Skip to main content

2.4.3 Character Type

A character in Emacs Lisp is nothing more than an integer. In other words, characters are represented by their character codes. For example, the character A is represented as the integerΒ 65.

Individual characters are used occasionally in programs, but it is more common to work with strings, which are sequences composed of characters. See String Type.

Characters in strings and buffers are currently limited to the range of 0 to 4194303β€”twenty two bits (see Character Codes). Codes 0 through 127 are ASCII codes; the rest are non-ASCII (see Non-ASCII Characters). Characters that represent keyboard input have a much wider range, to encode modifier keys such as Control, Meta and Shift.

There are special functions for producing a human-readable textual description of a character for the sake of messages. See Describing Characters.

β€’ Basic Char SyntaxΒ Β Syntax for regular characters.
β€’ General Escape SyntaxΒ Β How to specify characters by their codes.
β€’ Ctl-Char SyntaxΒ Β Syntax for control characters.
β€’ Meta-Char SyntaxΒ Β Syntax for meta-characters.
β€’ Other Char BitsΒ Β Syntax for hyper-, super-, and alt-characters. Basic Char Syntax​

Since characters are really integers, the printed representation of a character is a decimal number. This is also a possible read syntax for a character, but writing characters that way in Lisp programs is not clear programming. You should always use the special read syntax formats that Emacs Lisp provides for characters. These syntax formats start with a question mark.

The usual read syntax for alphanumeric characters is a question mark followed by the character; thus, β€˜?A’ for the character A, β€˜?B’ for the character B, and β€˜?a’ for the character a.

For example:

?Q β‡’ 81     ?q β‡’ 113

You can use the same syntax for punctuation characters. However, if the punctuation character has a special syntactic meaning in Lisp, you must quote it with a β€˜\’. For example, β€˜?\(’ is the way to write the open-paren character. Likewise, if the character is β€˜\’, you must use a second β€˜\’ to quote it: β€˜?\\’.

You can express the characters control-g, backspace, tab, newline, vertical tab, formfeed, space, return, del, and escape as β€˜?\a’, β€˜?\b’, β€˜?\t’, β€˜?\n’, β€˜?\v’, β€˜?\f’, β€˜?\s’, β€˜?\r’, β€˜?\d’, and β€˜?\e’, respectively. (β€˜?\s’ followed by a dash has a different meaningβ€”it applies the Super modifier to the following character.) Thus,

?\a β‡’ 7                 ; control-g, C-g
?\b β‡’ 8 ; backspace, BS, C-h
?\t β‡’ 9 ; tab, TAB, C-i
?\n β‡’ 10 ; newline, C-j
?\v β‡’ 11 ; vertical tab, C-k
?\f β‡’ 12 ; formfeed character, C-l
?\r β‡’ 13 ; carriage return, RET, C-m
?\e β‡’ 27 ; escape character, ESC, C-[
?\s β‡’ 32 ; space character, SPC
?\\ β‡’ 92 ; backslash character, \
?\d β‡’ 127 ; delete character, DEL

These sequences which start with backslash are also known as escape sequences, because backslash plays the role of an escape character; this has nothing to do with the character ESC. β€˜\s’ is meant for use in character constants; in string constants, just write the space.

A backslash is allowed, and harmless, preceding any character without a special escape meaning; thus, β€˜?\+’ is equivalent to β€˜?+’. There is no reason to add a backslash before most characters. However, you must add a backslash before any of the characters β€˜()[]\;"’, and you should add a backslash before any of the characters β€˜|'`#.,’ to avoid confusing the Emacs commands for editing Lisp code. You should also add a backslash before Unicode characters which resemble the previously mentioned ASCII ones, to avoid confusing people reading your code. Emacs will highlight some non-escaped commonly confused characters such as β€˜β€˜β€™ to encourage this. You can also add a backslash before whitespace characters such as space, tab, newline and formfeed. However, it is cleaner to use one of the easily readable escape sequences, such as β€˜\t’ or β€˜\s’, instead of an actual whitespace character such as a tab or a space. (If you do write backslash followed by a space, you should write an extra space after the character constant to separate it from the following text.) General Escape Syntax​

In addition to the specific escape sequences for special important control characters, Emacs provides several types of escape syntax that you can use to specify non-ASCII text characters.

  1. You can specify characters by their Unicode names, if any. ?\N{NAME} represents the Unicode character named NAME. Thus, β€˜?\N{LATIN SMALL LETTER A WITH GRAVE}’ is equivalent to ?Γ  and denotes the Unicode character U+00E0. To simplify entering multi-line strings, you can replace spaces in the names by non-empty sequences of whitespace (e.g., newlines).
  2. You can specify characters by their Unicode values. ?\N{U+X} represents a character with Unicode code point X, where X is a hexadecimal number. Also, ?\uxxxx and ?\Uxxxxxxxx represent code points xxxx and xxxxxxxx, respectively, where each x is a single hexadecimal digit. For example, ?\N{U+E0}, ?\u00e0 and ?\U000000E0 are all equivalent to ?Γ  and to β€˜?\N{LATIN SMALL LETTER A WITH GRAVE}’. The Unicode Standard defines code points only up to β€˜U+10ffff’, so if you specify a code point higher than that, Emacs signals an error.
  3. You can specify characters by their hexadecimal character codes. A hexadecimal escape sequence consists of a backslash, β€˜x’, and the hexadecimal character code. Thus, β€˜?\x41’ is the character A, β€˜?\x1’ is the character C-a, and ?\xe0 is the character Γ  (a with grave accent). You can use any number of hex digits, so you can represent any character code in this way.
  4. You can specify characters by their character code in octal. An octal escape sequence consists of a backslash followed by up to three octal digits; thus, β€˜?\101’ for the character A, β€˜?\001’ for the character C-a, and ?\002 for the character C-b. Only characters up to octal code 777 can be specified this way.

These escape sequences may also be used in strings. See Non-ASCII in Strings. Control-Character Syntax​

Control characters can be represented using yet another read syntax. This consists of a question mark followed by a backslash, caret, and the corresponding non-control character, in either upper or lower case. For example, both β€˜?\^I’ and β€˜?\^i’ are valid read syntax for the character C-i, the character whose value is 9.

Instead of the β€˜^’, you can use β€˜C-’; thus, β€˜?\C-i’ is equivalent to β€˜?\^I’ and to β€˜?\^i’:

?\^I β‡’ 9     ?\C-I β‡’ 9

In strings and buffers, the only control characters allowed are those that exist in ASCII; but for keyboard input purposes, you can turn any character into a control character with β€˜C-’. The character codes for these non-ASCII control characters include the 2**26 bit as well as the code for the corresponding non-control character. Ordinary text terminals have no way of generating non-ASCII control characters, but you can generate them straightforwardly using X and other window systems.

For historical reasons, Emacs treats the DEL character as the control equivalent of ?:

?\^? β‡’ 127     ?\C-? β‡’ 127

As a result, it is currently not possible to represent the character Control-?, which is a meaningful input character under X, using β€˜\C-’. It is not easy to change this, as various Lisp files refer to DEL in this way.

For representing control characters to be found in files or strings, we recommend the β€˜^’ syntax; for control characters in keyboard input, we prefer the β€˜C-’ syntax. Which one you use does not affect the meaning of the program, but may guide the understanding of people who read it. Meta-Character Syntax​

A meta character is a character typed with the META modifier key. The integer that represents such a character has the 2**27 bit set. We use high bits for this and other modifiers to make possible a wide range of basic character codes.

In a string, the 2**7 bit attached to an ASCII character indicates a meta character; thus, the meta characters that can fit in a string have codes in the range from 128 to 255, and are the meta versions of the ordinary ASCII characters. See Strings of Events, for details about META-handling in strings.

The read syntax for meta characters uses β€˜\M-’. For example, β€˜?\M-A’ stands for M-A. You can use β€˜\M-’ together with octal character codes (see below), with β€˜\C-’, or with any other syntax for a character. Thus, you can write M-A as β€˜?\M-A’, or as β€˜?\M-\101’. Likewise, you can write C-M-b as β€˜?\M-\C-b’, β€˜?\C-\M-b’, or β€˜?\M-\002’. Other Character Modifier Bits​

The case of a graphic character is indicated by its character code; for example, ASCII distinguishes between the characters β€˜a’ and β€˜A’. But ASCII has no way to represent whether a control character is upper case or lower case. Emacs uses the 2**25 bit to indicate that the shift key was used in typing a control character. This distinction is possible only when you use X terminals or other special terminals; ordinary text terminals do not report the distinction. The Lisp syntax for the shift bit is β€˜\S-’; thus, β€˜?\C-\S-o’ or β€˜?\C-\S-O’ represents the shifted-control-o character.

The X Window System defines three other modifier bits that can be set in a character: hyper, super and alt. The syntaxes for these bits are β€˜\H-’, β€˜\s-’ and β€˜\A-’. (Case is significant in these prefixes.) Thus, β€˜?\H-\M-\A-x’ represents Alt-Hyper-Meta-x. (Note that β€˜\s’ with no following β€˜-’ represents the space character.) Numerically, the bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.