2.4.8 String Type
A string is an array of characters. Strings are used for many purposes in Emacs, as can be expected in a text editor; for example, as the names of Lisp symbols, as messages for the user, and to represent text extracted from buffers. Strings in Lisp are constants: evaluation of a string returns the same string.
See Strings and Characters, for functions that operate on strings.
β’ Syntax for Strings | Β Β | How to specify Lisp strings. |
β’ Non-ASCII in Strings | Β Β | International characters in strings. |
β’ Nonprinting Characters | Β Β | Literal unprintable characters in strings. |
β’ Text Props and Strings | Β Β | Strings with text properties. |
2.4.8.1 Syntax for Stringsβ
The read syntax for a string is a double-quote, an arbitrary number of characters, and another double-quote, "like this"
. To include a double-quote in a string, precede it with a backslash; thus, "\""
is a string containing just one double-quote character. Likewise, you can include a backslash by preceding it with another backslash, like this: "this \\ is a single embedded backslash"
.
The newline character is not special in the read syntax for strings; if you write a new line between the double-quotes, it becomes a character in the string. But an escaped newlineβone that is preceded by β\
ββdoes not become part of the string; i.e., the Lisp reader ignores an escaped newline while reading a string. An escaped space β\Β
β is likewise ignored.
"It is useful to include newlines
in documentation strings,
but the newline is \
ignored if escaped."
β "It is useful to include newlines
in documentation strings,
but the newline is ignored if escaped."
2.4.8.2 Non-ASCII Characters in Stringsβ
There are two text representations for non-ASCII characters in Emacs strings: multibyte and unibyte (see Text Representations). Roughly speaking, unibyte strings store raw bytes, while multibyte strings store human-readable text. Each character in a unibyte string is a byte, i.e., its value is between 0 and 255. By contrast, each character in a multibyte string may have a value between 0 to 4194303 (see Character Type). In both cases, characters above 127 are non-ASCII.
You can include a non-ASCII character in a string constant by writing it literally. If the string constant is read from a multibyte source, such as a multibyte buffer or string, or a file that would be visited as multibyte, then Emacs reads each non-ASCII character as a multibyte character and automatically makes the string a multibyte string. If the string constant is read from a unibyte source, then Emacs reads the non-ASCII character as unibyte, and makes the string unibyte.
Instead of writing a character literally into a multibyte string, you can write it as its character code using an escape sequence. See General Escape Syntax, for details about escape sequences.
If you use any Unicode-style escape sequence β\uNNNN
β or β\U00NNNNNN
β in a string constant (even for an ASCII character), Emacs automatically assumes that it is multibyte.
You can also use hexadecimal escape sequences (β\xn
β) and octal escape sequences (β\n
β) in string constants. But beware: If a string constant contains hexadecimal or octal escape sequences, and these escape sequences all specify unibyte characters (i.e., less than 256), and there are no other literal non-ASCII characters or Unicode-style escape sequences in the string, then Emacs automatically assumes that it is a unibyte string. That is to say, it assumes that all non-ASCII characters occurring in the string are 8-bit raw bytes.
In hexadecimal and octal escape sequences, the escaped character code may contain a variable number of digits, so the first subsequent character which is not a valid hexadecimal or octal digit terminates the escape sequence. If the next character in a string could be interpreted as a hexadecimal or octal digit, write β\Β
β (backslash and space) to terminate the escape sequence. For example, β\xe0\Β
β represents one character, βa
β with grave accent. β\Β
β in a string constant is just like backslash-newline; it does not contribute any character to the string, but it does terminate any preceding hex escape.
2.4.8.3 Nonprinting Characters in Stringsβ
You can use the same backslash escape-sequences in a string constant as in character literals (but do not use the question mark that begins a character constant). For example, you can write a string containing the nonprinting characters tab and C-a
, with commas and spaces between them, like this: "\t, \C-a"
. See Character Type, for a description of the read syntax for characters.
However, not all of the characters you can write with backslash escape-sequences are valid in strings. The only control characters that a string can hold are the ASCII control characters. Strings do not distinguish case in ASCII control characters.
Properly speaking, strings cannot hold meta characters; but when a string is to be used as a key sequence, there is a special convention that provides a way to represent meta versions of ASCII characters in a string. If you use the β\M-
β syntax to indicate a meta character in a string constant, this sets the 2**7 bit of the character in the string. If the string is used in define-key
or lookup-key
, this numeric code is translated into the equivalent meta character. See Character Type.
Strings cannot hold characters that have the hyper, super, or alt modifiers.
2.4.8.4 Text Properties in Stringsβ
A string can hold properties for the characters it contains, in addition to the characters themselves. This enables programs that copy text between strings and buffers to copy the textβs properties with no special effort. See Text Properties, for an explanation of what text properties mean. Strings with text properties use a special read and print syntax:
#("characters" property-data...)
where property-data
consists of zero or more elements, in groups of three as follows:
beg end plist
The elements beg
and end
are integers, and together specify a range of indices in the string; plist
is the property list for that range. For example,
#("foo bar" 0 3 (face bold) 3 4 nil 4 7 (face italic))
represents a string whose textual contents are βfoo bar
β, in which the first three characters have a face
property with value bold
, and the last three have a face
property with value italic
. (The fourth character has no text properties, so its property list is nil
. It is not actually necessary to mention ranges with nil
as the property list, since any characters not mentioned in any range will default to having no properties.)