33.10.3 Coding Systems in Lisp
Here are the Lisp facilities for working with coding systems:
function
coding-system-list \&optional base-onlyβ
This function returns a list of all coding system names (symbols). If base-only
is non-nil
, the value includes only the base coding systems. Otherwise, it includes alias and variant coding systems as well.
function
coding-system-p objectβ
This function returns t
if object
is a coding system name or nil
.
function
check-coding-system coding-systemβ
This function checks the validity of coding-system
. If that is valid, it returns coding-system
. If coding-system
is nil
, the function return nil
. For any other values, it signals an error whose error-symbol
is coding-system-error
(see signal).
function
coding-system-eol-type coding-systemβ
This function returns the type of end-of-line (a.k.a. eol) conversion used by coding-system
. If coding-system
specifies a certain eol conversion, the return value is an integer 0, 1, or 2, standing for unix
, dos
, and mac
, respectively. If coding-system
doesnβt specify eol conversion explicitly, the return value is a vector of coding systems, each one with one of the possible eol conversion types, like this:
(coding-system-eol-type 'latin-1)
β [latin-1-unix latin-1-dos latin-1-mac]
If this function returns a vector, Emacs will decide, as part of the text encoding or decoding process, what eol conversion to use. For decoding, the end-of-line format of the text is auto-detected, and the eol conversion is set to match it (e.g., DOS-style CRLF format will imply dos
eol conversion). For encoding, the eol conversion is taken from the appropriate default coding system (e.g., default value of buffer-file-coding-system
for buffer-file-coding-system
), or from the default eol conversion appropriate for the underlying platform.
function
coding-system-change-eol-conversion coding-system eol-typeβ
This function returns a coding system which is like coding-system
except for its eol conversion, which is specified by eol-type
. eol-type
should be unix
, dos
, mac
, or nil
. If it is nil
, the returned coding system determines the end-of-line conversion from the data.
eol-type
may also be 0, 1 or 2, standing for unix
, dos
and mac
, respectively.
function
coding-system-change-text-conversion eol-coding text-codingβ
This function returns a coding system which uses the end-of-line conversion of eol-coding
, and the text conversion of text-coding
. If text-coding
is nil
, it returns undecided
, or one of its variants according to eol-coding
.
function
find-coding-systems-region from toβ
This function returns a list of coding systems that could be used to encode a text between from
and to
. All coding systems in the list can safely encode any multibyte characters in that portion of the text.
If the text contains no multibyte characters, the function returns the list (undecided)
.
function
find-coding-systems-string stringβ
This function returns a list of coding systems that could be used to encode the text of string
. All coding systems in the list can safely encode any multibyte characters in string
. If the text contains no multibyte characters, this returns the list (undecided)
.
function
find-coding-systems-for-charsets charsetsβ
This function returns a list of coding systems that could be used to encode all the character sets in the list charsets
.
function
check-coding-systems-region start end coding-system-listβ
This function checks whether coding systems in the list coding-system-list
can encode all the characters in the region between start
and end
. If all of the coding systems in the list can encode the specified text, the function returns nil
. If some coding systems cannot encode some of the characters, the value is an alist, each element of which has the form (coding-system1 pos1 pos2 β¦)
, meaning that coding-system1
cannot encode characters at buffer positions pos1
, pos2
, ....
start
may be a string, in which case end
is ignored and the returned value references string indices instead of buffer positions.
function
detect-coding-region start end \&optional highestβ
This function chooses a plausible coding system for decoding the text from start
to end
. This text should be a byte sequence, i.e., unibyte text or multibyte text with only ASCII and eight-bit characters (see Explicit Encoding).
Normally this function returns a list of coding systems that could handle decoding the text that was scanned. They are listed in order of decreasing priority. But if highest
is non-nil
, then the return value is just one coding system, the one that is highest in priority.
If the region contains only ASCII characters except for such ISO-2022 control characters ISO-2022 as ESC
, the value is undecided
or (undecided)
, or a variant specifying end-of-line conversion, if that can be deduced from the text.
If the region contains null bytes, the value is no-conversion
, even if the region contains text encoded in some coding system.
function
detect-coding-string string \&optional highestβ
This function is like detect-coding-region
except that it operates on the contents of string
instead of bytes in the buffer.
variable
inhibit-nul-byte-detectionβ
If this variable has a non-nil
value, null bytes are ignored when detecting the encoding of a region or a string. This allows the encoding of text that contains null bytes to be correctly detected, such as Info files with Index nodes.
variable
inhibit-iso-escape-detectionβ
If this variable has a non-nil
value, ISO-2022 escape sequences are ignored when detecting the encoding of a region or a string. The result is that no text is ever detected as encoded in some ISO-2022 encoding, and all escape sequences become visible in a buffer. Warning: Use this variable with extreme caution, because many files in the Emacs distribution use ISO-2022 encoding.
function
coding-system-charset-list coding-systemβ
This function returns the list of character sets (see Character Sets) supported by coding-system
. Some coding systems that support too many character sets to list them all yield special values:
- If
coding-system
supports all Emacs characters, the value is(emacs)
. - If
coding-system
supports all Unicode characters, the value is(unicode)
. - If
coding-system
supports all ISO-2022 charsets, the value isiso-2022
. - If
coding-system
supports all the characters in the internal coding system used by Emacs version 21 (prior to the implementation of internal Unicode support), the value isemacs-mule
.
See Process Information, in particular the description of the functions process-coding-system
and set-process-coding-system
, for how to examine or set the coding systems used for I/O to a subprocess.