Skip to main content

22.17 Unibyte Editing Mode

The ISO 8859 Latin-n character sets define character codes in the range 0240 to 0377 octal (160 to 255 decimal) to handle the accented letters and punctuation needed by various European languages (and some non-European ones). Note that Emacs considers bytes with codes in this range as raw bytes, not as characters, even in a unibyte buffer, i.e., if you disable multibyte characters. However, Emacs can still handle these character codes as if they belonged to one of the single-byte character sets at a time. To specify which of these codes to use, invoke M-x set-language-environment and specify a suitable language environment such as ‘Latin-n’. See Disabling Multibyte Characters in GNU Emacs Lisp Reference Manual.

Emacs can also display bytes in the range 160 to 255 as readable characters, provided the terminal or font in use supports them. This works automatically. On a graphical display, Emacs can also display single-byte characters through fontsets, in effect by displaying the equivalent multibyte characters according to the current language environment. To request this, set the variable unibyte-display-via-language-environment to a non-nil value. Note that setting this only affects how these bytes are displayed, but does not change the fundamental fact that Emacs treats them as raw bytes, not as characters.

If your terminal does not support display of the Latin-1 character set, Emacs can display these characters as ASCII sequences which at least give you a clear idea of what the characters are. To do this, load the library iso-ascii. Similar libraries for other Latin-n character sets could be implemented, but have not been so far.

Normally non-ISO-8859 characters (decimal codes between 128 and 159 inclusive) are displayed as octal escapes. You can change this for non-standard extended versions of ISO-8859 character sets by using the function standard-display-8bit in the disp-table library.

There are two ways to input single-byte non-ASCII characters:

  • You can use an input method for the selected language environment. See Input Methods. When you use an input method in a unibyte buffer, the non-ASCII character you specify with it is converted to unibyte.

  • If your keyboard can generate character codes 128 (decimal) and up, representing non-ASCII characters, you can type those character codes directly.

    On a graphical display, you should not need to do anything special to use these keys; they should simply work. On a text terminal, you should use the command M-x set-keyboard-coding-system or customize the variable keyboard-coding-system to specify which coding system your keyboard uses (see Terminal Coding). Enabling this feature will probably require you to use ESC to type Meta characters; however, on a console terminal or a terminal emulator such as xterm, you can arrange for Meta to be converted to ESC and still be able to type 8-bit characters present directly on the keyboard or using Compose or AltGr keys. See User Input.

  • You can use the key C-x 8 as a compose-character prefix for entry of non-ASCII Latin-1 and a few other printing characters. C-x 8 is good for insertion (in the minibuffer as well as other buffers), for searching, and in any other context where a key sequence is allowed.

    C-x 8 works by loading the iso-transl library. Once that library is loaded, the Alt modifier key, if the keyboard has one, serves the same purpose as C-x 8: use Alt together with an accent character to modify the following letter. In addition, if the keyboard has keys for the Latin-1 dead accent characters, they too are defined to compose with the following character, once iso-transl is loaded.

    Use C-x 8 C-h to list all the available C-x 8 translations.