23.7 inputenc package

Synopsis:

\usepackage[encoding-name]{inputenc}

Declare the input file’s text encoding to be encoding-name. The default, if this package is not loaded, is UTF-8. Technically, specifying the encoding name is optional, but in practice it is not useful to omit it.

In a computer file, the characters are stored according to a scheme called the encoding. There are many different encodings. The simplest is ASCII, which supports 95 printable characters, not enough for most of the world’s languages. For instance, to typeset the a-umlaut character ‘ä’ in an ASCII-encoded LaTeX source file, the sequence \"a is used. This would make source files for anything but English hard to read; even for English, often a more extensive encoding is more convenient.

The modern encoding standard, in some ways a union of the others, is UTF-8, one of the representations of Unicode. This is the default for LaTeX since 2018.

The inputenc package is how LaTeX knows what encoding is used. For instance, the following command explicitly says that the input file is UTF-8 (note the lack of a dash).

\usepackage[utf8]{inputenc}

Caution: use inputenc only with the pdfTeX engine (see TeX engines). (The XeTeX and LuaTeX engines assume that the input file is UTF-8 encoded.) If you invoke LaTeX with either the xelatex command or the lualatex command, and try to declare a non-UTF-8 encoding with inputenc, such as latin1, then you will get the error inputenc is not designed for xetex or luatex.

An inputenc package error such as Invalid UTF-8 byte "96 means that some of the material in the input file does not follow the encoding scheme. Often these errors come from copying material from a document that uses a different encoding than the input file; this one is a left single quote from a web page using latin1 inside a LaTeX input file that uses UTF-8. The simplest solution is to replace the non-UTF-8 character with its UTF-8 equivalent, or use a LaTeX equivalent command or character.

In some documents, such as a collection of journal articles from a variety of authors, changing the encoding in mid-document may be necessary. Use the command \inputencoding{encoding-name}. The most common values for encoding-name are: ascii, latin1, latin2, latin3, latin4, latin5, latin9, latin10, and utf8.


Unofficial LaTeX2e reference manual