Emacs smart umlaut conversion for TeX

root

tex-smart-umlauts.el

Smart umlaut conversion for TeX.

License GPLv3

Introduction

Transform LaTeX encoded non-ASCII characters to and from their
visible (utf-8) representations when visiting a file and preserve
their original encoding when saving the buffer.

TeX/LaTeX documents often contain special characters that are not
available in ASCII code in a special command form. For example, the
German umlaut ä can be written in LaTeX as {\“a}, \”{a}, \“a or
even”a. Of course, nowadays one should use an encoding like utf-8,
which contain a natural representation of these characters.
Unfortunately, old documents may still use the old encodings or
non-ASCII encodings are not allowed for other reasons. Emacs can
already automatically encode and decode such characters from and to
their LaTeX representation using the iso-cvt package: during
editing the document the characters are utf-8 encoded but when
saved to disk the characters are transformed to a LaTeX
representation.

Unfortunately, this an all-or-nothing approach: either all
characters are transformed or none. But when working on a document
with several authors this may be problematic. This may particularly
true if the document is stored in a revision control system. In
these cases each author may use its own editor, each editor may
have its own setting and encodes umlauts in a special preferred
way. Each time one author does a small change, all umlauts in the
whole document may get transformed. This may also lead to huge
diffs that consist of many hunks that only change the encoding of
some characters.

The purpose of tex-smart-umlauts is to automatically encode and
decode umlauts from and to their tex representation, while
preserving the original encoding. In other words, when a LaTeX
file is visited, the original encoding of each character is saved
and the character is transformed to its visible (utf-8)
representation. When the document is saved again, each character
that has been present when the document has been loaded is saved in
its original encoding. Only newly inserted non-ASCII characters
get a new encoding that depends on the user-options of
tex-smart-umlauts. This way, a small change in a document will
not reencode all non-ASCII charactes as iso-cvt would do and only
the modified parts of the document will really be modified on disk.

Usage

Simply add

(add-hook 'LaTeX-mode-hook #'tex-smart-umlauts-mode)

to your .emacs file. This will convert all LaTeX-encoded characters
to their resp. visible encoding and store their original encodings.
This will also automatically register the encoding so that the
original encodings are restored when the buffer is saved.

Change Log

1.3.1

  • Minor changes.

1.3.0

  • Use a true minor-mode.
  • Handle reverting a buffer.

1.2.0

  • Add \ss{} encoding for ß.

1.1.0

  • Convert string is iso-cvt table to multibyte
  • Use inhibit-read-only instead of buffer-read-only

Function Documentation

(tex-smart-umlauts-reencode-all &optional FROM TO)

Reencode all charactes in region.
Only characters between the buffer positions FROM and TO are
decoded. If FROM and TO are nil the whole buffer is encoded. The
original encodings of all characters in the region is dropped and
replaced by a new encoding according to the rules of
tex-smart-umlauts-encodings and tex-smart-umlauts-encode.

(tex-smart-umlauts-show-encodings)

Show encodings of all umlauts in buffer.

(tex-smart-umlauts-hide-encodings)

Hide encodings of all umlauts in buffer.