Fine-tuning LaTeX documents

Proofreading is the end of the process of creating a document. This final stage includes spell checking, wordsmithing, and fine tuning the layout. LaTeX is a platform-independent system, so users doing proofreads together can easily exchange documents. The LaTeX user who installs the correct software will not have to do without standard text editing functions such as spellcheck and markups.

Checking It Off

Spellchecking while proofreading LaTeX files under Linux uses the built-in spellcheck capability, which includes programs like GNU Hunspell [1] and Aspell  [2]. These programs can be downloaded from the Internet. Ubuntu users can install GNU Aspell via:

sudo apt-get install aspell

and Hunspell via

sudo apt-get install hunspell

Both of these programs can be initiated from the command line, and dictionaries are available in many languages [3]. The program even has its own special option for LaTeX files.

With the command

aspell -t -c flatland.tex

you can proofread the LaTeX file flatland.tex . When the program starts, the terminal shows each word of the designated file that does not appear in the designated dictionary. Where possible, the program also makes a suggestion for replacing the word with a substitute (Figure 1). You can either accept or ignore the suggestion, and you can include the unrecognized word in your own user dictionary. This user dictionary is located as a hidden file in the home directory (e.g., under ~/.aspell.en.pws ).

Figure 1: GNU Aspell is easily operated in the terminal. Misspelled words can be replaced, ignored, or stored in the user dictionary.

Dictionaries in numerous languages can be found for Hunspell, and technical dictionaries are available for things like medical terminology. Additionally, the program includes its own option for LaTeX files. A spellcheck feature can combine multiple dictionaries. For example, you could combine a general use dictionary with a technical dictionary.

The command

hunspell -d en_GB,en_med -t dissertation.tex

proofreads the LaTex file dissertation.tex using a British English dictionary combined with a technical dictionary for English medical terminology. After the program starts, it will display via the terminal each word in the designated file that does not also appear in the designated dictionaries. Again, if possible, the program suggests replacements.

Suggestions can be accepted or ignored, and the unrecognized term can be stored in the user dictionary. A user dictionary is created as a hidden file in the home directory, for example, in this case, in ~/.hunspell_en_GB .

Not only can you use Hunspell via the command line, you can also integrate it into assorted programs including LibreOffice and Scribus; browsers such as Firefox, Chrome, and Opera; email programs such as Thunderbird and The Bat; text editors such as Gedit and Emacs; and various LaTeX editors such as LyX, Texmaker, and TeXstudio.

This broad scope of possibilities for integration means you can rely on a single user dictionary for a variety of tasks, including proofreading LaTeX files, LibreOffice documents, and email. As with Office, the LaTeX Editor handles an entire LaTeX file, and spellchecking is done during input. Incorrect spellings are underlined in red. A right-click on the mouse lets you select from various spelling suggestions (Figure 2).

Figure 2: Hunspell can be integrated into a LaTeX editor; in this case, it is Texmaker.

An issue that arises during the spellcheck of a LaTeX file is that commands and text are checked equally. This means commands are displayed as misspellings. Initially, therefore, you're confronted with a large number of error messages. This issue resolves itself over time as you gradually add commands to the user dictionary. Other issues include the masked and tagged word breaks, both of which are displayed as misspellings. You can get around these issues by immediately entering umlauts with the correct character encoding and waiting until the spellcheck is complete before tagging word breaks.

Modified

Besides spellchecking, the proofreading phase in document preparation also includes finalizing wording. When multiple authors contribute suggestions for modifications to wording, knowing who has changed what becomes important. Office programs have a feature that tracks changes to a document. Under LaTeX, the changes [4] package assumes this task. Listing 1 shows an example.

Listing 1

Tracking Changes

01 % Preamble:
02 \usepackage
03 %[final]
04 {changes}
05 \definechangesauthor[name={Daniel Tibi}, color=red]{dti}
06 \definechangesauthor[name={editor}, color=blue]{E}
07 \definechangesauthor[name={technical editing}, color=orange]{TE}
08 % body of the document:
09 % ...
10 Only a few of the offspring of our best and \deleted[id=dti]{and} most prominent families can spend the money, which for\added[id=FK]{m master} is necessary for this great and well known art.
11 % ...
12 \listofchanges

The typical procedure is for the package to be loaded in the preamble (lines 2 to 4). Next, the option inside the square brackets gets commented out via the preceding percentage sign, % . This turns on the highlight function for marking any changes. Once you have worked through all of the modifications, the percentage sign should be removed.

LaTeX will then make the colored markings of the changes disappear and generate the document with the modifications that have been inserted. Then, it lists the authors (lines 5 to 7). An abbreviation for the author should be designated inside the curly brackets of the command. The abbreviation appears with every modification made by the author.

Additionally, you can enter a complete name and define a color in the square brackets. This would then appear with the modifications to the text that have been made by each author. There are predefined colors for you to select, and you can define colors using the xcolor package [5].

Three commands are available in the main part, which can be used for highlighting passages (line 10). These commands act to delete, add, or modify. Passages to be deleted or added are found in the single curly braces of the corresponding command. Modifications are found between the first set of curly braces, and the original text is found between the second set. The author ID, which has been defined in the preamble, and any comments can be added in the square brackets.

All of the changes to the document will appear in the color that was assigned to the author in the preamble. Deleted passages are shown as struck through. Additionally, the author ID appears as a superscript with each modification; comments are displayed as footnotes. Figure 3 illustrates how this can look. Using your own command (line  12), you can have LaTeX generate a list of all changes together with a reference to the page in the document where each appears.

Figure 3: Modifications are displayed with colors in the document. You can also generate a list of all modifications.

To reject a modification, the corresponding command should be deleted. Commands with modifications that you would like to accept should be left untouched. To generate a final version of the document with all of the accepted modifications, the package option final should be added to the preamble. In case you already have a commented-out version of the final option in the file, the preceding comment character should simply be removed (line 3).

Completing the Picture

The proofreading phase concludes with the finalization of the layout. Particularly important here are page breaks, line breaks, hyphenation, and ligatures. Normally, LaTeX takes care of all of these things automatically, so you don't have to worry about them. However, once in a while it's necessary to intervene in the layout process.

The first and last paragraph of a page normally consists of more than one line; otherwise, the layout looks bad. In the preamble, you can specify how strictly LaTeX should apply this rule via the following command:

\clubpenalty = 10000
\widowpenalty = 10000

This command allows you to set penalties for first (line 1) and last (line 2) paragraphs of a page, which are made up of a single line. The greater the value you enter, the more strictly LaTex will apply the rule. The value chosen in this example represents strict application.

Alternatively, you can use the nowidow [6] package. This package can be integrated into the preamble via the following command:

\usepackage[defaultlines=2,all]{nowidow}

This command stipulates that single lines at the beginning and end of a page are prohibited and that the first and last paragraph of a page contain at least two lines.

When in doubt, you can manually intervene in the layout. A page break can be generated via the \pagebreak command. The manual approach also lets you perform minor workarounds to solve layout problems.

LaTeX also automatically takes care of line breaks and hyphenation, so you should only have to resort to manual operations in isolated cases. The microtype  [7] package handles an extensive number of smaller corrections that make for an improved layout. This package and its standard settings can be installed with:

\usepackage{microtype}

Moreover, multiple detail settings are available. In particular, the program can arrange the individual letters of a line so that fewer gaps appear in the text and line breaks are improved.

If an individual situation requires that a line break be created manually, you can do this via the \linebreak command.

When confronted with compound words or with words from a foreign language, you may need to specify where LaTex should separate the syllables. Awkward word breaks can otherwise lead to stumbling blocks with-in th-e te-xt . You can tag the location of a syllable separation in two ways.

When the tag for a separation point applies to the entire document, then the command shown below should be used in the preamble.

\hyphenation{
Guard-house
}

When the separation only applies to one of the specified words, then LaTeX will divide the word at the tagged location. In the document itself, you can tag locations for syllable separations in a single word by using the \- command (e.g., guard\-house ).

A ligature ties two or three letters together to create a special character (e.g., ae) or to improve the readability and flow of the text (e.g., fi) However, a ligature is not always desirable. In the event that two letters should not be joined with a ligature, you should add the command \/ between the letters (example: sales\/man ).

Conclusion

LaTeX does not lag behind Office programs with respect to spellchecking, tagging modifications, and manual access to the layout. The spellcheck is problematic in that commands and text are treated equally, which in turn results in commands being subject to a spellcheck.

This problem gradually disappears as commands are accepted into the user dictionary during the proofreading process. The disadvantage to the modification tagging feature in LaTeX files is that each modification must be marked with its own command. This leads to additional typing. In terms of layout, LaTeX already offers excellent results; nonetheless, it allows for even better results with the use of additional packages and targeted manual intervention.

Infos

  1. Hunspell: http://hunspell.sourceforge.net
  2. Aspell: http://www.aspell.net
  3. Dictionaries for GNU Aspell: ftp://ftp.gnu.org/gnu/aspell/dict/0index.html
  4. Changes package for LaTeX: http://www.ctan.org/pkg/changes/
  5. "Creating Vector Graphics with LaTeX and TikZ" by Daniel Tibi, Ubuntu User , Issue 22, 2014: http://www.ubuntu-user.com/Magazine/Archive/2014/22/Creating-vector-graphics-with-LaTeX-and-TikZ/%28language%29/eng-GB
  6. Nowidow package for LaTeX: http://www.ctan.org/pkg/nowidow
  7. Microtype package for LaTeX: http://www.ctan.org/pkg/microtype