LaTeX Groff and Neatroff

LaTeX Groff and Neatroff

A story about digital typesetting

I am currently in the process of writing my bachelor’s thesis for university using the groff typesetting system. Coincidentally, it during the first year of university that I first started experimenting with non-what-you-see-is-what-you-get systems of writing. This document will serve as a summary of the struggles I have had with various typesetting systems over the past three years and why I like and dislike each one of them. Fair warning, I will mostly be talking about groff as it is the system I have used the longest.

For the uninitiated, Groff, Neatroff and LaTeX are certain programs that allow you to write a document with a specific syntax and have it manage the layout and markup of the document. For instance, instead of pressing a button in your document processor, you write /textit{} or .FT I to make text italic.

These sorts of systems have many benefits and there are a number of reasons for choosing each of them. These reasons have been discussed to death and I will not repeat them here.

Markdown

Markdown exists here only as an honorable mention and because it is the first markup language that I used. Notice that I say markup language, not typesetting system. In essence, while markdown is easy to read, write and learn, it is simply not designed for document typesetting. For instance, if you want headers, footers, page breaks, citations, even superscript text, you will need to fallback to another language such as LaTeX or html.

The existence of these fallbacks is appreciated, and does provide markdown with some much needed power, but it basically means that if you are writing your essays and dissertations in markdown, you still rely on other systems. From my point of view it is then easier to just use those systems directly.

Markdown still has it’s place, and I sue it often, for instance for writing these posts, but it is simply not suited for typesetting.

Groff

Groff is the system I moved to after I left markdown. Groff is difficult to talk about because by default it comes with very few functions. High-level functions such as headers, headings, footnotes and even paragraphs are added with so-called macros. These macros are often provided in the form of a macro package which provides functions that work well together.

I have been on record before singing the praise of the “mom” macro package, and that is because it is great. Mom is the only macro package with thorough documentation and all the functions one might want. Note that the implementations of each function are not always the best, but they do exist.

Things I especially appreciate in the mom package are: Table of contents generation, very smart footer handling, smart quotes, and general verbosity. Instead using two-letter macros that roughly abbreviate the action you are trying to perform (as some macro packages do), mom calls .HEADING 1, .FOOTNOTE and .FOOTNOTE OFF, .SECTION, .NEWPAGE, etc. This makes the source file extremely readable even for someone unfamilliar with the syntax. This also aids in remembering the syntax. Never have I wondered “how to I start a block quote again with mom?” because the macro is entirely intuitive (full credit goes to those who guessed .BLOCKQUOTE).

In other words, mom is great, it provides all the features I want in an easy and well-documented way. But this section is about groff, not mom, and groff, frankly, is not that great. On it’s own, groff is extremely tedious to use, which is why macro packages exist. But even fundamentally, there are some pretty serious design flaws.

Groff is an old system, as such, it does not have support for a number of near-essential features. Groff does not have colour support, proper ttf or otf font support, widow and orphan protection, or even ways programatically access something on a later line. This last point means that if I want a table of contents on page 2, it does not have the information about the headers and pagenumber from later pages. Mom gets around this by generating the table and then moving the entire page up to where you want it, but this method comes with a bunch of downsides that I will not get into now.

One of the most hair-pulling problems with groff however is that it (by default), does not support non-latin characters. That meas that characters such as ö or á will not process correctly. Even common symbols such as and em-dashes (—), or ellipses (…) are not supported. Groff “solves this problem with a preprocessor, a small program that runs before groff and takes out all of the non-latin characters replacing them with machine-readbale codes.

This is a very bad solution, especially because other preprocessors often include information from other files, which is not processed. This is perhaps best explained by an example.

Say you have a text file with a couple of chapters stored in different files with each chapter containing some academic references. You should first process the main file with preconv to fix the encoding. The second step is collecting the chapters with soelim, but you better not want to have non-latin characters in your file names, because preconv just changed all of that. Now you should run preconv again to prevent the other processors from complaining. Now you run refer to collect references, as long as you do not try to reference an author with a non-latin character in their name, because (once again, those characters were all changed in the main file). After collecting the reference data, run preconv again. Only then can you finally pass the file into groff and get your pdf.

Now, obviously, you can automate all these steps. In fact, automatibility is one of groff’s strongsuits. Nevertheless, this idiosyncratic way of handling something like encoding is really annoying especially when people recommend groff as a leightweight alternative to LaTeX.

Luckily, there is a program out to fix groff’s mistakes, that program being neatroff.

Neatroff

Neatroff brings the troff typesetting system to the 21st century with features such as paragraph at-once adjustment, coloured text, and yes… native utf-8 support. Neatroff also does not have widow and orphan protection, but some macro packages at least warn you when an orphan is printed.

Speaking of macros, while neatroff is great, the macro packages are not, at least not for my purposes. The mom macros do not work for neatroff, does to some design differences between neatroff and groff. The macro package closest to Mom in extensiveness is utroff which does not come with a “default” install of neatroff. While utroff has an expansive feature set, documentation, integration and verbosity are not so far along making the learning curve quite steep.

For small documents (around a thousand words or so), I have my own set of macros that can generate fine looking pdfs, but I honestly do not want to program every single little feature that I would want when writing a book or dissertation for example. This was what I loved so much about mom. Whenever I thought of a nice feature to add, I could go to the mom documentation and read that it was likely already implemented. While writing, I want to write, and while programming, I want to program. These two do not mix all that well as programming takes me entirely out of the flow of writing.

In general I still prefer neatroff over groff at their cores, but if I want to have an effective workflow, I have to rely on macros made by other people, which means that groff pulls ahead ever so slightly as it is the less niche of the niche options.

LaTeX

Lastly, there is LaTeX, which has lately become my choice of typesetter. LaTeX does not use a macro system like groff, and anything you would likely want to do in a document can already be done out of the box. If you want something that cannot (easily) be done with the base LaTeX syntax, then there is a plethora of packages that you can install to change and extend your documents. Largely due to its popularity, LaTeX is really well documented and supported and can therefore be learned relatively quickly. There are even websites where you can compile your source files allowing one to write anywhere and from any system.

In essence, if you want to do something, LaTeX can do it, and likely someone has already done it before you. For instance, my university requires I make citations in the Chicago format. For groff, it had to program my own macros for this (the same goes for neatroff). LaTeX simply has a package for it which can also cover a lot more edge cases due to its more powerful engine. Here we also once again see the benefit of using a more popular program as almost all website that allow for exporting references have a BibTex option (the format LaTeX uses).

Also, the utf-8 support, while not infallible, is absolutely great. Most characters just work out of the box without installing different fonts or using external programs. More esoteric characters such as “Ḥ” or “ī” do not show up properly, but this can be solved with reasonable,- human-readable- escape sequences.

The problem with LaTeX however is that it is a lot heavier than neatroff and even groff. A full LaTeX install is huge in comparison with with others, and producing a pdf is an order of magnitude slower too.

Conclusion

As I said already, I am now, and will be, using LaTeX for most of my typesetting. This is not because I do not see the value of lightweight software. Hell, I run alpine Linux on my laptop which has an install size less than 500 mb and was running Void and Arch before that. I a lot of experience with with minimalist programs such as mupdf, dwm, and even maintain my own web browser that is faster and smaller than Firefox and Chrome. For the writing of my documents however, I honestly could not care whether it takes five seconds or a full minute. What I want from a typesetter is the ability to make readable, beautifully documents with minimal effort.

LaTeX simply scores the highest in all of these metrics, but that does not mean that I will use it for everything. For instance, I have number of auto-generating pdfs on my system with give information on certain settings. These documents are speed-sensitive and will therefore remain in groff or neatroff.

I hope (but doubt) that this write up may have been useful for you.