Why I moved from LaTeX to HTML

As I wrote in my last weeknote, I decided to switch from LaTeX to HTML for the training manuals and papers my company Freistil Consulting will publish in the future. In this post, I’m going to explain why.

As a computer scientist, system administrator and IT trainer, I’ve written a lot of documents ranging from single-page articles and a diploma thesis to my book “Perl-Meisterkurs” with nearly 200 pages. And all the time, I’ve been looking for the format that best supported publishing those documents.

In the early nineties, I learned to like LaTeX. The process of writing documents in LaTeX (write, process, verify, repeat) is so similar to writing software in a compiler language that I found it very easy to learn. And its output quality was just stunning compared to all the text processing software like Word. Also, like for example with Perl, there is a huge choice of useful extensions, called “styles”, freely available on the Internet. With those, you can use LaTeX for writing articles and books as well as letters and even presentation slides (or cooking recipes for that matter).

But while you can generate PDF files easily with LaTeX, converting a document into HTML for the Web is tedious, even with the converter software available. And because I needed printouts as well as web pages to display on a projector, I chose DocBook XML as my source format. It’s been used for publishing technical documents for a long time and can be converted into PDF as easily as into HTML. Unfortunately, at least using open source tools, the PDF output is not nearly as neat as the one from LaTeX.

So, when I started writing my Perl book, I reverted back to LaTeX and I’ve been happy for some years. That is, until I decided to focus on online training. Now great print quality didn’t matter that much any more and I needed HTML files to publish on our online training platform.

The format had to be based on plain text so I could still use Textmate, my favourite text editor, and Perl scripts to process my source files. There weren’t much alternatives left to choose from. One of the most interesting candidates was Markdown. I already use Markdown on most of my blogs because it’s easy to write and also easy on the eyes. Furthermore, Scrivener, the writing tool I like most, also supports working with Markdown.

And then there was HTML, simple, plain HTML. I got the idea from Mark Pilgrim’s interview on the setup where he mentioned that he’s writing his new book in HTML. I liked this idea because doing semantic markup in HTML is easy and by using CSS, you can style a great looking online presentation.

Writing HTML is almost as easy as writing Markdown, especially when you have support from your editor software. Additionally, there was the strategic aspect that if I wanted to involve other writers, I would be much easier to find some who knew HTML than ones with LaTeX, DocBook or Markdown knowledge.

Because I still wanted to be able to generate PDF files, I looked for decent HTML-to-PDF converters. XSL:FO was the main reason I abandoned DocBook, so I researched the alternatives and found PrinceXML. Prince uses the normal CSS styling information and extends its syntax a bit to cover printing aspects like page sizes and footnotes. I found out that CSS3 actually even can do page or figure numbering and cross references. It has to be mentioned that PrinceXML is a bit on the pricy side, so I had to do some tests first.

I quickly converted one of the book’s chapters from LaTeX to HTML, created a CSS style sheet for screen and one for print media, and checked the results in HTML and PDF. As was to be expected, the web presentation was fine, and also the PDF output from PrinceXML was quite acceptable. These results finally convinced me to go Full Monty on converting the book from LaTeX to HTML.

I haven’t finished the conversion yet, but thanks to a Perl script with a growing list of regular expressions, I can minimize my manual work. I’ll also need to write some scripts for generating the table of contents and the keyword index, because PrinceXML doesn’t do that.

I expect to have a complete new version of the Perl Meisterkurs book in May and will let you know of my experiences in another post.