Almost looks like… a book?

Those who know me know that I am a fierce proponent of the digital-native lifestyle. Online shopping, cashless payments, the fewer physical components the better in my book.

Except for my books.

I’ve tried to get on board the e-book hype train, but I’ve never been convinced that an e-reader is more pleasant than a real, dead-tree book. Especially around christmas gift-buying time, the smell of a bookshop is something difficult to replicate by any other means.

In the spirit of this decidedly irrational attachment to an archaic technology, the only reasonable thing to do was to take this blog, with all of its WordPress underpinnings and moving graphics, convert it to a rigid pattern of inked colours, and splat it onto a few hundred pages of mulched up wood.

Excuse the poor handing by me.

Side.jpg

Interior.jpg

The whole thing weighs in at over 340 pages, with 44 chapters corresponding to a blog post each, and includes 320 full-colour figures.

How to get the book

It will soon be available for purchase on Amazon, under the title ‘Almost Looks Like Work’, e.g. Amazon UK, Amazon US.

The cost is unfortunately high, £30 in the UK and $48 in the US, of which I would receive around £1/$1. This ultimately comes down to my decision to print the book entirely in colour, which I feel adds a lot to many of the plots and illustrations.

Why not to get the book

This book comprises most of the blog posts I’ve written, almost exactly as they appear online. There is a small additional introduction, but otherwise no new content that you can’t read here.

Some of the figures contain text which is quite small, though the vast majority have translated decently well to print format. The variability of the figure quality is much larger than you would expect for a similarly-priced, professional publication.

If you are someone who likes the idea of reading blog posts which are typeset and paginated, but don’t want to spend so much on a book, get in touch and I can send you the PDF for a cost much closer to zero!

How it’s made

Right, with a somewhat self-deprecating sales pitch out of the way, let’s focus on the important stuff.

It turns out that it is possible to take an entire WordPress site and export it as an XML file, for the purposes of transferring content to another WordPress installation. Inside this file, each blog post is an <item>, e.g. the beginning of the first ever post:

<item>
<title>The fancy header image</title>
<link>https://jasmcole.com/2014/07/11/the-fancy-header-image/</link>
<pubDate>Fri, 11 Jul 2014 20:34:59 +0000</pubDate>
<dc:creator>jasmcole</dc:creator>
<guid isPermaLink="false">https://jasmcole.com/?p=9</guid>
<description></description> 
<content:encoded>
<![CDATA[The above image is of a few iterations of a Newton fractal.
Assuming that isn't quite enough information for you, a Newton fractal is a 2D plot of the complex plane - that is to say real numbers along the horizontal axis and imaginary along the vertical - where each point is coloured by the value of a complex number there.
<a title="Newton Fractal - Wikipedia" href="http://en.wikipedia.org/wiki/Newton_fractal" target="_blank">http://en.wikipedia.org/wiki/Newton_fractal</a>
<!--more-->

I then use the xmltodict python package to extract the content corresponding to each blog post, which I could then convert into Latex markup.

import xmltodict

with open('almostlookslikework.xml') as fd:
 doc = xmltodict.parse(fd.read())

for item in doc['rss']['channel']['item']:
 itemtype = item['wp:post_type']
 if itemtype == 'post':
   title = item['title']
   content = item['content:encoded']
   parse(content) # Do the actual XML -> Latex conversion

The actual parsing code I wrote was bad, not particularly idiomatic python and full of repetition and dirty hacks to avoid edge cases. In a typical case below, I replace anything inside a <strong> tag with the equivalent Latex macro \bfseries{}

# Format <strong> blocks
i1 = 1
while i1 >= 0:
  i1 = content.find('<strong>')
  i2 = content.find('</strong>')
  if i1 >= 0:
    content = content[:i1] + '{\\bfseries ' + content[i1+8:i2] + '}' + content[i2+9:]

The end result after all of this was a 6,000 line Latex file, surprisingly long and taking up to a minute to compile the whole thing from scratch.

The XML file doesn’t contain any of the images, so when one of the many variants of the markup WordPress has used over the years to denote an image was found, I grabbed it from the relevant URL with urllib.

One interesting problem was dealing with the many animated GIFs I’ve made. In this case I downloaded the GIF, used ImageMagick to split it into frames, then concatenated a number of the frames into a single image using numpy.

Hyperlinks became footnotes, equations were almost all written in Latex form already, and header tags were converted into \section and \subsection elements as appropriate.

One of the lengthier portions of the conversion was writing proper captions for all of the figures, then checking that the figures appeared at least near the prose describing them. In some cases this was not possible, as I tend to include lots of figures when I’m writing, and pesky page breaks would cause them to jump to the end of the chapter.

I made the cover by recreating the fractal used as the header image to the blog in much higher resolution, and added a couple of annotations in Illustrator.

Finally, after everything was compiled together, I uploaded the PDFs to Create Space, who take care of the Amazon listing and printing etc. There are other self-publishing possibilities, but they require that you purchase your own ISBN numbers. In the UK these are approximately £90 each, which I deemed slightly too expensive for an experimental book.

Why?

This book was inspired by a commenter on Hacker News saying they would enjoy a printed version of my last blog post, so thanks to them for filling my recent weekends!

This book doesn’t mark the end of my posting here, more a challenge to see how well this format translates to print. Here’s to volume 2 in a couple of years time!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s