and late one accursed night, I compounded the elements

This post is rather obsolete in its relation to this blog but is being kept for posterity and in case I need to refer back to it.

So I finally cracked and moved my blog to a new platform. I was happy with my wordpress blog, it never gave me any bother. From Wordpress.com to my self hosted install. It treated me right. I must admit there was one thing that often bothered me. While many people dislike the dashboard, specifically the new post section I find it completely fine from a interface point of view. The output however often had me vexed. Even in raw HTML mode it would often fiddle with the finished article. I wanted more control. I had overheard conversations amongst friends about something called habari and I thought it about time I tried something a little more unusual to serve up my inane drivel. At first I did nothing but it is one thing to mortify curiosity, another to conquer it.

A fortnight later, by excellent good fortune, Norman gave one of his pleasant dinners to some five or six old cronies, all intelligent, reputable men and all judges of enjoyable yack shaving techniques.

Here the conversation turned to markdown, makefile blogging and the Jekyll static website generator. I had recently been working a lot more with stylesheets and wanted to write blog posts and not have to worry about altering formatting for each one. I often couldn't help moving this paragraph that way a bit, that image the other. It was inconsistent and a waste of time. No shit you might say but I catch on slowly. I was also drawn to the notion of having my blog articles in a version control system like git. My curiosity was well and truly piqued. I was worried about ending up with a broken website but the temptation of a discovery at last overcame the suggestions of alarm.

The important thing to remember is that this was all the fault of Norman.
Even this very post.

Story of the door

Enough tomfoolery, down to business. Jekyll needs Ruby. So on my CrunchBang desktop:

sudo apt-get install ruby1.8-dev

After this I needed to install RubyGems. So I followed these instructions. What fun. I was starting to think about turning back at this point but I think mostly because I
was tired. That aside, I was now able to follow the Jekyll install instructions

gem install jekyll

Then I turned to the rest of the reasonably good documentation and did a little googling around reading some blog posts [1][2][3][4]. I downloaded a sample jekyll sitefrom git, take your pick, and played with it for a while.

I spent a while familiarising myself with markdown (of which I am now quite the fan), liquid markup and how Jekyll generates the site by taking each file it finds with a YAML header and applying the specified layout file in "_layouts" to generate the finished article. All very good, if but a bit confusing at first. I messed around with a site design using dummy content and, once I had developed some delusions of adequacy, set about importing my Wordpress content.

Inane drivel

There are a handful of tutorials for how to get your inane drivel from Wordpress to Markdown or Textile files by connecting to and querying the Wordpress MySQL database. This seems like overkill to me when you can export an XML file of all of your posts. There are a couple of Ruby scripts kicking around that will parse this XML file into individual Textile file for each post. Unfortunately there aren't any for Markdown. It wouldn't be too difficult to create one but with the amount of posts I have (~20), it hardly seemed worth it. So I... I manually created the Markdown files (-10 geek points).

Tweaking under the bonnet

All seemed to be going to plan so I went about making my design work correctly with my content. Firstly I needed to alter the YAML header to include a category to use for my URLs and tags because, just because (I remain undecided on if there will be any tag pages). I then added my own "img" variable. My blog design has a small thumbnail to go with each article and the liquid templates would need to know which image goes with which post. My YAML headers were now like this:

--- 
layout: post
title: and late one accursed night, I compounded the elements
published: true
postdate: 2011-06-15 00:00
categories: [technology]
tags: [technology, blog, markdown, ruby]
thumb: jekyll.jpg
---

Then within any liquid layout where I had access to a post I would insert the image like below:

{% if page.thumb %}
    <img src="../path/to/images/{{page.thumb}} " />
{% endif %}

It was at this point that I installed rdiscount as an alternative markdown parser as I found a few annoyances with the default e.g. If you want italics inside parenthesis
writing (_it like this doesn't work by default_) but it does with rdiscount.

Also, I should mention that I had trouble with the £ sign displaying properly. I made peace with it and figured I'd have to use ampersand pound. However, the next time I tried it, it worked fine. I could look into what was going on but I don't want too and you can't make me.

As well as the thumbnail I also wanted a way to include some licence information about any content that was borrowed from other sources. I wanted this information to appear in a footer in each article and didn't think markdown would allow me that sort of control. I toyed with the idea of having this text in a separate file and include it in the layout but this became tricky. I took the lazy mans route and I added another custom YAML variable.

licence: Headline image licenced under CC BY-NC-SA 2.0 provided by...

I accessed this variable in the same way as the thumb variable to insert the text at
the foot of each article.

Plug-in baby

Next I turned my attention to how to retrieve a excerpt of an article. Specifically I wanted to be able to place a "<--more-->" tag in my post and be able to access the text preceding it from my layout files. One possibility would be to truncate the post at an arbitrary word limit like this:

{{ post.content | truncatewords:80 }}

This does work but I had 2 issues.

  1. Whilst truncating at a fixed point was ok for most of my article previews I
    wanted my most recent article to be more neatly and deliberately snipped.

  2. The formatting such as headings, lists, quotes et cetera was applied to this
    retrieved content and made quite a mess on the home and archive pages.

I did a bit more googling to find out what was possible and found out that Jekyll can use plug-ins. You basically writ some Ruby *shudder* and put the script in the _plugins folder in the root of the blog (the folder wasn't there already for me but it depends on which blog you use as your starting point). I looked around for anyone who's attempted what I wanted because I never like to let people get away with not doing my work for me. Sure enough I found something that was parsing an html document and splitting it on the"<--more-->" tag. I altered it slightly and I made it strip out any images and headings.

require 'htmlentities'
require 'nokogiri'

module PostMore
    def postmorefilter(input)
        if input.include? "<!--more-->"
            coder = HTMLEntities.new 
            doc = Nokogiri::XML::DocumentFragment.parse(input.split("<!--more-->").first)  

        end
    end

    def stripstyle(input)
        coder = HTMLEntities.new 
        doc = Nokogiri::XML::DocumentFragment.parse(input)  
        doc.css("img").each do |img| 
            img.replace "" 

        end 

        doc.css("h3").each do |h3| 
            h3.replace "" 
        end    
    end
  end

  Liquid::Template.register_filter(PostMore)

It works a treat and is simple to pass the post content to the functions from the layout.

{{ post.content | postmorefilter | stripstyle }}

For the older articles I decided to just truncate the words but still wanted to strip the unwanted elements. This has flaws but I haven't bothered myself to do it properly and probably wont until it gives me grief.

{{ post.content | stripstyle | truncatewords:25 }}

Bish, bash, bosh.

Comments with Disquss

Jekyll defaults it's URLs to be constructed using the post date. It's a good default butI'm not keen as I don't have many posts and I like to have them show the category of the post like this: http://saltmypeanuts.com/technology/why-think/. To change the URL style I simply changed the "_config.yml" file to include:

permalink: /:categories/:title

The title that it uses isn't the YAML variable but fom the name of the markdown file
itself.

Doing the URLs this way kept parity with my Wordpress blog which is probably a good idea if you have... well... readers. I'm not overly concerned with link rot so I did
change a few that had previously unseen mistakes that bugged me. From my point of view the only reason to worry about altering the URLs was maintaining existing Disquss comments.

I needn't have worried because moving the comments is a snitch. For any posts that have new URLs you just make a CSV file containing Old URL, New URL and You upload the CSV in the Migrate Threads section of the dashboard. From there I just copied the various snippets from Disquss and pasted them into the layout files and dida bit of style tweaking.

Less is more

The end was in sight. I added all of the files to git and pushed them to
a Gitorious repository and use rsync to publish the "_site" folder to my web server. Less is more.

Full statement of the case

This was an interesting exercise. It wasn't completely straight forward and Ruby smells of poo but it wasn't too taxing for a run of the mill, though exceedingly cunning, geek. There are still things to tidy up and think about. I can't decide if having tag pages are worth the effort and I'd like to flesh out some of the design but it's pretty much there.

Running the jekyll command to completely regenerate the site takes about 1 second. This is hardly surprising because I have such a small number of posts and I'm led to believe that having to generate lots of tag and archive pages is something that slows the process down. I also opted not to bother with any code highlighting which is another are that retards generation time.

I'm more than happy with the results of this little sortie. Had a very pleasing feel to the notion of generating my site rather than have an application connected to a database. A hark back to simpler times (save the disquss javascript *cough*).

Stick a fork in me, I'm done.