Sebastian's personal website

Improving org-publish workflow for blogging

Written by Sebastian Dümcke on
Tags:

As I wrote before, one of my remaining issues with moving to org-publish is to not have a page listing all the posts associated with a certain tag. Also I really wanted to blog from a subtree as can be done via ox-hugo. In this post I show how I solved both these issues as well as some further improvements I implemented.

Generating tag pages with org-publish

I looked into different options here:

  • generating the tag pages from the information in the subtree of all my posts (without using org-publish, but still using emasc elisp). However, currently my posts are split between being in org file in a folder tree and the last few in a subtree in my personal org file.
  • generate the tag pages from the folder structure of all posts (outside of emacs completely) which would required parsing org files to get the #+filetags heading value
  • generate a single tag page instead of 1 page per tag, (ab)using the sitemap feature of org-publish. This turned out to be my final choice.

For this to work we need the following tricks:

  1. format each sitemap entry by concatenating the tags with the remaining post information with specific separators (I used ’|’ and ’:’). This happens in a custom function in :sitemap-format-entry
  2. generate the sitemap page by splitting the info into a hash or association list (I used the latter) then use that to format the final page. This happens in a custom function for :sitemap-function
  3. use a custom publish function that will only publish the tag page, ignoring all other input (which is already published by a previous project configuration). Here we override :publishing-function

Let me show you the code for each below. We start with the function that formats each entry in the sitemap. However, we only use that to generate a string in a format that we can later parse:

(defun samd/generate-tag-entry (entry style project)
  "generate a template string containing the tags of each entry first (separated by : or untagged if post has no tags, then add the post path, title and date (similar to the index sitemap, speparated by |"
  (cond ((not (directory-name-p entry))
         (format "%s|%s|%s|%s"
                 (let ((tags (org-publish-find-property entry :filetags project)))
                   (if tags (string-join tags ":") "untagged"))
                 entry
                 (org-publish-find-title entry project)
                 (format-time-string "%b %e, %Y" (org-publish-find-date entry project))))
        (t entry)))

The next function is used for :sitemap-function but really only structures the code, the magic happens in samd/tag-entry-to-alist and samd/format-tag-alist. So first the sitemap function:

(defun samd/generate-tag-page (title list)
  "Generate an org tree with tag name as header and the list of posts with that tag anti-chronologically under it.
TITLE is the title of the page.  LIST is an internal
representation for the files to include, as returned by
`org-list-to-lisp'."
  ;create an alist from tag to file+title
  (let* ((mapping (samd/tag-entry-to-alist list))
         (str (samd/format-tag-alist mapping)))
    (concat "#+TITLE: " title "\n\n"
            str)))

Now onto the really tricky part. We first parse the template string of each into an alist (one could also use a hash here):

(defun samd/generate-tag-entry (entry style project)
  "generate a template string containing the tags of each entry first (separated by : or untagged if post has no tags, then add the post path, title and date (similar to the index sitemap, speparated by |"
  (cond ((not (directory-name-p entry))
         (format "%s|%s|%s|%s"
                 (let ((tags (org-publish-find-property entry :filetags project)))
                   (if tags (string-join tags ":") "untagged"))
                 entry
                 (org-publish-find-title entry project)
                 (format-time-string "%b %e, %Y" (org-publish-find-date entry project))))
        (t entry)))

After we have the association list mapping each tag to a list of post information (path, title and date) we can then parse this out and generate the final string to be added under the title of the sitemap file. The list argument is a sort of AST representation of all entries, in our case just a flat list prefixed with “ordered” which we discard with the first cdr call in dolist:

(defun samd/tag-entry-to-alist (list)
  (let (res '())
    (dolist (e (cdr list) res)
      (let* ((fields (split-string (car e) "|"))
             (tags (split-string (car fields) ":")))
        (dolist (tag tags) (setq res (cons (cons tag (append (cdr (assoc-string tag res)) (list (cdr fields)))) (assoc-delete-all tag res))))))))
(defun samd/format-tag-alist (alist)
  (string-join (mapcar (lambda (e) (concat (format "* %s\n:PROPERTIES:\n:CUSTOM_ID: %s\n:END:\n" (car e) (car e))
                                           (string-join (mapcar (lambda (p) (format "- [[file:%s/%s][%s]] - %s"
                                                                                    (concat "../" samd/posts-publish-directory)
                                                                                    (car p)
                                                                                    (cadr p)
                                                                                    (caddr p))) (cdr e)) "\n")
                                           )) alist) "\n"))

Note how we add a CUSTOM_ID property to have anchor tags on the page. We will use this further on to map tag URLs to the right position on the tags page.

Last is the custom publish function that will ignore all files but the newly generated sitemap:

(defun samd/publish-tag (plist filename pub-dir)
    "only export if filename is tags.inc"
    (if (equal "tags.inc" (file-name-nondirectory filename))
        (org-html-publish-to-html plist filename pub-dir)))

Now adding the following to the org-publish-project-alist variable brings it all to life:

("tag-page"
 :base-directory "./posts"
 :base-extension "org"
 :publishing-directory "public/"
 :recursive t            ;required because posts in subfolders
 :auto-sitemap t
 :sitemap-filename "tags.inc"
 :sitemap-title "Posts by tags"
 :sitemap-style list
 :sitemap-function samd/generate-tag-page
 :sitemap-format-entry samd/generate-tag-entry
 :sitemap-sort-files anti-chronologically
 :publishing-function samd/publish-tag ;ensures only tags.inc file (i.e. sitemap) get published
 )    

Conserve previous behaviour using Apache rewrite rules

In my previous Hugo based setup, calling sam-d.com/tags/ would show a page with a list of all tags, clicking a tag would navigate to /tag/exampletag/ and show a list of posts.

Currently, we publish the list of all tags on a single page under /tags.html. However, we did generate CUSTOM_ID properties for each tag heading which can be used as anchor.

With the following .htaccess file we can get that behaviour back. One rule redirects /tag to /tags.html and the other rule parses the tag and redirects to the tags.html page at the right heading using the anchor tag: /tags.html#exampletag. Unknown tags are simply ignored (i.e. just redirect to the tag page). I also learned, that anchor tags are handled on the client side, so that rewrite rules have to redirect (R option) and not encode non-html characters (NE option). The order matters, as rules are tried in the order described. If we switch the order around then the query is first rewritten to sam-d.com/tags.html/exampletag which will then not match the next pattern. Here is the final file content:

RewriteEngine On
RewriteRule "^tags/([A-Za-z0-9]+)/?" "http://sam-d.com/tags.html#$1" [NE,R=301,L]
RewriteRule "^tags/?" "tags.html"

Adding tags to post header

With the knowledge gained writing the code from the previous section, I could finally add tags to the post header. For this the function samd/add-author-timestamp was changed as follows:

        (defun samd/add-author-timestamp (content backend info)
         "Filter to add Author and timestamp information into the header tag containing the post title"
         (if (and (org-export-derived-backend-p backend 'html) (org-export-get-date info ))
             (let ((timestamp (org-export-get-date info "%Y-%m-%dT%T"))
                   (timestring (org-export-get-date info "%B %e, %Y"))
                   (tags (plist-get info :filetags)))
               (replace-regexp-in-string "\\(<main.*\\(\n.*\\)*\\)</header>" (concat "\\1 <p class=\"info\">Written by "
                                                   (org-export-data (plist-get info :author) info)
                                                   " on <time datetime=\"" (org-export-data timestamp info) "\">" (org-export-data timestring info) "</time> "
     (if tags (progn (concat "<br />Tags: " (string-join (mapcar (lambda (s) (format "<a rel=\"tag\" href=\"http://sam-d.com/tags/%s\">%s</a>" s s)) tags) " ")))
       "")
 "</p></header>")
content)))) 

Then I also updated the styling in the CSS file to highlight the tags better by changing the link colors and giving each tag a dotted border and some padding.

Blogging from a subtree

ox-hugo has the option to create blog posts in a subtree and export this tree into the required folder structure. I wanted to have something similar, as I find it convenient to have all posts inside my personal org file instead of in a separate folder structure. I implemented this in the following way: posts are subheadings in an org file. Each post can be tagged using the standard org-set-tags-command and must contain the EXPORT_DATE and EXPORT_FILE properties. The latter are conveniently created by an org-capture template. However, it turns out that since I have “clean URLS” (meaning a post is accessed by /post-title, which is implemented by having the post in a folder post-title inside an index.org file), the path in EXPORT_FILE needs to create the folder post-title before the export. This is not part of the standard export process and needs to be coded by ourselves. So the implementation tasks break down as:

  1. generate capture template
  2. ensure the correct working directory is set for linking to media and other posts
  3. ensure the target folder is created before the export
  4. getting tags into filetags on exporting the post/subtree

The capture template is the most straightforward. It interactively queries for the post title and converts it to an URL safe path. In org, a capture template can be generated by calling a function that returns a string.

    (setq org-capture-templates '(("p" "Create blog post draft" plain
(file+headline "personal.org" "Posts")
#'samd/generate-post-template :jump-to-captured t :clock-in t :clock-resume t)))

With the relevant function as such:

(defun samd/generate-post-template ()
  "generate a template for a new post, to be then exported with org-export-to-org.
This function changes directory into a dummy subdirectory of posts/ so that relative
links to static media and to other posts can be inserted.

An advise around org-capture-finalize will then later reset the directory to org-directory.
Thus relative linking works only while in the indirect capture buffer"
  (let* ((title (read-string "Title: "))
         (fname (samd/blog-title-to-fname title))
         (path (file-name-concat
                (expand-file-name blog-directory)
                "posts"
                fname))
         (post (file-name-concat path "index.org"))
         (dummy (file-name-concat
                     (expand-file-name blog-directory)
                     "posts" "dummy"))) ;dummy directory must exist
    (cd dummy) ;change path to dummy folder so that linking between posts works with auto-completion
    (format "**** %s :DRAFT:\n:PROPERTIES:\n:EXPORT_DATE: %%U\n:EXPORT_FILE_NAME: %s\n:EXPORT_OPTIONS: date:t timestamp:nil author:nil \n:END:\n%%?"
            title
            post)))

(defun samd/blog-title-to-fname (title)
  (thread-last
    title
    (replace-regexp-in-string "[[:space:]]" "-")
    (replace-regexp-in-string "-+" "-")
    (replace-regexp-in-string "[^[:alnum:]-]+" "")
    downcase))

In order to enable linking to media and other blog posts with the correct relative links in org in our subtree, the capture template sets the current working directory. Ideally we would set it to the directory containing the blog post. But since this directory has not yet been created (as the post has not been exported), I chose as workaround to create a directory dummy inside my post directory and set the working directory to this. As this will influence the work directory of emacs, we make this change temporary for the duration of the capture. The work directory gets reverted to the path containing my org files by the following advice to org-capture-finalize. It only runs for the capture template key chosen above ("p"):

(advice-add 'org-capture-finalize :after (lambda (r) (when (string= (org-capture-get :key) "p") (cd org-directory)))) ;go back to org-directory after completing a post

The next function creates the parent directory of the post. We then advise org-export-output-file-name with this function

(defun samd/org-export-create-parent-if-not-exists (orig-fun extension &optional subtreep pub-dir)
  (let* ((target-file (apply orig-fun extension subtreep pub-dir nil))
         (parent-dir (file-name-directory target-file)))
    (when subtreep ;only change path when exporting a subtree (this is how I currently plan to use this). Avoids unexpectedly impacting other exports
      (unless (file-exists-p parent-dir)
        (make-directory parent-dir nil)));create post directory, if all parents exist
    target-file))
  (advice-add 'org-export-output-file-name :around #'samd/org-export-create-parent-if-not-exists))

Now to ensure that the tags of the subtree are used as filetags of the resulting org file we advise org-export-before-parsing-functions with the following function:

(defun samd/org-export-add-file-tags (backend)
  "when exporting subtree to org file add tags on parent header as filetags into the buffer"
  (when (eq backend 'org)
    (let ((tags (org-get-tags)))
      (goto-char (point-min))
      (insert (concat "#+FILETAGS: :" (string-join tags ":") ":\n")))))
(add-hook 'org-export-before-parsing-functions #'samd/org-export-add-file-tags)

Remember: For the option to “export to org” to show up one must load the ox-org package!

Add syntax highlighting

I took the time to finally tackle syntax highlighting. In my past attempt I encountered an error message when using htmlize. This error message has now vanished, and I can follow the typical steps of including highlighted output in org-publish:

  1. Generate CSS and add to header

    (setq org-html-htmlize-output-type 'css)
    
  2. load htmlize package and all babel libraries for the code used in the post then turn on htmlized-source

The first step needs to be done once, and only repeated when adding programming languages or changing themes. The generated CSS is the one from your current theme. It seems that my current theme only does very subtle highlighting. You need to load the modes for all the programming langages that are used in any src block before you generate the CSS output.

Conclusion and last trick

This was quite an overhaul of my blogging setup. I am very happy with the tag page and the re-write rule trickery. And also happy that I can now generate the posts from a subtree in my personal org files. This helps me e.g. clock into each blog post to track the time it took to write and publish.

As a side note, I can now list all my draft posts with a very nifty dynamic column view block

#+BEGIN: columnview :hlines t :id local :match "DRAFT" :maxlevel 4 :format "%24ITEM(Draft) %EXPORT_DATE(Created)"
#+END: