I create things. I blog about it. Sometimes.

16 February 2017

Setting up a Proper Multilingual Site with GitHub Pages and Jekyll

tl;dr Deploying multilingual sites with custom collections on GitHub Pages can be a bit tricky. This post shows you how to make it work.

Jekyll and GitHub Pages are a match made in heaven. The idea of a “flat-file CMS” whose content is compiled into static files that are pushed to a repository and then served as a public website is simply beautiful. Not just that, but it fixes everything that is wrong with normal content management systems. Versioning comes out of the box. There is no need to worry about PHP vulnerabilities, SQL injections etc. because - uh - we’re just serving static files here. If you have ever had to maintain a Wordpress installation you know the pain. Performance is a blast because - uh - we’re just serving static files ;-). In fact, when you host your static site (with a correctly configured domain record) on GitHub Pages everything is backed by their super fast CDN by default!

We’re just serving static files here.

The only downside of the Jekyll + GitHub workflow is that it has a very steep learning curve for non-technical folks. This is probably why most of GitHub Pages are either tech project pages or personal tech blogs (like this one). However, the capabilities of GitHub Pages go far beyond simple blogs. In fact there is a plethora of useful Jekyll plugins that enable you to create rather complex websites.

I18N

When I was working on a bilingual website to be hosted on GitHub Pages, I was thrilled to find out there are at least three serious-looking internationalization plugins). I needed some advanced capabilities like translated permalinks so I went with the jekyll-multiple-languages-plugin, which looked like the most mature solution at the time. It did take me about 90% of the way, at which point I had to dive into the plugin code and modify some things to fit my needs. In the following I’ll show my complete, working setup in the hope that you may find it (or parts of it) useful.

Requirements

I had the following requirements for the website:

  1. Every language (including the default!) resides in its own subfolder (/en/, /de/ etc.).
  2. The central index.html redirects to the correct subfolder based on the browser language.
  3. Every post and page can have its own permalink (/en/about-us/, /de/ueber-uns/ etc.).
  4. Custom collections are translated just like the _posts collection.

The jekyll-multiple-languages-plugin was capable of doing most of this, except for 1. and 4.

The Fix

The modified version of the plugin fixes these issues and can be found here, with usage instructions here.

Basic Setup

A clean Jekyll site in English and German should have the following structure:

mysite/
├ _i18n/
  ├ de/
  ├ en/
  ├ de.yml
  └ en.yml
├ _includes/
├ _layouts/
├ _plugins/
├ _posts/
├ assets/
├ _config.yml
├ base.html
├ CNAME
└ index.html

Most of these files and folders should be familiar - _includes, _layouts and _posts are just regular Jekyll folders, assets contains all static assets (images, CSS and Javascript files) and _plugins contains the modified plugin you downloaded. The CNAME file is needed when you deploy your site to GitHub Pages with a custom domain. Let’s look at the remaining items:

  • _i18n is a folder introduced by the jekyll-multiple-languages-plugin. It contains the translations of individual strings in the *.yml files as well as translated pages and collection documents in the respective subfolders.
  • index.html is the default page that will be put into each of the language subfolders. Caution: It no longer serves as the central index.html for your site.
  • base.html is a file I introduced to fix the absence of a central index.html. The plugin takes this file and makes it the index.html of your generated _site. It can (and should) contain language redirect logic.

A Minimal Site

Let’s work with a small example - an English and German site with a welcome page and a proper redirect.

_config.yml

url: http://www.example.com

languages: ["en", "de"]
exclude_from_localizations: ["assets", "CNAME"]
defaultLang: en
languageNames:
  de: Deutsch
  en: English

...

_i18n/de.yml

site:
  name: Meine Beispiel-Webseite
  tagline: Mit gutem Beispiel voran!
  description: Diese Beispiel-Webseite wird Sie begeistern!
  keywords: Beispiel, Webseite, toll

main:
  welcomeSection:
    anchor: willkommen
    heading: Willkommen auf dieser tollen Webseite!
    p1: Hunderttausende Menschen sind von dieser Seite begeistert. Sie auch?

_i18n/en.yml

site:
  name: My Example Website
  tagline: Setting a good example!
  description: This example website will amaze you!
  keywords: example, website, amazing

main:
  welcomeSection:
    anchor: welcome
    heading: Welcome to this marvelous webseite.
    p1: Hundreds of thousands of people are amazed by this site. Are you?

index.html

<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>
    <meta charset="utf-8">
    <meta name="description" content="{% t site.description %}"/>
    <meta name="keywords" content="{% t site.keywords %}"/>
    <title>{% t site.name %} - {% t site.tagline %}</title>
</head>
<body>
    <section id="{% t main.welcomeSection.anchor%}">
        <div>
            <h2>{% t main.welcomeSection.heading %}</h2>
            <p>{% t main.welcomeSection.p1 %}</p>
        </div>
    </section>
</body>
</html>

base.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Amazing Example Website</title>
</head>
<body>
<script>
    var lang = navigator.language || navigator.userLanguage;
    if (lang.indexOf('de') == 0)
        window.location = '/de/';
    else
        window.location = '/en/';
</script>
</body>
</html>

A couple things are worth noting:

  • Usually the contents of index.html would be modularized into layouts and includes. For simplicity everything is in one file here.
  • Our index.html uses translated strings in both the header (meta data) and the body (content of the welcome page). You can use an arbitrary structure in your YAML files, just make sure it’s the same structure for every language file, otherwise Jekyll will complain about missing strings at compile time.
  • We can translate everything down to anchor IDs, so in the above example you can link to www.example.com/de/#willkommen as well as www.example.com/en/#welcome.
  • Our base.html is excluded from the plugin translation process on purpose, so we have to put in any meta data (title, description etc.) verbatim, in the default language.
  • The redirect in base.html uses Javascript. A better solution would be to do a server side redirect based on the request Accept-Language header. Unfortunately, it’s not possible to configure server redirects with GitHub Pages, so Javascript is the best we can do.

Adding a Page

Now let’s add an “About us” page in English and German.

mysite/
├ _i18n/
  ├ de/
    └ about.md
  ├ en/
    └ about.md
  └ ...
├ about.md
└ ...

about.md

---
layout: page
title: pages.about

namespace: about
permalink: /about-us/
permalink_de: /ueber-uns/
---

{% tf about.md %}

_i18n/de/about.md

Wir sind ein wundervolles Team verrückter Visionäre.

_i18n/en/about.md

We are a wonderful team of crazy visionaries.

Note:

  • The about page is only defined once (in the root directory) and filled with content of the respective language via the {% tf ... %} (translate file) tag on each translation pass.
  • The front-matter title has to be defined in the language-specific *.yml files. This may seem confusing but actually makes sense given that it’s a Liquid variable that may be used in templates.
  • The namespace is optional. Actually, the term ‘namespace’ is a bit misleading. It was introduced by the original plugin, so I didn’t want to override it. You can use it to insert translated links with the {% tl ... %} (translate link) tag.

Adding a Language Switcher

Let’s add the possibility to switch the language at the bottom of our index.html:

<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>...</head>
<body>
    ...
    <footer>
        <a class="active" href="#">{{ site.languageNames[site.lang] }}</a>
        {% for lang in site.languageNames %}
        {% if lang[0] == site.lang %} {% continue %} {% endif %}
        {% if page.namespace %}
        <a href="{% tl {{ page.namespace }} {{ lang[0] }} %}">{{ lang[1] }}</a>
        {% else %}
        <a href="{{ site.baseurl_root }}/{{ lang[0] }}/">{{ lang[1] }}</a>
        {% endif %}
        {% endfor %}
    </footer>
</body>
</html>

Note:

  • The switcher code works on any page, not just index.html. For a real website you should put it in a footer.html include file and reference that in your layout.
  • The currently active language is written out first, with a CSS class .active for highlighting.
  • The other languages are enumerated as links, in the order in which they are defined in _config.yml.
  • The links point to the translated version of the current page if it exists or the language’s index.html otherwise.
  • On a real website you probably want to make the switcher a bit prettier. If you’re using Bootstrap I recommend using a dropup menu. Here’s what that looks like on the site I built:

Adding Custom Collections

Now let’s add an FAQ section to our website. Because the collection of questions might grow over time, we don’t want to maintain a long list of question in one file for each language (these might be hard to keep in sync). Rather we would like to have an actual collection of question-and-answer documents. These could then be displayed in one file and/or separately on subpages with fully qualified URLs (which might be good for SEO).

First, we must add the custom collection to our _config.yml, otherwise no output will be generated:

...

collections:
  faq:
    output: true

...

Then we create folders for our FAQ collection:

mysite/
├ _faq/
├ _i18n/
  ├ de/
    ├ _faq/
    └ ...
  ├ en/
    ├ _faq/
    └ ...
  └ ...
└ ...

As always, the root _faq folder will contain the actual documents (with front matter) and the language subfolders the translations that are filled in.

Let’s add a question to the collection:

_faq/why-is-this-site-so-awesome.md

---
layout: faq-entry
title: faq.why-is-this-site-so-awesome

namespace: faq.why-is-this-site-so-awesome
permalink: /faq/why-is-this-site-so-awesome
permalink_de: /faq/warum-is-diese-seite-so-toll
---

{% tf _faq/why-is-this-site-so-awesome.md %}

_i18n/de/_faq/why-is-this-site-so-awesome.md

Das ist schwierig zu erklären. Viele Leute finden unsere Seite einfach großartig.

_i18n/en/_faq/why-is-this-site-so-awesome.md

That is hard to explain. Many people just think our website is great.

Having such detailed front matter for each FAQ entry may seem like a lot of boilerplate. However, remember that it allows you to tweak every aspect of how the entry appears, including its permalink, for every language. I find that while the initial setup is a bit tedious, adding new content over time is much more convenient with a custom collection. Other use cases for custom collections are listing the team members of a company or recommending literature in a reading list.

By default, the FAQ entries will be output to separate documents using the specified template and permalinks. To top it off, let’s add a dedicated FAQ page:

faq.md

---
layout: page
title: pages.faq

namespace: faq
permalink: /frequent-questions/
permalink_de: /haeufige-fragen/
---

{% for question in site.faq %}
<div>
    <h4>{% t {{ question.title  }} %}</h4>
    {{ question.content }}
</div>
{% endfor %}

Voilà. Now you can access your FAQ both as a list at www.example.com/de/haufige-fragen and www.example.com/en/frequent-questions/ as well the individual entries at their own permalink, e.g. www.example.com/de/faq/warum-is-diese-seite-so-toll/ and www.example.com/en/why-is-this-site-so-awesome/. Note that we didn’t need to create a faq.md file in the language subfolders, because the root file simply outputs translated content in a loop.

Social Sharing with OpenGraph Tags

Our website is almost ready to be published. But what if people actually realize how awesome it is and want to share it on Facebook, LinkedIn, WhatsApp etc.? As you probably know, it’s possible to control the appearance of “sharing previews” of a website by means of Open Graph meta tags.

For any content that resides in the language subfolders, this is easy - we just add a few lines to the header:

index.html

<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>
    <meta charset="utf-8">
    <meta name="description" content="{% t site.description %}"/>
    <meta name="keywords" content="{% t site.keywords %}"/>
    <title>{% t site.name %} - {% t site.tagline %}</title>
    <meta property="og:title" content="{% t site.og.title %}"/>
    <meta property="og:image" content="{% t site.og.image %}"/>
    <meta property="og:description" content="{% t site.og.description %}"/>
    <meta property="og:url" content="{{ site.baseurl }}{% if page.url %}{{ page.url }}{% endif %}"/>
    <meta property="og:locale" content="{% t site.og.locale %}"/>    
</head>
<body>
    ...
</body>
</html>

Then we simply add the desired strings to our language YAML files:

_i18n/de.yml

site:
  ...
  og:
    title: Meine Beispiel-Webseite
    image: "http://www.example.com/assets/img/cool_1200x1200_image.png"
    description: Diese Beispiel-Webseite wird Sie begeistern!
    locale: de_DE

...

_i18n/en.yml

site:
  ...
  og:
    title: My Example Website
    image: "http://www.example.com/assets/img/cool_1200x1200_image.png"
    description: This example website will amaze you!
    locale: de_DE

...

Note:

  • Again, for a real website you should move the meta stuff into a header.html include file.
  • After you deploy your site, you can check whether Facebook picks up the correct info with the Sharing Debugger.
  • Obviously you can refine the inclusion of og: tags in your page and post templates, enabling you to include specific images and descriptions for individual posts and pages via front matter.

A Small Flaw

Great - so now everything works, right? Almost. Unfortunately, there is small blemish with GitHub Pages that cannot be fixed. What happens if people share your main URL (www.example.com)? Facebook has specified a redirect mechanism for this.

The basic idea is this: The main page (index.html) contains the og: tags for the default language (English in our case). However, it can also contain one or several og:locale:alternate tags to indicate that this page is available in other languages. So we would have

<meta property="og:locale:alternate" content="de_DE" />

in our index.html. The Facebook crawler would (on the first visit to the site) fetch the default version (en_US), and then refetch the page with the German locale (de_DE), to get the the version preferred by users with a German browser, which would contain the German og:tags. To communicate that it is looking for the German version of the main page, the crawler sends the X-Facebook-Locale header and it also attaches the URL parameter fb_locale to the request URL.

And this is where we are out of luck. Since GitHub Pages only serves flat files and we have no way to configure or react to parameters in the request header, we miss this signal. Javascript would possibly let us extract the URL parameter, but at that point it’s too late - the meta data is served to the crawler before any piece of Javascript has even been loaded. I wrote to GitHub support about this and they confirmed that there is no workaround for this problem at the moment:

“That is correct. GitHub Pages is not intended to be a fully configurable host, but I can definitely pass this along as a feature request to the team. We don’t share our roadmap publicly, so I can’t say if or when it will be implemented, but my guess is that it won’t be anytime soon.”

So the best fallback at the moment is to put your default language og: tags verbatim into base.html. While this is definitely a drawback (German users will see the English description and title when sharing your site), I don’t think it’s a show stopper in practice. Whenever users share specific content (residing in one of the language subfolders) they see the correct info.

Closing Thoughts

Creating a properly internationalized website is a lot of work, regardless of which toolbox you use. Jekyll and GitHub Pages give you pretty much everything you need. If your requirements are similar to mine (every language in its subfolder, translated permalinks, custom collections etc.) you may find the modified plugin useful. Additionally you may benefit from some of the tips and tricks explained above. Last but not least, I would like to say a big “thank you” to Anthony Gaudino and all contributors for creating the original plugin, which is fantastic and made my life a whole lot easier.