Setting up a Proper Multilingual Site with GitHub Pages and Jekyll
tl;dr Deploying multilingual sites with custom collections on GitHub Pages can be a bit tricky. This post shows you how to make it work.
Jekyll and GitHub Pages are a match made in heaven. The idea of a “flat-file CMS” whose content is compiled into static files that are pushed to a repository and then served as a public website is simply beautiful. Not just that, but it fixes everything that is wrong with normal content management systems. Versioning comes out of the box. There is no need to worry about PHP vulnerabilities, SQL injections etc. because - uh - we’re just serving static files here. If you have ever had to maintain a Wordpress installation you know the pain. Performance is a blast because - uh - we’re just serving static files ;-). In fact, when you host your static site (with a correctly configured domain record) on GitHub Pages everything is backed by their super fast CDN by default!
We’re just serving static files here.
The only downside of the Jekyll + GitHub workflow is that it has a very steep learning curve for non-technical folks. This is probably why most of GitHub Pages are either tech project pages or personal tech blogs (like this one). However, the capabilities of GitHub Pages go far beyond simple blogs. In fact there is a plethora of useful Jekyll plugins that enable you to create rather complex websites.
I18N
When I was working on a bilingual website to be hosted on GitHub Pages, I was thrilled to find out there are at least
three serious-looking internationalization plugins).
I needed some advanced capabilities like translated permalinks so I went with the
jekyll-multiple-languages-plugin
,
which looked like the most mature solution at the time. It did take me about 90% of the way, at which point I had to
dive into the plugin code and modify some things to fit my needs. In the following I’ll show my complete, working
setup in the hope that you may find it (or parts of it) useful.
Requirements
I had the following requirements for the website:
- Every language (including the default!) resides in its own subfolder (
/en/
,/de/
etc.). - The central
index.html
redirects to the correct subfolder based on the browser language. - Every post and page can have its own permalink (
/en/about-us/
,/de/ueber-uns/
etc.). - Custom collections are translated just like the
_posts
collection.
The jekyll-multiple-languages-plugin
was capable of doing most of this, except for
1. and
4.
The Fix
The modified version of the plugin fixes these issues and can be found here, with usage instructions here.
Basic Setup
A clean Jekyll site in English and German should have the following structure:
mysite/
├ _i18n/
├ de/
├ en/
├ de.yml
└ en.yml
├ _includes/
├ _layouts/
├ _plugins/
├ _posts/
├ assets/
├ _config.yml
├ base.html
├ CNAME
└ index.html
Most of these files and folders should be familiar - _includes
, _layouts
and _posts
are just regular Jekyll
folders, assets
contains all static assets (images, CSS and Javascript files) and _plugins
contains the modified
plugin you downloaded. The CNAME
file is needed when you
deploy your site to GitHub Pages with a custom domain.
Let’s look at the remaining items:
_i18n
is a folder introduced by thejekyll-multiple-languages-plugin
. It contains the translations of individual strings in the*.yml
files as well as translated pages and collection documents in the respective subfolders.index.html
is the default page that will be put into each of the language subfolders. Caution: It no longer serves as the centralindex.html
for your site.base.html
is a file I introduced to fix the absence of a centralindex.html
. The plugin takes this file and makes it theindex.html
of your generated_site
. It can (and should) contain language redirect logic.
A Minimal Site
Let’s work with a small example - an English and German site with a welcome page and a proper redirect.
_config.yml
url: http://www.example.com
languages: ["en", "de"]
exclude_from_localizations: ["assets", "CNAME"]
defaultLang: en
languageNames:
de: Deutsch
en: English
...
_i18n/de.yml
site:
name: Meine Beispiel-Webseite
tagline: Mit gutem Beispiel voran!
description: Diese Beispiel-Webseite wird Sie begeistern!
keywords: Beispiel, Webseite, toll
main:
welcomeSection:
anchor: willkommen
heading: Willkommen auf dieser tollen Webseite!
p1: Hunderttausende Menschen sind von dieser Seite begeistert. Sie auch?
_i18n/en.yml
site:
name: My Example Website
tagline: Setting a good example!
description: This example website will amaze you!
keywords: example, website, amazing
main:
welcomeSection:
anchor: welcome
heading: Welcome to this marvelous webseite.
p1: Hundreds of thousands of people are amazed by this site. Are you?
index.html
<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>
<meta charset="utf-8">
<meta name="description" content="{% t site.description %}"/>
<meta name="keywords" content="{% t site.keywords %}"/>
<title>{% t site.name %} - {% t site.tagline %}</title>
</head>
<body>
<section id="{% t main.welcomeSection.anchor%}">
<div>
<h2>{% t main.welcomeSection.heading %}</h2>
<p>{% t main.welcomeSection.p1 %}</p>
</div>
</section>
</body>
</html>
base.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Amazing Example Website</title>
</head>
<body>
<script>
var lang = navigator.language || navigator.userLanguage;
if (lang.indexOf('de') == 0)
window.location = '/de/';
else
window.location = '/en/';
</script>
</body>
</html>
A couple things are worth noting:
- Usually the contents of
index.html
would be modularized into layouts and includes. For simplicity everything is in one file here. - Our
index.html
uses translated strings in both the header (meta data) and the body (content of the welcome page). You can use an arbitrary structure in your YAML files, just make sure it’s the same structure for every language file, otherwise Jekyll will complain about missing strings at compile time. - We can translate everything down to anchor IDs, so in the above example you can link to
www.example.com/de/#willkommen
as well aswww.example.com/en/#welcome
. - Our
base.html
is excluded from the plugin translation process on purpose, so we have to put in any meta data (title, description etc.) verbatim, in the default language. - The redirect in
base.html
uses Javascript. A better solution would be to do a server side redirect based on the requestAccept-Language
header. Unfortunately, it’s not possible to configure server redirects with GitHub Pages, so Javascript is the best we can do.
Adding a Page
Now let’s add an “About us” page in English and German.
mysite/
├ _i18n/
├ de/
└ about.md
├ en/
└ about.md
└ ...
├ about.md
└ ...
about.md
---
layout: page
title: pages.about
namespace: about
permalink: /about-us/
permalink_de: /ueber-uns/
---
{% tf about.md %}
_i18n/de/about.md
Wir sind ein wundervolles Team verrückter Visionäre.
_i18n/en/about.md
We are a wonderful team of crazy visionaries.
Note:
- The about page is only defined once (in the root directory) and filled with content of the respective language via the
{% tf ... %}
(translate file) tag on each translation pass. - The front-matter
title
has to be defined in the language-specific*.yml
files. This may seem confusing but actually makes sense given that it’s a Liquid variable that may be used in templates. - The
namespace
is optional. Actually, the term ‘namespace’ is a bit misleading. It was introduced by the original plugin, so I didn’t want to override it. You can use it to insert translated links with the{% tl ... %}
(translate link) tag.
Adding a Language Switcher
Let’s add the possibility to switch the language at the bottom of our index.html
:
<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>...</head>
<body>
...
<footer>
<a class="active" href="#">{{ site.languageNames[site.lang] }}</a>
{% for lang in site.languageNames %}
{% if lang[0] == site.lang %} {% continue %} {% endif %}
{% if page.namespace %}
<a href="{% tl {{ page.namespace }} {{ lang[0] }} %}">{{ lang[1] }}</a>
{% else %}
<a href="{{ site.baseurl_root }}/{{ lang[0] }}/">{{ lang[1] }}</a>
{% endif %}
{% endfor %}
</footer>
</body>
</html>
Note:
- The switcher code works on any page, not just
index.html
. For a real website you should put it in afooter.html
include file and reference that in your layout. - The currently active language is written out first, with a CSS class
.active
for highlighting. - The other languages are enumerated as links, in the order in which they are defined in
_config.yml
. - The links point to the translated version of the current page if it exists or the language’s
index.html
otherwise. - On a real website you probably want to make the switcher a bit prettier. If you’re using Bootstrap I recommend using a dropup menu. Here’s what that looks like on the site I built:
Adding Custom Collections
Now let’s add an FAQ section to our website. Because the collection of questions might grow over time, we don’t want to maintain a long list of question in one file for each language (these might be hard to keep in sync). Rather we would like to have an actual collection of question-and-answer documents. These could then be displayed in one file and/or separately on subpages with fully qualified URLs (which might be good for SEO).
First, we must add the custom collection to our _config.yml
, otherwise no output will be generated:
...
collections:
faq:
output: true
...
Then we create folders for our FAQ collection:
mysite/
├ _faq/
├ _i18n/
├ de/
├ _faq/
└ ...
├ en/
├ _faq/
└ ...
└ ...
└ ...
As always, the root _faq
folder will contain the actual documents (with front matter) and the language subfolders
the translations that are filled in.
Let’s add a question to the collection:
_faq/why-is-this-site-so-awesome.md
---
layout: faq-entry
title: faq.why-is-this-site-so-awesome
namespace: faq.why-is-this-site-so-awesome
permalink: /faq/why-is-this-site-so-awesome
permalink_de: /faq/warum-is-diese-seite-so-toll
---
{% tf _faq/why-is-this-site-so-awesome.md %}
_i18n/de/_faq/why-is-this-site-so-awesome.md
Das ist schwierig zu erklären. Viele Leute finden unsere Seite einfach großartig.
_i18n/en/_faq/why-is-this-site-so-awesome.md
That is hard to explain. Many people just think our website is great.
Having such detailed front matter for each FAQ entry may seem like a lot of boilerplate. However, remember that it allows you to tweak every aspect of how the entry appears, including its permalink, for every language. I find that while the initial setup is a bit tedious, adding new content over time is much more convenient with a custom collection. Other use cases for custom collections are listing the team members of a company or recommending literature in a reading list.
By default, the FAQ entries will be output to separate documents using the specified template and permalinks. To top it off, let’s add a dedicated FAQ page:
faq.md
---
layout: page
title: pages.faq
namespace: faq
permalink: /frequent-questions/
permalink_de: /haeufige-fragen/
---
{% for question in site.faq %}
<div>
<h4>{% t {{ question.title }} %}</h4>
{{ question.content }}
</div>
{% endfor %}
Voilà. Now you can access your FAQ both as a list at www.example.com/de/haufige-fragen
and
www.example.com/en/frequent-questions/
as well the individual entries at their own permalink, e.g.
www.example.com/de/faq/warum-is-diese-seite-so-toll/
and www.example.com/en/why-is-this-site-so-awesome/
.
Note that we didn’t need to create a faq.md
file in the language subfolders, because the root file simply outputs
translated content in a loop.
Social Sharing with OpenGraph Tags
Our website is almost ready to be published. But what if people actually realize how awesome it is and want to share it on Facebook, LinkedIn, WhatsApp etc.? As you probably know, it’s possible to control the appearance of “sharing previews” of a website by means of Open Graph meta tags.
For any content that resides in the language subfolders, this is easy - we just add a few lines to the header:
index.html
<!DOCTYPE html>
<html lang="{{ site.lang }}">
<head>
<meta charset="utf-8">
<meta name="description" content="{% t site.description %}"/>
<meta name="keywords" content="{% t site.keywords %}"/>
<title>{% t site.name %} - {% t site.tagline %}</title>
<meta property="og:title" content="{% t site.og.title %}"/>
<meta property="og:image" content="{% t site.og.image %}"/>
<meta property="og:description" content="{% t site.og.description %}"/>
<meta property="og:url" content="{{ site.baseurl }}{% if page.url %}{{ page.url }}{% endif %}"/>
<meta property="og:locale" content="{% t site.og.locale %}"/>
</head>
<body>
...
</body>
</html>
Then we simply add the desired strings to our language YAML files:
_i18n/de.yml
site:
...
og:
title: Meine Beispiel-Webseite
image: "http://www.example.com/assets/img/cool_1200x1200_image.png"
description: Diese Beispiel-Webseite wird Sie begeistern!
locale: de_DE
...
_i18n/en.yml
site:
...
og:
title: My Example Website
image: "http://www.example.com/assets/img/cool_1200x1200_image.png"
description: This example website will amaze you!
locale: de_DE
...
Note:
- Again, for a real website you should move the meta stuff into a
header.html
include file. - After you deploy your site, you can check whether Facebook picks up the correct info with the Sharing Debugger.
- Obviously you can refine the inclusion of
og:
tags in your page and post templates, enabling you to include specific images and descriptions for individual posts and pages via front matter.
A Small Flaw
Great - so now everything works, right? Almost. Unfortunately, there is small blemish with GitHub Pages that cannot
be fixed. What happens if people share your main URL (www.example.com
)? Facebook has specified a redirect mechanism
for this.
The basic idea is this: The main page (index.html
) contains the og:
tags for the default language (English in our
case). However, it can also contain one or several og:locale:alternate
tags to indicate that this page is available
in other languages. So we would have
<meta property="og:locale:alternate" content="de_DE" />
in our index.html
. The Facebook crawler would (on the first visit to the site) fetch the default version (en_US
),
and then refetch the page with the German locale (de_DE
), to get the the version preferred by users with a German
browser, which would contain the German og:tags
. To communicate that it is looking for the German version of the
main page, the crawler sends the X-Facebook-Locale
header and it also attaches the URL parameter fb_locale
to the
request URL.
And this is where we are out of luck. Since GitHub Pages only serves flat files and we have no way to configure or react to parameters in the request header, we miss this signal. Javascript would possibly let us extract the URL parameter, but at that point it’s too late - the meta data is served to the crawler before any piece of Javascript has even been loaded. I wrote to GitHub support about this and they confirmed that there is no workaround for this problem at the moment:
“That is correct. GitHub Pages is not intended to be a fully configurable host, but I can definitely pass this along as a feature request to the team. We don’t share our roadmap publicly, so I can’t say if or when it will be implemented, but my guess is that it won’t be anytime soon.”
So the best fallback at the moment is to put your default language og:
tags verbatim into base.html
. While this
is definitely a drawback (German users will see the English description and title when sharing your site), I don’t
think it’s a show stopper in practice. Whenever users share specific content (residing in one of the language
subfolders) they see the correct info.
Closing Thoughts
Creating a properly internationalized website is a lot of work, regardless of which toolbox you use. Jekyll and GitHub Pages give you pretty much everything you need. If your requirements are similar to mine (every language in its subfolder, translated permalinks, custom collections etc.) you may find the modified plugin useful. Additionally you may benefit from some of the tips and tricks explained above. Last but not least, I would like to say a big “thank you” to Anthony Gaudino and all contributors for creating the original plugin, which is fantastic and made my life a whole lot easier.