Adding `lang="lang"` syntax

Would like to have a way to define the language used in a markdown file. Especially if the page has multiple languages.

The Government of Canada is working a lot on Github these days but we have to supply all our documentation in both English and French. This is usually down with adding content in both languages to a .md.

See our gc-da11yn.github.io/README.md for an example.

We also have to make all content accessible to all users but if they use a screen reader to read the French text on our markdown pages in the proper language.

I guess we could put a <div lang="fr"> around the text but it would be nice to have a way to do it in markdown.

I’ve seen a lot of talk about adding other attributes and the idea that markdown is a simplified language but I think adding the lang to blocks of content would be an important step in making sure markdown text is accessible to all users.

I’ve seen discussions on Explicit RTL indication in pure Markdown, as well as on Abbreviations (and acronyms).

I’ve also started a discussion on the GitHub Flavored Markdown repo, Multi language page in GFM #243.

I’d really like to see something like this added to your awesome project.

Thanks,

Shawn Thompson, WAS
Web Accessibility Technical Advisor

Usually when content is presented in multiple languages, they are on separate pages. For such organizations, there is an easy solution: a file name or directory structure scheme that uses language codes. For example README.fr.md and README.cn.md or docs/en/ and docs/sw/. Another solution is to place the language code in a metadata block at the top of each file.

Static site generators that support Markdown and multiple languages usually use on or more of these methods. For example, the static site generator Hugo supports both filename and directory approaches, and this Gatsby plugin uses filenames.

The typical cases where multiple languages appear on the same page is usually limited to a language selector or on the homepage. The language selector is better if it is language-independent (i.e. using symbols, 🇯🇵) but if words are used (e.g. for accessibility readers) they’d be placed in the common page template and/or Javascript. The home page is a unique, special page so you can just do it in HTML (or HTML embedded in Markdown) – there might be other reasons the homepage is too complex for Markdown anyway.

If you have a site design where multiple languages are presented on the same page, it is easy enough to have a common template page (e.g. written in HTML) into which the renderings of individual Markdown files for each language are inserted.

Thanks for your comment @vas. All valid ideas. I’d rather tackle this from the source.

This was brought up because of Github used parsed Markdown in repositories on their site.

Even if if you have a README.fr.md it would be read the French text with English pronunciation because the Github page itself is in English where it has <html lang="en"> in the source.

I raised the issue with Github Markdown Format, see link above, someone suggested raising it here.

In the Web Content Accessibility Guidelines there is Understanding Success Criterion 3.1.2: Language of Parts which talks about switching languages inside of web content and that it should be properly marked up.

If there was a way to declare language within markdown this would solve the Github issues, as well as if you have a multilingual markdown page.

@shawnthompson, now I think I understand what it is you are trying to do (see end of this comment), and why my answer is not useful.

In any case, I should have prefaced my answer with this:

While embedding language metadata in content is obviously useful for a number of reasons, I think it is highly unlikely any new syntax will be added to CommonMark, even more syntax for metadata:

  • Markdown is designed specifically for readability by human eyes. I don’t mean that in some bigoted way (i.e. for sighted people only). I mean humans as opposed to machines. For the latter we already have HTML and others. It’s designed so that plain text content can be readable as-is, without having to be published or “rendered”, without the reader having to be trained. Syntax is very simple and sparse, with zero accommodations for metadata, presentation attributes, etc. With one exception:

  • Markdown supports HTML. If you absolutely have to embed metadata, presentation attributes and the like, you simply embed HTML for that. The option you suggest, placing “a <div lang="fr"> around the text” is in fact not a hack, but is the Markdown way. You may think that’s ugly, but introducing any kind of metadata syntax to Markdown is going to be about as ugly.

  • You might be a newcomer, in which case you might be unaware that despite the countless requests on this forum, CommonMark has added no new features to Markdown in its decade of existence besides one: fenced code blocks. I may be forgetting another, but in any case it’s only one or two, and zero since it settled to where it is now between seven or and ten years ago. The project’s goal and focus has been to strictly specify the existing Markdown syntax. Notice the age of the other feature requests you mention and the countless others on this forum.

Is there a reason that using <div lang="fr"> is not acceptable for your project’s needs?

This wasn’t clear to me from your original post or your linked GitHub Issue, but you last reply suggests that your goal is for multilingual Markdown content in GitHub repos be accessible via screen readers when they are consumed directly via GitHub.com’s own web interface. Is this the case?

There are other options for multilingual pages beyond those I previously suggested (which I am happy to give you), but the only one that will work for GitHub rendered Markdown is using embedded HTML such as <div lang="cn">.

Pandoc’s markdown allows you to use a bracketed span or fenced
div like this:

::: {lang=fr}
Après deux ans de silence et de patience, malgré
mes résolutions, je reprends la plume. Lecteur, suspendez votre
jugement sur les raisons qui m’y forcent : vous n’en pouvez juger
qu’après m’avoir lu.
:::

In French, the word [lapin]{lang=fr} means rabbit.

This isn’t an official commonmark extension (as noted, there are
none). However, pandoc does allow you to specify this extension
with commonmark input:

pandoc -f commonmark+bracketed_spans+fenced_divs

Shawn Thompson via CommonMark Discussion
noreply@talk.commonmark.org writes:

2 Likes