Explicit RTL indication in pure Markdown

Hi,

I strongly believe that RTL in Markdown should be explicit, and has its own syntax indication within the markdown document itself.

##Why?
The current state of writing non-LTR documents with Markdown is simply awful… there’s no way you can say this element or document is RTL. hence any RTL document will result is messed text, especially if there are Latin words within the RTL text. Some think there’s no problem and this is abstracted away by Unicode or something… but Unicode does NOT handle direction…

Is this implementation/render-specific ?

Some folks say: man just put dir='auto' in the resulted HTML, let the browser do the work and you’re done! others suggest to make the parser detect the language being parsed and its direction… but this won’t cover the majority of cases… it just won’t help it…

Example:
First, to understand what I mean, compare the following 2 <code> blocks, the first without dir="rtl" and the second one with:

أسكن في SidiAmar, Annaba, Algeria.
أسكن في SidiAmar, Annaba, Algeria.

The first one is the current state. Some parser or language detectors (Like Facebook comments) detect the content’s language and decide its direction… but the majority of them (like in Facebook) count the RTL words and the Latin words, the higher wins, like in our example above sum(Latin words) > sum(Arabic words), thus the text is LTR, but it is NOT.

dir="auto" is a bit smarter, it surely helps, but unreliable… here’s why:

Sidi Amar, Annaba, Algeria، هذا مقر سكني

Inspect the above <code> element. It has dir="auto". Did it help? surely no, because it thinks it is LTR as it start with Latin words, but the text is intended to be RTL. Now change it to dir="rtl" to see what I mean.

Conclusion

RTL needs to have an explicit indication in Markdown regardless the content language… the examples I gave are very narrowed, there are other use cases where it really needs RTL explicit indication… (think about editors as well, how one can write RTL in markdown editors?)

If we want to spread Markdown in Wikis, documentation, books… etc, this needs to be in spec due to its importance…

Spec example:

<--rtl-- Foo
Bar

Would result into:

Foo Bar

<p dir="rtl">Foo
Bar</p>

Thanks.

1 Like

I guess you always could resort to the Right-to-left mark unicode char and its HTML named entity &#x202b; although it’s not exactly nice.

Other than that, we’d need attributes on paragraphs, like:

{dir=rtl}
paragraph

or equally:

paragraph
{dir=rtl}

Edit: just discovered that there’s already a topic on RTL.

Yes, we need a clean solution.

Aside attributes on paragraphs, we need them on the whole document too (in case where the whole document is RTL).

The topic you mentioned addresses implementation, but I’m addressing spec. this needs to be resolved at spec level first.

1 Like

Hey, I went ahead and added the dir attributes to the code blocks in your post, because they’re allowed by the sanitizer here. Now you have concrete examples without requiring people to edit the HTML with their browser.

Note that this is NOT a Markdown solution, it is a pure HTML solution.

Thanks @riking ! much better :slight_smile:

@01walid @obeid Did you come up with an agreement?

I naively started outlining an RTL markdown project, got parts of it translated, and then realize that GitHub doesn’t support RTL. #doh

Does anyone have recommendations on Markdown tools that render RTL?

Also, I see that Dariush Abbasi has a very simple version of RTL markdown.

Also, I’d love @riking @mb21 opinion on this.

Stackedit has built-in config option of RTL direction, but that’s not very convenient unless all your docs are RTL.

I’ve been using <div dir=rtl markdown=1></div> around Hebrew docs.
many tools support that on convertion to html (IIRC I used retext and pandoc).
retext is convenient for editing mixed-direction source (as long as you put opposite direction parts on separate line).

In any case conversion to non-HTML is harder. In current landscape I’d try writing pandoc filters that understand <div dir=rtl>.

BTW, github does (currently) support dir=rtl in markdown rendering — but does not support markdown=1:


However, it works fine in github pages processed by jekyll: https://cben.github.io/sandbox/README.html

Okay, I’ve taken a look at the issue of bidirectional text again. For a very accessible introduction, see Unicode Bidirectional Algorithm basics. Quotes below are from Unicode Standards Annex #9: The Bidirectional Alogrithm.

If I understand correctly, as of Unicode 6.3 and later, the preferred way to do bidi text is to use two mechanisms only:

  • Implicit Directional Formatting Characters: LRM (LEFT-TO-RIGHT MARK), RLM (RIGHT-TO-LEFT MARK) and ALM (ARABIC LETTER MARK) (ALM behaves the same as RLM, except around numbers). “Their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.”

  • Marking up the direction of ranges of text, using either “Explicit Directional Isolate Formatting Characters” or better yet, when using a markup language like HTML, use the dir attribute or similar. “On web pages, the explicit directional formatting characters […] should be replaced by using the dir attribute and the elements BDI and BDO. This does not apply to the implicit directional formatting characters.” BDI is only used when the directionality is not known, e.g. from user input saved in a db (thus doesn’t apply to markdown), and BDO is used to override the normal bidirectionality rules which seems like an edge-case that can be simulated by wrapping each character in an element with a dir attribute in the worst case (or do you think we’d really need a markdown BDO equivalent?)

Reading through Authoring HTML: Handling Right-to-left Scripts, I tried to reproduce similar ‘funny’ behavior with markdown as well:

[مشس هخصث خهس تخت تخهثز](#العربي)

Note that this is a valid internal link in markdown, in any bidi-aware browser and editor, the ](# are just displayed mirrored (i.e. rtl) since the surrounding text is rtl as well. But come to think about it, this is probably fine and could even be considered a strength of the markdown syntax (e.g. as opposed to HTML/XML). Similarly, markdown numbered lists display naturally for RTL scripts:

1. مشس هخصث خهس تخت تخهثز
2. مشس هخصث خهس تخت تخهثز

So… I think we could get away with “just” supporting the dir attribute, so you would write e.g. The title is [مشس هخصث خهس تخت تخهثز!]{dir=rtl} in Arabic., where the ! would be part of the title. However, this leaves the exclamation mark on the wrong side of the text when editing markdown, although it will be displayed correctly in a browser upon convertion to HTML. To display it correctly while editing markdown as well, you’d have to insert a ALM (or RLM) behind the ! (of course, you need a text editors that support bidi for this to work).

btw, can someone link to a good up to date resource about bidirectional text in LaTeX and ConTeXt? If found this PDF, but it’s from 2001.

Basically the question is: if a markdown processor like Pandoc were to add native span and div syntax with the dir attribute and pass along unicode LRM, RLM and ALM chars:

  1. this would already suffice for bidirectional HTML output, right?
  2. what LaTeX and ConTeXt code would need to be emitted?

ConTeXt minimal sample:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\setupdirections[bidi=on,method=two]
\starttext
The title is !مشس هخصث خهس تخت تخهثز in Arabic.
\stoptext

@ousia thanks! is there some documentation on this? However, while the exclamation mark is visually on the correct side in your example, logically (order of characters in memory) it’s on the wrong side—the ! is supposed to be at the end of the title, which visually happens to be on the left when read from right to left. What is the equivalent in ConTeXt of <span dir="rtl">?

@mb21, you are welcome.

As far as I know, the command is \righttoleft.

Since it is a command switch, it should be enclosed in braces, such as in:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

Thanks, I first had to install the font (tlmgr install almfixed), but now it works. Btw, the global direction can be set with \setupalign[r2l].

@mb21, sorry I didn’t know that TeX Live hadn’t the font included (I use the ConTeXt Suite).

This sample may work with TeX Live without extra font installation:

\definefontfamily [mainface] [rm] [FreeSerif] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

BTW, I would avoid using \setupalign[r2l] unless text orientation is explicitly set in the document’s metadata.