Explicit RTL indication in pure Markdown

Hey, I went ahead and added the dir attributes to the code blocks in your post, because they’re allowed by the sanitizer here. Now you have concrete examples without requiring people to edit the HTML with their browser.

Note that this is NOT a Markdown solution, it is a pure HTML solution.

Thanks @riking ! much better :slight_smile:

@01walid @obeid Did you come up with an agreement?

I naively started outlining an RTL markdown project, got parts of it translated, and then realize that GitHub doesn’t support RTL. #doh

Does anyone have recommendations on Markdown tools that render RTL?

Also, I see that Dariush Abbasi has a very simple version of RTL markdown.

Also, I’d love @riking @mb21 opinion on this.

Stackedit has built-in config option of RTL direction, but that’s not very convenient unless all your docs are RTL.

I’ve been using <div dir=rtl markdown=1></div> around Hebrew docs.
many tools support that on convertion to html (IIRC I used retext and pandoc).
retext is convenient for editing mixed-direction source (as long as you put opposite direction parts on separate line).

In any case conversion to non-HTML is harder. In current landscape I’d try writing pandoc filters that understand <div dir=rtl>.

BTW, github does (currently) support dir=rtl in markdown rendering — but does not support markdown=1:

However, it works fine in github pages processed by jekyll: https://cben.github.io/sandbox/README.html

Okay, I’ve taken a look at the issue of bidirectional text again. For a very accessible introduction, see Unicode Bidirectional Algorithm basics. Quotes below are from Unicode Standards Annex #9: The Bidirectional Alogrithm.

If I understand correctly, as of Unicode 6.3 and later, the preferred way to do bidi text is to use two mechanisms only:

  • Implicit Directional Formatting Characters: LRM (LEFT-TO-RIGHT MARK), RLM (RIGHT-TO-LEFT MARK) and ALM (ARABIC LETTER MARK) (ALM behaves the same as RLM, except around numbers). “Their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.”

  • Marking up the direction of ranges of text, using either “Explicit Directional Isolate Formatting Characters” or better yet, when using a markup language like HTML, use the dir attribute or similar. “On web pages, the explicit directional formatting characters […] should be replaced by using the dir attribute and the elements BDI and BDO. This does not apply to the implicit directional formatting characters.” BDI is only used when the directionality is not known, e.g. from user input saved in a db (thus doesn’t apply to markdown), and BDO is used to override the normal bidirectionality rules which seems like an edge-case that can be simulated by wrapping each character in an element with a dir attribute in the worst case (or do you think we’d really need a markdown BDO equivalent?)

Reading through Authoring HTML: Handling Right-to-left Scripts, I tried to reproduce similar ‘funny’ behavior with markdown as well:

[مشس هخصث خهس تخت تخهثز](#العربي)

Note that this is a valid internal link in markdown, in any bidi-aware browser and editor, the ](# are just displayed mirrored (i.e. rtl) since the surrounding text is rtl as well. But come to think about it, this is probably fine and could even be considered a strength of the markdown syntax (e.g. as opposed to HTML/XML). Similarly, markdown numbered lists display naturally for RTL scripts:

1. مشس هخصث خهس تخت تخهثز
2. مشس هخصث خهس تخت تخهثز

So… I think we could get away with “just” supporting the dir attribute, so you would write e.g. The title is [مشس هخصث خهس تخت تخهثز!]{dir=rtl} in Arabic., where the ! would be part of the title. However, this leaves the exclamation mark on the wrong side of the text when editing markdown, although it will be displayed correctly in a browser upon convertion to HTML. To display it correctly while editing markdown as well, you’d have to insert a ALM (or RLM) behind the ! (of course, you need a text editors that support bidi for this to work).

btw, can someone link to a good up to date resource about bidirectional text in LaTeX and ConTeXt? If found this PDF, but it’s from 2001.

Basically the question is: if a markdown processor like Pandoc were to add native span and div syntax with the dir attribute and pass along unicode LRM, RLM and ALM chars:

  1. this would already suffice for bidirectional HTML output, right?
  2. what LaTeX and ConTeXt code would need to be emitted?

ConTeXt minimal sample:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\setupdirections[bidi=on,method=two]
\starttext
The title is !مشس هخصث خهس تخت تخهثز in Arabic.
\stoptext

@ousia thanks! is there some documentation on this? However, while the exclamation mark is visually on the correct side in your example, logically (order of characters in memory) it’s on the wrong side—the ! is supposed to be at the end of the title, which visually happens to be on the left when read from right to left. What is the equivalent in ConTeXt of <span dir="rtl">?

@mb21, you are welcome.

As far as I know, the command is \righttoleft.

Since it is a command switch, it should be enclosed in braces, such as in:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

Thanks, I first had to install the font (tlmgr install almfixed), but now it works. Btw, the global direction can be set with \setupalign[r2l].

@mb21, sorry I didn’t know that TeX Live hadn’t the font included (I use the ConTeXt Suite).

This sample may work with TeX Live without extra font installation:

\definefontfamily [mainface] [rm] [FreeSerif] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

BTW, I would avoid using \setupalign[r2l] unless text orientation is explicitly set in the document’s metadata.

As far as my research, using bidi is the best approach for handling RTL in almost any context including markdown.

For editor, you just need to add dir="auto" into textarea tag. rest should be handled by the rendering engine which would simply add dir="auto" attribute into each top level elements while composing HTML file.

According to the W3C standard, “auto” should only be used as a last resort:

The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). Authors are urged to only use this value as a last resort when the direction of the text is truly unknown and no better server-side heuristic can be applied.

1 Like

There is no other server side solution unless you want to add something extra to the syntax of Markdown which is absolutely not necessary.

Of course one shouldn’t use this method if he/she is sure the direction will be RTL or LTR. but when the text is mixed, this is the right approach.

I have implemented this on FluxBB and without any modification in database or BB syntax, whole of forums using that software are now rendering new and old texts smartly based on the context.

May you tell me what is wrong with the implementation I propose?

@ahangarha if you have a text that mixes some ltr and some rtl text, auto is sometimes not good enough. See https://www.w3.org/International/articles/inline-bidi-markup/uba-basics#isolation

btw. this is actually implemented in pandoc now… see https://pandoc.org/MANUAL.html#language-variables

1 Like

Markdown should remain markdown. It should remain simple. It is not supposed to support complex and rare text formatting.

My native language is Persian and I know the problem well. I don’t want to add some extra code to just make my text direction RTL or LTR.

To me, it is very clear that we need to leave decision on direction of the paragraph (any block of text) to the browser by by adding dir="auto" to the block tag (like <p>). Then If you need to do anything else for the rare cases, do it after applying this.

Let us be able to use markdown for RTL. will handle complex cases later or never.

As per my experience and understanding, when we are dealing with mixed RTL and LTR txt, 99 percent of the cases, dir="auto" solves the problem. These 99 percents are to determine paragraph direction. The 1 remaining percent is related to rare cases. To make decision for this rare 1 percent, don’t stop!

Any extra action would be addition to the dir="auto" as per my understanding.

Use this css rule:

unicode-bidi:plaintext;

on your element where the rendered markdown.
I’m from Algeria too :upside_down_face:

2 Likes

It seems it works. I have tried it in some examples and the result was amazing.

Still I have to apply it on different other elements like lists, table and different mixture of RTL and LTR text.

In cases when “first strong character” heuristic is wrong, ideally you want an override that will also be understood by whatever plain-text editor you are using to edit the markdown source.
This is quite important for editing, otherwise the wrong order and “jumping punctuation” make it very annoying to go back and change the text!

Overriding with explicit <div dir=rtl> or some other syntax would probably not do that for any editor? (Dual-pane editors will render it in preview pane, but not source pane.)
Overriding by inserting an LRM / RLM character would work with bidi-friendly editors :tada:
The heuristic usually fails when an RTL sentence starts with LTR word/quotation or vice-versa, so overriding by U+2068 FIRST STRONG ISOLATE … U+2069 POP DIRECTIONAL ISOLATE might work too in an editor (support less common yet). But it might not translate into a browser(?). That’s what <bdi>, nowdays implied by setting dir, is for.

  • Many plain-text editors don’t auto-detect paragraph direction at all :frowning:. E.g. windows Notepad doesn’t, it lets you use left/right Ctrl+Shift to set default direction for whole window but that’s not stored in the text at all! Can’t help in this case.

  • Gnome’s gedit does auto-detect direction of each line separately. So does retext, based on same GTK widgets. I don’t know how many such editors exist, but heavy RTL users probably use them :wink:

  • For web <textarea> inputs, default used to depend on browser but unicode-bidi: plaintext makes modern browsers (it’s CSS3, https://caniuse.com/#feat=mdn-css_properties_unicode-bidi_plaintext) auto-detect direction per line :heart_eyes:

    It might also help rendered output, requiring less invasive changes than dir=auto(?) See https://github.com/gingko/client/issues/78#issuecomment-418448612 for an example of a program where it helped without deeper modifications. Though I don’t fully understand its behavior.

Given an editor that auto-detects direction per line, it’s very helpful to use markdown processors that allows line break in source WITHOUT a line break in the output. Then you can split mixed sentences/paragraphs such into LTR and RTL fragments on separate lines, making editing much easier.


To take this DIscuss for example, it’s pretty sub-optimal for bidi…

  • textarea source editing is all LTR, but styling it unicode-bidi: plaintext in devtools helps!
  • source line breaks force output line breaks. I actually think that’s a right choice for a commenting interface for non-specialist users (this forum being an exception), but for bidi it precludes the line-per-directional-piece hack that makes editing easier (which few users know).
  • output does not auto-detect direction on blocks.
  • inline elements don’t “isolate”, allowing structure to fragment: :broken_heart:
    אאא בבב ccc ddd
    Source logical order here was (using A=א B=ב notation): AAA `BBB ccc` ddd, note how the “directional run” detected by unicode algorithm crosses the element boundary! That was deliberate in the original algorithm, but mostly wrong, and isolates were invented later to fix that.
    • However, note that same problem exists in the textarea editor (independent of “plaintext”), as it can’t understand markdown inline constructs indicate structure.
      I believe forcing isolation for at least some elements eg. literals, is the Right Thing, but it does conflict with the “editor should match output” argument I made above…
  • the only good part is that <div dir=auto> works, but really, how many can you insert manually?