Explicit RTL indication in pure Markdown

Yes, we need a clean solution.

Aside attributes on paragraphs, we need them on the whole document too (in case where the whole document is RTL).

The topic you mentioned addresses implementation, but I’m addressing spec. this needs to be resolved at spec level first.

1 Like

Hey, I went ahead and added the dir attributes to the code blocks in your post, because they’re allowed by the sanitizer here. Now you have concrete examples without requiring people to edit the HTML with their browser.

Note that this is NOT a Markdown solution, it is a pure HTML solution.

Thanks @riking ! much better :slight_smile:

@01walid @obeid Did you come up with an agreement?

I naively started outlining an RTL markdown project, got parts of it translated, and then realize that GitHub doesn’t support RTL. #doh

Does anyone have recommendations on Markdown tools that render RTL?

Also, I see that Dariush Abbasi has a very simple version of RTL markdown.

Also, I’d love @riking @mb21 opinion on this.

Stackedit has built-in config option of RTL direction, but that’s not very convenient unless all your docs are RTL.

I’ve been using <div dir=rtl markdown=1></div> around Hebrew docs.
many tools support that on convertion to html (IIRC I used retext and pandoc).
retext is convenient for editing mixed-direction source (as long as you put opposite direction parts on separate line).

In any case conversion to non-HTML is harder. In current landscape I’d try writing pandoc filters that understand <div dir=rtl>.

BTW, github does (currently) support dir=rtl in markdown rendering — but does not support markdown=1:

However, it works fine in github pages processed by jekyll: https://cben.github.io/sandbox/README.html

Okay, I’ve taken a look at the issue of bidirectional text again. For a very accessible introduction, see Unicode Bidirectional Algorithm basics. Quotes below are from Unicode Standards Annex #9: The Bidirectional Alogrithm.

If I understand correctly, as of Unicode 6.3 and later, the preferred way to do bidi text is to use two mechanisms only:

  • Implicit Directional Formatting Characters: LRM (LEFT-TO-RIGHT MARK), RLM (RIGHT-TO-LEFT MARK) and ALM (ARABIC LETTER MARK) (ALM behaves the same as RLM, except around numbers). “Their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.”

  • Marking up the direction of ranges of text, using either “Explicit Directional Isolate Formatting Characters” or better yet, when using a markup language like HTML, use the dir attribute or similar. “On web pages, the explicit directional formatting characters […] should be replaced by using the dir attribute and the elements BDI and BDO. This does not apply to the implicit directional formatting characters.” BDI is only used when the directionality is not known, e.g. from user input saved in a db (thus doesn’t apply to markdown), and BDO is used to override the normal bidirectionality rules which seems like an edge-case that can be simulated by wrapping each character in an element with a dir attribute in the worst case (or do you think we’d really need a markdown BDO equivalent?)

Reading through Authoring HTML: Handling Right-to-left Scripts, I tried to reproduce similar ‘funny’ behavior with markdown as well:

[مشس هخصث خهس تخت تخهثز](#العربي)

Note that this is a valid internal link in markdown, in any bidi-aware browser and editor, the ](# are just displayed mirrored (i.e. rtl) since the surrounding text is rtl as well. But come to think about it, this is probably fine and could even be considered a strength of the markdown syntax (e.g. as opposed to HTML/XML). Similarly, markdown numbered lists display naturally for RTL scripts:

1. مشس هخصث خهس تخت تخهثز
2. مشس هخصث خهس تخت تخهثز

So… I think we could get away with “just” supporting the dir attribute, so you would write e.g. The title is [مشس هخصث خهس تخت تخهثز!]{dir=rtl} in Arabic., where the ! would be part of the title. However, this leaves the exclamation mark on the wrong side of the text when editing markdown, although it will be displayed correctly in a browser upon convertion to HTML. To display it correctly while editing markdown as well, you’d have to insert a ALM (or RLM) behind the ! (of course, you need a text editors that support bidi for this to work).

btw, can someone link to a good up to date resource about bidirectional text in LaTeX and ConTeXt? If found this PDF, but it’s from 2001.

Basically the question is: if a markdown processor like Pandoc were to add native span and div syntax with the dir attribute and pass along unicode LRM, RLM and ALM chars:

  1. this would already suffice for bidirectional HTML output, right?
  2. what LaTeX and ConTeXt code would need to be emitted?

ConTeXt minimal sample:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\setupdirections[bidi=on,method=two]
\starttext
The title is !مشس هخصث خهس تخت تخهثز in Arabic.
\stoptext

@ousia thanks! is there some documentation on this? However, while the exclamation mark is visually on the correct side in your example, logically (order of characters in memory) it’s on the wrong side—the ! is supposed to be at the end of the title, which visually happens to be on the left when read from right to left. What is the equivalent in ConTeXt of <span dir="rtl">?

@mb21, you are welcome.

As far as I know, the command is \righttoleft.

Since it is a command switch, it should be enclosed in braces, such as in:

\definefontfamily [mainface] [rm] [ALM Fixed] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

Thanks, I first had to install the font (tlmgr install almfixed), but now it works. Btw, the global direction can be set with \setupalign[r2l].

@mb21, sorry I didn’t know that TeX Live hadn’t the font included (I use the ConTeXt Suite).

This sample may work with TeX Live without extra font installation:

\definefontfamily [mainface] [rm] [FreeSerif] [features=arabic]
\setupbodyfont[mainface,12pt]
\starttext
The title is {\righttoleft مشس هخصث خهس تخت تخهثز!} in Arabic.
\stoptext

BTW, I would avoid using \setupalign[r2l] unless text orientation is explicitly set in the document’s metadata.

As far as my research, using bidi is the best approach for handling RTL in almost any context including markdown.

For editor, you just need to add dir="auto" into textarea tag. rest should be handled by the rendering engine which would simply add dir="auto" attribute into each top level elements while composing HTML file.

According to the W3C standard, “auto” should only be used as a last resort:

The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). Authors are urged to only use this value as a last resort when the direction of the text is truly unknown and no better server-side heuristic can be applied.

1 Like

There is no other server side solution unless you want to add something extra to the syntax of Markdown which is absolutely not necessary.

Of course one shouldn’t use this method if he/she is sure the direction will be RTL or LTR. but when the text is mixed, this is the right approach.

I have implemented this on FluxBB and without any modification in database or BB syntax, whole of forums using that software are now rendering new and old texts smartly based on the context.

May you tell me what is wrong with the implementation I propose?

@ahangarha if you have a text that mixes some ltr and some rtl text, auto is sometimes not good enough. See https://www.w3.org/International/articles/inline-bidi-markup/uba-basics#isolation

btw. this is actually implemented in pandoc now… see https://pandoc.org/MANUAL.html#language-variables

1 Like

Markdown should remain markdown. It should remain simple. It is not supposed to support complex and rare text formatting.

My native language is Persian and I know the problem well. I don’t want to add some extra code to just make my text direction RTL or LTR.

To me, it is very clear that we need to leave decision on direction of the paragraph (any block of text) to the browser by by adding dir="auto" to the block tag (like <p>). Then If you need to do anything else for the rare cases, do it after applying this.

Let us be able to use markdown for RTL. will handle complex cases later or never.

As per my experience and understanding, when we are dealing with mixed RTL and LTR txt, 99 percent of the cases, dir="auto" solves the problem. These 99 percents are to determine paragraph direction. The 1 remaining percent is related to rare cases. To make decision for this rare 1 percent, don’t stop!

Any extra action would be addition to the dir="auto" as per my understanding.

Use this css rule:

unicode-bidi:plaintext;

on your element where the rendered markdown.
I’m from Algeria too :upside_down_face:

2 Likes

It seems it works. I have tried it in some examples and the result was amazing.

Still I have to apply it on different other elements like lists, table and different mixture of RTL and LTR text.