No-markdown islands

Hi, I’ve rapidly looked through the spec and it seems that 6.8 Raw HTML finally address something which currently isn’t specifically addressed by any parser, but which I find a must.

I’ve mentioned it in this discussion and in other places I can’t recall.

The general situation is the need to tell the markdown parser to stop parsing,
so that anything can be entered safely without the risk of being accidentally
transformed by the parser, and also, be able to tell the parser that the
no-markdown island has finished and the parser can go on with its job.

It would be nice to also have such thing working inline with surrounding markdown.

This makes it a standard way for supporting, for example, MathJax, which
I’ve had several issues with when using Redcarpet and other parsers once I settled
with Kramdown, which claims to work with MathJax specifically, not meaning to
provide a general solution like in 6.8.

With Redcarpet for example, for trying to enter LaTeX inline to surrounding markdown
text, I tried to use SPAN tags surrounding the LaTeX, but the parser still transformed it
somehow.

So, are the current mechanisms in the spec able to really provide such
"no-markdown here, don’t parse" and is it validated in the test suite?

Regards

4 Likes

This does seem like a gaping hole in the current spec. The only thing I can think of currently would be to use an HTML tag to trick the parser into ignoring it but I would definitely prefer an explicit mention in the spec. Also, parsing Markdown inside HTML is a popular option in kramdown, so the trick isn’t that reliable.

MediaWiki has a special <nowiki> … </nowiki> tag to allow adding arbitrary stuff that isn’t passed through the parser. I think something like this would work fine for Markdown too.

1 Like

This feature is just demanding support for a kind of literal in CommonMark. C++ has a nice mechanism for providing raw string literals, which I believe could serve as inspiration:

R"(“foo”“bar”)" → “foo”“bar"
R"baz(”(foo)""(bar)")baz" → “(foo)”"(bar)"

Where R"?( and )?" are delimiters, with ? meaning any more complex delimiter, so to avoid clashing delimiters with the contents of the raw string literal.

Although I see that having such a simpler special purpose markdown-like syntax for this probably would provide an equivalent functionality that Raw HTML could provide.

Text in code blocks and inline code is not interpreted as markdown, which I think is a valid use if you want to show the reader LaTeX code, e.g. 2*7*\pi. If you want the LaTeX to be interpreted for math rendering, usually $...$ is used. Do you have further use cases for a no-markdown islands?

@mb21hi, it’s not valid because code blocks are, AFAIK, rendered as <pre>…</pre> which would make LaTeX code itself to be shown verbatim.

I’m not following. Do you want raw TeX like in Pandoc?

@mb21 No, I’m suggesting support for something simpler that would also help with MathJax right away, but not limited to it.

By the way, Rust also is another programming language that offers raw string literals using clever delimiters. For example, r##"foo #"# bar"## gives foo #"# bar, the amount of #s can change for the delimiters, so to avoid clashes with the contents.

Markdown uses _ and * for italic, bold, and bold-italic. It could be that more than 3 of such characters could mean no-markdown (or some other character counting from 1). For example, ****this won't be parsed****.

I guess what I’m asking is: do we really need markdown-islands? Because to me they seem like kind of a hack, they don’t correspond to any semantic entity like “code” or “math”. I don’t see it analogous to programming languages, since there you always have quotes for string, like x = "foo", and sometimes you want another delimiter than quotes to avoid having to escape them. But in markdown and other markup languages like HTML, you don’t have that.

As for your MathJax use case: e.g. Pandoc seems to do quite a good job by using $...$. Or are you saying there are problems when using that markdown in implementations that don’t support math? A basic test turned out not too bad, though.

@mb21, I think there’s some confusion, I’m not talking about support for parsing and taking action to render LaTeX, or any other language. When using MathJax in a web page for example, this is the job of the MathJax script to take LaTeX and render it appropriatelly, MathJax will understand its delimiters as $...$ or \(...)\ IIRC, and will take the stuff inside to produce rendered LaTeX, but what’s inside is just verbatim LaTeX that didn’t get pre-processed by some markdown parser, it should not.
Such job that MathJax provides using javascript, could also be provided in other context, I don’t have another example, but one can infer that it would also be necessary to not have parsing of such input that’s intended to be consumed by a script.

If you’re interested, I think I proposed something similar to you for the mathjax example at least: Mathematics extension

1 Like

Something like this would be useful for embedding MathJax, we also ran into this problem at Stack Exchange and ended up hard coding an ignore sequence whenver we saw the MathJax start and end blocks (the dollar signs, etc, as I recall). For example

http://meta.math.stackexchange.com/questions/tagged/tex?sort=votes

Yes, I think it would be good to have in the spec, that anything between $…$ and $$…$$ or between \(… \) and \[ …\] shouldn’t be processed by CommonMark. This is the standard for indicating LaTeX math blocks. And it would be great if it always possible to add mathjax support to any CommonMark editor. Without having to write difficult plug-ins, like the plugin I wrote for discourse: https://meta.discourse.org/t/mathjax-plugin-supports-math-notation-using-latex/12826

IMHO this is a part of more generic problem - format for inline extentions. Block extentions are now discussed in several threads, but those markup is not ok for inline elements (in paragraphs and lists).

On practice, inline elements have different philosophy - syntax should be as simple as possible. So, it would be more convenient for users to have different markers for each case, than one universal marker with name and params.

Does a <![CDATA[]]> section fit your definition of a no-markdown island?

You can put almost anything you want in one, except maybe ]]> There seems to be a few fine points to work out around them, but they are already in the spec.

I was testing in the block context and found a spec issue, see: Drawing a distinction between HTML block elements and non-element tags. We may need to pay similar attention to the difference between Raw HTML and non-element tags like CDATA.

No, it doesn’t fit since the CDATA section ends up in the rendered output wrapping its contents. MathJax LaTeX, for example, is not expected to be wrapped by this in the output.

I see. So it sounds like this topic belongs in the Extensions category.

1 Like

Indeed, but I’d like to point out that the no-markdown literal I’m suggesting can encompass, but is not limited to MathJax, it’s a simpler feature for asking the parser to not parse contents, just output them verbatim. So, let’s say, if the delimiters were ''', so _foo_ '''$E = mc^2$''' *bar* would be the thing that would make $E = mc^2$ goes to the output without being parsed, leaving the $...$ thing for the job of MathJax.

Maybe, but, at last for me, this is such a must to be right in the spec, I miss it didn’t happen to have been included in Markdown from day one.