No-markdown islands

I’m not following. Do you want raw TeX like in Pandoc?

@mb21 No, I’m suggesting support for something simpler that would also help with MathJax right away, but not limited to it.

By the way, Rust also is another programming language that offers raw string literals using clever delimiters. For example, r##"foo #"# bar"## gives foo #"# bar, the amount of #s can change for the delimiters, so to avoid clashes with the contents.

Markdown uses _ and * for italic, bold, and bold-italic. It could be that more than 3 of such characters could mean no-markdown (or some other character counting from 1). For example, ****this won't be parsed****.

I guess what I’m asking is: do we really need markdown-islands? Because to me they seem like kind of a hack, they don’t correspond to any semantic entity like “code” or “math”. I don’t see it analogous to programming languages, since there you always have quotes for string, like x = "foo", and sometimes you want another delimiter than quotes to avoid having to escape them. But in markdown and other markup languages like HTML, you don’t have that.

As for your MathJax use case: e.g. Pandoc seems to do quite a good job by using $...$. Or are you saying there are problems when using that markdown in implementations that don’t support math? A basic test turned out not too bad, though.

@mb21, I think there’s some confusion, I’m not talking about support for parsing and taking action to render LaTeX, or any other language. When using MathJax in a web page for example, this is the job of the MathJax script to take LaTeX and render it appropriatelly, MathJax will understand its delimiters as $...$ or \(...)\ IIRC, and will take the stuff inside to produce rendered LaTeX, but what’s inside is just verbatim LaTeX that didn’t get pre-processed by some markdown parser, it should not.
Such job that MathJax provides using javascript, could also be provided in other context, I don’t have another example, but one can infer that it would also be necessary to not have parsing of such input that’s intended to be consumed by a script.

If you’re interested, I think I proposed something similar to you for the mathjax example at least: Mathematics extension

1 Like

Something like this would be useful for embedding MathJax, we also ran into this problem at Stack Exchange and ended up hard coding an ignore sequence whenver we saw the MathJax start and end blocks (the dollar signs, etc, as I recall). For example

http://meta.math.stackexchange.com/questions/tagged/tex?sort=votes

Yes, I think it would be good to have in the spec, that anything between $…$ and $$…$$ or between \(… \) and \[ …\] shouldn’t be processed by CommonMark. This is the standard for indicating LaTeX math blocks. And it would be great if it always possible to add mathjax support to any CommonMark editor. Without having to write difficult plug-ins, like the plugin I wrote for discourse: https://meta.discourse.org/t/mathjax-plugin-supports-math-notation-using-latex/12826

IMHO this is a part of more generic problem - format for inline extentions. Block extentions are now discussed in several threads, but those markup is not ok for inline elements (in paragraphs and lists).

On practice, inline elements have different philosophy - syntax should be as simple as possible. So, it would be more convenient for users to have different markers for each case, than one universal marker with name and params.

Does a <![CDATA[]]> section fit your definition of a no-markdown island?

You can put almost anything you want in one, except maybe ]]> There seems to be a few fine points to work out around them, but they are already in the spec.

I was testing in the block context and found a spec issue, see: Drawing a distinction between HTML block elements and non-element tags. We may need to pay similar attention to the difference between Raw HTML and non-element tags like CDATA.

No, it doesn’t fit since the CDATA section ends up in the rendered output wrapping its contents. MathJax LaTeX, for example, is not expected to be wrapped by this in the output.

I see. So it sounds like this topic belongs in the Extensions category.

1 Like

Indeed, but I’d like to point out that the no-markdown literal I’m suggesting can encompass, but is not limited to MathJax, it’s a simpler feature for asking the parser to not parse contents, just output them verbatim. So, let’s say, if the delimiters were ''', so _foo_ '''$E = mc^2$''' *bar* would be the thing that would make $E = mc^2$ goes to the output without being parsed, leaving the $...$ thing for the job of MathJax.

Maybe, but, at last for me, this is such a must to be right in the spec, I miss it didn’t happen to have been included in Markdown from day one.

One approach could be to have a general directive like this in the beginning of the page.

!!!ignore
    latex: $ ... $
    C: /* ... */ 
!!!

$ 1+1 = 2 $
/* hello world */

Good thing about this approach, is that it is easy to refer to, and can be adapted as needed.

Also this above is local to the page only, but I’m sure it could be possible to create a sitewide header for declaring these (e.g. a website handling maths tutorials). And perhaps, baked into core, if used often enough by everyone like latex `$ … $``

This is the key point: who knows what kind of other parsing scripts will conflict with Markdown in the future? Having a simple “no Markdown” notation would make forward compatibility much simpler and possibly reduce the need for extensions to the Markdown parser.

I still think that the best bet would be to somehow combine this with the raw HTML notation, since the whole point of that is for Markdown to pass it along unaltered.

1 Like

mightymax, this could be your solution. Use markdown=1 flags in markdown (where all html tags are by default markdown=0)

<div markdown="1"> 
This is *true* markdown text.
</div>

src: PHP Markdown Extra

I’ll just point out that the markdown=“1” trick should be credited to John Gruber. It was his plan to incorporate this into Markdown at some point (and if I recall well, one of the 1.0.2 betas had the feature enabled, but it was removed in later betas because it had issues).

src: Markdown within block-level elements

2 Likes

It’d be a lot cleaner if the parser parsed the guts of capitalised tag names, and converted them to lowercase. This example looks messy, but in practice, you wouldn’t need to use the feature very often.

<SECTION>
# this is a header one
<p> # this is just text <SPAN> but `this is code` </SPAN></p>
<SECTION>

…becomes, more or less…

<section>
<h1> This is a header one </h1>
<p> # this is just text <span> but <code>this is code</code> </span></p>
</section>

This is not exactly backwards compatible, but you could easily convert existing docs by just capitalising every tag name.

1 Like

interesting approach…

Indeed, that’s a key point. I also think it may be ok if included in raw HTML notation section, the only thing that I find not optimal with that is the requirement to have html involved. This no-markdown thing doesn’t implies involvement of HTML. If it were just a kind of markdown notation, instead of HTML based, we could get verbatim content in the output without it having to be wrapped by <span> or <div> tags. When using MathJax for example, inside an html file, you can have foo \( math \) bar, instead of foo <span> \( math \) </span> bar which is what you would get with the no-markdown thing based on raw HTML, even though it works for effects of display.