MathJax extension for LaTeX equations


#1

We’ve talked about this in some other topics, but I think this deserves its own topic.

I think it would be great if there would be some kind of extension so that it is possible to load CommonMark and MathJax at the same time, so that LaTeX equations are rendered. Most importantly, this means that anything text between the following should be ignored by the CommonMark parser:

\( ... \)
\[ ... \]
$ ... $
$$ ... $$
\begin{some text} \end{some text}

Well, I’ve been trying to get this working, and I’ve now uploaded a stmd fork, with some sort of minimal example to get this working:
http://kasperpeulen.github.io/stmd/js/

Here are the changes that I made:

Github Commit


Mathematics extension
#2

It think it will definitely be important to have a tex math extension at some point. Note, however, that \[, \], \(, and \) already have well defined meanings in Markdown (and CommonMark) – they are escaped [, ], (, and ). It is important that it be possible to escape these characters, so it is better, in my view, to only support the $ style delimiters. In addition, care must be taken to avoid capturing $ characters with ordinary uses ($15.00). Pandoc has heuristics for this that have worked well.


#3

I’m cataloging existing syntaxes for this at https://github.com/cben/mathdown/wiki/math-in-markdown (very incomplete so far, edits welcome - it’s a wiki).

There is a tension between what’s convenient for heavy math users (supporting all 4 latex syntaxes wins on least-surprise and muscle memory grounds), and ordinary users.
It’s fine for some syntaxes to be optional but it would be really sweet if there was some portable syntax that’d work with any parser that supports math at all.
Until your post here I was of the opinion that the common syntax better be \(..\) and \[...\] because $...$ really needs some limiting heuristic but agreement on the exact heuristic seems unlikely (even Pandoc’s docs and implementation seem to disagree ;-)).
Specifically, limiting $...$ to one line is a recurring idea — bad heuristic (unlike any inline construct in markdown) but effective damage control.

However if you say escaping brackets is very important and dollars with heuristic are acceptable, I’ll take your opinion over mine.

But then what do you think of kramdown’s syntax?

  • It uses the more robust $$ for both inline and display math.
  • The automatic treatment of math alone in a paragraph as display is kinda elegantly “markdownish”. The best part of this is that display math must stand alone in the source, making it closer to how it’ll look rendered.
  • It has a simple provision for other environments via $$\begin{foo}...\end{foo}$$ which doesn’t require the markdown parser to look for \end{foo}, just same $$.
  • It’s somewhat compatible to any other parser that understands dollar math — worst case all math becomes display…

Alas, no other tool supports kramdown’s syntax now, nor vice versa, so there is currently no portable syntax at all for inline math :frowning:


#4

Yea I agree that $$ feels just as natural as $ . And I like how clear it it, which goes well with my personal opinion that the core syntax should be as minimal as possible, with minimal heuristics.


#5

I think it is a bad idea to go for non-standard syntax like kramdown’s syntax or double escaping. I would only go for this if there are no other solutions possible.

Kramdown’s syntax is kinda elegantly ‘markdownish’, and I like the idea, but for some other math language that is still adopting new ideas, like asciimathml. I wouldn’t try to change LaTeX, I’m sure many LaTeX users wouldn’t like it.

I think http://math.stackexchange.com is one of the most active math communities and I think it would be very important that any thing written there still works in commonmark + mathjax extension. So at least support $...$ and $$...$$.

Besides that I would really like it if supporting \( and \[ would optionally work. I don’t see exactly in which cases escaping them is essential. I’m sure there are, but as I think very few cases. And as there are very many cases for a mathematician, to use \( ...\) or \[...\] I would rather have some ugly syntax for escaping \[ then have this \\( ...\\) ugly syntax for all my mathematics.

The only case I can see that escaping ( ) [ ] is essential, is inside of links and links descriptions like:

[This [] (is!) my link](http://www.google.com/)
This [] (is!) my link works without escaping

[This \]\[ is my link](http://www.google.com/)
This ][ is my link needs escaping

But I would like it, if for a mathjax extension it is optional possible to just don’t support those cases, or use some other syntax for that like:

[This \\]\\[ is my link](www.google.com/)

Or maybe using backticks around the link description.


#6

@jgm I’ve been toying with the idea of math support in Markdown that looked like this:

  • Inline math is $$...$$. This avoid ambiguity with phrases like I thought the ticket was $20, but instead it was $25. IMHO Inline $...$ is just too prone to mistakes, though I’ve be very interested in seeing your heuristic if you could point me to it.
  • $$...$$ would be used for display math. Display math is considered only if there are no blank lines in-between. For example:
$$
this is
display math
$$
$$
this is

not
display math
$$

I didn’t even realize Kramdown used the same syntax, so that’s a comforting sign!


#7

Basic heuristic is: opening $ can’t be followed by whitespace, closing $ can’t be followed by a digit or preceded by whitespace. This has been used for a long time in pandoc, and I haven’t had complaints about capturing $s that are being used in the normal way.


#8

@gjtorikian In pandoc your example doesn’t causes any problems.

http://johnmacfarlane.net/pandoc/try/?text=I+thought+the+ticket+was+%2420%2C+but+instead+it+was+%2425. This+equation+%241%2B1%3D2%24+is+rendered+as+math.&from=markdown&to=html

I really don’t think $...$ is too prone to mistakes. Markdown users very rarely use the dollar symbol outside of code blocks are math blocks. Even at http://money.stackexchange.com/search?q=%24 this symbol is only used 78 times. And if you check those cases, you’ll see that they will all be solved by jgm’s heuristic.

And in this very rare case, that the dollar symbol really need to be escaped, people can just write \$ right ?


#9

@jgm, closing $ doesn’t care if it’s preceded by space:

$ echo '$2 + 3 $' | pandoc
<p><span class="math">2 + 3</span></p>

See https://groups.google.com/forum/#!msg/pandoc-discuss/KiyMZn1wFHg/lPUzI8U-KukJ for some more findings; it seems the only requirement are opening notFollowedBy space and closing notFollowedBy digit (but I don’t know Haskell).

Also AFAICT pandoc applies the same heuristic to \(...\) — is that intentional?

$ echo '\( 2 + 3 \)' | pandoc -f markdown+tex_math_single_backslash
<p>( 2 + 3 )</p>

Anyway, pandoc’s README documents a different heuristic and should be corrected.


#10

@cben, you probably have an old version. You can verify here:
http://johnmacfarlane.net/pandoc/try/?text=%242+%2B+3+%24 &from=markdown&to=html

As for \(, pandoc doesn’t follow any heuristic here, because this is
just markdown syntax for an escaped (.


#11

With all due respect, I think the heuristic for $...$ is still prone to failure.

Some countries place currency symbols after the numeric value; I know Quebec does this: “The price was 400$, but I thought it was only 210$!”

And there is the possibility of the same syntax clashing when talking in a derogatory manner: “I don’t know who is greedier, Di$ney or Micro$oft.”

If CommonMark is intended to be used in large online forums (Reddit, GitHub), you have to take the path of least surprises.

This is a very dangerous mentality. If anything, it should take more effort for people writing math than people not writing math. People not writing math shouldn’t need to remember when to escape or not escape dollar signs. It should be an encounter that’s not an accident.

If you know all the rules, great, you can know when to wield $ properly. But I’d like to just keep in mind that CommonMark will be used in some sort of online discussion format where the user quite often won’t care about writing in Markdown, and might be caught off-guard when their post is rendered incorrectly.


#12

Agreed. And if you really need to use $ form, you can always just declare that in document declaration ( maths vs filmscript mode), or site specific settings.


#13

Just to make an explicit distinction here - although the default parser might want to ignore LaTeX, not all converters would want to do that, for instance if you want to render a document to ODT you would convert LaTeX to to open document math markup, so you would not ignore the LaTeX.


#14

mangecoeur, This sounds like we need to define an Abstract Syntax Tree specification, so we can first compile CommonMark to AST, before parsing the AST to multiple different targets like html or pdf.


#15

I was talking about a mathematics extension here. Something that could be enabled at for example math.stackexchange.com or http://www.reddit.com/r/math. This extension is only needed if people are loading mathjax (or possibly katex) on their website. At math.stackexchange.com the $...$ notation is used, with much worse heuristic, and nobody has ever complained. I’m sure they will complain if math.stackexchange.com would adopt commonmark, and suddenly they can’t type $...$ anymore.

But now I understand, that you guys are not talking about an extension, but something for the default commonmark spec, I wouldn’t mind that as well, and I don’t think that it is prone to problems. If you think so @gjtorikian , in any of those websites where markdown is used, and where possibly commonmark could be used, can you hand me one line of the millions of markdown lines that are typed, where the $…$ heurstic of jgm could be problematic ? I’ve been trying to search for it, and I couldn’t find one example. Even at http://money.stackexchange.com

You artificial example: The price was 400$, but I thought it was only 210$!" indeed does give a problem in pandoc, and it maybe good to extend the heuristic so that a dollar symbol cannot be followed by a ,\s|!\s|.\s just saying something, I’m sure the heuristic can be improved to also ignore those cases.

Last point I want to make. Say that someone from Quebec did go to some commonmark enabled website, and did want to talk about the price of something. And did it in such a way that it breaked the heuristic. Then still, it would only cause a problem if mathjax is loaded. And if mathjax is loaded, he would be on a math website, and understand that this could give problems. If not, and mathjax is not loaded, then the only harm is that the part but I thought it was only 210 would be ignored by the markdown parser, which the user wouldn’t notice anyway.


#16

Electrical Engineering Stack Exchange is an interesting case study: to avoid confusion to users that had simply used $ for prices, and specifically to avoid breakage in existing texts, they switched to idiosyncratic \$...\$ for inline math:

IMHO the particular syntax \$ is quite unfortunate. Ideally backslash in MD should only block magic character meanings, not enable them; the only reason \(...\) and \[...\] are used is being familiar to Latex users. But let’s leave that aside an focus on the question why $...$ proved unacceptable.

  • Reading these Meta discussions, I see there were problems with preview then — math would render extremely slowly or not at all until the user pressed Post. Also, it seems mathjax didn’t support $ escaping yet, so even users who were aware of the math syntax didn’t know how to type a dollar, with the best workaround being the obscure math dollar $\$$
    I wonder if working & fast preview can significantly reduce user surprise, i.e. can users who don’t care about math. I.e. is $math$ something that users who just want to type a dollar and don’t care about math can easily learn to avoid (given working $ escaping), or is it a significant tripping point?

  • To this day, Pandoc is the only converter implementing $ with a heuristic, so that $200 ... $5 is not considered math.
    I wonder whether Electronics.SE could have stayed with single dollars if a similar heuristic had been easy to implement with MathJax?

@codinghorror, as you argued both sides there — first that

we don’t want different basic usage of MathJax across the network; that would be like Markdown changing essentials like “how to bold” on a site by side basis.

and later settled on \$...\$ — could you weigh in here on what delimiters/heuristics might be appropriate everywhere?


#17

Generally, I use:

  • ${ math }$ for inline equations and
  • $${ math }$$ for block equations

The addition of the { and } delimiters has three advantages:

  1. reduces ambiguities wrt other inline use of dollars, e.g: 100$ on that site and 150$ on the other
  2. keeps asimmetry among opening and closing delimiters, so that it is possible to automatically exclude pending delimiters to avoid errors or to perform massive substitutions with other delimiters
  3. it is still compatible with latex: indeed opening and closing {} are hidden by latex processors as superfluous symbols (but they are not in the meta language!), so it is back compatible with other solutions.

Anyway, in order to have the point 1 fulfilled, the parser has to be aware of the delimiters including {}.
I would like having this choice available in future CommonMark + Latex implementations.

Thanks and regards,
Netsaver, Rome (IT)


#18

I agree that this should not be in the main spec.


#19

Although I think the “$” delimiters with the simple Pandoc parsing rules would work just fine, we could consider adopting MultiMarkdown’s math delimiters as a way of avoiding using “$” and to not invent yet another syntax. It’s delimiters are:

  • \\( and \\) for in-line math
  • \\[ and \\] for display math

However, the “$” delimiters are quite widespread and appear to also be supported by MultiMarkdown now as well.

Either way, I think that if source code is a first-class citizen in Markdown, math certainly should be as well. We should define this now, so that parsers at least pass the math text verbatim rather than trying to parse it as markdown. This way, the math text can be syntax highlighted properly, and the math source is not mangled if the renderer doesn’t support MathJax or some other means of rendering the math.


#20

Perhaps fences ```` and ~~~~ should differ slightly from each other: one is always displayed verbatim (plus syntax highlighting etc.), while the other should be parser into something more useful if safe and possible. Users would specify the language as the first part the info string as is common practice for source code.

~~~~ tex
x^2
~~~~
.
<figure class="formula">
<var>x</var><sup>2</sup>
</figure>
```` tex
x^2
````
.
<pre><code class="language-tex">x^2</code></pre>

This does not prevent extensions from defining more specific custom fences or line prefixes for display math.