Math rendering (re-visited)

So, we’ve talked about this a while ago but wanted to re-open the topic of math rendering.

First, for a catch up see the amazing summary of the current state of play from @cben:

Things to note:

  • Math is a special case. We need inline support and not just code block support like we decided to do for things like mermaid.
  • MathJax is the most common client side library when it comes to parsing of LaTeX expressions
  • There are a few ways that MathJax can interfere with your rendering pipeline because \[ , \] , \( , and \) already have well defined meanings in Markdown (and CommonMark) and a single $ commonly used for inline expression evaluation is also hard to disambiguate when typing things like
    Cheese is $10.40 + $0.20 tax

What are folks latest opinions on this based on implementations in extensions to CommonMark?

2 Likes

Pandoc has been handling math in markdown for 16 years, and I
haven’t gotten complaints about the way we do it. It’s
already in wide use. So I wouldn’t reinvent the wheel.

We don’t allow \[..\] or \(..\) delimiters, because
these already mean something in Markdown.

Thus, display math is between $$ and $$, and inline math
is between $ and $, with the restriction that the delimiters
for inline math must be left- and right-flanking, respectively.
(This prevents your example $10.40 + $0.20 from being parsed as
math.)

I’ve also added this syntax as an extension to my Haskell
commonmark parser. Rough documentation with test cases here:

Math is parsed before we handle emphasis or links, in the
same parsing pass that handles inline code.

In rendering HTML, the math can be handled in various ways.
The easiest is to pass it through verbatim (or with minimal
necessary HTML escaping), and let a library like MathJax or
KaTeX handle it. It’s possible to speed things up by emitting the
math in a specially marked span tag, so that these libraries
don’t have to parse your whole document looking for math.

But of course it’s also possible to do things like convert
the math to MathML. These are all rendering details which
don’t need to be decided here.

4 Likes

My 2c:

  1. Discussion of rendering process (real world demand) can be postponed, in favor of focus on syntax first.
  2. Current syntax does not allows to select math dialect (it expects TeX only). There are nice user-friendly alternatives like asciimath.

So:

  • Should we care about alternatives or push all use TeX?
  • If alternatives are appreciated, can we auto-detect dialect, or define it explicit?

My personal opinion, it worth give a chance to asciimath. TeX is “much better than nothing”, but very complicated until someone use it every day. TeX is good default, with guarantee to express everything. But support of simplified alternative would be nice.

If universal solution can not be found in reasonable time, i’ll be ok with existing TeX-only $$ and $ (or anything else of this kind, approved by @jgm).

1 Like

Thanks for this discussion! I just wanted to surface this issue as well with a sample implementation of a fork of commonmarker https://github.com/commonmark/cmark/issues/439

2 Likes

Current syntax does not allows to select math dialect (it expects TeX only). There are nice user-friendly alternatives like asciimath

Yeah - looking at the implementations and when I’ve spoken to most practicing mathematicians then TeX support only seems to be a feature not a bug. Therefore I’d be fine with the $$ + $ syntax assuming TeX support but that still leaves room for people to implement a renderer that would use asciimath for codeblock rendering if they wanted to support it…

```asciimath
2 Likes

Codeblocks is not a big deal. Problem is with inlines. If we wish to land math right, we need to care about all cases. Or declare explicit - “we don’t like alternatives in math extension spec, and everything else is rejected, TeX is enough” (at least, that’s honest and clear for parser devs/users).

As i said before, math is special case with high demand. Primary blocker is inline syntax. I’d expect this things:

  1. Decide if alternate dialects should be supported by syntax. I like support of asciimath, but may be other parties have different opinions.
  2. If yes - update existing $$/$ syntax to co-exist well (nobody forced to support all dialects, but parsers should properly decline unknown ones).
    • Example: $am:.....$ - inline equation, telling it’s in asciimath. If parser not supports asciimath. it should not try to render that as TeX.
  3. Create extension spec draft, with reference who confirmed agreement with final syntax. That will signal to parser devs “you can start support without risk to break all in future”.

Note, i don’t push importance of aciimath, only call all to make decision about syntax for future. If majority wish TeX only, i will accept that.

Side note. Syntax like $<dialect>:...$ will open room for many other inline extensions (not math only). That’s nice.

@jgm what do you think?

1 Like

I suspect most people who want to include serious math would prefer to use TeX, which they already know, instead of learning a new syntax like asciimath.

And providing optional support with $<dialect>:..$ would lead to a lot of extra verbosity.

So, I’d say just go with TeX. It’s tried and true, known by the
people who will be using this feature, and has the best support.

One more syntax issue that I forgot to mention before. MathJax
supports a certain number of LaTeX environments even outside of
$$..$$. In fact, some of these environments have to occur
outside of $$..$$, for example
\begin{aligned}...\end{aligned} or
\begin{equation}...\end{equation}.

Pandoc parses LaTeX environments as raw LaTeX and passes these
through to HTML when MathJax is used. I’m mentioning this
because support for $..$ and $$..$$ by themselves may not
cover everything mathematicians want…

See my case ac_sc_grinder/math.md at master · speedcontrols/ac_sc_grinder · GitHub. For me, writing TeX equations without external editor is next to impossible (i use it very rare). IMO, criteria like “serious”, “professional” etc for user-friendly markup are excessive.

That’s only for alternate dialects. TeX will continue with $...$, as default engine. IMO that’s ~ zero cost.

I’m not familiar with topic. As far as i understand, you speak about outer wrapper for blocks (not inlines). For example - center equations on the page. Why those wrappers can’t be a part of each block? Just desire to reduce duplication of wrappers, or something else?

I understand, but we are in situation when sites “can’t wait anymore”. Those roll custom markup each. That will cause de-facto standards, but with chaotic result. Even incomplete spec with “known issue” is better than no spec at all.

IMO, when problem can’t be solved 100% at once, it can be partitioned:

  • By user needs
  • By sites

If 99.9% of users will be ok, it’s not rationale to force them wait more years until all 100% covered.

I just looked through my own markdown with maths for occurrences of the “dialect” notation. It’s common when defining functions, for example $f:A \mapsto B$. I personally thing it would be confusing if this was interpreted as “use the language ‘f’”.

I don’t have a stand on this question (yet), but your point is good, but is also easily addressed by coming up with a syntax that won’t clash with TeX, AsciiMath or other syntax.

1 Like

I have been thinking a little bit about the options being presented. One thing I come back to consistently is that Latex math is a code type being rendered. There are a number of types of math you could want to display asciimath, latex, etc… Also there are some choice inside these selections, for example if you are rendering mathjax there are different plugins that collide with each other, AMS and Physics math for example are not compatible.

If we choose to use the latex style prefixes $math$ or \begin we are defining ams math or physics math but not allowing the user to denote a different kind of math like physics. Would it be more extensible for the recommended implementation to off the 'asciimath 1 + 1 =2 ’ (where ’ is substituted for `) asciimath could be substituted for latex-asm or latex-physics?

Block math could then use standard ```mathtype syntax and single `mathtype syntax. What do people think?

I agree with @jgm that the Pandoc way is good as $ is common in other parsers, and Pandoc refines the inline syntax as @jgm has noted. I think that TeX is needed for its common use by authors and for typographic features. I think that the basics are not hard to learn for simple use, but I do think that also supporting asciimath would be a good thing for simpler needs and where you want more source text readability (which I think many extension proposals are failing to consider).

Perhaps @jgm can speak to how easy TeX/asciimath auto detect would be; the idea being that we support the two (rather than unlimited) so we could automatically detect which and not have to encumber each equation with text naming the dialect.

Just a question about ```mathtype syntax for block math. How would we distinguish math code we want rendered versus math code we want shown literally as code? Could be something I’ve missed.

Ryan Gray via CommonMark Discussion noreply@talk.commonmark.org
writes:

Perhaps @jgm can speak to how easy TeX/asciimath auto detect would be; the idea being that we support the two (rather than unlimited) so we could automatically detect which and not have to encumber each equation with text naming the dialect.

I don’t think this can be done, unless we’re willing to accept some degree of inaccuracy. Obviously, if the text has backslashes in it, we can assume it’s TeX. [EDIT: Well, actually not, since \ is used in asciimath for set difference.] That will cover many cases. But there are plenty of strings that are ambiguous. For example, the sample on asciimath.org, sum_(i=1)^n i^3=((n(n+1))/2)^2, is valid TeX too, though it has a different meaning in TeX than in asciimath. Another case to consider would be x^23, which in TeX means “x squared times 3”, and in asciimath means “x to the 23rd power.”

So we could guess but the guesses might be wrong…

Just following up on this, I think one approach would be to parse the stuff between $s as math, without specifying what the format is. It would be up to the renderer to interpret it. So, for example, one website might specify that material in $s is asciimath, and render it between backticks so MathJax will interpret it as asciimath; another might say it’s TeX, and render it between $s so MathJax will handle it. A third might let this be user selectable, say, through a flag in metadata.

This would mean that the spec doesn’t say what material between $s means, other than that it’s math.

I think this is perfect, having a hook for math but don’t incorporate any particular format. I like the defaults being site dependent or specified by metadata, which could include the renderer options as well.

They have to in a LaTeX document where there is concept of “text mode” (with math nested inside).

But MathJax only processes the math islands, and AFAIK it can process all such environments fine inside dollars (or whatever other delimiter you configure). It’s just an optional bonus that MathJax may detect them standalone in text.
Moreover, a non-buggy integration of math in markdown is unlikely to just let MathJax detect where math starts/ends (that way leads to \$ not preventing math rendering etc.) Normally the markdown processor should mark specifically where it already knows math to be (e.g. with <span class="math"> or similar) and later JS will call MathJax/KaTeX to typeset specific formulas. At that layer, there is no API distinction between “it was inside dollars” and “outside” — it just parses isolated formulas without context.

MathJax can even process \newcommand inside a formula (it has no concept of preamble vs body either).

Beni Cherniavsky-Paskin via CommonMark Discussion
noreply@talk.commonmark.org writes:

They have to in a LaTeX document where there is concept of “text mode” (with math nested inside).

But MathJax only processes the math islands, and AFAIK it can process all such environments fine inside dollars (or whatever other delimiter you configure). It’s just an optional bonus that MathJax may detect them standalone in text.
Moreover, a non-buggy integration of math in markdown is unlikely to just let MathJax detect where math starts/ends (that way leads to \$ not preventing math rendering etc.) Normally the markdown processor should mark specifically where it already knows math to be (e.g. with <span class="math"> or similar) and later JS will call MathJax/KaTeX to typeset specific formulas. At that layer, there is no API distinction between “it was inside dollars” and “outside” — it just parses isolated formulas without context.

MathJax can even process \newcommand inside a formula (it has no concept of preamble vs body either).

The question is whether we want to have people write

$$
\begin{equation}
x = y^2
\end{equation}
$$

in their commonmark documents, given that this would produce an
error in LaTeX (the equation environment can’t occur inside math
mode). That might seem counterintuitive to people who
know LaTeX.