Keeping hints about alternate syntax used in the AST

For ordered lists, the “delimiter” used after the number is kept in the abstract syntax tree (AST) as an attribute. All other alternative syntaxes are not reflected in the AST at all (see [Dingus]), i.e. emphasis (* vs. _), ATX vs. Setext heading, bullet list markers (* vs. + vs. -), thematic break character (* vs. _ vs. -), code fence (~ vs. `).

Why are enumerated list markers so special?

Filters working on the AST could be more versatile if they had access to that information in general.

[Dingus]: http://spec.commonmark.org/dingus/?text=1%20asterisk%20_1%20underscore_%202%20asterisks%20__2%20underscores__%203%20asterisks***%20___3%20underscores___%20____4%20underscores____%0A%23%20ATX%0ASetext%0A%3D%3D%3D%3D%3D%3D%0A1.%20Period%0A1)%20Parenthesis%0A%20Asterisk%0A%2B%20Plus%0A-%20Hyphen%0A____%0A----%0A**%0A~~~%0Atildes%0A~~~%0A%60%60%60%0Abackticks%0A%60%60%60%0A%5Binline%5D(%2Flink)%20%5Breference%5D%5Bid%5D%0A%0A%5Bid%5D%3A%20%2Flink%0A

1 Like

The thought was that one might want to preserve the delimiter on output. LaTeX, HTML, etc. make it possible to have lists with ) instead of .

With emphasis, it was thought, there’s no such need – there aren’t two kinds of emphasis.

That was the thinking. Of course, keeping the information in the AST would allow lots of possibilities. And it would be useful for converting back to CommonMark.

That’s what I assumed.
The delimiters are not directly supported in HTML, though, just with CSS. But if we take that into account, there‘s also a case for the bullets which could be correlated with presentational HTML 3/4, e.g. *disc, +circle, -square.
If I remember correctly, Latex does not support item styles for enumerate environments out of the box, i.e. a package like enumitem or enumerate was needed or each item’s label would have to be generated explicitly (including the number).

If a hint about actual marker used was available in the AST, someone could write filters or extensions that did something more reasonable like the following:

  • Treat underscore emphasis as i and b in HTML5 output,
    maybe even u (or ins?) for quadruple underscores.
  • Auto-number ATX headings, don’t number Setext headings or
    don’t put Setext headings into an automatically generated TOC or
    generate HTML5 sections for ATX headings only or
    do any combination thereof.
  • Treat one or two of the “thematic break” alternatives as section boundaries in HTML5 output.
    (@Dmitry once suggested to use “boundary” instead of “horizontal rule” or “thematic break”.)
  • Render images or formatted text for certain code blocks with tilde fences, but restrict to syntax-highlighting for back-tick fences. They could also differ just in default values for the parser-dependent info-string “attributes”, e.g. preset language, line numbers, whitespace visibility.