How to move ahead with extending CommonMark

I think that idea currently works because all block structure is encoded at start of lines: indentation, list bullets, >, blank lines. Will any this be a problem for any attempt to introduce block structure in the middle of a line?

Possibly silly questions: In what way do we want tables to be block structrure? Could we only consider table start/end to be block structure, and cell boundaries to be inline structure?

Well, putting code spans and \| problem aside, it makes typographic sense to think of each cell as having a separate inline structure. For example:

| table | head |
|-------|------|
| A*B   | C*D  |
  • GFM and almost all table implementations treat these as unmatched asterisks, not markup.

  • a couple (maruku, s9e/TextFormatter) make B and/or D italic :bug:, but still treat it C as a “fresh start of independent cell”.

  • nobody makes B C italic across cells. Good! :relieved: That would make little sense as AST and would not fit HTML at all…

  • But mutlimarkdown and cebe/gfm have an interesting alternative: a single cell “AB | CD” where the inner | is NOT a cell separator, just regular text.

    => I guess this is what it means to treat cell boundaries as inline structrure.
    I suspect it’s a bit more error-prone than parsing each cell separately, and more previews will flicker more during editing… But at least it’s a consistent position!

Also, what about escaped \| outside backticks resulting in a single cell with textual “|”?
Well, backslashes can inhibit block AND inline constructs in markdown, so it’s consistent with both positions. And it’s important to have a way to spell “|” inside a cell (other than ugly | or |) :+1:.

Not let’s talk code spans. I’d think that if we want:

| I`J   | K`L  |

to mean a single cell with a “IJ | KL” content, we better treat “AB | CD” similarly.

Unfortunately, the reality is more fragmented: https://babelmark.github.io/?text=|+table+|+head+| |-------|------| |+A*B+++|+C*D++| |+E*F++\|+G*H++| |+I`J+++|+K`L++| |+M`N++\|+O`P++|

  • github/cmark and a few others are consistent in first parsing cell boundaries, then treating A*B and I`J as unterminated asterisk an backtick.

  • markdown-it and a few others do A*B but a single cell with J | K code span.

  • maruku does the opposite! But its table support is weird in other ways, and apparently it doesn’t allow escaping | by any way — neither \ nor code span nor even \ inside code span :-1:

  • multimarkdown consistently treats all 4 combinations as a single cell. But nobody else does.

  • There is more variation about \| inside code span becoming | vs \| in the output :confused:
    I’ll post more thoughts about this soon.