Tables in pure Markdown

@ebruchez Any implementation which uses “|” to delimit table cells (or whatever syntax construct), but

  1. does not provide the escape sequence “\|” to “hide” that character, and
  2. does not treat is as data inside a code span

seems pretty broken, in particular since (2.) already holds for any “markup-relevant” character in the very first Markdown description by Gruber.


That said, based on examples I tried in BabelMark, it seems that botching escape sequence recognition is not an uncommon problem in Markdown implementations, and that using a character reference like you did with | is in fact the most robust work-around (and sometimes the only one). For a simple example:

*foo \* bar*

will not render as “foo * bar” (wrapped in <EM>) in all implementations, and even less so

*foo * bar*

(though I’m pretty sure that both forms should, by very basic Markdown rules), but

*foo &#42; bar*

will in every implementation employed there (even in some really dumb ones!).


@Dmitry Hmm, now that you quote it from the specification, the use of the term “precedence” in this context doesn’t feel quite right—am I the only one having this hunch?

Indeed, select implementations of CommonMark use the term priority, but I don’t see much harm in using precedence in this context.

2 posts were split to a new topic: Is the spec too big?

@tin-pot I am not sure which implementation this is (it’s the one used by gitbook). I agree it’s quite broken. Hopefully CommonMark can make sure this kind of scenarios are fully covered, and if the core of CommonMark already does cover escapes and code spans properly, then it’s even better!

JFTR Gitbook uses kramed, a fork of marked which is supposedly compatible with kramdown. Kramdown is well known to be the table-greediest of them all.

1 Like

I have started work on making libcmark extensible, see https://github.com/jgm/cmark/issues/100 for the (pretty long) discussion. My test case / use case for this is tables, it seems it’s something a lot of other people want too, and I was made aware of that escaping problem through Parsing strategy for tables? .

My humble opinion on that is that as it seems accepted that block level rules should take precedence over inline rules, the correct way to approach the issue is to match lines with table row rules by ignoring all backslash-escaped pipes. If a line matches and a table row block is created, the backslash should be removed before parsing inlines. Thus:

| A cell `\|` with a pipe | another cell |

should be interpreted as a table row with two columns, the content of the first cell being translated to:

A cell `|` with a pipe

before inlines in it are parsed.

That’s the behaviour my test extension now implements at https://github.com/MathieuDuponchelle/cmark/commits/extensions_draft_3 .

1 Like

Table headers should not be required (HTML doesn’t require them… :relaxed:) Instead, a single pipe character should be accepted as a ‘no-header’ table opener. (Bitbucket’s parser does this already FWIW)

The source would look like:

|
|------|------|
| cell | cell |
| cell | cell |

Thoughts? It’s unambiguously a table but most engines parse it as text unless it looks like this:

| | |
|------|------|
| cell | cell |
| cell | cell |

Of course, requiring users to count pipes when they aren’t using a header row seems, well, like a recipe for disaster parsing errors.

Why not just

|------|------|
| cell | cell |
| cell | cell |
6 Likes

Support for table cells that wrap over multiple lines in the Markdown source would be useful for platforms where files have fixed or limited record (line) length.

For example, on IBM z/OS (mainframes), record length might be limited to 80 characters.

I’m using markdown-it (thank you, Alex and Vitaly!), with GFM table support, to render Markdown on the mainframe. In a file with 80-byte records, this works well for tables with a small number of columns and short cell values - that is, where each row occupies less than 80 bytes - but beyond that, tables are unusable in this context.

I have colleagues who maintain tables as plain-text in files with 80-byte records, where table cells wrap over multiple lines. I’d like to be able to suggest some flavor of Markdown (preferably with an existing markdown-it plugin) to format those tables.

(Vitaly, I’ve read, understood, and agree with your comments about implementing GFM table formatting in markdown-it.)

1 Like

Pandoc supports grid tables with multiline cells which are suited for fixed width content.

I have recently implemented the same feature in Markdig

2 Likes

How about supporting Wikitext-style tables, along with the other syntaxes e.g.
pipe tables?

It’s sort of a thin layer on the traditional <table>, <th>, <td>, etc.
markup tags.

It supports declaring “flat” tables; tables where you don’t have to manually
format and reformat stuff like pipes. Also allows you to create big tables
without going over line-length limits.

Here’s an example:

{| class="wikitable"
|+ Table caption; The quick brown fox jumps over the lazy dog.
|-
! Header 1
! Header 2
! Header 3
|-
| row 1, cell 1
| row 1, cell 2
| row 1, cell 3
|-
| row 2, cell 1
| row 2, cell 2
| row 2, cell 3
|- style="text-align: center;"
| row 3, cell 1 || row 3, cell 2 || row 3, cell 3
|-
| The quick brown fox jumps over the lazy dog.
|
The five boxing wizards jump **quickly**.

> Whoo yeah this is a blockquote.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
| 
{|
|-
| Tis a nested table
| with two cells per row.
|-
| One two three
| Four five.
|}
|}

Advantages

  • Easy to write and is arguably more maintainable than pipe/grid tables (which
    is basically ASCII art.)
    • Writing long paragraphs is much simpler. No need to fiddle with pipe
      characters.
  • Diff-friendly. Adding or removing a cell (assuming one cell per line) means 1
    line edited in diffs.
  • Large amount of formatting options; again this is due to it being a thin
    layer for HTML.
    • Declaring HTML attributes is dead easy. This makes it possible to use
      rowspan, colspan and friends.
  • Used in one of the largest and most edited (?) sites in the world.

Disadvantages

  • Nesting tables — though less inconvenient compared to other formats — is
    unreadable when you have large tables.
  • Verbose.
  • Looks less like a table and more like markup.

Some modifications for Markdown.

Here are some thoughts to make it better for Markdown.

Starting and closing tags, plus shorthand attributes:

[| {.class1.class2 #royal-pain style="text-align: center"}
// table code here //
|]

Readable nested elements:

[|
|-
| And in the end, the love you take is equal to the love you make.
|
:: Here's a table inside a table.
:: 
:: {|
:: |-
:: | Tis a nested table
:: | with two cells per row.
:: |-
:: |
:: :: Here's another nested table
:: :: [|
:: :: |-
:: :: | One plus || Two plus || Three plus
:: :: |]
:: |}
| Closing thoughts.
|]

Nah, Mediawiki tables are not really something Commonmark should adopt due to the disadvantages you listed. The ASCII art look of pipe and grid tables is exactly what makes them fit well with the markdown spirit. Anyhow, if you like the {| intro, I guess it would mix with code fence info strings like this:

||| .class1 .class2 #royal-pain style="text-align: center"
table code here
|||

Nesting is done by indentation in markdown, so your proposed :: would be a deviation from that.

3 Likes

I’d be inclined to give a lot of weight to this disadvantage, given the fundamental focus of Markdown on source readability. (Easy-to-read wins over easy-to-write with Markdown; not all lightweight markup formats make that same choice, but it is what distinguishes Markdown.)

6 Likes

Just landing on this website. I read that 2 years ago, the table spec was relegated as not considered part of the ‘core features’. After two years, I don’t see a clear intention of incorporating tables into the core.

This is quiet disappointing, as there are major players supporting tables for years (i.e. github). I would rather be inclined to use one of those major players as the standard, like GFM.

Is there any serious and clear roadmap for tables?

1 Like

There’s a clear intention to add tables, yes. But it has proved difficult enough to get all the details ironed out for the core elements, and that should happen first. Also, as you can see above, there are a lot of tricky issues to resolve in deciding on an appropriate table syntax. In the mean time, nothing stops you from using an implementation that supports CommonMark + a non-official table extension. (And there are many implementations that offer this, e.g. markdown-it.)

2 Likes

While the time passes, some de-facto standards emerge. Any attempt to reinvent the wheel will cause a bigger conflict. I think it is time for you guys to take the common denominator of the major players and write down a basic proposal for tables. I guess that would be pipes and hyphens.

1 Like

Welcome to the site!

I think the main reason tables are included in extensions, rather than in the CommonMark core, is that the original Markdown syntax by John Gruber did not include tables at all. Many other flavors and implementations of Markdown don’t include tables either, not even basic ones with pipes and hyphens.

There’s something to that. I suggested something similar as a mostly-least-common-denominator approach. On the other hand, I would worry about trying to impose any sort of table standard where none exists.

CommonMark isn’t quite a “standards committee,” it’s really just yet another Markdown flavor, but even so I feel our main goal is to specify rather than to innovate. Given the huge variety of table syntaxes and potential conflicts, it might better to step back and let the major players fight it out as the de-facto-standardization process runs its course.

2 Likes

I indeed like the idea of leaning towards the wikisyntax. One property it has is that it’s relatively easy to write without the help of any sophisticated editor (once you know the syntax, which looks rather difficult).

Optionally put new cell on new line (with leading space)

My point would be to mix in the possibility of writing a new cell on a new line into pipe tables, like this:

  • ^[|].* indicates new row (starts with pipe)
  • ^\s+[|].* indicates new cell (starts with spaces then pipe)

Or rather:

  • replace(/\n^\s+[|]/gm, "|") – remove newline/linefeed from lines starting with spaces before the pipe (inside identified tables). See my example at regextester.com
  • then evaluate table as usually → This means there would be no big changes to pipe tables, i guess.
  • Also works with tables that lack leading or trailing pipes, since the replace rule would only match inside the rows.
| A1
 | B1
 | C1
 | D1 
|---|---|                       
| A2
 | B2
 | C2
 | D2

Until here everything about my proposal would be ***optional!***. No need to use it for small tables that are compact enough to fit into the editors view anyway. The point is that stuffed cells would get more overview.

| A1 | B1    | C1 | D1 |
|----|----|----|----|
| A2 | B2
    | C2: Lorem ipsum dolor sit amet, qui repudiare dissentias mediocritatem ut, quod consequat ex qui, ius in quaerendum repudiandae. Eu soleat repudiandae quo, est ullamcorper definitiones ut, cu augue sententiae quo. Ea casem nihil scaevola has, eu consul propriae pro. Nec feugait corrumpit te, est ut mollis bonorum.
    | D2 |

You could also space out the table so that it would be easy to grasp which cell is in which collum, even if the cells have content of different length.

| A1 Lorem ipsum dolor sit amet, qui repudiare 
    | B1 dissentias mediocritatem ut, quod consequat ex qui, ius in quaerendum repudiandae. 
        | C1 Eu soleat repudiandae cu augue sententiae
            | D1 Ea casem nihil scaevola has, 
|---|---|---|---
| A2 repudiare dissentias mediocritatem ut, quod 
    | B2 repudiandae quo, est ullamcorper
        | C2 est ut mollis bonorum.
            | D2 Nec feugait corrumpit 

Block level elements

However, if we would change the syntax a bit it might enable us to use block level elements inside tables!, which would be really awsome!

| A1 | B1    | C1 | D1 |
|----|----|----|----|
| A2 | B2
    | - C2 ex qui
      - Lorem ipsum
      - dolor sit amet,
      - qui repudiare
    | D2 |

compressed tables

I think this could be nicely combined with the idea of compressed pipe tables mentioned by @mofosyne with - or = to mark <TH>.

|= A1
 |= B1
 |= C1
 |= D1                        
|  A2
 |  B2
 |  C2
 |  D2

I personally prefer the = visually. Plus, maybe the combination |-could be used to more clearly start a new row, then you could even write a table in one single line like this: |= A1 | B1 |- A2 | B2. (But this would also require more changes to the syntax than just removing newlines.)

The |= might also be used to mark a <TH> anywhere in the table. Most common usecase might be flipped tables, where the headers are on the left, instead of on top (i.e. A1 and A2 would be headers in the examples above).

Align could be incorporated into the header marker like this: |:= left align, |:=: center align, |=: right align, or like @mofosyne proposed e.g. |:= header =: for center align.

Rowspan

For rowspan I would suggest the “obvious” symbol for continuation, i.e. the three dots that form the “ellipsis” .... E.g. in the following example C2 and C3 would be merged into one cell that spans two rows.

|  A1  |  B1 |  C1  |  D1  |
|------|-----|------|------|
|  A2       ||  C2  |  D2  |
|  A3  |  B3 |  ... |  D3  |
1 Like

I just noticed that my rowspan suggestion with ellipsis would take relatively long to be noticed by the parser (when it gets to cell C3) and only then can modify the rowspan attribute of the element C2. I don’t know whether this would slow it down a lot, but i thought that alternatively we’d just need to find a symbol for rowspan which we simply place in the same position as the repeated bars:

Visually i’d propose the exclamation mark (!), because of it’s vertical nature and the arrow-like downwards pointing of the point below it. It’s also relatively symetric to the double bar syntax, (repeat the ! if bigger rowspan is needed). Just like in HTML, the cell below would simple be omitted. Only disavantage I see, is that it’s relatively difficult to spot the difference between the bar and the exclamation mark. (The full-stop would probably be used to often as the last symbol and therefore generate unwanted behavior of the table. I doubt however, that there’s much shouting in tables.)

|  A1  |  B1 |  C1  |  D1  |
|------|-----|------|------|
|  A2       ||  C2 !|  D2  |
|  A3  |  B3        |  D3  |

A slightly extended markdown table syntax proposed by David Wheeler from 2009 contains the proposal for “continued content”. He proposes the colon (:) to replace the pipe (|) in lines that continue cells from the preceeding line. I would propose the apostrophe (') instead, but that’s merely a matter of taste.

|   |   |   |
|---|---|---|
| a | b | c |
: a : b : c :
|   |   |   |

variant with '

|   |   |   |
|---|---|---|
| a | b | c |
' a ' b ' c '
|   |   |   |

The big advantage of this syntax is that it allows block-level-elements.

However, the drawback may be that (just like gridtables) it disrupts the text into several parts, seperated by random other elements, e.g. if you search the text for “a a” you wouldn’t find it, because the raw text reads “a b c a”. This could be a problem for git/diff or the like.

In comparison, the syntax i proposed above (with space + bar in a new line) is more git/diff-friendly, but doesn’t retain the form of a table as much.

2 Likes