Tables in pure Markdown

I would like to say at this point, I don’t care much about what table format is supported, but rather, that any are.

There is already a tables extension as part of the GitHub Flavored Markdown spec (which is a superset of CommonMark). You can use this today. If a tables extension is ever formalised as part of the CommonMark project, it would need to aim for compatibility with GFM since the GFM table syntax is already widely used and the goal of CommonMark is to be highly compatible with existing implementations.

2 Likes

I agree that compatibility with the widely used pipe table syntax is a good idea. Here are some thoughts I wrote up a couple years ago, towards a spec for tables that is largely compatible with existing pipe tables but more flexible:

I tentatively agree with the current syntax’s decision that pipes | create cell structure, and that literal pipes need to be escaped even inside code backticks. This is a departure from the general principle that nothing needs to be escaped inside code backticks, but it conforms to the general commonmark idea that block structure is discerned prior to inline structure.

Headers should be optional. So, this should be a table:

| a | b |

This too:

|:--:|--:|
| a  | b |

I think | should be required at the beginning and end of the row. I’d like to reserve this syntax for line blocks:

| 15 Main St.
| Chicago, IL

Note that existing pipe tables allow the leading and trailing pipe to be skipped. However, this makes it harder for parsers to tell right away whether we have a table line, and there’s also the issue flagged above about line blocks. Finally, especially if headers are not required, this increases the probability that a line with a literal | will be wrongly treated as a table.

Alignments should be supported, in the now-standard way:

| right | left | center |
| ---:  | :--- | :----: |

What to do if the rows have different numbers of columns? Presumably just add empty columns to all the rows. But there’s a question whether we should take the headers to determine the number of columns (and truncate body rows if needed) or just take the maximum number of cells in any row.

That would be the minimum.

I think colspans could be supported thus:

| this spans two columns  || third column |
| this spans three columns              |||
| aaa      | bbb          |  cccc         |

This means that if you want a blank cell you need space between the ||s. This would be a departure from the way pipe tables currently work.

Maybe for rowspans:

| this spans two rows |  bbb |
|^                    |  ccc |

These can be combined:

| this spans two rows and two colums|| bbb |
|^                                  || ccc |
| aaa             | bbb              | ccc |

One more thing that would be attractive in table syntax is a way to include long paragraphs in cells (or even multiple block elements).

For long paragraphs, one could do something like this:

| aaa | this is really long so I just |
!     | continue down here            |
| new | row here                      |

Here the exclamation marks say: add the contents of these cells to the cells above them. One could even have a list in a table cell using this syntax:

| aaa | - item one         |
!     | - item two         |
| new | row here           |

The ! is very similar to the |, which has its good points and its bad points. Alternatively one could choose a different character with more contrast.

| aaa | - item one         |
+     | - item two         |
| new | row here           |

Note that allowing long cell contents, and especially block level content inside cells, raises some issues about table layout. In HTML output this isn’t a problem, since browsers compute column widths that (usually) make sense for the content. But in other output formats, like latex, one must explicitly specify column widths. So one question is whether to have something in the syntax that represents relative column widths. Pandoc does this by using the lines of - under the headers, but only in cases where the cells are too long to be represented without wrapping.

EDIT: fixed initial |s in code blocks, which this forum converted to > for some reason… @codinghorror - this seems to be a bug in discourse’s processing of posts received by HTML.

2 Likes

I’ve tried to find an implementation of it to no avail.

Joel Gerber

We should be very careful to not create Catch-22. Consider a paragraph with many code spans, each having a pipe in it. It might be easily misinterpreted as a table. Yes, you may escape the pipes to prevent that. Alas, when not table, all the escapes are not escapes anymore but literal \| in a normal code span.

IMHO, it would be absolutely awesome if such ideas are included somewhere in CommonMark repo, even in such a short and informal way as your previous post. E.g. something in contrib/ or staging/ subdir which would contain proposals of future CommonMark features.

It could get CommonMark some momentum for discussing them and it would allow PRs which incrementally change them into more formal spec-like wording, and incrementally bringing them to the level required for inclusion into the core spec as a new chapter. Also implementations of them would go in (more likely) in the right direction instead of reinventing something way too incompatible.

The activity in the form of PRs or some discussion referring them would also make some natural metrics about demand for those features, allowing some prioritization.

1 Like

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

That’s from Gruber’s original Markdown spec, and is also quoted in the second paragraph of the CommonMark spec.

The approach I outline below has the overriding goal of readability/publishability as-is. It supports rich table functionality and is backwards compatible with GFM tables. It riffs off of the ideas of @jgm and others in this thread.

Start with the GFM tables extension. Then add the following:

Allow zero or multiple header rows. If there is no header, the header delimiter is optional (you may want to keep it for column alignment). Support column spans.

| heading 1 |          heading 2          ||
|           |  sub head a  |  sub head b   |
|-----------|--------------|---------------|
| aaa       | bbb          |  cccc         |
| spans two cols          ||  cccc         |
| aaa       | bbb          |  cccc         |
| spans three cols                       |||
| aaa       | bbb          |  cccc         |

Support row spans as well as long paragraphs and multiple block elements as cell content. Normally each text row is a table row, but the inclusion of a row delimiter signals that all rows in the table will be explicitly delimited. Notice that despite the complexity, it is easy to visually parse this as a 3 x 5 table:

| aaa | this is really long so I just   | ccc |
|     | continue down here              |     |
|.....|.................................|.....|
| aaa | bbb                             | ccc |
|.....|.................................|.....|
| this spans 2 (not 3) rows and 2 cols || ccc |
|                                      ||.....|
|                                      || ccc |
|.......................................|.....|
| aaa | bbb                             | ccc |
|.......................................|.....|
| aaa | - item one                      | ccc |
|     | - item two                      |     |

In addition to column-level default alignment specified in the header delimiter (per GFM), granular cell-level alignment can be specified in row delimiters.

|..........|:........:|..........|
|    right |  center  |  center  |
|---------:|:---------|:--------:|
|    right | left     |  center  |
|..........|..........|..........|
|    right | left     |  center  |
|:........:|.........:|:.........|
|  center  |    right | left     |
|..........|..........|..........|
|    right | left     |  center  |

If row headers are important (for accessibility as @selfthinker says above), support identifying them in the header delimiter:

|            |   Small |  Large | 
|============|---------|--------|
| Salami	 |    8.99 |  10.99 |
| Hawaii	 |    9.49 |  11.49 |
| Margherita |    7.99 |   9.99 |

Optionally (if this is not too hard for parsers), column spans can be determined by pipe alignment. This is about as readable and as intuitive for writers as it gets:

| heading |              heading 2              |
|         |      sub head a      |  sub head b  |
|---------|----------------------|--------------|
| aaa     | this is still just   | ccc          |
|         | a single row but I   |              |
|         | talk too much        |              |
|.........|......................|..............|
| aaa     | bbb                  | ccc          |
|.........|......................|..............|
| this spans two rows and two    | ccc          |
| columns                        |..............|
|                                | ccc          |
|.........|......................|..............|
| aaa     | bbb                  | ccc          |
|.........|......................|..............|
| aaa     | - item one           | ccc          |
|         | - item two           |              |

Notice that the header clearly has two rows despite not having an explicit row delimiter because of the differing column spans of the two text rows. If this is too hard for parsers to see, we could require that an explicit row delimiter be used.

2 Likes

@vas - some nice ideas there. Two comments:

  1. In some of pandoc’s simple, multiline, and grid table formats (all very readable), we make use use column alignment. This doesn’t create a big problem for parsing. But it does create a problem for people entering tables in text boxes, which very often (as on this forum) don’t use monospaced fonts. When you’re using a proportionally spaced font, it’s next to impossible to line things up. So it’s probably better if the proposed syntax does not rely on column alignment.

  2. Instead of using periods, it would look better to use hyphens, and use = signs for the headers. Hyphens are in the middle of the line, while periods are on the bottom, and I think this makes them a better separator. I see that this has two drawbacks, though: it loses backwards compatibility, and it requires finding some other way to mark row headers. Still, it might be worth considering.

@mity - your point about the Catch-22 is a great one. Remember, though, that on this proposal an initial and final | is always required, and a table can’t interrupt a paragraph (or so I think it would be reasonable to stipulate). So to get the scenario you describe, you’d need a paragraph that starts with an unescaped |, like:

| some `code with pipes |
| and` more `code with pipes |
|` and a final unescaped |

Here we’d get a table, but this could be fixed easily by backslash-escaping the first | in the paragraph, or by re-wrapping. (The second line can’t start a table because it would be interrupting a paragraph.) It’s also extremely unlikely that anything like this would occur in normal writing: how often do you begin a paragraph with a pipe character?

I agree that it might be good to put some proposals for tables on GitHub, maybe in the CommonMark repository.

1 Like

So it’s probably better if the proposed syntax does not rely on column alignment.

Agreed. Would it be a bad idea if the spec supported both means of determining column span? e.g,

| aaa | bbb |
| ccc ||

would be equivalent to

| aaa | bbb |
| ccc       |

Instead of using periods, it would look better to use hyphens

But only marginally better I think:

| heading |              heading 2              |
|         |      sub head a      |  sub head b  |
|---------|----------------------|--------------|
| aaa     | this is still just   | ccc          |
|         | a single row but I   |              |
|         | talk too much        |              |
|.........|......................|..............|
| aaa     | bbb                  | ccc          |
|.........|......................|..............|
| this spans two rows and two    | ccc          |
| columns                        |..............|
|                                | ccc          |
|.........|:....................:|..............|
| aaa     |      centered        | ccc          |
|.........|......................|..............|
| aaa     | - item one           | ccc          |
|         | - item two           |              |
| heading |              heading 2              |
|         |      sub head a      |  sub head b  |
|=========|======================|==============|
| aaa     | this is still just   | ccc          |
|         | a single row but I   |              |
|         | talk too much        |              |
|---------|----------------------|--------------|
| aaa     | bbb                  | ccc          |
|---------|----------------------|--------------|
| this spans two rows and two    | ccc          |
| columns                        |--------------|
|                                | ccc          |
|---------|:--------------------:|--------------|
| aaa     |      centered        | ccc          |
|---------|----------------------|--------------|
| aaa     | - item one           | ccc          |
|         | - item two           |              |

Options:

  1. - for headers, . for rows, because GFM backwards compatibility is worth more
  2. = for headers, - for rows, because it’s more readable. # or + can be used to mark row headers.
  3. support both, with the occurrence of a = delimiter row changing the meaning of a - delimiter row.

Option 3 might be too complex for users. Not so much worried about parsers.

Also I think @chrisalley might be right above when he says cell alignment is a presentation thing, in which case I’d remove the column alignment stuff from my proposal. It would certainly simplify it.

Not often, of course. However your counter-example silently assumes that table cannot interrupt a paragraph like e.g. lists can.

Not silently – I say that explicitly: “a table can’t interrupt a paragraph (or so I think it would be reasonable to stipulate)”. I don’t think this is in the crude syntax description I provided, but it should be, for exactly this reason.

@jgm Sorry, overlooked that.

2 Likes

This discussion will soon be four years old, it would be lovely if at some point something was standardized :slight_smile:

3 Likes

A pipe table syntax with lots of features:

|              | Header 1        | Header 2                       || Header 3                       ||
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|-----------------|----------------|----------------|----------------|----------------|
| Row Header 1 | 3row, 3col span                                 ||| Colspan only                   ||
| Row Header 2 |       ^                                         ||| Rowspan only   | Cell           |
| Row Header 3 |       ^                                         |||       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
:              :  with multiple  :  has room for  :   multi-line   :    multi-line  :  multi-line    :
:              :  lines.         :  more text.    :      text.     :         text.  :  text.         :
|--------------|-----------------|----------------|----------------|----------------|----------------|
[Caption Text]

  1. Multiple rows of headers and subheaders (Thank you vas)

  2. Row headers, which are indicated by replacing dashes - with equals signs = in the first column’s delimiter row (Thank you again vas)

  3. Row spans using a carat/circumflex ^ (Thank you jgm)

  4. Column spans using multiple pipes ||| (MultiMarkdown, Maruku)

  5. Caption surrounded by brackets [ ] on the line just below the table (MultiMarkdown)

  6. Multi-line cell continuation using a colon : in place of a pipe | (like in PostreSQL’s interactive terminal, as discussed by David Wheeler in RFC: A Simple Markdown Table Format and suggested above by illionas)

  7. Per-cell alignment using colon(s) : inside a cell, to the left/right/both of the cell’s first line of text (inspired by the |:---:| syntax for per-column alignment)


An alternative syntax, for better compatibility with existing pipe tables:

| Caption Text |                 |                |                |                |                |
|--------------|-----------------|----------------|----------------|----------------|----------------|
|              | Header 1        | Header 2       |        <       | Header 3       |        <       |
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|-----------------|----------------|----------------|----------------|----------------|
| Row Header 1 | 3row, 3col span |       <        |        <       | Colspan only   |        <       |
| Row Header 2 |       ^         |       <        |        <       | Rowspan only   | Cell           |
| Row Header 3 |       ^         |       <        |        <       |       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
|.            .|. with multiple .|. has room for .|.  multi-line  .|.   multi-line .|. multi-line   .|
|.            .|. lines.        .|. more text.   .|.     text.    .|.        text. .|. text.        .|

A future extension could implement this syntax.

For now, in GFM and other existing Markdown flavors, it falls back to an ordinary table whose cells contain symbols that visually represent the additional features:

  1. One row in the rendered table consists of cells containing only dashes - and cells containing only equals signs =. This resembles a “delimiter row”. Cells above this row represent headers and subheaders. Cells below that row represent other table cells.

  2. In the “delimiter row”, if the first cell contains only equals signs =, it indicates that the cells in the first column represent row headers.

  3. A row span is represented by cells containing only a carat/circumflex ^.

  4. A column span is represented by cells containing only a “less-than” symbol <. (Inspired in part by 0x666C697473’s above proposal to use a “greater-than” symbol > for column spans.)

  5. A caption is represented by text in the top-left cell (above all other cells, headers, etc.)

  6. Multi-line cell continuation is represented by row(s) whose cells each start and end with a single dot that is whitespace-separated from the cell’s text content. |. (text) .| Visually, the dots in each column resemble a vertical ellipsis which indicates that the above cell continues downward.

  7. Per-cell text alignment is indicated if a cell (or, in a multiline cell, the first line) starts and/or ends with a colon that is whitespace-separated from the cell’s text content. | (text) :|, |: (text) |, |: (text) :|

In this table, the Markdown text can be compressed to remove extra dashes, equals signs, and whitespace, as long as there remains whitespace next to colons |: :| for alignment and next to dots |. .| for cell continuation.

EDIT: I have revised the above syntax proposal to simplify it. The earlier version was as follows:

|==============|                 |                |                |                |                |
|--------------|-----------------|----------------|----------------|----------------|----------------|
|              | Header 1        | Header 2       |        <       | Header 3       |        <       |
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|=================|================|================|================|================|
| Row Header 1 | 3row, 3col span |       <        |        <       | Colspan only   |        <       |
| Row Header 2 |       ^         |       <        |        <       | Rowspan only   | Cell           |
| Row Header 3 |       ^         |       <        |        <       |       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
|.            .|. with multiple .|. has room for .|.  multi-line  .|.   multi-line .|. multi-line   .|
|.            .|. lines.        .|. more text.   .|.     text.    .|.        text. .|. text.        .|
|--------------|-----------------|----------------|----------------|----------------|----------------|
| Caption Text |
4 Likes

There is a related topic discussing the introduction of figure environments and captions for images: Image tag should expand to figure when used with title It would be nice if a unified caption syntax would emerge from these two topics.

Captions could be added as a table row at the bottom (or perhaps top) of a pipe table. But since an image isn’t made up of rows it wouldn’t make sense for image captions to also use this syntax. However, both transcluded images and CSV table content blocks do share the same caption syntax.

1 Like

Modifying meg’s proposal, here’s a table syntax based on key-value pairs:

title: The Title | name: The Name | ph: The Phone
-|-|-
title: value 1
name:  value 2
ph:    value 3
||
title: value 4
name:  value 5
ph:    value 6

An alternative, using headers as keys:

The Title | The Name | The Phone
-|-|-
The Title: value 1
The Name:  value 2
The Phone: value 3
||
The Title: value 4
The Name:  value 5
The Phone: value 6

A future extension could convert the above code into this table:

GFM-table-2

In current GitHub-Flavored Markdown, it falls back to a table with keys/values in the first column:

@aoudad: This syntax violates the Prime Directive. It’s worth reading the discussion in that thread.

Even though it’s spelled out both by Gruber when he introduced Markdown and by @jgm in the introduction to CommonMark, a lot of ideas and proposals on this forum lose sight of it, and it confuses efforts to solidify and advance this standard. Maybe create a topic titled “The Philosophy and Spirit of Markdown” or “The Markdown Prime Directive”, and pin it to the top of the forum? @jgm, @codinghorror, what do you think? I’d be happy to make the initial post (I’ve been drafting something about this), though it might be best if it came from John. I realize, John, that you’ve already done this in What Is Markdown? Maybe just post and pin that at the top of the forum?

I think it important that the philosophy and spirit stay in the forefront of everyone’s minds as we try to get to v1.0 as well to v1.1 or v2. Any new directions that ditch the original philosophy are fine, but they shouldn’t be called Markdown.

5 Likes

For better readability, the table could be written as follows:

| The Title | The Name | The Phone |
|-----------|----------|-----------|
| The Title: value 1               |
| The Name:  value 2               |
| The Phone: value 3               |
| ||                               |
| The Title: value 4               |
| The Name:  value 5               |
| The Phone: value 6               |

These additional pipes and dashes make it look more table-like. And as before, it falls back to a valid table in existing GitHub-Flavored Markdown.


The question is, does readability take absolute priority over everything else? If so, “Prime Directive” would seem to be an appropriate metaphor.

I would argue, though, that sometimes readability should be weighed against other considerations. For example, compatibility with existing Markdown flavors is essential to CommonMark’s mission of specifying Markdown. Also, CommonMark needs to respect the Principle of Uniformity (i.e. text content has the same meaning whether or not it is inside a container block), since the spec for lists and block quotes presupposes this principle.

In any case, I like your idea for a topic about the philosophy and spirit of Markdown. This is becoming a longer discussion, and I think it’s worthy of its own place on the forum.

3 Likes

I really like this.
My idea is improvement for more machine- and human-readable.

Simple One Rule: Always start from a pipe character (|) for machine-readability.
This rule would avoid conflicts with other syntax.

|######################################## Caption Text ##########################################
|_______________________________________________________________________________________________,
|              | Header 1      || Header 2                     || Header 3                      |
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2 |
|==============|---------------|---------------|---------------|----------------|---------------|
| Row Header 1 ||| 3row, 3col span                             || Colspan only                  |
|______________|                                               |________________|_______________|
| Row Header 2 |^                                              |  Rowspan only  | Cell          |
|______________|                                               |                |_______________|
| Row Header 3 |^                                              |^               | Cell          |
|______________|_______________________________________________|________________|_______________|
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned |
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line   |
|~             | lines.        | more text.    |      text.    |         text.  |  text.        |
|______________|_______________|_______________|_______________|________________|_______________/

For human-readability, rule lines are sometimes useful. For machines, however, these have no mean.
So lines starting from |_ can be introduced, which can be ignored like comment lines.

Let’s remove lines starting from |_.

|######################################## Caption Text ##########################################
|              | Header 1      || Header 2                     || Header 3                      |
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2 |
|==============|---------------|---------------|---------------|----------------|---------------|
| Row Header 1 ||| 3row, 3col span                             || Colspan only                  |
| Row Header 2 |^                                              |  Rowspan only  | Cell          |
| Row Header 3 |^                                              |^               | Cell          |
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned |
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line   |
|~             | lines.        | more text.    |      text.    |         text.  |  text.        |

I think that it is better to use double or more pipe character BEFORE a table cell.
It makes parser a little easier for the colspan attribute creation.
Also, it allows us omit the last pipe character.

|######################################## Caption Text ##########################################
|              | Header 1      || Header 2                     || Header 3
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2
|==============|---------------|---------------|---------------|----------------|---------------
| Row Header 1 ||| 3row, 3col span                             || Colspan only
| Row Header 2 |^                                              |  Rowspan only  | Cell
| Row Header 3 |^                                              |^               | Cell
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line
|~             | lines.        | more text.    |      text.    |         text.  |  text.

Lines starting from |# constitute a caption text.
For this example, I write a caption text before a table because a <caption> element should be the first child of <table> element, but it’s not important.
Like lines for h1, h2, …, enclosing text with # should be allowed but its count does not matter.
The simplest form is |# Caption Text.

Keyword |^ increases rowspan, but no space should be allowed between | and ^ to simplify parser.
This no space rule would be also useful for other keywords.

To describe a row with multiple lines, keyword |~ can be used at the first of subsequent lines, instead of : use as a column separator.
Optionally |~ can be used not only at the first but also at each separator, but the first |~ is required.

Finally, I think that table syntax can be a extension of CommonMark, but I will be happy if it is released as a formal specification!

2 Likes