Tables in pure Markdown

@jgm Sorry, overlooked that.

2 Likes

This discussion will soon be four years old, it would be lovely if at some point something was standardized :slight_smile:

5 Likes

A pipe table syntax with lots of features:

|              | Header 1        | Header 2                       || Header 3                       ||
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|-----------------|----------------|----------------|----------------|----------------|
| Row Header 1 | 3row, 3col span                                 ||| Colspan only                   ||
| Row Header 2 |       ^                                         ||| Rowspan only   | Cell           |
| Row Header 3 |       ^                                         |||       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
:              :  with multiple  :  has room for  :   multi-line   :    multi-line  :  multi-line    :
:              :  lines.         :  more text.    :      text.     :         text.  :  text.         :
|--------------|-----------------|----------------|----------------|----------------|----------------|
[Caption Text]

  1. Multiple rows of headers and subheaders (Thank you vas)

  2. Row headers, which are indicated by replacing dashes - with equals signs = in the first column’s delimiter row (Thank you again vas)

  3. Row spans using a carat/circumflex ^ (Thank you jgm)

  4. Column spans using multiple pipes ||| (MultiMarkdown, Maruku)

  5. Caption surrounded by brackets [ ] on the line just below the table (MultiMarkdown)

  6. Multi-line cell continuation using a colon : in place of a pipe | (like in PostreSQL’s interactive terminal, as discussed by David Wheeler in RFC: A Simple Markdown Table Format and suggested above by illionas)

  7. Per-cell alignment using colon(s) : inside a cell, to the left/right/both of the cell’s first line of text (inspired by the |:---:| syntax for per-column alignment)


An alternative syntax, for better compatibility with existing pipe tables:

| Caption Text |                 |                |                |                |                |
|--------------|-----------------|----------------|----------------|----------------|----------------|
|              | Header 1        | Header 2       |        <       | Header 3       |        <       |
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|-----------------|----------------|----------------|----------------|----------------|
| Row Header 1 | 3row, 3col span |       <        |        <       | Colspan only   |        <       |
| Row Header 2 |       ^         |       <        |        <       | Rowspan only   | Cell           |
| Row Header 3 |       ^         |       <        |        <       |       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
|.            .|. with multiple .|. has room for .|.  multi-line  .|.   multi-line .|. multi-line   .|
|.            .|. lines.        .|. more text.   .|.     text.    .|.        text. .|. text.        .|

A future extension could implement this syntax.

For now, in GFM and other existing Markdown flavors, it falls back to an ordinary table whose cells contain symbols that visually represent the additional features:

  1. One row in the rendered table consists of cells containing only dashes - and cells containing only equals signs =. This resembles a “delimiter row”. Cells above this row represent headers and subheaders. Cells below that row represent other table cells.

  2. In the “delimiter row”, if the first cell contains only equals signs =, it indicates that the cells in the first column represent row headers.

  3. A row span is represented by cells containing only a carat/circumflex ^.

  4. A column span is represented by cells containing only a “less-than” symbol <. (Inspired in part by 0x666C697473’s above proposal to use a “greater-than” symbol > for column spans.)

  5. A caption is represented by text in the top-left cell (above all other cells, headers, etc.)

  6. Multi-line cell continuation is represented by row(s) whose cells each start and end with a single dot that is whitespace-separated from the cell’s text content. |. (text) .| Visually, the dots in each column resemble a vertical ellipsis which indicates that the above cell continues downward.

  7. Per-cell text alignment is indicated if a cell (or, in a multiline cell, the first line) starts and/or ends with a colon that is whitespace-separated from the cell’s text content. | (text) :|, |: (text) |, |: (text) :|

In this table, the Markdown text can be compressed to remove extra dashes, equals signs, and whitespace, as long as there remains whitespace next to colons |: :| for alignment and next to dots |. .| for cell continuation.

EDIT: I have revised the above syntax proposal to simplify it. The earlier version was as follows:

|==============|                 |                |                |                |                |
|--------------|-----------------|----------------|----------------|----------------|----------------|
|              | Header 1        | Header 2       |        <       | Header 3       |        <       |
|              | Subheader 1     | Subheader 2.1  | Subheader 2.2  | Subheader 3.1  | Subheader 3.2  |
|==============|=================|================|================|================|================|
| Row Header 1 | 3row, 3col span |       <        |        <       | Colspan only   |        <       |
| Row Header 2 |       ^         |       <        |        <       | Rowspan only   | Cell           |
| Row Header 3 |       ^         |       <        |        <       |       ^        | Cell           |
| Row Header 4 |  Row            |  Each cell     |:   Centered   :| Right-aligned :|: Left-aligned  |
|.            .|. with multiple .|. has room for .|.  multi-line  .|.   multi-line .|. multi-line   .|
|.            .|. lines.        .|. more text.   .|.     text.    .|.        text. .|. text.        .|
|--------------|-----------------|----------------|----------------|----------------|----------------|
| Caption Text |
1 Like

There is a related topic discussing the introduction of figure environments and captions for images: Image tag should expand to figure when used with title It would be nice if a unified caption syntax would emerge from these two topics.

Captions could be added as a table row at the bottom (or perhaps top) of a pipe table. But since an image isn’t made up of rows it wouldn’t make sense for image captions to also use this syntax. However, both transcluded images and CSV table content blocks do share the same caption syntax.

Modifying meg’s proposal, here’s a table syntax based on key-value pairs:

title: The Title | name: The Name | ph: The Phone
-|-|-
title: value 1
name:  value 2
ph:    value 3
||
title: value 4
name:  value 5
ph:    value 6

An alternative, using headers as keys:

The Title | The Name | The Phone
-|-|-
The Title: value 1
The Name:  value 2
The Phone: value 3
||
The Title: value 4
The Name:  value 5
The Phone: value 6

A future extension could convert the above code into this table:

GFM-table-2

In current GitHub-Flavored Markdown, it falls back to a table with keys/values in the first column:

@aoudad: This syntax violates the Prime Directive. It’s worth reading the discussion in that thread.

Even though it’s spelled out both by Gruber when he introduced Markdown and by @jgm in the introduction to CommonMark, a lot of ideas and proposals on this forum lose sight of it, and it confuses efforts to solidify and advance this standard. Maybe create a topic titled “The Philosophy and Spirit of Markdown” or “The Markdown Prime Directive”, and pin it to the top of the forum? @jgm, @codinghorror, what do you think? I’d be happy to make the initial post (I’ve been drafting something about this), though it might be best if it came from John. I realize, John, that you’ve already done this in What Is Markdown? Maybe just post and pin that at the top of the forum?

I think it important that the philosophy and spirit stay in the forefront of everyone’s minds as we try to get to v1.0 as well to v1.1 or v2. Any new directions that ditch the original philosophy are fine, but they shouldn’t be called Markdown.

3 Likes

For better readability, the table could be written as follows:

| The Title | The Name | The Phone |
|-----------|----------|-----------|
| The Title: value 1               |
| The Name:  value 2               |
| The Phone: value 3               |
| ||                               |
| The Title: value 4               |
| The Name:  value 5               |
| The Phone: value 6               |

These additional pipes and dashes make it look more table-like. And as before, it falls back to a valid table in existing GitHub-Flavored Markdown.


The question is, does readability take absolute priority over everything else? If so, “Prime Directive” would seem to be an appropriate metaphor.

I would argue, though, that sometimes readability should be weighed against other considerations. For example, compatibility with existing Markdown flavors is essential to CommonMark’s mission of specifying Markdown. Also, CommonMark needs to respect the Principle of Uniformity (i.e. text content has the same meaning whether or not it is inside a container block), since the spec for lists and block quotes presupposes this principle.

In any case, I like your idea for a topic about the philosophy and spirit of Markdown. This is becoming a longer discussion, and I think it’s worthy of its own place on the forum.

2 Likes

I really like this.
My idea is improvement for more machine- and human-readable.

Simple One Rule: Always start from a pipe character (|) for machine-readability.
This rule would avoid conflicts with other syntax.

|######################################## Caption Text ##########################################
|_______________________________________________________________________________________________,
|              | Header 1      || Header 2                     || Header 3                      |
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2 |
|==============|---------------|---------------|---------------|----------------|---------------|
| Row Header 1 ||| 3row, 3col span                             || Colspan only                  |
|______________|                                               |________________|_______________|
| Row Header 2 |^                                              |  Rowspan only  | Cell          |
|______________|                                               |                |_______________|
| Row Header 3 |^                                              |^               | Cell          |
|______________|_______________________________________________|________________|_______________|
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned |
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line   |
|~             | lines.        | more text.    |      text.    |         text.  |  text.        |
|______________|_______________|_______________|_______________|________________|_______________/

For human-readability, rule lines are sometimes useful. For machines, however, these have no mean.
So lines starting from |_ can be introduced, which can be ignored like comment lines.

Let’s remove lines starting from |_.

|######################################## Caption Text ##########################################
|              | Header 1      || Header 2                     || Header 3                      |
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2 |
|==============|---------------|---------------|---------------|----------------|---------------|
| Row Header 1 ||| 3row, 3col span                             || Colspan only                  |
| Row Header 2 |^                                              |  Rowspan only  | Cell          |
| Row Header 3 |^                                              |^               | Cell          |
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned |
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line   |
|~             | lines.        | more text.    |      text.    |         text.  |  text.        |

I think that it is better to use double or more pipe character BEFORE a table cell.
It makes parser a little easier for the colspan attribute creation.
Also, it allows us omit the last pipe character.

|######################################## Caption Text ##########################################
|              | Header 1      || Header 2                     || Header 3
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2
|==============|---------------|---------------|---------------|----------------|---------------
| Row Header 1 ||| 3row, 3col span                             || Colspan only
| Row Header 2 |^                                              |  Rowspan only  | Cell
| Row Header 3 |^                                              |^               | Cell
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line
|~             | lines.        | more text.    |      text.    |         text.  |  text.

Lines starting from |# constitute a caption text.
For this example, I write a caption text before a table because a <caption> element should be the first child of <table> element, but it’s not important.
Like lines for h1, h2, …, enclosing text with # should be allowed but its count does not matter.
The simplest form is |# Caption Text.

Keyword |^ increases rowspan, but no space should be allowed between | and ^ to simplify parser.
This no space rule would be also useful for other keywords.

To describe a row with multiple lines, keyword |~ can be used at the first of subsequent lines, instead of : use as a column separator.
Optionally |~ can be used not only at the first but also at each separator, but the first |~ is required.

Finally, I think that table syntax can be a extension of CommonMark, but I will be happy if it is released as a formal specification!

2 Likes

Let’s finish 1.0, then we focus on tables!

1 Like

Would it be possible to have single row header for multiple rows? For column headers, this is already implied: Header 2 spans multiple subheaders.

In this example, row header 2 would be spanning row 2 and 3:

|######################################## Caption Text ##########################################
|              | Header 1      || Header 2                     || Header 3
|              | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2
|==============|---------------|---------------|---------------|----------------|---------------
| Row Header 1 ||| 3row, 3col span                             || Colspan only
| Row Header 2 |^                                              |  Rowspan only  | Cell
|^             |^                                              |^               | Cell
| Row Header 4 | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned
|~             | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line
|~             | lines.        | more text.    |      text.    |         text.  |  text.

How would we do sub-rowheaders?
Here is an example with subheaders:

|######################################## Caption Text ##########################################
|               |           |Header 1      || Header 2                     || Header 3
|               |           | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2
|===============|===========|---------------|---------------|---------------|----------------|---------------
|| Row Header 1             ||| 3row, 3col span                             || Colspan only
|| Row Header 2 | subheader1|^                                              |  Rowspan only  | Cell
||^             | subheader2|^                                              |^               | Cell
|| Row Header 4             | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned
|~                          | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line
|~                          | lines.        | more text.    |      text.    |         text.  |  text.

But i get the feeling that this gets complicated to read & write quickly…

Yes. Just use |^.
For your example with subheaders, single pipe (|) should be used instead of double pipe (||) before “Row Header 2” and the below cell.

Corrected example:

|######################################## Caption Text ##########################################
|               |            | Header 1      || Header 2                     || Header 3
|               |            | Subheader 1   | Subheader 2.1 | Subheader 2.2 |  Subheader 3.1 | Subheader 3.2
|===============|============|---------------|---------------|---------------|----------------|---------------
|| Row Header 1              ||| 3row, 3col span                             || Colspan only
| Row Header 2  | subheader1 |^                                              |  Rowspan only  | Cell
|^              | subheader2 |^                                              |^               | Cell
|| Row Header 4              | Row           | Each cell     |:   Centered  :| Right-aligned :|: Left-aligned
|~                           | with multiple | has room for  |   multi-line  |    multi-line  |  multi-line
|~                           | lines.        | more text.    |      text.    |         text.  |  text.
  • “Row Header 1” is a merged 1 x 2 header cell, equal to <th colspan="2">Row Header 1</th>.
  • “Row Header 2” is a merged 2 x 1 header cell, equal to <th rowspan="2">Row Header 2</th>.
  • “Row Header 4” is a merged 1 x 2 header cell, equal to <th colspan="2">Row Header 4</th>.

Is it because you thought that the new syntax ||^ is needed?

Row-spanning should allow multiple circumflexes, as in underlined headings and row headers.

| foo | bar |
|-----|-----|
|^^^^^| baz |
|^    |    ^|
| quz | ^^^ |

Column-spans would then naturally use one or more less-than signs.

| foo | bar |<    | <<< |
| --- |-----|-    |    -|
| baz |<<<<<|    <| quz |

Since community input is desired - and most people seem to vote for ‘piped’ tables - I would like to add myself to the list of those, who’d prefer a more simple style.

To me, the “first rule” of Markdown is, as expressed by Gruber (emphasis is mine):

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

To me, that means Pandoc style ‘simple’ and ‘multi line’ tables:

Simple Table:

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Table:  Demonstration of simple table syntax.

Multiline Table:

-------------------------------------------------------------
 Centered   Default           Right Left
  Header    Aligned         Aligned Aligned
----------- ------- --------------- -------------------------
   First    row                12.0 Example of a row that
                                    spans multiple lines.

  Second    row                 5.0 Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

I would add an optional, numeric sequence to the caption, like in the following example.

ID    Name         Street            ZIP      City         Phone
--    ---------    --------------    -----    ---------    ---------
01    Tom Smith    Main St. 324      49322    Supertown    0233-4545
02    Jane Doe     Upper Rd. 3234    23234    Homeplace    0434-1343

Table 1: Caption (optional, numeric count is also optional)

Getting tables to align correctly is difficult enough already! It seems wisest, to first add each table row, check which is the longest, and then, as a last step, format the rest of the table according to that line.

I would like this kind of table to go into the “core” spec.

Introducing | or :, already contradicts: “without looking like it’s been marked up with tags or formatting instructions” Such tables could be a “blessed extension” To those I would add other, basic, formatting symbols people came up with.

2 Likes

Where do we track the status of this being accepted into the standard or refused as a wontfix?

This feature has been talked about a lot and it seems that the time has come to move on with something like GHFM’s use of piped tables and call it good for this version and iterate as needed. Otherwise, it just appears to be stalled as a feature request and years later, we still don’t have tables in the spec.

I don’t believe there was ever a plan to add tables to the core CommonMark spec - it is categorised under “extensions” for this reason. The goal has been to finalise the core feature set first, and then consider extensions at a later point (official extensions not being a concrete plan either). GitHub Flavoured Markdown is a well supported third party extension of the core CommonMark spec which can be used today.

Pushing this feature to an extension causes it to not be a feature of the core and then not be supported by folks who are only supporting core. It creates a lot of downstream confusion.

How would a group of folks lobby for this to be pushed into core?

1 Like

I also would be interested in this.

Tables are so common in many forms of literature, that they belong into the core.

That is also, why I vote for the simple table type, since that is unobtrusive and totally matches the Ur-Markdown spirit, IMO.

Well, tables wonʼt be part of the v1.0 core specification, but it should probably take care of special rules for the pipe character | already: CM-based GHFM requires it to be backslash-escaped even inside code spans in tables.

As soon as we have agreed upon a common way to document extensions, people should start to write specifications for their pet features which are at least as detailed as the core specification, so @jgm does not have to take care of everything.

1 Like