Obvious Markdown syntax for Tables

First, I’m new to working on CommonMark, so I’m guessing this is the best way to propose a new and consider nuances before writing code and submitting pull requests. If there is a better format, please let me know.

Markdown Tables Should Be Obvious

The original goal of Markdown is to create a rendering for the way people type normally. Someone who never used Markdown might stumble onto emphasized, bold, and underline just by typing things. Similarly, naturally communicating in raw text will lead to headings and lists. So why doesn’t this work with tables.

When I want I table in text, I type it:

        Orcs            1,234           Nasty buggers
        Trolls            394           Regenerating
        Goblins         9,234          Weak, but many
        Humans         16,900          Combined Armies

No fuss, no muss. It looks like a table in text. It has two key features:

  1. It is it own paragraph of two or more lines
  2. There are columns of spaces at least three wide down the entire table.

So this is not a table:

Orcs 1,234  Nasty Buggers
Humans 16,900 Combined Armies

We can add a rule for headers and footers as well:

  1. A line with only dashes or equal signs designates a header or footer.
  2. If this line is the second line, it designates a header line,and if the second from the bottom and not the second line, it designates a footer.

So this is a table:

Race            Size      Comment
--                --      --
Orcs           1,234      Nasty buggers
Trolls           394     Regenerating
Goblins       9,234       Weak, but many
Humans        16,900     Combined Armies
              ======
*Total*       27,762

Rendering tables is straightforward. Everything is put in a TABLE tag, with one THEAD, TR, or TFOOT per line, and each entry is rendered into a TD. Each entry, or ‘cell’, may have formatting within it, e.g., bold or emphasize or even sup/sub if supported.

What’s missing?

  • Writing the markdown in variable width fonts still works: it just takes a lot of extra spaces.
  • Do people want to allow cells like “Nasty 1”?
  • Formatting does not cross columns: “* Col1 Col2*” is just be two asterisks.
  • Complex tables are for embedded HTML; this only handles simple tables.

I humbly asks for your considered opinions,

Charles

2 Likes

I completely agree whitespace delimited columns are the most natural table form for humans. They are what most of us would do without any rules or guidelines telling us what to do.

There are a number of Markdown variants that support such tables, but they require fixed width fonts. For example, Pandoc’s Simple Tables. Your approach avoids that.

I’m working on a project that supports nearly arbitrary plain text table styles. In my analysis all the plain text table styles I’ve encountered or can think of break down into two classes: Separated Value (SV) tables and Aligned Grid (AG) tables. CSV and GFM tables are SV tables. Pandoc Simple Tables are AG tables.

Your table style is an SV table, where the separator is three or more spaces. So it works like a GFM table where vertical alignment of the columns is optional and only for plain text aesthetics. The parser could care less.

2 Likes

Thank you for your comments.

Looking at Pandoc’s Simple Tables, it appears another case where the table must be perfect to work, meaning writing the table would be fiddly. I think working with variable fonts will usually work. Many variable fonts have repeated spaces at about half the width of a normal character, so it is possible to muck it up.

I like your breakdown of types of tables. On the project you are working on, have you been running into ambiguous possible tables?

Have a great day!

Your idea, “Three Space Table” (3SP) if you don’t mind my giving it a name, works with variable width fonts and even narrow spaces at table creation time since one could use as many spaces as needed for visual clarity. A problem would only occur when later editing the same table in some other context that uses a font with very different relative widths and spacing. If one is liberal with spaces, you can usually avoid visual ambiguity even if the table looks ragged, but worst case it can end up hard for a human to read. The machine won’t care.

pickle           typical ingredients                                     culture    heat
--------           ------------------------                                     ---------    ------
kimchi          fermented cabbage, seasonings               Korea     8
cornichons   gherkins, vinegar, tarragon                       France    0
avakaya       mango, mustard seeds, oil, seasonings   India       10

When I entered the 3SP table above, all of the columns were aligned in the variable-width font of the reply box:

In this regard, behavior under changing font styles, I don’t think 3SP is much worse or better than a GFM table similarly aligned when entered:

pickle         | typical ingredients                                   | culture  | heat
--------         | ------------------------                                   | ---------  | ------
kimchi        | fermented cabbage, seasonings             | Korea   | 8
cornichons | gherkins, vinegar, tarragon                     | France  | 0
avakaya     | mango, mustard seeds, oil, seasonings | India      | 10

The same GFM table without attempts to align:

pickle | typical ingredients | culture | heat
--- | --- | ---  | ---
kimchi | fermented cabbage, seasonings | Korea | 8
cornichons | gherkins, vinegar, tarragon | France | 0
avakaya | mango, mustard seeds, oil, seasonings | India | 10

The same 3SP table without attempts to align:

pickle   typical ingredients   culture   heat
---   ---   ---    ---
kimchi   fermented cabbage, seasonings   Korea   8
cornichons   gherkins, vinegar, tarragon   France   0
avakaya   mango, mustard seeds, oil, seasonings   India   10

Let’s try the reverse. This 3SP table is spaced to look great in a fixed width font:

pickle      typical ingredients                    culture  heat
------      -------------------                    -------  ----
kimchi      fermented cabbage, seasonings          Korea    8
cornichons  gherkins, vinegar, tarragon            France   0
avakaya     mango, mustard seeds, oil, seasonings  India    10

Here’s how it looked when I entered it:

Technically there is never ambiguity because deterministic machine parsing rules govern. In practice, especially for a format that is designed for humans, there is plenty of ambiguity for the human reader. some styles are more prone than others. I can’t cover them all here, but I’ll go into one issue: empty cells.

For GFM tables, I think the biggest source of ambiguity is how the optionality of leading | interacts with intended empty cells. What the human eye sees below clashes with how the GFM parser sees it:

column a | column b 
-------- | -------- 
in a     | in b 
         | in a

Of course you can avoid it by consistently using leading pipes. How does an 3SP table handle empty cells?

column a   column b   column c
------------   ------------   ------------
in a            in b           in c
in a                             in c?
                  in b?         in c?
                                   in c?

How it looked when entered:

You might think, comparing both of the above, that there isn’t any ambiguity. But that’s the human eye. What would the deterministic machine rules be? And when humans lazily apply those machine rules (i.e. using the minimal number of spaces), won’t we end up with a visual mess? Below I assumed these 3SP table rules: (1) three spaces between adjacent cells and (2) a single space for an empty cell:

column a   column b   column c
------------   ------------   ------------
in a   in b   in c
in a       in c?
    in b?   in c?
        in c?

What would better 3SP table rules be?

FYI, the “heat” index in my example tables are completely made up :joy:.

Here’s a formatted table with links:

pickle typical ingredients culture heat
kimchi fermented cabbage, seasonings Korea 8
cornichons gherkins, vinegar, tarragon France 0
avakaya mango, mustard seeds, oil, seasonings India 10

The page for avakaya says:

South Indians are known to have a deep attachment to these spicy pickles.

I can personally attest to that :fire::heart:.

Sorry for taking so long…

There is an unsolvable issue: the raw text of writing a table in a variable width font is going to render differently in different fonts. It can get weirder at the edges of Unicode, e.g., multiple glyphs per character or changing text direction within a line. In many editors and fonts, the width of sequences of spaces is so compact that a table written with a monospace font would not be recognizable in a variable width and the other way around. As an example, here are a few lines, all ten characters long:

0123456789
| |
||||||||||
WWWWWWWWWW

Given this problem, I think I understand the miscommunication. Obvious Tables are not the idea of using at least three spaces as the break between columns, replacing the GMD ‘|’ character. This would lead to frustration when people try to create a table with a blank cell and cannot. The limited functionality of a three space table control would probably offer too little functionality and could be better replaced with an editor pass to just put in the pipe bars.

Obvious Tables, to differentiate from 3SP Tables, does require the idea of nearly fixed width characters. This may be a non-starter for some: it would require editor support that rendered sequences of spaces wider than individual spaces. The table format is fairly forgiving and only wants a sequence of spaces to render in about +/- 30% the width of an equivalent sequence of letters. True monospace fonts would not be required. Entry can be fairly sloppy as long the columns sort of line up.

You asked for the persnickety rules of Obvious Tables:

  • Obvious Tables fit in one paragraph. A paragraph is a section of text such that all lines contain non-blank characters and is not trailed by another non-blank character line. If a paragraph meets the criteria for an OT, then it is one.
  • Characters are treated as having a uniform width of ‘one’, after normalization for tabs and multi-glyph Unicode characters. This implies that each character will have an integer column number.
  • The Smushed Line (always wanted to Smush in a spec :slight_smile: ) is a single line representation of the paragraph with ‘X’ in a column with if any line has a character in that column number or ’ ’ if no line has a character at that column number.
  • Each Gap is three or more spaces in the Smushed Line that begins and ends with a non-space, i.e., ignoring the beginning and ending parts of the line. Each Span is the tuple of (beginning column number, ending column number) for each contiguous set of non-blank characters in the smush line that abuts a Gap.
  • A Divider is a line in the paragraph that contains only ‘-’ or ‘=’ in addition to whitespace. A Divider on the second line indicates the first line is the Header Line. A Divider that is after the second line and is one line before the end of the paragraph indicates the last line of the paragraph is the Footer Line. Any other Divider is a Misplaced Divider.
  • An Obvious Table contains two or more Spans and no Misplaced Dividers.

The rendering process is not mysterious. Here are the highlights:

  • The alignment of each column is usually the default (usually left) but right if all entries in that column are numeric. What is ‘numeric’ could be clarified, is should include digits and ,.+-$' and maybe other currency symbols. Is there a Unicode class for this? If someone really wants a numeric column to be left justified, then can insert an  ` or not use Obvious Tables.
  • On each line, the beginning and ending spaces within a Span are irrelevant. Blank cells are rendered (<td>&nbsb;</td>).

When I first implemented this, for Python Markdown, I also allowed for tables with multi-line cells. I can write out the persnickety bits for that here as well. It generally requires having two blank lines at the end of a table instead of just one. For example, with some sloppy entry, is a table with only a Header and two table rows:

       Creatures            HP            Attacks                    Description
    -----
    Dragon              12D12          3D12 Fire Blast         The big lizard
                                          2D6 Claws

    Orc                  2D6              1D8 Spear            Ugly

Do you want the persnickety rules for these as well?

OK. There seem to be no more comments on the first batch. I will start writing up the idea of adding tables as core markdown, using these tables as an option. Here’s the Overview, which hints at the scope. The entire document is the scope and I’ll post it when done.

Have a wonderful day,

Charles

Overview

The Obvious Tables Project (OTP) adds tables to core CommonMark MarkDown. This is expected to be a long and controversial process. Markdown should support tables in the way people write normally and follow the principle of least surprise. The end goal will be for tables in Markdown to be simple, obvious, ubiquitous and flexible.

The problem has always been that people write tables in text in multiple ways. Also, tables have a fast feature creep as column justification, column spanning, variable cell span, sorting, and collapse features each trade a complication for additional power. Table syntax becomes more difficult to use as it becomes more complicated. The original MarkDown can be learned from one page of examples; learning about tables should require no more than a page of examples.

The scope of adding tables to markdown includes:

  • Examples of how to write and use tables, including Github Tables (GHT, the | delimiter), spaces as columns, and block text tables
  • Details of exactly how tables are parsed, memory issues, and how intermediate forms play to parsing issues
  • CommonMark specification edits and write-ups to support table parsing
  • Full set of examples and compatibility tests for tables
  • Extensions or forks for top four most used markdown processors to show a compatible table system across major implementations
  • Work on top three on page editors to minimally support tables, aimed at increasing display width of sequences of spaces

I’m not really super involved here but I’ve been talking about tables a lot on this Discourse over the last year or so and - while I appreciate your enthusiasm - I don’t think you should be expecting anything about tables to become part of core MD - there’s many really excellent implementations of tables that already exist that are not part of core because tables aren’t seen as necessary in all cases, so it’s classified as an extension that communities using CommonMark can add without it being there if they don’t need it.

There have been discussions about adding tables as core since 2014 at least and (from my understanding) there’s no general interest in adding tables to core.

There’s a related discussion about exactly this here.

So, my understanding is, while you’re welcome to create an extension for this purpose, please know that’s as far as it’s likely to go. :slight_smile:

separating core software issues from others is sensible, but non-core features can be essential. for example, the C programming language would not be at the foundation of so many things if it didn’t have a standard library. thus, core specifications reconcile fundamentals while extensions supply invaluable high-level features.

adopting a core-first mindset, i reviewed the 0.30 Milestone issues to estimate the 1.0 release date. unfortunately, my notes quickly became cryptic. in situations like this, i try to categorize the issues i discover. this style of analysis induces me to structure my notes, so table usage is almost inevitable.

at the risk of making this reply far too long, Markdown tables seem even more useful in real life as it were. i use them to record test results produced by various healthcare providers. most recently, i made a table to reconcile two compilations of 78 rpm recordings with an American Music discography. i never generate HTML from these Markdown documents.

creating/maintaining a tabular summary might facilitate monitoring the status of Milestone issues. furthermore, standardizing table syntax might be easier than finalizing the 1.0 core specification? after all, a specification can add value by formalizing common usage, cf. your favorite dictionary. Le mieux est l’ennemi du bien!