Tab-related issues

Here’s a nice case:

>[TAB][TAB]x

http://johnmacfarlane.net/babelmark2/?normalize=1&text=> x

Here is a summary of the different approaches different implementations take:

  1. Treat this exactly as if it’s expanded to spaces, that is, as equivalent to > + 7 spaces + x. In this case the first space is gobbled up as an optional space after >, the next 4 are treated as indentation for a code block, and we get a code block with two leading spaces.

  2. Treat this as if it’s expanded to spaces, but don’t treat the first space as an optional space after >. Then we get a code block with three leading spaces.

  3. Treat the first tab as an optional space character after >, and the second tab as the indentation of a code block. Then we get a code block with just x, with no leading spaces.

  4. Don’t treat the first tab as an optional space character. Treat it as code block indentation, leaving the second tab as part of the code block contents. Then we get a code block with a leading tab.

Now that we’re allowing tabs in code blocks, I find the first two approaches strange. After all, there are no spaces in the Markdown source – only tabs – so why should the code block contain spaces, which aren’t there? (This made more sense when we converted all tabs to spaces before parsing.)

I’m somewhat inclined to favor the 4th approach, but I fear that it might break some existing documents, making regular block quotes into quoted code blocks. So maybe the 3rd approach is best overall.

There is a related issue about lists. How, exactly, do we calculate padding when the list marker is followed by a tab? If we pretend we’re converted tabs to spaces here, we’ll again get code blocks that have spaces when there’s none in the text.

1.[TAB][TAB]hi

http://spec.commonmark.org/dingus/?text=1. hi

Note that this looks like a bug; this should contain a code block of some kind. But I need to get clearer on the spec before this can be fixed.

Comments welcome.

In approach 4, why would the first tab be treated as a code block indentation? It’s equivalent to only 3 spaces when using the “4 space tab stop” rule, no?

Yes, I suppose you’re right that #4 doesn’t make much sense.
I’d be curious about your thoughts on the others.

I have adjusted the reference implementations to implement approach 1, which I think is the one most consistent with existing processors. (The changes are in the repository but not yet in a released version.) Also added a few more tab-related cases to the spec.

1 Like

Oh, I thought you made a good argument in favor of option #3, which I preferred (even without @robinst debunking #4) since it comes closest to elastic tab-stops, but backwards compatibility is also a good point and apparently in favor of #1.

Option #2 didn’t make sense at all, because quotations are (now) the only block elements where the space is optional, which is a point that should be hidden as much as possible (if it can’t be changed). The mental model for authors is simpler if at least one space after the (possibly indented) line prefix is always required.

My thought is that if two texts look the same (with tab stop set to 4 spaces), then they should render the same. Also, backwards compatibility is important.

2 Likes

For the record: seems like this is resolved in the spec in favour of option 1. as well.

1 Like

That is a real shame, because this issue has bothered me in the implementations I have seen.

Expanding to spaces does not help me when I want to publish code examples for a tab-indented codebase. People are bound to copy/paste the examples at one point or another, causing all kinds of inconsistensies all over the place :frowning:

IMO, the “spec” seems overly hostile to codebases with tabs in them, even though the most popular C codebases use them (e.g. the linux kernel).

I would prefer #3 as well.

1 Like