Section 2.2 requires that tabs (HT) are not expanded into spaces (SP); I take this to mean that HT is required to be “preserved” and passed through into the output:
Tabs in lines are not expanded to spaces.
For code block lines, section 4.4 requires that “four spaces of indentation”:
The contents of the code block are the literal contents of the lines, including trailing line endings, minus four spaces of indentation.
Consider a code block input line with ( HT , HT , SP ) as indentation (visualized here as “^I^I␣
”), like the line containing “A
” in this input:
12345678 Consider this code block:
␣␣␣␣␣␣Code:
^I^I␣A
The words of the specification require removing “four spaces of indentation” from “^I^␣A
”, but does not say how: there’s only one SP that could be removed, and the preceding HT is not supposed to be “expanded to spaces” by the clear words of section 2.2.
The actual result produced by the reference implementation is this:
␣␣Code:
^I␣A
But I don’t see how this can be inferred from the specification (without silently, just on this occasion, skipping the “HT is not expanded to SP” rule!). To cover the actual behavior in the given example, the “minus four spaces” rule for code block lines could be rephrased as:
[…] minus four spaces of indentation if the indentation ends in four or more spaces, or else minus the last tab character in the indentation.
This would explain the above case nicely, but would fail for an input line like “^I␣^I␣X
” in a code block: The “X
” is in column 10 (counting from 1), removing the last HT gives “^I␣␣X
”, which leaves “X
” in column 7: it is thus akin to removing only three SP here—ups! We could instead remove the first HT here, producing “␣^I␣X
”, where “X
” ends up in column 6, as if we removed “four spaces of indentation”, as required.
The problem here, I would sayt, is that in the general case, in a sequence of SP and HT:
- not every SP effectively moves the active position to the right by one (if followed by HT), and
- not every HT effectively moves the active position to the right by four (by definition).
So the “no HT expansion” requirement, combined with the “as if expanded” description of behavior, makes things rather messy here: exactly which characters of the indentation string are supposed to be deleted, or expanded and replaced?
Which brings me to this question: Why is it that HT must not be expanded into SP, even in a line’s indentation? Are there actual applications where this is crucial? Is it prudent to, say, try to extract from a parsed code block that comes out of a CommonMark processor, a Makefile
(where the distinction between SP and HT is obviously of outmost importance)? Is this a common usage scenario?
This requirement certainly makes both the specification more complicated, as demonstrated above, and complicates implementations too, as I’m sure. So what is the rationale behind it, how is this complication justified?