ATX heading, issue with tabs


4.2 says

The opening sequence of # characters must be followed by a space or by the end of line. The optional closing sequence of #s must be preceded by a space and may be followed by spaces only.

(Emphasis mine)

Now, I believe “space” in this context means “space or tab” based on 2.2 and differences in behavior between cmark and the JS implementation (though 4.2 links directly to the definition of “space” as being U+0020, so perhaps not). Here’s a couple examples of what I’m talking about:

  • # abc\t# gives me <h1>abc</h1> in cmark and <h1>abc\t#</h1> in the JS implementation.
  • # abc #\t gives me <h1>abc</h1> in cmark and <h1>abc #</h1> in the JS implementation.

cmark seems to favor the interpretation that “spaces” means “spaces and tabs” and the JS implementation seems to take “spaces” literally. Who’s right?

To add to the confusion, both cmark and the JS implementation allow tabs between characters in thematic breaks, again leading me to believe that “space” should mean “space and tab”.



cmark is right. Top of section 2.2 of spec:

However, in contexts where whitespace helps to define
block structure, tabs behave as if they were replaced by
spaces with a tab stop of 4 characters.

This is one such context, or so it seems to me.

The spec could definitely be improved in its treatment
of spaces and tabs (see e.g.