In the description of setext headings (section 4.3), the specification states:
A setext heading underline is a sequence of “=” characters or a sequence of “-” characters, with no more than 3 spaces indentation and any number of trailing spaces. If a line containing a single “-” can be interpreted as an empty list items, it should be interpreted this way and not as a setext heading underline.
In this example:
Dolor
-
the condition written in the second sentence seems to apply: The line below “Dolor” contains a single “-” HYPHEN-MINUS character, and it can be interpreted as “an empty list items” [is that “an empty list item”?]
Thus the expected result seems to be that in the example the second line is not a setext heading underline, and thus the example is not a setext heading.
In this example the situation is similar, but less ambiguous:
Dolor
1.
The line below “Dolor” here is (and must be) “interpreted as an empty list item”.
The reference implementation translates the first example to
<h2>Dolor<h2>
but the second to
<p>Dolor<p>
<ol>
<li></li>
</ol>
Is this a bug in the reference implementation? Do I misread the specification? Is the specification wrong?
Why can’t we just require more than one - or = on a line to make a setext style header?
That would be a simple and reasonable solution (after all, the code fence line has a minimum count, too).
It would, however, be also be an incompatible solution.
Given that the syntactic construct has two possible interpretations, and that “empty list items” are not exactly common, or “portable”, I honestly see no merit in preferring an “empty list item” interpretation over the “setext heading” interpretation.
If you really want an “empty” list item, you can always (and even somewhat more “portable” and clear) write
Dolor
- <span></span>
Yes, technically this is not really an empty item. In CommonMark you could write alternatively
Dolor
- <!-- empty item -->
Depending on the implementation, also not “really empty”.
With my suggested (don’t remember if I actually wrote about it here) “discardable input tag”, the empty comment declaration<!>, this would look like this:
Dolor
- <!>
In this case the <!> would be “seen” in parsing, but discarded when it comes to “inline” text. (The <!> is valid SGML, but not XML or HTML, so discarding it is no harm anyway.) I think <!> could be useful in other places, too.
I’d be in favor of that, but worry about compatibilty. I suspect that it’s not too uncommon to use single-character underlines, just because people are lazy. It would be nice if we had data on this.
after a “vanilla” text line (ie without a preceding blank line?
I have no data either, but my wild guess would be that “using a single-character setext underline” is more common than this combination of circumstances.
Cetero censeo: Recognizing list items in a “vanilla paragraph” is a bad idea (and that is what is going on here, after all).
Given that we are already introducing the breaking
#this-is-not-a-header
Change I don’t see any harm to tightening up the rules on setext headers to require more than one - and =
As I have mentioned before headers are by definition rare elements in a doc and thus pretty easy to fix as they are a) obvious and b) big thematic breaks so easy to find.
It’s certainly uncommon to be the desired end result, but it happens quite often while typing. I’ve seen implementations with integrated live preview or preview-like syntax highlighting indicating a heading at first when beginning a line with hyphen-minus (and whitespace) but switching to list item as soon as any other character is typed- It’s annoying, so I support a minimum number of - or = for Setext headings that is greater than 1.
As I wrote, I find the whole idea of splitting a “vanilla-starting” paragraph into items and whatnot “after the fact” to be bogus. Thus I have no horse one way or the other in this race over just another irritating consequence of this idea.
In fact, I agree with you and @codinghorror that introducing one more little incompatibility with all the other Markdown dialects out there (in your case, to tame a particular GUI implementation’s annoying behavior) would likely do not any harm. For an appropriate definition of “harm”, that is.
github .md files would be an excellent source, since they tend to include longer documents which are lacking in SO. Since these are public, I suppose we could grab them ourselves, or maybe ask Vicent Marti. I’d love to have this data.
Did anyone manage to find a corpus of notes to analyse?
I would be interested to know how badly breaking it would be:
to require multiple underlines for setext headers
to require setext headers and/or underlines to start at the beginning of lines
I believe the empty, hyphenated list-item is a non-trivial issue, but we don’t see it, because people realize it looks weird, then avoid it. With the rise of interactive markdown pads for group meeting notes, where bullet points are left empty to fill later, solving this could have significant benefits.
many randomly generated IDs will correspong to private or deleted projects, so significantly more calls will need to be made than desired sample size.
After that, I should be able to figure out how to do the appropriate regex searching.
I would be interested to hear if an investigation as I’ve described would have the necessary weight to influence decisions (in any direction), and if not, what would need to be changed to make it so?@jgm@vas (I ask because if the result would in any case be inconsequential, it’s not worth me doing the work)