Minor comments and unclarities after reading the spec

Hi folks,

I just stumbled upon this new spec and read it through. It’s a great spec. It’s a little verbose at some points (sometimes something EBNF-ish might have been more compact, but I don’t think that could be used for everything), but it reads like a breeze. It also feels complete, nice!

There were some things I stumbled upon while reading that struck me as unclear or confusing. I think these are all fairly trivial, so I’ll list them in one topic (except for one issue, which warrants a separate topic).

  • Section 5.2 says “The following rules define list items:”, but that link has an undefined anchor.

  • In section 5.2, rule #2 explicitly mentions “If a line is empty, then it need not be indented”. However, rule #1 doesn’t say this explicitly, even though the examples show that this is also the case here.

  • Section 6 says “Any ASCII punctuation character may be backslash-escaped”. Is this term well-enough defined? Shouldn’t there be an explicit list? Just"punctuation character" seems imprecise, since Wikipedia on Punctuation considers spacing part of punctuation as well (but spaces cannot be escaped in CommonMark).

  • Section 6.2 says “… all HTML valid HTML Entities in any context are recognized as such …”, which I understand to mean any context within the Markdown document. However, a bit further down, it says “Entities are recognized in any context besides code spans or code blocks, …”, making the first sentence false and confusing?

  • Section 6.2 also says “all HTML valid HTML Entities in any context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing the entity itself) before they are stored in the AST.” Isn’t it implementation-specific to use UTF8 as the representation? Shouldn’t this spec just say that entities should be interpreted as the corresponding Unicode character / code point and leave it up to the implementation what encoding to use in the AST and when serializing the HTML or other output format?

    Example 338 is related and says “HTML entities in the destination will be parsed into their UTF8 codepoints, as usual, and optionally URL-escaped when written as HTML”. Since this indicates the actual serialized representation of the url to use, it makes sense to explicitly mention UTF8 here (though I don’t think an “UTF8 codepoint” actually exists, that would be the UTF-8 representation of the codepoint represented by the entity or something like that)

  • In section 6.4, example 288 is:

     ***foo bar***
    

    which is rendered as:

     <p><strong><em>foo bar</em></strong></p>
    

    However, looking at the rules, I can’t see why this puts strong emphasis around emphasis, and not the other way around (e.g. why is strong and em not reversed)?

  • Furthermore, example 319 contains:

     **foo*
    

    rendered as:

     <p>**foo*</p>
    

    I can’t see why this is not rendered as:

     <p>*<em>foo</em></p>
    

    I can’t see anything that would prevent rule #1 and rule #3 from applying to the second and third asterisk in the line respectively. What am I missing here?

  • Spaces in links seem to be handled inconsistently. Example 399 says “Spaces are not allowed in autolinks”. However, Example 331 says, about regular link destinations, “If the destination contains spaces, it must be enclosed in pointy braces”.

    This seems a bit confusing to me, that sometimes I can use <> to explicitly allow spaces in urls, but in autolinks it is explicitly forbidden. I guess the rationale is probably to prevent false-positives for autolinks, whereas that risk is far smaller for regular link destinations, which makes sense.

  • Section 6.10 says “A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces is parsed as a softbreak.”. Shouldn’t that also exclude line breaks preceded by a backslash?

  • Example 443 says, about regular strings after all formatting is applied, “Internal spaces are preserved verbatim”. However, Example 236 says, about backtick-enclosed strings “nterior spaces and newlines are collapsed into single spaces, just as they would be by a browser”.

    Isn’t this consistent and even reversed - backticks are supposed to prevent reformatting but remove spaces, but in other places spaces are preserved?

P.S. this forum should be using CommonMark, my comment was actually rendered wrong due to some problems the spec solves ;-p

P.S. 2 this forum tried real hard to dissuade me from being helpful: First it refused more than 4 links per post, then it refused more than a few links to the same host (so I had to replace links with github.io and github.io instead of github.io, causing the github.org links to break…)
Update: Fixed this again now that I have more trust level

+++ Matthijs_Kooijman [Oct 06 14 16:25 ]:

P.S. 2 this forum tried real hard to dissuade me from being helpful:
First it refused more than 4 links per post, then it refused more than
a few links to the same host (so I had to replace links with github.com
and github.org instead of github.io, causing the github.org links to
break…)

Thanks very much for these comments! This is just the kind of feedback we’re hoping to get, and I’m sorry the forum software was discouraging. I’ve bumped up your trust level so that you shouldn’t have problems in the future.

Cool, hope it helps.

Great, I undid my manual hacks so all links should actually work again now :slight_smile:

You should at least give the option of “posting” but not published until a moderator can approve and give temporary bump in posting privilege.

1 Like

I’ll whitelist github.com and github.io as a non-spam link source – sorry about that!

Discourse generally does not trust posts from new users with a lot of links. I hope it is clear why… :wink:

It’s totally understandable, least of two evils though ;-p