I’m developing a syntax plugin for Markdown (but changing to CommonMark currently) which takes a very limited approach to parsing.
My major concern is that some terms are not really clarified.
In section 2, you say
This spec does not specify an encoding
Nonetheless, in the further document, you specify several things with respect to ASCII.
In section 2, you say
Line endings are replaced by newline characters (LF).
I have not seriously worked with Windows for years, but is this really clean, but shouldn’t the exact encoding rather be subject to the implementation?!
In section 6.1, you say
Any ASCII punctuation character may be backslash-escaped:
What is an “ASCII punctuation character”? Is it what is noted below, in example 207, or are there more? And what about other punctuation characters from some other codespaces?
In section 6.7, you say
… followed by zero or more characters other than ASCII whitespace and
control characters, <, and >."
What are control characters? ASCII 0-31?!
Why not reference rfc3986 for this at all, which is the definition of an URI?
In section 6.7, you specify a range of schemes to be recognized. Wouldn’t it be easier and cleaner to just reference the IANA range which you obviously took anyway?
Don’t get me wrong, I don’t want to nitpick unimportant things and I appreciate a more formal definition of Mark* than before. But since I have to work completely with regular expressions, these ambiguities need to be clarified to make CommonMark really be future-compatible.
Furthermore, having some formal definition would be really useful instead of a phrased-out one. E.g., providing a regex which will clarify what you actually specify, or a grammar or so (don’t know, I’m no computer scientist). Section 6.4 is far too long for what you want to specify! This also partially coincides with the thread by roop.
Of course you cannot specify everything with an regex, but if the rest is unambiguous, then some regexes would really help understanding the text, especially the ones for list items and blockquotes.
PS: This Discourse does not obey CommonMark. Newlines are treated as newlines, and paragraphing in list items does not work as intended, as can be seen in this text.
Ok, then it’s not encoding. It’s still unnecessary annoying to do this. No matter what Windows “stubbornly” does, it renders the output of the proposed CommonMark renderers unreadable for Windows users, which are still by far a large majority.
Or are there any specific reasons why LF vs. CRLF must be in the standard which would make a difference except for the compliance checks?
So Kerry Redshaw from Brisbane, Queensland, Australia is the reference for what ASCII punctuation is? Maybe that should then be noted in the standard.
When you open a new text field, it proposes to use “Markdown or BBCode”.
Yes, I did. This is why I’m asking if this couldn’t be part of the standard which is not formally written anyway. Providing a few lines of regexes is different to pointing to 1,5k lines of code with regexes in it.