IMO there are perfectly valid reasons why backslash escapes (and indeed none of the CommonMark-significant markup like *
, [
, the `
backtick itself, nor entity references) are not recognized in code block, code spans, and raw HTML—in other words, why these fragments are not interpreted in any way.
However, for these fragments of input text there are already rules in place to find their end (rather ingenious in the case of code spans and the “repeated backtick-trick”, or for the most part inherited from HTML in the case of raw HTML).
For autolinks, link titles, link destinations the rules are a little bit more complicated. The terminating delimiter can be:
- the first non-escaped QUOTATION MARK rsp APOSTROPHE, skipping white space (for link titles),
- the first non-escaped RIGHT PARENTHESIS providing a balanced match, skipping white space, for the opening LEFT PARENTHESIS (for link titles),
- the first non-escaped GREATER-THAN SIGN, without skipping white space (for link destinations),
- the first white space character (for link destinations and autolinks),
- the first GREATER-THAN SIGN, without skipping white space or LESS-THAN SIGN (for autolinks),
- the first LESS-THAN SIGN, without skipping white space or GREATER-THAN SIGN (for autolinks),
- the first non-escaped RIGHT PARENTHESIS providing a balanced match, without skipping white space (for link destinations in inline links).
Although Grubers description is in fact silent about all of these, there is “prior art” for this use of \"
, \'
, (
…)
, \(
, \)
in link titles. This might be sufficient to support this use in the specification. (I personally tend to just write "
if needed in link titles enclosed in "
, and to forget about all the other rules.)
For handling space or >
or \
in link destinations and autolinks (ie, in URIs), I think there’s much less “common” practice.
Apply the “repeated backtick-trick” here too?
So I wonder whether CommonMark could apply the “repeated backtick-trick” here too: open a link destination or autolink with two or more <
characters, and the closing delimiter will be the same number of repeated >
characters.
This would in one go
-
allow any text, including unescaped space, >
, )
and <
in link destinations and autolinks;
-
provide an obvious distinction between an autolink and a HTML tag (which was the reason for excluding space from the former, IIRC!)
While <svn:defs>
looks a lot like <svg:defs>
, and the latter should be written <svg:defs >
in CommonMark to differentiate both, it is perfectly obvious that <<svn:defs>>
or <<mailto:"2 < 4"@example.com>>
are not HTML tags (but would be autolinks). Similarly,
[link text](<<http://example.com/with a ) very strange > url>> "How about that?")
is much easier to type, parse, and copy-and-past for humans and machines than
[link text](<http://example.com/with%20a%20%29%20very%20strange%20%3E%20url> "How about that?")
or the “clearer” (because two less characters are percent-encoded
)
[link text](<http://example.com/with%20a%20\)%20very%20strange%20\>%20url> "How about that?")
(I found no easier or clearer way to enter this URL in CommonMark.)