Unicode Character 'BULLET' (U+2022)

derari · February 4, 2016, 2:56pm

Not wishing to interrupt the flamewar between kolen and Kagan_Kayal, I would like to add these two points.

####Commonmark is already a Unicode specification

A character is a Unicode code point. Although some code points (for example, combining accents) do not correspond to characters in an intuitive sense, all code points count as characters
for purposes of this spec.
This spec does not specify an encoding; it thinks of lines as composed of characters rather than bytes. A conforming parser may be limited to a certain encoding.

So it doesn’t matter whether a parser prefers UTF-7, -8, -16, -32, or some esoteric IBM standard, it has to support Unicode code points anyway. This also becomes apparent a few lines later (emphasis mine):

A Unicode whitespace character is any code point in the Unicode Zs class, or a tab (U+0009), carriage return (U+000D), newline (U+000A), or form feed (U+000C).

The old ASCII-Whitespace is only used inside HTML tags, which is probably to be compatible with the HTML standards.

So if Commonmark uses Unicode anyway, it might as well use all of it. Otherwise we have an ASCII markup language with some Unicode text inbetween, which seems quite arbitrary.

on the other hand…

####Unicode bullets can’t be escaped

Any ASCII punctuation character may be backslash-escaped.
Backslashes before other characters are treated as literal backslashes.

This is definitely something that needs to be looked into, and AFAIK it wasn’t mentioned yet.