Unicode Character 'BULLET' (U+2022)

instanceofme · September 12, 2014, 4:25pm

Not necessarily. A simple parser that would match significant markers on ASCII chars and blindly output all other bytes would sill work fine with one-byte encodings like ISO-1559-* and one-byte-or-non-ASCII encodings like UTF-8.

Admittedly it won’t work with UTF-16, though. I’m just saying that including Unicode markers raises the bar for implementations.

instanceofme · September 12, 2014, 4:42pm

Because text is not HTML, you have to make concessions, like requiring two spaces at the end of a line to get a line break, otherwise you cannot have a well-formatted text file OR you get line breaks every other sentence in the output. Markdown already works like this in almost all implementations, except for some that are not destined to be used with text files (e.g. embedded in an app or website).

So when you say “it doesn’t [work]”, you mean that, like most markdown implementations today, it doesn’t work for this quite specific case of both using the • bullet and not wanting to append those two spaces.

Consider the alternatives:

Loose the encoding-agnosticism – backward-incompatible in a major way plus lots of drawbacks
always breaking lines – backward-incompatible in a major way, loosing the ability to have clean text files

I think not supporting • as a list item marker it is the lesser important annoyance.

seantek · September 12, 2014, 9:35pm

Like instanceofme said:

CommonMark (as I understand it) is supposed to smooth out differences between all of the implementations out there, and smooth out the ambiguities in Gruber’s writeup. Its purpose is not necessarily to add new features. Regardless of the character set issue, it looks and smells like a new feature.

kagan · September 13, 2014, 3:06pm

Fenced code blocks are old and list items are new?

seantek · September 13, 2014, 4:06pm

No.

With • (and the Unicode standard), yes.

bjornte · December 17, 2014, 12:01pm

Hi, I’m writing on an iPad, and here the «•» is readily accessible by long-pressing the «-» button. It requires fewer button presses than «*» or «+».

Here’s my list example:

General to do

• Continue sales
• Better presentation material
• Read e-mail to Ian
• Read old draft business plan
• Contact Dan Gøran
• Ask Simen about mail

It was written using the • character, using Notes in the iPad.

This looks good in the preview here on the talk page, but bad in the dingus (http://spec.commonmark.org/dingus.html). The dingus makes it look like so:

• Continue sales • Better presentation material • Read e-mail to Ian • Read old draft business plan • Contact Dan Gøran • Ask Simen about mail

(Why the difference between the talk pages and the dingus?)

JCSalomon · December 17, 2014, 4:37pm

The difference is due to this site respecting arbitrary line-breaks; the dingus (and the Commonmark spec) concatenate adjacent lines into a single paragraph.

(Note that though the list displays nicely here, it has not been translated into an HTML list.)

jgm · December 17, 2014, 7:31pm

See https://github.com/jgm/CommonMark/pull/198

Your point about the ease of typing a bullet on an iPad might count in favor of making this change.

+++ Bjørn S Tennøe [Dec 17 14 12:12 ]:

cirosantilli · December 19, 2014, 10:17am

-1: There are already too many ways of making lists. This puts even greater burden on implementors and learning curve on end users. Like in programming languages, it is better to have one single way to do things.

codinghorror · December 19, 2014, 8:45pm

No, I don’t agree @cirosantilli there are already so many ways to make lists:

* this is a list
+ this is a list
- this is a list

So why not unicode bullet? Seems rather safe and minor of a change to me and it would help real users I see doing this in the wild a fair bit. That’s the main thing I care about. The combination of “easy, minor” and “seen in the wild a lot.”

I think we should include it @jgm .

jgm · December 19, 2014, 10:53pm

I think I am also in favor. Note that reStructuredText
added unicode bullets a long time back: they allow •, ‣, or ⁃:

Here is their mailing list discussion:
http://thread.gmane.org/gmane.text.docutils.user/2959/focus=2960

And their spec:
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#bullet-lists

+++ Jeff Atwood [Dec 19 14 20:57 ]:

chrisalley · December 19, 2014, 11:44pm

Causal writers who do not know how to insert • may be confused when editing someone else’s unicode bullet list.

* + and - are all easily accessible from a standard keyboard. Pressing Option+8 on the Mac to insert an additional • list item is not intuitive. If the writer does not know the key combination then they must use copy/paste, search for how to insert •, or rewrite all of the existing list markers.

No other Markdown syntax requires knowledge of key combinations (besides Shift of course).

zzzzBov · December 24, 2014, 6:44am

I’m still on the fence for this change. So I’ll voice my thoughts for and against, and see if anyone can sufficiently shoot them down.

(emphasis mine)

IIRC, all special characters in Markdown are in the ASCII range. Adding support for U+2022 would require that conforming implementations support unicode rather than ASCII, making the standard more restrictive.
Limited support from existing implementations. If they didn’t need it before, why do we need it now?
More divergence from the original markdown.
This particular change seems like it belongs in a unicode extension. Which could add all relevant unicode bullet characters to the list of acceptable characters, including (but not limited to):

• (U+2022) bullet
‣ (U+2023) triangular bullet
⁃ (U+2043) hyphen bullet
⁌ (U+204C) black leftwards bullet
⁍ (U+204D) black rightwards bullet
∙ (U+2219) bullet operator
◦ (U+25E6) white bullet

Additionally, unicode numerals beyond 0-9 could be supported for numeric lists.

on the other hand

“making the standard more restrictive” isn’t a strong argument. Unicode support isn’t particularly difficult when the character sets are well defined.
Many existing implementations don’t have support for fenced code blocks either, but utility was favored over popularity to reduce ambiguity.
Divergence from the original markdown is almost unavoidable, as the original is practically abandonware. If the original were to use semantic versioning, it’d be 1.0, and this would be the 2.0 spec due to the known API incompatibility.
Why go through all the extra effort to define the unicode alternatives to the core characters and then leave it up to implementors to choose whether or not they should implement an optional extension? Adding the unicode alternatives to core could help to make markdown more portable between conforming implementations.

Knagis · December 24, 2014, 7:57pm

Just wanted to note that this is not a valid point. There are two cases - either the particular implementation supports some kind of Unicode text encoding (like the C reference implementation supports UTF-8) or not (where it supports only ASCII).

If the supports just ASCII then there is no problem since the input data cannot contain a Unicode bullet char.
If it supports Unicode then the change to support this is trivial (as shown by the pull request).

Also note that the specification already contain tests that verify correct handling of Unicode characters (for url-encoding and tab expansion).

lu_zero · December 26, 2014, 2:06am

Would be nice to support all the possible bullet points then, willing to update the patch accordingly?

Brian_Lalonde · December 31, 2014, 6:34pm

Common “in the wild” use case: Bullets are often generated when pasting text out of certain popular word processing programs.

WRT the awkwardness of input: Mobile OS (tablets, initially) soft keyboards will increasingly be used for Markdown. It’s fewer keystrokes. See http://www.emojitracker.com/ to address any concern about popularity of characters not appearing on hardware keyboards.

lu_zero · January 1, 2015, 3:25pm

See above for a list of possible bullets.

chrisalley · January 3, 2015, 12:13pm

[quote=“Brian_Lalonde, post:36, topic:397”]
WRT the awkwardness of input: Mobile OS (tablets, initially) soft keyboards will increasingly be used for Markdown. It’s fewer keystrokes.[/quote]

It’s true that use of software keyboards is increasing. Hardware keyboards aren’t going anywhere though. For serious writing, hardware keyboards remain an important option.

I had a look at that site. Can you explain how it addresses the issue I raised earlier regarding appending list items to other people’s unicode lists?

Brian_Lalonde · January 5, 2015, 5:20am

Can you explain how it addresses the issue I raised earlier regarding appending list items to other people’s unicode lists?

“Confused” seems like a broad assessment based solely on a very specific presumed configuration of ignorances, I guess. I’d expect someone editing a document with bullet characters to either copy and paste them if they felt insecure about the format, or else replace them with their favorite Markdown bullet characters.

vitaly · January 5, 2015, 6:31am

+1 for extending markers list.