Unicode Character 'BULLET' (U+2022)


#62

Maybe I see this from a more high-level perspective and consider parsers a technicality to be solved, but for me the most important things are

  1. The purpose of Markdown is to add semantic structure to a human-readable plain text, and
  2. CommonMark is a standard based on Unicode.

All these arguments about which editor on which platform uses what special characters is really secondary.

Of course, for someone who writes mostly ASCII text it seems easy to use * et al, but if you write in Chinese* or Arabic or Klingon and you’re using a wide range of Unicode code points anyway it seems quite natural to use bullets as bullets, doesn’t it?

The only reason why we’re not discussing this in reverse (“do we really need * et al if Unicode provides us with actual bullets?”) is that John Gruber and most of us are still using a 140 year old keyboard layout, which really is a shitty argument in the greater scheme of things.

Thus, I think CommonMark should support all Unicode bullet characters, just as it supports all whitespace and line-break characters as such.

*) Disclaimer: I have no idea how one writes Chinese on a less-than-table-sized keyboard, but I assume it involves sacrificing the blood of a goat.


#63

A summary of the discussion thus far:

Pros

  1. Trivial to implement
  2. Produces the expected output when are in the input.

Cons

  1. It is hard to type for some users
  2. cannot be encoded in an ascii string.
  3. It may confuse a novice user when editing a documents with and may make markdown harder to learn.

My 2 Cents:

Cons 1 and 2 are minor. There is some existing useage of the bullet character already, and the markdown spec is encoding agnositc anyway (I believe there is consensus here). Since the feature is trivial to implement. That leaves Con 3 and Pro 2 which both address core aspects of markdown design phillosophy. It is not at all clear to me which is the better / more important arguement (who knows what actual users are like?). Probably either choice is ok.


#64

After thinking about this for a while, and originally being in favor of it, I am opposed to it. I think it ultimately works against the spirit of the ASCII conventions that Markdown is based on.

I understand the desire, to make it “just work”, but it seems to send the wrong message to all users. And that’s a major con.


#65

I think it would make sense to have a UTF-8 extension of CommonMark that supports additional characters such as •, but for CommonMark core, ASCII should be preferred.


#66

I’m not really convinced that the original spirit of Markdown is ASCII-only, but “easy to input”. If people frequently get characters like U+2022 from copy-and-paste operations from rich text environment to markdown / plain text environments, then it could make sense to spec them. However, this is probably not limited to .

Besides other bullet and hyphen or dash characters like U+2023 , U+25E6 , U+2043 and U+2012 , U+2013 , U+2014 , I’m thinking of local-script (mostly Indian/Brahmic) decimal digits for enumerated lists, i.e. Arabic-Indic U+0660–9/F0–9, Devanagari U+0966–F, Bengali U+09E6–F, Gurmukhi U+0A66–F, Gujarati U+0AE6–F, Oriya U+0B66–F, Tamil U+0BE6–F, Telugu U+0C66–F, Kannada U+0CE6–F, Malayalam U+0D66–F, Thai U+0E50–9, Lao U+0ED0–9, Tibetan U+0F20–33, Myanmar U+1040–9 and U+1090–9, Ethiopic U+1369–71, Khmer U+17E0–9, Mongolian U+1810–9, Limbu U+1946–F, Nko U+07C0–9, New Tai Lue U+19D0–9/A and U+1A80–9, Balinese U+1B50–9, Sundanese U+1BB0–9, Lepcha U+1C40–9, Ol Chiki U+1C50–9, Vai U+A620–9, Saurashtra U+A8D0–9, Kayah Li U+A900–9, Cham U+AA50–9, Osmanya U+104A0–9, but not circled or parenthesized European ones and probably neither East-Asian fullwidth and halfwidth nor mathematical stylistic forms. Also see the CSS Counter Styles.


#67

Is there any science or observation behind the “too many bullet types” confusion, or is this a someone’s preference masquerading as an unsubstantiated usability concern?

The real spirit of Markdown is, from my perspective anyway, actually collecting and measuring real text to drive syntax choices.


#68

Sometimes I use characters to avoid creating a list.

• Kind
• of
• like
• this.

This provides more control over styles and layout. I’ve found it useful.


#69

I can’t recall a single time I have seen a user use “real” bullets to make a list.

That observation is across my four (very long, 80+ hour weeks) years as a co-founder and primary architect of Stack Overflow, and my three years as a co-founder Discourse looking at hundreds of communities using Markdown.

It is, at best, very rare to see a unicode bullet typed from a user.

Compare with my list observation here. I see users make that mistake almost every day!


#70

Is that likely to be a broad enough selection across different document types, though? Are you personally looking at all the content for hundreds of communities? It certainly happens when pasting text from Word or Outlook emails or PowerPoint at the very least.


#71

When it comes to Discourse, yes. We have 400+ hosted customers ranging from 8-10 year old kids (hopscotch) to small businesses (taxcycle) to gamers to enthusiasts to moms and everything in between. Wildly different communities. Some specific examples here. And Stack Exchange has ~100 different communities as well.

Sharing Word documents as post attachments might be common but copying and pasting from word into a discussion is not.

I fight for the users. And believe me, I see many daily common problems regular users have with Markdown that we’re trying to address (numbered lists that work for 1) 2) 3) for example, as well as the list thing I cited above) but unicode bullets ain’t even in the top 50.


#72

If users never enter bullet characters, there’s no chance of confusion, right? The risk you’re describing, if I understand, is surprise list behavior.

I don’t have exposure the same quantity of content, so it’s odd that I’ve so often come across content that you haven’t. Even stranger that you haven’t come across any content pasted from Office lists! It doesn’t seem too much of a stretch to imagine that the use cases you are exposed to may not be complete.

If the point of this forum is to legitimize or inform the standard by gathering differing experience, that’s what I’ve tried to help provide.

I’m grateful you’ve started this infrastructure, and you are free to use that privileged position to simply assert a preference, but I wonder, beyond just this issue, whether some of these decisions could use a more scientific approach than the remembered experience of a single person, or small plurality.


#73

Yes, if we put our minds to it we can imagine many things. If you have a corpus of actual data you can point to that proves otherwise, feel free to share it. I’ve shared mine.

I should also point out that the entirety of Stack Exchange is creative commons, and freely downloadable. If you’d like to download every single post ever made (this is many, many gigabytes) and run a scan for unicode bullets, feel free!


#74

I’m in the financial industry, so there’s often restrictions to sharing certain content, but I’ll keep my eye out for sharable examples.

We have a large amount of office content that we often either convert or quote in Markdown. Perhaps most commonly is from either Outlook emails or Word documents. How much example content constitutes proof of existence or merits consideration?


#75

Copying an automatic (created with a hyphen+space) bulleted list from Word 2016 (web version) and pasting it into Atom, Visual Studio Code, Textmate, iA Writer or a plain textarea in Chrome/Firefox does not actually copy the list markers, just the list content (I have not verified this in the desktop version of Word though). Markdown editors often fall into this group of plain text editors that do not handle these Word auto lists.

Also worth noting is that the Option+8 bullets and Word bullets are different. The Option+8 bullets can be pasted into the plain text editors that I listed above.


#76

Just pointing out that I used to be in favor of adding the Unicode bullet, but was convinced otherwise, per the earlier posts in this topic. So I have held both positions on this matter, it is not something I rejected out of hand.

It is also entirely possible to use a CommonMark plugin to add this behavior later if it is required for your use case.


#77

I have definitely encountered Unicode bullets in the wild not produced by me. Not often, but occasionally.

There was a time, 8–13 years ago, when I was regularly manually turning a monthly newsletter I received as a PDF into HTML. (The PDFs had normally been generated from Word through a PDF printer.) I would start by just copying all the text, and bullets were •.

I have also worked with SVG documents with text containing bulleted lists, the bullets represented by •.

I still firmly believe the semantics argument (it’s a bullet, let it be a bullet) is very strong.


#78

Just for the record, Slack fakes lists by putting a Bullet character in front of consecutive paragraphs.


#81

The desktop version of Word (at least as of Office 2013) does include the bullets: a dash if you used a dash to initiate the list, the bullet character being discussed if you used an asterisk.


#82

Since both Google Keep and Slack use Option+8/Alt+Num 7 bullets (it’s also on the Android default keyboard) to do lists, that seems worthy of further discussion to support pasting in that content from these popular sources, or to support users being trained to expect this behavior.

Sadly, it’s nearly impossible to fully survey the usage of U+2022, since it isn’t included in search indexes. Even emojitracker.com doesn’t include it in their Twitter usage stats (since it isn’t an emoji character). It shows up as the • HTML entity on GitHub quite a bit. The CommonCrawl data (posted as an answer to Unicode character usage statistics - Stack Overflow) suggests it’s around the 197th most popular character.

Without more comprehensive stats, we’re each forced to shrug and presume our personal experience is sufficient anecdotal evidence to optimize for. Of course, if there’s no real downside (and I still haven’t heard any, other than wanting to pretend that Unicode doesn’t exist so the spec can live in the relative simplicity of ASCII-land), then including this seems reasonable.


#83

According to every Markdown flavor:

• This is a paragraph. 
• Not a list.

CommonMark needs to stay compatible, and not innovate.

Innovations such as U+2022 bullet list markers can come in future extensions or a successor language.