Unicode Character 'BULLET' (U+2022)

codinghorror · June 4, 2016, 9:31am

After thinking about this for a while, and originally being in favor of it, I am opposed to it. I think it ultimately works against the spirit of the ASCII conventions that Markdown is based on.

I understand the desire, to make it “just work”, but it seems to send the wrong message to all users. And that’s a major con.

zzzzBov · June 6, 2016, 2:57am

I think it would make sense to have a UTF-8 extension of CommonMark that supports additional characters such as •, but for CommonMark core, ASCII should be preferred.

Crissov · June 9, 2016, 10:23am

I’m not really convinced that the original spirit of Markdown is ASCII-only, but “easy to input”. If people frequently get characters like U+2022 from copy-and-paste operations from rich text environment to markdown / plain text environments, then it could make sense to spec them. However, this is probably not limited to •.

Besides other bullet and hyphen or dash characters like U+2023 ‣, U+25E6 ◦, U+2043 ⁃ and U+2012 ‒, U+2013 –, U+2014 —, I’m thinking of local-script (mostly Indian/Brahmic) decimal digits for enumerated lists, i.e. Arabic-Indic U+0660–9/F0–9, Devanagari U+0966–F, Bengali U+09E6–F, Gurmukhi U+0A66–F, Gujarati U+0AE6–F, Oriya U+0B66–F, Tamil U+0BE6–F, Telugu U+0C66–F, Kannada U+0CE6–F, Malayalam U+0D66–F, Thai U+0E50–9, Lao U+0ED0–9, Tibetan U+0F20–33, Myanmar U+1040–9 and U+1090–9, Ethiopic U+1369–71, Khmer U+17E0–9, Mongolian U+1810–9, Limbu U+1946–F, Nko U+07C0–9, New Tai Lue U+19D0–9/A and U+1A80–9, Balinese U+1B50–9, Sundanese U+1BB0–9, Lepcha U+1C40–9, Ol Chiki U+1C50–9, Vai U+A620–9, Saurashtra U+A8D0–9, Kayah Li U+A900–9, Cham U+AA50–9, Osmanya U+104A0–9, but not circled or parenthesized European ones and probably neither East-Asian fullwidth and halfwidth nor mathematical stylistic forms. Also see the CSS Counter Styles.

Brian_Lalonde · July 13, 2016, 12:29am

Is there any science or observation behind the “too many bullet types” confusion, or is this a someone’s preference masquerading as an unsubstantiated usability concern?

The real spirit of Markdown is, from my perspective anyway, actually collecting and measuring real text to drive syntax choices.

jkdev · July 13, 2016, 4:51am

Sometimes I use • characters to avoid creating a list.

• Kind
• of
• like
• this.

This provides more control over styles and layout. I’ve found it useful.

codinghorror · July 14, 2016, 7:52am

I can’t recall a single time I have seen a user use “real” bullets to make a list.

That observation is across my four (very long, 80+ hour weeks) years as a co-founder and primary architect of Stack Overflow, and my three years as a co-founder Discourse looking at hundreds of communities using Markdown.

It is, at best, very rare to see a unicode bullet typed from a user.

Compare with my list observation here. I see users make that mistake almost every day!

Brian_Lalonde · July 14, 2016, 8:48am

Is that likely to be a broad enough selection across different document types, though? Are you personally looking at all the content for hundreds of communities? It certainly happens when pasting text from Word or Outlook emails or PowerPoint at the very least.

codinghorror · July 14, 2016, 9:32am

When it comes to Discourse, yes. We have 400+ hosted customers ranging from 8-10 year old kids (hopscotch) to small businesses (taxcycle) to gamers to enthusiasts to moms and everything in between. Wildly different communities. Some specific examples here. And Stack Exchange has ~100 different communities as well.

Sharing Word documents as post attachments might be common but copying and pasting from word into a discussion is not.

I fight for the users. And believe me, I see many daily common problems regular users have with Markdown that we’re trying to address (numbered lists that work for 1) 2) 3) for example, as well as the list thing I cited above) but unicode bullets ain’t even in the top 50.

Brian_Lalonde · July 14, 2016, 5:40pm

If users never enter bullet characters, there’s no chance of confusion, right? The risk you’re describing, if I understand, is surprise list behavior.

I don’t have exposure the same quantity of content, so it’s odd that I’ve so often come across content that you haven’t. Even stranger that you haven’t come across any content pasted from Office lists! It doesn’t seem too much of a stretch to imagine that the use cases you are exposed to may not be complete.

If the point of this forum is to legitimize or inform the standard by gathering differing experience, that’s what I’ve tried to help provide.

I’m grateful you’ve started this infrastructure, and you are free to use that privileged position to simply assert a preference, but I wonder, beyond just this issue, whether some of these decisions could use a more scientific approach than the remembered experience of a single person, or small plurality.

codinghorror · July 14, 2016, 9:03pm

Yes, if we put our minds to it we can imagine many things. If you have a corpus of actual data you can point to that proves otherwise, feel free to share it. I’ve shared mine.

I should also point out that the entirety of Stack Exchange is creative commons, and freely downloadable. If you’d like to download every single post ever made (this is many, many gigabytes) and run a scan for unicode bullets, feel free!

Brian_Lalonde · July 14, 2016, 10:02pm

I’m in the financial industry, so there’s often restrictions to sharing certain content, but I’ll keep my eye out for sharable examples.

We have a large amount of office content that we often either convert or quote in Markdown. Perhaps most commonly is from either Outlook emails or Word documents. How much example content constitutes proof of existence or merits consideration?

chrisalley · July 15, 2016, 8:57am

Copying an automatic (created with a hyphen+space) bulleted list from Word 2016 (web version) and pasting it into Atom, Visual Studio Code, Textmate, iA Writer or a plain textarea in Chrome/Firefox does not actually copy the list markers, just the list content (I have not verified this in the desktop version of Word though). Markdown editors often fall into this group of plain text editors that do not handle these Word auto lists.

Also worth noting is that the Option+8 bullets and Word bullets are different. The Option+8 bullets can be pasted into the plain text editors that I listed above.

codinghorror · July 22, 2016, 4:45am

Just pointing out that I used to be in favor of adding the Unicode bullet, but was convinced otherwise, per the earlier posts in this topic. So I have held both positions on this matter, it is not something I rejected out of hand.

It is also entirely possible to use a CommonMark plugin to add this behavior later if it is required for your use case.

chris-morgan · May 10, 2018, 12:45am

I have definitely encountered Unicode bullets in the wild not produced by me. Not often, but occasionally.

There was a time, 8–13 years ago, when I was regularly manually turning a monthly newsletter I received as a PDF into HTML. (The PDFs had normally been generated from Word through a PDF printer.) I would start by just copying all the text, and bullets were •.

I have also worked with SVG documents with text containing bulleted lists, the bullets represented by •.

I still firmly believe the semantics argument (it’s a bullet, let it be a bullet) is very strong.

Crissov · September 6, 2018, 7:33pm

Just for the record, Slack fakes lists by putting a Bullet character in front of consecutive paragraphs.

Brian_Lalonde · February 25, 2019, 4:24pm

The desktop version of Word (at least as of Office 2013) does include the bullets: a dash if you used a dash to initiate the list, the bullet character being discussed if you used an asterisk.

Brian_Lalonde · February 27, 2019, 4:20pm

Since both Google Keep and Slack use Option+8/Alt+Num 7 bullets (it’s also on the Android default keyboard) to do lists, that seems worthy of further discussion to support pasting in that content from these popular sources, or to support users being trained to expect this behavior.

Sadly, it’s nearly impossible to fully survey the usage of U+2022, since it isn’t included in search indexes. Even emojitracker.com doesn’t include it in their Twitter usage stats (since it isn’t an emoji character). It shows up as the &#x2022; HTML entity on GitHub quite a bit. The CommonCrawl data (posted as an answer to Unicode character usage statistics - Stack Overflow) suggests it’s around the 197th most popular character.

Without more comprehensive stats, we’re each forced to shrug and presume our personal experience is sufficient anecdotal evidence to optimize for. Of course, if there’s no real downside (and I still haven’t heard any, other than wanting to pretend that Unicode doesn’t exist so the spec can live in the relative simplicity of ASCII-land), then including this seems reasonable.

aoudad · February 28, 2019, 3:05am

According to every Markdown flavor:

• This is a paragraph. 
• Not a list.

CommonMark needs to stay compatible, and not innovate.

Innovations such as U+2022 bullet list markers can come in future extensions or a successor language.

Brian_Lalonde · July 4, 2020, 7:20pm

I’m not sure I’d characterize bullets as an “innovation”. It’s certainly how markdown itself renders unordered lists. I wouldn’t really see a benefit from implementation in a separate language that isn’t supported by any sites, systems, or tools I use.