Unicode Character 'BULLET' (U+2022)

Manually replacing the characters could be quite time consuming if the list is long. Copy/paste seems more reasonable, although IMO fails the “easy-to-write as is feasible” aspect of Markdown.

Alternative suggestion: the application could prompt the user to replace characters with * when pasting them into a document, with a setting to remember the preference in the future.

1 Like

An application can still do that even if the parser supports •.

It just wouldn’t help if the user opens an existing Markdown document that uses . For example, if the Markdown document was created using an app which does not support the conversion of to * (when pasting). Requiring the * -, or + character to be what is saved means that all users get to work only with bullet characters that are available on a standard keyboard layout, the lowest common denominator.

I think an application should not change anything in an existing plain text source file. If a file already has the bullet character, it should stay so.

1 Like

“Manually” replacing can also refer to whatever search and replace functionality the editor provides (though stock browser textareas in particular could require copying and pasting into another application for this), which is much less onerous.

There a lots of options for how the user could go about updating the list, some very efficient if you know the right commands and menu options. My concern with all of them is that they require extra knowledge of text manipulation that goes beyond the characters visible on a standard keyboard layout.

I guess I’d have to see some stats regarding the number of CommonMark-relevant users unaware of basic search and replace functionality before I’d personally consider it a group worth optimizing for.

I think if the dash were the only allowed list item character, I would be against the change. But since -, +, and * are already supported, I don’t see the harm of adding the Unicode bullet(s) as well.

Most authors will stick to one (probably the dash, that’s what I do) and encourage co-authors to use the same symbol, but for casual users it’s great that when they copy’n’paste something from word or wherever, they get a proper list instead of • Hello• World!.

1 Like

So it seems that this feature mainly caters to OS X users.

Not only. My system runs under Linux. I can easily type bullet • and bunch of other typographics characters: —, «», №, etc.

Generally, I support using Unicode bullet character for lists. Do not forget also few other bullets:

U+2023 TRIANGULAR BULLET
U+2043 HYPHEN BULLET
U+25E6 WHITE BULLET

As already mentioned, pasting a bullet list copied from Word results in U+2022 characters, so Windows users stand to benefit just as much.

As for other bullet characters, nested Word bullet lists produce two other code points (circle and square), and then round-robin. I’m working on a CommonMark.NET extension that will recognize those as first-class unordered list markers (vanilla CommonMark.NET already has built-in support for U+2022).

Update: Word bullets are , o and (yes, the latter is missing from fonts other than Courier New).

0.02 cents here: I am against the addition of other kinds of bullets too.

  • when I first learn Markdown, I am confused with all variants to do the same thing, especially -, * in lists and *, _ in emphasis. As someone has said, there’s already too many ways to do it. I can’t get the logic to argue that if there’s already more than one way, why not more. I think it should be totally opposite, it’s already confusing, and it shouldn’t be messed up even more (IMHO it’s a mistake to have those different ways to do it in the beginning, but we can’t go back to change that fact. The only thing we could do is not to mess it up even more).
  • The major reason for the addition of such bullets are
    1. “compatibility to intuition” (that I entered a unicode bullet and expect a bullet) and
    2. compatibility to copy and paste from some other source.
  • but I think both are wrong expectations:
    1. if “intuition” governs anything (like how you type it on Mac/iPad), then we don’t need a spec. And that thing is not called Markdown nor CommonMark but plain text.
    2. In the case of copy and paste, the least thing we should worry is really the bullet points… who expect once you pasted it, Markdown would magically formatted it right for you? That’s called Pandoc… (at least it is very powerful at converting that)
  • After all, the most important thing is user experience, then what’s more important regarding the learning curve:
    1. Occasionally magic happens when I paste bullet lists and it’s perfect (then I probably got lazy and never learn it correctly)
    2. A simpler syntax such that you can learn in the first 30 minutes and it’s easy to remember so I don’t have to look up the options every time
  • P.S.
    1. I think part of the reason of the popularity of Markdown is its simplicity. While CommonMark try to solve the problem of fragmentation (which makes Markdown ugly), it shouldn’t destroy it’s original beauty (the simplicity). Adding all those different bullets is like trying to please everyone but in the end it makes it mediocre.
    2. regarding typing • with a long press of - on iPad, you could just type -… Bullet does not have to look like a point (contrary to the name bullet suggests). e.g. in LaTeX in a certain nested level of unnumbered list, it also looks like a hyphen. As I recall I think this is also true in Word?
    3. One stupid argument I could think of to propose another addition of list marker: / sign. Since, +, -, * are already supported and means the same thing, why don’t we also use / and let it means the same thing?
    • Saying / meant something else is not an acceptable answer, since * is also meant something else
    • Saying / doesn’t looks like a bullet is not acceptable either. + doesn’t look like a bullet too. And if one claim + looks like a bullet and therefore is one of the list markers, x should be included too.
    • I can also argue that since +, -, * are already legitimate list markers, one should expect new learners might use / too and therefore I need to cover their cases so as not to disappointed them or let them out of expectation.
    1. a long bullet lists is intentional to address bullet issue
    2. Please! Do not add more list markers. We already have too much.
1 Like

In my experiance, the bullet character is used as the first choice to mark list items. That applies to MS Word, MS Power Point and all major browsers.

It doesn’t matter if we like them or not. But, I think it is safe to say that these applications dominate our IT World.

Even the commonmark spec section 5.2 says

A bullet list marker is a -, +, or * character.

I would like to emphasis the bullet list part. That is the name of the baby. So what is more natural to use the bullet character for a bullet list? This fact alone proves how intuitive the bullet character is for a “bullet list”.

My original argument at the beginning of the topic still stands: If the bullet is not included into the list of markers, markdown converts a perfect plain text bulleted list into something wrong. Let me copy that part:

• Hello
• World!

becomes

• Hello• World!

In my opinion, this is pretty bad. I would like to quote the original Philosophy of markdown as formulated by John Gruber:

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else. A Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it’s been marked up with tags or formatting instructions.

Now, if markdown or commonmark basically turns a simple, correct bulleted list into the rubbish I have shown above, than this is clearly against the philosophy.

After reading so many arguments against the bullet in the bullet list, I gain the impression that we are more concerned about the simplicity for programmers than the simplicity for the end users. I believe, implementing the bullet triggers the alarm bells for some programmers, that they would have to deal with unicode. I know it is not that easy to implement unicode. I believe, that this the undelying reason of the opposition.

But I would like to remind that there is no such thing as “plain text”. You always have to specify the encoding. Is it windows-xyz, EUC or UTF-8? Without that information, the risk that your application shows rubbish is high. In the world wide web, UTF-8 has become the de-facto standard.

So please, remember the philosophy of markdown and think of the users first!

1 Like

@Kagan_Kayal

Basically what you just said can be summarized into these 3 bullets

  • Copy and paste a perfect plain text:
    • it is perfect plain text, it is not perfect Markdown. We are not talking about any plain text. In my post above I have already argued copy and pasting something and expect good results is ignorant.
  • John Gruber’s philosophy:
    • As you quoted John Gruber’s philosophy, remember the philosophy guides the design, but it doesn’t define it.
      • One can create a equally good implementation around this philosophy, and yet it is not compatible with Markdown (I hope it is obvious but if it is not I can make up some example).
      • So quoting a philosophy and says that something agrees with the philosophy has to be implemented is absurd.
  • simplicity for users regarding copy and paste:
    • I am all for the simplicity for users, but for Markdown users only. I don’t think we should care Apple Notes users, MS Words users, etc.
    • I have already argued why this is so in my previous bullet list. The short reason is because the last thing one should really concern about copy and pasting things from another application is the bullet. There’s a whole lot of features/formatting stripped off/messed up if you do that.
    • My personal experience on migrating from those sources is to use Pandoc.
    • As I have argued, if we aimed at the simplicity for Markdown users, we should keep the syntax simple. Read what I said up there.
  • P.S.
    1. You should stop pushing. If your idea is bright, they heard you, and they are considering it. If you have anything added to the discussion, you can keep saying. But if all you did is repeating yourself, which IMHO you have done quite a few times, you are wasting everyone’s time.
    2. Remember,
      • it is not about you, it is (supposedly) about all Markdown users.
      • no decision can please everyone. CommonMark should and can only please the majority. (e.g. I am here reading about CommonMark because I’m seeing if it is going in the direction I like, and if so I would adopt it. But if not? Pick something else. I can’t control them.)
        • The success of CommonMark depends on it. (Think about it: what makes it a common standard of all future Markdown, or just yet another Markdown variants?) So for own sake, before you push a feature, think about how it will affect the adoption of CommonMark.
        • You argued that they are “more concerned about the simplicity for programmers than…”. While I don’t believe it is true (as I just said above), you can’t ignore the programmers. Think about it: who’s going to decide the adoption of CommonMark? You? Come on, it’s those programmers.
    3. Originally I was just passing by and reading the progress of CommonMark. But as I explore and read your comments (assuming @kagan is you?), I found that you are pushing it too aggressively and so I decided to voice my opinion. To be polite I didn’t point at you and just reasoned (some added, some mentioned) with why I don’t think it should be included. In a certain sense I am afraid because you’re pushing it so aggressively, those developers might think it might save them time to implement it than argues with you. That’s why I put the reasons against it, not related to how easy/difficult it is to implement it but on Markdown users experience.
    4. Now, I have raised my concern, what if they happen to decide to add the bullet point as you suggested? I could write one more post to clarify why I think that’s wrong, perhaps add some points, or rewrite it clearer, etc. But I shouldn’t insist (and frankly my insisting won’t matter). If not, it will just turn into a troll. (think about what if someone is equally as aggressive as you and have exactly the opposite opinion as yours).
    5. Remember to separate opinions and facts. The developers over there are trying to make the right decision, as to what is right or wrong, it is going to be subjective. We should expect them to be as objective as possible but it is impossible to be totally objective.
    6. My last analogy will be the translation of the Bible. Markdown variants are like different English Bible translation. Some are loose, some are strict, some has a lot of feature, some do one thing and only one thing good. CommonMark is like to push a common standard among all English translation (which in the past is King James’), the best effort so far IMO is ESV (I might be wrong, but there always exists one which has the biggest market share, let say). But the best it can do is to please the majority, and gains the biggest share among Bible readers. Even if the majority like ESV, many others doesn’t (e.g. the controversy on gender neutrality).

when I first learn Markdown, I am confused with all variants to do the same thing, especially -, * in lists and *, _ in emphasis. As someone has said, there’s already too many ways to do it.

Then you ought to approve the U+2022 ‘•’ suggestion, as it provides a character with exactly one meaning.

One stupid argument I could think of to propose another addition of list marker: / sign. Since, +, -, * are already supported and means the same thing, why don’t we also use / and let it means the same thing?

The reason +, -, and * are used for bullet points is that they are often used that way in plain-text bulleted lists. Well, here in the 21ˢᵗ century, so are the Unicode bullet characters—particularly U+2022 ‘•’.

Do not add more list markers. We already have too much.

Then ignore them; nothing requires you to use them, and the implementation cost is almost nil.

Not wishing to interrupt the flamewar between kolen and Kagan_Kayal, I would like to add these two points.

####Commonmark is already a Unicode specification

A character is a Unicode code point. Although some code points (for example, combining accents) do not correspond to characters in an intuitive sense, all code points count as characters
for purposes of this spec.
This spec does not specify an encoding; it thinks of lines as composed of characters rather than bytes. A conforming parser may be limited to a certain encoding.

So it doesn’t matter whether a parser prefers UTF-7, -8, -16, -32, or some esoteric IBM standard, it has to support Unicode code points anyway. This also becomes apparent a few lines later (emphasis mine):

A Unicode whitespace character is any code point in the Unicode Zs class, or a tab (U+0009), carriage return (U+000D), newline (U+000A), or form feed (U+000C).

The old ASCII-Whitespace is only used inside HTML tags, which is probably to be compatible with the HTML standards.

So if Commonmark uses Unicode anyway, it might as well use all of it. Otherwise we have an ASCII markup language with some Unicode text inbetween, which seems quite arbitrary.

on the other hand…

####Unicode bullets can’t be escaped

Any ASCII punctuation character may be backslash-escaped.
Backslashes before other characters are treated as literal backslashes.

This is definitely something that needs to be looked into, and AFAIK it wasn’t mentioned yet.

3 Likes

Here’s the implementation cost in Asciidoctor: one alternative added to two regexps in the code, and two copy-and-pasted unit tests; see https://github.com/asciidoctor/asciidoctor/commit/9cb62122. The cost to CommonMark implementations would be similarly trivial.

Hi, @JCSalomon, might be I wasn’t clear, but my point is what’s done is done. As we can’t eliminate the use of +,-,*, at least we shouldn’t add more. No matter how intuitive • is (and even if it should be the only one that make sense), we shouldn’t add more to Markdown to do what can already be done.

And it isn’t really about me, or anyone who are reading this, since all of us already know the original syntax and won’t be confused. And this is my whole and only point: don’t make it too complicated to confuse new comers. We can see this even in the CommonMark 60 seconds guide, in the guide it didn’t mention +.

And as to the stupid example of +,-,*,/ I guess it is safe to assume when I said that is stupid, means that’s not serious at all and only said to illustrate the analogy it makes to.

And let me also use the chance to clarify that my “opposing to the idea of adding the support of it” is not as much as “the idea of someone trying to keep pushing and repeating his arguments to any counterarguments to his”. I think I have been very clear on this point.

Going back to the original point, since the addition of the • as an input doesn’t really add a new feature (in terms of output) I think it shouldn’t even be considered as an extension to CommonMark. But I say shouldn’t, not mustn’t. Using this language IMHO it mustn’t be a core feature of CommonMark, but shouldn’t be an extension.

I know some is going to disagree. I just hope some people can learn to agree to disagree. And it is fabulous that whether or not it will be implemented is not up to (some of) you and me.

P.S. I forgot to mention this, I should have mentioned it in the beginning: some might say adding 1 more bullet to the original 3 list markers is no big deal, and won’t be polluting the syntax too much. But the real danger is after the bullet is added. It can potentially be a beginning of a flood of new bullets. C.f. One of the post above about the long lists of other bullets. Now I might add in the case that the bullet • is added, the only logically consequence of it is to add the others too. 0.02 cents again.

Thanks for the clarification about unicode.

Concerning escaping the punctuation, good catch!

I had a look to the current spec. In Section 2, it says

A punctuation character is an ASCII punctuation character or anything in the Unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps

Then I found out, that the bullet character is in the class Po as defined in this source:

http://www.fileformat.info/info/unicode/category/Po/list.htm

The question for me is why section 2 includes so many Unicode characters in the definition of punctuation characters and then it is explicit about ASCII punctuation characters in section 6.1 as you have already quoted.

Maybe I see this from a more high-level perspective and consider parsers a technicality to be solved, but for me the most important things are

  1. The purpose of Markdown is to add semantic structure to a human-readable plain text, and
  2. CommonMark is a standard based on Unicode.

All these arguments about which editor on which platform uses what special characters is really secondary.

Of course, for someone who writes mostly ASCII text it seems easy to use * et al, but if you write in Chinese* or Arabic or Klingon and you’re using a wide range of Unicode code points anyway it seems quite natural to use bullets as bullets, doesn’t it?

The only reason why we’re not discussing this in reverse (“do we really need * et al if Unicode provides us with actual bullets?”) is that John Gruber and most of us are still using a 140 year old keyboard layout, which really is a shitty argument in the greater scheme of things.

Thus, I think CommonMark should support all Unicode bullet characters, just as it supports all whitespace and line-break characters as such.

*) Disclaimer: I have no idea how one writes Chinese on a less-than-table-sized keyboard, but I assume it involves sacrificing the blood of a goat.

A summary of the discussion thus far:

Pros

  1. Trivial to implement
  2. Produces the expected output when are in the input.

Cons

  1. It is hard to type for some users
  2. cannot be encoded in an ascii string.
  3. It may confuse a novice user when editing a documents with and may make markdown harder to learn.

My 2 Cents:

Cons 1 and 2 are minor. There is some existing useage of the bullet character already, and the markdown spec is encoding agnositc anyway (I believe there is consensus here). Since the feature is trivial to implement. That leaves Con 3 and Pro 2 which both address core aspects of markdown design phillosophy. It is not at all clear to me which is the better / more important arguement (who knows what actual users are like?). Probably either choice is ok.

1 Like