"_" and "*" being synonymous is redundant and confusing

Not sure if this is really within the scope of this standard to change, but this has irked me and several others: * and _ both act exactly the same when producing italics or bold text. This makes markdown a bit more confusing to learn, and to avoid making a mess, the author has to pick one scheme and stick to it.

One solution would be to scrap one of them altogether; using * or _ exclusively. For example, *i* could be the only way to create italics, and **b** the only way to create bold text. Another solution would be to let *b* be bold and _i_ be italics.

As I said, Iā€™m not sure whether itā€™s within the scope of this standardization effort, but in my opinion, it is a real issue with the markdown standard, and has to be changed at some point. This will break some backwards compatibility, so if it should be changed at all, it has to be changed as soon as possible.

3 Likes

I agree here, but I wonder how disruptive this would be for sites/apps with existing content to adopt SMD ā€¦ I wonder to what degree backwards compatibility is a concern that should be acknowledged. Making it relatively painless to upgrade would spur adoption.

1 Like

First, remember that according to the HTML spec itā€™s not italics and bold, itā€™s emphasis and strong importance. I think whatā€™s more at issue here is a contradiction between the HTML spec and SMD spec:

HTML

The em element represents stress emphasis of its contentsā€¦ The level of stress that a particular piece of content has is given by its number of ancestor em elementsā€¦ The em element also isnā€™t intended to convey importance; for that purpose, the strong element is more appropriate.

SMD

Markdown treats asterisks (*) and underscores (_) as indicators of emphasis.

So while HTML stresses that em and strong are completely different elements, SMD clearly conflates them and considers strong to be ā€œstrong emphasisā€. I donā€™t see a clear path here.

  1. Using only * for strong and only _ for em would be more compliant with the HTML standard by separating the two elements. SMD does already allow for nesting of em and strong, so ā€œstrong emphasisā€ could still be conveyed by nesting.
  2. The existing behavior is more consistent with Markdownā€™s original (though technically incorrect) interpretation of em and strong. I think this is also pretty consistent with those elementsā€™ usage on the net.

When it comes down to it, I think this is really an HTML5 problem. i, b, em, and strong are pretty high on the list of inconsistently used elements, so what should SMD do? Be a stickler and try to follow the HTML5 standard or go for ease-of-use and do what people usually intend?

Markdown is based on the way people used to mark up conversations on USENET. In that environment, people used *emphasis* and _emphasis_ and sometimes also -emphasis- interchangeably. If they wanted to indicate extra emphasis, theyā€™d put **double stars** or __double underscores__, again interchangeably. People generally tended to pick one or the other and stick to it, except sometimes to avoid ambiguity (e.g. I hate *identifiers_with_underscores*)

Thatā€™s why itā€™s interchangeable in Markdown. Making _ and * each do something different would be an unpleasant and annoying break with tradition.

You say that this makes markdown a bit more confusing to learn, and to avoid making a mess, the author has to pick one scheme and stick to it. I donā€™t see why this is confusing. There are two different ways to write each level of emphasis, how is that more confusing than it would be if * and _ did different things? (And then you would have to remember to nest them properly!)

4 Likes

Backwards compatibility was a major goal. Of course, implementations
vary enough that itā€™s impossible to avoid all breakage. But we want to
minimize it.

3 Likes

In my experience, most people intuitively see _ as denoting emphasis (itā€™s visually lighter) and * as denoting strong importance (itā€™s visually heavier).

Gruber had some logical arguments behind his decision, but it ultimately seems to have been his personal preference and stubbornness that won through. Aaron Swartz, Merlin Mann and several others were advocates for _emphasis_ and *strong*, but Gruber was not convincedā€¦

In short, youā€™re sort of screwed, because thatā€™s how I write, and itā€™s how Iā€™ve written since around 1992.

Unfortunately, any change to this would break backwards compatibility with legacy Markdown documents. Not to mention the habits of everyone who has used Markdown for years. Trying to get this fixed in SMD will not be easy.

That said, I think it could be done with relatively little breakage in legacy documents.

Have the spec declare that _ denotes emphasis and * denotes strong importance, while still allowing doubles (__, **) to denote the same as their respective single character. Double characters must be used to add emphasis and strong importance in the middle of words.

The breakage in legacy documents would then be:

  • __foo__, intended as strong importance instead come out as emphasis
  • *bar*, intended as emphasis instead come out as strong importance

Itā€™s mainly a semantic issue, the degrees of emphasis change. Unless Iā€™ve overlooked something? (Not unlikely. Despite its simple syntax, Markdown is full of quirks and surprises!)

Iā€™d love to see this getting fixed in SMD. But Markdown is in a very different position today than when Gruber made his decision back in 2006. It will not be easy to get a change like this accepted today either.

1 Like

Hm, I didnā€™t think it through with intra-word emphasis. Examples 280, 282, 284 and 287 would also be broken in legacy documents.

I would be vehemently against ā€œfixingā€ this, since it would break backwards compatibility in a big way. And not only in existing documents, but it would also change patterns that millions of people might have learned already.

4 Likes

Thatā€™s my conclusion as well. Long story short: Itā€™s too late.

Edit: Just as a tease, hereā€™s how I would have liked Markdown to be:

I _really_ like Markdown, but *beware*, stick to the spec.
It is supercali**fragilistic**expialidocious.

Emphasis: _foo_
Strong: *foo*
Intra-word emphasis and strong: foo__bar__baz, foo**bar**baz
No intra-word emphasis and strong: foo_bar_baz, foo*bar*baz
2 Likes

Iā€™d be furious if my italics suddenly starting turning into bold text.

1 Like

What if asterisks were used for <em> and <strong>, and underscores for <i> and <b>. This would preserve backwards compatibility visually, while allowing authors to make the appropriate distinction when necessary.

E.g.:
I really like The Big Lebowski!
I *really* like _The Big Lebowski_!

(Note that this may introduce accessibility issues, where things that were properly emphasized in legacy documents suddenly are not properly emphasized. However, Iā€™m guessing legacy documents already had accessibility issues with strings that were emphasized inappropriately.)

Larger list of syntax duplications which I think should also be deprecated from the language and later removed: https://github.com/karlcow/markdown-testsuite/issues/53 . Analogous for code blocks: We should abandon indented code blocks

Generally speaking, if there are 2 ways to achieve the same thing, then itā€™s confusing to learn. As a novice you donā€™t know if itā€™s really the same, if one is different from the other in some way or which you should use for what. Once someone tells you itā€™s actually the same, itā€™s not confusing anymore, but you canā€™t intuitively decide that for yourself (necessarily).

In this case though Iā€™m with you in that 1) itā€™s already too late and 2) itā€™s not confusing because these notations have been around forever and they both communicate well what they do. If rewinding time were possible, Iā€™d definitely remove one, though.

While I would like to get rid of underscores (I find them confusing as used for emphasis), I suspect that it would break compatibility with far too many documents to be worth the change.

Even after using markdown for ages. I still think the markdown convention for * and _ to be wrong.

Iā€™ve always seen italics to be _ and bold to be * .

Even if itā€™s not backwards compatible, we really do need to rought this one out, or this mistake will set in stone even harder (Much like the switch from Python 2.x to 3.x) .

Maybe you could make:

  • <em> ā†’ _em_
  • <strong> ā†’ *strong*
  • <i> ā†’ __italic__
  • <b> ā†’ **bold** .

My justification for the above is based from this section for html5 <b> tag. Which indicated that <strong> should be used first over <b> to bold text. HTML b Tag :

Note: According to the HTML 5 specification, the <b> tag should be used as a LAST resort when no other tag is more appropriate. The HTML 5 specification states that headings should be denoted with the <h1> to <h6> tags, emphasized text should be denoted with the <em> tag, important text should be denoted with the <strong> tag, and marked/highlighted text should use the tag.

1 Like

Your quote highlights why your proposition is a bad idea; itā€™s semantically incorrect. __text__ is strongly emphasized due to being surrounded by double underscores. The <i> tag does not mean strong emphasis/importance.

Not about ā€˜boldā€™ or ā€˜italicsā€™, itā€™s about emphasis and importance. In which case you could only argue for _ as emphasis and * as importance. This would be incompatible with having double asterisks or double underscores as means of indicating importance, resulting in massive backwards compatibility issues.

Not to mention that your proposal from the beginning already breaks many, many existing documents and implementations. (Everyoneā€™s single asterisk wrapped text would be changed from being in <em> tags to being in <strong> tags, and vice versa for double underscore wrapped text.)

This is one of those topics where the original Markdown spec is quite clear in its definitionsā€¦ I also think the logic is reasonably justifiable.

And since the goal of CommonMark is to be as true as possible to the original spec, and render existing Markdown docs as faithfully as possible, our path forward is clear here.

2 Likes

Excuse my possible mistakes. This is my first post in to this platform.

@codinghorror, with all due respect, this particular issue isnā€™t about logical consistency (sorry, Iā€™m afraid I havenā€™t been able to find its justification here).

in my opinion. the main problem for users is that this is visually misleading.

And this isnā€™t limited to new users, but it affects to the source readability in CommonMark by experienced users.

From the philosophy:

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else.

Well, Iā€™m afraid that we can agree in the following:

  • This issue doesnā€™t improve ease to write or mainly to read the source.

  • Leaving the compatibility side aside, reserving _ for emphasis and * for strong emphasis is perfectly feasible.

Faitfulness to original Markdown in CommonMark is great. But improving the spec following its design principles may be better.

Markdown already has an installed base of thousands of websites and apps, and millions of documents. Changing the language in backwards-incompatible ways for commonly used syntax is not really on the table. The only thing that would accomplish is a further fragmentation of the language, where the same text renders in different ways depending on what markdown implementation and version is being used.

Sorry to say it, but that ship has sailed.

1 Like

@codinghorror, I have just realized that I misread ā€œreasonably justifiedā€ instead of ā€œreasonably justifiableā€.

This was the reason why I couldnā€™t find any justification in Markdown spec.