"_" and "*" being synonymous is redundant and confusing

First, remember that according to the HTML spec it’s not italics and bold, it’s emphasis and strong importance. I think what’s more at issue here is a contradiction between the HTML spec and SMD spec:

HTML

The em element represents stress emphasis of its contents… The level of stress that a particular piece of content has is given by its number of ancestor em elements… The em element also isn’t intended to convey importance; for that purpose, the strong element is more appropriate.

SMD

Markdown treats asterisks (*) and underscores (_) as indicators of emphasis.

So while HTML stresses that em and strong are completely different elements, SMD clearly conflates them and considers strong to be “strong emphasis”. I don’t see a clear path here.

  1. Using only * for strong and only _ for em would be more compliant with the HTML standard by separating the two elements. SMD does already allow for nesting of em and strong, so “strong emphasis” could still be conveyed by nesting.
  2. The existing behavior is more consistent with Markdown’s original (though technically incorrect) interpretation of em and strong. I think this is also pretty consistent with those elements’ usage on the net.

When it comes down to it, I think this is really an HTML5 problem. i, b, em, and strong are pretty high on the list of inconsistently used elements, so what should SMD do? Be a stickler and try to follow the HTML5 standard or go for ease-of-use and do what people usually intend?

Markdown is based on the way people used to mark up conversations on USENET. In that environment, people used *emphasis* and _emphasis_ and sometimes also -emphasis- interchangeably. If they wanted to indicate extra emphasis, they’d put **double stars** or __double underscores__, again interchangeably. People generally tended to pick one or the other and stick to it, except sometimes to avoid ambiguity (e.g. I hate *identifiers_with_underscores*)

That’s why it’s interchangeable in Markdown. Making _ and * each do something different would be an unpleasant and annoying break with tradition.

You say that this makes markdown a bit more confusing to learn, and to avoid making a mess, the author has to pick one scheme and stick to it. I don’t see why this is confusing. There are two different ways to write each level of emphasis, how is that more confusing than it would be if * and _ did different things? (And then you would have to remember to nest them properly!)

4 Likes

Backwards compatibility was a major goal. Of course, implementations
vary enough that it’s impossible to avoid all breakage. But we want to
minimize it.

3 Likes

In my experience, most people intuitively see _ as denoting emphasis (it’s visually lighter) and * as denoting strong importance (it’s visually heavier).

Gruber had some logical arguments behind his decision, but it ultimately seems to have been his personal preference and stubbornness that won through. Aaron Swartz, Merlin Mann and several others were advocates for _emphasis_ and *strong*, but Gruber was not convinced…

In short, you’re sort of screwed, because that’s how I write, and it’s how I’ve written since around 1992.

Unfortunately, any change to this would break backwards compatibility with legacy Markdown documents. Not to mention the habits of everyone who has used Markdown for years. Trying to get this fixed in SMD will not be easy.

That said, I think it could be done with relatively little breakage in legacy documents.

Have the spec declare that _ denotes emphasis and * denotes strong importance, while still allowing doubles (__, **) to denote the same as their respective single character. Double characters must be used to add emphasis and strong importance in the middle of words.

The breakage in legacy documents would then be:

  • __foo__, intended as strong importance instead come out as emphasis
  • *bar*, intended as emphasis instead come out as strong importance

It’s mainly a semantic issue, the degrees of emphasis change. Unless I’ve overlooked something? (Not unlikely. Despite its simple syntax, Markdown is full of quirks and surprises!)

I’d love to see this getting fixed in SMD. But Markdown is in a very different position today than when Gruber made his decision back in 2006. It will not be easy to get a change like this accepted today either.

1 Like

Hm, I didn’t think it through with intra-word emphasis. Examples 280, 282, 284 and 287 would also be broken in legacy documents.

I would be vehemently against “fixing” this, since it would break backwards compatibility in a big way. And not only in existing documents, but it would also change patterns that millions of people might have learned already.

4 Likes

That’s my conclusion as well. Long story short: It’s too late.

Edit: Just as a tease, here’s how I would have liked Markdown to be:

I _really_ like Markdown, but *beware*, stick to the spec.
It is supercali**fragilistic**expialidocious.

Emphasis: _foo_
Strong: *foo*
Intra-word emphasis and strong: foo__bar__baz, foo**bar**baz
No intra-word emphasis and strong: foo_bar_baz, foo*bar*baz
2 Likes

I’d be furious if my italics suddenly starting turning into bold text.

1 Like

What if asterisks were used for <em> and <strong>, and underscores for <i> and <b>. This would preserve backwards compatibility visually, while allowing authors to make the appropriate distinction when necessary.

E.g.:
I really like The Big Lebowski!
I *really* like _The Big Lebowski_!

(Note that this may introduce accessibility issues, where things that were properly emphasized in legacy documents suddenly are not properly emphasized. However, I’m guessing legacy documents already had accessibility issues with strings that were emphasized inappropriately.)

Larger list of syntax duplications which I think should also be deprecated from the language and later removed: https://github.com/karlcow/markdown-testsuite/issues/53 . Analogous for code blocks: We should abandon indented code blocks

Generally speaking, if there are 2 ways to achieve the same thing, then it’s confusing to learn. As a novice you don’t know if it’s really the same, if one is different from the other in some way or which you should use for what. Once someone tells you it’s actually the same, it’s not confusing anymore, but you can’t intuitively decide that for yourself (necessarily).

In this case though I’m with you in that 1) it’s already too late and 2) it’s not confusing because these notations have been around forever and they both communicate well what they do. If rewinding time were possible, I’d definitely remove one, though.

While I would like to get rid of underscores (I find them confusing as used for emphasis), I suspect that it would break compatibility with far too many documents to be worth the change.

Even after using markdown for ages. I still think the markdown convention for * and _ to be wrong.

I’ve always seen italics to be _ and bold to be * .

Even if it’s not backwards compatible, we really do need to rought this one out, or this mistake will set in stone even harder (Much like the switch from Python 2.x to 3.x) .

Maybe you could make:

  • <em>_em_
  • <strong>*strong*
  • <i>__italic__
  • <b>**bold** .

My justification for the above is based from this section for html5 <b> tag. Which indicated that <strong> should be used first over <b> to bold text. HTML b Tag :

Note: According to the HTML 5 specification, the <b> tag should be used as a LAST resort when no other tag is more appropriate. The HTML 5 specification states that headings should be denoted with the <h1> to <h6> tags, emphasized text should be denoted with the <em> tag, important text should be denoted with the <strong> tag, and marked/highlighted text should use the tag.

1 Like

Your quote highlights why your proposition is a bad idea; it’s semantically incorrect. __text__ is strongly emphasized due to being surrounded by double underscores. The <i> tag does not mean strong emphasis/importance.

Not about ‘bold’ or ‘italics’, it’s about emphasis and importance. In which case you could only argue for _ as emphasis and * as importance. This would be incompatible with having double asterisks or double underscores as means of indicating importance, resulting in massive backwards compatibility issues.

Not to mention that your proposal from the beginning already breaks many, many existing documents and implementations. (Everyone’s single asterisk wrapped text would be changed from being in <em> tags to being in <strong> tags, and vice versa for double underscore wrapped text.)

This is one of those topics where the original Markdown spec is quite clear in its definitions… I also think the logic is reasonably justifiable.

And since the goal of CommonMark is to be as true as possible to the original spec, and render existing Markdown docs as faithfully as possible, our path forward is clear here.

2 Likes

Excuse my possible mistakes. This is my first post in to this platform.

@codinghorror, with all due respect, this particular issue isn’t about logical consistency (sorry, I’m afraid I haven’t been able to find its justification here).

in my opinion. the main problem for users is that this is visually misleading.

And this isn’t limited to new users, but it affects to the source readability in CommonMark by experienced users.

From the philosophy:

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else.

Well, I’m afraid that we can agree in the following:

  • This issue doesn’t improve ease to write or mainly to read the source.

  • Leaving the compatibility side aside, reserving _ for emphasis and * for strong emphasis is perfectly feasible.

Faitfulness to original Markdown in CommonMark is great. But improving the spec following its design principles may be better.

Markdown already has an installed base of thousands of websites and apps, and millions of documents. Changing the language in backwards-incompatible ways for commonly used syntax is not really on the table. The only thing that would accomplish is a further fragmentation of the language, where the same text renders in different ways depending on what markdown implementation and version is being used.

Sorry to say it, but that ship has sailed.

1 Like

@codinghorror, I have just realized that I misread “reasonably justified” instead of “reasonably justifiable”.

This was the reason why I couldn’t find any justification in Markdown spec.

on usenet, it was more like *bold*, _underline_ and /italics/. which seems more intuitive.

4 Likes