Beyond Markdown

#30

I believe John Gruber was referring to the Markdown syntax only here, rather than Markdown + HTML in the document. Further in the syntax guide he writes (emphasis mine):

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

So, a “Markdown-formatted document” mentioned in the prime directive is referring to just the Markdown syntax in the document. HTML is allowed because Markdown syntax is not intended to solve every problem.

I think this would remove the requirement for raw HTML in a lot of cases (even if it is still desired for the reasons I mentioned in earlier posts). We would need many of the proposed extensions to always be available, such as definition lists, tables, and embedded video. A significant challenge would be coming up with a short syntax for all the different types of phrasing content, while still making the syntax both concise and understandable in plain text.

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

1 Like

#31

To that end I’d suggest [removing these features] from CommonMark 1.0

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

That’s a great point. Maybe this is simply a semantic issue (for me). If the solution to this daydream is called “CommonMark 2.0” rather than “NewLanguage 1.0”, it implies:

  • breaking backwards compatibility
  • explicit support (or not) for specific versions
  • the need for an upgrade utility :smiley:
  • that the new version is better :smiling_imp:

I speak as someone who is comfortable with the future being unevenly distributed ala Python 2 and Python 3, so ymmv.

1 Like

#32

@jgm have you considered deprecating certain CommonMark features which could conflict with planned or oft-requested features?

A deprecated feature would continue to work in a 1.0-compliant renderer, but might cease to work in any later version.

A minor complication of this approach is that renderers would need to take a CMVer parameter in order to support a document collection which uses more than one spec version.

1 Like

#33

I like your reform of emphasis. I would add that Creole is particularly impressive in using markup characters that hint at their function.

For example, italics, or what we’re calling simple emphasis, are marked with double slashes //. The rightward lean of the slashes makes it easy to remember how to do italics since the letters are rightward leaned.

And for a numbered list, Creole using the pound / hash / number symbol #. (Bulleted lists use asterisks.) I think Markdown just uses literal numerals for numbered lists, which makes sense, but maybe sometimes it’s easier to use the pound/number sign so that lists can be modified without having to renumber every item.

Changing gears a bit, I think all CMSes and markup languages that render HTML, such as Markdown, should emit minified HTML by default. In fact I think HTML should be minified by definition/specification – it was a big mistake for the HTML spec writers to not specify that production/consumed HTML had to be minified according to XYZ standards. The result is a lot of bloated HTML files and websites, waste of bandwidth, energy, and user time. It’s one thing to sling a ridiculously inefficient text format all over the net, but it’s even worse when there’s so much bloat added to it. I realize that image bloat is a bigger problem, but everything counts and HTML should be minified by default, at birth.

0 Likes

#34

If the “Beyond Markdown” language does take off, at least the proposed emphasis rules would already be compatible with one popular system that uses a “Markdown-like” syntax.

2 Likes

#35

WhatsApp uses almost the same formatting conventions as Slack.

3 Likes

#36

I like this proposal. Markdown has some strengths without which it would not have gained its current popularity, but not everything about it is great.

The mentioned pain points fall into two categories: many ways to do the same thing (1, 3, 4) and missing power/generality (2, 5, 6).

Having more than one way to emphasize a word makes absolutely no sense. What does nested emphasis even mean?

The second point is that given markdowns limited power (basically everything that is discussed in extensions), it probably seemed like an easy fix to just allow in-lining arbitrary HTML. Today, Markdown has become so much more than just a fast way of writing HTML (for arbitrary HTML it is not even the best tool) so tying it to HTML does more harm than help. What Markdown needs is to natively support constructs that appear in written text (definition lists, tables, etc.) and then a way of annotating the document so that the document converter can do something smart with it. These annotations should reflect the semantic meaning of parts of the document or add some metadata to it, similar to LaTeX.

Getting 6. right could be tricky though: What if you want to give a few adjacent words or paragraphs a slightly different semantic meaning? Do you have to emphasize the words first and then apply the attribute? Do you have to add the attributes separately to all the paragraphs?

The main selling point about markdown is its beauty and its flat learning curve. None of that is taken away by simplifying it and adding some general way to extend it.

You can’t make an omelette without breaking some eggs.

3 Likes

#37

I also think it’s worth looking at ArchieML, created by the New York Times IT group for NYT writers/reporters. It’s sort of like Markdown but with a big emphasis on data and types of data embedded in articles. It compiles to JSON. There might be some useful approaches there.

0 Likes

#38

Simpler is better for everyone. Common Mark should stand apart with no (or minimal) reliance on other other languages. IMO, backward compatibility is a goal not an absolute. Where backward compatibility is possible go for it but do not be bound by it. Very probable not all variants of Markdown can be built into Common Mark. Common Mark needs to “exceed” the other variants so they go away. Simplicity and unambiguous ways of writing will eventually prevail. Getting all to use Common Mark not likely.

  • I agree with @alehed that Common Mark should provide the ability to create the “normal” features of writing documents (tables; footnotes and so on).

  • Eliminate multiple ways of performing the same task. For example, no short reference links. There are probable others.

  • Emphasis: not sure how to solve bold and strong. Bold = " * “. Strong +” ** ". I agree with one character to identify letter format.

  • A truly radical proposal: use words, ie this becomes an attribute so there is no ambiguity (strong is strong; bold is bold). For clarity in human readability each attribute stands alone; cannot put multiple attributes in the same “holder”.

  • For attributes: {=…} @adiantwoods.

  • With a unambiguous statement of attributes HTML not needed. Not all know HTML or care to learn HTML.

  • A list should only be a list (no fancy complications).

  • All code that needs to “pass through” inside a code block.

Always open to comments and suggestions.

1 Like

#39

Having to visually parse <strong>words that are not</strong> part of the sentence does <emphasis>not</emphasis> achieve “clarity in human readability”. See what <bold>I</bold> did there?

I think you are confusing *markup* readability with *content* readability, and Markdown is focused on the latter.

2 Likes

#40

I also wonder whether one can expect to get those “normal features” (in the form of extensions) of (more serious) documents like tables/footnotes/… in the Common Mark or is it better to focus on the systems supporting other markups (Markdown Extra, rst, Asciidoc,…)?

In other words, what is the practical meaning of Proposed Extensions, can one expect to see e.g. support for footnotes or it is more probable that we won’t see much of them in the final spec?

0 Likes

#41

1. Emphasis

No personal strong opinion. Will follow best consensus/practices.

2. Reference links

No personal strong opinion. Will follow best consensus/practices.

3. Indented code blocks and lists

+1.0 with your fix.

4. Raw HTML

INLINE FORM

+1.0 with your fix

BLOCK FORM

+0.5 with your fix

  • should be completed with a section ::: too, like with pandoc fenced_div.
  • which should now rather be understood as a section marker compatible with html5 section, article, main etc.

5. Lists and blank lines

No personal strong opinion. Will follow best consensus/practices.

6. Attributes

+1.0 with your fix. A very good work.

However

  • {.class} should be recognized too
  • Attributes at end rather than at begin have appeared here and there.
    Why not relax conditions about these.

Edited: This link on github go in this direction too : generic directive extension list

1 Like

#42

An alternate fix for emphasis:

Require an exact match between the opening and closing delimiters. Kind of like inline code spans.

Emphasis would begin with a left-flanking delimiter run of exactly 1, 2, or 3 asterisks or 1, 2, or 3 underscores, and end with a right-flanking delimiter run of exactly the same length and character.

*emphasis*
_emphasis_

**strong emphasis**
__strong emphasis__

***strong plus regular emphasis***
___strong plus regular emphasis___

Four or more sequential asterisks or underscores would render literally.

****no emphasis****
____no emphasis____

Unmatched delimiter runs would not create emphasis at all, and could not divide into emphasis plus literal characters.

**no emphasis*

**no emphasis***

_no emphasis*

Any unmatched delimiters, including within emphasis, would render literally.

**asterisk* within strong emphasis**
__underscore_ within strong emphasis__

*asterisks** within emphasis*
_underscores__ within emphasis_

To create a literal asterisk or underscore next to emphasized text, a character can be escaped…

*asterisk\** inside emphasized text

*asterisk*\* outside emphasized text

…or a different delimiter character can be used.

_asterisk*_ inside emphasized text

_asterisk_* outside emphasized text

Nested emphasis would work.

**strong and *emphasis* within strong**

It would also be possible to nest the same type of emphasis by alternating asterisks and underscores.

*lots _of *emphasized* text_ here*

When using asterisks in a single word, emphasis would start with a left-flanking or both-flanking delimiter run, and end with a both-flanking or right-flanking delimiter run. This would allow intraword emphasis.

*emphasized*
*em*phasized
em*pha*sized
empha*sized*

These rules should be pretty intuitive and easy to learn, and backwards compatible to a large extent.

And they eliminate a huge amount of complexity and ambiguity.

3 Likes

#44

I think @aoudad’s suggestions on emphasis are sound, keeping both ease of reading and backward compatibility. On the rest, I tend to agree with @jgm. However, the bit that I find toughest to get behind is getting rid of shortcut reference links. Those are not only convenient, but are very readable as well. [foo][] gives up a bit of that human-readability for parsing convenience.

At any rate, looking forward to CommonMark 1.0.

2 Likes

#45

Some thoughts…

  1. Emphasis

    I like using _emphasis_ and *strong emphasis*. As somebody mentioned already, that’s how it works on WhatsApp, Facebook, and Slack, and it seems very logical. They also support ~strikethrough~, `monospaced`, and triple-backtick code blocks, which are all nice.

    I personally don’t care much for intra-word emphasis and I’d rather keep the single tilde free for strikethrough syntax.

  2. Reference links

    Shortcut reference links are great, and it would be a shame to remove them. They are very intuitive and readable. Just go to a random Hacker News comment thread and you’ll see people instinctively using them, even though they’re not supported there (just Ctrl/Cmd+F [1] to see examples).

    They are also essential for wiki-style and academic writing, which are both extremely rich in references. The extra noise caused by [this style][] would hurt readability.

  3. Indented code blocks

    Big yes to the more logical list indentation style, and to removing indented code blocks. What a pain they are.

  4. Raw HTML

    Not a fan of the specific syntax (why the extra =?), but this sounds good in general.

  5. Lists and blank lines

    I’d rather we just allow the creation of a list even without a blank line separating it from a paragraph. It seems to me this would rarely be a problem in practice. The example given can just be fixed by having 220. be on the previous line. Or maybe one could allow escaping the period like so 200\. to mean that you really do want to write 200. at the beginning of the line, and not start a list. Again, I really doubt this will happen very often.

  6. Attributes

    Why not use a consistent way of creating attributes for headers, like on GitHub? This would avoid having to introduce extra syntax, and keep documents cleaner.

2 Likes

#46

There is a discussion regarding adding header IDs to CommonMark, but it seems that - at least in some cases - having the ability to manually specify these is useful.

0 Likes

#48

To keep shortcut reference links readable, how about double brackets?

In other words, links would use external brackets for a reference label [foo][bar], and nested brackets if the link text is its own label [[foo]].

That makes it immediately clear there’s a link, not just a span or literal brackets; and it’s more natural-looking than appending empty brackets [foo][].

[[foo]]

[foo]: https://example.com
<a href="https://example.com">foo</a>

If there’s no link reference definition, the double-bracketed text would still be a link. It could fall back to an implicit page link:

[[foo]]
<a href="foo">foo</a>
0 Likes

#49

That’s a good suggestion. This is similar to the [[text]] format of the Wiki markup used by MediaWiki, and it is human-readable and doesn’t look horrid.

The trouble would be that it isn’t backward-compatible. But it is something I feel one can support, since I feel standardization is more important than backward-compatibility.

0 Likes

#50

I would say MediaWiki is exactly the reason why IMHO Markdown (or derivatives of it) should not use [[foo]] for the ordinary links. Wiki is one very natural application for Markdown-like syntax and it’s therefore better to reserve that syntax for wiki-links to other articles defined by the database of its articles; not to some arbitrary URIs.

2 Likes

#51

UPDATE ABOUT ATTRIBUTES

6. Attributes

+1.0 with your fix. A very good work.

However

  • {.class} should be recognized too
  • Attributes at end rather than at begin have appeared here and there.
    Why not relax conditions about these.

See Also

0 Likes