Beyond Markdown

chrisalley · May 2, 2018, 2:21am

One of the nice features, from a writer’s perspective, is that Markdown doesn’t require special markers or delimiters to start special cases. The proposal here would add this requirement to start HTML blocks or add intraword emphasis. So I think I agree with @piro_or that adding more complex (even if only a little) syntax would take away some of what makes Markdown appealing. It would reduce complexity from the spec and implementations, but seemingly at the writer’s expense.

For complex scenarios, we have the fallback to HTML. If nested emphasis is rare, HTML could be used in just those scenarios, while keeping the normal Markdown syntax simple. If the parser knows about the particular HTML tags used, they could be stored in the AST and then converted into other formats. We also have the concept of extensions, such as what GitHub have done, to use as shorter syntax in more widespread cases, such adding attributes. With these two solutions in mind (HTML fallback and extensions), other than making life easier for spec and parser writers, do we really benefit from a new set of incompatible rules?

jgm · May 2, 2018, 5:00am

I think the proposal offers quite a bit to writers:

the ability to pass through any format, not just HTML
the ability to put attributes on any block element
a simpler mental model for emphasis and lists, with fewer rules to remember
a uniform and predictable extension mechanism, instead of ad hoc syntactic extensions

These things may not be important to you, but there are writers for whom they are important.

chrisalley · May 2, 2018, 10:31am

My use cases for a light weight markup language are primarily web based systems, particularly those where it’s important for the writer to productive quickly without learning new syntax rules or referencing a manual. I wonder if our use cases are different enough that a new language in the Markdown family of languages (like what you’ve proposed) would make sense for only some types of writers, while others would be better off staying with the status quo?

Could you add support for other formats to CommonMark as an extension (with the ````{=latex}` syntax), while keeping raw HTML working by default (as it does now)? Same with attributes as an extension. If so, these two points wouldn’t justify a new language on their own (although I can understand your desire to stay neutral with non-Markdown syntax as well).

On the one hand, I agree that the proposed rules for emphasis are simpler. On the other hand, with the current emphasis rules, the writer can muddle through without looking up the special intra-word emphasis syntax in a manual, by testing the output in a preview window (like the one this forum uses). I suppose the editor could have a button that adds intra-word emphasis, although the writer may not think to use it if they believe they already know how to write emphasised text.

Can you talk a bit more about how this would work in the new language? For example, to add tables, description lists, strikethrough and other common proposed extensions, how would adding these to the new language differ to adding them to CommonMark?

By the way, I don’t want to come across as negative towards the idea of a new language. These are the kinds of objections that others who primarily use web-based systems will likely raise, especially if we’re asking people to change their entrenched habits. Personally I wouldn’t mind writing in a language that’s a bit stricter and more explicit if it makes things simpler overall.

jgm · May 2, 2018, 5:57pm

This may well be the case.

Yes, but this still leaves you with an ugly and complicated set of rules for determining what belongs to an HTML block.

I was just referring to what I said above:

However, this approach is too unwieldy to use for definition lists or strikethrough, and definitely not enough for tables, so some extensions would still be needed.

chrisalley · May 4, 2018, 11:07am

I think this another case where users of web systems, wanting to get some raw HTML working quickly, would find CommonMark simpler, while in more advanced cases explicitly declaring the section of text as HTML would make inserting it simpler. This, and the similar points I brought up earlier, are leading me to believe that a new language project might be worth undertaking as an alternative, rather than as a replacement, to Markdown.

vas · May 4, 2018, 2:50pm

This.

Markdown took the markup world by storm for a reason. It needs to stay true to that reason. It should not and cannot be all things to all people. Other markup languages exist or can be created for people who have different reasons.

Has there been a discussion here or elsewhere that gets to Markdown’s reason?

I believe the reason is as is stated in the CommonMark intro:

What distinguishes Markdown from many other lightweight markup syntaxes, which are often easier to write, is its readability. As Gruber writes:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. (Daring Fireball: Markdown)

The Prime Directive, so to speak. I would put portability second (the reason for CommonMark’s clarification of Markdown, right?), and then greater expressibility (e.g. supporting tables) third. Greater expressibility must bow to The Prime Directive; we can’t just add stuff willy-nilly. If adding stuff tends to break the Prime Directive, then we have to be very judicious – e.g. don’t add things needed by a a few at the expense of the many (ok, that second Star Trek reference wasn’t intentional, but I like it). If someone comes up with some amazing syntax that supports semantic “divs” and the like while staying true to the Prime Directive, sweet.

[And yeah, I know that some of the greatest Star Trek episodes are where the Prime Directive is broken. But it wasn’t willy nilly. And it certainly wasn’t for self-serving reasons.]

jgm · May 4, 2018, 5:11pm

I agree completely about the Prime Directive; that’s why I put it right at the beginning of the Commonmark spec. But you can’t justify “include HTML anywhere you like without special marking” by appealing to the Prime Directive. Once you include raw HTML, you’ve already gotten away form “publishable as-is, as plain text, without looking like it’s been marked up…”

If we need to choose between

(1) what the original Markdown allows:

<div class="warning">
<p>Don't try this at home.</p>
</div>

(2) what Commonmark allows:

<div class="warning">

Don't try this at home.

</div>

(3), which you can do in pandoc:

::::::::: warning ::::::::::::
Don't try this at home.
::::::::::::::::::::::::::::::

and (4), which the proposal above would allow:

{warning}
Don't try this at home.

then I think it is (3) and (4) that best respect the Prime Directive.

mb21 · May 5, 2018, 10:29am

As a web-dev I understand you guys not liking the thought of not having the power of HTML quickly available. But maybe it would be more productive to talk about the use-cases for raw HTML in markdown documents. Maybe there is a better, more readable way?

Would adding a nice syntax for block containers (divs/aside/etc) and inline-spans not solve 99% of those use-cases?
(See the pandoc manual for examples.)

d3vid · May 9, 2018, 1:42pm

Many of these solutions suggest throwing out some syntax to make parsing easier. The result (from these suggestions) is a subset of Markdown/CommonMark that is easy to write, read and parse.

This seems like an excellent approach. It suggests to me that bad syntax will simply render as regular text, rather than weird markup, which is an easier-to-swallow surprise. In addition, it seems likely this will make it easier to learn from the correction (“Oh, I don’t need to indent code.”)

To that end I’d suggest implementing these particular solutions in CommonMark 1.0. It would mean that some Markdown files that-render-in-some-particular-parser would need to be updated, but imho these are the exact documents that would benefit from ambiguity fixes.

Could a similar approach be taken to remove other ambiguous cases, and instead require writers to write unambiguously. For example, how many of the 17 emphasis rules simply be declared “ambiguous” and rendered in regular text? The author would then need to find a simpler way of marking up their text, benefiting plaintext readers and parsers alike.

chrisalley · May 10, 2018, 12:44pm

I believe John Gruber was referring to the Markdown syntax only here, rather than Markdown + HTML in the document. Further in the syntax guide he writes (emphasis mine):

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

So, a “Markdown-formatted document” mentioned in the prime directive is referring to just the Markdown syntax in the document. HTML is allowed because Markdown syntax is not intended to solve every problem.

I think this would remove the requirement for raw HTML in a lot of cases (even if it is still desired for the reasons I mentioned in earlier posts). We would need many of the proposed extensions to always be available, such as definition lists, tables, and embedded video. A significant challenge would be coming up with a short syntax for all the different types of phrasing content, while still making the syntax both concise and understandable in plain text.

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

d3vid · May 10, 2018, 2:34pm

To that end I’d suggest [removing these features] from CommonMark 1.0

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

That’s a great point. Maybe this is simply a semantic issue (for me). If the solution to this daydream is called “CommonMark 2.0” rather than “NewLanguage 1.0”, it implies:

breaking backwards compatibility
explicit support (or not) for specific versions
the need for an upgrade utility
that the new version is better

I speak as someone who is comfortable with the future being unevenly distributed ala Python 2 and Python 3, so ymmv.

networkimprov · June 16, 2018, 12:27am

@jgm have you considered deprecating certain CommonMark features which could conflict with planned or oft-requested features?

A deprecated feature would continue to work in a 1.0-compliant renderer, but might cease to work in any later version.

A minor complication of this approach is that renderers would need to take a CMVer parameter in order to support a document collection which uses more than one spec version.

JoeDuarte · June 25, 2018, 3:10am

I like your reform of emphasis. I would add that Creole is particularly impressive in using markup characters that hint at their function.

For example, italics, or what we’re calling simple emphasis, are marked with double slashes //. The rightward lean of the slashes makes it easy to remember how to do italics since the letters are rightward leaned.

And for a numbered list, Creole using the pound / hash / number symbol #. (Bulleted lists use asterisks.) I think Markdown just uses literal numerals for numbered lists, which makes sense, but maybe sometimes it’s easier to use the pound/number sign so that lists can be modified without having to renumber every item.

Changing gears a bit, I think all CMSes and markup languages that render HTML, such as Markdown, should emit minified HTML by default. In fact I think HTML should be minified by definition/specification – it was a big mistake for the HTML spec writers to not specify that production/consumed HTML had to be minified according to XYZ standards. The result is a lot of bloated HTML files and websites, waste of bandwidth, energy, and user time. It’s one thing to sling a ridiculously inefficient text format all over the net, but it’s even worse when there’s so much bloat added to it. I realize that image bloat is a bigger problem, but everything counts and HTML should be minified by default, at birth.

chrisalley · June 25, 2018, 3:33am

If the “Beyond Markdown” language does take off, at least the proposed emphasis rules would already be compatible with one popular system that uses a “Markdown-like” syntax.

Crissov · June 25, 2018, 9:10am

WhatsApp uses almost the same formatting conventions as Slack.

alehed · July 2, 2018, 8:41pm

I like this proposal. Markdown has some strengths without which it would not have gained its current popularity, but not everything about it is great.

The mentioned pain points fall into two categories: many ways to do the same thing (1, 3, 4) and missing power/generality (2, 5, 6).

Having more than one way to emphasize a word makes absolutely no sense. What does nested emphasis even mean?

The second point is that given markdowns limited power (basically everything that is discussed in extensions), it probably seemed like an easy fix to just allow in-lining arbitrary HTML. Today, Markdown has become so much more than just a fast way of writing HTML (for arbitrary HTML it is not even the best tool) so tying it to HTML does more harm than help. What Markdown needs is to natively support constructs that appear in written text (definition lists, tables, etc.) and then a way of annotating the document so that the document converter can do something smart with it. These annotations should reflect the semantic meaning of parts of the document or add some metadata to it, similar to LaTeX.

Getting 6. right could be tricky though: What if you want to give a few adjacent words or paragraphs a slightly different semantic meaning? Do you have to emphasize the words first and then apply the attribute? Do you have to add the attributes separately to all the paragraphs?

The main selling point about markdown is its beauty and its flat learning curve. None of that is taken away by simplifying it and adding some general way to extend it.

You can’t make an omelette without breaking some eggs.

JoeDuarte · July 30, 2018, 5:54am

I also think it’s worth looking at ArchieML, created by the New York Times IT group for NYT writers/reporters. It’s sort of like Markdown but with a big emphasis on data and types of data embedded in articles. It compiles to JSON. There might be some useful approaches there.

JCPayne · July 30, 2018, 6:03pm

Simpler is better for everyone. Common Mark should stand apart with no (or minimal) reliance on other other languages. IMO, backward compatibility is a goal not an absolute. Where backward compatibility is possible go for it but do not be bound by it. Very probable not all variants of Markdown can be built into Common Mark. Common Mark needs to “exceed” the other variants so they go away. Simplicity and unambiguous ways of writing will eventually prevail. Getting all to use Common Mark not likely.

I agree with @alehed that Common Mark should provide the ability to create the “normal” features of writing documents (tables; footnotes and so on).
Eliminate multiple ways of performing the same task. For example, no short reference links. There are probable others.
Emphasis: not sure how to solve bold and strong. Bold = " * “. Strong +” ** ". I agree with one character to identify letter format.
A truly radical proposal: use words, ie this becomes an attribute so there is no ambiguity (strong is strong; bold is bold). For clarity in human readability each attribute stands alone; cannot put multiple attributes in the same “holder”.
For attributes: {=…} @adiantwoods.
With a unambiguous statement of attributes HTML not needed. Not all know HTML or care to learn HTML.
A list should only be a list (no fancy complications).
All code that needs to “pass through” inside a code block.

Always open to comments and suggestions.

vas · August 10, 2018, 10:09am

Having to visually parse <strong>words that are not</strong> part of the sentence does <emphasis>not</emphasis> achieve “clarity in human readability”. See what <bold>I</bold> did there?

I think you are confusing *markup* readability with *content* readability, and Markdown is focused on the latter.

gour · October 3, 2018, 9:30am

I also wonder whether one can expect to get those “normal features” (in the form of extensions) of (more serious) documents like tables/footnotes/… in the Common Mark or is it better to focus on the systems supporting other markups (Markdown Extra, rst, Asciidoc,…)?

In other words, what is the practical meaning of Proposed Extensions, can one expect to see e.g. support for footnotes or it is more probable that we won’t see much of them in the final spec?