Beyond Markdown

You had me at removing doubled character delimiters from emphasis :slight_smile:

  1. I really like the changes to simplify emphasis. I’m not entirely sure I like the choice of the ~ delimiter but I think the key thing you’ve identified is that intraword emphasis should have some kind of special marking so that we ensure that it is intended. I wonder if we could make this idea more a more general concept, a kind of “anti-escape” if you will.
    Delimiter characters that depend on context (like emphasis) could either be “active”, “passive”, or “inactive”. “Active” ones would always render (the effect of your fan~_tas_~tic), “passive” ones would render if context allows (i.e. the left/right flanking rules are satisfied), and “inactive” ones would just be literal characters. By default all delimiter characters are passive, unless marked as active, or inactive (escaped).
    The immediate problem I see with this is that context also defines whether a delimiter is an opener or a closer (since both use the same character), so this “active” marker would have to allow that to be specified (e.g. as you’ve done by placing the ~ on the left or right of the delimiter).

  2. I really like the recommendations for reference links.

  3. Indented code blocks aren’t very useful, and they do indeed complicate things. Getting rid of them is a good solution.

  4. :+1:

  5. I like the moving of HTML to a proper block and inline syntax. `'s are markdown’s way of allowing odd text to be passed through untouched – allowing custom treatment of the untouched text (to HTML, or other) is a very natural extension.

  6. I like the idea here, I think that a class should have a . prefix personally for uniformity with css, and there are probably some details to work out with 6b and 6c.

    Perhaps even some there would be a way of reconciling this {...} syntax with the {=...} proposed in 5. . I know these will be distinguishable cases as defined, but is there perhaps more that can be done? e.g. suppose that {=...} was allowed before any block and was a general “argument passing syntax”. We could perhaps use {=(delimiter: roman, start: 6)} before a list to dictate the list counter for example. Not that we have to do it exactly like this, but perhaps something to think about with regard to unifying custom data inputs for behaviours with blocks/inlines. This so that we are not just limited to doing this kind of thing in code blocks and code spans.

If interword emphasis needs special syntax, why not use doubled markers only there?

foo _emph_ baz
foo *strong* baz
foo_bar_baz
foo*bar*baz
foo _bar_baz
foo *bar*baz
foo_bar_ baz
foo*bar* baz
foo__emph__baz
foo**strong**baz
foo __emph__baz
foo **strong**baz
foo__emph__ baz
foo**strong** baz
1 Like

I’m having trouble seeing the use cases for emphasis within emphasis. The main example I found in the spec was for use in bibliographies, but is emphasis the right element to use for these, rather than text offset from normal prose? The original Markdown spec was released before HTML5 reclassified some of the older tags to differentiate emphasis and other alternative prose, but presumably a successor to Markdown would want to take the range of different text elements that are rendered alternatively into account. Previously I wrote that I thought the forward slash would be suitable for marking up alternative voice/mood.

The original Markdown syntax guide does not mention emphasis inside of emphasis being a requirement of Markdown. Perhaps we need to support this in CommonMark because Markdown implementations already support this but a successor can be freed from these constraints, particularly the behaviour of Markdown.pl.

Were letter-ordered lists added to CommonMark? I couldn’t find mention of them in the latest version of the spec.

The [Book Title]{citation} syntax you mentioned isn’t bad, the main concern I have is that it’s another syntax to learn for an author who already knows HTML. Since Markdown was originally designed as a light weight syntax for “issues that can be conveyed in plain text”, there was always a way that web authors could fall back to heavier features without learning lots of extra syntax. If the language aims to be more general, it is less targeted at that specific audience. This was, I believe, one of the motivations for encouraging different flavours of Markdown, rather than having one general syntax for everyone.

To give a software analogy, we have Apple and Microsoft who have vastly different strategies when it comes to user interfaces. Windows 10 has a very general interface that is designed for both touch screens and mouse/trackpad inputs. Apple on the other hand, has two very distinct user interfaces with iOS and macOS, the former which is consists of thicker icons, the latter featuring thiner UI elements that allow for very precise and subtle movements. Generally, the UI of Windows 10 attempts to reach some kind of middle ground which makes it arguably not the best UI for either input type, but with the benefit of being more universal and compatible.

So if the successor language is aiming to move away from being a superset of HTML to something more general, that might be less appealing to someone using it for the very specific purpose of web authoring. HTML first, everything else second, already works well for those users. If the goal of the successor format is indeed to become more general and universal, that’s something that should probably be explicit in the goals of the project so that people can decide if it’s the right language for them.

If the goals of the two projects are close enough, it might be worth making this an official successor to CommonMark, a “CommonMark Strict” or “CommonMark Lite”, with regular CommonMark acting as a transitional spec for users coming from the various loosely specified Markdown specifications. But if the goals are indeed fundamentally different (from Markdown), a new language would make more sense.

1 Like

My mistake. I thought they had been – I remember advocating for them in the early discussions we had – but I guess they weren’t.

1 Like

I think Markdown should stick to content including its semantic structure (This is a heading. This set of items belong to a list, an ordered as opposed to unordered one.) and not style (Center the H1 heading and underline it. Show order using letters).

This separation of concerns is properly followed between HTML and CSS. One sets a list up with letters in CSS, not in HTML via the CSS list-style-type Property, (which supports many options. Traditional Katakana iroha numbering anyone?). It’s also why in HTML its strong and emphasis not bold and italic.

2 Likes

Very interesting proposal! A few comments:

Emphasis

Couldn’t agree more. The current markdown rules are confusing to explain to new users as well.

Though before making a definite decision, I would love to know current emphasis usage statistics. Are there any usage numbers of markdown documents in the wild?

(Not so sure about fan~_tas_~tic though, but I guess why not?)

Indented code blocks and lists

To remain somewhat more backwards compatible to CommonMark, instead of getting rid of indented code blocks entirely, one could also simply disallow them within lists and blockquotes and such, but keep allowing them when not nested in another element (which should account for 99% of existing uses).

Raw HTML

{=html} is great, but what about using Markdown inside HTML? Like:

<aside>
  my _great_ text
</aside>

Maybe if we had a generic block container (like the ::: in pandoc), Markdown inside HTML wouldn’t really be needed anymore.

Attributes

As you probably know, I’m all in on attributes. (Attribute discussion on this forum.) The specifics for different elements are a bit trickier to figure out (e.g. paragraphs, lists, list items), but I can see how placing the attribute before block elements, as you propose, might help in parsing.

several may be used (and will then be combined):

Not sure about that, isn’t the following simpler?

{warning #mywarning}

Dreaming?

Interestingly, most of the proposed changes are things only markdown-power-users would notice anyway. For example, most people just fiddle with lists until it’s right in the preview.

However, Emphasis and Raw HTML are two things almost everyone who has come across markdown somewhere is familiar with and would probably find annoying if it doesn’t work the way he/she expects anymore. Not sure what that means though… maybe if it weren’t for those two changes it could even pass through as “CommonMark v2”?

Here you can just do something like

```{=html}
<aside>
```
my _great_ text
```{=html}
</aside>
```

I didn’t mean to exclude this. You can have several attributes in one attribute block. But, to limit the need for lookahead in parsing, it’s convenient to limit attributes to one line, so if you have a lot of attributes, it’s nice to be able to have several attribute blocks that will be combined.

I know you’re a fan of hard-wrapping, but personally, I would rather have one overlong line with a single attribute block (that’s why we have attributes: to put technical junk in there which isn’t part of the text but still necessary sometimes), rather than several attribute block that I have to remember are actually a single one (just doesn’t feel natural to me, although I can see that it would be better for the parser). But either way, it’s probably somewhat of a detail…

On the proposal, you could still just use one long line with all the attributes. I just want to leave the possibility of having several attribute blocks that are consolidated. That would be nice for people who like to hard-wrap to a certain width. Besides, the parser needs to do something if it encounters multiple attribute blocks in a row, and consolidating them this way seems the most natural choice.

2 Likes

I disagree this idea, because I believe that Markdown was loved by many people who totally tired from too complex formats - rst, RD, MediaWiki, and more. I believe that extending of the Markdown format itself must keep its backward compatibility, otherwise, people who want to extend Markdown without compatibility should move toward to any other known complex formats designed for the purpose - for example, raw HTML. It is well designed and structured.

Simple format, less expressive, but enough for most popular cases - I believed it was the design strategy of Markdown. As the rsult, today non-programmer people are also using Markdown - writers, designers. So I think it’s a bad idea that breaking compatibility for just few demands - it will lose Markdown’s value. How many people really want such a complex combination of multiple "*"s?

Thus, it is the time to graduate Markdown and migrate to any other known major formats, for people who suffered from Markdown’s less expressiveness. But please don’t get other people involved to complex syntax hell…

2 Likes

One of the nice features, from a writer’s perspective, is that Markdown doesn’t require special markers or delimiters to start special cases. The proposal here would add this requirement to start HTML blocks or add intraword emphasis. So I think I agree with @piro_or that adding more complex (even if only a little) syntax would take away some of what makes Markdown appealing. It would reduce complexity from the spec and implementations, but seemingly at the writer’s expense.

For complex scenarios, we have the fallback to HTML. If nested emphasis is rare, HTML could be used in just those scenarios, while keeping the normal Markdown syntax simple. If the parser knows about the particular HTML tags used, they could be stored in the AST and then converted into other formats. We also have the concept of extensions, such as what GitHub have done, to use as shorter syntax in more widespread cases, such adding attributes. With these two solutions in mind (HTML fallback and extensions), other than making life easier for spec and parser writers, do we really benefit from a new set of incompatible rules?

1 Like

I think the proposal offers quite a bit to writers:

  • the ability to pass through any format, not just HTML
  • the ability to put attributes on any block element
  • a simpler mental model for emphasis and lists, with fewer rules to remember
  • a uniform and predictable extension mechanism, instead of ad hoc syntactic extensions

These things may not be important to you, but there are writers for whom they are important.

3 Likes

My use cases for a light weight markup language are primarily web based systems, particularly those where it’s important for the writer to productive quickly without learning new syntax rules or referencing a manual. I wonder if our use cases are different enough that a new language in the Markdown family of languages (like what you’ve proposed) would make sense for only some types of writers, while others would be better off staying with the status quo?

Could you add support for other formats to CommonMark as an extension (with the ````{=latex}` syntax), while keeping raw HTML working by default (as it does now)? Same with attributes as an extension. If so, these two points wouldn’t justify a new language on their own (although I can understand your desire to stay neutral with non-Markdown syntax as well).

On the one hand, I agree that the proposed rules for emphasis are simpler. On the other hand, with the current emphasis rules, the writer can muddle through without looking up the special intra-word emphasis syntax in a manual, by testing the output in a preview window (like the one this forum uses). I suppose the editor could have a button that adds intra-word emphasis, although the writer may not think to use it if they believe they already know how to write emphasised text.

Can you talk a bit more about how this would work in the new language? For example, to add tables, description lists, strikethrough and other common proposed extensions, how would adding these to the new language differ to adding them to CommonMark?

By the way, I don’t want to come across as negative towards the idea of a new language. These are the kinds of objections that others who primarily use web-based systems will likely raise, especially if we’re asking people to change their entrenched habits. Personally I wouldn’t mind writing in a language that’s a bit stricter and more explicit if it makes things simpler overall.

2 Likes

This may well be the case.

Yes, but this still leaves you with an ugly and complicated set of rules for determining what belongs to an HTML block.

I was just referring to what I said above:

However, this approach is too unwieldy to use for definition lists or strikethrough, and definitely not enough for tables, so some extensions would still be needed.

I think this another case where users of web systems, wanting to get some raw HTML working quickly, would find CommonMark simpler, while in more advanced cases explicitly declaring the section of text as HTML would make inserting it simpler. This, and the similar points I brought up earlier, are leading me to believe that a new language project might be worth undertaking as an alternative, rather than as a replacement, to Markdown.

This.

Markdown took the markup world by storm for a reason. It needs to stay true to that reason. It should not and cannot be all things to all people. Other markup languages exist or can be created for people who have different reasons.

Has there been a discussion here or elsewhere that gets to Markdown’s reason?

I believe the reason is as is stated in the CommonMark intro:

What distinguishes Markdown from many other lightweight markup syntaxes, which are often easier to write, is its readability. As Gruber writes:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. (Daring Fireball: Markdown)

The Prime Directive, so to speak. I would put portability second (the reason for CommonMark’s clarification of Markdown, right?), and then greater expressibility (e.g. supporting tables) third. Greater expressibility must bow to The Prime Directive; we can’t just add stuff willy-nilly. If adding stuff tends to break the Prime Directive, then we have to be very judicious – e.g. don’t add things needed by a a few at the expense of the many (ok, that second Star Trek reference wasn’t intentional, but I like it). If someone comes up with some amazing syntax that supports semantic “divs” and the like while staying true to the Prime Directive, sweet.

[And yeah, I know that some of the greatest Star Trek episodes are where the Prime Directive is broken. But it wasn’t willy nilly. And it certainly wasn’t for self-serving reasons.]

2 Likes

I agree completely about the Prime Directive; that’s why I put it right at the beginning of the Commonmark spec. But you can’t justify “include HTML anywhere you like without special marking” by appealing to the Prime Directive. Once you include raw HTML, you’ve already gotten away form “publishable as-is, as plain text, without looking like it’s been marked up…”

If we need to choose between

(1) what the original Markdown allows:

<div class="warning">
<p>Don't try this at home.</p>
</div>

(2) what Commonmark allows:

<div class="warning">

Don't try this at home.

</div>

(3), which you can do in pandoc:

::::::::: warning ::::::::::::
Don't try this at home.
::::::::::::::::::::::::::::::

and (4), which the proposal above would allow:

{warning}
Don't try this at home.

then I think it is (3) and (4) that best respect the Prime Directive.

7 Likes

As a web-dev I understand you guys not liking the thought of not having the power of HTML quickly available. But maybe it would be more productive to talk about the use-cases for raw HTML in markdown documents. Maybe there is a better, more readable way?

Would adding a nice syntax for block containers (divs/aside/etc) and inline-spans not solve 99% of those use-cases?
(See the pandoc manual for examples.)

1 Like

Many of these solutions suggest throwing out some syntax to make parsing easier. The result (from these suggestions) is a subset of Markdown/CommonMark that is easy to write, read and parse.

This seems like an excellent approach. It suggests to me that bad syntax will simply render as regular text, rather than weird markup, which is an easier-to-swallow surprise. In addition, it seems likely this will make it easier to learn from the correction (“Oh, I don’t need to indent code.”)

To that end I’d suggest implementing these particular solutions in CommonMark 1.0. It would mean that some Markdown files that-render-in-some-particular-parser would need to be updated, but imho these are the exact documents that would benefit from ambiguity fixes.

Could a similar approach be taken to remove other ambiguous cases, and instead require writers to write unambiguously. For example, how many of the 17 emphasis rules simply be declared “ambiguous” and rendered in regular text? The author would then need to find a simpler way of marking up their text, benefiting plaintext readers and parsers alike.

3 Likes

I believe John Gruber was referring to the Markdown syntax only here, rather than Markdown + HTML in the document. Further in the syntax guide he writes (emphasis mine):

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

So, a “Markdown-formatted document” mentioned in the prime directive is referring to just the Markdown syntax in the document. HTML is allowed because Markdown syntax is not intended to solve every problem.

I think this would remove the requirement for raw HTML in a lot of cases (even if it is still desired for the reasons I mentioned in earlier posts). We would need many of the proposed extensions to always be available, such as definition lists, tables, and embedded video. A significant challenge would be coming up with a short syntax for all the different types of phrasing content, while still making the syntax both concise and understandable in plain text.

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

2 Likes