Beyond Markdown

That was in fact the aim here. I just thought about the features of Markdown (and commonmark) that have made the project of creating an unambiguous and not overly complex spec difficult, and proposed minimal changes to fix these. The result would be very much like commonmark and Markdown except in a few respects.

They’re established in Markdown. But before Markdown, of course, there were all sorts of conventions in plain text email, and one of the most common was to use different characters for *strong* and /regular/ emphasis. Many other light markup formats do the same. I think there’s a very good technical reason for avoiding the “doubled asterisk” syntax, as explained in my post. Anyway, it’s a suggestion I made from long experience fiddling with emphasis rules. It’s quite easy to think of rules that make sense for the cases you happen to have in mind—but then there are always other cases! I don’t think authors are going to want to give up the possibility of doing strong emphasis inside regular emphasis (or the reverse).

I think you’re aware of this already, but in case not: CommonMark already has both significant start numbers and letter-ordered lists.

With the addition of attributes, there’d be a cleaner solution to this problem. One could write [Book Title]{citation}, for example, and have the renderer turn this into something appropriate for the output format. I know some people think of Markdown as primarily a way to write HTML. That’s fine, but I prefer to think of light markup syntaxes as ways of writing documents that can be converted into many different formats. Given those interests, privileging HTML doesn’t make sense.

This is something to consider. Adding this attribute syntax (which I see now is very similar to markua’s, though my own inspiration was pandoc’s attribute syntax) would probably not change the interpretation of existing documents, since you’d be very unlikely to write one of these attribute specifiers for any other reason.

I’m trying to understand how the indentation idea would work. Presumably, the thought is that this would be a paragraph followed by a list:

foo
 - bar

But this would not be:

foo
- bar

Applying the principle of uniformity, then, here you’d have a sublist:

- foo
   - bar

But here you wouldn’t:

- foo
  - bar

Do I have it right? To me this seems a bit unnatural. It looks best to line up the markers with the content of the parent item in a sublist. Of course, I realize that on your proposal there’d still be the option of using a blank line, but I wonder whether it’s worth the extra complexity. After all, people will spontaneously create lists without the indentation; they’ll have to learn that this doesn’t work, and that they need to do something special. Why not just have them learn that they need a blank line?

2 Likes

Attribute support could be extremely useful for RDFa and/or JSONLD support. Currently, the only way to embed structured data in CommonMark is with HTML and RDFa (RDF-in-atttributes).

Schema.org/CreativeWork #examples for document-level metadata would be a great start. And then how to point to a Schema.org/Dataset?

Support for Linked Data from attributes should be a priority.

With indented code blocks gone, the rules get a lot simpler.

paragraph
- paragraph
paragraph
 - list
paragraph
    -    list
paragraph

- list
- list
 - nested list
- list
      -    nested list
- list

- list
- list

 - nested list?
1 Like

You had me at removing doubled character delimiters from emphasis :slight_smile:

  1. I really like the changes to simplify emphasis. I’m not entirely sure I like the choice of the ~ delimiter but I think the key thing you’ve identified is that intraword emphasis should have some kind of special marking so that we ensure that it is intended. I wonder if we could make this idea more a more general concept, a kind of “anti-escape” if you will.
    Delimiter characters that depend on context (like emphasis) could either be “active”, “passive”, or “inactive”. “Active” ones would always render (the effect of your fan~_tas_~tic), “passive” ones would render if context allows (i.e. the left/right flanking rules are satisfied), and “inactive” ones would just be literal characters. By default all delimiter characters are passive, unless marked as active, or inactive (escaped).
    The immediate problem I see with this is that context also defines whether a delimiter is an opener or a closer (since both use the same character), so this “active” marker would have to allow that to be specified (e.g. as you’ve done by placing the ~ on the left or right of the delimiter).

  2. I really like the recommendations for reference links.

  3. Indented code blocks aren’t very useful, and they do indeed complicate things. Getting rid of them is a good solution.

  4. :+1:

  5. I like the moving of HTML to a proper block and inline syntax. `'s are markdown’s way of allowing odd text to be passed through untouched – allowing custom treatment of the untouched text (to HTML, or other) is a very natural extension.

  6. I like the idea here, I think that a class should have a . prefix personally for uniformity with css, and there are probably some details to work out with 6b and 6c.

    Perhaps even some there would be a way of reconciling this {...} syntax with the {=...} proposed in 5. . I know these will be distinguishable cases as defined, but is there perhaps more that can be done? e.g. suppose that {=...} was allowed before any block and was a general “argument passing syntax”. We could perhaps use {=(delimiter: roman, start: 6)} before a list to dictate the list counter for example. Not that we have to do it exactly like this, but perhaps something to think about with regard to unifying custom data inputs for behaviours with blocks/inlines. This so that we are not just limited to doing this kind of thing in code blocks and code spans.

If interword emphasis needs special syntax, why not use doubled markers only there?

foo _emph_ baz
foo *strong* baz
foo_bar_baz
foo*bar*baz
foo _bar_baz
foo *bar*baz
foo_bar_ baz
foo*bar* baz
foo__emph__baz
foo**strong**baz
foo __emph__baz
foo **strong**baz
foo__emph__ baz
foo**strong** baz
1 Like

I’m having trouble seeing the use cases for emphasis within emphasis. The main example I found in the spec was for use in bibliographies, but is emphasis the right element to use for these, rather than text offset from normal prose? The original Markdown spec was released before HTML5 reclassified some of the older tags to differentiate emphasis and other alternative prose, but presumably a successor to Markdown would want to take the range of different text elements that are rendered alternatively into account. Previously I wrote that I thought the forward slash would be suitable for marking up alternative voice/mood.

The original Markdown syntax guide does not mention emphasis inside of emphasis being a requirement of Markdown. Perhaps we need to support this in CommonMark because Markdown implementations already support this but a successor can be freed from these constraints, particularly the behaviour of Markdown.pl.

Were letter-ordered lists added to CommonMark? I couldn’t find mention of them in the latest version of the spec.

The [Book Title]{citation} syntax you mentioned isn’t bad, the main concern I have is that it’s another syntax to learn for an author who already knows HTML. Since Markdown was originally designed as a light weight syntax for “issues that can be conveyed in plain text”, there was always a way that web authors could fall back to heavier features without learning lots of extra syntax. If the language aims to be more general, it is less targeted at that specific audience. This was, I believe, one of the motivations for encouraging different flavours of Markdown, rather than having one general syntax for everyone.

To give a software analogy, we have Apple and Microsoft who have vastly different strategies when it comes to user interfaces. Windows 10 has a very general interface that is designed for both touch screens and mouse/trackpad inputs. Apple on the other hand, has two very distinct user interfaces with iOS and macOS, the former which is consists of thicker icons, the latter featuring thiner UI elements that allow for very precise and subtle movements. Generally, the UI of Windows 10 attempts to reach some kind of middle ground which makes it arguably not the best UI for either input type, but with the benefit of being more universal and compatible.

So if the successor language is aiming to move away from being a superset of HTML to something more general, that might be less appealing to someone using it for the very specific purpose of web authoring. HTML first, everything else second, already works well for those users. If the goal of the successor format is indeed to become more general and universal, that’s something that should probably be explicit in the goals of the project so that people can decide if it’s the right language for them.

If the goals of the two projects are close enough, it might be worth making this an official successor to CommonMark, a “CommonMark Strict” or “CommonMark Lite”, with regular CommonMark acting as a transitional spec for users coming from the various loosely specified Markdown specifications. But if the goals are indeed fundamentally different (from Markdown), a new language would make more sense.

1 Like

My mistake. I thought they had been – I remember advocating for them in the early discussions we had – but I guess they weren’t.

1 Like

I think Markdown should stick to content including its semantic structure (This is a heading. This set of items belong to a list, an ordered as opposed to unordered one.) and not style (Center the H1 heading and underline it. Show order using letters).

This separation of concerns is properly followed between HTML and CSS. One sets a list up with letters in CSS, not in HTML via the CSS list-style-type Property, (which supports many options. Traditional Katakana iroha numbering anyone?). It’s also why in HTML its strong and emphasis not bold and italic.

2 Likes

Very interesting proposal! A few comments:

Emphasis

Couldn’t agree more. The current markdown rules are confusing to explain to new users as well.

Though before making a definite decision, I would love to know current emphasis usage statistics. Are there any usage numbers of markdown documents in the wild?

(Not so sure about fan~_tas_~tic though, but I guess why not?)

Indented code blocks and lists

To remain somewhat more backwards compatible to CommonMark, instead of getting rid of indented code blocks entirely, one could also simply disallow them within lists and blockquotes and such, but keep allowing them when not nested in another element (which should account for 99% of existing uses).

Raw HTML

{=html} is great, but what about using Markdown inside HTML? Like:

<aside>
  my _great_ text
</aside>

Maybe if we had a generic block container (like the ::: in pandoc), Markdown inside HTML wouldn’t really be needed anymore.

Attributes

As you probably know, I’m all in on attributes. (Attribute discussion on this forum.) The specifics for different elements are a bit trickier to figure out (e.g. paragraphs, lists, list items), but I can see how placing the attribute before block elements, as you propose, might help in parsing.

several may be used (and will then be combined):

Not sure about that, isn’t the following simpler?

{warning #mywarning}

Dreaming?

Interestingly, most of the proposed changes are things only markdown-power-users would notice anyway. For example, most people just fiddle with lists until it’s right in the preview.

However, Emphasis and Raw HTML are two things almost everyone who has come across markdown somewhere is familiar with and would probably find annoying if it doesn’t work the way he/she expects anymore. Not sure what that means though… maybe if it weren’t for those two changes it could even pass through as “CommonMark v2”?

Here you can just do something like

```{=html}
<aside>
```
my _great_ text
```{=html}
</aside>
```

I didn’t mean to exclude this. You can have several attributes in one attribute block. But, to limit the need for lookahead in parsing, it’s convenient to limit attributes to one line, so if you have a lot of attributes, it’s nice to be able to have several attribute blocks that will be combined.

I know you’re a fan of hard-wrapping, but personally, I would rather have one overlong line with a single attribute block (that’s why we have attributes: to put technical junk in there which isn’t part of the text but still necessary sometimes), rather than several attribute block that I have to remember are actually a single one (just doesn’t feel natural to me, although I can see that it would be better for the parser). But either way, it’s probably somewhat of a detail…

On the proposal, you could still just use one long line with all the attributes. I just want to leave the possibility of having several attribute blocks that are consolidated. That would be nice for people who like to hard-wrap to a certain width. Besides, the parser needs to do something if it encounters multiple attribute blocks in a row, and consolidating them this way seems the most natural choice.

2 Likes

I disagree this idea, because I believe that Markdown was loved by many people who totally tired from too complex formats - rst, RD, MediaWiki, and more. I believe that extending of the Markdown format itself must keep its backward compatibility, otherwise, people who want to extend Markdown without compatibility should move toward to any other known complex formats designed for the purpose - for example, raw HTML. It is well designed and structured.

Simple format, less expressive, but enough for most popular cases - I believed it was the design strategy of Markdown. As the rsult, today non-programmer people are also using Markdown - writers, designers. So I think it’s a bad idea that breaking compatibility for just few demands - it will lose Markdown’s value. How many people really want such a complex combination of multiple "*"s?

Thus, it is the time to graduate Markdown and migrate to any other known major formats, for people who suffered from Markdown’s less expressiveness. But please don’t get other people involved to complex syntax hell…

2 Likes

One of the nice features, from a writer’s perspective, is that Markdown doesn’t require special markers or delimiters to start special cases. The proposal here would add this requirement to start HTML blocks or add intraword emphasis. So I think I agree with @piro_or that adding more complex (even if only a little) syntax would take away some of what makes Markdown appealing. It would reduce complexity from the spec and implementations, but seemingly at the writer’s expense.

For complex scenarios, we have the fallback to HTML. If nested emphasis is rare, HTML could be used in just those scenarios, while keeping the normal Markdown syntax simple. If the parser knows about the particular HTML tags used, they could be stored in the AST and then converted into other formats. We also have the concept of extensions, such as what GitHub have done, to use as shorter syntax in more widespread cases, such adding attributes. With these two solutions in mind (HTML fallback and extensions), other than making life easier for spec and parser writers, do we really benefit from a new set of incompatible rules?

1 Like

I think the proposal offers quite a bit to writers:

  • the ability to pass through any format, not just HTML
  • the ability to put attributes on any block element
  • a simpler mental model for emphasis and lists, with fewer rules to remember
  • a uniform and predictable extension mechanism, instead of ad hoc syntactic extensions

These things may not be important to you, but there are writers for whom they are important.

3 Likes

My use cases for a light weight markup language are primarily web based systems, particularly those where it’s important for the writer to productive quickly without learning new syntax rules or referencing a manual. I wonder if our use cases are different enough that a new language in the Markdown family of languages (like what you’ve proposed) would make sense for only some types of writers, while others would be better off staying with the status quo?

Could you add support for other formats to CommonMark as an extension (with the ````{=latex}` syntax), while keeping raw HTML working by default (as it does now)? Same with attributes as an extension. If so, these two points wouldn’t justify a new language on their own (although I can understand your desire to stay neutral with non-Markdown syntax as well).

On the one hand, I agree that the proposed rules for emphasis are simpler. On the other hand, with the current emphasis rules, the writer can muddle through without looking up the special intra-word emphasis syntax in a manual, by testing the output in a preview window (like the one this forum uses). I suppose the editor could have a button that adds intra-word emphasis, although the writer may not think to use it if they believe they already know how to write emphasised text.

Can you talk a bit more about how this would work in the new language? For example, to add tables, description lists, strikethrough and other common proposed extensions, how would adding these to the new language differ to adding them to CommonMark?

By the way, I don’t want to come across as negative towards the idea of a new language. These are the kinds of objections that others who primarily use web-based systems will likely raise, especially if we’re asking people to change their entrenched habits. Personally I wouldn’t mind writing in a language that’s a bit stricter and more explicit if it makes things simpler overall.

2 Likes

This may well be the case.

Yes, but this still leaves you with an ugly and complicated set of rules for determining what belongs to an HTML block.

I was just referring to what I said above:

However, this approach is too unwieldy to use for definition lists or strikethrough, and definitely not enough for tables, so some extensions would still be needed.

I think this another case where users of web systems, wanting to get some raw HTML working quickly, would find CommonMark simpler, while in more advanced cases explicitly declaring the section of text as HTML would make inserting it simpler. This, and the similar points I brought up earlier, are leading me to believe that a new language project might be worth undertaking as an alternative, rather than as a replacement, to Markdown.

This.

Markdown took the markup world by storm for a reason. It needs to stay true to that reason. It should not and cannot be all things to all people. Other markup languages exist or can be created for people who have different reasons.

Has there been a discussion here or elsewhere that gets to Markdown’s reason?

I believe the reason is as is stated in the CommonMark intro:

What distinguishes Markdown from many other lightweight markup syntaxes, which are often easier to write, is its readability. As Gruber writes:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. (Daring Fireball: Markdown)

The Prime Directive, so to speak. I would put portability second (the reason for CommonMark’s clarification of Markdown, right?), and then greater expressibility (e.g. supporting tables) third. Greater expressibility must bow to The Prime Directive; we can’t just add stuff willy-nilly. If adding stuff tends to break the Prime Directive, then we have to be very judicious – e.g. don’t add things needed by a a few at the expense of the many (ok, that second Star Trek reference wasn’t intentional, but I like it). If someone comes up with some amazing syntax that supports semantic “divs” and the like while staying true to the Prime Directive, sweet.

[And yeah, I know that some of the greatest Star Trek episodes are where the Prime Directive is broken. But it wasn’t willy nilly. And it certainly wasn’t for self-serving reasons.]

2 Likes

I agree completely about the Prime Directive; that’s why I put it right at the beginning of the Commonmark spec. But you can’t justify “include HTML anywhere you like without special marking” by appealing to the Prime Directive. Once you include raw HTML, you’ve already gotten away form “publishable as-is, as plain text, without looking like it’s been marked up…”

If we need to choose between

(1) what the original Markdown allows:

<div class="warning">
<p>Don't try this at home.</p>
</div>

(2) what Commonmark allows:

<div class="warning">

Don't try this at home.

</div>

(3), which you can do in pandoc:

::::::::: warning ::::::::::::
Don't try this at home.
::::::::::::::::::::::::::::::

and (4), which the proposal above would allow:

{warning}
Don't try this at home.

then I think it is (3) and (4) that best respect the Prime Directive.

7 Likes