Strict Markdown subset

Hi all – Do you think it would be helpful to define a strict, unambiguous subset of Markdown/CommonMark? What I mean is that Markdown by default allows for several different ways to do lots of things, and the CommonMark spec allows for all sorts of extreme artifacts and syntax.

For example, a strict subset would specify one way to do emphasis (e.g. asterisks only), one way to do headings (e.g. octothorpes, called ATX I think), one way to do unordered lists (e.g. hyphens), and so forth. It could also strictly limit the length of delimiter runs and lock down other syntactic artifacts.

I wonder if having a much simpler strict subset would help clarify the full CommonMark spec, or make it easier to write. Certainly writing the spec for the strict subset would be easier than the CommonMark spec. And writing parsers would be easier.

Is there some other way such a project or subproject would help CommonMark? I don’t have any strong opinions here. It’s fine if it’s orthogonal and needs to be its own project.

This might seem similar to a Markdown linter that forces one way of doing each thing, but a linter isn’t a spec, and probably wouldn’t spawn lean standalone parsers.

Those of you who have written parsers – how much ease-of-parsing benefits do you think could be realized?

(A good name would be Stark, abbreviating STrict mARKdown, and a nod to the true King in the North. :grinning:)

2 Likes

I think it might be useful as a recommendation for automatic Markdown writers or convertors from other formats: They could generate something very similar to each other, which could likely be a good thing.

Enforced by parsers, imho not so much, if at all. Many apps would likely have to keep some backward compatibility with older documents anyway (and why to keep two so similar parsers around).

Also, I very much doubt that a wide consensus which of the duplicate to keep and which to remove could be reached: Many authors have already chosen whether they prefer e.g. a setext heading over ATX heading or vice versa: If you remove any of the two, the affected people quite likely won’t migrate to such strict parser at all, so it could actually add to the babel instead of solving anything.

3 Likes

I believe Markdown’s spirit and what makes it successful is the degree to which it is designed for humans as opposed to machines, that is to say the degree to which it is like a natural language and not like a programming language. Markdown is about getting the machine to parse what humans can read without a spec and write with little explanation. If it weren’t for machines we’d not need a spec at all. Humans can read each other’s ad hoc or idiosyncratic plain text styles effortlessly. If we had A.I. today, Markdown would be dead.

In some cases Markdown’s support for multiple styles is all about the above. For example, recognizing many ways to delimit lists and many styles of thematic break. In other cases Markdown is a compromise: Setext is what an untrained human would write, and is the most readable heading style, but it is limited to two levels. If you took away Setext, Markdown suddenly stops recognizing the most natural way humans do headings. Most good writing, for humans at least, never uses more than two levels of heading, so the limit is not a limit or is a beneficial limit. Not coincidentally and not without irony, the types of docs that do use 3 or more levels are specifications and legal docs.

Markdown’s complexity both in spec and parsing has little to do with supporting a variety of styles. If it were complexity in service of what I describe above, then so be it. That’s what machines are for. But nearly all of Markdown’s complexity stems from its support for lazy continuation and sloppy structure. The logic in support of these two metastasized throughout the spec in ways you wouldn’t realize until you try to write a parser and a spec. These are the sources of the “extreme artifacts and syntax” to which you allude, not ATX vs Setext. I think both were misguided attempts at being more human. They ended up the opposite. They are in my opinion Markdown’s biggest mistakes.

Just to give one example (I have many more), specifying the following behavior does no service to humans. Quite the opposite. See for yourself. Then write a spec to get all of these different interpretations of laziness and sloppiness in line!

>> everything below stems from the desire
> > to support *this* sloppiness
and *this* laziness as part of a single
block quote.


> at level 1.
>> at level 2.
  >
  > at level 1.
>   > at level 2.
  >   > continuing level 2.
lazily continuing level 2.
> still at level 2.
still at level 2.
>
> at level 1.
lazily at level 1.
>
>>>>> at level 5.
>>> lazily at level 5.
>
not lazy, at root level.

If we were to define a “strict Markdown”, lazy continuation and sloppy structure should be first on the chopping block.

3 Likes

What are @jgm’s thoughts on this?

It’s true that disallowing lazy continuations would simplify creation of parsers. But we have parsers now that handle these things efficiently, and a spec that defines behavior even for crazy things like the above example. (I guarantee that if you eliminated laziness you’d have howls of protest.) Having just one bullet list marker or thematic break style would not simplify the spec or parsers significantly.

I have already put down my thoughts about how some tweaks to Markdown syntax would create a more rational language and simplify the spec and parsers:

and with slight modifications at

https://johnmacfarlane.net/beyond-markdown.html

3 Likes

@vas Thanks for the examples. What is the precise definition of lazy continuation?

The CommonMark specification allows that list item or quote block continues on the next line even when the author was lazy to use the > (for quote block) and indent the line contents properly (both quote and list items).

For example:

> These two lines together form one
paragraph in a block quote even though there is no `>` at the 2nd one.

* Ditto for a long list item which
can also be broken into multiple lines in a similar way.
1 Like