Beyond Markdown


#21

One of the nice features, from a writer’s perspective, is that Markdown doesn’t require special markers or delimiters to start special cases. The proposal here would add this requirement to start HTML blocks or add intraword emphasis. So I think I agree with @piro_or that adding more complex (even if only a little) syntax would take away some of what makes Markdown appealing. It would reduce complexity from the spec and implementations, but seemingly at the writer’s expense.

For complex scenarios, we have the fallback to HTML. If nested emphasis is rare, HTML could be used in just those scenarios, while keeping the normal Markdown syntax simple. If the parser knows about the particular HTML tags used, they could be stored in the AST and then converted into other formats. We also have the concept of extensions, such as what GitHub have done, to use as shorter syntax in more widespread cases, such adding attributes. With these two solutions in mind (HTML fallback and extensions), other than making life easier for spec and parser writers, do we really benefit from a new set of incompatible rules?


#22

I think the proposal offers quite a bit to writers:

  • the ability to pass through any format, not just HTML
  • the ability to put attributes on any block element
  • a simpler mental model for emphasis and lists, with fewer rules to remember
  • a uniform and predictable extension mechanism, instead of ad hoc syntactic extensions

These things may not be important to you, but there are writers for whom they are important.


#23

My use cases for a light weight markup language are primarily web based systems, particularly those where it’s important for the writer to productive quickly without learning new syntax rules or referencing a manual. I wonder if our use cases are different enough that a new language in the Markdown family of languages (like what you’ve proposed) would make sense for only some types of writers, while others would be better off staying with the status quo?

Could you add support for other formats to CommonMark as an extension (with the ````{=latex}` syntax), while keeping raw HTML working by default (as it does now)? Same with attributes as an extension. If so, these two points wouldn’t justify a new language on their own (although I can understand your desire to stay neutral with non-Markdown syntax as well).

On the one hand, I agree that the proposed rules for emphasis are simpler. On the other hand, with the current emphasis rules, the writer can muddle through without looking up the special intra-word emphasis syntax in a manual, by testing the output in a preview window (like the one this forum uses). I suppose the editor could have a button that adds intra-word emphasis, although the writer may not think to use it if they believe they already know how to write emphasised text.

Can you talk a bit more about how this would work in the new language? For example, to add tables, description lists, strikethrough and other common proposed extensions, how would adding these to the new language differ to adding them to CommonMark?

By the way, I don’t want to come across as negative towards the idea of a new language. These are the kinds of objections that others who primarily use web-based systems will likely raise, especially if we’re asking people to change their entrenched habits. Personally I wouldn’t mind writing in a language that’s a bit stricter and more explicit if it makes things simpler overall.


#24

This may well be the case.

Yes, but this still leaves you with an ugly and complicated set of rules for determining what belongs to an HTML block.

I was just referring to what I said above:

However, this approach is too unwieldy to use for definition lists or strikethrough, and definitely not enough for tables, so some extensions would still be needed.


#25

I think this another case where users of web systems, wanting to get some raw HTML working quickly, would find CommonMark simpler, while in more advanced cases explicitly declaring the section of text as HTML would make inserting it simpler. This, and the similar points I brought up earlier, are leading me to believe that a new language project might be worth undertaking as an alternative, rather than as a replacement, to Markdown.


#26

This.

Markdown took the markup world by storm for a reason. It needs to stay true to that reason. It should not and cannot be all things to all people. Other markup languages exist or can be created for people who have different reasons.

Has there been a discussion here or elsewhere that gets to Markdown’s reason?

I believe the reason is as is stated in the CommonMark intro:

What distinguishes Markdown from many other lightweight markup syntaxes, which are often easier to write, is its readability. As Gruber writes:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. (http://daringfireball.net/projects/markdown/)

The Prime Directive, so to speak. I would put portability second (the reason for CommonMark’s clarification of Markdown, right?), and then greater expressibility (e.g. supporting tables) third. Greater expressibility must bow to The Prime Directive; we can’t just add stuff willy-nilly. If adding stuff tends to break the Prime Directive, then we have to be very judicious – e.g. don’t add things needed by a a few at the expense of the many (ok, that second Star Trek reference wasn’t intentional, but I like it). If someone comes up with some amazing syntax that supports semantic “divs” and the like while staying true to the Prime Directive, sweet.

[And yeah, I know that some of the greatest Star Trek episodes are where the Prime Directive is broken. But it wasn’t willy nilly. And it certainly wasn’t for self-serving reasons.]


#27

I agree completely about the Prime Directive; that’s why I put it right at the beginning of the Commonmark spec. But you can’t justify “include HTML anywhere you like without special marking” by appealing to the Prime Directive. Once you include raw HTML, you’ve already gotten away form “publishable as-is, as plain text, without looking like it’s been marked up…”

If we need to choose between

(1) what the original Markdown allows:

<div class="warning">
<p>Don't try this at home.</p>
</div>

(2) what Commonmark allows:

<div class="warning">

Don't try this at home.

</div>

(3), which you can do in pandoc:

::::::::: warning ::::::::::::
Don't try this at home.
::::::::::::::::::::::::::::::

and (4), which the proposal above would allow:

{warning}
Don't try this at home.

then I think it is (3) and (4) that best respect the Prime Directive.


#28

As a web-dev I understand you guys not liking the thought of not having the power of HTML quickly available. But maybe it would be more productive to talk about the use-cases for raw HTML in markdown documents. Maybe there is a better, more readable way?

Would adding a nice syntax for block containers (divs/aside/etc) and inline-spans not solve 99% of those use-cases?
(See the pandoc manual for examples.)


#29

Many of these solutions suggest throwing out some syntax to make parsing easier. The result (from these suggestions) is a subset of Markdown/CommonMark that is easy to write, read and parse.

This seems like an excellent approach. It suggests to me that bad syntax will simply render as regular text, rather than weird markup, which is an easier-to-swallow surprise. In addition, it seems likely this will make it easier to learn from the correction (“Oh, I don’t need to indent code.”)

To that end I’d suggest implementing these particular solutions in CommonMark 1.0. It would mean that some Markdown files that-render-in-some-particular-parser would need to be updated, but imho these are the exact documents that would benefit from ambiguity fixes.

Could a similar approach be taken to remove other ambiguous cases, and instead require writers to write unambiguously. For example, how many of the 17 emphasis rules simply be declared “ambiguous” and rendered in regular text? The author would then need to find a simpler way of marking up their text, benefiting plaintext readers and parsers alike.


#30

I believe John Gruber was referring to the Markdown syntax only here, rather than Markdown + HTML in the document. Further in the syntax guide he writes (emphasis mine):

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

So, a “Markdown-formatted document” mentioned in the prime directive is referring to just the Markdown syntax in the document. HTML is allowed because Markdown syntax is not intended to solve every problem.

I think this would remove the requirement for raw HTML in a lot of cases (even if it is still desired for the reasons I mentioned in earlier posts). We would need many of the proposed extensions to always be available, such as definition lists, tables, and embedded video. A significant challenge would be coming up with a short syntax for all the different types of phrasing content, while still making the syntax both concise and understandable in plain text.

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.


#31

To that end I’d suggest [removing these features] from CommonMark 1.0

CommonMark’s goal is to be highly compatible with existing implementations. Changing the emphasis syntax or removing raw HTML would significantly break many documents that cannot be automatically updated (for example, GitHub readmes); even if those documents would benefit from being updated, it’s not something that can realistically be done for every document.

That’s a great point. Maybe this is simply a semantic issue (for me). If the solution to this daydream is called “CommonMark 2.0” rather than “NewLanguage 1.0”, it implies:

  • breaking backwards compatibility
  • explicit support (or not) for specific versions
  • the need for an upgrade utility :smiley:
  • that the new version is better :smiling_imp:

I speak as someone who is comfortable with the future being unevenly distributed ala Python 2 and Python 3, so ymmv.