Support for Extension Token

drobertson123 · March 28, 2018, 8:29pm

As I have used Markdown I constantly run into use cases where I need it to do something beyond the original scope of the Markdown concept. I think this is a common theme that has driven much of the fragmentation of the markdown idea.

I believe it would be very helpful to define something in the markdown spec that allows for the extension of markdown in a consistent way without limiting what that extension would be.

My thought is that this would be a token definition that would specifically not be rendered by a Markdown processor. These tokens would follow a simple basic syntax and could be rendered by extension to a markdown processor.

This is the core concept using {{{ and }}} to define the extension token;

{{{extension directive1 directive2}}}

Markdown would not render this into output text in any way, but a Markdown renderer with an extension specific to that extension name would detect and process it. Anything within the token area could be used as directives for the extension. The extension would be responsible for interpreting and rendering the output.

example for a hypothetical LatexMD extension token;

{{{LatexMD “$\alpha + b =\Gamma \div D$”}}}

In any normal Markdown rendering engine, this block would get ignored and not be included in the output.

If the rendering engine had an extension that interpreted the LatexMD token it would process the token along with the information/directives and render an output for that location as part of the rendering process.

The important feature of this would be that the token would specifically NOT be rendered unless there was an extension that could provide a legitimate output for the rendering process. At this point, I can’t see anything in the spec that would provide optional functionality like this, but I would be happy to learn it was there.

Aside from the goal of providing an extension point for markdown, I see this as a way to limit the future fragmentation of markdown in an unstructured way. Unique rendering functions could be created that would not cause breaking changes to propagate out to the markdown community.

Notes:
I am indifferent to the way it is implemented. I just feel the core concept is vital. If implemented well this could be the extension point for a wide variety of the requests I am seeing here that seem to fall outside the core concept of markdown but are arguably useful.

jgm · March 28, 2018, 8:47pm

You could use HTML comments for this:

<!-- LatexMD "$\alpha + b == \Gamma \div D$" -->

This will be parsed as a raw HTML node; one can then
transform the AST and convert this to whatever.

Advantage of this over {{{...}}} is that it is already
compatible with every markdown implementation.

However, I think that, in practice, if you want LaTeX math, it’s
going to be too cumbersome to write things like this (either
version); you’ll want an actual parser extension that
handles this.

drobertson123 · March 28, 2018, 9:57pm

Using HTML Comments could work, good thought. I may use that as a practical solution, but it feels like an improper subversion of something that has its own purpose.

I still think that having some form of extension token would be a better solution.

You may have missed the point of the idea above. I don’t care about Latex, I was just using that as an example of creating an entry point for a Latex extension, or any extension for that matter. LatexMD was just a made up example.

You are right that Latex should have a parser extension. It doesn’t belong in markdown, but could be useful for some people. The issue is that if we don’t have a specified way to inject extension specific markdown then everyone will just implement their own extensions in any way they feel like.

Right now Markdown has so many flavors that it is impossible to trust what you will see in different tools. This suggestion is about providing a way for markdown to use extesions, but not pollute Markdown implementations in the wild.

drobertson123 · March 28, 2018, 10:07pm

One additional advantage that some kind of extension marker would have is to facilitate the concept of extensions in Markdown parsers.

It could really improve the Markdown ecosystem if the parsers/renderers implemented hooks for plugin extensions. It would be fairly simple to have the parser detect the marker/token in markdown and pass the information up to an extension hook. If a parser extension could handle it properly the contents of the token would be rendered and sent back down to the Rendering system.

This wouldn’t make sense to do for every HTML comment tag, but would work well for an extension token.

jgm · March 28, 2018, 10:47pm

DougR noreply@talk.commonmark.org writes:

It could really improve the Markdown ecosystem if the
parsers/renderers implemented hooks for plugin extensions.

Some of them do. gfm’s extensions to cmark were implemented through a
plugin system designed by Mathieu Duponchelle.

The javascript parser markdown-it has a plugin system.

The Haskell parser I’m
working
on has an extension system.

Probably others too, these are just the ones I’m aware of

It would be fairly simple to have the parser detect the marker/token in
markdown and pass the information up to an extension hook.

Here you’re underestimating the complexity. As these examples show,
it’s fairly complex to design a plugin system that works with commonmark.

Note that it’s also possible to customize by re-using the existing
constructs. E.g. one could construe a link with an empty target as
a wikilink, or overload HTML comments in the way described. For this
kind of thing, you don’t need to modify the parser at all; you just
need a way to manipulate the AST after parsing. My
lcmark gives a way to do this
with cmark.

drobertson123 · March 29, 2018, 12:43am

I have no doubt I am underestimating the complexity. This is just based on needs I have had in the past and frustrations I have with markdown.

Without subverting the use of some other element of markdown or effectively breaking how markdown itself works I can’t add functionality I need to markdown.

If there was a clear specification that says “An item inside these characters is a target of extension methods.” it would greatly simplify the process of extending markdown and discourage some of the more random ways people do it now.

I don’t even care if the content of that area is specified in any particular way. I just need to know that if it goes through a markdown processor with no extension that can do something with that token it will not render it.

I just need something that allows it to be a target. We don’t have that now.

Each of the examples you are giving actually uses something else that already has different functionality. That is specificly what I am trying to get away from. And I would expect that some defined token for extensions would actually make it easier to extend markdown in a spec compliant way.

Markdown shouldn’t make it hard to add needed functionality. That is the entire point of what I am suggesting. Instead of having to hack around the limitations of markdown, let’s add a simple part of the spec that allows us to consistently extend the abilities of markdown without breaking the spec.

So many of the requests for “Could markdown implement XXXX” could be answered by implementing an extension that uses this type of extension target token. Features could be added without expanding or breaking the spec in a significant way.

jgm · March 29, 2018, 4:03am

In fact, there are a few different kinds of extensions you need
to make room for:

operate on raw text and yield a block element
operate on raw text and yield an inline element
operate on commonmark block content and yield a block element
operate on commonmark inline content and yield an inline element

In pandoc, we’ve developed a system that works well for all four.

fenced code block with structured attributes or a “raw”
annotation
```
``` {=dot}
graph graphname {
a -- b -- c;
b -- d;
}
```
```
inline code backticks with structured attributes or a “raw”
annotation

$x^2$ {=latex}

fenced div with structured attributes

::: warning
1. Don't read this.
2. Or you'll regret it.
:::

bracketed inlines (as in links) with structured attributes
```
[This is *colored* text]{color=red}
```

The attributes can be intercepted in a filter which can do
as it likes with the contents.

This has been quite a flexible system, and it degrades well
(in pandoc) when you don’t have the filter.

drobertson123 · March 29, 2018, 7:00am

Great, then let’s get something like that in the spec. The point is that if it isn’t in the spec and you try to do that outside of pandoc you have issues.

The pandoc solution you have sounds good, but it is part of the bigger problem. The pandoc solution doesn’t work outside of pandoc (or other systems that implement it). That means that when you use that pandoc compliant markdown someplace else it doesn’t work correctly.

It may be a great solution, but it contributes to the fragmentation of markdown. Without consistency and a spec that allows for extension we will continue to get this fragmentation problem.

Later, when I try to use the markdown based documents that were compliant with pandoc in a different rendering system everything falls apart.

The goal of the spec should be to prevent that.

pantherse · March 29, 2018, 9:38pm

I myself have been looking into the same issue while working on a side-project. I specifically wanted to use markdown to produce a layout template; but, I digress. If an “extension” syntax is added to the spec, I think there should be some well-defined rules of what should go in the spec. For me, the most critical piece is the extension should only define the syntax to define an “extension” block. It should also provide a fallback behavior if the processor is unable to render the extension block.

I think an extension mechanism will still cause some form of fragmentation because not every compliant processor will have support for every extension. I feel if the spec is written with provision of expected behavior when the processor cannot render the “extension” block, it shouldn’t be much of an issue. Users can choose the implementation that suites their needs. If they’re forced to switch implementations the pain point should be the process of switching from implementation A to implementation B. Get past that, your existing content still stands.

RyanGray · March 30, 2018, 7:41pm

The problem with Markdown really seems to be the disagreements on any new additions. I would argue that there are many cases where I would rather the renderer that can’t handle the extension should actually show the raw source than hide it. Of course, some things make sense to hide but should have an alt text attribute much like an image.

Brian_Lalonde · March 30, 2018, 8:15pm

I’m not sure why you consider using HTML comments “subversion”.

Adding new syntax would be a noisy subversion of the low-ceremony philosophy of Markdown if it were used with each extension supported. I’d really like to see tables and the MDwiki Alerts Gimmick (similar to what @jgm called “fenced divs”, but importantly with no additional syntax) added to the core. Everything else seems like extending through dropping back to HTML (comments, script elements, object elements) is already sufficient, since they provide a pretty rich feature set using a mature syntax, rather than gradually turning Markdown into AsciiDoc or reStructuredText or a dozen other lightweight markup lanuages.

jgm · March 30, 2018, 9:44pm

Brian Lalonde noreply@talk.commonmark.org writes:

Everything else seems like extending through dropping back to HTML
(comments, script elements, object elements) is already sufficient,
since they provide a pretty rich feature set using a mature syntax,
rather than gradually turning Markdown into AsciiDoc or
reStructuredText or a dozen other lightweight markup
lanuages.

Dropping back to HTML has two big disadvantages:

It only works well when you’re targeting HTML. Markdown is
routinely used to author documents that will be published in
a wide variety of formats, including LaTeX, docx, and PDF.
It compromises the Markdown design goal of having a source
that is readable as plain text.

I do see Markdown/commonmark as a member of the generic class of
lightweight markup languages. It is distinguished from the other
members of that class by its design goal of source readability
(compare nested lists in Markdown and AsciiDoc, for example)
and its popularity.

But, part of what has made this project difficult is that different
people use Markdown in very different ways. On the one end of the
spectrum, you have people who just want an easy way for users to
write quick comments on a website. On the other end, there are people
writing books and dissertations in Markdown. Many of the debates
that arise about spec issues have their source in these differences.

Brian_Lalonde · March 31, 2018, 6:53am

HTML can be rendered in these formats, too. There’s no real reason to exclude it from them.

The needs of complex or precisely formatted text will never align with a minimalist markup, just by definition. The inertia of popularity alone is a risky reason to invest heavily in a technology, especially when its organizing principal works against your goals.

jgm · April 1, 2018, 12:23am

Brian Lalonde noreply@talk.commonmark.org writes:

The needs of complex or precisely formatted text will never align with a minimalist markup, just by definition. The inertia of popularity alone is a risky reason to invest heavily in a technology, especially when its organizing principal works against your goals.

It’s not just popularity. Readability of the plain text source is a primary goal of Markdown (and commonmark – see the second paragraph of the spec), and this is where Markdown does better than other light markup formats, in my opinion.

You lose this readability once you drop down to HTML. And that’s why exploring extensions is a reasonable thing to do, even though, as you say, for very fine-grained control you’re probably going to need something more expressive than even extended commonmark.

Which would you rather read, in plain text?

`--help`
  ~ Print this usage message.

`--fragment`
  ~ Create a document fragment rather than a standalone document.

`--output` *FILE*
  ~ Direct output to *FILE*.

or

<dl>
<dt><code>--help</code></dt>
<dd>Print this usage message. <code>--fragment</code>
</dd>
<dd>Create a document fragment rather than a standalone document. <code>--output</code> <em>FILE</em>
</dd>
<dd>Direct output to <em>FILE</em>.
</dd>
</dl>

If I’m reading a plain text document and I come across this

> Fruit        | Price   | Season       |
|--------------|--------:|--------------|
| Apple        |    2.33 | Fall         |
| Strawberry   |    3.53 | Summer       |

it’s obvious that it’s a table, and I can read off the information
as easily as in a rendered version. Not so with the HTML version:

<table>
<thead>
<tr>
<th>Fruit</th>
<th style="text-align: right;">Price</th>
<th>Season</th>
</tr>
</thead>
<tbody>
<tr>
<td>Apple</td>
<td style="text-align: right;">2.33</td>
<td>Fall</td>
</tr>
<tr>
<td>Strawberry</td>
<td style="text-align: right;">3.53</td>
<td>Summer</td>
</tr>
</tbody>
</table>

EDIT: Fixed initial pipes in code blocks, which this forum converted to >.

Brian_Lalonde · April 2, 2018, 3:57pm

Perhaps. It would be interesting to see some data to this effect rather than making assumptions. While I’d agree that the examples given are probably easier to read, I’d consider the AsciiDoc-style use of a wide array of hard-to-remember characters much harder to edit than the much more consistent SGML-style, especially if considering any of the complex multiline cell concepts that have been discussed, when CommonMark would cease to be a light markup.

Keep in mind, too, that many of those end tags aren’t needed.

<dl>
<dt> <code>--help</code>
<dd> Print this usage message. <code>--fragment</code>
<dd> Create a document fragment rather than a standalone document. <code>--output</code> <em>FILE</em>
<dd> Direct output to <em>FILE</em>.
</dl>

<table>
<thead>
<tr>
<th> Fruit 
<th align="right"> Price
<th> Season
<tbody>
<tr>
<td> Apple
<td align="right"> 2.33
<td> Fall
</tr>
<tr>
<td> Strawberry
<td align="right"> 3.53
<td> Summer
</table>

drobertson123 · April 19, 2018, 8:19am

I consider it subversion because you are using HTML comments in a way they are not design for or specified.

I agree that it can be convenient, but Markdown isn’t just used for HTML. Adding HTML for a MD document may not fit in many circumstances. It is also subverting the intent of HTML comments, which is to add HTML comments. If we regularly use HTML comments for extending markdown what happens when your completely legitimate HTML comments get interpreted as an extension.

It is just a bad fit that we use because it is the most convenient thing we have. I honestly think it is a poor way of doing things. It would prevent us from having a standardized way of extending Markdown and force people to continue to use creative hacks.

drobertson123 · April 19, 2018, 8:37am

Hi Ryan

I am completely open to the idea of a readable extension syntax. Good arguments can be made for showing or not showing the contents of the extension area. In fact, that would be great to configure in the rendering engine based on the use case.

My main goal is to get some form of defined extension syntax into common mark so we know what to expect.

Markdown is going to be extended. It already happens in so many ways. Without a defined way to do it we need to deal with each extension strategy separately. It becomes unmanageable chaos over time and markdown becomes less consistent and harder to use.

I don’t want to have to make major changes to my MD files if I decide to use Pandoc vs a different rendering engine.

Ideally, with an extension syntax we could have some managed consistency between rendering engines. One rendered may not support an extension you wish to use, but at least it would know that the extension block was for an extension and have ways of handling it even if it didn’t have that extension available.

Right now we have the issue of proverbial blocks. Parts of the document that are tagged for an extension in one rendering system, but other rendering engine just look at them and say or improperly treat them as plain text.

Even HTML comments have similar issues. If the rendering engine doesn’t have a specific way of rendering HTML comments and the output is to a non HTML output you get a mess. HTML improperly rendered to a PDF file or EPUB is not good.

The goal of an extension syntax should be that every rendering engine should implement strategies for handling extension tokens even if it doesn’t know what the contents of the extension token means.

Brian_Lalonde · April 19, 2018, 7:14pm

Comments have a pretty long history in many languages of becoming an extension point, especially polyglot languages (like Markdown, which is a superset of HTML by design).

Fenced code blocks are also a pretty reasonable choice.