Generic directives/plugins syntax

mb21 · September 6, 2014, 9:50am

This is a proposal for a generic syntax to accommodate custom directives/plugins, which is as simple and markdown-like as possible while still accommodating a lot of use cases. While not enabling all markdown extensions (e.g. tables, custom linebreak behaviour etc. doesn’t really fit the template), it works for a lot of them.

There are three different kinds of directives:

Inline, start with one colon (analogous to spans)
Leaf block, start with two colons (analogous to empty divs)
Container block, start with at least three colons (analogous to divs containing further blocks)

This enables parsers to easily distinguish between inline and leaf blocks (even if an inline directive is the only content of a paragraph), and between leaf and container blocks (without a lot of lookahead).

1. Inline Directive Syntax

The syntax for inline directives:

:name[content]{key=val}

Exactly one colon, followed by the name which is the identifier for the extension and must be a string without spaces, content may be further inline markdown elements to be interpreted and then printed in one way or another and the {#myId .myClass key=val key2="val 2"} contain generic attributes (i.e. key-value pairs) and are optional.

(Originally, the : was an @, but ! and :: are also under discussion. Also, the original post proposed :name[](){} with the () containing a string that’s supposed to contain one or more identifiers, e.g. ids, URLs or filepaths.)

A few example applications:

Pandoc’s citation syntax could then consequently be changed to :cit[smith04] and :cit[@doe99, pp. 33-35, 38-39 and *passim*].
spoilers that are hidden by default :spoiler[it's a _happy_ ending]
Wikipedia links: :wikipedia[Foobar]
Small caps: :smallcaps[content]
References to figures: as seen in the :ref[scatter plot]{target=myFigure} could expand to: as seen in the <a href="#myFigure">scatter plot (Figure 15)</a> (when the image with id myFigure is the 15th)
Create tel:-links for phone numbers: :tel[+375255318270]
more on this GitHub wiki page…

2. Leaf Block Directives

The syntax for leaf block directives:

:: name [content] {key=val}

To be recognized as a directive, this has to form an otherwise empty paragraph. But as opposed to inline directives, there are two colons now, the brackets [] are optional as well, and spaces may be interspersed for readability.

Example applications:

Proposal from this thread for code snippets: ::snip[label]{file=path/to/code.scala}
and from this thread: ::youtube[title]{vid=09jf3ow9jfw}
this thread: ::video[title]{file=filename.mp4}
transclusions: ::include{file=other-file.md}
placing a table of contents: ::toc[Table des matières]

3. Container Block Directives

Container blocks contain further blocks. The proposed syntax for container block directives is:

::: name [inline-content] {key=val}
contents, which are sometimes further block elements
:::

Analogous to fenced code blocks, an arbitrary number of colons greater or equal three could be used as long as the closing line is longer than the opening line. That way, you can even nest blocks (think divs) by using successively fewer colons for each containing block. Finally, the first line might also have non-significant trailing colons, so you can do things like:

:::::::::::: SPOILER :::::::::::::
We're going to spoil it in three
easy steps:

1. ready
2. steady
3. go
::::::::::::::::::::::::::::::::::

Further examples include this proposal:

:::eval [label] {.python}
x = 1+1
print x
:::

amending this proposal:

:::form
Choose an email
_____

Choose a password
*****

[submit]
:::

and replacing the .md.html.-example syntax from the CommonMark spec.txt:

:::md-example
- one
- two
:
<ul>
<li>one</li>
<li>two</li>
</ul>
:::

If no name is provided, it falls back naturally to a div with optional attributes:

::: {.myClass}
some markdown
:::

Span element

And because it’s quite similar, a proposal for a native span element with attributes:

hello [world]{.myClass}

which would translate to:

<p>hello <span class="myClass">world</span>

Identifier Registry

Finally, we could set up a maintained registry for the extension-identifiers (cit, snip, etc.) and if a processor doesn’t know an extension it could always fall back to a span with class="extension-name" for HTML. Even more useful should be a generic directive representation in the AST which modules or filters could hook into to use a different set of directives for different use cases. That representation then should contain both the parsed and raw strings of the directives’ contents, because some plugins might want to use the parsed markdown while others want to do the parsing of the contents themselves.

EnCey · September 6, 2014, 12:25pm

I do like this suggestion, however I’m not a fan of the curly braces.

They seem to be redundant and unnecessarily complicate the directive: which options go into parenthesis () and which into braces {} ?
Would it not be sufficient to only permit options in parenthesis, then we have the common link syntax with the additional @ directive in front of it.

I’d like to add an example for CSS syles:

@style[content](classNames) as in @style[TODO](important) where important is a CSS class name

I realize that the curly braces would allow you to do more advanced CSS styling, however I don’t believe that would be such a great idea. First off it’s not very Markdown-y, wheres only using class names (as in the example) remains readable. From a maintenance point of view, inline CSS sounds like a recipe for unmaintainable, unreadable documents.

mb21 · September 6, 2014, 1:06pm

The reasoning for the different braces is that:

() contains a simple string that the directive is free to interpret and serves as kind of the default argument (usually a URL or other identifier), and
{} contains generic attributes (i.e. key-value pairs similar to what in some programming languages are called named arguments) that might also be passed on directly to the HTML element, e.g. @video[title](filename.mp4){autoplay=1}. And {#myId .myClass} are simply shortcuts for {id=myId class=myClass} — this syntax is taken from Pandoc.

I’m not sure I get your @style proposal, could you make an example for the generated HTML? If it’s <span class="important">TODO</span>, that would be [TODO]{.important} in my proposal (see the very first code block above).

EnCey · September 6, 2014, 2:51pm

My style proposal was presented as an alternative to the curly brace syntax, you interpreted it correctly.

I’m aware of the power the generic attributes would give to extensions, I’m merely worried about how easy it will be to read for non-developers. It also deviates somewhat from Markdown’s goal of remaining readable without being rendered.

My suggestion thus was to only use parenthesis to make it easier for regular users to use and read the syntax. For example to write @video[title](filename.mp4 "autoplay=1") (or perhaps without quotes if possible). This is very similar to the link syntax and should thus be easy to pick up.

For extension authors it shouldn’t matter much either which form is used. Key-value pairs would make it easier to create extensions, but parsing them as string argument isn’t that much harder. The #myId and .myClass arguments must be parsed as strings too, unless # and . are explicitly defined by the specification, which wouldn’t be a good idea imo as they are very specific and not really generally useful aside from CSS use cases.

Edit: to clarify on the last part, I’m assuming that CSS styling is an extension and not a built-in feature.

mb21 · September 6, 2014, 3:53pm

Yeah, I wouldn’t be really opposed to @name[]() instead of @name[](){} if that should emerge as the consensus for a more limited syntax.

That being said, I guess the appeal of {} to me is that I would like it to become a no-brainer for a user to know what to put in there because it’s the same everywhere. The distinction between values () and key-value pairs {} is also somewhat inspired by ConTeXt (see 1.3 Commands). (btw: ConTeXt, as opposed to LaTeX, also has a notion of a class, so .myClass isn’t entirely HTML-centric.)

Finally, {} is probably also easier to implement in parsers. But yes, as you proposed, I guess we could also have all the attributes in the (), as in

@video[title](filename.mp4 "autoplay=1")

![altText](myImage.png "title" .myClass width=40 height=50)

### myHeader ### #myId

It works okay for everything but headers (although it still works I guess). But some implementations like Pandoc already make use of {} for attributes in some places. And my proposed span syntax [text]{.class} can’t be converted to []() since that is already taken for links.

mofosyne · September 7, 2014, 4:13am

The issue with that is that some existing parsers will process everything in () as a url.

Could something like this be suitable? Where we have a semi json { and } field (but made easier by not needing " for setting names. And just use that for directives

link style directives:

For standard links:

[ Textual Information ]( hypertext url ){ semi-json style settings}

Which can be omitted most of the time if needed:
    [ Textual Information ]( hypertext url )

For picture or video links, where it is recognized by `.png` or `.mp4` etc... :
    ![ Textual Information ]( hypertext url ){ semi-json style settings}

To which we can either go classic semi-json or html inspired:

![altText](myImage.png){ title="title", class=.myClass, width=40, height=50 }
![altText](myImage.png){ "title" .myClass width=40 height=50}

For shorthand picture or video links

![ Textual Information ]( Image url ){ semi-json style settings}
!video[ Textual Information ]( video url ){ semi-json style settings}
!audio[ Textual Information ]( audio url ){ semi-json style settings}
!base64[ Textual Information ]( embedded binary file ){ semi-json style settings}

The benefit to this approach is that we keep the settings or the ‘markup’ separate from the actual content of ![]()

block style directives

So for directives + semi-json semi-html. Sticking to the style of code fencing. We could use this.

@@@latex [title]{fontsize=14}
\frac{1,2} = 0.5
@@@

Btw, why @@@@, and not $$$$$ or even ~~~~ ( Maybe !!!! since ![](){} but looks rather ugly)

~~~~~latex [title] {fontsize=14}
\frac{1,2} = 0.5
~~~~~

$$$$$latex [title] {fontsize=14}
\frac{1,2} = 0.5
$$$$$

{} [] and () are all optional.

e.g. using url for up to date csv input?

@@@graphGen [My Graph Generator] (www.example.com/values.csv) {graphsize=10}
generateBarChart(x,y);
@@@

summary

When choosing directives, please try to keep to the tradition of

[] is only for text title/descriptor
() is only for url.
{} is only for options & markups

As per ![](){} tradition.

EDIT: Modified example to better match how fenced code block is treated according to mb21 . Instead of @@.latex, its @@latex

mb21 · September 7, 2014, 10:17am

@mofosyne While I like your idea of having only a URL/identifier in the (), actually Common Mark defines images already as ![foo](/url "title")and that’s kind of why I worte that the part between () “is a string to be interpreted by the custom directive that generally is not going to be directly visible”. But we could of course do it “better” for directives than what happened to the image syntax, but then again that’s kind of restricting directives more than really necessary.

I think ![altText](myImage.png){.myClass width=40 height=50} is great and is in fact already proposed in the consistent attribute syntax thread. About the comma-separated variant: personally, I don’t like there being too many ways to write the same thing and think this is already “semi-json” enough.

About container block directives: for fenced code blocks, the standard currently says to use:

```ruby

and not:

```.ruby

So I think that’s consistent with my proposal. I agree that the spacing probably should be allowed there as you suggested and that also the [] might be optional there.

Finally, the @ and @@@ is a rather arbitrary choice. But it should be some special character that is easily distinguishable from ! (to make it easy to see that it’s not an image) and $ is often already used to mark up mathematics.

EnCey · September 7, 2014, 3:35pm

As @mb21 mentioned, images as well as links allow a title to be declared in parenthesis.

That’s why I suspect users may get confused and try to write expressions like the following:

![image](/url){title="My image"} instead of ![image](/url "My image")

Options can be in either () or {}, there is no rule or easy way to guess which option goes where (when you’re new to CommonMark) and you have to try by trial-and-error.

I’m all for a generic attribute syntax that is applicable everywhere so it’s easier to learn the language, however sadly we’re already stuck with the options-in-parenthesis due to backwards compatibility. The question that must be answered now is what’s easier for users to learn and continually use.

mofosyne · September 7, 2014, 4:42pm

I don’t see how we can’t support both methods?
You could call options-in-parenthesis inside (/url "My image") as “shorthand” for {title="My image"}.

We need to encourage {} but recongize that in most use cases, people will only need to declare title, and thus contextually ![image](/url "My image") is enough.

Just make it so that internally the parser, automagically converts:

![image](/url "My image") --> ![image](/url){title="My image"}

tl:dr: Treat options-in-parenthesis as syntactic sugar of {} .

mb21 · September 7, 2014, 5:48pm

Since {} is already in use for attributes, I tend to favour it over () for options. Personally I find the title in ![altText](image.png "title") kind of unfortunate, but it’s way too late to change that. But when writing custom directives, the discussion here convinced me that we should discourage the use of multiple things in (). The question is whether we should actually disallow it. I just edited my first post above and changed (options) to (arg) to reflect this.

mofosyne · September 7, 2014, 6:17pm

Depends on how much you hate programmers lol. Hmmm… should we actively disallow the use of multiple things in ()?

For `![]()` media:

The answer is no, since people use it as shorthand ![image](/url "My image") (and that is a good thing). Heck in normal english language, we use shorthand all the time. Since we expect others to be able to derive the full meaning with a bit of context. Speed is of essence in life.

 !mediaType[description](url){#myId .myClass key=val key2="val 2"}

mediaType :~ (e.g. !youtube) Assumed to be image if left blank

For `@[]()` directives:

I think it depends. In some cases it may make sense to have syntactic sugar in []. But since it’s an extension directive, just pass anything in arg into a string to the extension, and let it deal with that issue. For consistency sake however {} should only hold key values pairs to be passed to the extension and CSS IDs and Class as you have illustrated already.

@name[content](arg){#myId .myClass key=val key2="val 2"}

name :~ name of extension to call
arg :~ Everything in arg is just passed directly to the extension as a single string. It is up to the extension if it wants to implement any shorthand notations.
{} :~ The parser will read #myId and .myClass for HTML attributes before passing everything in {} to the extension

The {#myId .myClass key=val key2="val 2"} comes from this discussion about keeping attributes syntax consistant. Since we are passing attributes and parameters, I’m using this as reference.

http://talk.commonmark.org/t/consistent-attribute-syntax/

jroper · September 8, 2014, 2:09am

As someone who has written a few syntax extensions, I like this proposal.

In one of the comments, someone commented that parsers that don’t support this syntax will render these as ordinary links. In most of the cases above, that is actually the big advantage of this proposal, since they are effectively links, and rendering a link to them is a very appropriate fallback. We’re currently using a very similar syntax to this for code snippets, have a look at what the GitHub markdown renderer does with them:

https://github.com/playframework/playframework/blob/master/documentation/manual/scalaGuide/main/http/ScalaResults.md

It renders them as links to the actual file, which is a perfect fallback, and is a big reason why we chose that syntax.

mofosyne · September 8, 2014, 2:22am

Sweet, so with your example, if the extension was called “setContent-Type” would be:

 @setContent-Type[](code/ScalaResults.scala)

Example mockups for inspiration. How would a python extension to markdown look like? Obviously the actual python extension might treat arg and parameters bit differently.

Scientific document with inline runnable snippets of python.

@@@python[ Test Prog ]( SciPy, NumPy ){ codePreview=yes }
    print("Hello World")
    # More python code for scientist here
@@@

So is there a consensus forming here? Any Objections?

jroper · September 8, 2014, 2:39am

Not quiet, in my example, the extension would be called “snip”, and would look like this:

@snip[content-type_text](code/ScalaResults.scala)

snip is the extension (code snipped from external file), the bit in the [] is the fragment of code snipped from the file, so in the file, you’ll see stuff like this:

    "A scala result" should {
      "default result Content-Type" in {
        //#content-type_text
        val textResult = Ok("Hello World!")
        //#content-type_text
        testContentType(textResult, "text/plain")
      }

And then what gets rendered is the val textResult = ....

Though, we might also consider something like this:

@snip[Some label](code/ScalaResults.scala){#content-type_text}

Now the bit in the [] might be an optional title rendered above the code snippet. This syntax I think would be more in line with the spirit of the description of each element in this proposal.

Of course, we’re talking about extensions to the markdown syntax here that are never intended to be part of markdown proper, all of this is just a guide, it’s not prescriptive, there’s no right or wrong way in how to use it.

mofosyne · September 8, 2014, 2:49am

Ah okay. Well aside from the specificity of how your “snip” extension works jroper. Is there a consensus for general extension syntax ?

@name[content](arg){#myId .myClass key=val key2="val 2"}

@@@name[content](arg){#myId .myClass key=val key2="val 2"}
      \*Content to pass to extension*\
@@@

chrisalley · September 9, 2014, 10:27am

If there are going to be extensions using a JSON-style settings hash, I suggest making the list of attributes look more like JSON/YAML rather than HTML attributes. For example:

!audio[ title ]( url ){ size: 10, duration: 10, cycle: forever }

is more readable than:

!audio[ title ]( url ){ size=10 duration=10 cycle=forever }

The lack of whitespace between the key/value pairs in the latter example makes the syntax feel squashed.

dajare · September 9, 2014, 11:20am

Note also that MarkdownExtra uses curly braces for attributes as well. It has been around a long time (the curly braces since at least the 1.0 of PHP MarkdownExtra, dated 5 September 2005! FWIW…).

mofosyne · September 9, 2014, 11:29am

Yup and is our prime inspiration in this thread. Pandoc as always is ahead of the pack. One may say we are pulling up markdown. I say we are catching up to Pandoc.

Any further objection or discussion on syntax? If not, then this can be done as a feature request in.

mofosyne · September 10, 2014, 2:53pm

Most people writing markdown are more familiar with HTML I would think. Plus CommonMark accepts HTML tags, so best to minimise mental switching cost by keeping the syntax somewhat same.

Also {} is not encouraged to be used liberally, it should only really be used when there is no other alternatives. So keeping it short, quick and dirty, and compact will minimize it’s visual presence in the page.

EnCey · September 10, 2014, 8:44pm

As someone who works a lot with JSON, I can confirm the first version feels more readable to me – I however doubt that this is a generic rule applicable to everyone. I don’t believe that a key-value pair using colons : is generally more readable than one using equality signs =.

I also don’t believe we should use colons just because JSON also happens to use curly braces.

On the other side, I wouldn’t see any harm in allowing either colons : or equality signs = when specifying attributes, or allowing optional commas ,. The values on the right-hand side of a key-value pair shouldn’t contain either symbol and the additional overhead for parsers is minimal.

It makes the syntax more complicated to learn if there are 2 alternate ways to specify attributes, but also more flexible. Either solution has its merits.