Generic directives/plugins syntax

Or, a somewhat more coherent version of my last post:

I think inline directives will always smell kinda bad, because plain text doesn’t really do anything with them. So, reusing an existing syntax at least minimizes the additional bad-smell introduced to the language, and increases learnability. So, I think we should stick with !foo for inlines, a la images, rather than introducing another meaningless bit of characters. There’s no functional difference between !foo and ::foo otherwise, just the prefix.

For blocks, I think we can usefully separate them into three categories:

  1. Leaf blocks
  2. Code-containing containers
  3. Markdown-containing containers

#1 is technically a subset of the others, but a lot of languages have special support for them, and I think they look decent in plain-text to support.

#2 and #3 should be distinguished so that processors which don’t understand the extension can do something useful - show preformatted text, or continue formatting contents as markdown. This is an important feature of generic extensions.

I think tagged code blocks already fulfill #2. It works for the obvious case of syntax highlighting, and I think is appropriate for “evaluated” code as well - falling back to displaying raw LaTeX, or raw railroad diagram descriptions, is totally fine and useful.

I think the OP’s proposal for using :::foo for #3 is good. It doesn’t clash with existing syntaxes, and it has good plain-text aesthetics, particularly when used with no arguments.

I also think it’s good for #1, for consistency. I assume you tell leaf apart from container by the presence of a blank line after it?

1 Like

So you think !foo[](){} is preferable to just !foo[]{}?

Not hugely important to me. I’m fine if the core language directives use some special syntax; I don’t think we really need to “explain” ref links ([foo][ref]) in this syntax.

On the other hand, consistency is nice - the two existing forms that use have a shared syntax for the () part: url, and possibly string for title. We could enforce this consistency: allow () on custom directives, but force it to have that same syntax. Not all custom directives need a url, for those that do, having them all use the same basic syntax (rather than having to remember which key this or that particular custom directive uses) is good. (I liked the examples of !video[foo](/foo.mpeg)/etc, those seemed very easy to understand if you already knew how images worked.)

Using () on a directive that doesn’t use that information is the same as passing a key it doesn’t use - it’s just wasted data.

(Things with more complex needs, like the slideshow example from previous that needs multiple urls, couldn’t use () for that. That’s fine, I think.


Edit: I expressed an opinion on the syntax inside of {} here, but I was wrong. Matching PanDoc and MarkdownExtra is valuable here; we should use their syntax, per http://talk.commonmark.org/t/consistent-attribute-syntax/272.

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.


No, the container blocks are closed with ::: on its own line, while leaf blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.

My reason for separating the two was because they have different “ideal fallbacks” when presented in viewers that don’t understand the extension. “code containers” should fallback to preformatted text, while “markdown containers” should fall back to formatting their contents.

No, the leaf blocks are closed with ::: on its own line, while container blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.

The OP doesn’t have ::: closing any of the leaf directives, and one of the container directive examples (:::::SPOILERS:::::) has ::: on both sides of the initial tag.

Oh, obviously it’s exactly the other way, edited my post to:

No, the container blocks are closed with ::: on its own line, while leaf blocks are not.

btw, I’m the OP :wink:

Oh, obviously it’s exactly the other way,

The problem is that you currently have to search forward an arbitrary number of lines to tell whether the directive is a leaf or a container. That’s not very good for parsing; ideally you have a finite amount of lookahead (right now, parsing MD requires 1 or 2 lines of lookahead, depending on your strategy in some cases). It also means that you wont’ be able to use a bare ::: without any further arguments to just mean “div”, as that would also trigger the “I’m a container!” behavior of a preceding block directive.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after. That’s why I suggested “blank line after” to mean leaf; it seems consistent with how we handle other types of blocks, and is easy to learn. (“end of container block” would also work, obviously, for those blocks where that’s unambiguous.)

btw, I’m the OP

I know, I was using it to mean “original post”, to specifically point at the first post in the thread for clarity.

I’m no parsing expert, but fenced code blocks currently work the same way. You potentially have to look ahead to the end of the document to make sure that three backticks aren’t starting a code block, but are actually just part of the paragraph text. jgm seems to be fine with the proposal for pandoc as well.

Since you usually don’t want an empty div, I think it makes sense to have the div syntax to be closing as well:

::: {.my-empty-div}
:::

Without a blank-line-after rule, you can do something like:

::: spoiler

# my title

my very long paragraph...

:::

Hmm, you’re right. The last MD implementation I wrote used ~~~-fenced code blocks (a la PHP Markdown Extra), so I didn’t even think of the issue with code and backtick fenced blocks.

Well, since we already have to deal with arbitrary lookahead (and GitHub-flavored MD being popular enough that we can’t really lose backtick-fenced blocks), I guess it’s not an issue to add an additional instance of arbitrary lookahead.

My question about div was more this example:

:::foo
some text
:::
This is supposed to be wrapped in a div.
:::
This should still be in the "foo" block.
:::

If you have anything in the div starting line, this works as expected - the “foo” block stays open until the end, etc. But if you have just an anonymous div block, it instead looks like a closing line for the “foo” block, and the subsequent ::: line (should be closing the div) instead opens a div.

If may be that you intend that you have to provide something on a container line - a name, some options, anything, so that a bare ::: line is always a closer. That’s fine, if so.

Or maybe you just intend this to be handled by the generic “nesting things inside of a block requires the outer block to have more colons than the inner one; otherwise weird things might happen”. That would be reasonable too.

Yes, I think both approaches are viable. If we’re going with the first, a div with no attributes would still have to look like this:

::: {}
my content
:::

+++ Mb21 [Nov 13 14 19:19 ]:

mb21 [1]mb21
November 13

 I'm no parsing expert, but fenced code blocks currently work the same
 way. You potentially have to look ahead to the end of the document to
 make sure that three backticks aren't starting a code block, but are
 actually just part of the paragraph text.

The way it works in CommonMark, a fenced code block starts with an opening fence and continues until either a matching closing fence or the end of the containing block (which might be the whole document). So you never need to lookahead. Avoiding lookahead that might need to reach to the end of the document is important.

With a fenced syntax for a block container, the following strategy could be used. If no closing fence is parsed before the end of the document (or enclosing container, which might e.g. be a block quote or list item), then we simply emit a paragraph with the raw contents of the opening fence, followed by the blocks parsed. Or, by analogy with the fenced code blocks, we could treat the unclosed container as if it were closed. Either way we’d avoid the need to lookahead.

 jgm seems to be fine with the proposal [2]for pandoc as well.

Do you infer that from my silence?

Thank you for the clarification. If I understand correctly, it is possible to efficiently parse and differentiate between

:::foo

as a leaf block directive, and

:::foo
:::

as a container block directive.


What I meant was that you were fine with this div-syntax from an implementation-perspective for pandoc (since you originally posted it on pandoc-discuss), not that it was your favourite of the different proposals. Sorry for the confusion.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after.

That’s why I think my proposal is better for block directives (you can tell between a leaf and a block quickly from the first line):

leaf

!spoiler: harrys kills voltmort. But spares hermimi

container

!spoiler: toggle visibility on hover
:::
  harrys kills voltmort
  But spares hermimi
:::

More compact

!spoiler::::::::::::::::
  harrys kills voltmort
  But spares hermimi
::::::::::::::::::::::::

A compact version for single paragraph directive.

!spoiler: 
  harrys kills voltmort
  But spares hermimi

where

block = !spoiler:

inline = !spoiler

+++ Mb21 [Nov 13 14 22:14 ]:

mb21 [1]mb21
November 13

 Thank you for the clarification. If I understand correctly, it is
 possible to efficiently parse and differentiate between

:::foo

 as a leaf block directive, and

:::foo
:::

 as a container block directive.

I may have misunderstood the issue; I was only commenting on container blocks. A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents. I think, if generic leaf blocks were wanted, that it would be better to distinguish them syntactically from the starters for container blocks.

A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents.

Not quite; while a block leaf can contain inline markdown in its part, or as the value of one of its {} options, the “leaf” concept is really just about having a block that doesn’t have any further contents, like a video block.

I just read this interesting discussion and I’d like to chime in with a few remarks:

I don’t really like the ![](){} --> !default[](){} --> !image[](){} idea with redirecting to image or video based on the provided URL. This ties the generated output to a particular implementation of the parser which knows how to recognize if something is an image, a video, or something else. Parsers will have to look into each other’s implementation to figure out how to classify a particular URL to remain consistent. The output would also change if generated before or after a new video or image site or file format appears and is added to the list of known URLs. To me the ![]() image syntax is part of the core spec and as such should be strict and well-defined and always cause the same output to be generated in all cases. This would also make unit testing of implementations easier.

On the other hand extensions can be completely free to generate anything and it should be explicit that extensions are not standard syntax. That’s why I propose to distinguish clearly between extensions and core syntax by using a different marker character, for example, as previously suggested, @ instead of !. That way writers will know that a particular feature they’re using is tied to a particular website, CMS, or parser. Allowing an empty extension name would make this possible : @[](){} --> @default[](){} --> !image[](){}. This has the added benefit to avoid possible name clashes between extensions and any future evolution of the spec which might want to define a new name, but that would be impossible if an extension widely in use is already using it. I also rather like that @ stands out a little more than ! to locate extension uses more easily when rapidly scrolling through a document.

Concerning the multiple ways to provide arguments using [](){}, to me it looks like it could be simplified by considering that :

  1. [] and () are both used to provide one or 2 unnamed ordered arguments
  2. {} is used both to provide unnamed ordered arguments and named arguments

So [], () and unnamed arguments in {} could be used interchangeably. For example, @extension[argument 1](argument 2){argument 3, other: argument 4} and @extension[argument 1][argument 2][argument 3]{other: argument 4} are equivalent. Ultimately one of [] or () seems redundant but it is already widely used so they could be considered as alternative ways to say the same thing, as seems to be the general philosophy of the spec.

Right, I guess @tabatkins was on to something and using a different opening-syntax for leaf-blocks and container-blocks would avoid lookahead in almost all cases; e.g. !foo[] for inline directives, :foo[] for leaf block directives, and for container block directives:

::: foo []
:::

I edited the original post accordingly.

While separating leaf and container directives isn’t strictly necessary, it would enable one-liners for leaf blocks, instead of having the closing ::: always needed.

All in all, the goal is to make writing directives dead-simple. For example, with the following file named youtube.html placed in a leaf-block-directives-folder:

<figure>
    <iframe width="420" height="315" src="//www.youtube.com/embed/$vid$" frameborder="0" allowfullscreen></iframe>
    <figcaption>$content$</figcaption>
</figure>

The markdown :youtube[my _funny_ video]{ vid=09jf3ow9jfw } would render to, well, to what you’d expect.

+1

Yes, that’s what we figured as well. That’s why the (edited) first post currently doesn’t include the () anymore.

That’s an example how to show video instead of image without syntax change. I use something similar everywhere (but replace plain links)

I think, video from youtube & vimeo is the most “popular” request for markdown.

1 Like

I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.

The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.

2 Likes

+++ Andrew Meyer [Jan 21 15 16:45 ]:

I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.

The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.

I agree that a lot of care is needed, and that it would be bad to end up with something that looks too markupy and not “markdownish.”

On the other hand, using inline HTML has the big disadvantage that it only works if you’re targeting HTML. In principle someone might want to render a CommonMark document to multiple formats, and then they’d want some format-neutral way of indicating this structure.

3 Likes