Generic directives/plugins syntax

May be it worth to move block directives markup discussion to separate thread? I’d like to implement something. It seems that such block are independent on inlines, and syntax can be more simple.

1 Like

I agree. Inline discussion is stabilising around the ![](){} vs @[](){} but there is still some debate around block syntax. There should be a separate discussion on block syntax.

http://talk.commonmark.org/t/block-directives/802


nexussays: It’s a directive, not a function. It only specifies intent of users, not how it looks at the output. So the question about HTML element output in this context makes no sense, because it ignores other context like pdf or latex.

@mb21, could you join Block Directives discussion? I thinks blocks have more high chance to be stabilized in short time.

I’ve updated the first post in this thread with what was my takeaways from the block directives thread…

Upon @jgm’s remark, which echoes concerns that have been voiced before, and after reading through these pandoc div syntax proposals, I’ve tried to simplify the syntax even more and updated the first post yet again. So now the (arg) is gone and wrapped into the attributes.

Having read the entire thread, I find the use of !foo[](){} convincing for inline directives (where !image[]() is the default directive that you get if you don’t specify a name). I don’t think there’s sufficient goodness from other syntaxes to justify introducing something new here; they’ll always look somewhat code-y, so we might as well just reuse the existing code-y smell we’ve already accepted. I also like the use of []{} for generic spans, while we’re at it.

For blocks, I’m still somewhat convinced by “just use tagged code blocks”, again for the “just re-use the syntax we already have” reasons. The html-based input language for the spec preprocessor I maintain has a number of parsing extensions done with <pre class=foo>, which is very similar, and it seems to work fine. However, these extensions are all special-syntaxes, not nestable HTML; in other words, they’re a type of code, not a type of content. It may make sense to keep tagged code blocks for, well, code, even when it’s some specialized language. A block of latex that’s meant to be rendered, for example, might be good for this.

So then, the case in the OP for ::: block directives is relatively convincing. It allows both leaf and container blocks, which is great. It makes nesting of content possible without the processor needing to know what’s going on (it can just render unknown container directives as divs and continue processing the contents). Tagged code blocks can’t do this - if the processor doesn’t know the tag, it has to treat it as a <pre>. And I think the aesthetics are just quite nice, particularly the “spoiler” example without any arguments. That looks like something I might actually write in my plain-text documents, which is a big plus in its favor. None of the other proposed syntaxes have this.

So, in summary, I think we should be consistent with image links and use !foo for inline custom directives. We should use tagged code blocks for code, including cases where the code is interpreted and the result is somehow used in the place of the code block, like inline LaTeX. We should use the :::foo syntax for leaf custom blocks, and container custom blocks that contain more markdown. This introduces a minimum of new syntax, while preserving some imo decent plain-text aesthetics.

3 Likes

Thanks for sifting through the whole discussion! So you think !foo[](){} is preferable to just !foo[]{}? The () were supposed to contain an id, url or path, similar to the image syntax. However, this can always be done with {src=foo}. It’s a tradeoff between a more concise but harder to explain/remember syntax, and a simpler alternative with only contents and attributes.

(btw, I think !foo, @foo and ::foo are all fine for inline directives.)

Or, a somewhat more coherent version of my last post:

I think inline directives will always smell kinda bad, because plain text doesn’t really do anything with them. So, reusing an existing syntax at least minimizes the additional bad-smell introduced to the language, and increases learnability. So, I think we should stick with !foo for inlines, a la images, rather than introducing another meaningless bit of characters. There’s no functional difference between !foo and ::foo otherwise, just the prefix.

For blocks, I think we can usefully separate them into three categories:

  1. Leaf blocks
  2. Code-containing containers
  3. Markdown-containing containers

#1 is technically a subset of the others, but a lot of languages have special support for them, and I think they look decent in plain-text to support.

#2 and #3 should be distinguished so that processors which don’t understand the extension can do something useful - show preformatted text, or continue formatting contents as markdown. This is an important feature of generic extensions.

I think tagged code blocks already fulfill #2. It works for the obvious case of syntax highlighting, and I think is appropriate for “evaluated” code as well - falling back to displaying raw LaTeX, or raw railroad diagram descriptions, is totally fine and useful.

I think the OP’s proposal for using :::foo for #3 is good. It doesn’t clash with existing syntaxes, and it has good plain-text aesthetics, particularly when used with no arguments.

I also think it’s good for #1, for consistency. I assume you tell leaf apart from container by the presence of a blank line after it?

1 Like

So you think !foo[](){} is preferable to just !foo[]{}?

Not hugely important to me. I’m fine if the core language directives use some special syntax; I don’t think we really need to “explain” ref links ([foo][ref]) in this syntax.

On the other hand, consistency is nice - the two existing forms that use have a shared syntax for the () part: url, and possibly string for title. We could enforce this consistency: allow () on custom directives, but force it to have that same syntax. Not all custom directives need a url, for those that do, having them all use the same basic syntax (rather than having to remember which key this or that particular custom directive uses) is good. (I liked the examples of !video[foo](/foo.mpeg)/etc, those seemed very easy to understand if you already knew how images worked.)

Using () on a directive that doesn’t use that information is the same as passing a key it doesn’t use - it’s just wasted data.

(Things with more complex needs, like the slideshow example from previous that needs multiple urls, couldn’t use () for that. That’s fine, I think.


Edit: I expressed an opinion on the syntax inside of {} here, but I was wrong. Matching PanDoc and MarkdownExtra is valuable here; we should use their syntax, per http://talk.commonmark.org/t/consistent-attribute-syntax/272.

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.


No, the container blocks are closed with ::: on its own line, while leaf blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.

My reason for separating the two was because they have different “ideal fallbacks” when presented in viewers that don’t understand the extension. “code containers” should fallback to preformatted text, while “markdown containers” should fall back to formatting their contents.

No, the leaf blocks are closed with ::: on its own line, while container blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.

The OP doesn’t have ::: closing any of the leaf directives, and one of the container directive examples (:::::SPOILERS:::::) has ::: on both sides of the initial tag.

Oh, obviously it’s exactly the other way, edited my post to:

No, the container blocks are closed with ::: on its own line, while leaf blocks are not.

btw, I’m the OP :wink:

Oh, obviously it’s exactly the other way,

The problem is that you currently have to search forward an arbitrary number of lines to tell whether the directive is a leaf or a container. That’s not very good for parsing; ideally you have a finite amount of lookahead (right now, parsing MD requires 1 or 2 lines of lookahead, depending on your strategy in some cases). It also means that you wont’ be able to use a bare ::: without any further arguments to just mean “div”, as that would also trigger the “I’m a container!” behavior of a preceding block directive.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after. That’s why I suggested “blank line after” to mean leaf; it seems consistent with how we handle other types of blocks, and is easy to learn. (“end of container block” would also work, obviously, for those blocks where that’s unambiguous.)

btw, I’m the OP

I know, I was using it to mean “original post”, to specifically point at the first post in the thread for clarity.

I’m no parsing expert, but fenced code blocks currently work the same way. You potentially have to look ahead to the end of the document to make sure that three backticks aren’t starting a code block, but are actually just part of the paragraph text. jgm seems to be fine with the proposal for pandoc as well.

Since you usually don’t want an empty div, I think it makes sense to have the div syntax to be closing as well:

::: {.my-empty-div}
:::

Without a blank-line-after rule, you can do something like:

::: spoiler

# my title

my very long paragraph...

:::

Hmm, you’re right. The last MD implementation I wrote used ~~~-fenced code blocks (a la PHP Markdown Extra), so I didn’t even think of the issue with code and backtick fenced blocks.

Well, since we already have to deal with arbitrary lookahead (and GitHub-flavored MD being popular enough that we can’t really lose backtick-fenced blocks), I guess it’s not an issue to add an additional instance of arbitrary lookahead.

My question about div was more this example:

:::foo
some text
:::
This is supposed to be wrapped in a div.
:::
This should still be in the "foo" block.
:::

If you have anything in the div starting line, this works as expected - the “foo” block stays open until the end, etc. But if you have just an anonymous div block, it instead looks like a closing line for the “foo” block, and the subsequent ::: line (should be closing the div) instead opens a div.

If may be that you intend that you have to provide something on a container line - a name, some options, anything, so that a bare ::: line is always a closer. That’s fine, if so.

Or maybe you just intend this to be handled by the generic “nesting things inside of a block requires the outer block to have more colons than the inner one; otherwise weird things might happen”. That would be reasonable too.

Yes, I think both approaches are viable. If we’re going with the first, a div with no attributes would still have to look like this:

::: {}
my content
:::

+++ Mb21 [Nov 13 14 19:19 ]:

mb21 [1]mb21
November 13

 I'm no parsing expert, but fenced code blocks currently work the same
 way. You potentially have to look ahead to the end of the document to
 make sure that three backticks aren't starting a code block, but are
 actually just part of the paragraph text.

The way it works in CommonMark, a fenced code block starts with an opening fence and continues until either a matching closing fence or the end of the containing block (which might be the whole document). So you never need to lookahead. Avoiding lookahead that might need to reach to the end of the document is important.

With a fenced syntax for a block container, the following strategy could be used. If no closing fence is parsed before the end of the document (or enclosing container, which might e.g. be a block quote or list item), then we simply emit a paragraph with the raw contents of the opening fence, followed by the blocks parsed. Or, by analogy with the fenced code blocks, we could treat the unclosed container as if it were closed. Either way we’d avoid the need to lookahead.

 jgm seems to be fine with the proposal [2]for pandoc as well.

Do you infer that from my silence?

Thank you for the clarification. If I understand correctly, it is possible to efficiently parse and differentiate between

:::foo

as a leaf block directive, and

:::foo
:::

as a container block directive.


What I meant was that you were fine with this div-syntax from an implementation-perspective for pandoc (since you originally posted it on pandoc-discuss), not that it was your favourite of the different proposals. Sorry for the confusion.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after.

That’s why I think my proposal is better for block directives (you can tell between a leaf and a block quickly from the first line):

leaf

!spoiler: harrys kills voltmort. But spares hermimi

container

!spoiler: toggle visibility on hover
:::
  harrys kills voltmort
  But spares hermimi
:::

More compact

!spoiler::::::::::::::::
  harrys kills voltmort
  But spares hermimi
::::::::::::::::::::::::

A compact version for single paragraph directive.

!spoiler: 
  harrys kills voltmort
  But spares hermimi

where

block = !spoiler:

inline = !spoiler

+++ Mb21 [Nov 13 14 22:14 ]:

mb21 [1]mb21
November 13

 Thank you for the clarification. If I understand correctly, it is
 possible to efficiently parse and differentiate between

:::foo

 as a leaf block directive, and

:::foo
:::

 as a container block directive.

I may have misunderstood the issue; I was only commenting on container blocks. A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents. I think, if generic leaf blocks were wanted, that it would be better to distinguish them syntactically from the starters for container blocks.