Generic directives/plugins syntax


#104

I’ve updated the first post in this thread with what was my takeaways from the block directives thread…


#105

Upon @jgm’s remark, which echoes concerns that have been voiced before, and after reading through these pandoc div syntax proposals, I’ve tried to simplify the syntax even more and updated the first post yet again. So now the (arg) is gone and wrapped into the attributes.


#106

Having read the entire thread, I find the use of !foo[](){} convincing for inline directives (where !image[]() is the default directive that you get if you don’t specify a name). I don’t think there’s sufficient goodness from other syntaxes to justify introducing something new here; they’ll always look somewhat code-y, so we might as well just reuse the existing code-y smell we’ve already accepted. I also like the use of []{} for generic spans, while we’re at it.

For blocks, I’m still somewhat convinced by “just use tagged code blocks”, again for the “just re-use the syntax we already have” reasons. The html-based input language for the spec preprocessor I maintain has a number of parsing extensions done with <pre class=foo>, which is very similar, and it seems to work fine. However, these extensions are all special-syntaxes, not nestable HTML; in other words, they’re a type of code, not a type of content. It may make sense to keep tagged code blocks for, well, code, even when it’s some specialized language. A block of latex that’s meant to be rendered, for example, might be good for this.

So then, the case in the OP for ::: block directives is relatively convincing. It allows both leaf and container blocks, which is great. It makes nesting of content possible without the processor needing to know what’s going on (it can just render unknown container directives as divs and continue processing the contents). Tagged code blocks can’t do this - if the processor doesn’t know the tag, it has to treat it as a <pre>. And I think the aesthetics are just quite nice, particularly the “spoiler” example without any arguments. That looks like something I might actually write in my plain-text documents, which is a big plus in its favor. None of the other proposed syntaxes have this.

So, in summary, I think we should be consistent with image links and use !foo for inline custom directives. We should use tagged code blocks for code, including cases where the code is interpreted and the result is somehow used in the place of the code block, like inline LaTeX. We should use the :::foo syntax for leaf custom blocks, and container custom blocks that contain more markdown. This introduces a minimum of new syntax, while preserving some imo decent plain-text aesthetics.


#107

Thanks for sifting through the whole discussion! So you think !foo[](){} is preferable to just !foo[]{}? The () were supposed to contain an id, url or path, similar to the image syntax. However, this can always be done with {src=foo}. It’s a tradeoff between a more concise but harder to explain/remember syntax, and a simpler alternative with only contents and attributes.

(btw, I think !foo, @foo and ::foo are all fine for inline directives.)


#108

Or, a somewhat more coherent version of my last post:

I think inline directives will always smell kinda bad, because plain text doesn’t really do anything with them. So, reusing an existing syntax at least minimizes the additional bad-smell introduced to the language, and increases learnability. So, I think we should stick with !foo for inlines, a la images, rather than introducing another meaningless bit of characters. There’s no functional difference between !foo and ::foo otherwise, just the prefix.

For blocks, I think we can usefully separate them into three categories:

  1. Leaf blocks
  2. Code-containing containers
  3. Markdown-containing containers

#1 is technically a subset of the others, but a lot of languages have special support for them, and I think they look decent in plain-text to support.

#2 and #3 should be distinguished so that processors which don’t understand the extension can do something useful - show preformatted text, or continue formatting contents as markdown. This is an important feature of generic extensions.

I think tagged code blocks already fulfill #2. It works for the obvious case of syntax highlighting, and I think is appropriate for “evaluated” code as well - falling back to displaying raw LaTeX, or raw railroad diagram descriptions, is totally fine and useful.

I think the OP’s proposal for using :::foo for #3 is good. It doesn’t clash with existing syntaxes, and it has good plain-text aesthetics, particularly when used with no arguments.

I also think it’s good for #1, for consistency. I assume you tell leaf apart from container by the presence of a blank line after it?


#109

So you think !foo[](){} is preferable to just !foo[]{}?

Not hugely important to me. I’m fine if the core language directives use some special syntax; I don’t think we really need to “explain” ref links ([foo][ref]) in this syntax.

On the other hand, consistency is nice - the two existing forms that use have a shared syntax for the () part: url, and possibly string for title. We could enforce this consistency: allow () on custom directives, but force it to have that same syntax. Not all custom directives need a url, for those that do, having them all use the same basic syntax (rather than having to remember which key this or that particular custom directive uses) is good. (I liked the examples of !video[foo](/foo.mpeg)/etc, those seemed very easy to understand if you already knew how images worked.)

Using () on a directive that doesn’t use that information is the same as passing a key it doesn’t use - it’s just wasted data.

(Things with more complex needs, like the slideshow example from previous that needs multiple urls, couldn’t use () for that. That’s fine, I think.


Edit: I expressed an opinion on the syntax inside of {} here, but I was wrong. Matching PanDoc and MarkdownExtra is valuable here; we should use their syntax, per http://talk.commonmark.org/t/consistent-attribute-syntax/272.


#110

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.


No, the container blocks are closed with ::: on its own line, while leaf blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.


#111

If 2. is implemented using fenced code blocks with attributes, that makes sense. Otherwise, the implementation could always keep the raw strings contents of the container in the AST (in addition to the parsed markdown-content) so that plugins etc. can do with it whatever they want.

My reason for separating the two was because they have different “ideal fallbacks” when presented in viewers that don’t understand the extension. “code containers” should fallback to preformatted text, while “markdown containers” should fall back to formatting their contents.

No, the leaf blocks are closed with ::: on its own line, while container blocks are not. For nesting, a different number of colons can be used, using the same rule as currently for fenced code blocks.

The OP doesn’t have ::: closing any of the leaf directives, and one of the container directive examples (:::::SPOILERS:::::) has ::: on both sides of the initial tag.


#112

Oh, obviously it’s exactly the other way, edited my post to:

No, the container blocks are closed with ::: on its own line, while leaf blocks are not.

btw, I’m the OP :wink:


#113

Oh, obviously it’s exactly the other way,

The problem is that you currently have to search forward an arbitrary number of lines to tell whether the directive is a leaf or a container. That’s not very good for parsing; ideally you have a finite amount of lookahead (right now, parsing MD requires 1 or 2 lines of lookahead, depending on your strategy in some cases). It also means that you wont’ be able to use a bare ::: without any further arguments to just mean “div”, as that would also trigger the “I’m a container!” behavior of a preceding block directive.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after. That’s why I suggested “blank line after” to mean leaf; it seems consistent with how we handle other types of blocks, and is easy to learn. (“end of container block” would also work, obviously, for those blocks where that’s unambiguous.)

btw, I’m the OP

I know, I was using it to mean “original post”, to specifically point at the first post in the thread for clarity.


#114

I’m no parsing expert, but fenced code blocks currently work the same way. You potentially have to look ahead to the end of the document to make sure that three backticks aren’t starting a code block, but are actually just part of the paragraph text. jgm seems to be fine with the proposal for pandoc as well.

Since you usually don’t want an empty div, I think it makes sense to have the div syntax to be closing as well:

::: {.my-empty-div}
:::

Without a blank-line-after rule, you can do something like:

::: spoiler

# my title

my very long paragraph...

:::

#115

Hmm, you’re right. The last MD implementation I wrote used ~~~-fenced code blocks (a la PHP Markdown Extra), so I didn’t even think of the issue with code and backtick fenced blocks.

Well, since we already have to deal with arbitrary lookahead (and GitHub-flavored MD being popular enough that we can’t really lose backtick-fenced blocks), I guess it’s not an issue to add an additional instance of arbitrary lookahead.

My question about div was more this example:

:::foo
some text
:::
This is supposed to be wrapped in a div.
:::
This should still be in the "foo" block.
:::

If you have anything in the div starting line, this works as expected - the “foo” block stays open until the end, etc. But if you have just an anonymous div block, it instead looks like a closing line for the “foo” block, and the subsequent ::: line (should be closing the div) instead opens a div.

If may be that you intend that you have to provide something on a container line - a name, some options, anything, so that a bare ::: line is always a closer. That’s fine, if so.

Or maybe you just intend this to be handled by the generic “nesting things inside of a block requires the outer block to have more colons than the inner one; otherwise weird things might happen”. That would be reasonable too.


#116

Yes, I think both approaches are viable. If we’re going with the first, a div with no attributes would still have to look like this:

::: {}
my content
:::

#117

+++ Mb21 [Nov 13 14 19:19 ]:

mb21 [1]mb21
November 13

 I'm no parsing expert, but fenced code blocks currently work the same
 way. You potentially have to look ahead to the end of the document to
 make sure that three backticks aren't starting a code block, but are
 actually just part of the paragraph text.

The way it works in CommonMark, a fenced code block starts with an opening fence and continues until either a matching closing fence or the end of the containing block (which might be the whole document). So you never need to lookahead. Avoiding lookahead that might need to reach to the end of the document is important.

With a fenced syntax for a block container, the following strategy could be used. If no closing fence is parsed before the end of the document (or enclosing container, which might e.g. be a block quote or list item), then we simply emit a paragraph with the raw contents of the opening fence, followed by the blocks parsed. Or, by analogy with the fenced code blocks, we could treat the unclosed container as if it were closed. Either way we’d avoid the need to lookahead.

 jgm seems to be fine with the proposal [2]for pandoc as well.

Do you infer that from my silence?


#118

Thank you for the clarification. If I understand correctly, it is possible to efficiently parse and differentiate between

:::foo

as a leaf block directive, and

:::foo
:::

as a container block directive.


What I meant was that you were fine with this div-syntax from an implementation-perspective for pandoc (since you originally posted it on pandoc-discuss), not that it was your favourite of the different proposals. Sorry for the confusion.


#119

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after.

That’s why I think my proposal is better for block directives (you can tell between a leaf and a block quickly from the first line):

leaf

!spoiler: harrys kills voltmort. But spares hermimi

container

!spoiler: toggle visibility on hover
:::
  harrys kills voltmort
  But spares hermimi
:::

More compact

!spoiler::::::::::::::::
  harrys kills voltmort
  But spares hermimi
::::::::::::::::::::::::

A compact version for single paragraph directive.

!spoiler: 
  harrys kills voltmort
  But spares hermimi

where

block = !spoiler:

inline = !spoiler

http://talk.commonmark.org/t/block-directives/802?u=mofosyne


#120

+++ Mb21 [Nov 13 14 22:14 ]:

mb21 [1]mb21
November 13

 Thank you for the clarification. If I understand correctly, it is
 possible to efficiently parse and differentiate between

:::foo

 as a leaf block directive, and

:::foo
:::

 as a container block directive.

I may have misunderstood the issue; I was only commenting on container blocks. A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents. I think, if generic leaf blocks were wanted, that it would be better to distinguish them syntactically from the starters for container blocks.


#121

A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents.

Not quite; while a block leaf can contain inline markdown in its [] part, or as the value of one of its {} options, the “leaf” concept is really just about having a block that doesn’t have any further contents, like a video block.


#122

I just read this interesting discussion and I’d like to chime in with a few remarks:

I don’t really like the ![](){} --> !default[](){} --> !image[](){} idea with redirecting to image or video based on the provided URL. This ties the generated output to a particular implementation of the parser which knows how to recognize if something is an image, a video, or something else. Parsers will have to look into each other’s implementation to figure out how to classify a particular URL to remain consistent. The output would also change if generated before or after a new video or image site or file format appears and is added to the list of known URLs. To me the ![]() image syntax is part of the core spec and as such should be strict and well-defined and always cause the same output to be generated in all cases. This would also make unit testing of implementations easier.

On the other hand extensions can be completely free to generate anything and it should be explicit that extensions are not standard syntax. That’s why I propose to distinguish clearly between extensions and core syntax by using a different marker character, for example, as previously suggested, @ instead of !. That way writers will know that a particular feature they’re using is tied to a particular website, CMS, or parser. Allowing an empty extension name would make this possible : @[](){} --> @default[](){} --> !image[](){}. This has the added benefit to avoid possible name clashes between extensions and any future evolution of the spec which might want to define a new name, but that would be impossible if an extension widely in use is already using it. I also rather like that @ stands out a little more than ! to locate extension uses more easily when rapidly scrolling through a document.

Concerning the multiple ways to provide arguments using [](){}, to me it looks like it could be simplified by considering that :

  1. [] and () are both used to provide one or 2 unnamed ordered arguments
  2. {} is used both to provide unnamed ordered arguments and named arguments

So [], () and unnamed arguments in {} could be used interchangeably. For example, @extension[argument 1](argument 2){argument 3, other: argument 4} and @extension[argument 1][argument 2][argument 3]{other: argument 4} are equivalent. Ultimately one of [] or () seems redundant but it is already widely used so they could be considered as alternative ways to say the same thing, as seems to be the general philosophy of the spec.


#123

Right, I guess @tabatkins was on to something and using a different opening-syntax for leaf-blocks and container-blocks would avoid lookahead in almost all cases; e.g. !foo[] for inline directives, :foo[] for leaf block directives, and for container block directives:

::: foo []
:::

I edited the original post accordingly.

While separating leaf and container directives isn’t strictly necessary, it would enable one-liners for leaf blocks, instead of having the closing ::: always needed.

All in all, the goal is to make writing directives dead-simple. For example, with the following file named youtube.html placed in a leaf-block-directives-folder:

<figure>
    <iframe width="420" height="315" src="//www.youtube.com/embed/$vid$" frameborder="0" allowfullscreen></iframe>
    <figcaption>$content$</figcaption>
</figure>

The markdown :youtube[my _funny_ video]{ vid=09jf3ow9jfw } would render to, well, to what you’d expect.