Generic directives/plugins syntax

I’m no parsing expert, but fenced code blocks currently work the same way. You potentially have to look ahead to the end of the document to make sure that three backticks aren’t starting a code block, but are actually just part of the paragraph text. jgm seems to be fine with the proposal for pandoc as well.

Since you usually don’t want an empty div, I think it makes sense to have the div syntax to be closing as well:

::: {.my-empty-div}
:::

Without a blank-line-after rule, you can do something like:

::: spoiler

# my title

my very long paragraph...

:::

Hmm, you’re right. The last MD implementation I wrote used ~~~-fenced code blocks (a la PHP Markdown Extra), so I didn’t even think of the issue with code and backtick fenced blocks.

Well, since we already have to deal with arbitrary lookahead (and GitHub-flavored MD being popular enough that we can’t really lose backtick-fenced blocks), I guess it’s not an issue to add an additional instance of arbitrary lookahead.

My question about div was more this example:

:::foo
some text
:::
This is supposed to be wrapped in a div.
:::
This should still be in the "foo" block.
:::

If you have anything in the div starting line, this works as expected - the “foo” block stays open until the end, etc. But if you have just an anonymous div block, it instead looks like a closing line for the “foo” block, and the subsequent ::: line (should be closing the div) instead opens a div.

If may be that you intend that you have to provide something on a container line - a name, some options, anything, so that a bare ::: line is always a closer. That’s fine, if so.

Or maybe you just intend this to be handled by the generic “nesting things inside of a block requires the outer block to have more colons than the inner one; otherwise weird things might happen”. That would be reasonable too.

Yes, I think both approaches are viable. If we’re going with the first, a div with no attributes would still have to look like this:

::: {}
my content
:::

+++ Mb21 [Nov 13 14 19:19 ]:

mb21 [1]mb21
November 13

 I'm no parsing expert, but fenced code blocks currently work the same
 way. You potentially have to look ahead to the end of the document to
 make sure that three backticks aren't starting a code block, but are
 actually just part of the paragraph text.

The way it works in CommonMark, a fenced code block starts with an opening fence and continues until either a matching closing fence or the end of the containing block (which might be the whole document). So you never need to lookahead. Avoiding lookahead that might need to reach to the end of the document is important.

With a fenced syntax for a block container, the following strategy could be used. If no closing fence is parsed before the end of the document (or enclosing container, which might e.g. be a block quote or list item), then we simply emit a paragraph with the raw contents of the opening fence, followed by the blocks parsed. Or, by analogy with the fenced code blocks, we could treat the unclosed container as if it were closed. Either way we’d avoid the need to lookahead.

 jgm seems to be fine with the proposal [2]for pandoc as well.

Do you infer that from my silence?

Thank you for the clarification. If I understand correctly, it is possible to efficiently parse and differentiate between

:::foo

as a leaf block directive, and

:::foo
:::

as a container block directive.


What I meant was that you were fine with this div-syntax from an implementation-perspective for pandoc (since you originally posted it on pandoc-discuss), not that it was your favourite of the different proposals. Sorry for the confusion.

It would be better if there was some way to distinguish containers from leaves in the first line, or the line after.

That’s why I think my proposal is better for block directives (you can tell between a leaf and a block quickly from the first line):

leaf

!spoiler: harrys kills voltmort. But spares hermimi

container

!spoiler: toggle visibility on hover
:::
  harrys kills voltmort
  But spares hermimi
:::

More compact

!spoiler::::::::::::::::
  harrys kills voltmort
  But spares hermimi
::::::::::::::::::::::::

A compact version for single paragraph directive.

!spoiler: 
  harrys kills voltmort
  But spares hermimi

where

block = !spoiler:

inline = !spoiler

http://talk.commonmark.org/t/block-directives/802?u=mofosyne

+++ Mb21 [Nov 13 14 22:14 ]:

mb21 [1]mb21
November 13

 Thank you for the clarification. If I understand correctly, it is
 possible to efficiently parse and differentiate between

:::foo

 as a leaf block directive, and

:::foo
:::

 as a container block directive.

I may have misunderstood the issue; I was only commenting on container blocks. A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents. I think, if generic leaf blocks were wanted, that it would be better to distinguish them syntactically from the starters for container blocks.

A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents.

Not quite; while a block leaf can contain inline markdown in its [] part, or as the value of one of its {} options, the “leaf” concept is really just about having a block that doesn’t have any further contents, like a video block.

I just read this interesting discussion and I’d like to chime in with a few remarks:

I don’t really like the ![](){} --> !default[](){} --> !image[](){} idea with redirecting to image or video based on the provided URL. This ties the generated output to a particular implementation of the parser which knows how to recognize if something is an image, a video, or something else. Parsers will have to look into each other’s implementation to figure out how to classify a particular URL to remain consistent. The output would also change if generated before or after a new video or image site or file format appears and is added to the list of known URLs. To me the ![]() image syntax is part of the core spec and as such should be strict and well-defined and always cause the same output to be generated in all cases. This would also make unit testing of implementations easier.

On the other hand extensions can be completely free to generate anything and it should be explicit that extensions are not standard syntax. That’s why I propose to distinguish clearly between extensions and core syntax by using a different marker character, for example, as previously suggested, @ instead of !. That way writers will know that a particular feature they’re using is tied to a particular website, CMS, or parser. Allowing an empty extension name would make this possible : @[](){} --> @default[](){} --> !image[](){}. This has the added benefit to avoid possible name clashes between extensions and any future evolution of the spec which might want to define a new name, but that would be impossible if an extension widely in use is already using it. I also rather like that @ stands out a little more than ! to locate extension uses more easily when rapidly scrolling through a document.

Concerning the multiple ways to provide arguments using [](){}, to me it looks like it could be simplified by considering that :

  1. [] and () are both used to provide one or 2 unnamed ordered arguments
  2. {} is used both to provide unnamed ordered arguments and named arguments

So [], () and unnamed arguments in {} could be used interchangeably. For example, @extension[argument 1](argument 2){argument 3, other: argument 4} and @extension[argument 1][argument 2][argument 3]{other: argument 4} are equivalent. Ultimately one of [] or () seems redundant but it is already widely used so they could be considered as alternative ways to say the same thing, as seems to be the general philosophy of the spec.

Right, I guess @tabatkins was on to something and using a different opening-syntax for leaf-blocks and container-blocks would avoid lookahead in almost all cases; e.g. !foo[] for inline directives, :foo[] for leaf block directives, and for container block directives:

::: foo []
:::

I edited the original post accordingly.

While separating leaf and container directives isn’t strictly necessary, it would enable one-liners for leaf blocks, instead of having the closing ::: always needed.

All in all, the goal is to make writing directives dead-simple. For example, with the following file named youtube.html placed in a leaf-block-directives-folder:

<figure>
    <iframe width="420" height="315" src="//www.youtube.com/embed/$vid$" frameborder="0" allowfullscreen></iframe>
    <figcaption>$content$</figcaption>
</figure>

The markdown :youtube[my _funny_ video]{ vid=09jf3ow9jfw } would render to, well, to what you’d expect.

+1

Yes, that’s what we figured as well. That’s why the (edited) first post currently doesn’t include the () anymore.

That’s an example how to show video instead of image without syntax change. I use something similar everywhere (but replace plain links)

I think, video from youtube & vimeo is the most “popular” request for markdown.

1 Like

I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.

The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.

2 Likes

+++ Andrew Meyer [Jan 21 15 16:45 ]:

I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.

The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.

I agree that a lot of care is needed, and that it would be bad to end up with something that looks too markupy and not “markdownish.”

On the other hand, using inline HTML has the big disadvantage that it only works if you’re targeting HTML. In principle someone might want to render a CommonMark document to multiple formats, and then they’d want some format-neutral way of indicating this structure.

3 Likes

I’m tempted a lot to borrow rst syntax for that part and see how it feels.

Does anyone know what the process is for getting a common extension syntax into the CommonMark spec proper? I am wanting to implement a syntax, but don’t want to choose the wrong one, and end up having to support two once one is official.

Is there a formal application/proposal process to get something into the spec?

+++ Eric Holscher [Feb 13 15 19:19 ]:

[1]ericholscher
February 13

Does anyone know what the process is for getting a common extension
syntax into the CommonMark spec proper? I am wanting to implement a
syntax, but don’t want to choose the wrong one, and end up having to
support two once one is official.

Is there a formal application/proposal process to get something into
the spec?

The current priority is to get a solid spec for core elements. Extensions can be proposed and discussed on this forum, but won’t be a priority until the core is solid. (Things like raw HTML are still in flux.)

In some cases a good option will be to impose some conventions on existing syntax; you can then use the existing parser and make customizations either in the AST or with a custom writer or both. For example, you might treat a blockquote that starts with a level two header marked “Exercise” as a special “exercise” element, and render it in a distinctive way. This method has the advantage that your content would degrade nicely if the extension isn’t enabled.

This approach won’t always work well, but getting a collection of use cases and figuring out what isn’t currently possible to do will help in further development of the spec.

Anyway, let us know what you’re trying to achieve.

I couldn’t agree more with @jgm’s suggestion. As an example, I have been able to implement a convention for floating environments, such as figures, table and code listings, purely using existing md syntax (aside from attributed images, but that’s another story).

LIke John suggested, if you are making an extension that maps to block level content, making the starting delimiter a special section that stars with a keyword, such as

### Figure: proposed trajectory of Apollo 14 {#apollo14-plan}

gives some useful advantages:

  1. Headers serve as a relatively unambiguously marker, both in the raw Markdown (it’s easy for regex to parse) and the parsed AST (there’s no need to recurse into AST trees since it’s only allowed in the “top level”).

  2. Text editors, even with just the most rudimentary Markdown support, have no problem parsing headers. Most of them will also give you some sort of automatic TOC based on the headers. For my case, it was very nice because I jump between several text editors throughout the day and I was able to have my figures show up (along with their anchor label) in the TOC without having to write a single custom line of editor plugin code.

Thanks for the thoughtful replies.

My current use case is trying to map Markdown to reStructuredText directives. I’m one of the maintainers of Read the Docs and we currently support Sphinx with rST, and are planning to support the current spec. We will be waiting until an extension syntax is defined before adapting one, so we don’t end up supporting two.

We will be adding support to Sphinx for current CommonMark spec though the CommonMark-Py implementation. For now, we will just implement it with no extensions, but it would be amazing to be able to get a more complete mapping to rst concepts.

My current thought is that rST directives (ref) are a bit too undefined. This leads to some situations, like the admonition where the arguments and content is ambiguous. Another instance is the note directive, that has no arguments but has content.

An example of this: http://rst.ninjs.org/?n=fac7927439c29f7eb80db0f255a21427&theme=basic

I’d argue this is an annoying implementation detail of rST, but my main concern is being able to map properly. Having explicit argument, option, and content syntax would solve a lot of it for us from the Markdown side. Then we will just need to figure out how to map it to rST.

I’d also love to get extensions into CommonMark so that we can begin to support them. We might go with your suggestions of extending current syntax for now – but the main power for us comes from a relatively complete mapping of the markup.

1 Like

Some sphinx/docutils features do not map neatly with the current extension we are discussing here.

Namely the best course of action to make it non-awkward is to import as-is the directive and then use the generic way to put attributes for stuff like roles and such.

(I’m the guy adapting the remarkdown to use commonmark-py and I extended it already to support block attributes).

1 Like