I’m no parsing expert, but fenced code blocks currently work the same way. You potentially have to look ahead to the end of the document to make sure that three backticks aren’t starting a code block, but are actually just part of the paragraph text. jgm seems to be fine with the proposal for pandoc as well.
Since you usually don’t want an empty div, I think it makes sense to have the div syntax to be closing as well:
::: {.my-empty-div}
:::
Without a blank-line-after rule, you can do something like:
::: spoiler
# my title
my very long paragraph...
:::
Hmm, you’re right. The last MD implementation I wrote used ~~~-fenced code blocks (a la PHP Markdown Extra), so I didn’t even think of the issue with code and backtick fenced blocks.
Well, since we already have to deal with arbitrary lookahead (and GitHub-flavored MD being popular enough that we can’t really lose backtick-fenced blocks), I guess it’s not an issue to add an additional instance of arbitrary lookahead.
My question about div was more this example:
:::foo
some text
:::
This is supposed to be wrapped in a div.
:::
This should still be in the "foo" block.
:::
If you have anything in the div starting line, this works as expected - the “foo” block stays open until the end, etc. But if you have just an anonymous div block, it instead looks like a closing line for the “foo” block, and the subsequent ::: line (should be closing the div) instead opens a div.
If may be that you intend that you have to provide something on a container line - a name, some options, anything, so that a bare ::: line is always a closer. That’s fine, if so.
Or maybe you just intend this to be handled by the generic “nesting things inside of a block requires the outer block to have more colons than the inner one; otherwise weird things might happen”. That would be reasonable too.
I'm no parsing expert, but fenced code blocks currently work the same
way. You potentially have to look ahead to the end of the document to
make sure that three backticks aren't starting a code block, but are
actually just part of the paragraph text.
The way it works in CommonMark, a fenced code block starts with an opening fence and continues until either a matching closing fence or the end of the containing block (which might be the whole document). So you never need to lookahead. Avoiding lookahead that might need to reach to the end of the document is important.
With a fenced syntax for a block container, the following strategy could be used. If no closing fence is parsed before the end of the document (or enclosing container, which might e.g. be a block quote or list item), then we simply emit a paragraph with the raw contents of the opening fence, followed by the blocks parsed. Or, by analogy with the fenced code blocks, we could treat the unclosed container as if it were closed. Either way we’d avoid the need to lookahead.
jgm seems to be fine with the proposal [2]for pandoc as well.
Thank you for the clarification. If I understand correctly, it is possible to efficiently parse and differentiate between
:::foo
as a leaf block directive, and
:::foo
:::
as a container block directive.
What I meant was that you were fine with this div-syntax from an implementation-perspective for pandoc (since you originally posted it on pandoc-discuss), not that it was your favourite of the different proposals. Sorry for the confusion.
Thank you for the clarification. If I understand correctly, it is
possible to efficiently parse and differentiate between
:::foo
as a leaf block directive, and
:::foo
:::
as a container block directive.
I may have misunderstood the issue; I was only commenting on container blocks. A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents. I think, if generic leaf blocks were wanted, that it would be better to distinguish them syntactically from the starters for container blocks.
A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents.
Not quite; while a block leaf can contain inline markdown in its part, or as the value of one of its {} options, the “leaf” concept is really just about having a block that doesn’t have any further contents, like a video block.
I just read this interesting discussion and I’d like to chime in with a few remarks:
I don’t really like the ![](){} --> !default[](){} --> !image[](){} idea with redirecting to image or video based on the provided URL. This ties the generated output to a particular implementation of the parser which knows how to recognize if something is an image, a video, or something else. Parsers will have to look into each other’s implementation to figure out how to classify a particular URL to remain consistent. The output would also change if generated before or after a new video or image site or file format appears and is added to the list of known URLs. To me the ![]() image syntax is part of the core spec and as such should be strict and well-defined and always cause the same output to be generated in all cases. This would also make unit testing of implementations easier.
On the other hand extensions can be completely free to generate anything and it should be explicit that extensions are not standard syntax. That’s why I propose to distinguish clearly between extensions and core syntax by using a different marker character, for example, as previously suggested, @ instead of !. That way writers will know that a particular feature they’re using is tied to a particular website, CMS, or parser. Allowing an empty extension name would make this possible : @[](){} --> @default[](){} --> !image[](){}. This has the added benefit to avoid possible name clashes between extensions and any future evolution of the spec which might want to define a new name, but that would be impossible if an extension widely in use is already using it. I also rather like that @ stands out a little more than ! to locate extension uses more easily when rapidly scrolling through a document.
Concerning the multiple ways to provide arguments using [](){}, to me it looks like it could be simplified by considering that :
[] and () are both used to provide one or 2 unnamed ordered arguments
{} is used both to provide unnamed ordered arguments and named arguments
So [], () and unnamed arguments in {} could be used interchangeably. For example, @extension[argument 1](argument 2){argument 3, other: argument 4} and @extension[argument 1][argument 2][argument 3]{other: argument 4} are equivalent. Ultimately one of [] or () seems redundant but it is already widely used so they could be considered as alternative ways to say the same thing, as seems to be the general philosophy of the spec.
Right, I guess @tabatkins was on to something and using a different opening-syntax for leaf-blocks and container-blocks would avoid lookahead in almost all cases; e.g. !foo[] for inline directives, :foo[] for leaf block directives, and for container block directives:
While separating leaf and container directives isn’t strictly necessary, it would enable one-liners for leaf blocks, instead of having the closing ::: always needed.
All in all, the goal is to make writing directives dead-simple. For example, with the following file named youtube.html placed in a leaf-block-directives-folder:
I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.
The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.
I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.
The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.
I agree that a lot of care is needed, and that it would be bad to end up with something that looks too markupy and not “markdownish.”
On the other hand, using inline HTML has the big disadvantage that it only works if you’re targeting HTML. In principle someone might want to render a CommonMark document to multiple formats, and then they’d want some format-neutral way of indicating this structure.
Does anyone know what the process is for getting a common extension syntax into the CommonMark spec proper? I am wanting to implement a syntax, but don’t want to choose the wrong one, and end up having to support two once one is official.
Is there a formal application/proposal process to get something into the spec?
Does anyone know what the process is for getting a common extension
syntax into the CommonMark spec proper? I am wanting to implement a
syntax, but don’t want to choose the wrong one, and end up having to
support two once one is official.
Is there a formal application/proposal process to get something into
the spec?
The current priority is to get a solid spec for core elements. Extensions can be proposed and discussed on this forum, but won’t be a priority until the core is solid. (Things like raw HTML are still in flux.)
In some cases a good option will be to impose some conventions on existing syntax; you can then use the existing parser and make customizations either in the AST or with a custom writer or both. For example, you might treat a blockquote that starts with a level two header marked “Exercise” as a special “exercise” element, and render it in a distinctive way. This method has the advantage that your content would degrade nicely if the extension isn’t enabled.
This approach won’t always work well, but getting a collection of use cases and figuring out what isn’t currently possible to do will help in further development of the spec.
Anyway, let us know what you’re trying to achieve.
I couldn’t agree more with @jgm’s suggestion. As an example, I have been able to implement a convention for floating environments, such as figures, table and code listings, purely using existing md syntax (aside from attributed images, but that’s another story).
LIke John suggested, if you are making an extension that maps to block level content, making the starting delimiter a special section that stars with a keyword, such as
### Figure: proposed trajectory of Apollo 14 {#apollo14-plan}
gives some useful advantages:
Headers serve as a relatively unambiguously marker, both in the raw Markdown (it’s easy for regex to parse) and the parsed AST (there’s no need to recurse into AST trees since it’s only allowed in the “top level”).
Text editors, even with just the most rudimentary Markdown support, have no problem parsing headers. Most of them will also give you some sort of automatic TOC based on the headers. For my case, it was very nice because I jump between several text editors throughout the day and I was able to have my figures show up (along with their anchor label) in the TOC without having to write a single custom line of editor plugin code.
My current use case is trying to map Markdown to reStructuredText directives. I’m one of the maintainers of Read the Docs and we currently support Sphinx with rST, and are planning to support the current spec. We will be waiting until an extension syntax is defined before adapting one, so we don’t end up supporting two.
We will be adding support to Sphinx for current CommonMark spec though the CommonMark-Py implementation. For now, we will just implement it with no extensions, but it would be amazing to be able to get a more complete mapping to rst concepts.
My current thought is that rST directives (ref) are a bit too undefined. This leads to some situations, like the admonition where the arguments and content is ambiguous. Another instance is the note directive, that has no arguments but has content.
I’d argue this is an annoying implementation detail of rST, but my main concern is being able to map properly. Having explicit argument, option, and content syntax would solve a lot of it for us from the Markdown side. Then we will just need to figure out how to map it to rST.
I’d also love to get extensions into CommonMark so that we can begin to support them. We might go with your suggestions of extending current syntax for now – but the main power for us comes from a relatively complete mapping of the markup.
Some sphinx/docutils features do not map neatly with the current extension we are discussing here.
Namely the best course of action to make it non-awkward is to import as-is the directive and then use the generic way to put attributes for stuff like roles and such.
(I’m the guy adapting the remarkdown to use commonmark-py and I extended it already to support block attributes).