A leaf block would presumably have a sequence of inline elements (rather than block elements) as its contents.
Not quite; while a block leaf can contain inline markdown in its part, or as the value of one of its {} options, the “leaf” concept is really just about having a block that doesn’t have any further contents, like a video block.
I just read this interesting discussion and I’d like to chime in with a few remarks:
I don’t really like the ![](){} --> !default[](){} --> !image[](){} idea with redirecting to image or video based on the provided URL. This ties the generated output to a particular implementation of the parser which knows how to recognize if something is an image, a video, or something else. Parsers will have to look into each other’s implementation to figure out how to classify a particular URL to remain consistent. The output would also change if generated before or after a new video or image site or file format appears and is added to the list of known URLs. To me the ![]() image syntax is part of the core spec and as such should be strict and well-defined and always cause the same output to be generated in all cases. This would also make unit testing of implementations easier.
On the other hand extensions can be completely free to generate anything and it should be explicit that extensions are not standard syntax. That’s why I propose to distinguish clearly between extensions and core syntax by using a different marker character, for example, as previously suggested, @ instead of !. That way writers will know that a particular feature they’re using is tied to a particular website, CMS, or parser. Allowing an empty extension name would make this possible : @[](){} --> @default[](){} --> !image[](){}. This has the added benefit to avoid possible name clashes between extensions and any future evolution of the spec which might want to define a new name, but that would be impossible if an extension widely in use is already using it. I also rather like that @ stands out a little more than ! to locate extension uses more easily when rapidly scrolling through a document.
Concerning the multiple ways to provide arguments using [](){}, to me it looks like it could be simplified by considering that :
[] and () are both used to provide one or 2 unnamed ordered arguments
{} is used both to provide unnamed ordered arguments and named arguments
So [], () and unnamed arguments in {} could be used interchangeably. For example, @extension[argument 1](argument 2){argument 3, other: argument 4} and @extension[argument 1][argument 2][argument 3]{other: argument 4} are equivalent. Ultimately one of [] or () seems redundant but it is already widely used so they could be considered as alternative ways to say the same thing, as seems to be the general philosophy of the spec.
Right, I guess @tabatkins was on to something and using a different opening-syntax for leaf-blocks and container-blocks would avoid lookahead in almost all cases; e.g. !foo[] for inline directives, :foo[] for leaf block directives, and for container block directives:
While separating leaf and container directives isn’t strictly necessary, it would enable one-liners for leaf blocks, instead of having the closing ::: always needed.
All in all, the goal is to make writing directives dead-simple. For example, with the following file named youtube.html placed in a leaf-block-directives-folder:
I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.
The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.
I think we have to be careful here not to make things too complicated. In my view, if we get to the point where the syntax isn’t easily readable and understandable by someone completely unfamiliar with Markdown, then we might as well just use inline HTML.
The syntax proposed in the original post looks okay, in my opinion. This is something we have to be very careful about though.
I agree that a lot of care is needed, and that it would be bad to end up with something that looks too markupy and not “markdownish.”
On the other hand, using inline HTML has the big disadvantage that it only works if you’re targeting HTML. In principle someone might want to render a CommonMark document to multiple formats, and then they’d want some format-neutral way of indicating this structure.
Does anyone know what the process is for getting a common extension syntax into the CommonMark spec proper? I am wanting to implement a syntax, but don’t want to choose the wrong one, and end up having to support two once one is official.
Is there a formal application/proposal process to get something into the spec?
Does anyone know what the process is for getting a common extension
syntax into the CommonMark spec proper? I am wanting to implement a
syntax, but don’t want to choose the wrong one, and end up having to
support two once one is official.
Is there a formal application/proposal process to get something into
the spec?
The current priority is to get a solid spec for core elements. Extensions can be proposed and discussed on this forum, but won’t be a priority until the core is solid. (Things like raw HTML are still in flux.)
In some cases a good option will be to impose some conventions on existing syntax; you can then use the existing parser and make customizations either in the AST or with a custom writer or both. For example, you might treat a blockquote that starts with a level two header marked “Exercise” as a special “exercise” element, and render it in a distinctive way. This method has the advantage that your content would degrade nicely if the extension isn’t enabled.
This approach won’t always work well, but getting a collection of use cases and figuring out what isn’t currently possible to do will help in further development of the spec.
Anyway, let us know what you’re trying to achieve.
I couldn’t agree more with @jgm’s suggestion. As an example, I have been able to implement a convention for floating environments, such as figures, table and code listings, purely using existing md syntax (aside from attributed images, but that’s another story).
LIke John suggested, if you are making an extension that maps to block level content, making the starting delimiter a special section that stars with a keyword, such as
### Figure: proposed trajectory of Apollo 14 {#apollo14-plan}
gives some useful advantages:
Headers serve as a relatively unambiguously marker, both in the raw Markdown (it’s easy for regex to parse) and the parsed AST (there’s no need to recurse into AST trees since it’s only allowed in the “top level”).
Text editors, even with just the most rudimentary Markdown support, have no problem parsing headers. Most of them will also give you some sort of automatic TOC based on the headers. For my case, it was very nice because I jump between several text editors throughout the day and I was able to have my figures show up (along with their anchor label) in the TOC without having to write a single custom line of editor plugin code.
My current use case is trying to map Markdown to reStructuredText directives. I’m one of the maintainers of Read the Docs and we currently support Sphinx with rST, and are planning to support the current spec. We will be waiting until an extension syntax is defined before adapting one, so we don’t end up supporting two.
We will be adding support to Sphinx for current CommonMark spec though the CommonMark-Py implementation. For now, we will just implement it with no extensions, but it would be amazing to be able to get a more complete mapping to rst concepts.
My current thought is that rST directives (ref) are a bit too undefined. This leads to some situations, like the admonition where the arguments and content is ambiguous. Another instance is the note directive, that has no arguments but has content.
I’d argue this is an annoying implementation detail of rST, but my main concern is being able to map properly. Having explicit argument, option, and content syntax would solve a lot of it for us from the Markdown side. Then we will just need to figure out how to map it to rST.
I’d also love to get extensions into CommonMark so that we can begin to support them. We might go with your suggestions of extending current syntax for now – but the main power for us comes from a relatively complete mapping of the markup.
Some sphinx/docutils features do not map neatly with the current extension we are discussing here.
Namely the best course of action to make it non-awkward is to import as-is the directive and then use the generic way to put attributes for stuff like roles and such.
(I’m the guy adapting the remarkdown to use commonmark-py and I extended it already to support block attributes).
Yea, the main thing I’m worried about in the deeper mapping is the dependence of the directives on the rst parser – I guess at the end of the day even pushing the content into the directive as a string it would be parsed as rst but the author would be using markdown, so we need to use the generic way of mapping parsed commonmark objects.
Not really, the directive interface itself just takes the text and the option and passes it down to a specialized parser. Usually the parser feeds back docutils nodes to be processed by the generic docutils code.
The problem I have is that the model of those directive is that options can have the same name and multiple instances, while here we have a dictionary, thus a single key.
Interesting, we first thought about having both arguments (i.e. filenames, urls and other identifiers) and options (i.e. key-value pairs which we call attributes) as well. But then we dropped the former since having both is somewhat redundant—you can simply write :youtube[funny cat]{id=1234 fullscreen=true} instead of :youtube[funny cat](1234){fullscreen=true}—and it has the potential to confuse users unnecessarily.
(Note that you can still have attributes as well as content, the difference being that attribute-key and -values are plain strings, while content is markdown as well.)
Some sphinx/docutils features do not map neatly with the current extension we are discussing here.
@lu_zero, are you talking about the markdown directives proposal having no arguments, or?
Do you guys think this poses a major problem for mapping to docutils?
@ericholscher, would love to hear your feedback on the proposel in the first post in this topic, which I’ve updated a few times as the discussion progressed.
I think the proposal generally looks good. My main concern is with the above example, which conflates content with arguments. IE, I would consider “Foobar” an argument to the wikipedia directive, but “content” is actual content for the smallcaps directive. I believe rst did this same thing with mixing arguments and content, but I don’t know if there is really a way around it, since users will likely do it because it’s a more natural syntax.
For example, this feels kinda off to me:
:wikipedia[**Foobar**]
:smallcaps[**we went for a run**]
Though having it be: :wikipedia{Foobar} might be workable, I think that the separation might will not be kept by third party authors. I think it might be an unavoidable side effect of offering this kind of feature, and probably not able to be stopped in a spec.
I’m not totally set on RST’s differentiation between arguments and options. I think we could do a hacky thing and map {arguments='string of arguments'} or something to produce a more compatible mapping. I think at the end of the day, mapping perfectly is going to be impossible, or at least really awkward.
.. py:function:: send_message(sender, recipient, message_body, [priority=1])
Send a message to a recipient
:param str sender: The person sending the message
:param str recipient: The recipient of the message
:param str message_body: The body of the message
:param priority: The priority of the message, can be a number 1-5
:type priority: integer or None
:return: the message id
:rtype: int
:raises ValueError: if the message_body exceeds 160 characters
:raises TypeError: if the message_body is not a basestring
The simplest way is make something like
:::rst
.. py:function:: send_message(sender, recipient, message_body, [priority=1])
Send a message to a recipient
:param str sender: The person sending the message
:param str recipient: The recipient of the message
:param str message_body: The body of the message
:param priority: The priority of the message, can be a number 1-5
:type priority: integer or None
:return: the message id
:rtype: int
:raises ValueError: if the message_body exceeds 160 characters
:raises TypeError: if the message_body is not a basestring
:::
or
:::py_function[send_message(sender, recipient, message_body, [priority=1])]
Send a message to a recipient
:param[str](sender){c="The person sending the message"}
:param[str](recipient){c="The recipient of the message"}
:param[str](message_body){comment="The body of the message"}
:param[](priority: The priority of the message, can be a number 1-5
:type[priority](integer, None)
:return[](){c="the message id")
:rtype[int]
:raises[ValueError](if the message_body exceeds 160 characters)
:raises[TypeError](if the message_body is not a basestring)
:::
to use something close to what we discussed so far.
Interesting. That is certainly one way to make it work – and likely less hacky than all the other ways I was looking at doing it. It will makes the user write rst, but it makes a nice bridge between the two.