Yes, having a mechanism how “tagged” code blocks could be processed by more-or-less “external” tools, in a common and general way, would be a good thing, I agree on that. There are several comments on it I would like to give on the details.
The !
to mark “external syntax” would be fine with me too. But what would a label on a code block be good for if it had no consequences? Just for documentation? And if it would have consequences (other than in the current cmark
, if I remember correctly!)—how do these consequences happen if not by an “external” processor of the kind we’re talking about, be it for syntax highlighting in the HTML output, or for a complete new and other mark-up syntax?
Thinking about how cmake
could find the appropriate processor for the tag, and how this “external” processor would be invoked (A plug-in architecture? Using standard C library system()
? Both? How does the “external” processor’s output comes back into the cmark
process—which is still in the middle of processing an input document?), I felt that this seemed all too complex and brittle to implement, at least using the standard C library only.
And considering it being a reference implementation after all, I would strongly prefer cmark
to be “just” a standard C program, even if the sources are currently half-way between C95 and C99. (Requiring <stdbool.h>
for example, but not requiring C99’s declaration-is-a-statement syntax—any more.) And IMO it should not depend on <dlopen.h>
on U*IX but on LoadLibrary()
on Windows, for example.
But I now think that both of us (or rather: both of our approaches and preferences) could have our cakes and eat it, with a more general and simpler implementation strategy:
Instead of invoking an “external” processor in order to translate content of such a “tagged” code block, wouldn’t it be much more easy and clean and robust if cmark
would simply output the whole content of said code block, wrapped inside a SGML/HTML/XML element, say with a special class
attribute, or even a configurable tag name. This would be trivially easy to implement in cmark
, I guess.
It would certainly be no problem at all to create a new or adapt an existing formatting tool like mine, or say a processor for Mermaid, to scan it’s input text for just these special elements (rsp tags), and then replace the elements with the processed PCDATA
content of the element itself—would it? I can even imagine factoring this switch between “copy text outside these elements” and “replace these elements with their processed content” into a kind of post-processor infrastructure library, or post-processor applying-tool—completely independant of cmark
of course.
Because these elements would solely exist for the communication between cmark
and an “external” processor (a post-processor this time …), no SGML/HTML/XML document type definition needs to be construed or modified, as long as the tag in use does not conflict with the target document’s DTD (or XML schema, or whatever). This is very simple to guarantee by inventing and using a tag in a made-up namespace like <commonmark:specialblock class="Z">
, to continue our “Z notation” example. Remember that no one but a post-processor would actually see these elements.
And for a post-processor to just filter these elements would not even require an XML parser, a simple text search would suffice: we can, after all, rely on the exact spelling of these tags and their attributes.
These elements would be my answer to how a (post-)processors are expected to locate “their” content to process.
In case the author tagged his code block with a completely non-sensical label, for which no processor ever existed, and even less so is available in the processing chain, one could restrict cmark
's behaviour to output code block content wrapped in an element like this only for a known and given list of code block labels, and for all other labels to fall back on the current behaviour as the default—here is the graceful degradation for you!
Which would again obviate the distinction made by using a !
or similar in the author’s written text between code blocks to be processed by an “external” processor in another syntax, and code blocks as we all know and use them already—here is my desire to not change CommonMark nor cmark
's behaviour in a substantial way satisfied.
I’m not sure if it would be a good idea to not entity-encode the unformatted content of these code blocks (in order to spare the “external” processor the reversal of this): I’d much rather have cmark
output a valid XML/SGML/HTML document. Sheepishly replacing the <
and &
entity references would again be all the “external” processor would have to know and achieve regarding it’s input text stream, while just copying all the rest of the input—outside of these elements—to it’s output without any processing.
I would argue that this approach would
in a transparent and robust manner, in order to
- have these “tagged” code blocks processed in whichever way you like,
while at the same time being completely compatible with the existing CommonMark specification, practice and “feel”.
What would you think about this approach? And what was on your mind regarding the question how an “external” processor would get invoked and so on?
[Sorry if this was again a very long post, but there are a lot of details to iron out …]
(The whole topic of pre-processing in the style I do now is independent of this design of “tagged” code blocks, needs no adaption in cmark
, and is IMO a matter of taste: as long as cmake
keeps “supporting” it, I think we can regard it as off-topic now, or rather: as a nifty little trick I would recommend, but you are free to dislike and dismiss.)