Mermaid - Generation of diagrams and flowcharts from text in a similar manner as markdown

There’s the “info string,” which can contain just about anything. So your AST filter just needs to look for a marker in the info string. For example:

``` dot
-- put your dot diagram here
```

so the info string is any text after the ``` but with a space? If so then there would be a distinction between:

No Space:

```dot
-- put your dot diagram here
```

and With Space

``` dot
-- put your dot diagram here
```

Where the first one would be rendered as just “block code” but with dot syntax markup and the second as a “dot diagram”? So with that logic, to have a “dot diagram” with “dot code fallback”, it would look like?:

```dot dot
-- put your dot diagram here
```

With its many English keywords, Mermaid fails to actually be like Markdown which doesn’t use keywords at all (although several extended flavors do).

I’m not saying a generic graph description language could or should be done without keywords, DOT uses them, too.

The leading space is trimmed. See http://spec.commonmark.org/0.22/#info-string
Dot diagram with dot code fallback could just be

``` dot
...
```

because the first word of the info string is used by default for the code syntax class.

and if you want just the dot code without the dot diagram? (Specifically say… the author is illustrating how to write dot code?). Perhaps .dot instead of dot in infotext indicate that it must be displayed as code?

This could be a difference between backtick and tilde fences. Tilde would be parsed if possible, backtick was always displayed source code.

3 Likes

It is quiet difficult syntax./mermaid is overtly complicated.
http://bramp.github.io/js-sequence-diagrams/ is much true to markdown.

and if you want just the dot code without the dot diagram? (Specifically say… the author is illustrating how to write dot code?). Perhaps .dot instead of dot in infotext indicate that it must be displayed as code?

Well, it’s up to the particular application to define this. (There are so many special purpose things like this you might want to do, it would not make sense to build them into the spec.) An application could use anything it likes in the info string to trigger dot rendering. If you want to mix dot syntax examples with rendered diagrams, then you might want to use a different convention than the one used in gitit.

2 Likes

That sounds rather fair. I can imagine the spec would have a section on recommendation for trigger words to always be provided e.g.diagram like:

``` dot diagram
...
```

And if diagram is not in infostring, then the plugin would simply ignore it and let it fall to code block mode.

Which would mean that by convention, the plugin will not activate without seeing the diagram keyword in the info-string (could be other keywords as it’s only a recommendation, not a standard).

I cannot and don’t want to see CM adopting English trigger words.

However, there’s syntax precedent for embedding instead of more or less verbatim rendering: the exclamation mark in image syntax vs. normal hyperlinks. I could see extensions adopting this letter for the info string, e.g.:

``` dot!

or

```! dot
2 Likes

Those are totally different graph types, though. As an example of syntax, it’s interesting but not very closely related.

I totally agree with embracing the exclamation point as a generic marker of embedding content in commonmark, and this aligns with what I suggested elsewhere to disambiguate image link syntax and convert alt text into a visible caption.

1 Like

Keep in mind that Mermaid, Viz.js (Graphviz), et al. are one of those things that should work fine today as embedded HTML with not much more syntax burden than fenced code blocks would require.

Whether that translates into usable graphing behavior when exporting to PDF or ePub, e.g., from a given implementation may be a different, more difficult, story.

If web-components will become a thing, and we use the ! as the trigger char for codeblocks then we could just make it so that:

``` !mermaid A Demo Diagram { .diagramStyle id=demoChart } 
graph TD;
    A-->B;
    A-->C;
    B-->D;
    C-->D;
```
<mermaid class=diagramStyle id=demoChart alt="A Demo Diagram" >
    graph TD;
        A-->B;
        A-->C;
        B-->D;
        C-->D;
</mermaid>

Essentially a no-markdown island. Of course, you could also have a filter in front of the AST to catch and modify the behaviour before it defaults back to webcomponent mode.

Using Concepts from:

  • Consistent attribute syntax
  • alt field is heavily encouraged by w3 for disability accessibility reasons. So infotext should always be included in alt field to encourage it’s usage. (e.g. blind users can understand what the diagram is suppose to be.)

Note, until the official mermaid supports webcomponents, if the above approach is adopted… you will need to type like this:

``` !div { .mermaid } 
CHART DEFINITION GOES HERE
```

To render as in this official example :

<div class="mermaid"> 
CHART DEFINITION GOES HERE 
</div>

Alternatively, you can have a official commonmark-mermaid plugin that captures the web-component and output the correct html code for the mermaid engine (at least until they switch to webcomponents).

  • Another approach is to say that “.dot” and “#dot” in infotext means a div with either a class=dot or id=dot. ( e.g. .mermaid is equiv to !div { .mermaid } )

If people like this idea and it has not been discussed before, I could probbly paste this into it’s own thread. Or maybe slip it into the “no markdown island” if relevant.

I don’t like the generic attribute syntax with curly braces at all, but I could live with special-casing strings that are introduced by dot . or hash # in places that don’t get rendered. The exact treatment depends on the output format, HTML attributes id and class are obviously the default case. Other places than the info string of a start fence are the location and reference parts of links, but it’s not as compatible with existing implementations, e.g.:

[text](location .class #id)

[text][reference .class #id]
  [reference]: location

[text][reference]
  [reference]: location .class #id

So I presume you mean something like this? Crissov?

``` !mermaid A Demo Diagram { key=value } 
...
```

``` ! .mermaid A Demo Diagram { key=value } 
...
```

``` ! #mermaid A Demo Diagram { key=value } 
...
```

``` ! #mermaid .mermaid A Demo Diagram { key=value } 
...
```

``` !mermaid #mermaid .mermaid A Demo Diagram { key=value } 
...
```
<mermaid key=value alt="A Demo Diagram" >...</mermaid>
<div class=mermaid key=value alt="A Demo Diagram" >...</div>
<div id=mermaid key=value alt="A Demo Diagram" >...</div>
<div class=mermaid id=mermaid key=value alt="A Demo Diagram" >...</div>
<mermaid class=mermaid id=mermaid key=value alt="A Demo Diagram" >...</mermaid>

In most cases the generic attribute syntax can be omitted if you do not need to set any values in html (or plugin/extention settings). (basically I know it’s ugly, but it helps avoid overloading the most common commonmark syntax, by moving the cruft to an optional {}.

On the question of compatibility. This is what it would generally look like (in current commonmark):

<pre><code class="language-!mermaid">...</code></pre>

Edit: Added ! .mermaid to .mermaid and others, to account for the case that people want to style “code blocks” rather than styling web-components.

I didn’t say anything about alt or rather title, which isn’t such a bad idea but should go inside quotation marks, and I’d leave out the curly braces part, of course.

``` !mermaid "A Demo Diagram" .mermaid #mermaid something else

<mermaid title="A Demo Diagram" class="mermaid" id="mermaid">…</mermaid>

I don’t think it is necessary to handle syntax entensions like Mermaid by extending a CommonMark processor like cmark at all. I use a regular CommonMark (rsp Markdown) syntax processor in a very similar scenario without difficulties, and I think my approach can be used in a very general way.

I am currently using (a slightly extended clone of) cmark to generate HTML documents, but not directly from CommonMark formatted input, but from original “extended” syntax input after a pre-processing step.

For example, I generate this HTML from that input.

The cmark input is not the hand-written CommonMark text, but the output of another tool of mine which acts as a pre-processor: It reads the original text input (written in the “extended” syntax) and replaces parts of the hand-written text with HTML mark-up, while leaving the rest alone (which is all the regular CommonMark text).

This output is than fed to cmark, and cmark then ignores and passes through the inserted HTML (as the CommonMark and Markdown specifications require), while doing it’s job on the CommonMark text. The output from this step is the final HTML document.

The two steps look in a Makefile like this (cm2html outputs a whole HTML document, including the document type definition and the <HEAD>, but is in all other respects the CommonMark parser):

$(HTML): $(MARKDOWN)
        zhtml -aUm $(MARKDOWN) >"%TEMP%\$(TMPNAME)"
        cm2html "%TEMP%\$(TMPNAME)" >$(HTML)

This works quite nicely and allows to “extend” the CommonMark syntax with another syntax, which is in my case very similar in spirit, namely the “e-mail mark-up” of the Z notation, as defined in ISO/IEC 13568:2000.

I’m sure that other “extended” syntaxes could be accomodated using the same approach, which is by the way very much like the time-honored use of pre-processors for troff, and that the same can be done for Mermaid.

(While this answer was written with cmark in mind, nothing in this process is specific to CommonMark, and works just as well with other Markdown processors.)

Is certainly an option to just use preprocessor (and can be an effective solution to extensions that is local to a specific website). But would quickly get unwieldy when you implement more than one preprocessor filters, due to syntax collision.

Plus its not obvious in the original source code that it is Z notation. Hence its safer for the interpreter to have an “island” of known non-markdown content, that would self specify what extensions can be safely applied to it in a portable manner.

Yes, placing “foreign” syntax in kind-of code block boxes tagged with an indicator to signal how to process them: if I understand you correctly, that would be your approach, and I think it could work. Taking my use case, that would mean a mark-up like this:

... in normal _CommonMark_ text here, but now:

~~~~Z
%%Z
X == Y %x Z
%%
~~~~

Back in _CommonMark_

where cmark would somehow be configured to pass the content of the ~~~~Z block through an external converter like my zhtml, right?

I’ll give you that this would be a relatively clean way to have one main processor, cmark, and one or more subordinate processors for “foreign” syntaxes, like zhtml.

But I see two drawbacks:

  1. It is actually more verbose to enforce “double” mark-up for the Z notation paragraphs in this way, while ^%%Z$ is (one of several) perfectly recognizable markers for such paragraphs already (and a preprocessor could certainly recognize ~~~~Z too!);

  2. It does not provide a solution for in-line “foreign” syntaxes, which the pre-processor approach naturally does – I use $ to delimit in-line Z notation.

Furthermore

  • I don’t see how this would get unwieldy if combining several pre-processors, as long as each preprocessor leaves HTML marked-up stuff alone (as they do already);

  • whether it is obvious in my example where Z notation begins and ends is somewhat a matter of taste; the standard (I mentioned and which specifies this Z mark-up) at least prescribes delimiters like %%Z (and others) do delimit various kinds of paragraphs in Z Notation, which I do not find hard to perceive. Remember that the exact same pre-processor approach would work just as well with ^~~~~Z / ^~~~~ delimiters too: it’s just the pre-processor who needs to recognize them.

So no, I’m not really convinced of the merits in modifying/extending cmark – and hence, CommonMark syntax itself, in some kind. But using some special kind of code blocks (and inline code spans too?) for this purpose would be a generic approach, yes.

But I do see the advantage in portablility, if each block of “foreign” syntax is enclosed in a tagged code block: If nothing can be done with the content, cmark could just treat it as an ordinary code block – if that’s what your’re aiming at.

Wouldn’t that be exactly the way to implement syntax highlighting in the existing notation for blocks of code, tagged with a language identifier like PHP, or C++?

1 Like