Multiline Image URLs for e.g. PlantUML

Coming to CommonMark from Github Markdown we desperately miss multiline image URLs for e.g. embedding PlantUML diagrams in our documentation. Here’s an example:

![](http://plantuml.rado0x54.com/png?uml=
@startuml
Hans -> John : Hallo
John -> Carla : Hi
Carla -> Hans : Ola
@enduml
)

This worked on Github and still works in some markdown editors like MacDown and it would be great to have it back. All additional lines within the brackets are just URL-escaped on the fly leading to the following code for the example above:

![](http://plantuml.rado0x54.com/png?uml=@startuml%0AHans%20-%3E%20John%20%3A%20Hallo%0AJohn%20-%3E%20Carla%20%3A%20Hi%0ACarla%20-%3E%20Hans%3A%20Ola%0A@enduml)

The original allows our diagrams to be edited in place and having diagrams as first class citizens in our documentation was a huge win to using external tools as we did before…

What’s your opinion? Does it hurt to add that to CommonMark?

2 Likes

According to the CommonMark Dingus, you can have multiple lines in an HTML attribute, but the browser turns around and treats them like spaces [](

<img src="http://plantuml.rado0x54.com/png?uml=
@startuml%0A
Hans -> John : Hallo%0A
John -> Carla : Hi%0A
Carla -> Hans : Ola%0A
@enduml
">

This also works for links on GitHub. However, for images specifically, GitHub strips the URL, generating an <img src=""> for the example above. :angry:


As for why CommonMark doesn’t parse multiline URLs like GFM-OG did? Luckily, nobody on Babelmark can be tricked by the Geordi LaForge emoticon []( at the beginning of this post to render 95% of the document as a gigantic link, but some of them were willing to eat a large fraction of this paragraph :-). The idea is that “CommonMark’s syntactical forms should be hard to trigger accidentally.”

Click here to see it.

Edit: the newline isn’t completely ignored

Not quite: While it is true that line breaks (or record ends in SGML parlance) may occur in attribute value literals in SGML, XML, proper HTML (ie W3C 4.01 or ISO 15445) and the funny notation called HTML5, they are not “completely ignored” in either case:

  • In SGML and XML, these line breaks (as well as HT characters) inside attribute value literals are equivalent to SPACE characters for CDATA attributes; and for tokenized attributes (ie those with a declared value of NMTOKENS etc) a whole span of multiple white-space characters is equivalent to a single SPACE: This is called attribute-value normalization in XML. (See ISO 8879:1986/A1:1988 section 7.9.3 for SGML and W3C REC-XML section 3.3.3 for XML.)
  • In HTML5 however, these white-space characters are part of the attribute value, ie they are taken literally and passed on to the application - at least that’s how I understand the tedious description in W3C TR-HTML51, section 8.2.4.38 ff.

That is, those newlines are not represented in the document’s information set - the SGML element information set (ESIS), rsp the XML Infoset - for SGML/HTML/XML, and a structure-controlled application therefore never sees them but only SPACE.

However, any HTML5 processor sees them all and can act on them (as would a markup-sensitive SGML application).