Consistent attribute syntax

Hey,

I’m developing a static site generator named Hugo. We use a Markdown parser/renderer called Blackfriday which is mostly CommonMark compliant.

I have thought about the problem in this thread. A lot. Often when I think of a new useful feature in Hugo, I’m either blocked by this – or I have to create some ugly workaround.

I have read this and other threads about this. The objections I see are of type “this is too coupled to HTML/formatting”. I think we need to think beyond just CSS classes. What I really need is

  • a way to add “processing tokens” to a node in the document
  • my own namespace that I can use as I please for my custom processing instructions
  • I guess it would also be natural to create a reserved namespace for CSS classes and possibly some other

My current “cool thing I want to do with Markdown in Hugo” is to allow people to tell Hugo how to process images:

![alt text](/my-image.jpg "Logo Title Text 1") {: hugo:Fill:"300x300" }

Or whatever. The above example is probably not the best, but there are many situations in the world of publishing where you need to pass on some instructions to the renderer. And the “you can use HTML” really doesn’t solve these problems (when the target isn’t even HTML).

2 Likes

Hey @bep! For the use-case you mention, we have had this working in pandoc for quite some time…

1 Like

I could probably “get something working” with Blackfriday, too. But the reason I reach out on this forum is to get this into some common spec so “all” parsers would recognize this pattern. Given the image processing example above, I would expect GitHub to just throw away that instruction and display the image as-is.

1 Like

to get this into some common spec so “all” parsers would recognize this pattern.

Totally agreed, that’s the goal of this thread. Unfortunately, commonmark hasn’t reached 1.0 yet, so this is going to take some time. All I’m saying is that if you’re going to implement something like that, following pandoc’s precedent might lead to fewer headaches when it comes to standardizing this in commonmark. I might, of course be biased…

1 Like

Bjørn Erik Pedersen noreply@talk.commonmark.org
writes:

I could probably “get something working” with Blackfriday, too. But the reason I reach out on this forum is to get this into some common spec so “all” parsers would recognize this pattern. Given the image processing example above, I would expect GitHub to just throw away that instruction and display the image as-is.

Currently your best bet might be to use actual HTML
processing instructions.

![My image](img.png)<?hugo:size=300?>

The special instruction will be ignored by GitHub
(actually it will be changed into an HTML comment, it
appears), but you could look for it in the AST and
modify the image rendering accordingly.

Not a beautiful solution, but it could be
something to consider, and it works right now.

Having read the discussion up to this point, I would propose to not add an attribute syntax that just passes its argument through to HTML. Instead I propose to only include attributes in the specification that actually add value on top of what could be achieved with HTML or LaTeX.

My particular use-case, coming from Support for image dimensions, is to have the equivalent of LaTeX’s \includegraphics[width=0.9\textwidth]{file} and to be able to render that into PDF / LaTeX in the same manner as it would be rendered into HTML. Without such an attribute the rendering engine has an impossible task: It cannot automatically decide whether an image should be up/down-scaled or included 1:1 just by looking at the image and the output format.

I agree that forcing more specific sizes, like width=500px height=20px, is out of scope for CommonMark and better left to specialised HTML or LaTeX syntax.


In addition to the above, I would suggest to not conflate the use of IDs (technical implementation) with the need for anchors and links (HTML) or labels and references (LaTeX). Labels and references to them are very much useful when writing any larger document, while I would agree that having control over HTML IDs and HTML classes might be out of scope for Markdown. How labels and references are being rendered into the final document should be the decision of the processor, depending on the desired output (HTML or PDF / LaTeX), while CommonMark should only concern itself with their abstract concept.

And it has the advantage of being ignored by Markdown renderers which don’t have consistent attribute syntax enabled. Even if GitHub never adopts consistent attribute syntax, you’d still have the advantage of an HTML comment being hidden in a GitHub readme. If not everyone wants to adopt the extension, perhaps this even a preferable syntax?

Just for the record, markdown-it and a number of other implementations converts the < to &lt;.

Babelmark

Howdy folks :wave:

I have been using a custom syntax for attributes for code-blocks for quite some time now in a reasonably large app and I’m interested in making it “more standard”. Is this the right place to chip in with the discussion? Are there any regular meetings where things are planned?

Also, I don’t just want to reply to this thread with a “this is the way that I would like it to look” I’m more interested in helping to shepherd any specification through than making sure it’s my personal preference of the syntax :joy:

Yes, I believe it is. You could describe a syntax and invite other people here to share their thoughts.

Keep in mind, there’s already some variation (or is it fragmentation?) in attribute syntax among different Markdown flavors and other writing formats like Markua. For example…

Pandoc / PHP Markdown Extra / Earlier version of CommonMark spec:

{#myId .myClass key=val key2="val 2"}

Maruku / Kramdown:

{:ref-name: #myid .my-class}

Markua:

{key_one: value1, key_two: value_two, key_three: "value three!", key_four: true, key_five: 0, key_six: 3.14}

Since Markdown supports, and indeed encourages more than one way of marking up elements, I think it would be fine to support some or all of these variations. Both key: value and key=value are logically equivalent, and the #id and .class syntax acts as a useful shortcut. So long as they’re within curly braces, I don’t think there should be any issue with parsers supporting a mixture of attribute syntax?

1 Like

Ok cool :+1:

The syntax that we’re utilising is similar to the one you mention is in the “Earlier version of CommonMark”:

```handlebars {data-filename="app/templates/components/rental-listing.hbs" data-diff="-15,+16"}

This comes from the Ember Documentation here to allow us to render filename and diffs:

As for the syntax, I wonder if we can’t potentially aim for having something a lot simpler in the spec? :thinking: I understand the benefits that people might have about using shortcuts for ids and classes but I wonder what is wrong with the simpler to define:

```hr {id="awesome-rule" class="blue" data-unrelated="all the data"}

I am not familiar yet with the CommonMark spec process but I wonder if a simpler definition of a feature is much more likely to make it through?

And yes I agree with @chrisalley that because of the multiple markup situations we could support both key: value and key="value" (and maybe even key=value but I am less bothered about that).

1 Like

We’ll probably need to wrap multi-word values within quotes, e.g. key="multi word value" or key: "multi word value", but single word values could leave off the quotes, e.g. key=value or key: value.

I think we should use double quotes around the value, so that the writer can use values that contain apostrophes, e.g. thought-experiment: "Schrödinger's cat"

Since there may be cases where values contain double quotes, we could alternatively support wrapping the value in single quotes, e.g. key: 'my "useful" value'.

1 Like

I thought I had said this earlier, but since I cannot find it, I will mention it possibly again: The implementations of equals vs. colon syntax for key-value pairs differ in that the latter requires a comma (or perhaps semicolon) between pairs, thus it does not need quote marks. Also, the equals sign is usually directly attached to the key and the (possibly quoted) value, without intervening whitespace, whereas the colon is almost always followed (and possibly preceded) by whitespace.

1 Like

I like it the idea of leaving off quote marks. But it does raise the question of what happens when there is a colon or comma inside of the value string. What does this produce?

title: Horizon: Zero Dawn

It seems to me that a value containing a colon or comma would be more common than the value that contains double quotes.

Markua requires that multi-word value strings are wrapped in double quotes. I wouldn’t mind staying close to the Markua rules. This would also keep the = and : quote rules consistent. There would be less cognitive overhead switching between the two.

If we wanted to take consistency to the next level we could make the commas between the key/value pairs optional. That way, any of the following four would be valid:

{key_one: value1 key_two: value_two key_three: "value three!" key_four: true key_five: 0 key_six: 3.14}
{key_one: value1, key_two: value_two, key_three: "value three!", key_four: true, key_five: 0, key_six: 3.14}
{key_one=value1 key_two=value_two key_three="value three!" key_four=true key_five=0 key_six=3.14}
{key_one=value1, key_two=value_two, key_three="value three!", key_four=true, key_five=0, key_six=3.14}

The parser could simply ignore the commas, leaving it up to the writer to optionally include them for aesthetic purposes.

5 Likes

@mb21 : Has your draft proposal reach acceptance yet?

I agree with most of the proposal except for the following points:

In the draft I am reading:

[…] For paragraphs, block quotes and tight lists, the attribute block must start on a line that immediately follows the corresponding block […]

I wonder what is the advantage of putting the attribute block after the block? I would personally follow “Beyond Markdown” recommendation and put them before the block and for inline I would leave them after The reason is that it does not seem natural to “identify” something after it’s been declared. Maybe it’s just me. I also think that class and attributes information may be useful not only for HTML but to simply identify blocks of texts in a document and it would be a lot easier to spot if it is before the blocks, by putting them them after the reader will need to identify the end of the block which is trickier than the start in all cases and less intuitive anyway.

I completely agree. Escaping left curly brackets could be quite problematic.

I wonder what is the advantage of putting the attribute block after the block? I would personally follow “Beyond Markdown” recommendation and put them before the block and for inline I would leave them after

The main reason for this recommendation is to avoid ambiguities. If block attributes can come after a block, then there’s always an ambiguity about whether the attribute goes with the block or the final inline in it.

I completely agree. Escaping left curly brackets could be quite problematic.

Where do curly brackets appear in ordinary text? Of course they appear in computer code, but that should be in code backticks. They also appear in math, but that too should be in a special environment (since a lot of mathematical expressions would otherwise need escaping).

Chicago Manual of Style only mentions these two uses:

1 Like

I agree that they do not appear often… After reconsidering, we can drop this. The syntax {...} is simpler and for the rare occasion someone will need the curly brackets inside text, it may not be worth to penalize more common use cases with the more complicated syntax {: ... ).

As an implementer thought, regarding the attributes blocks position (after or before blocks) I need to implement this functionality. What is the approval process for proposals?

When I wrote this proposal, I was (and still am) on the fence on whether the attributes should come before or after paragraphs.

We have mixed precedents: in fenced block quotes it’s before:

``` {.python}
x=1
```

while in headings it’s after:

# my title {.myclass}

But you people have certainly made good argument in favour of having them come before. Especially:

Regarding:

I thought this could be resolved by requiring the attributes to be on their own line. But it may well be that it’s easier to parse if the attributes come at the beginning of the block.


Currently, there isn’t any. Commonmark hasn’t even reached 1.0 due to some edge cases that need to be resolved. That being said, it’s certainly a valuable forum to have different implementers discuss pros and cons of future extension syntax.

Btw., if you’re interested to see what happens if you bolt-on attributes and some other pandoc extensions on the token-based parser of markdown-it.js (on a least-effort basis), feel free to play around with this bundle of markdown-it plugins: GitHub - mb21/markdown-it-pandoc: Package bundling a few markdown-it plugins to approximate pandoc flavoured markdown.

As for the difference between before and after… For me it is a matter of how we see the attributes…

If we consider them merely like HTML attributes (ignoring all other considerations I make below) then to put them after would make sense since they are just seen as side parameters only there to serve the purpose of being used in the HTML generation process.

I point out in my previous comment that putting them before would make them easier to use to identify blocks. In the vision I have for those attributes, yes, they are used to feed the HTML rendering process (or any other kind of generation process…) but I see them mainly as semantic identifiers. They can tell us something about the text we are looking at or we are looking for.

For example, I could have a text like this:


# title level one 

{.content}
Some text. 

…that I want to comment…


# title level one 

{.content}
Some text. 

{.comment}
Author, could you make this sentence longer? 

If I take the same text and I put the parameters blocks after, it gives something like this:


# title level one 

Some text. 
{.content}

Author, could you make this sentence longer? 
{.comment}

For me, the first one is easier to read, in the second, it’s more difficult to know what goes with what.

Other point to consider in favor of putting attributes blocks before is consistency. The semantic information in a fenced code block, the parameters, is put before the block so parameters blocks should follow I think.

Another point regarding the spec. I see that there is no way to specify multiple classes in the attributes blocks:

Markdown authors shouldn’t write multiple key-value pairs with the same key in an attribute block. However, to ease the burden of implementation, the behaviour in such cases is left undefined—although most implementations will probably parse the attributes sequentially and insert them into a map, which would result in a last-one-wins semantic.

On this, I think a syntax like this should be allowed:

{.author .john}

Again, if we see classes as semantic tagging, or meta-information about the block, a bit like in fenced code blocks parameters, the support of multiple classes and the syntax that goes with it should clearly be defined in my opinion. The parsing would simply need to have an array(or a set to avoid duplicates…) per map entry and fill it in as the parsing is done.

Thank you very much for the link to the source code: exactly what I needed to get me started! I will look into it for sure, really appreciated!

2 Likes