Consistent attribute syntax

rwzy · September 7, 2014, 5:55am

The attributes would be split from the text which refers to the image (![image](src)), and rather reside on the cms’ database or whatever. And by the following I get that that’s exactly what’d you prefer:

But my point was always that the attributes are not always just presentational, but also inherent properties of the content. So that they shouldn’t be separated and still be copied over. So that the other system can also present the differences specified by the attributes if they so desire/think it’s meaningful to do so.

If auto-generation of heading anchors is accepted into the spec, then it’s most imperative that there’s a custom method to override them, in case of conflicts between two same IDs. The syntax proposed here makes the most sense as it’s widely used already, so minimal backwards compatibility issues.

Indicating rel links.
Indicating dir, at least for the code blocks example.

mofosyne · September 7, 2014, 6:33pm

FYI: So far this is the kind of attribute syntax that we are agreeing upon in Generic directives/plugins syntax

Is this what you guys are ultimately talking about in terms of consistency in attribute?

Url Links

 [description](url){#myId .myClass  key=val key2="val 2"}

Embedded Media

!mediaType[description](url){#myId .myClass key=val key2="val 2"}

assumed to be image if mediaType left blank

syntactic sugar ( content of () handled by mediaType handler/extension):

  ![](file.mp4 "video title" 80x10 )
 is equivalent to typing: 

  !video[](file.mp4){title="video title" width=80 height=10}

Extension Directives (For extra extensions!):

@name[content](arg){#myId .myClass key=val key2="val 2"}

name :~ extention name

or

@@@name[content](arg){#myId .myClass key=val key2="val 2"}
    >>extension code/content to process here<<
@@@

Code:

`<someCodeHere>`{#myId .myClass key=val key2="val 2"}

Shorthand version:

`<someCodeHere>`myClass

fenced code:

``````{#myId .myClass  key=val key2="val 2"}
     someFunction();
``````

short hand version:

``````myClass 
     someFunction();
``````

Header:

# HeaderTextHere # {#myId .myClass key=val key2="val 2"}

HeaderTextHere 
################## {#myId .myClass key=val key2="val 2"}

HeaderTextHere 
~~~~~~~~~~~~~~~~~~ {#myId .myClass key=val key2="val 2"}

HeaderTextHere 
------------------ {#myId .myClass key=val key2="val 2"}

HeaderTextHere 
================== {#myId .myClass key=val key2="val 2"}

Alt Possibility: wonder if we could include something in [] below… maybe summery?:

### HeaderTextHere ### [Short Summary of content]{#myId .myClass key=val key2="val 2"}

Edit: RWZY suggested a header example

rwzy · September 8, 2014, 1:15am

As long as the extension directives part is proposed to be an extension and not in the core, I’m okay with it. Except:

Have the same syntax proposed for headings too, to be able to specify custom anchors and such. (You didn’t list headings as an example.)

136 got accepted.

This means that the shorthand inline code example:

`code`myClass

turns into:

<code class="language-myClass">code</code>

instead of:

<code class="myClass">code</code>

Same with the code block example. But I don’t think it’s a problem since it follows the html spec to do that. I’m just clarifying that if you want to actually just indicate myClass instead of language-myClass, you will have have to use curly braces as such:

`code`{.myClass}

Also, in your fenced code block examples, you use six backticks, but I’m assuming the same numbering rules apply as already outlined in the spec?

mofosyne · September 8, 2014, 1:53am

yea I’m not really counting back ticks, since that is not the point of the discussion.

I think the syntax for the extension directives should be implemented as core as a way for other extensions to cleanly hook into it. However the core should not include any extensions by default.

Your code block HTML representation proposal makes sense. I’m just defining the commonmark syntax

if you want to actually just indicate myClass instead of language-myClass, you will have have to use curly braces as such:

Oh, and yes this does make sense, since {} can be seen as ‘explicit’ rather than implicit without the curly braces.

  `````ruby

is just syntactic sugar (In human written language: Shorthand notation. E.g. “Don’t” ) for

 `````` {.language-ruby}

ConnorKrammer · September 8, 2014, 5:03am

(Note: this gets rather long, but I hope that it’s a start towards something useful.)

This is an important feature to have, since without it there’s absolutely no way to style individual documents differently. If I want a particular element to act a certain way, I have to change it entirely into HTML instead of just adding a small {attributes} declaration to my existing Markdown. Human-readability is key here, and we all agree that raw HTML within a Markdown document is just plain ugly.

An argument exists that it looks too “developer-ey” (most likely stemming from the fact that it uses curly braces and can be used for CSS), but that’s ignoring one fact: users that don’t need the feature aren’t going to stumble across it accidentally. Nobody’s just going to randomly type # h1 {id="header"} without first needing that syntax and looking it up, and therefore they won’t be bothered by it. Even if they read a document using that syntax, it’s fairly obvious that whatever gets typed within the curly braces doesn’t get visibly rendered to the document (if they even bother to check).

I agree that we need to keep the language specifier for code blocks, and I especially support the notion that we look at it as “syntactic sugar” for the more verbose curly brace format.

A suggested start for the spec:

Curly braces define attributes on an element:
{class="hello" id="greeting"}
Classes can be written shorthand by prefixing a word with a dot:
{.shorthand-class .another-class}
IDs can be written shorthand by prefixing a word with a pound sign:
{#shorthand-id}
Shorthand and longhand forms cannot exist for the same attribute at the same time:
{class="not-allowed" .nope}
On a fenced code block, the first word after the opening fence will be considered shorthand for the following (where x is the first word):
{class="language-x"} or its equivalent: {.language-x}
ID uniqueness will not be enforced by the Markdown parser (too problematic; should discuss)
An attribute declaration must directly follow the declaration of another element type (inline or block) without any intermediate spaces. Attribute declarations can be placed on any element in a Markdown document. Any attribute declaration not at the end of an inline or block element will be treated as if it had no special meaning (plaintext).

As for point #7, here are some examples of how I think it would work:

This is a paragraph {.some-class}

[This is a link](url){.some-class}

# This is a header {.some-class}

`This is a code block`{.some-class}

This is a larger paragraph `with a code block`{.code-block-class} and more text **here**{.stong-text-class} and more here as well. {.paragraph-class}

It gets a little ugly **like this**{.strong-text-class}. {.paragraph-class}

mofosyne · September 8, 2014, 3:29pm

Agreed on most points of ConnorKrammer’s post above with exceptions of:

(4.) If I recall. Markdown philosophy is that all input is valid input. (But I do support having warning outputs). So for:

{class="not-allowed" .nope}

Just accept it, but make class="not-allowed" have higher precedence than .nope .

(6.) I agree on not worrying about ID uniqueness. That is the responsibility of the author. As long as we are not auto generating ‘anchors’ from header names, writers won’t likely accidentally make non unique name (and if they do, it’s their own fault).

(7.) Bit worried about potential problems with putting {} after a paragraph. As for the later examples looking quite ugly with too many {}, it doesn’t matter since in most cases people will not need to use it. But when they do need it, it will be a lifesaver. (Take for instance: customizing a link to look like a CSS based button).

The biggest reason to adopt this into core. Is that {} is optional, and will often get out of the way, but will save your sanity when you need it.

ConnorKrammer · September 8, 2014, 4:18pm

That’s a good idea for #4 – it fits the Markdown philosophy better. Another idea would be to apply both classes, so that it would have .not-allowed and .nope.

And I do agree that there are potential problems with putting {} after a paragraph, but I like the idea enough that I would be willing to put up with that.

I’d really like to get one of the maintainers to look at this. I think we’re off to a good start, but they’re the ones most likely to think up any serious objections. After that we can iterate.

mb21 · September 8, 2014, 4:24pm

Yes, I mostly agree as well. Although as @mofosyne says 4. should be reformulated to either say that:

the first of several attributes with the same name takes precedence: affects also {dir=rtl dir=auto}, or
it’s the author’s responsibility (similar to 6.), which I actually prefer to make implementations easier.

I’d probably rather white-list elements that may contain attributes than saying “all of them can”. What about block quotes, lists, list items, hard line breaks or even inline HTML?

We may also want to restrict the {} to its own line at the end of a (nonempty) paragraph (RTL use case) or block quote to enhance readability and make clashes even more unlikely:

This is a paragraph with a lot of text. That's why it's somewhat hard to make out there is an attribute attached to it unless it's on another line.
{.some-class}

> This is a block quote.
> {.some-class}

To attach attributes to the entire list I guess this could work:

- item one
- item two
{.some-class}

But adding attributes to individual list items seems very tricky. How do you know whether the target is the last list item or the entire list? (Or even the last paragraph of the last list item?) It might be possible to distinguish these using indentation, but unless someone comes up with a good use case, I think we should not support attributes on list items to avoid confusion. Of course, you could argue that if the {} doesn’t have to be on its own line (like I just proposed) it would be somewhat easier to attach it to list items, but I’m not sure it’s worth the trouble.

mofosyne · September 9, 2014, 11:23am

One solution could be to treat newline + {.some-class} as 'entire block operators, and {.some-class} at the end of a line as individual item operators (with the exception of text only paragraphs.

Ergo:

Normal list with indivudal item styling and block styling:

- item one {.some-class}
- item two {.some-class}
{.some-class}

This is a paragraph with an ignored class.

This is a paragraph with a lot of text. That's why it's somewhat hard to make out there is an attribute. {.this-class-is-ignored-and-left-as-it-is-in-text}
{.some-class}

Is that it? Any objection, or refinement? Should this feature request be place in https://github.com/jgm/stmd/issues now?

mb21 · September 10, 2014, 3:53pm

I have put a draft together and made a pull request (although discussion should stay here). Let’s see where this goes.

Paragraphs, block quotes and lists are left out for the moment as it’s really quite tricky and will probably have to be indentation-sensitive.

mofosyne · September 10, 2014, 4:00pm

kk, issue thread posted in https://github.com/jgm/stmd/issues/102 btw, just make sure it matches. If not, then let me know what needs to be changed (or change yours).

Wonder if the spec should highlight all ‘syntactic sugars’ and keep it seperate… since I would imagine a good parser would first scan the page… convert all ‘syntactic sugars’ to it’s main form (Maybe that can be a feature for keeping things consistent), before processing the page properly into html. Might help simplify implementation. Should that be a different issue?

ousia · September 10, 2014, 5:12pm

I think there is an essential attribute that I miss from almost all lightweight markup codes: natural languages.

Having language as an special attribute ( :lang [such as in :en, :de or :grc]), enables text hyphenation. Text hyphenation is part of HTML. And it improves text readability in some contexts.

Syntax is taken from Textile. I think the colon makes sense for language, since language in CSS is a pseudo-element, selected with a starting colon plus the language code.

The issue with the consistent attribute syntax is that basic attributes can be only applied to some elements. I think that attributes for class, id and language should be available for all elements, either block elements or inline elements.

Otherwise, it would be impossible to specify that an emphasized element is in a foreign language. Or that it belongs to a class. Or that it has the specified id.

Would it be possible that :lang attribute could be added to the special attributes for the CommonMark proposal? How about extending the attributes to any CommonMark element?

What do you think about these questions?

mikl · September 10, 2014, 6:54pm

I don’t really think additions like these are within the goals of markdown. If you want access to the full spectrum of what you can do with HTML (attributes and whatnot), you should probably just use HTML.

That’s the reason that markdown allows you to mix HTML tags in your markup, after all.

ousia · September 10, 2014, 8:53pm

@mikl, class, identifiers and languages are not specific to HTML. They are essential features for any logical approach to be able to work with texts.

HTML is fine to have HTML as its only output.

Imagine I want the books I write generated in the following formats (I must admit that I’m biased for having used pandoc for some time, although I really need these formats):

high quality PDF (compiled with ConTeXt)
HTML/HTML5
ePub 2/3
Microsoft Word 2007
LibreOffice Writer

Using HTML as source format won’t make it. Because even classes, identifiers and languages would be hardcoded in HTML.

Having a lightweight markup language does serve for this purpose. These three attributes are essential to deal with texts that aren’t only typewritten.

EnCey · September 10, 2014, 9:14pm

Classes are generally applicable, I can agree to that. But identifiers? I wouldn’t even use them when targeting HTML. Why not create a class and use it only once? We’re not HTML developers but CommonMark users. Identifiers, to me, are redundant, unnecessary and complicate the syntax.

Also your example fails to sell your point, as I can imagine what #mainContainer refers to in HTML but I’m clueless as to what it would target in a Word or PDF document.

ousia · September 10, 2014, 9:55pm

@EnCey, many thanks for your reply.

I’m not an HTML expert, but with identifiers you can point to them. This is something not possible with classes (it doesn”t make sense there). And cross-referencing works in any format (not only HTML)

Books are divided in front matter, main matter, back matter (and some even distinguish the appendices from the back matter). This allows different page and title numbering.

In Word or a PDF document, these divisions are relevant. There are way more examples, but I think I made the point of attributes in CommonMark.

BTW, how am I supposed to fake hyphenation in multilingual documents?

Should every text be handled as it were English? (This doesn’t work in German or Spanish, since hyphenation rules are different)

Or should the whole text hyphenated with the rules for the main language? (Sorry, but this doesn’t work again, hyphenation in passages in foreign languages won’t work.)

Many thanks for your reply again.

mofosyne · September 11, 2014, 3:24am

http://address ~= inter webpage/website address. Allows for jumping between pages

  Jump Via: [ text ]( http://address )

#identifiers ~= inter document address. Allows for jumping between sections

  Jump Via: [ text ]( #identifiers )

Both are important in making information more accessible.

EnCey · September 16, 2014, 10:12am

My apologize, I confused the intent of identifiers as anchors with CSS identifiers (purely to style a document). In this sense, I agree that they are universally useful and not HTML specific.

mofosyne · October 1, 2014, 6:38pm

But adding attributes to individual list items seems very tricky. How do you know whether the target is the last list item or the entire list?

Just thought of a general solution that might work for inline styling within paragraph, and outside of it.

Recall in [](){} that [] tend to allow styling within it. Then how about this [[[ ... ]]]{ .style } for block styling and []{ .style } for inline styling. This should work well for list items

[[[
But adding attributes [to individual list items seems very tricky.]{ .inlineStyle }
(Or even the last paragraph of the last list item?)
]]]{ .blockStyle }

[[[
- item one
- [item two]{ .itemStyle }
]]]{ .blockStyle }

mb21 · October 1, 2014, 8:00pm

Yes, []{.class} seems like a logical syntax for spans, it is in fact already in the draft.

The block-level [[[...]]]{.class} would certainly get the job done, but it is heavy. I’d prefer to solve this with indentation/line-breaking rules if somehow possible…