The case for a `<!CommonMark>` declaration tag

Let’s discuss the interpretation & advantages of using an optional <!CommonMark> directive at the beginning of a CommonMark document.

  1. First it should do no harm. Its already supported as a [declaration] (http://jgm.github.io/stmd/spec.html#declaration) by the proposed CommonMark spec. Based on this, any good CommonMark processor will pass it through into the HTML output unaltered. Browsers, not recognizing the name CommonMark, will simply ignore it. The typical human reader would be unimpressed, which is good.

  2. On the other hand, a simple document pre-processor could recognize the declaration, and choose to route the document to a CommonMark compliant processor vs some legacy document processor like markdown. This provides an upgrade path for applications and web services that want to allow users move away from some legacy markdown flavor without trying to upgrade the world.

  3. As the CommonMark specification evolves, embedding a version number in a document’s declaration could allow a pre-processor or sophisticated CommonMark processor to customize the parsing for with strict compatibility with an older version of CommonMark a document might have been written for. Admittedly, there’s work to be done to enable this, but done correctly it will allow us to evolve CommonMark in a planned way without slavish attention to compatibility stifling invention.

  4. Such a CommonMark declaration could specify the flavor of the document containing the declaration. For example, if I make the first line of my document say <!CommonMark GFM/1.0>, my intent would be to reference some (as of yet non-existant) specification of what the GFM flavor of CommonMark does differently.

Next Steps

What we need next is a consistent plan for identifying versions of the specification in a declaration. Such a plan should allow for both official sanctioned specification versions, and for unsanctioned/proposed versions of the specification. A version should correspond to a URI or URL, and if its a URL, there should be a idempotent (unchanging) version of the specification retrievable at that location.

Clearly the versions identifiers for sanctioned versions should be concise, and we can leverage a default URL base for the version specified in a <!CommonMark> declaration, for example http://www.commonmark.org/v/. [Don’t bothy clicking on that, it doesn’t exist yet, just a proposal.]

There’s more to say about choosing URLs to represent versions, but I’m about written out. What do you think?

4 Likes

If my suggestion below is adopted. How would you include multiple different ‘editions’ of CommonMark? e.g. barebone, core, programmer's extension packages etc…

http://talk.commonmark.org/t/multiple-levels-of-commonmark-specification/541

1 Like

Thanks @mofosyne. I’ve thought about it some more and I’m updating my answer.

First, see my new post on the need for flavors. I think your proposal is close to what I’m saying there.

Second, any proposed alternate such as yours needs to be identified with a prerelease-identifier in holding with the Semantic Versioning concepts, before it advances to sanctioned flavor status. The prerelease string identifies that its a spec in development. My suggestion would be something that anyone could do for themselves, and that we adopt a convention that helps people find details on any proposal, such as having the pre-release string be like github.username.projectname of a fork of the main project files. . In this fork, spec.txt should be updated with the prerelease-string appended to the version number, and edited to reflect what’s different in the proposed flavor. While it’s in prerelease state, the version numbers used would be arbitrary within that project. During this period, the right declarant tag might look like <!CommonMark 0.1.23-github.username.projectname>.

Then, assuming a community develops that supports the proposed variant, there’s at least one stable implementation that passes all the tests in the new (edited) spec.txt, and consensus is reached, then the edited spec.txt is sanctioned by publication as a stable baseline in subdirectory of a CommonMark.org web site. At this point the variant gets an official name (the name of the subdirectory, lets say it’s called Foo) and stable version number, that might be 1.0.0. Then, the recommended declaration becomes <!CommonMark Foo/1.0.0>, but implementations are fee to still recognize/process the pre-release identifiers if desired.

Does that answer your question?

1 Like

Yes, it makes sense Burt_Harris. I like the flexibility your proposal would give.

Btw, do you think this could be a metadata? Because maybe we could include it in as part of a document declaration. Or should we still stick with <!CommonMark 0.1.23-github.username.projectname > (or both?)

E.g.

                     My Title

 | !CommonMark: 0.1.23-github.username.projectname
 | title: Title for the top bar of any browser
 | author: average joe
 | layout: resume

 .... Content Here ....

http://talk.commonmark.org/t/metadata-in-documents/721

I’m not sure I understand you question, but I think that metadata and a “document declaration” have distinct purposes.

The concept of a document declaration (either an HTML DOCTYPE declaration or the CommonMark declaration I proposed) takes on special meaning: it is information potentially critical to software’s proper interpretation of the document’s content. Document metadata however is typically only advisory, containing information about the document not necessary to understanding it’s content.

I’ve tried to avoid eating-up new punctuation characters like the vertical bar, as this is likely to complicate integration of other potentially interesting extensions like tables, in addition to the other cons you mention.

I see, you got a good point. I was aiming to avoid --- & ‘…’ , since I think it’s rather bloated for small amounts of metadata.

And yes, I agree that we should probably keep to <!CommonMark 0.1.23-github.username.projectname > now that you mention how critical it is.

It’s one extra line tho


e.g.

                     My Title

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser

vs

<!CommonMark 0.1.23-github.username.projectname >

                     My Title

| title: Title for the top bar of any browser

Is there an agreed upon concept for document declaration? Is that the YAML header?

Also is there any possibility to get other standards like ASCII doc, etc… to come to a common file declaration standard? Might be handy, especially in autodetecting files.

I’m opposed to adding a declaration tag to CommonMark documents. While HTML is a suitable format for machines and developers, Markdown was designed as a writing format for human beings who are not necessarily developers. As the Markdown philosophy states, readability is emphasised above all else. “A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.”

I’ll address each what I see as the three main arguments in original post in turn:

The harm done is that it adds extra visual noise to the document which is not meaningful to a typical human reader. The overall aesthetic of Markdown is harmed when we add syntax intended solely for machines.

The web service could flag old posts in a database table as the legacy format and render them accordingly. This would reduce the need to add a versioning switch syntax to CommonMark in a lot of cases.

I hope that future versions of CommonMark do not break backward compatibility; any releases after 1.0 should be minor fixes and should degrade gracefully. If the goal of CommonMark is to solve the problem of multiple inconsistent implementations of Markdown then changing the spec would be creating yet another flavour of Markdown.

Extensions might create inconsistencies between documents though. In the case of extensions extra care should be made so that there is a graceful fall back to core CommonMark, preserving the semantic meaning of the extension. I’ll provide two extension examples:

~~strikethrough text~~

with the extension enabled generates the following HTML:

<s>strikethrough text</s>

and without the extension generates:

<p>~~strikethrough text~~</p>

…which still looks like strikethrough; no major harm done. Another example is the proposed definition list syntax:

Species
: Human

which generates:

<dl>
	<dt>Species</dt>
	<dd>Human</dd>
</dl>

and without the extension generates:

<p>Species: Human</p>

Not a bad fallback either. There will be some extensions that don’t degrade as gracefully. So long as the meaning remains apparent this is an appropriate solution.

1 Like

As long as it is not mandatory, I think there is merit in having a document declaration.

Line noise can be minimised if we keep the amount of settings available to a minimum.

Anyhow, if we adopt my concept of block directives . Then this is possible:

!CommonMark: 0.1.23-github.username.projectname
 Author:     Bane Liciea
 Title:      Why you should hire me
 Date:       32-4-2002
 Layout:     Resume

... content blabalba ...

You can see that it really doesn’t take much visual noise, and you can include important information that will help user contextualize the document. So it is both machine and human readable.

layout: resume :~ It helps people know that the document is a resume. For machines, it is a suggestion on what default CSS to apply. This is important, as the CSS for a Resume, is vastly different from a academic report for instance.

This has big significance in archival, in that it provides a file descriptor that is carried with the file itself. Think of how a microfiche slide carries a label.


  • It can be minimal line noise. Just need to drop the <>.
  • It can fall back as a directive.
  • File metadescriptors might get lost if not inserted into file
  • We cannot predict the future. It’s better safe than sorry, by being futureproofed. Especially once this standard is used in more areas than simply web conversations. Make’s archivist life easier.
  • It is optional. Thus it will be used where it makes sense to do so.

Just think of it as an ‘archivist sticker label’ . E.g. the ISBN number may be on it.

To avoid derailing the topic, I’ll respond just to the points directly relating to the declaration tag.

No noise is better than minimal noise, so if it can be avoided it should be.

This could potentially be an issue. As mentioned earlier, CommonMark shouldn’t need a meta descriptor if it remains backward compatible.

An issue might arise if another flavour of Markdown was used, say GitHub Flavored Markdown. In that case there’s still a problem because GFM does not have a declaration tag. The rendering software would not know whether to render GFM or CommonMark or some other flavour. Parsing any Markdown file as CommonMark + any available extensions by default would have the same result.

We can plan not to break backward compatibility once CommonMark reaches version 1.0. It’s possible another format will come along in the future that breaks backward compatibility for an unforeseen reason. When that happens the proposal could be reviewed for the new format, with CommonMark remaining the default parser format. We should aim not to get to that point and instead be careful when deciding both core and extension syntax.

The proposal feels very anti-Markdown even if optional. Perhaps others would like to share their thoughts.

1 Like