Document titles

kagan · September 20, 2014, 12:28am

I write many technical documents. They have a document title, chapters, subchapters etc. Using markdown extra, much of my documents can be written simply as plain text and converted to HTML.

What I am looking for, is a way to define the document title. E.g.

                           MY TITLE


# My First Chapter

## Level two header

Lorem ipsum (TBC)

I am now looking for a rule that would turn “MY TITLE” into something like a document title. As far as I know, HTML does not offer such a thing. The h1 tag is not the right thing for that, because it would be used for chapters. The title tag in the header is not visible in the document itself, but it is treated like a meta tag.

Preliminary solution:

The HTML Part could look like

<p class="docTitle">MY TITLE</p>

The corresponding common mark rule could be similar to

If the first line of a document starts with minimum 8 spaces or two tabs and is followed by at least one empty line, the content is translated to HTML as a document title.

Any ideas?

Burt_Harris · September 20, 2014, 1:34am

I’ve got an idea. How about using # for the document title, ## for chapters, and ### for the next level down.

I think this is in keeping with HTML guidelines, that there should only be one <h1> in a document.

chrisalley · September 20, 2014, 1:59am

Your content management system or converter could check for the <h1> tag and reuse the content in the HTML <title> tag.

Edit: Reading the W3C page on the H1 tag, I see no mention of a one h1 tag per document rule in HTML5.

Rick · September 20, 2014, 3:21am

I think most people use YAML for this?

---
author: William Shakespeare
title: Twelfth Night
---

mofosyne · September 20, 2014, 3:26am

YAML makes sense. I can imagine this would be in the top of the page. Could possibly add settings flag local to document here a well.

!!!document
   author: William Shakespeare
   title: Twelfth Night
!!!

chrisalley · September 20, 2014, 3:53am

Jekyll solves this by adding triple dashes around the YAML “front matter” at the top of Markdown files.

---
layout: post
title: Blogging Like a Hacker
---

Perhaps we could go with the Jekyll syntax since it is quite widely used?

Burt_Harris · September 20, 2014, 4:53am

I’m a really big supporter of YAML.

However a solution that works today and doesn’t require syntactic extensions to the core language for me has has a stronger appeal than blue sky talk of extensions. That said, adding a YAML metadata to CommonMark is the single most appealing extension idea I know of.

I think however Rick’s post illustrates the biggest problem with the proposal as well, the YAML --- document separator notation has a syntactic overlap with Markdown’s <hr> notation, so what he typed turned into to <hr> horizontal rules on display, and they are almost invisible.

There’s already a very good solution to this conflict between YAML and Markdown like languages, illustrated in the CommonMark spec.txt document itself. I’m about typed out for the night, anyone else care to elaborate?

chrisalley · September 20, 2014, 6:33am

Good point. Could we handle --- differently if it is placed at the top of the document? It would be odd to include a horizontal rule on the first line. As mentioned before, Jekyll already does this.

Burt_Harris · September 20, 2014, 8:51pm

Yes, absolutely we could. I’m glad you point it out.

I suggest there are several other interesting distinctions that could be made for added clarity of intent to switch languages to YAML for including non-content data, including metadata. There’s also an interesting distinction that can be made about how we switch back to CommonMark.

kagan · September 20, 2014, 9:34pm

I like the idea. I just would like to make sure that the title in the YAML part would become the visible document title in the HTML Body. That is the original intent of my proposal.

chrisalley · September 20, 2014, 10:17pm

I still think you’d be better off using an <h1> tag for that, but a custom parser could grab the value from either place and reuse it in both the <h1> and the <title>.

hobarrera · September 23, 2014, 1:55pm

We could use a new syntax for a YAML separator, instead of --- like jekyll, eg “///”, “^^^” or some other character.

This would also open the door for more metadata (<title>, but also meta-author tags, etc).

Burt_Harris · September 23, 2014, 6:19pm

Of course we could, in an extension. But the tension between extensions and compatibility is a very tricky balancing act, which means that the core CommonMark authors are avoiding it for now, delaying the goal of including it.

My thought however was that if we choose the metadata syntax carefully (rather than basing it on other YAMF implementations like Jekyll or Pandoc) metadata doesn’t have to be an extension in a strict sense. Note however that my thinking on this doesn’t meet @kagan’s explicitly stated desire for the title to become a visible part of the document in the HTML, in fact I’m aiming aiming metadata to stay hidden (but present), so I’m hesitant to bring it up on @kagan’s topic (without his invitation.) So if you really want to discuss document metadata rather than document titles, either a new topic or edit to this one’s title is appropriate.

chrisalley · September 24, 2014, 4:23am

I would go with --- for adding metadata to a document for a number of reasons:

It looks like a seperator, visually
It is already widely used (in Jekyll/Github Pages)
It compliments the horizontal rule seperator, since the writer is using the same syntax to seperate blocks of text later in the document as well.

hobarrera · September 24, 2014, 8:32am

It doesn’t compliment it: it’s ambiguous. “Is this a metadata separator, o a ruler”?
Unless we define that it’s a separator only when on the first line, but it’s an exception and make CommonMark and Markdown incompatible.

kagan · September 24, 2014, 11:50am

I am afraid we are talking about two topics meanwhile. One is about meta data in general. The other one is about visible document titles.

Concerning meta data, I like the YAML Sytax very much. However @hobarrera has a point about creating a compatibility issue with vanilla markdown, even though it seems quite unlikely that someone starts a document with a horizontal line, it is quite possible that someone could write something like

---
MY TITLE
---

in vanilla to create a sort of visual impression of a document title.

So I believe we need a different solution for the separator of the YAML Block

Although I support the idea to add an extension for meta data, I personally would like to focus on the visible document title and make sure it doesn’t conflict with meta data or plain markdown.

How about this?

                  My Title
+++
Author: Myself
+++

In this case the first line of the document starts with at least 8 spaces.
The meta block starts either at the very beginning of the document (if there is no visible title) or after the visible title.

In case the author adds a title as meta data, that one can be translated to the HTML Title Tag and can be different from visible title. This way we could solve the question what to do if the visible title and the meta data title are different.

chrisalley · September 25, 2014, 6:13am

That is what I meant, yes.

Using a horizonal rule on the first line does not seem like common or good practice to me. So having the --- begin the metadata block on the first line only would discourage users from using a horizonal rule seperator on the first line.

I’ve created a Gist using this example:

On GitHub at least, the top --- renders as a horizontal rule, while the bottom --- turns the MY TITLE part into an <h1> tag. I suspect other existing Markdown implementations do the same and the behaviour appears consistent with Gruber’s syntax guide.

Burt_Harris · September 26, 2014, 10:16pm

Sorry for delaying describing my suggestion. I can think of three different approaches to metadata I like. Weighing them all #3 seems most attractive to me, but I’ll go through all three hoping it makes sense to anyone who cares.

The first is what Pandoc does, use --- for the beginning marker, and ... for the end marker. This alligns well with YAML’s syntax definition, and if you find a --- that doesn’t have a complementry ... you can treat it as an HR for compatiility.

If you look at the source code for the CommonMark Spec you’ll find an example of this.

But its is important to note that Pandoc is alone in treating this combination as metadata. Since this approach clearly doesn’t produce the intended result in most implementations, calling it “Common” seems a big stretch to me.
The second and third approach avoid trying to overload the triple-dash, The second approach is using HTML/SGML declaration format to encapsulate YAML metadata. Same basic idea, just using delimiters that are far older: <! and >. In general Markdown-like languages just ignore this passing it on to the browser, and browsers ignore declarations they don’t recognize, so we make sure the browser won’t recognize it by starting with our very own new word: CommonMark. Then, for example, document metadata could look like this:

<!CommonMark:Metadata
Title: Testing Metadata
Author: foo bar
>

# Testing Metadata

The reaction of existing processors to this is mixed, but has more visually pleasing variations than the first, and both Markdown.pl and stmd generate HTML that displays pretty well on browsers I tested. Check it out on the preview tag of the above link:

As I see it stmd already handles that example exactly right. The only problem being it does require the blank line shown before correctly handling a heading. So using this notation, for metadata could be considered a convention, and not an extension. Mardown.pl renders it well too, except that it wraps this construct in a <p> element. In practice, the visual result of the extra <p> tag is mostly harmless, especially in context.

Drawbacks to the second approach include not being able to include either > or -- in the metadata since they are special in declarations. Since -- might be used as a part of a document seperator in YAML, and there seems to be no provision for escaping a > that isn’t the closing delimiter, this isn’t a great solution in terms of robustness and risk.

3. Possibly the best approach would be use of HTML/SGML Processing Instruction tags with an unrecognized namespace prefix, like CommonMark:… . This is much the same as the previous approach, except it uses <? and ?> as the delimiters. Browsers generally ignore processing instructions they don’t recognize too, and since the close delimiter is an unusual two-character combination, the risks are lower. Similar lack of special-treatment of the double-hyphen makes this the safest option I considered:

<?CommonMark:Metadata
Title: Testing Metadata
Author: foo bar
?>

# Testing Metadata

which seems to be rendered as well or better than the declaration syntax. This leaves me in a slight quandry because metadata is semantically closer to declaration than processing instruction, but as in in most cases I prefer safety over the semantics of somthing I know the browser will ignore.

The root cause that drives the required blank line is already under consideration, and I think its likely it can go away may go-away if the alternative described in Appendix B is invested in.

P.S. Some might suggest there’s a similar fourth alternatives, HTML comments. I don’t disagree, but being an old-timer, I consider comments a special case of declarations, and thus fits under #2 in the above list. Under HTML 5, that may no longer be technically true, but the difference in practice is nill, so don’t worry about it.

malcook · September 27, 2014, 7:34am

IMO pandoc being alone in this regard should not be a deterrent to adoption by commonmark.

Also, pandocs approach is built on within the whole R literate programming stack: rmarkdown / knitr - and it would be good to remain compatible here.

kagan · September 27, 2014, 10:03am

It was a pleasure to read this. Good work!

It looks like your option 2 and 3 would not require the meta data to be at the beginning of the document, right? So that would not conflict with my visible document title.

Concerning option 2, you said

I agree with caling this behaviour “harmless”. It may be even “good” that the meta data becomes visible in a <p> tag, because that is how I understand the spirit of markdown.

I think it would not harm to create <meta> tags for those keywords in the HTML Header in addition to the <p> tag. That could be useful for other software tools that read the meta data from the meta tags.

Considering what I just said above, being “semantically closer to declaration” is just an other reason for me to vote for option two.