Document titles

I am afraid we are talking about two topics meanwhile. One is about meta data in general. The other one is about visible document titles.

Concerning meta data, I like the YAML Sytax very much. However @hobarrera has a point about creating a compatibility issue with vanilla markdown, even though it seems quite unlikely that someone starts a document with a horizontal line, it is quite possible that someone could write something like

---
MY TITLE
---

in vanilla to create a sort of visual impression of a document title.

So I believe we need a different solution for the separator of the YAML Block

Although I support the idea to add an extension for meta data, I personally would like to focus on the visible document title and make sure it doesn’t conflict with meta data or plain markdown.

How about this?

                  My Title
+++
Author: Myself
+++

In this case the first line of the document starts with at least 8 spaces.
The meta block starts either at the very beginning of the document (if there is no visible title) or after the visible title.

In case the author adds a title as meta data, that one can be translated to the HTML Title Tag and can be different from visible title. This way we could solve the question what to do if the visible title and the meta data title are different.

That is what I meant, yes.

Using a horizonal rule on the first line does not seem like common or good practice to me. So having the --- begin the metadata block on the first line only would discourage users from using a horizonal rule seperator on the first line.

I’ve created a Gist using this example:

On GitHub at least, the top --- renders as a horizontal rule, while the bottom --- turns the MY TITLE part into an <h1> tag. I suspect other existing Markdown implementations do the same and the behaviour appears consistent with Gruber’s syntax guide.

2 Likes

Sorry for delaying describing my suggestion. I can think of three different approaches to metadata I like. Weighing them all #3 seems most attractive to me, but I’ll go through all three hoping it makes sense to anyone who cares.

  1. The first is what Pandoc does, use --- for the beginning marker, and ... for the end marker. This alligns well with YAML’s syntax definition, and if you find a --- that doesn’t have a complementry ... you can treat it as an HR for compatiility.

    If you look at the source code for the CommonMark Spec you’ll find an example of this.

    But its is important to note that Pandoc is alone in treating this combination as metadata. Since this approach clearly doesn’t produce the intended result in most implementations, calling it “Common” seems a big stretch to me.

  2. The second and third approach avoid trying to overload the triple-dash, The second approach is using HTML/SGML declaration format to encapsulate YAML metadata. Same basic idea, just using delimiters that are far older: <! and >. In general Markdown-like languages just ignore this passing it on to the browser, and browsers ignore declarations they don’t recognize, so we make sure the browser won’t recognize it by starting with our very own new word: CommonMark. Then, for example, document metadata could look like this:

<!CommonMark:Metadata
Title: Testing Metadata
Author: foo bar
>

# Testing Metadata

The reaction of existing processors to this is mixed, but has more visually pleasing variations than the first, and both Markdown.pl and stmd generate HTML that displays pretty well on browsers I tested. Check it out on the preview tag of the above link:

As I see it stmd already handles that example exactly right. The only problem being it does require the blank line shown before correctly handling a heading. So using this notation, for metadata could be considered a convention, and not an extension. Mardown.pl renders it well too, except that it wraps this construct in a <p> element. In practice, the visual result of the extra <p> tag is mostly harmless, especially in context.

Drawbacks to the second approach include not being able to include either > or -- in the metadata since they are special in declarations. Since -- might be used as a part of a document seperator in YAML, and there seems to be no provision for escaping a > that isn’t the closing delimiter, this isn’t a great solution in terms of robustness and risk.

3. Possibly the best approach would be use of HTML/SGML Processing Instruction tags with an unrecognized namespace prefix, like CommonMark:… . This is much the same as the previous approach, except it uses <? and ?> as the delimiters. Browsers generally ignore processing instructions they don’t recognize too, and since the close delimiter is an unusual two-character combination, the risks are lower. Similar lack of special-treatment of the double-hyphen makes this the safest option I considered:

<?CommonMark:Metadata
Title: Testing Metadata
Author: foo bar
?>

# Testing Metadata

which seems to be rendered as well or better than the declaration syntax. This leaves me in a slight quandry because metadata is semantically closer to declaration than processing instruction, but as in in most cases I prefer safety over the semantics of somthing I know the browser will ignore.

The root cause that drives the required blank line is already under consideration, and I think its likely it can go away may go-away if the alternative described in Appendix B is invested in.

P.S. Some might suggest there’s a similar fourth alternatives, HTML comments. I don’t disagree, but being an old-timer, I consider comments a special case of declarations, and thus fits under #2 in the above list. Under HTML 5, that may no longer be technically true, but the difference in practice is nill, so don’t worry about it.

1 Like

IMO pandoc being alone in this regard should not be a deterrent to adoption by commonmark.

Also, pandocs approach is built on within the whole R literate programming stack: rmarkdown / knitr - and it would be good to remain compatible here.

1 Like

It was a pleasure to read this. Good work!

It looks like your option 2 and 3 would not require the meta data to be at the beginning of the document, right? So that would not conflict with my visible document title.

Concerning option 2, you said

I agree with caling this behaviour “harmless”. It may be even “good” that the meta data becomes visible in a <p> tag, because that is how I understand the spirit of markdown.

I think it would not harm to create <meta> tags for those keywords in the HTML Header in addition to the <p> tag. That could be useful for other software tools that read the meta data from the meta tags.

Considering what I just said above, being “semantically closer to declaration” is just an other reason for me to vote for option two.

Document title

I agree with kagan, that a run of spaces before a word as first or second line is probably a title, and should be auto recognised as such. And treated as <h1>. As for why? It’s because I would hate to have to keep typing ## throughout the entire document (As you would see in my metadata via | example below)

e.g.

lazy headers (only works if you add enough spaces)

                 Blogging Like a Hacker

| title: This is a title

... intro ...

# My First Chapter

## Level two header

Lorem ipsum (TBC)

or ( typical markdown Setext-style headers. Recall that ===... is h1 and ---... is h2 ). Not sure if this is a good idea, since it’s not immediately obvious that it is a document title, since it is typically used for subheadings.

Blogging Like a Hacker
=======================
| title: This is a title

... intro ...

# My First Chapter

## Level two header

Lorem ipsum (TBC)

or ( non-standard Setext-style headers, but specifically used to demarcate document titles). Babelmark2 shows that ###... is not supported anywhere, so is a good candadate for document title declaration babelmark2 :

#################################################
            Blogging Like a Hacker
#################################################
| title: This is a title

... intro ...

# My First Chapter

## Level two header

Lorem ipsum (TBC)

is rendered as (note: usage of header tag ):

<head>
    <title>This is a title</title>
</head>
<body>
   <header>
    <h1>Blogging Like a Hacker</h1>
    <p>... intro ...</p>
   </header>
   <main>
    <h1> My First Chapter </h1>
    <h2> Level two header </h2>
        <p>Lorem ipsum (TBC)</p>
   </main>
</body>

Metadata

Moved to

http://talk.commonmark.org/t/metadata-in-documents/721

While I agree it would cause no harm for an implemention to do this, I think that most implementations typically never generate any <head> tags at all, so it would complicate spec matters significantly to specify correct placement of a <meta> tag. .

The fenced (---) YAML approach is shared across jekyll, hyde, plume, nanoc actually.

And would be better to keep this kind of markup at the application level.IMHO.

1 Like

Are you suggesting that a single # should be changed to <h2> in the case that a title with a run of spaces is added to the top of the file? If so, I can see this being confusing for people copy/pasting text from different Markdown files that use a single # to mean <h1>.

I’d prefer to keep as much XML-like syntax out of Markdown as possible. Markdown isn’t primarily for parsers, but for human readers (as Gruber writes in the philosophy of Markdown, “Readability, however, is emphasized above all else.”). Just as YAML is more readable than XML because it is closer to something a writer might jot down in a text editor, Markdown should lean towards being less of programmers tool and more of a writer’s tool.

1 Like

Yea, I should change that post. Best to maintain consistency. Though how would you make the main document title be like <h0> or something.

Agreed on keeping out XML syntax, it’s why we switched to markdown in the first place. However I would like to say that for document declaration, we should consider if we need to go with <!CommonMark 0.1.23-github.username.projectname > to inform the parser which version to use. But would like to be convinced that placing the document declaration as a meta data would still work. (Since it’s much nicer to look at)

e.g.

                     My Title

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser

vs

<!CommonMark 0.1.23-github.username.projectname >

                     My Title

| title: Title for the top bar of any browser

I would keep that as application-specific and possibly fit within the current target users of the markup.

Namely: hyde, jekyll, Flask-FlatPages, nanoc.

Anything that would make their life or the life of their users harder is counterproductive.

I’m not quite sure what you mean here. There is no <h0> HTML tag, so the main document heading would need to be <h1>.

What I meant is that subsequent headings would need to be ## which translates to <h2> for consistency when copy/pasting between Markdown documents. If we went with an overall heading it would need to be <h1> regardless of the Markdown syntax used.

1 Like

I see a problem with using spaces to represent a document heading; four spaces are already used to represent a code block. What if the writer wanted to put a code block on the first line?

Ah… that’s so true, and so annoying as well. There is two potential ways to tackle that.

  1. Heuristics. Does it look like a title? e.g. 8 or more spaces, and a newline. (I don’t like this idea)
  2. Only allow lazy document titles ‘before’ a document declaration like below. The code will be placed after the document declaration. As for why? In most cases where you declare a lazy title, it would most likely be for a dedicated document. (I prefer this idea)
  • Best thing about this approach, is you can declare multiple lines to be a header. E.g. like the front page of a report.

e.g.

                 The Report
              By Robert Smithly
                    2042

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser
| layout: report

    Function(2+2);

The above function is rendered as a code block, while the title is treated as h1 since it is before the document declaration. This is a simulated paragraph.

the document declaration is metadata located on top of document (via compact metadata syntax):

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser
| layout: report

Which if user is lazy enough, can just simply be |

                               The Report
                            By Robert Smithly
                                  2042
|

The above function is rendered as a code block, while the title is treated as h1 since it is before the document declaration. This is a simulated paragraph.

The reason why I really really would like some lazy document title, at least for stand alone documents, is that it looks much prettier.

If you resize the editor window it’s not going to remain centred which could make the heading look a bit odd. At least with an underlined heading (or a heading prefixed by #) it will look consistent. IMO, Markdown doesn’t need to aim to look so close to the actual output, especially if the same document’s presentation differs when reused across different sites (some site headings might be centred, others left aligned).

I guess, but at least it would still be a choice for users. I mean, it could probbly mean the same thing to type. ( multiple spaces doesn’t matter, since it is unlikely people want to type code for the title page. But can have bold or italic) (Usually implicitly # in document title, but can be overrid by ##)

The Report
##   By: **Robert Smithly**
## Year: 2042

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser
| layout: report

The above function is rendered as a code block, while the title is treated as h1 since it is before the document declaration. This is a simulated paragraph.

This way it is left up to the user how they want to lay out the document title in text form.

The writer might not intend for the first line to be heading though. For example, the document may just be some causal notes with no need for a heading.

Then you place it under the document declaration. (If document declaration is not done, then there is no cover page, but the first header will be auto detected as the page title.)

<< This area is for the 'Cover Page' and title >>

| !CommonMark: 0.1.23-github.username.projectname
| title: Title for the top bar of any browser
| layout: report

<< This area is for everything else like casual notes >>

Markdown doesn’t traditionally support the concept of “pages”; it’s more fluid than that. A piece of Markdown is usually pastable between different parts of different documents. Markdown typically represents a section of an HTML document, rather than the entire document.

That’s not to say that supporting cover page use cases is a bad idea, but I do think any syntax should be defined in more abstract terms that correspond to particular HTML elements. You would typically use a “print” stylesheet in CSS for print documents and apply page break rules to particular HTML elements.

It is common to reuse a document heading in multiple places. For example, the following HTML is common:

<html>
  <head>
    <title>Site Name - Article Name</title>
  </head>
  <body>
    <img src="site-name-logo.png" alt="Site Name's Logo" />
    <h1>Article Name</h1>
  </body>
</html>

As you can see, the article name is reused in two places. This is essentially our “document heading”.

In my view, it does not make sense to require the writer to define the document heading in two places - that’s tedious and violates the DRY principle. It should be the content management system’s job to reuse the contents of the <h1> tag inside the <title> tag. The same approach could apply to a cover page.

I have been playing with a new static web page generator that reads Markdown. It also reads three different kinds of Document Metadata or Front Matter:

YAML, JSON, or the new TOML

1 Like