Metadata in documents

note: I’m not discussing about whats inside the metadata. I think YAML is good enough. (But do encourage support for Vcards somehow. Most likely as an extention). I’m mostly discussing how to visually seperate metadata from document.

I don’t mind --- & ... for block metadata. However for short metadata blocks, it is overkill, and occupy too much visual presence. | is a bit more compact, and can be stacked to occupy even less space.

Compare

                        My title  

| layout: post
| title: Blogging Like a Hacker

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

with

                        My title  

-----------------------------------------------------------------
layout: post
title: Blogging Like a Hacker
.................................................................   

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

My initial idea of |||||||| is indeed a massive visual presence and perhaps should be discouraged (except maybe for metadata that is also displayed to user… e.g. VCards)

However for short document declaration, it occupies too much visual presence, that it does detract from the actual content. Each has it’s merits, but I think we need both a “block metadata” and a ‘visually compact’ version as well. Visually compact metadata is mostly hidden from users, but block metadata might be shown (With some sort of switch perhaps).


Hmmmmm… in the case of block metadata

Can you say that since ||| has a bigger visual presence than --- & ... we could treat the difference in if the metadata should be displayed or hidden?

For metadata that should be shown

||||||| Burk Drake  |||||||
 name: Burk Drake
 telephone: (702) 932-6523
|||||||||||||||||||||||||||

While for metadata that should be hidden from normal view.

----------------
layout: post 
title: Blogging Like a Hacker
................

Vcard as an extention

How to support Vcards? Well I guess that just have to be an extention. But perhaps you can allow generic extentions to output metadata as well. So it might look like this

!!!!!!!!!!!!!!!!!Vcard[Burk Drake's Vcard]
 begin:vcard 
 n:Burk;Drake
 tel;work:(702) 932-6523
 end:vcard
!!!!!!!!!!!!!!!!!

would call a general directive function that outputs either YAML vcards (like example in previous section), or a microformat hcard (depending on setting).

nanoc, plume and probably plenty of other static page generators seem to use the — fence to mark the metadata. I wouldn’t use something different.

regarding vcard support surely it fits well the block-level extension syntax (either

@@@vcard(caption)
...
@@@

or

!!!vcard(caption)
...

yea I agree that vcard would work as an extension within a generic directive framework via @ or !.

Interesting ideas @mofosyne.

Regarding the use of ... as a terminator for meta data blocks, would it not be better to use the same start and end syntax? This would be be consistent with the rest of Markdown, for example:

### This is a third level heading ###

or

```ruby
puts "Hello World"

With this in mind, I'm still in favour of the Jekyll syntax for placing meta data at the top of a file.

[quote="mofosyne, post:1, topic:721"]
| is a separator char that declares following line to be a key:value metadata statement
Can stack multiple key values pairing of metadata, but must be seperated by | .  e.g. | key:value : a ; b_ : c | key1:value1 gives {"key":"value : a ; b_ : c", "key1":"value1" }

It is contextually dependent on h1, h2, h3 headings etc...


pros: Cleaner visuals. YAML likecons: Cannot simply copy paste YAML like the jekyll ----- notation.
[/quote]

This looks nice visually. If I understand correctly, this is for embedding meta data within a section of the document (e.g. not just at the top)? Is there a use case for this?

Not sure if I can super think of one. But stuff like copyright attribution, or maybe “figure 1”, for codes blocks might be handy. Or maybe making a resume more searchable. Or providing a speech synth metadata for exact pronunciation of certain hard to say words? Or maybe you need to cite your sources? Or maybe you have a summery of the section?

This is one of those applications where you build it, and people will come.

The examples you mentioned sound like they might be displayed visually. Perhaps some are already covered by our discussion in the directives topic? Would they map to particular HTML elements?

well I see | as first character of a line to be for hidden metadata. As for block diagrams, I see |||||| as displayed visually, while --- to ... is hidden. As for reason, its about how visually strong it is. |||| is very strong, so I thought that might mean it might mean the user want to display it visually. (You can see that in my examples)

For visual data/blocks, why not just use directives?

Good point. I guess what I’m aiming for in terms of “visual metadata” (As opposed to hidden metadata like with | example ), is a commonmark document that can be read as easily if formatted correctly to json. E.g. https://jsonresume.org/ but which can look as good as it can be easily parsed.

E.g. Formatting this txt resume to be easily read as a json data structure. resume example in markdown


Approach for visual metadata

hmmmm… noticed that people type list often like this

# header title  (secondary descriptor): description
Loose Key (secondary descriptor): description

List name:
* item name1 (secondary descriptor): description
* item name2 (secondary descriptor): description
* item name3 (secondary descriptor): description

e.g.

# About Animals (Year:1986) : You know you want to know more!   
Written By (Author): Greg
Publisher: Burkank

Animals:
* bob (cat) : barfs furballs
* george (dog) : very lazy
* alex (cat) : likes birds

The common thing is that it uses “Key(2nd value): Value” structure like YAML,
or “Key(2nd value) - Value” used for this example.

Perhaps we can use that?


Extra example of textual resumes: http://media.wiley.com/Lux/assets/03/126203.08037X%20fg0401.pdf

JSON Resume looks like a cool project. It’s unfortunate that many companies still require CVs to be submitted as Word documents.

Could we just use YAML to represent the metadata (visual or not)? It seems like another syntax is being invented that represents essentially the same thing as YAML. YAML is already quite readable and compliments Markdown well visually.

If the metadata is to be visual we should think about which HTML elements would be used to represent the data.

I think metadata in YAML and | format should be hidden. (Btw my proposal essentially is YAML, but avoids the --- to ensure a more compact representation. )

Not too sure how meta data in visual form (Key(2nd value): Value) (e.g. standard list, headers etc…) should be done in HTML. But this might give an idea http://www.w3schools.com/tags/tag_meta.asp

Using that W3Schools example:

<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Hege Refsnes">

All of that information is in the page head, so I think using Jekyll-style front matter would be fine in that case:

---
charset: UTF-8
description: Free Web tutorials
keywords: HTML,CSS,XML,JavaScript
author: Hege Refsnes
---

The <meta> tag only ever goes in the <head> section of an HTML document. If you’re putting data in the body it’s usually represented by visible HTML element (unless the element is hidden by CSS).

Data definition lists can be used to display matching pairs, I’m not sure about secondary descriptors though. Perhaps the secondary descriptor could be just another definition. Using your example:

<dl>
  <dt>bob</dt>
  <dd>cat</dd>
  <dd>barfs furballs</dd>
  <dt>george</dt>
  <dd>dog</dd>
  <dd>very lazy</dd>
</dl>

And the Markdown would be:

bob
: cat
: barfs furballs
george
: dog
: very lazy
1 Like

Btw just noticed that ascii doc way of doing document declaration is

**Writing Documentation using AsciiDoc
====================================
Joe Bloggs <jbloggs@mymail.com>
v2.0, February 2003:
Rewritten for version 2 release.**

perhaps we can auto recognize YAML blocks under a header( as metadata) or the start of a page (as document declaration).

First 3 line is the document declaration for the whole page. There is also a local meta data under the first header “The beginnings of time”

!CommonMark: 0.1.23-github.username.projectname
 Title:      Title for the top bar of any browser
 Date:       32-4-2002

==========================
The beginnings of time
==========================
Date_Edited:  24th of jan 2043 
Last_Edit_by: Burko Ruffo

In the beginnings there were only darkness. But then with a keystroke, there was light.

metadata placed in div

<div title="Title for the top bar of any browser" date="32-4-2002" >

 <section>

  <div style="metadata date_edited">24th of jan 2043</div>
  <div style="metadata last_edit_by">Burko Ruffo</div>

  <h1> The beginnings of time </h1>
  <p>
     In the beginnings there were only darkness. 
     But then with a keystroke, there was light.
  </p>
 </section>

</div>

Hmmm… the document declaration metadata would probably be encased in meta tag and placed on top of html page.


This has the advantage of allowing sectioning of the page based on header or rules. E.g. with ruling for slideshow.

Should we use <section> ? or is div good enough? For this example, I’ll use section tag.

----
:id: slide1
:class: slidestyle
note: this is a test slide    

# slide title

normal text here

---

renders as

 <hr>

 <section id="slide1" style="slidestyle" >
  <div class="metadata note" >
   this is a test slide    
  </div>
  <h1> slide title </h1>
  <p>
     normal text here
  </p>
 </section>

 <hr>

Is this valid HTML?

You might be better off implementing something like this with consistent attribute syntax (or whatever is eventually decided for that).

consistent attribute syntax is only for single inline or block element. Not a section of elements that is separated by either a header or rule


Probbly should have data- prefix according to http://ejohn.org/blog/html-5-data-attributes/ so good catch. will fix now. So anything that is not a recognized html attribute is appended as data-.

In that case, I think the discussion on explicit sections is relevant. Whatever is decided there will likely be applicable to your example.

Aye, posted.

Initially thought for html representation of metadata, is to put it in attribute. But then I noticed that in resumes, people would type address and contact details in the same format. So I switch to div tags, but that seems rather limiting.

I think the best option for representing metadata is http://www.w3schools.com/tags/tag_dl.asp . It was recommended in html5 doctor. Incidentally there is a talk on description list here in this site e.g.

<h1>Authorship</h1>
<dl class="metadata authorship" >
  <dt>Authors:</dt>
  <dd>Remy Sharp</dd>
  <dd>Rich Clark</dd>
  <dt>Editor:</dt>
  <dd>Brandan Lennox</dd>
  <dt>Category:</dt>
  <dd>Comment</dd>
</dl>

For key/value pairs, definition lists are suitable. See my example earlier in this topic.

According to what you said Description List

==========================
The beginnings of time
==========================
Date Edited:  24th of jan 2043 
Last Edit by: Burko Ruffo

Will not be detected as a description list.

==========================
The beginnings of time
==========================
Date Edited
:  24th of jan 2043 
Last Edit by
: Burko Ruffo

But this one would. However it seems rather verbose line wise. And most people type like the first one above. Plus it’s not very YAML like. I was aiming to keep to YAML syntax (or as close to it) as possible. There need to be another way… but alas I’m out of idea for today.

(edit: surely an exception could be made for metadata entries that are right after a header?)

The description list marker is similar to the other list markers (for ordered and unordered lists) in that it has to be at the start of the line to avoid clashes with the marker character mid-sentence. Very few lines would use a number+full stop combination, hyphen, asterisk, plus sign, or a colon at the start of a line, so it’s relatively safe to place them there. Even for lines that directly follow a heading there’s a good chance that the list marker characters will be used in the middle of the line. This would lead to all sorts of awkward character escaping which I think we can all agree is unappealing to look at.