Metadata in documents


#1

While others in Document titles was discussing about document titles, there was a divergent discussion on metadata standards.

http://talk.commonmark.org/t/document-titles/649/21

Document title is a metadata, but so is many other critical informations like author name etc… So we may likely need it.

Ah, especially for stuff like document and processor type declaration. e.g.

 | parser: commonmark core + storywriter
 | layout: resume

Would be pretty handy, if you want to select the right CSS for the type of document you are reading. E.g. resume looks different from a filmscript.


Anyway, below is what I written in the other thread, but moved here.


If you need explicitly, then this is a potential approach.

  1. | is a separator char that declares following line to be a key:value metadata statement
  2. Can stack multiple key values pairing of metadata, but must be seperated by | .
    e.g. | key:value : a ; b_ : c | key1:value1 gives {"key":"value : a ; b_ : c", "key1":"value1" }
  3. It is contextually dependent on h1, h2, h3 headings etc…

pros: Cleaner visuals. YAML like
cons: Cannot simply copy paste YAML like the jekyll ----- notation.

e.g.

                        My title  
| layout: post
| title: Blogging Like a Hacker

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

source text: http://www.elsewhere.org/journal/pomo/


Hmmmm… if we assume that | represents meta data. Then we can extend this to be contextually depended on subsections. Leading to this.

#                          My title                            #  
| layout: post
| title: Blogging Like a Hacker
| summary: How to blog like a hacker and win at life

... intro content...

##             subheading what to do?                          ##
| summary: You need to do this
| key1:value1
| key2:value2

... subcontent ...

##         subheading what to do? 2                         ##
| summary: You need to do this 2 | key1:value1 | key2:value2

... 2nd subcontent ...

So the meta data might not show in visual representation. But would certainly be viewable in AST. This allows for perhaps smarter applications like say code block metadata. (Should not be used for extension settings, as that is not the purpose of | but rather the generic attribute syntax {})

```````````````````````````````````````````` {.python requiredPackages=numPy }
    print("hello world")
````````````````````````````````````````````
| author: burk dake | year: 2014 | licence: Public Domain |

Need block level metadata as well? Maybe this form?

             Blogging Like a Hacker

||||||||||||||||||||||||||||||||||||||||||||||||||||
layout: post
title: Blogging Like a Hacker
summary: How to blog like a hacker and win at life
||||||||||||||||||||||||||||||||||||||||||||||||||||

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

Block level could provide additional opportunities for smarter metadata structure. E.g. Embedded Vcards.

||||||||||||||||| Vcard: Forrest Gump |||||||||||||||||
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:100 Waters Edge=0D=0ABaytown, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE:42 Plantation St.=0D=0ABaytown, LA 30314=0D=0AUnited States of America
EMAIL;PREF;INTERNET:forrestgump@example.com
REV:20080424T195243Z
END:VCARD
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

If the parser cannot handle whatever is in the block, the metadata would simply be saved as a string (e.g. vcard syntax might not be readable by certain parsers).

Only issue is, how do you selectively show and hide certain metadatas? (e.g. I imagine you want to show your Vcard sometimes in rendered view.


Method for comments, especially multiline
The case for a `<!CommonMark>` declaration tag
#2

Whats wrong with YAML sections between --- and ...? Pandoc supports them anywhere in the document if preceeded by a blank line: http://johnmacfarlane.net/pandoc/demo/example9/pandocs-markdown.html - it’s up to the Markdown Processor to make use of this metadata blocks, for instance as suggested by you. I’d just reject introducing another syntax instead of these YAML blocks.


#3

note: I’m not discussing about whats inside the metadata. I think YAML is good enough. (But do encourage support for Vcards somehow. Most likely as an extention). I’m mostly discussing how to visually seperate metadata from document.

I don’t mind --- & ... for block metadata. However for short metadata blocks, it is overkill, and occupy too much visual presence. | is a bit more compact, and can be stacked to occupy even less space.

Compare

                        My title  

| layout: post
| title: Blogging Like a Hacker

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

with

                        My title  

-----------------------------------------------------------------
layout: post
title: Blogging Like a Hacker
.................................................................   

The primary theme of Porter’s[1] critique of dialectic desublimation is the difference between class and society. Several theories concerning not, in fact, discourse, but subdiscourse exist. Thus, the subject is interpolated into a Sartreist absurdity that includes culture as a reality.

My initial idea of |||||||| is indeed a massive visual presence and perhaps should be discouraged (except maybe for metadata that is also displayed to user… e.g. VCards)

However for short document declaration, it occupies too much visual presence, that it does detract from the actual content. Each has it’s merits, but I think we need both a “block metadata” and a ‘visually compact’ version as well. Visually compact metadata is mostly hidden from users, but block metadata might be shown (With some sort of switch perhaps).


Hmmmmm… in the case of block metadata

Can you say that since ||| has a bigger visual presence than --- & ... we could treat the difference in if the metadata should be displayed or hidden?

For metadata that should be shown

||||||| Burk Drake  |||||||
 name: Burk Drake
 telephone: (702) 932-6523
|||||||||||||||||||||||||||

While for metadata that should be hidden from normal view.

----------------
layout: post 
title: Blogging Like a Hacker
................

Vcard as an extention

How to support Vcards? Well I guess that just have to be an extention. But perhaps you can allow generic extentions to output metadata as well. So it might look like this

!!!!!!!!!!!!!!!!!Vcard[Burk Drake's Vcard]
 begin:vcard 
 n:Burk;Drake
 tel;work:(702) 932-6523
 end:vcard
!!!!!!!!!!!!!!!!!

would call a general directive function that outputs either YAML vcards (like example in previous section), or a microformat hcard (depending on setting).


#4

nanoc, plume and probably plenty of other static page generators seem to use the — fence to mark the metadata. I wouldn’t use something different.

regarding vcard support surely it fits well the block-level extension syntax (either

@@@vcard(caption)
...
@@@

or

!!!vcard(caption)
...

#5

yea I agree that vcard would work as an extension within a generic directive framework via @ or !.


#6

Interesting ideas @mofosyne.

Regarding the use of ... as a terminator for meta data blocks, would it not be better to use the same start and end syntax? This would be be consistent with the rest of Markdown, for example:

### This is a third level heading ###

or

```ruby
puts "Hello World"

With this in mind, I'm still in favour of the Jekyll syntax for placing meta data at the top of a file.

[quote="mofosyne, post:1, topic:721"]
| is a separator char that declares following line to be a key:value metadata statement
Can stack multiple key values pairing of metadata, but must be seperated by | .  e.g. | key:value : a ; b_ : c | key1:value1 gives {"key":"value : a ; b_ : c", "key1":"value1" }

It is contextually dependent on h1, h2, h3 headings etc...


pros: Cleaner visuals. YAML likecons: Cannot simply copy paste YAML like the jekyll ----- notation.
[/quote]

This looks nice visually. If I understand correctly, this is for embedding meta data within a section of the document (e.g. not just at the top)? Is there a use case for this?

#7

Not sure if I can super think of one. But stuff like copyright attribution, or maybe “figure 1”, for codes blocks might be handy. Or maybe making a resume more searchable. Or providing a speech synth metadata for exact pronunciation of certain hard to say words? Or maybe you need to cite your sources? Or maybe you have a summery of the section?

This is one of those applications where you build it, and people will come.


#8

The examples you mentioned sound like they might be displayed visually. Perhaps some are already covered by our discussion in the directives topic? Would they map to particular HTML elements?


#9

well I see | as first character of a line to be for hidden metadata. As for block diagrams, I see |||||| as displayed visually, while --- to ... is hidden. As for reason, its about how visually strong it is. |||| is very strong, so I thought that might mean it might mean the user want to display it visually. (You can see that in my examples)


#10

For visual data/blocks, why not just use directives?


#11

Good point. I guess what I’m aiming for in terms of “visual metadata” (As opposed to hidden metadata like with | example ), is a commonmark document that can be read as easily if formatted correctly to json. E.g. https://jsonresume.org/ but which can look as good as it can be easily parsed.

E.g. Formatting this txt resume to be easily read as a json data structure. resume example in markdown


Approach for visual metadata

hmmmm… noticed that people type list often like this

# header title  (secondary descriptor): description
Loose Key (secondary descriptor): description

List name:
* item name1 (secondary descriptor): description
* item name2 (secondary descriptor): description
* item name3 (secondary descriptor): description

e.g.

# About Animals (Year:1986) : You know you want to know more!   
Written By (Author): Greg
Publisher: Burkank

Animals:
* bob (cat) : barfs furballs
* george (dog) : very lazy
* alex (cat) : likes birds

The common thing is that it uses “Key(2nd value): Value” structure like YAML,
or “Key(2nd value) - Value” used for this example.

Perhaps we can use that?


Extra example of textual resumes: http://media.wiley.com/Lux/assets/03/126203.08037X%20fg0401.pdf


#12

JSON Resume looks like a cool project. It’s unfortunate that many companies still require CVs to be submitted as Word documents.

Could we just use YAML to represent the metadata (visual or not)? It seems like another syntax is being invented that represents essentially the same thing as YAML. YAML is already quite readable and compliments Markdown well visually.

If the metadata is to be visual we should think about which HTML elements would be used to represent the data.


#13

I think metadata in YAML and | format should be hidden. (Btw my proposal essentially is YAML, but avoids the --- to ensure a more compact representation. )

Not too sure how meta data in visual form (Key(2nd value): Value) (e.g. standard list, headers etc…) should be done in HTML. But this might give an idea http://www.w3schools.com/tags/tag_meta.asp


#14

Using that W3Schools example:

<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Hege Refsnes">

All of that information is in the page head, so I think using Jekyll-style front matter would be fine in that case:

---
charset: UTF-8
description: Free Web tutorials
keywords: HTML,CSS,XML,JavaScript
author: Hege Refsnes
---

The <meta> tag only ever goes in the <head> section of an HTML document. If you’re putting data in the body it’s usually represented by visible HTML element (unless the element is hidden by CSS).

Data definition lists can be used to display matching pairs, I’m not sure about secondary descriptors though. Perhaps the secondary descriptor could be just another definition. Using your example:

<dl>
  <dt>bob</dt>
  <dd>cat</dd>
  <dd>barfs furballs</dd>
  <dt>george</dt>
  <dd>dog</dd>
  <dd>very lazy</dd>
</dl>

And the Markdown would be:

bob
: cat
: barfs furballs
george
: dog
: very lazy

#15

Btw just noticed that ascii doc way of doing document declaration is

**Writing Documentation using AsciiDoc
====================================
Joe Bloggs <jbloggs@mymail.com>
v2.0, February 2003:
Rewritten for version 2 release.**

perhaps we can auto recognize YAML blocks under a header( as metadata) or the start of a page (as document declaration).

First 3 line is the document declaration for the whole page. There is also a local meta data under the first header “The beginnings of time”

!CommonMark: 0.1.23-github.username.projectname
 Title:      Title for the top bar of any browser
 Date:       32-4-2002

==========================
The beginnings of time
==========================
Date_Edited:  24th of jan 2043 
Last_Edit_by: Burko Ruffo

In the beginnings there were only darkness. But then with a keystroke, there was light.

metadata placed in div

<div title="Title for the top bar of any browser" date="32-4-2002" >

 <section>

  <div style="metadata date_edited">24th of jan 2043</div>
  <div style="metadata last_edit_by">Burko Ruffo</div>

  <h1> The beginnings of time </h1>
  <p>
     In the beginnings there were only darkness. 
     But then with a keystroke, there was light.
  </p>
 </section>

</div>

Hmmm… the document declaration metadata would probably be encased in meta tag and placed on top of html page.


This has the advantage of allowing sectioning of the page based on header or rules. E.g. with ruling for slideshow.

Should we use <section> ? or is div good enough? For this example, I’ll use section tag.

----
:id: slide1
:class: slidestyle
note: this is a test slide    

# slide title

normal text here

---

renders as

 <hr>

 <section id="slide1" style="slidestyle" >
  <div class="metadata note" >
   this is a test slide    
  </div>
  <h1> slide title </h1>
  <p>
     normal text here
  </p>
 </section>

 <hr>

Flowerbox Headers on top and bottom of a header
Explicit section not possible?
Consistent attribute syntax
#16

Is this valid HTML?

You might be better off implementing something like this with consistent attribute syntax (or whatever is eventually decided for that).


#17

consistent attribute syntax is only for single inline or block element. Not a section of elements that is separated by either a header or rule


Probbly should have data- prefix according to http://ejohn.org/blog/html-5-data-attributes/ so good catch. will fix now. So anything that is not a recognized html attribute is appended as data-.


#18

In that case, I think the discussion on explicit sections is relevant. Whatever is decided there will likely be applicable to your example.


#19

Aye, posted.

Initially thought for html representation of metadata, is to put it in attribute. But then I noticed that in resumes, people would type address and contact details in the same format. So I switch to div tags, but that seems rather limiting.

I think the best option for representing metadata is http://www.w3schools.com/tags/tag_dl.asp . It was recommended in html5 doctor. Incidentally there is a talk on description list here in this site e.g.

<h1>Authorship</h1>
<dl class="metadata authorship" >
  <dt>Authors:</dt>
  <dd>Remy Sharp</dd>
  <dd>Rich Clark</dd>
  <dt>Editor:</dt>
  <dd>Brandan Lennox</dd>
  <dt>Category:</dt>
  <dd>Comment</dd>
</dl>

#20

For key/value pairs, definition lists are suitable. See my example earlier in this topic.