Metadata in documents


#10

For visual data/blocks, why not just use directives?


#11

Good point. I guess what I’m aiming for in terms of “visual metadata” (As opposed to hidden metadata like with | example ), is a commonmark document that can be read as easily if formatted correctly to json. E.g. https://jsonresume.org/ but which can look as good as it can be easily parsed.

E.g. Formatting this txt resume to be easily read as a json data structure. resume example in markdown


Approach for visual metadata

hmmmm… noticed that people type list often like this

# header title  (secondary descriptor): description
Loose Key (secondary descriptor): description

List name:
* item name1 (secondary descriptor): description
* item name2 (secondary descriptor): description
* item name3 (secondary descriptor): description

e.g.

# About Animals (Year:1986) : You know you want to know more!   
Written By (Author): Greg
Publisher: Burkank

Animals:
* bob (cat) : barfs furballs
* george (dog) : very lazy
* alex (cat) : likes birds

The common thing is that it uses “Key(2nd value): Value” structure like YAML,
or “Key(2nd value) - Value” used for this example.

Perhaps we can use that?


Extra example of textual resumes: http://media.wiley.com/Lux/assets/03/126203.08037X%20fg0401.pdf


#12

JSON Resume looks like a cool project. It’s unfortunate that many companies still require CVs to be submitted as Word documents.

Could we just use YAML to represent the metadata (visual or not)? It seems like another syntax is being invented that represents essentially the same thing as YAML. YAML is already quite readable and compliments Markdown well visually.

If the metadata is to be visual we should think about which HTML elements would be used to represent the data.


#13

I think metadata in YAML and | format should be hidden. (Btw my proposal essentially is YAML, but avoids the --- to ensure a more compact representation. )

Not too sure how meta data in visual form (Key(2nd value): Value) (e.g. standard list, headers etc…) should be done in HTML. But this might give an idea http://www.w3schools.com/tags/tag_meta.asp


#14

Using that W3Schools example:

<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Hege Refsnes">

All of that information is in the page head, so I think using Jekyll-style front matter would be fine in that case:

---
charset: UTF-8
description: Free Web tutorials
keywords: HTML,CSS,XML,JavaScript
author: Hege Refsnes
---

The <meta> tag only ever goes in the <head> section of an HTML document. If you’re putting data in the body it’s usually represented by visible HTML element (unless the element is hidden by CSS).

Data definition lists can be used to display matching pairs, I’m not sure about secondary descriptors though. Perhaps the secondary descriptor could be just another definition. Using your example:

<dl>
  <dt>bob</dt>
  <dd>cat</dd>
  <dd>barfs furballs</dd>
  <dt>george</dt>
  <dd>dog</dd>
  <dd>very lazy</dd>
</dl>

And the Markdown would be:

bob
: cat
: barfs furballs
george
: dog
: very lazy

#15

Btw just noticed that ascii doc way of doing document declaration is

**Writing Documentation using AsciiDoc
====================================
Joe Bloggs <jbloggs@mymail.com>
v2.0, February 2003:
Rewritten for version 2 release.**

perhaps we can auto recognize YAML blocks under a header( as metadata) or the start of a page (as document declaration).

First 3 line is the document declaration for the whole page. There is also a local meta data under the first header “The beginnings of time”

!CommonMark: 0.1.23-github.username.projectname
 Title:      Title for the top bar of any browser
 Date:       32-4-2002

==========================
The beginnings of time
==========================
Date_Edited:  24th of jan 2043 
Last_Edit_by: Burko Ruffo

In the beginnings there were only darkness. But then with a keystroke, there was light.

metadata placed in div

<div title="Title for the top bar of any browser" date="32-4-2002" >

 <section>

  <div style="metadata date_edited">24th of jan 2043</div>
  <div style="metadata last_edit_by">Burko Ruffo</div>

  <h1> The beginnings of time </h1>
  <p>
     In the beginnings there were only darkness. 
     But then with a keystroke, there was light.
  </p>
 </section>

</div>

Hmmm… the document declaration metadata would probably be encased in meta tag and placed on top of html page.


This has the advantage of allowing sectioning of the page based on header or rules. E.g. with ruling for slideshow.

Should we use <section> ? or is div good enough? For this example, I’ll use section tag.

----
:id: slide1
:class: slidestyle
note: this is a test slide    

# slide title

normal text here

---

renders as

 <hr>

 <section id="slide1" style="slidestyle" >
  <div class="metadata note" >
   this is a test slide    
  </div>
  <h1> slide title </h1>
  <p>
     normal text here
  </p>
 </section>

 <hr>

Explicit section not possible?
Consistent attribute syntax
Flowerbox Headers on top and bottom of a header
#16

Is this valid HTML?

You might be better off implementing something like this with consistent attribute syntax (or whatever is eventually decided for that).


#17

consistent attribute syntax is only for single inline or block element. Not a section of elements that is separated by either a header or rule


Probbly should have data- prefix according to http://ejohn.org/blog/html-5-data-attributes/ so good catch. will fix now. So anything that is not a recognized html attribute is appended as data-.


#18

In that case, I think the discussion on explicit sections is relevant. Whatever is decided there will likely be applicable to your example.


#19

Aye, posted.

Initially thought for html representation of metadata, is to put it in attribute. But then I noticed that in resumes, people would type address and contact details in the same format. So I switch to div tags, but that seems rather limiting.

I think the best option for representing metadata is http://www.w3schools.com/tags/tag_dl.asp . It was recommended in html5 doctor. Incidentally there is a talk on description list here in this site e.g.

<h1>Authorship</h1>
<dl class="metadata authorship" >
  <dt>Authors:</dt>
  <dd>Remy Sharp</dd>
  <dd>Rich Clark</dd>
  <dt>Editor:</dt>
  <dd>Brandan Lennox</dd>
  <dt>Category:</dt>
  <dd>Comment</dd>
</dl>

#20

For key/value pairs, definition lists are suitable. See my example earlier in this topic.


#21

According to what you said Description List

==========================
The beginnings of time
==========================
Date Edited:  24th of jan 2043 
Last Edit by: Burko Ruffo

Will not be detected as a description list.

==========================
The beginnings of time
==========================
Date Edited
:  24th of jan 2043 
Last Edit by
: Burko Ruffo

But this one would. However it seems rather verbose line wise. And most people type like the first one above. Plus it’s not very YAML like. I was aiming to keep to YAML syntax (or as close to it) as possible. There need to be another way… but alas I’m out of idea for today.

(edit: surely an exception could be made for metadata entries that are right after a header?)


#22

The description list marker is similar to the other list markers (for ordered and unordered lists) in that it has to be at the start of the line to avoid clashes with the marker character mid-sentence. Very few lines would use a number+full stop combination, hyphen, asterisk, plus sign, or a colon at the start of a line, so it’s relatively safe to place them there. Even for lines that directly follow a heading there’s a good chance that the list marker characters will be used in the middle of the line. This would lead to all sorts of awkward character escaping which I think we can all agree is unappealing to look at.


#23

How about just keeping metadata as generic block directive syntax instead. Who knows, maybe one day we won’t be using YAML (Or maybe YAML has a higher version) etc…

!YAML: v1.2
::::::::::::::::::::::::::::::::::::::::::::::::::::
--- !clarkevans.com/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.
::::::::::::::::::::::::::::::::::::::::::::::::::::

or for json

!JSON:
::::::::::::::::::::::::::::::::::::::::::::::::::::
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
::::::::::::::::::::::::::::::::::::::::::::::::::::::

#24

I’d like to warn about YAML use. Though YAML is very nice for humans, it’s very difficult for correct implementation. Placing it to spec “as is” will create ass pain for client-side parsers. Mostly because of big size. I say this, as author of js-yaml - the most popular js implementation.

It worth to “officially” restrict some YAML features like omap, set, anchros, merge, custom types. That will make possible to create more fast and compact parser for YAML subset. IMHO, it would be enougth to support JSON types and (may be) Date.


#25

Is there a “slim YAML” version? As in a minimal spec YAML for embedded use? (e.g. “commonmark core” and Lua)

Alternatively, we can just define our own restricted metadata syntax. But I think it would be nicer if the is a common standard for small parsing engine for metadata. Maybe I’ll shoot them an email (done. was sent to their mailing list).


Some naming idea for slim/restricted subset of YAML:

  • uYAML - micro YAML
  • nYAML - nano YAML - not YAML
  • YAMLe - YAML embedded

#26

If I were to build a web app that uses markdown, I would make separate text fields for metadata anyway. I think the YAML metadata header is mainly useful if you work directly with (plain-text/markdown) files instead of web apps with potentially multiple text fields.

As such, metadata blocks should certainly be extensions and not “core spec”, because it doesn’t really make sense for client-side implementation, and as @vitaly said is hard to implement. This would save us the confusion of differentiating between a full YAML and a mini YAML, either an implementation supports it or it doesn’t, both is fine.


#27

No. YAML spec contains definitions of reduced schemas, but those are not practical and do not cover restrictions on anchors and so on. There are some movement to JSON in upcoming YAML 2.0, but it’s upcoming > 2 years and there are no estimates when this finish :slight_smile:

At first glance, it can be enougth if metadata contains stupid pairs:

---
foo: bar
baz: bad
---

If not - then “restricted” YAML should cover all needs.


#28

I think, metadata on client can be useful too sometime. With current tendencies it’s not good to assume software to run on server only.

I warn against writing anywere (in extentions spec too) that “format is YAML” - that means automatically, that full YAML spec must be supported.


#29

Yes, although I would add arrays (both the JSON and list syntax).

Of course, web apps include client-based html/javascript-only apps. I’m just saying that their UI is probably better off when you separate the metadata fields from the “body” markdown field.