Metadata in documents


#14

Using that W3Schools example:

<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Hege Refsnes">

All of that information is in the page head, so I think using Jekyll-style front matter would be fine in that case:

---
charset: UTF-8
description: Free Web tutorials
keywords: HTML,CSS,XML,JavaScript
author: Hege Refsnes
---

The <meta> tag only ever goes in the <head> section of an HTML document. If you’re putting data in the body it’s usually represented by visible HTML element (unless the element is hidden by CSS).

Data definition lists can be used to display matching pairs, I’m not sure about secondary descriptors though. Perhaps the secondary descriptor could be just another definition. Using your example:

<dl>
  <dt>bob</dt>
  <dd>cat</dd>
  <dd>barfs furballs</dd>
  <dt>george</dt>
  <dd>dog</dd>
  <dd>very lazy</dd>
</dl>

And the Markdown would be:

bob
: cat
: barfs furballs
george
: dog
: very lazy

#15

Btw just noticed that ascii doc way of doing document declaration is

**Writing Documentation using AsciiDoc
====================================
Joe Bloggs <jbloggs@mymail.com>
v2.0, February 2003:
Rewritten for version 2 release.**

perhaps we can auto recognize YAML blocks under a header( as metadata) or the start of a page (as document declaration).

First 3 line is the document declaration for the whole page. There is also a local meta data under the first header “The beginnings of time”

!CommonMark: 0.1.23-github.username.projectname
 Title:      Title for the top bar of any browser
 Date:       32-4-2002

==========================
The beginnings of time
==========================
Date_Edited:  24th of jan 2043 
Last_Edit_by: Burko Ruffo

In the beginnings there were only darkness. But then with a keystroke, there was light.

metadata placed in div

<div title="Title for the top bar of any browser" date="32-4-2002" >

 <section>

  <div style="metadata date_edited">24th of jan 2043</div>
  <div style="metadata last_edit_by">Burko Ruffo</div>

  <h1> The beginnings of time </h1>
  <p>
     In the beginnings there were only darkness. 
     But then with a keystroke, there was light.
  </p>
 </section>

</div>

Hmmm… the document declaration metadata would probably be encased in meta tag and placed on top of html page.


This has the advantage of allowing sectioning of the page based on header or rules. E.g. with ruling for slideshow.

Should we use <section> ? or is div good enough? For this example, I’ll use section tag.

----
:id: slide1
:class: slidestyle
note: this is a test slide    

# slide title

normal text here

---

renders as

 <hr>

 <section id="slide1" style="slidestyle" >
  <div class="metadata note" >
   this is a test slide    
  </div>
  <h1> slide title </h1>
  <p>
     normal text here
  </p>
 </section>

 <hr>

Explicit section not possible?
Flowerbox Headers on top and bottom of a header
Consistent attribute syntax
#16

Is this valid HTML?

You might be better off implementing something like this with consistent attribute syntax (or whatever is eventually decided for that).


#17

consistent attribute syntax is only for single inline or block element. Not a section of elements that is separated by either a header or rule


Probbly should have data- prefix according to http://ejohn.org/blog/html-5-data-attributes/ so good catch. will fix now. So anything that is not a recognized html attribute is appended as data-.


#18

In that case, I think the discussion on explicit sections is relevant. Whatever is decided there will likely be applicable to your example.


#19

Aye, posted.

Initially thought for html representation of metadata, is to put it in attribute. But then I noticed that in resumes, people would type address and contact details in the same format. So I switch to div tags, but that seems rather limiting.

I think the best option for representing metadata is http://www.w3schools.com/tags/tag_dl.asp . It was recommended in html5 doctor. Incidentally there is a talk on description list here in this site e.g.

<h1>Authorship</h1>
<dl class="metadata authorship" >
  <dt>Authors:</dt>
  <dd>Remy Sharp</dd>
  <dd>Rich Clark</dd>
  <dt>Editor:</dt>
  <dd>Brandan Lennox</dd>
  <dt>Category:</dt>
  <dd>Comment</dd>
</dl>

#20

For key/value pairs, definition lists are suitable. See my example earlier in this topic.


#21

According to what you said Description List

==========================
The beginnings of time
==========================
Date Edited:  24th of jan 2043 
Last Edit by: Burko Ruffo

Will not be detected as a description list.

==========================
The beginnings of time
==========================
Date Edited
:  24th of jan 2043 
Last Edit by
: Burko Ruffo

But this one would. However it seems rather verbose line wise. And most people type like the first one above. Plus it’s not very YAML like. I was aiming to keep to YAML syntax (or as close to it) as possible. There need to be another way… but alas I’m out of idea for today.

(edit: surely an exception could be made for metadata entries that are right after a header?)


#22

The description list marker is similar to the other list markers (for ordered and unordered lists) in that it has to be at the start of the line to avoid clashes with the marker character mid-sentence. Very few lines would use a number+full stop combination, hyphen, asterisk, plus sign, or a colon at the start of a line, so it’s relatively safe to place them there. Even for lines that directly follow a heading there’s a good chance that the list marker characters will be used in the middle of the line. This would lead to all sorts of awkward character escaping which I think we can all agree is unappealing to look at.


#23

How about just keeping metadata as generic block directive syntax instead. Who knows, maybe one day we won’t be using YAML (Or maybe YAML has a higher version) etc…

!YAML: v1.2
::::::::::::::::::::::::::::::::::::::::::::::::::::
--- !clarkevans.com/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.
::::::::::::::::::::::::::::::::::::::::::::::::::::

or for json

!JSON:
::::::::::::::::::::::::::::::::::::::::::::::::::::
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
::::::::::::::::::::::::::::::::::::::::::::::::::::::

#24

I’d like to warn about YAML use. Though YAML is very nice for humans, it’s very difficult for correct implementation. Placing it to spec “as is” will create ass pain for client-side parsers. Mostly because of big size. I say this, as author of js-yaml - the most popular js implementation.

It worth to “officially” restrict some YAML features like omap, set, anchros, merge, custom types. That will make possible to create more fast and compact parser for YAML subset. IMHO, it would be enougth to support JSON types and (may be) Date.


#25

Is there a “slim YAML” version? As in a minimal spec YAML for embedded use? (e.g. “commonmark core” and Lua)

Alternatively, we can just define our own restricted metadata syntax. But I think it would be nicer if the is a common standard for small parsing engine for metadata. Maybe I’ll shoot them an email (done. was sent to their mailing list).


Some naming idea for slim/restricted subset of YAML:

  • uYAML - micro YAML
  • nYAML - nano YAML - not YAML
  • YAMLe - YAML embedded

#26

If I were to build a web app that uses markdown, I would make separate text fields for metadata anyway. I think the YAML metadata header is mainly useful if you work directly with (plain-text/markdown) files instead of web apps with potentially multiple text fields.

As such, metadata blocks should certainly be extensions and not “core spec”, because it doesn’t really make sense for client-side implementation, and as @vitaly said is hard to implement. This would save us the confusion of differentiating between a full YAML and a mini YAML, either an implementation supports it or it doesn’t, both is fine.


#27

No. YAML spec contains definitions of reduced schemas, but those are not practical and do not cover restrictions on anchors and so on. There are some movement to JSON in upcoming YAML 2.0, but it’s upcoming > 2 years and there are no estimates when this finish :slight_smile:

At first glance, it can be enougth if metadata contains stupid pairs:

---
foo: bar
baz: bad
---

If not - then “restricted” YAML should cover all needs.


#28

I think, metadata on client can be useful too sometime. With current tendencies it’s not good to assume software to run on server only.

I warn against writing anywere (in extentions spec too) that “format is YAML” - that means automatically, that full YAML spec must be supported.


#29

Yes, although I would add arrays (both the JSON and list syntax).

Of course, web apps include client-based html/javascript-only apps. I’m just saying that their UI is probably better off when you separate the metadata fields from the “body” markdown field.


#30

I guess it depends on whats the purpose of including metadata support in documents.

Maybe the default metadata support should be restricted to the top of the page “document declaration”, which will be our own style of metadata that is YAML inspired. Much like how <meta> tag is only allowed in head tag of HTML. This is because I think that metadata placed haphazardly across the page is bad design, without explicit declaration of !YAML:.

As for full support for metadata. We should include it only as a recommended but not mandatory inclusion of a !YAML or !json generic directive.


e.g. example of a simple document declaration. You’ll put this on the top of the page. Maybe we put all the restricted YAML structured data here.

!CommonMark: 0.1.23-github.username.projectname
 author:     Bane Liciea
 title:      Why you should hire me
 layout:     Resume
 arrayName:  [ val1, val2, val3 ]
 listKey: 
             - key:val
               key:val
             - key:val
               key:val

#31

Sent an email saying:

I would like to know if there is a restricted subset of YAML, that is
suitable for embedding in other parsers with minimal size for a small
set of core syntax (but extendible).

Got back this reply from - Trans :

There is no official support for such a format at this time, but I have started a initial project to create such a standard code named “Diet YAML”.

https://github.com/openbohemians/diet-yaml

The idea is simply to take YAML as is and remove the “extraneous” features that are unnecessary for use as a basic configuration file format.


In that link, his EBNF (said to be work in progress) looks like

YAML ::= Start Data End
Start ::= ( "\n---" | "" )
End ::= ( "\n..." | "\n---" | "" )
Data ::= (Scalar | Sequence | Mapping )
Scalar ::= (Number | String | Date | Boolean | Nil)
Sequence ::= ( "[" Data ("," Data)* "]" | OptionalTab "-" Data ("\n" OptionalTab "-" Data)* )
Mapping ::= ( "{" Key ":" Data ("," Key ":" Data)* "}" | Tab Key ":" Data ("\n" Tab Key ":" Data)* )
OptinalTab ::= Space*
Tab ::= Space+
String ::= '"' .* '"' | [^-] .+
Number ::= ("+" | "-")? [0-9]* ("." [0-9]+)?
Date ::= [0-9][0-9][0-9][0-9] "-" [0-1][0-9] "-" [0-3][0-9] ( [0-2][0-9] ":" [0-5][0-9] ":" [0-5][0-9] )?
Boolean ::= "true" | "false"
Nil ::= "~"
Space ::= " "

Based on observation, this one supports the --- ... fencing. Each entry can be either a Scalar key:value pair, or a sequence/list (via -), or map (restricted to scalar entries). Only supports a limited set of core types (Strings, Number, Date, Boolean, Null, space).


Pretty much perfect for usage as document declaration metadata. And should cover most use case of a typical writer (at least for me).


#32

Who is Trans? I don’t know him as yaml spec developper.


#33

poster from yaml-core@lists.sourceforge.net mailing list. I don’t think he is an official yaml spec developer. I think he is just proposing an alternative.

Perhaps we can shoot trans’ diet-YAML EBNF to an actual yaml spec dev and see if it makes sense to them. My biggest concerns is if it can ignore data entries it doesn’t recognise (since its a restricted subset of YAML)