Metadata in documents

Aye, posted.

Initially thought for html representation of metadata, is to put it in attribute. But then I noticed that in resumes, people would type address and contact details in the same format. So I switch to div tags, but that seems rather limiting.

I think the best option for representing metadata is http://www.w3schools.com/tags/tag_dl.asp . It was recommended in html5 doctor. Incidentally there is a talk on description list here in this site e.g.

<h1>Authorship</h1>
<dl class="metadata authorship" >
  <dt>Authors:</dt>
  <dd>Remy Sharp</dd>
  <dd>Rich Clark</dd>
  <dt>Editor:</dt>
  <dd>Brandan Lennox</dd>
  <dt>Category:</dt>
  <dd>Comment</dd>
</dl>

For key/value pairs, definition lists are suitable. See my example earlier in this topic.

According to what you said Description List

==========================
The beginnings of time
==========================
Date Edited:  24th of jan 2043 
Last Edit by: Burko Ruffo

Will not be detected as a description list.

==========================
The beginnings of time
==========================
Date Edited
:  24th of jan 2043 
Last Edit by
: Burko Ruffo

But this one would. However it seems rather verbose line wise. And most people type like the first one above. Plus it’s not very YAML like. I was aiming to keep to YAML syntax (or as close to it) as possible. There need to be another way… but alas I’m out of idea for today.

(edit: surely an exception could be made for metadata entries that are right after a header?)

The description list marker is similar to the other list markers (for ordered and unordered lists) in that it has to be at the start of the line to avoid clashes with the marker character mid-sentence. Very few lines would use a number+full stop combination, hyphen, asterisk, plus sign, or a colon at the start of a line, so it’s relatively safe to place them there. Even for lines that directly follow a heading there’s a good chance that the list marker characters will be used in the middle of the line. This would lead to all sorts of awkward character escaping which I think we can all agree is unappealing to look at.

How about just keeping metadata as generic block directive syntax instead. Who knows, maybe one day we won’t be using YAML (Or maybe YAML has a higher version) etc…

!YAML: v1.2
::::::::::::::::::::::::::::::::::::::::::::::::::::
--- !clarkevans.com/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.
::::::::::::::::::::::::::::::::::::::::::::::::::::

or for json

!JSON:
::::::::::::::::::::::::::::::::::::::::::::::::::::
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
::::::::::::::::::::::::::::::::::::::::::::::::::::::
1 Like

I’d like to warn about YAML use. Though YAML is very nice for humans, it’s very difficult for correct implementation. Placing it to spec “as is” will create ass pain for client-side parsers. Mostly because of big size. I say this, as author of js-yaml - the most popular js implementation.

It worth to “officially” restrict some YAML features like omap, set, anchros, merge, custom types. That will make possible to create more fast and compact parser for YAML subset. IMHO, it would be enougth to support JSON types and (may be) Date.

4 Likes

Is there a “slim YAML” version? As in a minimal spec YAML for embedded use? (e.g. “commonmark core” and Lua)

Alternatively, we can just define our own restricted metadata syntax. But I think it would be nicer if the is a common standard for small parsing engine for metadata. Maybe I’ll shoot them an email (done. was sent to their mailing list).


Some naming idea for slim/restricted subset of YAML:

  • uYAML - micro YAML
  • nYAML - nano YAML - not YAML
  • YAMLe - YAML embedded

If I were to build a web app that uses markdown, I would make separate text fields for metadata anyway. I think the YAML metadata header is mainly useful if you work directly with (plain-text/markdown) files instead of web apps with potentially multiple text fields.

As such, metadata blocks should certainly be extensions and not “core spec”, because it doesn’t really make sense for client-side implementation, and as @vitaly said is hard to implement. This would save us the confusion of differentiating between a full YAML and a mini YAML, either an implementation supports it or it doesn’t, both is fine.

2 Likes

No. YAML spec contains definitions of reduced schemas, but those are not practical and do not cover restrictions on anchors and so on. There are some movement to JSON in upcoming YAML 2.0, but it’s upcoming > 2 years and there are no estimates when this finish :slight_smile:

At first glance, it can be enougth if metadata contains stupid pairs:

---
foo: bar
baz: bad
---

If not - then “restricted” YAML should cover all needs.

1 Like

I think, metadata on client can be useful too sometime. With current tendencies it’s not good to assume software to run on server only.

I warn against writing anywere (in extentions spec too) that “format is YAML” - that means automatically, that full YAML spec must be supported.

Yes, although I would add arrays (both the JSON and list syntax).

Of course, web apps include client-based html/javascript-only apps. I’m just saying that their UI is probably better off when you separate the metadata fields from the “body” markdown field.

I guess it depends on whats the purpose of including metadata support in documents.

Maybe the default metadata support should be restricted to the top of the page “document declaration”, which will be our own style of metadata that is YAML inspired. Much like how <meta> tag is only allowed in head tag of HTML. This is because I think that metadata placed haphazardly across the page is bad design, without explicit declaration of !YAML:.

As for full support for metadata. We should include it only as a recommended but not mandatory inclusion of a !YAML or !json generic directive.


e.g. example of a simple document declaration. You’ll put this on the top of the page. Maybe we put all the restricted YAML structured data here.

!CommonMark: 0.1.23-github.username.projectname
 author:     Bane Liciea
 title:      Why you should hire me
 layout:     Resume
 arrayName:  [ val1, val2, val3 ]
 listKey: 
             - key:val
               key:val
             - key:val
               key:val

Sent an email saying:

I would like to know if there is a restricted subset of YAML, that is
suitable for embedding in other parsers with minimal size for a small
set of core syntax (but extendible).

Got back this reply from - Trans :

There is no official support for such a format at this time, but I have started a initial project to create such a standard code named “Diet YAML”.

GitHub - openbohemians/diet-yaml: A Low Calorie YAML Alternative

The idea is simply to take YAML as is and remove the “extraneous” features that are unnecessary for use as a basic configuration file format.


In that link, his EBNF (said to be work in progress) looks like

YAML ::= Start Data End
Start ::= ( "\n---" | "" )
End ::= ( "\n..." | "\n---" | "" )
Data ::= (Scalar | Sequence | Mapping )
Scalar ::= (Number | String | Date | Boolean | Nil)
Sequence ::= ( "[" Data ("," Data)* "]" | OptionalTab "-" Data ("\n" OptionalTab "-" Data)* )
Mapping ::= ( "{" Key ":" Data ("," Key ":" Data)* "}" | Tab Key ":" Data ("\n" Tab Key ":" Data)* )
OptinalTab ::= Space*
Tab ::= Space+
String ::= '"' .* '"' | [^-] .+
Number ::= ("+" | "-")? [0-9]* ("." [0-9]+)?
Date ::= [0-9][0-9][0-9][0-9] "-" [0-1][0-9] "-" [0-3][0-9] ( [0-2][0-9] ":" [0-5][0-9] ":" [0-5][0-9] )?
Boolean ::= "true" | "false"
Nil ::= "~"
Space ::= " "

Based on observation, this one supports the --- ... fencing. Each entry can be either a Scalar key:value pair, or a sequence/list (via -), or map (restricted to scalar entries). Only supports a limited set of core types (Strings, Number, Date, Boolean, Null, space).


Pretty much perfect for usage as document declaration metadata. And should cover most use case of a typical writer (at least for me).

Who is Trans? I don’t know him as yaml spec developper.

poster from yaml-core@lists.sourceforge.net mailing list. I don’t think he is an official yaml spec developer. I think he is just proposing an alternative.

Perhaps we can shoot trans’ diet-YAML EBNF to an actual yaml spec dev and see if it makes sense to them. My biggest concerns is if it can ignore data entries it doesn’t recognise (since its a restricted subset of YAML)

+1 on restricting yaml to basic types + arrays.

1 Like

May be you don’t know, every hobbyst propose alternatives :slight_smile: . JSON5, TOML, and so on. It worth do discuss serious things with yaml spec authors. Everything else is a waste of time.

The best we can do at our side - collect use cases first.

That’s difficult, and i don’t see practical reasons, except for text highlight. There are many ways to “break” yaml with identation, without complex types:

a: b
c

try it here YAML parser for JavaScript - JS-YAML

2 Likes

+++ Vitaly Puzrin [Nov 12 14 10:27 ]:

I completely understand the worries about the complexity of full YAML.

At first glance, it can be enougth if metadata contains stupid pairs:

---
foo: bar
baz: bad
---

One reason I ended up supporting both lists and objects in pandoc (which actually just uses a real YAML parser) is that both seem useful in document metadata, esp. for books:

For example, the following maps on to standard EPUB metadata:

+++ mofosyne [Nov 12 14 14:09 ]:

There is no official support for such a format at this time, but I have started a initial project to create such a standard code named “Diet YAML”.

GitHub - openbohemians/diet-yaml: A Low Calorie YAML Alternative

The idea is simply to take YAML as is and remove the “extraneous” features that are unnecessary for use as a basic configuration file format.


In that link, his EBNF (said to be work in progress) looks like

YAML ::= Start Data End
Start ::= ( “\n—” | “” )
End ::= ( “\n…” | “\n—” | “” )
Data ::= (Scalar | Sequence | Mapping )
Scalar ::= (Number | String | Date | Boolean | Nil)
Sequence ::= ( “[” Data (“,” Data)* “]” | OptionalTab “-” Data (“\n” OptionalTab “-” Data)* )
Mapping ::= ( “{” Key “:” Data (“,” Key “:” Data)* “}” | Tab Key “:” Data (“\n” Tab Key “:” Data)* )
OptinalTab ::= Space*
Tab ::= Space+
String ::= ‘"’ .* ‘"’ | [^-] .+
Number ::= (“+” | “-”)? [0-9]* (“.” [0-9]+)?
Date ::= [0-9][0-9][0-9][0-9] “-” [0-1][0-9] “-” [0-3][0-9] ( [0-2][0-9] “:” [0-5][0-9] “:” [0-5][0-9] )?
Boolean ::= “true” | “false”
Nil ::= “~”
Space ::= " "

Based on observation, this one supports the --- ... fencing. Each entry can be either a Scalar key:value pair, or a sequence/list (via -), or map (restricted to scalar entries). Only supports a limited set of core types (Strings, Number, Date, Boolean, Null, space).


This would not be hard to parse, but seems to lack what I think is a
crucial feature: the ability to specify multiline strings using |:

title: My Article
abstract: |
  This is the abstract of my
  article.  It can go on and on.

  It can even have two paragraphs,
  or a list:

  - one
  - two
1 Like

Of cause, multiline strings and quoting can not be candidates for removal. That’s why i asked who is that guy who proposed that notation. I can prepare better summary, if you wish, in couple of days. At first glance, this can be removed:

  • omap, pairs,
  • merge
  • anchors
  • custom types
  • binary type
  • binary and octal numbers, ‘infinity’
  • directives
  • writer
  • (? not sure) scientific floats (+1.2e5)
  • (? not sure) explicits (!!float 123, !!str true)

That does not touch markup anyhow, only remove features. And still can be parsed with full-weight implementations.

1 Like