Compact URIs (CURIEs)

Burt_Harris · September 10, 2014, 9:51pm

Imagine for a moment being able to type a shortcut notation like <WP:TL;DR> to generate an external hyperlink that helped readers understand your concise intent. Better yet, try mousing over the link above.

Compact URIs

About 5 years back, the textual datatype W3C Compact URI (a.k.a CURIE) was standardized. CURIEs are similar to QNames, but resolved some of their shortcomings. CURIEs are the sort of building-block datatype that would have been a great foundation build XML, YAML, etc.; but came a bit too late for those standards, but it’s seems worth consideration consider for CommonMark.

Leveraging CURIEs for in CommonMark could be a great way to help standardize a way of being non-standard. To do so, we need a few things:

A set locations where text would be interpreted as CURIEs
A syntax for declaring CURIE prefixes
Some conventions on dealing with unrecognized prefixes

Lets illustrate the first two of these with an example. There are potentially many locations in CommonMark that could benefit from CURIE syntax, but lets start with a simple case of a hyperlink to illustrate:

<?prefix wp: <http://en.wikipedia.org/wiki/>?>

[JavaScript]: wp:JavaScript
[prototype-based]: prototype-based

**[JavaScript]** is a [prototype-based] programming language.

Which (under this proposal) should produce HTML output of:

<p><strong><a href="http://en.wikipedia.org/wiki/JavaScript">JavaScript</a></strong> is a <a href="http://en.wikipedia.org/wiki/prototype-based">prototype-based</a> programming language.</p>

To be clear, CURIE processing should only occur in contexts where a CURIE is expected, the naked instance of something that looks like a CURIE in plain text a would not invoke the auto-linking mechanism. So a counter example is in order:

<?prefix wp: <http://en.wikipedia.org/wiki/>?>

The text wp:foo is not treated as a CURIE.

Should produce HTML output of:

<p>The text wp:foo is not treated as a CURIE.</p>

But when angle brackets set in…

Angle brackets are frequently an indication of hyperlink intent. But we don’t want to misinterpret angle brackets needlessly, so we need some clearer indication of the writer’s intent before interpreting something that might be an extension as an extension.

A processor can distinguish between a URI and a CURIE only if the prefix has been declared. Pre-assignment of some of prefixes (like wp) might be part of some flavors of CommonMark, but we also need a way for authors to declare prefixes unanticipated by a flavor implementer.

Prefix Processing Instruction

The <?prefix... processing directive would be passed through to the in the output verbatim, but would be ignored by a web browser. Such an extension could be implemented in a post-processor, or a more integrated extension could address it during CommonMark processing. A CURIE prefix follows the XML definition of a NCName, and would be followed by the URI prefix definition. As shown in this example, it’s generally useful that the prefix is defined with any trailing delimiter included.

CURIEs for CommonMark extensions

The same Compact URI prefixes that can reduce the amount of text needed to specifying the URL target of a hyperlink could be leveraged in a framework for CommonMark extensions. In an integrated extension however, the CURIEs would only be treated as a identifier, and generally not de-referenced CommonMark processor, only (potentially) referenced by it. For example, let’s say that the @ character is treated as an extension mechanism for link syntax, then something like this might make sense:

<?prefix x: <http://www.sample.com/formspackage/>?>
@x:button[OK]: submit.aspx
@x:button[Cancel]: home.htm
 
Are you sure:<br> [OK] [Cancel]

Now, if prefix and CURIE were recognized by an integrated CommonMark processor extension, then it could output something like:

<p>Are you sure:<br> <button href='submit.aspx'>OK</button> <button href='home.htm'>Cancel</button></p>

While if URI were not a recognized extension, it might make sense to generate links rather than buttons. If use of an extension were semantically ‘necessary’ to the document content, then perhaps a different prefix (e.g. !) might be appropriate. The best output from the processor for a unrecognized ! extension might be warning message.

Its important to emphasize that the CommonMark processor should never need to de-reference an extension URI generated from, a Curie prefixed with @ or !, it would instead be simply matching it up with the identifiers specified by available extensions. But the use of a resolvable URL for extensions might be a convenient way to find documentation on the extension.

Burt_Harris · September 11, 2014, 7:42pm

<?prefix?> and CURIE auto-links

If we consider adopting the principle that CURIEs should be permitted anywhere a URI is, perhaps the first area to explore should be how this could beneficially extend auto-linking. Given:

<?prefix wp: <http://en.wikipedia.org/wiki/> ?>

<wp:Namespace>

It would be possible to refer to a concept like <wp:Namespace> with an external definition. The content of the link should be the literal (unexpanded) content of the angle brackets, while the href and title would be expanded, so it would be rendered as wp:Namespace. The specific HTML output I would target would be:

<p><a class="auto-link prefix-wp" title="http://en.wikipedia.org/wiki/Namespace" href="http://en.wikipedia.org/wiki/Namespace" >wp:Namespace</a></p>

Giving it nice CSS compatibility and hover-over behavior with little effort.

Burt_Harris · September 12, 2014, 11:58pm

Sugar-free CommonMark Extensions

I’m have edited my previous posts to use the existing processing instruction notation for prefix definition, rather than a percent-sign one that requires a syntax extension. The more I think about it, the more I think this can lead to a framework for sugar-free CommonMark extensions.

Let’s say I was really ambitious, and wanted to add LaTex’s power to a CommonMark processor. But not everyone using my extneded processor wants that power. We declare use of the extension with the following:

<?prefix wp+: <http://en.wikipedia.org/wiki/> !extension> ?>

**CommonMark** markup is respected here.  But when I want to embed some LaTeX, I can:

<?wp+:LaTeX?>
  \maketitle
  \LaTeX{} is a document preparation system  
  for the \TeX{}typesetting program. ...
<?/wp+:LaTeX?>

OK, I’ll be the first to admit it looks a little weird, but I think the cool thing is that it would allow me to implement my hypothetical extension as a pure postprocessor, or pure preprocessor, and not get bogged down adding sugar to CommonMark.

mofosyne · October 12, 2014, 3:37am

http://en.wikipedia.org/wiki/Help:Link#Interwiki_links

This strikes me as similar to CURIE proposal here. Here is some wiki syntax:

[[abc]] is seen as “abc” in text and links to page “abc”.

It has the purpose of being able to jump to pages within a wiki like wikipedia

Also there is a similar format for finding a wiki page from another wiki site

For example, [[m:Help:Link]] links to the “Help:Link” page on Meta, while [[:commons:Athens]] links to page “Athens” on Wikimedia Commons as: commons:Athens.

Btw, maybe a good test of a lighweight markup language, is how well it works as a wiki markup language.

chrisalley · October 21, 2014, 6:41am

I’m having trouble seeing the utility of a CURIE extension for CommonMark. Let’s see some real world documents that would benefit the feature.

For the Wikipedia example, see my much simpler proposal for declaring implicit, relative links.

The example with OK and Cancel buttons in the original post looks like the writer is attempting to add functionality to a Markdown document. That surely goes beyond the scope of Markdown?

mofosyne · October 21, 2014, 7:07am

How Compact URL might look like

Relative urls is used in wikipedia, so using that as a base of reference, here is how compact urls may work. For relative urls go to here

Base Compact URL Declaration (Similar to remote referencing for images markdown)

[[google:<<search term>>]]: www.google.com/?q=<<search term>>#q=<<search term>> "Search Engine"
[[wikipedia:<<article name>>]]: en.wikipedia.org/wiki/<<articlename>> "Encyclopaedia"

Appending name to url base

[[google:cat]] 
   Links to --> https://www.google.com/?q=cat#q=cat
[[wikipedia:Analytical_Review]] 
   Links to --> http://en.wikipedia.org/wiki/Analytical_Review

chrisalley · October 21, 2014, 7:14am

I can see how it would work in theory; my question earlier was about real world usage. How often would you need to link to a Google search terms like that in a single document? If only one or two times, just write out Markdown links manually; no need to complicate the document with additional syntax.

My concern is that if we start adding lots of variables, Markdown files will quickly feel like programming code.

mofosyne · October 21, 2014, 7:51am

I probably won’t need google search term. But I definitely would appreciate being able to easily reference Wikipedia pages in my blog post. Even self notes can be very useful if it includes some sort of ‘peeking’ functionality. Other idea could include [[dictionary:vexations]] for convenient referencing of words (linked to an online dictionary website).

This doesn’t always have to be declared on every document if common enough. (e.g. you could have a user editable list of common root urls to support)

If it seems too programmmic, we can perhaps restrict it to act like this instead. Which requires website url to be well formed I guess. But is less complex, and thus less programmic.

[[wikipedia]]: en.wikipedia.org/wiki/ "Encyclopaedia"

[[wikipedia:Analytical_Review]] 
   Links to --> http://en.wikipedia.org/wiki/Analytical_Review

Burt_Harris · October 22, 2014, 11:10pm

I’m not sure understand the concern, in particular what it has to do with programming code? Perhaps it’s because you’re considering these to be variables that you see a parallel, I wouldn’t characterize them that way.

Lots of professions (perhaps most of them) have specialized vocabularies (including acronyms and initialisms) they use like code in writing and speech, and such vocabularies tend to cluster. For example, I sat down with an investment advisor today who used a whole set of words from a different vocabulary than I’m used to, e.g. recharacterization, REIT, RMD, and Roth to pick out examples starting with a single letter. All of these are terms that might show up in a financial glossary.

I see nothing particularly programming-like about such specialized vocabularies, and being able to pick a short prefixes to clarify terminology in writing with hyperlinks could be quite valuable, particularly when combined with a subject-matter specific glossary recourse on the internet.

...after age 70.5, [[F:RMD]]s could impact your tax bracket.   A [[F:Roth]] is not subject to RMDs during the owners lifetime.

The practical applications could go well beyond simply shortcutting hyperlinks however, because additional semantics (beyond the URI base) can be associated with a prefix. As a simple example (and to illustrate why I think special sytax for prefix declaration is justified), perhaps the prefix is declared with something like this:

<?prefix F: http://www.investopedia.com/terms?q= !hideprefix> ?>

Where the !hideprefix indicates the F: prefix should be suppressed in the hyperlink text, so the result might look like:

…after age 70.5, RMDs could impact your tax bracket. A Roth is not subject to RMDs during the owner’s lifetime.

chrisalley · October 24, 2014, 1:11pm

What you’ve described could be considered code as it is foreign to readers. The way in which these vocabularies are used is very similar to how constants are defined in programming code.

Markdown itself does something similar with reference-style links. I’m not sure if this is a good thing. I’ve always used the simpler inline link syntax.

If this proposal goes ahead I suggest we focus on making the syntax as close to the regular link syntax as possible. This will make the extension syntax intuitive for existing Markdown writers.

mofosyne · October 24, 2014, 5:02pm

Btw, this is what a standard markdown reference looks like:

 [id]: http://example.com/  "Optional Title Here"

Here are the possibilities:

reference link style CURIE

 [[wikipedia]]: en.wikipedia.org/wiki/ "Encyclopaedia"

more complex declaration, just add a `:` in `[[]]` and anything after is the placeholder word:

 [[wikipedia:SearchTerm]]: www.google.com/?q=SearchTerm#q=SearchTerm "search engine"

hide or show prefixes (I think by default, you should hide prefixes)

 [[wikipedia]]: en.wikipedia.org/wiki/ "Encyclopaedia" { prefix=show }

 [[wikipedia]]: en.wikipedia.org/wiki/ "Encyclopaedia" 
 { prefix=show }

Generic directive style, root declaration

 !CURIE[ wikipedia ]( en.wikipedia.org/wiki/ ){ prefix=show }

If this proposal goes ahead I suggest we focus on making the syntax as close to the regular link syntax as possible. This will make the extension syntax intuitive for existing Markdown writers.

For declaration of references, it might be better to keep it distinct. Or you confuse the link syntax with declaration of references. [[wikipedia]](en.wikipedia.org/wiki/) will be confused for standard link. [[wikipedia:BatMan]]

mofosyne · October 24, 2014, 5:26pm

What do I prefer?

This is my preference, to link to this example site en.wikipedia.org/wiki/TheDog

[[wikipedia: The Dog ]]

declared via either:

[[wikipedia]]: en.wikipedia.org/wiki/ "Encyclopaedia" { prefix=hide }

[[wikipedia:ArticleName]]: en.wikipedia.org/wiki/ArticleName "Encyclopaedia" 
{ prefix=hide }

Where the last example will use the ArticleName placeholder to edit more complex root urls.