Cross-references and citations

DaveJarvis · November 17, 2021, 5:17am

Numerous programs have implemented cross-references and citations on top of Markdown. Similar discussions have not resulted in a standard syntax for cross-references and citations added to CommonMark:

Existing implementations have a reasonably consistent syntax, including:

pandoc citations extension and R Studio — [@bibKey] (e.g., [@doe99; @smith2000; @smith2004], [see @doe99, pp. 33-35 and *passim*; @smith04, chap. 1], [-@smith04], @smith04), plus an extensive set of locator terms
PandocCiter — @bibKey or [@bibKey]
zotxt — [@bibKey] (e.g., [@Doe2006], [@Doe:2006])
MultiMarkdown — [][#bibKey] (e.g., [p. 26][#Doe2006])
HTML — <cite>...</cite>
Zotero’s Scannable Cite — {See | Smith, (2012) |p. 45 | for an example |zu:2433:WQVBH98K}, and includes legal types

Citations and cross-references are somewhat similar concepts. Internal cross-references have been discussed at length:

pandoc: internal links to tables and figures

An astute observation from the thread includes:

Though perhaps this mechanism could replace the current mechanism for numbered examples: {#ex:foo}, @ex:foo?

A possible issue with the lengthy discussion on cross-references is that content and presentation logic are intermingled. That is, writing Figure @fig:label is redundant because @fig already denotes that the reference is a figure. How @fig is rendered can be left to the presentation layer. (The pandoc-fignos demo exemplifies this scenario.)

Rather than create a specific form for each reference type, consider the general form:

{#type-name:label}
[@type-name:label]

Where the type name can be any two- to four-letter value (including I18N). Thus the following pairs of anchors (braces) and references (brackets) are valid:

{#fig:cats}
[@fig:cats]
{#図版:猫}
{@図版:猫}
{#eq:mass-energy}
[@eq:mass-energy]
{#eqn:laplace}
[@eqn:laplace]

Note that @type:label could be considered invalid (syntactic sugar?), although it’d be a breaking change because some implementations support it.

Multiple cross-references could be written as:

see [@fig:cat; @fig:dog; @fig:dolphin; @tab:habitats].

That could be rendered in one of many ways, depending on the presentation logic:

see Figures 1.1 to 1.3 and Table 1.1.
see Figures 1.1—1.3 and Table 1.1.
see Figures 1.1, 1.2, 1.3, and Table 1.1.
see Figure 1.1, Figure 1.2, Figure 1.3, and Table 1.1.

This allows for:

see [@fig:cat; @fig:dog; @fig:dolphin; @tab:habitats] 
starting on [@page:cat].

The rendering software would have to be flexible enough to allow the user to define the behaviour for the type name (or provide suitable defaults). For example, @page could map to \at{page}[label]. This allows the flexibility of:

[@fig:cat]

to map to see \in[cat] on \at{page}[cat], which could render as:

see Figure 1.1 on page 9

For bibliographic references, labels are cross-referenced against an external database. The render must be informed what label denotes such a reference, meaning the following may be valid:

[@bib:doe2021]
[@参考文献:doe2021]

Labels not found in the database may result in a warning by the rendering software.

A locator is the reference followed by a comma, its type, and numbering, such as:

[@bib:doe2021, pp. 33-35, and *passim*; @bib:smith2024, ch. 1]

Here, too, what qualifies as a locator would need to be configured to allow for I18N. Such as:

[@bib:descartes, lv. 2]

Where lv. would be rendered as livre. The default locator type is page, allowing it to be omitted, as per:

[@bib:doe2021, 33-35, and *passim*; @bib:smith2024, ch. 1]

Thoughts?

P.S.
This topic stems from a question by a discussion item posed about KeenWrite. See https://github.com/DaveJarvis/keenwrite/discussions/144 for details.

nichtich · February 3, 2023, 6:21am

Quarto supports cross-references in a syntax based on pandoc-crossref but with subtle difference. Quarto:

{#tbl-table}

pandoc-crossref:

{#tbl:table}

Both use is a list of mandatory prefixes for tables (tbl), equations (eq), code listings (lst)…

Vanilla Pandoc since version 2.1.13 allows cross-references without any special syntax but limited to output format OpenDocument.

Frankly speaking I don’t understand why enforced prefixes such as tbl: or tbl- are needed at all to enable numbered cross-references. It’s just a matter of output format writers to support this.

mofosyne · May 20, 2023, 10:41am

Pandoc’s approach appears to be mostly to keep citation in front matter or if inline then it would be via this XML based CSL format CSL 1.0.2 Specification — Citation Style Language 1.0.1-dev documentation .

However I’ve been wanting to do citation for note taking and it would be handy if I could write bibilography inline using Generic directives/plugins syntax style that we settled on at least for stuff like html IDs etc…

In this context I was trying to link to an academic paper in some non academic blog post but would love to also include some citation metadata in an inline manner… maybe like this?

[Understanding KaZaA](https://cse.engineering.nyu.edu/~ross/papers/UnderstandingKaZaA.pdf){author = "Jian Liang, Rakesh Kumar, Keith W. Ross", title = "Understanding KaZaA", year = "2004"}

samwilson · May 22, 2023, 2:21am

Would that result in a link followed by a footnote reference? What would do the formatting of the contents of the footnote/citation?

I’ve been doing similar things, but putting the URL in the directive as well, and using WordPress shortcode syntax (which is much the same but without the comma separator), e.g. {cite url=https://cse.engineering.nyu.edu/~ross/papers/UnderstandingKaZaA.pdf author="Jian Liang, Rakesh Kumar, Keith W. Ross" title = "Understanding KaZaA" year = 2004} — and the cite bit is the name of a template file that does the actual formatting (into HTML or LaTeX as required).

mofosyne · May 22, 2023, 3:00am

I haven’t thought that deeply, but I think what you said makes sense. However seems more like implementation of output that the render could figure out themselves (e.g. tooltip view instead).

Does bring to mind if it would cause clashes with other uses of the metadata e.g. html id. In that case should it be addressed by [name](url){id=<html_id>}{cite <metadata>}? I see you are using {cite <metadata>}, but not sure if we have any similar context in conceptialising this with the currently adhoc community accepted idea of Consistent attribute syntax.

[Understanding KaZaA](https://cse.engineering.nyu.edu/~ross/papers/UnderstandingKaZaA.pdf){id=understanding_kazaa}{cite author = "Jian Liang, Rakesh Kumar, Keith W. Ross", title = "Understanding KaZaA", year = "2004"}

Would that be in keeping with your idea? Not sure if we can intermix cite metadata with attributes.

Recall that I want to be able to associate citation metadata with links at the very least. Not sure what would be sane for paragraphs and sections however.

samwilson · May 22, 2023, 8:13am

Separating the HTML attributes from others sounds like a useful idea.

And yeah, I was just wondering really if the implementation details of the renderer are actually crucial to making the Markdown syntax make sense, that it’s hard to separate them. i.e. that it’d be unlikely that multiple Markdown renderers would do it in the same way, so there’s not that much benefit from a standardized syntax. (Although I’m perhaps not thinking through it properly!)

DaveJarvis · June 11, 2023, 10:18pm

While that works for native English speakers working on Latin-based documents, an internationalized syntax that supports any type of document reference (figure, table, algorithm, listing, equation, page number, heading, bibliographic entry, ad nauseum) would likely have a wider impact and greater adoption.

Certainly, different editors will have different ways to present the information. That’s a benefit to using a plain text format, not a deterrent to having a standard syntax. A standard eases porting documents from one editor to the next (i.e., no translation software needed).

Here’s an animation of how I envision the “tooltip” working in KeenWrite:

citations

In my mind, a cross-reference and a citation are the same thing: they are both ways to reference an item. While it would be convenient to inline the citation metadata, there’s a whole ecosystem of tools dedicated to bibliographic references that would be unusable (such as JabRef, Zotero, and ConTeX/LaTeX/LuaTeX/SILE modules). Further, it would require two new syntaxes (one for internal document references and a separate one for external references), which is additional development effort for parsers.

For myself, I don’t see the benefit: there’s already a syntax for handling bibliographic references. Why not keep that externalized? Let the renderer handle linking the file and the document, rather than the syntax.

To me, that’s a presentation issue, best left for the renderer.

DaveJarvis · December 17, 2023, 5:30pm

KeenWrite now has a revised caption syntax and cross-references, the latter borrowing from pandoc-crossref. The caption syntax provides a consistent way to add a caption to the main types of objects in a document: tables, figures, and equations. The captions can include cross-references.

See the documentation for details.