Referense link definitions

Shortcut reference links works well for English language because “English nouns are only inflected for number and possession” [w]. However, shortcut reference links do not work well for languages where nouns have more forms. For example, in Russian language nouns have 6 cases. Because of “variable nouns” in Russian text, an author has to use link labels very frequently:

... [Льва Толстого][Лев Толстой] ... 
... [Льву Толстому][Лев Толстой] ...
...
[Лев Толстой]: https://en.wikipedia.org/wiki/Leo_Tolstoy

But frequent using link labels harms text readability.

Alternative approach would be using multiple link definitions, e. g.:

... [Льва Толстого] ... 
... [Льву Толстому] ...
...
[Лев Толстой]: https://en.wikipedia.org/wiki/Leo_Tolstoy
[Льва Толстого]: https://en.wikipedia.org/wiki/Leo_Tolstoy
[Льву Толстому]: https://en.wikipedia.org/wiki/Leo_Tolstoy
...6 definitions total...

Main text is quite readable, but this approach is not so good because of duplication of link destinations (and link titles).

I would like to have some ability to avoid duplication.

One possible form could be:

[Лев Толстой]: -
[Льва Толстого]: -
...3 more cases...
[Льву Толстому]: https://en.wikipedia.org/wiki/Leo_Tolstoy

I. e. link definition with dash (or another dedicated character) instead of link destination and link title uses destination and title from the next link definition.

Alternative solution could be using regular expressions, e. g.:

[!Лев Толстой|Льва Толстого|...|Льву Толстому]:
    https://en.wikipedia.org/wiki/Leo_Tolstoy

or

[!Л(ев|...|ьва|ьву) Толст(ой|...|ого|ому)]:
    https://en.wikipedia.org/wiki/Leo_Tolstoy

Exclamation mark (or another dedicated character) denotes the label is a regular expression, not a literal.

Any comments on that?


BTW, Mediawiki (Wikipedia engine) uses another approach to the problem: If a link is followed by a word with no space, e. g.:

[[a]]b

They use a as link label, but concatenates a and b to form link text.

It allows to append endings to link labels. It helps a little but not too much. For example, it does not work for my example with Leo Tolstoy.

2 Likes

An interesting case, or, more appropriately, seven cases.

I believe using regular expressions would impact the very text readability you are looking to improve. A more user-centric approach would be to put the burden on the programmer, i.e. automatically resolve declined reference links. Leo Tolstoy from your example poses a challenge due to the exceptional declension of the first name (Лев -> Льва etc.). The last name, however, is conventional:

  1. Толстой
  2. Толстого

Therefore, it should be fairly straightforward to implement an extension that would accept the following:

... [Толстому] ...

[Толстой]: https://en.wikipedia.org/wiki/Leo_Tolstoy

In cases where additional content is required in the link text nested brackets could be used:

[Львом [Толстым]]

That would also enable

о [Л. [Толстом]]

and even

[[Толстой]]

This is in line with the wiki syntax, but could prove counterproductive if applied to an actual wiki scenario. In the latter case, the above label could first be checked against the document’s reference map, and if not found there, resolve to the respective wiki page. Ambiguity between references and wiki links would be resolved by using the full forms, i.e.

[Толстой](Толстой)

for direct links and

[Толстой][Толстой]

for reference links, or the collapsed form, i.e.

[Толстой]

for the latter.

The proposed syntax would obviously prove useless where e.g. both Leo Tolstoy and Alexei Tolstoy need to be referenced in the same document. In fact, it would fail for the latter’s granddaughter:

[Татьяны [Толстой]]

Such scenarios, however, are seemingly rare.

The declension of Лев would still require a solution for e.g. references to Leo the zodiac sign.

Why seven cases rather than six, you might ask. In this regard Russian is very similar to Latin, which also includes the vocative (звательный) case. Although the use of vocative in modern Russian is mostly confined to quoting classics, it should still be taken into account. Consider:

[Врачу], исцелися сам!

[врач]: www.imdb.com/title/tt0412142/

Although the above form is identical to that of the dative, the same does not hold for боже, человече, or (the more challenging) старче.

The latter example also illustrates a case normalization of a different kind, i.e. lower vs. upper case, which is already part of CommonMark.

The expression in the last example is a (commonly used) verbatim translation of the Latin Medice, cura te ipsum! The proposed extension would therefore benefit not only authors writing in Russian, but (with appropriate customizations) the sizable Latin-speaking community.

2 Likes

I got so fascinated with Russian declensions that I started to develop a program to automate those.

This is how it looks at the moment:

1 Like

Updated UI:

So you think 6 (or 8, depending on whom you ask) cases is too much? Try Finnish!

I believe using regular expressions would impact the very text readability you are looking to improve.

I do most care about text body readability, readability of link definitions is less important.

A more user-centric approach would be to put the burden on the programmer, i.e. automatically resolve declined reference links.

Great idea but it’s too hard to implement. Regular expressions may look a bit ugly, but they will work for any language, either Russian or Finnish, while resolving declined references will require a dedicated module for each language. What about multi-language texts? Automatic resolving will have to detect language first. I do not believe it can be implemented in reasonable time.

BTW, regular expressions is just a variant, I do not insist on it. Explicit list of acceptable variants is also ok to me:

[Лев Толстой|Льва Толстого|...]: 

or

[Лев Толстой]+[Льва Толстого]+[...]: http://...

or

[Лев Толстой] \
[Льва Толстого] \
[...]: http://...

Exact syntax may vary, it is not so important.

I got so fascinated with Russian declensions that I started to develop a program to automate those.

Are you aware about словарь Зализняка?

I guess the easiest solution markup-wise would be to reference another reference:

[Bücher]: [Buch]
[Buch]: example.com/buch

The advantage would be that it doesn’t matter where you put it. E.g. in the first paragraph you could use [Buch][], and in another one, you could use [Bücher][]

The only problem that would arise is circular referencing and how to dissolve it, i.e. what to output then.

Other than that, i would favor a simple contracting syntax without any new symbols or regular expressions:

[bücher]:
[buch]: example.com/buch

or

[bücher]: [buch]: example.com/buch

PS: i also like the idea of contained references as in [Leo [Tolstoy]], but that would efectively disallow square brackets inside link text. And yes, it might collide with some applications that use a wiki where the double brackets actually mean “this is a wiki link”.

PPS: There’s a nice logic in the [Leo [Tolstoy]] syntax: [link text with [printed reference]] vs. [link text][hidden reference]. I do love it more and more! It’s also more clear than in wiki-syntax where [[CD]]s gets [CDs][CD]. With the syntaxx from above one could write The company is producing [[wheel]chair]s. and get [wheelchair][wheel]s. An edge case, but totally logical and very easy to see.