Extension terminology and rules

Crissov · June 23, 2015, 3:30pm

Continuing the discussion from Guide for syntax extensions:

I wholeheartedly agree with @jroper’s post. Sadly, most replies last year talked about an expressive generic syntax instead, which hasn’t been implemented anywhere before and for the most part just mimics HTML.
I believe we should agree on a common nomenclature and some basic rules and identify innate extension points. Here are some thoughts and ideas of mine.

Terminology

There already is hardly any Markdown implementation that limits itself to the syntax and semantics as specified by Commonmark 1.0. They try to parse supersets thereof and their grammars are referred to as flavors, variants, dialects etc. or they are described as a set of extensions to the interoperable core. Sometimes, e.g. in the preparation of the CM spec, a preprocessor converts additional idiosyncratic syntax (often specific to a site or project) to standard CM/MD or HTML.

When a parser alleviates a rigid syntax, e.g. laxer white-space handling, this is commonly known as (syntactic) sugar, but the term is also applied to minor, but often backward-incompatible extensions like [Shortcut Links] without trailing brackets. This also includes invisible extras like the automatic generation of identifiers (IDs) for certain HTML elements, mostly headings, based upon their textual content.

A group of related extensions can form a module. Modules should be harmonized with each other, but special interest modules may be mutually incompatible. Profiles combine a number of required, optional and for forbidden modules to suit a specific domain or use-case.

Some extensions are already deployed and supported by default in several interoperable implementations, they are add-ons, whereas those implemented but off by default are options and those incompatible among implementations are plug-ins. Others are just proposed drafts without noteworthy implementation.

Rules

Lines have optional indentation followed by a single optional alphanumeric attribute to the optional and nestable line prefix which must be followed by whitespace before the line content which may be followed by whitespace and the prefix repeated as a line suffix. (Some existing prefix may deviate from this.)
Phrasal affixes (i.e. prefix or postfix or both) never have a space between themselves and their content (inside), but they always require non-alphanumeric characters (or nothing) on the other side (outside).
Phrasal prefix and postfix have the same shape but may be mirrored or rotated images of each other (i.e. brackets), unless there is strong precedent (e.g. SGML entity references with ampersand start and semicolon end).
A phrasal postfix (or prefix) may be declared optional to apply to a single alphanumeric word, but there must never be a semantic difference to the double-affix variant.
New syntax should fallback gracefully:
- Content must never disappear depending on parser used.
- Markup characters must not alienate readers if displayed verbatim.
New standardized markup should follow established practice in plain-text media first and in existing implementations second.
A sequence of the same phrasal affix twice or more should be treated as …?…

Patterns open to extension

The link and embed syntax can be extended in several ways.
- Currently:[link text](URL "optional title") or [linkt text][reference] with preceding ! for embedding.
- The exclamation mark can be substituted by another punctuation mark. Implementations that do not understand its meaning should render a normal hyperlink instead (and they all do).
- The title in parenthesis links may be followed by other optional attributes, e.g. image dimensions. (Degrades awfully.)
- The alt text may contain additional markup or information in a predefined format.
- There can be additional meaningful markup in the definition lines of reference links.
- Invalid single-character URLs (e.g. :, ?, #, %) may be used as markers for special treatment. (Does not work well with embeds.)
- Pseudo-protocols may be parsed for special effects. (Discount supports abbr: etc.)
Every visible (= printable) non-alphanumeric character from US-ASCII (= Basic Latin block in Unicode) should be considered potentially active, i.e. it can be either a line prefix or a phrasal affix.
Lines consisting of a single non-alphanumeric ASCII character repeated at least three times may have structural meaning.
- With intervening whitespace there must be nothing else in the line and it is considered some kind of separator.
- Otherwise the line may be a fence and parameters may follow that will not appear in output. These apply to the following block.
- More heading levels may get a Setext-like underline.
Numbered list items could get more valid parameters, i.e. formats.
…

Crissov · October 12, 2018, 10:56am

@jgm Can we get a separate repository under https://github.com/commonmark/ that shall contain

a spec to define these terms and
specs for actual extensions (and modules and flavors) with embedded test cases like the main spec?

Chronological overview of existing meta threads

Add "plugin" syntax to the spec (2014 Sep/Oct)
wants generic processing instructions added to the base spec
Guide for syntax extensions (2014 Sep)
same gist as this thread in the initial post that was quickly kidnapped in the responses
Including Markdown Flavours Features (2014 Sep)
single supportive post
Optional syntax (2014 Sep)
about graceful fallback, coordinated extensions etc.
Multiple levels of CommonMark specification? (2014 Sep through 2016 May)
proposes overly complex spec levels
Extension distributions (2014 Oct)
introduces distribution for what I called module above: a sensible collection of related and compatible extensions, single post
Overview of existing MD extensions (common solutions, incompatibilities, …) (2015 Jun/Jul)
resulted in wiki entries for Proposed Extensions, existing Flavors and Deployed Extensions
Flexibility and modularity for the win! and also extensions’ ecosystem (2015 Jun)
AST transforms etc.
Sorely confused about "Extensions" (2015 Sep through 2016 Mar)
brief talk about extension (syntax), module (syntax), addon (implementation)
Can I Use __ ? Markdown Version (w/ CommonMark) (2016 Mar)
announcement for an off-site documentation project
Hooks for extensions (2016 Nov)
about extending the reference implementations in particular
Extension spec as part of the Commonmark spec (2017 May)
same gist as this thread, single post
A convention for flavor declaration (2017 Oct)
wants unobtrusive “namespacing”

Crissov · October 19, 2018, 8:44am

Just for the record, I’m keeping my documentation in the branch extensions in my fork of the Common Mark spec repository for now.