Enumerated lists without explicit number, ATX headings with explicit number


#1

Gruber originally described a single way to markdown numbered lists: 1.. Commonmark specifies another alternative: 1). They also differ in that CM supports start values other than 1, but not value overrides (like HTML does): 2. 4. = 2. 3.. A third alternative has been proposed without much objection, but it hasn’t been adopted (yet): (1).

Several flavors of MD support alphabetic roman letter values: a. or A). Some of them also support roman numerals: i. or I) – this is bound to clash if start offsets are to be supported as well, so let’s ignore that part. There even has been a proposal to support lists without line markers for each item.

Since it’s not very intuitive to use 1. or 1) for each and every item if you don’t want to keep track of the progression by hand, and it’s also confusing and a waste of time if you do try to manually number each item correctly, some flavors support the number symbol to replace digits: #. and perhaps #) as well. The CM spec now requires a space after the hash mark(s) when they shall indicate a heading, but many existing parsers don’t enforce that rule, so this syntax is asking for a bit of trouble. It is helpful, nevertheless.

As I see it, the digit is merely the value of an anonymous attribute. The real marker is just the dot . or the right/closing parenthesis ). Could we make that pseudo-attribute optional, i.e. leave it empty?

This way, automatically enumerated lists could look like these:

. Apples
. Oranges
. Bananas

) Pears
) Peaches
) Kiwis

Luckily, both punctuation marks used here are very unlikely to appear accidentally at the start of a line, because they almost always follow alphanumeric characters directly without intervening whitespace. It’s therefore actually more likely to get false matches with the existing syntax!

If this should not be incorporated into the main specification, at least not before version 1.0, does it have to make provisions now so that this feature could be included later or in an extension?


Info strings elsewhere
Info strings for suffixed headings
#2

IMO #. is just right.

I recall (although I can’t find the question ATM) somebody on Stack Overflow Meta explicitly requesting support for this syntax without ever being aware of Markdown.

Looking back at every implementation in existence wouldn’t get CM very far. #[a-zA-z] could be a hashtag, but I can’t think of any other reasonable interpretation for #. (or #) or (#)).

How about that (annoying) habit some people have of preceding punctuation with spaces ?

Combine that with wrapping, and voilà!


#3

Ah, right, I forgot about French spacing. It shouldn’t affect periods as much as exclamation and question marks, though, and I’m not sure about parentheses.

It’s not that #. would be misinterpreted as a hashtag, but a heading almost everywhere. (#) would be safe and unambiguous, actually.

I also forgot to mention that Mediawiki syntax has # as line marker for enumerated lists.


#4

Since in Markdown readability is emphasised above all else, I think we should be careful about introducing an additional syntax which is less intuitive without first learning a special syntax. For this reason, I would be hesitant to add . or ). #. maybe, since the # sign is commonly used to represent a number, but it could be confusing since # is also used for headings.


#5

I second this—valid and important—argument and would like to point out that introducing “#” in a role “as-if” it is a numeral would complicate parsing: and I don’t mean just what a “real” CommonMark parser does, but also for example the use of regular expressions in editors (like vi) and simple scripts (like awk). Where you now have something like

marker = (space , space ?) ? , digit + , "." , space +

you’d have to use something akin to this:

marker = (space, space ?) ? , (  "#" | digit + ) , "." , space +

The current state of affairs can be summarized like this:

  • What numerals the author writes into the “item markers” does not matter, they get replaced by whatever numeral the style sheet rsp rendering procedure will generate;

  • Except that the numeral in the first item of a list is transmitted in the start attribute of the CommonMark <list> or HTML <OL> element (as a decimal numeral without leading zero digits).

So it is the style sheet’s rsp rendering procedure’s job to start the list with the correct counter value, while subsequent items will be marked with the incremented counter values.


I don’t know what “anonymous attribute” is intended to mean here, but the input “marker” numeral of the first list item in an ordered list is certainly not an “anonymous attribute”: the attribute’s name is “start”.

It should be noted however, that the “start” attribute of <OL> is already deprecated in W3C HTML 4.01, and is absent in ISO 15445:2000 HTML—funny enough, it seems to be back, alive and kicking, in HTML5.

But that’s not a CommonMark definition problem, but a problem of how to map the parsed content into HTML: the “start” counter information is recognized and output by a CommonMark processor in any case.


That said, it is beyond me how one can find that

to use 1. or 1) for each and every item

is not very intuitive, and at the same time can seemingly find the notation

) Pears
) Peaches
) Kiwis

more “intuitive” (while being mute about what the start counter value of such a list should be—presumably “1” …).

Why not just use a convention like maybe

1. Apples
0. Oranges
0. Bananas

Is a DIGIT ZERO still “too much information” and “not intuitive” enough?

In my mind this at least makes absolutely clear that the “0” in the second and subsequent items is not the “real” item number; and for my taste it does look better and easier to understand than using a sole ". " or ") ", let alone "#) ", as a “item marker”.


#6

It’s just as counter-intuitive to use 0 for the third or seventeenth item as would be 1.

As for my concept of attributes: in MD/CM all attributes would be anonymous (although there are proposals and extensions to the contrary). In XML none are. MD/CM line markers could have but a single attribute (info strings of fences may consist of more). Currently no other line marker supports an attribute, but some MD flavors effectively use them, e.g. A> or key:. The exclamation mark that differentiates embedded from normal hyperlinks could also be seen as a single-character attribute value, although there have been proposals that put a more explicit value between itself and the opening square bracket, e.g. !video[](clip.mp4).

Disclosure: I have a generic parser still in the making that works somewhat like this (and therefore cannot support CM perfectly since whitespace after any marker, incl. >, is mandatory for instance):

line:       indent? prefix? contents? suffix? trailing?;
indent:   ( space space? space? | tab );
prefix:   ( open? attribute? close | attribute? affix ) indent;
contents: ( phrase space )* phrase;
suffix:     trailing ( open attribute? close? | affix attribute? );
trailing:   space;
attribute:  alphanum*;
open:       '(' | '[' | '{' | '<';
close:      ')' | ']' | '}' | '>';
affix:      '*' | '-' | '+' | '.' | '#' | …; # all non-paired ASCII punctuation

#7

Okay, so your position is—correct me if I’m wrong:

  1. Using "0) " or "1. " is counter-intuitive,
  2. while ") " or ". " is (somehow) intuitive enough;
  3. but typing the correct individual numeral is too much work in any case,
  4. and there are no tools available or even imaginable to do so,
  5. therefore the CommonMark syntax should be changed.

I can’t answer anything (reasonable) to that …

[And note that I just did the seemingly impossible, and did type “⎵1.⎵”, “⎵2.⎵”, “⎵3.⎵”, “⎵4.⎵”, and “⎵5.⎵” into my browser’s shabby text area box: thats five different numbers! :stuck_out_tongue_winking_eye: ]

Obviously, “⎵” is meant to represent SPACE.


As far as I can tell from your explanation of what you mean by “attributes” (and I guess that “MD/CM” means “Markdown/CommonMark”, not “1500/900”, right? :wink: ), it seems that attribute is first of all the name of a non-terminal in your grammar; and if I understand correctly, you then go on and say that the matching sub-string for this production (ie, the alphanum*) is the value of this “anonymous attribute”?

And I guess that by “line marker” you must must mean a (missing) non-terminal, possibly like this:

line: line-marker content? suffix? trailing?;
line-marker:  indent? prefix?

Now your grammar’s prefix production obviously generates the strings “.⎵” and “)⎵”, but in an ambiguous way:

  • either attribute is missing (it is marked by ? as always being optional),
  • or attribute is there, but matches an empty sequence of alphanum – (which I guess is the usual character class).

I have no idea what the undefined phrase could be, except that it ought to contain the important stuff.


All in all I honestly fail to see your point here: if the CommonMark specification and your grammar (or parser) disagree: why should the former need to change?

Btw, I’m actually interested in a (context-free or not) grammar that could correctly define CommonMark (and have some unpublished material on this myself), but your grammar so far looks not that similar to what I would expect.


CommonMark Formal Grammar
#8

Not quite. The syntax should optionally allow authors to omit the number if they find . more intuitive than repeated 1. and less cumbersome than keeping track of the correct number (e.g. when changing the order) and their setup doesn’t automate list generation. One purpose of CM/MD is that it’s easy to type by hand without dedicated tools.

Line marker wasn’t required for the grammar, it would be prefix || suffix, but for simplicity you may as well assume it’s the same as prefix. Since attributes are always anonymous, attribute indeed represents the value only. You’re right, since attribute is always optional, it should be alphanum+. That doesn’t change much, though, because it can still contain an arbitrary string of characters [A-z0-9] (or something similar to that). I left out phrase because it doesn’t matter here, being the inline contents.

I disclosed that grammar for informative purposes only, to show my mental model of “simple markup languages”, because you seemed confused by my notion of an anonymous pseudo-attribute. It’s not the reason for proposing “naked” . and ) line markers.


#9

Thank you, I had D omitted from my Roman list implementation. Now that I added it, the following test case fails:

c. First item
d. Second item

It should be a Latin list with start=3, but is instead interpreted as a Roman list with start=100. Should I require start=1 in alphabetical lists?


#10

Ok, and yes: I was in fact confused by your use of the term “attribute”, which has a well-defined meaning in the context of XML, and also is used in the CommonMark specification with the exact same meaning. You are of course free to name a non-terminal in a concrete syntax “attribute” (a perfectly reasonable name!), but don’t expect everyone to understand that you mean this from now on when you say “attribute” :wink: – But I think I see now where you’re coming from, thank you.


But I still can’t quite follow your argument

[…] less cumbersome than keeping track of the correct number (e.g. when changing the order) and their setup doesn’t automate list generation.One purpose of CM/MD is that it’s easy to type by hand without dedicated tools.

I’ve been writing “CM/MD” for years now, most of the time using “tools” like Vim (some times through the “It’s all text!” or “External editor” Mozilla add-ins, which I do warmly recommend), but often—right now!—I just type into an HTML text area the old-fashioned, stone-age way.

And not once did I miss the ability to re-number items in an ordered list there (but I do this regularly in Vim, using a filter in “:%!command” and similarly).

These two (personal) ways of typing Markdown text correspond IMO to two (objective) roles one can assign to the Markdown text as written:

  1. As just a vehicle to get the content into whatever form is wanted: that’s what I’m doing right now, hacking into the commonmark.org text area, and wrestling as always with the shi^H^H^H capricious Markdown parser used here.

  2. As a plain-text document of its own right: the Markdown syntax was designed with this in mind (too), and one can in fact format Markdown text in a way that the “source code” is quite clean and readable: that’s what I’m commonly aiming at when writing in Vim, and this includes operations like reformatting paragraphs (line breaking and indentation), using UTF-8 instead of character references, renumbering sections and list items, and so on.

You seem to assign neither role to your text: on the one hand, you complain about the “counter-intuitive” look of some part of the syntax, but on the other hand you decline to use the perfectly clean and intuitive-looking alternative (out of laz^H^H^Hconvenience or lack of proper tools).

You said nothing about “continuation lines” or line breaking: do “sloppy” lines (as allowed in Markdown) look “intuitive” in your mind? Why would you be content with the one (line format) but not the other (item numbers), seemingly lacking any tool to improve any of this? Do you ever use section numbers in your section headings—without any support from Markdown?


This reminds me of this discussion here about introducing the backslash-escape sequence “\⎵” (ie REVERSE SOLIDUS followed by SPACE) to denote NO-BREAK SPACE in CommonMark.

Typing “&nbsp;” or “&#160;” (let alone using a decent editor, or a simple pre-processing step like global search/replace) just seemed to be unacceptable for the original poster.


#11

Hehe – but I also could only remember that “D” had a value of maybe 500, but not whether “MD” was valid, or what value it would denote :wink:

Maybe “.1500” would make a nice and geeky file name extension for Markdown texts (and “.900” for CommonMark)? Or, using base 16 for increased nerd factor and at the same time staying inside the (historic) three-character width limit—how about this:

markdown.5dc” and “commonmark.384” ? :slight_smile:


c. First item
d. Second item

It should be a Latin list with start=3, but is instead interpreted as a Roman list with start=100. Should I require start=1 in alphabetical lists?

Gee, what kind of Markdown or CommonMark parser would interpret c. as a roman item numeral denoting 100? Your implementation?

If you ask me, and if you are not kidding, and if you insist to support roman numerals in Markdown list item labels: I’d recommend to, when in doubt, interpret those labels alphabetic:

  • A list starting with an item labelled i. or ii) may well be meant to use roman numerals, but

  • the markers c. and d. (say: any syntactically valid roman numeral denoting the number 100 or greater) are probably meant to be just alphabetic item labels.

HTH :wink:


#12

Yes, and I am not kidding. My implementation is consistent with CommonMark, which allows arbitrary numerals (albeit in a more restricted sense) to be used as ordered list markers. Roman numerals have precedence over Latin letters, since i. would otherwise be interpreted as <ol type="a" start="9">.

Adding a

e. Third item

would of course make that an unambiguously Latin list.

This is an edge case that will probably never occur in the field, but it’s still a perfectly valid test case (which fails ATM).


#13

It was news to me that CommonMark would allow roman numerals in ordered lists at all, let alone “arbitrary” numerals, even “in a restricted sense”, whatever that means.

Adding a

e. Third item

would of course make that an unambiguously Latin list.

This is an edge case that will probably never occur in the field, but it’s still a perfectly valid test case (which fails ATM).

Did I understand this right? The interpretation of all numerals in all the list’s items would or could change if you add, remove, or re-label one item anywhere in the list?

That’s an “interesting” design decision!

Aside: what does “ATM” stand for? “At the marketplace” maybe? [ Sorry, couldn’t resist :wink: ]


#14

Hmmm. I have been chewing on this “#” usage a bit, and this is what I would like to throw into the discussion now:

  1. Introduce “#” as syntactically equivalent to a decimal digit (that is, the character class digit would include “#” as the 11th member);

  2. Assign a “preliminary meaning” to this “#” character that could be vaguely described as “a placeholder for an unspecified digit or leading space in a numeral that is to be generated by some automatic numbering mechanism”;

  3. Allow items in an “ordered” list to be labelled with “#.⎵” or “#)⎵”, meaning that “#” is to be replaced by a numeral, namely

  • by the numeral “1” in the first item in a list, and
  • by the succeeding numeral to the one used in the previous item in all other list items.

Syntactically, a numeral in the CommonMark syntax would then be any non-empty sequence of “digits” from DIGIT ZERO to DIGIT NINE and including NUMBER SIGN.

Or maybe rather either a sequence of DIGIT ZERO to DIGIT NINE, or a single NUMBER SIGN character?


At first sight this generic meaning assigned to “#” (to exactly which occurrences anyway?) seems to clash head-on with the use of “#” in ATX headings as specified currently for CommonMark. But then, writing section headings with decimal numbers followed by two spaces, in the form like this:

2⎵⎵Overview

Lorem⎵ipsum⎵dolor⎵sit⎵amet,⎵consectetur⎵adipiscing⎵elit.

2.1⎵⎵Preliminaries

Cras⎵ligula⎵velit,⎵imperdiet⎵vitae⎵congue⎵at,⎵pretium⎵et⎵lectus.

is a long-standing convention to highlight sections and subsections used in typescripts.

Now comparing this with the ATX syntax, one could derive the syntax rules:

  1. One or more “#”, followed by a single SPACE, and then a non-whitespace character, starts an ATX heading.

  2. A single number, optionally followed by multiple ( “.” , NUMBER ), ant then by two SPACES and a non-whitespace character, starts a “numbered section heading”.

That is, the example above could be input (to request automatic numbering)

#⎵⎵Overview

Lorem⎵ipsum⎵dolor⎵sit⎵amet,⎵consectetur⎵adipiscing⎵elit.

#.#⎵⎵Preliminaries

Cras⎵ligula⎵velit,⎵imperdiet⎵vitae⎵congue⎵at,⎵pretium⎵et⎵lectus.

Or—if the top-level number is fixed and should not be replaced by a generated number—one could mix “#” placeholders and “genuine” numerals:

2⎵⎵Overview

Lorem⎵ipsum⎵dolor⎵sit⎵amet,⎵consectetur⎵adipiscing⎵elit.

2.#⎵⎵Preliminaries

Cras⎵ligula⎵velit,⎵imperdiet⎵vitae⎵congue⎵at,⎵pretium⎵et⎵lectus.

So far, an ATX heading and a “numbered top-level section heading” are distinguished only by the number of SPACEs preceding the title text:

#⎵This is an ATX heading

Lorem⎵ipsum⎵dolor⎵sit⎵amet,⎵consectetur⎵adipiscing⎵elit.

#⎵⎵This is a numbered heading of the first level

Cras⎵ligula⎵velit,⎵imperdiet⎵vitae⎵congue⎵at,⎵pretium⎵et⎵lectus.

Note that “numbered section headings” of subordinate levels are easily recognized by their use of FULL STOP between numerals (or placeholders):

#.#⎵⎵This is a numbered heading of the second level

Cras⎵ligula⎵velit,⎵imperdiet⎵vitae⎵congue⎵at,⎵pretium⎵et⎵lectus.

For ordered lists, the “syntactic situation” is simpler:

3.⎵A first-in-list item, setting the 'start' attribute to "3";
#.⎵An "automatically numbered" item.
1.⎵This one would also be "automatically re-numbered" in the output.

A paragraph, just to interrupt the ordered list.

#.⎵Here is the question: should this item be numbered "1", 
⎵⎵⎵or should it *continue* the number sequence of the "interrupted" list?.

I’m not sure whether top-level “numbered section headings” with a single placeholder “#” of this form are (in particular: visually) “different enough” from top-level ATX headings using this convention.

Apart from that, I see no reason why this use of “#

  • in “numbered section headings” and
  • in “numbered list items”

should cause problems (which couldn’t be avoided by an appropriate specification).

At least I would much prefer this use of “#” to the “truncated” labeling of list items with “.⎵” or “)⎵” alone.


#15

The syntax of ATX headings is a good argument in favor of making the number in enumerated lists optional, because headings are usually numbered as well and authors do not have to provide these ordinal numbers manually – they cannot, actually. The difference is that the hierarchic level of a list is determined by its indentation relative to a parent list, whereas the level of a heading – since these are usually not neighbored directly by another heading – is determined by the number of line markers. Other languages, e.g. Mediawiki syntax, adopted the latter convention for both.

For the record, with my generalized anonymous attribute model, overrides for ATX headings would look like this:

# 1st-level heading with implicit number “1.”
3# 1st-level heading with explicit number “3.”
## 2nd-level heading with implicit number “3.1.”
5#0# 2nd-level heading with explicit number “5.0.”
#2# 2nd-level heading with explicitly numbered 2nd level
    and implicitly numbered 1st level “5.2.” (not “6.2.”)
6## 2nd-level heading with explicitly numbered 1st level
    and implicitly numbered 2nd level “6.1.” (not “6.3.”)
82## 2nd-level heading with explicitly numbered 1st level
     and implicitly numbered 2nd level “82.1.” (not “8.2.”)

Now I’m not sure whether headings, unlike list items, would be allowed to have a lower number than the preceding one:

2# 1st-level heading “2.”
1# 1st-level heading “1.” (explicit) or “3.” (implicit)?

#16

@Crissov:
Hmm:

  1. By overrides for ATX headings you really mean “variant interpretation of ATX heading syntax” here, right? The first line in your example, ie the syntax “#⎵Lorem ipsum”, would be interpreted as an unnumbered “ATX heading” by Markdown and CommonMark rules, while you (or your grammar or implementation) would generate “1.⎵Lorem ipsum” as the character data in the output heading element – did I understand this correctly so far?

  2. Why not preserve the “ATX heading” syntax and interpretation by making the—admittedly hard to see—distinction between input text like “#⎵Lorem ipsum” (an “ordinary” ATX heading) and “#⎵⎵Lorem ipsum” (an automagically numbered section heading)? As I said, the two spaces after the section number is already a well-entrenched typewriting convention.

  3. Regarding the generated section numbers in your example: the “decimal-hierarchical” section numbers should not end in a FULL STOP; the “.” is only used to separate the numerals pertaining to the hierarchical structure. See for example the—still current!—ISO 2145:1978 on “Numbering of divisions and subdivisions in written documents”, or any number of “style guides” for structuring documents like technical reports.

  4. What is the advantage of using “##” instead of “#.#” to indicate a second-level heading? Other than redefining the interpretation of ATX heading syntax, that is?

  5. Why do “##”, “5#0#” and “#2#” all denote 2nd-level headings? If the former is meant to imply “two levels”, then I’d rather see four and three levels in the latter two notations. What is wrong in your opinion with a syntax like the following, with the exact same interpretation as in your examples (and listed in the same order), as far as I understand your terms “implicit” and “explicit” right:

  • #⎵⎵”,
  • 3⎵⎵”,
  • #.#⎵⎵”,
  • 5.0⎵⎵”,
  • #.2⎵⎵”,
  • 6.#⎵⎵”, and finally
  • 82.#⎵⎵” ?

Apart from your re-definition of the “ATX heading” syntax, I find your notation rather hard to recognize, use, and explain from the perspective of a CommonMark author.

Now I’m not sure whether headings, unlike list items, would be allowed to have a lower number than the preceding one:

One obvious example where allowing this would be useful is a table of contents. Consider a document where each section heading appears twice: once in the TOC, once in the document body. With a “dumb” automatic numbering scheme for section heading (which takes decimal numerals “literally”!), and with a distinction between ATX headings and numbered section headings(!), the following “extended” CommonMark input would be processed in a reasonable, meaning: the intended, way—at least as far as I can brain-parse it:

#⎵Contents⎵#

0⎵⎵Introduction
#⎵⎵Section one
#.#⎵⎵Sub-section one
#.#⎵⎵Sub-section two
#⎵⎵Section two

0⎵⎵Introduction

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

#⎵⎵Section one

Donec a diam lectus. Sed sit amet ipsum mauris.

#.#⎵⎵Sub-section one

Maecenas congue ligula ac quam 
viverra nec consectetur ante hendrerit. Donec et mollis dolor. Praesent 
et diam eget libero egestas mattis sit amet vitae augue.

#.#⎵⎵Sub-section two

Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed 
arcu vehicula ultricies a non tortor.

#⎵⎵Section two

Aenean ut gravida lorem. Ut turpis felis, 
pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id 
magna consequat sagittis.

#17

Re 1. Whether a heading is numbered in the (rendered) output format is (currently) of no concern to Commonmark. In HTML, for instance, it completely depends on the associated CSS. Overriding the number would, of course, not put a literal number into the textual contents, but fill an attribute in the AST which may or may not be available in the final output format. (May be <h1 data-number=3> in HTML5 or <h1 style="counter-reset: chapter 3"> with inline CSS.) So, no, you did not understand me correctly.

Re 2. The alleged two-spaces convention is hardly less user-unfriendly than the original Markdown line break syntax with two spaces. Many people use the available freedom of indentation and whitespace truncation without applying meaning to it.

Re 3. Yeah, I usually follow that convention (but not everyone does). It probably slipped over from thinking about enumerated lists. My bad.

Re 4. I’m not saying Commonmark should adopt that syntax, but if you treated numbers in enumerated lists as the value of an anonymous attribute (like I proposed) then that would be a logical conclusion. Your #.# would be yet another syntax which could make authors confuse headings and list items if the latter also adopted the #. syntax. With my syntax, # is exclusive to headings and . is exclusive to lists.

Re 5. You see runs of digits as valid line markers. I don’t. I only consider ASCII punctuation marks as possible line markers. Alphanumerics can only be attribute values or proper textual content (and punctuation marks like # cannot). Your mental model is obviously different from mine, thus it’s not surprising that you find my notation “hard to recognize, use, and explain”. It is consistent, though.


#18

This is absolutely correct, but has nothing to do with my argument: by conflating the syntax of “ATX headings” and “numbered section heading”, you change the expressiveness of CommonMark in this regard from formerly

  • there’s only one kind of heading, and it may or may not be rendered with automatic numbering.

(ignoring setext headings for now) to the new situation:

  • there’s only one kind of heading, and it will be rendered with automatic numbering (at least that’s the stated intent).

Maybe you could point to an advantage incurred by making ATX headings and numbered section headings indistinguishable in the input syntax?


The alleged two-spaces convention is hardly less user-unfriendly than the original Markdown line break syntax with two spaces.

If insinuating that I made-up an alleged two-space convention is all you can bring as (or rather in place of) an argument, it primarily shows that you have no clue how section headings were actually type-written for decades, and are printed today.

And your crude assertion that this is hardly less user-unfriendly than two SPACE characters at the end of a line is utterly baffling in another way: do you really fail to see the difference between

  • one or more SPACE characters between graphical characters on the one side, and
  • one or more SPACE characters followed by an—usually invisible—end-of-line control character?

If so, what do you think might be the reason for the existence of a variety of typographical spaces that has been in use not just for decades like more-or-less modern typewriters, but rather for centuries?


You see runs of digits as valid line markers. I don’t. I only consider ASCII punctuation marks as possible line markers. Alphanumerics can only be attribute values or proper textual content (and punctuation marks like # cannot).

Hacking my way through this jungle of jargon, and using the assumptions that

  1. “line marker” is a prefix in an input line that signals a specific “type” or interpretation of this line;

  2. “attribute values” mean again a (maximally?) matching sub-string of characters in the alphanumeric class contained in such a prefix;

  3. “proper textual content” is that portion in the input syntax that ends up as element character content in the output;

I guess my answer is: no, I do not see runs of digits as valid line markers any more than you do. But I would in fact see

numbered section marker = BOL , Digit , { Digit } , { "." , Digit , { Digit } } , SPACE , SPACE ;

as a valid “line marker” in your assumed sense, I think (but without any “anonymous attributes”! :wink: )

[ Here “BOL” means a terminal symbol providing an anchor point at the beginning of an input line, as you may have guessed, and SP means, well, the terminal symbol U+0020 SPACE, and Digit means means the character class containing the ten decimal digits DIGIT ZERO … DIGIT NINE, just to avoid any misunderstandings. Note that NUMBER SIGN plays no role in this simplified syntax rule. — I assume you can read and understand EBNF. ]


Your mental model is obviously different from mine, […]

That’s my impression too, and maybe one reason for this is that you keep throwing around your home-made terms, like “overriding”, “fill an attribute”, “treat a number as the value of an anonymous attribute”, “line markers”.

[…] thus it’s not surprising that you find my notation “hard to recognize, use, and explain”. It is consistent, though.

Consistent it may well be, but I’m sure I’m not the only one who finds your notation at least “harder to recognize, use, and explain”.

What makes me sure in this regard is the simple observation that it is easier to visually recognize N sub-strings consisting of only decimal digits and NUMBER SIGN when they are separated by N - 1 FULL STOP characters, than it is to recognize N sub-strings when they are represented by a string comprising anything from N up to 2 × N characters (or more!) whose glyphs do all have the same vertical extent—which is what your proposed syntax produces.

In other words, you require one to “find and count the NUMBER SIGNs”, and I require one to “count the numerals separated by FULL STOP”: please note that the latter is precisely what everybody who is reading a decimal numbered section heading is doing day in and day out when reading any document that uses the most common typographical style to show section numbers in current practice …

So on the contrary, I would say it would be “not surprising” if pretty much everybody would see the difference in readability the same way I do.

But maybe we also have very different mental models about human recognition abilities, or would you accept for once my argument here?


[ Edit: Ups, I forgot to answer this one, sorry … ]

I’m not saying CommonMark should adopt that syntax, […]

You are not? Then what are we discussing here after all?

[…] but if you treated numbers in enumerated lists as the value of an anonymous attribute (like I proposed) then that would be a logical conclusion.

Assuming that this implication is true, it seems to form a nice argument why one should not treat numbers in enumerated lists as the value of an anonymous attribute [whatever…] in the way you proposed, just in order to avoid the “logical consequences”.

Your #.# would be yet another syntax which could make authors confuse headings and list items if the latter also adopted the #. syntax. With my syntax, “#” is exclusive to headings and . is exclusive to lists.

I would rather say: my “#.#⎵⎵Foo” is easy to confuse with “#.⎵Fooprecisely to the degree in which, for example, “1.2⎵⎵Foo” and “2.⎵Foo” are easy to “confuse”. And that this syntax is “yet another syntax” in to precisely the same extent as numbering sections according to ISO 2145 is compared to numbering list items—both very common practices in vanilla documents which you can hardly call “obscure” or “easy to confuse”, or do you?

Are you really saying that you have difficulties with telling numbered section headings and numbered list items apart, as they appear in everyday (technical, presumably) documents? Or what is the the advantage you try to accomplish when  “#” is exclusive to headings and “.” is exclusive to lists ?


#19

Some people enter two consecutive spaces in various places deliberately, some of them even expect that to have a certain effect in output (and sometimes it does). Many more people, however, only accidentally type more than a single space and usually (i.e. in current Commonmark) they won’t notice that in rendered output either. (Unlike you and me, they may be using a proportional font for markdown source.) It’s therefore harmful to apply meaning to something like this, especially when it had always been valid to put more than one space between line marker and textual content without consequences.

Think about it, why did you choose to display a literal space by ? Because the difference is too damn hard to see otherwise!

I also see your point that you’re not considering the numbers to be the line markers, but the double space. Alas, I don’t think that makes it any better as explained above.

Finally, I agree that the most natural way to markdown a numbered heading is with numbers separated by single periods in front of the textual content of the heading, even if it can look much like an isolated list item. However, since the number signs in ATX headings are not separated in any way, this would establish a third way to denote headings in Commonmark, except for the top level obviously. I deny the applicability of the double-space convention to Commonmark, therefore – even if something like this was adopted – it depends on the parser settings and output format capabilities whether any of the following headings is actually rendered with a leading number.

A setext heading that may be numbered automatically
====
# An “implicit” ATX heading that may be numbered automatically (spec)
#  An “implicit spaced” ATX heading that should be numbered implicitly but may be not (tin-pot)
4 An “explicit" ATX heading that should be numbered explicitly but may be not (natural)
5  An “explicit spaced” ATX heading that should be numbered explicitly but may be not (tin-pot)
6# An “attributed” ATX heading that should be numbered explicitly but may be not (Crissov)

PS: I’m saying that too many people would not see the difference between 1.2 and 2. introducing a line, same goes for #. As you’ve pointed out earlier, even I did mess it up before, although I’m usually well aware of the conventions.


#20

We’re right in the middle of personal preferences here, and I wouldn’t want to guess what “some people” do or don’t, and why they would, and use my guesstimates about it as an argument.

The one argument not based on personal taste (and personal typing habits or personal visual recognition capabilities for that matter) that I offered—namely the several decades of using exactly this convention in typewritten documents—you either fail to understand or you disregard for some unstated reason.

And I for one fail to see what’s so hard about seeing the difference (while editing text) between one SPACE or more than one SPACE following a Digit or NUMBER SIGN. This has nothing to do with my use of “” to emphasize certain SPACE characters in examples: I can assure you that I can see said difference quite clear and easily in your included “setext” example, and don’t tell me you could not (while writing that example text) …


However, since the number signs in ATX headings are not separated in any way, this would establish a third way to denote headings in Commonmark, except for the top level obviously.

Well yes, let me count: we would have (1) ATX headings, like before; (2) setext headings, like before; but new (3) decimal-numbered section headings. Each with an own syntax, each with its own, different meaning (well, no: (1) and (2) have the exact same meaning …). It was my point that this would introduce a “third way” to denote headings, and because this would correspond to a different kind of headings, I can’t see where you see a problem here.


I deny the applicability of the double-space convention to CommonMark, therefore – even if something like this was adopted – it depends on the parser settings and output format capabilities […]

I can’t make sense out of this paragraph:

  1. You deny the “applicability” (whatever that means) of said syntax: fine, I do affirm it.

  2. And therefore “it” depends on “parser settings”? You deny something, and this is the reason that something depends on “parser settings”? I’m sorry, you have to help me out here.

[…] whether any of the following headings is actually rendered with a leading number.

Well, yes, it is an established fact that how everything is rendered what comes out of a CommonMark processor (with or without a leading number, bold or condensed, red or black, centered or block-justified, on and on) is outside of the reach of both the specification and the processor of CommonMark.


Maybe you understand my position when I explain it this way: remember that the point (that is, “meaning”, or “reason of existence”) of the—existing, long-established!—input numerals in “ordered” lists is:

  1. as a “strong hint” that “some form of” numbering, or individually labeling, the list items is desired in the presentation (in contrast to items in an “unordered” list);

  2. as a way to specify (the first in the automatically generated sequence of) item numbers.

This is not my opinion, this is simply paraphrasing the CommonMark (and Markdown) specification.

And it is exactly the same two purposes, to repeat and paraphrase:

  1. as a “strong hint” that “some form of” numbering, or individually labeling, the section heading is desired in the presentation (in contrast to ATX and setext headings);

  2. as a way to specify (the first in the automatically generated, hierarchical sequence of) section numbers.

So any argument you bring forward against distinguishing (in the input syntax) “numbered” and “plain” section headings immediately translates into an argument against distinguishing (in the input syntax) “ordered” and “unordered” lists.

But you seem to argue that distinguishing two types of lists is good, while distinguishing two kinds of section headings is bad, practically solely by reference to what “some people” supposedly do or can or want?

I’m saying that too many people would not see the difference between 1.2 and 2. introducing a line, […]

if they can see that the line you are talking about is either (1) separated by blank lines, or (2) part of an indented run of text containing multiple lines which start in the same way (maybe one could call this “run of text” a … let’s see: “list”?), if they can see this context and still would not see the difference between 1.2⎵⎵ and 2.⎵ introducing a line – then I’m sorry to say that maybe they are unfit for any kind of text processing.


By the way, I forgot to mention something else—this is important:

The main reason why I started to use “⎵” as a “visible” SPACE was in fact not curiosity of mine nor courtesy for the reader, but rather the shi^H^H^Hspecial Markdown processing employed by this site: try, for example, entering

"`1. `"

that is (QUOTATION MARK , GRAVE ACCENT , DIGIT ONE , FULL STOP , SPACE , GRAVE ACCENT , QUOTATION MARK ) not in a code block, but “inline”, like here: “1.” – can you see the SPACE after DIGIT ONE? I can’t either, because it gets discarded by this abomination of a would-be Markdown processor!

Now try the same, this time replacing U+0020 SPACE with U+00A0 NO-BREAK SPACE: this gets discarded too!

You can’t of course use a character reference either here, because of the “backticks”!

But the funny U+23B5 BOTTOM SQUARE BRACKET character (copy-and-pasted into the text out of the fabulous BabelMap application, which you absolutely should know and use!) somehow evades this ruthless parser, and heroically makes his way right to the end, into your and mine user agent, where it just looks like a “visible SPACE”, exactly as if yours truly, the author, had nothing else in mind from the start but the reading pleasure of anyone lurking around this curious site … :wink:

So that’s why, if you ask me.


Leading and trailing white spaces in code blocks