[I sent a reply by email, but for some reason it didn’t come through. So, apologies if this is a duplicate.]
On Jul 26, 2022, at 11:13 AM, vas via CommonMark Discussion <noreply@talk.commonmark.org> wrote:
If you eliminate manual line wrapping (treating newlines as soft breaks), then it gets considerably better. First, a large source of ambiguity, what @cben calls “Hard-wrapping is a lossy action”, goes away. There simply would be no line break before 9).
I don’t understand your response to my point. Most likely we are understanding some key terms, like “soft break,” differently. As I was using the term here, a “soft break” is a newline in the markdown source that is interpreted semantically as a space. Djot does “treat newlines as soft breaks” in this sense, so I assume you mean something different.
I understood you to be proposing that newlines in paragraphs would produce a hard line break in the rendered output. Assuming that’s right, then you face an ambiguity in
She counted (all the way from 1 to
9) and then went in.
Is this a single paragraph with a hard break,
<p>She counted (all the way from 1 to<br>
9) and then went in.</p>
or is it a paragraph followed by a list?
<p>She counted (all the way from 1 to</p>
<ol start=9><li>and then went in</li></ol>
That’s the ambiguity. It’s just the same as the ambiguity we face now with commonmark, and which we resolve with the unprincipled restriction on start numbers. I guess you like that way of resolving it, but I’ve never been happy with it. In any case, it comes up whether or not you allow hard wrapping (i.e., treat newlines in paragraphs as equivalent to spaces).
As for your main point: certainly, djot falls a bit to the left of markdown. With markdown and commonmark, the aim was to magically guess what humans intended, as far as possible. My mantra was always “favor what is intuitive to humans, and make the parser more complex if necessary.” That is why the parsing rules are so complex. The problem is, even with all this complexity, there are many cases where we don’t get the results people intuitively intend. So, my choice with djot is to give up trying. Performing the task well would require a high degree of general intelligence. Maybe in the future, some successor of GPT-3 will be used to parse our plain text documents, but for now, I’d rather have a simple set of rules that we can keep in our heads, so the output is predictable.
To elaborate: in my example above, any human can tell that the 9)
on the second line is the end of the parenthesized phrase in the preceding line, and not the start of a list, whereas in
A further point is
9) blah blah blah
we have a list. But in doing this we’re relying on our grasp of the meaning of what is written; it’s very hard to predict human intentions in such cases with a set of syntactic rules. Consider the minimal pair:
A more interesting number is
6. This is a "perfect" number.
vs
A more interesting point is
6. This is a "perfect" number.
In the second case a list is probably intended (depending on preceding content), while in the first a list is not intended. We can figure that out because we know the difference between “point” and “number” and we have a grip on what the writer is trying to achieve. Our markdown parsers can’t learn this by being given more and more complex rules. They’d need to have a psychological model of the humans writing the text, and an understandig of the meanings of the words.
The idea behind djot is to keep things simple, uniform and predictable while still achieving most of the aims of markdown.
John