Default line break handling is inconvenient

Still hard for me to get the meaning of a foreign language by a glance… But that’s definitely the best choice we can get at this moment. And thanks for pointing out.

Yeah, maintaining those invisible spaces was once a nightmare to me. And for a long time when I was publishing READMEs to GitHub, I was afraid and needed to check every time before I pushed my code.

If there’s a “backslash-newline” syntax back that time it could be my saviour.

1 Like

I agree to the first post in this thread.

With all respect, IMHO, those trailing two spaces are the biggest problem of the original markdown. The quote below is from the original markdown page:

The overriding design goal for Markdown’s formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it’s been marked up with tags or formatting instructions. While
Markdown’s syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown’s
syntax is the format of plain text email.

Unfortunatly, the obligation to insert two spaces to the end of a line to get a linebreak is in direct contradiction to the above quoted “overriding design goal” itself. Those two spaces are

  • not visible in the good old emails as refered above
  • not intuitive at all. The most intuitive choice to insert a line break is obviously to use the line break character. E.g CR, LF or CRLF.
  • They are exactly the sort of formatting instructions, which it should not contain by the design goal.

Now let’s have a look to the fix in standard markdown. Example 436 introduced backslashes at the end of a line. That is much better than two trailing spaces, because they are visible. However, this is

  • still not as intuitive as the ASCII CR, LR or CRLF Charater.
  • diffiult for non US-Keyboards. If you need it more often, it becomes a pain.

Section 6.10 says:

A conforming parser may render a soft line break in HTML either as a line break or as a space.
A renderer may also provide an option to render soft line breaks as hard line breaks.

I think these two statments contradict the goal of standard markdown itself. Allowing conforming parsers to choose how they want to handle this will create a lot of inconsistencies.

My proposal:

Line breaks are very basic. An ASCII Line break should be directly translated into an HTML Linebreak, unless two or more line breaks start a new paragraph. E.g

Lorem ipsum
dolor sit amet

consectetur adipiscing elit

should always (i.e. not optionally) translate to

<p>Lorem ipsum<br>
dolor sit amet</p>
<p>consectetur adipiscing elit</p>
2 Likes

I like this. But there may be a reason why someone would want / need to break lines in the markdown, but not want that in the rendered version. Perhaps its that case that should be handled specially, with some character at the end of the line to indicate that it shouldn’t be a line break:

1 Like

I like that too. But backward compatibility is always a heavy burden to languages that ˙have been widely used.

I have undertanding for backwards compatibility concern. However, I also wrote the following post:

http://talk.commonmark.org/t/what-changed-in-standard-markdown/15/12?u=kagan

For me something is broken in the spec if it keeps mentioning trailing spaces and how they should render and I keep having to highlight the sample input text to see that, indeed, there are trailing spaces there. :smile: That seems in direct contradiction to “The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible.”

1 Like

I would like to add two study cases to strengthen my previous post on this topic:

Case 1. Poems, lyrics have many short sentences grouped into paragraphs.

E.g. A poem from Oscar Wilde (incomplete):

Option A:

I can write no stately proem
As a prelude to my lay;
From a poet to a poem
I would dare to say.

For if of these fallen petals
One to you seem fair,
Love will waft it till it settles
On your hair.

Option B:

I can write no stately proem\
As a prelude to my lay;\
From a poet to a poem\
I would dare to say.

For if of these fallen petals\
One to you seem fair,\
Love will waft it till it settles\
On your hair.

I would not ask a poet or a musician to add two spaces or a backslash at the end of each line and go for option A.

Case 2: Bulleted lists using the unicode character U+2022

• Hello
• World!

Without the two spaces at the end or without a backslash at the end, the list would be converted to

<p>• Hello
• World!</p>

which is rendered by the browser to

• Hello• World!

This destroys the original natural plain text list.

1 Like

I object to this since it’s incompatible with those who like to hardwrap their text (even though I do not myself). They’re pretty much left with no other option and have good reasons for hardwrapping plain text such as displaying in pagers such as less. You could argue that that’s the fault of less itself, but nevertheless, being plain text, the lowest common denominator of displaying plain text falls prey to reduced readability when the actual lines themselves are long.

A list with proper list markers would be converted to<ul>and<li> tags, so your case 2 issue is actually just the same as case 1. u+2022 is not a list marker (yet, at least), so this is exactly the same type of text as your poem example.

That’s why poems are usually suggested to be wrapped in <pre> tags. Poems sometimes also have all these different whitespaces (more than a single space at one time) and weird types of formatting as a form of poetic expression that the browser removes when in a normal <p> tag anyway.

6 Likes

I think you’ve confused the html itself with how the browser treats such soft line breaks in the html and displays it. Just above what you quoted as two contradictory statements (emphasis mine):

A softbreak may be rendered in HTML either as a newline or as a space. The result will be the same in browsers.

So I don’t think these are contradictory since browsers do get rid of that whitespace (or reduce it to a single space), so it doesn’t make a difference in the end, at least in a p tag.

I am not sure, if I understood this comment correctly. I believe we need a better understanding of what is meant with soft and hard line break. My example is not about long lines that don’t fit to the screen, but about short lines, which the user wants to be short.

I will try to illustrate it in a different way. In a simple plain text document, when I type

aa
bb

I get the characters a + a + CR + LF + b + b in memory, assuming an MS Windows based editor. This is what I would call a hard line break, because the user explicitly hits the return key with the full intention to get a line break.

It is NOT OK, that the user has to hit the space key twice followed by the return key to get a line break.

I think the markdown spec should respect that and translate the above to

<p>aa<br>
bb</p>

Without the <br> tag, the browser will display it as

aabb

For me, it is obvious that transforming

 aa
 bb

to

aabb

is simply wrong.

Using two spaces or a backslash at the end of the line might be a good instrument to simplify the implementation of a markdown parser. But this is IMHO too counter-intuitive to be acceptable from the users perspective.

I know, what I am proposing has consequences on examples 426 to 438. Therefore, I am formulating it more precisely as follows:

  • Merge sections 6.9 and 6.10 and call it ‘Line breaks’ Rationale: Soft line breaks happen automatically in a text editor or browser if a line is too long. It is not something that the user enters in plain text. Therefore, no need to mention in the spec. The only thing that the user has control over while typing plain text is the hard line break. When we end up with a single type of line break, we can simply call it “line break”.

  • Modify example 426: Remove only the two space characters after ‘foo’. Keep the rest of the example as is.

  • Remove Examples 427, 428

  • I believe the examples 429 and 430 could also go away, since they would not be related to line breaks anymore, but to leading spaces.

  • Example 431 would be equivalent to writing

    foo
    bar

In this case you could argue that I impose something on the user to type it differently. That would be correct. However, I strongly believe that example 431 is very rare in compare to the straight forward line breaks. I would not scarify the simple line break approach and add two spaces everywhere else just to be able to have emphasis over a multi-line text.

  • The same applies to example 432, 433 and 434. markdown should not remove the users line breaks
  • Remove the trailing two spaces of example 435 and the backslash in 436 and keep the rest as is, since the user knows exactly he is actually editing HTML and not plain text and therefore knows how a browser deals with that.
  • Remove example 437 and 438.
1 Like

My proposal:

Line breaks are very basic. An ASCII Line break should be directly translated into an HTML Linebreak, unless two or more line breaks start a new paragraph

No, this is incompatible with a large number of MD implementations. People linebreak their text for all sorts of reasons, such as preferring to keep text within a certain line-length, and those kinds of breaks must not turn into hard breaks; it would produce nonsensical breaking in the source.

As noted, the spec allows renderers to offer “all line-breaks are hard breaks” as an option. It can’t (and shouldn’t) mandate it.

4 Likes

You said that the statements:

A conforming parser may render a soft line break in HTML either as a line break or as a space.

A renderer may also provide an option to render soft line breaks as hard line breaks.

contradict each other. They don’t because the browser removes line breaks/multiple spaces down to one space in <p> tags anyway. Therefore, it doesn’t create inconsistencies.

The statement “A softbreak may be rendered in HTML either as a newline or as a space. The result will be the same in browsers” was referring to this:

aaa
bbb

being converted to:

<p>aaa
bbb</p>

or:

<p>aaa bbb</p>

Which is not a problem because the browser renders both in the same way:

aaa bbb

The spec is quite clear on this. A hard line break is quote:

A line break (not in a code span or HTML tag) that is preceded by two or more spaces is parsed as a linebreak

Soft line break (emphasis mine):

A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces is parsed as a softbreak.

In addition to this, “two or more spaces” can be swapped with “a backslash”; see example 427.

Your definition of hard line breaks and soft line breaks:

  • Hard line breaks are when the user hits the enter key (and gets a carriage return/linefeed/or both, depending on system).
  • Soft line breaks are when the displayer/viewer (the browser) decides to wrap lines that are too long.

is completely different. Stmd is always referring to when you hit the enter key, your “hard line break”, as a ‘line break’.

  • Your soft line breaks are not referred to in the spec, since as you say, there is “no need to mention it in the spec”.
  • Stmd then differentiates between your ‘hard line breaks’, and splits them into what it calls a “hard line break”, or a “soft line break”, depending on whether or not they are preceded by a back slash or two or more spaces, provided they are not within a code span or html tag.

I know that this is exactly what you are arguing for though, because when all line breaks (whenever you press enter not in a code span/html tag) are converted to a <br> tag, there is no need to split up the definition of ‘line break’ into whether or not it is preceded by a back slash or two or more spaces, i.e. what stmd currently refers to as “hard” or “soft” line breaks.

I just wanted to clear up the differences between your definition of soft and hard line breaks and what stmd currently uses.


On that note though, I still disagree with your proposal. Turning all line breaks into <br> tags certainly must not happen as it would disastrous to backwards compatibility to say the least. You still haven’t provided a solution for those that hardwrap their text, other than simply saying it’s wrong.

So that is what I meant. Under your proposal, the only way to have paragraphs without <br> tags forced in it, is to have long lines obviously. The desire to not have <br> tags in the html output is because you might not exactly want to wrap at those points when the paragraph is displayed in the browser (let the browser do the wrapping for you).

(But of course you may still want to have hard line breaks in the original text for better readability in the original text itself. For example, in emails, where plain text emails is much more commonly used than html emails.)

This is how it has always been for a long time and changing that would certainly be against the spirit of markdown since markdown was all about what people already did and creating a syntax from that. I’m all for ‘moving forward’, but this would be the wrong decision and break too many things in my opinion.


No, example 435/436 is about how stmd handles these cases. These examples help clarify it.

2 Likes

OK, I agree, let’s stick to the definition of hard and soft linebreaks as in this spec.

Here we had an other misunderstanding. My proposal above does not have such a consequence. I quote myself:

That means an empty line in the plain text, which is the same as two subsequent line feed characters create a paragrah. This spec suggests that more than one empty lines have the same effect. I am fine with that.

Concerning the backwards compatibility, you wrote:

The issue is that we already have a disastrous compatibility issue among the different flavors. Github has without no doubt a very large user base. They implemented the linebreaks the way I propose. It is not my invention. That means, there are already tons of Github flavored text out there. They would be broken, if the current spec would become as common as we all want it to be.

Here is a quote form Github at Writing on GitHub - GitHub Docs

Newlines

The biggest difference with writing on GitHub is the way we handle linebreaks.
With Markdown, you can hard wrap paragraphs of text to have them combine into a
single paragraph. We find this causes a huge number of unintentional formatting
errors. In comments, GitHub treats newlines in paragraph-like content as real line breaks,
which is usually what you intended.

The next paragraph contains two phrases separated by a single newline character:

Roses are red
Violets are blue


becomes

Roses are red
Violets are blue

I would like to highlight the sentence

We find this causes a huge number of unintentional formatting errors.

I guess they have one of the largest user bases. Commonmark should consider this feedback.

2 Likes

What @rwzy means is that the following piece of Markdown (written by Gruber, straight from the source of the original Markdown spec) contains loads of line breaks. By your reasoning, when this piece of text is converted to HTML it should have br elements added behind every line. That would look really weird. The expected behaviour is that the text becomes a single paragraph with no HTML line breaks added in.

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

According to Babelmark 2 only 1 out of the 25 tested Markdown implementations will add those line breaks.

3 Likes

I understand the problem of Asian or other languages that have a different way of using line breaks than latin/greek based languages do.

However, I suspect that markdown is impractical for those languages for lots of other reasons as well, and the goal of Markdown is not to be the be-all, end-all markup language, it’s to make writing prose faster.

So, no, it’s not very useful for poems or other things that require very precise formatting.

Additionally, changing the linebreak semantics would break backwards compatibility in a very big and painful to correct way. I have hundreds of .md files I’ve written over the years, where I use linebreaks to make the file easier to read in plain text mode (so, generally, all lines are shorter than 80 characters).

If this change was made, I’d have to go back and re-format all those files manually, because some line breaks are inside code blocks or other places, so reformatting them can’t really be automated without a lot of hassle.

Now scale this challenge up to almost all open source projects on Github (almost everyone has a README.md, all questions and comments on Stack Overflow. Fixing all those would require many man-years of effort.

So in short, while I have sympathy for problems with other scripting languages, I don’t think this is a wise or reasonable change.

4 Likes

Do you prefer to exclude those who write stuff in other languages?

Not nice!

Also here you exclude a group of users.

Again, not nice!

Additionally, who said markdown is about things that require precise formatting? That idea is against the very spirit of markdown. Those, who need precise formatting are better off with markup languages.

I am confused. Github is the one implementation, which does respect the users line breaks. in other words, this would be exactly the group of users, who would not have to change anything. See my comment about github above.

Interesting. However, it doesn’t include github.

Additionally, here in this forum, it works the github way. I don’t add two spaces and no backslashes and my line breaks are very well respected.

Why would that be weird? The user doesn’t see any <br> elements. What is weird in my opinion is the fact that the original markdown practically removes my line breaks. When I tell my friends or colleagues that they have to add two spaces at the end of tyhe line to get a line break, they just laugh at me.

This means keep the line breaks. Don’t swallow them;-)

1 Like

Poems should be wrapped in <pre> tags because they require that their precise formatting be kept and not removed by the browser. Remember, browsers reduce extra whitespace to a single space in <p> tags. <pre>tags don’t require a monospace font, which can be changed with css anyway if that is the case. Or give it a class ‘poem’ and change the font for all pre.poem elements.

That is exactly what mikl meant: “it’s not very useful for poems or other things that require precise formatting.”

Github only does that for things like issues, comments and pull request descriptions. When displaying a README.md file, github does change:

aaa
bbb

to:

aaa bbb

Github also always strongly suggest that every project has such a file.

The user obviously doesn’t see the characters <br>, but the browser does display the <br> tags in the form of line breaks. In other words, the browser forcefully wraps the text at the place of each <br> tag which is not desired and ‘looks weird’. The reason this would look weird in the browser and not when viewing the plain text is because fixed width fonts are not used when viewing in the browser unlike when viewing plain text.

No, it means exactly to swallow them when producing html. The following is what many people write like, one such massive use case is email, but also just in general, meaning hardwrapping their text:

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

The intention here is that when viewed in plain text form, the text will always be wrapped after the words make, formatted, looking and so on from the above example. However, the whole thing should be interpreted as a paragraph when converted to HTML. So it should only be wrapped in <p> tags and no <br> should be inputted after the words make, formatted, looking etc. This is because they want the browser (through css or otherwise) to handle such wrapping for them, as it is much more flexible that way.

Any <br> tags placed in those positions have no semantic meaning, and hence should not be there when the browser has more advanced methods to wrap text. The whole point of markdown is to be able to make text ‘look pretty’ or readable in plain text form. Inserting line breaks by hand usually at colum 72 or 80, hardwrapping, makes paragraphs look better when viewed in plain text.

With your proposal, those who hardwrap their text like that are forced to do:

The overriding design goal for Markdown's formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions. While Markdown's syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown's syntax is the format of plain text email.

to get the same effect (i.e. not having any <br> tags inserted inside the paragraph). Notice how it’s one very long line. (If you view the source, you’ll notice the hardwrapping example has \n after the words make, formatted, looking and so on, whereas the above example does not.) Every paragraph would have to be like that.

1 Like

One thing I believe no one has mentioned here is that if you don’t hardwrap your paragraphs, the diffs (in Git, Hg, SVN, etc.) are basically impossible to read. Which is an inconvenience for anyone who uses Jekyll, and a show stopper for anyone who cares about auditing changes.

I certainly have objection to this (hardwrapped lines turning into <br>s) being an optional extension of a CommonMark parser, but I would strongly object to it being turned on by default.

2 Likes