Default line break handling is inconvenient

In China, people like Markdown too. But I don’t think that’s a nice solution for us to create new lines with many spaces before newlines as described at:

jgm.github.io/stmd/spec.html#hard-line-breaks

A line break (not in a code span or HTML tag) that is preceded by two or more spaces is parsed as a linebreak (rendered in HTML as a
tag)

Yes there’s an alternative: GFM from GitHub, which treat line breaks just like in normal text. And it’s lucky for us.

Personally there’s some reasons I don’t like the standard way of creating line breaks. In China, we use Chinese characters, and paragraphs are tend to be a dense chunk of dots like this. An easy solution is breaking paragraphs into more lines to make it clear. I did that a lot.

Other people who write in Chinese might not use that as frequently as I do. But there are still a lot cases we need the texts to be formatted. An example is this from Ruby China.

报名方式:直接来就可以了,当然回复一下更好哇~
费用:免费
时间:2014年9月14号(周日)下午2点至5点半
地点:北京市海淀大街38号银科大厦底商Black Jack咖啡 link
交通:地铁十号线,苏州街站下车,往北走;离创业街也不远

I can’t say we use that in most cases, but just there are a lot cases we need line breaks.

I want to list some Chinese forum on programming below. Most of them treat line breaks as GFM does when they choose Markdown:

And some are not, they follow the standard Markdown and need spaces to create line breaks:

So I think these should be taken into consideration in making the standard Markdown.

3 Likes

I came here to discuss the "two or more spaces followed by a line break means a <br/> tag, too, because I personally dislike formatting based on something I can’t see in a normal editor (leading spaces are a different matter), But I see it is in the original Markdown write-up by Gruber (here), and hence the likelihood of something that entrenched may not be easily changed, even if other MD parsers do something different.

That said, I wish it wasn’t the behavior because again, I dislike format-controlling syntax based on “invisible” text, at least in something that is supposed to be textual like MD. Of course, it leads to the problem that not everyone wants a hard break placed everywhere there is a hard break in the original text, so how do you deal with that, etc.? Not an easy problem. Perhaps the answer is to just use <br/> when you really want a hard break?

1 Like

The solution to this is actually mentioned in the spec. What you want is §6.10, Soft Line Breaks to be rendered as <br> elements.

It actually explicitly says that:

A renderer may also provide an option to render soft line breaks as hard line breaks.

2 Likes

One thing I see people often do is forget to put a line break between a list ‘heading’ and the list contents:

Does the Soft Line Breaks option allow this to be rendered as a list?

my list of things:

  • thing 1
  • thing 2

vs.

my list of things

  • thing 1
  • thing 2
1 Like

backslash-newline is an alternative to the trailing spaces syntax: http://jgm.github.io/stmd/spec.html#hard-line-breaks

2 Likes

Still hard for me to get the meaning of a foreign language by a glance… But that’s definitely the best choice we can get at this moment. And thanks for pointing out.

Yeah, maintaining those invisible spaces was once a nightmare to me. And for a long time when I was publishing READMEs to GitHub, I was afraid and needed to check every time before I pushed my code.

If there’s a “backslash-newline” syntax back that time it could be my saviour.

1 Like

I agree to the first post in this thread.

With all respect, IMHO, those trailing two spaces are the biggest problem of the original markdown. The quote below is from the original markdown page:

The overriding design goal for Markdown’s formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it’s been marked up with tags or formatting instructions. While
Markdown’s syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown’s
syntax is the format of plain text email.

Unfortunatly, the obligation to insert two spaces to the end of a line to get a linebreak is in direct contradiction to the above quoted “overriding design goal” itself. Those two spaces are

  • not visible in the good old emails as refered above
  • not intuitive at all. The most intuitive choice to insert a line break is obviously to use the line break character. E.g CR, LF or CRLF.
  • They are exactly the sort of formatting instructions, which it should not contain by the design goal.

Now let’s have a look to the fix in standard markdown. Example 436 introduced backslashes at the end of a line. That is much better than two trailing spaces, because they are visible. However, this is

  • still not as intuitive as the ASCII CR, LR or CRLF Charater.
  • diffiult for non US-Keyboards. If you need it more often, it becomes a pain.

Section 6.10 says:

A conforming parser may render a soft line break in HTML either as a line break or as a space.
A renderer may also provide an option to render soft line breaks as hard line breaks.

I think these two statments contradict the goal of standard markdown itself. Allowing conforming parsers to choose how they want to handle this will create a lot of inconsistencies.

My proposal:

Line breaks are very basic. An ASCII Line break should be directly translated into an HTML Linebreak, unless two or more line breaks start a new paragraph. E.g

Lorem ipsum
dolor sit amet

consectetur adipiscing elit

should always (i.e. not optionally) translate to

<p>Lorem ipsum<br>
dolor sit amet</p>
<p>consectetur adipiscing elit</p>
2 Likes

I like this. But there may be a reason why someone would want / need to break lines in the markdown, but not want that in the rendered version. Perhaps its that case that should be handled specially, with some character at the end of the line to indicate that it shouldn’t be a line break:

1 Like

I like that too. But backward compatibility is always a heavy burden to languages that ˙have been widely used.

I have undertanding for backwards compatibility concern. However, I also wrote the following post:

http://talk.commonmark.org/t/what-changed-in-standard-markdown/15/12?u=kagan

For me something is broken in the spec if it keeps mentioning trailing spaces and how they should render and I keep having to highlight the sample input text to see that, indeed, there are trailing spaces there. :smile: That seems in direct contradiction to “The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible.”

1 Like

I would like to add two study cases to strengthen my previous post on this topic:

Case 1. Poems, lyrics have many short sentences grouped into paragraphs.

E.g. A poem from Oscar Wilde (incomplete):

Option A:

I can write no stately proem
As a prelude to my lay;
From a poet to a poem
I would dare to say.

For if of these fallen petals
One to you seem fair,
Love will waft it till it settles
On your hair.

Option B:

I can write no stately proem\
As a prelude to my lay;\
From a poet to a poem\
I would dare to say.

For if of these fallen petals\
One to you seem fair,\
Love will waft it till it settles\
On your hair.

I would not ask a poet or a musician to add two spaces or a backslash at the end of each line and go for option A.

Case 2: Bulleted lists using the unicode character U+2022

• Hello
• World!

Without the two spaces at the end or without a backslash at the end, the list would be converted to

<p>• Hello
• World!</p>

which is rendered by the browser to

• Hello• World!

This destroys the original natural plain text list.

1 Like

I object to this since it’s incompatible with those who like to hardwrap their text (even though I do not myself). They’re pretty much left with no other option and have good reasons for hardwrapping plain text such as displaying in pagers such as less. You could argue that that’s the fault of less itself, but nevertheless, being plain text, the lowest common denominator of displaying plain text falls prey to reduced readability when the actual lines themselves are long.

A list with proper list markers would be converted to<ul>and<li> tags, so your case 2 issue is actually just the same as case 1. u+2022 is not a list marker (yet, at least), so this is exactly the same type of text as your poem example.

That’s why poems are usually suggested to be wrapped in <pre> tags. Poems sometimes also have all these different whitespaces (more than a single space at one time) and weird types of formatting as a form of poetic expression that the browser removes when in a normal <p> tag anyway.

6 Likes

I think you’ve confused the html itself with how the browser treats such soft line breaks in the html and displays it. Just above what you quoted as two contradictory statements (emphasis mine):

A softbreak may be rendered in HTML either as a newline or as a space. The result will be the same in browsers.

So I don’t think these are contradictory since browsers do get rid of that whitespace (or reduce it to a single space), so it doesn’t make a difference in the end, at least in a p tag.

I am not sure, if I understood this comment correctly. I believe we need a better understanding of what is meant with soft and hard line break. My example is not about long lines that don’t fit to the screen, but about short lines, which the user wants to be short.

I will try to illustrate it in a different way. In a simple plain text document, when I type

aa
bb

I get the characters a + a + CR + LF + b + b in memory, assuming an MS Windows based editor. This is what I would call a hard line break, because the user explicitly hits the return key with the full intention to get a line break.

It is NOT OK, that the user has to hit the space key twice followed by the return key to get a line break.

I think the markdown spec should respect that and translate the above to

<p>aa<br>
bb</p>

Without the <br> tag, the browser will display it as

aabb

For me, it is obvious that transforming

 aa
 bb

to

aabb

is simply wrong.

Using two spaces or a backslash at the end of the line might be a good instrument to simplify the implementation of a markdown parser. But this is IMHO too counter-intuitive to be acceptable from the users perspective.

I know, what I am proposing has consequences on examples 426 to 438. Therefore, I am formulating it more precisely as follows:

  • Merge sections 6.9 and 6.10 and call it ‘Line breaks’ Rationale: Soft line breaks happen automatically in a text editor or browser if a line is too long. It is not something that the user enters in plain text. Therefore, no need to mention in the spec. The only thing that the user has control over while typing plain text is the hard line break. When we end up with a single type of line break, we can simply call it “line break”.

  • Modify example 426: Remove only the two space characters after ‘foo’. Keep the rest of the example as is.

  • Remove Examples 427, 428

  • I believe the examples 429 and 430 could also go away, since they would not be related to line breaks anymore, but to leading spaces.

  • Example 431 would be equivalent to writing

    foo
    bar

In this case you could argue that I impose something on the user to type it differently. That would be correct. However, I strongly believe that example 431 is very rare in compare to the straight forward line breaks. I would not scarify the simple line break approach and add two spaces everywhere else just to be able to have emphasis over a multi-line text.

  • The same applies to example 432, 433 and 434. markdown should not remove the users line breaks
  • Remove the trailing two spaces of example 435 and the backslash in 436 and keep the rest as is, since the user knows exactly he is actually editing HTML and not plain text and therefore knows how a browser deals with that.
  • Remove example 437 and 438.
1 Like

My proposal:

Line breaks are very basic. An ASCII Line break should be directly translated into an HTML Linebreak, unless two or more line breaks start a new paragraph

No, this is incompatible with a large number of MD implementations. People linebreak their text for all sorts of reasons, such as preferring to keep text within a certain line-length, and those kinds of breaks must not turn into hard breaks; it would produce nonsensical breaking in the source.

As noted, the spec allows renderers to offer “all line-breaks are hard breaks” as an option. It can’t (and shouldn’t) mandate it.

4 Likes

You said that the statements:

A conforming parser may render a soft line break in HTML either as a line break or as a space.

A renderer may also provide an option to render soft line breaks as hard line breaks.

contradict each other. They don’t because the browser removes line breaks/multiple spaces down to one space in <p> tags anyway. Therefore, it doesn’t create inconsistencies.

The statement “A softbreak may be rendered in HTML either as a newline or as a space. The result will be the same in browsers” was referring to this:

aaa
bbb

being converted to:

<p>aaa
bbb</p>

or:

<p>aaa bbb</p>

Which is not a problem because the browser renders both in the same way:

aaa bbb

The spec is quite clear on this. A hard line break is quote:

A line break (not in a code span or HTML tag) that is preceded by two or more spaces is parsed as a linebreak

Soft line break (emphasis mine):

A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces is parsed as a softbreak.

In addition to this, “two or more spaces” can be swapped with “a backslash”; see example 427.

Your definition of hard line breaks and soft line breaks:

  • Hard line breaks are when the user hits the enter key (and gets a carriage return/linefeed/or both, depending on system).
  • Soft line breaks are when the displayer/viewer (the browser) decides to wrap lines that are too long.

is completely different. Stmd is always referring to when you hit the enter key, your “hard line break”, as a ‘line break’.

  • Your soft line breaks are not referred to in the spec, since as you say, there is “no need to mention it in the spec”.
  • Stmd then differentiates between your ‘hard line breaks’, and splits them into what it calls a “hard line break”, or a “soft line break”, depending on whether or not they are preceded by a back slash or two or more spaces, provided they are not within a code span or html tag.

I know that this is exactly what you are arguing for though, because when all line breaks (whenever you press enter not in a code span/html tag) are converted to a <br> tag, there is no need to split up the definition of ‘line break’ into whether or not it is preceded by a back slash or two or more spaces, i.e. what stmd currently refers to as “hard” or “soft” line breaks.

I just wanted to clear up the differences between your definition of soft and hard line breaks and what stmd currently uses.


On that note though, I still disagree with your proposal. Turning all line breaks into <br> tags certainly must not happen as it would disastrous to backwards compatibility to say the least. You still haven’t provided a solution for those that hardwrap their text, other than simply saying it’s wrong.

So that is what I meant. Under your proposal, the only way to have paragraphs without <br> tags forced in it, is to have long lines obviously. The desire to not have <br> tags in the html output is because you might not exactly want to wrap at those points when the paragraph is displayed in the browser (let the browser do the wrapping for you).

(But of course you may still want to have hard line breaks in the original text for better readability in the original text itself. For example, in emails, where plain text emails is much more commonly used than html emails.)

This is how it has always been for a long time and changing that would certainly be against the spirit of markdown since markdown was all about what people already did and creating a syntax from that. I’m all for ‘moving forward’, but this would be the wrong decision and break too many things in my opinion.


No, example 435/436 is about how stmd handles these cases. These examples help clarify it.

2 Likes

OK, I agree, let’s stick to the definition of hard and soft linebreaks as in this spec.

Here we had an other misunderstanding. My proposal above does not have such a consequence. I quote myself:

That means an empty line in the plain text, which is the same as two subsequent line feed characters create a paragrah. This spec suggests that more than one empty lines have the same effect. I am fine with that.

Concerning the backwards compatibility, you wrote:

The issue is that we already have a disastrous compatibility issue among the different flavors. Github has without no doubt a very large user base. They implemented the linebreaks the way I propose. It is not my invention. That means, there are already tons of Github flavored text out there. They would be broken, if the current spec would become as common as we all want it to be.

Here is a quote form Github at Writing on GitHub - GitHub Docs

Newlines

The biggest difference with writing on GitHub is the way we handle linebreaks.
With Markdown, you can hard wrap paragraphs of text to have them combine into a
single paragraph. We find this causes a huge number of unintentional formatting
errors. In comments, GitHub treats newlines in paragraph-like content as real line breaks,
which is usually what you intended.

The next paragraph contains two phrases separated by a single newline character:

Roses are red
Violets are blue


becomes

Roses are red
Violets are blue

I would like to highlight the sentence

We find this causes a huge number of unintentional formatting errors.

I guess they have one of the largest user bases. Commonmark should consider this feedback.

2 Likes

What @rwzy means is that the following piece of Markdown (written by Gruber, straight from the source of the original Markdown spec) contains loads of line breaks. By your reasoning, when this piece of text is converted to HTML it should have br elements added behind every line. That would look really weird. The expected behaviour is that the text becomes a single paragraph with no HTML line breaks added in.

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

According to Babelmark 2 only 1 out of the 25 tested Markdown implementations will add those line breaks.

3 Likes