Default line break handling is inconvenient

OK, I agree, let’s stick to the definition of hard and soft linebreaks as in this spec.

Here we had an other misunderstanding. My proposal above does not have such a consequence. I quote myself:

That means an empty line in the plain text, which is the same as two subsequent line feed characters create a paragrah. This spec suggests that more than one empty lines have the same effect. I am fine with that.

Concerning the backwards compatibility, you wrote:

The issue is that we already have a disastrous compatibility issue among the different flavors. Github has without no doubt a very large user base. They implemented the linebreaks the way I propose. It is not my invention. That means, there are already tons of Github flavored text out there. They would be broken, if the current spec would become as common as we all want it to be.

Here is a quote form Github at https://help.github.com/articles/writing-on-github

Newlines

The biggest difference with writing on GitHub is the way we handle linebreaks.
With Markdown, you can hard wrap paragraphs of text to have them combine into a
single paragraph. We find this causes a huge number of unintentional formatting
errors. In comments, GitHub treats newlines in paragraph-like content as real line breaks,
which is usually what you intended.

The next paragraph contains two phrases separated by a single newline character:

Roses are red
Violets are blue


becomes

Roses are red
Violets are blue

I would like to highlight the sentence

We find this causes a huge number of unintentional formatting errors.

I guess they have one of the largest user bases. Commonmark should consider this feedback.

2 Likes

What @rwzy means is that the following piece of Markdown (written by Gruber, straight from the source of the original Markdown spec) contains loads of line breaks. By your reasoning, when this piece of text is converted to HTML it should have br elements added behind every line. That would look really weird. The expected behaviour is that the text becomes a single paragraph with no HTML line breaks added in.

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

According to Babelmark 2 only 1 out of the 25 tested Markdown implementations will add those line breaks.

3 Likes

I understand the problem of Asian or other languages that have a different way of using line breaks than latin/greek based languages do.

However, I suspect that markdown is impractical for those languages for lots of other reasons as well, and the goal of Markdown is not to be the be-all, end-all markup language, it’s to make writing prose faster.

So, no, it’s not very useful for poems or other things that require very precise formatting.

Additionally, changing the linebreak semantics would break backwards compatibility in a very big and painful to correct way. I have hundreds of .md files I’ve written over the years, where I use linebreaks to make the file easier to read in plain text mode (so, generally, all lines are shorter than 80 characters).

If this change was made, I’d have to go back and re-format all those files manually, because some line breaks are inside code blocks or other places, so reformatting them can’t really be automated without a lot of hassle.

Now scale this challenge up to almost all open source projects on Github (almost everyone has a README.md, all questions and comments on Stack Overflow. Fixing all those would require many man-years of effort.

So in short, while I have sympathy for problems with other scripting languages, I don’t think this is a wise or reasonable change.

4 Likes

Do you prefer to exclude those who write stuff in other languages?

Not nice!

Also here you exclude a group of users.

Again, not nice!

Additionally, who said markdown is about things that require precise formatting? That idea is against the very spirit of markdown. Those, who need precise formatting are better off with markup languages.

I am confused. Github is the one implementation, which does respect the users line breaks. in other words, this would be exactly the group of users, who would not have to change anything. See my comment about github above.

Interesting. However, it doesn’t include github.

Additionally, here in this forum, it works the github way. I don’t add two spaces and no backslashes and my line breaks are very well respected.

Why would that be weird? The user doesn’t see any <br> elements. What is weird in my opinion is the fact that the original markdown practically removes my line breaks. When I tell my friends or colleagues that they have to add two spaces at the end of tyhe line to get a line break, they just laugh at me.

This means keep the line breaks. Don’t swallow them;-)

1 Like

Poems should be wrapped in <pre> tags because they require that their precise formatting be kept and not removed by the browser. Remember, browsers reduce extra whitespace to a single space in <p> tags. <pre>tags don’t require a monospace font, which can be changed with css anyway if that is the case. Or give it a class ‘poem’ and change the font for all pre.poem elements.

That is exactly what mikl meant: “it’s not very useful for poems or other things that require precise formatting.”

Github only does that for things like issues, comments and pull request descriptions. When displaying a README.md file, github does change:

aaa
bbb

to:

aaa bbb

Github also always strongly suggest that every project has such a file.

The user obviously doesn’t see the characters <br>, but the browser does display the <br> tags in the form of line breaks. In other words, the browser forcefully wraps the text at the place of each <br> tag which is not desired and ‘looks weird’. The reason this would look weird in the browser and not when viewing the plain text is because fixed width fonts are not used when viewing in the browser unlike when viewing plain text.

No, it means exactly to swallow them when producing html. The following is what many people write like, one such massive use case is email, but also just in general, meaning hardwrapping their text:

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

The intention here is that when viewed in plain text form, the text will always be wrapped after the words make, formatted, looking and so on from the above example. However, the whole thing should be interpreted as a paragraph when converted to HTML. So it should only be wrapped in <p> tags and no <br> should be inputted after the words make, formatted, looking etc. This is because they want the browser (through css or otherwise) to handle such wrapping for them, as it is much more flexible that way.

Any <br> tags placed in those positions have no semantic meaning, and hence should not be there when the browser has more advanced methods to wrap text. The whole point of markdown is to be able to make text ‘look pretty’ or readable in plain text form. Inserting line breaks by hand usually at colum 72 or 80, hardwrapping, makes paragraphs look better when viewed in plain text.

With your proposal, those who hardwrap their text like that are forced to do:

The overriding design goal for Markdown's formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions. While Markdown's syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown's syntax is the format of plain text email.

to get the same effect (i.e. not having any <br> tags inserted inside the paragraph). Notice how it’s one very long line. (If you view the source, you’ll notice the hardwrapping example has \n after the words make, formatted, looking and so on, whereas the above example does not.) Every paragraph would have to be like that.

1 Like

One thing I believe no one has mentioned here is that if you don’t hardwrap your paragraphs, the diffs (in Git, Hg, SVN, etc.) are basically impossible to read. Which is an inconvenience for anyone who uses Jekyll, and a show stopper for anyone who cares about auditing changes.

I certainly have objection to this (hardwrapped lines turning into <br>s) being an optional extension of a CommonMark parser, but I would strongly object to it being turned on by default.

2 Likes

This kind of thing should be done with some sort of markup language, e.g. HTML, LATEX etc. Not markdown, which is certainly not made for precise formatting supported by CSS classes to make the basics look right.

I have more examples where it makes sense to use line breaks without creating new paragraphs. E.g. my address could look like

My street 42
95412 My City
My Country
My phone: 123456789

The original markdown would present this like

My street 42 95412 My City My Country My phone: 123456789

Not acceptable to me.

I have already provided many examples and could add more. But will the answer always be “The user shall use code blocks”? No, I completely disagree. That demand reminds me Apples reaction during the antennagate.

The user has to hold the phone differently :wink:

1 Like

Markdown allows raw html and <pre> tags are block level elements so they don’t have to mix with markdown.

The answer is to prepend each newline with 2 spaces or a backslash. And <pre> tags not are not <code> tags, they are for ‘preformatted text’, which is what poems usually are. (Sometimes poets even use multiple spaces as an effect. <br> tags cannot help retain that when in <p> tags.) Paragraph tags should be for prose, not poetry.

You still have not provided a solution for those who hardwrap their paragraphs, forcing them to not hardwrap at all, which is “not acceptable” to many. Adding more examples where the division of lines is significant is not going to help that.

1 Like

I wonder if we can signify that a line should be treated as “hardwrapped” by a - at end of each line. I see it in some old books that had split words.

This is a hard wrapped text starting from here -
and then continuing here so it is better. -
Traditionally it is used in typewriters, -
to deal with some very long worrrrrr-
rrrrrrds that should be read as one word.

This is an unwrapped text
</br> tags are inserted here automagically.

For multilines hardwrap maybe this

--- This is a hard wrapped text starting from here
and then continuing here so it looks better.
1 Like

Hyphens normally indicate word breaks, not line breaks and therefore that would be ugly and confusing.

I think a simpler solution would be to have this as an ‘extension’, so that people who desire this and know that they never hardwrap their text can have it while everyone else (who hardwraps or otherwise) can still expect the same as in the original markdown.

Okay, well then if this one does not work, then hard to say what would work:

I often see paragraphs in books looking like

      Lorem ipsum dolor sit amet, consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 

      Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat. 

Obviously in markdown, 1 tab means “code”. But could perhaps preface it with :

:      Duis aute irure dolor in reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur. 

:      Excepteur sint occaecat cupidatat non proident, 
sunt in culpa qui officia deserunt mollit anim id est laborum.
:      Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat. 

That could be a neat way to signify that it should be explicitly treated as hardwrapped text.


I guess it could be used either to signify hardwrapped text, or non hardwrapped. Whoever wins the fight perhaps? I say we should break tradition though.

It is common practise to indicate a new paragraph with an indented first line.

Wouldn’t that be against the goal of this project, since commonmark is about being compatible to the original markdown? It is tradition for good reason, and markdown is about respecting tradition, i.e. what people already use.

If it would be to signify non-hardwrapped text, I think the backspace syntax is much superior. But what is the objection to having it in the ‘extensions’ category? Doesn’t everyone win in that case without having to change their existing documents?

Could you provide a few precise examples, with some explanation of what kind of text you are trying to save? What would be broken in which way?

In principle yes. The issue is, “what people already use” is different.

The last example: Default line break handling is inconvenient.

OK, I have revisited that example.

When I write a piece of text and expect other people to read it, I have no idea about the size of their screen. It could be as small as a smartphone and only a few words would fit in a line. An other user might use a tablet, a third one might have a very large screen with 4000 pixels. As the author, I have no control over that. Even if someone would quote our discussion on a piece of paper or carve into stone, he or she would wrap the sentences or even break words to fit them into that environment.

Until the original text has a line break.

I have used for sure more than 100 different vievers and editors in my life. From the terminal editor vi to MS Word. I haven’t seen a single viewer or editor, which is not able to wrap the text to fit to the screen or window size.

For this reason, it is not wise to press the return key somewhere in the middle of a text just to enforce a linebreak after 80 characters in the editor that I am using right now. I never do. I only press Return when I have a reason. E.g. the Address or poem.

If the intend is to have a paragraph, then the solution is to have the long line. There is nothing wrong with the long line. The viewer deals with the best fit. Apart from the very new markdown editors on the market, from all the editors I have never seen one, that thinks for me your way and removes the linebreaks, which I have entered by intention.

We may like it or not, but MS Word is probably the most used editor on the planet. It doesn’t remove any line breaks when I paste some text and this is what people are very well used to. This is definitly the largest user base. And even the stone age vi (which I still use, where appropriate), the behaviour is the same. For good reasons. The editor should not change my decisions. It has to work for me, not against me.

The same applies to markdown.

In other words, the solution you asked for in that example is simply not to use hard wrapped text in the first place.

Imagine markdown would be by miracle so popular that half of the text world-wide would be formatted this way, there would be still the other half displayed in other software and would be terrible with all these strange bakslashes and invisible spaces. Copy and paste would become a pain.

Again, there is nothing wrong with one long line. It looks best and works best with all software I have ever seen without line breaks within a paragraph of fluid text. It is and shall remain flexible.

However, there is something wrong with software, which reverses my decisions and removes the hard wrapped lines where I had inserted them by intention. It is wrong if the software removes my freedom and dictates me what is right and what is wrong.

Now one might say, what about all the text out there, which has already those line breaks. The ones, which you are concerned about, which is fair to be concerned about. Let’s repeat your example. First the input:

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

And now, the way it would look like after a markdown transformation in the way I propose:

The overriding design goal for Markdown's formatting syntax is to make
it as readable as possible. The idea is that a Markdown-formatted
document should be publishable as-is, as plain text, without looking
like it's been marked up with tags or formatting instructions. While
Markdown's syntax has been influenced by several existing text-to-HTML
filters, the single biggest source of inspiration for Markdown's
syntax is the format of plain text email.

Is there a difference?

No

Is that bad?

No

Does that break anything?

No

It is bad because quoting you:

The browser forcefully wraps at hardwrapped places via the insertion of <br> tags when it might be too short or too long for the viewport, due to the varieties in ways in which the text can be displayed over various styling, systems and devices. Therefore the wrapping should be left to the browser to handle it.

Guys, there is no “right” answer on this one. However it is abundantly clear that original intent of Markdown was to wrap lines ala HTML, e.g.

this text
is written on two lines

(renders as)
this text is written on two lines

and CommonMark is designed to be as true as possible to the original philosophy of Markdown, only making necessary decisions in the many, many places where the spec was ambiguous.

As noted by @riking earlier:

It is an explicit goal to support a common “switch” that allows linebreaks to be treated as <br> since that is what users tend to expect.

That’s how it works
here too. :smile:

The compromise we have in is that you can indicate linebreak with a backslash character now too e.g.

this text\
is written on two lines

(renders as)
this text
is written on two lines

and at least you can now see why, rather than having to somehow magically infer that there are two invisible spaces at the end of the line…

4 Likes

Exclude from what? It’s a bit like saying you’re excluded from using QWERTY keyboards. It’s not “excluding” to say that a tool is designed for a certain task. It’s just a practical realization that a single tool can’t work great for whatever task. That’s why we have non-QWERTY keyboards, and that’s also why there should probably be a different flavour of Markdown for other kinds of writing systems.

No, you’re deliberately misconstruing what I said. It’s like saying that I’m “excluding” towards screws when I’ve designed a nail gun to efficiently nail stuff to the wall.

You’re mistaken. While Github uses the hars linebreaks for comments, it’s not used for Markdown files and the like.

1 Like