A handful of small spec issues

These are issues I noticed while reading through the spec. Most of them are pretty minor and, I hope, uncontroversial; some of them are only issues with the presentation of the spec rather than bugs in the spec itself. They’re in the order I noticed them.

  • Consistent with the default behavior of CSS/HTML and the original ASCII specification, tab stops should be 8 spaces apart, not 4 spaces. (More precisely, tab expansion should insert however many spaces are required to make the number of characters so far on the current line be a multiple of 8.) This may also mean reverting to the original Markdown rule of “four spaces OR one tab” to set off a code chunk.

  • The <tag /> notation is a hangover from XML and should not appear in new HTML; therefore, please do not use it in equivalent-HTML examples either (for instance, <hr>, not <hr />.

  • In several places, trailing whitespace is discussed. However, trailing whitespace is invisible in examples. The style sheet for the specification should be adjusted so that trailing whitespace is visible (drawn in a slightly darker color, perhaps).

  • An indented code block inside a quote block should require only four spaces (or a hard tab) after the > if the space after > has been omitted on preceding lines:

      >quoted text
      >
      >    code inside quoted text
      >1234
    

    This is ambiguous only if the code is the first line of the quote block, in which case the five-space rule is fine. (This all enhances consistency with the rules for list items, below.)

  • As a special case, code spans consisting entirely of whitespace (e.g. ` `) should be reduced to a single space character, <code> </code>, rather than to nothing at all. (Given the syntax, we have to choose between allowing authors to write a code span that is a single space character, or a code span that is completely empty. The former is more useful.)

1 Like

I disagree, I think this would be actively harmful. Four spaces seems to be chosen specifically to coincide with 1 indentation level for code blocks and lists, and making it 8 would certainly violate the Principle of Least Surprise.

There’s a difference between “should not” and “will not”. There’s plenty of strange HTML being created all the time, like how IntelliJ’s javadoc formatter insists on you using <p /> in the Javadocs even though that tag makes no sense whatsoever.

Agreed.

No strong opinions on the last 2 items.

@zwol said:
Consistent with the default behavior of CSS/HTML and the original ASCII
specification, tab stops should be 8 spaces apart, not 4 spaces.

@riking said:
Four spaces seems to be chosen specifically to coincide with 1 indentation level for code blocks and lists,

I sympathize with this logic; unfortunately, IMNSHO, compatibility with the behavior of <textarea> is an overriding concern. And <textarea> has always used 8-space tabstops, and this is not even configurable in most browsers (Firefox being the exception).

This is not academic; this is a regular cause of mangled indentation in code samples on Stack Overflow. People type

int main(void)
{
    code
    code
    code
}

in the <textarea>, but the posted question comes out as

int main(void)
{
code
code
code
}

because pressing TAB in the <textarea> looks like it inserts 8 spaces but really it inserts a hard tab character, which SO Markdown treats as equivalent to four spaces.

… I think I just talked myself back out of the “four spaces OR one tab” logic; only consistently treating TAB exactly as <textarea> does, i.e., move cursor right to the next multiple of 8 columns, will completely fix this problem.

and making it 8 would certainly violate the Principle of Least Surprise.

And as far as I’m concerned, any value other than 8 violates the Principle of Least Surprise; I have literally never used any piece of software or hardware in which tab stops did not default to 8 spaces wide. In most of them this wasn’t even changeable.

The <tag /> notation is a hangover from XML and should not appear in new HTML

There’s a difference between “should not” and “will not”.

… I don’t see how this is an argument against removing it from the HTML examples in the spec?

Opposite here. Apart from Git’s command line output, tabs are and always have been 4 spaces to me. And while your textarea example might work, I don’t even know how I can enter a tab into a textarea (without copy-pasting).

I would actually say, if the code span only contains whitespace, leave everything there. How else am I supposed to tell you otherwise in markdown, that you need to add two spaces after a line to insert a hard line break (for example). I’d say that when using spaces explicitely inside code-quotes, then the author likely has a reason to insert all those whitespace characters.

2 Likes

@poke said:
Apart from Git’s command line output, tabs are and always have been 4 spaces to me.

Which alternate history are you from, and does DOS use forward slashes for pathname separators there? :wink:

And while your textarea example might work, I don’t even know how I can enter a tab into a textarea (without copy-pasting).

When this happens on SO I’m pretty sure it’s because people copy and paste their code (which is indented with hard tabs, because their mommas never taught them better) and then hit the {} button.

When I paste in text with hard tabs, and use {} or Ctrl+K, then everything I’ve selected gets indented by 4 spaces, regardless of tab characters. So everything is indented as code, and then a tab characters will still maintain a proper indentation within the code (and they are converted to four spaces too).

The more likely problem is that people paste in code, see that everything except the first line is indented and then indent that until a code block appears. The actual problem is that many don’t look at the preview and don’t care enough about their questions.

Well, due to my age, my DOS memory might be a bit skewed. But for as long as I remember caring about tab characters, they have been 4 spaces to me :stuck_out_tongue_winking_eye:

This doesn’t really make sense. In HTML, the trailing slash is literally useless; it’s thrown away by the HTML parser at parse time. Writing <p /> is exactly identical to writing <p> (which is why you can’t use self-closing elements in HTML, as it just opens the element). Using the self-closing slash on a void element in HTML is luckily harmless, as void elements have only start tags, but it’s still, as noted above, it’s useless.

OK, I did an experiment here: https://meta.stackexchange.com/questions/3122/formatting-sandbox/238942#238942 You’re right, but you’re right in a way that means I am also right. :slight_smile:

If you copy and paste code formatted with hard tabs, and then you use the {} button, everything is hunky-dory, except that all the tabs get converted to four spaces on render, which is likely to ruin the formatting of code that was written on the assumption that tabs are eight spaces. But that’s not important right now.

What’s important is that, as you speculated, if you copy and paste code formatted with hard tabs and then you manually type four spaces at the beginning of each line that used to be at the left margin, the code looks fine in the <textarea> – visually indistinguishable from what {} does – but is malindented when rendered. This is a loaded footgun handed to new users.

(Stack Exchange’s CSS specifies -moz-tab-size:4 for the <textarea>, so if you visit the site in Firefox, the code will not look fine in the <textarea> and you can figure out that you need to put four spaces at the beginning of each line without having to pay attention to the preview. No other browser supports this property.)

Do not blame the user for the failings of their tools. It looks right in the <textarea>, why should they have to notice that it’s wrong in the preview? Maybe they don’t trust the preview. Previews are often wrong.

This isn’t a historical thing. Tab stops are eight spaces apart by default in every browser. They are eight spaces apart by default in vi, Emacs, Sublime Text, and Xcode. They are eight spaces apart by default in every terminal emulator I have ever seen, starting now and counting backwards to 1995ish. (I am not confident of my own memory of such things prior to 1995.) I never did much of anything with glass or printing terminals, but my vague understanding is that many if not most of them had fixed tabs eight spaces apart – not just the default, you could not change it. And, perhaps most relevant in this context, one of the only things the K&R and GNU code formatting standards agree on is that tab stops are eight spaces apart.

I wasn’t at all kidding when I asked you what alternate history you were from. A less silly way of phrasing the question is: please specify exactly which editors, terminal emulators, etc. you are familiar with in which tab stops aren’t eight spaces apart by default, because I don’t believe they exist.

A positive advantage of using XML notation for HTML (XHTML) is that it is then possible to further process documents with XML tools without having to resort to HTML tools.

Use case: indexed items, generating index.
TOC, linking through to all targets.

If you really mean that then you are apparently using inconsistent indentation in your code. If I use tabs for indentation, I absolutely don’t care with how many spaces those tabs are rendered. To indent by one, I add one tab; to remove an indent, I remove a tab. So when you set it to 8 spaces, and I set it to 4 spaces, it will still be the same thing except that your code expands further to the right. But when pasting code that uses tabs, it absolutely should not be an issue that SO’s “tab interpreter” renders it with 4 spaces instead of your 8.

2 Likes

A positive advantage of using XML notation for HTML (XHTML) is that it is then possible to further process documents with XML tools without having to resort to HTML tools.

No you can’t, unless you’re much stricter about actually outputting XML syntax. Writing HTML that just uses the slash on void elements isn’t actually XML, and your pipeline will fail as soon as anything more complicated comes along.

XML is far more complicated in its requirements than most people realize.

Yes you can and no, it isn’t complicated. It’s far more logical.
==
btw. Simple statement, close all elements. No more no less. I didn’t ask for xml, I asked for capability of being processed by XML tools.

This is true. All terminal emulators emulate the 1978 VT100, which is an ANSI standard (X3.64), and the standard says that the default is 8 columns. Even the 1970 VT05 had 8-column tab stops (source).

I concure with @poke. I use 2 spaces for my tabs, and in certain syntaxes, my tabs are converted to spaces for my personal ease of editing. The fact is, we’re rendering HTML as the end goal, and we have CSS that handles the presentation of tab size. So why force the output to be of another spec’s format when that does not reflect the intentions of the writer?

+++ tabatkins [Sep 04 14 02:02 ]:

This doesn’t really make sense. In HTML, the trailing slash is literally useless; it’s thrown away by the HTML parser at parse time. Writing <p /> is exactly identical to writing <p> (which is why you can’t use self-closing elements in HTML, as it just opens the element). Using the self-closing slash on a void element in HTML is luckily harmless, as void elements have only start tags, but it’s still, as noted above, it’s useless.

Not useless – the EPUB format requires that its HTML content be XML.
It seems useful to me for cmark to generate HTML that can be used in an
EPUB.

If the browsers don’t care about the self-closing slash, but including
it makes it possible to use XML tools and use the HTML in epubs, then
I see no reason against including it and a couple reasons for…