Why there is no syntax for subscript and supscript?

mlorant · September 13, 2014, 9:00am

Hi,

I noticed there is no syntax for subscript and supscript. I know StackOverflow allows HTML tags  and  for that but a native support could be nice in this format to avoid again conflicts. I know one French website which uses the syntax 1^st^ for 1^st and H~2~O for H₂O.

I don’t say it is the best or it should be the syntax (it is just an example actually), but I think have some markup for this would be really helpful (not only for math and physics but also for some abbreviations where supscript is used)

rwzy · September 13, 2014, 9:08am

I think that would be the best syntax since pandoc already uses it. Displaying math should be left to other things like mathjax though.

I think there is no syntax for superscripts and subscripts because Gruber’s original markdown syntax did not contain it and I believe one of the goals of this project is to address the flaws, bugs and ambiguities of the original Markdown.pl implementation.

Therefore, I think it’s best to have this in the ‘extensions’ category.

mlorant · September 13, 2014, 9:11am

I changed the category, indeed, ‘Extensions’ sounds better for this request.

Bengt_Luers · September 13, 2014, 9:44am

At least surrounding sub/supscripts are better than LaTeX-style sub/supscripts H_2O; E = mc^2, because they conflict less with url-encoded titles: http://myblog.com/why_commonmark_is_awesome/a94b3

rwzy · September 13, 2014, 9:53am

It appears as though your link is dead.

LaTeX style is for between math delimeters specifically, which is why it is allowed to be shorter. By the way, I proposed dollar signs for math delimeters in that case.

Bengt_Luers · September 13, 2014, 10:04am

It was for illustrative purposes, only.

Yes, but it mangles up other areas, too. To use that link in a footnote one would have to escape the underscores:

\footnote{http://myblog.com/why\_commonmark\_is\_awesome/a94b3}

Or is that another problem entirely?

rwzy · September 13, 2014, 10:15am

My bad!

I do not understand, I was in agreement with you. I was only attempting to clarify why (because I thought that was an actual link reference which had failed).

What I meant was that the LaTeX style syntax is for use only within math delimeters, where it very rarely, if not never, conflicts with normal text such as your example with underscores between words. This is why it is suitable in that context but not suitable for free text like in markdown and hence why I agree with @mlorant’s proposed syntax (and also why I proposed dollar signs as math delimeters so that LaTeX style math can be embedded easily).

LaTeX has the \textsuperscript{} command (and \textsubscript{} with the fixltx2e package) for ‘free text’, i.e. text-mode, cases.

Bengt_Luers · September 13, 2014, 10:59am

Yes, in LaTeX underscores can only be used to note subscripts in math mode, but in text mode they are still special / reserved characters and cause errors like this:

! Missing $ inserted.
<inserted text> 
                $
l.5 test_
         text

The syntax proposed by @mlorant would avoid both problems by not reserving underscores characters and allowing them to be used in text without further meaning.

rwzy · September 13, 2014, 11:13am

Correct. This can be avoided by use of the underscore package though.

Burt_Harris · September 13, 2014, 5:29pm

My opinion: superscripts and underlines are two fine examples of where resorting to new syntactic sugar extensions is not justified in the core CommonMark language.

That’s because simple existing HTML formatting tags like  (and ) are compatible with many existing markdown implementations, and don’t look to out-of-place in plain text.

On the other hand, defining a namespace qualified language-escape extension mechanism, is the kind of topic worthy of a CommonMark extension. I thinking about ways you could indicate to an extended CommonMark processor that you wanted to switch for a while into a math-savvy language, chemistry savvy language, or even table-savvy language .

See the topic on CURIEs and see if you see what I mean.

rwzy · September 14, 2014, 2:04am

I think it is easier to allow the implementation itself to provide options for enabling extensions, than needing to change the document itself.

Burt_Harris · September 14, 2014, 10:40pm

Sure, it might be easier, but its likely to be less portable, extensible, and sustainable…

Eventually you might get to the point where guys who want to use LaTeX notation want to interact with guys who prefer a lightweight extension that only supports subscripts and superscripts on the same (Discourse or like) web site. The only way that works is if each posting is tagged with what extensions it uses.

Now that doesn’t mean that the site doesn’t remember what each user’s preferences are, and have a custom template for their posts that includes the declarations.

rwzy · September 15, 2014, 12:28am

Wouldn’t it be easier for the user if a particular site specified what extensions (if any) they support? And then provide options to enable/disable them? It feels extremely clunky to have to mark up the text just to use a syntax if it’s already supported, probably best to leave activation of the syntax to the site itself. If another site also allows activation of that particular syntax, there should not be any conflicts/portability issues since it is the same specification being used. And why would there be a need to include syntaxes on sites that do not support them? For example, why would you need to have portability for LaTeX like math equations on a site where nobody would use that feature anyway?

Point is that a “namespace qualified language-escape extension mechanism” does not look like it fits within a markdown document, i.e. it does not look like something the average reader would really understand the meaning of. Remember from the philosophy section of Gruber’s original markdown:

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

To be fair, LaTeX style equations aren’t really that easy to read in their source form either, something like the syntax proposed by @Kasper in Mathematics extension would be much more readable. But LaTeX style math equations are already established as a very needed feature to be able to embedded (and not handled directly by commonmark), and at least the reader sort of has an understanding of what it is or what it does (I wouldn’t call it that difficult to interpret, just not readable/pretty at a glance). And it should just be the math part of the LaTeX, not general LaTeX commands. (Similar to how mathjax doesn’t convert general LaTeX commands and instead promotes the use of native (html) elements for text-mode formatting.)

Besides why would there be a conflict with guys who want to use LaTeX notation with those who do not prefer it? If they are not discussing math/equations, why the need for the first group to use LaTeX notation/extensions at all? Shouldn’t they just be using the common syntax they can both understand and communicate to each other in?

Burt_Harris · September 15, 2014, 3:00am

I suggest concepts and notation of namespace qualified extensions are no more complex than Grubers reference style links (called link reference definitions in CommonMark). Its really same principal applied to extensions. The real complexity difference is because Gruber didn’t support extensions well, and isn’t that what lead us here?

I think the extension model you are suggesting is oversimple. It doesn’t support evolution well, and leads us down the same path that allowed Gruber’s original to evolve into to a fragmented ecosystem of incompatible implementations, dialects and versions. What he wrote was pretty good, but his approach didn’t address a community based evolution of the language very successfully.

I answer the original question, to keep CommonMark as simple as possible. For the occasional guy who wants to tell his friends that E = mc², the sub/sup tags are fine. I agree it’s not the simplest possible notation, but its the simplest solution: but few angle brackets really aren’t that unreadable in plain text. (Its the deep nesting of angle brackets where SGML style markup fails.)

Everything should be as simple as it can be, but not simpler
– Alber Einsten

Going beyond sub/sup however, to include topics worthy of extension (e.g.heavier math) has to be at least a little less simple than the read like a plain text email or it will fail either based on lack of utility, and/or failure to meet the test of evolution without version/flavor declarations.

mlorant · September 15, 2014, 6:16am

Please also consider the fact that  can also be used in some languages, and so it can be used to write well. For example, the # 13 in English is n^o 13 in French (note it is a o in supscript, not a degree symbol). It is not just about math and physics.

In this case, should the specification recommend some sort of list of HTML to not escape or such thing if it is not implemented as a syntax?

jericson · September 15, 2014, 8:02am

I agree with you on underlines, because they are just another dreary typewriter habit.

As for sub- (and super-) script, I think those are far more reasonable as anticipated extensions. One of the goals of Markdown is to allow readable text both in the input and output. We lose nothing by not offering syntactic sugar for underlines (in fact we gain quite a bit). But we are left with some uncomfortable compromises with superscript. An author must chose between:

...the 28<sup>th</sup> of the month...

and:

…the 28th of the month…

From personal experience, the later is just acceptable enough for a reader that few writers bother with the former. That’s a shame because little touches like superscripts go a long way to making a document look professional. The proposed simpler syntax seems like it would push the balance the other direction, which would make me happy.

Burt_Harris · September 15, 2014, 3:02pm

I don’t know French, but suggest a better way to deal with that would be to avoid superscripts entirely. I think you talking about the numero symbol, which in CommonMark can be written as № giving.

For example, the # 13 in English is № 13 in French…

The same thing works in markdown, pandoc, etc.

Also, if someone writing in French has a numbo symbol key on their keyboard, (which seems likely), then they can just press it, emit the correct Unicode symbol into the plain text, and get the effect that way. The № notation is just for someone limited to ASCII characters.

I disagree. What we loose is compatibility with Gruber’s notation. CommonMark has been very careful to retain that compatibility as much as possible.

If we want to extend to handle better formatting for the use case of writing ordinal numbers like the 28th of the month better, that’s a good case for a extension! But we should consider one that takes the commonly accepted plain text version in, recognizes the author’s intent, and generates 28^th on output. That’s completely sugar-free, meaning that the same input works OK in markdown, better in the extended language.

mlorant · September 15, 2014, 4:29pm

As I said the degree symbol ° is not the same as ^o. Yes, it is just details but as @jericson highlight, this is what makes a document professional or not I can live without but if a readable syntax in both input and output can be added, CommonMark specs would be more awesome.

Burt_Harris · September 15, 2014, 5:03pm

Are you confusing the degree symbol with the numero symbol? The degree symbol seems completely off-topic to me.

What I did was draw a clear distinction between formatting markup and glyphs. That’s a separation of duties that needs to be maintained: the Unicode consortium is the authority on codifying glyphs, but stays completely neutral on markup, similarly w3.org stays out defining new glyphs. See http://www.w3.org/International/questions/qa-chars-vs-markup.

I find I was wrong guessing that the numero symbol might be on a French AZERTY keyboard. The lack of one there suggests to me that No. is the best plain-text № , and that a sugar-free extension technique to beautifying that notation, and not using it as a justification for a sugary N^o^ notation, or similar.

Similarly if writing degree symbols really is your concern, a similar sugar-free extension approach could be used. But all of that has to do with better ways of writing well-defined glyphs, and nothing to do with superscripts.

If there’s well-accepted e-mail friendly notation for something, say x^2 for x², then that (no trailing ^ delimiter) might be an interesting idea for an extension, but I don’t know of one for subscript, and I dislike the idea of using tilde for it without requiring some declaration of intent in documents doing so.

jericson · September 16, 2014, 2:19am

I agree that, at least initially, this particular feature should not go into the core of the language. But it should be, like the US Bill of Rights, among the first amendments to the core.

On the other hand, I think we should keep that sort of DWIM features out of the language entirely. Consider the not unlikely case of someone wishing to turn the superscript ordinals off. Likely we’d need a way to escape the superscript: 28\th or some such.