Emphasis inside Strong broken in JS implementation when parenthesis involved

Why does the following markdown string:

**Gomphocarpus (*Gomphocarpus physocarpus*, syn. *Asclepias physocarpa*)**

produces:

Gomphocarpus (Gomphocarpus physocarpus, syn. Asclepias physocarpa)**

i.e.
<p><em><em>Gomphocarpus (</em>Gomphocarpus physocarpus</em>, syn. <em>Asclepias physocarpa</em>)**</p>

I experience this in the Try page of this site and on my own web site where I installed the prebuilt JS parser. Strangely, it is well formatted in the Preview pane at the right of the box where I typed this message… (same on stackoverflow, so I guess that a different parser is used here for the preview of WMD).

It is because of rule 3 for the emphasis parsing:

3. A single * character can close emphasis iff it is not preceded by whitespace.

This rule probably should be extended to consider opening parenthesis - if there isn’t a good example not to?

You’re right. Thanks.
It would be great to include parenthesis. Writing my sentence as I did seemed so natural (and this is the right way to write in botany).

May I add a non related question?.. Is the source of this WMD editor available somewhere? Like here and on StackOverflow, I’m looking to be able to plug my image upload service.

Yes, it’s probably a good idea to extend the rules so that closers can’t be preceded by ( (or [ or {?) and openers can’t be followed by ) (or ] or }?). Does anyone see problems with this?

Of course, it would mean you’d need to write

 *`)`*

if you wanted an italicized parenthesis.

By the way, you can work around this for now by using _ for the inner emphasis markers.

:smile: but I’m still in dev process right now so this is not a hurry…

@jgm - There could be a better approach for handling cases such as this - skip all symbols and punctuation before checking for whitespace before or after the emphasis char.

This would also handle cases like:

foo*. bar -- only closer because followed by space (skipping the dot)
**foo "*bar*" foo** -- skips quotes

There are some good ideas here. It looks hairy, but if I understand correctly, basic idea is fairly simple:

  1. Strings of * or _ are divided into “left flanking” and “right flanking,” based on two things: the character immediately before them and the character immediately after.
  2. Left-flanking delimiters can open emphasis, right flanking can close, and non-flanking delimiters are just regular text.
  3. A delimiter is left-flanking if the character to the left has a lower rank than the character to the right, according to the following ranking: spaces and newlines are 0, punctuation (unicode categories Pc, Pd, Ps, Pe, Pi, Pf, Po, Sc, Sk, Sm or So) is 1, the rest 2. And similarly a delimiter is right-flanking if the character to the left has a higher rank than the character to the right.

@Knagis, your idea is also a good one, I just wanted to mention another alternative that might be worth looking into. Both require recognizing unicode punctuation, which adds another level of complexity to the C parser. And your suggestion might require indefinite lookbehind or lookahead in cases where you have a whole pile of punctuation characters before or after the delimiter.

I’ve implemented a solution on the newemph branch – still need to update the spec and the JS parser. But the examples in this thread are now handled well. If anyone is intersted, the commit is here:

I’ve polished this change and merged it into master. Spec, C, and JS implementations have all been updated.

% ./cmark
**Gomphocarpus (*Gomphocarpus physocarpus*, syn. *Asclepias physocarpa*)**
^D
<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn. <em>Asclepias physocarpa</em>)</strong></p>

do you have the commit line handy? I’d sync up the python implementation later.