I experience this in the Try page of this site and on my own web site where I installed the prebuilt JS parser. Strangely, it is well formatted in the Preview pane at the right of the box where I typed this message… (same on stackoverflow, so I guess that a different parser is used here for the preview of WMD).
You’re right. Thanks.
It would be great to include parenthesis. Writing my sentence as I did seemed so natural (and this is the right way to write in botany).
May I add a non related question?.. Is the source of this WMD editor available somewhere? Like here and on StackOverflow, I’m looking to be able to plug my image upload service.
Yes, it’s probably a good idea to extend the rules so that closers can’t be preceded by ( (or [ or {?) and openers can’t be followed by ) (or ] or }?). Does anyone see problems with this?
@jgm - There could be a better approach for handling cases such as this - skip all symbols and punctuation before checking for whitespace before or after the emphasis char.
This would also handle cases like:
foo*. bar -- only closer because followed by space (skipping the dot)
**foo "*bar*" foo** -- skips quotes
There are some good ideas here. It looks hairy, but if I understand correctly, basic idea is fairly simple:
Strings of * or _ are divided into “left flanking” and “right flanking,” based on two things: the character immediately before them and the character immediately after.
Left-flanking delimiters can open emphasis, right flanking can close, and non-flanking delimiters are just regular text.
A delimiter is left-flanking if the character to the left has a lower rank than the character to the right, according to the following ranking: spaces and newlines are 0, punctuation (unicode categories Pc, Pd, Ps, Pe, Pi, Pf, Po, Sc, Sk, Sm or So) is 1, the rest 2. And similarly a delimiter is right-flanking if the character to the left has a higher rank than the character to the right.
@Knagis, your idea is also a good one, I just wanted to mention another alternative that might be worth looking into. Both require recognizing unicode punctuation, which adds another level of complexity to the C parser. And your suggestion might require indefinite lookbehind or lookahead in cases where you have a whole pile of punctuation characters before or after the delimiter.
I’ve implemented a solution on the newemph branch – still need to update the spec and the JS parser. But the examples in this thread are now handled well. If anyone is intersted, the commit is here:
Strings of * or _ are divided into “left flanking” and “right flanking,” based on two things:
the character immediately before them
and the character immediately after.
Left-flanking delimiters can open emphasis,
right flanking can close,
and non-flanking delimiters are just regular text.
A delimiter is left-flanking if the character to the left has a lower rank than the character to the right, according to the following ranking:
spaces and newlines are 0,
punctuation (unicode categories Pc, Pd, Ps, Pe, Pi, Pf, Po, Sc, Sk, Sm or So) is 1,
the rest 2.
And similarly a delimiter is right-flanking if the character to the left has a higher rank than the character to the right.
In an issue comment on GitHub, I suggested that it may be useful – especially for East Asian languages and others written without inter-word spacing –, to treat the different Unicode punctuation classes differently for flanking (or can open/close emphasis) behavior.
Open/Start Ps, e.g. (
Initial Pi, e.g. “
Close/End Pe, e.g. )
Final Pf, e.g. ”
Connector Pc, e.g. _
Dash Pd, e.g. -
Other Po, e.g. ,
Current classification of delimiter runs
␠: any (Unicode) whitespace, including line start and end
.: any (Unicode) punctuation
a: neither punctuation nor whitespace
Before
After
Flanking
Open
Close
␠
␠
none
a
a
both (2a)
*
*
.
.
both (2b2)
*, _ (b)
*, _ (b)
.
a
left (2a)
*, _ (a)
␠
a
left (2a)
*, _ (a)
␠
.
left (2b1)
*, _ (a)
a
.
right (2a)
*, _ (a)
a
␠
right (2a)
*, _ (a)
.
␠
right (2b1)
*, _ (a)
Proposed differentiation of punctuation classes
(: Ps or Pi
): Pe or Pf
,: Pc, Pd or Po
Before
After
Flanking
Open
Close
,
,
both
*, _
*, _
(
(
left (both)
*, _
*, _
)
)
right (both)
*, _
*, _
(
)
both
*, _?
*, _?
)
(
both (none?)
*, _
*, _
(
,
both (left?)
*, _
*, _
)
,
right (both)
*, _
*, _
,
(
left (both)
*, _
*, _
,
)
both (right?)
*, _
*, _
(
a
left
*, _
␠
(
left
*, _
a
(
bothright
*, _
*, _
(
␠
right
*, _
)
a
bothleft
*, _
*, _
␠
)
left
*, _
a
)
right
*, _
)
␠
right
*, _
,
a
left
*, _
␠
,
left
*, _
a
,
right
*, _
,
␠
right
*, _
A left-flanking delimiter run is a delimiter run that is
(1) not followed by whitespace, and
either (2a) not followed by punctuation,
or (2b) followed by non-starting punctuation and preceded by whitespace or non-ending punctuation,
or (2c) followed by starting punctuation.
A strongly left-flanking delimiter run is a left-flanking delimiter run that is
either (a) not part of a right-flanking delimiter run
or (b) part of a right-flanking delimiter run preceded by punctuation
(…)
A single * character can open emphasis
iff (if and only if) it is part of a left-flanking delimiter run.
A single _ character can open emphasis
iff it is part of a strongly left-flanking delimiter run