I experience this in the Try page of this site and on my own web site where I installed the prebuilt JS parser. Strangely, it is well formatted in the Preview pane at the right of the box where I typed this message… (same on stackoverflow, so I guess that a different parser is used here for the preview of WMD).
There are some good ideas here. It looks hairy, but if I understand correctly, basic idea is fairly simple:
Strings of * or _ are divided into “left flanking” and “right flanking,” based on two things: the character immediately before them and the character immediately after.
Left-flanking delimiters can open emphasis, right flanking can close, and non-flanking delimiters are just regular text.
A delimiter is left-flanking if the character to the left has a lower rank than the character to the right, according to the following ranking: spaces and newlines are 0, punctuation (unicode categories Pc, Pd, Ps, Pe, Pi, Pf, Po, Sc, Sk, Sm or So) is 1, the rest 2. And similarly a delimiter is right-flanking if the character to the left has a higher rank than the character to the right.
@Knagis, your idea is also a good one, I just wanted to mention another alternative that might be worth looking into. Both require recognizing unicode punctuation, which adds another level of complexity to the C parser. And your suggestion might require indefinite lookbehind or lookahead in cases where you have a whole pile of punctuation characters before or after the delimiter.