Absolutely. I shortened my example for brevity’s sake - <rb> and <rp> tags should of course be used in practice.
That raises another thought, by the way - the <rp> tags should probably contain 【 and 】, unless there’s a common application for ruby tags where those aren’t appropriate.
Edit: Here’s some more info: <ruby> tags, along with <rt> and <rp>, are part of the living HTML standard. <rb> and <rtc>, however, are part of the HTML5 spec. This could mean that support for them is worse (although I don’t know). For the purposes of this, however, that doesn’t matter: <rtc> isn’t used at all, and <rb> doesn’t actually do anything (and will thus work in browsers that don’t support it) unless tags are out of order (e.g. <rt>1</rt> <rt>2</rt> <rt>3</rt> <rt>1</rt> <rt>2</rt> <rt>3</rt>, which this extension wouldn’t do).
I’d be inclined to agree, though my only exposure to ruby text is for Japanese. If 【】 is common across other languages, and won’t lead to ambiguity with other uses of 【】, then yeah. The standard mentions how ( ) can be ambiguous in some situations, so the same logic applies to whatever delimiter is chosen.
I think it’s safe to say that ( and ) are bad choices, in any case, as they would look ridiculous in most text. ( and ) are better (notice the space?), but not too friendly toward full-width text. Since the spec specifically mentions that ruby is “primarily used in East Asian typography as a guide for pronunciation or to include other annotations”, I think 【 and 】 are a fairly safe bet.
Went ahead and wrote a tangentially related proposal for full-width formatting characters. It would make the furigana within entirely full-width Japanese text thing possible (e.g. いい[提案]【ていあん】ですね。).
I can’t think of a case where I would want to display that. It is an exception, but it makes it much easier to add ruby text for compound words. Using the same example, it would be denoted 振【ふ】り向【む】く
or [振]{ふ}り[向]{む}く
This requires typing out the compound to get it to appear in the IME, then backtracking to add the ruby text. It might be worth it as an optional feature.
In Discourse, you normally can escape brackets with a backslash (applies to {}, (), []). In my plugin I got rid of all backslashes before 【】 in the baked text and ignored the ones with backslashes, but it might be worth allowing them to be escaped like any other set of brackets. The other full-width brackets would also be good candidates.
I will update it to the markdown it engine. It shouldn’t be difficult since it is a preprocessor on the whole text.
The new engine does not like this for very good reason, when you add rules you need to find the right place to inject them, in this case it would be an inline rule so you could probably just push to the end of the stack.
I made the []{} version on the markdown-it engine in as an inline rule on the top of the stack. It should run a lot better then large numbers of unreadably large regexes.
The inline rule will work with the character seperated syntax for multiple ruby tags ([図書館]^(と しょ かん)). It can also be switched over to the []^() syntax if necessary.
Supporting full-width brackets in an inline rule right now is problematic because markdown-it does not stop on them. However, it shouldn’t be an issue with []【】.
I can add the same type of pattern matching as the old version I wrote but it might be better as an optional feature rather than a part of the spec.
What syntax would be best for CommonMark? I think []{} is the easiest to type in most cases.
The 【】 syntax is probably too dependent on the text type for the spec. It uses non-CJK characters to determine where to place the ruby tag, and only saves a couple keystrokes. It could also have unintended consequences on existing plaintext documents (【】 are used in headings/titles).
Yes. I was referring to the 漢字【かんじ】 syntax only. The unambiguous one would be completely fine.
Without full support for full-width brackets in CommonMark though, []【】 and other syntax with only full-width brackets would be a bit of an issue.
The implementation Discourse uses, markdown-it, skips over sections of text not in a specific set of characters. This set does not include any full-width characters since they are not used elsewhere in the spec.
There would no way to escape it.
My plugin uses markdown-it so adding a syntax with only full-width brackets will require some hacks to address the above.
The best solution for adding something like []【】 would be full-width support in the spec as you proposed.
The full width is relating to the list of stop chars for inlines, it’s an implementation detail that makes it annoying to build this plugin, but my new text post processor helps a bit (but if you want to bold or italic ruby text you would be stuck)
Whether you use furigana (the Japanese use case for ruby) or not is really dependent on text type and audience. If you were inclined to use furigana at all, though, 爨 is most definitely a kanji you would use it on, as it’s both hyōgai (non-standard) and very complicated. For example, you’d write [炊爨]【すいさん】 or [爨]【かし】ぐ to make things readable.
I’ve never seen anyone use any formatting within Ruby text, and both Japanese and Chinese traditionally lack both bold and italic, so it most definitely isn’t too important.
I don’t think there’s a reason to explicitly forbid it, though. Since formatting is available in regular CJK text (あいう), it might as well be valid within the base text of a Ruby tag. I can also imagine a scenario where someone would want to emphasize a certain part of pronunciation ("She actually says [寂]【さ**み**】しい in this case"), in which case formatting in the actual Ruby text would be useful. None of those would be particularly common, I don’t think, but it could become a minor thing people would stumble over every now and then.
The one thing that might get weird is ending a formatting block within a Ruby block. For example, *Italic outside[italic base]【italic ruby* normal ruby】. Either formatting could be disallowed within Ruby tags, or this could result in something like:
I suppose it could also introduce a new “context” where new formatting tags can be started but earlier ones can’t be completed… I’m sure you’re more experienced with that kind of thing than I am, though – it’s bound to have come up with other elements before. Stick to what CommonMark usually does, I guess.
In the Discourse context the main reason is cause I can not make this an inline rule, it would have to be a post process rule that walks through text nodes. ]【 etc, are skipped in inlines. So you would get no formatting in these tags.