Proper ruby text (<rb>) syntax support in Markdown

Whether you use furigana (the Japanese use case for ruby) or not is really dependent on text type and audience. If you were inclined to use furigana at all, though, 爨 is most definitely a kanji you would use it on, as it’s both hyōgai (non-standard) and very complicated. For example, you’d write [炊爨]【すいさん】 or [爨]【かし】ぐ to make things readable.

1 Like

I am warming up to just adding this syntax to Discourse core, provided it is behind a site setting.

Can you clarify if you require formatting in the brackets, eg: [*爨*]【*かし*】 ?

I’ve never seen anyone use any formatting within Ruby text, and both Japanese and Chinese traditionally lack both bold and italic, so it most definitely isn’t too important.

I don’t think there’s a reason to explicitly forbid it, though. Since formatting is available in regular CJK text (), it might as well be valid within the base text of a Ruby tag. I can also imagine a scenario where someone would want to emphasize a certain part of pronunciation ("She actually says [寂]【さ**み**】しい in this case"), in which case formatting in the actual Ruby text would be useful. None of those would be particularly common, I don’t think, but it could become a minor thing people would stumble over every now and then.

The one thing that might get weird is ending a formatting block within a Ruby block. For example, *Italic outside[italic base]【italic ruby* normal ruby】. Either formatting could be disallowed within Ruby tags, or this could result in something like:

<i>Italic outside</i>
<ruby>
    <rb>
        <i>italic base<i>
    </rb>
    <rt>
        <i>italic ruby</i> normal ruby
    </rt>
</ruby>

I suppose it could also introduce a new “context” where new formatting tags can be started but earlier ones can’t be completed… I’m sure you’re more experienced with that kind of thing than I am, though – it’s bound to have come up with other elements before. Stick to what CommonMark usually does, I guess.

In the Discourse context the main reason is cause I can not make this an inline rule, it would have to be a post process rule that walks through text nodes. ]【 etc, are skipped in inlines. So you would get no formatting in these tags.

I see. In that case, it definitely isn’t a very big deal at all.

What about inclusion in CommonMark itself? It’d certainly be useful outside of Discourse too.

My vote is yes it should be included, but I have no real say here at all, deciding what is or is not included is up to @jgm

1 Like

Unlikely, as we’re still trying to get the basics nailed down for 1.0 before adding extensions like this.

I’m mainly afraid that sites/software/services will implement CommonMark 1.0 and be done with it, with most never migrating to later versions as the first “complete” version is “good enough”. In that case, features like this would be relegated to relative obscurity, which is a terribly sad thought.

Not really, tables for example is a popular extension but not part of spec at the current time.

Japanese here. This is my opinion:

  • 【】 is commonly used for titles or categories (e.g. 【速報】(news)) in Japanese text. So I feel 【】for ruby is a bad idea. I don’t know why Japanese StackExchange chose it. Seriously I want to ask about this decision to StackOverflow’s developers.
  • Aozora Bunko also has ruby’s syntax for plain texts. It is《》. (description and example.) This syntax is literally the biggest example of ruby syntax in the world. Originally, 《》 is used by the Books for the Blind Association in Japan. And Aozora Bunko follows it. If you say “let’s use【】 for ruby”, then I will also say “why not 《》? ;-)”. Aozora Bunko doesn’t adopt Markdown though.
  • 【】and 《》 are hard to type for normal keyboards. It’s also a bit hard even with Japanese IME. Currently all of syntaxes for Markdown are composed of singlebyte characters and these are easy to type with English keyboard. I hope this defacto standard will be kept.
  • I believe ruby is not only for japanese text. It is just a expression of all language’s text. Someday it may be used with a great idea for some language. I mean we don’t need to stick to specific language like Japanese.
1 Like

Did I stop the discussion? :thinking:
Sorry if it was offensive. :cry:
I just gave the above opinion :loudspeaker: as just a person in Japan. :jp:
Of course, it is just my personal opinion :loudspeaker:, and I don’t obsess it. :tipping_hand_woman:
I hope that the discussion :speaking_head: will resume again. :pray::pray::pray:

Discussion is rather slow, don’t take it personally, that is the normal pace of discussion here sometimes :wink: :turtle:

2 Likes

I’d say it depends on our intended use case.

If we were strictly implementing a friendlier way to mark CJK pronounciations, then any of the syntax proposed by @tom-n would work.

Taiwanese Bopomofo also adapts a convention like 國語辭典(ㄍㄨㄛˊ ㄩˇ ㄘˊ ㄉㄧㄢˇ), so a syntax that counts N space-separated components in parentheses and backtrack-applying them to N kanji characters (\p{Han} in regexp) would be decent. The only problem is the potential complexity introduced to parser.

On the other hand, we could stick to language-neutral syntax, given that people are using ruby text more expressively nowadays. A good example is marking katakana word with their original language (アクセラレータaccelerator), or supplementing aboriginal place names with literal meaning (Iranmeylek天神的女兒), which are both not necessarily kanji.

Something like {漢字}(かん-じ) might be a good general syntax, and we may suggest a full-width standard alternative that is up to individual implementations to adapt.

Any thoughts? :slight_smile:

I hope that we can make the syntax for ruby annotation as semantic general as much as possible.
Actually, I’m contemplating to use ruby annotation for classic text annotation, for example, to render better reading experience for many inline annotation or commentary. For example,

1 From Paul, an ·apostle [messenger] of Christ Jesus. ·I am an apostle because that is what God 
wanted [L …by the will of God].

To ·God’s holy people [T the saints] living in Ephesus[a] [C a prominent city in the Roman province 
of Asia, present-day western Turkey; Acts 19], ·believers in [or who are faithful to] Christ Jesus:

I’m looking for a markdown implementation that I can convert the above text with the annotation with ruby annotation, similar to the above examples.

I think that for my interest, the syntax {漢字}(かん-じ) would be great!

Please refer me to any potential lead. Thanks in advance!

An interesting feature of the HTML markup are <rp> tags designed to add parentheses or other fallback rendering for browsers old that don’t understand the ruby tags.
In similar spirit, whatever syntax extension you invent for markdown, many markdown converters will not understand it, and likely dump the syntax as-is to the output. Is that a consideration? Which, if any, of these syntaxes are acceptable fallbacks? For example the caret in syntaxes like “[図書館]^(としょかん)”, while is has a nice geeky meanins, probably looks silly if dumped as-is to the output?

1 Like

Hello everyone,

I published a ruby library that supports ruby element in Markdown.

The syntax I chose is

[漢字(かんじ)]

Reasons being:

  • It looks like Markdown
  • It does not need to type any special brackets 【】
  • You can annotate each character
  • Linkify ruby text is the same as how you would do it in Markdown
An example
[漢字(かんじ)](https://jisho.org/search/漢字)

# Annotate each character
[漢(かん)][字(じ)]

# Link separately
[漢(かん)](https://jisho.org/search/漢)[字(じ)](https://jisho.org/search/字)

# Link together
[[漢(かん)][字(じ)]](https://jisho.org/search/漢字)

You can see how above renders at here. I have been using this to write 150 posts involves ruby markups and it works fine.

Hope this helps!

2 Likes

ruby涉及的元素包括ruby、rt以及rp。首先使用ruby指定一个具体的表达式,然后使用rt提供说明。rt部分将显示在表达式上方。

下面的例子中,拼音将显示在文字的上方。

<ruby>
温<rt>wēn</rt>
习<rt>xí</rt>
饭<rt>fàn</rt>
</ruby>

效果为:

wēnfàn

但是在不支持ruby的浏览器中需要使用rp对这两个区块进行视觉上的隔离。

<ruby>
   温
   <rp>
       <rt>wēn</rt>
   </rp>
   习
   <rp>
      <rt>xí</rt>
   </rp>
   饭
   <rp>
       <rt>fàn</rt>
   </rp>
</ruby>

标签是将所有需要注释的字包裹起来,然后实现上标的标签其实是<rt>标签,由于这个标签属于HTML5标签,很多老版本浏览器会不支持,那么当浏览器不支持的时候它就会像

“温wēn习xí饭fàn”这样显示,非常不美观。 加上<rp>当浏览器不支持时就可以显示成
温(wēn)习(xí)饭(fàn) 这样的效果了。

出处(Source): HTML5 汉字上方添加拼音标注 ruby、rp、rt_晨风的专栏-CSDN博客


I use Ruby-Text as shows below…

I think your implementation is fantastic and the example at Markdown Extension: Ruby Markup — Ruby — Juanito Fatas explains wonderfully why this is such a cool extension.

Curious what @jgm thinks here. This issue is very foreign to me as a non Japanese speaker but feels incredibly common to 120 so million people who would find this very handy.

The [rb(rt)] syntax looks nice! The one thing I would like to say is that half-width brackets are difficult to type with a Japanese IME active, necessitating either switching your input method mid-sentence and then back again, or paging through suggestions for a while to get to the correct brackets. For this to actually facilitate comfortable typing, full-width formatting characters will also need to be supported.

I would love to see a common Markdown syntax for this! It would be ideal if that syntax could support a wide range of use cases. W3C has a nice collection of them here: Use Cases - Interlinear Text Layout Community Group