Proper ruby text (<rb>) syntax support in Markdown


#21

The full width is relating to the list of stop chars for inlines, it’s an implementation detail that makes it annoying to build this plugin, but my new text post processor helps a bit (but if you want to bold or italic ruby text you would be stuck)

Be sure to read


#22

Would you use ruby text for: 爨 are people able to hand write that?


#23

Whether you use furigana (the Japanese use case for ruby) or not is really dependent on text type and audience. If you were inclined to use furigana at all, though, 爨 is most definitely a kanji you would use it on, as it’s both hyōgai (non-standard) and very complicated. For example, you’d write [炊爨]【すいさん】 or [爨]【かし】ぐ to make things readable.


#24

I am warming up to just adding this syntax to Discourse core, provided it is behind a site setting.

Can you clarify if you require formatting in the brackets, eg: [*爨*]【*かし*】 ?


#25

I’ve never seen anyone use any formatting within Ruby text, and both Japanese and Chinese traditionally lack both bold and italic, so it most definitely isn’t too important.

I don’t think there’s a reason to explicitly forbid it, though. Since formatting is available in regular CJK text (), it might as well be valid within the base text of a Ruby tag. I can also imagine a scenario where someone would want to emphasize a certain part of pronunciation ("She actually says [寂]【さ**み**】しい in this case"), in which case formatting in the actual Ruby text would be useful. None of those would be particularly common, I don’t think, but it could become a minor thing people would stumble over every now and then.

The one thing that might get weird is ending a formatting block within a Ruby block. For example, *Italic outside[italic base]【italic ruby* normal ruby】. Either formatting could be disallowed within Ruby tags, or this could result in something like:

<i>Italic outside</i>
<ruby>
    <rb>
        <i>italic base<i>
    </rb>
    <rt>
        <i>italic ruby</i> normal ruby
    </rt>
</ruby>

I suppose it could also introduce a new “context” where new formatting tags can be started but earlier ones can’t be completed… I’m sure you’re more experienced with that kind of thing than I am, though – it’s bound to have come up with other elements before. Stick to what CommonMark usually does, I guess.


#26

In the Discourse context the main reason is cause I can not make this an inline rule, it would have to be a post process rule that walks through text nodes. ]【 etc, are skipped in inlines. So you would get no formatting in these tags.


#27

I see. In that case, it definitely isn’t a very big deal at all.

What about inclusion in CommonMark itself? It’d certainly be useful outside of Discourse too.


#28

My vote is yes it should be included, but I have no real say here at all, deciding what is or is not included is up to @jgm


#29

Unlikely, as we’re still trying to get the basics nailed down for 1.0 before adding extensions like this.


#30

I’m mainly afraid that sites/software/services will implement CommonMark 1.0 and be done with it, with most never migrating to later versions as the first “complete” version is “good enough”. In that case, features like this would be relegated to relative obscurity, which is a terribly sad thought.


#31

Not really, tables for example is a popular extension but not part of spec at the current time.


#32

Japanese here. This is my opinion:

  • 【】 is commonly used for titles or categories (e.g. 【速報】(news)) in Japanese text. So I feel 【】for ruby is a bad idea. I don’t know why Japanese StackExchange chose it. Seriously I want to ask about this decision to StackOverflow’s developers.
  • Aozora Bunko also has ruby’s syntax for plain texts. It is《》. (description and example.) This syntax is literally the biggest example of ruby syntax in the world. Originally, 《》 is used by the Books for the Blind Association in Japan. And Aozora Bunko follows it. If you say “let’s use【】 for ruby”, then I will also say “why not 《》? ;-)”. Aozora Bunko doesn’t adopt Markdown though.
  • 【】and 《》 are hard to type for normal keyboards. It’s also a bit hard even with Japanese IME. Currently all of syntaxes for Markdown are composed of singlebyte characters and these are easy to type with English keyboard. I hope this defacto standard will be kept.
  • I believe ruby is not only for japanese text. It is just a expression of all language’s text. Someday it may be used with a great idea for some language. I mean we don’t need to stick to specific language like Japanese.

#33

Did I stop the discussion? :thinking:
Sorry if it was offensive. :cry:
I just gave the above opinion :loudspeaker: as just a person in Japan. :jp:
Of course, it is just my personal opinion :loudspeaker:, and I don’t obsess it. :tipping_hand_woman:
I hope that the discussion :speaking_head: will resume again. :pray::pray::pray:


#34

Discussion is rather slow, don’t take it personally, that is the normal pace of discussion here sometimes :wink: :turtle:


#35

Hi I was looking for info on doing rubies in Markdown, and I found this thread.

I just wanted to mention that I have developed an extension to make rubies above Kanji using the Madoko markdown engine. (I’d love to use Commonmark, but Madoko was the only way I could figure out how to implement rubies)

Basically, the syntax looks like this:

永禄(えい・ろく)十二年 己巳(つちのとのみ)の歳(とし)より、翌年(よく・どし)午(うま)七月まで、天(あま)に烟(けむり)の出(で)る星(ほし)出(で)たり。[^1]

and the HTML rendered output looks like this:

%E6%B0%B8%E7%A6%84%E5%8D%81%E4%BA%8C%E5%B9%B4

and the PDF output rendered with XeLaTex looks like this:

<apparently new users can only have one image per post, so I guess you’ll just have to imagine how beautiful the Latex output is>

In this example, I use full-width parentheses (i.e. the characters and ) to enclose the ‘ruby’ characters because, when typing Japanese using the OSX IME (and probably most other IMEs), that is what it prints by default when you hit shift+9 and shift+0 (at least on my machine). So basically, this choice was made for ease of typing over aesthetics, though it can use “half-width” characters (i.e. ( and )), (or any other characters) just as easily. In my opinion, this is the most intuitive syntax, and despite being full-width is still quite readable as plain text in, and certainly more so than I imagine 《》or 【】 would be, though I think maintaining compatibility with the Aozora Bunko style sheet would be a good idea.

Also, notice that there is no need in this syntax for enclosing a base kanji. That is, I don’t need to write something like [永禄]{えいろく}. This is because it uses regular expressions, built into Madoko, and assumes that any sequence of [Chinese characters]([ruby characters]) is a string that is meant to be converted to Furigana. If necessary, however, the opening and closing parentheses can be backslash-escaped.

Finally, notice that I use the “naka-guro” 中黒 middle dot () to optionally separate groups of ruby characters within ruby-word-groups. An en-dash - can also be used (i.e. 中黒(なか-ぐろ)) . This was implemented because sometimes the spacing of ruby characters above two or more Chinese characters is not necessarily uniform. For example, in the word shokuji 食事(しょく・じ) “meal”, the shoku character
食 is read with three hiragana characters (しょく), while the second character ji 事 uses only one (じ). Often times, the difference is negligible, but it is occasionally important.

Basically, my method works by using regular expressions to distinguish Chinese characters, but I don’t know if this is possible with Commonmark. If anyone would like to see the source code, or hear more of the details, I’d be happy to share it.

Thanks,
Tom

[^1]: This text is from a classical Japanese document called Kōyō gunkan 甲陽軍鑑 and means “From the 12th year of Eiroku (tsuchinoto no mi) to the seventh month of the following year of the Ox, in the heavens, a star that let off smoke (i.e. a comet) appeared.”