Can highly compatible implementations in other languages also become a part of CommonMark?

Vilx_ · September 7, 2014, 3:34pm

Currently there are only C and Javacript implementations of CommonMark. Obviously there are a lot of people that want to have (and/or write) an implementation in their own favorite language. Are there any plans to expand the “official” implementations and include more languages (and can I contribute? )? Or is there at least going to be some site with links to “certified” implementations?

I really think that it would be good if CommonMark also coordinated implementations in other languages so that there aren’t a zillion competing implementations of varying quality, but rather a few implementations of good quality. Also, a standardized API in all languages would be interesting, although with different paradigms (OOP, imperative, functional, etc) it can get somewhat tricky.

codinghorror · September 7, 2014, 11:56pm

I think any implementation that passes the tests embedded in the spec is “certified” CommonMark compatible.

It’s simply a matter of making the tests fully complete. What we have now is good, but it can always be improved!

balpha · September 8, 2014, 5:56am

I think there are two ways of looking at this.

One of them is compatibility of the conversion – given the same Markdown, the implementation produces the same HTML as the reference implementation(s). In the optimal case, this is equivalent to “the implementation matches the spec” and “the implementation passes all the tests”.

If this is the case, then by all means the implementation should be considered compatible, as @codinghorror says.

The second view is a similarity in how it’s implemented compared to the reference implementation. For example, you could probably create a CommonMark-compatible converter that works similar to John Gruber’s original Markdown implementation, which is a bunch of regex replacements with some extra logic inside it. And you could also create a CommonMark-compatible converter whose inner logic works similar to the CM reference implementation, including an identically-structured document tree (see “Appendix A: A parsing strategy” in the spec).*

Such a similarity would have the advantage that extensions could be ported between implementations in a fairly straightforward way. Say you have a CommonMark implementation in LISP that creates the same kind of document tree for its intermediate representation, and there’s some sort of plugin that processes this tree in some way before it’s passed to the HTML renderer. This plugin could probably be easily ported to, say, the JavaScript reference implementation. On the other hand, you could have a CommonMark implementation in Fortran that uses regular expression replacements similar to Gruber’s Perl original. Since this Fortran implementation has no comparable intermediate document tree, the plugin couldn’t easily be ported to work in it.

So the Fortran implementation should by all means considered CM-compatible, but the LISP implementation has an additional nice-to-have level of compatibility. I don’t know that we need to have a name for this additional level, but it’s something to keep in mind.

Of course a plugin written for a Ruby implementation that uses a similar regex replacement technique could be easily ported to the Fortran version, so this kind of “compatibility” doesn’t only apply to the reference implementations.

*or how ever else the final CM reference implementation may eventually work

Vilx_ · September 8, 2014, 6:11am

No, this isn’t what I meant. Sure, anyone can create a CM-compliant implementation and then proudly present their “compliant with the spec” badge. That’s pretty much why there is a CM in the first place - so people could make “compliant” implementations.

But “compliant” is one thing and “officially endorsed” another. Anyone can make the first one, but the second one implies that the very fathers of CM have looked this implementation over and agreed it to be of superb quality. It’s a mark of recognition that sets an implementation apart from all the rest. It says “This particular implementation is preferred over all the others in the market. Consider it to be the one and only canonical reference implementation for [insert language here]. This is GUARANTEED to be good. You can’t go wrong with it.”

And this is what I’d like. To spare me (and other developers) the choice of picking among zillion competing implementations by pointing out one which is known to be good, is unlikely to be abandoned (unless CM itself is abandoned), and will always follow the standard.

Vilx_ · September 12, 2014, 9:56am

Ermm… hello? Bump? Did I say something wrong?

mb21 · September 12, 2014, 10:09am

I guess most people here (myself included) feel that a “best implementation for language X” award shouldn’t be in the scope of CommonMark. W3C doesn’t pick a “best HTML browser for operating system X” either. As long as they can show they pass all the tests, they are CommonMark compatible and that’s all there is to it. Beyond that there come a lot of subjective (interface and documentation styles) and hard to measure (like performance) criteria into play.

Vilx_ · September 12, 2014, 3:08pm

Well… sort of… but they already have reference C and Javascript implementations. Why not other languages as well then?

mikl · September 12, 2014, 7:46pm

Well, each reference implementation requires careful maintenance, much more so than any other implementation (higher standards), and for every update to the spec, all reference implementations will have to be updated too.

So every reference implementation is a maintenance burden, especially you’d need an expert developer for each of those languages to maintain each one. So if we had one for every programming language that implements markdown, we’d need dozens of maintainers and updating the specs would be a huge effort, coordinating all those people.

I’d assume the reason that there are two rather than one reference implementation is to have one for dynamic and one for static languages. And C and JavaScript have probably been carefully chosen as the most wide-spread and generally useful of those two families. Almost any other language can have C-extensions, so you could easily use stmd to create a Python or PHP module that would use the C implementation. And JavaScript scarcely needs an explanation. Lingua franca of the web, need I say more?

So, in short, adding more reference implementation would not gain us much (any developer looking to implement markdown probably understands enough of either C or JavaScript to use the reference implementation to make his own – that is the purpose of a reference implementation, after all), and would add a substantial maintenance burden.

Vilx_ · September 12, 2014, 8:34pm

Well, allright then.