Hello folks.
I would like to introduce here new C implementation, MD4C. It is very new but I hope it already deserves some attention.
Why?
-
Because I needed something with support of Windows Unicode (UTF-16 LE) natively and it would be too much work incorporate that into existing solutions which are too integrated with some UTF-8 code everywhere.
-
Because cmark is too big to my liking (according to
cloc
, it is 26.2K lines of code, MD4C has currently 3.7K lines of code). -
Because cmark, given it is reference implementation, could never accept any extensions and maintaining forks is often too much work.
-
Because I disliked Hoedownās API. (So many callbacks? Should API enforce some kind of its own buffer to an application using it? Etc.)
-
And last but not least, because I thought it cannot be difficult. Here I was really wrong. Especially the
pathological_tests.py
from Cmark made my head almost explode.
Current status:
-
All major CommonMark features are implemented. As of 622 tests provided by CommonMark specification 0.27: 568 tests pass and 54 fail. I hope it will be eventually even better in a foreseeable future.
-
MD4C deals fine with the infamous cmarkās
/pathological_tests.py
testsuite. -
Few extensions like tables or permissive autolinks are implemented. (They have to be explicitly enabled.)
Performance:
Not yet tested it thoroughly but on my Windows machine with the benchinput.md
and pathological_tests.py
from cmark, I get these results:
Cmark:
$ time build/src/cmark bench/benchinput.md >/dev/null
real 0m1.634s
user 0m0.015s
sys 0m0.000s
$ time test/pathological_tests.py --program build/src/cmark.exe >/dev/null
real 0m0.718s
user 0m0.015s
sys 0m0.045s
MD4C:
$ time md2html/md2html.exe ../../cmark/bench/benchinput.md >/dev/null
real 0m1.391s
user 0m0.000s
sys 0m0.000s
$ time test/pathological_tests.py --program ../md4c/bin/md2html/md2html.exe >/dev/null
real 0m0.382s
user 0m0.015s
sys 0m0.030s
Disclaimer:
Please note it is still quite fresh piece of code. It needs more testing. So I am pretty sure there are still some bugs lurking.
If you encounter any, please report them, ideally as GitHub issues.
Big thanks:
Big thanks belong to all those people working on the CommonMark specification, cmark test suite and other stuff. MD4C would be much worse then it currently is without it. I especially appreciate that the CommonMark specification tests are responsible for most of the great code coverage.