Why is MD4C so fast? [C]

mity · July 25, 2017, 12:50am

Well, this thread made me finally do what I was postponing for a long time: Doing some better bechmarking of MD4C. For now, I compared it just against Cmark, as it is probably the most relevant competitor.

As these numbers provide some hard data for the discussion above, let me to publish them here.

The testing was done on 64-bit Linux machine (Slackware 14.2). All input files were placed in tmpfs filesystem to mitigate any I/O impact. The script used for the testing can be found in this gist but note it is not easily reusable without some manual tweaking and that it uses some scripts from Cmark’s repo.
Fresh release build of current master head was used both for MD4C as well as for Cmark.

The test composed of several samples. Usually the samples try to target dominantly one particular aspect of the parsers implementation. For example, the test many-paragraphs.md contains just 1,000,000 trivial paragraphs and tries to examine mainly how block parser behaves. Similarly all the tests are just made by huge repetitions of some very simple pattern which tends to be used frequently in any markdown document. (Some tests use different count of repetitions to give some measurable numbers).

Just the sample cmark-benchinput.md is different: It is compilation of mutiple language version of the pro-git book, as generated by make bench from Cmark’s repo. Unlike the other samples, it can be seen as a representative of “normal input”.

Each sample was performed 10 times. Given that stddev was always negligible, the table below contains only mean times in seconds (complete output of the script is in the comment of the script gist)

The benchmaring also helped to find one nasty bug in MD4C (the results below are after the fix applied).

Test name	Sample input	MD4C (seconds)	Cmark (seconds)
`cmark-benchinput.md`	(benchmark from CMark)	0.3650	0.7060
`long-block-multiline.md`	`"foo\n" * 1000000`	0.0400	0.2300
`long-block-oneline.md`	`"foo " * 10 * 1000000`	0.0700	0.1000
`many-atx-headers.md`	`"###### foo\n" * 1000000`	0.0900	0.4670
`many-blanks.md`	`"\n" * 10 * 1000000`	0.0700	0.3110
`many-emphasis.md`	`"foo " * 1000000`	0.1100	0.8460
`many-fenced-code-blocks.md`	`"~~~\nfoo\n~~~\n\n" * 1000000`	0.1600	0.4010
`many-links.md`	`"[a](/url) " * 1000000`	0.2100	0.5110
`many-paragraphs.md`	`"foo\n\n" * 1000000`	0.0900	0.4860

I find quite surprising that the performance ratio between the two competitors varies so much among the samples.

For the cmark-benchinput.md test, I also compared the memory consumption with the memusage(1) utility. Just few numbers from it:

	MD4C	Cmark
Count of `malloc()` calls	5	3
Count of `realloc()` calls	36	1304578
Count of `calloc()` calls	1	1587507
Count of `free()` calls	10	2369043
Heap peak	275504032 bytes (~262.74 MB)	495063570 bytes (~472.13 MB)
Heap total	275508128 bytes (~262.75 MB)	504309058 bytes (~ 480.94 MB)

Given that size of the input document is 110648441 bytes (~105.52 MB) and that MD4C’s simplistic md2html utility renders the output into one big growing memory buffer before outputting it, it shows its overhead (approx. heap peak - 2 * document size) is pretty low.

But keep in mind, the memory comparison is very unfair as MD4C is SAX-style parser and does not build any AST representation of the document. That gives MD4C huge advantage in this regard. Given the number of allocation calls in Cmark, this for sure also plays some role for the performance.