Performance of CommonMark reference implementations

codinghorror · September 1, 2014, 7:26am

John MacFarlane says he has designed the CommonMark reference implementations (in C and JavaScript) to be fast.

I’ll let him explain.

I have designed the implementations to be fast – e.g. avoiding backtracking whenever possible. I have optimized them to the best of my ability, but I’m not a js or C wizard, so I’m sure they can still be improved.

I can give you some data on performance.

make benchjs on my macbook air says:

std markdown → html x 226 ops/sec ±6.74% (88 runs sampled)

showdown.js markdown → html x 123 ops/sec ±1.22% (81 runs sampled)

marked.js markdown → html x 415 ops/sec ±0.76% (94 runs sampled)

So there may be room for improvement; but of course, marked isn’t as accurate a converter, so some of its speed may be due to short cuts it is taking.

My tests of the C implementation suggest that its performance is about the same as discount’s. For example, on my laptop it takes 0.03s (user) to convert a 179K manual. It seems to me that this should be fast enough. sundown is considerably faster, though, so again there may be room for improvement – though again, I worry that sundown achieves the performance by making many shortcuts that make proper parsing impossible. Anyway, I am sure C experts will be able to improve the performance quite a bit.

codinghorror · September 1, 2014, 7:28am

Discussion is welcome here, of course, but if you have specific performance improvements to contribute, open pull requests at

jgm · January 7, 2015, 5:24am

Just to update this thread: after many performance improvements by Vicent Marti, cmark is now about 6 times faster than discount and just a tad slower than sundown.

Some rough benchmarks here

jgm · January 11, 2015, 2:01am

I’ve spent the last couple days optimizing commonmark.js, which is now just a little slower than marked.js:

commonmark.js markdown->html x 709 ops/sec ±1.28% (95 runs sampled)
showdown.js markdown->html x 248 ops/sec ±1.90% (87 runs sampled)
marked.js markdown->html x 729 ops/sec ±2.20% (94 runs sampled)
markdown-it markdown->html x 986 ops/sec ±1.15% (94 runs sampled)

Note that the benchmarks are highly dependent on the specific input used; I used a 10 MB Markdown text composed of twenty concatenated copies of the first edition of Pro Git. (make benchjs will run the benchmark above; make benchjs BENCHINP=foo.txt will use foo.txt for the source.)

vitaly · January 11, 2015, 6:58am

Seems your js benchmarks are for README.md, not for progit book.

Below are ones @ my mb pro retina

progit x 1

commonmark.js markdown->html x 47.04 ops/sec ±3.77% (64 runs sampled)
showdown.js markdown->html x 6.22 ops/sec ±0.53% (20 runs sampled)
marked.js markdown->html x 43.74 ops/sec ±1.21% (59 runs sampled)
markdown-it markdown->html x 54.75 ops/sec ±2.03% (59 runs sampled)

progit x 20

commonmark.js markdown->html x 1.55 ops/sec ±4.00% (8 runs sampled)
showdown.js markdown->html x 0.55 ops/sec ±0.71% (6 runs sampled)
marked.js markdown->html x 1.79 ops/sec ±2.08% (9 runs sampled)
markdown-it markdown->html x 2.10 ops/sec ±4.14% (10 runs sampled)

README.md

commonmark.js markdown->html x 1,058 ops/sec ±0.52% (98 runs sampled)
showdown.js markdown->html x 370 ops/sec ±1.46% (92 runs sampled)
marked.js markdown->html x 1,125 ops/sec ±0.52% (99 runs sampled)
markdown-it markdown->html x 1,659 ops/sec ±0.61% (98 runs sampled)

Looks like after next markdown-it renderer update i will not be able to say, that it’s faster than reference js implementation .

jgm · January 11, 2015, 4:27pm

My mistake! Well, I’m happy that commonmark.js does even better on the progit benchmark. It would probably good to put in place a better benchmark suite that tests a variety of different inputs, as you have in markdown-it.

vitaly · January 11, 2015, 5:06pm

Seems you were right in earlier measurements, and even markdown-it is still 5x slower than C implementation, on big files, instead of 2-3x as i expected . No more ideas what to optimize.

I even tried to unify token classes at first level of properties, but didn’t noticed speed gain.

vitaly · January 11, 2015, 10:20pm

Here is result of our bench on 0.6 spec

Sample: spec.txt (110610 bytes)
 > commonmark-reference x 53.76 ops/sec ±6.42% (68 runs sampled)
 > current x 77.20 ops/sec ±4.07% (67 runs sampled)
 > current-commonmark x 101 ops/sec ±0.62% (75 runs sampled)
 > markdown-it-2.2.1-commonmark x 102 ops/sec ±0.31% (74 runs sampled)
 > marked-0.3.2 x 22.97 ops/sec ±0.63% (41 runs sampled)

Previous score for commonmark.js 0.15 was ~ 40 ops/sec.

jgm · January 12, 2015, 1:43am

By the way, I don’t think spect.txt is a good benchmark text. The reason is that it’s a very special kind of file – not Markdown but a combination of Markdown and some special conventions.

So, for example, all of the raw HTML in the right-hand panes of the examples will come through as raw HTML (since it’s not in a real code block in spec.txt), giving you a document that’s much heavier in raw HTML than in any other kind of construct – and quite atypical. The . delimiters for the example blocks will also prevent many of the Markdown examples from having their normal meanings, for example,

.
1. a list
2. not
.

will not be a parsed as a list, but as a regular paragraph, because of the dots.

Ideally we’d want a benchmark text that (a) contains every kind of construct and (b) reflects ordinary documents in the relative frequencies of things.

vitaly · January 12, 2015, 4:45am

You are right, spec is not very good for it. It’s better to replace with something else.

I don’t care much about absolute speed and compare with other parsers. The most important for me is to detect anomalies and regressions. High markup density is preferable for my goals, but spec is really not ideal because of dots.

For markdown-it the most expencive will be very big nested blockquetes (they cause stacked remapping for ranges of lines). Inlines are ~ similar to reference. Probably, it’s possible to rewrite block level to be similar to reference, but design of optimal shared state machine is very boring - both me and Alex don’t wish to do it again. Ideally, if reference parser could support markup extensions is some “linear” way (not via tons of pre/post hooks).

jgm · January 12, 2015, 5:22am

+++ Vitaly Puzrin [Jan 12 15 04:55 ]:

Ideally, if reference parser could support markup extensions is some “linear” way (not via tons of pre/post hooks).

When I can find the time, I’d like to investigate making it more modular and providing a way to do this.

Knagis · January 12, 2015, 3:39pm

An interesting aspect I noticed when testing the performance of various .NET implementations - some fail really hard (>30 sec) when given a huge HTML file such as IMDB homepage. If you create a test suite for performance, such a test might be worth including.

I also used ProGit book for benchmarking but instead of using just English concatenated 10 times, I merged all languages together - so there are much more Unicode stuff going on which might be useful to test.

jgm · January 12, 2015, 6:27pm

+++ Kārlis Gaņģis [Jan 12 15 15:50 ]:

An interesting aspect I noticed when testing the performance of various .NET implementations - some fail really hard (>30 sec) when given a huge HTML file such as IMDB homepage. If you create a test suite for performance, such a test might be worth including.

This must be something specific to these implementations. Both the
reference implementations (C and JS) parse that page in an instant.

I also used ProGit book for benchmarking but instead of using just English concatenated 10 times, I merged all languages together - so there are much more Unicode stuff going on which might be useful to test.

Nice idea. I think I’ll do that too.

vitaly · March 7, 2015, 12:16pm

In upcoming 4.0 of markdown-it, rewriting renderer to support attributes list costed ~20% of performance. Much better than i expected. New renderer use simiplar approach as reference parser - attributes are stored in array, to allow extend output without renderer functions change.

jackdw · February 4, 2021, 4:22am

I like these ideas, but when I am looking at benchmarks and stuff, I am concerned about different styles of writing Markdown documents. For example, I almost always using Atx Headings, Inline Links and no HTML. So, I can use my own blog Markdown to test the performance of how I write… but not as much for other styles.

Would there be some use in trying to get together a series of “sample” files that are good benchmarks that are some kind of adjunct to the specification?

codinghorror · February 7, 2021, 2:21am

Good question; @jgm do you have a set of “reference” documents in markdown (probably large ones, with lots of markdown features in the doc) that could be used for benchmarking speed of implementations? I looked up my old notes and I found reference to MDTest 1.1 but that was mostly about conformance, not performance.

Aha, reading back through the book, it looks like this was the choice – ProGit book, with all languages merged?

https://github.com/Knagis/CommonMarkBenchmark/blob/master/MarkdownCompare/Tests/progit.md

Note that this is a 10mb file, which is kind of extreme! It might be good to benchmark a single run on english ProGit, then the concatenated version.

But sadly Pro Git Second Edition moved to AsciiDoc, so…

vas · February 7, 2021, 3:30am

@codinghorror @jgm

I’ll bet the GitHub folks have a corpus of docs that is a representative sample of the Markdown in all the GitHub repos they host, that they use to perf test their implementation since I’m sure performance is especially important for them.

@codinghorror wouldn’t Stack Exchange also have such a sample?

mity · February 7, 2021, 10:06am

Take a look at this post too: Why is MD4C so fast? [C] - #12 by mity

It was a one-time hack, but I can imagine it might be expanded and turned into a more reusable form. The advantage of it is that it does actually multiple mini-benchmarks, each trying to address some particular part of the parser implementation. So this approach can also give a hint where your implementation can be behind.

jgm · February 7, 2021, 4:48pm

There are also some small fine-grained benchmark sample files in the cmark repository (bench/samples). (Run with ‘make newbench’.) Most of these derive originally from the markdown-it project. You can use these to determine that, for example, parsing of nested block quotes is faster in X than in Y.

jackdw · February 8, 2021, 2:58am

Cool suggestions all! I will start looking into each of them and report back here when i have some time.

While I don’t need to have everything be perfect, I do want to make sure I put my best foot forward and test a decent variety of styles to let me know: 1) that I didn’t miss any grammar related issues and 2) that I didn’t sacrifice performance in one area by not looking at it sufficiently.

Cheers!