Performance of CommonMark reference implementations

John MacFarlane says he has designed the CommonMark reference implementations (in C and JavaScript) to be fast.

I’ll let him explain.


I have designed the implementations to be fast – e.g. avoiding backtracking whenever possible. I have optimized them to the best of my ability, but I’m not a js or C wizard, so I’m sure they can still be improved.

I can give you some data on performance.

make benchjs on my macbook air says:

  • std markdown → html x 226 ops/sec ±6.74% (88 runs sampled)

  • showdown.js markdown → html x 123 ops/sec ±1.22% (81 runs sampled)

  • marked.js markdown → html x 415 ops/sec ±0.76% (94 runs sampled)

So there may be room for improvement; but of course, marked isn’t as accurate a converter, so some of its speed may be due to short cuts it is taking.

My tests of the C implementation suggest that its performance is about the same as discount’s. For example, on my laptop it takes 0.03s (user) to convert a 179K manual. It seems to me that this should be fast enough. sundown is considerably faster, though, so again there may be room for improvement – though again, I worry that sundown achieves the performance by making many shortcuts that make proper parsing impossible. Anyway, I am sure C experts will be able to improve the performance quite a bit.

Discussion is welcome here, of course, but if you have specific performance improvements to contribute, open pull requests at

Just to update this thread: after many performance improvements by Vicent Marti, cmark is now about 6 times faster than discount and just a tad slower than sundown.

Some rough benchmarks here

1 Like

I’ve spent the last couple days optimizing commonmark.js, which is now just a little slower than marked.js:

commonmark.js markdown->html x 709 ops/sec ±1.28% (95 runs sampled)
showdown.js markdown->html x 248 ops/sec ±1.90% (87 runs sampled)
marked.js markdown->html x 729 ops/sec ±2.20% (94 runs sampled)
markdown-it markdown->html x 986 ops/sec ±1.15% (94 runs sampled)

Note that the benchmarks are highly dependent on the specific input used; I used a 10 MB Markdown text composed of twenty concatenated copies of the first edition of Pro Git. (make benchjs will run the benchmark above; make benchjs BENCHINP=foo.txt will use foo.txt for the source.)

Seems your js benchmarks are for README.md, not for progit book.

Below are ones @ my mb pro retina

progit x 1

commonmark.js markdown->html x 47.04 ops/sec ±3.77% (64 runs sampled)
showdown.js markdown->html x 6.22 ops/sec ±0.53% (20 runs sampled)
marked.js markdown->html x 43.74 ops/sec ±1.21% (59 runs sampled)
markdown-it markdown->html x 54.75 ops/sec ±2.03% (59 runs sampled)

progit x 20

commonmark.js markdown->html x 1.55 ops/sec ±4.00% (8 runs sampled)
showdown.js markdown->html x 0.55 ops/sec ±0.71% (6 runs sampled)
marked.js markdown->html x 1.79 ops/sec ±2.08% (9 runs sampled)
markdown-it markdown->html x 2.10 ops/sec ±4.14% (10 runs sampled)

README.md

commonmark.js markdown->html x 1,058 ops/sec ±0.52% (98 runs sampled)
showdown.js markdown->html x 370 ops/sec ±1.46% (92 runs sampled)
marked.js markdown->html x 1,125 ops/sec ±0.52% (99 runs sampled)
markdown-it markdown->html x 1,659 ops/sec ±0.61% (98 runs sampled)

Looks like after next markdown-it renderer update i will not be able to say, that it’s faster than reference js implementation :slight_smile: .

My mistake! Well, I’m happy that commonmark.js does even better on the progit benchmark. It would probably good to put in place a better benchmark suite that tests a variety of different inputs, as you have in markdown-it.

Seems you were right in earlier measurements, and even markdown-it is still 5x slower than C implementation, on big files, instead of 2-3x as i expected :frowning: . No more ideas what to optimize.

I even tried to unify token classes at first level of properties, but didn’t noticed speed gain.

Here is result of our bench on 0.6 spec

Sample: spec.txt (110610 bytes)
 > commonmark-reference x 53.76 ops/sec ±6.42% (68 runs sampled)
 > current x 77.20 ops/sec ±4.07% (67 runs sampled)
 > current-commonmark x 101 ops/sec ±0.62% (75 runs sampled)
 > markdown-it-2.2.1-commonmark x 102 ops/sec ±0.31% (74 runs sampled)
 > marked-0.3.2 x 22.97 ops/sec ±0.63% (41 runs sampled)

Previous score for commonmark.js 0.15 was ~ 40 ops/sec.

By the way, I don’t think spect.txt is a good benchmark text. The reason is that it’s a very special kind of file – not Markdown but a combination of Markdown and some special conventions.

So, for example, all of the raw HTML in the right-hand panes of the examples will come through as raw HTML (since it’s not in a real code block in spec.txt), giving you a document that’s much heavier in raw HTML than in any other kind of construct – and quite atypical. The . delimiters for the example blocks will also prevent many of the Markdown examples from having their normal meanings, for example,

.
1. a list
2. not
.

will not be a parsed as a list, but as a regular paragraph, because of the dots.

Ideally we’d want a benchmark text that (a) contains every kind of construct and (b) reflects ordinary documents in the relative frequencies of things.

You are right, spec is not very good for it. It’s better to replace with something else.

I don’t care much about absolute speed and compare with other parsers. The most important for me is to detect anomalies and regressions. High markup density is preferable for my goals, but spec is really not ideal because of dots.

For markdown-it the most expencive will be very big nested blockquetes (they cause stacked remapping for ranges of lines). Inlines are ~ similar to reference. Probably, it’s possible to rewrite block level to be similar to reference, but design of optimal shared state machine is very boring - both me and Alex don’t wish to do it again. Ideally, if reference parser could support markup extensions is some “linear” way (not via tons of pre/post hooks).

+++ Vitaly Puzrin [Jan 12 15 04:55 ]:

Ideally, if reference parser could support markup extensions is some “linear” way (not via tons of pre/post hooks).

When I can find the time, I’d like to investigate making it more modular and providing a way to do this.

An interesting aspect I noticed when testing the performance of various .NET implementations - some fail really hard (>30 sec) when given a huge HTML file such as IMDB homepage. If you create a test suite for performance, such a test might be worth including.

I also used ProGit book for benchmarking but instead of using just English concatenated 10 times, I merged all languages together - so there are much more Unicode stuff going on which might be useful to test.

+++ Kārlis Gaņģis [Jan 12 15 15:50 ]:

An interesting aspect I noticed when testing the performance of various .NET implementations - some fail really hard (>30 sec) when given a huge HTML file such as IMDB homepage. If you create a test suite for performance, such a test might be worth including.

This must be something specific to these implementations. Both the
reference implementations (C and JS) parse that page in an instant.

I also used ProGit book for benchmarking but instead of using just English concatenated 10 times, I merged all languages together - so there are much more Unicode stuff going on which might be useful to test.

Nice idea. I think I’ll do that too.

1 Like

In upcoming 4.0 of markdown-it, rewriting renderer to support attributes list costed ~20% of performance. Much better than i expected. New renderer use simiplar approach as reference parser - attributes are stored in array, to allow extend output without renderer functions change.

1 Like