MD4C: New implementation in C

I’m looking forward to port and add my own front end to it (ie, as a replacement for md2html), but the latest interface changes kept me on my toes to adapt my - only slightly differing - clone of md2html, so this was a bit daunting

I can imagine changes of these kinds:

  • some new constants in the enums (e.g. MD_TEXTTYPE, MD_SPANTYPE, MD_BLOCKTYPE) are added as new extensions (or new features to CommonMark) are added, but it should keep source-level compatibility. Other then that the API of the parser is more or less as flexible as we need it.

  • Related new ‘detail’ structures might be added.

  • Current detailed strructures might be enriched where it is useful.

All such changes should be mostly source-level compatible and I think there is already time to start taking this into account.

Other then, the way how the interface looks and how it is flexible should be hopefully sufficient for long time. (Fingers crossed.)

And would you still incorporate some code improvements (in my view),
like using table lookup for character classification instead of complex
comparison and logical expressions (packaged as macros), before making
an “official” release?

According to my analysis with gprof there are three functions which deserve some optimizations:

  • md_collect_marks()
  • md_analyze_line()
  • render_html_escaped() (but this is on the HTML renderer side)

Each of these takes about between 20% and 30% of time.

There already is such optimization based on table lookup for some time in md_collect_marks() and it helped way a lot for its performance.

I tried implementing something similar for md_analyze_line() but there was no measurable gain so I did not commit that. I believe that’s because most of the time the function spends in just two lines when scanning for end of the line (just after the label done:).

I did not care about the renderer side much yet and I am unsure whether the render_html_escaped() may be considerably faster then it is.

Impact of all other functions is more or less negligible. It’s IMHO not worth of making them more complicated.

BTW, when talking about performance, I also tried profile guided optimization with gcc using long calibrating input created by concatenating MD4C’s README.md 100 times together. The result was another 30% performance gain over normal Release build without the PGO (measured with Cmark’s make bench).

I don’t know what optimizations gcc did although such analysis would be interesting. I would guess there was some aggressive loop unrolling and/or inlining because the binary was about 10% or 20% larger then without PGO.