@jgm I tried to cook something. It is still very buggy.
From the standard spec suite, 191 tests still fail.
It seems I do not really understand how the AST should look in many cases. Maybe if you may look at it and give some advice. The most relevant code for you should be in md2html+ast/build_ast.c
.
For example code spans, code inlines, tight lists do not work at all. I guess you could tell from the 1st sight, what I should do differently to produce AST palatable by cmark_render_html()
.
Neverthless I tried few tests where it seems to work already with updated test script. The script now also collects some heap info about all the tests. (But many tests are commented out as they are broken for md2html-ast
)
Few notes:
- Cmark 0.28 is used with few additions (node setters who accept strings not terminated with zero byte (they have extra size argument) so MD4C callbacks do not need to create temp. buffers to add just the zero, call the Cmark function who just again uses it to just call
strlen()
.
- There is still some slowdown caused by the fact that
cmark_render_html()
returns zero-terminated string forcing the caller to do strlen()
on it, making effectively an extra iteration over all output. This makes some slowdown in md2html+ast
on its own. It would be good to get rid of it.
- IMHO, most users of the API likely have to face the same problems.
- I did not study if Cmake internally uses
strlen()
as extensively on its own. If it does, it may play big role in the performance difference.
So the results gathered so far:
/home/mity/prj/md4c/bin/md2html/md2html [performance]:
samples/empty.md: mean = 0.0000, median = 0.0000, stdev = 0.0000
samples/long-block-oneline.md: mean = 0.0700, median = 0.0700, stdev = 0.0000
samples/many-atx-headers.md: mean = 0.0800, median = 0.0800, stdev = 0.0000
samples/many-paragraphs.md: mean = 0.0800, median = 0.0800, stdev = 0.0000
/home/mity/prj/md4c+ast/bin-release/md2html+ast/md2html+ast [performance]:
samples/empty.md: mean = 0.0000, median = 0.0000, stdev = 0.0000
samples/long-block-oneline.md: mean = 0.0700, median = 0.0700, stdev = 0.0000
samples/many-atx-headers.md: mean = 0.3200, median = 0.3200, stdev = 0.0000
samples/many-paragraphs.md: mean = 0.3170, median = 0.3200, stdev = 0.0048
/home/mity/prj/cmark/build/src/cmark [performance]:
samples/empty.md: mean = 0.0000, median = 0.0000, stdev = 0.0000
samples/long-block-oneline.md: mean = 0.1060, median = 0.1100, stdev = 0.0052
samples/many-atx-headers.md: mean = 0.4610, median = 0.4600, stdev = 0.0032
samples/many-paragraphs.md: mean = 0.4800, median = 0.4800, stdev = 0.0000
/home/mity/prj/md4c/bin/md2html/md2html [memory consumption]:
samples/empty.md: heap total: 37480, heap peak: 37480, stack peak: 464
samples/long-block-oneline.md: heap total: 112119465, heap peak: 112117673, stack peak: 1600
samples/many-atx-headers.md: heap total: 58314592, heap peak: 58310496, stack peak: 1600
samples/many-paragraphs.md: heap total: 36425988, heap peak: 36421892, stack peak: 1600
/home/mity/prj/md4c+ast/bin-release/md2html+ast/md2html+ast [memory consumption]:
samples/empty.md: heap total: 37601, heap peak: 37601, stack peak: 1104
samples/long-block-oneline.md: heap total: 167119864, heap peak: 167113976, stack peak: 1568
samples/many-atx-headers.md: heap total: 324175128, heap peak: 309560496, stack peak: 1568
samples/many-paragraphs.md: heap total: 317397400, heap peak: 301171888, stack peak: 1568
/home/mity/prj/cmark/build/src/cmark [memory consumption]:
samples/empty.md: heap total: 5649, heap peak: 5504, stack peak: 4608
samples/long-block-oneline.md: heap total: 229714616, heap peak: 169710232, stack peak: 8640
samples/many-atx-headers.md: heap total: 342620248, heap peak: 342614784, stack peak: 8640
samples/many-paragraphs.md: heap total: 344231120, heap peak: 344225664, stack peak: 8640
So if these preliminary results can be trusted, the performance md2html+ast
is somewhere between md2html
and cmark
.
EDIT: I have added the bench.sh script into the https://github.com/mity/md4c-ast repo for case someone wants to play with it. Follow its README to perform the tests on your machine.