Questions about libcmark

Hello,

I am in the process of creating a Common Lisp bindings library for the reference implementation and I have a few questions about libcmark. I am using the cmark(3) man page for reference. While I am at it, I also want to thank you for providing a man page, I hate how in recent years many projects host their documentation exclusively online instead.

First of all, there are a few node types which are just aliases for other types:

CMARK_NODE_FIRST_BLOCK = CMARK_NODE_DOCUMENT,
CMARK_NODE_LAST_BLOCK = CMARK_NODE_THEMATIC_BREAK,
CMARK_NODE_FIRST_INLINE = CMARK_NODE_TEXT,
CMARK_NODE_LAST_INLINE = CMARK_NODE_IMAGE

What is up with these? Are they deprecated and only there for backwards compatibility? Are they subclasses? Which ones should I expose in my library?

Second, from the description of the iterator it sounds like depth-first order, starting at the root. Am I correct in my understanding?

Third, if I understand the streaming parser correctly, the it only differs from the simple parser in that it allows to feed the document piecewise to the parser. But we still need to feed the entire document into the parser before we can start parsing, right?

Fourth, what does node consolidation (cmark_consolidate_text_nodes) do?

Fifth, when does the CMARK_EVENT_NONE even occur? It is defined as one of the iterator event, but there is no description. Is it the initial event value of the iterator?

I think that’s all for now, I might have more questions later. The library is actually mostly done, I just need to write more tests for the parser and iron out some details. There are two systems (what other languages call packages): libcmark consists of low-level 1:1 binding with all the advantages and disadvantages, while cmark is a lispy high-level library. It uses native classes to represent nodes rather than wrapping C pointers to make memory management easier. When a native node gets garbage-collected I would have to free its corresponding foreign node, but there is no standard way of specifying finalizers in Common Lisp. There are ways around that, but in the end I found it less messy to just use libcmark for its parser and keep the rest native. If someone really wants a wrapper around foreign pointers he can build his own high-level system on top of libcmark and ignore cmark, hence why I split the system in two.

1 Like

Alejandro Sanchez via CommonMark Discussion
noreply@talk.commonmark.org writes:

CMARK_NODE_FIRST_BLOCK = CMARK_NODE_DOCUMENT,
CMARK_NODE_LAST_BLOCK = CMARK_NODE_THEMATIC_BREAK,
CMARK_NODE_FIRST_INLINE = CMARK_NODE_TEXT,
CMARK_NODE_LAST_INLINE = CMARK_NODE_IMAGE

What is up with these? Are they deprecated and only there for backwards compatibility? Are they subclasses? Which ones should I expose in my library?

They’re used to determine whether a node is block-level or
inline-level. See e.g. S_is_block in src/node.c.

Second, from the description of the iterator it sounds like depth-first order, starting at the root. Am I correct in my understanding?

Yes, basically, depending on what you mean by that.
For each node you’ll get a CMARK_EVENT_ENTER event on that
node, then it will descend to any children, and then when
the children are all done you’ll get CMARK_EVENT_EXIT on the
original node.

Third, if I understand the streaming parser correctly, the it only differs from the simple parser in that it allows to feed the document piecewise to the parser. But we still need to feed the entire document into the parser before we can start parsing, right?

Well, with the streaming parser, some block-level parsing does actually
get done every time you push in a chunk of text. But you won’t
get a final result until you call cmark_parser_finish. This is
because inline parsing can’t be done until we know what all the reference
definitions are, and we don’t know this til we’ve seen the whole
document.

Fourth, what does node consolidation (cmark_consolidate_text_nodes) do?

It will combine two adjacent text nodes, e.g. with contents “FOO”
and “BAR”, into a single text node with “FOOBAR”.

Fifth, when does the CMARK_EVENT_NONE even occur? It is defined as one of the iterator event, but there is no description. Is it the initial event value of the iterator?

Exactly.

I think that’s all for now, I might have more questions later. The library is actually mostly done, I just need to write more tests for the parser and iron out some details. There are two systems (what other languages call packages): libcmark consists of low-level 1:1 binding with all the advantages and disadvantages, while cmark is a lispy high-level library. It uses native classes to represent nodes rather than wrapping C pointers to make memory management easier. When a native node gets garbage-collected I would have to free its corresponding foreign node, but there is no standard way of specifying finalizers in Common Lisp. There are ways around that, but in the end I found it less messy to just use libcmark for its parser and keep the rest native. If someone really wants a wrapper around foreign pointers he can build his own high-level system on top of libcmark and ignore cmark, hence why I split the system in two.

Sounds good!

Thank you for your answers

After reading the source code it seems that the tree is mutated as the iterator walks over it and it will merge any number of adjacent sibling text nodes, not just two. The last node remains and the other ones are dropped from the tree. What happens if someone has references to those nodes? Do those become dangling pointers then?