Should the spec address nesting limits?

Some Markdown implementations (like Sundown) will stop parsing inlines and blocks once the parser hits a certain nesting level. There are a couple reasons for this, but I think the most common one is preventing user-supplied Markdown from causing a stack overflow.

For example, here’s what the reference C implementation does if you supply it with 50000 nested emphasis inlines:

$ python -c 'print ("_*" * 25000) + "foo" + ("*_" * 25000)' | valgrind ./stmd > /dev/null
==8528== Memcheck, a memory error detector
==8528== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==8528== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==8528== Command: ./stmd
==8528== 
==8528== Stack overflow in thread 1: can't grow stack to 0xfe116fec

Obviously not ideal when dealing with untrusted input, so should the spec address this by adding a “reasonable” limit on nesting levels for inlines / blocks, or should it be left up to implementers to decide? What should a parser do once it hits its nesting limit?

Apart from maliciously constructed input, I think it’s even more likely that non-Markdown input such as binary files incorrectly interpreted as Markdown will cause implementations to fail in ways that no spec could ever predict.

In practice there will always be limits for implementations. If it’s not a nesting limit, then it’s a document size limit. It depends on the implementation whether and when such a limit is hit. Some implementations might use tail-recursion or some other optimization and will never hit a nesting limit. Other implementations might be constrained by their hardware, platform or libraries and hit some limit much sooner. Most limits aren’t clear-cut and depend on too many factors.

Therefore any limit that would be specified in the spec is too restrictive for some, and too liberal for others. In my opinion implementations should set their own limits, if possible at all, and simply fail if a limit is hit or a document can’t be rendered. Very few documents will ever hit these limits in any implementation.

I think the behavior when a document can’t be rendered must not be part of the spec either. Some implementations can decide to produce the part of the document that did succeed up to now, other implementations throw an exception or return an error code. There is no one-size-fits-all solution.

5 Likes

True, but the latter can be dealt with outside the implementation proper by checking the length of the input.

That seems reasonable to me, although I think we should be explicit about what a parser may take into account when deciding it can’t render the document as provided. Something that decides to stop parsing inlines after a certain depth a la Sundown should still be considered compliant, but something that refuses to render any documents containing "f"s shouldn’t.

Even better if a certain baseline level of functionality is guaranteed like “Parser behaviour is undefined when the nesting level passes 100.”

But how would you decide to put the limit at say 100? Why not 150, or 50? How can an implementation guarantee it can reach that limit of 100? For example, 75 inlines in 50 nested lists in 25 block quotes could be within spec limits, yet unrenderable. Or 50 inlines in a pretty large document could push the application over its limits, whereas it would be fine in a smaller document. Bottom line is that it would be impossible to predict all the possible interactions between inputs and the implementations.

You are assuming perfect implementations, that can identify all possible corner cases and react appropriately. This is not reasonable. Instead, I propose to simply let an implementation fail whenever it happens, and let it handle that failure however it appropriate for the implementation.

Additionally this would make implementations perform a little better, as they don’t have to be checking limits all the time.

[quote=“Virtlink, post:4, topic:602”]
But how would you decide to put the limit at say 100? Why not 150, or 50?[/quote]

It’s an arbitrary limit, but it would be pretty silly to call an implementation that couldn’t handle triply-nested <em>s compliant. You need some point where you can say “if it can’t handle this, it doesn’t handle CommonMark.”

[quote=“Virtlink, post:4, topic:602”]
How can an implementation guarantee it can reach that limit of 100? […] You are assuming perfect implementations, that can identify all possible corner cases and react appropriately. This is not reasonable.[/quote]

It don’t know that they’d need to formally prove they properly handle every possible range of input, but if it was shown that they could not it would be considered a spec compliance issue. Having all of these corner cases specified is why I like CommonMark.

What’s “it” here? The nesting issue or just general errors?

They needn’t check anything if they don’t want to, I’m saying that after a certain point the behaviour could be implementation-defined. It could continue normally, crash and burn, or make dragons fly out of your nose.

The CommonMark spec’s goal is to remove any differences and ambiguities in how input is handled by different implementations. While the nesting limit might indeed be different for different implementations, you’re not proposing how to handle them. Continue until an error occurs isn’t handling, and will still result in different outputs in different implementations.

And putting nesting limits in the spec still wouldn’t allow you to make statements about an implementation’s conformance to the spec, as implementations can’t guarantee these limits in all circumstances. Bugs, oversights, or unexpected resource constraints may cause the implementation to bail out both before and after the spec’s limit was reached.

Fatal errors will always occur. If it isn’t simply because of excess nesting, then it occurs due to something even more shady or a combination of factors. There will always be a difference between the theoretical ideal implementation with unlimited memory and processing power that is described by the spec, and the implementation in practice that is limited in different complicated ways. I wouldn’t add rules to the spec for only those theoretical ideal implementations, which actual implementations don’t check, cannot prove, and won’t guarantee.