Copied from Issues · commonmark/commonmark-spec · GitHub
Currently the tab-to-space conversion is done before parsing so when the source column number is stored in the AST it does not match the character position in the original input.
True. That’s unfortunate, though it shouldn’t affect block elements, as you point out.
This probably makes it not suitable for precise mapping between the output and the source. This might not be a significant issue for block elements but it would certainly create issues for inlines.
My current idea is for the
untabify
function to create a linked list of all tab characters and how many spaces it added so that when a node is created, it can adjust the number.
We could do this, but it’s more memory and more processing, and will slow things down. Not sure it’s worth it (see below).
Also, not to create another issue - the inlines do not track the start/end line/column numbers (at least in the C implementation) - is that intentional?
Yes, I figured that the biggest use of source location would be to locate the block element (so that, e.g., clicking on a block in the HTML pane will move the cursor to that block in the Markdown pane). I don’t know how much added value you’d get from doing this for inlines. (And there are some technical difficulties with doing this for inlines, with the current separation of block and inline parsing.)
So this should be open for discussion - is there any real need for the parsers to calculate the exact line/char each inline starts and ends? For example, to do markdown syntax highlighting or maybe something completely different?