Should parsers save line/column information for inlines?


#1

Copied from https://github.com/jgm/CommonMark/issues/170:

Currently the tab-to-space conversion is done before parsing so when the source column number is stored in the AST it does not match the character position in the original input.

True. That’s unfortunate, though it shouldn’t affect block elements, as you point out.

This probably makes it not suitable for precise mapping between the output and the source. This might not be a significant issue for block elements but it would certainly create issues for inlines.

My current idea is for the untabify function to create a linked list of all tab characters and how many spaces it added so that when a node is created, it can adjust the number.

We could do this, but it’s more memory and more processing, and will slow things down. Not sure it’s worth it (see below).

Also, not to create another issue - the inlines do not track the start/end line/column numbers (at least in the C implementation) - is that intentional?

Yes, I figured that the biggest use of source location would be to locate the block element (so that, e.g., clicking on a block in the HTML pane will move the cursor to that block in the Markdown pane). I don’t know how much added value you’d get from doing this for inlines. (And there are some technical difficulties with doing this for inlines, with the current separation of block and inline parsing.)


So this should be open for discussion - is there any real need for the parsers to calculate the exact line/char each inline starts and ends? For example, to do markdown syntax highlighting or maybe something completely different?


#2

It’s not clear for me too. We completely skipped columns calc until someone can exaplain, why those needed at all. Lines are required in blocks for sync scroll, that’s obvious.


#3

+++ Vitaly Puzrin [Oct 28 14 17:57 ]:

[1]vitaly
October 28

It’s not clear for me too. We completely skipped columns calc until
someone can exaplain, why those needed at all. Lines are required in
blocks for sync scroll, that’s obvious.

Syntax highlighting has been mentioned.


Syntax Highlighting Source CommonMark for Editors
#4

Ah, yes, i thinked about abstract web editors and completely fogot about editor programs. Sorry.


#5

I wanted to open up this discussion again. Since tabs are no longer expanded, is there anything that keeps us from implementing correct values for sourcepos in cmark? Is there maybe some work is already going on to add that?

My use case is for syntax highlighting the source. I would love to see this added into markdown-it as well.


#6

See comments here and in the linked issue.

Having a sourcepos attribute on elements is not the way to go, I’ve concluded. Some kind of separate source map (which might be an element of the AST) is needed. Unfortunately it’s not trivial to add this to the current reference implementation, but when I find time, I plan to do it.