Can we have formal escaping rules?

kenlyon · October 20, 2017, 8:49pm

Are there any plans to formalize a set of rules that could be used for escaping text to ensure no markdown formatting occurs? Here is my use case:

Data is retrieved from a data source and is assumed to contain plain text.
Data is presented to the user by combining it with a template written in markdown that contains placeholders for field names.

I want to ensure that the data is presented without any markdown applied to it.

Here are the steps that I’ve come up with thus far. I’d be interested to know if I’ve missed anything:

Escape known special characters:
1. \ backslash
2. ` backtick
3. * asterisk
4. _ underscore
5. {} curly braces
6. [] square brackets
7. () parentheses
8. # hash mark
9. + plus sign
10. - minus sign
11. . dot
12. ! exclamation mark
13. > greater-than
14. ~ tilde
Replace four leading spaces (/^[ ]{4}/g) with   to prevent being interpreted as a code block.
Replace 0-3 leading spaces + tab (/^[ ]{0,3}[ ]{1}/g) with   to prevent being interpreted as a code block.
Replace two trailing spaces (/[ ]{2}$/g) with   to prevent being interpreted as a line break.

zzzzBov · October 20, 2017, 10:48pm

You could probably wrap it in a <div> and HTML encode it.

Crissov · October 21, 2017, 8:18am

```plain text
There will be no Markdown processing in here, 
but the fences should consist of more backticks (or tildes) 
than their longest run in the pasted text.
```

kenlyon · October 23, 2017, 3:18pm

Thanks for the suggestion. It’s certainly a simpler way to avoid having to escape everything. I did try this at the time but the downside is that everything’s preformatted, so it’s still not quite the same end result. I don’t want to preserve indenting/formatting of the source text.

It would be handy if markdown had some kind of “not markdown” tag that you could wrap stuff in that it would just pass through without wrapping in any tags or converting anything.

kenlyon · October 23, 2017, 3:22pm

The HTML encoding idea sounds interesting, although I don’t want a <div> around it. The source text needs to be added inline, in the context of markdown before and after.

A basic example: ## {Title}

zzzzBov · October 23, 2017, 3:51pm

I think in that case we need to formalize what it would mean to “Markdown Encode” text (and likely “Markdown Decode” for symmetry).

Based on your earlier posts, I considered suggesting we leverage <![CDATA[]]> blocks, but given your example of using a template that shouldn’t have additional formatting I think discussing what it would mean to encode text to markdown is appropriate.

Crissov · October 23, 2017, 9:44pm

The presentation, i.e. font and line breaking behavior, depend on your output format. It is simple to change in HTML+CSS for instance, assuming the parser passes the info string value somehow to the class attribute.