Can we have formal escaping rules?


#1

Are there any plans to formalize a set of rules that could be used for escaping text to ensure no markdown formatting occurs? Here is my use case:

  1. Data is retrieved from a data source and is assumed to contain plain text.
  2. Data is presented to the user by combining it with a template written in markdown that contains placeholders for field names.

I want to ensure that the data is presented without any markdown applied to it.

Here are the steps that I’ve come up with thus far. I’d be interested to know if I’ve missed anything:

  1. Escape known special characters:
    1. \ backslash
    2. ` backtick
    3. * asterisk
    4. _ underscore
    5. {} curly braces
    6. [] square brackets
    7. () parentheses
    8. # hash mark
    9. + plus sign
    10. - minus sign
    11. . dot
    12. ! exclamation mark
    13. > greater-than
    14. ~ tilde
  2. Replace four leading spaces (/^[ ]{4}/g) with   to prevent being interpreted as a code block.
  3. Replace 0-3 leading spaces + tab (/^[ ]{0,3}[ ]{1}/g) with   to prevent being interpreted as a code block.
  4. Replace two trailing spaces (/[ ]{2}$/g) with   to prevent being interpreted as a line break.

#2

You could probably wrap it in a <div> and HTML encode it.


#3
```plain text
There will be no Markdown processing in here, 
but the fences should consist of more backticks (or tildes) 
than their longest run in the pasted text.
```

#4

Thanks for the suggestion. It’s certainly a simpler way to avoid having to escape everything. I did try this at the time but the downside is that everything’s preformatted, so it’s still not quite the same end result. I don’t want to preserve indenting/formatting of the source text.

It would be handy if markdown had some kind of “not markdown” tag that you could wrap stuff in that it would just pass through without wrapping in any tags or converting anything.


#5

The HTML encoding idea sounds interesting, although I don’t want a <div> around it. The source text needs to be added inline, in the context of markdown before and after.

A basic example: ## {Title}


#6

I think in that case we need to formalize what it would mean to “Markdown Encode” text (and likely “Markdown Decode” for symmetry).

Based on your earlier posts, I considered suggesting we leverage <![CDATA[]]> blocks, but given your example of using a template that shouldn’t have additional formatting I think discussing what it would mean to encode text to markdown is appropriate.


#7

The presentation, i.e. font and line breaking behavior, depend on your output format. It is simple to change in HTML+CSS for instance, assuming the parser passes the info string value somehow to the class attribute.