This should probably be fixed so no one is actually using the same insecure approach.
+++ fransrosen [Sep 03 14 20:16 ]:
I noticed that the try.standardmarkdown.com implementation is not
result in an XSS:
This should probably be fixed so no one is actually using the same
I don’t think it’s a problem for the online dingus, since you’re only
going to get this list if you yourself type the corresponding markdown
in the text box.
There’s a broader question about whether a stmd implementation should
include sanitization of potentially unsafe HTML. I do this in my
Haskell implementation, cheapskate. But it’s probably safer in most
cases to run the entire output of stmd through a stock HTML sanitizer.
That way the people who maintain the sanitization library can worry
about keeping it up to date, and we don’t need to duplicate efforts.
Thoughts about this would be welcome.
True, I didn’t worry about the actual page it was on, but more of the way it was implemented as a use case for others to adapt.
I believe there’s still a belief that using Markdown is a replacement for allowing real HTML, not only because of convenience (easier tags/readability) but also due to the fact that it is just letting certain tags pass through and therefore acts as a sort of sanitizer.
It is not the first time I’ve seen insecure Markdown implementations, so that was why I wanted to discuss it. I think mentioning that a sanitizer is also needed would be a good thing, so that people do not believe this is one-stop-shop for allowing user generated content through.
However, there are secure implementations of Markdown, so this is maybe not affecting all parsers but only the stmd.js version?
It allows approximately every HTML tag to pass through, and even tags which don’t exist in HTML. The Markdown spec does not include anything about sanitizing.
this is because these implementations apply some sanitizing for you already
Now, for places where Markdown is being used for user-generated content, sanitizing is necessary. But everyone has different requirements here. We could provide some sort of default sanitizer alongside (Stack Exchange does this with PageDown), but in the end everyone will have there own list of things they want to allow or prevent.
Thanks for the input. Your points make sense. I missed that it passed through all HTML aswell, just tried the one thing people often forget to sanitize in Markdown. That would probably make devs use a sanitizer combined with Standard Markdown, since nothing is done at all in regards to sanitization. I wonder why this keeps happening though. If people would have a sanitizer after Markdown those issues would be mitigated, but that doesn’t seem to be the case. Ah, well. Thanks for clarifying!
I filed https://github.com/jgm/stmd/issues/61 before I noticed this topic. While I understand why one would want to keep the spec and sanitization separate, this issue needs to be addressed in some way - people will use the reference library and that will cause security issues. At the very least the spec and reference library documentation should warn that additional sanitization is required whenever the Markdown code comes from an untrusted source.
I tried sanitising commonmark with the OWASP Java HTML Sanitizer. The markdown failed to render because:
`is escaped to
`, so code blocks no longer work
- The sanitier automatically closed (in unexpected places)
<p>tags affecting the layout
In addition to failing to render, all the escaped html entities modifies the original markdown and makes it much more difficult to read.
My preference is to sanitise the markdown and persist that rather than storing the markdown, converting to html and sanitizing the html each time it need to be rendered.
In hindsight I think my approach of attempting to sanitise commonmark with a html sanitizer was flawed. Sharing so others can hopefully avoid making the same mistake.