Cross Site Scripting issue in Standard Markdown example at try.standardmarkdown.com

Hi,
I noticed that the try.commonmark.com implementation is not santizing links with the javascript-scheme. The following POC-link will result in an XSS:
[xxx](javascript:alert(1))
This should probably be fixed so no one is actually using the same insecure approach.

+++ fransrosen [Sep 03 14 20:16 ]:

I noticed that the try.standardmarkdown.com implementation is not
santizing links with the javascript-scheme. The following POC-link will
result in an XSS:
[xxx](javascript:alert(1))
This should probably be fixed so no one is actually using the same
insecure approach.

I don’t think it’s a problem for the online dingus, since you’re only
going to get this list if you yourself type the corresponding markdown
in the text box.

There’s a broader question about whether a stmd implementation should
include sanitization of potentially unsafe HTML. I do this in my
Haskell implementation, cheapskate. But it’s probably safer in most
cases to run the entire output of stmd through a stock HTML sanitizer.
That way the people who maintain the sanitization library can worry
about keeping it up to date, and we don’t need to duplicate efforts.

Thoughts about this would be welcome.

5 Likes

True, I didn’t worry about the actual page it was on, but more of the way it was implemented as a use case for others to adapt.

I believe there’s still a belief that using Markdown is a replacement for allowing real HTML, not only because of convenience (easier tags/readability) but also due to the fact that it is just letting certain tags pass through and therefore acts as a sort of sanitizer.
It is not the first time I’ve seen insecure Markdown implementations, so that was why I wanted to discuss it. I think mentioning that a sanitizer is also needed would be a good thing, so that people do not believe this is one-stop-shop for allowing user generated content through.
However, there are secure implementations of Markdown, so this is maybe not affecting all parsers but only the stmd.js version?

It allows approximately every HTML tag to pass through, and even tags which don’t exist in HTML. The Markdown spec does not include anything about sanitizing.

this is because these implementations apply some sanitizing for you already

I have always been and continue to be of the opionion that it’s not Markdown’s job to sanitize anything. Markdown should allow you to create anything you want, including script tags. I’m writing my own blog in Markdown, and I want to be able to write any JavaScript I want there.

Now, for places where Markdown is being used for user-generated content, sanitizing is necessary. But everyone has different requirements here. We could provide some sort of default sanitizer alongside (Stack Exchange does this with PageDown), but in the end everyone will have there own list of things they want to allow or prevent.

4 Likes

Thanks for the input. Your points make sense. I missed that it passed through all HTML aswell, just tried the one thing people often forget to sanitize in Markdown. That would probably make devs use a sanitizer combined with Standard Markdown, since nothing is done at all in regards to sanitization. I wonder why this keeps happening though. If people would have a sanitizer after Markdown those issues would be mitigated, but that doesn’t seem to be the case. Ah, well. Thanks for clarifying!

I filed https://github.com/jgm/stmd/issues/61 before I noticed this topic. While I understand why one would want to keep the spec and sanitization separate, this issue needs to be addressed in some way - people will use the reference library and that will cause security issues. At the very least the spec and reference library documentation should warn that additional sanitization is required whenever the Markdown code comes from an untrusted source.

I tried sanitising commonmark with the OWASP Java HTML Sanitizer. The markdown failed to render because:

  • ` is escaped to `, so code blocks no longer work
  • The sanitier automatically closed (in unexpected places) <p> tags affecting the layout

In addition to failing to render, all the escaped html entities modifies the original markdown and makes it much more difficult to read.

My preference is to sanitise the markdown and persist that rather than storing the markdown, converting to html and sanitizing the html each time it need to be rendered.

In hindsight I think my approach of attempting to sanitise commonmark with a html sanitizer was flawed. Sharing so others can hopefully avoid making the same mistake.

See yaml demo YAML parser for JavaScript - JS-YAML . It allow to pass content via link (share data with zero storage). If someone do similar for markdown, users will be in trouble.

IMHO, it worth to optin dangerous modes like html blocks and html tags. At least for js implementation.