Statistical Analysis of Markdown Usage?

There are a few topics that focus on whether we should keep/add features available in different variants/flavors of Markdown. I’m wondering if we can/should do a statistical analysis of how markdown is being used across major websites, to see what the impact would be of adding features from common flavors, and/or removing features of the original markdown.

This would exclude the many who use it not just on websites, but for personal offline documents too. I think any analysis should be on implementations rather than across websites, because it’s difficult/near impossible to gauge popularity of offline usage. Nevertheless, there could still be meaning in analysis across websites as that might dictate/relate to how markdown is used offline.

Either way, removing underscores is likely not going to happen. Markdown usually wants to use what’s already being used, and underscores as a means of emphasis is still common. But I do understand that’s not exactly the point here.

John’s babelmark2 is probably related here.
See also: Tables vs fenced code blocks, when is something “common”?

3 Likes

I know Vicent mentioned that he was granted permission to use the entire corpus of GitHub documents in Markdown as a test set for his performance improvements to CommonMark’s C reference implementation.

We can also get every Stack Exchange question and answer as all that is Creative Commons licensed.

If the goal is to “prove” that tables are more needed than preformatted code, I suspect you will have a hard time doing that given Markdown has never had support for tables, so whatever was used for hacked in table formatting cannot be detected. (Though GitHub does, so that might be possible in that particular case.)

(And also, based on my experience in Discourse, the need for a table in social discussion is basically nil. I suspect embedding a table or mini spreadsheet would be more effective.)

I think it would be fantastic if we could do a statistical analysis of whatever usage we can get our hands on, including StackExchange and GitHub. Markdown has a broader use than just “social” sites, so I think it’d be great if we can expand our analysis beyond the “social” aspect.

For instance, Markdown is used quite a bit on blogs (which aren’t entirely social), and we could look at to Wordpress.com for usage data. Wordpress.com has has markdown support, an implementation of MarkdownExtra which famously has support for tables and footnotes, and looks to be under somewhat active development. Apparently 43.7 million posts are made on Wordpress.com monthly, and while I don’t know how many use the “extra” features, we could ask. A search of Wordpress.org’s plugin repository shows that plugins supporting Markdown have hundreds of thousands of downloads. Again, no way to know how those downloads are used, but we could do some rough estimates.

As an aside: I’m not trying to “prove” the need for tables or footnotes (although I personally find them valuable), more I am hoping that we can base our implementation on common usage in the real-world, and use data to make our decisions about what to include. As you note, code blocks were not in the original implementation but are incredibly useful. Just because something wasn’t imagined by Markdown when it was created, doesn’t mean that a great solution hasn’t been found for many users.

EDIT: MarkdownExtra is discussed more here.