Is a reverse conversion (HTML to Markdown) possible?

Currently, I’m investigating whether it would be possible to not convert Markdown to HTML but the other way around: HTML to Markdown.

I found this JavaScript implementation which doesn’t work very well.

So my question is:

Are you aware of a working HTML to Markdown conversion?

Pandoc works quite well. The command would be:

$ pandoc file.html -o file.md
2 Likes

Thanks! You probably mean johnmacfarlane.net/pandoc - right?

Exactly. (link fixed)

1 Like

I even found this one to make it work in C#.

Unfortunately starting a process seems way to slow for what I want to use it.

I’ll try to see whether I can migrate the related parts to C#…

I don’t know your use case, but starting a process is really not as heavy as it used to be on old hardware (test it). If you work in Haskell, you can of course use Pandoc as a library as well, without starting a process.

1 Like

Just saw that it is released under GPL, so it would not work in my case anyway (using it in a commercial software).

Too bad…

kramdown works as well.

Porting either of those on C# might be a major task, wrapping them just to call them seems better.

2 Likes

You can use a GPL tool with a process call in a commercial application, just not with a library call.

1 Like

Are you sure? What I understand is that I also are not allowed to bundle it within my installer.

I’m not sure about the installer, but calling a GPL process is ok as far as I know (I’m no lawyer). See here: http://programmers.stackexchange.com/questions/50118/avoid-gpl-violation-by-moving-library-out-of-process

1 Like

I’ve also recently had success with Reverse Markdown for Ruby.

It attacked all of my 2008-2011 TinyMCE-based WYSIWYG content, which was garbled and horrendous, and output some really basic Markdown. IT was about 99% spot on when parsed by Jekyll + kramdown.

1 Like

I use ittyeditor’s JavaScript implementation. It does both MD (one flavour thereof, of course) to HTML and vice versa.

1 Like

I maintain a library written in C# to achieve this.

It is actively maintained and I happily accept contributions.

There is this css file called Markdown.css that you can apply to your html to render it to markdown.

2 Likes

Automate That Shit works pretty well in my experience