Is a reverse conversion (HTML to Markdown) possible?


#1

Currently, I’m investigating whether it would be possible to not convert Markdown to HTML but the other way around: HTML to Markdown.

I found this JavaScript implementation which doesn’t work very well.

So my question is:

Are you aware of a working HTML to Markdown conversion?


#2

Pandoc works quite well. The command would be:

$ pandoc file.html -o file.md

#3

Thanks! You probably mean johnmacfarlane.net/pandoc - right?


#4

Exactly. (link fixed)


#5

I even found this one to make it work in C#.

Unfortunately starting a process seems way to slow for what I want to use it.

I’ll try to see whether I can migrate the related parts to C#…


#6

I don’t know your use case, but starting a process is really not as heavy as it used to be on old hardware (test it). If you work in Haskell, you can of course use Pandoc as a library as well, without starting a process.


#7

Just saw that it is released under GPL, so it would not work in my case anyway (using it in a commercial software).

Too bad…


#8

kramdown works as well.

Porting either of those on C# might be a major task, wrapping them just to call them seems better.


#9

You can use a GPL tool with a process call in a commercial application, just not with a library call.


#10

Are you sure? What I understand is that I also are not allowed to bundle it within my installer.


#11

I’m not sure about the installer, but calling a GPL process is ok as far as I know (I’m no lawyer). See here: http://programmers.stackexchange.com/questions/50118/avoid-gpl-violation-by-moving-library-out-of-process


#12

I’ve also recently had success with Reverse Markdown for Ruby.

It attacked all of my 2008-2011 TinyMCE-based WYSIWYG content, which was garbled and horrendous, and output some really basic Markdown. IT was about 99% spot on when parsed by Jekyll + kramdown.


#13

I use ittyeditor’s JavaScript implementation. It does both MD (one flavour thereof, of course) to HTML and vice versa.


#14

I maintain a library written in C# to achieve this.

It is actively maintained and I happily accept contributions.


#15

There is this css file called Markdown.css that you can apply to your html to render it to markdown.


#16

Automate That Shit works pretty well in my experience