Implementation in C#

mlu · September 7, 2014, 3:31pm

Being inspired by this project I have started implementing a parser in C#. At this moment I have 103 tests succeeding.

I will be publishing on GitHub and NuGet, but before I do, I want to avoid any controversy with regard to naming.

I have tentatively called it CommonMarkSharp. Does anyone have an issue with that name?

Vilx_ · September 7, 2014, 3:51pm

Hah! You posted, like, 3 minutes before me! Can highly compatible implementations in other languages also become a part of CommonMark?

codinghorror · September 8, 2014, 4:56am

The guideline is this: you are free to use the name CommonMark in any way you like, provided you pass all tests in the current version of the spec.

mlu · September 8, 2014, 6:12am

Thanks for the clarification. Passing all the tests is certainly the goal.

Knagis · September 8, 2014, 7:23am

I have also started the port of the existing C code to C#. I have similar level of completeness, but creating two different implementations does not seem to be the best idea. Any suggestions on how we could merge the efforts?

adiekmann · September 8, 2014, 2:59pm

You should look at this blog post:
http://blog.codefluententities.com/2014/09/05/commonmarkdown-for-c-using-codefluent-runtime/
This is not a C# implementation but it explains how to call the JS CommonMark engine using the CodeFluent Script Engine. This approach is more maintenable because it uses the JavaScript implementation.

Knagis · September 8, 2014, 3:09pm

how to call the JS CommonMark engine using the CodeFluent Script Engine

The main problem with this approach would be performance.
The next problem is cross-platform compatibility since nowadays .NET libraries like these must run on Mono and .NET Framework and on Android/iOS/Windows.

An update for my progress - I have only 6 tests failing anymore, 4 from those are just problems with the Perl test runner (it does not work properly with UTF on Windows (Strawberry Perl) and one issue with different newlines that a simple regex does not correct for some reason).

Knagis · September 8, 2014, 8:01pm

I finished the port to C#: https://github.com/Knagis/CommonMark.NET

It passes all the tests in the current specification.

The next steps are performance and memory profiling and some refactoring for the public interface (the syntax tree elements).

arthur_peka · September 9, 2014, 5:24pm

The code violates C# naming convention and has some incomprehensive method names (e.g. cr). I understand that this is due to the C roots of the project. Will you accept pull requests fixing this?

Knagis · September 9, 2014, 6:11pm

Yes, the naming is this way only because it was ported directly from C code. The same is true for why there are so many ref parameters - those should go away as well.

I will of course accept pull requests to fix this although we should be careful to not change the structure too much and also note the original names in comments. The idea behind this is so that when the specification or tests change, we could implement the changes by looking at those made to the reference implementation, instead of reinventing the wheel.

Before the specification reaches version 1.0, I believe such pragmatic approach would be better.

arthur_peka · September 9, 2014, 6:28pm

Fair enough, I’ll note methods’ original names in comments. Although usually it’s just removing the underscore and camel casing the name.

arthur_peka · September 9, 2014, 9:12pm

What about automated testing btw? I think that’s a rather high priority, to get some normal QA (perl script testing is so 90s).

Knagis · September 9, 2014, 9:21pm

Yes, I agree that instead of the perl script unit tests within VS should be used (especially since the failing tests currently only fail because Perl on Windows and/or Windows console does not properly handle unicode symbols).
The one thing we should keep is that the source of the tests is the spec.txt file so that for new versions it is just a matter of replacing that one file.

Knagis · September 11, 2014, 7:30pm

The first beta version of the CommonMark.NET implementation has been published to NuGet: https://www.nuget.org/packages/CommonMark.NET/.

Edit: since then a new version (0.1.1) has been uploaded. No longer marked as pre-release. A very simple benchmark shows that for processing spec.txt document, CommonMark.NET now outperforms MarkdownSharp by 50%. Now just a little more to beat Markdown.Deep…

Knagis · September 13, 2014, 9:47pm

Yet another update - 0.1.3 has been optimized to perform just as fast as Markdown.Deep which is the fastest alternative on .NET currently (that I know of).

 CommonMark.NET 0.1.3      7 ms   11%     (current release for this library)
 CommonMark.NET 0.1.2     15 ms   23%
 CommonMark.NET 0.1.1     27 ms   42%
 CommonMark.NET 0.1.0     56 ms   84%     (first public release)
   MarkdownSharp 1.13     55 ms   84%     (MS and MD might not conform to 
     MarkdownDeep 1.5      7 ms   11%     CommonMark specification)
CommonMarkSharp 0.1.1     91 ms   140%
             Baseline     65 ms   100%    (used to compare results on different machines)

vitaly · October 9, 2014, 9:21pm

It was aways interesting for me to compare modern js JITs with static-typed languages.

./benchmark/benchmark.js spec
Selected samples: (1 of 26)
 > spec

Sample: spec.txt (109764 bytes)
 > current x 77.77 ops/sec ±1.44% (68 runs sampled)
 > marked-0.3.2 x 23.10 ops/sec ±0.66% (42 runs sampled)
 > stmd x 39.92 ops/sec ±4.07% (51 runs sampled)

13ms, lol. (mbp retina). current = remarkable in strict commonmark mode.

Knagis · October 15, 2014, 6:57pm

In case there is someone who is interested but is not following on GitHub:

CommonMark.NET has had 5 more releases since Sep 14
the performance is now even better (~4ms where the 0.1.3 release had ~7ms for the spec.txt)
updated the implementation to the version 2 of the specification (updates to entity handling, url encoding and emphasis parsing)

vitaly · October 16, 2014, 12:39am

@Knagis, what computer/CPU do you use for benchmarking?

Knagis · October 17, 2014, 6:05pm

Core i5-2500 @ 3.3Ghz. Using the spec.txt version 1 (114 782 bytes).

Knagis · October 29, 2014, 8:11pm

Found a sundown wrapper for .NET - MoonShine and added that to the comparison.

Unfortunately it seems that most of the performance gain is lost probably due to string interop so it is actually slower (~2x) for very small inputs. But for parsing 112KB spec.txt it performs just 17% faster than CommonMark.NET (on average 3ms vs 4ms).

A better comparsion would be progit.md (I concatenated all languages together, resulting in 10MB file) where sundown/MoonShine does it in 277ms while CommonMark.NET spends 534ms. Still a very good result in my opinion (if only .NET would give access to string internals…).

progit.md    10,6 MB   (3 iterations)

             Library    Total   Each   vs Baseline
--------------------------------------------------
            Baseline    17329   5776   100%
      CommonMark.NET     1601    534   9%
     CommonMarkSharp     5960   1987   34%
       MarkdownSharp    16271   5424   94%
        MarkdownDeep     1080    360   6%
 MoonShine (sundown)      830    277   5%
--------------------------------------------------

Implementation in C#

The main problem with this approach would be performance. The next problem is cross-platform compatibility since nowadays .NET libraries like these must run on Mono and .NET Framework and on Android/iOS/Windows.

The main problem with this approach would be performance.
The next problem is cross-platform compatibility since nowadays .NET libraries like these must run on Mono and .NET Framework and on Android/iOS/Windows.