Style Corpus Reference?

mofosyne · September 27, 2014, 11:33am

Should we compile a corpus of different styles in existing historical text archives? And perhaps sort it by applications/fields? Kind of like babelmark, but for assessing how normal people write without a processor.

E.g. It might be interesting to see if there is a difference between how a filmmaker would write a filmscript in txt, and how a news reporter might type their documents, or how a police might write their report.

Where this would help? It would help in the process of finding how “minimum” we can make the core processor, and what’s the best mix we need for combined extensions aimed at various fields e.g. CommonMark (core)+(coder's bundle)

This might require manual work in classifying stylistic choices.