[ TechnoCage | Caskey | marko ]
Perhaps in the 'release early, release often' school of thought there's such a thing as 'too early.' If so, this project is one of those.
Marko is a simple toolset to create markov chain databases of a corpus (or two) of text and then allow you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other.
Possible applications include intelligent mail filtering, plagarism detection and historical research.
Marko started life as an implementation of Paul Graham's spam filter, however it quickly generalized into a tool for comparing the affinity of a piece of text against two other bodies.
A little more detail can be found in the doc/README file in the distribution. Or you can read the CHANGELOG.
Well, it isn't very easy to use marko right now. It serves mostly as a proof-of-concept as to the utility of even doing such a thing.
Each package has a GPG signature signed by the distributor, Caskey Dickson. You can get a copy of my key for verification here. For those without GPG, below are the checksums of the distribution. So you can verify the signatures, these are also available in the file checksums.txt in the download directory.
$
You can contact the author, Caskey via email, or use the IRC channel.
The marko IRC channel is #marko on irc.freenode.net:6667.
http://www.technocage.com/~caskey/markoYou're
