Naive-Bayes Text Classifier

Text classifiers are used to automatically determine some metadata about documents. This metadata can be used to determine whether an email is spam, how to sort incoming Help Desk tickets, or any number of other categorization tasks. Naive-Bayes classifiers make use of the Bayes Rule:

This rule is used to calculate the likelihood of event A given that B has happened. In text classification this is the probability that a document B is of type A based on the frequencies of words in the document.

In this project, I developed a Naive-Bayes text classifier in C#. The classifier was tested using documents from Tom Mitchell’s repository of news-group articles which is located at http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html