An example of a probabilistic classifier. Commonly used in spam filters (classifies as spam if the probability of spam is higher than not spam)
To model this, it uses Bayes rule:
Where
- is the marginal probability that an e-mail is spam
- is the marginal probability than an e-mail has the set of words
- Hard to approximate (lots of ways to combine words)
- is the conditional probability that a spam e-mail has the words
Optimizations
Denominator doesn’t matter
We can actually reframe this to avoid calculating as Naive Bayes just returns spam if
Roughly, denominator doesn’t matter
Conditional Independent Assumptions
Additionally, we assume that all features are conditionally independent given label so we can decompose it.
Laplace Smoothing
If we have no spam messages with lactase, then so spam messages with lactase automatically get through!
Our estimate of is
We can add to the numerator and to the denominator, which effectively adds fake examples: for each where is a possible class (2 for a binary classifier)
So for our binary spam classifier (with ):