An example of a probabilistic classifier. Commonly used in spam filters (classifies as spam if the probability of spam is higher than not spam)

To model this, it uses Bayes rule:

Where

  • is the marginal probability that an e-mail is spam
  • is the marginal probability than an e-mail has the set of words
    • Hard to approximate (lots of ways to combine words)
  • is the conditional probability that a spam e-mail has the words

Optimizations

Denominator doesn’t matter

We can actually reframe this to avoid calculating as Naive Bayes just returns spam if

Roughly, denominator doesn’t matter

Conditional Independent Assumptions

Additionally, we assume that all features are conditionally independent given label so we can decompose it.

Laplace Smoothing

If we have no spam messages with lactase, then so spam messages with lactase automatically get through!

Our estimate of is

We can add to the numerator and to the denominator, which effectively adds fake examples: for each where is a possible class (2 for a binary classifier)

So for our binary spam classifier (with ):