Naive Bayes

An example of a probabilistic classifier. Commonly used in spam filters (classifies as spam if the probability of spam is higher than not spam)

To model this, it uses Bayes rule:

$P (y_{i} = spam ∣ x_{i}) = \frac{P ( x _{i} ∣ y _{i} = spam ) P ( y _{i} = spam )}{P ( x _{i} )}$

Where

$P (y_{i} = spam)$ is the marginal probability that an e-mail is spam
$P (x_{i})$ is the marginal probability than an e-mail has the set of words $x_{i}$
- Hard to approximate (lots of ways to combine words)
$P (x_{i} ∣ y_{i} = spam)$ is the conditional probability that a spam e-mail has the words $x_{i}$

Optimizations

Denominator doesn’t matter

We can actually reframe this to avoid calculating $P (x_{i})$ as Naive Bayes just returns spam if $P (y_{i} = spam ∣ x_{i}) > P (y_{i} = not spam ∣ x_{i})$

Roughly, denominator doesn’t matter

$\propto P (x_{i} ∣ y_{i} = spam) P (y_{i} = spam)$

Conditional Independent Assumptions

Additionally, we assume that all features $x_{i}$ are conditionally independent given label $y_{i}$ so we can decompose it.

$\approx \prod_{j = 1}^{d} P (x_{ij} ∣ y_{i}) P (y_{i})$

Laplace Smoothing

If we have no spam messages with lactase, then $P (l a c t a se ∣ s p am) = 0$ so spam messages with lactase automatically get through!

Our estimate of $P (l a c t a se ∣ s p am) = 0$ is $\frac{# spam messages with lactase}{# spam messages} = \frac{0}{# spam messages}$

We can add $β$ to the numerator and $β k$ to the denominator, which effectively adds $β k$ fake examples: $β$ for each $k$ where $k$ is a possible class (2 for a binary classifier)

So for our binary spam classifier (with $β = 1$ ):

$\frac{# spam messages with lactase + 1}{# spam messages + 2}$

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control