Probabilistic Classifier

We want a model of $P (y_{i} = important ∣ x_{i})$ for use in decision theory.

Predictions generally map $w^{T} x_{i}$ to labels for classes (for binary prediction, we used $sign (x)$ )
Probabilities we want to map $w^{T} x_{i}$ to the range $[0, 1]$

The most common choice is to use the sigmoid function:

$h (z_{i}) = \frac{1}{1 + e x p ( - z _{i} )}$

Multi-class Probabilities

The softmax function allows us to map $k$ real numbers $z_{i} = w_{c}^{T} x_{i}$ to probabilities.

$P (y ∣ z_{1}, z_{2}, \dots, z_{k}) = \frac{e x p ( z _{y} )}{\sum _{c = 1}^{k} e x p ( z _{c} ))}$

The alternative ‘harder’ version to softmax is the argmax function which simply finds the maximum value, sets it to 1.0, and assigns 0.0 to all other values.

In contrast, the softmax operation serves as a “softer” version of that. Due to the exponentiation involved in softmax, the largest value is emphasized and pushed towards 1.0, while still maintaining a probability distribution over all input values. This allows for a more nuanced representation that captures not only the most likely option but also the relative likelihood of other options.

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control