jzhao.xyz

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Outlier detection

Oct 03, 20222 min read

seed
CPSC340

Find observations that are unusually different from the others (aka anomaly detection).

Why? We may want to remove outliers, or be interested in the outliers themselves (security)

Generally does not work. It can be hard to decide when to report an outlier. There are always new ways to make outliers!

5 Types of outlier detection

Model-based methods
- See if z-score is past a certain threshold
- Unfortunately, z-score assumes uni-modal data
Graphical approaches
- Look at a plot, human decides if data is an outlier
- Unfortunately only in max 2-3 dimensions
Cluster-based methods
- Cluster the data
- Find points that do not belong to clusters
Distance-based methods
- How many points lie in a radius $ϵ$ ?
- Global outliers
  - For each point, compute the average distance to its KNN
  - Outliers are points that are far from their KNNs
- Local outliers
  - Outlierness ratio of example $i$ is the average distance of $i$ to its KNN over the average distance of neighbours of $i$ to their KNNs
Supervised-learning methods
- Use supervised learning: $y_{i} = 1$ if $x_{i}$ is an outlier, $y_{i} - 0$ if $x_{i}$ is a regular point
- Needs supervision: we need to know what outliers look like

Local vs global outliers

It’s hard to precisely define “outliers”

In the first case it was a “global” outlier.
In this second case it’s a “local” outlier:
- Within normal data range, but far from other points.

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Graph View

Backlinks

Latent-Factor Models
Unsupervised learning

Created with Quartz v4.5.1 © 2025

GitHub
Twitter