jzhao.xyz

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Exploratory data analysis (EDA)

Sep 09, 20221 min read

seed
CPSC340

How do we “look” at features and high-dimensional examples?

Summary statistics
- Categorical Features
  - Frequencies
  - Mode
  - Quantiles
- Numerical Features
  - Location
    - Mean
    - Median
    - Quantiles
  - Spread
    - Range
    - Variance
    - Interquartile ranges
- Entropy
- Not always representative! Don’t mistake the map for the territory
Distance or similarities
- Hamming distance: number of times elements aren’t equal
- Euclidian distance: how far apart are the vectors (square root of sum of squares)
- Correlation
- Jaccard coefficient: set distance, intersection over union
- Edit distance: for strings, how many characters do I need to change to go from one to the other
- Distance in latent space
Visualizations
- Basic line plots
- Matrix plot: visualize two features as an image
- Correlation plot
  - Can add colour to show a third feature (usually categorical)
- Scatterplot

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Graph View

Backlinks

Machine Learning

Created with Quartz v4.5.1 © 2025

GitHub
Twitter