Data Distributions
- Machine Learning and
AI Systems excludes the tail ends of the distributions
- this is where minorities live, ends up reproducing existing systems of power (re: To live in their Utopia, Matthew Effect)
- synthetic/generative/federated models suck at these
- most interesting cases ARE outliers (esp. in medical AI)
# Contextual Data
Should data and information be contextualized all the time?
- context is important when dealing with historical data. Knowing why certain decisions were made is extremely important
- We want data to be anonymized to a certain extent. Exposing patient data, for example, is a huge risk.
How do we choose what context to include and what not to include?