CV, broadly speaking, is a research field aimed to enable computers to process and interpret visual data, as sighted humans can
It can also be thought of as the inverse of computer graphics.
Typically, it’s a pipeline from
- Image
- Sensing Device
- Interpreting Device
- Interpretation
Problems in CV
- Measurement. Algorithms for computing properties of the 3D world from visual data. This is literally impossible to invert the image formation process. The best we can do is guess.
- Perception and interpretation. Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. We don’t fully understand how human processing mechanisms work yet!
- Search and organization. Algorithms to mine, search, and interact with visual data. Scale is absolutely enormous.
- Visual imagination. Algorithms for manipulation or creation of image or video content
Problem subtypes
- Categorization
- Detection
- Segmentation
- Instance segmentation
- Image captioning
Subnotes: