Optical Flow

How do we determine how objects (and/or the camera itself) move in the 3D world? Difficulty comes as motion is geometric whereas optical flow is radiometric (about an origin)

Constraint Equation

Let image intensity be denoted by $I (x, y, t)$ . Then, applying chain rule, we obtain $\frac{d I ( x , y , t )}{d t} = I_{x} \frac{d x}{d t} + I_{y} \frac{d y}{d t} + I_{t}$ .

Let $u = \frac{d x}{d t}$ and $v = \frac{d y}{d t}$ . Then $[u, v]$ is the 2D velocity space.

If we set $\frac{d I ( x , y , t )}{d t} = 0$ , then we get the optical flow constraint equation: $I_{x} u + I_{y} v + I_{t} = 0$ .

We assume constant brightness for this, meaning $I (x (t), y (t), t) = C$ .

We measure each of the following:

Spatial Derivative: $I_{x} = \frac{\partial I}{\partial x}$ , $I_{y} = \frac{\partial I}{\partial y}$
- Forward difference
- Sobel filter
- Scharr filter
Optical Flow: $u = \frac{d x}{d t}$ , $v = \frac{d y}{d t}$
- We need to solve for this! (the unknown in the optical flow problem)
Temporal Derivative: $I_{t} = \frac{\partial I}{\partial t}$
- Frame difference

Lucas-Kanade

A dense method to compute motion $[u, v]$ at every location in an image.

Where can you see movement that can be effectively computed? A corner!

Solve for $v$ in $v = (A^{T} A)^{- 1} A^{T} b$ where $v$ is the 1-by-2 column matrix of $u$ and $v$ . $A$ is the $n$ -by-2 column matrix of $I_{x} (q_{i})$ , $I_{y} (q_{i})$ partial derivatives evaluated at point $q_{i}$ ( $A$ is actually the same matrix $C$ used in Harris corner detection). $b$ is the 1-by- $n$ matrix consisting of the negative of the temporal partial derivative for each point.

$A = I_{x} (q_{1}) I_{x} (q_{2}) ⋮ I_{x} (q_{n}) I_{y} (q_{1}) I_{y} (q_{2}) ⋮ I_{y} (q_{n}) v = [V_{x} V_{y}] b = - I_{t} (q_{1}) - I_{t} (q_{2}) ⋮ - I_{t} (q_{n})$ Lucas-Kanade Method

Assumptions

Motion is slow enough that partial derivatives $I_{x}$ , $I_{y}$ , and $I_{t}$ are well-defined
The optical flow constraint equation holds ( $\frac{d I ( x , y , t )}{d t} = 0$ )
Window size is chosen so that motion $[u, v]$ is constant in the window
Window size is chosen so that the rank of $A^{T} A$ is 2 for the window (required inverse exists)

Stereo

Computing depth from multiple images. Formulated as a correspondence problem: dtermine match between location of a scene point in one image and its location in another.

Disparity: $d = x - x^{'} = \frac{b f}{Z}$ where $b$ is baseline, $x$ is distance from $O$ to epipolar line, and $x^{'}$ is distance from $O^{'}$ to epipolar line. $Z$ is distance from $b$ to target $X$ .

Simple stereo algorithm

Rectify images (make epipolar lines horizontal)
1. Rectified images have these properties:
  1. Image planes of cameras are parallel
  2. Focal points are at same height
  3. Focal lengths are the same
  4. Epiolar lines fall along the horizontal scan lines
For each pixel
1. Find epipolar line
2. Scan line for best match
3. Compute depth from disparity

Naive approach, pixel-based often lacks content. What we can try is min SSD-error of a window-based approach.

Another approach is to match the edges (the zero-crossings) at different scales.

Note: Sum squared differences (SSD) is the same as Normalized Cross Correlation (NCC)

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control