How do we determine how objects (and/or the camera itself) move in the 3D world? Difficulty comes as motion is geometric whereas optical flow is radiometric (about an origin)

See also: aperture problem

### Constraint Equation

Let image intensity be denoted by $I(x,y,t)$. Then, applying chain rule, we obtain $dtdI(x,y,t) =I_{x}dtdx +I_{y}dtdy +I_{t}$.

Let $u=dtdx $ and $v=dtdy $. Then $[u,v]$ is the 2D velocity space.

If we set $dtdI(x,y,t) =0$, then we get the **optical flow constraint equation**: $I_{x}u+I_{y}v+I_{t}=0$.

We assume constant brightness for this, meaning $I(x(t),y(t),t)=C$.

We measure each of the following:

- Spatial Derivative: $I_{x}=∂x∂I $, $I_{y}=∂y∂I $
- Forward difference
- Sobel filter
- Scharr filter

- Optical Flow: $u=dtdx $, $v=dtdy $
- We need to solve for this! (the unknown in the optical flow problem)

- Temporal Derivative: $I_{t}=∂t∂I $
- Frame difference

### Lucas-Kanade

A dense method to compute motion $[u,v]$ at every location in an image.

Where can you see movement that can be effectively computed? A corner!

Solve for $v$ in $v=(A_{T}A)_{−1}A_{T}b$ where $v$ is the 1-by-2 column matrix of $u$ and $v$. $A$ is the $n$-by-2 column matrix of $I_{x}(q_{i})$, $I_{y}(q_{i})$ partial derivatives evaluated at point $q_{i}$ ($A$ is actually the same matrix $C$ used in Harris corner detection). $b$ is the 1-by-$n$ matrix consisting of the negative of the temporal partial derivative for each point.

$A= I_{x}(q_{1})I_{x}(q_{2})⋮I_{x}(q_{n}) I_{y}(q_{1})I_{y}(q_{2})⋮I_{y}(q_{n}) v=[V_{x}V_{y} ]b= −I_{t}(q_{1})−I_{t}(q_{2})⋮−I_{t}(q_{n}) $
*Lucas-Kanade Method*

Assumptions

- Motion is slow enough that partial derivatives $I_{x}$, $I_{y}$, and $I_{t}$ are well-defined
- The optical flow constraint equation holds ($dtdI(x,y,t) =0$)
- Window size is chosen so that motion $[u,v]$ is constant in the window
- Window size is chosen so that the rank of $A_{T}A$ is 2 for the window (required inverse exists)

## Stereo

Computing depth from multiple images. Formulated as a correspondence problem: dtermine match between location of a scene point in one image and its location in another.

Disparity: $d=x−x_{′}=Zbf $ where $b$ is baseline, $x$ is distance from $O$ to epipolar line, and $x_{′}$ is distance from $O_{′}$ to epipolar line. $Z$ is distance from $b$ to target $X$.

Simple stereo algorithm

- Rectify images (make epipolar lines horizontal)
- Rectified images have these properties:
- Image planes of cameras are parallel
- Focal points are at same height
- Focal lengths are the same
- Epiolar lines fall along the horizontal scan lines

- Rectified images have these properties:
- For each pixel
- Find epipolar line
- Scan line for best match
- Compute depth from disparity

Naive approach, pixel-based often lacks content. What we can try is min SSD-error of a window-based approach.

Another approach is to match the edges (the zero-crossings) at different scales.

Note: Sum squared differences (SSD) is the same as Normalized Cross Correlation (NCC)