Rather than picking from fixed convolutions, we learn the elements of the filters. A convolution is a linear filter that measures the effect one signal has on another signal.
If is the input signal (image) and is the filter, then the 2D convolution is given by
Standard is DxWxH
is the number of filters, is the spatial extent of filters (kernel size), is the stride, and is the padding
Total number of learnable parameters: .
Makes representation smaller, more manageable and spatially invariant.
Total number of learnable parameters: 0.
- Convolutional Layer: applies a set of learnable filters
- Pooling Layer: performs spatial downsampling
- Fully-connected Layer: same as any regular neural network
A CNN then just learns a hierarchy of filters
Properties of Convolution
Correlation, on the other hand, is generally not associative.
For 1D Gaussians, we note . Convolving with
- Ignore these locations: make the computation undefined for the outsize rows/columns
- Pad with zeroes: return zero whenever of value of is required at some position outside the image
- Assume periodicity: wrap image around
- Reflect border
A 2D pillbox is rotationally invariant but not separable
An efficient implementation would represent a 2D box filter as the sum of a 2D pillbox and some “extra corner bits”
- Box filter doesn’t apply well for lens defocus. A circular pillbox is a much better model for defocus
- Gaussian is a good general smoothing model
- for phenomena
- whenever the CLT applies
Gaussian filters are rotationally invariant.
We get where is the standard deviation
For a 3x3, we then need to quantize and truncate it, evaluating wherever in the filter. Increasing means more blur. Problem with 3x3 is that it truncates too much of the distribution (does not sum up to one), this can cause unintentional darkening.
In general, the Gaussian filter should capture for which gives us a 7x7 filter.
As both the 2D box filter and 2D Gaussian filter are separable, it can be implemented as two 1D convolutions which convolve each row and then each column separately.
A 2D filter is separable if it can be expressed as an outer product of two 1D filters
A seperable 2D Gaussian only does multiplications at each pixel (one for each 1D filter). Considering the image has pixels, then this is a multiplications. Assuming , this is
The basic building block of the fourier transform is the periodic function.
where is the amplitude, is the angular frequency and is the phase. Fourier’s claim was that you could add enough of these to get any periodic signal!
The Convolution Theorem
Let be the convolution.
Then, which is just a simple element-wise multiplication after applying a Fourier transform to each.
At the expense of two Fourier transforms and one inverse Fourier transform, convolution can be reduced to (complex) multiplication. This speeds up the cost of FFT/IFFT for the image and filter to and respectively, dropping the total cost of convolution to
Convolving two filters of size and results in a filter of size
More broadly for a set of filters of sizes the resulting filter will have size