Rather than picking from fixed convolutions, we learn the elements of the filters. A convolution is a linear filter that measures the effect one signal has on another signal.

If is the input signal (image) and is the filter, then the 2D convolution is given by

Convolutional Layer

Standard is DxWxH

is the number of filters, is the spatial extent of filters (kernel size), is the stride, and is the padding

Total number of learnable parameters: .

Pooling Layer

Makes representation smaller, more manageable and spatially invariant.

Total number of learnable parameters: 0.

Layer Summary

  • Convolutional Layer: applies a set of learnable filters
  • Pooling Layer: performs spatial downsampling
  • Fully-connected Layer: same as any regular neural network

A CNN then just learns a hierarchy of filters

Properties of Convolution

  1. Associative.
  2. Symmetric.

Correlation, on the other hand, is generally not associative.

For 1D Gaussians, we note . Convolving with

Boundary Effects

  1. Ignore these locations: make the computation undefined for the outsize rows/columns
  2. Pad with zeroes: return zero whenever of value of is required at some position outside the image
  3. Assume periodicity: wrap image around
  4. Reflect border


A 2D pillbox is rotationally invariant but not separable

An efficient implementation would represent a 2D box filter as the sum of a 2D pillbox and some “extra corner bits”

Gaussian Filters

  • Box filter doesn’t apply well for lens defocus. A circular pillbox is a much better model for defocus
  • Gaussian is a good general smoothing model
    • for phenomena
    • whenever the CLT applies

Gaussian filters are rotationally invariant.

We get where is the standard deviation

For a 3x3, we then need to quantize and truncate it, evaluating wherever in the filter. Increasing means more blur. Problem with 3x3 is that it truncates too much of the distribution (does not sum up to one), this can cause unintentional darkening.

In general, the Gaussian filter should capture for which gives us a 7x7 filter.


As both the 2D box filter and 2D Gaussian filter are separable, it can be implemented as two 1D convolutions which convolve each row and then each column separately.

A 2D filter is separable if it can be expressed as an outer product of two 1D filters

A seperable 2D Gaussian only does multiplications at each pixel (one for each 1D filter). Considering the image has pixels, then this is a multiplications. Assuming , this is

Fourier Transform

The basic building block of the fourier transform is the periodic function.

where is the amplitude, is the angular frequency and is the phase. Fourier’s claim was that you could add enough of these to get any periodic signal!

The Convolution Theorem

Let be the convolution.

Then, which is just a simple element-wise multiplication after applying a Fourier transform to each.

At the expense of two Fourier transforms and one inverse Fourier transform, convolution can be reduced to (complex) multiplication. This speeds up the cost of FFT/IFFT for the image and filter to and respectively, dropping the total cost of convolution to

Convolution Sizing

Convolving two filters of size and results in a filter of size

More broadly for a set of filters of sizes the resulting filter will have size