How do we efficiently find the “best” hyper-parameters?

More complicated models have even more hyper-parameters. This makes searching all values expensive (increases over-fitting risk)

Simplest approaches:

  • Exhaustive search: try all combinations among a fixed set of and values.
  • Random search: try random values
  • Stochastic local search: Generic global optimization methods (simulated annealing, genetic algorithms, and so on)
  • Coordinate search: Optimize one hyper-parameter at a time, keeping the others fixed. Repeatedly go through the hyper-parameters