A solution to a learning problem is usually found by an optimization procedure, i.e. maximizing some performance measure (e.g. classification accuracy) or minimizing a loss function (e.g. minimize the number of misclassified examples) over a dataset. However, optimization is a much larger problem, namely that of finding the parameters of a function which maximize the associated value. Optimization applies to a large variety of problems such as designing efficient engines or minimizing costs, provided there exists a function which can measure the fitness of candidate solutions.
In this course, we start by a definition of what constitutes an optimization problem and discuss the important issue of local minima. We discuss the special case of convex functions for which every local minimizer is a global one, and that of continuously differentiable functions for which optimum values are among those where the derivative of the objective function is zero. Then, we present the gradient descent algorithm which is a widely used method in ML. Finally, we introduce the black-box optimization setting where inner workings of the objective function are assumed unknown and present EDAs which are especially suited to this context.