Continuous differentiable functions

Course completion
44%
$
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\dom}{dom}
\DeclareMathOperator{\sigm}{sigm}
\DeclareMathOperator{\softmax}{softmax}
\DeclareMathOperator{\sign}{sign}
$

In the case of continuous differentiable functions, every local optimum satisfies the so-called 1st order necessary condition, i.e. the gradient at that point has to be $0$. Note that not all points which satisfy this condition are local optima, as in the function $x^{3}$ at $0$.

Definition 1.4 (1st order necessary optimality condition)
Let $\mathbf{x}^{*}$ be a local optimum of f, then $\nabla f(\mathbf{x}^{*})=0$.

This means that at every local optimum, the graph of f has a horizontal tangent (this horizontal tangent is in fact a horizontal tangent hyperplane if the dimension of $\dom f$ is larger than $1$).

Additionally, if $f$ is continuous and twice differentiable, the second order derivative, i.e. the Hessian can be used to define both a necessary condition and a sufficient condition for local optimality.

Definition 1.5 (2nd order necessary optimality condition)
Let $\mathbf{x}^{*}$ be a local minimum of f, then $\nabla f(\mathbf{x}^{*})=0$ and $\nabla^{2}f(\mathbf{x}^{*})$ is positive semidefinite.
Definition 1.6 (2nd order sufficient optimality condition)
Suppose that $f$ is continuous and twice differentiable, and $\nabla f(\mathbf{x}^{*})=0$ and $\nabla^{2}f(\mathbf{x}^{*})$ is positive definite. Then $\mathbf{x}^{*}$ is a strict local minimum.

The above conditions could be used to prove for instance that a function $f:x\rightarrow x^{2}+x-1$ has a global minimum at $x^{*}=1$. The principle can then be generalized to arbitrarily complicated functions in high dimensional spaces, provided the gradient and the Hessian can be found analytically.

Next: Gradient descent