The curse of dimensionality

Course completion
22%
$
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\dom}{dom}
\DeclareMathOperator{\sigm}{sigm}
\DeclareMathOperator{\softmax}{softmax}
\DeclareMathOperator{\sign}{sign}
$

Optimization in spaces of high dimensionality can be somewhat counter intuitive. Consider for instance the unit cube in dimension $n$. By comparison, the volume of the cube of side $0.99$ which contains $99\%$ of the volume of the unit cube in dimension $1$ only contains about $36\%$ of it in $100$ dimensions and $4.3\times10^{-5}\%$ in dimension $1000$. Thus, almost all the volume of a $1000-$dimensional hypercube is concentrated in an infinitesimal region near its boundary with almost no volume in the center.

This can be interpreted as a fundamental difference in the behavior of distances in spaces of high dimensionality. The above example can be understood as a manifestation of the fact that almost no points are close together in high dimension because they have so many ways of being dissimilar.

In the context of optimization, the unusual behavior of high dimensional spaces can become very problematic. For instance, it could seem reasonable to use a grid search approach (see Algorithm below, i.e. testing the fitness of 100 values of x at intervals of length $0.01$. However, in a $1000-$dimension space, the number of function evaluations needed to form such a grid would be $100^{1000}$: intractable even for trivial problems.

These issues are by no means insurmountable but can turn up in many situations. We will try to address them when they do.

Algorithm (Grid Search)
Inputs: $f$, a function with $\dom f=[0,1]^{D}$.
$d$, distance between observations.
Outputs: $\hat{\mathbf{x}}$, approximation of a glogal minimum.
begin
$\hat{\mathbf{x}}$ := $(0,0,\dots,0)$
for $u_{1}$ from $0$ to $1$ by step $d$:
    for $u_{2}$ from to $1$ by step $d$:
        $\vdots$
        for $u_{D}$ from $0$ to $1$ by step $d$ :
            $\mathbf{x}_{t}$ := $(u_{1},u_{2},\dots,u_{D})$
            if $f(\mathbf{x}_{t})$ < $f(\hat{\mathbf{x}})$
            then $\hat{\mathbf{x}}$ := $\mathbf{x}_{t}$
return $\hat{\mathbf{x}}$
end

Next: Convex functions