In many cases, the objective function is not given by a specific formula but is only given in the form of an evaluation process. For example, if the objective is to find the best number of pistons $k$ in an engine to maximize the mileage per gallon $f(k)$, $f$ does not have an analytical formulation. In this example, the objective function presents the additional difficulty of having a discrete domain, namely the set of positive natural numbers $k\in\mathbb{N}^{*}$. In discrete optimization, necessary conditions, sufficient conditions and gradient based algorithms are inapplicable.
Nevertheless, $f$ can still be evaluated for any candidate solution that might be considered. In the previous example, $f(k)$ can be evaluated by actually building an engine with $k$ pistons and measuring the mileage per gallon empirically, clearly a costly process.
A problem such as the one we just described is known as a black-box optimization problem.
$$\min_{\mathbf{x}\in\dom f}f(\mathbf{x}),$$
where the only available operation on $f$ is evaluation, namely obtaining $f(\mathbf{x})$ from any given value $\mathbf{x}$.
Then the above problem is referred to as a black-box optimization problem.
The efficiency of black box optimization strategies is usually measured in terms of the number of evaluations to reach a target value or reciprocally in terms of the target value attainable for a given number of evaluations.
In this context, stochastic optimization algorithms propose to iteratively try new candidate solutions based on a probability distribution on the domain of $f$. Consider for instance the example of uniform random search which consists in repeatedly sampling candidate solutions in a uniform distribution over $\dom f$, i.e. $\mathbf{x}_{t}\sim\mathcal{U}(\dom f)$, and returning the best candidate found so far (see the algorithm below).
Inputs: | $f$, a function. $\delta\hspace{-1pt}t$, the step size. |
Outputs: | $\hat{\mathbf{x}}$, approximation of a global minimum. |
Variables: | $\mathbf{x}_{t}$, candidate solution of the algorithm at time t. |
$\mathbf{x}_{0}\sim\mathcal{U}(\dom f)$
$\hat{\mathbf{x}}:=\mathbf{x}_{0}$
$\mathbf{x}_{t+1}\sim\mathcal{U}(\dom f)$
if $f(\mathbf{x}_{t+1}) < f(\hat{\mathbf{x}})$ then $\hat{\mathbf{x}}:=\mathbf{x}_{t+1}$
return best guess so far $\hat{\mathbf{x}}$.
end
Although this algorithm does not seem very efficient (statements about the superiority of a particular optimization method should always be considered in the light of the no free lunch theorem [Wolpert1997]), which states that no optimization algorithm is strictly better than another on all problems), it has a number of interesting properties. First, it is a global optimization method and never gets stuck into local minima; second, there is a guarantee that each step can only improve performance. Finally, this algorithm is capable of ignoring dimensions which are not relevant to the problem and can therefore be much more efficient than a grid search approach [Bergstra2012].
An interesting improvement of this method concerns the possibility of adapting the probability distribution based on past observations. In essence, the goal is then to learn from previous attempts, where better values might be located in the search space.