Evolution of Distribution Algorithms (EDAs)
EDAs are mathematically principled evolutionary algorithm which do away with the biological analogy. As the name suggests EDAs are based on an unsupervised learning topic: density estimation, which will be reviewed thoroughly in the machine learning section. EDAs achieve state of the art performance in the black-box optimization setting where the goal is to optimize a function without any knowledge of how computes its values.
EDAs propose to represent individuals as samples from a probability distribution: the so called proposal distribution. A population is then a set of iid samples from this distribution. In the preceding section, we saw that the mutation and cross-over operators served to define small possible movements around a current population. The EDA approach has the advantage of transforming the problem of moving towards better populations in the input space (which may not be well behaved) to a proxy problem which is usually well behaved: moving towards better proposal distributions.
Formally the algorithm generates a new population by taking samples from a proposal distribution . These individuals are then ranked according to their -values and the best individuals are used to update the proposal distribution with a log-likelihood gradient ascent step, i.e.
where is the step size of the gradient ascent update. The algorithm can then run for a number of steps, until a satisfactory solution is found. The figure below gives an example of EDA update for a Gaussian proposal distribution.
Although the purpose of the algorithm is to maximize , it consists in the maximization of a surrogate objective: , where the weight function is equal to for the best individuals, and otherwise. This has the advantage of making the approach invariant w.r.t. monotone transformations of the objective function .
The maximization of the surrogate objective is done with gradient ascent:
where taking samples from can be done by taking samples from and keeping only the best according to .
This general framework can be adapted to optimize functions in e.g. with CMA-ES [Hansen2008], or in discrete spaces such as with PBIL [Baluja1994]. It has the advantage of allowing a move towards several proposal solutions at once, contrary to methods such as hill climbing. The PBIL algorithm for optimization over is given below.

Figure 5: The steps of an EDA given a fitness function (a) and a Gaussian proposal distribution at time (b). First, samples are drawn from the proposal distribution (c). The best samples are then selected according to their -values (d). Finally, the likelihood of the selected samples w.r.t. the parameters (e) can be increased with a step of log-likelihood gradient ascent leading to a new proposal distribution at time (f).
Algorithm (PBIL)
Inputs: |
, a function with .
, number of proposal samples.
, number of samples to select at each step.
, the step-size.
, the probability of a mutation.
, the mutation shift. |
Outputs: |
, approximation of a global minimum. |
Variables: |
, the parameters of an independent Bernoulli proposal distribution at time : for the probability that at time .
, the proposal samples at time , not to be confused with the components of a vector . |
begin
until satisfied:
rank samples ensuring
// update probability vector
for from to :
for from to :
// mutate probability vector
for from to :
if :
then
return best solution .
end
Next: Summary