Evolutionary algorithms propose to optimize an objective function by using an artificial evolution process to select relevant parameters. In this setting, parameter values are seen as organisms evolving in an environment which only allows survival and reproduction of the fittest (the fittest being the best parameter values according to the objective function). The algorithm starts with a set of proposal solutions which is seen as a population of individuals. Based on the evaluation of these individuals according to the function $f$, the best individuals are then selected for reproduction. Reproduction serves to generate new individuals using a mutation operator, and sometimes a cross-over operator. Finally, the new individuals replace the previous population thus creating a new generation. This process is summarized in the figure below.
Figure 4: Visualization of an evolutionary algorithm on a 2D toy problem. The dotted lines represent level sets of the fitness function. At generation n (a), the best individuals according to the fitness function are selected (b). These individuals are then the basis of a reproduction process (c) leading to a new generation (d). The process can then continue with step (b) for the new generation. Notice that generation n+1 has progressed towards better values of the fitness function.
The mutation operator is meant to propose variations of current individuals to allow exploration of the search space over time. However, the mutation operator takes the best individuals of the previous population as input, thus it should only propose small variations, exploiting the fact that previously selected points have (relatively speaking) a higher fitness. Preferring larger variations (to move faster in parameter space) or smaller variations (to better exploit the fitness of the best current individuals) is a recurring problem commonly referred to as the exploration-exploitation dilemma.
The mutation operator can be interpreted as defining a neighborhood on individuals, where the function $f$ is assumed to have small variations. Possible mutations then correspond to nearby individuals and are thus guaranteed to have similar $f$-values (if the assumption holds). This assumption that the $f$-values do not differ to much after mutation is what makes the so-called exploitation possible.
However, w.r.t. the algorithm, the important notion of neighborhood is on populations, not on individuals. If the reproduction step creates a few unfit individuals, they will not be selected for reproduction and therefore cannot impact later generations. What matters is that the reproduction process generates a population close to the previous one as a whole.
The mutation operator implicitly defines a neighborhood on populations by allowing a small variation of each individual. However, there are some settings where nearby populations can also be obtained by allowing the combination of several individuals into a new one. when such an operator exists, it is called a cross-over operator.
This implicit definition of neighborhood in parameter space makes evolutionary algorithms particularly efficient when the parameter space is discrete such as, sequences, trees or graphs, where the usual notion of distance is rarely helpful.
However, despite their effectiveness, evolutionary algorithms have long been criticized for their lack of theoretical justification. A point which is addressed in the next section.