## 14 启发式算法

2019-06-07 13:35 CST
2019-08-20 21:27 CST
CC BY-NC 4.0

## 6.1 Introduction

A heuristiiic algorithm in a very general sense is a consitent algorithm for an optimization problem that is based on some transparent (usually simple) strategy (idea) of searching in the set of all feasible solutions, and that does not guarantee finding any optimal solution.

In a narrow sense a heuristic is a technique providing a cosistent algorithm for which nobody is able to prove that it provides feasible solutions of a reasonable quality in a reasonable (for instance, polynomial) time, but the idea of the heuristic seems to promise good behavior for typical instances of the optimization problem considered.

Another general property of heuristics (independent of in which sense one specified this term) is that they are very robust. This means that one can apply any heuristic technique to a large class of optimization problems, even if these problems have very different combinatorial structures.

In this book we consider the following very narrow definition of the term heuristic:

A heuristic is a robust technique for the design of randomized algorithms for optimization problems, and it provides randomized algorithms for which one is not able to guarantee at once the efficiency and the quality of the computed feasible solutions, even not with any bounded constant probability $p > 0$.

## 6.2 Simulated Annealing

### 6.2.1 Basic Concept

In condensed matter physics, annealing is a process for obtaining lowenergy states of a solid in a heat bath. The physical process of annealing consists of the following two steps:

1. The temperature of the heat bath is increased to a maximum value at which the solid melts. This causes all the particles to arrange themselves randomly.
2. The temperature of the heat bath is slowly decreased according to a given cooling schedule until a low-energy state of the solid (a perfect crystal structure) is achieved.

If one has fixed a neighborhood $Neigh$ for a minimization problem $U = \Sigma_O, \Sigma_O, L, L_I, \mathcal{M}, cost, \texttt{minimum})$, then the simulated annealing algorithm can be described as follows.

### Theory and Experience

Simulated annealing has become a successful, widely used method in combinatorial optimization. The main reasons for this success are:

• the simulated annealing algorithm is based on a simple, transparent idea and it is easy to implement,
• simulated annealing is very robust and can be applied for almost all (combinatorial) optimization problems,
• simulated annealing behaves better than local search (that also has the above formulated positive properties - simplicity and robustness) due to the replacement of the deterministic acceptance criterion of local search by a stochastic criterion.

Theorem 6.2.2.1 Let $U$ be a minimization problem and let $Neigh$ be a neighborhood for $U$. The asymptotic covergence of the simulated annealing algorithm for an input $x$ is guaranteed if the following conditions are satisfied.

1. Every feasible solution from $M(x)$ is reachable from every feasible solution from $M(x)$, and
2. the initial temperature $T$ is at least as large as the depth of the deepest local, nonglobal minimum.

The asymptotic convergence means that one reaches a global optimum with probability tending to 1 with the number of iteration steps. Simulated annealing will find itself at a global optimum an increasing percentage of the time as time grows.

There exist several further theoretical results saying under which assumption about the neighborhoods and cooling schemes the simulated annealing algorithm asymptotically converges to the set of optimal solutions. Typical requirements are the symmetricity of the neighborhoods, the uniformity of neighborhoods, $$\vert Neigh_x(\alpha) \vert = \vert Neigh_x(\beta) \vert \text{ for all } \alpha, \beta \in \mathcal{M}(x),$$ and slow increase of $T$ by specific cooling schedules.

In contrast to local search, simulated annealing can leave local optima, one prefers small (simply definable) neighbor hoods. This enables an efficient execution of one iterative step and so this strategy enabling the performance of many iterative steps usually provides better solutions than the strategy of using large neighborhoods and a small number of iterative steps. Theoretical results show that it is prudent to avoid neighborhoods with the following structural properties:

• spiky structure (topography),
• deep troughs,
• large plateau-like areas.

In general, it is not easy to assure nice structures of solution spaces, because for every neighborhood there may exist a pathological problem instance that has a very wild topology.

The rest of this section is devoted to cooling schedules. This part can be viewed as a production of a sequence of almost random feasible solutions. If one speaks about cooling schedule it means that the following parameters have to be specified:

• an initial temperature $T$,
• a temperature reduction function $f(T, t)$ (also called decrement function or cooling ratio), and
• a termination condition (a value term such that simulated annealing stops for $T < term$).

#### Choice of the Initial Temperature

The inital value $T$ has to be large enough to allow all transitions to be accepted. One possibility is to take $T$ as the maximal difference in the cost between any two neighboring solutions.

Another pragmatic possibility is to start with any value $T$ and, after choosing a neighbor ($\beta$ of the initial situation $\alpha$, to increase $T$ in such a way that ($\beta$ will be accepted with a probability of almost 1. Doing so for a number of first iterative steps one can get a reasonable initial value $T$.

#### Choosing the Temperature Reduction Function

The typical choice of the temperature reduction function is to multiply $T$ by some constant $r$, $0.8 < r < 0.99$.

Another frequent choice of the temperature reduction is $$T_k = \dfrac{T}{\log_2 (k + 2)}.$$

The number $d$ (the number of iterations for any fixed temperature) is usually chosen as the size of the neighborhood.

#### Termination Condition

One possible termination condition is independent of $T$ and says that simulated annealing can stop when there is no change in the cost of the solutions for a time. Another possibility is to take a fixed constant term and to stop when $T \leqslant term$.

In thermodynamics one considers $$term \leqslant \dfrac{\varepsilon}{\ln \left[ (\vert \mathcal{M}(x) \vert - 1) / p \right]},$$ where $p$ is the probability of getting a feasible solution with an $\varepsilon$-approximation ratio.

One does not need to spend too much time with the choice between cooling schemes with the same cooling rate, but one should concentrate on the choice of the initial temperature $T$ and on the choice of a sufficiently large number d of iterations with a fixed temperature.

#### General Observations

• Simulated annealing has the potential to provide feasible solutions of high quality but at the cost of enormous computational efforts.
• The quality of the output does not essentially depend on the choice of the initial feasible solution (which is the case for local search).
• The essential parameters for the success of simulated annealing are the reduction rate and the neighborhood. A lot of experimental work is necessary to adjust the parameters for a particular problem well.
• The average case time complexity of simulated annealing is close to the worst case time complexity.
• Considering the same neighborhood, simulated annealing behaves substantially better (higher quality in the same time) than local search or multistart local search.

#### Applications to Specific Optimization Problems

• For graph positioning problems (Max-Cut, Min-Cut, etc.), the simulated annealing performs better, with respect to both quality of the solutions and time complexity, than the Kernighan-Lin’s variable-depth search algorithm.
• For several engineering problems, for instance from the area of VLSI design, the simulated annealing method seems to be the best approach and it even beats nontrivial approximation algorithms.
• For TSP and similar problems, simulated annealing is weak and there are usually several different approaches that provide substantially better solutions in a shorter time.

## 6.3 Genetic Algorithms

### 6.3.1 Basic Concept

Genetic algorithms are based on the analogy to optimization in the process of evolution of a population of individuals. The main paradigms of the considered evolutionary process are the following ones:

1. The individual of the population can be represented by strings (or vectors) over a finite alphabet, where a string is the code of the genetic structure of a chromosome (DNA sequence).
2. To create a new individual one needs two parents on which the well-known genetic operation crossover is applied. The operation crossover performs an exchange of some sections of the parents' chromosomes, i.e., the “child” gets parts of the genetic information from both parents (for instance, the first half of the chromosomes of the first parent and the second half from the second parent).
3. There exists a fitness value for every individual, which judges the quality (evolutionary maturity) of the individuals. Parents are randomly chosen from the population, and the individuals with a high fitness value have higher probabilities of being chosen to be parents than the individuals with a low fitness value.
4. The genetic operation of mutation is randomly applied to the individuals. This operation causes a random modification of a local part of the chromosomes.
5. The individuals may pass away. Their lifetime is strongly connected to their fitness values.

There are two main differences between this scenario and simulated annealing.

• The first one is that one works with sets of feasible solutions instead of working with one feasible solution only. This may be viewed as an advantage of genetic algorithms because they may produce a collection of high-quality feasible solutions and there are applications that require the production of several different solutions.
• The second difference is connected with the first one. Genetic algorithms cannot be viewed as pure local search because of the creation of new individuals by the crossover operation. If the new individual gets the genetic code from both parents in a balanced way, then there is no small (local) neighborhood that enables its generation.

The first thing that has to be done before applying the concept of genetic algorithms is to choose a suitable representation of feasible solutions as well as to determine the crossover operation on this representation in such a way that it maps two feasible solutions into two feasible solutions.

The mutation operation can be viewed as a random choice from a probably small neighborhood and the randomized choice of individuals for the next generation can be viewed as a randomized acceptance criterion on the new individuals. By suitable choices of all these probabilities there is a possibility to prove similar asymptotic convergence results as for simulated annealing.

The second result analyzing the behavior of genetic algorithms is the socalled Schema Theorem that tries to find practically formal arguments for explaining the intuition of a reasonable behavior of genetic algorithms.

Our aim is now to show that the cardinality of $Schema(s) \cap P$ grows relative to $\vert P \vert$ if $s$ has a high (at least greater than 1) fitness ratio.

Theorem 6.3.1.6 (The Schema Theorem for GAS). For every schema $s$ and every $t = 1, 2, \dots$, the expected number of individuals from $Schema(s)$ in the $(t + 1)$-st population $P_{t+1}$ is $$E[Z_{t+1}] \geqslant Fit-ratio(s, P_t) \cdot \dfrac{n - length(s)}{n} \cdot \left( 1 - order(s) \cdot pr_m \right) \cdot \vert P_t \cap Schema(s) \vert.$$

Thus, the ideal situations for a genetic algorithm are those where short schemata combine to form better and better solutions. The assumption that this corresponds to reality is known as the so-called building-block hypothesis.

### 6.3.2 Adjustment of Free Parameters

We discuss the following free parameters of a genetic algorithm:

• population size,
• selection of the initial population,
• fitness estimation and selection mechanism for parents,
• representation of individuals and the crossover operation,
• probability of mutation,
• selection mechanism for a new population,
• stop criterion.

#### Population Size

It is obvious (like in the case of multistart simulated annealing) that

1. a small population size increases the risk of converging to a local optimum whose fitness may be essentially smaller than the fitness of a global optimum, and
2. large population sizes increase the probability to reach a global optimum.

On the other hand, the amount of computer work grows with the size of the population.

Several practitioners believe that the size of 30 individuals is quite adequate for several problems, while other experimental works suggest taking the population size from the interval $[n,2n]$ for problem instances of size $n$. In general, practitioners prefer small population sizes in order to be competitive with other approaches with respect to the time complexity.

#### Selection of the Initial Population

One frequently used possibility is to take a random selection, where every individual (feasible solution) has the same probability of being chosen. Experience shows that it can be helpful to take some precomputed high-quality feasible solutions into the initial population, because they can speed up the convergence of genetic algorithms. This precomputation can be done by local search or any other fadt heuristic. But if the selection is done completely deterministically, one risks a fadt convergence to some local optimum. Probably the best possibility is to combine random choice of individuals with a precomputation of some individuals with high fitness.