This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
(0)]. The state of the vehicle at any time instant can be conclusively computed as far as initial orientation >(0) is specified. The location of the vehicle used as measurement data, [x(k), y(k), (j>(k)], can also be computed from the absolute sensor readings at all times. Since the data obtainable from the gyroscope is the rate of change of the vehicle orientation Vgood> then mode <— exploitation 9.If mode = exploration, then apply the specialized mutation operator to each individual in P<7 10.If stopping criterion has not been satisfied, go to Step 6
132
T. Furukawa, et al.
Fig. 6.4. Flowchart of simulation.
of the vehicle state. Due to noise present in the sensors as well as the inaccuracies of the vehicle model, error in this estimate gradually increases. Thus, information from external sensors is periodically measured, and errors accumulated during each period are incorporated into the state estimate using a Kalman filter based estimator to correct errors and obtain a more accurate state estimate. Note that this information may not be available for extended periods of time depending on the environment in which the vehicle operates. GPS signals are often prone to blackout near buildings and other structure that obstruct or reflect radio signals. In such situations, the vehicle navigation purely relies on the estimates obtained from the internal sensors and the vehicle model. Therefore, the availability of an accurate vehicle model with accurate kinematic parameters is extremely valuable for the proper functioning of an autonomous vehicle navigation system.
Application of MOEAs in Autonomous Vehicles Navigation
133
Fig. 6.5. Navigation system of an autonomous vehicle.
6.3. Parameter Identification of Autonomous Vehicles 6.3.1. Problem
Formulation
By observing the autonomous vehicles in the last section, the parameter identification problem of concern can be characterized as follows: • Parameters to be identified are x T = [c\, c2, c3, /, b, 0(0), r, 8] € St8. • Errors in position and orientation of the vehicle must be minimized to identify the parameters. • For the Kalman filter based estimator, the predictor model is only needed to be accurate over each short time period between the receipt of external sensor readings. The consideration of all the characteristics brings the following problem formulation of a multi-objective optimization problem: f(x) T = [/POS(X),/OJU(X)] -)• rnin,
(8)
where, to be accurate over each short time period, objective functions f (x) : Rn ->• 5ft2 are given by
fposW = E X > ( * ' k'f+3)-x(i • k'f+j)\\2 i=\
j=\
+\\y(i-k'f+j)-y(i-k'f+j)\\2, np
k'f
/ofl/(x) = J2 E I W • k'f + 3) - Hi • k'f + j)\\\ j = l 3=1
(9)
134
T. Furukawa, et al.
and = y(i • k'f),i = l,...,n p ,
x(i -k'f) = x(i-k'f),y(i-k'f)
= 4(i-k'f),i
=l,...,np.
(10)
where k'f is the number of iterations for each period which is used for further autonomous navigation, and np is the number of partitions in the vehicle operation. The total number of iterations is given by kf = k\ • nv.
6.3.2. A General Framework for Searching Solutions
Pareto-Optimal
Fig. 6.6 shows the flowchart of the framework of the multi-objective optimization proposed in this chapter. In order to find multiple solutions, the multi-objective optimization searches with A multiple points, i.e.,
x(tf) = {Xl\...,xf}e0R")\ th
(ii)
th
where xf is the i search point at K generation. The initial population, X(0), is generated randomly within a specified range [x m i n ,x m a x ]. Each objective function value fj(x.f) is then calculated with each parameter set x.f, finally yielding
F(J0 = {f(xf),...f(x£)}.
(12)
Unlike the other MOEAs, two scalar criteria are evaluated for each search point in the proposed framework. One is the rank in Paretooptimality as usual
0(/r) = {6»(xfr),...,(?(xf)})
(13)
where 6 : 5ft™ —> N, and the other is a positive real-valued scalar objective function or a fitness, which is derived by taking the rank into account:
*(Ar) = {0(xf) ) ...,0(xf)},
(14)
where
(15)
Application of MOEAs in Autonomous Vehicles Navigation
135
Fig. 6.6. Flowchart of multi-objective evolutionary algorithms.
where s is the search operator. Once the iterative computation is enabled, we want to find effective Pareto-optimal solutions as many as possible such that the solution space can be configured. Another technique proposed here is a Pareto pooling strategy where the set of Pareto-optimal solutions created in the past are pooled as P{K) besides the population of search points X{K). The process of the Pareto pooling technique is as follows. The whole Pareto-optimal solutions obtained in the first generation are saved in this storage, i.e., P(0) = X(0). From the second generation, the newly created Pareto-optimal solutions in the optimization loop, X(K+1), are compared to the stored Pareto-optimal solutions P(K), and the new set of Paretooptimal solutions P(K + 1) is saved in the storage as illustrated in Fig. 6.7. Some Pareto-optimal solutions may be identical or very close to an existing point. The storage of such solutions is simply a waste of memory, so
136
T. Furukawa, et al.
that they are discarded if they are closer than the resolution set a priori. The creations of the new population and the Pareto-optimal solutions are repeated until a terminal condition is satisfied.
Fig. 6.7. Creation of Pareto-optimal solutions.
6.3.3. Selection of a Single Solution by CoGM Fig. 6.8 illustrates Pareto-optimal solutions where two objective functions f — [/i,/2] T are minimized to identify three parameters x = [xi, X2,x3]T. As two-dimensional function space and three-dimensional parameter space are still easy to visualize, one may incorporate human knowledge into computational knowledge-based techniques such as expert systems and fuzzy logic24 for automatic selection of a single solution. However, if the numbers of objective functions and parameters are considerably large, the knowledge to be constructed is immense, and such techniques are no longer possible practically. In this case, one prominent way is to select the solution residing in the center of solution space since this solution is robust. The authors here propose a technique where the closest solution to the center-of-gravity is chosen as the solution. Let the Pareto-optimal solutions finally obtained
Application of MOEAs in Autonomous Vehicles Navigation
137
be x \ Vi e {1,...,?}. If each solution is evaluated in a scalar manner, i.e., (^(x1), the center-of-gravity is in general given by
(16) As the Pareto-optimal solutions must be evaluated equally, we can consider all the Pareto-optimal solutions possess the same scalar value, i.e., (^(x1) = ••• = tp(5tq). No matter what the value is, the center-of-gravity results in the form:
x = ^i=1
.
(17)
The effectiveness of the center-of-gravity method cannot be proved theoretically, but it is highly acceptable, as it has been commonly used in fuzzy logic24 to find a solution from the solution space described by fuzzy sets. The adoption of a clustering algorithm will increase the reliability of the solution 25 .
Fig. 6.8. Process of deriving a single solution.
138
T. Furukawa, et al.
6.4. Multi-Objective Optimization 6.4.1. Evaluation of Functions 6.4.1.1. Rank Function Fig. 6.9 depicts the process to rank the search points and resultantly derive Q(K) in Eq. 13. The process is purely based on an elimination rule. In the rule, every objective function at every search point fj(xf), Vi G {1,..., A}, Vj € {1, ...,m}, is first calculated, and the Pareto-optimal set in the population is ranked No. 1, i.e., 0(xf) = 1 if the search point xf^ is in the Pareto-optimal set. The group of search points ranked No. 1 is denoted as (7(1) in thefigure.The points with rank No. 1 are then eliminated from the population, and the Pareto-optimal set in the current population is ranked No. 2, #(xf) = 2. Ranking is continued in the same fashion until all the points are ranked26.
Fig. 6.9.
Ranking process.
Application of MOEAs in Autonomous Vehicles Navigation
139
6.4.1.2. Fitness Function The evaluation of fitness of each search point starts with finding the best and worst values of each objective function among the population: f /best; = min{/ ; (xf )|Vt € {1.....A}} I /worst; = max{/ j (xf )|Vi € {1,..., A}} '
. . '
y
If we temporarily define the fitness as 6' (xK) - fWOTStj ~ ^ /worstj
X
' ^
/best;
(19)
we can get the normalized conditions: 0 < # ( x ? ) < 1,
(20)
and this allows us to treat the fitness of each objective function in the same scale. The fitness of points with the same rank has to be the same, and the true fitness of each objective function is thus defined as:
(21)
The fitness of each individual can be conclusively calculated as: m
0(xf)=J>,0 J (xf),
(22)
3= 1
where Wj € [0,1] is a weighting factor, the value of which varies depending on the search methods present in the next subsection. The fitness value will appear within the range: 0 <
(23)
6.4.2. Search Methods 6.4.2.1. MCEA The process to find x^ + 1 from x f in this approach is conducted algorithmically in an evolutionary manner through two evolutionary operators, reproduction and selection23. The reproduction, consisting of recombination and mutation, contributes to the creation of new search points inheriting some information from the old search points whereas the selection guarantees that the search points on the whole move towards finding Pareto-optimal solutions.
140
T. Furukawa, et al.
As the approach is concerned with the identification of continuous search points, the recombination and mutation adopt continuous formulation. After all the search points are paired randomly, a pair of search points, x£f and x ^ , go through the following recombination operation:
J x£ := (1 - M )x* + nxf I X/3 •— \l ~ W x /3 + M x a
(24)
where parameter /j, may be defined by the normal distribution with mean 0 and standard deviation a: » = N(0,a2),
(25)
or simply a uniform distribution: fi = rand(-/i m a x , /"max)
(26)
with often 0 < // m a x < 0.3. The 'rand' operator in the equation returns a uniformly random value within the range specified in the input. The mutation can also be achieved simply by implementing x f :=rand { p m } (x m i n ,x m a x )
(27)
with small probability Pm27 • Note that the mutation may not be necessary for parameter // with normal distribution since it can allow individuals to alter largely with a small probability, when the coefficient /i is large. The new search points xf^+1 are finally determined through the selection operation, which favorably selects individuals of higher fitness proportionally more often than those of lower fitness for the next iteration. As
(28)
6.4.2.2. MOGM The strength of the proposed framework is the introduction of fitness ${K), as gradient-based search methods, which are based on a continuous function value and search much more efficiently than conventional evolutionary
Application of MOEAs in Autonomous Vehicles Navigation
141
algorithms, can be implemented. In order to yield a well-distributed solution, the weighting factor Wj in Eq. 22 is chosen randomly in the range [0,1] Wi = rand(0,1).
(29)
With this >(xf), the next state of a search point is given by
x f + ^ x f + Axf,
(30)
where the step of the search point is determined by Axf = ad(xf,
(31)
In the equation, a is the search step length iteratively searched as a subproblem by Wolfe's algorithms29, whereas mapping d outputs the direction of the search step. In the steepest descent (SD) method, the mapping is defined as
d SD (xf,V0(xf)) = V0(xf).
(32)
In the quasi-Newton (QN) method, the mapping is defined as d QJV (xf, V0(xf)) = VA" 1 V0(xf),
(33)
where Ak ~ V2>(xf). The effectiveness of MCEAs and MOGM is not investigated in this chapter due to the limitation in chapter length. It was previously demonstrated with various numerical examples. The reader is referred to the report by Furukawa, et al.30 for the details. 6.5. Application of Parameter Identification of an Autonomous Vehicle The proposed technique was applied to the identification of the parameter set of the autonomous vehicle developed by the authors where the vehicle tracked a path in a flat parking area for 100 seconds. The vehicle path created from GPS readings is shown in Fig. 6.10. Note that x and y coordinate data are collected independently with respect to time. The gyroscope, steering encoder and velocity encoder readings are shown in Figs. 6.11-6.13. The information from these sensors was sub-sampled at 4 Hz to obtain a synchronous sequence of data for parameter identification. Although the parameters have been well identified by both methods, we will show the results when MCEA was used in this section. Table 6.29 lists the parameters used for simulation and MCEA to execute identification,
142
T. Furukawa, et al.
and the initial guess of the search space of parameters to be identified is listed in Table 6.30 in the form of the lower and upper boundaries, i.e., xmin and x max - The search space was chosen to include the original calibration data of each parameter at the center of the space, and the range was determined based on its reliability.
Fig. 6.10. Vehicle path created from GPS readings (x [m] - y [m]).
Fig. 6.11. Gyroscope readings (x: Time [sec] - y: Rate of change of orientation [rad/s]).
Fig. 6.14 shows the Pareto-optimal solutions in function space after 100
Application of MOEAs in Autonomous Vehicles Navigation
143
Fig. 6.12. Steering encoder readings (x: Time [sec] - y: Steering encoder counts).
Fig. 6.13. Velocity encoder readings (x: Time [sec] - y: Velocity encoder counts).
Table 6.29. Parameters for autonomous vehicle parameter identification. Parameter No. of generations Population (A) Mutation rate Pm No. ofTime partitions step (n p )
Value 2500 10 0.10 20 0.05
generations. It is first easily found that the orientation is much smaller than the position in objective function value but that the solutions are well distributed showing a smooth convex-shaped curve in such a different
144
T. Furukawa, et al. Table 6.30.
Initial search space of parameters to be identified.
Parameter
I
b
c\
ci
xmin Xmax
3.10 3;20
0.85 1.05
4.50 10" 4 4.60 -lO" 4
-0.925 -0.900
Parameter
a
<j>o
r
9
xmm Xmax
4.90 -10" 4 5.00 -10~4
1.96 1.98
3.65 3;67
0.160 0.190
scale. Next, Pareto-optimal solutions in parameter space are depicted in Figs. 6.15-6.18. Although the parameter scales are also different from each other, the solutions in each graph show a characteristic distribution, from upper-left to lower-right for Figs. 6.15-6.17 and from lower-left to upperright for Fig. 6.18.
Fig. 6.14. foRl)-
Pareto-optimal solutions of identification in function space (x: fpos
~ V-
Because of the high dimensionality of the parameter space, The final solution cannot be selected in parameter space manually. The CoGM was used to select the final solution, and the solution is listed in Table 6.31. Together with the parameters identified by the original calibration, the solutions each having the minimum position and orientation errors are also shown in the table for comparison. Since rather monotonic distributions of solutions in parameter space are obtained for this example, the solution
Application of MOEAs in Autonomous
145
Vehicles Navigation
Fig. 6.15. [m]).
Pareto-optimal solutions of identification in parameter space (x: I [m] - y: b
Fig. 6.16.
Pareto-optimal solutions of identification in parameter space (x: C3 - y: c\).
chosen by the center-of-gravity method is well within the solution space. Table 6.31. Parameter Chosen Min. pos. err. Min. ori. err. Original
I 3.152 3.145 3.160 3J^
Parameter
C3
Chosen Min. pos. err. Min. ori. err. Original
4.955 -10^ 4 4.963 -10- 4 4.946 -lO" 4 4.95 -lO" 4
Parameters identified. b 0.9441 0.9641 0.8831 095^ <po 1.967 1.973 1.963 L97
ci 4.536 -10" 4 4.513 -lO" 4 4.549 10~ 4 4.55 -10~ 4
C2 -0.9149 -0.9203 -0.9154 -0.9125
r
0
3.675 3.695 3.650 3;66
0.1801 0.1807 0.1770 0.175
146
T. Furukawa, et al.
Fig. 6.17.
Pareto-optimal solutions of identification in parameter space (x: C2 - y- >o)-
Fig. 6.18.
Pareto-optimal solutions of identification in parameter space (x: r [m] - y: 6
[rad]).
The simulation result using the parameter set chosen, used to calculate the objective function values, is shown in Fig. 6.19 with the GPS data denoted as 'Experiment'. The simulated path shows some accumulated errors, but it is well along the GPS data, clearly indicating that an appropriate parameter set is identified. The errors may be caused by the slip of the vehicle and other inaccuracies of the model rather than its parameters themselves. Since there is no way to investigate the errors with the current vehicle set-up, we shall not further discuss it in this paper. To investigate the appropriateness of this solution to the other Paretooptimal solutions, simulation without correcting the path at every partition was conducted with the three Pareto-optimal solutions in Table 6.31. In order to see how robust the parameter set chosen through the proposed
Application of MOEAs in Autonomous Vehicles Navigation
147
Fig. 6.19. Simulation results with parameters chosen (x [m] - y [m]).
technique is, the simulation was conducted not only for the first 100 seconds during which GPS data were used to find the solution but also for the next 100 seconds. The simulation results with the three solutions are depicted in Figs. 6.20-6.25. The solution with the minimum position error and the solution chosen correlate well with GPS data in comparison to the solution with the minimum orientation error. The orientation accuracy must also be investigated to find the most appropriate solution, and in order to see the results in more detail, the error values computed by the three Pareto-optimal solutions and by the original parameter set are listed in Table 6.32. It is first seen that the worst solutions in position error and orientation error in both the first and second 100 seconds among the three Pareto-optimal solutions are the solutions with the minimum orientation error and with the minimum position error, respectively. Particularly, the orientation error by the minimum position error solution and the position error by the minimum orientation error solution, both in the second 100 seconds, are significantly large compared to the others. This is clearly caused by the fact that the other objective function was not much considered; to get the minimum position error solution, for example, the objective function describing the orientation errors is not taken into account. The parameter set by the original calibration shows largest errors in almost all the items. This indicates the importance of parameter identification during vehicle operation.
148
T. Furukawa, et al.
The solution chosen is not worst in any criterion, and it is even better than the solution with the minimum position error in the position error of the second 100 seconds. This characteristic remained even with different numerical examples. The fact that the accurate orientation of the vehicle at each iteration can contribute to its accurate positioning may have increased the accuracy of the solution chosen in position.
Fig. 6.20. Non-partitioned simulation results with parameters chosen ( 1 s t 100 seconds, (x [m] - y [m])).
Table 6.32. Solution Chosen Min. ori. err. Min. pos. err. Original
Position and orientation errors.
Position error 1st 100 sec 2nd 100 sec 710.66 706.165 719.83 721.45
23,361 37,992 253,920 57,351
Orientation error 1st 100 sec 2nd 100 sec 1.5656 2.5941 1.2630 2.7214
316.52 2,673.7 8.5044 3,172.4
6.6. Conclusions A technique to identify the parameter set of an autonomous vehicle during the normal operation has been proposed. Formulating the parameter identification problem in a multi-objective fashion where position and orientation errors are minimized, a multi-objective optimization method is
Application of MOEAs in Autonomous Vehicles Navigation
149
Fig. 6.21. Non-partitioned simulation results with parameters chosen (2 n d 100 seconds, (x [m] - y [m])).
Fig. 6.22. Non-partitioned simulation results with minimum position error parameters (I s4 100 seconds, (x [m] - y [m])).
first used to find Pareto-optimal parameter sets that minimize the two error functions. A framework for multi-objective optimization and two search methods, MCEA and MOGM, which find Pareto-optimal solutions of this class of multi-objective optimization problems efficiently, have been pro-
150
T. Furukawa, et al.
Fig. 6.23. Non-partitioned simulation results with minimum position error parameters (2 nd 100 seconds, (x [m] - y [m])).
Fig. 6.24. Non-partitioned simulation results with minimum orientation error parameters {1st 100 seconds, (x [m] - y [m])).
posed. Finally, CoGM has been proposed to select a final parameter set from the Pareto-optimal solutions. The proposed technique was applied to parameter identification of an autonomous vehicle developed by the authors, and a solution was chosen
Application of MOEAs in Autonomous Vehicles Navigation
151
Fig. 6.25. Non-partitioned simulation results with minimum orientation error parameters (2 n d 100 seconds, (a: [m] - y [m])). from the Pareto-optimal solutions derived by MCEA. The solution was compared to the original parameter set and the other Pareto-optimal solutions, and the appropriateness of the solution in accuracy has been demonstrated. The parameter set identified by the proposed technique further has proven to increase the accuracy of simultaneous localization and mapbuilding of an autonomous vehicle by an average of 11.3% in comparison to the navigation with the original parameter set used. This result has indicates the overall effectiveness of the proposed technique for the parameter identification of an autonomous vehicle. 6.7. Acknowledgement This work is supported by the ARC Centre of Excellence program, funded by the Australian Research Council (ARC) and the New South Wales State Government. References 1. H. F. Durrant-Whyte, International Journal of Robotics Research 15(5), 407-440 (1986). 2. R. Madhavan, G. Dissanayake, H. F. Durrant-Whyte, J. M. Roberts, P. I. Corke and J. Cunningham, Mineral Resources Engineering 8(3), 313-323 (1994).
152
T. Furukawa, et al.
3. T. Pilarski, M. Happold, H. Pangels, M. Ollis, K. Fitzpatrick and A. Stentz, Proceedings of the 8th International Topical Meeting on Robotics and Remote Systems, April (1999). 4. S. Scheding, G. Dissanayake, E. Nebot and H. F. Durrant-Whyte, IEEE Transactions on Robotics and Automation 15(1), 85-95 (1999). 5. J. Borenstein and L. Feng, IEEE Transactions on Robotics and Automation 12(6), 869-880 (1996). 6. S. Singh and D. H. Shin, Vision and Navigation: The CMU Navlab C. E. Thorpe, Ed., Kluwer Press, 365 (1990). 7. T. Furukawa and G. Dissanayake, Engineering Optimization 34(4), 22-48 (2002). 8. C. A. C. Coello, International Journal of Knowledge and Information Systems 1(3), 269-308, (1999). 9. D. V. Van Veldhuizen and G. B. Lamont, Evolutionary Computation 8(2), 125-147, (2000). 10. R. Kumar and P. Rockett, Evolutionary Computation 10(3), 283-314, (2002). 11. A. Toffolo and E. Benini, Evolutionary Computation 11(2), 151-167, (2003). 12. L. Costa and P. Oliveira, Evolutionary Computation 11(4), 417-438, (2003). 13. Y. Bard, Nonlinear Parameter Estimation (Academic Press, New York, 1976). 14. L. C. W. Dixon, Nonlinear Optimisation (The English Universities Press, London, 1972). 15. W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes in C (Cambridge University Press, Cambridge, 1988). 16. G. L. Nemhauser, A. H. G. Rinnooy Kan and M. J. Todd, Handbooks in Operations Management Science Vol. 1 Optimization (Elsevier Science Publishers B.V., Amsterdam, 1989). 17. F. Hoffmeister and T. Baeck, Genetic Algorithms and Evolution Strategies: Similarities and Differences (Technical Report, University of Dortmund, Germany, Sys-1/92, 1992). 18. T. Baek and H.-P. Schwefel, International Journal of Evolutionary Computation 1(1), 1-23, (1993). 19. T. Furukawa and G. Dissanayake, Proceedings of the 71st JSME Annual Meeting 930(71), 509-510 (1993). 20. T. Furukawa and G. Yagawa, International Journal for Numerical Methods in Engineering 40, 1071-1090 (1997). 21. C. M. Fonseca and P. J. Fleming, Proceedings of the Fifth International Conference on Genetic Algorithms, (S. Forrest, Ed., Morgan Kaufmann, San Mateo, CA, 416-423, 1993). 22. C. M. Fonseca and P. J. Fleming, International Journal of Evolutionary Computation, 3(1), 1-16, (1993). 23. T. Furukawa, International Journal for Numerical Methods in Engineering 52, 219-238, (2001). 24. J. H. Holland, Adaptation in Natural and Artificial Systems (The University of Michigan Press, Michigan, 1975).
Application of MOEAs in Autonomous Vehicles Navigation
153
25. M. J. Jeong, S. Yoshimura, T. Furukawa, G. Yagawa and Y. J. Kim, Proceedings of Computational Engineering Conference 5, 231-234, (2000). 26. D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, Reading, MA, 1989). 27. T. Kowaltczyk, T. Furukawa, S. Yoshimura and G. Yagawa, Inverse Problems in Engineering Mechanics, Eds. M. Tanaka and G. S. Dulikravich (Elsevier Science Publishers B.V., Amsterdam, 1998), 541-550. 28. J. E. Baker, Proceedings of the First International Conference on Genetic Algorithms and Their Applications J. J. Grefenstette, Ed., 101-111, (1985). 29. P. Wolfe, Econometrica, 27, 382-398, (1959). 30. T. Furukawa, S. Yoshimura and H. Kawai,Proceedings of the Fifth World Congress on Computational Mechanics, (Eds. H.A. Mang, F.G. Rammerstofer and J. Eberhardsteiner, Vienna University of Technology, 2002) 1-11.
CHAPTER 7 AUTOMATING CONTROL SYSTEM DESIGN VIA A MULTIOBJECTIVE EVOLUTIONARY ALGORITHM
K.C. Tan* and Y. Li** * Department of Electrical and Computer Engineering National University of Singapore 4 Engineering Drive 3, Singapore 117576 Republic of Singapore E-mail: eletankcQnus.edu.sg ** Center for Systems and Control & Dept. of Electronics and Electrical Engineering University of Glasgow Glasgow G12 8LT, UK
This chapter presents a performance-prioritized computer aided control system design (CACSD) methodology using a multi-objective evolutionary algorithm. The evolutionary CACSD approach unifies different control laws in both the time and frequency domains based upon performance satisfactions, without the need of aggregating different design criteria into a compromise function. It is shown that control engineers' expertise as well as settings on goal or priority for different preference on each performance requirement can be easily included and modified on-line according to the evolving trade-offs, which makes the controller design interactive, transparent and simple for real-time implementation. Advantages of the evolutionary CACSD methodology are illustrated upon a non-minimal phase plant control system, which offer a set of low-order Pareto optimal controllers satisfying all the conflicting performance requirements in the face of system constraints. 7.1. Introduction With rapid developments in linear time-invariant (LTI) control theories and algorithms in the past few decades, many control schemes ranging from the most straightforward proportional plus integral plus derivative (PID), phase lead/lag and pole-placement schemes to more sophisticated optimal, adap155
156
K.C. Tan and Y. Li
tive and robust control algorithms have been available to control engineers. Each of these control schemes, however, employs a different control characteristic or design technique that is often restricted ad-hoc to one particular problem or addresses only a limited subset of performance issues. To design an optimal controller using these methods, control engineers need to select an appropriate control law that best suits the application on hand, and to determine a practical control structure with a set of optimal controller parameters that best satisfies the usually conflicting performance specifications in both the time and frequency domains. An effective design approach is to coin the linear controller synthesis by meeting all types of performance requirements and constraints via numerical optimization, instead of by a specific control scheme or in a narrow problem domain. This approach of simultaneously addressing design specifications in both the time and frequency domains is, however, semi-infinite and generally not everywhere differentiable1"6. Therefore conventional numerical approaches that often rely on a smooth and differentiable performance index can only address a small subset of the problem or to limit the type of the design specifications for convex optimization7"8, which forms the major obstacle on the development of a generalized numerical optimization package for practical control applications. In this chapter, a uniform CACSD methodology is presented to accommodate LTI control laws based on performance requirements and practical design constraints in both the time and frequency domains, without the need of linear parameterization or confining the design in a particular domain for convex optimization. Unlike existing mutually independent and individual LTI control schemes, control engineers can easily address practical performance requirements such as rise time or overshoots in the time domain, and formulate the robustness specifications such as disturbance rejection or plant uncertainty according to the well developed robustness theorems in the frequency domain, as desired. Developing such an optimal unified linear time-invariant control (ULTIC) system, however, requires a powerful and global multi-objective optimization technique to determine the multiple controller parameters simultaneously, in order to satisfy a set of usually conflicting design specifications in a multi-modal multi-objective design space. Complexity, nonlinearity and constraints in practical systems, such as voltage/current limits, saturation, transportation delays, noise or disturbance, cause the design problem space to be discontinuous and difficult to solve using conventional analytical or CACSD software packages. Current numerical methods employed in ex-
Automating Control System Design via a Multiobjective Evolutionary Algorithm 157
isting CACSD tools are based upon a-priori gradient-guided approaches, which are often applicable to a subset of design problem or only useful for control system analysis and simulations2'3. These tools are computationally intractable because in the worst case their computation time grows exponentially with the number of design parameters. They are incapable of delivering a global, high-dimensional and automated multi-objective design solution in designing an optimal ULTIC system. Since practical design specifications and constraints are often mixed or competing among each other, using such a CACSD package for optimal ULTIC designs often requires control engineers to go through numerous heuristic simulations and analysis before a 'satisfactory' design emerges. The simulation and analytical power of modern CACSD can, however, be utilized to achieve design automation of ULTIC systems if it is interfaced and coupled with powerful evolutionary based intelligent search tools. Sedgewick9 pointed out that one way to extend the power of a digital computer is to endow it with the power of intelligent non-determinism to assert that when an algorithm is faced with a choice of search options, it has the power to intelligently 'guess' for the right one. Artificially emulating Darwinian's principle of 'survival-of-the-fittest' on natural selection and genetics10, evolutionary algorithm is such a non-deterministic polynomial (NP) computing technique that has the ability to replace human 'trial-anderror' based iterative process by intelligent computer-automated designs. Using such an evolutionary design optimization approach, control engineers' expertise can also be easily incorporated into the initial design 'database' for intelligent design-reuse to achieve a faster convergence11. More importantly, such an evolutionary CACSD approach allows any mixed or sophisticated conflicting specifications and constraints in practical applications be unified and addressed easily under one design banner: Performance Satisfaction. This chapter presents an MOEA application to CACSD design automation in ULTIC systems by unifying all LTI approaches under performance satisfactions in both the time and frequency domains. Unlike existing multiobjective optimization methods that linearly combine multiple attributes to form a composite scalar objective function, the MOEA incorporates the concept of Pareto's domination to evolve a family of non-dominated solutions along the Pareto optimal frontier. Further, each of the objective components can have different priorities or preferences to guide the optimization from individual design specifications rather than manually preweighting the objective functions. Besides the flexibility of specifying a loworder controller structure to simplify the design and implementation tasks,
158
K.C. Tan and Y. Li
the design approach also allows control engineers to interplay and examine different trade-offs among the multiple performance requirements. Such an evolutionary 'intelligent' CACSD methodology for optimal ULTIC designs has been successfully applied to many control engineering applications2"4. The overall architecture of the evolutionary CACSD methodology for optimal ULTIC systems is presented in Section 2, which includes the ULTIC system formulation and formation of various design specifications commonly adopted in practical applications. Validation of the methodology against practical ULTIC design problem for a single-input single-output (SISO) non-minimal phase plant is given in Section 3. Conclusions are drawn in Section 4.
7.2. Performance Based Design Unification and Automation Almost all types of LTI controllers are in the form of a transfer function matrix or its bijective state-space equation when the design is eventually complete. The order and the coefficients of the transfer function, however, vary with the control law or a compromise design objective as to satisfy certain design specifications. For example, a controller designed from the linear quadratic regulator (LQR) scheme tends to offer a minimized quadratic error with some minimal control effort, while an H^ controller provides the robust performance with a minimal value of mixed sensitivity function. Although the obtained coefficients or orders of these two types of controllers may be different, the common purpose of both control laws is to devise an LTI controller that could guarantee a closed-loop performance to meet certain customer specifications in either the time or the frequency domain. Therefore a step towards the unification of LTI control laws is to coin the controller design by meeting practical performance specifications via CACSD optimization approach, instead of by a particular control scheme or in a confined problem domain. This CACSD unified approach should eliminate the need of pre-selecting a specific control scheme for a given application, so as to form a performance-prioritized unified design that is easily understood and applicable to practical control engineers. Further, it should be capable of incorporating performance specifications in both the time and frequency domains that engineers are familiar with, and take into account various system constraints12"14.
Automating Control System Design via a Multiobjective Evolutionary Algorithm 159
7.2.1. The Overall Design
Architecture
The overall evolutionary CACSD paradigm for ULTIC systems is illustrated in Fig. 7.1. As highlighted in the Introduction, design unification of LTI control system can be formulated as an interactive multi-objective optimization problem that searches for a set of Pareto optimal controllers satisfying the often-conflicting practical performance requirements. Such a design optimization cycle accommodates three different modules: the interactive human decision-making module (control engineer), the optimization module (MOEA toolbox15) and the control module (system and specifications). According to the system performance requirements as well as any a-priori knowledge on the problem on-hand, control engineers may specify or select a set of desired specifications from a template15 and forms a multiple-cost function in the control module, which need not necessarily be convex or confined to a particular control scheme. These ULTIC design specifications can also incorporate different performances in both the time and frequency domain or other system characteristics such as poles, zeros or etc., as desired. Based on these performance specifications, responses of the control system consists of the set of input/output signals, the plant model and the candidate controller that is recommended from the optimization module are evaluated so as to determine the different cost values for each design specification in the multiple-cost function. According to the evaluation result of the cost function in the control module and the design guidance, if any, such as goal and priority information from the decision-making module, the optimization module (MOEA toolbox15) automates the ULTIC design process and intelligently searches for the 'optimal' controller parameters that best satisfy the set of performance specifications. On-line optimization progress and simulation results, such as the design trade-offs or convergence trace can be displayed graphically and feedback to the decision-making module. In this way, the overall ULTIC design environment can be supervised and monitored effectively, which helps control engineers in making any further actions such as examining the competing design trade-offs, altering the design specifications, adjusting goal settings that are too stringent or generous, or even modifying the control and system structure if necessary. This man-machine interactive design and optimization process maybe proceeded until all design specifications have been met or the control engineer is satisfied with the control performances. One merit of such approach is that the design problem as well as interaction with the optimization process is closely linked to the en-
160
K.C. Tan and Y. Li
Fig. 7.1. A general CACSD architecture for evolutionary ULTIC systems
vironment of that particular application. A control engineer, in most cases, is not required to deal with any details that are related to the optimization algorithm or to worry about any possible ill-conditioning problem in the designs1. 7.2.2. Control System
Formulation
A general control system configuration for posing performance specifications is shown in the control module of Fig. 7.1. The operator G is a 2x2 block transfer matrix mapping the inputs w and u to the outputs z and
[y\
[G21G22\[u\
{)
The actual process or plant is represented by the sub-matrix G22, i-e. the nominal model Go, which is linear time-invariant and may be unspecified except for the constraint of lying within a given set II ('uncertainty modeling'). H is the ULTIC controller to be designed in order to satisfy all specifications and constraints in the system as given by , ,
H 1<J
=
Pitj
+ ••• + P i , j , m + 2 S + P i , j , m + 1
Pi,j,mSm + • • • + Pi,j,lS + Piij<0
, y
Automating Control System Design via a Multiobjective Evolutionary Algorithm 161
where i, j denotes the respective elements in the transfer matrix and Pi,j,k S 3f+ VA: 6 {0, 1, ..., n} are the coefficients to be determined in the design; y is the signal that the controller has access to, and u is the output of the controller with usually a hard constraint saturation range such as the limited drive voltage or current. The mapping from the exogenous inputs w (disturbances, noise, reference commands etc.,) to the regulated outputs z (tracking errors, control inputs, measured outputs etc.,) contains all the input-output maps of interest12. As illustrated in Fig. 7.1, the evolutionary CACSD for ULTIC systems is to find an optimal controller H that minimizes a set of performance requirements in terms of magnitude or norm of the map from wto z in both the time and frequency domains, subject to certain constraints on the behavior of the system. 7.2.3. Performance Specifications In developing the ULTIC systems, a set of objectives or specifications is often formed as to reflect the various performance requirements that are needed in designing a practical control system. Existing CACSD approaches require the performance index for these design objectives to be within a convex set or restricted to a confined problem domain, which may be impractical. In contrast, there is no hard limitation or objectives transformation needed in the evolutionary ULTIC system designs. This advantage allows many system constraints or conflicting specifications in both the time and frequency domains to be easily incorporated in the design, which are unmatched using conventional CACSD methods. To guide the a-posteriori non-deterministic evolution towards the global optimum, the evolutionary approach merely requires a performance index to indicate the relative strength for each candidate design, which is naturally available or can be easily formulated for most practical control applications. In order to address the various design specifications commonly accommodated in practical control applications, it is essential that the design objectives formulated in ULITC systems should at least reflect the following performance requirements: 7.2.3.1. Stability Stability is often the first concern in any control system designs, which could be determined by solving the roots of the characteristic polynomials. The cost of stability can then be defined as the total number of unstable closedloop poles or the positive poles on the right-hand-side of the s-plane as
162
K.C. Tan and Y. Li
given by Nr{Re(eig) > 0}, i.e. no right-hand poles on the s-plane indicates that the system is stable and vice versa. 7.2.3.2. Step Response Specifications Practical control engineers often address system transient and steady-state performances in terms of time domain specifications. These time domain performances are specified upon step response since it gives a good indication of the response for the controlled variable to command inputs that are constant for long periods and occasionally change quickly to a new value. For a SISO system, the performance requirement of steady-state accuracy can be defined as e ss < 1 - y(t)t^roc , i.e. the difference between the actual response of the commanded variables after the system is settled down. 7.2.3.3. Disturbance Rejection The disturbance rejection problem is defined as follows: find a feedback controller that minimizes the maximum amplitude (HQQ norm) of the regulated output over all possible disturbances of bounded magnitude. A general structure to represent the disturbance rejection for a broad class of control problems is given in Fig. 7.2, which depicts the particular case where the disturbance enters the system at the plant output. The mathematical representation is given by, y = z = Gou+W1d
* = W1{I + G0Hy1=W1S
U
The matrix 5is known as the sensitivity function and the maximum singular values of S determines the disturbance attenuation, since 5 is in fact the closed-loop transfer function from disturbance d to the measured output y. W\ is the desired disturbance attenuation factor, which is often a function of frequency to allow a different attenuation factor at each frequency. The disturbance attenuation specification may thus be given as
a(S)<\\Wrl\\oo
=> IJWi^lU < 1
(4)
where a defines the largest singular value of a matrix. 7.2.3.4. Robust Stability It is important that the designed closed-loop system is stable and provides guaranteed bounds on the performance deterioration, even for 'large' plant
Automating Control System Design via a Multiobjective Evolutionary Algorithm 163
Fig. 7.2. A disturbance rejection problem
variations that maybe occurred in practical applications. Roughly speaking, a robust stability specification requires some design specifications to be hold, even if the plant Go is replaced by any Gpert from the specified set II of possible perturbed plants. Small Gain Theorem: Suppose the nominal plant Go in Fig. 7.3 is stable with the multiplicative uncertainty A being zero. Then the size of the smallest stable A for which the system becomes unstable is16 ... 1 I + G0H a (A) = —— = a (T) G0H ^
(5)
Therefore the singular value Bode plot of the complementary sensitivity function T can be used to measure the stability margins of the feedback system in face of multiplicative plant uncertainties. The multiplicative stability margin is, by definition, the 'size' of the smallest stable A that destabilizes the system as shown in Fig. 7.3. According to the small gain theorem, the smaller ~o (T) is, the greater the size of the smallest destabilizing multiplicative perturbation will be and, hence, the greater the stability margins of the system. The stability margin of a closed-loop system can thus be specified via the singular value inequalities such as
aiTXWW^W^
=* \\W2T\\OO< 1
(6)
where IJH^""11| are the respective sizes of the largest anticipated multiplicative plant uncertainties. 7.2.3.5. Actuator Saturation In a practical control system, the size of actuator signals should be limited since a large actuator signal may be associated with excessive power con-
164
K.C. Tan and Y. Li
Fig. 7.3. Stability robustness problem with multiplicative perturbation
sumption or resource usage, apart from its drawback as a disturbance to other parts of the systems if not subject to hardware limitation. A general structure for saturation nonlinearities at the input of the plant is shown in Fig. 7.4. To pose this problem, a saturation function is denned, 1 Umaxsgn{u)
\u\ > f/max
(7)
Let the plant be described as Gou = Go-Sat(u), the objective is to design an optimal ULTIC controller H that satisfies all the design specifications with an allowable control effort of max(u) < Um&x, so as to stay in the linear region of the operation. Note that performances of the closed-loop system such as tracking accuracy and disturbance attenuation are bounded by the actuator saturation specification, i.e. a smaller control effort often results in a poorer performance in tracking and disturbance rejection due to the limited control gain in order to operate the system in a linear region. In addition, the stability for such a system will mean the local stability of the nonlinear system.
Fig. 7.4. Saturation nonlinearities at the plant input
7.2.3.6. Minimal Controller Order It is often desired that the controller to be designed in practical control system is as simple as possible, since a simple controller would require less
Automating Control System Design via a Multiobjective Evolutionary Algorithm 165
computation and implementation effort than a higher-order controller17. It is thus useful to include the order of ULTIC controller as one of the design specification here, in order to find the smallest-order controller that satisfies all the performance requirements and system constraints. The performance and robustness specifications that are formulated above cover the usual design requirements in many practical control applications. Note that other design specifications such as phase/gain margin, time delay, noise rejection etc., could also be easily added to the ULTIC system in a similar way, if desired. As addressed in the Introduction, designing an optimal ULTIC system requires simultaneously optimizing multiple controller coefficients to satisfy the set of conflicting design specifications. It leads to a multi-dimensional and multi-modal design problem characterized by the multi-objective performance indices, which can be tackled via a multiobjective evolutionary algorithm. 7.3. An Evolutionary ULTIC Design Application In this section, the control system design application of a non-minimum phase SISO plant using an MOEA toolbox15 is presented to illustrate the effectiveness of the evolutionary ULTIC design methodology. Considering the following non-minimal phase plant as studied in18: r
n
o U
=
-1.3(a-5.5307)(a +4.9083)
{> s(s + 0.3565 -5.27j)(s + 0.3565 + 5.27j)(s + 0.0007) This nominal model has a 'non-minimum phase' zero at z = 5.5307 and a nearly unstable pole at p = -0.0007, which makes it an interesting robust control design problem. Here, the aim is to design an ULTIC controller that meets a set of time and frequency domain performance requirements, while satisfying certain system constraints such as actuator saturation. Fig. 7.5 shows the overall design block diagram of the ULTIC system, which includes eight design objectives and one hard actuator constraint to be satisfied as listed in Table 7.33. The underlying aim of setting the priority vector in the second last column of Table 7.33 is to obtain a controller that first stabilizes the system within the actuator saturation limit for hardware implementation. Note that the actuator saturation is set as a hard constraint reflecting the hard limit of this performance requirement, which requires no further minimization if the control action u is within the saturation limit. Further, the system must be robust to plant uncertainty and disturbance attenuation under the level of tolerances as defined by the weighting functions of Wi and W2 in Fig. 7.6 18. Having fulfilled these requirements, the system
166
K.C. Tan and Y. Li
should also satisfy some time domain specifications as defined by the transient and steady-state responses. Although determination of the objective and the priority settings may be a subjective matter and depends on the performance requirements, ranking the priorities may be unnecessary and can be ignored for a 'minimum-commitment' design19. If, however, an engineer commits himself to prioritizing the objectives, it is a much easier task than weighting the different objectives that are compulsory in objective function aggregation approaches6.
Fig. 7.5.
Block diagram of the ULTIC system design
Fig. 7.6.
Frequency responses of W\ and W?
The order of all candidate controllers is not fixed, while allowing its maximum to be of third-order. Parameter settings of the MOEA toolbox15
Automating Control System Design via a Multiobjective Evolutionary Algorithm 167 Table 7.33. Time and frequency domain design specifications for the non-minimal phase plant. Design specification I 1. Stability (closede . loop poles) domain 2 . Disturbance
Objective Nr[Ke(eig)] > 0 (Sta)
Goal 0
Priority I
Constraint soft
S
1
3
soft
T
I
3
soft
Co
3™
5
soft
~Aci
0.5 V
2
hard
Tr Mv 7^
4s 0.05 fs
4 4 4
soft soft soft
e^
0.01 s
4
soft
rejection
Time domain
3. Plant uncertainty 4. Controller order 5. Actuator saturation 6. Rise time 7. Overshoots 8. 5% settling time 9. Steady-state error
are shown in Fig. 7.7. The design took less than 2 hours on a Pentium II 350MHz processor, with a population and generation size of 100. At the end of the evolution, all ULTIC controllers recommended by the toolbox have met the nine design specifications as listed in Table 7.33. Among these controllers, 88 are of second-order and 12 are of third-order. The system closed-loop responses for these ULTIC controllers are shown in Fig. 7.8, where all the responses are within the clear area showing good performance of the time domain specifications. Fig. 7.9 shows the frequency responses of both WiS and W2T for all the Pareto optimal controllers, in which the gains of the responses are satisfactory less than the required magnitude of 0 dB. To illustrate robustness of the evolutionary designed ULTIC system on disturbance rejection, a sinusoidal acted as disturbance signal was applied to the system, with an amplitude and angular frequency of 1 volt and 0.05 rad/s, respectively. The sinusoidal and its attenuated signal for all Pareto optimal ULTIC controllers are shown by the dashed and solid line in Fig. 7.10, respectively. Clearly, the disturbance has been attenuated successfully as required by the 2 nd objective in Table 7.33, which had resulted a 10 times in gain reduction of the original sinusoidal signal. Fig. 7.11 shows the output responses for one of the randomly chosen Pareto optimal controller with a perturbed nominal model of eqn. 8 as to study the system robustness in terms of plant uncertainties. The plant
168
K. C. Tan and Y. Li
Fig. 7.7.
Fig. 7.8.
Quick setups of the MOEA toolbox for the ULTIC problem
The MOEA optimized output responses for the SISO system
is being perturbed simultaneously upon both the zeros and poles of the
Automating Control System Design via a Multiobjective Evolutionary Algorithm 169
Fig. 7.9. Frequency responses of the non-minimal phase system
nominal model in the range of
T-Z-Zl>
?2-p-Pl
(9)
170
K.C. Tan and Y. Li
Fig. 7.10. The sinusoidal disturbance and its attenuated signal
where z\ = 2z and p\ = l.lp; z and p is the zeros and poles of the nominal plant, respectively. It was observed that plant perturbations upon the system poles are much more sensitive than the zeros, due to the 'almost unstable' pole that is located very near to the imaginary axis, i.e. p = 0.0007. As shown in Fig. 7.11, the ULTIC system is able to maintain relatively good response and stability performance despite the various perturbations made upon the nominal plant. Apart from the flexibility in analyzing the control performance, the evolutionary design also allows on-line examination of different trade-offs among the multiple conflicting specifications, modification of existing objectives and constraints or zooms into any region of interest before selecting one final controller for real-time implementation. The trade-off graph of the resultant 100 ULTIC controllers is shown in Fig. 7.12, where each line representing a solution found by the evolutionary optimization. The x—axis shows the design specifications, the y-axis shows the normalized cost for each objective and the cross-mark shows the desired goal setting for each specification. Clearly, trade-offs between adjacent specifications results in the crossing of the lines between them, whereas concurrent lines that do not cross each other indicating the specifications do not compete with one another. For example, the specification of tracking error (ess) and controller
Automating Control System Design via a Multiobjective Evolutionary Algorithm 171
Fig. 7.11. Output responses of the ULITC system with plant uncertainties
order (Co) do not directly compete against each other, whereas the sensitivity function (5) and complementary sensitivity function (T) appear to compete heavily, as expected. Information contained in the trade-off graph of Fig. 7.12 also suggests that a lower goal setting of rise time and settling time is possible, and these objectives could be further optimized to arrive at even better transient performance if desired. A powerful feature of designing ULTIC system using MOEA is that all the goal and priority settings can be conveniently examined and modified at any time during the evolution process. For example, the designer may change his preference and decide to set a goal setting of 2™d-order, instead of the 3rd-order, for the controller order specification after certain number of generations. Fig. 7.13 illustrates the behavior of the evolution upon online modification of this goal setting after the design in Fig. 7.12. Due to the sudden change of a tighter goal setting, none of the individuals manages to meet all the required specifications as shown in Fig. 7.13(a). After continuing the evolution for 5 generations, the tradeoffs move towards satisfying the controller order specification at the performance expenses of other objectives as shown in Fig. 7.13(b). In Fig. 7.13(c), the evolution continues and again leads to the satisfaction of all the goal settings including the controller order specification, by having less room
172
K.C. Tan and Y. Li
Fig. 7.12. Trade-off graph of the final evolutionary designed ULTIC system
for further improvement of other design objectives or achieving less Pareto optimal solutions as compared to the one in Fig. 7.12. Clearly, this manmachine interactive design approach has enabled control engineers to divert the evolution into any interested trade-off regions as well as to modify certain specifications or preferences on-line, without the need of restarting the entire design cycles as required by conventional methods. 7.4. Conclusions This chapter has presented an automated CACSD design methodology for uniform LTI control systems using an MOEA, which is capable of unifying different LTI design schemes under performance satisfactions and eliminating the need of pre-selecting a specific control law. Unlike conventional methods, control engineers' expertise as well as settings on goal or priority for different preference on each design specification can be easily incorporated and modified on-line according to the evolving trade-offs, without the need of repeating the whole design process. In principle, any number or combination of constraints and performance specifications can be included in the evolutionary ULTIC design if desired. Validation results upon a nonminimum phase control system illustrate the efficiency and effectiveness of the methodology.
Automating Control System Design via a Multiobjective Evolutionary Algorithm 173
(a) Reducing the goal setting of controller order from 3rd- to 2nd- order
(b) After 5 generations
(c) After another 5 generations
Fig. 7.13. Effects of the evolution upon the on-line modification of goal setting
174
K.C. Tan and Y. Li
References 1. W.T. Nye and A.L. Tits, An application-oriented, optimization-based methodology for interactive design of engineering systems, Int. J. Contr., vol. 43, pp. 1693-1721 (1986). 2. Y. Li, K.C. Tan, K.C. Ng and D.J. Murray-Smith, Performance based linear control system design by genetic algorithm with simulated annealing, Proc. 34th Conf. on Decision and Conir.,New Orleans, pp. 731-736 (1995). 3. Y. Li, K.C. Tan and C. Marionneau, Direct design of linear control systems from plant I/O data using parallel evolutionary algorithms, Int. Conf. on Control'96, Special Session on Evolutionary Algorithms for Contr. Eng., University of Exeter, UK, pp. 680-686 (1996). 4. K.C. Tan and Y. Li, Multi-objective genetic algorithm based time and frequency domain design unification of control systems, IFAC Int. Sym. on Artificial Intelligence in Real-Time Contr., Kuala Lumpur, Malaysia, pp. 6166 (1997). 5. P.J. Fleming and A.P. Pashkevich, Application of multi-objective optimization to compensator design for SISO control systems, Electronics Letters, vol. 22, no. 5, pp. 258-259 (1986). 6. W.Y. Ng, Interactive Multi-objective Programming as a Framework for Computer-aided Control System Design, Lecture notes in control and information sciences (Springer-Verlag, 1989). 7. R.G. Becker, A.J. Heunis and D. Q. Mayne, Computer-aided design of control systems via optimization, IEE Proc. Ft. D, vol. 126, no. 6, pp. 573578 (1979). 8. E. Polak, D.Q. Mayne and D.M. Stimler, Control system design via semiinfinite optimization: A review, Proc. IEEE, vol. 72, no. 12, pp. 1777-1794 (1984). 9. R. Sedgewick, Algorithms, 2nd Edition (Addison-Wesley, Reading, MA, 1988). 10. Z. Michalewicz, Genetic Algorithms + Data Structure = Evolutionary Programs, 2nd Edition (Springer-Verlag, Berlin, 1994). 11. K.J. MacCallum, Design reuse - Design concepts in new engineering contexts, Proc. Control, Design and Production Research Conf., Heriot-Watt University, pp. 51-57 (1995). 12. M.A. Dahleh and I. Diaz-Bobillo, Control of Uncertain Systems: A Linear Programming Approach (Prentice Hall, Englewood Cliffs, NJ, 1995). 13. W.S. Levine and M.B. Tischler, CONDUIT - Control Designer's Unified Interface. IEEE Int. Conf. Contr. Appl. and Sys. Design, Hawaii, pp. 422-427 (1999). 14. H.A. Barker, Open environments and object-oriented methods for computer-aided control system design, Contr. Eng. Practice, vol. 3, no. 3, pp. 347-356 (1995). 15. K.C. Tan, T.H. Lee, D. Khoo and E.F. Khor, A multi-objective evolutionary algorithm toolbox for computer-aided multi-objective optimization, IEEE Transactions on Systems, Man and Cybernetics: Part B (Cybernetics),
Automating Control System Design via a Multiobjective Evolutionary Algorithm 175 vol. 31, no. 4, pp. 537-556 (2001). 16. G. Zames, On the input-output stability of time-varying non-linear feedback systems, Parts I and II, IEEE Trans. Auto. Contr., AC-11, 2 & 3, pp. 228-238 & 465-476 (1966). 17. P. Schroder, A.J. Chipperfield, P.J. Fleming and N. Grum, Multiobjective optimization of distributed active magnetic bearing controllers, Conf. on Genetic Algorithms in Engineering Systems: Innovations and Applications, pp. 13-18 (1997) 18. J.C. Doyle, B. Francis and A. Tannenbaum, Feedback Control Theory (Macmillan Publishing Company, New York, 1992). 19. K.X. Guan and K.J. MacCallum, Adopting a minimum commitment principle for computer aided geometric design systems. Artificial Intelligence in Design '96 (Gero, J. S. and Sudweeks, F., eds) (Kluwer Academic Publishers, 1996), pp. 623-639.
CHAPTER 8 THE USE OF EVOLUTIONARY ALGORITHMS TO SOLVE PRACTICAL PROBLEMS IN POLYMER EXTRUSION
Antonio Gaspar-Cunha and Jose A. Covas IPC - Institute for Polymers and Composites Dept. of Polymer Engineering University of Minho, 4800-058 Guimaraes, Portugal E-mail: gaspar,[email protected] This work aims at selecting the operating conditions and designing screws that optimize the performance of single-screw and co-rotating twin-screw extruders, which are machines widely used by the polymer processing industry. A special MOEA, denoted as Reduced Pareto Set Genetic Algorithm, RPSGAe, is presented and used to solve these multiobjective combinatorial problems. Twin screw design is formulated as a Travelling Salesman Problem, TSP, given its discrete nature. Various case studies are analyzed and their validity is discussed, thus demonstrating the potential practical usefulness of this approach. 8.1. Introduction Polymer extrusion is a major plastics processing technology used for the manufacture of a wide range of plastics products (such as pipes and profiles, film, sheet, filaments, fibers, electrical wires and cables) and also for the production of raw materials {e.g., modified polymers, polymer blends, fiber/polymer matrix composites, biodegradable systems) 1 ' 2 . The essential unit of an extrusion line is the extruder, which is composed of one (single screw extruder) or more screws (the most common being the co-rotating twin screw extruder) rotating at constant speed inside a heated barrel. Solid polymer (in pellets or powder form) is supplied to the screw channel either by gravity flow from a hopper or by a feeder set at a prescribed rate. The solid progresses along the screw and melts due to the combined effect of conducted and dissipated heat. This (highly viscous non-Newtonian) melt is subsequently homogenized (via both dispersive and distributive mixing), 177
178
A. Gaspar-Cunha and J.A. Covas
pressurized and forced to pass through the die, where it is shaped into the required cross-section, before being quenched1"3. Mathematical modelling of the global process involves coupling a sequence of numerical routines, each valid for a process stage where specific physical/rheological phenomena develop (namely solids conveying, melting, melt conveying, dispersivedistributive mixing, devolatilization) 1 - 3 . In other words, each zone is described by the relevant governing equations (mass conservation, momentum and energy), together with constitutive equations describing the rheological and thermal responses of the material, linked to the adjacent zones through the appropriate boundary conditions. The relative simplicity of the screw extruder geometry masks the complexity of the flow developed. In practice, setting the operating conditions and/or designing screws for new applications are usually carried out by a trial-and-error procedure, where tentative extrusion experiments, or machining of screws, are performed until satisfactory results (i.e., the desirable performance) are obtained. Since the above targets correspond to multiobjective problems, and given their typology, they can instead be solved adopting a scientific methodology based on Multi-Objective Evolutionary Algorithms (MOEAs)4'5. The present work focus on the application of this optimization methodology to single and twin-screw polymer extrusion. For this purpose, a special MOEA, denoted as Reduced Pareto Set Genetic Algorithm with elitism (RPSGAe), is proposed6'7. This algorithm uses a clustering technique to reduce the number of solutions on the efficient frontier. Fitness is determined through a ranking function, the individuals being sorted using the same clustering technique. Thus, section 2 presents the main functional process features and discusses the characteristics of the optimization problems. The RPSGAe is presented and described in detail in section 3, where a specific screw design methodology is also proposed. Evolutionary algorithms are then used in section 4 to set the operating conditions and to design screws for single and twin-screw extruders. 8.2. Polymer Extrusion 8.2.1. Single Screw Extrusion A conventional plasticating single-screw extrusion unit uses an Archimedestype screw (with at least three distinct geometrical zones in terms of channel depth), rotating at constant speed, inside a heated barrel. As illustrated in Fig. l.A, intensive experimental research demonstrated that the material
The Use of EAs to Solve Practical Problems in Polymer Extrusion
179
deposited in the hopper passes through various sequential functional zones which will induce a certain thermo-mechanical environment1'7. Flow in the hopper is due to gravity, while that in the first screw turns results from friction dragging (solids conveying). Soon, a melt film will form near to the inner barrel wall (delay zone), followed by the creation and growth of a melt pool (melting zone). Eventually, all fluid elements will progress along the screw channel following an helicoidal path (melt conveying) and pressure flow will take place in the die. Figure 2 shows the physical assumptions underlying the mathematical model of the global process. Calculations are performed in small screw channel increments, a detailed description being available elsewhere7"9. For a given polymer / system geometry / operating conditions set, the program not only predicts the evolution of important process variables along the screw (as shown in Fig. l.B for pressure and melting rate), but also yields the values of parameters which, altogether, describe the overall process performance (these include - see Fig. l.C - mass output, mechanical power consumption, length of screw required for melting, melt temperature, degree of mixing - WATS and viscous dissipation, which is quantified by the ratio maximum temperature / barrel temperature)7. The process is quite sensitive to changes in geometry and/or operating conditions. As can be observed in the example of Fig. l.C, an increase in screw speed produces an increase in mass output, but at the cost of more power consumption, higher melt temperatures - due to viscous dissipation - and lower mixing quality. In fact, WATS generally decreases with increasing screw speed, as there is less channel length available for mixing (due to lower melting rates) and shorter residence times. Therefore, setting the operating conditions requires establishing a compromise between the relative satisfaction of the above parameters. The same reasoning could be applied to screw design. 8.2.2. Co-Rotating Twin-Screw
Extrusion
The limitations of single screw extruders in terms of the interdependence between output, die resistance and mixing quality, as well as in the capability of producing effective random distributive and dispersive mixing stimulated the use of co-rotating twin-screw extruders for compounding operations1'2. In these machines two parallel intermeshing screws rotate in the same direction, inside a cavity with a cross-section with a format-of-8. Since the screws are generally of modular construction, it is possible to
180
A. Gaspar-Cunha and J.A. Covas
A)
B)
C)
Fig. 8.1. Single-screw extruder: A) geometry; B) melt pressure and melting profiles; C) performance measures.
build profiles where the location of melting, mixing intensity and average residence time can be estimated a priori. Also, the barrel can contain aper-
The Use of EAs to Solve Practical Problems in Polymer Extrusion
181
Fig. 8.2. Physical models for single-screw extrusion.
tures for secondary feeding (e.g., additives, fillers), devolatilization (e.g., removal of water vapor or of reaction volatiles), etc. In the case of the extruder of Fig. 3.A, the material is supplied at a prescribed rate, so that conveying sections are only partially fed. Melting will occur at the staggering kneading block upstream (by the combined effect of heat conducted and dissipated from the mechanical smearing of the solid pellets), while the third kneading block will provide the adequate seal for devolatilization. Although these extruders have also attracted a significant amount of experimental and theoretical work in the last decades10"13, the understanding of certain process stages, such as melting, is still far from complete14^16. Consequently, for modelling purposes melting is often considered as instantaneous and taking place before the first restrictive element upstream. From the melting location to the die exit computations of melt flow are performed separately for each type of screw element (right-handed or left-handed screw elements, staggered kneading disks) - as illustrated in Fig. 4. This is also the concept of the LUDOVIC software17, whose predictions have been shown to be within 10% of the experimental values17'18. As for single screw extrusion, for a given polymer / system geometry / operating conditions set,
182
A. Gaspar-Cunha and J.A. Covas
A
B;
C)
Fig. 8.3. Twin-screw extruder: A) geometry; B) pressure and cumulative residence time; C) performance measures.
the software predicts the evolution along the screw of variables such as temperature, melt pressure, shear rate, viscosity, residence time, specific energy and filling ratio (Fig. 3.B) and the values of global performance parameters {e.g., average residence time, average strain, mechanical power consumption, maximum melt temperature, outlet temperature, as in Fig.
The Use of EAs to Solve Practical Problems in Polymer Extrusion
183
3.C). The response of these machines is also sensitive to the operating conditions, in this case output, screw rotation speed and temperature. The effect of output is illustrated in Fig. 3. Output influences mainly the number of fully filled channels, hence mechanical power consumption, average residence time and strain. However, the level of shear stresses at kneading disks remains the same, hence the maximum temperatures attained are not affected.
Fig. 8.4. Physical models for co-rotating twin-screw extrusion.
8.2.3. Optimization Characteristics As discussed above, for each application the performance of single and twin screw extruders is determined by the operating conditions and machine geometry. The former include screw speed (N) and barrel temperature profiles (Tbi), and mass output (Q) in the case of twin-screw extruders. As illustrated in Fig. 5, which identifies the parameters to be optimized for each type of machine, N, Tbi, and Q can vary continuously within a prescribed range, which is dictated by the characteristics of the motor and the thermal
184
A. Gaspar-Cunha and J.A. Covas
stability of the polymer. In the case of the twin-screw machine N and Q are not independent, since for each N there is a maximum attainable Q (as the screws become fully filled along their axis). This limit is detected by the LUDOVIC17, which does not converge if the two values are incompatible. The geometric parameters of single-screw extruders can also vary continuously within a preset interval. As shown in Fig. 5, if one is aiming at designing a new screw for an existing extruder, then consideration should be given to the definition of the screw length of the feed {L{) and compression (L2) zones, their corresponding internal diameters (Di and D3, respectively), the flight thickness (e) and the screw pitch (P). The variation intervals are defined by a number of reasons, such as excessive mechanical work on the polymer (maximum D1/D3 ratio), mechanical resistance of the screw (minimum D{), polymer conveying characteristics (minimum Li). Conversely, screws for twin screw extruders are built by selecting the required number of elements from a set of available geometries and then defining their relative position. As Fig. 5 shows, if a screw is made of 14 elements and the aim is to define the relative position of 10 (of which 5 are transport elements, 4 are kneading blocks and 1 is a reverse element), there are 10! possible combinations, i.e., a complex discrete combinatorial problem must be solved. Although less common, one could also envisage to optimize the geometry of individual elements, which would entail the continuous variation of parameters within a prescribed interval. Despite the obvious practical importance of the topic, there is limited experience on the use of an optimization approach to define the operating conditions or to design screws for polymer extrusion. Most effort has been concentrated on single screw extrusion19'20, although Potente et al.21 has recently suggested the use of a quality function to optimize the geometry of specific screw elements for twin screw extruders. 8.3. Optimization Algorithm 8.3.1. Multi-Objective
Optimization
As most real-world optimization problems, optimization of polymer extrusion is multi-objective. This can be dealt with in two ways, depending on the moment when the decision about the relative importance of the various criteria is to be taken. If it is feasible to establish that importance before the search takes place, then the various individual objectives can be congregated into a unique function, yielding a single objective optimization problem. However, if the relative weight of each criterion is changed, a new
The Use of EAs to Solve Practical Problems in Polymer Extrusion
Fig. 8.5.
185
Parameters to be optimized.
optimization run needs to be carried out. When the relative value of the criteria is not known a priori, it is possible to take advantage of the fact that Genetic Algorithms work with a population of points to optimize all criteria simultaneously. This is performed with
186
A. Gaspar-Cunha and J.A. Covas
a Multi-Objective Evolutionary Algorithm (MOEA). The result will be a set of non-dominated vectors, denoted as Pareto-optimal solutions, evidencing the trade-off between the criteria and the parameters to be optimized. Thus, the decision maker can choose a solution resulting from a specific compromise between the relative satisfaction of the individual criteria.
8.3.2. Reduced Pareto Set Genetic Algorithm with Elitism (RPSGAe) In MOEAs the selection phase of a traditional Evolutionary Algorithm is replaced by a routine able to deal with multiple objectives. Usually, this is made applying the fitness assignment, density estimation and archiving operators, various methods being available for this purpose4'5. In this work, the Reduced Pareto Set Genetic Algorithm with Elitism (RPSGAe)6 is adopted, which involves the application of a clustering technique to reduce the number of solutions on the efficient frontier, while maintaining intact its characteristics. The clustering technique, proposed by Roseman and Gero22 and known as complete-linkage method, compares the proximity of solutions on the hyper-space using a measure of the distance between them. Solutions closer to a pre-defined distance are aggregated. Fitness is determined through a ranking function, the individuals being sorted with the same clustering technique. In order to incorporate these techniques in the EA, Algorithm 1 was developed. The RPSGAe follows the steps of a traditional EA, except it defines an external (elitist) population and uses a specific fitness evaluation. It starts with the random definition of an internal population of size N and with the creation of an empty external population. At each generation, the following operations are carried out: • The internal population is evaluated using the modelling package; • Fitness is calculated using the clustering technique (see Algorithm 2 below6); • A fixed number of best individuals are copied to the external population until this becomes full; • Algorithm 2 is applied again, to sort the individuals of the external population; • A pre-defined number of the best individuals is incorporated in the internal population, by replacing the lowest fitness individuals; • Reproduction, crossover and mutation operators are applied.
187
The Use of EAs to Solve Practical Problems in Polymer Extrusion
Algorithm 1 (RPSGAe): Random initial population (internal) Empty external population while not Stop-Condition do Evaluate internal population Calculate the Fitness of all the individuals using Algorithm 2 Copy the best individuals to the external population if the external population becomes full Apply Algorithm 2 to this population Copy the best individuals to the internal population end if Select the individuals for reproduction Crossover Mutation end while Algorithm 2 starts with the definition of the number of ranks, NRankS, and the rank of each individual, Rank[i], is set to 0. For each rank, r, the population is reduced to NR individuals (where NR is the number of individuals of each rank), using the clustering technique. Then, rank r is attributed to these NR individuals. The algorithm ends when the number of pre-defined ranks is reached. Finally, the fitness of individual i (Fi) is calculated using the following linear ranking function:
Fi = 2
_ SP + 2 {SP ~ 1]
{NRanks
+ 1
~ Rank
[i])
(1)
Nflanks
where SP is the selection pressure (1 < SP < 2). Detailed information on these algorithms can be found elsewhere6'7. 8.3.3. Travelling Salesman Problem The above RPSGAe can be easily adapted to the various extrusion optimization problems involving continuous variables, i.e., setting the operating conditions for both single and twin-screw extruders and designing screws for single-screw extruders. When the aim is to optimize the screw configuration of twin-screw extruders, a discrete combinatorial problem must be solved (Twin-Screw Configuration Problem, TSCP). However, TSCP can
188
A. Gaspar-Cunha and J.A. Covas
Algorithm 2 (Clustering): Definition of NRanks Rank[i]=0 r=\ do NR = r(N/NRanks\ Reduce the population down to NR individuals r =r +1 while (r < NRanks) Calculate fitness End be formulated as a Travelling Salesman Problem (TSP), as illustrated in Fig. 6. In the TSP the salesman needs to visit n cities, the aim being to select the visiting sequence that minimizes the distance travelled and/or the total cost (two alternative routes are suggested). In the TSCP the polymer is the Travelling Salesman and the screw elements are the cities. In this case, the polymer must flow through the different elements, whose location in the screw has to be determined in order to maximize the global process performance.
Fig. 8.6.
Twin-screw configuration problem (TSCP) formulated as a TSP.
Formulating TSCP as a TSP yields the possibility of using the vast number of algorithms available to solve the latter. In fact, single objective
The Use of EAs to Solve Practical Problems in Polymer Extrusion
189
TSPs have been solved using EAs23'24 but, apparently, only Zhenyu25 approached multi-objective TSPs. The difficulty of using MOEA arises from the fact that the traditional crossover and mutation operators are not sufficiently capable of granting a positive and rapid evolution of the population along the various generations26. Thus, a specific TSP reproduction operator, incorporating crossover and mutation, and able to make full use of the heuristic information contained in the population, the inver-over, has been suggested. It has been shown to out-perform other evolutionary operators in the resolution of single objective TSPs26. Consequently, a MOEA for solving multi-objective TSP (or, equivalently, TSCP) was developed (Algorithm 3). It starts with the random generation of the N individuals of the internal population and an empty external population of size 2 * N. After evaluating the former using the LUDOVIC routine, the following actions are taken for each generation: • The individuals are ranked using Algorithm 2; • The entire internal population is copied to the elitist population; • The inver-over operator is applied in order to generate the remaining N individuals of the elitist population; • The new individuals are evaluated; • The non-domination test and Algorithm 2 are applied to the elitist population to rank its 2N individuals; • The best N individuals of the elitist population are copied to the main population. The algorithm is concluded when the number of generations is reached. The solutions are the non-dominated individuals of the last internal population. 8.4. Results and Discussion The optimization algorithms discussed in the previous section will now be used to solve the situations depicted in Fig. 5. Single and twin screw extrusion will be studied separately and, for each, the operating conditions and the screw geometry will be optimized. 8.4.1. Single Screw Extrusion Operating conditions
The aim is to determine the operating conditions, i.e., screw speed (N) and barrel temperature profile (T1: T2 and T3), which may vary continu-
190
A. Gaspar-Cunha and J.A. Covas
Algorithm 3 (MOEA for TSP): Random initial population (internal) Empty external population Evaluate internal population while not Stop-Condition do Calculate the Fitness of all the individuals using Algorithm 2 Copy the N individuals to the external population Apply the inver-over operator to generate new N individuals Evaluate the new N individuals Apply Algorithm 2 to the external population Copy the best N individuals to the internal population end while
ously within the range defined between square brackets in Fig. 5, that will maximize the performance described by the six criteria presented in Table 1. Thus, the global objective is to maximize mass output and degree of mixing (WATS), while minimizing the length of screw required for melting, melt temperature, power consumption and viscous dissipation, which is obviously conflicting. The prescribed range of variation of each criterion is also stated in Table 1. The polymer properties (a commercial high density polyethylene extrusion grade) and the extruder geometry (a Leistritz LSM 36, a laboratorial machine) are known7. The following GA parameters were used: 50 generations, crossover rate of 0.8, mutation rate of 0.05, internal and external populations having 100 individuals, limit of the clustering algorithm set at 0.2 and Nuanks equal to 30. Table 8.34. Criteria for optimizing single screw operating conditions and corresponding range of variation. Criteria Cl C2 C3 C4 C5 C6 -
Output (kg/hr) Length of screw required for melting (m) Melt temperature (°C) Power consumption (W) WATS Viscous dissipation - Tmax/Tb
Aim
Range of variation
Maximize Minimize Minimize Minimize Maximize Minimize
1-20 0.2 - 0.9 150 - 210 0 - 9200 0 - 1300 0.5 - 1.5
The Use of EAs to Solve Practical Problems in Polymer Extrusion
191
Figure 7 shows some of the optimal Pareto plots obtained for the simultaneous optimization of all the six criteria, both in the criteria's (Fig. 7.A) and parameters to optimize domain (Fig. 7.B). As expected, in this six-dimensional space distinction between dominated and non-dominated solutions is difficult, since points that appear to be dominated in one Pareto frontier are probably non-dominated in another, i.e., selecting a solution is not easy. One alternative consists in quantifying the relative importance of the criteria using a conventional quality function, such as the weighted sum, applied to the final population:
(2)
Fi = J2wifi
Here, Fi is the fitness of individual i, q is the number of criteria, fj is the objective function of criterion j and Wj is the corresponding weight (0 < Wj < 1). The decision maker defines the weight of each criterion and applies this function to the non-dominated solutions, thus finding the best result. Using output (Cl in Table 1) as a basis of comparison, Table 2 shows the operating conditions proposed when its weight (wi) varies between 0.1 and 0.5. As output becomes more relevant to the global performance, N increases due to their direct relationship. However, as illustrated in Fig. 1, the remaining criteria will be progressively less assured. The results of this methodology have been validated experimentally7. Table 8.35.
Best operating conditions for single-screw extrusion.
Weights
Operating Conditions
mi
w2 to w5
N (rpm)
Ti/T2/T3 (°C)
51 0.2
0.9/4 0.8/4
13! 23.0
207/155/150 185/183/153
0.3 0.4 0.5
0.7/4 0.6/4 0.5/4
23.0 48.5 48.5
185/183/153 161/199/195 161/199/195
Screw design As identified in Fig. 5, the aim is to define the values of L 1; L2, £>i, D3, P and e that, for the same polymer and forfixedoperating conditions (TV = 50rpm and Ti = 170 °C), will again optimize the criteria identified in Table 1. Since this involves, as above, a six-dimensional space in the criteria's or
192
Fig. 8.7. domain.
A. Gaspar-Cunha and J.A. Covas
Optimal Pareto plots: A) Criteria's domain; B) Parameters to optimize
in the parameters to optimize domains, following the same procedure yields the results shown in Table 3. As illustrated in Fig. 8, two quite different screw profiles are proposed (see Fig. 8), one when output is not relevant, the other when it is at least as important as the remaining criteria. The former has a high D3/D1 ratio and a shallow pumping section (£3), favoring melting and mixing, but opposing high throughputs. Conversely, the second screw profile possesses a higher channel cross-section, inducing higher flows. Table 8.36. Best screw geometries for single-screw extrusion. Weights L\
L2
D\
D3
0.1
0.9/4
UD
8^15
22T6
3T9
38^9
3.2
0.2 0.3
0.8/4 0.7/4
7.5D 7.5D
7.1D 7.1D
25.1 25.1
26.9 26.9
36.2 36.2
3.7 3.7
0.4 0.5
0.6/4 0.5/4
7.5D 7.5D
7.ID 7.ID
25.1 25.1
26.9 26.9
36.2 36.2
3.7 3.7
w\
VL>2 t o u>5
Screw geometry (mm) P
e
The Use of EAs to Solve Practical Problems in Polymer Extrusion
Fig. 8.8.
193
Best screw profiles: A) tin =0.1; B) (0.2 < w\ < 0.5 (see Table 3).
In industrial practice screws must be flexible, i.e., they must exhibit good performance for a range of materials and operating conditions. This requirement may be included in the design routine by studying the sensitivity of designs proposed by the optimization algorithm to limited changes in relevant parameters, such as polymer rheology, operating conditions and even the relative importance of the weights9. More specifically, assuming u>i — 0.2, the five best screws proposed by the optimization algorithm are those of Table 4. When these are subjected to a sensitivity analysis, the data of Fig. 9 is obtained, where the black bars represent the average global performance, and the white bars the respective standard deviation. Thus, screw 1 can be chosen if global performance is of paramount importance; or screw 2 may be selected when process stability has priority. Table 8.37.
Best screws considered for a sensitivity analysis (wi=0.2).
Screw 1
L\ 7.5D
Z/2 7.1D
L3 11.4D
D\ ( m m ) 26^9
D3 ( m m ) 36!
Screw Screw Screw Screw
6.3D 6.3D 6.3D 5.9D
8.4D 8.4D 8.4D 8.4D
11.3D 11.3D 11.4D 11.6D
31.9 31.9 31.8 30.8
38.9 39.4 40.6 32.3
2 3 4 5
194
A. Gaspar-Cunha and J.A. Covas
0.7 c
1
T
0.6 - _ n
i ' a o.4 -•
~ \ .,
.
•;
|JB - i - - T§H---
Screw 1
Screw 2
*"1B f-wB—
Screw 3
till Q Operating conditions S J B " " B Rheo/ogica/properties
Screw 4
Screw 5
Fig. 8.9. Global sensitivity to small changes in operating conditions, rheological properties and criteria importance of the 5 best screws of Table 4.
8.4.2. Twin-Screw
Extrusion
Operating conditions As shown in Fig. 5, this problem involves determining screw speed (N), barrel temperature profile (2\, T2 and T3) andflowrate (Q). The detailed screw geometry is given in Table 5, while Table 6 presents the criteria and their corresponding aim and range of variation. Since Q is imposed by a volumetric/gravimetric feeder but, simultaneously, it is convenient to maximize it, it is taken both as parameter and optimization criterion. The RPSGAe was applied using the following parameters: 50 generations, crossover rate of 0.8, mutation rate of 0.05, internal and external populations with 100 individuals, limits of the clustering algorithm set at 0.2 and Nnanks = 30. Table 8.38. 1 ~L P
3
4
5
6
7
97.5 150 60
60
30
120
45
-30
30
KB-60
45
2
Screw configuration: L - Length (mm); P - Pitch (mm).
30
20 KB90
8
9
60 60
10 3TB
45 30 KB-30
11
12 13
120 90 30 60
30 20
Figure 10 shows the Pareto frontiers in the criteria's domain, plotted against output, while Table 7 presents the results obtained when the set of weights of Table 2 is used upon application of equation (1). As the importance of Q increases, the best solutions (represented in Fig. 10 from
195
The Use of BAs to Solve Practical Problems in Polymer Extrusion
Table 8.39. Criteria for optimizing twin-screw operating conditions and corresponding range of variation. Criteria
Aim
Range of variation
Cl - Output (kg/hr)
Maximize
3
20
C2 - Average strain
Maximize
1000
15000
Stay within range
180-210
220-240
C3 - Melt temp, at die exit (°C) C4 - Power consumption (W)
Minimize
0
9200
C5 - Average residence time (s)
Minimize
10
300
1 to 5) change radically. Therefore, the decision depends entirely on the (somewhat subjective) definition on the relative importance of the criteria.
Fig. 8.10. Pareto frontiers on the criterias domain after the optimization of the operating conditions.
Screw configuration Finally, Algorithm 3 will be used to optimize screw configuration, i.e., to define the best location of 10 screw elements (comprising 5 transport elements, 4 kneading blocks and 1 reverse element), as illustrated in Fig. 5. Two criteria, melt temperature and mechanical power consumption -
196
A. Gaspar-Cunha and J.A. Covas Table 8.40.
Best operating conditions for twin-screw extrusion.
Weights Wi
w2tow5
Operating Conditions iV(rpm)
Q* (kg/hr)
Ti (°C)
T2 (°C)
T3 (°C)
0.1 0.2 0.3 0.4
0.9/4 0.8/4 0.7/4 0.6/4
184 184 193 193
3 3 25 25
200 200 205 205
167 167 172 172
194 194 205 205
0.5 0.6
0.5/4 0.4/4
193 193
25 25
205 205
172 172
205 205
which are particularly dependent on screw geometry - should be minimized. Output, screw speed and barrel temperature are kept constant at 10 kg/hr, 100 rpm and 200 °C, respectively. The same genetic parameters were used, with the exception of the population size (200 external and 100 internal individuals). Figure 11 (top) shows the Pareto-curves in the criteria's domain for the initial and final populations. The improvement provided by MOEA is relevant. Since the two criteria are conflicting, solutions 1, 2 and 3, corresponding to relative degrees of satisfaction of each criterion, are considered, the corresponding screw profiles being represented in Fig. 11 (bottom). Screw 1 produces the highest power consumption, but the lowest outlet temperature. The kneading and reverse elements are located more upstream, therefore this screw is less restrictive downstream. Thus, the polymer melts earlier (increasing energy consumption, as melt flow requires more power than solids flow) and the melt has time to recover from the early viscous dissipation (low melt temperature). The profile - and thus the behavior of screw 3 is the opposite, while screw 3 exhibits a geometry that is a compromise between the other two, although more similar to that of screw 1. These results are in general agreement with practical experience, although a formal experimental validation needs to be carried out. 8.5. Conclusions An elitist multi-objective genetic algorithm, denoted as RPSGAe, was used to select the operating conditions and to design screws that optimize the performance of single-screw and co-rotating twin-screw extrusion, which are important industrial processing technologies. These correspond to complex multi-objective, combinatorial, not always continuous problems. The examples studied demonstrated that MOEA is sensitive to the type and relative
The Use of EAs to Solve Practical Problems in Polymer Extrusion
197
Fig. 8.11. Twin-screw configuration results: Top - Pareto curve; Bottom - optimal screws.
importance of the individual criteria, that the method proposed yields solutions with physical meaning and that it is possible to incorporate important empirical knowledge through constraints/prescribed variation range of both criteria and process parameters. Acknowledgments This work was supported by the Portuguese Fundagao para a Ciencia e Tecnologia under grant POCTI/34569/CTM/2000. References 1. C. Rauwendaal, Polymer Extrusion (Hanser Publishers, Munich 1986). 2. J.F. Agassant, P. Avenas and J. Sergent, La Mise en Forme des Matires Plastiques (Lavoisier, 3rd edition, Paris, 1996). 3. Z. Tadmor and I. Klein, Engineering Principles of Plasticating Extrusion (Van Nostrand Reinhold, Ney York, 1970).
198
A. Gaspar-Cunha and J.A. Covas
4. C.A. Coello Coello, D.A. Van Veldhuizen and G.B Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems (Kluwer, 2002). 5. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms (Wiley, 2001). 6. A. Gaspar-Cunha and J.A. Covas, RPSGAe - A Multiobjective Genetic Algorithm with Elitism: Application to Polymer Extrusion, in Metaheuristics for Multiobjective Optimisation, Lecture Notes in Economics and Mathematical Systems, Eds. X. Gandibleux, M. Sevaux, K. Sorensen, V. T'kindt (Springer, 2004). 7. A. Gaspar-Cunha, Modelling and Optimisation of Single Screw Extrusion (Ph. D. Thesis, University of Minho, Braga, 2000). 8. J.A. Covas, A. Gaspar-Cunha and P. Oliveira, An Optimization Approach to Practical Problems in Plasticating Single Screw Extrusion, Polym. Eng. and Sci. 39, 3, p. 443 (1999). 9. A. Gaspar-Cunha and J.A. Covas, The Design of Extrusion Screws: An Optimisation Approach, Intern. Polym. Process. 16, p. 229 (2001). 10. J.L. White, Twin Screw Extrusion; Technology and Principles (Hanser, Munich, 1990). 11. J.L. White, A.Y. Coran and A. Moet, Polymer Mixing; Technology and Engineering (Hanser, Munich, 2001). 12. H. Potente, J. Ansahl and R. Wittemeier, Throughput characteristics of Tightly Intermeshing Co-rotating Twin Screw Extruders, Intern. Polym. Proc. 5, p. 208 (1990). 13. H. Potente, J. Ansahl and B. Klarholtz, Design of Tightly Intermesing CoRotating Twin Screw Extruders, Intern. Polym. Proc. 9, p. 11 (1994). 14. B. Vergnes, G. Souveton, M.L. Delacour and A. Ainser, Experimental and Theorectical Study of Polymer Melting in a Co-rotating Twin Screw Extruder, Intern. Polym. Proc. 16, p. 351 (2001). 15. H. Potente and U. Melish, A Physico-Mathematical Model for Solids Conveying in Co-rotating Twin Screw Extruders, Intern. Polym. Proc. 11, p. 101 (1996). 16. S. Bawiskar and J.L. White, A Composite Model for Solid Conveying, Melting, Pressure and Fill Factor Profiles in Modular Go-Rotating Twin Screw Extruders, Intern. Polym. Proc. 12, p. 331 (1997). 17. B. Vergnes, G. Delia Valle and L. Delamare, A Global Computer Software for Polymer Flows in Corotating Twin Screw Extruders, Polym. Eng. Sci. 38, p. 1781 (1998). 18. O.S. Carneiro, J.A. Covas and B. Vergnes, Experimental and Theorectical Study of Twin-Screw Extrusion of Polypropylene, J. Appl. Polym. Sci. 78, p. 1419 (2000). 19. F. Fassihi-Tash and N. Sherkat, In-exact Decision Support for the Design of Plasticating Extruder Screws, Proceedings of Polymat'94, p. 434 (1994). 20. C.A. Thibodeau and P.G. Lafleur, Computer Design and Screw Optimization, Proceedings of the PPS Annual Meeting, Shangai, China, p. 15 (2000). 21. H. Potente, A. Mller and K. Kretschmer,iDevelopment and Verification of a Method to Optimize Individual Screw Elements for Co-rotating Twin Screw
The Use of EAs to Solve Practical Problems in Polymer Extrusion
199
Extruders, Proceeding of ANTEC2003 conference, USA (2003). 22. M.A. Roseman and J.S. Gero, Reducing the Pareto Optimal Set in Multicriteria Optimization, Eng. Optim. 8, p. 189 (1985). 23. Y. Nagata and S. Kobayashi, Edge Assembly Crossover: A High-power GA for the TSP, Seventh Int. Conf. on Genetic Algorithms, Michigan, USA, p. 450 (1997). 24. C.L. Valenzuela and L.P. Williams, Improving Simple Heuristic Algorithms for the TSP using a GA, Seventh Int. Conf. on Genetic Algorithms, Michigan, USA, p. 458 (1997). 25. Y. Zhenyu, L. Zhang, K. Lishan and L. Guangming, A New MOEA for Multi-objective TSP and Its Convergence Property Analysis, Proceedings of the Second Int. Conf. On Evol. Multi-Objective Optimization (EMO'2003), Faro, Portugal, p. 342 (2003). 26. G. Tao and Z. Michalewicz, Inver-over Operator for the TSP, Proceedings of the 5th Parallel Problem Solving from Nature, Amsterdam, p. 803 (1998).
CHAPTER 9 EVOLUTIONARY MULTI-OBJECTIVE OPTIMIZATION OF TRUSSES
Arturo Hernandez Aguirre and Salvador Botello Rionda Center for Research in Mathematics (CIMAT) Department of Computer Science A.P. 402, Guanajuato, Gto. C.P. 36000 MEXICO E-mail: artha,[email protected] In this chapter, we introduce the ISPAES evolutionary computation algorithm for truss optimization. The ISPAES algorithm needs little or no modifications to solve single-objective or multi-objective problems with a large number of constraints, either in discrete or continuous search space. Thus, we present a detailed description of the ISPAES algorithm, and solve several truss optimization problems. Different modalities are illustrated, that is, continuous/discrete, single/multiple objective. Pareto fronts in both continuous and discrete space are shown.
9.1. Introduction Evolutionary algorithms (EAs) are search and optimization techniques inspired by the natural selection principle. Thus, they are global optimization techniques naturally well suited for unconstrained problems. Since most real life problems involve constraints, a considerable amount of research has recently been triggered to augment EAs with a constraint-handling technique. A common constraint handling technique in EAs is the use penalty functions. In this approach, the amount of constraint violation is used to penalize infeasible individuals so that feasible individuals are favored by the selection process 25>28. Nonetheless, the use of multi-objective optimization concepts has proved more promissory for constraint handling. In this chapter we introduce the Inverted and Shrinkable Pareto Archived Evolutionary Strategy, ISPAES, which is an extension of the PAES algorithm 17 . ISPAES does not present the scalability problems that prevented its 201
202
A. Hernandez and S. Botello
antecessor from solving larger problems. The ISPAES algorithm 3-2 has successfully solved the well-known Michalewicz's benchmark 19, which consists of a set of 13 single-objective optimization problems with constraints, in continuous search space. It is uncommon to find in the specialized literature an evolutionary optimization algorithm that had been applied to solve problems of so different nature as we present in this chapter. We show solutions for single and multi-objective problems with constraints, in continuous and discrete search space. For multi-objective problems we depict the results as a Pareto front; for discrete optimization problems we use the catalog of Altos Hornos de Mexico. The organization of the chapter is the following: Section 9.2 presents the three most popular ways in which the Pareto concept has been incorporated into EAs. The constraint handling approach is also explained. Section 9.3 presents a detailed description of the ISPAES algorithm (for continuous search space), and the simple changes needed for solving discrete search space problems. Section 9.4 describes engineering optimization problems taken from the standard literature. Finally, Section 9.5 draws our conclusions and provides some paths of future research. 9.2. Related Work Since our approach belongs to the group of techniques in which multiobjective optimization concepts are adopted to handle constraints, we will briefly discuss some of the most relevant work done in this area. The main idea of adopting multiobjective optimization concepts to handle constraints is to redefine the single-objective optimization of f(x) as a multiobjective optimization problem in which we will have m + 1 objectives, where m is the total number of constraints. Then, we can apply any multiobjective optimization technique n to the new vector v = (f(x), fi(x),..., fm(x)), where fi(x),..., fm(x) are the original constraints of the problem. An ideal solution x would thus have fi(x) > 0 for 1 < i < m and f(x) < f(y) for all feasible y (assuming minimization). Three are the mechanisms taken from evolutionary multiobjective optimization that are more frequently incorporated into constraint-handling techniques 18: (1) Use of Pareto dominance as a selection criterion. Examples of this type of approach are given in 6>16>8. (2) Use of Pareto ranking 14 to assign fitness in such a way that nondominated individuals (i.e., feasible individuals in this case) are
Multi-Objective Optimization of Trusses
203
assigned a higher fitness value. Examples of this type of approach 99 99
7
are given in zz>ZJ-\ (3) Split the population in subpopulations that are evaluated either with respect to the objective function or with respect to a single constraint of the problem. This is the selection mechanism adopted in the Vector Evaluated Genetic Algorithm (VEGA) 26. Examples of this type of approach are given in 29>20>9.
In order to sample the feasible region of the search space widely enough to reach the global optimum it is necessary to maintain a balance between feasible and infeasible solutions. If this diversity is not maintained, the search will focus only in one area of the feasible region. Thus, it will lead to a local optimum solution. A multiobjective optimization technique aims to find a set of trade-off solutions which are considered good in all the objectives to be optimized. In global nonlinear optimization, the main goal is to find the global optimum. Therefore, some changes must be done to those approaches in order to adapt them to the new goal. Our main concern is that feasibility takes precedence, in this case, over nondominance. Therefore, good "trade-off" solutions that are not feasible cannot be considered as good as bad "trade-off" solutions that are feasible. Furthermore, a mechanism to maintain diversity must normally be added to any evolutionary multiobjective optimization technique. Tied to the constraint handling mechanism of ISPAES, we can find an enhanced selection operator. A desirable selection operator will provide a blend of feasible and infeasible individuals at any generation of the evolutionary process. Higher population diversity enhances exploration and prevents premature convergence. A robust evolutionary algorithm for constrained optimization will provide a selection mechanism with two clear objectives: to keep diversity, and to provide promissory individuals (approaching the optimum). These goals are difficult to reach when the selection mechanism is driven by "greedy rules" that fail to cooperate. A poor selection mechanism could minimize the effort of the diversity mechanism if only best-and-feasible individuals are favored. Similarly, a poor diversity preservation mechanism could never provide interesting individuals to the Pareto dominance-based selection operator as to create a promissory blend of individuals for the next generation.
204
A. Hernandez and S. Botello
9.3. ISPAES Algorithm All of the approaches discussed in the previous section have drawbacks that keep them from producing competitive results with respect to the constraint-handling techniques that represent the state-of-the-art in evolutionary optimization. In a recent technical report 18, four of the existing techniques based on multiobjective optimization concepts (i.e., COMOGA 29, VEGA 9 , MOGA 8 and NPGA 7) have been compared using Michalewicz's benchmark 19 and some additional engineering optimization problems. Although inconclusive, the results indicate that the use of Pareto dominance as a selection criterion gives better results than Pareto ranking or the use of a population-based approach. However, in all cases, the approaches analyzed are unable to reach the global optimum of problems with either high dimensionality, large feasible regions or many nonlinear equality constraints 18. In contrast, the approach proposed in this paper uses Pareto dominance as the criterion selection, but unlike the previous work in the area, a secondary population is used in this case. The approach, which is a relatively simple extension of PAES 17 provides, however, very good results, which are highly competitive with those generated with an approach that represents the state-of-the-art in constrained evolutionary optimization. The structure of the ISPAES algorithm is shown in Figure 9.1. Notice the two loops operating over the Pareto set (in the external storage). The right loop aims for exploration of the search space, the left loop aims for population diversity and exploitation. ISPAES has been implemented as an extension of the Pareto Archived Evolution Strategy (PAES) proposed by Knowles and Corne 1T for multiobjective optimization. PAES's main feature is the use of an adaptive grid on which objective function space is located using a coordinate system. Such a grid is the diversity maintenance mechanism of PAES and it constitutes the main feature of this algorithm. The grid is created by bisecting k times the function space of dimension d (d is the number of objective functions of the problem. In our case, d is given by the total number of constraints plus one. In other words, d = n + p+1, where n is the number of inequality constraints, and p is the number of equality constraints. Note that we add one to this summation to include the original objective function of the problem) . The control of 2kd grid cells means the allocation of a large amount of physical memory for even small problems. For instance, 10 functions and 5 bisections of the space produce 250 cells. Thus, the first feature introduced
205
Multi-Objective Optimization of Trusses
ISPAES ALGORITHM INITIAL POPULATION PICK PARENT FROM LESS CROWDED AREA r <SELECT>
I
[
* MUTATION
^
,;
^ PARETO
I
S E T
I
'
N
° 1
CHILD DOMINATES
PA ENT?
R
iT~^
\
Yes
I ADD CHILD BY USING PROCEDURE
NEW PARETO SET Fig. 9.1. The logical structure of ISPAES algorithm
in ISPAES is the "inverted" part of the algorithm that deals with this space usage problem. ISPAES's fitness function is mainly driven by a feasibility criterion. Global information carried by the individuals surrounding the feasible region is used to concentrate the search effort on smaller areas as the evolutionary process takes place. In consequence, the search space being explored is "shrunk" over time. Eventually, upon termination, the size of the search space being inspected will be very small and will contain the solution desired (in the case of single-objective problems. For multi-objective problems, it will contain the feasible region). The main algorithm of ISPAES is shown in Figure 9.2. Its goal is the construction of the Pareto front which is stored in an external memory (called file). The algorithm performs Maxnew loops, generating a child h from a random parent c in every loop. Therefore, the ISPAES algorithm introduced here is based on a (1 + 1) — ES. If the child is better than the
206
A. Hernandez and S. Botello
parent, that is, the child dominates its parent, then it is inserted in file, and its position is recorded. A child is generated by introducing random mutations to the parent, thus, h = mutate(c) will alter a parent with increments whose standard deviation is governed by Equation 1. maxsize: maximum size of file c: current parent £ X (decision variable space) h: child of c € X, a^- individual in file that dominates h ad,: individual in file dominated by h current: current number of individuals in file cnew: number of individuals generated thus far g: pick a new parent from less densely populated region every g new individuals r: shrink space at every r new individuals current = 1; cnew=0; c = newindividual(); add(c); While cnew<MaxNew do h = mutate(c); cnew+ =1; if (
Fig. 9.2. Main algorithm of ISPAES
Most of main and the function test (h,c,file) in ISPAES are devoted to three things: (1) decide whether a new child should be inserted in file, and if so, (2) how to make room for the new member and (3) who becomes the new parent. Every g new children created, a new parent is randomly
Multi-Objective Optimization of Trusses
207
picked from file for this purpose. Also, every r children generated, the space is shrunk around the current Pareto front represented by the individuals of the external memory. Here we introduce the following notation: X\dx2 means x\ is located in a less populated region of the grid than x2- The pseudo-code of this function is depicted in Figure 9.3. if (current < maxsize) then { add(h); if (h • c) then c=h } else if (3ap6file | h • ap) then { remove(ap); add(h) if (h D c) then c = h; }
Fig. 9.3. Pseudo-code of test(h,c,file) (called by main of IS-PAES)
9.3.1. Inverted "ownership"
As noted before, ISPAES keeps the location of every individual in the grid, whereas PAES keeps occupancy of every grid location. The advantage of the inverted relationship is clear since in the worst scenario there would be as many grid locations to record as population size (thus one individual per grid location). 9.3.2. Shrinking the Objective Space
Shrinkspace(file) is the most important function of ISPAES since its task is the reduction of the search space. The space is reduced every r number of generations. The pseudo-code of Shrinkspace(file) is shown in Figure 9.4. In the following we describe the four tasks performed by shrinkspace. • The function select (file) returns a list whose elements are the best individuals found in file. The size of the list is set to 15% of maxsize. Thus, the goal of select (file) is to create a list with: a) only the best feasible individuals, b) a combination of feasible and partially feasible individuals, or c) the "most promising" infeasible individuals. The selection algorithm is shown in Figure 9.5. Note that validconstraints (a list of indexes to the problem constraints)
208
A. Hernandez and S. Botello
XpOb- vector containing the smallest value of either i j £ l Xpob' vector containing the largest value of either Xi G X select (file); getMinMax( file, xpob, xpob); trim(xp 0() , xpob ); adjustparameters(file);
Fig. 9.4. Pseudo-code of Shrinkspace(flle) (called by main of IS-PAES) indicates the order in which constraints are tested. The loop steps over the constraints removing only one (the worst) individual for each constraint till there is none to delete (all feasible), or 15% of file size is reached (in other words, 85% of the Pareto set will be generated anew using the best 15% individuals as parents). Also, in order to keep diversity, a new parent is randomly chosen from the less populated region of the grid after placing on it g new individuals. • The function getMinMax(file) takes the list list (last step in Figure 9.5) and finds the extreme values of the decision variables represented by those individuals. Thus, the vectors x_pob and xpob are found. • Function trim(x p0 6, xpob) shrinks the feasible space around the potential solutions enclosed in the hypervolume defined by the vectors x_pOb a n d ~x~pob- Thus, the function trim(cCp0;,, xpob) (see Figure 9.6) determines the new boundaries for the decision variables. The value of /? is the percentage by which the boundary values of either Xi £ X must be reduced such that the resulting hypervolume H is a fraction a of its previous value. The function trim first finds in the population the boundary values of each decision variable: ~xPob,i and xpob t. Then the new vectors x~i and x_t are updated by deltaMim, which is the reduction in each variable that in the overall reflects a change in the volume by a factor /?. In ISPAES all objective variables are reduced at the same rate /?, therefore, /? can be deduced from a as discussed next. Since we need the new hypervolume to be a fraction a of the previous one,
Hnew > a -^old
Multi-Objective Optimization of Trusses
209
m: number of constraints i: constraint index maxsize: max size of file listsize: 50% of maxsize constraintvalue(x,i): value of individual at constraint i sortfile(flle): sort file by objective function worst(file,i): worst individual in file for constraint i validconstraints={l,2,3,...,m}; i=firstin(validconstraints); While (size(file) > listsize and size(validconstraints) > 0) { x=worst(file,i) if (x violates constraint i) file=delete(file,x) else validconstraints=removeindex(validconstraints,i) if (size(validconstraints) > 0) i=nextin(validconstraints) } if (size(file) == listsize)) list=file else ffle=sort(file) list=copy(file,listsize) *pick the best listsize elements*
Fig. 9.5. Pseudo-code of select(file) (called by shrinkspace)
n(5j +i -sj +i )=°n^-«!)
i=l
»=1
Either xi is reduced at the same rate j3, thus
i=l
i=l
i=l
i
Pn = a In short, the new search interval of each decision variable Xi is
210
A. Hernandez and S. Botello
n: size of decision vector; afj: actual upper bound of the ith decision variable xt: actual lower bound of the ith decision variable ^pob,i' upper bound of ith decision variable in population xpob f lower bound of ith decision variable in population Vi : i e { 1, .. •, n } slacki = 0.05 x (xpob,i - xpobi) width-pobi = xpob,i - xpobi; width* = x\ — x\ deltaMirii = "-^ —i deltas = max(slackj-, deltaMin;); x\+1 = xpob,i + deltas x\+l = x_pobi - deltas if (z- + 1 > x0Tiginai,i) then - T-^i' T * + 1 _— 2L{
if ( ^
+1
Tt+i
+ 1
< xoriginalti) -
- T^original.,i-> • • , - rx jt
+ 1
- r —
• , •• •"original^)
then x\ + = x<,riginal,i ~ £* +1 ; +1
T
i-i —original,i> 7 f 1 il if (' T • • i A fi*nt;ii h p n x^ T7*"1"1 — • • i •• ^xr + ^> >* xoriginal,z) — "r d>original,it
Fig. 9.6. Pseudo-code of trim (called by shrinkspace)
adjusted as follows (the complete algorithm is shown in Figure 9.4): widthnew
> f3 x widthoid
It should be noted that the value of a has an important impact on the performance of ISPAES because it controls the shrinking speed. In order to determine a range within which we could set this parameter for a large variety of problems, we studied the effect of a on the performance of our algorithm for many test problems. From analyzing this effect, we found that in all cases, a range of a between 85% and 97% was always able to generate the best possible solutions to each problem. Values smaller than 0.80 make the algorithm prone to converge to local minima. Values of a too near to 100% slow down convergence, although they increase the probability of success. In order to avoid a fine tuning of a dependent of each test function, we decided to set its value to 0.90, which we considered as a good compromise based on our analysis. As we will see later on, this value of a provided good results in all the problems solved.
Multi-Objective Optimization of TYusses
211
Note that also the parameter r (see Figure 9.2), which controls the shrinkspace rate, plays an important role in the algorithm. To set the value of r, we performed a similar analysis to the one previously described for a. In this analysis, we related the behavior of r with that of a and with the performance of ISPAES. Our results indicated that a value of r = 2 * maxsize provided convergence to the optimum in most of the problems (maxsize is the number of elements allowed to the Pareto set, stored in the external file). Thus, we used r = 200, and maxsize = 100 in all the experiments reported in this paper. The variable slack is computed once every new search interval is determined (usually set to 5% of the interval). The role of slack is simply to prevent (up to some extent) against fast decreasing rates of the search interval. • The last step of shrinkspace() is a call to adjustparameters(flle). The goal is to re-start the control variable a through: (7i = {xi-Zi)/Vn
ie(l,...,n)
(1)
This expression is also used during the generation of the initial population. In that case, the upper and lower bounds take the initial values of the search space indicated by the problem. The variation of the mutation probability follows the exponential behavior suggested by Back 4 . Elitism A special form of elitism is implemented in IS-PAES to prevent the lost of the best individual. Elitism is implemented as follows: the best individual of the generation is marked and only replaced by another one if it is in the feasible region and with better objective function value. ISPAES for Optimizing problems in Discrete Search Space Simple modifications are required for discrete optimization problems. The initial value of all objective variables is a random integer drawn from an uniform distribution, and bounded by the upper and lower limits staten by the specific problem. Mutation of objective variables is performed as follows, xl+1 = x\+ rand(<Ji)
where cr, is the control variable of the corresponding objective variable, and rand(ai) is a random number with uniform distribution in the interval
212
A. Hernandez and S. Botello
Control variables a% are mutated as follows, if(random() < 0.45) then a — a + 1; else a = a — 1; this is, with little less probability than the average of 0.5, the control variables diminish their value by 1. The reduction of the search space is performed as shown in Figure 9.6 for the real space case, except that all results of the computations must be rounded up to the next integer. The variable slack is also computed as depicted in Figure 9.6, it must also be rounded up, and its smallest possible value is 1. 9.4. Optimization Examples The parameters used by the ISPAES algorithm for solving all the following problems were: • maxsize = 100, the size of the Pareto set • r = 200, the reduction rate. Perform "shrinkspace" each time that 200 children have been generated. • listsize = 50%of max size. Thus, when infeasible individuals are removed from the Pareto set, keep at least 50% of the original size. • Q = 0.9. Thus, the hypervolume preserved after a shrunk is at least 90% • A total of 50,000 fitness function evaluations are performed. 9.4.1. Optimization
of a 49-bar Plane Truss
The first engineering optimization problem chosen is the optimization of the 49-bar plane truss shown in Figure 9.7. The solutions to this problem were computed in discrete search space using the catalog of Altos Hornos de Mexico. Both single-objective and multi-objective versions of the problem are described next. 9.4.1.1. The 49-bar Plane Truss as a Single-Objective Optimization Problem with Constraints The goal is to find the cross-sectional area of each member of the truss, such that the overall weight is minimized, subject to stress and displacement constraints. The weight of the truss is given by F(x) = X)j=i 7A?AJ> where
Multi- Objective Optimization of Trusses
213
Aj is the cross-sectional area of the j t h member, Lj is the corresponding length of the bar, and 7 is the volumetric density of the material.
Fig. 9.7. 49-bar plane truss used as the first engineering optimization example
We used the catalog of Altos Homos de Mexico, S.A., with 65 entries for the cross-sectional areas available for the design. Other relevant information is the following: Young modulus = 2.1 x 106 kg/cm 2 , maximum allowable stress = 3500.00 kg/cm 2 , 7 = 7.4250 x 10~3 kg/cm 3 , and a horizontal load of 4994.00 kg applied to the nodes: 3, 5, 7, 9, 12, 14, 16, 19, 21, 23, 25 y 27. We solved this problem for three cases: (1) Case 1. Stress constraints only: Maximum allowable stress = 3500.00 kg/cm 2 . A total of 49 constraints, thus 50 objective functions. (2) Case 2. Stress and displacement constraints: Maximum allowable stress = 3500.00 kg/cm 2 , maximum displacement per node = 10 cm. There are 72 constraints, thus 73 objective functions. (3) Case 3. Real-world problem: The design problem considers traction and compression stress on the bars, as well as their proper weight. Maximum allowable stress = 3500.00 kg/cm 2 , maximum displacement per node =10 cm. A total of 72 constraints, thus, 73
214
A. Hernandez and S. Botello
objective functions. The average result of 30 runs for each case are shown in Tables 9.41, 9.42 and 9.43. We compare ISPAES with previous results reported by Botello et al. 5 using other heuristics with a penalty function 25 (SA: Simulated Annealing, GA50: Genetic Algorithm with a population of 50, and GSSA: General Stochastic Search Algorithm with populations of 50 and 5). Table 9.41. Comparison of different algorithms on the 49-bar truss, case 1 | Algorithm | Average Weight (Kg) | ISPAES I 610 SA 627 GA50 649 GSSA50 619 GSSA5 625
Table 9.42. Comparison of different algorithms on the 49-bar truss, case 2 \ Algorithm | Average Weight (Kg) | ISPAES 725 SA 737 GA50 817 GSSA50 ~ 748 GSSA5 | 769
Table 9.43. Comparison of different algorithms on the 49-bar truss, case 3 Algorithm [ Average Weight (Kg) | ISPAES I 2603 SA 2724 GA50 2784 GSSA50 ~ 2570 GSSA5 2716
We can clearly see that in all the cases tried, ISPAES produced the lowest average weight.
Multi-Objective Optimization of Trusses
215
9.4.1.2. The 49-bar Plane Truss as a Multi-Objective Optimization Problem with Constraints The statement of this problem is similar to case 3 in Section 9.4.1.1, but now we consider two objective functions for simultaneous optimization. First objective is the minimization of structure weight, the second objective is the minimization of the horizontal displacement of the node at upper right corner of the structure. The Pareto front of these two objectives subject to 71 constraints is shown in Figure 9.8.
Fig. 9.8. Pareto front for 49-bar plane truss for two objective optimization problem (See Section 9.4.1.2)
9.4.2. Optimization of a 10-bar Plane Truss The second engineering optimization problem chosen is the optimization of the 10-bar plane truss shown in Figure 9.9. This problem has been solved by several authors in real search space, thus, we solved it in real space for both single-objective and multi-objective optimization for the sake of comparisons.
216
A. Hernandez and S. Botello
9.4.2.1. The 10-bar Plane Truss as a Single-Objective Optimization Problem with Constraints The goal is to find the cross-sectional area of each bar of this truss such that its weight is minimized, subject to stress and displacement constraints. The weight of the truss is given by:
10
F(x) = Y/lAjLj
(2)
.7 = 1
where: a; is a candidate solution, Aj is the cross-sectional area of the jth member, Lj is the length of member j and 7 is the volumetric weight of the material. The maximum allowable displacement for each node (vertical and horizontal) is assumed as 5.08 cm. There are 10 stress constraints and 8 displacement constraints in total. The minimum and maximum allowable value for the cross-sectional areas are 0.5062 cm2 and 999.0 cm2, respectively. The remaining assumed data are: Young's modulus E — 7.3xl0 5 kg/cm2, maximum allowable stress = 1742.11 kg/cm2, 7 = 7.4239xl0~3 kg/cm3, and a vertical load of -45454.0 kg applied at nodes 2 and 4. Table 9.44 shows the minimum value found for this problem by different heuristic algorithms 5 : GSSA (general stochastic search algorithm with a population size of five, crossover rate of zero, and mutation rate 0... 10/(number_of_bars), and simulated annealing with a = 1.001), VGA (variable-length genetic algorithm of Rajeev and Krishnamoorthy 21, with population size of 50), MC (Monte-Carlo annealing algorithm of Elperin 12 ), SAARSR (Simulated Annealing with Automatic Reduction of Search Range, proposed by Tzan and Pantelides 3 0 ), ISA (Iterated Simulated Annealing, of Ackley 1, and SSO (State Space Optimal 15 ). We can see in Table 9.44 that ISPAES found better results than any of the other methods. Note that MC found a solution with a lower weight than ISPAES, but such a solution violates stress and displacement constraints, as can be seen in Tables 9.45 and 9.46. The convergence of the algorithm is shown in Figure 9.10 (for any random run). Note that the algorithm reaches the neighborhood of the optimum at 25000 fitness function evaluations.
Multi-Objective Optimization of Trusses
217
Fig. 9.9. 10-bar plane truss used as the second engineering optimization example
9.4.2.2. The 10-bar Plane Truss as a Multi-Objective Optimization Problem with Constraints Now we approach the 10-bar truss design as a multiobjective optimization problem with constraints. The first objective is the structure weight minimization, and the second is the vertical displacement of node number 2. The Pareto front of these functions is shown in Figure 9.11. 9.4.3. Optimization of a 72-bar 3D Structure The next problem is the design of the 72-bar 3D structure shown in Figure 9.12, which has been addressed elsewhere in the literature 10.
218
A. Hernandez and S. Botello
Table 9.44. Comparison of weights for the 10-bar plane truss of the second engineering example Element I ISPAES I GSSA I VGA I J. 190.53 205.17 206.46 _2 0.6466 ~0.6452 0.6452 _3 146.33 134.20 151.62 _4 95.07 90.973 103.23 _5 0.6452 0.6452 0.6452 _6 3.0166 0.6452 0.6452 _7 47.677 55.487 54.84 _8 129.826 127.75 129.04 _9 133.282 133.56 132.27 10 0.6452 0.6452 0.6452~ Vol. (cm 3 ) 801624.5 805777 833258 Weight (kg) 5951 6186 6186
MC I 200.01 0.6452~ 129.04 90.328 0.6452 0.6452 51.616 145.17 96.78 0.6452 765710 5685 ~
SSO I 193.75 0.6452~ 150.15 98.62 0.6452 3.23 48.18 136.64 139.47 0.6452 828956 6155
ISA I SAARSR 269.48 201.35 79.810 ~ 0.6452 178.45 161.55 152.90 95.68 70.390 0.6452 10.260 4.19 147.87 49.16 14.710 131.55 156.06 134.32 87.740 0.6452 1313131 833258 9750 ' 6187
Table 9.45. Comparison of stresses for the 10-bar plane truss of the second engineering example. We indicate in boldface the elements in which the stress constraints are being violated Element I IS-PAES I GSSA I _1 483.27 -447.65 2 " -73.37 ~ 0.41 ~ _3 -613.26 670.31 _4 -478.62 499.60 _5 1741.30 -1464.09 _6 -15.72 0.41 _7 1313.54 -1134.31 _8 -507.89 513.60 _9 482.80 -481.25 _L0 103.985 ~ 0.58
VGA -444.75 3.41 593.43 440.30 -1428.68 3.41 -1148.24 508.24 -485.97 -4.82
I
MC I -460.10 ' -15.30 ' 695.72 503.06 " -1757.16 -15.30 -1214.48 453.71 -664.00 21.64
SSO I ISA I SAARSR~ -475.31 -209.75 -476.58 91.98 ~ -111.35 ' 43.99 597.46 449.90 569.04 461.46 239.13 485.80 -1754.88 362.13 " -1641.04 18.37 -866.13 14.83 -1299.10 ~~-763.45 -1311.60 482.74 1064.60 528.83 -461.46 -331.34 -492.79 -130.07 ~143.23 | -65.61
Table 9.46. Comparison of displacements for the 10-bar plane truss of the second engineering example. We indicate in boldface the elements in which the displacement constraints are being violated Element I IS-PAES I GSSA I VGA I MC 1 0.5134 0.5602 0.5528 0.5954 2 -5.080 ~ -5.0798 -4.9040 ' -5.4352 3 -1.368 -1.4654~~ -1.2948" -1.5016 _4 -5.060 -5.0792 -4.8997 5.4543 5 0.6053 0.5607~ 0.5571 0.5763 Jj -1.878 -1.8474 -1.8303 -1.7130 7 ~ ~ -0.768 -0.8396~ -0.7433 ~ -0.8715 _8 -4.059 -3.6813 -3.6199 -3.9140
I
SSO I ISA 0.4802 0.4022 -4.9056" -3.8008 " -1.3264 '-0.8631 -4.8826 -4.8857 " 0.5954 ' 0.2627 -1.8047 -2.9298 ' -0.7484 -0.5636 -4.0030 | -2.4762
I SAARSR 0.5419 ' 5.0889 -1.3213 -5.0746 ~ 0.5970 -1.9303 ~ -0.7129 -3.9901
The truss is subject to two distinct loading conditions and sixteen independent design variables. All nodes are subject to displacement constraint
Multi-Objective Optimization of Trusses
219
Fig. 9.10. Typical convergence of ISPAES for 10-bar truss problem as a single objective optimization (Section 9.4.2.1)
A < 0.25 inches in x and y direction. All bars have a stress constraint -1759.25 kg/cm 2 < {aa)i < 1759.25 kg/cm 2 , i = 1,2... 72. The minimum size constraint is 0.254 cm2 < At, i — 1,2... 72. The properties of the material are: modulus of elasticity: 7.031 x 10 6 kg/cm 2 , volumetric weight: 2.77 x 10~ 3 kg/cm 3 . The first loading condition has a point load in node 1 with 2270 kg in x direction, 2270 kg in y direction and —2270 kg in z direction. The second loading condition has four load points in nodes 1,2,3 and 4, with —2270 kg in z direction. The problem consists on designing the truss for both loading conditions. In Table 9.47 we give the group description of the truss. We solved this problem as a single-objective optimization case in both continuous and discrete search spaces.
9.4.3.1. The 72-bar 3D Structure in Continuous Search Space as a Single-Objective Optimization Problem with Constraints As noted, the design problem is the minimization of weight structure subject to both loading conditions, the We compare ISPAES against several results
220
A. Hernandez and S. Botello
Fig. 9.11. Pareto front for 10-bar truss optimization as a multiobjective optimization problem Table 9.47. 72-bar 3D cross sections by group Group Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Member "
A1-A4 A5-A12 A13-A16 A17-A18 A19-A22 A23-A30 A 3 1 -A 3 4 A35-A36 A37-A40 A 4 i-A48 A49-A52 A53-A54 A55-A58 A S9 -A66 A67-A70 A71-A72
Multi-Objective Optimization of Trusses
Fig. 9.12.
221
Optimization of 72-bar 3D structure
of other authors in Table 9.48; as it can be observed IS-PAES provides the best solution. In Table 9.49 we show basic statistics for 30 runs.
Table 9.48. ISPAES vs results of several authors for 72-bar 3D structure in continuous search space |
Algorithm | Best Minimun Weight (Kg) | IS-PAES I 172.02 Venkayya 31 173.06 Gellatly l 5 ~ 179.77 Renwei 24 172.36 Schmit27 176.44 Xicheng 32 172.90 GAPS 5 | 173.94
222
A. Hernandez and S. Botello Table 9.49. ISPAES statistics for 72-bar 3D structure in continuous search space | Parameter [ Weight (Kg) | " Best " Worst Mean Std. dev. Median Fact. Sol.
172.02 172.09 172.05 0.015 172.04 30
9.4.3.2. The 72-bar 3D Structure in Discrete Search Space as a Single-Objective Optimization Problem with Constraints We solved three cases of this problem using the catalog of Altos Hornos de Mexico, S.A. with 65 entries for the cross-sectional areas: 1) stress constraints only; 2) stress and displacement constraints; 3) displacement constraints, and considers bar traction and compression stress, as well as their proper weight. The values of material properties and constraints remain with no change for all three cases. Statistics of the best solutions in 30 runs for the 3 cases are shown in Table 9.50 Table 9.50. ISPAES solutions to 72-bar 3D structure using a catalog with 65 entries (discrete search space) ["Parameter | Casel (Kg) | Case2 (Kg) | Case3 (Kg) Best ~Worst Mean Std. dev. Median Fact. Sol.
I 92.3295 ~ 92.3295 92.3295 CUD 92.3295 30
I " "
192.7194 193.4353 192.9098 0.3060 192.7194 30
I
630.400 640.3640 633.2354 2.7371 632.9665 30
9.5. Final Remarks and Future Work We have introduced the ISPAES evolutionary algorithm that combines the following three ideas: 1) a constraint handling mechanism based on multiobjective optimization concepts; 2) a Pareto dominance-based selection operator which promotes diversity and a desired blend including promissory and "best infeasible" individuals; 3) a search reduction population driven mechanism, thus, self adaptable, that directs the search towards potential areas of the space; and 4) an external memory to store the latest Pareto set.
Multi-Objective Optimization of Trusses
223
The algorithm in its basic form is used to solve single and multi-objective problems, in discrete and continuous search space. Pareto fronts in these spaces have been computed during the experiments. This double capability of solving both single- and multi-objective optimization problems is not common for an evolutionary algorithm. ISPAES requires to make decisions over four parameters (described in Section 9.4): the size of the Pareto set; the shrinkspace rate; the size of the new hypervolume after reduction; and the percentage of remanent "best infeasible" individuals in the Pareto set. These parameters are not hard to set, they could be as easy or difficult to set as any parameter setting of the standard Genetic Algorithm. After some trials the proper combinations comes easily since they are related in a logical manner. Any approximate set of parameters works nicely for the algorithm, so robustness towards several kinds of problems is one advantage of the ISPAES algorithm. Scaling to large problems is of course required and it has been the major weakness of evolutionary algorithms. Nonetheless, we showed here how ISPAES, using both the real representation inherent to the evolution strategies and multi-objective optimization concepts, is able to handle a large number of constraints. Future work for this algorithm is the development of a multiparent approach which should improve diversity and exploration. Acknowledgments The authors recognize support for this work from CONACyT project No. 40721-Y, Mexico. References 1. D. Ackley. An empirical study of bit vector function optimization. In Lawrence Davis, editor, Genetic Algorithms and Simulated Annealing, pages 170-271. Morgan Kaufmann Publishers, Los Altos, California, 1987. 2. Arturo Hernandez Aguirre, S. Botello, C. Coello, and G. Lizarraga. Use of Multiobjective Optimization Concepts to Handle Constraints in SingleObjective Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2003), pages 573-584, Berlin, Germany, July 2003. Springer-Verlag, Lecture Notes in Computer Science No. 2723. 3. Arturo Hernandez Aguirre, S. Botello, G. Lizarraga, and C. Coello. ISPAES: A Constraint-Handling Technique Based on Multiobjective Optimization Concepts. In Proceedings of the 2nd International Conference, Evolutionary Multi- Criterion Optimization (EMO2003), pages 73-87, Berlin, Germany, April 2003. Springer-Verlag, Lecture Notes in Computer Science No. 2632.
224
A. Hernandez and S. Botello
4. Thomas Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996. 5. Salvador Botello, Jose Luis Marroqum, Eugenio Oiiate, and Johan Van Horebeek. Solving Structural Optimization problems with Genetic Algorithms and Simulated Annealing. International Journal for Numerical Methods in Engineering, 45(8):1069-1084, July 1999. 6. Eduardo Camponogara and Sarosh N. Talukdar. A Genetic Algorithm for Constrained and Multiobjective Optimization. In Jarmo T. Alander, editor, 3rd Nordic Workshop on Genetic Algorithms and Their Applications (3NWGA), pages 49-62, Vaasa, Finland, August 1997. University of Vaasa. 7. Carlos A. Coello Coello and Efren Mezura-Montes. Handling Constraints in Genetic Algorithms Using Dominance-Based Tournaments. In I.C. Parmee, editor, Proceedings of the Fifth International Conference on Adaptive Computing Design and Manufacture (ACDM 2002), volume 5, pages 273-284, University of Exeter, Devon, UK, April 2002. Springer-Verlag. 8. Carlos A. Coello Coello. Constraint-handling using an evolutionary multiobjective optimization technique. Civil Engineering and Environmental Systems, 17:319-346, 2000. 9. Carlos A. Coello Coello. Treating Constraints as Objectives for SingleObjective Evolutionary Optimization. Engineering Optimization, 32(3):275308, 2000. 10. Carlos A. Coello Coello and Alan D. Christiansen. Multiobjective optimization of trusses using genetic algorithms. Computers and Structures, 75(6):647-660, May 2000. 11. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, May 2002. ISBN 0-3064-6762-3. 12. T. Elperin. Monte-carlo structural optimization in discrete variables with annealing algorithm. International Journal for Numerical Methods in Engineering, 26:815-821, 1988. 13. R. A. Gellatly and L. Berke. Optimal structural design. Technical Report AFFDL-TR-70-165, Air Force Flight Dynamics Laboratory, 1971. 14. David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, Reading, Massachusetts, 1989. 15. Edward J. Haug and Jasbir S. Arora. Applied Optimal Design: Mechanical and Structural Systems. Wiley, New York, 1979. 16. F. Jimenez, A.F. Gomez-Skarmeta, and G. Sanchez. How Evolutionary Multiobjective Optimization can be used for Goals and Priorities based Optimization. In E. Alba, F. Fernandez, J.A. Gomez, F. Herrera, J.I. Hidalgo, J. Lanchares, J.J. Merelo, and J.M. Sanchez, editors, Primer Congreso Espanol de Algoritmos Evolutivos y Bioinspirados (AEB'02), pages 460-465, Merida Espana, 2002. Universidad de la Extremadura, Espana. 17. Joshua D. Knowles and David W. Corne. Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation, 8(2):149-172, 2000.
Multi-Objective Optimization of Trusses
225
18. Efren Mezura-Montes and Carlos A. Coello Coello. A Numerical Comparison of some Multiobjective-based Techniques to Handle Constraints in Genetic Algorithms. Technical Report EVOCINV-03-2002, Evolutionary Computation Group at CINVESTAV-IPN, Mexico, D.F. 07300, September 2002. available at: http://www.cs.cinvestav.mx/~EVOCINV/. 19. Zbigniew Michalewicz and Marc Schoenauer. Evolutionary Algorithms for Constrained Parameter Optimization Problems. Evolutionary Computation, 4(l):l-32, 1996. 20. I. C. Parmee and G. Purchase. The development of a directed genetic search technique for heavily constrained design spaces. In I. C. Parmee, editor, Adaptive Computing in Engineering Design and Control-'94, pages 97-102, Plymouth, UK, 1994. University of Plymouth, University of Plymouth. 21. S. Rajeev and C.S. Krishnamoorthy. Genetic Algorithms-Based Methodologies for Design Optimization of Trusses. Journal of Structural Engineering, 123(3):350-358, 1997. 22. Tapabrata Ray, Tai Kang, and Seow Kian Chye. An Evolutionary Algorithm for Constrained Optimization. In Darrell Whitley et al., editor, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '2000), pages 771-777, San Francisco, California, 2000. Morgan Kaufmann. 23. Tapabrata Ray and K.M. Liew. A Swarm Metaphor for Multiobjective Design Optimization. Engineering Optimization, 34(2):141-153, March 2002. 24. X. Renwei and L. Peng. Structural optimization based on second order approximations of functions and dual theory. Computer Methods in Applied Mechanics and Engineering, 65:101-104, 1987. 25. Jon T. Richardson, Mark R. Palmer, Gunar Liepins, and Mike Hilliard. Some Guidelines for Genetic Algorithms with Penalty Functions. In J. David Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms (ICGA-89), pages 191-197, San Mateo, California, June 1989. George Mason University, Morgan Kaufmann Publishers. 26. J. David Schaffer. Multiple Objective Optimization with Vector Evaluated Genetic Algorithms. In Genetic Algorithms and their Applications: Proceedings of the First International Conference on Genetic Algorithms, pages 93100. Lawrence Erlbaum, 1985. 27. L.A. Schmidt and B. Farshi. Some Approximation Concepts for Structural Synthesis. Journal of the American Institute of Aeronautics and Astronautics, 12:231-233, 1974. 28. Alice E. Smith and David W. Coit. Constraint Handling Techniques— Penalty Functions. In Thomas Back, David B. Fogel, and Zbigniew Michalewicz, editors, Handbook of Evolutionary Computation, chapter C 5.2. Oxford University Press and Institute of Physics Publishing, 1997. 29. Patrick D. Surry and Nicholas J. Radcliffe. The COMOGA Method: Constrained Optimisation by Multiobjective Genetic Algorithms. Control and Cybernetics, 26(3):391-412, 1997. 30. S. Tzan and C.P. Pantelides. Annealing strategy for optimal structural design. Journal of Structural Engineering, 122(7):815-827, 1996. 31. VB. Venkayya. Design of Optimum Structures. Computers & Structures,
226
A. Hernandez and S. Botello
1:265-309, 1971. 32. W. Xicheng and M. Guixu. A parallel iterative algorithm for structural optimization. Computer Methods in Applied Mechanics and Engineering, 96:2532, 1992.
CHAPTER 10 CITY AND REGIONAL PLANNING VIA A MOEA: LESSONS LEARNED
Richard Balling Department of Civil and Environmental Engineering, Brigham Young University Provo, Utah, USA E-mail: [email protected] The traditional approach to city and regional land-use and transportation planning is described. The traditional approach totally depends on the preferences and past experiences of planners, and does not objectively optimize the very large search space. Furthermore, cities in a metropolitan region often plan independently, neglecting regional goals. The planning problem is also fraught with multiple competing objectives and interests. This chapter describes a multi-year research project to apply a MOEA to city and regional planning in general, and to planning of the Wasatch Front Metropolitan Region in Utah, USA in particular. The problem formulation at both the city level and the regional level is described. The choice of MOEA is explained. Results were presented to professional planners and elected officials. Based on their suggestions, modifications were made both to the problem formulation and the algorithm. Although non-dominated plans generated by the MOEA have not been adopted per se, the results have influenced planning thinking and trends. The obstacles to adopting this radically new way of city planning are described. 10.1. The Traditional Approach Seventy six percent of the population of the state of Utah in the United States lives in the Wasatch Front Metropolitan Region (WFMR). The population of the WFMR in 2000 was 1,702,450. The WFMR encompasses Weber, Davis, Salt Lake, and Utah counties and includes Salt Lake City as well as 70 additional cities. The WFMR is approximately 100 miles long in the north-south direction and 20 miles wide in the east west direction. The 227
228
Richard Balling
WFMR is bounded along the eastern side by the Wasatch Front mountain range, which rises abruptly from the valley floor. On the west, the WFMR is bounded by the Great Salt Lake and Utah Lake. The natural beauty and recreational opportunities of the WFMR have been key reasons for the rapid growth that has occurred in recent decades. The population grew 27% in 1990-2000. The WFMR was brought into world view during the Winter Olympic Games of 2002. This has attracted even more growth to the region. Planners project the population to jump by another 41% by the year 20201. In previous decades, unfettered development and sprawl were allowed to occur in the WFMR. However, the growth surge of the past decade has brought anxiety to the residents of the WFMR. Opinion polls have shown the management of growth to be among the top concerns of the people2. The state of Utah is politically very conservative, and the rights of businesses, including developers, are highly esteemed. Nevertheless, most residents feel that growth cannot be allowed to continue unmanaged. In December 1995, the governor convened a three-day Growth Summit that was aired on all three major television stations. Issues were discussed, and ideas were explored. The Growth Summit further heightened public awareness of this complex problem3. A local citizen's action group, the Coalition for Utah's Future, obtained studies from other metropolitan regions in the western United States that were experiencing high growth. In particular, studies from Denver, Colorado and Portland, Oregon were obtained. Following the approach used in these studies, the group, under a program named Envision Utah, developed four contrasting land use and transportation plans for the region4. These plans were made available to citizens via the local newspapers and public meetings. The public response was mixed. Many felt that the choice of plans was severely limited and somewhat biased in the sense that one of the plans was made to seem clearly superior5. This approach to planning is typical of the traditional approach to regional and city planning. In the traditional approach, a handful of plans are developed by planners and presented to decision-makers for selection. The development of the candidate plans is largely subjective and highly dependent on the experience and preferences of the planners. The subjectivity of the process can induce skepticism among elected officials and the citizenry. Researchers at Brigham Young University have received two grants from the National Science Foundation to study the possibility of increasing objectivity in the land use and transportation planning process through the
City and Regional Planning via a MOEA: Lessons Learned
229
use of formal optimization algorithms. The first grant was devoted to city planning6, and the second grant was devoted to regional planning7. A MultiObjective Evolutionary Algorithm (MOEA) was selected as the optimization algorithm because it can: 1) objectively search large spaces, 2) handle discrete variables and discontinuous functions, 3) rationally treat multiple competing objectives. The number of possible land use and transportation plans for a city or region is huge, making the search space very large. The design variables are discrete-valued choices between different land uses and street classifications. The objectives and constraints are evaluated from empirical models that may be discontinuous functions of the design variables. There are several competing objectives in the problem, and it is very difficult, if not impossible, to get stakeholders to agree on the relative importance of each objective. 10.2. The MOEA Approach Let nsize be the generation size and let ngener be the number of generations. The MOEA begins by randomly generating a starting generation of nsize plans. The objectives and constraints are then evaluated for each plan. The maximin fitness8 is calculated for each feasible plan (plan satisfying all the constraints). Let nobj be the number of objectives, and let Pk be the scaled value of the kth objective for the ith plan in the generation. Assuming that all objectives are minimized, and assuming that all plans in the generation are distinct in objective space, plan j dominates plan i if: fk > fkioi all k from 1 to nobj
(1)
This is equivalent to: min
(2)
(fl -fl)>0
k=l,nobj \
J
*J
-
\ >
The ith plan will be dominated if:
max
(,Tnh (fk~f0)-°
j = 1, nsize \k=l>n°b3 ^
' )
W
230
Richard Balling
The maximin fitness of the ith plan is: fitness1 =
max j = l,nsize
( min (fk~fl)) \k=i,nobj\
(4) )J
The maximin fitness of dominated plans will be greater than or equal to zero, and the maximin fitness of nondominated plans will be less than zero. The maximin fitness is minimized. The maximin fitness has another important property. It does not treat all nondominated plans equally. Nondominated plans that are widely separated from other plans in objective space will have more negative (better) maximin fitness than nondominated plans that are clustered in objective space. In the limit as clustering increases, two nondominated plans that have the same values for all objectives will have zero maximinfitness.The maximin fitness was chosen for its simplicity, and the fact that it rewards both dominance and diversity. Initially, infeasible plans were deleted from the generation and replaced with newly generated feasible plans. However, we later decided that it would be better to allow infeasible plans in the generation in order to increase genetic diversity in the search process. We assume that all constraints are scaled so that they are satisfied when less than zero and violated when greater than zero. Let maxfeas be the maximum fitness over all feasible plans in the generation. The fitness of an infeasible plan is taken as maxfeas plus the maximum value of all constraints for the plan. Thus, thefitnessof any infeasible plan is always greater than the fitnesses of all feasible plans in the generation. After the fitnesses of all plans in the starting generation are evaluated, the starting generation becomes the parent generation, and the processes of tournament selection, single-point crossover, and gene-wise mutation are employed to produce a child generation of nsize plans. We used a tournament size of 3, a crossover probability of 0.7, and a mutation probability of 0.01. The values of the objectives and constraints are then evaluated for all plans in the child generation. Elitism is employed by combining the parent generation and the child generation into a combined generation of 2*nsize plans. The maximin fitness is evaluated for each feasible plan in this combined generation, maxfeas is evaluated as the maximum fitness over all feasible plans, and the fitness of each infeasible plan is evaluated as maxfeas plus the maximum scaled constraint value for the plan. The nsize plans with the lowest fitnesses
City and Regional Planning via a MOEA: Lessons Learned
231
from the combined generation become the next parent generation, and the remaining nsize plans are discarded. The processes of selection, crossover, mutation, fitness evaluation, and elitism are repeated using the new parent generation. 10.3. City Planning: Provo and Orem We began by applying the MOEA approach to two adjacent cities in the WFMR, the cities of Provo and Orem. The combined population of both cities in 2002 was 189,490, and the projected combined population in 2025 is 316,200. We obtained current zoning maps from the planning departments of both cities. The combined cities were divided into 195 zones. The zoning maps specified one of the following 11 land uses for each zone: FARM farm land VLDR very low density residential LDR low density residential MDR medium density residential HDR high density residential CDB central business district GC general commercial SC shopping center LI light industrial HI heavy industrial UNIV university In our optimization problem, we decided to allow all zones to change land use except for the university zones (there are two universities in Provo and Orem). Thus, we assigned one base-ten gene to each non-university zone. The value of this gene for a particular zone specifies the future land use for that zone from among the ten non-university land uses. We also identified 45 major streets in the cities. Each street is currently assigned one of the following classifications in the status quo plan: C2 2-lane C3 3-lane C4 4-lane C5 5-lane
collector collector collector collector
232
Richard Balling
A2 2-lane arterial A3 3-lane arterial A4 4-lane arterial A5 5-lane arterial A6 6-lane arterial A7 7-lane arterial F6 6-lane freeway Speeds on arterial streets are generally higher than speeds on collector streets, and access is more limited. Speeds on freeways are significantly higher than speeds on arterial streets, and access is significantly more limited. In our optimization problem, we decided to allow all streets to change classification except for the freeways. Thus, we assigned one base-ten gene to each non-freeway street. The value of this gene for a particular street specifies the future classification for that street from among the ten nonfreeway classifications. Various objectives and constraints were considered during the course of the research project. From the beginning, it was clear that the minimization of traffic congestion had to be included. Initially, a commercial traffic analysis model, MinUTP9, was used. The model had many capabilities that were not needed, and lacked other capabilities that were needed. We decided to develop our own traffic analysis model. The model analyzes traffic during the peak commute period as well as during the rest of the day. Based on the land uses assigned to the zones for a particular future plan, the model generates the number of trips originating from each zone. These trips include home to work trips, home to non-work trips, and business trips between workplaces. The destination zone for each trip is then determined via the gravity model, which takes into account the travel time between origin and destination as well as the relative attractiveness of zones as destinations. Trips are then assigned to streets via a multi-path assignment model. As streets reach their capacity, their average speed is lowered. This may cause trips to be re-routed to other streets. The traffic congestion objective is the minimization of the total travel time of all trips in a 24-hour day. The MinUTP program required 105 seconds to analyze a single plan. Our own traffic model required only 10 seconds. The second objective that was considered was the minimization of change from the status quo. The change required by a future plan is a measure of its political acceptability. Change is measured in terms of number of people affected. The change is the sum over the zones of the number
City and Regional Planning via a MOEA: Lessons Learned
233
of people currently living in the zone times a change factor plus the sum over the streets of the number of people currently living on the street times a change factor. Change factors range from zero to one based on the degree of change between status quo land use / street classification and future planned land use / street classification. The change factor is zero if there is no change in land use / street classification, the change factor is close to zero for small changes (i.e. a change from VLDR to LDR or from C3 to C4), and the change factor is close to one for large changes (i.e. a change from VLDR to HI or from C3 to A7). Three constraints were considered: a housing constraint, an employment constraint, and a green space constraint. The housing capacity of the status quo zoning plan is 249,035 people based on housing densities in the residential zones. Since the status quo population is 189,490, the build out rate for Provo and Orem is currently only 76%. Assuming a 97% build out rate for the year 2025 when the population is projected to be 316,200, the housing constraint on future plans requires a minimum capacity of 316,200/0.97 = 327,000 people. The employment capacity of the status quo zoning plan is 196,188 jobs based on employment densities in the commercial zones. The ratio of housing capacity to employment capacity for the status quo zoning plan is 249,035/196,188 = 1.27 people per job. Assuming this same ratio in the year 2025, the employment constraint on future plans requires a minimum capacity of 327,000/1.27 — 257,600 jobs. Green space is the amount of land zoned as FARM or UNIV. The status quo zoning plan has 5980 acres of green space. The green space constraint on future plans for the year 2025 requires a minimum of 4000 acres. During the course of the research project, other objectives and constraints were considered. The minimization of air pollution was considered, but it was concluded that this objective would be nearly equivalent to the minimization of traffic congestion. The minimization of new infrastructure costs was considered, but this was found to be strongly correlated to the minimization of change from the status quo. Constraints on minimum utilities, schools, and emergency services were deemed equivalent to the minimum housing and employment constraints. At one point we tried to split the housing constraint into three constraints corresponding to low-income housing, medium-income housing, and high-income housing. However, it was too difficult to obtain reliable data. The MOEA was executed for the city of Provo separately, and then for the combined cities of Provo and Orem simultaneously. The benefits of simultaneous planning over separate planning of the cities became evi-
234
Richard Balling
dent. Simultaneous planning led to increased capacities on east-west streets connecting to two cities. It also allowed the cities to develop cooperative roles. For example, traffic congestion was reduced and green space was preserved by assigning more high-density residential land to Provo and more commercial land to Orem. Of course, such a plan has serious property tax implications and may not be politically acceptable to both cities. After the MOEA was executed for the combined cities, the results were presented to the planning departments of both cities. The results consisted of several feasible non-dominated plans. The planners rejected most of the plans as unrealistic. Even though a plan may satisfy the constraints and have low traffic congestion and low overall change, it may prescribe changes to specific zones and streets that the planners knew would be politically unacceptable. It became clear in our conversations that it was unrealistic to leave the search space completely unrestrained where any zone could be rezoned to any land use, and any street could be reclassified to any street classification. Therefore, we narrowed the search space. We went through each zone in both cities with the planners, and they told us the politically acceptable land uses for that zone. There were several zones where the only acceptable land use was the status quo land use. A new land use titled MIX was also added as an acceptable land use for some zones. The MIX land use is residential with some commercial use permitted. The intent of this land use is to promote walkable communities. It was also decided that the only acceptable street classifications for any street were classifications with higher capacity, in terms of vehicles per hour, than the status quo classification. Thus, streets could be upgraded in capacity, but not downgraded. With these restrictions, the search space dropped from 10237 possible plans to 1086 possible plans. The MOEA was re-executed with a generation size of nsize = 100 for ngener = 100 generations. The total execution time was 26 hours on a laptop computer with a Pentium III processor. Results from two executions are shown in Figures 10.1 and 10.2. In the first execution, the starting generation consisted of 100 randomly generated plans. In the second execution, the starting generation consisted of 98 randomly generated plans and 2 seeded plans. The first seeded plan was a plan obtained by executing the MOEA with traffic congestion as the single objective, and the second seeded plan was a plan obtained by executing the MOEA with change as the single objective. Note that the seeded starting generation in Figure 10.2 produced a more diverse final generation than the unseeded starting generation in Figure 10.1.
City and Regional Planning via a MOEA: Lessons Learned
235
The results were again shared with planning departments from both cities. Although they were overwhelmed by 100 non-dominated feasible plans in the final generation, they were able to pick out interesting observations. The most glaring observation was that the status quo zoning plan was infeasible. It did not have near enough housing or employment capacity to meet the needs of the projected future population. The feasible plan with the lowest value of change converted many FARM zones to residential and commercial zones in order to meet the housing and employment constraints, and it left the non-FARM zones and the streets unchanged from the status quo. Since few people currently live on FARM land, this kind of change affects the fewest number of people. This is precisely what has happened in recent decades in both cities as fruit orchards and agricultural fields have been gobbled up by development. Thus, it appears that people have subconsciously opted for the minimization of change. On the other hand, the feasible plan with the lowest value of traffic congestion rezoned many FARM and non-FARM zones throughout both cities, and reclassified virtually all streets to A7. Most of the land use changes involved mixing residential and commercial land use more uniformly throughout the cities in order to shorten trips. Although planners from both cities did not select a plan from the results as their master city plan, they did regard are study as objective evidence for the observations that emerged. Such evidence is needed for the persuasion of elected officials and citizens. 10.4. Regional Planning: The WFMR The MOEA was applied to the WFMR as a whole. To do so, we first identified the developable land. Developable land excluded national forest land, lakes, wetlands, and steep land on mountainsides. The developable land was then divided into 343 districts. Districts are much larger in area than the zones in a city. In fact, a single district may represent an entire town or borough. Because districts are larger than city zones, a future plan cannot assign a single land use to an entire district. Instead, a future plan may assign a single "scenario" to a district. A scenario is a set of land use percentages. Clustering analysis was used to identify 17 scenarios currently existing in the WFMR. These 17 scenarios are listed in Table 10.51. The leftmost column of Table 10.51 lists following different land uses: Rl single family residential
236
Richard Balling
R2 duplex and four-plex residential R3 multi-family apartment residential R4 mobile home residential R5 high density apartment residential Cl retail commercial C2 industrial commercial C3 warehouse commercial C4 office commercial AU airports and universities AG agricultural PA parks VA vacant Each column in Table 10.51 lists the land use percentages for a particular scenario. Thus, scenarios 1-3 and 14 are primarily open space scenarios, scenarios 4-7 are primarily residential scenarios, scenarios 8-10 are mixed usage scenarios, scenarios 11-15 are primarily commercial scenarios, and scenarios 16-17 are primarily airport and university scenarios. One integervalued gene was assigned to each district. The integer value from 1 to 17 corresponds to the planned future scenario for the district. We learned in the Provo-Orem city planning problem that it was unwise to let every district change to any possible scenario without restraint. Table 10.52 indicates the allowed scenario changes. A row in Table 10.52 corresponds to the status quo scenario, and the X's indicate the allowed future scenarios. Note that scenarios 15, 16, and 17 were not allowed to change from the status quo. Scenario 15 is primarily heavy industrial, scenario 16 is primarily university, and scenario 17 is primarily airport. In general, districts were allowed to change to scenarios that were slightly more developed than their status quo scenarios. We identified 260 inter-district streets in the WFMR. We used the same street classifications that were used in the Provo-Orem city planning problem. Streets that are currently arterial were allowed to change to arterial classifications with equal or greater number of lanes. Streets that were currently collector are allowed to change to collector or arterial classifications with equal or greater number of lanes. Streets that are currently freeways were not allowed to change. We used the same objectives and constraints that were used in the Provo-Orem city planning problem. Specifically, we minimized the total travel time of all trips in a 24-hour day, and we minimized the change
City and Regional Planning via a MOEA: Lessons Learned
237
from the status quo. The population of the WFMR in 2000 was 1,702,450. The housing constraint required enough housing capacity for the projected 2,401,000 residents for the year 2020. The employment constraint required enough employment capacity for the projected 1,210,000 jobs needed by the year 2020. The green space constraint required future plans have at least 165,000 acres of green space, which is 20% of the area of the developable land. We ignored the change objective and executed the genetic algorithm to find the minimum travel time plan (the constraints were included). Then we ignored the travel time objective and executed the genetic algorithm to find the minimum change plan. These two seed plans were added to 98 randomly generated plans to form the starting generation of 100 plans. The genetic algorithm was then executed for 100 generations with both objectives. The execution required four days on a desktop computer with a 1.7 GHz dual processor and 1 gigabyte of RAM. Objective function values for the starting and final generations are plotted in Figure 10.3. Table 10.53 gives numerical results for four selected plans: the status quo plan, the minimum change plan, the minimum travel time plan, and a compromise plan selected from the final generation. Note that the status quo plan does not satisfy the minimum constraints for housing and employment, while the other three plans do. Note that the travel time for the minimum change plan is more than double the travel time for the minimum travel time plan. Note that the change for the minimum travel time plan is more than 18 times greater than the change for the minimum change plan. Finally, note that the compromise plan represents a good compromise between travel time and change. As expected, the minimum change plan did not change any of the streets, and the minimum travel time plan reclassified all streets to A7. The land use of the plans is more interesting. Figure 10.4 shows land use maps for the status quo, minimum change, and minimum travel time plans. The land use is divided into four maps for each plan. The first map shows the districts that are predominantly open space, the second map shows the districts that are predominantly residential, the third map shows the districts that are predominantly commercial, and the fourth map shows the districts that are mixed residential/commercial usage. All three plans have about the same amount of predominantly commercial land. The minimum change and minimum travel time plans have significantly less open space than the status quo. The minimum change plan converted open space land to predominantly residential land, and the minimum travel time plan converted both
238
Richard Balling
open space and residential land to mixed usage land. These observations are similar to what was observed in the Provo-Orem city planning problem. The approach and the results were presented to: 1) planners from the Utah Governor's Office of Planning and Budget, 2) planners from Envision Utah, 3) mayors and officials serving on the Utah Quality Growth Commission, 4) mayors serving on the Mountainlands Association of Governments, and 5) planners from the Wasatch Front Regional Growth Commission. All of these people found the work interesting and relevant. However, none were anxious to select one of the non-dominated plans produced by the work right away. We believe there are two reasons for this. First, we believe they were intimidated by the fact that this approach is so radically different from the traditional approach. Second, we believe that the number of non-dominated plans produced is overwhelming. Further work is needed to reduce the non-dominated plans down to a handful of plans that decisionmakers can assimilate. This reduction process must be done objectively rather than subjectively. The resulting handful of plans must represent distinct conceptual ideas. Finally, many of these groups expressed the need to include mass transit in our modeling. 10.5. Coordinating Regional and City Planning One of the problems we wanted to address in our research project was the issue of coordinating planning at the regional and city levels. Regional planning should not attempt to micromanage city planning by taking over the development of zoning plans and street plans for each city in the region. Cities must be given autonomy to develop their own zoning and street plans. On the other hand, if cities plan independently from one another and ignore regional planning altogether, then regional goals will not be achieved resulting in a chaotic and inefficient situation. A proper balance between regional and city planning must be sought. The regional planning approach we have described thus far is consistent with this balance. At the regional level, scenarios are selected for each district, and street classifications are selected for each inter-district street. Inter-district streets are major streets that run through multiple districts as opposed to intra-district streets that lie within a district. Remember that districts are fairly large in size and may represent an entire town or borough. After a regional plan is selected, the results are sent down to the cities. The cities then subdivide the districts into zones and determine the land use for each zone. The inter-district streets are held fixed to the classi-
City and Regional Planning via a MOEA: Lessons Learned
239
fications specified by the regional plan. The cities are allowed to determine the classifications of the intra-district streets. The objectives and constraints used at the city planning level need not be the same as the objectives and constraints used at the regional planning level. However, the cities must include the minimization of land use deviation among their objective functions. Land use deviation is a measure of the mismatch between the zoning plan of the city and the scenarios specified for each district by the region. Recall that scenarios specify the percentages of each land use in each district. For a particular zoning plan, one can determine the actual percentages of each land use in each district. The land use deviation is the sum over the districts of the differences between specified and actual percentages of land use multiplied by the area of the district. With this objective function, cities try to match the regional land use scenarios passed down to them. We demonstrated this approach to regional and city coordination by selecting the compromise plan in Table 10.53 as the regional plan and passing the results down to the cities of Provo and Orem. There are 35 districts in Provo and Orem, which are divided into the same 195 zones as before. We re-executed the MOEA for Provo-Orem with the same objectives and constraints as before, but with the addition of land use deviation as the third objective. Table 10.54 matches the land uses from the city level to the land uses from the regional level. Note that each row of the table adds up to 100%. Twenty-seven of the 45 major streets are inter-district streets, while the rest are intra-district streets. The inter-district streets were fixed to the classifications specified in the regional plan. The MOEA was executed for 100 generations with 100 plans in each generation. The starting generation was again seeded with plans where each of the three objectives was minimized individually. The MOEA produced a non-dominated set of plans for the decision-makers. We noted that the land-use deviation could not be reduced to zero because the discreteness of the zones and the limitations on allowable land uses made it impossible to match the regional percentages exactly. 10.6. Conclusions An MOEA was developed and applied to city and regional planning. At the city level, the MOEA determined land uses for each zone and classifications for each street. At the regional level, the MOEA determined land-use percentages for each district and classifications for each inter-district street.
240
Richard Balling
At both levels, traffic congestion and change from the status quo were minimized while constraints on housing capacity, employment capacity, and green space capacity were satisfied. Coordination between city and regional levels was demonstrated by re-executing the MOEA at the city level with a third objective, which minimized land use deviation from the regional plan. Inter-district streets were also fixed to the classifications specified by the regional plan, but the city was permitted determine the classifications of intra-district streets. The MOEA was executed for the Wasatch Front Metropolitan Region (WFMR) in the state of Utah in the USA. It was also executed for the cities of Provo and Orem in the WFMR. Results were presented to both planners and elected officials. After discussions with these people, the MOEA approach was modified and re-executed. Because the MOEA approach is so radically different from the traditional approach to planning, and because it produced an overwhelming number of non-dominated plans, the planners and elected officials were reluctant to select one of the plans produced by the MOEA approach. Nevertheless, they recognized the objectivity of the MOEA approach, and they were able to utilize the planning trends and ideas produced by the MOEA approach. Many of these people view the MOEA approach as the future way to plan, and encourage its further development. Acknowledgments This work was funded by the USA National Science Foundation under Grant CMS-9817690, for which the author is grateful. References 1. Utah Governor's Office of Planning and Budget, http://www.governor.utah.gov/dea/LongTermProjections.html. 2. Growth Summit Survey Results, Dan Jones and Associates, Salt Lake City, Utah, 1995. 3. Fouhy, E., "Utah Growth Summit Attracts Half-Million Citizens", Civic Catalyst Newsletter, Winter, 1996. 4. Envision Utah Quality Growth Strategy, Envision Utah, Salt Lake City, Utah, November, 1999. 5. Simmons, D.R., Simmons, R.T., and Staley, S.R., Growth Issues in Utah: Facts, Fallacies, and Recommendations for Quality Growth, The Sutherland Institute, Salt Lake City, Utah, October, 1999. 6. Balling, R.J., Taber, J.T., Day, K., and Wilson, S., "Land-Use and Transportation Planning for Twin Cities Using a Genetic Algorithm", Transporta-
241
City and Regional Planning via a MOEA: Lessons Learned
tion Research Record, Vol. 1722, Pp. 67-74, December, 2000. 7. Balling, R.J., Lowry, M., and Saito, M., "Regional Land-Use and Transportation Planning Using a Genetic Algorithm", Transportation Research Board Annual Meeting, Washington, DC, January 12-16, 2003. 8. Balling, R.J., "The Maximin Fitness Function; Multi-Objective City and Regional Planning", Second International Conference on Evolutionary MultiCriterion Optimization, Faro, Portugal, April 8-11, 2003. 9. MinUTP, Comsis Corporation, Silver Springs, Maryland, 1994. Table 10.51. Land Use Percentages for Each Scenario.
Rl R2 R3
Scenarios 1 | 2 | 3 | 4 | 5 | 11.3 34.0 41.0 47.3 86.1 ~ O 2 ~ 0.3 0.7 0.7 2.0 0.1 0.0 0.2 0.0~ 0.8
R4
"~b~X~~
|
0.0
0.7
R5 0.0 0.0 0.0 Cl 1.7 0.0 3.1 C2 ~ L X ~ 0.0 " 2.4 C3 0.2 0.0 0.5 C4 0.1 0.0 ~ 0.1 AU ~ 2 J 3 ~ 0.1 ' 4.1 AG 78.4 0.0 39.5 PA 0.2 64.6 1.0 VA 3.5 0.9 ~ 4.7 ~L0~
Rl R2 R3 R4 R5 Cl C2 C3 C4 AU AG PA
10.6 (To ~38.4 ~blT~ 0.0 25.4 ~5~7T~ 0.0 ~olT~ 0.0 0.0 0.0
11
12
6 | 68.2 1.8 1.0
7 | 41.7 46.4 0.0
8 | 48.7 2.9 4.8
0.0~
0.1
0.5
0.0
0.9
13.9 0.0 0.0~ 0.0 0.0 0.0 ~ 36.2 0.0 1.9
0.0 0.1 4.3 3.6 1.0 0.8 0.1 0.2 0.2~ 0.3 0.7 ~9.5 2.0 7.0 0.4 1.9 2.2~ 4.2
0.0 2.8 0.0 0.3 0.2 0.1 8.6 0.0 0.0
0.0 11.9 4.0 0.3 0.5~~ 8.5 3.1 7.8 6.1~
14
16
13
15
12.9 10.8 21.3 4.2 3.1 21.1 1.9 0.0 ~ 1.0 O3 bT9 O8 2.2~ 1.1 2.3 0.2~ 0.0 " 8.8 0.0 0.6 1.4 ~ 0.1 0.2 0.0 0.0 0.0 0.1 " 0.0 0.0 0.0 31.6 65.6 7.9 5.0 1.9 3.5 22.2 "5.0 7.9 ~ 17.3 77.2 0.0 2.0 0.0 0.4 0.9 0.4 O0 0.8 "O.O 0.5 ~ 0.1 0.1 9.9 7.6 2.9 24.5 4.1 0.3 47.7 0.3 4.9 4.4 9.3 15.5 0.2 1.7 0.0 2.4 ~ 5.7 ~oTb~ 4.6
V A I 19.8 [ 14.6 I 9.1
9 57.4 0.6 2.4
I 24.4 | 52.7 | 0.1 | 1.1
0.0
17
10.8 (O l.f~ 0.4 0.0 3.2 0.6 (X0 0.0 76.1 1.8 2.4 | 3.3
|
0.0 28.7 ~0^~ 0.0 0.0 luT" 1.2 0.6 5.1
242
Richard Balling Table 10.52.
_1 2 3 _4
5
Allowed Scenario Changes.
I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 I 16 I 17 X X X X X X X X X X X X ~ X ~ X X X ~X X ZZHI^^ ' X ~X~^X X~~X~~X~^X X _ X X X ~^X X X ZZZ
6 _7 _8 _9 10 11 12 13 14 15 _16
~17
X X ^X X X
X~^X~^X~^Xf~~X~ X X__ _JX_ _X_ _X_ _X_ _X X X X X X X X X X X X X X X X~~X~~X~ X X X
|
|
|
I [ |
Table 10.53.
|
|
Table 10.54. FARM VLDR LDR MDR HDR CBD SC GC LI HI MIX UNIV
I Rl I 0 95 " 80 20 0 ' 0 "0 "0 0 " O 0 0^
X X X X X X X
|
X X X ~~X ~X~ X X X X X X X
X X
1 1 I
X X
I
|
X
1 X
Data for Four Selected Regional Plans. status quo
change in persons affected travel time in hours housing capacity in people employment capacity in jobs green space in acres
IZZZIZ
0 1,349,617 1,742,914 995,293 349,583
minimum minimum compromise change travel time 59,934 ~ 1,119,385~ 273,753 2,025,681 984,436 1,493,006 2,401,937 2,401,360 2,410,032 1,210,048 1,466,150 1,376,804 248,541 247,840~ 228,256
Regional and City Land Use Match.
R2 I R3 I R4 I R5 I C l I 0 ~0 0 ~0 0 0 0 0 0 0 10 ' 0 10 0 ~0 10 70 0 0 0 10 ~20 0 ^70 0 0 0 0 0 30 0 0 0 0^5 0 0 0 ~ 0 ~35 0 0 0 0 10 O O O O 0 10 20 0 27.5 22.5~ 0 6 6 0^ 0
C2 I C3 I C4 I AU I 0 0 " 0 0 0 0 0 0 0 0 0 0 ~ 0 0 ~0 0 0 0 0 0 2.5 2.5 50 15 0 0 25 0 ~ 10 10 35 10 60 30 0 0 95 05 0 0 0 17.5 2 . 5 ~ 0 0 ~0 100
AG I PA 90 10 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
City and Regional Planning via a MOEA: Lessons Learned
Fig. 10.1. Provo-Orem Results with Random Starting Generation.
Fig. 10.2. Provo-Orem Results with Seeded Starting Generation.
243
244
Richard Balling
Fig. 10.3. WFMR Results with Seeded Starting Generation.
City and Regional Planning via a MOEA: Lessons Learned
Fig. 10.4. WFMR Land Uses for Three Plans.
245
CHAPTER 11 A MULTI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR THE COVERING TOUR PROBLEM
Nicolas Jozefowiez Laboratoire d'Informatique Fondamentale de Lille, Universite de Lille, France E-mail: [email protected] Frederic Semet Laboratoire d'Automatique, de Mecanique et d'Informatique industrielles et Humaines, Universite de Valenciennes, France E-mail: frederic.semetQuniv-valenciennes.fr El-Ghazali Talbi Laboratoire d'Informatique Fondamentale de Lille, Universite de Lille, France E-mail: [email protected] In this chapter, we present a multi-objective evolutionary algorithm for the bi-objective covering tour problem, which is a generalization of the single-objective covering tour problem. In the latter, the objective is to find a minimal length tour on a set of vertices V so that every vertex in a set W lies within a given distance c of a visited node. In the biobjective covering tour problem, the parameter c is omitted, and replaced by an objective. Our evolutionary algorithm employs specially designed genetic operators, and exploits special features of the problem to improve its efficiency. The evolutionary algorithm is compared with an exact algorithm on generated benchmarks.
247
248
N. Jozefouiiez, F. Semet and E-G. Talbi
11.1. Introduction Problems related to vehicle routing are among the most studied combinatorial optimization problems. They deal with the need to identify a tour, or a set of tours, on a set of nodes, or a set of arcs, while taking resources into account. Famous examples of this class of problems are the well-known traveling salesman problem (TSP), and the vehicle routing problem (VRP) 32 when the routing is realised on the nodes, or the arc routing problem (ARP) when it is done on the arcs. Academic routing problems often need adaptations for practical applications. These adaptations usually take the form of new constraints incorporated into the model of the problem. For instance, the basic version of the VRP deals with the construction of a minimum length collection of routes for a fleet of vehicles among customers, which demand to be served goods from a depot, so that the demands on a route do not exceed the vehicle capacity. To consider practical aspects, several variants of this problem, including additional constraints, have been proposed 32, such as the VRP with time windows. In this case, each customer must be served during its time window. Another strategy to improve the practical aspect of a problem is to consider several objectives. In the last decade, a growing number of studies explore this opportunity in different application areas. In the case of routing problems, the objectives can be classified according to three aspects: the route (work, profit, makespan, balance ...), the nodes or arcs (time windows, customer service ...), and the resources (vehicles, product ...). Tables 11.55 and 11.56 present different papers devoted to multi-objective routing problems with their objectives classified according to these three categories. The problems presented in Tables 11.55 and 11.56 are clearly stated and solved as multi-objective problems by the authors. However, according to Boffey 4, some single-objective routing problems are used as surrogates for multi-objective routing problems. In those problems, either only one objective is solved while the others appear as constraints; or the different objectives are combined into a single objective, for example by means of a weighted aggregating method. Feillet et al. 10 survey the class of problems called traveling salesman problems with profit, in which all the customers may not be visited, but where a profit is associated with each customer, and can be collected by visiting it. Clearly, two opposite objectives can be defined:
•
VRP
VRP
and
22
Sutcliffe Board 3 1
Lee and Ueng
El-Sherbeny
y
Hong and Park
30
iB
VRPTW for a Belgian firm
VRPTW
VRPTW
VRP
Park and Koelling 27 28 '
Sessomboon et al.
Problem TSP with profit
1
»Min. total distance «Max. equalization of vehicle trip times »Min. total distance •Balance workload (length) Min. total distance
Min. total travel distance
Route «Min. tour length »Max. profit
"Heuristic Min. total travel time • Goal programming Multiobjective »Min. total duration simulated anneal^Balancing (length) ing »Max.
MOEA
Heuristic
Constrained MOLP
,
Min. total customer waiting time Min. waiting time
Max. customer satisfaction
«Max. fulfillment of urgent demand •Max. conditional dependencies of stations
Nodes
flexibility
1
Multi-objective routing problems in the literature (part 1).
Method "Lexicographic method •Heuristic Weighted goal programming
Table 11.55.
Authors Keller - ^ Keller and Goodchild 1 9
,
1
uti-
»Min. number of trucks »Min. number of covered trucks •Min. number of uncovered trucks •Min. working time not used
»Min. number of vehicles •Min. vehicle waiting time
Max. capacity lization
Min. deterioration of goods
Resources
1
CD
to
^ e_ ?"• § ^ §"• ^ § g. g § *f (£• 2 | ,_^ ° S Q § 2 t§ 4 5j "T3 g_ 2
tt.
1Y
Paquete Stutzle 2 6 Yan et al. a 4 This study
25
and
Pacheco and Marti
Baran and Schaerer 2 Lacomme et al. '21
Jozefowiez et al.
Corberan et al. e
MOEA »e-constraint method »MOEA
MOTSP covering problem
tour
Local search
Tabu search
MOEA
CARP
School bus routing MOTSP
Ant colony system
MOEA
Scatter search
Iterated local search
MOEA
Method MOEA
Min. total length Min. length
Min. total length
»Min. total distance •Balancing (length) Min. total traveling times «Min. total length • Min. makespan Min. Makespan
«Min. total distance ^Balancing (nb. of visited customers) Min. makespan
Min. total distance
Route Min. total distance
Min. cover
Min. total delivery times
Nodes »Min. time window violation •Min. number of violations Min. number of violated constraints Marketing objective
Multi-objective routing problems in the literature (part 2).
VRPTW
School bus routing VRP
Periodic VRP
Ribeiro Lourengo
and
VRPTW
Rahoual et al.
29
Problem VRPTW
Authors Geiger ll
Table 11.56.
Min. number of vehicles
Min. number of vehicles
Min. number of vehicles
Min. number of vehicles
Resources Min. number of vehicles
§. Cq O >-3 5:
o
-^ re5 5
S; <-, g ° g'
s
to
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
251
(1) Collect the maximum of profit, and thus visit the maximum number of customers, and increase the length of the tour. (2) Minimize the travel length by excluding customers, and thus minimize the profit generated by those customers. When the objective is the maximization of profit while restricting the length of the route, the problem is called the selective traveling salesman problem. On the other hand, when the objective is the minimization of the length while ensuring a minimal profit, the problem is called the quota traveling salesman problem. But, when it is considered from a bi-objective point of view, a unique problem is defined. The only attempt to solve the biobjective problem by means of a lexicographic method, was made by Keller 18 , and Keller and Goodchild 19. In the present chapter, we are interested in another problem pointed by Boffey 4 as an implicit multi-objective routing problem: the covering tour problem (CTP) 13. We defined a multi-objective evolutionary algorithm (MOEA) for the bi-objective model of this problem. MOEAs have generated great interest among researchers for solving multi-objective problems (MOP) since they possess interesting features like working on a population of solutions, which is helpful to the solution of a MOP made of a set of solutions. Several works and applications 5 , including studies on multi-objective routing problems 30-11.17! u s e MOEAs. This chapter is organized as follows. Section 11.2 presents the CTP, its bi-objective generalization, as well as a heuristic and an exact method from the literature 13 used in the meta-heuristic and for experiments. Section 11.3 introduces our MOEA for the CTP. In section 11.4, we assess the efficiency of the MOEA on generated benchmarks. Conclusions are drawn in section 11.5. 11.2. The Covering Tour Problem 11.2.1. The Mono-Objective
Covering Tour Problem
The covering tour problem (CTP) is defined as follows. Let G = (V U W, E) be an undirected graph, where V U W is the vertex set, and E = {(vi,Vj)\vi,Vj e V U W, i < j} is the edge set. Vertex Vi is a depot, V is the set of vertices that can be visited, T C V is the set of vertices that must be visited (v± G T), and W the set of vertices that must be covered. A distance matrix C = (c^) satisfying the triangle inequality is defined on E. The CTP consists in determining a minimum length tour, or Hamiltonian cycle, over a subset of V so that the tour contains all vertices of T, and
252
N. Jozefowiez, F. Semet and E-G. Talbi
every vertex of W is covered by the tour, i.e., it lies within a distance c from a vertex of the tour. Such a tour may not always exist. A generic application of the CTP is the design of a tour on a network where the vertices of V represent points that can be reached by a vehicle, and all the points not on that route are easily reachable from it 7 . An example is the selection of locations for post boxes among a set of candidate sites so that all users are located within reasonable distance of a post box, and the cost of a collection route is minimized 20. Another application, proposed by Hodgson et al. 15, is the provision of adequate primary health care in developing countries. In this study, a medical mobile facility cannot reach all the villages, or the whole population. Therefore, the goal is to determine a minimal length tour for the mobile facility upon the practical roads and to ensure that the maximal distance to travel for a patient, who often walks, to the nearest visited points does not exceed a given length. This problem was applied to real data from the Suhum district in Ghana. Few papers on the CTP can be found in the literature. Gendreau et al. 13 proposed a model, a heuristic, and a branch-and-cut algorithm. The branchand-cut algorithm is used in the study presented above on the routing of medical mobile facility in Ghana 15. Another model and three scatter search algorithms were proposed by Maniezzo et al. 23. Hachicha et al. 14 studied an extension of the CTP: the m-CTP where m tours must be defined so that every vertex of V belongs to at most one tour, and the length of a tour does not exceed a given constraint. The authors defined three heuristics for the m-CTP and applied them on generated instances and on the Suhum district data. Finally, Motta et al. 24 proposed a GRASP meta-heuristic for a generalized covering tour problem where the nodes from W can also be visited. 11.2.2. The Bi-Objective Covering Tour Problem In the present chapter, we are not interested in the solution of the CTP as described above, but as a bi-objective problem. The bi-objective covering tour problem (BOCTP) corresponds to the CTP, where the constraints imposing that for all the nodes w E W there exists at least one node v visited such that cvw is smaller than c, have been removed, and replaced by an objective. All the other constraints are maintained. The objectives of the BOCTP are the minimization of: (1) the tour length. (2) the cover.
253
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
We define the cover of a solution as the greatest distance between a node w £ W, and the nearest visited node v £ V from w. We will now present a criterion to compute the covers that correspond to feasible solutions. This criterion will be used later in our MOEA. From the definition of the cover, for every couple (v, w) £ V x W, cvw is a candidate cover. However, not every candidate cover corresponds to a feasible solution. To evaluate the feasibility of a cover, we use the following criterion. Given v £ V and w £ W, we have: 1)W eT,v'^v, cv>w>cvw cvw is a feasible cover •& 2)Vu/ £ W, w' ^ w, 3i>' £ V S-T.
Cy'yj'
_
^VW
_
Cyl
ui
The validity of this criterion is immediate since the implication is the definition of the cover, and in the other direction, following the rules is equivalent to building a solution. 11.2.3. Optimization
Methods
11.2.3.1. A Heuristic Method In this paragraph, we will describe the heuristic designed by Gendreau et al. 13 , which is used in our meta-heuristic. This heuristic combines the GENIUS heuristic for the TSP 12 with the PRIMAL1 Set Covering heuristic by Balas and Ho 1 for the set covering problem (SCP). The PRIMAL1 heuristic gradually includes nodes v in the solution according to a greedy criterion, in order to minimize a function f(cv,bv), where, at each step, cv is the cost of including node v £ V in the solution, and bv is the number of nodes of W covered by v. cv is expressed by the value of the minimum spanning tree built upon the edges defined by the vertices present in this solution and v. The minimum spanning tree is built using the Prism algorithm. The three following functions suggested by Balas and Ho are used: (1) / ( * A ) = 1 5 ^ (2) f(Cy,bV) = t (3) f(Cy,by)
=CV
PRIMAL1 first applies criterion 1 in a greedy fashion until all the vertices of W are covered. Then, the nodes, which cover an over covered node of W, are removed from the solution. After that, the solution is completed using criterion 2, and nodes, which cover over covered nodes of W, are removed.
254
N. Jozefowiez, F. Semet and E-G. Talbi
The process is iterated with criterion 3. A second solution is constructed by applying this time the criteria in order 1, 3, 2. The best of these two solutions is retained. The following heuristic is run twice like in PRIMAL 1 with the two sequences of criteria. STEP 1 Initialization. Set H <- T, z = oo. The current covering criterion is 1. STEP 2 Termination rules. If at least one vertex of W is not covered by a vertex of H, go to STEP 3. Construct a Hamiltonian cycle over all the vertices of H using GENIUS. Let z be the length of the cycle. If z < z then z <— z and H «— H. If the current covering criterion is the last one, stop: the best solution is given by the tour on H, its cost is z. Otherwise, remove vertices of H associated with over covered nodes of W and consider the next covering criterion. STEP 3 Vertex selection. Compute for every v G V \ H the coefficients cv and bv in the current set H. Determine the best vertex v* to include in H according to the current covering criterion. Set H <- H U {v*}. Go to STEP 2. 11.2.3.2. An Exact Method We propose a multi-objective exact method which is based on the monoobjective branch-and-cut algorithm for the CTP developed by Gendreau et al. 13. It is able to solve instances, where the size of V is up to 100, and the size of W up to 500. We employ the branch-and-cut algorithm in an e-constraint strategy to generate the optimal Pareto set for the BOCTP. It is easy to design the e-constraint method since the problem is usually solved with the cover considered as a constraint, and since we are able to compute all the possible cover values by means of the criterion described earlier. The method is as follows : STEP 1 Compute the feasible covers using the previous criterion, and sort them out by decreasing order in a list I = (co, c\,..., c*). Set current-cover 4— CQ.
STEP 2 Solve the CTP with cur rent .cover as a parameter using the branch-and-cut algorithm. STEP 3 Compute the cover cs of the solution s provided by the branch-and-cut. Save s as a solution of the optimal Pareto set. Search a 6 I so that c; = maxCj.6/(ci < cs). If such a a exists, then
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
255
set current-cover <- a and go to STEP 2, otherwise stop. 11.3. A Multi-Objective Evolutionary Algorithm for the Bi-Objective Covering Tour Problem This section is organized as follows. In subsection 11.3.1, we present the general framework of our MOEA. In subsection 11.3.2, we explain the choices we made for the encoding of the solution. Finally, the genetic operators are described in subsection 11.3.3. 11.3.1. General Framework Our approach is based on a steady-state variant of NSGA II 8 . The main mechanisms and features of our MOEA are: (1) The ranking function is the same as in NSGA II. The population is sorted into different non-domination levels. The non-dominated individuals obtain the rank 1 and form the subset E\. Rank k is given to the solution only dominated by the individuals belonging to the subset E\ U Ei U • • • U Ek-i- Then a fitness equal to its rank (1 is the best level) is assigned to each solution. Since we work on a bi-objective problem, this phase can be done efficiently in O(nlogn). (2) A crowding distance metric is used to provide diversity during the search. This metric gives an estimate of the density of the solutions surrounding a solution i in the population. This estimate is expressed by an approximation of the perimeter of the cuboid formed by the nearest neighbors of i. (3) We added an archive whose purpose is to store the non-dominated solutions as they are found. Doing so insures that no non-dominated solutions will be lost due to the stochasticity of the algorithm. This archive is also used for the stopping criterion of the MOEA. If it is not updated for M generations in a row, the MOEA is stopped. (4) The initial population is built as follows. First, the feasible cover values are computed using the criterion explained earlier. Then, we select several values among the feasible covers, and apply the heuristic for the CTP with the selected covers as parameter. By using the heuristic, we begin with good quality solutions. Furthermore, by selecting the starting covers, and notably the highest and lowest values, we can obtain information about the frontier such as
256
N. Jozefowiez, F. Semet and E-G. Talbi
a first approximation of the extremities of the Pareto set. N + 1 solutions are generated during this phase. (5) Our approach is steady-state. A generation runs as follows. The rank and crowding distance are computed for the N + 1 solutions belonging to the population. The solution with the worst rank and the worst crowding distance is discarded. Two parents are chosen from the N remaining solutions by means of a binary tournament, during which two solutions are compared by means of a crowded tournament selection operator. According to this operator, a solution i wins a tournament against another solution j if any of the following conditions is true: (a) n < rj, where r, is the rank of solution i. (b) ri — Tj and d; > dj, where di is the crowding distance of solution i. The first condition makes sure that the chosen solution lies in a better non-dominated set. The second condition breaks ties when both solutions belong to the same non-dominated level by selecting the less crowded individual. An offspring is generated from the two selected parents by means of the genetic operators described in subsection 11.3.3. We do not allow multiple occurrences of a solution in the population. If the offspring already appears in the population, we generate a new offspring with the same parents. This process is repeated until an offspring not already present in the population is created, or until 50 offspring have been generated unsuccessfully. If an offspring is successfully generated, it is inserted into the population in replacement of the discarded individual, otherwise the population does not change. 11.3.2. Solution Coding Two components can be identified in a CTP solution. The first one corresponds to a set covering problem solution (SCP), i.e., the vertices that will appear in the tour; the other component is the tour upon the vertices that have been chosen. It may be noted that the second objective is independent of the tour, and only depends on the vertices appearing in the solution. In the genetic operators, we are only interested in the SCP aspect of the solution. Actually, problems are encountered for the TSP aspect when one is designing the genetic operators. Indeed, the two parents may be so different in regards of the nodes visited and the edges used, that not enough
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
257
information can be passed from the parents to build a tour for the offspring without requiring the use of a method providing a solution of the TSP. Therefore, it has been decided that the genetic operators choose the nodes visited, and that the tour is built by an embedded method dedicated to the TSP. In our implementation, we use the GENIUS heuristic 12. Since GENIUS is a heuristic, it will not always be able to solve optimally the TSP over the selected nodes. However, our hypothesis is: if V\ and V2 are two subsets of V, and the optimal tour on Vi is better than the optimal tour on V2> then, in most cases, the tour generated by GENIUS on V\ would be better than the tour generated by GENIUS on V2. Considering this hypothesis, we designed the MOEA to identify covers, and their associated sets of vertices, which are good candidates for optimal Pareto solutions. Then, if there is a need, TSP dedicated methods may be applied on all the solutions of the approximation generated, or on the solutions selected by a decision maker, to improve their length. The task of the retained method will be simplified, since good candidate sets will have been identified, and since it appears from the experiments conducted by Gendreau et al. 13 , that the number of vertices in the optimal solutions are usually small compared to \V\. 11.3.3. Genetic
Operators
11.3.3.1. The Crossover Operator The crossover we have developed built a solution by inserting one vertex at a time. The goal is to minimize the number of vertices in the solution at the end of the crossover. To do so, we avoid to add a vertex that does not have an effect on the cover value. Actually, the insertion of a vertex in a solution can only have two consequences: either the cover value decreases, or it stays unchanged. The first case appears if the cover value is given by a couple (v, w), and a vertex v', so that cviw < cvw, is added. Taking this remark into account, the crossover was designed as follows: STEP 1 Initialization. Set H <-T. STEP 2 Identify the couple {v,w) that gives the current cover value. Build the set H' <- {v' G V \ H\cV'W < cvw).
STEP 3 If H' = 0 then go to STEP 4. Choose a node v' e H' and remove it from H'. Include v' into H with a probability p computed according to the parent sets of visited nodes. If v' is included into H, go to STEP 2, otherwise go to STEP 3.
258
N. Jozefowiez, F. Semet and E-G. Talbi
STEP 4 Build a subset U of the vertices of H so that the cover is unchanged by the removal of an element of U from H. If U is empty, exit. Otherwise, select u 6 U so that the value of the minimum spanning tree on H\{u) is minimal, and set H •(- H\{u}. Reiterate STEP 4. The probability p in STEP 3 is computed by the same rules as in the crossover fusion 3 for the SCP. Let i and j be the two parents, let v be the vertex we wish to include into the offspring, then p is computed as follows: (1) if v appears in i and j , then p = 1. (2) if v does not appear in any parent, then p = 0; (3) if v appears in one parent only, let p' = rT]rT. if r; ^ tj, otherwise let p' = d d^d.. If v is used in i, then we set p = p', otherwise we set p = 1 - p'. 11.3.3.2. The Mutation Operator During the mutation phase, we change the status of each vertex v EV \T with a probability iv\x\ • ^° change the status of a vertex means that if the vertex is in the solution, we remove it even if it increases the cover value; and if the vertex is not in the solution, we add it even if it does not improve the cover value. 11.4. Computational Results The MOEA was tested on a series of randomly generated instances. To generate the vertex set, |V| + |W| points were generated in a [0,100] x [0,100] square, according to a uniform distribution. The sets T and V were defined by taking the first \T\ and |V| points respectively, and W was defined as the set of the remaining points. \V\ was set to 50, 75, 100, and 120; T to 1, [0.10|V|J, |_0-20|V|J; and \W\ to |V|,2|V|,3|V|. For each combination, 5 instances were generated. The MOEA was run 5 times on each instance. The parameters of the MOEA were the following: the population size N was set to 256, and the stopping criterion parameter M to 5000. The MOEA was coded in C. Optimal Pareto sets were generated by means of the e-constraint method. The branch-and-cut algorithm was implemented in C with CPLEX 8.1. Runs were executed on a Pentium IV 2.67Ghz, 512 Mo of RAM, and Debian Linux 3.0 as operating system. Tables 11.57, 11.58, 11.59, and 11.60 provide three types of information. First, they give the computational times for the e-constraint method
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
259
and the MOEA. They also indicate the number of optimal Pareto solutions found by the exact method, and the maximum, average, minimum, and standard deviations of the ratio of optimal Pareto solutions reached by the MOEA. The maximum, average, minimum, and standard deviations of the generational distance 33 of the approximation generated by the MOEA according to the optimal Pareto set are also provided. The generational distance is expressed as follows: ^—^=1 ' , where n is the number of solutions in the approximation set, and di is the distance in objective space between solution i and the nearest solution of the optimal Pareto set. The coordinates of the solutions in the objective space were normalized, and therefore, all values x in the tables related to this metric must be read x x 10~4. The column headings are the following ones: NB Number of optimal Pareto solutions. time e Computational times for the e-constraint method. Max '/, Maximal ratio. Avg '/, Average ratio. Min '/. Minimal ratio. S.d. 7, Standard deviation of the ratio. Max GD Maximal generational distance. Avg GD Average generational distance. Min GD Minimal generational distance. S.d. GD Standard deviation of the generational distance. time EA Average computational time of the MOEA. Now, we discuss these results according to two points: the quality of the approximations and the computational times. First, the MOEA is able to generate good quality approximations. As a matter of fact, it is almost always able to generate a significant part of the optimal Pareto sets. Furthermore, the generational distance indicates that the non optimal solutions found are not far from the optimal solutions. When the size of V increases, it appears that the approximations remain of good quality. The sizes of T and W do not seem to have a significant impact on the quality of the approximations. Concerning the computational times, the exact method is faster for |V| = 50. This can be explained mainly by two facts. First, it only runs for a number of times equal to the number of optimal Pareto solutions, which, on average, is not important. Whereas, the MOEA must run at the end for at least 5000 generations before stopping. Furthermore, for each
260
N. Jozefowiez, F. Semet and E-G. Talbi
branch-and-cut algorithm, the cover is fixed, which allows the application of rules to simplify the problem 13. Due to these rules, the problems to solve for the branch-and-cut are simpler, notably when \T\ is important as it can be seen in Table 11.57. Whereas, for the MOEA, since we search the complete optimal Pareto set, the cover is not fixed, and the problem must be solved without simplifications. For \V\ = 75, the MOEA is significantly faster than the exact approach when \T\ = 1. This can be explained by the fact that the simplification rules do not reduce enough the problem size to provide a significant advantage to the exact algorithm. When \T\ increases, the simplification rules are more efficient, and the exact algorithm is faster. Note that for some results with \T\ = 7, and notably \W\ = 225, the simplifications are not important enough. These remarks are confirmed when |V| = 100. As a matter of fact, when \T\ = 1, the difference between the two methods are really significant in favor of the MOEA. Furthermore, for \T\ = 10 and \W\ = 200 or 300, the MOEA is always faster due to the fact that the reductions are less important. For \T\ = 10 and \W\ — 100, the computational times are roughly the same. For \T\ = 20, the reductions are still important enough for the exact algorithm to be faster, but it may be noted that when \W\ increases, the advantage of the exact method is not as marked as previously. From these remarks, we can deduce that with the augmentation of the size of V, the reduction rules will be less and less able to improve the computational times of the exact algorithm, whereas the computational times of the MOEA increase moderately. This is confirmed when \V\ = 120. Note that results for jV| = 120 and \T\ = 1 are not reported due to the prohibitive computational times required by the econstraint method. 11.5. Conclusions and Outlooks In this chapter, we have proposed a study of the bi-objective covering tour problem, which is a generalization of the covering tour problem. In this generalization, a constraint of the original problem is expressed quite naturally as a second objective. For the bi-objective covering tour problem, a multi-objective evolutionary algorithm, incorporating special features for the problem, has been designed and implemented. This meta-heuristic has been tested on a set of generated instances, and compared with an exact algorithm able to generate the complete optimal Pareto set. Results show that the multi-objective evolutionary algorithm is able to generate good quality approximations, while its execution time does not increase dramat-
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
261
ically when the size of the instances increases, which is the case for the exact method. Considered future works are the study of other genetic operators to decrease the computational time, while keeping good quality approximations, and the adaptation of the meta-heuristic to the real case of the Suhum district in Ghana. Acknowledgement This work was partially supported by the Nord-Pas-de-Calais Region. This support is gratefully acknowledged. Thanks is also due to the referee for his valuable comments. References 1. E. Balas and A. Ho. Set covering algorithm using cutting planes, heuristics, and subgradient optimization: A computational study. Mathematical Programming, 12:37-60, 1980. 2. B. Baran and M. Schaerer. A multiobjective ant colony system for vehicle routing problem with time windows. In IASTED International Conference on Applied Informatics, pages 97-102, Innsbruck, Austria, 2003. 3. J. E. Beasley and P. C. Chu. A genetic algorithm for the set covering problem. European Journal of Operational Research, 94:392-404, 1996. 4. B. Boffey. Multiobjective routing problems. Top, 3(2):167-220, 1995. 5. C. A. Coello Coello, D. A. Van Veldhuizen, and G. B. Lamont. Evolutionary Algorithms for solving Multi-Objective Problems. Kluwer Academic Publishers, New York, May 2002. ISBN 0-3064-6762-3. 6. A. Corboran, E. Fernandez, M. Laguna, and R. Marti. Heuristic solutions to the problem of routing school buses with multiple objectives. Journal of the operational research society, 53:427-435, 2002. 7. J. R. Current and D. A. Schilling. The median tour and maximal covering problems. European Journal of Operational Research, 73:114-126, 1994. 8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182-197, 2002. 9. N. El-sherbeny. Resolution of a vehicle routing problem with multi-objective simulated annealing method. PhD thesis, Faculte Polytechnique de Mons, 2001. 10. D. Feillet, P. Dejax, and M. Gendreau. Traveling salesman problem with profits. Transportation Science, 2003. (to be published). 11. M.J. Geiger. Genetic algorithms for multiple objective routing. In MIC2001 4 Metaheuristics International Conference, pages 349-353, Porto, Portugal, July 2001. 12. M. Gendreau, A. Hertz, and G. Laporte. New insertion and postoptimization procedures for the traveling salesman problem. Operations Research, 40:10861094, 1992.
262
N. Jozefowiez, F. Semet and E-G. Talbi
13. M. Gendreau, G. Laporte, and F. Semet. The covering tour problem. Operations Research, 45:568-576, 1997. 14. M. Hachicha, M. J. Hodgson, G. Laporte, and F. Semet. Heuristics for the multi-vehicle covering tour problem. Computers and Operations Research, 27:29-42, 2000. 15. M. J. Hodgson, G. Laporte, and F. Semet. A covering tour model for planning mobile health care facilities in suhum district, ghana. Journal of Regional Science, 38:621-638, 1998. 16. S-C. Hong and Y-B. Park. A heuristic for a bi-objective vehicle routing with time window constraints. International Journal of Production Economics, 62:249-258, 1999. 17. N. Jozefowiez, F. Semet, and E-G. Talbi. Parallel and hybrid models for multi-objective optimization: Application to the vehicle routing problem. In J.J. Merelo Guervos et a l , editors, PPSN VII, volume 2439 of Lecture Notes in Computer Science, pages 271-280. Springer-Verlag, September 2002. 18. C. P. Keller. Multiobjective routing through space and time: The MVP and TDVP problems. PhD thesis, Department of Geography, The University of Western Ontario, London, Ontario, Canada, 1985. Unpublished thesis. 19. C. P. Keller and M. Goodchild. The multiobjective vending problem: A generalization of the traveling salesman problem. Environment and Planning B: Planning and Design, 15:447-460, 1988. 20. M. Labbe and G. Laporte. Maximizing user convenience and postal servive efficiency in post box location. Belgian Journal of Operations Research, Statistic, and Computer Science, 26:21-35, 1986. 21. P. Lacomme, C. Prins, and M. Sevaux. Multiobjective capacitated arc routing problem. In C. M. Fonseca et al., editors, Evoluationary Multi-criterion Optimization, volume 2632 of LNCS, pages 550-564. Springer, 2003. 22. T-R. Lee and J-H. Ueng. A study of vehicle routing problem with load balancing. International Journal of Physical Distribution and Logistics Management, 29:646-648, 1998. 23. V. Maniezzo, R. Baldacci, M. Boschetti, and M. Zamboni. Scatter search methods for the covering tour problem. Technical report, Science dellTnformazione, University of Bologna, Italy, June 1999. 24. L. Motta, L. S. Ochi, and C. Martinhon. Grasp metaheuristics for the generalized covering tour problem. In MIC"2001 - 4 Metaheuristics International Conference, pages 387-391, Porto, Portugal, July 2001. 25. J. Pacheco and R. Marti. Tabu search for a multi-objective routing problem. Technical Report TR09-2003, University of Valencia, 2003. 26. L. Paquete and T. Stiitzle. A Two-Phase Local Search for the Biobjective Traveling Salesman Problem. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Optimization. Second International Conference, EMO 2003, volume 2632 of Lecture Notes in Computer Sciences, pages 479-493, Faro, Portugal, April 2003. SpringerVerlag. 27. Y. Park and C. Koelling. A solution of vehicle routing problems in multiple objective environment. Engineering Costs and Production Economics,
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
263
10:121-132, 1986. 28. Y. Park and C. Koelling. An interactive computerized algorithm for multicriteria vehicle routing problems. Computers and Industrial Engineering, 16:477-490, 1989. 29. R. Ribeiro and H. R. Lourengo. A multi-objective model for a multi period distribution management problem. In MIC2001 - 4 Metaheuristics International Conference, pages 97-102, Porto, Portugal, July 2001. 30. W. Sessomboon, K. Watanabe, T. Irohara, and K. Yoshimoto. A study on multi-objective vehicle routing problem considering customer satisfaction with, due-time (the creation of pareto optimal solutions by hybrid genetic algorithm). Transaction of the Japan Society of Mechanical Engineers, 1998. 31. C. Sutcliffe and J. Board. Optimal solution of a vehicle routing problem: Transporting mentally handicapped adults to an adult training centre. Journal of the Operational Research Society, 41:61-67, 1990. 32. P. Toth and D. Vigo, editors. The vehicle routing problem, volume 9 of SIAM Monographs on Discrete Mathematics and Applications. SIAM, December 2001. 33. D. A. Van Veldhuizen. Multiobjective evolutionary algorithms: Classifications, analysis, and new innovations. PhD thesis, Departement of electrical and computer engineering, Graduate school of engineering, Air Force Institute of Technology, Wright-Paterson AFB, Ohio, May 1999. 34. Z. Yan, L. Zhang, L. Kang, and G. Lin. A New MOEA for Multi-Objective TSP and Its Convergence Property Analysis. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Optimization. Second International Conference, EMO 2003, volume 2632 of Lecture Notes in Computer Sciences, pages 342-354, Faro, Portugal, April 2003. Springer-Verlag.
264
N. Jozefowiez, F. Semet and E-G. Talbi Table 11.57. Results for \V\ = 50.
\T\ \W\ N B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
50 52 50 47 50 36 50 40 50 63 100 73 100 40 100 47 100 94 100 41 150 55 150 50 150 44 150 85 150 60 50 46 50 30 50 49 50 45 50 25 100 34 100 33 100 31 100 29 100 33 150 38 150 29 150 38 150 29 150 65 50 17 50 19 50 16 50 23 50 22 100 21 100 12 100 29 100 29 100 20 150 19 150 18 150 27 150 19 150 39
time e 28 15 39 23 65 86 46 28 137 34 60 39 33 97 69 26 11 41 22 5 21 10 16 18 10 18 7 49 7 37 3 4 12 8 6 5 2 17 10 8 4 3 8 5 17
Max % 0.90 0.98 0.94 0.95 0.95 0.90 0.95 0.91 0.86 0.98 0.96 1.00 0.95 0.85 0.93 0.93 0.97 1.00 0.82 0.96 0.94 1.00 0.90 1.00 0.94 0.97 1.00 0.97 1.00 1.00 1.00 1.00 0.75 1.00 0.95 1.00 1.00 0.97 0.86 1.00 1.00 1.00 1.00 0.90 0.85
Avg % 0.89 0.98 0.93 0.93 0.93 0.85 0.95 0.90 0.82 0.95 0.96 0.97 0.93 0.77 0.89 0.91 0.88 0.97 0.77 0.93 0.91 0.98 0.90 0.99 0.93 0.96 0.96 0.96 1.00 0.97 1.00 1.00 0.69 0.94 0.95 0.99 1.00 0.97 0.86 1.00 1.00 0.92 1.00 0.90 0.84
Min % 0.87 0.98 0.89 0.93 0.89 0.71 0.95 0.87 0.72 0.90 0.95 0.90 0.89 0.70 0.85 0.87 0.83 0.94 0.71 0.92 0.91 0.94 0.87 0.97 0.91 0.95 0.93 0.94 1.00 0.94 1.00 1.00 0.63 0.87 0.95 0.95 1.00 0.97 0.86 1.00 1.00 0.83 1.00 0.90 0.79
S.d. % 0.01 0.00 0.02 0.01 0.02 0.07 0.00 0.02 0.05 0.02 0.01 0.03 0.03 0.04 0.03 0.02 0.05 0.02 0.04 0.02 0.01 0.02 0.01 0.02 0.01 0.01 0.02 0.01 0.00 0.02 0.00 0.00 0.04 0.06 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.02
Max GD 1.85 0.00 1.25 2.26 0.67 1.71 0.00 3.94 2.84 1.66 0.11 1.27 4.43 2.53 1.25 1.58 4.07 0.12 3.88 1.72 0.00 0.00 0.00 1.65 0.82 0.00 1.16 0.13 0.00 0.47 0.00 0.00 4.26 8.54 2.03 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.48
Avg GD 0.80 0.00 0.25 0.81 0.18 1.52 0.00 3.54 0.90 0.33 0.02 0.51 4.40 2.03 1.17 1.39 3.81 0.07 2.74 1.38 0.00 0.00 0.00 0.33 0.33 0.00 0.46 0.03 0.00 0.11 0.00 0.00 4.25 3.42 2.03 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00 0.28
Min GD 0.23 0.00 0.00 0.38 0.00 1.15 0.00 3.41 0.08 0.00 0.00 0.00 4.38 0.66 1.11 0.77 3.03 0.00 2.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.78 0.00 2.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.09
S.d. GD 0.55 0.00 0.50 0.73 0.25 0.23 0.00 0.20 0.99 0.66 0.04 0.48 0.02 0.71 0.05 0.31 0.39 0.06 0.61 0.69 0.00 0.00 0.00 0.66 0.40 0.00 0.57 0.05 0.00 0.18 0.00 0.00 0.71 4.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.48 0.00 0.00 0.12
time EA 61 37 51 47 93 150 38 56 178 56 79 81 61 228 95 131 68 111 124 30 76 75 40 61 39 91 44 101 26 181 74 78 83 127 70 93 72 102 74 64 82 96 47 60 144
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
265
Table 11.58. Results for |V| = 75. T\ \W\ N B time e 1 75 74 368 1 75 82 804 1 75 80 542 1 75 92 962 1 75 96 629 1 150 68 457 1 150 81 1859 1 150 98 1449 1 150 105 2606 1 150 82 752 1 225 124 7701 1 225 100 1874 1 225 67 292 1 225 103 912 1 225 108 2493 7 75 58 196 7 75 72 164 7 75 91 5027 7 75 60 173 7 75 44 74 7 150 63 144 7 150 40 39 7 150 56 427 7 150 26 19 7 150 62 546 7 225 45 84 7 225 52 323 7 225 42 93 7 225 64 1611 7 225 77 805 15 75 14 4 15 75 33 56 15 75 29 14 15 75 43 123 15 75 28 21 15 150 19 12 15 150 34 54 15 150 28 59 15 150 25 215 15 150 36 60 15 225 19 6 15 225 25 22 15 225 50 85 15 225 38 56 15 225 39 43
Max Avg Min S.d. Max Avg Min S.d. time % % % % GD GD GD GD EA 0.89 0.86 0.84 0.02 1.17 0.76 0.60 0.21 128 0.79 0.73 0.67 0.04 3.95 2.42 1.23 0.96 183 0.94 0.91 0.82 0.03 4.95 4.82 4.72 0.08 199 0.83 0.79 0.76 0.02 1.74 1.24 0.33 0.52 274 0.89 0.80 0.71 0.06 2.11 1.73 1.44 0.26 243 0.88 0.87 0.82 0.02 3.10 2.82 2.68 0.15 123 0.69 0.67 0.60 0.03 4.01 2.71 1.70 0.80 267 0.87 0.83 0.81 0.02 3.22 2.46 0.76 0.92 508 0.69 0.66 0.60 0.03 2.79 2.19 1.71 0.35 336 0.80 0.72 0.66 0.05 1.84 1.48 1.22 0.27 171 0.55 0.51 0.48 0.03 2.42 2.03 1.76 0.24 526 0.83 0.78 0.70 0.05 2.93 1.64 0.99 0.71 305 0.90 0.87 0.85 0.01 1.42 0.71 0.26 0.50 105 0.83 0.76 0.71 0.04 0.86 0.53 0.31 0.20 268 0.76 0.70 0.65 0.04 2.88 1.74 1.24 0.59 350 0.97 0.91 0.83 0.05 2.52 0.99 0.00 0.85 258 0.63 0.56 0.47 0.07 4.36 3.38 2.36 0.72 464 0.71 0.63 0.55 0.05 3.02 2.15 1.20 0.58 697 0.85 0.84 0.83 0.01 0.93 0.42 0.19 0.26 269 1.00 0.93 0.82 0.07 4.86 1.29 0.00 1.82 190 0.92 0.85 0.79 0.04 2.10 1.56 1.10 0.36 287 0.58 0.50 0.43 0.05 1.40 1.08 0.66 0.28 83 0.98 0.87 0.82 0.06 2.34 0.76 0.00 0.85 265 1.00 0.99 0.96 0.02 0.00 0.00 0.00 0.00 60 0.89 0.87 0.85 0.02 0.87 0.54 0.16 0.32 160 1.00 0.98 0.98 0.01 4.49 1.82 0.00 2.18 183 0.58 0.56 0.51 0.02 6.31 5.86 5.57 0.25 134 0.98 0.96 0.93 0.02 1.60 0.68 0.18 0.52 176 0.75 0.71 0.63 0.04 2.64 2.28 2.03 0.23 395 0.78 0.72 0.69 0.03 6.27 4.71 3.83 0.85 560 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 85 0.88 0.76 0.70 0.07 0.39 0.30 0.12 0.11 205 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 146 0.65 0.65 0.63 0.01 5.11 3.94 2.78 0.75 319 0.79 0.76 0.75 0.01 1.97 1.91 1.83 0.46 147 1.00 0.92 0.79 0.09 3.18 1.24 0.00 1.52 116 0.91 0.88 0.85 0.02 1.99 1.26 0.37 0.72 221 0.89 0.86 0.82 0.03 0.69 0.40 0.21 0.23 156 0.92 0.92 0.92 0.00 0.48 0.48 0.48 0.00 141 0.94 0.92 0.89 0.03 1.64 1.35 0.91 0.24 182 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 103 0.88 0.82 0.76 0.04 4.62 3.43 2.93 0.61 215 0.72 0.68 0.64 0.03 5.77 5.38 4.71 0.36 354 0.97 0.94 0.89 0.03 4.27 4.01 3.94 0.13 272 1.00 0.97 0.95 0.02 2.27 0.92 0.00 1.10 149
266
N. Jozefovriez, F. Semet and B-G. Talbi Table 11.59.
\T\ \W\ N B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
100 129 100 98 100 129 100 127 100 141 200 128 200 159 200 112 200 116 200 142 300 143 300 98 300 102 300 123 300 146 100 63 100 50 100 70 100 38 100 86 200 60 200 61 200 48 200 72 200 42 300 60 300 72 300 107 300 98 300 75 100 44 100 12 100 48 100 12 100 41 200 16 200 49 200 44 200 71 200 41 300 38 300 35 300 29 300 54 300 16
time e 29776 6592 35743 6758 19242 28448 39325 15690 6995 19426 15347 8750 6335 37622 41065 3193 157 410 103 896 495 492 555 2582 556 649 1897 17554 7426 2020 362 4 149 6 157 13 323 144 1507 400 98 812 390 148 17
Max % 0.62 0.59 0.60 0.78 0.63 0.73 0.54 0.83 0.78 0.61 0.62 0.79 0.82 0.76 0.56 0.73 0.50 0.80 0.97 0.80 0.98 0.84 0.90 0.81 0.95 0.95 0.88 0.63 0.63 0.84 0.82 1.00 0.60 0.50 0.88 0.94 0.67 0.61 0.66 0.68 0.79 0.80 0.38 0.59 0.88
Avg % 0.60 0.59 0.57 0.73 0.59 0.63 0.53 0.76 0.74 0.56 0.52 0.73 0.77 0.73 0.54 0.69 0.56 0.79 0.97 0.78 0.79 0.82 0.77 0.76 0.90 0.91 0.80 0.59 0.58 0.80 0.75 1.00 0.56 0.50 0.77 0.94 0.63 0.59 0.62 0.67 0.70 0.75 0.38 0.57 0.88
Results for |V| = 100. Min % 0.57 0.58 0.55 0.69 0.54 0.55 0.52 0.66 0.69 0.50 0.43 0.68 0.67 0.69 0.50 0.65 0.48 0.77 0.97 0.76 0.62 0.75 0.73 0.72 0.85 0.85 0.74 0.56 0.52 0.76 0.61 1.00 0.48 0.50 0.61 0.94 0.59 0.55 0.55 0.66 0.58 0.69 0.38 0.54 0.88
S.d. % 0.02 0.01 0.02 0.04 0.04 0.06 0.01 0.07 0.04 0.04 0.08 0.04 0.06 0.02 0.02 0.03 0.04 0.01 0.00 0.02 0.13 0.03 0.06 0.03 0.03 0.04 0.05 0.02 0.04 0.03 0.08 0.00 0.05 0.00 0.10 0.00 0.03 0.03 0.04 0.01 0.09 0.04 0.00 0.02 0.00
Max GD 4.95 6.07 2.76 2.03 3.74 3.81 4.41 4.91 3.94 2.65 3.35 2.44 2.49 2.57 4.30 3.02 9.34 4.56 0.00 3.15 4.23 5.98 3.37 1.95 5.42 0.61 1.84 2.19 3.61 3.61 4.16 0.00 4.00 5.90 3.73 0.00 8.56 5.20 4.12 6.39 7.01 4.97 13.11 2.28 1.03
Avg GD 3.28 5.49 2.01 1.76 3.12 2.96 3.08 3.56 2.32 2.12 2.72 1.83 1.04 2.14 2.98 2.63 7.39 3.04 0.00 1.97 2.02 2.80 2.55 1.23 2.12 0.49 1.24 1.79 3.39 0.94 2.48 0.00 2.87 5.25 3.08 0.00 8.30 4.70 3.07 6.36 2.87 3.82 8.40 1.65 1.03
Min GD 2.15 4.75 1.49 0.90 2.44 1.69 2.29 1.99 1.12 1.81 1.96 0.73 0.31 1.94 2.01 1.42 6.36 1.58 0.00 0.85 0.00 0.48 0.07 0.74 0.00 0.43 0.55 1.44 3.12 0.21 1.68 0.00 1.43 4.43 2.50 0.00 8.11 4.55 2.28 6.31 1.15 2.31 6.83 1.13 1.03
S.d. time GD EA 0.99 623 0.57 415 0.51 740 0.43 446 0.51 572 0.76 703 0.78 842 1.09 534 0.94 447 0.30 803 0.58 1414 0.61 500 0.91 389 0.23 773 0.76 1121 0.61 414 1.10 176 0.98 543 0.00 157 0.83 856 1.79 400 1.87 447 1.25 375 0.40 678 2.06 287 0.07 415 0.43 633 0.29 1086 0.16 1089 1.34 714 0.90 710 0.00 152 0.94 700 0.62 519 0.46 574 0.00 135 0.15 632 0.25 355 0.62 1500 0.04 324 2.11 835 0.95 557 2.36 470 0.47 526 0.00 240
A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem
267
Table 11.60. Results for \V\ = 120.
\T\ \W\ N B time Max Avg Min S.d. M a x Avg Min S.d. time e % % % % GD GD GD GD EA 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24
120 120 120 120 120 240 240 240 240 240 360 360 360 360 360 120 120 120 120 120 240 240 240 240 240 360 360 360 360 360
50 69 19 47 76 85 70 88 89 84 99 102 117 82 68 46 60 9 38 23 35 57 59 48 64 59 62 30 48 51
609 1814 53 2072 24776 5192 777 11446 9467 32239 11154 34350 8083 10009 13811 566 829 29 85 163 257 1498 12264 194 1354 2557 839 1204 406 313
0.88 0.75 1.00 0.91 0.72 0.69 0.83 0.72 0.75 0.81 0.39 0.55 0.53 0.55 0.71 0.87 0.77 0.11 0.89 0.91 0.86 0.95 0.71 0.71 0.41 0.61 0.82 1.00 0.77 0.88
0.80 0.70 0.97 0.85 0.64 0.67 0.77 0.64 0.69 0.72 0.35 0.46 0.49 0.52 0.65 0.83 0.72 0.11 0.83 0.87 0.72 0.89 0.62 0.68 0.40 0.58 0.76 0.99 0.68 0.83
0.70 0.64 0.84 0.81 0.54 0.65 0.73 0.57 0.66 0.62 0.31 0.35 0.42 0.49 0.57 0.80 0.68 0.11 0.68 0.83 0.66 0.74 0.54 0.65 0.39 0.54 0.68 0.97 0.64 0.76
0.07 0.05 0.06 0.05 0.06 0.02 0.03 0.05 0.03 0.08 0.03 0.06 0.04 0.02 0.05 0.02 0.03 0.00 0.08 0.03 0.07 0.08 0.06 0.03 0.01 0.02 0.05 0.02 0.05 0.05
4.94 3.58 0.08 3.96 5.12 5.28 3.57 2.73 7.33 1.75 4.82 3.30 5.59 2.57 4.44 7.87 3.71 10.85 6.91 1.40 4.43 3.42 1.55 8.16 7.04 3.57 1.81 0.55 3.08 1.83
1.12 2.47 0.02 1.98 3.90 4.64 1.16 2.48 3.76 1.40 4.42 2.17 4.29 2.79 3.89 4.53 2.49 10.62 4.04 0.32 2.38 1.59 1.26 7.72 5.70 3.47 1.43 0.11 1.93 0.89
0.04 1.95 0.00 0.66 2.02 2.94 0.25 2.22 1.65 0.77 3.94 1.19 2.66 2.29 3.49 0.28 2.04 9.71 1.48 0.05 0.91 0.37 0.86 7.39 4.76 3.35 1.07 0.00 1.33 0.38
1.92 0.58 0.03 1.19 1.03 0.88 1.23 0.20 2.10 0.36 0.35 0.76 1.06 1.77 0.32 2.78 0.62 4.53 2.40 0.54 1.67 1.12 0.30 0.29 0.83 0.08 0.32 0.22 0.62 0.52
569 489 98 368 673 1129 502 963 1424 1149 1233 1881 1746 744 613 590 1135 182 365 229 437 748 1038 634 1145 1276 1208 407 744 572
CHAPTER 12 A COMPUTER ENGINEERING BENCHMARK APPLICATION FOR MULTIOBJECTIVE OPTIMIZERS
Simon Kiinzli, Stefan Bleuler, Lothar Thiele, and Eckart Zitzler Department of Information Technology and Electrical Engineering Swiss Federal Institute of Technology (ETH) Zurich Gloriastrasse 35, CH-8092 Zurich, Switzerland E-mail: {kuenzli,bleuler,thiele,zitzler} @tik.ee.ethz. ch Among the vaious benchmark problems designed to compare and evaluate the performance of multiobjective optimizers, there is a lack of real-world applications that are commonly accepted and, even more important, are easy to use by different research groups. The main reason is, in our opinion, the high effort required to re-implement or adapt the corresponding programs. This chapter addresses this problem by presenting a demanding packet processor application with a platform- and programminglanguage-independent interface. The text-based interface has two advantages: it allows (i) to distribute the application as a binary executable pre-compiled for different platforms, and (ii) to easily couple the application with arbitrary optimization methods without any modifications on the application side. Furthermore, the design space exploration application presented here is a complex optimization problem that is representative for many other computer engineering applications. For these reasons, it can serve as a computer engineering benchmark application for multiobjective optimizers. The program can be downloaded together with different multiobjective evolutionary algorithms and further benchmark problems from http://www.tik.ee.ethz.ch/pisa/. 12.1. Introduction The field of evolutionary multiobjective optimization (EMO) has been growing rapidly since the first pioneering works in the mid-1980's and the early 1990's. Meanwhile numerous methods and algorithmic components are available, and accordingly there is a need for representative benchmark 269
270
5. Kunzli, S. Bleuler, L. Thiele and E. Zitzler
problems to compare and evaluate the different techniques. Most test problems that have been suggested in the literature are artificial and abstract from real-world scenarios. Some authors considered multiobjective extensions of NP-hard problems such as the knapsack problem 27, the set covering problem 13, and the quadratic assignment problem 14. Other benchmark problem examples are the Pseudo-Boolean functions introduced by Thierens 23 and Laumanns et al. 15 that were designed mainly for theoretical investigations. Most popular, though, are real-valued functions 7>6. For instance, several test functions representing different types of problem difficulties were proposed by Zitzler et al. 24 and Deb et al. 9 . Although there exists no commonly accepted set of benchmark problems as, e.g., the SPEC benchmarks in computer engineering, most of the aforementioned functions are used by different researchers within the EMO community. The reason is that the corresponding problem formulations are simple which in turn keeps the implementation effort low. However, the simplicity and the high abstraction level come along with a loss of information: various features and characteristics of real-world applications cannot be captured by these artificial optimization problems. As a consequence, one has to test algorithms also on actual applications in order to obtain more reliable results. Complex problems in various areas have been tackled using multiobjective evolutionary algorithms, and many studies even compare two or several algorithms on a specific application 7 ' 6 . The restricted reusability, though, has prohibited so far that one or several applications have established themselves as benchmark problems that are used by different research groups. Re-implementation is usually too labor-intensive and error-prone, while re-compilation is often not possible because either the source code is not available, e.g., due to intellectual property issues, or particular software packages are needed that are not publicly available. To solve this problem, we present a computer engineering application, namely the design space exploration of packet processor architectures, that • provides a platform- and programming-language-independent interface that allows the usage of pre-compiled and executable programs and therefore circumvents the problem mentioned above, • is scalable in terms of complexity, i.e., problem instances of different levels of difficulty are available, and • is representative for several other applications in the area of computer design 3-10.26. The application will be described in terms of the underlying optimiza-
A Computer Engineering Benchmark Application
271
tion model in Section 12.2, while Section 12.3 focuses on the overall software architecture and in particular on the interface. Section 12.4 demonstrates the application of four EMO techniques on several problem instances and compares their performance on the basis of a recently proposed quality measure 28. The last section summarizes the main results of this chapter. 12.2. Packet Processor Design Packet processors are high-performance, programmable devices with special architectural features that are optimized for network packet processing6. They are mostly embedded within network routers and switches and are designed to implement complex packet processing tasks at high line speeds such as routing and forwarding, firewalls, network address translators, means for implementing quality-of-service (QoS) guarantees to different packet flows, and also pricing mechanisms. Other examples of packet processors would be media processors which have network interfaces. Such processors have audio, video and packetprocessing capabilities and serve as a bridge between a network and a source/sink audio/video device. They are used to distribute (real-time) multimedia streams over a packet network like wired or wireless Ethernet. This involves receiving packets from a network, followed by processing in the protocol stack, forwarding to different audio/video devices and applying functions like decryption and decompression of multimedia streams. Similarly, at source end, this involves receiving multimedia streams from audio/video devices (e.g. video camera, microphone, stereo systems), probably encrypting, compressing and packetizing them, and finally sending them over a network. Following the above discussion, there are major constraints to satisfy and conflicting goals to optimize in the design of packet processors: • Delay Constraints: In case of packets belonging to a multimedia stream, there is very often a constraint on the maximal time a packet is allowed to stay within the packet processor. This upper delay must be satisfied under all possible load conditions imposed by other packet streams that are processed simultaneously by the same device. • Throughput Maximization: The goal is to maximize the maximum possible throughput of the packet processing device in terms e
In this area of application, also the term network processor is used.
272
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
of the number of packets per second. • Cost Minimization: One is interested in a design that uses a small amount of resources, e.g., single processing units, memory and communication networks. • Conflicting Usage Scenarios: Usually, a packet processor is used in several, different systems. For example, one processor will be implemented within a router, another one is built into a consumer device for multimedia processing. The requirements from these different applications in terms of throughput and delay are typically in conflict to each other. All of the above constraints and conflicting goals will be taken into account in the benchmark application. 12.2.1. Design Space Exploration Complex embedded systems like packet processors are often comprised of a heterogeneous combination of different hardware and software components such as CPU cores, dedicated hardware blocks, different kinds of memory modules and caches, various interconnections and I/O interfaces, run-time environment and drivers, see, e.g., Figure 12.1. They are integrated on a single chip and they run specialized software to perform the application.
Fig. 12.1. Template of a packet processor architecture as used in the benchmark application
Typically, the analysis questions faced by a designer during a systemlevel design process are: • Allocation: Determine the hardware components of the packet processor like microprocessors, dedicated hardware blocks for computationally intensive application tasks, memory and busses.
A Computer Engineering Benchmark Application
273
• Binding: For each task of the software application choose an allocated hardware unit which executes it. • Scheduling Policy: For the set of tasks that are mapped onto a specific hardware resource choose a scheduling policy from the available run-time environment, e.g., a fixed priority. Most of the available design methodologies start with an abstract specification of the application and the performance requirements. These specifications are used to drive a system-level design space exploration 17, which iterates between performance evaluation and exploration steps, see also Thiele et al. 19-20! and Blickle et al. 3 . Finally, appropriate allocations, bindings, and scheduling strategies are identified. The methodology used in the benchmark application of this paper is shown in Figure 12.2.
Fig. 12.2. Design space exploration methodology used in the benchmark application
One of the major challenges in design exploration is to estimate the essential characteristics of the final implementation in an early design stage. In the case of packet processor design, the performance analysis has to cope with two major problems which make any kind of compositional analysis difficult: (1) The architecture of such systems is highly heterogeneous— the different architectural components have different computing capabilities and use different arbitration and resource sharing strategies; (2) the packets of one or different packet streams interact on the various resources, i.e., if a resource is busy in processing one packet, others have to wait.
274
5. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
This interaction between packet streams is of a tremendous complexity and influences packet delays and memory usage. There is a large body of work devoted to system-level performance analysis of embedded system architectures, see Gajski et al. 12 and the references therein. Currently, the analysis of such heterogeneous systems is mainly based on simulation. The main advantage of using simulation as a means for performance evaluation is that many dynamic and complex interactions in an architecture can be taken into account, which are otherwise difficult to model analytically. On the other hand, simulation based tools suffer from high running times, incomplete coverage, and failure to identify corner cases. Analytical performance models for DSP systems and embedded processors were proposed in, e.g., Agarwal 1, or Franklin and Wolf n . These models may be classified under what can be called a "static analytical model". Here, the computation, communication, and memory resources of a processor are all described using simple algebraic equations that do not take into account the dynamics of the application, i.e., variations in resource loads and shared resources. In contrast to this class of approaches, the models we will use in here may be classified under "dynamic analytical models", where the dynamic behavior of the computation and communication resources (such as the effects of different scheduling or bus arbitration schemes) are also modeled, see, e.g., Thiele and co-workers 16>22>4. Applications to stream-processing have been reported in different publications 21.18,19,5_ 12.2.2. Basic Models and Methods According to Figure 12.2, basic prerequisites of the design space exploration are models for the architecture, the application, the run-time scheduling, and the application scenarios. Based on these models, we will describe the method for performance analysis. Architecture Template and Allocation Following Figure 12.1, the model for a packet processor consists of a set of computation units or processing elements which perform operations on the individual packets. In order to simplify the discussion and the benchmark application, we will not model the communication between the processing elements, i.e., packets can be moved from one memory element to the next one without constraints. Definition 1: We define a set of resources R. To each resource r 6 R
A Computer Engineering Benchmark Application
275
we associate a relative implementation cost cost{r) > 0. The allocation of resources is described by the function alloc(r) £ {0,1}. To each resource r there are associated two functions /3"(A) > 0 and filr{A) > 0, denoted as upper and lower service curves, respectively. Initially, we specify all available processing units as our resource set R and associate the corresponding costs to them. For example we may have the resources R = {ARM9, MEngine, Classifier, DSP, Cipher, LookUp, CheckSum, PowerPC}. During the allocation step (see Figure 12.2), we select those which will be in a specific architecture, i.e., if alloc(r) = 1, then resource r £ R will be implemented in the packet processor architecture. The upper and lower service curves specify the available computing units of a resource r in a relative measure, e.g., processor cycles or instructions. In particular, /?"(A) and /?'(A) are the maximum and minimum number of available processor cycles in any time interval of length A. In other words, the service curves of a resource determine the best case and worst case computing capabilities. For details, see, e.g., Thiele et al. 18. Software Application and Binding The purpose of a packet processor is to simultaneously process several streams of packets. For example, one stream may contain packets that store audio samples and another one contains packets from an FTP application. Whereas the different streams may be processed differently, each packet of a particular stream is processed identically, i.e., each packet is processed by the same sequence of tasks. Definition 2: We define a set of streams s £ S and a set of tasks t £ T. To each stream s there is an ordered sequence of tasks V(s) = [to,...,tn] associated. Each packet of the stream is first processed by task t0 £ T, then successively by all other tasks until tn £ T. As an example we may have five streams 5 = {RTSend, NRTDecrypt, NRTEncrypt, RTRecv, NRTForward}. According to Figure 12.3, the packets of these streams when entering the packet processor undergo different sequences of tasks, i.e., the packets follow the paths shown. For example, for stream s = NRTForward we have the sequence of tasks V(s) = [LinkRX, VerifyIPHeader, ProcessIPHeader, Classify, RouteLookUp, ... , Schedule, LinkTx]. Definition 3: The mapping relation M C T x R defines all possible bindings of tasks, i.e., if (t, r) £ M, then task t could be executed on resource r. This execution of t for one packet would use w(r, t) > 0 computing units of
276
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitder
Fig. 12.3. Task graph of a packet processing application
r. The binding B of tasks to resources B C M is a subset of the mapping such that every task t £ T is bound to exactly one allocated resource r £ R, alloc(r) — 1. We also write r = bind(t) in a functional notation. In a similar way as alloc describes the selection of architectural components, bind defines a selection of the possible mappings. Both alloc and bind will be encoded using an appropriate representation described later. The 'load' that a task t puts onto its resource r = bind(t) is denoted as w(r,t). Figure 12.4 represents an example of a mapping between tasks and resources. For example, task 'Classify' could be bound to resource 'ARM9' or 'DSP'. In a particular implementation of a packet processor we may have 6m 0. There are no streams with equal priority.
A Computer Engineering Benchmark Application
111
Fig. 12.4. Example of a mapping of task to resources
In the benchmark application, we suppose that only preemptive fixedpriority scheduling is available on each resource. To this end, we need to associate to each stream s a fixed priority prio(s) > 0, i.e., all packets of s receive this priority. From all packets that wait to be executed in a memory, the run-time environment chooses one for processing that has the highest priority among all waiting packets. If several packets from one stream are waiting, then it prefers those that are earlier in the task chain V(s). Application Scenarios A packet processor will be used in several, possibly conflicting application scenarios. Such a scenario is described by the properties of the input streams, the allowable end-to-end delay (deadline) for each stream and the available total memory for all packets (sum of all individual memories of the processing elements). Definition 5: The properties of each stream s are described by upper and lower arrival curves a™ (A) and a's(A). To each stream s e S there is associated the maximal total packet memory m(s) > 0 and an end-to-end deadline d(s) > 0, denoting the maximal time by which any packet of the stream has to be processed by all associated tasks V(s) after his arrival. The upper and lower arrival curves specify upper and lower bounds on the number of packets that arrive at the packet processor. In particular, a"(A) and a's(A) are the maximum and minimum number of packets in any time interval of length A. For details, see, e.g., Thiele et al. 21. Definition 6: The packet processor is evaluated for a set of scenarios
278
5. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
b £ B. The quantities of Definition 5 are defined for each scenario independently. In addition, whereas the allocation alloc is defining a particular hardware architecture, the quantities that are specific for a software application are also specific for each scenario b £ B and must be determined independently, for example the binding bind of tasks to processing elements and the stream priorities prio. Performance Analysis It is not obvious how to determine for any memory module, the maximum number of stored packets in it waiting to be processed at any point in time. Neither is it clear how to determine the maximum end-to-end delays experienced by the packets, since all packet flows share common resources. As the packets may flow from one resource to the next one, there may be intermediate bursts and packet jams, making the computations of the packet delays and the memory requirements non-trivial. Interestingly, there exists a computationally efficient method to derive worst-case estimates on the end-to-end delays of packets and the required memory for each computation and communication. In short, we construct a scheduling network and apply the real-time calculus (based on arrival and service curves) in order to derive the desired bounds. The description of this method is beyond the scope of this chapter but can be found in Thiele et al. 21>18-19. As we know for each scenario the delay and memory in comparison to the allowed values d(b,s) and m(b,s), we can increase the input traffic until the constraints are just about satisfied. In particular, we do not use the arrival curves aYb ^ and aL * directly in the scheduling network, but linearly scaled amounts ipi> • a^b , and ipt, • aL s~., where the scaling factor ipt, is different for each scenario. Now, binary search is applied to determine the maximal throughput such that the constraints on delay and memory are just about satisfied. For the following discussion, it is sufficient to state the following fact: • Given the specification of a packet processing design problem by the set of resources r € R, the cost function for each resource cost(r), the service curves /?" and /?£, a set of streams s E S, a set of application tasks t 6 T, the ordered sequence of tasks for each stream V(s), and the computing requirement w(r,t) for task t on resource r;
A Computer Engineering Benchmark Application
279
• given a set of application scenarios b e B with associated arrival curves for each stream a?b s and al,b SN , and a maximum delay and memory for each stream d(b, s) and m(b, s); • given a specific HW/SW architecture denned by the allocation of hardware resources alloc(r), for each scenario b a specific priority of each stream prio(b, s) and a specific binding bind(b, t) of tasks t to resources; • then we can determine — using the concepts of scheduling network, real-time calculus and binary search — the maximal scaling factor tpb such that under the input arrival curves ^ • aYb > and tpf, • aL SN the maximal delay of each packet and the maximal number of stored packets is not larger than d(b, s) and m(b, s), respectively. As a result, we can define the criteria for the optimization of packet processors. Definition 7: The quality measures for packet processors are the associated cost cost = J2rl-Ralloc(r)cost(r) and the throughput ipt, for each scenario b £ B. These quantities can be computed from the specification of a HW/SW architecture, i.e., alloc{r), prio(b, s) and bind(b, t) for all streams s £ S and tasks t £ T. Now, the benchmark application is defined formally in terms of an optimization problem. In the following, we will describe the two aspects, representation and variation operators, that are specific to the evolutionary algorithm implementation. Representation Following Figure 12.2 and Definition 7, a specific HW/SW architecture is denned by alloc(r), prio(b,s) and bind(b,t) for all resources r £ R, streams s £ S and tasks t £ T. For the representation of architectures, we number the available resources from 1 to \R\; the tasks are numbered from 1 to \T\, and each stream is assigned a number between 1 and \S\. The allocation of resources can then be represented as integer vector A £ {0, l}' f l ', where A[i] = 1 denotes, that resource i is allocated. To represent the binding of tasks on resources, we use a two-dimensional vector Z £ { 1 , . . . , |i?|}lBlxlTl, where for all scenarios b £ B it is stored which task is bound to which resource. Z[«][j] = k means that in scenario i task j is bound to resource k. Priorities of flows are represented as a two-dimensional vector P £ { l , . . . , | 5 | } | B | x | l S | , where we store the streams according to their priorities, e.g., P[i][j] = A; means that in scenario i, stream k has priority j , with 1 being the highest priority. Obviously, not all possible encodings A,
280
5. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
Z, P represent feasible architectures. Therefore, a repair method has been developed that converts infeasible solutions into feasible ones. Recombination The first step in recombining two individuals is creating exact copies of the parent individuals. With probability 1 - Pcross, these individuals are returned as offspring and no recombination takes place. Otherwise, crossing over is performed on either the allocation, the task binding or the priority assignment of flows. With probability Pcross-aiioc, a one-point crossover operation is applied to the allocation vectors Ai and A2 of the parents: First we randomly define the position j where to perform the crossover, then we create the allocation vector Anewi for the first offspring as follows: ^nei«l[i] = Ai[i], if 1 < i < j Anewi\i] = A2\i], H j < i <
\R\
Similarly, Anew2 is created. After this exchange in the allocation of resources, the repair method is called, to ensure, that for all tasks there is at least one resource allocated, on which the task can be performed. If the crossover is not done within the allocation vector, it is performed with probability PcrOss-bind within the binding of tasks to resources. In detail, a scenario b 6 B is randomly determined, for which the crossover of the binding vectors should happen. Then, a one point crossover for the binding vectors Z\\b\ and Z2[b] of the parents according to the following procedure is performed, where j is a random value in the interval [1, \T\]. Znewi[b][i\
=
ZneWi[b]\i]
= Z2[b)\i],
Z1[b]\i],itl
The binding Znew2 can be determined accordingly. Finally, if the crossover is neither in the allocation nor in the binding of tasks to resources, the crossover happens in the priority vector. For a randomly selected scenario b, the priority vectors -Pi[6] and P2[b] are crossed in one point to produce new priority vectors Pnew\ and PneW2 following a similar procedure as described above. Mutation First, an exact copy of the individual to be mutated is created. With probability 1—Pmut, n 0 mutation takes place and the copy is returned. Otherwise, the copy is modified with respect to either the allocation, the task binding, or the priority assignment. We mutate the allocation vector with probability Pmut-aiioc- To this end, we randomly select a resource i and set Aneu,[i] = 0 with probability
A Computer Engineering Benchmark Application
281
Pmut-aiioc-zero, otherwise we set .Aneu,[i] = 1- After this change in the allocation vector, the repair method is called, which changes infeasible bindings such that they all map tasks to allocated resources only. In case the mutation does not affect the allocation vector, with probability Pmut-bind we mutate the binding vector Znew[b] for a randomly determined scenario b e B. That is we randomly select a task and map it to a resource randomly selected from the specification. If the resource is not yet allocated in this solution, we additionally allocate it. If we do neither mutate the allocation nor the binding, we mutate the priority vector for a randomly selected scenario b. We just exchange two flows within the priority list Pnew [b]. 12.3. Software Architecture So far, we have discussed evaluation, representation and variation for the proposed benchmark application. In the following, we will discuss how these components are combined with the fitness assignment and selection components. The overall software architecture is shown in Figure 12.5. It depicts how the application itself is separated from the multiobjective optimizer via a text-based interface. In this context, the two main questions are: (i) which elements of the optimization process should be part of the implementation of the benchmark application and (ii) how to establish the communication between the two parts? 12.3.1. General
Considerations
Essentially, most proposed multiobjective optimizers differ only in their selection operators: how promising individuals are selected for variation and how it is decided which individuals are removed from the population. Accordingly, most studies comparing different optimizers keep the representation of individuals and the variation operators fixed in order to assess the performance of the selection operators, which form the problemindependent part of the optimization process 7>6. Consistently with this approach, the packet processor benchmark module consists of the individual handling including their representation and the objective function evaluation, as well as the variation operators (see Figure 12.2 and its extension in Figure 12.5). The division of the optimization process into two parts raises the problem of communication between the benchmark application and the optimizer. Several options can be considered: restricting oneself to a specific
282
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
APPLICATION (VARIATION)
OPTIMIZER (SELECTION)
Fig. 12.5. Overview of the separation between benchmark and optimizer
programming language that is available on many platforms, e.g., C or Java, would allow to provide the modules as library functions. However, coupling two modules which are written in different programming languages would then be difficult, and it would in any case be necessary to re-compile or at least re-link the program for each optimizer. Alternatively, special communication mechanisms like UNIX sockets, which are independent of the programming language, could be used. The drawback is, though, that these mechanisms are not supported on all platforms. We have therefore decided to implement the benchmark and the optimizer as separate programs which communicate through text files. The use of text files guarantees that any optimizer can be coupled to the benchmark even if the two programs are written in different programming languages and run on different machines with different operating systems as long as both have access to a common file system. This flexibility does certainly not come for free. There is an additional overhead in running time; however, it
A Computer Engineering Benchmark Application
283
is minimal and can be neglected in the case of real world applications as a series of tests have shown 2. The interface developed for this benchmark application has been proposed and described in detail in Bleuler et al. 2. This concept named PISAf is applicable in a much wider range of scenarios since the chosen separation between selection and variation is suitable for most evolutionary multiobjective optimizers as well as for many other stochastic search algorithms. Additionally, the interface definition provides means for extensions to adjust it to specific needs. In the following we describe the main characteristics of the interface and especially its communication protocol. 12.3.2. Interface
Description
As mentioned above the two parts shown in Figure 12.5 are implemented as separate programs. Since the two programs run as independent processes, they need a method for synchronization. The procedure is based on a hand shake protocol which can be described using two state machines (see Figure 12.6). In general only one process is active at one time. When it reaches a new state it writes this state, encoded as a number, to a text file. During that time the other process has been polling this state file and now becomes
Fig. 12.6. Handshake protocol: The two processes can be modeled as finite state machines using the state file to synchronize. The data files are not shown
f
PISA stands for 'Platform- and programming-language-independent Interface for Search Algorithms'
284
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
active, while the first process starts polling the state file. Specifically it works as follows. Both programs independently perform some initialization, e.g., reading of parameter files. During this step, the benchmark application generates the initial population and evaluates the individuals. It then writes the IDs of all individuals and their objective values to a file and changes the state number. As soon as the optimizer is done with its initialization, it starts polling the state file ('wait' state in Figure 12.6). When the state number is changed by the benchmark application, the optimizer reads the data file and selects promising parents for variation (in 'select'). Their IDs are written to another text file. The optimizer can maintain a pool of IDs it might consider again for selection in future iterations. The list of these archived IDs is also written to a text file. Then the state number is changed back. The benchmark program, which had been polling the state file, now (in 'variate') reads the list of archived IDs and deletes all individuals which are not on this list. Then the benchmark reads the IDs of the parents and produces offspring by variation of the respective individuals. They are evaluated, IDs and objective values are written to the text file, and the cycle can start again. Since the optimizer only operates in the objective space, it is not necessary to communicate the actual representation of the individuals to the optimizer. The amount of data exchanged is thus small. For the exact specification of the protocol and the format of the text files see Bleuler et al. 2. 12.4. Test Cases The packet processor benchmark application has been implemented in Java. The corresponding tool EXPO, which has been tested under Solaris, Linux and Microsoft Windows, provides a graphical user interface that allows to control the program execution (c.f. Figure 12.7): the optimization run can be halted, the current population can be plotted and individual processor designs can be inspected graphically. In the following, we will present three problem instances for the packet processor application and demonstrate how to compare different evolutionary multiobjective optimizers on these instances. 12.4.1. Problem Instances We consider three problem instances that are based on the packet processing application depicted in Figure 12.3. The set of available resources is the
A Computer Engineering Benchmark Application
285
Fig. 12.7. The user interface of the benchmark application: The main control window in the upper left, a plot the current population in the upper right and a graphical representation of a network processor configuration in the lower part
same for all the problem instances. The three problem instances differ in the number of objectives. We have defined a problem with 2 objectives, one with 3 and a scenario including 4 objectives. For all the different instances one objective is in common, the total cost of the allocated resources. The remaining objectives in a problem instance are the performance ^b of the solution packet processor under a given load scenario b £ B. In Table 12.61 the different load characteristics for the remaining objectives are shown. Overall, three different loads can be identified; in Load 1 all flows have to be processed; in Load 2 there are only the threeflowsreal-time voice receive and send, and non-real-time (NRT) packet forwarding present; in Load 3, the packet processor has to forward packets of flow 'NRT forward' and encrypt/decrypt packets of flows 'NRT encryption' and 'NRT decryption', respectively. The size of the search space for the given instance can be computed
286
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler Table 12.61. Loads for the different scenarios for which the architecture should be optimized. Load scenario 2 Objectives
3 Objectives
4 Objectives
I RT send Load 1 y/
Load 2
Load Load Load Load
3 1 2 3
\1
y/ y/ -
RT receive y/
\1
y/ y/ -
NRT encrypt y/
-
y/_ -J y/
NRT decrypt y/
-
y/_ y/ y/
NRT forward y/
\7 yj y/ y/ y/
as follows. In the problem setting, there are 4 resource types on which all the tasks can be performed. Therefore, we have more than 425 possibilities to map the tasks on the resources. Furthermore, the solution contains a priority assignment to the different flows. There are 5! possibilities to assign priorities to the flows. So, if we take into account that there are other specialized resources available, the size of the search space is S > 425 x 5! > 1017 already for the problem instance with 2 objectives and even larger for the instances with 3 or 4 objectives. As an example, an approximated Pareto front for the 3-objective instance is shown in Figure 12.8—the front has been generated by the optimizer SPEA2 2S. The x-axis shows the objective value corresponding to *Load2 under Load 2 (as defined in Table 12.61), the y-axis shows the objective value corresponding to \PLoad?,, whereas the z-axis shows the normalized total cost of the allocated resources. The two example architectures shown in Figure 12.8 differ only in the allocation of the resource 'Cipher', which is a specialized hardware for encryption and decryption of packets. The performance of the two architectures for the load scenario with real-time flows to be processed is more or less the same. However, the architecture with a cipher unit performs around 30 times better for the encryption/decryption scenario, at increased cost for the cipher unit. So, a designer of a packet processor that should have the capability of encryption/decryption would go for the solution with a cipher unit (solution on the left in Figure 12.8), whereas one would decide for the cheaper solution on the right, if there is no need for encryption. 12.4.2. Simulation Results To evaluate the difficulty of the proposed benchmark application, we compared the performance of four evolutionary multiobjective optimizers, namely SPEA2 25 , NSGA-II 8 , SEMO 15 and FEMO 15, on the three afore-
A Computer Engineering Benchmark Application
287
Fig. 12.8. Two solution packet processor architectures annotated with loads on resources for the different loads specified in Table 12.61
mentioned problem instances. For each algorithm, 10 runs were performed using the parameter settings listed in Tables 12.62 and 12.63; these parameters were determined based on extensive, preliminary simulations. Furthermore, all objective functions were scaled such that the corresponding values lie within the interval [0,1]. Note that all objectives are to be minimized, i.e., the performance values are reversed (smaller values correspond to better performance). The different runs were carried out on a Sun Ultra 60. A single run for 3 objectives, a population size of 150 individuals in conjunction with SPEA2 takes about
288
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler Table 12.62. Parameters for population size and duration of runs dependent on the number of objectives. # of objectives 2 3 4 Table 12.63. tion 12.2) Mutation ->
population size 100 150 200
# of generations 200 300 400
Probabilities for mutation and crossover (cf. Sec-
Allocation
Pmut Pmut-alloc *Tnut — alloc — zero
-> Crossover -> ->
Binding Allocation Binding
Pmut-bind Pcross Pcross-alloc | PCross-bind
= = —
= = = =
0.8 0.3 \J.O
0-5 0.5 0.3 0-5
20 minutes to complete. In the following we have used two binary performance measures for the comparison of the EMO techniques: (1) the additive e-quality measure 28 , and (2) the coverage measure 27 . The e-quality measure Ie+(A,B) returns the maximum value d, which can be subtracted from all objective values for all points in the set of solutions A, such that the solutions in the shifted set A1 equal or dominate any solution in set B in terms of the objective values. If the value is negative, the solution set A entirely dominates the solution set B. Formally, this measure can be stated as follows: L+ (A, B) = max { min I max {a,i - bA > > , y b€B {a€A {0
A Computer Engineering Benchmark Application
289
Fig. 12.9. Illustration of the additive e-quality measure Ie+, here IS+(A, B) = d where d < 0 as A entirely dominates B
tives with respect to the chosen performance measures; this confirms results presented in a study by Zitzler, Laumanns, and Thiele 25. The distributions of the coverage and the additive e-quality measure values between SPEA2 and NSGA-II are depicted in Figure 12.10. In both the cases, for 2 and 4 objectives, NSGA-II achieves a better value for coverage over SPEA2, but for more objectives, SPEA2 finds solutions that lead to smaller values for Ie+ than NSGA-II. Furthermore, we can see that FEMO, a simple evolutionary optimizer with a fair selection strategy, performs worse than the other algorithms for all the test cases (see Figure 12.11 for details) — again, with respect to the two performance measures under consideration. Note that because of the implementation of selection in FEMO, it is possible that a solution is selected for reproduction multiple times in sequence. This behavior can decrease the impact of recombination in the search process with FEMO. With respect to both coverage and especially the additive e-quality measure, SPEA2 is superior to FEMO. SEMO, in contrast, performs similarly to FEMO for the case with 2 objectives; however, SEMO shows improved performance with increasing number of objectives. 12.5. Summary This chapter presented EXPO, a computer engineering application that addresses the design space exploration of packet processor architectures. The underlying optimization problem is complex and involves allocating
290
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
Fig. 12.10. Comparison of SPEA2 and NSGA-II for 2 and 4 objectives
resources, binding tasks to resources, and determining schedules for the usage scenario under consideration. The goal is to minimize the cost of the allocated resources and to maximize the estimated performance of the corresponding packet processor architecture for each distinct usage scenario. Especially the last aspect, performance estimation, is highly involved and makes the use of black-box optimization methods necessary. As shown in the previous section, the application reveals performance differences between
A Computer Engineering Benchmark Application
291
Fig. 12.11. Comparison of SPEA2 and FEMO for 2 and 4 objectives
four selected multiobjective optimizers with several problem instances. This suggests that EXPO is well suited as a multiobjective benchmark application. Moreover, the EXPO implementation provides a text-based interface that follows the PISA specification 2. PISA stands for 'platform- and programming-language-independent interface for search algorithms' and allows to implement application-specific parts (representation, variation,
292
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
objective function calculation) separately from the actual search strategy (fitness assignment, selection). Therefore, EXPO can be downloaded as a ready-to-use package, i.e., pre-compiled for different platforms; no modifications are necessary to combine it with arbitrary search algorithms. In addition, several multiobjective evolutionary algorithms including SPEA2 25 and NSGA-II 8 as well as other well-known benchmark problems such as the knapsack problem and a set of continuous test functions 24 ' 9 are available for download at the PISA website http://www.tik.ee.ethz.ch/pisa/. All PISA compliant components, benchmarks and search algorithms, can be arbitrarily combined without further implementation effort, and therefore this interface may be attractive for other researchers who would like to provide their algorithms and applications to the community. Acknowledgments This work has been supported by the Swiss Innovation Promotion Agency (KTI/CTI) under project number KTI 5500.2 and the SEP program at ETH Zurich under the poly project TH-8/02-2. References 1. A. Agarwal. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525—539, September 1992. 2. S. Bleuler, M. Laumanns, L. Thiele, and E. Zitzler. PISA — a platform and programming language independent interface for search algorithms. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi- Criterion Optimization (EMO 2003), Lecture Notes in Computer Science, pages 494-508, Berlin, 2003. Springer. 3. T. Blickle, J. Teich, and L. Thiele. System-level synthesis using evolutionary algorithms. Journal on Design Automation for Embedded Systems, 3(8):2358, 1998. 4. S. Chakraborty, S. Kiinzli, and L. Thiele. A general framework for analysing system properties in platform-based embedded system designs. In Proc. 6th Design, Automation and Test in Europe (DATE), Munich, Germany, March 2003. 5. S. Chakraborty, S. Kiinzli, L. Thiele, A. Herkersdorf, and P. Sagmeister. Performance evaluation of network processor architectures: Combining simulation with analytical estimation. Computer Networks, 41(5):641-665, April 2003. 6. C. A. Coello Coello, D. A. Van Veldhuizen, and G. B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, 2002. 7. K. Deb. Multi-objective optimization using evolutionary algorithms. Wiley, Chichester, UK, 2001.
A Computer Engineering Benchmark Application
293
8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182-197, April 2002. 9. K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable multi-objective optimization test problems. In Congress on Evolutionary Computation (CEC), pages 825-830. IEEE Press, 2002. 10. R. P. Dick and N. K. Jha. MOGAC: A Multiobjective Genetic Algorithm for Hardware-Software Co-synthesis of Hierarchical Heterogeneous Distributed Embedded Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(10):920-935, 1998. 11. M. Franklin and T. Wolf. A network processor performance and design model with benchmark parameterization. In P. Crowley, M. Franklin, H. Hadimioglu, and P. Onufryk, editors, Network Processor Design: Issues and Practices, Volume 1, chapter 6, pages 117-140. Morgan Kaufmann Publishers, 2003. 12. D. D. Gajski, F. Vahid, S. Narayan, and J. Gong. Specification and Design of Embedded Systems. Prentice Hall, Englewood Cliffs, N.J., 1994. 13. A. Jaszkiewicz. Do multiple-objective metaheuristics deliver on their promises? a computational experiment on the set-covering problem. IEEE Transactions on Evolutionary Computation, 7(2): 133-143, 2003. 14. J. Knowles and D. Corne. Instance generators and test suites for the multiobjective quadratic assignment problem. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Optimization (EMO 2003), Lecture Notes in Computer Science, pages 295-310, Berlin, 2003. Springer. 15. M. Laumanns, L. Thiele, and E. Zitzler. Running time analysis of multiobjective evolutionary algorithms on pseudo-boolean functions. IEEE Transactions on Evolutionary Computation, 2004. Accepted for publication. 16. M. Naedele, L. Thiele, and M. Eisenring. Characterising variable task releases and processor capacities. In 14th IFAC World Congress 1999, pages 251-256, Beijing, July 1999. 17. A. Pimentel, P. Lieverse, P. van der Wolf, L. Hertzberger, and E. Deprettere. Exploring embedded-systems architectures with Artemis. IEEE Computer, 34(ll):57-63, November 2001. 18. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. Design space exploration of network processor architectures. In First Workshop on Network Processors at the 8th International Symposium on High-Performance Computer Architecture (HPCA8), pages 30-41, Cambridge MA, USA, February 2002. 19. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. A framework for evaluating design tradeoffs in packet processing architectures. In Proc. 39th Design Automation Conference (DAC), pages 880-885, New Orleans, LA, June 2002. ACM Press. 20. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. Design space exploration of network processor architectures. In Network Processor Design: Issues and Practices, volume 1, chapter 4, pages 55-90. Morgan Kaufmann Publishers, 2003.
294
S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler
21. L. Thiele, S. Chakraborty, M. Gries, A. Maxiaguine, and J. Greutert. Embedded software in network processors - models and algorithms. In Proc. 1st Workshop on Embedded Software (EMSOFT), Lecture Notes in Computer Science 2211, pages 416-434, Lake Tahoe, CA, USA, 2001. Springer Verlag. 22. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for scheduling hard real-time systems. In Proc. IEEE International Symposium on Circuits and Systems (ISCAS), volume 4, pages 101-104, 2000. 23. D. Thierens. Convergence time analysis for the multi-objective counting ones problem. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Optimization (EMO 2003), Lecture Notes in Computer Science, pages 355-364, Berlin, 2003. Springer. 24. E. Zitzler, K. Deb, and L. Thiele. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation, 8(2):173-195, 2000. 25. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization. In K. Giannakoglou et al., editors, Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), pages 95-100. International Center for Numerical Methods in Engineering (CIMNE), 2002. 26. E. Zitzler, J. Teich, and S. S. Bhattacharyya. Multidimensional exploration of software implementations for DSP algorithms. Journal of VLSI Signal Processing, 24(l):83-98, February 2000. 27. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, 3(4):257-271, 1999. 28. E. Zitzler, L. Thiele, M. Laumanns, C. M. Foneseca, and V. G. da Fonseca. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Transactions on Evolutionary Computation, 7(2):117-132, 2003.
CHAPTER 13 MULTIOBJECTIVE AERODYNAMIC DESIGN AND VISUALIZATION OF SUPERSONIC WINGS BY USING ADAPTIVE RANGE MULTIOBJECTIVE GENETIC ALGORITHMS Shigeru Obayashi and Daisuke Sasaki Institute of Fluid Science, Tohoku University 2-1-1, Katahira, Sendai, 980-8577 JAPAN E-mail: [email protected], [email protected] This paper describes an application of Adaptive Range Multiobjective Genetic Algorithms (ARMOGAs) to aerodynamic wing optimization. ARMOGAs are an extension of MOGAs with the range adaptation. The objectives are to minimize transonic and supersonic drag coefficients, as well as the bending and twisting moments of the wings for the supersonic airplane. A total of 72 design variables are employed to describe wing geometry in terms of wing's planform, thickness distribution, and warp shape. Four-objective optimization successfully produced 766 nondominated solutions. These solutions are first compared with the nondominated wings obtained by the three-objective optimization and a wing designed by the National Aerospace Laboratory (NAL). To analyze the present non-dominated solutions further, Self-Organizing Maps (SOMs) have been used to visualize tradeoffs among objective function values. The design variables are also mapped onto a separate SOM. The resulting SOM generates clusters of design variables, which indicate roles of the design variables for design improvements and tradeoffs. These processes can be considered as data miningof the engineering design. 13.1. Introduction Multiobjective Evolutionary Algorithms (MOEAs) are getting popular in many fields because they provide a unique opportunity to address global tradeoffs between multiple objectives by sampling a number of Pareto solutions. In addition to perform the multiobjective optimization, it is getting more important to analyze the tradeoffs. To understand tradeoffs, visualization is essential. Although it is trivial to understand tradeoffs between two objectives, tradeoff analysis in more than three dimensions is not triv295
296
5. Obayashi and D. Sasaki
ial as shown in Fig. 1. To visualize higher dimensions, the Self-Organizing Map (SOM) by Kohonen1'2 is applied herein to Pareto solutions obtained by the multiobjective design optimization. In this paper, the design target is the wing for Supersonic Transport (SST). Many research activities have been performed for SST worldwide.3"14 In Japan, the National Aerospace Laboratory (NAL) conducted the scaled supersonic experimental airplane project.5"9 For a new SST design, there exist many technical difficulties to overcome. Lift-to-drag ratio must be improved, and the sonic boom should be minimized. However, there is a severe tradeoff between reducing the drag and boom. As a result, a new SST is expected to cruise at a supersonic speed only over the sea and to cruise at a transonic speed over the ground. This means the important design objectives are not only to improve a supersonic cruise performance but also to improve a transonic one. For example, a large sweep angle can reduce the wave drag, but it limits the low-speed aerodynamic performance. Therefore, there are many tradeoffs to be addressed in designing a SST. The multipoint aerodynamic optimization of a wing shape for SST at both supersonic and transonic cruise conditions was conducted by using the Adaptive Range Multiobjective Genetic Algorithm (ARMOGA).13 Both aerodynamic drags were to be minimized under lift constraints, and the bending and pitching moments of the wing were also minimized instead of imposing constraints on structure and stability. A high fidelity Computational Fluid Dynamics (CFD) code, a Navier-Stokes code, was used to evaluate the wing performance at both conditions. In this design optimization, planform shapes, camber, thickness distributions and twist distributions were parameterized in total of 72 design variables. To alleviate the required computational time, parallel computing was performed for function evaluations. The resulting 766 non-dominated solutions are analyzed to reveal tradeoffs in this paper. In addition, SOM is also used to understand the four-dimensional objective functions and 72-dimensional design variables. SOM is one of neural network models. SOM algorithm is based on unsupervised, competitive learning. It provides a topology preserving mapping from the high dimensional space to map units. Map units, or neurons, usually form a two-dimensional lattice and thus SOM is a mapping from the high dimensions onto the two dimensions. The topology preserving mapping means that nearby points in the input space are mapped to nearby units in SOM. SOM can thus serve as a cluster analyzing tool for high-dimensional data. The cluster analysis of the objective function values will help to identify design tradeoffs. Regarding four design objectives as a codebook vector,
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
297
SOM is first applied to visualize the design tradeoffs. Design is a process to find a point in the design variable space that matches with the given point in the objective function space. This is, however, very difficult. For example, the design variable spaces considered here have 72 dimensions. One way of overcoming high dimensionality is to group some of design variables together. To do so, the cluster analysis based on SOM can be applied again. Extracting a specific design variable from non-dominated solutions to form a codebook vector, the design variable space can be mapped onto another SOM. The resulting SOM generates clusters of design variables. Design variables in such a cluster behave similar to each other and thus a typical design variable in the cluster indicates the behavior/role of the cluster. A designer may extract design information from this cluster analysis. These processes can be considered as data mining for the engineering design.
Fig. 13.1. Visualization of Pareto front.
13.2. Adaptive Range Multiobjective Genetic Algorithms Genetic Algorithms (GAs) search from multiple points in the design space simultaneously and stochastically, instead of moving from a single point deterministically like gradient-based methods. This feature prevents design candidates from settling in a local optimum. Moreover, GAs do not require computing gradients of the objective function. These characteristics lead
298
S. Obayashi and D. Sasaki
to the following advantages of GAs coupled with CFD: 1, GAs have the capability of finding global optimal solutions. 2, GAs can be processed in parallel. 3, high fidelity CFD codes can easily be adapted to GAs without any modification. 4, GAs are not sensitive to any noise that might be present in the results. 5, GAs are less prone to premature failure. GAs have been extended to solve multiobjective problems successfully.18-19 GAs use a population to seek optimal solutions in parallel. This feature can be extended to seek Pareto solutions in parallel without specifying weights between the objective functions. The resultant Pareto solutions represent global tradeoffs. As high-fidelity CFD solvers need a large computational time, an efficient MOEA is required for the aerodynamic optimization. ARMOGA is developed for this purpose. In the traditional binary coding, a large string length is necessary for real parameter problems, which may result in a slow convergence to a global optimum. Adaptive Range Genetic Algorithm (ARGA), which was originally proposed by Arakawa and Hagiwara, is a quite unique approach to solve such problems efficiently.15'16 Oyama developed real-coded ARGA and applied them to the transonic wing optimization.17 ARMOGA has been developed based on ARGA to deal with multiple Pareto solutions for multi-objective optimization. The main difference between ARMOGA and a conventional Multi-Objective Genetic Algorithm (MOGA) is the introduction of the range adaptation. The flowchart of ARMOGA is shown in Fig. 2. The population is reinitialized at every M generations for the range adaptation so that the population advances toward promising regions. The basis of ARMOGA is the same as ARGA, but a straightforward extension may cause a problem in the diversity of the population. To better preserve the diversity of solution candidates, the normal distribution for encoding is changed. Figure 3 shows the search range with the distribution of the probability. Plateau regions are defined by the design ranges of selected solutions. Then the normal distribution is considered at both sides of the plateau. The advantages of ARMOGA are following: It is possible to obtain Pareto solutions efficiently because of the concentrated search of the probable design space. It also produces diversified solutions. On the other hand, it may be difficult to avoid the local minima, if global solutions are not included in the present search region. The genetic operators adopted in ARMOGA is based on MOGAs.20
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
299
Selection is based on the Pareto ranking method and fitness sharing.21 Each individual is assigned a rank according to the number of individuals that dominate it. A standard fitness sharing function is used to maintain the diversity of the population. To find the Pareto solutions more effectively, the so-called best-N selection21 is also adopted. Blended crossover (BLX-Q) 2 2 described below is adopted. This operator generates children on a segment defined by two parents and a user specified parameter . The disturbance is added to new design variables at a mutation rate of 20%. If the mutation occurs, new design variables are obtained as Childl =7 x Parentl + (1 - 7) x Parent2 + m x (ran2 - 0.5) Child2 =(1 - 7) x Parentl + 7 x Parent2 + m x {rani - 0.5) 7 = (1 + 2a) x rani - a
(1)
where a = 0.5 and Childl,2 and Parentl,2 denote encoded design variables of the children (members of the new population) and parents (a mated pair of the old generation), respectively. The random numbers ranl-3 are uniform random number in [0,1] and m is set to 10% of the given range of each design variable.
Fig. 13.2. Flowchart of ARMOGA.
300
S. Obayashi and D. Sasaki
Fig. 13.3. Sketch of range adaptation.
13.3. Multiobjective Aerodynamic Optimization 13.3.1. Furmulation of
Optimization
Four objective functions used here are (i) Drag coefficient at transonic cruise, Co,t (ii) Drag coefficient at supersonic cruise, CD,S (iii) Bending moment at the wing root at supersonic cruise condition, MB (iv) Pitching moment at supersonic cruise condition, Mp In the present optimization, these objective functions are to be minimized. The transonic drag minimization corresponds to the cruise over land; the supersonic drag minimization corresponds to the cruise over sea. Lower bending moments allow less structural weight to support the wing. Lower pitching moments mean less trim drag. The present optimization is performed at two design points for the transonic and supersonic cruises. Corresponding flow conditions and the target lift coefficients are described as (i) Transonic cruising Mach number, Moo,t — 0-9 (ii) Supersonic cruising Mach number, M^^ = 2.0 (iii) Target lift coefficient at transonic cruising condition, Ch,t = 0-15 (iv) Target lift coefficient at supersonic cruising condition, CL,S = 0.10 (v) Reynolds number based on the root chord length at both conditions, Re = 1.0 x 107 The Reynolds number is taken from the wind tunnel condition. Flight altitude is assumed at 10 km for the transonic cruise and at 15 km for the supersonic cruise. To maintain lift constraints, the angle of attack is com-
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
301
puted for each configuration by using da obtained from the finite difference. Thus, three Navier-Stokes computations per evaluation are required. During the aerodynamic optimization, wing area is frozen at a constant value. Design variables are categorized to planform, airfoil shapes and the wing twist. Planform shape is defined by six design variables, allowing one kink in the spanwise direction. The definition is shown in Fig. 4 and constraints for the planform are summarized in Table 1. A chord length at the wing tip is determined accordingly because of the fixed wing area. Airfoil shapes are composed of its thickness distribution and camber line. The thickness distribution is represented by a Bezier curve defined by 11 polygons as shown in Fig. 5. The wing thickness is constrained for structural strength as summarized in Table 1. The thickness distributions are denned at the wing root, kink and tip, and then linearly interpolated in the spanwise direction. Two camber surfaces composed of the airfoil camber lines are defined at the inboard and outboard of the wing separately. Each surface is represented by the Bezier surface defined by four polygons in the chordwise direction and three in the spanwise direction. Finally, the wing twist is represented by a B-spline curve with six polygons. In total, 72 design variables are used to define a whole wing shape. A three-dimensional wing with computational structured grid is shown in Fig. 6. See Ref. 13 for more details for geometry definition and CFD information.
Fig. 13.4. Wing planform definition and schematic view of moment axes.
302
S. Obayashi and D. Sasaki Table 1. Summary of constraints. (a) Constraints for planform shape
Chord length at root Chord length at kink Inboard span length Outboard span length Inboard sweep angle (deg) Outboard sweep angle (deg) Wing area Chord length at tip Chord length Span length Sweep angle ^ ^
10 < Croot < 20 3 < Cunk < 15 2 < bin < 7 2 < bout < 7 35 < aroot < 70 35 < otkink < 70 S = 60 1 < Ctip < 10 Ct%P < Ckink < Croot bout < bin ctkink < Q-root
(b) Constraints for thickness distribution
Maximum thickness Maximum thickness location Continuous first derivative at P5 ~
.
Continuous second derivative at F5 Continuous first derivative at leading edge
3 < Zps < 4 15 < Xp5 < 70 Zpt — Zp5 = Zp6 ^Pc
-A-P-* — X-P-7
-X-P* ,
Zp3 = ZPj Xp0 = Xpt
13.3.2. CFD Evaluation To evaluate the design, a high fidelity Euler/Navier-Stokes code was used. Taking advantage of the characteristics of GAs, the present optimization was parallelized on SGI ORIGIN2000 at the Institute of Fluid Science, Tohoku University. The system has 640 Processing Elements (PE's) with peak performance of 384 GFLOPS and 640 GB of memory. A simple master-slave strategy was employed: The master PE manages the optimization process, while the slave PE's compute the Navier-Stokes code. The parallelization rate became almost 100% because almost all the CPU time was dominated by CFD computations. The population size used in this study was set to 64 so that the process was parallelized with 32-128 PE's depending on the availability of job classes. The present optimization requires about six hours per generation for the supersonic wing case when parallelized on 128 PE's.
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
303
Fig. 13.5. Thickness definition.
Fig. 13.6. Computational grid around a wing in C-H topology.
13.3.3. Overview of Non-Dominated
Solutions
The evolution was computed for 75 generations. After the computation, all the solutions evolved were sorted again to find the final nondominated solutions. The non-dominated solutions were obtained in the four-dimensional objective function space. To understand the distribution of non-dominated solutions, all non-dominated solutions are projected into
304
S. Obayashi and D. Sasaki
the two-dimensional objective function space between transonic and supersonic drag coefficients as shown in Fig. 7. In Fig. 7, Surface I shows the tradeoff between aerodynamic performances. The wings near Surface I have impractically large aspect ratios. The planform shapes of the extreme non-dominated solutions that minimize the respective objective functions appear physically reasonable as shown in Fig. 8. A wing with the minimal transonic cruising drag has a less leading-edge sweep and a large aspect ratio. On the contrary, a wing with the lowest supersonic drag coefficient has a large leading-edge sweep to remain inside the Mach cone. The pitching moment is reduced by lowering the sweep angle and the wing chord length. All the present non-dominated solutions in Fig. 7 are labeled by the bending and pitching moments, respectively, as shown in Fig. 9. The wings near the tradeoff surface between transonic and supersonic drag coefficients (tradeoff surface I in Fig. 7) have impractically large bending moments as shown in Fig. 9(a). The bending moment is closely related to both transonic and supersonic drag coefficients. On the other hand, the pitching moment has an influence only on supersonic drag coefficient in Fig. 9(b).
Fig. 13.7. Projection of non-dominated solutions into two-dimensional plane between transonic and supersonic drag coefficients.
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
305
Fig. 13.8. Planform shapes of the extreme non-dominated solutions.
13.4. Data Mining by Self-Organizing Map 13.4.1. Neural Network and SOM SOM1'2 is a two-dimensional array of neurons: M = {mi...mpxJ
(2)
One neuron is a vector called the codebook vector: m; = [mi, ...rriij
(3)
This has the same dimension as the input vectors (n-dimensional). The neurons are connected to adjacent neurons by a neighborhood relation. This dictates the topology, or the structure, of the map. Usually, the neurons are connected to each other via rectangular or hexagonal topology. One can also define a distance between the map units according to their topology relations. The training consists of drawing sample vectors from the input data set and "teaching" them to SOM. The teaching consists of choosing a winner unit by means of a similarity measure and updating the values of codebook vectors in the neighborhood of the winner unit. This process is repeated a number of times. In one training step, one sample vector is drawn randomly from the input data set. This vector is fed to all units in the network and a similarity measure is calculated between the input data sample and all the codebook vectors. The best-matching unit is chosen to be the codebook vector with greatest similarity with the input sample. The similarity is usually defined
306
5. Obayashi and D. Sasaki
(a) Labeled according to bending moment
(b) Labeled according to pitching moment Fig. 13.9. Projection of non-dominated front to supersonic and transonic drag tradeoffs labeled according to bending and pitching moments.
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
307
by means of a distance measure. For example in the case of Euclidean distance the best-matching unit is the closest neuron to the sample in the input space. The best-matching unit, usually noted as m c , is the codebook vector that matches a given input vector x best. It is denned formally as the neuron for which | | x - m c | | = min[||x-mi||]
(4)
After finding the best-matching unit, units in SOM are updated. During the update procedure, the best-matching unit is updated to be a little closer to the sample vector in the input space. The topological neighbours of the bestmatching unit are also similarly updated. This update procedure stretches the best-matching unit and its topological neighbours towards the sample vector. The neighbourhood function should be a decreasing function of time. In the following, SOMs were generated with more advanced techniques by using Viscovery® SOMine 4.0 Plus.23
13.4.2. Cluster Analysis Once SOM projects input space on a low-dimensional regular grid, the map can be utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. The twostage procedure — first using SOM to produce the prototypes which are then clustered in the second stage — was reported to perform well when compared to direct clustering of the data.24 Hierarchical agglomerative algorithm is used for clustering here. The algorithm starts with a clustering where each node by itself forms a cluster. In each step of the algorithm two clusters are merged: those with minimal distance according to a special distance measure, the SOM-Ward distance.23 This measure takes into account whether two clusters are adjacent in the map. This means that the process of merging clusters is restricted to topologically neighbored clusters. The number of clusters will be different according to the hierarchical sequence of clustering. A relatively small number will be chosen for visualization (Sec. 4.3), while a large number will be used for generation of codebook vectors for respective design variables (Sec. 4.4).
308
S. Obayashi and D. Sasaki
13.4.3. Visualization of Design Tradeoffs: SOM of Tradeoffs As discussed above, a total of 766 non-dominated solutions were obtained after 75 generations as a three-dimensional surface in the four-dimensional objective function space as shown in Figs. 7 and 9. By examining the extreme non-dominated solutions, the archive was found to represent the nondominated front qualitatively. The present non-dominated solutions of supersonic wing designs have four design objectives. First, let's project the resulting non-dominated front onto the two-dimensional map. Figure 10 shows the resulting SOM with seven clusters. For better understanding, the typical planform shapes of wings are also plotted in the figure. Lower right corner of the map corresponds to highly swept, high aspect ratio wings good for supersonic aerodynamics. Lower left corner corresponds to moderate sweep angles good for reducing the pitching moment. Upper right corner corresponds to small aspect ratios good for reducing the bending moment. Upper left corner thus reduces both pitching and bending moments.
Fig. 13.10. SOM of the objective function values and typical wing planform shapes.
Figure 11 shows the same SOM contoured by four design objective val-
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
309
ues. All the objective function values are scaled between 0 and 1. Low supersonic drag region corresponds to high pitching moment region. This is primarily because of high sweep angles. Low supersonic drag region also corresponds to high bending moment region because of high aspect ratios. Combination of high sweep angle and high aspect ratio confirm that supersonic wing design is highly constrained.
Fig. 13.11.
SOM contourd by each design objective.
310
5. Obayashi and D. Sasaki
13.4.4. Data Mining of Design Space: SOM of Design Variables The previous SOM provides clusters based on the similarity in the objective function values. The next step is to find similarity in the design variables that corresponds to the previous clusters. To visualize this, the previous SOM is first revised by using 49 clusters as shown in Fig. 12. Then, all the design variables are averaged in each cluster, respectively. Now each design variable has a codebook vector of 49 cluster-averaged values. This codebook vector may be regarded to represent focal areas in the design variable space. Finally, a new SOM is generated from these codebook vectors as shown in Fig. 13. This process can be done for encoded design variables (genotype) and decoded design variables (phenotype). The genotype and phenotype generated completely different SOMs. A possible reason is because the various scaling appears in phenotype. For example, one design variable is between 0 and 1 and another is between 35 and 70. The difference of order of magnitude in design variables may lead to different clusters. To avoid such confusion, the genotype is used for SOM here. In Fig. 13, the labels indicate 72 design variables. DVs 00 to 05 correspond to the planform design variables. These variables have dominant influence on the wing performance. DVs 00 and 01 determine the span lengths of the inboard and outboard wing panels, respectively. DVs 02 and 03 correspond to leading-edge sweep angles. DVs 04 and 05 are root-side chord lengths. DVs 06 to 25 define wing camber. DVs 26 to 32 determine wing twist. Figure 13 contains seven clusters and thus seven design variables are chosen from each cluster as indicated. Figure 14 shows SOM's of Fig. 10 contoured by these design variables. The sweep angles, DVs 02 and 03, make a cluster in the lower left corner of the map in Fig. 13 and the corresponding plots in Fig. 14 confirm that the wing sweep has a large impact on the aerodynamic performance. DVs 11 and 51 in Fig. 14 do not appear influential to any particular objective. By comparing Figs. 14 and 10, DV 01 has similar distribution with the bending moment Mb, indicating that the wing outboard span has an impact on the wing bending moment. On the other hand, DV 00, the wing inboard span, has an impact on the pitching moment. DV 28 is related to transonic drag. DV 04 and 05 are in the same cluster. Both of them have an impact on the transonic drag because their reduction means the increase of aspect ratio. Several features of the wing planform design variables and the
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
311
corresponding clusters are found out in the SOMs and they are consistent with the existing aerodynamic knowledge.
Fig. 13.12. SOM of objective function values with 49 clusters.
13.5. Conclusions The multipoint design optimization of a wing for a SST has been performed by using ARMOGA. Four objective functions were used to minimize the supersonic and transonic drags and the bending and pitching moments. The complete wing shape was represented by a total of 72 design variables. The Navier-Stokes solver was used to evaluate the aerodynamic performances. Successful optimization results are obtained. The planforms of the extreme Non-dominated solutions appear physically reasonable. Global tradeoffs between the objectives are presented. In addition, design tradeoffs have been investigated for the design problem of supersonic wings by using visualization and cluster analysis of the non-dominated solutions based on SOMs. SOM is applied to visualize tradeoffs between design objectives. Three-dimensional non-dominated front in the objective function space has been mapped onto the two-dimensional SOM where global tradeoffs are successfully visualized. The resulting SOMs
312
S. Obayashi and D. Sasaki
Fig. 13.13. SOM of cluster-averaged disign variables.
are further contoured by each objective, which provides better insights into design tradeoffs. Furthermore, based on the codebook vectors of cluster-averaged values for respective design variables obtained from the SOMs, the design variable space is mapped onto another SOM. Design variables in the same cluster are considered to have similar influences in design tradeoffs. Therefore, by selecting a member (design variable) from a cluster, the original SOM in the objective function space is contoured by the particular design variable. It reveals correlation of the cluster of design variables with objective functions and their relative importance. Because each cluster of design variables can be identified influential or not to a particular design objective, the optimization problem may be divided into subproblems where the optimization will be easier to lead to better solutions. These processes may be considered as data mining of the engineering design. The present work demonstrates that MOEAs and SOMs are versatile design tools for engineering design. Acknowledgments The present computation was carried out in parallel using ORIGIN2000 in the Institute of Fluid Science, Tohoku University. The authors would like to thank National Aerospace Laboratory's SST Design Team for providing
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
313
Fig. 13.14. SOM contoured by design variables selected from clusters in Fig. 13.
many useful data. References 1. T. Kohonen, Self-Organizing Maps, Springer, Berlin, Heidelberg (1995). 2. J. Hollmen, Self-Organizing Map, http://www.cis.hut.fi/~jhollnien/dippa/ node7.html, last access on October 3 (2002). 3. S. E. Cliff, J. J. Reuter, D. A. Saunders and R. M. Hicks, "Single-Point and Multipoint Aerodynamic Shape Optimization of High-Speed Civil Transport," J. of Aircraft, 38, 6, (2001), pp.997-1005. 4. J. J. Alonso, I. M. Kroo and A. Jameson, "Advanced Algorithms for Design and Optimization of Quiet Supersonic Platforms," AIAA Paper 2002-0144 (2002). 5. K. Sakata, "Supersonic Experimental Airplane Program in NAL and its CFDDesign Research Demand," Proc. of 2nd SST-CFD Workshop, (2000), pp.5356. 6. K. Sakata, "Supersonic Experimental Airplane (NEXST) for Next Generation SST Technology," AIAA Paper 2002-0527 (2002). 7. Y. Shimbo, K. Yoshida, T. Iwamiya, R. Takaki and K. Matsushima, "Aero-
314
8.
9.
10.
11.
12.
13.
14. 15.
16.
17.
18. 19.
20.
21.
22.
5. Obayashi and D. Sasaki dynamic Design of Scaled Supersonic Experimental Airplane," Proc. of 1st SST-CFD Workshop, (1998), pp.62-67. T. Iwamiya, K. Yoshida, Y. Shimbo, Y. Makino and K. Matsuhima, "Aerodynamic Design of Supersonic Experimental Airplane," Proc. of 2nd SST-CFD Workshop, (2000), pp.79-84. Y. Makino and T. Iwamiya, "Aerodynamic Nacelle Shape Optimization for NAL's Experimental Airplane," Proc. of 2nd SST-CFD Workshop, (2000), pp.115-120. R. Grenon, "Numerical Optimization in Aerodynamic Design with Application to a Supersonic Transport Aircraft," Proc. of 1st SST-CFD Workshop, (1998), pp.83-104. H.-J. Kim, D. Sasaki, S. Obayashi and K. Nakahashi, "Aerodynamic Optimization of Supersonic Transport Wing Using Unstructured Adjoint Method," AIAA J., 39, 6 (2001), pp.1011-1020. S. Obayashi, D. Sasaki, Y. Takeguchi and N. Hirose, "Multiobjective Evolutionary Computation for Supersonic Wing-Shape Optimization," IEEE Transactions on Evolutionary Computation, 4, 2 (2000), pp.182-187. D. Sasaki, S. Obayashi and K. Nakahashi, "Navier-Stokes Optimization of Supersonic Wings with Four Objectives Using Evolutionary Algorithm," J. of Aircraft, 39, 4 (2002), pp.621-629. D. Sasaki, G. Yang and S. Obayashi, "Automated Aerodynamic Optimization System for SST Wing-Body Configuration," AIAA Paper 2002-5549 (2002). M. Arakawa and I. Hagiwara, "Development of Adaptive Real Range (ARRange) Genetic Algorithms," JSME Int. J., Series C, 41, 4 (1998), pp.969977. M. Arakawa and I. Hagiwara, "Nonlinear Integer, Discrete and Continuous Optimization Using Adaptive Range Genetic Algorithms," Proc. of 1997 ASME Design Engineering Technical Conferences, 1997. A. Oyama, S. Obayashi and T. Nakamura, "Real-Coded Adaptive Range Genetic Algorithm Applied to Transonic Wing Optimization," Applied Soft Computing, 1, 3 (2001), pp.179-187. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms, John Wiley & Sons, Ltd., Chickester (2001). C. A. Coello Coello, D.A. Van Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-ObjectiveProblems , Kluwer Academic Publishers, New York, (2002). C. M. Fonseca and P. J. Fleming,, "Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization," Proc. of 5th ICGA, (1993), pp.416-423. S. Obayashi, S. Takahashi and Y. Takeguchi, "Niching and Elitist Models for MOGAs," Parallel Problem Solving from Nature - PPSN V, Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, (1998), pp. 260269. L. J. Eshelman and J. D. Schaffer, "Real-coded genetic algorithms and interval schemata," Foundations of Genetic Algorithms 2, Morgan Kaufmann Publishers, Inc., San Mateo, (1993), pp.187-202.
Multiobjective Aerodynamic Design and Visualization of Supersonic Wings
315
23. Eudaptics software gmbh. http://www.eudaptics.com/technology/somine4. html, last access on October 3 (2002). 24. J. Vesanto and E. Alhoniemi, "Clustering of the Self-Organizing Map," IEEE Transactions on Neural Networks, 11, 3 (2000), pp.586 -600.
CHAPTER 14 APPLICATIONS OF A MULTI-OBJECTIVE GENETIC ALGORITHM IN CHEMICAL AND ENVIRONMENTAL ENGINEERING
Madhumita B. Ray Department of Chemical and Biomolecular Engineering National University of Singapore 4 Engineering Drive 4 Singapore 117576 E-mail: [email protected]
Multiobjective optimization involving simultaneous optimization of more than one objective function is quite commonly encountered in chemical and environmental engineering processes. With the implementation of stringent regulations on air and water discharge of the particulate pollutants, development of efficient fluid-solid separation devices integrating the physico-chemical processes with economic parameters is of significant commercial importance. In this chapter, multiobjective optimization using conflicting objectives such as maximization of the overall collection efficiency and minimization of both the pressure drop and the cost in commonly used fluid-particulate separation devices namely cyclone separator and venturi scrubber is illustrated using the Non-dominated Sorting Genetic Algorithm (NSGA). 14.1. Introduction Genetic algorithms (GAs), which are a nontraditional search and optimization method, introduced by Holland1 in 1975, mimic the principles of genetics and natural selection. This is done by the creation of a population of solutions referred to as chromosomes or strings. Each chromosome is represented in terms of a set of several binary numbers generated randomly, and encodes the values of the different parameters (decision variables) being optimized. These are equivalent to the chromosomes in DNA in the biological systems. The chromosomes then go through a process of simulated "evolution". Bit-manipulation operators then implement "reproduction", 317
318
Madhumita B. Ray
"crossover", "mutation" and other biological operators of natural evolution, to improve their "fitness". GAs have several advantages over conventional optimization techniques: i) objective functions can be multimodal or discontinuous, ii) they require information only on the objective function; gradient evaluation is not required, iii) a starting solution is not required, iv) search is carried out using a population of several points simultaneously, rather than a single point, v) they are better suited to handle problems involving several design or operating variables (decision variables). Simple genetic algorithms (SGAs) are suitable for optimization problems involving single-objective functions. In such problems, a SGA usually reaches the global optimum. However, for problems involving multipleobjective functions, unique optimal solutions rarely exist. Rather, a set of several equally desirable, trade-off points may exist. These solutions are called non-dominated and conform the so-called Pareto optimal set. None of these non-dominated solutions are superior to any of the other points, and indeed, any one of them could be selected for design or operation. The choice of a desired solution among the Pareto set of points requires additional knowledge about the problem, information which may be intuitive and hence, non-measurable. Statistical techniques using the estimations of several decision-makers are often used to decide the preferred solution. However, the Pareto optimal set assists in narrowing down the choices to be considered for a decision-maker, and thus is of great importance. The Nondominated Sorting Genetic Algorithm (NSGA)2, an adaptation of SGA can be used for multi-objective optimization. Many of the current chemical engineering problems in core areas such as reaction engineering, transport phenomena, separation science, and biological systems require multi-objective optimization. A recent review presents various applications of multi-objective optimizations in diverse chemical engineering problems3. Previously, the NSGA has been successfully used for various chemical engineering problems including optimization of nylon6 semi-batch reactor4'5, optimization of side-fired steam reformer6, wiped film PET reactor7and dialysis of beer in hollow fiber membranes.8 In addition to the major goal of achieving economic efficiency, most of the recent chemical engineering examples include reliability, safety, hazard analysis, control performance, and environmental pollution. With the implementation of stringent regulations on air and water discharge of the particulate pollutants, development of efficient fluid-solid separation devices integrating the physico-chemical processes with economic parameters is of
Applications of a MOGA in Chemical and Environmental Engineering
319
significant practical importance. In this chapter, multiobjective optimization involving conflicting objectives such as maximization of the overall collection efficiency and minimization of the pressure drop, and in some cases, the cost in commonly used fluid-particulate separation devices is dealt with using the NSGA.
14.2. Physical Problem Control of particulate matter is a major aspect of industrial air pollution engineering. Current federal regulations in developed countries call for "invisible stack emissions" from industries. These regulations necessitate the development of new efficient systems for particulate removal as well as improving the existing designs. Particles are separated from the conveying fluids by a combination of several mechanisms namely: gravitational settling, centrifugal impaction, inertial impaction, direct interception, diffusion and electrostatic attraction. Industrial fluid-particle separation devices such as cyclones and hydrocyclones are centrifugal or inertial separators that use centrifugal force to remove the fine particles from air and water, respectively. Since its inception in the early nineteenth century, cyclone separators are immensely popular in industry due to their simple and compact design, and low manufacturing and maintenance costs. However, industrial scale cyclones are not efficient in removing particles smaller than 10 micron, and the operating cost increases considerably with the decreasing particle size. Venturi scrubbers, yet another type of predominantly inertial separators, are reasonably efficient for the removal of submicron particles, and are also able to handle wet and corrosive gases. The major particle collection mechanism in venturi scrubbers is due to inertial impaction of the particulates with liquid (mainly water) droplets. Large power requirements for their operation due to high pressure drop are the main drawback of the venturi scrubbers, and the pressure drop increases with the increasing collection efficiency. Naturally, reduction in operating pressure drop without compromising the particle collection efficiency is the most desired objective in the design and operation of such separators. In fact, these are the only two most important practical objective functions relevant to fluid-particle separation. In this chapter, we will see the application of the NSGA for optimal designs of cyclone separator and venturi scrubber using several examples.
320
Madhumita B. Ray
14.3. Genetic Algorithm Genetic Algorithm (GA) imitates biological evolution for optimization problems. It consists of a set of individual elements (the population) and a set of biologically similar operators that are used to change these individuals. In simple genetic algorithm (SGA), the binary information on decision variables is first mapped into real values using prescribed bounds and the fitness (objective) functions of the chromosomes are evaluated using appropriate model. Using the Darwinian principle of survival of the fittest, a new population (generation) is created by performing reproduction of the chromosomes in the current population. This is done by copying the chromosomes in the earlier generation into a gene pool, with the number of copies made being proportional to their fitness functions. The chromosomes in the gene pool then undergo pair-wise random crossover and mutation operations in order to provide members of the next generation. In the course of several generations, the fitness of the chromosomes improves, and fitter sets of strings emerge. 1>9~10 Non-dominated Sorting Genetic Algorithm (NSGA) was developed by Deb and Srinivas2 (1994) to solve optimization problems involving multiple objectives. Principally, NSGA differs from SGA in the selection of population. In NSGA-I, an initial population of chromosomes is generated randomly. As mentioned earlier, a chromosome (or gene) is a string of numbers (often binaries), encoding information about the decision variables. The subsets (substrings) in any chromosome associated with the different decision variables are then mapped into real values lying between the corresponding specified bounds. A model for the process is then used to evaluate the values of the fitness (objective) functions. Thereafter, a set of the good non-dominated chromosomes is identified by testing each of the chromosomes in the population against all others (pair-wise comparison) involving a large number of computational steps. Two solutions are nondominating when on moving from one solution to another, an improvement in one of the objective functions occurs at the cost of deterioration in one (or more) of the other objective function(s). A chromosome is checked for dominance once during the computation for that generation. After testing all the chromosomes in this manner, a sub-set of the best non-dominated chromosomes is identified. This is assigned a front number of unity (Front No. = 1). The remaining solutions are again compared as before, and the next set of nondominated solutions is identified and assigned a Front No. of 2. This procedure is repeated for all the new chromosomes. Fronts with lower values of the front number are superior or non-dominated sets com-
Applications of a MOGA in Chemical and Environmental Engineering
321
pared to those with a higher front number. A high fitness value (which is usually the number of chromosomes, N p , could be any other arbitrarily selected large value instead) is assigned arbitrarily to all the solutions in Front No. 1. The fitness values of individual chromosomes in this front are then modified based on their "degree of crowding" or a sharing procedure obtained by dividing the fitness value by the niche count of the chromosome. The niche count is a parameter proportional to the number of chromosomes/neighbors in its neighborhood (in the decision variable space) within the same front, with distant neighbors contributing less than those nearby. The niche count is obtained, e.g., for the ith chromosome, by computing its distance, djj, from another, jth chromosomes in the solution-space, and using a sharing function, Sh, as given below: Sh(dij) = j l - \~^rej Sh(dij) = 0,
d^ < ashare otherwise
(1)
In equation 1, crs/iare, a computational parameter, is the maximum distance allowed between two chromosomes to qualify as neighbors and a is the dimensionless exponent of the sharing function. Thus, if d^- is larger than ashare, its contribution to Sh is zero, while for d,j =0, its contribution to Sh is 1, and for the intermediate distances, Sh (dij) lies between 0 and 1. By summing up Sh (djj) for all values of j in any front comprising of nondominated chromosomes, degree of crowding of the ith chromosome can be found out. This summation is referred to as the niche count of chromosome i. The shared fitness value of chromosome, i, assigned earlier is the ratio of the common dummy fitness, and its niche count. Use of the shared fitness value for reproduction helps to spread out the chromosomes in the front. This procedure is repeated for all the members of the first front. Once this is done, these chromosomes are not considered for the time being, and all the remaining chromosomes are tested for non-dominance. The non-dominated chromosomes in this round are classified into the next front (Front No. =2). The common fitness value assigned to all members of this front is a bit lower than the lowest shared fitness value of the previous front (Front No. =1). Thereafter sharing is performed. This procedure is continued till all the chromosomes in the population have been assigned shared fitness values. This step is followed by reproduction. The chromosomes are copied stochastically (best chromosome having a higher probability) into a mating
322
Madhumita B. Ray
pool. Non-dominated members of the first front that have fewer neighbors get the highest representation in the mating pool. Dominated members of the later fronts, instead of getting "killed", are assigned some low fitness values in order to maintain the diversity of the gene-pool. There are numerous selection techniques for the copying of the chromosomes, e.g., roulette wheel, tournament selection (popular), normalized geometric ranking, expected value and linear normalization. In the next stage, crossover and mutation are performed on these copies to produce daughter chromosomes (and complete a generation). Crossover is a genetic operator used to recombine the genetic material of the population by selecting two chromosomes (randomly) and swapping part of their genetic information to produce new chromosomes. For example, a pair of binary coded chromosomes, 101001 and 010110, after crossover at the third (randomly selected) location, will give two chromosomes, 101110 and 010001. Mutation operator moves the chromosome locally in the solution space to create a fitter chromosome. Each binary number in every single chromosome is changed with a specified mutation probability, using a random number code. The mutation probability is small so as to avoid oscillatory behavior. The above procedure is repeated several times (generations) until a satisfactory set of Pareto optimal solutions are obtained in the gene-pool, having a reasonable spread of points. A flowchart showing NSGA-I is shown in Figure 14.1. 14.4. Problem Formulation Example 1 The major criteria that are used in the determination of the performance of gas-cleaning devices are the collection efficiency,^,and the pressure drop Ap. In the first example, we evaluate both the pressure drop, Ap, and the total annual cost, Co.for an industrial operation treating 165 m 3 /s of air using N parallel cyclones (multiclones). The train of N cyclones at the design stage was conducted for a paper mill.11 The average size of the particles in the stream is 10 /xm with a log-normal size distribution, and the average particle density, ps, is 1600 kg/m3. The viscosity of the gas is 24.8 x 10~6 Pa.s. There are several collection efficiency and pressure drop models available for cyclones in literature. The details of the models adopted in this example are available in Ravi et al. (2001)11. A schematic diagram showing all the
Applications of a MOGA in Chemical and Environmental Engineering
323
Fig. 14.1. Flow-chart of the NSGA (adapted from Mitra et a]., 1998)4.
dimensional parameters of standard reverse flow cyclone is presented in Figure 14.2. The methodology of optimization is illustrated by selecting two objective functions, Ii and I2, for simplicity. However, this is often sufficient for the optimization involving fluid-particle separation device where the maximization of the overall collection efficiency, ryo is desired with the minimization of the pressure drop, Ap. Here we will discuss several problems involving all the decision variables and constraints, commonly used in cyclone design. Problem 1 can, thus, be described mathematically as: Problem 1:
324
Madhumita B. Ray
Fig. 14.2. Schematic diagram of the test cyclone.
Max h(u)=h(N,D,%;%1%,%,%,%,±)=r,0 Min
(a)
J 2 (u)=/ 2 (^A&,§,g,;§,&,£,&)=Ap (b)
subject to (s.t.): 15.0
(c) (d) (2)
An alternate, 2-objective optimization problem (Problem 2) will also be discussed. Here, r\o is maximized while the cost, C o , is minimized:
Applications of a MOGA in Chemical and Environmental Engineering
325
Problem 2: Max T
,
s _
T
(AT
n
Ds_ B
H
S
h
a
b \ _ „
/Q\
A(u) =h [jy,U, -rf-, p, ^ , p, -p, •s, -p) -j}0 (a) Min (b) h^) = h{N,D,%%,%,%^,^^)=Co s.t.: 15.0 < Vi < 30.0m/s (c) u{
(a) Min F2 = h s.t. all earlier constraints (equations 2 or 3, c - e)
(b) (c) (4)
The procedure used to take care of the constraint on the range of values for Vi (equations. 2 c and 3 c) is to penalize chromosomes violating these constraints by adding an arbitrarily large number, P e , to the two fitness functions, Fi and F 2 , so that such chromosomes become unfit and die out. Bounds or limits on the variables Nine decision variables, u»; i = 1, 2, . . . , 9 have been used in these problems. These variables are the number, diameter and seven geometric ratios (shape) of the cyclones. The bounds used (first level or a-priori bounds) on the nine decision variables, u, for both Problems 1 and 2 are given in Table 14.64.
326 Table 14.64. Problem 2]
Madhumita
B. Ray
Bounds on Decision Variables, u; [Problem preference case) and
i
FIRST-LEVEL (a-priori) BOUNDS: _1 _2 _3 4 _5 _6 _7 _8 _9 SECOND-LEVEL (over-riding) BOUNDS:
(i) 0.4 < a/D < S/D in any chromosome (ii) If 0.5 < D e /D < 0.6 in any chromosome, then 0.15 < b/D < (1 - D e /D)/2 |
Uj
u"
u[
Stairmand* High Efficiency
N 1 2048 D, m 0.3 0.7 De/D 0.4 0.6 B/D~~ 0.325 0.425 H/D 3 . 5 ~ 4.5 " S/D 0.4 0.6 h/D 1.1 1,3 a/D 0.4 0.6 b/D 0.15 0.25
|
|
~ 0.5 0.375 4.0 0.5 1.2 0.5 0.2
|
The bounds have been chosen to encompass the values corresponding to the standard high-efficiency cyclone of Stairmand 12 design, which are also provided for comparison in the table. The number, N, of cyclones is to be taken as an integer. A reasonably large range (common for multiclones) is provided for the first decision variable, N. The constraints on the inlet velocity, v;, are those normally used in industrial practice. The lower bound on Vj helps ensure reasonably high values of r\o, while the upper bound helps reduce problems of erosion, excessively high values of Ap, and reduce reentrainment of solids. Similarly, a small range for the cyclone diameter, D, of 0.3 - 0.7 m has been taken. The lower limit helps prevent re-entrainment of the collected solids from the cyclone wall. The upper bound on D, as for the case of N, is somewhat arbitrarily selected, and has to be relaxed, at least to some extent, if the optimal solution lies at the upper bound. However, in multiclone operation, large diameter of the cyclones is seldom encountered. In order to avoid violating the physics of gas-solid separation, several additional bounds and constraints need to be added to over-ride the random choice of decision variables. These bounds are chromosome-specific. These are being referred to as second level or over-riding constraints. Two over-riding constraints (also shown in Table 1) are operative in the given example for the selections made for the a-priori bounds. For example, both
Applications of a MOGA in Chemical and Environmental Engineering
327
S/D and a/D have been selected to lie from 0.4 to 0.6. However, a wellknown practice is to have a < S, since this minimizes the short-circuiting of the feed stream to the outlet before going through the separation space. The presence of these kinds of over-riding bounds necessitates adaptation of the mapping procedures for the binary chromosomes into real numbers in the NSGA procedures currently in use. One must, for example, first map u^; i =1, 2, ..., 7, using the normal techniques. The values of S/D chosen for any chromosome must then be used to decide the bounds to be used for that chromosome while mapping the binary values of a/D into the realnumber domain. Over-riding bounds assists in numerical convergence of the problem. Optimum Cyclone Design The seven geometric parameters shown in Figure 14.2 are important for the performance of cyclone. The problems (equations 2 and 3) are solved on a CRAY J916 computer. The average CPU time required for the solution of the system is about 0.24 s. The several decision variables and constraints may interact in a complex manner, making it difficult to obtain meaningful solutions for the various decision variables (the presence of many decision variables makes interpretation of the results quite difficult). Thus, it is necessary to solve several simpler cases before attempting on the solution of the general problems. The simpler cases for Problem 1 consider one decision variable at one time (the remaining variables are fixed). For example, in Case 1, the only decision variable is the total number, N, of cyclones. All other variables are taken as constants, at values suggested by Stairmand12, while the diameter is fixed at 0.5 m. Cases 2 and 3 involve two decision variables, of which one is N and the other is either D or D e /D. Case 4 is the more general problem with all the nine decision variables used and is being called as the reference (ref) case. The results of Cases 1-3 are shown in Figure 14.3. In Case 1, only N is allowed to vary. A Pareto optimal set is obtained, as shown in Figure 14.3a. The optimal value of N corresponding to different points on the Pareto set is observed to decrease with increasing values of rjo (Figure 14.3b). The highest value of N (at low r?o) is determined by the lower bound on the inlet velocity, v*. Typically, collection efficiency in cyclones increases with increasing inlet velocity. Thus, higher values of r)0, are associated with lower values of N and higher values of Vj(Figure 14.3c), as expected. However, the
328
Madhumita B. Ray
maximum value of the overall collection efficiency is quite low (about 60%). Case 2 involves both N (= ui) and D (= u2) as the decision variables. The Pareto optimal set for this case (Figure 14.3a) is shifted to the right as compared to that for Case 1, implying that higher collection efficiencies than Case 1 are obtained. The highest value of rj0 is observed to be limited by the upper bound of 30 m/s on the inlet velocity. It can be seen that the diameter, D, slowly moves to its lower bound of 0.3 m (see Figure 14.3d). In Case 3, N (= ui) and D e /D (= U3) are taken as the decision variables (while keeping D at 0.5 m). The increase in the value of D from 0.3 m in Case 2 to 0.5 m in Case 3 reduces the optimal values of rjo in the Pareto optimal set. The difference in the qualitative behavior of the decision variables can be observed by comparing Cases 1 and 2 (Figures 14.3b and c). It is observed in this case that N and Vj are almost constant till some value of rjo while D e /D decreases (Figure 14.3e) to give higher rjo. Once D e /D reaches its lower limit of 0.4, r]o increases further only by a sudden increase in D e /D and Vj,and a decrease in N. Thereafter, N and V; again stay constant, and D e /D decreases continuously with increasing rjo. The points of change in the Vj and N curves coincide with the change in the D e /D curve. The efficiency increases considerably with decreasing diameter (De) of the outlet (vortex finder) although the trend indicates the presence of an optimum value. An optimum value of D e /D lies between 0.33-0.5. Between N and D e /D, the latter is more important in deciding the Pareto optimal set for this case. The inference that D e /D predominates over N in deciding the Pareto optimal set in the simplified Case 3, would not have been so evident if we had started out solving more complex problems from the very beginning. The methodology of first obtaining solutions for simplified cases with only one or two decision variables is thus highly recommended for all real-life and complex multiobjective optimization problems. For the cases involving N and any one of the other geometrical parameters (viz. B/D, S/D, H/D, h/D, a/D and b/D) as the decision variables, N is found to be the principal decision variable which controls the shape and characteristics of the Pareto optimal set. The results of the more general reference Problem 1 are shown in Figures 14.3 and 14.4 (filled circles). It is clear from Fig. 3 that the use of several decision variables simultaneously, leads to Pareto optimal sets with much higher values of the overall collection efficiency. Figure 14.4 gives the solution of Problem 2 (equation (3)). In this problem, the annual cost, Co, is minimized while r\o is maximized. The bounds
Applications of a MOGA in Chemical and Environmental Engineering
329
Fig. 14.3. Results for the simplified Cases 1-4 for Problem 1, (a) Pareto optimal sets showing Ap vs r\o for Cases 1-4; (b) N vs r\o for Cases 1-4; (c) v^ us no for Cases 1-4; (d) D vs rio for Case 2; (e) D e /D vs no for Case 3.
of the decision variables are the same as in Table 14.64. The cost Pareto optimal set in Figure 14.4a is found to extend over a lower range of values of rj0. The importance of both N and D e /D as decision variables controlling the Pareto optimal set for Problem 2 is also observed. Figure 14.4e shows the calculated values of Ap corresponding to the different points on the Covs.rjo Pareto optimal set. It is interesting to observe (Figure 14.4e) that the reference Pareto optimal set (Case 4, Problem 1) is almost indistinguishable from the computed Ap vs. r\0 curve corresponding to the cost Pareto optimal set over the range where their values of 7yoare similar.
330
Madhumita B. Ray
Similarly, the other decision variables superimpose in this range. The parallelism between these two curves suggests that three-dimensional Pareto optimal sets (maximize T]o, minimize Ap, and minimize Co) will not lead to substantially different results. Some numerical scatter is observed for the optimal values of the decision variables. Such scatter is common in GA and can possibly be somewhat reduced but not eliminated by changing the computational parameters and will be discussed later. Example 2 The performance of a venturi scrubber depends largely on the manner of liquid injection, drop size, liquid flux distribution and initial liquid momenta. Most of the particle collection occurs in the throat (see Figure 14.5) because of the presence of a high degree of turbulence in the region caused by large relative velocities between the drops and particles. In this example, we will see the application of the NSGA to determine the optimum nozzle distribution in a pilot scale venturi scrubber to improve the droplet flux distribution. We will see two optimization problems involving venturi scrubber in this section. In the first problem, a two-dimensional approach was used for the determination of the collection efficiency. Three decision variables, the liquid-gas ratio (L/G), the gas velocity in the throat (VG,J/I), and the aspect ratio of the throat, Z, were used. Optimal design curves were obtained for the pilot scale scrubber. In the second optimization problem of the scrubber, a three-dimensional model was used to determine the collection efficiency. A three-dimensional approach will produce results about optimum nozzle arrangement in venturi scrubbers which is a very important design variable dictating the flux distribution in scrubber. Three decision variables, the liquid-gas ratio (L/G), the gas velocity in the throat ( V Q ^ ) , and nozzle configuration, Nc, were used in the second problem. The models for collection efficiency and pressure drop of the venturi scrubber can be found in Ravi et al. (2002) 13. Problem Formulation Problem 3 is, thus, described mathematically by
Applications of a MOGA in Chemical and Environmental Engineering
331
Fig. 14.4. Results of Problem 2 and Case 4, Problem 1, (a) Pareto optimal sets showing cost vs T)o for both cases; (b) N vs r\o for both cases; (c) Vj vs r]o for both cases; (d) D e /D vs r\o for both cases; (e) Pareto optimal sets showing calculated Ap vs r\o for both cases. Max h(u)=h Min
(%,Vgth,Z) =T]O
I2(u)=I2{§,Vgth,Z)=AP subject to (s.t.): u\
(a) (b) (c) (d) (5)
Dust with a log-normal distribution of sizes (Mass median diameter = 5.0 /jm, standard deviation (<rp) — 1.5) was used for the collection efficiency
332
Madhumita B. Ray
calculation in the venturi scrubber. The dimensions of the venturi scrubber are presented in Figure 14.5. As before, the problem was converted into a pure minimization problem by denning fitness functions, Fi and F 2 (similar to equation (4)), both of which are to be minimized. The bounds used for the three decision variables in this problem are presented in Table 2. Additional constraints can also be incorporated using penalty function P e . For example, constraints can be imposed such that the collection efficiency is never below 75% and pressure drop never exceeds 5000 Pa. P e an arbitrarily large number is added to both the fitness functions, Fi and F 2 , for all the chromosomes violating this requirement which ensures that such chromosomes become unfit and die out, almost instantaneously (referred to as instant killing). Optimum scrubber design The results of Problem 3 are shown in Figure 14.6a (filled circles and squares). Optimal solutions for Problem 3 using three decision variables are presented by the filled circles while the unfilled circles represent the optimal solutions using the NSGA with only a single objective function (maximization of r]o) using three decision variables. Plot of r\0 vs. Ap shown in Figure 14.6a has the characteristics of a Pareto optimal set, wherein an improvement (increase) in r)ois accompanied with a worsening (increase) of Ap. Plots of the three decision variables corresponding to the different points on the Pareto optimal set are shown in Figures 14.6b-d. It is observed that the gas velocity at the throat, Vgth, varies along the points on the Pareto optimal set. The values of L/G that provide optimal operating conditions are found to be almost constant, varying in a narrow range of about 0.8xl0~ 3 - l.lxlO" 3 (Figure 14.6b). Similarly, the aspect ratio, Z (Figure 14.6d), needs to be maintained at around 2.5. A new dimensionless number, VAT [= LRoZ/(Gdon.,)], which characterizes the nonuniformity in the flux distribution and the overall collection efficiency observed to lie in the range 1.0xl0~3-1.3xl0~3. Earlier, Ananthanarayanan and Viswanathan (1998)14 who used the 'simulation procedure', and considered only the maximization of r\0 found that this number lies in the range of 1.0xl0~3-1.5xl0~3 at the optimal point, for an optimization problem involving only one objective function (maximization of the collection efficiency). The venturi numbers for the present problem are shown in Figure 14.6e. The computed values of the pressure drop for the single-objective func-
Applications of a MOGA in Chemical and Environmental Engineering Table 14.65.
333
Bounds on the Decision Variables (Problem 3)
Decision Variable L/G (m^liquid/m3 gas) V otft (m/s) Z
~
Lower Bound 0.5xl0~ a 20O) | 0-5 |
Upper Bound 2.0xl(r3 11O0 2.5
tion problem is higher than the values on the Pareto optimal set corresponding to two objective functions, and we could possibly select better points (similar r)ogut lower Ap) using the Pareto optimal solutions with two objective functions. A variation of Problem 3 where a 3-D collection efficiency model is used to consider an important design variable, the nozzle arrangement in the throat of the scrubber (Problem 4) is discussed here. Previously, for Problem 3, a slice consisting of a single nozzle was taken along the axial direction and the process was simulated in that plane, giving rise to a two-dimensional problem. In the three-dimensional problem, the entire scrubber was divided into grids and simulation was carried out throughout the scrubber for all the nozzles (Ravi et al., 2003)15. As before two decision variables are selected, namely, the liquid-to-gas flow ratio, L/G, the gas velocity at the throat, Vo.t/i- In addition, a new variable, the nozzle configuration, Nc is also selected. The bounds used for the three decision variables are presented in Table 14.66. From the experience of Problem 3, a range of 0.3xl0~ 3 - 1.4xlO~3 m3 of liquid/m3 of air has been chosen as the bounds for L/G. Bounds for the gas velocity, Vcth, (40 - 120 m/s) were decided based on industrial practice. Table 14.66.
Bounds on the Decision Variables (Problem 4)
Decision Variable L/G (m 3 liquid/m 3 gas) Vqth (m/s)
N^
'
Lower Bound 0.3xl(r3 40LO
1 1
Upper Bound 1.4xlQ- 3 mo
|5
The third decision variable, the nozzle configuration, Nc is the arrangement of nozzles for injection of the droplets. Five different nozzle configurations were considered. The optimum nozzle configuration is the configuration wherein the nozzles are arranged in a staggered triangular file. If we now compare the results of Problems 3 and 4, we see that basic characteristics of the Pareto optimal sets remain the same for the problems, but the values are slightly different because of the difference in the rigor
334
Madhurnita B. Ray
Fig. 14.5. Schematic diagram of the venturi scrubber used in the optimization (dimensions are in cm).
involved in the models used. Typically, higher collection efficiency occurs at a lower pressure drop in the 3-D case than that in the 2-D case, which may be due to better liquid distribution in the scrubber due to the nozzle arrangement in a staggered triangular pitch. The optimal L/G ratio in 3D case varied in a range of 0.4xl0~ 3 -1.0xl0~ 3 depending on the nozzle arrangement, while it was about l.lxlO" 3 for the 2D case (Problem 3). This reduction in optimum L/G value causes considerable saving in scrubbing liquid, subsequently reducing the pressure drop by 43%. Computational Parameters The number of generations required for convergence in the NSGA is problem specific. In all the problems discussed above, essentially random distribution of feasible solutions occurs at the first generation (N9 — 1). However, by the end of tenth generation (N9 = 10), most of the undesired solutions die out and a Pareto optimal set seems to emerge, although considerable scatter is present at this stage (which dies out quite slowly). By about the 100"1 generation, generally the Pareto optimal set has normally been reached. Further generations do not affect the nondominated solutions
Applications of a MOGA in Chemical and Environmental Engineering
335
Fig. 14.6. Optimal solutions (filled circles and squares) for the reference case (Problem 3) using three decision variables. Unfilled circles represent the optimal solutions using the NSGA with only a single objective function (maximization of 7/o), with three decision variables. Ap for the latter are computed values.
336
Madhumita B. Ray
much. The most important numerical parameters involved in the NSGA are the crossover (pc) and mutation (pm) probabilities, and the spreading parameter, a. Unfortunately, the choice of these parameters is also problemspecific, and hence prior knowledge of these is rather limited. For Problems 1-4, pc did not have much effect on the Pareto optimal set. However, the same cannot be said for the effect of the mutation probability, p m . Higher values of p m result in large gaps in the Pareto optimal set, as well as some amount of scatter occurs in the Pareto optimal set. On the other hand, solutions obtained with lower values of p m show scatter, particularly at high values of r\0. The best value of this computational parameter has to be established by trial. The spreading parameter, a, determines the range covered by the Pareto optimal set, and the best value of this parameter, too, has to be obtained by trial. The numerical parameters used in Problems 1-4 are listed in Table 14.67.
Table 14.67. Devices
Numerical Parameters Used In Optimization of Gas-Solid Separation
Computational Parameters Maximum number of generations, maxgen Population size, N p Probability of crossover, p c Probability of mutation, p m Random Seed Spreading parameter, a Exponent controlling the sharing effect, a Computational time on (CRAY J916)
Problem 1 Problem 3 500 80 100 ~50 0.65 0.65 0.001 0.001 0.87619 0.87619 0.015 0.015 2 2 0.24 s 1.24 s
Problem 4 100 100 0.55 0.001 0.87619 0.005 2 1.35 s
In this work, NSGA-I which has been used extensively earlier in chemical engineering problems, was used. However, there are some disadvantages in NSGA-I. For example, the sharing function used to evaluate niche count of any chromosome requires the values of two parameters, which are difficult to assign a-priori in NSGA-I. The total complexity of NSGA-I is MNp, where M is the number of objective functions, and Np is the number of chromosomes in the population. In addition, NSGA-I does not use any elite-preserving operator and so, good parents may get lost in time. Deb et al. (2002) have recently developed an elitist non-dominated sorting genetic algorithm (NSGA-II) to overcome these limitations.16
Applications of a MOGA in Chemical and Environmental Engineering
337
14.5. Conclusions The NSGA has been successfully applied for the optimization of gas-solid separation devices used for particulate removal from air. Pareto optimal solutions relating different process variables were obtained. The algorithm is quite robust for generating non-inferior solutions for large-scale complex problems of industrial significance. A better understanding of the values of the computational parameters is required to increase the speed of convergence. Index a height of the cyclone inlet (m) b width of the cyclone inlet (m) B diameter of the base of the cyclone (m) Co total cost ($/yr) d0 orifice diameter of the nozzle in scrubber (mm) D diameter of the cyclone (m) De diameter of the exit pipe (m) Dp mass-mean diameter of solids (/im) h height of the cylindrical portion of the cyclone (m) H total height of the cyclone (m) / objective function (dimensionless) L length of the venturi scrubber (m) L/G liquid to gas flow ratio, dimensionless N number of cyclones Ng generation number, dimensionless pc probability of crossover, dimensionless pm probability of mutation, dimensionless Pe penalty function, dimensionless S depth of the exit pipe of the cyclone (m) Ro half-width of the venturi throat parallel to water injection (m) u decision variable, dimensionless Vi inlet velocity in the cyclone (m/s) Vgth gas velocity at the throat of the scrubber (m/s) VN venturi number, dimensionless Wo width of venturi throat perpendicular to water injection (m) Z aspect ratio, dimensionless
338
Madhumita B. Ray Greek symbols A p pressure drop (Pa) T]o overall collection efficiency a spreading parameter
References 1. Holland, J. H., Adaptation in Natural and Artificial Systems, (University of Michigan Press, Ann Arbor, MI, 1975). 2. N. Srinivas and K. Deb, "Multiobjective optimization using nondominated sorting in Genetic Algorithms". Evol. Comp., 2, 221-248 (1994). 3. V. Bhaskar, S.K. Gupta and A.K. Ray, "Applications of multi-objective optimization in chemical engineering", Reviews in Chemical Engineering, Vol. 16 (1), 1-54 (2000). 4. K. Mitra, K. Deb and S. K. Gupta, "Multiobjective Dynamic Optimization of an Industrial Nylon 6 Semibatch Reactor Using Genetic Algorithm," J. App. Polym. Sci., 69, 69-87 (1998). 5. R R. Gupta and S.K. Gupta, "Multiobjective Optimization of an Industrial Nylon 6 Semibatch Reactor Using Genetic Algorithm J. App. Polym. Sci., 73, 729-739 (1999). 6. J.K. Rajesh, S. K. Gupta, G. P. Rangaiah and A. K. Ray, "Multiobjective Optimization of Steam Reformer Performance using Genetic Algorithm," Ind. Eng. Chem. Res. 39, 706-717 (2000). 7. V. Bhaskar, S.K. Gupta and A.K. Ray. Multiobjective Optimization of an industrial wiped film PET reactor, AIChE J. 46, 1046-1058 (2000). 8. C. C. Yuen, Aatmeeyata, S. K. Gupta and A. K. Ray, "Multiobjective Optimization of membrane separation modules using genetic algorithm", J. Memb. Sci., 176 (2), 177-196 (2000). 9. D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning, (Addison-Wesley, Reading, MA, 1989). 10. K. Deb. Optimization for Engineering Design: Algorithms and Examples (Prentice Hall of India, New Delhi, 1995). 11. G. Ravi, S. K. Gupta and M. B. Ray. "Multiobjective Optimization of Cyclone Separators using Genetic Algorithm". Ind. Eng. Chem. Res. 39, 42724286 (2000). 12. C.J. Stairmand. "The Design and Performance of Cyclone Separators". Trans. Inst. Chem. Eng, 29, 356-383 (1951). 13. G. Ravi, S.K. Gupta, S. Viswanathan and M. B. Ray, "Optimization of Venturi Scrubbers using Genetic Algorithm" Ind. Eng. Chem. Res. 41, 29883002 (2002). 14. N.V. Ananthanarayanan and S. Viswanathan. "Estimating Maximum Removal Efficiency in Venturi Scrubbers", AIChE J. 44, 2549-2560 (1998). 15. G. Ravi, S.K. Gupta, S. Viswanathan and M. B. Ray, Multiobjective Optimization of Venturi Scrubbers Using a Three-dimensional Model For Collection Efficiency, Journal of Chemical Technology and Biotechnology, 78(2-3),
Applications of a MOGA in Chemical and Environmental Engineering
339
308-313 (2003). 16. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6(2), 182-197 (2002).
CHAPTER 15 MULTI-OBJECTIVE SPECTROSCOPIC DATA ANALYSIS OF INERTIAL CONFINEMENT FUSION IMPLOSION CORES: PLASMA GRADIENT DETERMINATION
R.C. Mancini", S.J. Louis6, I.E. Golovkinc, L.A. Welsera, Y. Ochi d , K. Fujita d , H. Nishimura d , J.A. Koch e , R.W. Lee e , J.A. Delettrez', F.J. Marshall', I. Uschmann 9 , E. Foerster^, L. Kleind a Department of Physics, University of Nevada, Reno Department of Computer Science, University of Nevada, Reno c Prism Computational Sciences, Madison, Wisconsin Institute of Laser Engineering, Osaka University, Osaka, Japan e Lawrence Livermore National Laboratory, Livermore, California * Laboratory for Laser Energetics, University of Rochester, Rochester, New York 9 Institute of Optics and Quantum Electronics, Jena University, Jena, Germany Department of Physics and Astronomy, Howard University, Washington, D.C. We report on a spectroscopic method for the characterization of the spatial structure of inertial confinement fusion implosion cores based on the self-consistent analysis of simultaneous narrow-band X-ray images and X-ray line spectra. The method performs a search in multi-dimensional parameter space for the temperature and density gradients that simultaneously yield the best fits to narrow-band spatial emissivity profiles obtained from X-ray images, and spectral line shapes recorded with crystal spectrometers. A multi-objective Niched Pareto Genetic Algorithm (NPGA) was developed to efficiently implement the multi-criteria data analysis. The availability of the NPGA is critical for the practical implementation of this analysis method, since NPGA-driven searches in parameter space typically find suitable solutions in approximately 105 evaluations of the spectral model out of a total of 1018 possible cases (i.e. size of the parameter space). Furthermore, analysis of solutions on the Pareto front permits us to address the issue of uniqueness of the solution and the uncertainty of the optimal solution. The performance of the NPGA is illustrated with spectroscopic data recorded in a series of stable and spherically symmetric implosion experiments where argondoped deuterium-filled plastic shells were driven with the GEKKO XII (Institute of Laser Engineering, Japan) and OMEGA (Laboratory for
341
342
R.C. Mancini, et. al.
Laser Energetics, USA) laser systems. This measurement is relevant for understanding the spectral formation and plasma dynamics associated with the implosion process. In addition, since the results are independent of hydrodynamic simulations they are important for the verification and benchmarking of detailed hydrodynamic simulations of high-energydensity plasmas.
15.1. Introduction The determination of plasma core parameters during the implosion of an Inertial Confinement Fusion (ICF) deuterium-filled plastic-microballoon can provide critical information about the plasma state in the final stages of the implosion. A wide variety of particle and radiation based instruments are currently used to diagnose implosion dynamics and, in particular, to determine core conditions. In this connection, X-ray spectroscopic measurements have proven to be a very powerful diagnostic for these high-energy-density plasmas1. Over the years and with the aid of tracer elements, several types of emission and absorption X-ray spectral features have been observed and modeled in detail with the goals of understanding their plasma density and temperature sensitivity, and in turn investigating their potential for plasma spectroscopy diagnostics. Mostly, these X-ray spectroscopic diagnostics have relied only on the analysis of spectra and have been applied to deduce emissivity-averaged or effective temperature and density. However, when the line emission extends over the whole plasma source (i.e. implosion core) the usefulness of these measurements is questionable. We discuss a method for the determination of spatial gradients in ICF implosion cores based on multi-objective analysis of X-ray spectroscopic data. Previous X-ray spectroscopy studies of implosion cores have failed to address the gradient problem, even though hydrodynamic simulations predicted significant gradients in these plasma conditions. The work that we describe here is a unique attempt to address this problem through the simultaneous analysis of time-resolved X-ray images and X-ray line spectra of the implosion process. As implosion dynamics are an essential component of ICF and important to applications involving X-ray spectroscopy diagnostics, the quantitative measurement of the hydrodynamic behavior of implosions is a significant step in developing a predictive capability. The evolution of the plasma gradients is also of basic relevance to atomic physics studies of level population kinetics, electron thermal conduction, radiation energy coupling and transfer, and spectral line formation. The idea of the method is based on performing systematic searches in
Plasma Gradient Determination
343
a multi-dimensional parameter space for the temperature and density spatial gradients that simultaneously yield the best fits to narrow-band spatial emissivity profiles obtained from X-ray images and spatially-integrated spectral line shapes recorded with crystal spectrometers. A multi-objective Niched Pareto Genetic Algorithm (NPGA) was developed to efficiently implement this multi-objective data analysis technique. Indeed, the availability of the NPGA is critical for the practical implementation of this analysis method, since NPGA-driven searches in parameter space typically find suitable solutions in approximately 105 evaluations of the spectral model (and spatial gradient selections) out of a total of 1018 possible cases. Although spatially resolved measurements of plasma conditions at these high energy densities have not been carried out in the past, the study of the average properties of ICF implosion cores has progressed from experimental measurements of peak electron densities in the compressed core2"6, to the determination of spatially averaged, but temporally resolved, electron temperatures,
344
R.C. Mancini, et. al.
15.2. Self-Consistent Analysis of Data from X-ray Images and Line Spectra The determination of plasma temperature and density gradients in the core of an implosion requires the self-consistent modeling and analysis of time-resolved X-ray line spectra and time-resolved X-ray narrow-band images. This can be accomplished by doping the deuterium-gas fill with small amounts of a suitable tracer element (e.g. argon) that can provide adequate line radiation signals for the analysis without affecting the hydrodynamics. The argon concentration should also be low enough to ensure that the argon line emission is as optically thin as possible. Analysis of spatiallyintegrated time-resolved line spectra usually results in spatially averaged values of plasma density and temperature, and can provide no information concerning the gradients. As an illustration of this problem, Fig. 15.1 displays three combinations of one-dimensional (1-D) temperature and density gradients that result in almost identical space-integrated spectra of the Is3p X P - Is2 XS He/3 line and associated Li-like satellite transitions in argon. Note that although, for illustrative purposes, these calculations were performed for linear gradients, more complicated gradients may in actuality occur. Hence, to determine and characterize core gradients unambiguously, additional information has to be taken into account in the analysis. Additional information for the characterization of core gradients can be obtained from the analysis of time-resolved X-ray narrow-band images. The two-dimensional narrow-band images provide a spatial map of the emissivity that is dependent on the spatial gradients in temperature and density. In the implosion data discussed here, the narrow-band X-ray images are dominated by line transition emission and have a negligible contribution from continuum emission. Although emissivity maps provide important spatially-resolved information about the plasma source, they do not impose a sufficient constraint to provide spatial information on both temperature and density gradients. On the other hand, using the constraints imposed by self-consistently fitting the spectra (whose overall broadening depends on the density) and the spatially-resolved relative distribution of narrow-band emissivities at a given time, provides for that time both the temperature and density as a function of spatial coordinate. Thus, we can extract the electron temperature Te(r,t) and density Ne(r,t) from this analysis18'19. The conceptual idea of the analysis method and its implementation is schematically illustrated in Fig. 15.2. Time-resolved argon He^, He7 (Is4p XP - Is2 1S) and Ly/3 (3p 2 P - Is
Plasma Gradient Determination
345
Fig. 15.1. Argon He/3 line space-integrated spectra (I, bottom right) and spatial emissivity profiles (E, bottom left) for three combinations of electron temperature (T e , top left) and density (N e , top right) gradients. Note, importantly, that the emission spectra in the lower right would be observably the same for all cases.
Fig. 15.2. Schematic illustration of the spectroscopic method for the determination of electron temperature and density gradients in the core based on the self-consistent analysis of data from X-ray images and line spectra.
346 2
R.C. Mancini, et. al.
S) line spectra, and their associated He- and Li-like satellite transitions, were recorded and used for the analysis. Opacity effects on these lines are smaller than for the Hea (Is2p XP - Is2 1S) and Lya (2p 2 P - Is 2S) lines; in addition, using a small concentration of dopant (argon) will further reduce possible opacity effects. To ensure that this condition is satisfied, the spectra were analyzed with a detailed argon K-shell spectral model and code that can consider both uniform and non-uniform plasmas, and optically thin and optically thick approximations. Opacity effects in the model were taken into account by self-consistently solving the radiation transport equation and a set of collisional-radiative population kinetics equations20. For a given set of data (Fig. 15.2), first we perform the analysis of the spectra considering a uniform plasma approximation. This allows us to extract the emissivity-averaged electron temperature and density of the core, under the assumption that the lines are optically thin20. To check this assumption, the same spectra is also analyzed considering instead a uniform plasma in the optically thick approximation, and the temperature and density extracted in this way are compared with the optically thin results. The difference between optically thin and thick analysis results can then be used as a measure of the importance of opacity effects in the spectra. This idea was previously employed to systematically study opacity effects in the spectra from argon-doped implosions at the NOVA laser facility of Lawrence Livermore National Laboratory21. Next, we analyze the spectra using temperature and density gradients which are subject to the constraint of reproducing the correct value for the emissivity-averaged temperature and density obtained with the uniform model analysis. Further, the same gradients are also used to fit the spatiallydependent emissivity extracted from the analysis of the argon He/3 X-ray narrow-band image. In this way, a set of self-consistent electron temperature and density gradients are extracted from the data that simultaneously yield fits to the line spectrum and the narrow-band emissivity spatial profiles, further subject to the emissivity average constraint. This procedure is schematically illustrated by the self-consistency iteration loop in Fig. 15.2. The search for suitable gradient functions is performed with the aid of a niched Pareto genetic algorithm technique. Emissivity profiles in the plasma source (i.e. implosion core) are obtained from narrow-band X-ray images using the Abel inversion procedure22. Although usually discussed for cases of cylindrical geometry, the Abel inversion method can also be considered for spherical geometry23 as well as generalized cylindrical geometry cases. Finally, we note that in argon-doped deuterium plasmas electron and ion
Plasma Gradient Determination
347
number densities can be considered equal since most of the electrons come from the ionization of deuterium. Hence, the extracted electron density gradients are the same as the ion number density (and mass density) gradients. 15.3. A Niched Pareto Genetic Algorithm for Multi-Objective Spectroscopic Data Analysis Genetic algorithms are search and optimization algorithms based on the mechanics of natural selection24. They are capable of finding solutions in poorly understood search spaces while exploring only a small fraction of the space, and can robustly deal with complex non-linear problems. We have shown that in the case of spectroscopic analysis of implosion-core spatiallyintegrated line spectra (i.e. a single-objective optimization problem), genetic algorithms efficiently search a two-dimensional parameter space to find the electron temperature and density values that yield best fits to the data assuming a uniform plasma approximation25. However, as is illustrated in Fig. 15.2, the problem of core gradient determination requires multiobjective data analysis since several pieces of data (i.e. spatially-integrated line spectra and spatially-resolved emissivity profiles) have to be simultaneously and self-consistently approximated with a single selection of electron temperature and density gradients. Furthermore, flexible-enough encoding algorithms of plasma gradients results in large, multi-dimensional search spaces. Thus, an efficient and robust algorithm is required to effectively implement the spectroscopic analysis illustrated in Fig. 15.2. Our strategy is to use the principle of Pareto optimality in designing a Pareto optimal genetic algorithm26"28. At each generation, there is a set of non-dominated solutions in fitness space that form a surface known as the Pareto optimal front (or the Pareto front). The goal of a Pareto optimal genetic algorithm is to find and maintain a representative sampling of the solutions on the Pareto front. If the criteria are not self-contradictory, then there should be a point on the final convex front that satisfies all criteria well (see Fig. 15.3). In our case, this solution will be considered as the solution to the multi-objective spectral analysis problem18. If there is no such point (concave front), an expert decision has to be made about which (if any) solution on the concave Pareto optimal front represents the most acceptable, physically sound solution. The result of the analysis in this case may not be reliable. Tracing the Pareto optimal front also helps to address the issue of solution uniqueness. The idea of the analysis is to simultaneously minimize for each of the
348
R.C. Mancini, et. al.
Fig. 15.3. Illustration of a two-objective search for successful (concave front, a) and unsuccessful (convex front, b) cases.
objectives the difference x2 between the experimental and synthetic data computed with the spectral model and denned by, 2 p eor 2
x = s^(/r -/f ) i
where the Ii represents either intensity or emissivity and o^is a weight factor. A particular choice of the weight factor may have an impact on the performance of the algorithm. It may also be important for the estimation of uncertainty intervals29. Since our primary goal was to develop and study the performance of a niched Pareto genetic algorithm, we set the weight factors to 1 for the spectra, and (1/Iexp)2 for the emissivities. This was done to compensate for possible large changes in the range of values of the emissivity profile. Therefore we measure the performance or fitness of each candidate as l/\2 (the higher the performance, the better the fit). The crucial difference between a canonical genetic algorithm and the niched Pareto genetic algorithm is in the implementation of selection. We implemented Pareto domination tournament selection where two candidates are picked at random from the population. A comparison set of individuals is also picked randomly from the population. Each of the candidates is then compared against each individual in the comparison set. If one candidate is dominated by the comparison set, and the other is not, the latter is selected for reproduction. If neither or both are dominated by the comparison set, then we must use sharing to choose a winner. The equivalence class sharing implemented in our model defines the winner as an individual that has the smallest number of the other individuals inside its niche. This technique helps to maintain diversity along the Pareto front. Niche size gets adjusted automatically for each generation based on the average area of the front. We also normalize the objective function for each generation so that the objective function for each criterion ranges from 0 to 1. In order to increase selection pressure we use an elitist scheme where:
Plasma Gradient Determination
349
1) members of the current generation and offspring are combined in a common pool in each generation; 2) the solutions along the Pareto front are selected for the next generation and removed from the pool; 3) the procedure is repeated until the next generation is filled. We have found empirically that elitism combined with uniform crossover provides reliable and rapid convergence for our problem of spectroscopic analysis18. The size of the comparison set controls selection pressure. However, when using an elitist scheme, the algorithm is not very sensitive to size. In our implementation we compare each candidate against 5 individuals. Probabilities of crossover and mutation are 0.95 and 0.05, respectively. A systematic study was performed by varying the genetic algorithm parameters in order to ensure reliability and optimize performance of the algorithm18'30. The implementation we discussed above turned out to be the best for our purposes. 15.4. Test Cases Before applying the spectroscopic analysis and its implementation to actual data using a niched Pareto genetic algorithm, we tested it out in a number of cases using "synthetic" data where the solution was known. First, we considered the case of parabolic (with missing linear term) temperature and density gradients. As shown in Fig. 15.4, each gradient formula was determined by two coefficients that were computed from the values of two parameters: the values of the gradients at the center (i.e. T(0) and iV(0)) and edge (i.e. T(R) and N(R)) of the implosion core. Each of these parameters was allowed to take values in a suitable finite range and was encoded using five bits. Thus, the chromosome length in this case was 20. The population size was 100 and we ran the NPGA code for 150 generations.
Fig. 15.4. Encoding and chromosome of parabolic temperature and density gradients. One electron Volt (eV) of temperature is equivalent to 11605K.
350
R.C. Mancini, et. at.
With this gradient encoding and chromosome length and structure, the size of the parameter space (i.e. total number of possible temperature and density gradient combinations) is 324 = 1,048,576. Hence, an exhaustive search of the parameter space would require 324 evaluations of the spectral model. The NPGA code finds the right solution after evaluating the spectral model for 3,000 to 5,000 temperature and density gradients. This represents a small fraction of the total parameter space for this problem. This is also important because spectral model evaluations can be computationally expensive and, in order to investigate the uniqueness of the solution, analysis runs are repeated several times for the same dataset but using a different random initialization of the first generation. To illustrate the performance of the algorithm, Fig. 15.5 shows results for a typical run of this test case using two objectives. The emissivity distribution and the line spectrum in Fig. 15.5 are those of the argon He/3 spectral feature20. This is one of the spectral signatures that are used in the spectroscopic analysis of X-ray line emission from implosion cores, and the electron temperature and number density conditions are typical of the conditions achieved at the collapse of laser-driven implosion experiments10'13"15. The "synthetic" data are characterized by six spatial zones, and 10% noise has been added to it in order to approximate real data. Good fits to the line spectrum are found first since the spectrum is spatially integrated; however, as the run progresses, good fits to the spatially-resolved emissivity are also found. Eventually, a convex Pareto front begins to develop leading to gradients that simultaneously and self-consistently fit well both line spectrum and emissivity profile. The optimal solution is extracted from the upper right corner of the convex Pareto front, and it coincides with the right solution (see Fig. 15.5). Points along the Pareto front but away from the optimal solution do not satisfy both objectives simultaneously and can clearly be discarded. It is also important to investigate points in the vicinity of the optimal solution. If the gradients associated with these points are similar to those of the optimal solution, then the solution is unique and the spread in solutions can be used as part of the uncertainty estimation. On the other hand, if there are points that satisfy all objectives well but nevertheless have different gradients, then we have alternative solutions and the analysis is ambiguous. It is expected that the higher the level of noise in the data, the larger the chances of finding alternative solutions. For up to 10% of noise, we have found that the solution is unique. However, starting at 15% to 20% of noise level alternative solutions are found and the results of the analysis
Plasma Gradient Determination
351
Fig. 15.5. Parabolic gradients test case results. Top: early development and propagation of Pareto front in fitness space. Middle: self-consistent fits to spatially-resolved emissivity profile (left) and spatially-integrated line spectrum (right). Bottom: self-consistent density (left) and temperature (right) gradients. Gradients in the vicinity of the optimal solution are also displayed.
become ambiguous. Next, we consider a test case based on "synthetic" data computed using plasma core gradients from a laser-driven implosion numerical simulation performed with the one-dimensional Lagrangian radiation-hydrodynamics code LILAC14. Fig. 15.6 displays the time-history of core electron temperature and density gradients calculated by LILAC through the collapse of the implosion. In this case, the simulation was done for a plastic target of
352
R.C. Mancini, et. al.
937 /im of initial exterior diameter, 24 //m of wall thickness, filled with 20 atm of deuterium, doped with 0.1% of argon, and irradiated with a square laser pulse of 500 ps duration and 15 kJ of UV (1/3 ^m wavelength) laser energy. The main shock wave hits the center of the target at t = 1.9 ns. Gradients are shown every 200 ps for a period of 1 ns. At t = 3.0 ns the core radius reaches a minimum value of about 53 fxm, which corresponds to a convergence ratio for the core of about 8.
Fig. 15.6. Time-history of core temperature (top) and density (bottom) gradients through the implosion collapse computed with the one-dimensional radiationhydrodynamics code LILAC.
The functional dependence of these gradients on radius cannot be well approximated by the simple parabolic gradients considered in the previous test case. Thus, a more general algorithm for encoding implosion core gradients is needed. First, we work with six spatial zones since the implosion cores are about 60 /zm in diameter and current X-ray imagers have a spatial resolution of 10 /zm. In each spatial zone temperature and density can take values within a suitable range. Using a 5-bit encoding, this results in 32 uniformly-spaced values for both the temperature and the
Plasma Gradient Determination
353
density. Further, maximum relative changes between adjacent spatial zones are bounded (typically by 60%), and to avoid unrealistic changes in these zone-by-zone given gradients a polynomial fit is performed. Thus, the total number of temperature and density gradients described by this algorithm (i.e. the size of the search parameter space) is ((32) 6 ) 2 =1.2xl0 18 . Now, the chromosome length is 60 and we work with populations of 200 to 300 members. In our experience, this gradient-generating algorithm has proven to be thorough and flexible to accommodate core gradients through the collapse of the implosion. However, it may need to be extended or changed to deal with other application problems. As another test-case illustration of the performance of the spectroscopic analysis driven by the NPGA code, Figs. 15.7 and 15.8 show the comparison between LILAC simulation gradients (see Fig. 15.6) and the corresponding gradients found by the NPGA. Again, the analysis is based on "synthetic data" associated with the argon He/3 X-ray image and line spectrum. For all times, the NPGA code consistently finds the correct gradients and approximates quite well the spatial-coordinate functional dependence.
Fig. 15.7. Comparison of LILAC simulation and NPGA-found gradients for t=2.6ns and t=2.8ns.
354
R.C. Mancini, et. al.
Fig. 15.8. Comparison of LILAC simulation and NPGA-found gradients for t=3.0ns and t=3.2ns.
Finally, we note that a parallel version of the NPGA code for spectral analysis production runs was also developed31. The evaluation of the spectral model is computationally more expensive than the execution of the NPGA logic, and the evaluations of the spectral model are independent from each other. Hence, the parallelization of the spectral model evaluations in the NPGA code is quite straightforward and relatively easy to implement. It should result in significant speed-up so long as the cost of population member evaluation dominates the communication cost. Indeed, we did a study of the parallel NPGA performance with number of nodes in a PC-cluster and found a linear speed-up with up to 20 nodes. 15.5. Application to Direct-Drive Implosions at GEKKO XII The laser-driven direct-drive implosion experiments were performed at the Osaka University GEKKO XII laser system. The array of diagnostics instrumentation included a monochromatic X-ray framing camera, essential for the spatial resolution of the plasma. The X-ray monochromatic images were used to obtain, for the first time in an implosion experiment, spatially
Plasma Gradient Determination
355
and temporally resolved data on the collapsing core. In addition, timeresolved but spatially averaged, streak spectrograph spectral data for the usual spatially averaged diagnostic were recorded. The drive consisted of a 12 beam, 2.55 kJ Nd glass laser operating at 526 nm. Random phase plates were used to smooth individual laser beams. The laser pulse was composed of a 0.2 ns pre-pulse followed by a 1.6 ns square pulse with rise time of 0.05 ns. Targets were plastic shells, 500 /jm in diameter, with 8 /an wall thickness, filled with 30 atm of deuterium and doped with 0.075 atm of argon (for diagnostic purposes). The implosion was diagnosed by recording both the compressed core image and the argon line spectrum. In particular, time-resolved X-ray monochromatic images and simultaneous spatially integrated X-ray spectra of the He/3 spectral feature were recorded. For the spatial information, a two-dimensional X-ray monochromatic framing camera imaging the central 19 eV of the He/? line emitted by the argon in the core was employed32. This X-ray imager monitors the implosion symmetry and provides up to 5 frames with At=40 ps duration, 50 ps interframe time, and 10 fxm spatial resolution. The X-ray spectrometer consisted of a flat RbAP (100) crystal coupled to an X-ray streak camera with 10 ps time resolution and resolving power, A/AA, of 600. It is important that the entire core of the implosion is in the field of view of the spectrograph so that a spatial average over the imploded core radius is recorded for the determination of both
356
R.C. Mancini, et. al.
Fig. 15.9. Argon He/3 image (top left) and line spectrum (top right) data for GEKKO XII shot 22091. Fits yielded by the self-consistent gradients found with the NPGA code for the emissivity profile (middle left) and line spectrum (middle right). Also shown is the line spectrum fit based on a uniform model analysis. Bottom: self-consistent gradients found with the NPGA-driven spectral analysis.
of parameter space). The self-consistent gradients, and the fits they yield to the data, are displayed in Fig. 15.9. Their emissivity-weighted averages are 600 eV for the electron temperature and 3xlO23 cm"3 for the electron number density. These values are consistent to within 4% of those obtained with the uniform model analysis. The gradients' uncertainties indicated in Fig. 15.9 account for deviations from (perfect) spherical symmetry and the spread of solutions about the optimal Pareto front solution. Finally, Fig. 15.10 shows the time-history of core gradients extracted by the NPGA code from a sequence of four consecutive framed images and their corre-
Plasma Gradient Determination
357
sponding line spectra. In all cases we found that temperature and density are counter-correlated. This is consistent with the idea of an isobaric core, and it is also in good agreement with one-dimensional hydrodynamic code simulations17.
Fig. 15.10. Time-history of core gradients for GEKKO XII shot 22091 determined by the NPGA-driven spectral analysis. Each time interval represents a time-integration over 40 ps.
15.6. Application to Indirect-Drive Implosions at OMEGA The indirect-drive implosion experiments were performed at the OMEGA laser facility of the Laboratory for Laser Energetics at the University of Rochester, with support from the National Laser Users' Facility program and in collaboration with Lawrence Livermore National Laboratory. The target consisted of a gold cylindrical hohlraum with a plastic capsule inside. The gold hohlraum was 2500 /xm long, 1600 fim in diameter, and had 1200 fim laser entrance holes (LEH). The plastic capsule had an external diameter of 510 fim with a wall thickness of 35 fim, i.e., the initial core diameter was 440 /xm, and it was placed at the center of the hohlraum. The core was filled with 50 atm of deuterium and 0.1 atm of argon. The argon tracer is added to the core fill for diagnostic purposes and resulted
358
R.C. Mancini, et. al.
in a typical optical depth of the argon He/? line of 0.3, and less for the Ly/3 line. The capsules were designed so that the hohlraum X-ray radiation did not directly penetrate to the fill gas. These hohlraum targets were irradiated with 30 UV OMEGA beams, split into 15 beams per LEH that were arranged in two cones of 5 and 10 beams each. The beam cones were pointed to produce two rings of beams on each end of the hohlraum. The laser energy per beam was 500 J, for a total UV laser energy of 15 kJ, producing a hohlraum radiation temperature of 210 eV. Three diagnostic holes placed on the side of the hohlraum provided lines-of-sight for the MMI-2 and GMXI X-ray imagers, and for a streaked X-ray crystal spectrometer. MMI-2 is a pinhole-array flat multi-layer mirror Bragg reflector instrument that records numerous narrow-band (~75 eV/image) X-ray images in the photon energy range from 3000 eV to 5000 eV with ~ 10 /xm spatial resolution. The pinhole array is comprised of 1280 pinholes, each 5 fim in diameter and separated by 70 /zm, and is attached to the hohlraum target 15 mm from the capsule. MMI-2 operates with a magnification of 8. The wavelength dispersion effect is provided by a WBC4 multi-layer mirror, with layer thickness of approximately 15 A. Data from MMI-2 can be used to construct narrow-band images from several lines as well as continuum images. In addition, a space-integrated X-ray line spectrum can be extracted from MMI-2 data. The line spectrum covers the spectral range of the He/3 and Ly/? lines and their associated Li- and He-like satellite transitions33"36. We focus on the analysis of data recorded with MMI-2. Fig. 15.11 shows the image data recorded by MMI-2 in OMEGA shot 26787. A characteristic of MMI-2 core image data is that while both horizontal (x) and vertical (y) axes represent spatial resolution, the y-axis is also a spectral resolution axis. Thus, several groups of adjacent core images display narrow-bands of bright (line) emission covering different portions of the image. Working with groups of images (see Fig. 15.11), core X-ray images associated with different narrow-band ranges can be extracted from the data 33 ' 35 . In addition, a wide horizontal integration of the image data produces the spatially-integrated line spectrum. Given the narrow-band spatial emissivity profiles obtained from the reconstructed core images and the spatiallyintegrated line spectrum, we performed the spectroscopic analysis of core gradients using the NPGA code. Each dataset is analyzed at least ten times to check for solution uniqueness, each time starting with a different random initialization of the first generation. We found that the NPGA code also works very well for data from indirect-drive implosions, and finds the solution after approximately 105 model evaluations (i.e. gradient selections).
Plasma Gradient Determination
359
Fig. 15.11. MMI-2 data from OMEGA shot 26787. Top: image data covering the spectral region of Lya, He/3 and Ly/3 line emissions. Middle: extraction of six sub-images that contain Ly/3 line emission. Bottom: Ly/3 core image constructed from the addition of the six Ly/3 sub-images and the substraction of a nearby continuum image.
The self-consistent gradients, and the fits they yield to the data, are shown in Fig. 15.12. Comparison between emissivity-weighted averages and uniform model analysis results is also good36. 15.7. Conclusions We have discussed the use of a niched Pareto genetic algorithm for the determination of plasma temperature and density gradients through the collapse of laser-driven implosion cores based on the self-consistent analysis of simultaneous spatially-resolved X-ray image data and the spatiallyintegrated X-ray line spectrum. The idea is to explore a parameter space of gradient functions searching for gradients that yield the best fits to X-ray narrow-band emissivity profiles and line spectra. In addition, the emissivityweighted averages of these gradients are subject to matching the results of the uniform model analysis. Since the analysis involves satisfying (simultaneously) multiple objectives and searching in multi-dimensional parameter spaces, it does require an efficient and robust algorithmic implementation in a computer code. In this connection, we have developed a niched Pareto genetic algorithm that drives the spectroscopic analysis and successfully finds self-consistent temperature and density gradients. The method was
360
R.C. Mancini, et. al.
Fig. 15.12. Self-consistent gradients for OMEGA shot 26787 (bottom), and the fits they yield to the emissivity spatial profile (He/3, top right) and the spatially-integrated line spectrum (top left) including the argon He/3 (3680 eV) and Ly/3 (3935 eV) line emissions. Emissivity-weighted averages of the gradients and uniform model analysis results both yield 950 eV for the electron temperature and 1.3xlO24 cm" 3 for the electron number density.
first applied to a number of test cases based on "synthetic" data where the gradient solution was known, and subsequently applied to real-world cases with data recorded in direct- and indirect-drive laser-driven implosion cores. In all cases, the NPGA code successfully solved the inversion problem posed by the spectroscopic analysis. An important component of the analysis method is the gradient encoding algorithm. In an effort to be thorough and flexible, our current gradient encoding algorithm results in large parameter spaces. An exhaustive search of these parameter spaces would involve evaluating the spectral model for about 1018 different combinations of temperature and density gradients. In this regard, the performance of the NPGA code is impressive since it finds the solution in approximately 105 model evaluations. This is important since spectral model evaluations can be (depending upon the amount of physics included in the model) computationally expensive. We emphasize that the NPGA code we have developed is general and thus it can be applied to other problems of spectroscopic analysis. Recent new applications of the NPGA technique to plasma spectroscopy include three-objective analysis of several X-ray narrow-band im-
Plasma Gradient Determination
361
ages and line spectrum, and multi-objective analysis of continuum-based X-ray images for diagnosis of cryogenic implosions37. Preliminary results look promising and suggest yet another spectroscopic analysis inversion problem where the NPGA technique can also play a critical role. Acknowledgments This work was supported by DOE-NLUF Grants DE-FG03-01SF22225 and DE-FG03-03SF22696, NSF Grant 9624130, ONR contract 00014-03-1-0104, LLNL under the auspices of DOE contract W-7405-ENG-48, DOE-HEDS DE-FG03-98DP00213, and the Japan-Germany and Japan-US collaboration program of JSPS. References 1. Plasma spectroscopy in inertial confinement fusion and soft x-ray laser research, H. Griem, Phys. Fluids B 4, 2346 (1992). 2. Direct measurement of compression of laser-imploded targets using x-ray spectroscopy, B. Yaakobi, D. Steel, E. Thorsos, A. Hauer and B. Perry, Phys. Rev. Lett. 39, 1526 (1977). 3. Compression measurements of neon-seeded glass microballoons irradiated by CO2 laser light, K.B. Mitchell, D.B. van Husteyn, G.H. McCall, P. Lee and H.R. Griem, Phys. Rev. Lett. 42, 232 (1979). 4. X-ray spectroscopic diagnosis of laser-produced plasmas, with emphasis on line broadening, J.D. Kilkenny, R.W. Lee, M.H. Key and J.G. Lunney, Phys. Rev. A 22, 2746 (1980). 5. Time-resolved spectroscopic measurement of high-density in argon-filled microballoon implosions, C.F. Hooper, Jr., D.P. Kilcrease, R.C. Mancini, L.A. Woltz, D.K. Bradley, P.A. Jaanimagi and M.C. Richardson, Phys. Rev. Lett. 63, 267 (1989). 6. High Z x-ray spectroscopy of laser-imploded capsules, B.A. Hammel, P. Bell, C.J. Keane, R.W. Lee and C.L.S. Lewis, Rev. Sci. lustrum. 61, 2774 (1990). 7. X-ray spectroscopic measurements of high densities and temperatures from indirectly driven inertial confinement fusion capsules, B. Hammel, C.J. Keane, M.D. Cable, D.R. Kania, J.D. Kilkenny, R.W. Lee and R. Pasha, Phys. Rev. Lett. 70, 1263 (1993). 8. Study of indirectly driven implosion by x-ray spectroscopic measurements, H. Nishimura, T. Kiso, H. Shiraga, T. Endo, K. Fujita, A. Sunahara, H. Takabe, Y. Kato and S. Nakai, Phys. Plasmas 2, 2063 (1995). 9. Spectroscopic analysis of hot dense plasmas: a focus on ion dynamics, C.F. Hooper, Jr., D.A. Haynes, Jr., D.T. Garber, R.C. Mancini, Y.T. Lee, D.K. Bradley, J. Delettrez, R. Epstein and P.A. Jaanimagi, Laser Part. Beams 14, 713 (1996).
362
R.C. Mancini, et. al.
10. The effects of ion dynamics and opacity on Stark broadened argon line profiles, D.A. Haynes, Jr., D.T. Garber, C.F. Hooper, Jr., R.C. Mancini, Y.T. Lee, D.K. Bradley, J. Delettrez, R. Epstein and P.A. Jaanimagi, Phys. Rev. E 53, 1042 (1996). 11. Spectroscopy of compressed high energy density matter, N. Woolsey, A. Asfaw, B. Hammel, C. Keane, C.A. Back, A. Calisti, C. Mosse, R. Stamm, B. Talin, J.S. Wark, R.W. Lee and L. Klein, Phys. Rev. E 53, 6396 (1996). 12. Evolution of electron temperature and electron density in indirectly driven spherical implosions, N. Woolsey, B.A. Hammel, C.J. Keane, A. Asfaw, C.A. Back, J.C. Moreno, J.K. Nash, A. Calisti, C. Mosse, R. Stamm, B. Talin, L. Klein and R.W. Lee, Phys. Rev. E 56, 2314 (1997). 13. Competing effects of collisional ionization and radiative cooling in inertially confined plasmas, N. Woolsey, B.A. Hammel, C.J. Keane, C.A. Back, J.C. Moreno, J.K. Nash, A. Calisti, C. Mosse, R. Stamm, B. Talin, A. Asfaw, L.S. Klein and R.W. Lee, Phys. Rev. E 57, 4650 (1998). 14. Characterization of direct-drive-implosion core conditions on OMEGA with time-resolved argon K-shell spectroscopy, S.P. Regan, J.A. Delettrez, R. Epstein, P.A. Jaanimagi, B. Yaakobi, V.A. Smalyuk, F.J. Marshall, D.D. Meyerhofer, W. Seka, D.A. Haynes, Jr., I.E. Golovkin and C.F. Hooper, Jr., Phys. Plasmas 9, 1357 (2002). 15. Time- and space-resolved x-ray spectroscopy for observation of the hot compressed core region in a laser driven implosion, Y. Ochi, K. Fujita, I. Niki, H. Nishimura, N. Izumi, A. Sunahara, S. Naruo, T. Kawamura, M. Fukao, H. Shiraga, H. Takabe, K. Mima, S. Nakai, I. Uschmann, R. Butzbach and E. Forster, J. Quant Spectrosc. Radiat. Transfer 65, 393 (2000). 16. The effects of gradients on the diagnostic use of spectral features from laser compressed plasmas, R.W. Lee, J. Quant. Spectrosc. Radiat. Transfer 2, 87 (1982). 17. Temporal evolution of temperature and density profiles of a laser compressed core, Y. Ochi, I. Golovkin, R. Mancini, I. Uschmann, A. Sunahra, H. Nishimura, K. Fujita, S. Louis, M. Nakai, H. Shiraga, N. Miyanaga, H. Azechi, R. Butzbach, E. Forster, J. Delettrez, J. Koch, R.W. Lee, and L. Klein, Rev. Sci. lustrum. 74, 1683 (2003). 18. Spectroscopic modeling and analysis of plasma conditions in implosion cores, I.E. Golovkin, PhD Dissertation, University of Nevada, Reno (2000). 19. Spectroscopic determination of dynamic plasma gradients in implosion cores, I. Golovkin, R. Mancini, S. Louis, Y. Ochi, K. Fujita, H. Nishimura, H. Shirga, N. Miyanaga, H. Azechi, R. Butzbach. I. Uschmann, E. Forster, J. Delettrez, J. Koch, R.W. Lee, L. Klein, Phys. Rev. Lett. 88, 045002 (2002). 20. High-order satellites and plasma gradients effects on the argon He/5 line opacity and intensity distribution, I.E. Golovkin and R.C. Mancini, J. Quant. Spectrosc. Radiat. Transfer 65, 273 (2000). 21. Opacity analysis of the He/3 line in argon-doped indirect drive implosions at NOVA, I.E. Golovkin, R.C. Mancini, N.C. Woolsey, C.A. Back, R.W. Lee and L. Klein, Proceedings of the First International Conference on Inertial Fusion and Science Applications, page 1123 (Elsevier Sc. Pub., 2000).
Plasma Gradient Determination
363
22. Transformation of observed radiances into radial distribution of the emission of a plasma, Bockasten, K., Journal of the Optical Society of America 51, 943 (1961). 23. Abel inversion of cryogenic laser target images, B. Yaakobi, F.J. Marshall and J. Delettrez, Optics Comm. 133, 43 (1997). 24. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley Pub., 1989). 25. Analysis of x-ray spectral data with genetic algorithms, I.E. Golovkin, R.C. Mancini, S.J. Louis, R.W. Lee and L. Klein, J. Quant. Spectrosc. Radial Transfer 75, 625 (2002). 26. Multi-objective optimization using the niched Pareto genetic algorithm, J. Horn, N. Nafpliotis and D.E. Goldberg, Proceedings of the First IEEE Conference on Evolutionary Computation, p82 (1994). 27. C.A. Coello-Coello, D.A. Van Veldhuizen and G.B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems (Kluwer Academic Pub., 2002). 28. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms (J. Wiley Pub., 2001). 29. R.L. Coldwell and G.I. Bamford, Theory and Operation of Spectral Analysis using ROBFIT (American Institute of Physics Pub., 1991). 30. Multi-criteria search and optimization: an application to x-ray plasma spectroscopy, I.E. Golovkin, R.C. Mancini, S.J. Louis, R.W. Lee and L. Klein, Proceedings of the 2000 Congress of Evolutionary Computation, pl521 (IEEE Press, 2000). 31. Parallel implementation of niched Pareto genetic algorithm code for x-ray plasma spectroscopy, I.E. Golovkin, R.C. Mancini and S.J. Louis, Proceedings of the 2002 Congress on Evolutionary Computation, (IEEE Press, 2002). 32. Time-resolved ten-channel monochromatic imaging of inertial confinement fusion plasmas, I. Uschmann, K. Fujita, I. Niki, R. Butzbach, H. Nishimura, J. Funakura, M. Nakai, E. Forster and K. Mima, Applied Optics 39, 5865 (2000). 33. Processing and analysis of x-ray line spectra and multi-monochromatic x-ray images for implosion core gradient determination, L.A. Welser, MS Thesis, University of Nevada, Reno (2003). 34. Spectroscopic determination of core gradients in inertial confinement fusion implosions, L.A. Welser, R.C. Mancini, I.E. Golovkin, J.A. Koch, H.E. Dalhed, R.W. Lee, F.J. Marshall, J.A. Delettrez and L. Klein, American Institute of Physics Conf. Proc. 635, 61 (2002). 35. Processing of multi-monochromatic x-ray images from indirect drive implosions at OMEGA, L.A. Welser, R.C. Mancini, J.A. Koch, S. Dalhed, R.W. Lee, I.E. Golovkin, F. Marshall, J. Delettrez and L. Klein, Rev. Sci. Instrum. 74, 1951 (2003). 36. Analysis of the spatial structure of inertial confinement fusion implosion cores at OMEGA, L.A. Welser, R.C. Mancini, J.A. Koch, N. Izumi, H.E. Dalhed, H. Scott, T.W. Barbee, Jr., R.W. Lee, I.E. Golovkin, F. Marshall, J. Delettrez and L. Klein, J. Quant. Spectrosc. Radiat. Transfer 81, 487 (2003).
364
R.C. Mancini, et. al.
37. Multi-spectral imaging of continuum emission for determination of temperature and density profiles inside implosion plasmas, J.A. Koch, S. Haan and R.C. Mancini, J. Quant. Spectrosc. Radiat. Transfer (2004, in press).
CHAPTER 16 APPLICATION OF MULTIOBJECTIVE EVOLUTIONARY OPTIMIZATION ALGORITHMS IN MEDICINE
Michael Lahanas Department of Medical Physics and Engineering Klinikum Offenbach, Starkenburgring 66 63069 Offenbach am Main, Germany E-mail: [email protected] We present an overview of the application of multiobjective evolutionary algorithms (MOEAs) in medicine. We describe how MOEAs are used for image processing, computer aided diagnosis, treatment planning and data mining tasks. The benefits of the use of MOEAs in comparison to conventional methods are discussed.
16.1. Introduction Multiple objectives have to be considered for many real world problems and an increasing number of such problems can be solved by multiobjective evolutionary optimization algorithms (MOEAs). l Previously, by using a weighted sum of the individual objectives, the multiobjective (MO) problem was transformed into a specific single objective (SO) optimization problem solved by SO optimization methods. A review of the applications of evolutionary algorithms in medicine has been presented by Pena-Reyes et al. 2 We consider here the application of MOEAs in medicine. MOEAs are now used for the solution of many medical problems, starting from the reconstruction of medical images from projections, the analysis of data for the diagnosis of symptoms up to treatment optimization. MOEAs can be used in scheduling optimization where the hospital resources have to be optimally used considering various constraints. For data mining tasks, such as partial classification, MOEAs can be applied for the discovery of rules in medical databases. MOEAs are used in medicine for the solution of inverse problems. What 365
366
M. Lahanas
is meant in simplistic terms: first you know the ideal answer, and second you take into account any constraints and mathematically determine the optimum parameter values to provide the ideal answer. In other words you have the result and the inverse problem is to determine the cause of this result. We have to solve inverse problems in order to determine the internal structure using measurements performed outside the human body, that is, non-invasively. This can be done e.g. by measuring radiation, X-ray, ultrasound etc. that passes through the body. Three strategies can be used for solving MO optimization problems: • An a priori method. The decision making (DCM) is specified in terms of a scalar function and an optimization engine is used to obtain the corresponding solution. This approach requires knowledge of the optimal weights (importance factors). Often such knowledge does not exist and the optimization procedure has to be repeated, by trial and error methods with different sets of weights, until a satisfactory solution is found. • An a posteriori method. An optimization engine exists which finds all solutions. Decision making is applied at the end of the optimization manually, or using a decision engine. This method decouples the optimization from the decision making process. A new decision is possible without having to repeat the optimization. • Mixture of a priori and a posteriori methods. During the optimization periodically information obtained may be used to reformulate the goals as some of these can not be achieved. Such a method is used by Yan Yu 3 for the solution of radiotherapy treatment planning problems. We consider only applications using the a posteriori method. The main two tasks of this approach are: • Obtaining a representative set of non-dominated solutions. • Using the trade-off information to select a solution from this set, i.e. the decision making process. 16.2. Medical Image Processing Three-dimensional imaging modalities such as x-ray computed tomography (CT), 3D-ultrasound or magnetic resonance (MR) imaging provide important information to the physician. Image reconstruction from projections is a key problem in medical image analysis. The image to be reconstructed
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 367
should reproduce as close as possible the measured projections. The reconstruction can be viewed as an optimization process that has to consider the effects of noise in the measured projections. An important task in medicine is the segmentation of the images in anatomical structures that can provide information such as the volume of a tumor. Multiple characteristics, so-called features, are used for the segmentation process, such as moments, fractal dimensions etc. MOEAs were applied for the reconstruction of 3D objects such as the left ventricle 4 and for the selection of an optimal subset of features for optimal segmentation results for MR brain images. 5 16.2.1. Medical Image
Reconstruction
A MOEA has been proposed by Xiaodong Li et al 6 for the CT image reconstruction problem. Three objectives were considered: • Minimize the sum of the squared error between the original projection data and the re-projection data. A(x) = ( G - H x ) T ( G - H x )
(1)
where x is an n dimensional projection data vector, H is an m x n projection matrix, G is an m dimensional projection data vector. • Maximize the entropy as a measurement of the global image smoothness. This is important if the image is contaminated with noise. n
Mx) = Ylxjlog(xj)
(2)
i=i
• Optimize the local smoothness in the neighborhood Nj of the j-th pixel of the reconstructed image: " /s(x) = ^v{xj,Xi)\xi
1 G Nj,v(xi,Xj) = - ^2(xi-Xj)2
(3)
For the optimization the MO genetic local search algorithm MOGLS 7 was used. The selection probability -P(x') for the i-th solution x* from N solutions is given by p(*1) =
v
;
(
,
j U
(4)
368
M. Lahanas
where the weight vector w = (wi,W2,w3) is used for /(x): 3
/(x) = ^ W j / i ( x )
(5)
j=i
Initially a small parameter value for A is used that increases towards the end of the optimization. A window crossover is used and a local search is applied. The number of steps and the step size are varied during the optimization. The parameters to be optimized are the grey values of the reconstructed image. A neighborhood is defined for two images using a parameter 5. A large step size S is used initially for the local search with only a few optimization steps. The step size decreases and the number of steps increases during the evolution in order to improve the convergence of the algorithm and the quality of the obtained solutions. The algorithm can be described by the following steps: (1) Initialize Ng solutions with random images. (2) Find the non-dominated set. (3) For each individual. (b) (b) (b) (b)
Produce a direction specified by the weights (wi,w^,w3). Select a pair of parent solutions using Eq. 4. Apply crossover. Select best children from crossover and apply local search at specified direction.
(4) Select Ne elite solutions using Eq. 4 and apply for each a local search. (5) Combine the solutions Ng and Ne. (6) If the termination conditions are satisfied stop else go to step (2). The reconstruction results using the Shepp-Logan phantom were compared with the fast backprojection Matlab 5.2 algorithm with and without random noise added to the data. The authors imply that the use of MOGLS provides a better control of the reconstructed image, especially for noisy data, than the conventional algorithm. The reconstructed images depends on how the single objectives are weighted. The MO approach provides in principle a set of all possible images out of which the optimal can be selected.
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 369
16.3. Computer Aided Diagnosis We can define computed-aided diagnosis (CAD) as the diagnosis a physician makes using output from a computerized analysis of medical data. Such information could be the malignancy likelihood from a breast mammogram. A data set of features from both normal (without disease) and abnormal (with disease) cases is used for "training" the classifier i.e. determining the classifier parameter values so that it correctly classifies other data sets of unknown pathology. The training of a classifier can be viewed as an optimization process where the quantity to be optimized is the performance of the classifier on an independent data set. Binary classifiers 8 consider two objective functions: The sensitivity describing how well they classify the abnormal cases and the specificity describing how well they classify the normal cases. There is a trade-off between these two objectives. Traditional methods of classifier training combine these two objective functions, or analogous class performance measures, into a single scalar objective function optimized by SO optimization techniques. Various combination functions are tried until a suitable objective function is found. 9 Most classifiers do not aggregate sensitivity and specificity directly such as artificial neural networks (ANN) that use a sum-of-squares error function. I0 A binary classifier separates two classes of observations and assigns new observations to one of the two classes the normal (no disease evident) and abnormal (indicative of disease) class, denoted by pn and pa, respectively. The set of features corresponding to an observation can be expressed as a vector x = [x\,X2,... ,xp]. The space spanned by the feature vector is denoted by S. An automated classifier uses a parameter vector w to partition S completely into two disjoint sets of observations: The set Cn(w) that belongs to class pn and the set Ca(w) belonging to class pa, i.e. Cn(w) U Ca(w) = S and Cn(w) n Ca(w) = 0. The vector w can represent, for example, the weights of an ANN or the threshold values in a rule-based classifier. Given a measurement x, the classifier assigns x to class pn if x € Cn (w) or to class pa if x £ Ca (w). For MO diagnostic classification the members of the Pareto-optimal set correspond to operating points on an optimal receiver operating characteristic (ROC) curve 11, whose performances describe the limiting sensitivityspecificity tradeoffs that the classifier can provide for the given training data set.
370
M. Lahanas
16.3.1. Optimization of Diagnostic
Classifiers
u
Kupinski et al applied the niched Pareto genetic algorithm NPGA 12 to optimize the performance of two diagnostic classifiers and to generate ROCs. A linear classifier and an ANN were trained both by conventional methods and by NPGA. For NPGA a binary representation was used for the parameter vector w. The optimization was performed using 100 generations. A training set of 100 normal and 100 abnormal cases were used and for the evaluation a set of 10000 normal and 10000 abnormal cases respectively. The population size was 500 members. A single point crossover and standard mutation was used. The crossover and mutation probability was 0.3 and 0.05 respectively. These parameters were found to be suitable for the problems studied even if the crossover rate seems to be unusually small. The NPGA convergence depends strongly on the tournament size and a value of four was used. The sharing parameter was 0.1 or 10% of the range of each objective. For the linear classifier three parameters are optimized. For the training of the ANN with two inputs, two hidden units and one output, nine parameters were optimized. A very large population of 3000 solutions was used. For both cases, in general, the results using NPGA were superior to the results obtained with conventional training methods. The conventional method for the ANN optimization was sensitive on local minima, much more than NPGA. The task of classifier optimization and ROC curve generation are combined with NPGA into a single task. It was demonstrated that constructing the ROC curve in this way may result in a better ROC curve than is produced by conventional methods. The optimization with NPGA requires more computation time than conventional non-stochastic optimization methods. For the conventional ROC curve generation 20 scalar optimizations were used producing solutions that were not evenly distributed. NPGA produced a much larger number of uniformly distributed points on the ROC curve. If more classes have to be considered in the classification then the performance of NPGA over the conventional methods in terms of optimization time will be more advantageous. 16.3.2. Rules-Based Atrial Disease
Diagnosis
An example of CAD using MOEAs for the diagnosis of Paroxysmal Atrial Fibrillation (PAF) was presented by F. Toro et al. 13 The heart arrhythmia
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 371
causes most frequently embolic events that can generate cerebrovascular accidents. The diagnosis of patients that suffer PAF used the analysis of Electrocardiogram (ECG) traces with no explicit fibrillation episode. This non-invasive examination can be used to decide whether more specific and complex diagnosis testing is required. A database for PAF diagnosis applications was used that included registers obtained from 25 healthy individuals and 25 patients diagnosed with PAF. The ECG register was described by 48 parameters (pi ..-Pis) that characterize each subject. The diagnosis was based on weighted threshold dependent decision rules determined by a MOEA applied to improve the ability to automatically discriminate registers of the two groups with a certain degree of accuracy. For each parameter four different decision rules were used: (1) (2) (3) (4)
If Pi < UiiLowi) then CPAF = CPAF + Wa If P i < Ui(Low2) then CPAF = CPAF - Wi2 If pi > UiiHigh) then CPAF = CPAF + Wi3 If ^ > Ui(High2) then CPAF = CPAF - Wi4
where U represents different thresholds and weights Wij € [0,1]. For the 48 parameters we have a total of 192 weights and 192 threshold parameters. CPAF is a level that determines the final diagnosis. After a statistical study a subset of 32 rules and their associated 32 thresholds was selected that maximizes the discrimination power of the classifier. If the CPAF level is within a security interval [-F, F] then there is not enough certainty about the diagnosis and the case is left undiagnosed. The diagnosis is positive (a PAF patient) if CPAF > F and negative if CPAF < F. The MO procedure uses two optimization objectives: the classification rate CR and the coverage level CL: (1) CR = Number of correct diagnosed cases / Number of diagnosed cases (2) CL — Number of diagnosed cases / Total number of cases Two PAF diagnosis cases have been considered: • Optimization of weights of decision rules given by an expert. In this case the chromosome length is 32. • Optimization of threshold and weights of the decision rules given by an expert. In this case the chromosome length is 64. Three MOEA algorithms were tested: The Strength Pareto Evolutionary
372
M. Lahanas
Algorithm SPEA 14, the Single Front genetic Algorithm SFGA 15 and the New Single Front Genetic Algorithm NSFGA. 16 A mutation and crossover probability 0.01 and 0.6, respectively was used and each algorithm evolved 1000 generations with a population size set to 200. NSFGA, SFGA and SPEA showed a similar performance. The best results were obtained when both threshold and weights were optimized. The results are similar to other results using classic schemes but the MO optimization leads to multiple solutions that can be of interest for certain patients who suffer from other disorders and certain solutions could be more suitable. 16.4. Treatment Planning Every year more than one million patients only in the United States will be diagnosed with cancer. More than 500000 of these will be treated with radiation therapy. 17 Cancer cells have a smaller probability than healthy normal cells to survive the radiation damage. The dose is the amount of energy deposited per unit of mass. The physical and biological characteristics of the patient anatomy and of the source, such as intensity and geometry are used for the calculation of the dose function, i.e. the absorbed dose at a point in the treatment volume. The dose distribution specifies the corresponding three-dimensional non-negative scalar field. A dose distribution is possible if there is a source distribution which is able to generate it. A physician prescribes the so-called desired dose function i.e. the absorbed dose as a function of the location in the body. The objectives of dose optimization are: • Deliver a sufficiently high dose in the Planning Target Volume (PTV) which includes besides the Gross Tumor Volume (GTV) an additional margin accounting for position inaccuracies, patient movements, etc. • Protect the surrounding normal tissue (NT) and organs at risk (OARs) from excessive radiation. The dose should be smaller than a critical dose Dcra specific for each OAR. Radiation oncologist use for the evaluation of the dose distribution quality a cumulative dose volume histogram (DVH) for each structure ( PTV, NT or OARs), which displays the fraction of the structure that receives at least a specified dose level. The objectives are called DVH-based objectives
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 373
if expressed in terms of DVHs related values. The determination of the dose distribution for a given source distribution, the so-called forward problem, is possible and a unique solution exists. For the inverse problem, i.e. the determination of the source distribution for a given dose distribution, is not always possible or the solution is not unique. Optimization algorithms are therefore used to minimize the difference between the desired and the obtained dose function. 16.4.1. Brachytherapy High dose rate (HDR) brachytherapy is a treatment method for cancer where empty catheters are inserted within the tumor volume. A single 192Ir source is moved inside the catheters at discrete positions (source dwell positions, SDP) using a computer controlled machine. The dose optimization problem considers the determination of the n dwell times (or simply weights) for which the source is at rest and delivers radiation at each of the n dwell positions, resulting in a dose distribution which is as close as possible to the desired dose function. The range of n varies from 20 to 300. If the positions and number of catheters and the SDPs are given after the implantation of the catheters, we term the process postplanning. The optimization process to obtain an optimal dose distribution is called dose optimization. The additional determination of an optimal number of catheters and their position, so-called inverse planning, is important as a reduction of the number of catheters simplifies the treatment plan in terms of time and complexity, reduces the possibility of treatment errors and is less invasive for the patient. Dose optimization can be considered as a special type of inverse planning where the positions and number of catheters and the SDPs are fixed. 16.4.1.1. Dose Optimization for High Dose Rate Brachytherapy MO dose optimization for HDR brachytherapy was first applied by Lahanas et al18 using NPGA, 12 NSGA 19 and NRGA 20 with a real encoding for the SDPs weights. A number of 3-5 DVH derived objectives, depending on the number of OARs was used. The results were superior to optimization results using a commercial treatment planning system. More effective was the application of SPEA 21 using dose variance based objectives that enables the support from deterministic algorithms that provide 10-20 solutions with which the population is initialized. A faster op-
374
M. Lahanas
timization than with SPEA was possible using NSGA-II. 22'23 Both SPEA and NSGA-II require the support from a deterministic algorithm that improves significantly the optimization results and the convergence speed. Pareto global optimal solutions can be obtained with L-FBGS that allows to evaluate the performance of MOEAs for the HDR dose optimization problem. The optimization with 100 individuals and 100 generations requires less than one minute which is the time required to obtain a single solution with simulated annealing. NSGA-II was used for dose optimization using DVH-based objectives, for which deterministic algorithms cannot be used as multiple local minima exist. 25 The DVH-based objectives provide a larger spectrum of solutions than the dose variance-based objectives. The archiving method of PAES 26 was included and the algorithm was supported by L-BFGS solutions using variance-based objectives. 23 A SBX crossover 27 and polynomial mutation 28 were used. Best results were obtained for a crossover probability in the range 0.7-1.0 and a mutation probability 0.001-0.01. 16.4.1.2. Inverse Planning for HDR Brachytherapy The NSGA-II algorithm was applied for the HDR brachytherapy inverse planning problem 29 where the optimal position and number of catheters has to be found additional to the dwell position weights of the selected catheters. A two-component chromosome is used. The first part W contains the dwell weight of each SDP for each catheter with a double precision floatingpoint representation. The second part C is a binary string which represents which catheters have been selected: the so-called active catheters. The inverse planning algorithm is described by the following steps: (1) Determine geometrically the set of all allowed catheters. (2) Initialize individuals with solutions from a global optimization algorithm. (3) Perform a selection based on constrained domination ranking. (4) Perform a SBX crossover for the SDP weights chromosome and one point crossover for the catheter chromosome with rescaled dwell times. (5) Perform a polynomial mutation for the SDP weights chromosome and flip mutation for the catheter chromosome with rescaled dwell times.
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 375
(6) Perform a repair mechanism to set the number of used catheters of each solution in a given range. (7) Reset scaling according to number of active SDPs. (8) Evaluate dosimetry for each individual. (9) If termination criteria are satisfied output set of non-dominated archived solutions else go to (3). Inverse planning considers a range of solutions with different number of active SDPs, Therefore the dwell weights of the parents before crossover are divided by the number of active SDPs to be independent on this number. After mutation the weights of each offspring are multiplied by the number of SDPs in the active catheters encoded in the C chromosome. For dose optimization and inverse planning decision making (DCM) tools are necessary to filter a single solution 30>31 from the non-dominated set that matches at best the goals of the treatment planner. Dose optimization and inverse planning with MOEAs together with DCM tools were implemented in the commercial Real-Time HDR prostate planning system SWIFT™ (Nucletron B.V., Veenendaal, The Netherlands) and patients are now treated by this system. A display table of a list of values for all solutions of the objectives, DVHs for all OARs, the NT and the PTV of each solution is provided. Other parameters are Dg0 (dose that covers 90% of the PTV), V150 (percentage of PTV that receives more than 150% of the prescription dose) and the extreme dose values. The entire table for every such quantity can then be sorted and solutions can be selected and highlighted by the treatment planner. Constraints can be applied such as to show only solutions with a PTV coverage, i.e. percentage of the PTV that receives at least 100% of the prescription dose, larger than a specified value. Solutions that do not satisfy the constraints are removed from the list. This reduces the number of solutions and simplifies the selection of an optimal solution. The DVHs of all selected solutions can be displayed and compared, see Fig. 16.1. Other decision-making tool are projections of the Pareto front onto a pair of selected objectives. For M objectives the number of such projections is M ( M - l ) / 2 . The position of selected solutions can be seen in these projections. This helps to identify their position in the multidimensional Pareto front and to quantify the degree of correlation between the objectives and of the possibilities provided by the non-dominated set. The Pareto front provides information such as: the range of values for each objective and the trade-off
376
M. Lahanas
Fig. 16.1. Example of DVHs (a) for the PTV and (b) for the urethra of a representative set of non-dominated solutions. A single solution selected by a treatment planner is shown.
between other DVH derived quantities, see Fig. 16.2.
Fig. 16.2. Example of a trade-off between the percent of PTV that is covered at least with the prescribed dose DVH(D re j) and the percent of volume with a dose higher than a critical dose limit (a) for the urethra and (b) for the rectum. For the urethra a rapidly increasing fraction receives an over dosage as the coverage for the PTV increases above 80%.
With MOEAs the best possible solution can be obtained, considering the objective functions and the implant geometry, and this increases the probability of treatment success. 16.4.2. External Beam
Radiotherapy
In external beam radiotherapy, or teletherapy, high energy photon beams are emitted from a source on a rotating gantry with the patient placed so that the tumor is at the center of the rotation axis. Haas et al 32'33
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 377
proposed the use of MOEAs for the solution of the two main problems in radiotherapy treatment planning. First find an optimal number of beams and their orientation and second determine the optimum intensity distribution for each beam. Both problems are considered separately. A beam configuration is selected based on experience or using geometric methods. Then the intensity distributions of these beams is optimized. In the last few years mostly SO algorithms have been proposed for the simultaneous solution of both problems. 16.4.2.1. Geometrical Optimization of Beam Orientations The aim of beam orientation optimization is to find a configuration of beams such that a desired dose distribution can be achieved. A single beam would deposit a very high dose to the NT. Using more beams it is possible to increase the dose in the tumor, keeping the dose in the surrounding healthy tissue at a sufficiently low level, but the treatment complexity increases. The idea of using geometrical considerations in the cost function was first proposed by Haas et al 34 and NPGA to obtain an optimum beam configuration. Simplifications such a limitation in 2D, using the most representative 2D computed tomography slice in the plan have been used. The geometric objective functions to be minimized are: (1) Difference between the area where all M beams overlap and the area of the PTA: fPTA = area(Si n B2 • • • D BM) - area(PTA)
(6)
(2) Overlap area between each beam and the j-th OAR: •^
areaiBi D OAR A
foAR, =J2 Pi i=l
_
~
~M
f /3(SPTA - SOAR) if SOAR < SPTA 11
, ^
(?) /gs
if SOAR > SPTA
SPTA and SOAR are distances shown in Fig. 16.3 and j3 a parameter
that favors beam entry points further away from OARs. (3) Overlap from pair wise beam intersections to minimize hot spots: M-\
fNT=Y;Yj i=l
M area B
j=i
( i
n B
i)
(9)
378
M. Lahanas
Fig. 16.3. Geometric parameters used for the solution of the beam orientation problem. The gantry angle 6 of a field (beam) is shown. The patient body including the normal tissue NT, one organ at risk (OAR) and the planning target area (PTA) which includes the tumor is shown.
An example of the geometry of a radiation field and parameters used by Haas et al is shown in Fig. 16.3. An integer representation for the beam gantry angle was used. The length of each chromosome is equal to the number of beams involved in the plan. A particular solution, i.e. a chromosome, is represented as a vector CT = (#i,... 9M) where 9i is the i-th individual beam gantry angle. For the integer representation a intermediate recombination is used 35, such that the parents Cp\ and Cp2 produce the offspring CO: Co = round{CP1 + j{CP2 - CP1))
(10)
where 7 is a random number in the interval [—0.25,1.25]. 36 A mutation operator is used to introduce new beam angles into the population by generating integers that lie in the range [0... 359°]. Important was the inclusion of problem specific operators which attempt to replicate the approach followed by experienced treatment planners. Such an operator is used to generate k equispaced beams as this distribution will reduce the area of overlap between the beams. One gantry angle from a particular chromosome is selected randomly positioning the k — 1 remaining beams evenly. A further mutation operator is used to perform a local search by shifting randomly by a small amount (less than 15°) one of the selected beam gantry angles.
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 379
16.4.2.2. Intensity Modulated Beam Radiotherapy Dose Optimization In IMRT each beam is divided in a number of small beamlets (bixels), see Fig. 16.4. The intensity of each beamlet can individually be adjusted. A sparse dose matrix is precalculated and contains the dose value at each sampling point from each bixel with a unit radiation intensity. The intensity (weight) of each beamlet has to be determined such that the produced dose distribution is "optimal". The number of parameters can be as large as 10000.
Fig. 16.4. Principle of IMRT dose optimization. The contours of the body, the PTV and one OAR are shown. The problem is to determine the intensities of the tiny subdivisions (bixels) of each beam, so that the resulting dose distribution is optimal.
Lahanas et al used NSGA-IIc 37 algorithm for the optimization of the intensity distribution in IMRT where the orientation and the number of beams are fixed. 38 The dose variance-based objective functions are: for the PTV the dose variance fprv around the prescription dose Dref, for NT the sum of the squared dose values /NT and for each OAR the variance IOAR for dose values above a specific critical dose value D°rAR. 1
NPTV
/— = ^ £ « T V - ^ 7 ) 2
(ID (12)
380
M. Lahanas
f
JOAR=Jj
,
NoAR V^ H TJfjOAR d
2^ \ i
r,OAR\/jOAR d
~ Dcr
)\ j
nOAR\2
~ Dcr
)
/-• n \
V16)
H(x) is the Heaviside step function. d?TV, d^T and d®AR are the calculated dose values at the j-th sampling pointforthe PTV, the NT and each OAR respectively. Npxv, NNT and NOAR are the corresponding number of sampling points. Depending on the number of OARs we have 3-6 objectives. For the multidimensional problem it was required to use supported solutions 39 , i.e. solutions initialized by another optimization algorithm. Even if constraints can be used for some of the objectives a large number of non-dominated solutions is required to obtain a representative set of the multidimensional Pareto front. An archive was used, similar to the PAES algorithm, with all non-dominated solutions archived. This allows to keep the population size in the range 200-500 and the optimization time below one hour. Tests show that NSGA-IIc and SPEA are not able to produce high quality solutions. Only a very small local Pareto optimal front can be found far away from the very extended global Pareto front that can be obtained by the gradient based optimization algorithm L-BFGS. 24 Strong correlations exist between the optimization parameters and could be the reason of the efficiency of the L-BFGS algorithm that uses gradient information not available to the genetic algorithms. Using a fraction of solutions initialized by L-BFGS and an arithmetic crossover NSGA-IIc is able to produce a representative set of non-dominated solutions in a time less than the time required using sequentially the L-BFGS algorithm each time with a different set of importance factors. Previous methods used in IMRT include simulated annealing 40 which is very slow, iterative approaches and filtered back-projection 41 . The large number of objectives and the non-linear mapping from decision to objective space requires a very large number of solutions to obtain a representative non-dominated set with SO optimization algorithms. 4 4 The benefit of using MOEAs is the information of the trade-off between the objectives which is essential for selecting an optimal solution. E. Schreibmann et al 45 applied NSGA-IIc for IMRT inverse planning. The user specifies a minimum and maximum number of beams, usually 3-9, to be considered. Constraints can be applied by using the constraint domination relation. A two-component chromosome is used, with a part for weights and beams, similar to inverse planning in brachytherapy. After mutation L-BFGS is applied with 30 iterations to optimize the intensity
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 381
distributions of each solution. The number of iterations increases during the evolution. Clinical acceptable results can be obtained in one hour. More than 5000 archived solutions are obtained after 200 generations using a population size of 200 solutions. Arithmetic crossover is used with a random mixing parameter a £ [0,1] and a flip mutation. A mutation and crossover probability 0.01 and 0.9 respectively is used. 16.4.3. Cancer Chemotherapy Petrovski et al 46 applied MOEAs for the cancer chemotherapy treatment problem. Anti-cancer drugs are given to a patient in n doses at times £i , . . . , £ „ . Each dose is a cocktail of d drugs characterized by the concentrations Cij,i = 1 , . . . , n , and j — l,...,d. The problem is the optimization of the concentrations C,j. The response of the tumor to the chemotherapy treatment is modelled analytically by:
^T=N(t) \]n(J^-£KjJ2Cij(H(t-ti)-H(t-ti+1?)
(14)
where N(t) is the number of tumor cells at time t, A and 0 are tumor growth parameters, H(t) the Heaviside step function and KJ denote the efficacy of the anticancer drugs. The objectives of the MO optimization are: (1) Maximization of the tumor eradication
/i(c)= in
f («!))*
<15)
(2) Prolong the patient survival time T (16)
f2(c,tlt...,tn)=T
The toxic nature the drugs provide limits on the single and combined concentration for the treatment. The concentrations C%j have to satisfy various constraints: • Maximum instantaneous dose Cmax for each drug. 0 i ( c ) = {CmaXti-Cij
>0,Vie
[ l , . . . , n ] , V j e [l,...,d]}
(17)
382
M. Lahanas
• Maximum cumulative dose Ccum for each drug. n
(18)
8=1
• Maximum tumor size of the tumor allowed.
(19)
• Restriction of toxic side effects by the chemotherapy. d
ff4(c) = {C s -e//, fc -53»?*iC* i >O ) Vte[l,...,n],Vfce[l,... ) m]} (20) %j represent the risk of damaging the k-th organ or tissue by the j-th drug SPEA was used for the MO chemotherapy treatment optimization with a maximum of 10000 generations. The crossover probability was 0.6 and a large mutation probability 0.1 was used. The population size was TV = 50 and the external archive size of SPEA was 5. A binary encoding was used for the decision variables Cij. Each individual is represented by n ddimensional vectors Cij each of them with 4 bytes that corresponds to 25 possible concentration units for each drug. For the optimization the constraints are added to the objective functions as penalties m
^Pjmax2{-gm{c),Q)
(21)
i=i
where Pj are penalty parameters. For a breast cancer chemotherapy treatment case d = 3 drugs were considered (Taxotere, Adriamycon and Cisplatinum) using n = 10. New treatment scenarios have been found by the application of SPEA. The representative set contains a number of treatment strategies some of them which were not found by SO optimization algorithms. This provides the therapists a larger repertoire of treatment strategies out of which the most suitable for certain cases can be used. 16.5. Data Mining Knowledge Discovery can be seen as the process of identifying novel, useful and understandable patterns in large data sets.
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 383
The goal of classification 4 7 is to predict the value (the class) of a userspecified goal attribute based on the values of other attributes, the so-called predicting attributes. Classification rules can be considered a particular kind of prediction rules where the rule antecedent ("IF part") contains a combination - typically, a conjunction - of conditions on predicting attribute values, and the rule consequent ("THEN part") contains a predicted value for the goal attribute. Complete classification may be infeasible when there are a very large number of class attributes. Partial classification, known as nugget discovery, seeks to find patterns that represent a "strong" description of a particular class. The consequent is fixed to be a particular named class. Given a record t, antecedent^) is true if t satisfies the predicate antecedent. Similarly consequent^) is true if t satisfies the predicate consequent. The subsets defined by the antecedent or consequent are the sets of records for which the relevant predicate is true. Three sets of records are defined 4 7 : A = {t G D\antecedent(t)} i.e. the set of records defined by the antecedent, B = {t 6 D\consequent(t)} i.e. the set of records defined by the consequent, C = {t € D\antecedent(t) A consequent(t)}. The cardinality of these sets are a, b and c respectively. The confidence confer) and the coverage cov(r) of a rule r are: • conf(r) = c/a • cov(r) = c/b A strong rule may be defined as one that meets certain confidence and coverage thresholds normally set by a user. 16.5.1. Partial Classification Iglesia et al47 used NSGA-II for nugget discovery. An alternative algorithm ARAC, which can deliver the Pareto global optimal front of all partial classification rules above a specified confidence/coverage threshold, was used for the analysis of the NSGA-II results. The objectives used are conf{r) and conf(r). The antecedent comprises a conjunction of Attribute Tests, ATs. A binary encoded string is used to represent the solution as a conjunctive rule. The first part of the string represents the m nominal attributes. Each numeric attribute is represented by a set of Gray-coded lower and upper limits
384
M. Lahanas
using 10 bits. For all attributes, when the data is loaded, the maximum and minimum values are calculated and stored. The second part of the string represents categorical attributes, with as many bits for each attribute as distinct values the categorical attribute can take. If a bit assigned to a categorical attribute is set to 0 then the corresponding label is included as an inequality in one of the conjuncts. To evaluate a solution, the bit string is first decoded, and the data in the database is scanned. For a database with n attributes, the ATs for nominal attributes can be expressed in various forms such as: ATj — v where v is a value from the domain of ATj, for some 1 < j < n. A database record x meets this simple value test if x[ATj] = v. ATj ^ v for some 1 < j < n. A record x meets this inequality test if x[ATj] < v. A bit string decoded represents to the following format: IF h
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 385
they were at the time of the interview. The problem is to predict the current contraceptive method choice (42% "no use", 23% "long-term methods", or 35% "short-term methods" cases) of a woman based on demographic and socio-economic characteristics. The results of NSGA-II were compared with results obtained with ARAC which is able to find all non-dominated solutions, thus the global optimal Pareto front. For large databases the computational time increases rapidly and the complexity is such that its actual computational time is unknown. The classification time by MOEA is proportional to the database size, but the number of rules found is limited by the population size. The results showed that NSGA-II fairly well reproduced the non-dominated front obtained by ARAC. For large databases, MOEA can be used to find a good approximation of the Pareto optimal set of rules. ARAC can be used to find an initial set of rules, to be used for the initialization of the MOEA algorithm. For large databases, the initial set of solutions found by ARAC may be a constrained set, but then MOEA can be used to drive the search further without constraints. NSGA-II and ARAC can be used in combination for knowledge discovery for large databases for the partial classification task. 16.5.2. Identification of Multiple Gene Subsets Data mining has proven to be an important tool in DNA microarray data analysis by uncovering patterns and relationships in gene expression data. Microarrays have revolutionized the way in which researchers analyze gene expression patterns. It is possible to screen a large number of genes and to observe their activity under various conditions. This gene-expression profiling is expected to revolutionize cancer diagnosis. Reddy and Deb 48 applied the NSGA-II algorithm for the identification of gene subsets for the classification of samples. Microarray data for three types of cancer (leukemia, lymphoma and colon) were analyzed. The data sets contain levels of gene expressions for a few 1000 genes. The data is divided in a training and a test subset and the objectives are: (1) Minimize the gene subset size. (2) Minimize the number of misclassifications in the training set. (3) Minimize the number of misclassifications in the test set. A binary string was used the length of which corresponds to the number of genes to be considered. Genes with a corresponding bit set to 1 are
386
M. Lahanas
included in the gene subset of the individual. The population is initialized with only 10% of the bits set to 1. A single point crossover and a bit-wise mutation operator were used. For the leukemia case 6817 genes are considered and a population size of 1000 individuals with 2000 generations was used. The lymphoma and colon data set have 4026 and 2000 genes respectively. The results showed that a 100% classification for leukemia and lymphoma can be obtained with only a few genes. A similar result is obtained for the colon data set with a smaller classification rate. The NSGA-II algorithm was modified to accumulate solutions that although have a different phenotype have identical objective values. This multi-modal NSGA-II version discovered as an example for the leukemia set 630 different three-gene combinations that achieve a perfect classification. A smaller number of genes have been found that are frequently used in the various combinations with their role has to be examined by a biological point of view. The analysis shows that MOEAs are able to provide subsets of genes that are important for a high level of classification. 16.6. Conclusions MOEAs provide in medicine for image reconstruction problems, classification and CAD a range of solutions out of which an optimal solution can be obtained. These selected solutions are in general better than solutions obtained by SO optimization algorithms. The performance of MOEAs over conventional scalar objective optimizations are more pronounced for complex problems with many objectives. A very large number of SO optimization runs is necessary to obtain a representative non-dominated set. The mapping from decision to objective space produces solutions by the conventional methods that are clustered and not necessarily uniformly distributed over the entire Pareto front. MOEAs can produce more uniformly distributed solutions and in regions not accessible by conventional convex weighted scalar optimization. MOEAs are now used increasingly in radiotherapy treatment planning in clinical practice, especially in HDR brachytherapy. The possibility exists to use MOEAs for low dose rate (LDR) brachytherapy treatment, where now SO genetic algorithms or evolutionary MO methods guided by artificial intelligence are used.3 MOEAs have been applied for IMRT dose optimization and inverse planning where the number of parameters is very
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 387
large. MOEAs with a support from deterministic algorithms can provide a representative set of non-dominated solutions with a clinical acceptable quality. Inverse planning with MOEAs determines optimal beam directions and numbers of fields for specific types of cancer. MO determines a representative set of the entire, sometimes unexpected complex, Pareto front. While the optimization aspects have been discussed in details the decision making process for the selection of an optimal solution is not considered in most of the presented MOEA applications. For HDR brachytherapy treatment planning with MOEAs tools have been developed that allow the planner via visualization methods to determine an optimal solution. For some high dimensional problems MOEAs alone fail to produce sufficient good solutions. The solutions are far from the global Pareto optimal front. Even with a large number of generations the population converges prematurely. Initialization of the population with solutions provided by other methods and knowledge inclusion is important and improves significantly the performance of MOEAs. Such a case are hybrid algorithms applied in IMRT which requires the optimization of as many as 5000 or more parameters. In cooperation with deterministic gradient based optimization algorithms they produce sufficiently fast clinical acceptable solutions.
References 1. C. A. Coello Coello, D. A. Van Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems, (Kluwer Academic Publishers, New York, 2002). 2. C. A. Pena-Reyes and M. Sipper, Evolutionary Computation in Medicine: An Overview, Artificial Intelligence in Medicine 19, 1-23 (2000). 3. Yan Yu. Multiobjective decision theory for computational optimization in radiation therapy, Med. Phys. 24, 1445-1454 (1997). 4. J. Aguilar and P. Miranda, Resolution of the left ventricle 3D reconstruction problem using approaches based on genetic algorithm for multiobjective problems, In Proceeding of the 1999 Conference on Evolutionary Computation (eds. P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao and A. Zalzala), pp. 913-920, Vol. 2, Washington, D.C., July 1999. 5. H. E. Rickard, Feature selection for self-organizing feature map neural networks with applications in medical image segmentation, Masters Thesis, Department of Electrical Engineering, University of Louisville, Dec. 2001. 6. X. Li, T. Jiang and D. J. Evans, Medical image reconstruction using a multi-objective genetic local search algorithm, Intern. J. Computer Math. 74, 301-314 (2000). 7. H. Ishibuchi and T. Murata, A multi-objective genetic local search algorithm
388
8. 9.
10. 11.
12.
13.
14. 15.
16.
17. 18. 19. 20.
M. Lahanas
and its application to flowshop scheduling, IEEE Trans. Systems, Man and Cybernetics, Part C: Application and Reviews 28, 392-403 (1998). L. Devroye, L. Gyrfi and G. Lugosi, A probabilistic Theory of Pattern Recognition, (New York, Springer Verlag, 1996). M. A. Anastasio, H. Yoshida, R. Nagel, R. M. Nishikawa and K. Doi, A genetic algorithm-based method for optimizing the performance of a computer-aided diagnosis scheme for detection of clustered microcalcifications in mammograms, Med. Phys. 25, 1613-1620 (1998). C. Bishop, Neural networks for pattern recognition, (Oxford Univ. Press, Oxford UK, 1995). M. A. Kupinski and M. A. Anastasio, Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves, IEEE Transactions in medical imaging 18, 675-685 (1999). J. Horn, N. Nafpliotis and D. E. Goldberg, A Niched Pareto Genetic Algorithm for Multiobjective Optimization, Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, IEEE Service Center, Vol. 1, pp. 82-87, June 1994. F. de Toro, E. Ros, S. Mota and J. Ortega, Non-invasive Atrial Disease Diagnosis Using Decision Rules: A Multi-objective Optimization Approach, in C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele (eds.), Evolutionary Multi-Criterion Optimization. Second International Conference, EMO 2003, pp. 638-647, Springer. Lecture Notes in Computer Science. Vol. 2632, Faro, Portugal, April 2003. E. Zitzler and L. Thiele, Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach, IEEE Transactions on Evolutionary Computation 37, 257-271 (1999). F. de Toro, J. Ortega, J. Fernandez and A. F. Diaz, PSFGA: A Parallel Genetic Algorithm for Multiobjective Optimization. 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, F. Vajda and N. Podhorszki (eds.), IEEE, pp. 384-391, 2002. F. Toro, E. Ros, S. Mota and J. Ortega, Multi-objective Optimization Evolutionary Algorithms Applied to Paroxysmal Atrial Fibrillation Diagnosis Based on the k-Nearest Neighbours Classifier, pp. 313-318 in F. J. Garijo, J. C. R. Santos, M. Toro (eds.): Advances in Artificial Intelligence IBERAMIA 2002 Proceedings. Lecture Notes in Computer Science 2527 Springer 2002. C. A. Peres and L. W. Brady, Principles and practice of radiotherapy. (Lippincott-Raven, Philadelphia, 3rd edition, 1998). M. Lahanas, D. Baltas and N. Zamboglou, Anatomy-based threedimensional dose optimization in brachytherapy using multiobjective genetic algorithms, Med. Phys. 26, 1904-1918 (1999). N. Srinivas and K. Deb, Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation 2, 221-248 (1994). M. Fonseca and P. J. Fleming, Multiobjective optimization and multiple
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 389
21.
22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
constraint handling with evolutionary algorithms I: A unified formulation. Research Report 564, Dept. Automatic Control and Systems Eng. University of Sheffield, Sheffield, U.K., Jan. 1995. N. Milickovic, M. Lahanas, D. Baltas and N. Zamboglou, Comparison of evolutionary and deterministic multiobjective algorithms for dose optimization in brachytherapy, in Proceedings of the first international conference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Computer Science, Vol. 1993, Springer, pp. 167-180, 2001. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A Past and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182-197 (2002). M. Lahanas, D. Baltas and N. Zamboglou, A hybrid evolutionary multiobjective algorithm for anatomy based dose optimisation algorithm in HDR brachytherapy. Phys. Med. Biol. 48, 399-415 (2003). D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming 45, 503-528 (1989). J. O. Deasy, Multiple local minima in radiotherapy optimization problems with dose-volume constraints, Med. Phys. 24, 1157-1161 (1997). J. D. Knowles and D. W. Corne, Approximating the nondominated front using the Pareto Archived Evolution Strategy, Evolutionary Computation 8, 149-172 (2000). K. Deb and R. B. Agrawal, Simulated binary crossover for continuous search space, Complex Systems 9, 115-148 (1995). K. Deb and M. Goyal, A combined genetic adaptive search (GeneAS) for engineering design Computer Science and Informatics 26, 30-45 (1996). M. Lahanas, K. Karouzakis, S. Giannouli, R. F. Mould and D. Baltas, Inverse planning in brachytherapy: Radium to High Dose Rate 192 Iridium Afterloading, to be published in Nowotwory Journal of Oncology 2004. C. A. Coello Coello, Handling Preferences in Evolutionary Multiobjective Optimization: A Survey, Piscataway, Congress on Evolutionary Computation, IEEE Service Center, Vol. 1, pp. 30-37, New Jersey, July 2000. D. Cvetkovic and I. C. Parmee, Preferences and their Application in Evolutionary Multiobjective Optimisation, IEEE Transactions on Evolutionary Computation 6, 42-57 (2002). O. C. L. Haas, K. J. Burnham and J. A. Mills, On Improving the selectivity in the treatment of cancer: a systems modelling and optimization approach. J. Control Engineering Practice 5, 1739-1745 (1997). O. C. L. Haas, Radiotherapy Treatment Planning: New System Approaches. Advances in Industrial Control Monograph. (Springer Verlag, London, 1999). O. C. L. Haas, K. J. Burnham and J. A. Mills Optimization of beam orientation in radiotherapy using planar geometry Phys. Med. Biol. 43, 2179-2193 (1998). O. C. L. Haas, Optimisation and control systems modelling in radiotherapy treatment planning, PhD Thesis, Coventry University, 1997.
390
M. Lahanas
36. A. Chipperfield, P. Fleming, H. Polheim and C. Fonseca C 1995 Genetic algorithm toolbox user's guide, Research Report 512. University of Sheffield: Department of Automatic Control and Systems Engineering. 37. K. Deb and T. Goel, Controlled elitist non-dominated sorting genetic algorithms for better convergence, in Proceedings of the first international conference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Computer Science, Vol. 1993, Springer, pp. 67-81, 2001. 38. M. Lahanas, E. Schreibmann, N. Milickovic and D. Baltas. Intensity modulated beam radiation therapy dose optimization with multiobjective evolutionary algorithms, in C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele (eds.), Evolutionary Multi-Criterion Optimization. Second International Conference, EMO 2003, pp. 648-661, Springer. Lecture Notes in Computer Science. Vol. 2632, Faro, Portugal, April 2003. 39. X. Gandibleaux, H. Morita and N. Katoh, The supported solutions used as a genetic information in population heuristic, in Proceedings of the first international conference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Computer Science, Vol. 1993, Springer, pp. 429-442, 2001. 40. S. Webb, Optimization of conformal radiotherapy dose distributions by simulated annealing, Phys. Med. Biol. 34, 1349-1370 (2001). 41. T. Bortfeld, J. Urkelbach, R. Boesecke and W. Schlegel, Methods of image reconstruction from projections applied to conformation therapy, Phys. Med. Biol. 35, 1423-1434 (1990). 42. J. D. Knowles, D. Corne and J. M. Bishop, Evolutionary Training of Artificial Neural Networks for Radiotherapy Treatment of Cancers in Proceedings of the 1998 IEEE International Conference on Evolutionary Computation IEEE Neural Networks Council, 0-7803-4871-0, pp 398-403. 43. J. D. Knowles and D. Corne, Evolving neural networks for cancer radiotherapy, in Chambers, L.(ed.), Practical Handbook of Genetic Algorithms: Application 2nd Edition, Chapman Hall/CRC Press, pp. 443-448. ISBN L-58488-240-9, 2000. 44. M. Lahanas, E. Schreibmann and D. Baltas, Multiobjective inverse planning for intensity modulated radiotherapy with constraint-free gradient-based optimization algorithms Phys. Med. Biol. 48, 2843-2871 (2003). 45. E. Schreibmann, M. Lahanas, L. Xing and D. Baltas, Multiobjective evolutionary optimization of number of beams, their orientation and weights for IMRT, Phys. Med. Biol. 49, 747-770, (2004). 46. A. Petrovski and J. McCall, Multiobjective optimization of cancer chemotherapy using evolutionary algorithms in Proceedings of the first international conference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Computer Science, Vol. 1993, Springer, pp. 531-545, 2001. 47. B. de la Iglesia, G. Richards, M. S. Philpott and V. J. Rayward-Smith, The application and effectiveness of a multi-objective metaheuristic algorithm with respect to data mining task of partial classification, submitted for
Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 391
publication to European Journal of Operational Research, 2003. 48. A. R. Reddy and K. Deb, Identification of multiple gene subsets using multiobjective evolutionary algorithms, in C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele (eds.), Evolutionary Multi-Criterion Optimization. Second International Conference, EMO 2003, pp. 623-637, Springer. Lecture Notes in Computer Science, Vol. 2632, Faro, Portugal, April 2003.
CHAPTER 17 ON MACHINE LEARNING WITH MULTIOBJECTIVE GENETIC OPTIMIZATION
Rajeev Kumar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur Kharagpur, WB 721 302, India E-mail: [email protected] We describe a generic framework for solving high-dimensional and complex domains of machine learning. We use multiobjective genetic optimization to act as a pre-processor for partitioning the learning tasks in to simpler domains which could then be solved by traditional learning approaches. We define two main objectives - minimization of learning costs and minimization of errors - for partitioning. For (further) improving generalization, we use multiple machine learning algorithms for each partition and label a partition by a vector of learning-costs and confidence-levels, and thus provide multiple views of the solution space by a set of machine learners. This is a multiobjective optimization problem whose solutions are not known a priori. Therefore, we use a multiobjective evolutionary algorithm implementation which produces diverse solutions and monitors convergence without needing a priori knowledge of the partitions emerging out of the optimization criteria.
17.1. Introduction An essential attribute of an intelligent machine is its ability to learn from examples, produce general hypotheses from training data, and make effective decisions when presented with unseen data. During the process of learning from examples, the machine learner not only approximates the functional relationship of the restricted domain covered by the training set but also attempts to understand the wider sampling it has not seen of the parent function. The understanding of unseen sampling may result from interpolation and extrapolation. Most real-world applications (RWA) do not give 393
394
Rajeev Kumar
precise input-output mappings; data-sets may be noisy containing distorted patterns, they may have partially occluded high-dimensional images, and decision boundaries may be non-linear. Achieving a good generalization is a non-trivial task for most machine learning models because they induce varying degrees of freedom while learning such real-life patterns24. There are many approaches to machine learning - logical & fuzzy rules, decision trees & Bayes decision theory, supervised & non-supervised clustering, connectionist intelligence & statistical inferences, and genetic search & stochastic methods42'44. At an abstraction level, a machine learner desires that the learning errors in prediction, classification or approximation be minimized for a given finite set of known patterns. Complementary to this, for unseen patterns, machine learning aims at achieving a good performance; such a good performance may not be achieved simply by optimizing a single value of an error-function. A learning model mainly minimizes errors in training data while generalization is influenced by many factors such as learning model's architecture and design parameters, and data-sets used for training, validation and testing. Many of the machine learning algorithms and models work on the principle of iterative refinement. The generalization of such models and algorithms mainly aims at avoiding underfitting as well as overfitting while approximating functions and demarcating decision boundaries. This is related to solve the well-known bias-variance dilemma21. There exists another dimension to the problem of generalization which relates to the scaling of learning models for solving arbitrarily complex problems. Scaling models to larger systems is a difficult problem because larger models require increasing amounts of training time and data, and eventually complexity of the optimization task reaches computationally unmanageable proportions. Simply increasing the complexity of the model is a popular solution but may unjustifiably increase the number of free-parameters of the learner's architecture which can lead to poor generalization10. In the context of addressing complex learning domains, two basic approaches have emerged as possible solutions to the poor scalability of intelligent models: ensemble based22 and modular systems54. The family of ensemble-based approaches relies on combining predictions of multiple models, each of which is trained on the same database; in general, the emphasis is on improving the accuracy for a better generalization and not on simplifying the function approximators. Another advantage of decomposing the input space in to multiple partitions is that each of the partitions can be learnt through multiple machine
On Machine Learning with Multiobjective Genetic Optimization
395
learning algorithms, and thus yielding multiple views of each of the partitions. Each view can be labeled as a vector of multiple costs - the learning cost, and the learning and validation errors. This representation has the advantage that a user may pick a view from the vector and select a machine learning algorithm based on the resource availability and the learning accuracy needed for a particular application. Hence, we label each partition with multiple costs. Decomposing a pattern-space into multiple partitions using a set of multiple objective is an NP-hard problem20'27. Therefore, we use randomized search heuristics like evolutionary algorithms (EAs) for the partitioning task. In recent years, EAs have emerged as a powerful black-box optimization tool to solve NP-hard combinatorial optimization problems. In the multiobjective scenario, EAs often find effectively a set of diverse and mutually competitive solutions without applying much problem-specific information. Additionally, achieving proper diversity in the solutions while approaching convergence is a challenge in multiobjective optimization, especially for unknown problems in black-box optimization. There are many implementations of multiobjective EAs, for example, MOGA18, NSGA16>5Y, PAES32 and SPEA66'65. These implementations achieve diverse and equivalent solutions by some diversity preserving mechanism, and they do not talk about convergence. However, some recent studies have been done on combining convergence with diversity5'43 for problems whose optimal Pareto-set is known. Kumar & Rockett41 proposed the use of rank-histograms for monitoring convergence of the Pareto-front while maintaining diversity without any explicit diversity preserving operator. Pattern space partitioning problem belongs to the class of problems for which the optimal solution space is not known a priori. Therefore, in this work, we use the Pareto Converging Genetic Algorithm (PCGA)41 which has been demonstrated to work effectively across complex problems and achieves diversity without needing a priori knowledge of the solution space. PCGA excludes any explicit mechanism to preserve diversity and allows a natural selection process to maintain diversity. Thus, multiple, equally good final solutions to the problem, are generated. We use PCGA to partition the task in a generic but near-optimal manner as a pre-processor to the learning domain. We argue that separating the task of decomposition from the regime of modular learning simplifies the overall learning architecture and this strategy of data-processing before its submission to a classifier considerably reduces the learning complexity. Additionally, only those patterns which lie close to the decision boundaries
396
Rajeev Kumar
possibly warrant multiple learning efforts in order to improve the prediction accuracy and the clusters which contain only one data class are implicitly labeled without ambiguity. The rest of the chapter is organized as follows. The next section presents an overview on some issues in machine learning (sub-section 17.2.1), generalization (sub-section 17.2.2) and, the application of our multiobjective evolutionary algorithm to solve real-world problems (sub-section 17.2.3). Section 17.3 formulates the partitioning problem. Section 17.4 describes the implementation of multiobjective evolutionary algorithm used for pattern space partitioning. A summary of the results is presented in section 17.5. Finally, we conclude this chapter with a summary in section 17.6. 17.2. An Overview 17.2.1. Machine
Learning
A learning model aims at capturing the global nature of approximations. In case of iterative refinement, the model incrementally adapts in the direction of decreasing error-function based on some learning rules. At the same time, it is believed that a global error-surface may have an extensive flat area and significant variations in local minima. Such situations are very common with high-dimensional inputs which often require very long learning time or result in unsuccessful training. Another major phenomenon contributing to the problem of slow/difficult training is cross-talk, i.e., the presence of conflicting information in the training data that retards learning. Crosstalks are identified as: temporal58 and spatial47. In temporal cross-talk a neural network receives inconsistent training information at different times in the training cycle; receiving inconsistent information at a single instant in time is treated as spatial cross-talk. By analogy, catastrophic interference55 results from sequential training when the disjoint blocks of training data are presented in sequence. In the context of addressing complex learning domains, the 'divide-andconquer' algorithm divides a complex task into a number of simpler subtasks such that each subspace is learnt by an expert, and then combines the knowledge acquired by experts to arrive at an overall decision. It is believed that the partitioning of a space into sensible subspaces and the subsequent learning of subspaces by corresponding simpler experts, reduces overall total computational/learning complexity. Analogously, it is a general practice to decompose a k - class classification problem into k two-class problems. The simplest form of divide-and-conquer algorithm is a tree-structured
On Machine Learning with Multiobjective Genetic Optimization
397
classifier. Such algorithms have their origin in statistics, e.g., Classification and Regression Tree (CART) algorithm of Brieman et. al.8, and ID3 & C4.5 induction tree algorithms of Quinlan49. These algorithms fit surfaces to data by explicitly dividing the input space into a nested sequence of regions (tree) and by fitting surfaces within these regions. In a simple example, a hierarchical partitioning of feature space using hyperplanes parallel to feature axes results in a binary decision tree; non-leaf nodes are decision nodes and the leaf ones are terminals. With stronger connections between statistics and neural networks, many researchers have combined the tree-structured concept of partitioning an input space with non-linear non-parametric functional approximation capabilities of neural networks9. In this approach, a classification tree is grown and simple neural networks are employed at each decision node of the tree. This hybrid approach significantly decreases error-rates on comparing with those of decision tree having a constant function in the decision node but on the expense of increased training time. The tree structure combines with smaller multi-neural nets located at each decision node of the tree yields comparable error rates with shorter training time when compared with a single large neural network, though these observations vary with the application and the approach. In general, modularity is attractive for problem solving in complex domains of machine learning driven by a desire for: • computation at each stage is lesser than a single unpartitioned one, • problem is more constrained and solvable, convergence is faster, • misclassification errors are fewer, predictions are more accurate and approximations are superior, • problems of spatial/temporal cross-talks and catastrophic interference are reduced, • model has more structured sub-models and components, and • model can be suitably mapped on a multiprocessor system. Nonetheless decomposition has its own difficulties - partitions in the absence of a priori knowledge of the pattern space are not unique. There are additional requirements and complexities in distributed learning and their integration into the next hierarchy. The credit assignment problem (i.e., the problem of getting the right training information to the right module so as to improve overall system performance) becomes more complex and adds another dimension to the problem of getting the right information to the right module. This implies that the modularity can serve meaningfully only if both inter-module as well as intra-module assignments are beneficial
398
Rajeev Kumar
to credit assignment. Modularity assigns a set of function approximators to each sub-problem so that the modules learn to specialize in different tasks and then combine their individual solutions. Jacobs et. al.29 inferred that function decomposition is an under-constrained problem and different modular architectures may decompose a function in different ways which is certainly not a happy situation from the generalization point of view with too many degrees of freedom. They also concluded that the modular architecture could be restricted to a well-suited solution if domain knowledge is incorporated for a desirable decomposition. For those problems where one has some prior knowledge of the pattern space and the decomposition into sub tasks is explicit; this is a trivial task, e.g., image corner labeling38, and phonemic classification61. In the absence of any prior knowledge of the pattern space, Jordon & Jacobs demonstrated the decomposition-throughcompetition31 approach where decomposition and learning phases are combined. They designed the hierarchical mixtures of experts where expert networks compete to learn training patterns, and gating network mediates the competition. Their architecture performs task decomposition in the sense that it learns to partition a task into functionally independent subtasks and allocates a distinct network to learn each task. In subsequent years, different researchers have used different ways of incorporating a priori knowledge into modular architecture for both functionality of task-decomposition and modular learning integrated together. In this work, we adopt a different line of work and separate out partitioning and learning phases. 17.2.2. Generalization Another dimension of work in modular system is to combine predictions of multiple learners (in ensemble approach, each partition can be learnt by multiple learners) to improve accuracy10'28'54. The emphasis, in general, is not on partitioning the input data but on improving the accuracy. The ability of a learner is judged on how correctly the learner responds to unseen data, i.e., how well it generalizes. Geman et. al.21 have shown that given infinite training data, consistent learners approximate the Bayesian decision boundaries to arbitrary precision, thus providing similar generalization. However, finite and noisy training data sets are reality and different learners when trained with such data provide different generalization. In this context, Hansen and Salamon22 suggested the use of an ensemble where each model is trained on the same database. However, winner-takesall strategy may not be an ideal choice, since potentially valuable informa-
On Machine Learning with Multiobjective Genetic Optimization
399
tion may be wasted by discarding the results of less successful models63. This observation motivates the use of combining the outputs of several experts for making a decision. This approach is particularly suited to difficult problems with limited training data and high dimensional patterns. Another term analogous to a combiner is meta-learning which is defined as learning from information generated by a set of learners. It can be viewed as the learning of meta-knowledge on the learned information10. The concept of combining has been studied in recent years in several forms10'28'54. A weighted averaging of the outputs of several learners, voting schemes and arbiters have been suggested as an alternative to selecting the best model. Other developments in the statistics community include stacked regression6, bootstrap aggregation7, and stacked generalization63. Ali &: Pazzani1 modeled the degree of error reduction due to the use of multiple models. There are many other references too. A rule of thumb for obtaining good generalization is to use the smallest model that fits the data. Unfortunately, it is not obvious which size is the best; a model that is not sufficiently complex is very sensitive to initial conditions and learning parameters. A small neural network learns extremely fast but has a high probability of getting trapped in a local minimum and thus may fail to train, leading to underrating. On the other hand, larger networks have more functional flexibility than small networks so are able to better fit the data. A network that is too large may fit the noise and not just the signal and this leads to overfitting. Overfitting produces excessive variance whereas underfitting produces excessive bias in the outputs; bias and variance are complementary terms and the best generalization is obtained with the optimum balance between bias and variance, i.e., increase model bias in order to reduce model variance for avoiding overfitting21'52. Drawing an inference that only large model shows overfitting is not correct; small networks start overfitting before they have learnt all they could. Viewed in abstract terms of bias and variance, we can say that for a good generalization we need to control the effective complexity of the learner for an optimum mix of both bias and variance. For example, in case of connectionist architecture, the network complexity can be defined simply in terms of the size of the weights, the number of connections, the number of hidden units and the number of layers24. Analogously, several methods have been proposed for controlling the network complexity: there are approaches where one starts with a relatively large network and prune out the least significant connections or derives them to insignificance. Similarly one can start with a small network and add units during the learning process
400
Rajeev Kumar
with the goal of arriving at an optimal network. There are other dependencies as well e.g., initial network conditions3, learning rate, cross-validation, stopping criterion and more importantly the curse of dimensionality24. One major contributor to model complexity is the model-size and it is always desirable to minimize the number of free parameters. Many studies have been carried out for selecting a proper size, nonetheless it remains an unresolved problem. Some theoretical studies have established the upperbounds on the number of hidden nodes for connectionist architectures; but a priori knowledge of the upper-bounds can neither provide a practical guess on the number of hidden-nodes required for mapping a training set involving a large number of samples nor minimize the free parameters. Some researchers also defined the theoretical lower-bounds based on the VapnikChervonenkis (VC) Dimension assuming that the future test samples are drawn from the distribution of training-samples. Weigend62 avoided overfitting if net-size was guided by the eigen-value spectra, and others advocated use of the effective number of parameters in a non-linear system for achieving better generalization. But how to decide the effective dimensionality or the number of parameters remains an open issue46'62. Another promising approach to avoiding under-/over-fitting and increasing flexibility of learning is to start with a large model and through regularization or pruning improve generalization50. In case of neural networks, weight decay is a subset of regularization methods which adds a penalty term to objective function. The penalty term penalizes large weights and thus the complexity; large weights can cause excessive variance in the output. Different researchers defined different penalty terms for weight decay/elimination. A fundamental problem with weight decay is that the proper coefficient for this term is not known a priori, and different types of weights in the network require different decay constants for good generalization. Other type of approaches are based on pruning out the least significant connections either by removing individual weights or by removing complete units, e.g., optimal brain damage/surgeon. Many researchers have also proposed correlation and heuristics based pruning/merging methods for model simplification. These approaches are found effective on a few problem sets. Pruning based generalization demands selection of algorithms, effective parameters and setting of stopping criterion. Furthermore, it is shown by many researchers that pruning is not always beneficial and some algorithms may not be effective. Pruning has been applied to many other models too, e.g., decision trees and rule-based system. Early stopping monitors the errors on a validation set and halts learn-
On Machine Learning with Multiobjective Genetic Optimization
401
ing when the error on validation set starts increasing. Here selection of the model is not guided by the training process convergence but rather training process is used to perform a search to find a model with superior generalization performance. The objective of this approach is to stop training before the model starts fitting noise. The results of many researchers have provided strong evidence for the efficiency of stopped training. At the same time, it has been shown that for finite validation set there is a dispersion of stopping points around the best stopping point, and this increases the expected generalization error. Other obvious problems are: there is no guarantee that the validation curve passes through the optimal point, it may go up and down many times during training. The validation set is again a limited sampling and may not represent the universe. It also requires crucial decisions regarding selection of training and validation sets, and the number of examples to be divided into these two sets. Selection of what strategy to be followed - leave-one-out, cross-validation, bootstrapping, or bagging - is another issue. In accordance with No Free Lunch theorems64, there is no reason, in the absence of prior information about the problem, to prefer one learning algorithm or the model to another. This theorem also establishes that for any algorithm, any elevated performance over one class of problems is offset by the performance over another class. On the other hand, given a finite set of feature values, the Ugly Duckling theorem17 states that in the absence of assumptions, there is no privileged feature representation. However, the Minimal Description Length principal17 prefers one type over another specifically, simpler over the complex ones. It is unarguably accepted that simpler the model and the algorithm, the superior is the generalization. Therefore, in this work, we facilitate (i) data partitioning into simpler domains,(ii) smoother decision boundaries by (possibly) excluding variations of noise, and (iii) use simpler features for decomposition into sensible partitions to minimize the model and learning complexity which is expected to yield improved prediction accuracy and offer better generalization. 17.2.3. Multiobjective Evolutionary Algorithms (MOEA) & Real-World Applications (RWA) In this sub-section, we briefly review the issues related to the use of multiobjective evolutionary algorithms in solving real-world applications. Since the whole book-volume is compiled for real-world applications, we mention only those factors which we address and use in solving the partitioning problem.
402
Rajeev Kumar
(For detailed coverage of multiobjective genetic optimization - see Deb15 and Coello et al.13; a current list of references is maintained by Coello12). We classify multiobjective optimization problems into the following three distinct classes : (i) Class A - Optimization problems which can be represented by analytical functions. (ii) Class B - Combinatorial optimization problems in NP which can be verified in polynomial time, e.g., 0 - 1 knapsack, Hamiltonian path and ^-clique problems. For this class of problems, many (1 + e) approximation algorithms exist; e is usually a small quantity27. (iii) Class C - Combinatorial optimization NP-hard problems for which polynomial-time good approximation algorithms are not known. It is difficult to approximate Pareto-front for this class of problem. We consider the partitioning problem in this class. Problems falling in Class A are most studied problems and numerous studies have been done on many functions to study various aspects of these problems including multi-modality and deception - see, for example, Deb14. Solution space of such problems is known a priori, and can otherwise be obtained by many off-the-shelf tools. Many such problems have become defacto standards for benchmarking and comparing the performance of newer MOEAs with the other well known algorithms. In fact, they serve as a fitting between the obtained solution-space and the desired one rather than solving a problem. Therefore, such problems have been extensively researched to evaluate the efficacy of (i) genetic operators used in exploring the search space, (ii) producing/preserving diversity across the Pareto-front, and (iii) assessing the convergence by measuring the closeness of the obtained solutions to the real Pareto-front. Many metrics have been proposed for quantitative evaluation of the quality of solutions23'33'59'66'67. Essentially, these metrics are divided into two classes: • Diversity Metrics : Coverage and sampling of the obtained solutions across the front, and • Convergence Metrics : Distance of the obtained solution-front from the (known) optimal Pareto-front. Some of these metrics (e.g. generational distance, volume of space covered, error-ratio measures of closeness of the Pareto-front to the true Pareto front) are only applicable where solution is known. Other metrics (e.g., ratio
On Machine Learning with Multiobjective Genetic Optimization
403
of non-dominated individuals, uniform distribution) quantify the Paretofront and can only be used to assess diversity. 17.2.3.1. Achieving Diversity Many techniques and operators have been proposed to achieve diversity13'15. The commonly used techniques for preventing genetic drift and promoting diversity are: sharing, mating restrictions, density count (crowding) and pre-selection operators. These approaches can be grouped into two classes: parameter-based and parameter-less. The niching/sharing techniques have been commonly employed to find a diverse set of solutions although such techniques work best when one has a priori knowledge of the solution. On knowing the number of niches, a sharing function using some user-defined parameters computes the extent of sharing and produces multiple (near-) optimal solutions. Some work has been done on parameter-less MOO too. Most of the work has been done to test the efficacy of the EAs in solving known problems rather than solving the problem per se. In summary, most explicit diversity preserving methods need prior knowledge of many parameters and the efficacy of such mechanisms depends on successful fine-tuning of these parameters. Interestingly, in a recent study, Purshouse & Fleming48studied the effect of sharing on a widerange of two-criteria benchmark problems using a range of performance measures and concluded that sharing can be beneficial, but can be prove surprisingly ineffective if parameters are not properly tuned. They statistically observed that parameter-less sharing is more robust than parameterbased equivalents (including those with automatic fine-tuning during program execution). Nonetheless, most of the MOO algorithms, e.g., MOGA18, NSGA-II16, PAES32 and SPEA265, use some diversity promoting mechanism in some form or the other, and have been successfully applied to many problems which can be represented by analytical functions. Some recent work includes treating diversity as another objective to be optimized60. 17.2.3.2. Monitoring Convergence We have classified real-world problems into two groups - Class B and Class C - the first whose solution is known o priori or can be approximated by some means and second, those for which the solution space is unknown. For class A problems, tolerance limits or achievable percentages of defined goals can give some indication of solutions moving towards goal-convergence, and
404
Rajeev Kumar
thus solutions obtained by genetic optimization could be compared. For example, the 0 - 1 knapsack problem has been attempted by many EA researchers and several approximate Pareto-front have been obtained, e.g., Zitzler & Thiele66. Many metrics are available in the literature, e.g., Zitzler & Thiele66 and Tan et al.59 which measure the diversity of the obtained Pareto-front and the distance between the obtained front and the desired one. Thus, efficacy of the genetic implementation may be measured and results obtained by the genetic optimization verified. However, for problems where we have neither prior knowledge nor any approximation of the solution space, the issue of convergence is an important issue. Therefore, such real-problems can not be recasted in the form of an analytical function, prior visualization of the solution set is not possible, and proper selection of the niche parameters is difficult. Secondly, species formation in high-dimensional domains does not scale well and is a computationally-intensive task. Moreover, although the sharing/mating restrictions employed by various authors partly solve the problem of premature convergence, they do not necessarily guarantee overall convergence. Some recent studies have been done on combining convergence with diversity. Laumanns et al.43 proposed an e-dominance for getting an eapproximate Pareto-front for problems whose optimal Pareto-set is known. Their technique does not work for unknown problems. Similarly, Bosman & Thierens attempted to combine diversity and convergence too5. In real-world search problems belonging to Class C, the location of the actual Pareto-front is, by definition, unknown and the identification of the 'best value' of some criterion does not necessarily mean global convergence. In problem domains of low objective dimensionality, the Pareto-front can be examined for genetic diversity but not for convergence; high-dimensional objective spaces cannot generally be visualized for either diversity or convergence. (Some performance metrics, e.g., volume of space covered, distribution (Tan et al.59) can provide information about diversity alone.) Knowledge of the propagation of the solution front through successive generations of the population, however, can serve as a clue for convergence. Viewed as a statistical sampling problem over the objective space, just because a given solution point dominates all others in the (finite) samples does not imply that it is drawn from the Pareto optimal set - the given nondominated point could itself be dominated by another, yet undiscovered, solution which in turn, need not necessarily be drawn from the Paretooptimal set. In the past, a simple upper bound on the number of generations/iterations has been used as a stopping point while others have em-
On Machine Learning with Multiobjective Genetic Optimization
405
ployed the production of some percentage of non-dominated individuals in the total population as a stopping criterion. The first of these is unsatisfactory since either a large amount of processor time could be wasted producing further generations for an optimization which has already converged; alternatively there is no way of knowing that a particularly stubborn problem is still far from convergence. The second option is ill-conceived since solutions are non-dominated relative to the population sample not the universe of optimal solutions. In this context, the rank-histograms, proposed by Kumar & Rockett39'41 monitor convergence of the Pareto-front for problems of unknown nature; assessing convergence does not need any a priori knowledge for monitoring movement towards the Pareto-front. 17.2.3.3. Avoiding Local Convergence
For solving unknown problems there is a common concern whether the obtained solution is close to the true Pareto-front or not. Due to the finite size of the population, from any initialization of the population there is a finite set of genetic material to be permuted and combined. Towards the end of the EA run most of the chromosomes will become rather similar and so crossover becomes a weak driver for population advancement; most of the gains will effectively be made by random walk due to mutation. At some stage, the rate at which population improves slows down and few further gains of significance are achieved. Hopefully, here the EA will have converged to the Pareto front but conceivably it may have got stuck at some sub-optimal point. This can be the case even with simple analytical problems. While working on a known bimodal problem by recasting it into a two-objective one, Deb14 concluded that he could not avoid the population getting stuck to a local Pareto-front in spite of fine-tuning of the diversity preserving operators and continuing the optimization for a very large number of generations. Kumar & Rockett41 investigated the same problem and also got identical results without any diversity promoting mechanism. For such cases, there is little point in continuing the optimization and it should be terminated. We argue that there is always a certain inheritance of genetic material or content belonging to one population and there may not be much appreciable evolutionary gain beyond a certain number of generations. This implies that the genetic precursors available within a finite population may be inherently incapable of evolving to the true Pareto-front. Instead, we suggest that alternative genetic material should be acquired
406
Rajeev Kumar
in the form of another population. Each population sample is run to its own convergence, the obtained solutions are then merged and tested across populations. Therefore, we suggest this strategy of EA optimization through independently initialized populations - as a test on convergence - to be particularly suited to harder problems of unknown nature37'41. 17.3. Problem Formulation If we consider partitioning of a pattern space as a mapping P from an N dimensional space to j subspaces of dimensionality rij, then the formulation is an JV-dimensional function decomposition into many rij - dimensional sub-functions subject to certain criteria, Obj-fi(X). Since nj represents (hopefully) a less complex domain, a learner can approximate such a subdomain with less effort; one of the measures of complexity is the local intrinsic dimensionality which is computed by the Principal Component Analysis (PCA)46. (In principle, we refer to the intrinsic dimensionality for referring to learning complexity rather than the true dimensions.)
Fig. 17.1. Approximation of a 2D function by a set of (approximately) linear functions.
Figure 17.1 illustrates a simple case of decomposing a 2D function into many sub-functions. Here the 2D function is parameterized by a series of intrinsically ID functions and although in this illustrative example, the reduction in dimensionality is trivial, this will not be the case for higher dimensional spaces. Linear functions represent the simplest problem domains to be learnt by a machine learning algorithm and such functions can be learnt by a simple input-output mapping. Alternatively - in Figure 17.2 - the hyperspherical clusters can enclose pattern blobs and two situations can arise in practice: (i) all the patterns in each hypersphere belong to a single class, i.e. partitioning alone demarcates the decision boundaries and no explicit classification stage is needed, and (ii) the patterns belong to multiple classes necessitating the use of some
On Machine Learning with Multiobjective Genetic Optimization
407
Fig. 17.2. Partitioning of the pattern space in to (a) disjoint clusters, and (b) overlapped clusters. Outliers may be excluded.
post-partitioning classifier. Outliers may still be excluded. Partitioning is not unique. Many probable partitioned blobs can exit. For example, we show two exemplar cases of hyperspheres enclosing patterns in Figure 17.3.
Fig. 17.3. Partitioning of pattern space into clusters. Enclosing hyperspheres can be located in many ways. Two situations are depicted.
The basis for such a partitioning is that the clusters are generated on the basis of fitness for purpose, i.e., they are explicitly optimized for the subsequent mapping onto a machine learner for subspace learning. This approach transforms the problem-dependent partitioning task, in a generic manner, by dividing the pattern space into a set of hyperspherical regions by a set of objectives for optimizing the performance; the data within each sphere being learned by individual learners are then combined. We perform optimization on a vector space of objectives and explore the search space for a set of equally viable partitions of the pattern space. We identify the following three subsets of objectives for partitioning the pattern space, optimizing learning efforts, and improving generalization.
408
Rajeev Kumar
I. Maximize Modularity i. Minimize learning cost : we consider learning cost as a function of intrinsic dimensionality46'62. For example, for one partitioning case shown in Fig. 17.1, the intrinsic dimensionality is effectively one. We have taken a conservative estimate in determining the intrinsic dimensionality and included the components up to some proportion, say 0.95, of the total variance within a hypersphere as the determining criterion of intrinsic dimensionality. Thus, our objective is to minimize the average intrinsic dimensionality of the subspaces. Assuming learning complexity of a machine learner to be a quadratic or higher order functions, this will yield substantial reduction in computation. For example, in a feedforward network, this is of the order of O(N3) - see Hinton26. ii. Minimize number of partitions : we wish to maximize the modularity, but it should be based on minimizing the overall training effort. Alternatively, this objective can be withdrawn and the number of partitioning hyperspheres can be specified in advance based on some prior knowledge of the problem domain. iii. Minimize overlap of partitions : we do not aim at getting completely disjoint partitions emerging out of the partitioning. However, we aim to avoid repetition of learning effort on similar sets of patterns in different modules but allow some overlap of hyperspheres to prevent the formation of a no-man's land between the partitions. II. Maximize Generalization Ability iv. Maximize data density : this measure attempts to produce compact solutions, and thus minimizes the probability of taking a random-walk by decision surfaces in no-man's land. We consider the number of patterns included within a partition, normalized by the surface content of the hypersphere as the data density measure. v. Maximize regularity of decision surfaces : this objective aims at increasing the learning accuracy by regularizing decision surfaces which is, however, difficult to quantify. For this, we consider the nearest neighbor classification error to indicate how well the partitions preserve the total structure of the
On Machine Learning with Multiobjective Genetic Optimization
409
pattern space as a separability measure. vi. Minimize validation errors of multiple learners : aims at getting the validation errors for few machine learning algorithms and confirming the suitability of machine learners for the resultant partition. III. Generic Measures for Search vii. Maximize fraction of excluded patterns of each class : we aim
to include within all the partitions as many training patterns as possible from each class. Hopefully, outliers within the pattern space can be excluded because the objective does not aim to include all patterns. The purpose is that the decision surfaces should not be formed by simply omitting the patterns belonging to the minority class (es). viii. Maximize inclusion of patterns of a single class in a single
partition : for such partitions, there is no need for any postpartition learning efforts; the patterns can be unambiguously labeled just by an inclusion membership. ix. Maximize equitable distribution of patterns : this aims at having a balanced training set for post-partition learning. We wish to minimize partitioning where partitions have imbalanced training set; (generalization ability is usually poor for an imbalanced training set). The above elements in the objective vector are distinct, competing and complimentary. From the obtained set of solutions, a small subset based on subranking of objectives is picked for subsequent learning. In our technique clusters are explicitly optimized for their subsequent mapping onto the machine learner - rather than emerging as some implicit property of the clustering algorithm30. Most traditional clustering algorithms rely on some similarity measure; they may also fail to converge to a local minimum53. Some work has been done, e.g., Chang & Lippman11 and Srikanth et al.56, for obtaining optimal clusters, however, we did not aim to get optimal trade-offs but tried to find good approximations. In conceptualization, the proposed strategy for feature space decomposition has strong links to the recursive partitioning algorithms of Henrichon & Fu25 and Friedman19 for non-parametric classification using hyperplanes parallel to feature axes. Similarly, we aim to partition feature spaces into subspaces and their corresponding mapping onto multiple machine learners.
410
Rajeev Kumar
17.4. MOEA for Partitioning The partitioning problem as formulated in the previous section belongs to the class C category of problems as per the classification scheme defined in section 17.2.3. From evolutionary algorithm's point of view, the problem is characterized by the following features: • No a priori knowledge of the solution space is available. For the partitioning problem involving multiple objectives, solution space is mostly discrete, and there may not exist any well defined uniformly distributed spread of solutions across the Pareto-front. • Bounds of the objective space are not known. None of the objective value is known nor any of the optimal-points in the objective space is known. There exists no information regarding niches or local/global minima/maxima. • This is an NP-hard combinatorial optimization problem and no polynomial-time good approximation algorithm is known. • Problem formulation as given in section 17.3 is new; no previously obtained result for the partitioning problem is available to compare/validate. While solving an optimization problem using evolutionary algorithm most of the EAs need the following: • For diversity: Some information about the niches or the shape of the Pareto-front to make the diversity preserving mechanism effective. • For convergence: Some information about the final solutions which usually serves as the stopping criterion. In the absence of such an information, the algorithm is run for a very large number of iterations and thus, wasting CPU time. • Avoidance of local optima: Information about the existence of any local minima/maxima in the Pareto-front is not available. Therefore, in this work, we use an algorithm implementation which does not need any explicit sharing mechanism and does need an approximate Pareto-front to check for convergence. We choose an algorithm whose features are as follows: • Implicitly achieves diversity by varying the selection pressure without any knowledge of the problem domain, • Monitors convergence (without any knowledge of the problem domain) to terminate the further evolution of generations, thus avoid-
On Machine Learning with Multiobjective Genetic Optimization
411
ing wastage of computing resources, • Preserves known, good solutions, and • Check for the avoidance of a run getting stuck at a local optima by an independently initialized population approach. This is essentially a test for (near-)global convergence. To the best of our knowledge, the Pareto Converging Genetic Algorithm (PCGA)41 is the only multiobjective optimization algorithm which does not need any problem-dependent knowledge and monitors convergence for unknown solution space. 17.4.1. The Algorithm The Pareto Converging Genetic Algorithm (PCGA) used in this work is a steady-state algorithm and can be seen as an example of (// + 2) evolutionary strategy in terms of its selection mechanism. In this algorithm, the individuals are compared against the total population set according to a tied Pareto-ranking scheme18 and the population is selectively moved towards convergence by discarding the lowest ranked individuals in each evolution. In doing so, we require no parameters such as the size of sub-population in tournament selection or sharing/niching parameters. Initially, the whole population of size N is ranked and fitness is assigned by interpolating from the best individual (rank = 1) to the lowest (rank < N) according to some simple monotonic function. A pair of mates is randomly chosen biased in the sizes of the roulette wheel segments and crossed-over and/or mutated to produce offspring. The offspring are inserted into the population set according to their ranks against the whole population and the lowest ranked two individuals are eliminated to restore the population size to N. The process is iterated until a convergence criterion based on rank-histogram is achieved. For details of the algorithms see Kumar & Rockett39'41. If two individuals have the same objective vector, we lower the rank of one of the pair by one; this way, we are able to remove the duplicates from the set of nondominated solutions without loss of generality. For a meaningful comparison of two real numbers during ranking, we restrict the floating-point precision of the objective values to a few units of precision. This algorithm does not explicitly use any diversity preserving mechanism, however, lowering the rank of the individual having the identical objective vector (with restricted units of precision) is analogous in some way to a sort of sharing/niching mechanism (in objective space) which effectively controls
412
Rajeev Kumar
the selection pressure and thus partly contributes to diversity (For other factors that contribute to diversity, see PCGA41). The algorithm has been tested on many benchmark analytic functions along with many real-world complex problems for producing diverse solutions on the Pareto-front; some results are reported in Kumar et al.40'37. 17.4.2. Chromosome
Representation
To represent sub-space partitions, we use variable length individuals where each sub-block encodes the hypersphere centre and radius. Each unit of a chromosome is a real number and (N + 1) such units form a block where N is the dimensionality of the pattern space. Variable length of individuals are necessitated by the fact that the number of clusters emerging from the search is unknown and so the number of clusters in the (near-) optimal solution is also evolved genetically such that the number of blocks forming a chromosome represents the number of partitions of the pattern space. 17.4.3. Genetic
Operators
For the genetic search we employed a single point cross-over operation on the hypersphere limits and a Gaussian mutation. The crossover point is taken on the boundaries between hypersphere description records to prevent the formation of illegal chromosomes. Apart from meaningful recombination, this has the added advantage that good clusters can be retained but shuffled among solutions which is needed to get (near-) optimal partitions. We use the mechanism of adding zero-mean Gaussian noise to the centre coordinates and the radii of the hyperspheres for mutation. 17.4.4. Constraints & Heuristics We draw the constraints on the decision variables from the bounds of the pattern space. We have investigated two approaches for initializing the chromosomes: In one approach the cluster centers were randomly initialized and in the other a hypersphere was centered on a randomly selected data pattern. The second approach of seeding a chromosome proved particularly well-suited to sparsely populated pattern spaces and thus significantly reduced the search effort. The search operations are further minimized using a few heuristics. One heuristic acts on the upper bound of the radii of a hypersphere where the upper bound is divided by the square-root of the dimensionality of the
On Machine Learning with Multiobjective Genetic Optimization
413
space. This heuristic prevents the potential inclusion of all the patterns into each partition of the feature space. Complementary to this, another heuristic limits the minimum fraction of the patterns included in a hypersphere: partitions containing less than some fraction of the total patterns are prevented from forming a separate cluster. The other heuristic acts on the maximum number of clusters forming a solution since preventing the number of partitions from becoming arbitrarily large helps restrict the search in the space of many (partially or wholly overlapped) clusters. In spite of the constraints and heuristics, exploring such a search space for a set of equally viable partitions of the pattern space is a complex optimization. One simplification is to look for some predetermined number of clusters. This can be useful if one has prior knowledge of the pattern space from, say, viewing the data with standard ordination techniques, or one can tune the computation to some fixed number of partitions after becoming acquainted with the nature of the solutions obtained during the initial EA runs. 17.4.5.
Convergence
We use Intra-island rank histograms to monitor the rate of convergence within a single population, and /nier-island rank histograms to combine evidence about the satisfactory convergence of a series of EA runs41. Both types of the histograms together do not guarantee a true convergence, however, they do help us in approximating the convergence and avoiding the wastage of compute resources.
Fig. 17.4. Two sets of Intra-Island Rank Histograms. The decreasing tail of (b) indicates the movement of total population towards convergence.
Intra-Island Rank Histogram entries are generated from the ratio of the number of individuals at a given rank in the current population to that of a combined and re-ranked populations of the current and the preceding
414
Rajeev Kumar
epochs. We are interested, for convergence, in the shift of the set of nondominated solutions between epochs, hence the ratioing of rank entries. Two typical Intra-island rank histograms are shown in Figure 17.4 in which a decreasing length of histogram tail denotes the movement of total population towards convergence. The rank-ratio of the bin belonging to rank unity should remain at 0.5 in an ideal converged state which shows that the whole population in two successive epochs remains non-dominated without any shuffling, though this is not an indicator of a convergence (Figure 17.4(b)). For solving complex multiobjective optimization problems there is a common concern whether the obtained solution is close to the true Paretofront or not. We argue that there is always a certain inheritance of genetic material or content belonging to one independent run and there may not be much appreciable evolutionary gain beyond a certain number of generations. We run each population sample to an approximation of (intra-island) convergence, the obtained solutions are then merged across islands and compared through Pareto-ranking. The shift of the Pareto-front is monitored with an inter-island rank histogram. In a scenario when some of the non-dominated solutions of either of the contributing islands are demoted to being dominated, the inter-island rank histograms are depicted in Figure 17.5. The smaller the entry in the bin corresponding to unity rank and the wider the histogram tail, the larger is the shift in the set of best solutions and the greater the reshuffling that has taken place. The desired outcome from merging the non-dominated members of two or more islands is that none of the non-dominated solutions is down-graded to dominated status and all solutions combine to form a similar or better sampled Pareto-front; the inter-island rank histogram of the combined solutions indicates unity in the bin corresponding to non-dominated rank.
Fig. 17.5. Two samples of Inter-Island Rank Histograms. The shift of Pareto-front is indicated by a tail and improved diversity by a larger value in the bin corresponding to unity rank.
On Machine Learning with Multiobjective Genetic Optimization
415
We use the PCGA which naturally performs good sampling of the solution space and ensures population advancement towards the Pareto-front. We compute both Intra-island and Inter-island rank histograms for monitoring the convergence. From the obtained set of solutions, a small subset based on sub-ranking of objectives were picked for learning. The ANCHOR connectionist architecture35 which was developed for integrating multiple heterogeneous learners is particularly suitable for hierarchical learning of subspaces. For this, the Net Definition Language (NDL)36 can be used for specifying the interfaces needed for connectionist architecture. This framework can be easily extended to include other paradigms of machine learning. 17.5. Results and Discussion In this section, we include a few representative results to demonstrate the 'proof-of-concept' that the partitioning strategy proposed here works well for complex problems of machine learning. More importantly, we show that the optimization problem of unknown nature in high-dimensional objective space could effectively be solved to obtain quality solutions by proper designing of multiobjective evolutionary algorithm and without needing a priori knowledge of the solution space. From a machine learning perspective, the efficacy of the partitioning strategy coupled with the genetic search should result into the partitions that address (a few of) the following: • Are the partitioned pattern subspaces compact? • Do the partitions contain patterns of a single class alone? • Are we able to exclude outliers and maximize the included patterns? • Is the total training time of multiple partitions lesser than the time needed for the monolithic data? • Are the validation errors minimized resulting in superior generalization? And, from a genetic optimization perspective, the efficacy of the algorithm implies obtaining diversity along the high-dimensional objective surface while ensuring a (near-) optimal convergence. For the above, we applied the partitioning strategy to a range of known synthetic problems as well as unknown real machine learning problems. We generated synthetic data with known structures and so that the resulting partitions can be evaluated. We generated two types of data - one, where
416
Rajeev Kumar
all blobs of patterns belonging to different classes were separated, and the other, where they overlap. We generated many sets of data by varying the following parameters: true dimensions of data (six to thirty-six dimensions), effective dimensionality (three to twelve dimensions), separation of blobs (well separated to just separated), multi-class data (two to four class data), and overlap of different degree among blobs. For each of the datasets, we generated two sets of data - one training set which is used for partitioning, and other validation set used for validating the partitions and the machine learning results. First, we partitioned synthetic data from two, 3-dimensional Gaussian blobs embedded in a 6-dimensional space and within this, we examined two cases: one where the two blobs are just separated and the other where they overlap. For the just-separated data a large number of equivalent solutions evolved, most of which comprised two clusters of three intrinsic dimensions, each containing only data from the separate classes, although some solutions contained exemplars from both classes. From the point of view of the EA, all non-dominated solutions are equivalent but some may be more desirable in practice. For the overlapped Gaussian blobs, the EA produced partitions of intrinsic dimensionality of 3 or 4 containing a fraction of data from the other class, both of which would be expected for this dataset. Positioning two hyperspheres on the (known) Gaussian centers and carrying-out an exhaustive search for the two best hypersphere radii produced partitions which were comparable to the typical EA results indicating that the EA was finding close-to-optimal clusters for both cases. This exercise was aimed at proof-of-principle that the genetic optimization technique works, and that the learning effort can be reduced with near-optimal partitions. We also considered the partitioning of a four-class synthetic problem in twelve variables; here each Gaussian blob was of three (mutually exclusive) dimensions and just separated from the others. This proved to be a bit harder than the six dimensional problem. Most of the family of equivalent solutions produced comprised four clusters of three intrinsic dimensions; there was some overlap among them. Nonetheless, we were not aiming at perfect solutions but we are interested in (minimally) overlapped solutions so that potentially, all the volume of interest could be mapped for subsequent classification. A number of Pareto-equivalent solutions, however, contained seven or eight dimensional hyperspheres. We experimented with other datasets as well. Next, we consider many sets of high-dimensional data sets taken from UCI repository of machine learning databases4. We investigated the be-
On Machine Learning with Multiobjective Genetic Optimization
417
havior of the partitioning algorithm in two modes - (i) variable number of clusters emerged out of the genetic search, and (ii) we fixed the number of clusters to two, four, six and eight in respective clustering runs depending on viewing the data with standard ordination techniques. In the following paragraphs, we include a brief abstract detailing the characteristics of the partitions emerged out of the genetic search. (However, a detailed description of the results obtained from land use classification of multispectral image data can be seen in Kumar & Rockett40 - in this work, a seven-element objective vector was designed for feedforward neural learning.) Typically, the following three types of partitions emerged out of the partitioning algorithm - Figure 17.6: (i) Type - I partition contained members of one class only, (ii) Type - II partition contained roughly an equal split of members of each class; mostly in the range of 40 : 60 split for a two-class data, (iii) Type - HI partition contained most members of a single class and a few, say 5%-10% members of the other classes, and finally (iv) Outliers (non-clutered members) are not included in any partition. The above is the general spectrum of solutions obtained by genetic optimization across a range of high-dimensional machine learning datasets. (We did not experiment with datasets of lower dimensions.)
Fig. 17.6. Three types of clusters emerged form the partitioning algorithm. Some patterns may not be included in any hypersphere.
Majority of the partitions contained samples of a single class. So once inclusion within a hypersphere was established, labeling an unknown datum was trivial. This category of clusters does not require any post-partitioning
418
Rajeev Kumar
effort for learning since to label an unknown point it is sufficient to determine in which cluster it is included. The effectiveness of such labeling was confirmed by the fact that in all the several thousand Pareto-optimal clusters we examined, we did not find any case where a cluster containing a single class of training data was subsequently found to include a single member of the other class from the test data. Thus, a pattern belonging to such a cluster is implicitly labeled without ambiguity. In the second category of partitions, mapping the clustered data for learning on to a certain type of machine learning algorithm, e.g., connectionist architecture is fairly straightforward since the roughly equal numbers of exemplars from each class together with the reduced size of the subset to be learned both simplify training. This also confirmed that for most such clusters the validation errors are fewer. For learning with nearest neighbor (NN) type of classifier, classification of an unknown point required far fewer nearest neighbor distance calculations than needed for classification based on the whole training set. Thus, in A;-NN classification, our partitioned dataset gave error rates which were not degraded over a monolithic classifier, but the time to compute a label was reduced significantly. For the third category of cluster, some of the machine learning algorithms are well-known to pose problems for learning such an unbalanced data set. For example, a feedforward network could be used but special measures are required to accommodate the unbalanced training2. Alternatively, a /c-NN classifier could be employed to decide the final classification within a hypersphere with less computation than would be required for nearest neighbor classification on the whole training set although clearly, unless at least k members of the minority class are included; otherwise the classification effectively degenerates to the first category. As a further option, a small fraction of examples within a hypersphere could be ignored at the cost of a minute increase in error rate by treating all included patterns as from the grossly dominant class. The objective was not to include all training data within the clusters and so some data points were excluded from the solutions and these are potentially outliers. The presence of outliers in a training set is known to pose problems and they can degrade the performance. In our strategy, isolated eponymous outliers may well be discarded since the relevant objective tries only to maximize the number of patterns utilized. Similarly, clusters of outliers caused by some systematic measurement failure are likely to generate their own hypersphere which may well be significantly separated from other patterns with the same class label.
On Machine Learning with Multiobjective Genetic Optimization
419
The strategy adopted in this work also supports the concept of ensemble based approaches. In an ensemble approach, we suggest that only those clusters where more than one class is represented need to be multiply mapped on suitable classifiers. Thus, the principal advantage of our partitioning approach is that only those patterns which lie near the decision boundaries warrant learning effort. We also propose a way of dealing with multiple instances of patterns (which is possible because one of the objectives directly promotes some overlap): the clusters can be assigned priorities based on inclusion of patterns of a single class and if a pattern is included in more than one cluster having different priorities, it can be safely assigned to the class of the higher-priority cluster. These additions may well enhance the accuracy of ensemble-based approaches and the functional-simplicity of modular systems. In general, it is difficult to compare the overall performance of machine learning algorithms, in absolute terms of mis-classification errors. Most of the learning algorithms give different error-rates with different set of parameters for the same dataset. Therefore, we do not include a quantitative comparison of the error-rates. In our work, we have observed, in general, that the classification rate was not reduced, which is highly significant. In most cases, classification accuracy, however, was improved, the improvement may not be statistically significant. Another major advantage of this strategy is that the total computational efforts can be divided in to off-line and on-line efforts. Total computation for the genetic search is significant which is off-line. The off-line learning of individual partitions is reduced. Nonetheless, each of the partition can be multiply learnt by different algorithms, and is labeled with a confidence label. During recall, which is on-line, this information is used, and an unknown datum is labeled with higher confidence and much reduced computational efforts. For example, if the unknown pattern belongs to the first category of cluster, the pattern is unambiguously labeled just by knowing the inclusion membership in constant time. Thus, the computational effort for training and recall is significantly reduced; the generalization ability is improved too. 17.6. Summary & Future Work We have discussed the issues in solving difficult problems of machine learning. High dimensional data rendered by real-world applications is mostly noisy containing distorted and overlapped patterns. Scaling of machine
420
Rajeev Kumar
learning algorithms and models, and achieving good generalization of learnt mappings for real-world data are challenging tasks. As a result, modular systems and ensemble based approaches have emerged as the potential solutions to solving complex domains of intelligent models. Whereas modularity addresses the drawback of scalability, ensemble-based approaches emphasize on improving the accuracy for a superior generalization. In this work, we have presented a generic framework for solving complex problems of machine learning by addressing both - modularity for simplification of learning complexity and ensembles for multiple learning efforts to improve upon the prediction accuracy. We identified a set of objectives and formulated the partitioning problem, without needing any application specific knowledge, as a multi-criteria optimization problem, involving competitive and conflicting objectives. Such a partitioning problem is hard. We used a MOEA for partitioning and adopted an algorithm which produces diverse solutions and monitors convergence without needing a priori knowledge of the partitions emerging out of the optimization criteria. We also employed a distributed version of the algorithm and generated solutions from multiple tribes to ensure convergence. However, we did not aim to get optimal trade-offs but tried to find a good approximation, i.e., a set of solutions whose objective values are hopefully not too far from the (unknown) optimal objective values. We tested this approach, first on synthetic data of known characteristics and of varying complexities to assess the efficacy of our approach and the MOEA-implementation. We observed that the implementation used in this work was able to find partitions with the desired trade-offs. Then, we tested our approach on many other datasets taken from real applications. The partitioning strategy adopted here is a divide-and-conquer strategy for scalability and explicitly optimizing the patterns for subsequent mapping on to multiple learners and was found to reduce the learning complexity. The other merit of this work is to have multiple views of a partition which supports the concept of ensemble-based approaches to improve the generalization ability. We have observed while working on many datasets, that data-partitions which contain only one-class of data in a single partition, do not require any further processing and are explicitly labeled without any ambiguity. In a partitioned ensemble-based approach, we suggest that only those clusters where more than one class is represented needs to be multiply mapped on suitable classifiers. Thus, the principal advantage of our partitioning approach is that only those patterns which lie near decision boundaries warrant the learning effort, possibly multiple efforts for
On Machine Learning with Multiobjective Genetic Optimization
421
enhanced accuracy. Thus, this evolutionary approach coupled with needbased multiple learning efforts simplifies the functional mapping, enhances the accuracy and offers better generalization. There can be many possible extensions of this work. For example, we may use hyperellipsoids instead of hyperspheres. In this work, we have used hyperspheres since their chromosomal representation is quite compact compared to other geometric primitives. Another extension of the work may be to explore the already partitioned pattern space for further partitioning; such recursively-partitioned pattern spaces can be directly mapped to hierarchical classifiers. Acknowledgments Author gratefully acknowledges discussions with Peter Rockett and Partha Chakrabarti during different stages of this work. A part of the work is supported from the Ministry of Human Resource Development, Government of India project grant. References 1. K. Ali and M. Pazzani. Error reduction through learning multiple descriptors. Machine Learning, 24(3): 173 - 202, 1996. 2. R. Anand, K. G. Mehrotra, C. K. Mohan, and S. Ranka. An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans. Neural Networks, 4(6): 962 - 969, 1993. 3. A. Atiya and C. Ji. How initial conditions affect generalization performance in large networks? IEEE Trans. Neural Networks, 8(2): 448 - 451, 1997. 4. C. L. Blake and C. J. Merz. UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. 1998. 5. P. A. N. Bosman and D. Thierens. The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE Trans. Evolutionary Computation, 7(2): 174 - 188, 2003. 6. L. Breiman. Stacked regression. Machine Learning, 24(1): 49 - 64, 1996. 7. L. Breiman. Bagging predictors. Machine Learning, 24(2): 123 - 140, 1996. 8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees, 1984. New York, NY: Chapman k. Hall. 9. W. Buntime. Learning classification tress. Statistics & Computing, 2: 63 - 73, 1992. 10. P. Chan, S. Stolfo, and D. Wolpert. Working Notes AAAI Workshop Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms, 1996. Menlo Park, LA: AAAI Press. 11. E. I. Chang and R. P. Lippmann. Using genetic algorithms to improve pattern classification performance. In Advances in Neural Information Pro-
422
12. 13. 14. 15. 16. 17. 18.
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.
Rajeev Kumar
cessing System, R. P. Lippmann, J. E. Moody and D. S. Touretzky, Eds., 3: 797 - 803, 1991. San Mateo, CA: Morgan Kaufmann. C. A. C. Coello. List of References on Evolutionary Multiobjective Optimization. [http://www.lania.mx/~ccoello/EMOO/EMOObib.html]. C. A. C. Coello, D. A. Van Veldhuizen, and G. B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems, 2002. Boston, MA: Kluwer. K. Deb. Multi-objective genetic algorithms: problem difficulties and construction of test problems. Evolutionary Computation, 7(3): 205 - 230, 1999. K. Deb. Multiobjective Optimization Using Evolutionary Algorithms, 2001. Chichester, UK: Wiley. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evolutionary Computation, 6(2): 182 - 197, 2002. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd Edition, 2001. New York, NY: Wiley. C. M. Fonseca and P. J. Fleming. Multiobjective optimization and multiple constraint handling with evolutionary algorithms - Part I: a unified formulation. IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans, 28(1): 26 - 37, 1998. J. H. Friedman. A recursive partitioning decision role for nonparametric classification. IEEE Trans. Computers, 26(4): 404 - 408, 1977. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness, 1979. San Francisco, LA: Freeman. S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Networks, 4(1): 123 - 1232, 1994. L. Hansen and P. Salamon. Neural network ensemble. IEEE Trans. Pattern Analysis & Machine Intelligence, 12(10): 993 - 1001, 1990. M. P. Hansen and A. Jaszkiewicz. Evaluating the quality of approximations of the nondominated set. Tech. Rep., Inst. of Mathematical Modeling, Tech. Univ. of Denmark, IMM Tech. Rep. IMM-REP-1998-7, 1998. S. Haykin. Neural Networks - A Comprehensive Foundation, Second Edition, 1999. Englewood Cliffs, NJ: Prentice Hall. E. G. Henrichon and K. S. Fu. A nonparametric partitioning procedure for pattern classification. IEEE Trans. Computers, 18(7): 614 - 624, 1969. G. E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40: 185 - 234, 1989. D. Hochbaum (Ed.). Approximation Algorithms for NP-Hard problems, 1997. Boston, MA: PWS. R. A. Jacobs. Methods for combining experts' probability assessments. Neural Computations,!^): 450 - 463, 1995. R. A. Jacobs and M. I. Jordan. Learning piecewise in modular neural network architecture. IEEE Trans. Systems, Man & Cybernetics, 23: 337 - 345, 1993. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data, 1988. Englewood Cliffs, NJ: Prentice-Hall. M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the
On Machine Learning with Multiobjective Genetic Optimization
423
EM algorithm. Neural Computation, 6(2): 181 - 214, 1994. 32. J. D. Knowles and D. W. Corne. Approximating the non-dominated front using the Pareto Archived Evolution Strategy. Evolutionary Computation, 8(2): 149 - 172, 2000. 33. J. D. Knowles and D. W. Corne. On metrics for comparing nondominated sets. In Proc. Congress on Evolutionary Computation (CEC-02), 711 - 716, 2002. Piscataway, NJ: IEEE Press. 34. R. Kumar, On generalization of machine learning with neural-evolutionary computations. In Proc. 3rd Int. Conf. Computational Intelligence & Multimedia Applications (ICCIMA-99), 112 - 116. 1999. Los Alamitos, CA: IEEE Computer Society Press. 35. R. Kumar. ANCHOR - A connectionist architecture for partitioning feature spaces and hierarchical nesting of neural nets. Int. Journal Artificial Intelligence Tools, 9(3): 397 - 416, 2000. 36. R. Kumar. A neural network compiler system for hierarchical organization. ACM SIGPLAN Notices, 36(2): 26 - 36, 2001. 37. R. Kumar. Multicriteria network design using distributed evolutionary algorithm. In Proc. Int. Conf. High Performance Computing (HiPC), LNCS 2913: 343 - 352, 2003. Berlin Heildberg: Springer-Verlag. 38. R. Kumar, W. C. Chen, and P. I. Rockett. Bayesian labeling of image corner features using a grey-level corner model with a bootstrapped modular neural network. In Proc. IEE Int. Conf. Artificial Neural Networks, 440: 82 - 97, 1997. London: IEE Conference Publication. 39. R. Kumar and P. I. Rockett. Assessing the Convergence of Rank-Based Multiobjective Genetic Algorithms. In Proc. IEE-IEEE 2nd Int. Conf. Genetic Algorithms in Engineering Systems: Innovations & Applications, 446: 19 - 23, 1997. London: IEE Conference Publication. 40. R. Kumar and P. I. Rockett. Multiobjective genetic algorithm partitioning for hierarchical learning of high-dimensional pattern spaces : a learningfollows-decomposition strategy. IEEE Trans. Neural Networks, 9(5): 822 830, 1998. 41. R. Kumar and P. I. Rockett. Improved sampling of the Pareto-front in multiobjective genetic optimizations by steady-state evolution : a Pareto converging genetic algorithm. Evolutionary Computation, 10(3): 283 - 314, 2002. 42. P. Langley. Elements of Machine Learning, 1996. Morgan Kaufmann. 43. M. Laumanns, L. Thiele, K. Deo and E. Zitzler. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10(3): 263 - 182, 2002. 44. T. M. Mitchell. Machine Learning, 1997. New York, NY: McGraw-Hill. 45. C. A. Murthy and N. Chowdhury. In search of optimal clusters using genetic algorithms. Pattern Recognition Letters 17(8): 825 - 832, 1996. 46. E. Oja. Neural networks, principal components and subspaces. Int. J. Neural Systems, 1(1): 61 - 68, 1989. 47. D. C. Plaut and G. E. Hinton. Learning sets of filters using backpropagation. Computer, Speech & Languages, 2: 35 - 61, 1987.
424
Rajeev Kumar
48. R. C. Purshouse and P. J. Fleming. Elitism, sharing and ranking choices in evolutionary multi-criterion optimization. Research Report No. 815, Dept. Automatic Control &; Systems Engineering, University of Sheffield, 2002. 49. J. R. Quinlan. C4-5: Programs for Machine Learning, 1993. New York, NY: McGraw Hill. 50. R. Reed. Pruning algorithms - a survey. IEEE Trans. Neural Networks, 4(5): 740 - 747, 1993. 51. G. Rudolph and A. Agapie. Convergence properties in some multiobjective evolutionary algorithms. In Proc. Congress of Evolutionary Computation (CEC-00), 1010 - 1016, 2000. Piscataway, NJ: IEEE Press. 52. C. Schaffer. Overfitting avoidance as a bias. Machine Learning, 10(2): 153 178, 1993. 53. S. Z. Selim and M. A. Ismail. K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Analysis & Machine Intelligence, 6(1): 81 - 87, 1984. 54. A. J. C. Sharkey. Combining Artificial Neural Nets: Ensemble and Modular Multi-Net System: Perspective in Neural Computing. Springer-Verlag, London, 1999. 55. N. E. Sharkey and A. J. C. Sharkey. An analysis of catastrophic interference. Connection Science, 7(3-4): 301 - 329, 1995. 56. R. Srikanth et al. A variable length genetic algorithm for clustering and classification. Pattern Recognition Letters, 16(8): 789 - 800, 1995. 57. N. Srinivas and K. Deb. Multiobjective Optimization using Non-dominated Sorting in Genetic Algorithms. Evolutionary Computation, 2: 221 - 248, 1994. 58. R. S. Sutton. Two problems with backpropagation and other steepestdescent learning procedure for networks. In Proc. Annual Conf. Cognitive Society, 828 - 831, 1986. Hillsdale: Lawrence Erlbaum. 59. K. C. Tan, T. H. Lee, and E. F. Khor. Evolutionary algorithms for multiobjective optimization: performance assessment and comparisons. In Proc. Congress on Evolutionary Computation (CEC-01), 979 - 986, 2001. Piscataway, NJ: IEEE Press. 60. A. Toffolo and E. Benini. Genetic diversity as an objective in multiobjective evolutionary algorithms. Evolutionary Computation, 11(2): 151 - 167, 2003. 61. A. Waibel, H. Sawai, and K. Shikano. Modularity and scaling in large phonemic neural networks. IEEE Trans. Acoustics, Speech & Signal Processing, 37(12): 1888 - 1897, 1989. 62. A. S. Weigend. On overfitting and the effective number of hidden units. In Proc. 1993 Connectionist Models Summer School, M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, Eds., 335 - 342, 1994. Hillsdale, NJ: Lawrence Erlbaum. 63. D. H. Wolpert. Stacked Generalization. Neural Networks, 5(2): 241 - 259, 1992. 64. D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Trans. Evolutionary Computation, 1(1): 67 - 82, 1997. 65. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength
On Machine Learning with Multiobjective Genetic Optimization
425
Pareto evolutionary algorithm. In Proc. Evolutionary Methods for Design,
Optimization and Control with Applications to Industrial Problems (EUROGEN), 2001. 66. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: a Comparative Case Study and the Strength Pareto Approach. IEEE Trans. Evolutionary Computation, 3: 257 - 271, 1999. 67. E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. da-Fonseca. Performance assessment of multiobjective optimizers : an analysis and review. IEEE Trans. Evolutionary Computation, 7(2): 117 - 132, 2003.
CHAPTER 18 GENERALIZED ANALYSIS OF PROMOTERS: A METHOD FOR DNA SEQUENCE DESCRIPTION
R. Romero Zaliz a , I. Zwir6 and E. Ruspini c a Department of Computer Science Facultad de CienciasExactas y Naturales Universidad of Buenos Aires Buenos Aires, Argentina E-mail: rromeroQdc.uba.ar Department of Molecular Microbiology Washington University School of Medicine St. Louis, Missouri, U.S.A. Email:[email protected]. edu g c
Artificial Intelligence Center SRI International Menlo Park, California, U.S.A. E-mail: [email protected] Recent advances in the accessibility of databases containing representations of complex objects—exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways—have not been matched by availability of tools that facilitate the retrieval of objects of particular interest while aiding to understand their structure and relations. In applications such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basic of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This chapter presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology, where the features being sought correspond to the solutions of a multivariable, multiobjective optimization s
Corresponding author. Currently at Departamento de Ciencias de la Computation e I.A., E.T.S. Ingenien'a Informatica, C/ Daniel Saucedo Aranda s/n, 18071 Granada, Espafia. Tel:(34) 958-240469, Email:[email protected]. 427
428
R. Romero Zaliz et al.
problem and generally correspond to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature distance and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described that lie in the set of all Pareto-optimal solutions—of that problem. These candidate features are then inter-related employing domain-specific relations of interest to the end users. We present results of the application of a method termed Generalized Analysis of Promoter (GAP) to identify one of the most important factors involved in the gene regulation problem in bacteria, which is crucial for detecting regulatory behaviors or genetic pathways as well as gene transcription: the RNA polymerase motif. The RNA polymerase or promoter motif presents vague submotifs linked by different distances, thus, making its recognition in DNA sequences difficult. Moreover, multiple promoter motifs can be present in the same regulatory regions and all of them can be potential candidates until experimental mutagenesis is performed. GAP is available for public use in http://soar-tools.wustl.edu.
18.1. Introduction One of the big challenges of the post genomic era is determining when, where and for how long genes are turned on or off4. Gene expression is determined by protein-protein interactions among regulatory proteins and with RNA polymerase, and protein-DNA interactions of these trans-acting factors with cis-acting DNA sequences in the promoters of regulated genes 22 7 ' . Therefore, identifying these protein-DNA interactions, by means of those DNA motifs that characterize the regulatory factors that operate in the transcription of a gene12'23, becomes crucial for determining which genes participate in a regulation process, how they behave and how are they connected to build genetic networks. The RNA polymerase or promoter is an enzyme that transcribes a gene or recruits other regulatory factors to interact with it, producing cooperative regulations 22. Different computational methods have been applied to discover promoter motifs or patterns 9,11,15,2,12 However, most of them failed to provide accurate predictions in prokaryotic promoters because of the variability of the pattern, which comprises more than one vague submotif and variable distances between them. Moreover, multiple occurrences of promoters in the same regulatory region of one gene can be found (e.g. different promoters can be used for gene
Generalized Analysis of Promoters: A Method for DNA Sequence Description
429
activation and repression, or can interact with different regulatory factors from the same regulatory pathway 19 ' 17 ). This paper presents a method termed Generalized Analysis of Promoters (GAP), which applies generalized clustering techniques 27>35 to the discovery of qualitative features in complex biological sequences, particularly multiple promoters in bacterial genomes. The motivation for the development of this methodology is provided by requirements to search and interpret databases containing representations of this type of objects in terms that are close to the needs and experience of the users of those data-based descriptions. These qualitative features include both interesting substructures and interesting relations between those structures, where the notion of interestingness is provided by domain experts by means of abstract qualitative models or learned from available databases. The GAP method represents promoter features as fuzzy logic expressions with fuzzy predicates, whose membership functions are learned from probabilistic distributions28'21'36. The proposed method takes adventage of a new developed Multi-Objective Scatter Search (MOSS) algorithm to identify multiple promoters occurrences within genomic regulatory regions by optimizing multiple criteria that those features that describe promoters should satisfy. This methodology formalizes previous attempts to produce exhaustive searches of promoters12, most of which emphasize the processing of detailed system measurements rather than that of qualitative features of direct meaning to users (called perceptions by Zadeh) 32. Therefore, this chapter is organized as follows: Section 2 describes the generalized clustering framework; Section 18.3 explains the problem of discoverying and describing bacterial promoters; Section 18.4 applies the GAP method to the promoter discovery problem in Escherichia coli (E. coli) genome; Section 18.5, shows the results obtained by the proposed method and its evaluation; and Section 18.6 summarizes the concluding remarks.
18.2. Generalized Clustering The method presented in this paper belong to a family of techniques for the discovery of interesting structures in datasets by classification of its points into a finite number of fuzzy subsets, or fuzzy clustering. indexFuzzy clustering methods were introduced by Ruspini25 to provide a richer representation scheme, based on a flexible notion of partition, for the summarization of dataset structure, and to take advantage of the ability of
430
R. Romero Zaliz et al.
continuous-analysis techniques to express and treat classification problems in a formal manner. In Ruspini's original formulation the clustering problem was formulated as a continuous-variable optimization problem over the space of fuzzy partitions of the dataset. This original formulation of the clustering problem as an optimization problem has been largely retained in various extensions of the approach, which differ primarily on the nature of the functionals being optimized and on the constraints that the partition must satisfy3. The original approach proposed by Ruspini, however, focused on the determination of the clustering as a whole, i.e., a family of fuzzy subsets of the dataset providing a disjoint, exhaustive partition of the set into interesting structures. Recent developments, however, have emphasized the determination of individual clusters as fuzzy subsets having certain optimal properties. From this perspective, a fuzzy clustering is a collection of optimal fuzzy clusters—that is, each cluster is optimal in some sense and the partition satisfies certain conditions—rather than an optimal partition— that is, the partition, as a whole, is optimal in the sense that it minimizes some predefined functional defining classification quality. Redirecting the focus of the clustering process to the isolation of individual subsets having certain desirable properties provides also a better foundation for the direct characterization of interesting structure while freeing the clustering process from the requirement that clusters be disjoint and that partitions be exhaustive. In the context of image-processing applications, for example, features may correspond to certain interesting prototypical shapes. In these applications not every image element may belong to an interesting feature while some points might belong to more than one cluster (e.g., the intersection of two linear structures). It was, indeed, in the context of image-processing applications that Krishnapuram and Keller14 reformulated the fuzzy clustering problem so as to permit the sequential isolation of clusters. This methodology, called possibilistic clustering, does not rely, like previous approaches, on prior knowledge about the number of clusters while permitting to take full advantage of clustering methods based on the idea of prototype. Prototype-based classification methods3 are based on the idea that a dataset could be represented, in a compact manner, by a number of prototypical points. The well-known fuzzy c-means method of Bezdek—the earliest fuzzy-clustering approach exploiting this idea—seeks to describe a dataset by a number of prototypical points lying in the same domain as the members of that dataset. Extensions of this basic idea based on a
Generalized Analysis of Promoters: A Method for DNA Sequence Description
431
generalization of the notion of prototypical structure in a variety of ways (e.g., as line or curve segments in some euclidean space) are the basis for methods that seek to represent datasets in terms of structures that have been predefined as being of particular interest to those seeking to understand the underlying physical systems being studied. Generally speaking, however, these methods require that prototypical structures belong to certain restricted families of objects so as to exploit their structural properties (e.g., the linear structure of line segments or hyperplane patches). The generalized clustering methodology presented in this paper belongs to this type of approaches, extending them by consideration of arbitrary definitions of interesting structures provided by users by means of a family of parameterized models M = [Ma] and a set of relations between them 26,35 j n addition to a variety of geometric structures, these models may also be described by means of structures (e.g., neural networks) learned from significant examples of the features being defined or in terms of very general constraints that features might satisfy to some degree (soft or fuzzy constraints). As is the case with possibilistic clustering methods, our approach is based on the formulation of the qualitative-feature identification problem in terms of the optimization of a continuous functional Q(F, Ma) that measures the degree of matching between a fuzzy subset F of the dataset and some instantiation Ma of the family of interesting models27. Our approach recognizes, however, that simple reliance on optimization of a single performance index Q would typically result in the generation of a large number of features with small extent and poor generalization as it is usually easier to match smaller subsets of the dataset than significant portions of it. For this reason, it is also necessary to consider, in addition to measures Q of representation quality, additional criteria S gauging the size of the structure being represented. In addition, it may also be necessary to consider application-specific criteria introduced to assure that the resulting features are valid and meaningful (e.g., constraints preventing selective picking of sample points so that they lie, for example, close to a line in sample space). This multiobjective problem might be treated by aggregation of the multiple measures of feature desirability into a global measure of cluster quality . A problem with this type of approach, which is close in spirit to minimum description length methods24, is the requirement to provide a-priori relative weights to each one of the objectives being aggregated. It should be clear that assignment of larger weight to measures Q of quality representation would lead to small features with higher degrees of matching to
432
R. Romero Zaliz et al.
models in the prototype families while, conversely, assigning higher weights to measures S of cluster extent would tend to produce larger clusters albeit with poor modeling ability. Ideally, a family of optimization problems, each similar in character to the others but with different weights assigned to each of the aggregated objectives, should be solved so as to produce a full spectrum of candidate clusters. Rather than following such a path—involving the solution of multiple problems—our approach relies, instead, on a reformulation of the generalized clustering problem as a multiobjective optimization problem involving several measures of cluster desirability27. In this formulation, subsets of the dataset of potential interest are locally optimal in the Pareto sense, i.e., they are locally nondominated solutions of the optimization problem.11. Locally nondominated solutions of a multiobjective optimization problem are those points in feature space such that their neighbors do not have better objective values for all objectives while being strictly superior in at least one of them, (i.e., a better value, for a neighbor, of some objective implies a lower value of another). The set of these solutions is called the local Pareto-optimal or local effective frontier. We employ a multiobjective genetic algorithm (MGA)27 based on an extension of methods originally proposed by Marti and Laguna 18>8 to solve this problem. This method is a particularly attractive tools to solve such complex optimization problems because of their generality and their capability, stemming from application of multimodal optimization procedures, to isolate local optima. 18.3. Problem: Discovering Promoters in DNA Sequences Biological sequences, such as DNA or protein sequences, are a good example of the type of complex objects that maybe described in terms of meaningful structural patterns. Availability of tools to discover these structures and to annotate the sequences on the basis of those discoveries would greatly improve the usefulness of these repositories that currently rely on methods developed on the basis of computational efficiency and representation accuracy rather than on terms of structural and functional properties deemed to be important by molecular biologists. An important example of biological sequences are prokaryotic promoter data gathered and analyzed by many compilations 10'9>16 that reveal the presence of two well conserved sequences or submotifs separated by varih
The notions of proximity and neighborhood in feature space is application dependent.
Generalized Analysis of Promoters: A Method for DNA Sequence Description
433
able distances and a less conserved sequence. The variability of the distance between submotifs and their fuzziness, in the sense that they present several mismatches, hinder the existence of a clear model of prokaryotic corepromoters. The most representative promoters in E. coli (i.e. a 70 subunits) are described by the following conserved patterns: (1) TTGACA: This pattern is an hexanucleotide conserved sequence whose middle nucleotide is located approximately 35 pair of bases upstream of the transcription start site. The consensus sequence for this pattern is TTGACA and the nucleotides reported in 16 reveal the following nucleotide distribution: T69T7gG6iA56C54A54, where for instance the first T is the most seen nucleotide in the first position of the pattern and is present in 69 % of the cases. This pattern is often called -35 region. (2) TATA AT: This pattern is also an hexanucleotide conserved sequence, whose middle nucleotide is located approximately 10 pair of bases upstream of the transcription start site. The consensus sequence is TATAAT and the nucleotide distribution in this pattern is T77A76T6oA6iA56T82, which is often called -10 region1*. (3) CAP Signal: In general, a pyrimidine (C or T) followed by a purine (A or G) compose the CAP Signal. This signal constitutes the transcription start site (TSS) of a gene. (4) Distance(TTGACA, TATAAT). The distance between the TTGACA and TATAAT consensus submotifs follows a data distribution between 15 and 21 pair of bases. This distance is critical in holding the two sites at the appropriate distance for the geometry of RNA polymerase 10 . The identification of the former RNA polymerase or promoters sites becomes crucial to detect gene activation or repression, by the way in which such promoters interact with different regulatory proteins (e.g. overlapping suggest repression and distances of approximately 40 base pairs suggest typical activation). Moreover, combining the promoter sites with other regulatory sites 37 can reveal different types of regulation, harboring RNA polymerase alone, RNA polymerase recruiting other regulatory protein, or cooperative regulations among more than one regulator22. Different methods have been used to identify promoters 30>15.2>9) but several failed to perform accurate predictions because of their lack of flexibility, by using crisp instead of fuzzy models for the submotifs (e.g., TATAAT or TTGACA 3 1 ), or restricting distances between submotifs to fixed values (e.g., 17 base
434
R. Romero Zaliz et al.
pairs12). The vagueness of the compound promoter motifs and the uncertainty of identifying which of those predicted sites correspond to a functional promoter can be completely solved only by performing mutagenesis experiments22. Thus more accurate and interpretable predictions would be useful in order to reduce the experiment costs and ease the researchers work. 18.4. Biological Sequence Description Methods In this paper we present results of the application of GAP to the discovery of interesting qualitative features in DNA sequences based on those ideas discussed in Section 18.2. The notion of interesting feature is formally defined by means of a family of parameterized models M = {Ma} specified by domain experts27 who are interested in finding patterns such as epoch descriptors of individual or multiple DNA sequences. These idealized versions of prototypical models are the basis for a characterization of clusters as cohesive sets that is more general than their customary interpretation as "subsets of close points." To address the promoter prediction problem we take advantage of the ability of representing imprecise and incomplete motifs, the fuzzy sets representations flexibility and interpretability, and the multi-objective genetic algorithms ability to obtain optimal solutions using different criteria. Our proposed method GAP represents each promoter submotif (i.e., 10 and -35 regions and the distance that separates them) as fuzzy models, whose membership functions are learned from data distributions13'21. In addition, as a generalized clustering method, GAP considers the quality of matching with each promoter submotif model (Q), as well as the size of the promoter extend (5), by means of the distance between submotifs, as the multiple objectives to be optimized. To do so, we used a Multi-objective Scatter Search (MOSS) optimization algorithm 18'8, which obtains a set of multiple and optimal promoter descriptions for each promoter region. Moreover, the former matching is also considered by MOSS as a multimodal problem, since there is more than one solution for each region. GAP, by using MOSS, overcomes other methods used for DNA motif discovery, such as Consensus/Patser based on weight probabilistic matrices (see Section 18.5), and provides the desired trade-off between accurate and interpretable solutions, which becomes particurary desirable for the end users. The extension of the original Scatter Search (SS) heuristic 18 uses the DNA regions where promoters should be detected as inputs and finds all optimal
Generalized Analysis of Promoters: A Method for DNA Sequence Description
435
relationships among promoter submotifs and distance models. In order to extend the original SS algorithm to a multi-objective environment we need to introduce some concepts6'5: A multi-objective optimization problem is defined as: Maximize Qm(x, Ma), subject to gj(x) > 0, hk(x) = 0, x i < xi < x\
m = 1,2,..., \M\; jg = k= ,i —
l,2,...,J; l,2,...,K; l,2,...,n.
where Ma is a generalized clustering model, \M\ corresponds to the number of models and Qm the objectives to optimize, J to the number of inequality constraints, K to the number of equality constraints and finally n is the number of decision variables. The last set of constraints restrict each decision variable Xi to take a value within a lower x\ and an upper x\ ' bound. Specifically, we consider the following instantiations: • \M\ = 3. We have three models: M^ and M\ are the models for each of the boxes,TTGACA-box and TATAAT-box, respectively, and M\ corresponds to the distance between these two boxes (recall Equations 1 and 2, and Figure 18.1). • \Q\ = 3. We have three objectives consisting of maximizing the degree of matching to the fuzzy models (fuzzy membership): Ql{x,Mi),Q2{x,Ml) and Q3(x,Ml) • J = 1. We have just one constraint g-\_: the distance between boxes can not be less than 15 and no more than 21 pair of bases. • K = 0. No equality constraints needed. • Only valid solutions are kept in each generation. • The boxes can not be located outside the sequence searched, that is, it can not start at negative positions or greater than the length of the query sequence. Definition 8: A solution x is said to dominate solution y (x -< y), if both conditions 1 and 2 are true: (1) The solution x is no worse than y in all objectives: fi(x) ^ fi(y) for alH = 1,2,... ,M; (2) The solution x is strictly better than y in at least one objective: fj(x) < fj(y) for at least one i 6 {1,2,..., M}. If x dominates the solution y it is also customary to write that x is nondominated by y.
436
R. Romero Zaliz et al.
In order to code the algorithm, three different models were developed. Both submotif models were implemented by using their nucleotide consensus frequency as discrete fuzzy sets, whose membership function has been learned from distributions13 The first model corresponding to the TATAAT-box was formulated as: Ml = iHataat{x) =Ml(si)U...U/4(a;)
(1)
where the fuzzy discrete set corresponding to the first nucleotide of the submotif To.77A076To.60A0.6iA0.56T0.82 was defined as n\(xi) = A/0.08+ T/0.77 + G/0.12 + C/0.05, and the other fuzzy sets corresponding to positions 2-6 were calculated in a similar way according to data distributions from16. The second model corresponding to the TTGACA-box was described as: Ml = littgaca (X) = n\ (Xl ) U ... U l4 (X)
(2)
where the fuzzy crisp set corresponding to the first nucleotide of the submotif T0.69T0.79G0.51A0.56C0.54A0.54 was defined as tf(x) = A/0.12+T/0.69+ G/0.13 + C/0.06 and the other fuzzy sets corresponding to positions 2-6 were calculated in a similar way accordingly to data distributions from16. The union operation corresponds to fuzzy set operations21'13. The third model, i.e., the distance between the previous submotifs, was built as a fuzzy set, whose triangular membership function M^ (see Figure 18.1) was learnt from data distributions9 centered in 17, where the best value (one) is achieved. Therefore, the objective functions Qm correspond to the membership to the former fuzzy models Ma.
Fig. 18.1. Graphical representation of M% Combination Operator and Local Search. We used a block representation
Generalized Analysis of Promoters: A Method for DNA Sequence Description 437
to code each individual, where each block corresponds to one of the promoter submotifs (i.e., TATAAT-box or TTGACA-box). Particularly, each block was represented by two integers, where the first number corresponds to the starting point of the submotif, and the second one represents the size of the box (see Figure 18.2). The combination process was implemented as Phenotype ttgaca tataat gtttatttaatgtttacccccataaccacataatcgcgttacact
t
t
char 6
char 29
Genotype Gen 0
Gen 1
[(6,6)]
[(29,6)]
/i = 0.578595
h = 0.800000
f3 = 1.000000
Fig. 18.2. Example of the representation of an individual
a one-point combine operator, where the point is always located between both blocks. For example, given chromosomes with two blocks A and B, and parents P = A\Bi and P' = A2B2, the corresponding siblings would be S — A1B2 and 5' — A^B\. The local search was implemented as a search for nondominated solutions in a certain neighborhood. For example, a local search performed over the chromosome space involves a specified number of nucleotides located on the left or right sides of the blocks composing the chromosome. The selection process considers that a new mutated chromosome that dominates one of its parent will replace it, but if it becomes dominated by its ancestors no modification is performed. Otherwise, if the new individual is not dominated by the nondominated population found so far, it replaces its father only if it is located in a less crowded region (see Figure 18.3). Algorithm. We modified the original SS algorithm to allow multipleobjective solutions by adding the nondominance criterion to the solution ranking6. Thus, nondominated solutions were added to the set in any order, but dominated solutions were only added if no more nondominated solutions could be found. In addition to maintaining a good set of nondominated solutions, and to avoid one of the most common problems of multi-objective algorithms such as multi-modality6, we also kept track of the diversity of
438
R. Romero Zaliz et al.
the available solutions through all generations. Finally, the initial populations were created randomly and unfeasible solutions corresponding to out of distance ranges between promoter submotifs (gi) were checked at each generation. Figure 18.4 clearly illustrates the MOSS algorithm proposed in GAP. 1: Randomly select which block g in the representation of the individual c to apply local search. 2: Randomly select a number n in [—neighbor,neighbor] and move the block g, n nucleotides. Notice that it can be moved upstream or downstream. Resulting block will be g' and resulting individual will be called c'. 3: if c' meets the restrictions then 4: if c' dominates c then 5: Replace c with c' 6: end if 7: if c' does not dominate c and c' is not dominated by c and c' is not dominated by any solution in the Non-Dominated set then 8: Replace c with c' if crowd(c') < crowd(c). 9: end if 10: end if Fig. 18.3. Local search
18.5. Experimental Algorithm Evaluation The GAP method was applied to a set of known promoter sequences reported in9. In this work 261 promoter regions and 68 alternative solutions (multiple promoters) defined in9 for the corresponding sequences (totalizing 329 regions) constituted the input of the method. To evaluate the performance of GAP, we first compare the obtained results with the ones retrieved by a typical DNA sequence analysis method, the Consensus/Patser n . Then, we compare the ability of MOSS with the other two Multiobjective Evolutionary Algorithms (MOEAs), i.e., the Strength Pareto Evolutionary Algorithm (SPEA)33 and the (ju + A) MultiObjective Evolutionary Algorithm (MuLambda)20. All of the former MOEA algorithms share the same following properties: • They store optimal solutions found during the search in an external set. • They work with the concept of Pareto dominance to assign fitness values to the individuals of the population.
Generalized Analysis of Promoters: A Method for DNA Sequence Description
439
1: Start with P = 0. Use the generation method to build a solution and the local search method to improve it. If x 0 P then add x to P, else, reject x. Repeat until P has the user specified size. 2: Create a reference set RefSet with 6/2 nondominated solutions of P and 6/2 solutions of P more diverse from the other 6/2. If there are not enough nondominated solutions to fill the 6/2, complete the set with dominated solutions. 3: NewSolution <— true 4: while Exists a Solution not yet explored (NewSolution = true) do 5: NewSolution <— false 6: Generate subsets of RefSet where there is at least one nondominated solution in each one. 7: Generate an empty subset TV to store nondominated solutions. 8: while subset to examine do 9: Select a subset and mark it as examined. 10: Apply combination operators to the solutions in the set. 11: Apply local search to each new solution x found after the combination process as explained in Figure 18.3 and name it xb. 12: if xb is nondominated by any x G N and xb £ N then 13: Add xb to N. 14: end if 15: end while 16: Add solutions y 6 N to P if there are no solution z £ P that dominates y. 16: NewSolution <— true. 17: end while Fig. 18.4. MOSS algorithm
Particularly, SPEA is a well-known algorithm that has some special features 33, including: • The combination of the above techniques in a single algorithm. • The determination of the fitness value of an individual by using the solutions stored in the external population, where dominance from the current population becomes irrelevant. • All individuals of the external set participate in the selection procedure. • A niching method is given to preserve diversity in the population. This method is based on Pareto optimality and does not require a distance parameter (e.g., the niche radius in a sharing function6). MuLambda is a relative new algorithm with a very different design from other Pareto approaches. This algorithm has the following characteristics20: • It does not use any information from the dominated individuals of the population. Only nondominated individuals are kept from generation to generation.
440
R. Romero Zaliz et al.
• The population size is variable. • It applies clustering to reduce the number of nondominated solutions stored without destroying the features of the optimal Pareto front. As we explained earlier, the MOSS approach has the following properties: • The local search is used to improve those solutions found during the execution of the algorithm. • The diversity of the solutions is kept by including in every generation a set of diverse solutions into the current population. To compare the results obtained from the former three algorithms, we use the same objective functions described in Section 18.4 and execute these algorithms 20 times with different seeds for each input sequence. A promoter is said to be found if it appears in, at least, one of the execution result sets. The parameters used in the experiments are listed in Table 18.68. Parameter Number of generations RefSet Non-Dominated population size
Value 200 16 300
Table 1. Parameters for algorithms
Our method overcomes Consensus/Patser11 by detecting the 93.1 % of the available promoters, while this method, based on weight matrices, identifies the 74 %. Moreover, GAP, by using MOSS also overcomes the other MOEA algorithms as it is illustrated in Table 18.69.
MOSS SPEA {H + A) GA
Original 243 217 223
Alternative 59 43 52
%originals 93.10% 83.14% 85.44%
%alternatives 86.76% 63.24% 76.47%
Total 302 260 275
%total 91.79% 79.03% 83.59%
Table 2. Results with different Multi-Objective Genetic Algorithms for all sequences. The Original column indicates the number of conserved promoter locations reported in the literature. The Alternative column indicates alternative locations also reported in the literature
Generalized Analysis of Promoters: A Method for DNA Sequence Description
441
We should note that there exist more than one possible description for each promoter region, as it is illustrated in Figure 18.5 for the Ada gene reported in Harley & Reynolds compilation9. These alternative descriptions were also found by MOSS in a higher percentage than the other methods (86.76 %). The complete set of results is illustrated in the Appendix. gttggtttttgcgtgatggtgaccgggcagcctaaaggctatcctt Fig. 18.5. Different solutions for the Ada sequence - Three different alternative locations for the preserved sequences were included in the final set of the MOSS method matching with the three alternatives reported in the literature
In addition to the number of promoters detected by using different MOEA algorithms, we use two other functions C34 and D (see Equations 3 and 4) to have a better understanding of each algorithm performance. Definition 9: Let X',X" C X two set of decision vectors. The function C maps the ordered pairs (X', X") to the [0,1] interval: o <-A^ > ^ ; =
e X ;3a € X : a ^ a }\ j-^77j
D(X', X") = \{a € X'; a" £ X" : a" £ a A aV a"}\
W
(4)
The value C(X',X") = 1 in the former definitions means that all solutions in X" are equal to or dominated by the solutions in X'. Its opposite value, C(X',X") = 0, represents the situation where no solutions in X" are covered by any solutions in X1. Both C(X',X") and C(X",X') must be considered since C(X',X") it is not necessary equal to 1 - C{X",X'). Function D(X',X") counts the number of individuals in X' that do not dominate X" and are not found in X". We show in Table 18.70 the average results obtained for the comparisons among the MOEA algorithms. The first Table measures the C(X', X"),and the other measures the D(X',X"). These numbers were obtained by executing the algorithms 20 times with different seeds and calculating the average value for both functions and sequences. As we previously suggested, function D counts the number of nondominated individuals of an algorithm that were not found in the other two MOEAs. The MOSS algorithm achieves the best value of D in all experiments, while SPEA and MuLambda present lower values. Moreover those
442
R. Romero Zaliz et al.
C{X',X") I MOSS I SPEA I ^ + A MOSS 0.538 0.360 SPEA 0.013 0.054 fj, + X 0.029 0.349 -
D(X',X") I MOSS I SPEA 1 jx + A MOSS 14.204 12.977 SPEA 0.170 0.876 n +A 1.066 2.284
Table 3. Sequence results
results obtained by MOSS do not present much fluctuation between different sequences. MOSS leads the rankings followed by MuLambda and SPEA in the last position of the table. In addition, the diversity of solutions found by MOSS is considerably better than the other two algorithms (aproximately seven times better according to the D value). Finally, MOSS becomes the most robust algorithm by finding, on average, a specific promoter 16.81 times of the 20 runs. In contrast, SPEA obtains a promoter 6.48 times of the total 20 runs and and MuLambda 9.33 of the times. 18.6. Concluding Remarks Generalized-clustering algorithms—solving multivariable, multiobjective, optimization problems—provide effective tools to identify interesting features that help to understand complex objects such as DNA sequences. We have proposed GAP, a promoter recognition method that was tested by predicting E.coli promoters. This method combines the advantages of feature representation based on fuzzy sets and the searching abilities of multiobjective genetic algorithms to obtain accurate as well as interpretable solutions. Particularly, these kinds of solutions are the most useful ones for the end users. That is, they allow to detect multiple occurrences of promoters, shedding light on different putative transcription start sites. The ability of finding multiple promoters becomes more useful when the whole intergenic regions are considered, allowing to predict distinct regulatory activities, harboring activation or repression. The present approach can be extended to identify other DNA motifs, which are also conected by variable distances, such as binding sites of transcriptional regulators (e.g., direct or inverted repeats). Therefore, by combining multiple and heterogeneous DNA motifs (e.g., promoters, binding sites, etc.), we can obtain different descriptions of the cis-acting regions and, thus, different regulatory environments. The present implementation of GAP is available for academic use in the SOARTOOLS web site (http://soar-tools.wustl.edu) and will be updated soon with a new dataset from RegulonDB database29 (in process).
Generalized Analysis of Promoters: A Method for DNA Sequence Description
443
Appendix Tables 18.71 through 18.74 illustrate the set of solutions found by GAP by considering the set of promoter examples published in 9 . The last column of the tables indicates whether the GAP recognized the promoter or not by the simbols / and D, respectively. The first column corresponds to the name of the sequence, the second column shows the beginning character position of the TTGACA-box, and the third column shows the character position where the TATAAT-box begins. These positions are the ones recognized by GAP. Only one result for each sequence is shown due to space limitations. The fourth column corresponds to the sequence itself with each of the boxes clearly depicted.
References 1. T. Back, D. Fogel, and Z. Michalewicz, Eds. Handbook of Evolutionary Computation. Institute of Physics Publishing and Oxford University Press, 1997. 2. L. Bailey and C. Elkan T. The value of prior knowledge in discovering motifs with meme. In Proc Int Conf Intell Syst Mol Biol, volume 3, pages 21-29, 1995. 3. J. C. Bezdek. Fuzzy clustering. In Handbook of Fuzzy Computation. E. H. Ruspini, P. P. Bonissone, & W. Pedrycz, Eds.: F6.2. Institute of Physics Press, 1998. 4. S. Brenner. Genomics. the end of the beginning. Science, 287(5461):21732179, 2000. 5. C. Coello Coello, D. Van Veldhuizen, and G. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, May 2002. 6. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, 2001. 7. M. Gibson and E. Mjolsness. Computational Modeling of Genetic and Biochemical Networks, chapter Modeling the Activity of Single Genes. The MIT Press, 2001. 8. D. E. Goldberg. Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley, 1989. 9. C. B. Harley and R. P. Reynolds. Analysis of e.coli promoter sequences. Nucleic Acids Research, 15(5):2343-2361, 1987. 10. D. K. Hawley and W. R. McClure. Compilation and analysis of escherichia coli promoter DNA sequences. Nucleic Acids Research, ll(8):2237-2255, 1983. 11. G. Z. Hertz and G. D. Stormo. Identifiying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15(7/8):563-577, 1999. 12. A. M. Huerta and J. Collado-Vides. Sigma70 promoters in escherichia coli:
444
13. 14. 15.
16. 17.
18. 19.
20. 21. 22. 23.
24. 25. 26.
27.
28.
29.
R. Romero Zaliz et al. specific transcription in dense regions of overlapping promoter-like signals. J Mol Biol., 333(2):261-278, 2003. G. J. Klir and T. A. Folger. Fuzzy sets, uncertainty, and information. Prentice Hall International, 1988. R. Krishnapuram and J. Keller. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 98-110, 1993. C. E. Lawrence, S. F. Altschul, M. S. Bogurski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science, 262(5131):208-214, 1993. S. Lisser and H. Margalit. Compilation of e.coli mrna promoter sequences. Nucleic Acids Research, 21(7):1507-1516, 1993. B. Magasanik J. Collado-Vides, and J. D. Gralla. Control site location and transcriptional regulation in escherichia coli. Microbiol Rev, 55(3):371-394, 1991. R. Marti M. Laguna. Scatter Starch: Methodology and Implementations in C. Kluwer Academic Publishers, Boston, 2003. C. Mouslim, T. Latin and E. A. Groisman. Signal-dependent requirement for the co-activator protein rcsa in transcription of the rcsb-regulated ugd gene. J Biol Chem, 278(50):50588-95, 2003. C. Newton R. Sarker, K. Liang. A new multiobjective evolutionary algorithm. European Journal of Operational Research, 140:12-23, 2002. W. Pedrycz, P. P. Bonissone and E. H. Ruspini. Handbook of fuzzy computation. Institute of Physics, 1998. M. Ptashne and A. Gann. Genes and signals. Cold Spring Harbor Laboratory Press, 2002. M. G. Reese. Application of a time-delay neural network to promoter annotation in the drosophila melanogaster genome. Computers & Chemistry, 26(1):51 56, 2002. J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, 1989. E. H. Ruspini. A new approach to clustering. Information and Control 15:1, 22-32, 1969. E. H. Ruspini and I. Zwir. Automated qualitative description of measurements. In Proc. 16th IEEE Instrumentation and Measurement Technology Conf, 1999. E. H. Ruspini and I. Zwir. Automated Generation of Qualitative Representations of Complex Object by Hybrid Soft-computing Methods. In Pattern Recognition: From Classical to Modern Approaches. S. K. Pal &: A. Pal, Eds. World Scientific Company, Singapore, 2001. E. H. Ruspini and I. Zwir. Automated generation of qualitative representations of complex object by hybrid soft-computing methods. In S. K. Pal and A. Pal, editors, Lecture Notes in Pattern Recognition. World Scientific Company, 2001. H. Salgado et al. Regulondb (version 3.2): transcriptional regulation and operon organisation in escherichia coli k-12. Nucleic Acids Research, 29:7274, 2001.
Generalized Analysis of Promoters: A Method for DNA Sequence Description 445 30. A. Ulyanov and G. Stormo. Multi-alphabet consensus algorithm for identification of low specificity protein-DNA interactions. Nucleic Acids Research, 23(8):1434-1440, 1995. 31. J. van Helden, B. Andre and J. Collado-Vides. A web site for the computational analysis of yeast regulatory sequences. Yeast, 16(2):177—187, 2000. 32. L. A. Zadeh. Outline of a Computational Theory of Perceptions Based on Computing with Words. In Soft Computing and Intelligent Systems: Theory and Applications. N .K. Sinha, M. M. Gupta & L. A. Zadeh, Eds.: 3-22. Academic Press, San Diego, 2000. 33. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: A comparative Case Study and the Strength Pareto. IEEE Transactions on Evolutionary Computation, 3:4, 257-271, 1999. 34. E. Zitzler, L. Thiele, and K. Deb. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation 8:2, 173-195, 2000. 35. I.Zwir and E.H.Ruspini. Qualitative Object Description: Initial Reports of the Exploration of the Frontier. In Proc. of the EUROFUSE-SIC99. Budapest, Hungary, 485-490, 1999. 36. I. Zwir, R. Romero Zaliz and E. H. Ruspini. Automated biological sequence description by genetic multiobjective generalized clustering. Ann N Y Acad Sci, 980:65-82, 2002. 37. I. Zwir, P. Traverso, and E. A. Groisman. Semantic-oriented analysis of regulation: the phop regulon as a model network. In Proceedings of the 3rd International Conference on Systems Biology (ICSB), 2003.
446 sequence ada alaS ampC ampC/C16 araBAD araC araE aral(c) aralfc)X(c) argCBH argCBH-Pl/6argCBH-Pl/LL argE-Pl argE-P2 argE/LL13 argF argl argR aroF aroG aroH bioA bioB bioP98 C62.5-P1 carAB-Pl carAB-P2 cat cit.util-379 cit.util-431 CloDFcloacin CtoDFnal colEl-B colEl-C colEl-Pl colEl-P2 ColEllO-13 colicinElP3 crp cya dapD deo-Pl deo-P2 deo-P3 divE
R. Romero Zaliz et al. I ttgaca
tataat
15 15 7 15 15 12 13 13 15 15 15 15 15 15 15 7 12 15 15 15 15 15 15 15 15 13 14 15 15 15 15 15 15 13 15 15 15 14 10 15 15
39 37 30 37 38 37 35 37 39 36 36 38 38 38 38 30 35 37 38 37 39 38 38 38 39 36 38 37 39 36 37 38 37 37 37 38 38 39 35 37 38
1 promoter AGCGGCTAAAGGTG AACGCATACGGTAT TGCTATCCTGACAG GCTATC TTAGCGGATCCTAC GCAAATAATCAATG CTGTTTCCGAC AGCGGATCCTAC AGCGGATCCTAC TTTGTTTTTCATTG TTTGTTTTTCATTG TTTGTTTTTCATTG TTACGGCTGGTGGG CCGCATCATTGCTT CCGCATCATTGCTT ATTGTGAAATGGGG TTAGAC TCGTCGCCGCG TACGAAAATATGGA AGTGTAAAACCCCG GTACTAGAGAACTA GCCTTCTCCAAAAC TTGTCATAATCGAC TTGTTAATTCGGTG CACCTGCTCTCGC ATCCCGCCATTAAG TAAGCAGATTTGCA ACGTTGATCGGC AAACAGGCGGGG GACAGGCACAGCA TCATATATTGACAC ACACGCGGTTGCTC TTATAAAATCCTCT TTATAAAATCCTCT GGAAGTCCACAGTC TTTTTAACTTATTG GCTACAGAGTTC TTTTTAACTTATTG AAGCGAGACACCAG GTAGCGCATCTTTC AAGTGCATCAGCGG CAGAAACGTTTTA TGATGTGTA ACACCAACTGTCTA AAACAAATTAGGGG
| TTGACG TTTACC TTGTCA TTGACA CTGACG TGGACT CTGACA CTGGCG CTGGCG TTGACA TTGACA TTGACA TTTTAT TGCGCT TGCGCT TTGCAA TTGCAA TTGCAG TTGAAA TTTACA GTGCAT GTGTTT TTGTAA TAGACT TTGAAA TTGACT TTGATT ACGTAA GTCTCA TTGTAC CTGAAA TTGAAG TTGACT TTGACT TTGACA TTTTAA TTGAAG TTTTAA GAGACA TTTACG TTGACA TTCGAA TCGAAG TCGCCG TTTACA
TGCGAGAA TTCCCAGTC CGCTGATT GTTGTCAC CTTTTTAT TTTCTGCC CCTGCGTGA CTTTTTAT CTTTTTATC CACCTCTGG CACCTCT CACCTCT TACGCTCA GAAACAGT GAAACAGT ATGAATAA ATGAATAA GAGCAAGG ACTTTACT CATTCTGA TAGCTTAT TTTGTTGTT ACCAAATT TGTAAACC TTATTCTC TTTAGCGC TACGTCATC GAGGTTCC GGCGACTAA GATCAACTG ACTGGAGG TGTGCGCCA TTTAAAA TTTAAAAC GGGAAAAT AAGTCAAA TAGTGGCCC AAGTCAAA CAAAGCGA GTCAATCA GAGGCCCTC CATCGATCT TGTGTTGCG TATCAGCG CGCCGCAT
ATGTTTAGC AAGAAAACT GGTGTCGT GCTGATTGG CGCAACTC GTGATTATA GTTGTTCACG CGCAACTC GCAACTCTC TCATGATAG GGTCATAA GGTCATGA ACGTTAGTG CAAAGCGGT CAAAGCGGT TTACACATA TCATCCATA CTTTGACAA TTATGTGT CGGAAGATA TTTTTTGT AATTCGGTG GAAAAGATT TAAATCTTT CCTTGTCCC CCATATCTC ATTGTGAAT AACTTTCAC CCCGCAAAC ATTTGTGCC AGTAAGGT AAGTCCGGC CAATAAGT AATAAGTT GCAGCGGCG GAGGATTT GACTACGGC GAGGATTT AAGCTATGC GCAAGGTGT AATCCAAAC CGTCTTGTGT GAGTAGATGT AATAACGG CGGGATGTT
Table 4. Results for the training sequences
TAAACT TATCTT TACAAT TATCGT TCTACT GACACT TATTTT TCTACT TACTAT TATCAA TATTAT TATTAT TATTTT TATGTT TATATT TAAAGT TAAATT TATTAA TATCGT TAGATT TATCAT TAGACT TAGGTT TAAATT CATCTC CAGAAT TAATAT CATAAT TCTTAC AATAAT AATAAT TACACT TAAAAA AAAAAT TAGCTT TATAAT TACACT TATAAT TAAAAC TAAATT GATAAA TAGAAT TAGAAT TATACT TATAGT
TCTCTCATGTG ATTCCCACTTTTCAGT CTAACGCATCGCCAATG TACAATCTAACGTATCG GTTTCTCCATACCCGTT TTTGTTACGCGTTTTTG TTCACTATGTCTTACTC GTTTCTCCATACCCGTT TTCTCCATACCCGTTTT TATTCATGCAGTATT CAATATTCATGCAGTAT CAATATTCATGCAGTAT TATTCATAAATACTGCA CATATGCGGATGGCG CATATGCGGATGGCG GAATTTTAATTCAATAA GAATTTTAATTCATTGA TCAGTCTAAAGTCTCGG TACGTCATCCTCGCTG GGAAGTATTGCATTCA GCTAACCACCCGGCGAG TGTAAACCTAAATCT TACAAGTCTACACCGAA TGGTTTACAAGTCGAT TCCCACATCCTGTTTT GCCGCCGTTTGCCAGA GCAAATAAAGTGAG GAAATAAGATCACTACC CTCTATACATAATTCTG TAAATGAAATCAC CATACTGTGTATATAT GGAAGGACAGATTTGG TAAATACTGTAA AAATACTGTACATATAA TTATGCTGTATATAAAA GGAAACCGCGGTAGCGT AGAAGGACAGTATTTGG GGAAACCGCGGTAGCGT AGTCAGGATGCTACAG GATCACGTTTTAGACC GGGTGATGTGTTTACTG TCTAACATACGGTTGC ACTAACAAACTCGCAA GATCTGATCATTTAAA GCGCGTCATTCCGGAAG
found • •/ /" S S •/ •/ •/ S •/ •/ S S S •/ •/ •/ •" •/ •/ •/ •/ •/ •/ D S •/ •/ D •/ •/ •/ •/ S /" S •/ •/ •/ •/ • •/ S •/ •/
Generalized Analysis of Promoters: A Method for DNA Sequence Description sequence dnaA-lp dnaA-2p dnaK-Pl dnaK-P2 dnaQ-Pl Fpla-oriTpX Fplas-traM Fplas-traY/Z frdABGD fumA 7-5-tnpA 7-<5-tnpR gal-Pl gal-P2 gal-P2/mut-l gal-P2/mut-2 glnL gin gltA-Pl gltA-P2 glyA glyA/geneX gnd groE gyrB his his A hisBp hisJ(St) hisS htpR-Pl htpR-P2 htpR-P3 ilvGEDA ilvIH-Pl ilvIH-P2 ilvIH-P3 ilvIH-P4 ISlins-PL ISlins-PR IS2I-II lad lacPl lacP115 lacP2 lep leu leuItRNA lex livJ lpd Ipp IppPl lppP2 lppRl Mima macll macl2 mac21 mac3 mac31 malBFG malK malPQ malPQ/A516Pl malPQ/A516P2 malPQ/A517/A malPQ/Ppl2 malPQ/Ppl3 malPQ/Ppl4 malPQ/Ppl5 malPQ/PP16 malPQ/Ppl8 malT manA metA-Pl metA-P2 metBL metF
ttgaca 15 15 15 15 15 15 15 14 12 15 15 14 15 15 14 14 15 15 15 15 15 15 15 11 14 15 15 15 15 15 15 15 15 14 15 14 15 15 13 7 15 15 14 15 15 2 15 15 15 7 15 13 13 13 15 14 14 14 14 14 15 15 15 12 15 15 14 14 14 15 15 15 15 15 15 12 8
tataat 39 38 39 37 37 38 38 37 34 38 38 36 38 37 36 36 40 38 37 39 38 39 38 38 38 38 38 37 38 38 39 38 38 37 38 37 38 37 36 30 38 39 37 38 37 25 37 38 38 30 38 37 37 36 38 38 38 38 37 37 37 37 38 34 39 37 38 37 38 38 38 37 38 38 38 35 31
447
promoter TGCGGCGTAAATCG TGCCCG CCTCGCGGC AGGATCGTT TACACT TAGCGAGTTCTGGAAA TCTGTGAGAAACAG AAGATC TCTTGCGC AGTTTAGGC TATGAT CCGCGGTCCCGATCG 1TTGCATCTCCCCC TTGATG ACGTGGTTT ACGACCCCA TTTAGT AGTCAACCGCAGTG ATGAAATTGGGCAG TTGAAA CCAGACGT TTCGCCCC TATTAC AGACTCACAACCACA GCCAGCGCTAAAGG TTTTCT CGCGTCCG CGATAGCG TAAAAT AGCGCCGTAACCCC GAACCACCAACCTG TTGAGC CTTTTTGT GGAGTGGGT TAAATT ATTTACGGATAAAG ATTAGGGGTGCTGC TAGCGG CGCGGTGT GTTTTnTA TAGGAT ACCGCTAGGGGCGCTG GCGTTAATAAGGT GTTAAT AAAATATA GACTTTCCG TCTATT TACCTTTTCTGATTATT GATCTCGTCAA ATTTCA GACTTATC GATCAGAC TATACT GTTGTACCTATAAAGGA GTACTAGTCTCAGT TTTTGT TAAAAAAG TGTGTAGGA TATTGT TACTCGCTTTTAACAGG ACACATTAACAGCA CTGTTT TTATGTGT GCGATAATT TATAAT ATTTCGGACGGTTGCA ATTCATTAACAAT TTTGCA ACCGTCCG AAATATTA TAAATT ATCGGACACATAAAAAC TCCATGTCACACTT TTCGCA TCTTTGTT ATGCTATGG TTATTT CATACCATAAG CTAATTTATTCCAT GTCACA CTTTTCGC ATCTTTGT TATGCT ATGGTTATTTCATACC TAATTTATTCCAT GTCAGA CTTTTCGC ATCTTTGT TATACT ATGGTTATTTCATAC TAATTTATTCCAT GTCACA CTTTTCGC ATTTTTGT TATGCT ATGGTTATTTCATAC CAATTCTCTGATGC TTCGCG CTTTTTATC CGTAAAAAGC TATAAT GCACTAAATGGTGC TAAAAAACTAACAG TTGTCA GCCTGTCC CGCTTATAA GATCAT ACGCCGTTATACGTT ATTCATTCGGGACA GTTATT AGTGGTAG ACAAGTTT AATAAT TCGGATTGCTAAGTA AGTTGTTACAAACA TTACCA GGAAAAGCA TATAATGCG TAAAAG TTATGAAGTCGGT TCCTTTGTCAAGAC CTGTTA TCGCACAA TGATTCGGT TATACT GTTCGCCGTTGTCC ACACCAAAGAACCA TTTACA TTGCAGGGC TATTTTTTA TAAGAT GCATTTGAGATACAT GCATGGATAAGCTA TTTATA CTTTAATA AGTACTTTG TATACT TATTTGCGAACATTCCA TTTTTCCCCC TTGAAG GGGCGAAG CCATCCCCA TTTCTC TGGTCACCAGCCGGGAA CGGACGAAAA TTCGAA GATGTTTACCGTGGAAAAGGG TAAAAT AACGGATTAACCCAAGT ATATAAAAAGTTC TTGCTT TCTAACGTG AAAGTGGTT TAGGTT AAAAGACATCAGTTGAA GATCTACAAACTAA TTAATA AATAGTTA ATTAACGCT CATCAT TGTACAATGAACTGTAC CCTCCAGTGCGGTG TTTAAA TCTTTGTG GGATCAGGG CATTAT CTTACGTGATCAG TAGAATGCTTTGCC TTGTCG GCCTGATT AATGGCAC GATAGT CGCATCGGATCTG AAATAATAACGTGA TGGGAA GCGCCTCG CTTCCCGTG TATGAT TGAACCCGCATGGCTC ACATTACGCCACTT ACGCCT GAATAATA AAAGCGTGT TATACT CTTTCCTGCAATGGTT TTCACAAGCTTGCA TTGAAC TTGTGGATA AAATCACGG TCTGAT AAAACAGTGAATG AGCTTGCATTGAAC TTGTGG ATAAAATC ACGGTCTGA TAAAAC AGTGAATGATAACCTCGT GCCAAAAAATATCT TGTACT ATTTACAA AACCTATGG TAACTC TTTAGGCATTCCTTCGA CTCTGGCTGCCAA TTGCTT AAGCAAGA TCGGACGGT TAATGT GTTTTACACATTTTTTC GAGGATTTTATCGT TTCTTT TCACCTTT CCTCCTGTT TATTCT TATTACCCCGTGT ATTTTAGGATTAA TTAAAA AAATAGAG AAATTGCTG TAAGTT GTGGGATTCAGCCGATT TGTAGAATTTTATT CTGAAT GTCTGGGC TCTCTATTT TAGGAT TAATTAAAAAAATAGAG CGAGGCCGGTGATG CTGCCA ACTTACTG ATTTAGTG TATGAT GGTGTTTTTGAGGTGCT ATATATACCTTA TGGTAA TGACTCCA ACTTATTGA TAGTGT TTTATGTTCAGATAAT GATGTC TGGAAA TATAGGGG CAAATCCAC TAGTAT TAAGACTATCACTTATT GACACCATCGAATG GCGCAA AACCTTTC GCGGTATGG CATGAT AGCGCCCGGAAGAGAGT TAGGCACCCCAGGC TTTACA CTTTATGCT TCCGGCTCG TATGTT GTGTGGAATTGTGAGC TTTACACTTTATG CTTCCG GCTCGTAT GTTGTGTGG TATTGT GAGCGGATAACAATTT AATGTGAGTTAGCT CACTCA TTAGGCAC CCCAGGCTT TACACT TTATGCTTCCGGCTCG TCCTCGCCTCAATG TTGTAG TGTAGAAT GCGGCGTT TCTATT AATACAGACGTTAAT G TTGACA TCCGTTTT TGTATCCAG TAACTC TAAAAGCATATCGCATT TCG ATA ATT A ACT A TTGACG AAAAGCTG AAAACCAC TAGAAT GCGCCTCCGTGGTAGCA TGTGCAGTTTATGG TTCCAA AATCGCCT TTTGCTGTA TATACT CACAGCATAACTGTAT TGTCAAAATAGCTA TTCCAA TATCATAA AAATCGGGA TATGTT TTAGCAGAGTATGCT TTGTTG TTTAAA AATTGTTA ACAATTTTG TAAAAT ACCGACGGATAGAACGA CCATCAAAAAAATA TTCTCA ACATAAAA AACTTTGTG TAATAC TTGTAACGCTACATGGA ATCAAAAAAATA TTCTCA ACATAAAAA ACTTTGTGT TATACT TGTAACGCTACATGGA ATCAAAAAAATA TTCTCA ACATAAAAA ACTTTGTGT TATAAT TGTAACGCTACATGGA ATCAAAAAAATA TTCACA ACATAAAA A ACTTTGT GTAATA CTTGTAACGCTACATGGA ATGCGCAACGCGGG GTGACA AGGGCGCG CAAACCCTC TATACT GCGCGCCGAAGCTGACC CCCCCGCAGGGAT GAGGAA GGTGGTCGA CCGGGCTCG TATGTT GTGTGGAATTGTGAGC CCCCCGCAGGGAT GAGGAA GGTCGGTCG ACCGGCTCG TATGTT GTGTGGAATTGTGAGC CCCCCGCAGGGAT GAGGAA GGTCGACCT TCCGGCTCG TATGTT GTGTGGAATTGTGAGC CCCCCGCAGGGAT GAGGAA GGTCGGTC GACCGCTCG TATGTT GTGTGGAATTGTGAGCG CCCCCGCAGGGAT GAGGAA GGTCGGTC GACCGCTCG TATATT GTGTGGAATTGTGAGCG AGGGGCAAGGAGGA TGGAAA GAGGTTGC CGTATAAA GAAACT AGAGTCCGTTTAGGTGT CAGGGGGTGGAGGA TTTAAG CCATCTCC TGATGACG CATAGT CAGCCCATCATGAATG ATCCCCGCAGGATG AGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT ATCCCCGCAGG ATGAGG AGCCTGGC AAACTAGC GATGAT AACGTTGTGTTGAA ATCCCCGCAGGAGG ATGAGG AGCCTGGCA AACTAGCGA TAACGT TGTGTTGAAAA CCCCGCAGGATGAG GTCGAG CCTGGCAA ACTAGCGA TAACGT TGTGTTGAAAA ATCCCCGCAGGAT GAGGAA GGTCAACA TCGAGCCTG GAAAAC TAGCGATAACGTTGTGT ATCCCCGCAGGAT TAGGAA GGTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT ATCCCCGCAGGAT GAGGAA GGTCAACA TCGAGCCTG GAAACT AGCGATAACGTTGTGT ATCCCCGCAGGAT GAGAAA GGTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT ATCCCCGCAGGATA AGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT ATCCCCGCAGGATG GGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT GTCATCGCTTGCAT TAGAAA GGTTTCTG GCCGACCT TATAAC CATTAATTACG CGGCTCCAGGTTAC TTCCCG TAGGATTC TTGCTTTAA TAGTGG GATTAATTTCCACATTA TTCAACATGCAGGC TCGACA TTGGCAAA TTTTCTGGT TATCTT CAGCTATCTGGATGT AAGACTAATTACCA TTTTCT CTCCTTTT AGTCATTCT TATATT CTAACGTAGTCTTTTCC TTACCGTGACA TCGTGT AATGCACC TGTCGGCGT GATAAT GCATATAATTTTAACGG TTTTCGG TTGACG CCCTTCGG CTTTTCCTT CATCTT TACATCTGGACG
Table 5. Results for the training sequences
found 7 •/ •" S •/ S • S S ^ •/ S •/ •/ •/ S S •/ S •/ S •/ S • S y •/ •/ •" •/ •/ •/ •" S •/ •/ •/ •/ S •/ •/ S •/ V •/ S •/ S S •/ S •/ •/ •/ •/ •/ •/ •/ sT •/ •/ •/ •" S •/ •/ •/ • -/ •/ S • •/ •/ •/ •/ •/ S •/
448 sequence micF motA MuPc-1 MuPc-2 MuPe NRlrnaC NRlrnaC/m NTPlrnalOO nusA ompA ompC ompF ompF/pKI217 ompR pl5primer pl5rnal P22ant P22mnt ± £i£x t\. P22PRM pBR3l3Htet pColViron-Pl pCo)Viron-P2 pEG3503 phiXA phiXB phiXD Iambdacl7 lambdacin lambdaL57 lambdaPI lambdaPL lambdaPo lambdaPR lambdaPR' lambdaPRE lambdaPRM pBR322bla pBR322P4 pBR322primer pBR322tet pBRH4-25 pBRPl pBRRNAI pBRtet-10 pBRtet-15 pBRtet-22 pBRtet/TA22 pBRtet/TA33 pori-l pori-r ppc pSClOloriPl pSC101oriP2 pSC101oriP3 pyrBl-Pl pyrBl-P2 pyrD pyrE-Pl pyrE-P2 R100rna3 R100RNAI R100RNAII R1RNAII recA rnh rn(pRNaseP) rplj rpmHlp rpmH2p rpmH3p rpoA rpoB rpoD-Pa rpoD-Pb rpoD-Phs rpoD-Phs/min rrn4.5S rrnABPl
R. Romero Zaliz et al. I ttgaca 15 15 6 9 15 15 15 11 7 12 15 7 3 15 15 15 15 15 l o 9 12 15 13 6 15 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 4 15 15 15 15 15 10 10 15 3 8 15 15 13 15 15 14 15 15 15 15 15 15 15 15 15 15 15 15 15 13 9 13 14 15
t a t a a t I promoter 37 GCGGAATGGCGAAA TAAGCA CCTAACAT CAAGCAAT AATAAT TCAAGGTTAAAATCAAT 39 GCCCCAATCGCGCG TTAACG CCTGACGAC TGAACATCC TGTCAT GGTCAACAGTGGA 33 AAATT TTGAAA AGTAACTTTATAGAAAAGAAT AATACT GAAAAGTCAATTTGGTG 32 GGAACACA TTTAAA AACCCTCC TAAG1TTTG TAATCT ATAAAGTTAGCAATTTA 38 TACCAAAAAGCACC TTTACA TTAAGCTT TTCAGTAAT TATCTT TTTAGTAAGCTAGCTA 39 GTCACAATTCTCAA GTCGCT GATTTCAAA AAACTGTAG TATCCT CTGCGAAACGATCCCT 38 TCACAATTCTCAAG TTGCTG ATTTCAAA AAACTGTAG TATCCT CTGCGAAACGATCCCT 35 GGAGTTTGTC TTGAAG TTATGCACC TGTTAAGGC TAAACT GAAAGAACAGATTTTGT 30 CAGTAT TTGCAT TTTTTACC CAAAACGAG TAGAAT TTGCCACGTTTCAGGCG 34 GCCTGACGGAG TTCACA CTTGTAAG TTTTCAAC TACGTT GTAGACTTTAC 38 GTATCATATTCGTG TTGGAT TATTCTGC ATTTTTGGG GAGAAT GGACTTGCCGACTG 30 GGTAGG TAGCGA AACGTTAG TTTGAATGG AAAGAT GCCTGCAGACACATAAA 26 GG TAGCGA AACGTTAG TTTGCAAGC TTTAAT GCGGTAGTTTATCAC 36 TTTCGCCGAATAAA TTGTAT ACTTAAG CTGCTGTT TAATAT GCTTTGTAACAATTT 38 ATAAGATGATCTTC TTGAGA TCGTTTTG GTCTGCGCG TAATCT CTTGCTTGAAAACGAAA 39 TAGAGGAGTTAGTC TTGAAG TCATGCGCC GGTTAAGGC TAAACT GAAAGGACAAGTTTTG 38 TCCAAGTTAGTGTA TTGACA TGATAGAA GCACTCTAC TATATT CTCAATAGGTCCACGG 38 CCACCGTGGACCTA TTGAGA ATATAGTA GAGTGCTTC TATCAT GTCAATACACTAACTT or CATCTTAAA1 AAAC> IICiAOI AAAGA11C CTTTAGTA GATAAT TTAAGTGTiCTTTAAT 32 AAATTATC TACTAA AGGAATCT TTAGTCAAG TTTATT TAAGATGACTTAACTAT 35 AATTCTCATGT TTGACA GCTTATCA TCGATAAGC TAGCTT TAATGCGGTAGTTTAT 38 TCACAATTCTCAAG TTGATA ATGAGAAT CATTATTGA CATAAT TGTTATTATTrTAC 35 TGTTTCAACACC ATGTAT TAATTGTG TTTATTTG TAAAAT TAATTTTCTGACAATAA 30 CTGGC TGGACT TCGAATTCA TTAATGCGG TAGTTT ATCACAGTTAA 38 AATAACCGTCAGGA TTGACA CCCTCCCA ATTGTATGT TTTCAT GCCTCCAAATCTTGGA 39 GCCAGTTAAATAGC TTGCAA AATACGTGG CCTTATGGT TACAGT ATGCCCATCGCAGTT 39 TAGAGATTCTCTTG TTGACA TTTTAAAAG AGCGTGGAT TACTAT CTGAGTCCGATGCTGTT 38 GGTGTATGCATTTA TTTGCA TACATTCA ATCAATTGT TATAAT TGTTATCTAAGGAAAT 38 TAGATAACAATTGA TTGAAT GTATGCAA ATAAATGCA TACACT ATAGGTGTGGTTTAAT 37 TGATAAGCAATGC TTTTTT ATAATGCC AACTTAGTA TAAAAT AGCCAACCTGTTCGACA 38 CGGTTTTTTCTTGC GTGTAA TTGCGGAG ACTTTGCGA TGTACT TGACACTTCAGGAGTG 38 TATCTCTGGCGGTG TTGACA TAAATACC ACTGGCGGT GATACT GAGCACATCAGCAGGA 38 TACCTCTGCCGAAG TTGAGT ATTTTTGC TGTATTTGT CATAAT GACTCCTGTTGATAGAT 38 TAACACCGTGCGTG TTGACT ATTTTACC TCTGGCGGT GATAAT GGTTGCATGTACTAAG 38 TTAACGGCATGATA TTGACT TATTGAAT AAAATTGGG TAAATT TGACTCAACGATGGGTT 39 GAGCCTCGTTGCGT TTGTTT GCACGAACC ATATGTAAG TATTTC CTTAGATAACAAT 38 AACACGCACGGTGT TAGATA TTTATCCC TTGCGGTGA TAGATT TAACGTATGAGCACAA 38 TTTTTCTAAATACA TTCAAA TATGTATC CGCTCATGA GACAAT AACCCTGATAAATGCT 42 CATCTGTGCGGTAT TTCACA CCGCATATGGTGCACTCTCAG TACAAT CTGCTCTGATGCCGCAT 38 ATCAAAGGATCTTC TTGAGA TCCTTTTT TTCTGCGCG TAATCT GCTGCTTGCAAACAAAA 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGATAAGC TTTAAT GCGGTAGTTTATCACA 27 TCG TTTTCA AGAATTCA TTAATGCGG TAGTTT ATCACAGTTAA 42 TTCATACACGGTGC CTGACT GCGTTAGCAATTTAACTGTGA TAAACT ACCGCATTAAAGCTTA 39 GTGCTACAGAGTTC TTGAAG TGGTGGCCT AACTACGGC TACACT AGAAGGACAGTATTTG 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGATGCGG TAGTTT ATCACAGTTAA 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGGTAGTT TATCAC AGTTAAATTGC 39 AAGAATTCTCATGT TTGACA GCTTATCAT CGATCACAG TTAAAT TGCTAACGCAG 33 TTCTCATGT TTGACA GCTTATCA TCGATAAGC TAAATT TTATATAAAATTTAGCT 33 TTCTCATGT TTGACA GCTTATCA TCGATAAGC TAAATT TATATAAAATTTTATAT 38 CTGTTGTTCAGTTT TTGAGT TGTGTATA ACCCCTCAT TCTGAT CCCAGCTTATACGGT GATCGCACGATCTG TATACT TATTTGAGT AAATTAACC CACGAT CCCAGCCATTCTTCTGC CGATTTCGCAGCAT TTGACG TCACCGCT TTTACGTGG CTTTAT AAAAGACGACGAAAA 30 TT TTGTAG AGGAGCAAACAGCGTTTGCGA CATCCT TTTGTAATACTGCGGAA 30 ATTATCA TTGACT AGCCCATC TCAATTGG TATAGT GATTAAAATCACCTAGA 38 ATACGCTCAGATGA TGAACA TCAGTAGG GAAAATGCT TATGGT GTATTAGCTAAAGC 37 CTTTCACACTCCGC CCTATA AGTCGGAT GAATGGAA TAAAAT GCATATCTGATTGCGTG 36 TTGCATCAAATG CTTGCG CCGCTTCT GACGATGAG TATAAT GCCGGACAATTTGCCGG 38 TTGCCGCAGGTCAA TTCCCT TTTGGTCC GAACTCGCA CATAAT ACGCCCCCGGTTTG 38 ATGCCTTGTAAGGA TAGGAA TAACCGCC GGAAGTCCG TATAAT GCGCAGCCACATTTG 38 GTAGGCGGTCATA CTGCGG ATCATAGAC GTTCCTGTT TATAAA AGGAGAGGTGGAAGG 39 GTACCGGCTTACGC CGGGCT TCGGCGGTT TTACTCCTG TATCAT ATGAAACAACAGAG 38 CACAGAAAGAAGTC TTGAAC TTTTCCGG GCATATAAC TATACT CCCCGCATAGCTGAAT 38 ATGGGCTTACATTC TTGAGT GTTCAGAA GATTAGTGC TAGATT ACTGATCGTTTAAGGAA 37 ACTAAAGTAAAGAC TTTACT TTGTGGCG TAGCATGC TAGATT ACTGATCGTTTAAGGAA 37 TTTCTACAAAACAC TTGATA CTGTATGA GCATACAG TATAAT TGCTTCAACAGAACAT 38 GTAAGCGGTCATTT ATGTCA GACTTGTC GTTTTACAG TTCGAT TCAATTACAGGA 38 ATGCGCAACGCGGG GTGACA AGGGCGCG CAAACCCTC TATACT GCGCGCCGAAGCTGACC 38 TGTAAACTAATGCC TTTACG TGGGCGGT GATTTTGTC TACAAT CTTACCCCCACGTATA 38 GATCCAGGACGATC CTTGCG CTTTACCC ATCAGCCCG TATAAT CCTCCACCCGGCGCG 38 ATAAGGAAAGAGAA TTGACT CCGGAGTG TACAATTAT TACAAT CCGGCCTCTTTAATC 38 AAATTTAATGACCA TAGACA AAAATTGG CTTAATCGA TCTAAT AAAGATCCCAGGACG 38 TTCGCATATTTTTC TTGCAA AGTTGGGT TGAGCTGGC TAGATT AGCCAGCCAATCTTT 37 CGACTTAATATACT GCGACA GGACGTCC GTTCTGTG TAAATC GCAATGAAATGGTTTAA 36 CGCCCTGITCCG CAGCTA AAACGCAC GACCATGCG TATACT TATAGGGTTGC 33 AGCCAGGT CTGACC ACCGGGCAA CTTTTAGAG CACTAT CGTGGTACAAAT 36 ATGCTGCCACCC TTGAAA AACTGTCG ATGTGGGAC GATATA GCAGATAAGAA CCC TTGAAA AACTGTCGATGTGGGACGATA TAGCAG ATAAGAATATTGCT 37 GGCACGCGATGGG TTGCAA TTAGCCGG GGCAGCAGT GATAAT GCGCCTGCGCGTTGGTT 37 TTTTAAATTTCCTC TTGTCA GGCCGGAA TAACTCCC TATAAT GCGCCACCACTGACACG
Table 6. Results for the training sequences
) found •/ •/ •/ S •/ S •/ S •/ S •/ S S •/ ^ •/ •/ S w •/ •/ S •/ •/ S •/ •/ •/ •/ S •/ •/ •/ •/ •/ S •/ •/ S •/ •/ •/ •/ •/ •/ •/ •/ •/ •/ S D Q •/ S •/ •/ S •/ •/ S •/ S •/ •/ •/ •/ •/ •/ •/ •/ •/ S S •/ •/ •/ • S •/
Generalized Analysis of Promoters: A Method for DNA Sequence Description 449 sequence rrnABP2 rrnB-P3 rrnB-P4 rrnDEXP2 rrnD-Pl rrnE-Pl rrnG-Pl rrnG-P2 rrnXl RSFprimer RSFrnal S10 sdh-Pl sdh-P2 spc spot42r ssb str SucAB supB-E T7-A1 T7-A3 T7-C T7-D T7A2 T7E TAC16 TnlOPin TnlOPout TnlOtetA TnlOtetR TnlOtetR* TnlOxxxPl Tnl0xxxP2 Tnl0xxxP3 Tn2660bla-F3 Tn2661bla-Pa Tn2661bla-Pb Tn501mer Tn501merR Tn5TR Tn5neo Tn7-PLE tnaA tonB trfA trfB trp trpP2 trpR trpS trxA tufB tyrT tyrT/109 tyrT/140 tyrT/178 tyrT/212 tyrT/6 tyrT/77 uncl uvrB-Pl uvrB-P2 uvrB-P3 uvrC uvrD 434PR 434PRM
ttgaca 15 14 15 15 15 15 15 15 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 15 11 10 9 15 15 15 11 15 15 11 15 15 5 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 13 2 13 15 15 15 15 15 15 15 15
tataat 37 40 36 37 37 37 37 37 37 38 39 37 37 37 38 37 39 38 39 38 38 38 38 38 39 34 32 33 38 39 39 34 37 38 38 38 38 28 39 37 38 38 38 37 39 39 38 38 38 39 38 39 38 37 39 34 24 38 37 38 39 38 38 37 38 38
promoter GCAAAAATAAATGC CTATGATAAGGAT GCGTATCCGGTCAC CCTGAAATTCAGGG GATCAAAAAAATAC CTGCAATTTTTCTA TTTATATTTTTCGC AAGCAAAGAAATGC ATGCATTTTTCCGC GGAATAGCTGTTCG TAGAGGAGTTTGTC TACTAGCAATACGC ATATGTAGGTTAA AGCTTCCGCGATTA CCGTTTATTTTTTC TTACAAAAAGTGCT TAGTAAAAGCGCTA TCGTTGTATATTTC AAATGCAGGAAATC CCTTGAAAAAGAGG TATCAAAAAGAGTA GTGAAACAAAACGG CATTGATAAGCAAC CTTTAAGATAGGCG ACGAAAAACAGGTA CTTACGGATG AATGAGCTG TCATTAAG AGTGTAATTCGGGG ATTCCTAATTTTTG TATTCATTTCACTT TGATAGGGAG TTAAAATTTTCTTG AAATGTTCTTAAGA CCATGATAGA TTTTTCTAAATACA GGTTTATAAAATTC CCTC TTTTCCATATCGC CATGCGCTTGTCCT TCCAGGATCTGATC CAAGCGAACCGGAA ACTAGACAGAATAG AAACAATTTCAGAA ATCGTCTTGCCTTA AGCCGCTAAAGTTC AGCGGCTAAAGGTG TCTGAAATGAGCTG ACCGGAAGAAAACC TGGGGACGTCGTTA CGGCGAGGCTATCG CAGCTTACTATTGC ATGCAATTTTTTAG TCTCAACGTAACAC ACAGCGCGTCTTTG TTAAGTCGTCACTA TGCGCGCAGGTC C ATTTTTCTCAAC ATTATTCTTTAA TGGCTACTTATTGT TCGAGTATAATTTG TCAGAAATATTATG ACAGTTATCCACTA GCCCATTTGCCAGT TGGAAATTTCCCGC AAGAAAAACTGTAT ACAATGTATCTTGT
TTGACT TACTCA CTCTCA TTGACT TTGTGC TTGCGG TTGTCA TTGACT TTGTCT TTGACT TTGAAG TTGCGT TTGTAA TGGGCA TACCCA TTCTGA TTGGTA TTGACA TTTAAA TTGACG TTGACT TTGACA TTGACG TTGACT TTGACA ATGATA TTGACA TTAAGG CAGAAT TTGACA TTCTCT TGGTAA TTGATG TTGTCA TTTAAA TTCAAA TTGAAG GTGATA TTGACT TTCGAA TTCCAT TTGCCA TTGTAA TAGACA TTGAAT TTGACA TTGACG TTGACA GTGACA CTGATC ATCTCA TTTACG TTGCAT TTTACA TTTACG TACAAA GTGACG ATGTCG GTAACA TCGCCA TTGAAA TTGGCA GTGATG TTCCTG TTGTCT TTGGCA TTGACA TTGTCA
CTGTAGCG GGAAGGCG TCTTATCCTT ATCAAACCGT CCTGACA GTTCGTGG CTGAAAGA GGAAAGCG AAAAAATT GGGATCCC CCTGCGGA GAACTCCC GGCCGGAA TAACTCCC CTGTAGCG GGAAGGCG TCCTGAGC CGACTCCC TGATAGAC CGATTGATT TTATGCACC TGTTAAGGC TCGGTGGT TAAGTATG TGATTTTG TGAACAGCC GCTTCTTC GTCAAATT TATCCTTG AAGCGGTGT ACTGAACA AAAAAGAG ATGGTACAA TCGCGCGTT CCTTTTCG GCATCGCCC AACTGCCCC TGACACTAA CTGCAAGG CTCTATACG TAAAGTCT AACCTATAG ACATGAAG TAAACACGG CAATGTTA ATGGGCTGA TGATGGGT CTTTAGGTG ACATGAAGT AACATGCAG TTTACACA TTACAGTGA ATTAATCA TCGGCTCG TGGATACAC ATCTTGTCA TGGTAAAG AGAGTCGTG CTCTATCAT TGATAGAGT ATCACTGAT AGGGAGTGG AATAACTC TATCAATGA ATTTTTAT TTCCATGA CGACCACA TCATCATGA ATAACATACCGTCAGTATGTT TATGTATC CGCTCATGA ACGAAAGG GCCTCGTGA CGCTTATT TTTATAGGT CCGTAGATG AGTAGGGAAG TTGAAATT GGATAGCG GTGACCTC CTAACATGG GCTGGGGC GCCCTCTGG ACTGAAAT CAGTCCAGT AAAACTCT GAGTGTAA ATGATTGCT ATTTGCATT GCGGAACCA ATGTTTAGC TGCGAGAA ATGTTTAGC ATTAATCA TCGAACTAG TTTTAACA CGTTTGTTA CGCACGTTT ATGATATGC GCCAGCCT GATGTAATT AAAGCGTAT CCGGTGAAA GAACTCGC ATGTCTCCA GCGGCGCG TCATTTGA GTAATCGAA CGATTATTC GTACTGGCA CAGCGGGTC TCGAGAA AAACGTCT ATCATACC TACA.CAGC CTTTACAG GCGCGTCA GCAAAAATA ACTGGTTACC TCACGGGG GCGCACCG TAATTAAG TACGACGAG AACTGTTTT TTTATCCAG TGGATAAC CATGTGTAT GAACGTGA ATTGCAGAT TCTCTGAC CTCGCTGA AACAAGAT ACATTGTAT AATACAGT TTTTCTTGT
TATTAT TAAAAT TAAAAT TAATAT TATAAT TATAAT TATAAT TATTAT TATAAT CATCAT TAAACT TATAAT TATACT TATCAT TATAAT TAAAGT TACACT TAAAAT GACAGT CATAAT GAT ACT TACGAT TAGTCT TAGGCT TAAGAT TATACT TATAAT TATGAT TAAAAT TATTTT TAAAAT TAGAGT TAGATT TACCAT TATGGT GACAAT TACGCT TAATGT TAAGGT TAACCT TAACGT TAAGGT TATGCT TAATGT TAAAAT TAAACT TAAACT TTAACT CAAGGT TATCGT TATCAG TAAAGT TAGAAT TATGAT TTTAAT TTTGTT TAAGTC TGAAGA. TTTGAT TTTAAT TATAAT TAAAAT TATAAT TAGAGT TATGCT TATAAT GAAAAT GAAGAT
Table 7. Results for the training sequences
GCACACCCCGCGCCGC GGGCGGTGTGAGCTTG AGCCAACCTGTTCGACA ACGCCACCTCGCGACAG GCGCCTCCGTTGAGACG GCGCCTCCATCGACACG GCGCCACCACTGACACG GCACACCGCCGCGCCG GCGCCTCCATCGACACG CTCATAAATAAAGAA GAAAGAACAGATTTTG GCGCGGGCTTGTCGT GCCGCCAGTCTCCGGAA GTGGGGCATCCTTACCG GCCGCGCCCTCGATA TAGTCGCGTAGGGTACA TATTCAGAACGATTTT TCGGCGTCCTCATAT TTTAAAAGGTTCCTT GCGCCCCGCAACGCCGA TACAGCCATCGAGAGGG GTACCACATGAAACGAC TATCTTACAGGTCATC TTAGGTGTTGGCTTTA ACAAATCGCTAGGTAAC CAAGGCGACTACAGATA GTGTGGAATTGTG CAAATGGTTTCGCGAAA ATCGAGTTCGCACATC ACCACTCCCTATCAGT AACTCTATCAATGATA GTCAACAAAAATTAGG TAAAATAACATACC AAACATACTGACGG ATCATGATGATGTGGTC AACCCTGATAAATGCT TATTTTTATAGGTTAA C ATGATA AT AATGGTTT TACGCTATCCfcATTTC TACTTCCGTACTCA TCATGATAACTTCTGCT TGGGAAGCCCTGCAA GTGAAAAAGCAT AGCCTCGTGTCTTGCG CGAGACCTGGTTT AGAGTCTCCTT TCTCTCATGTG AGTACGCAAGTTCACGT AAAGGCGACGCCGCCC ACTCTTTAGCGAGTACA TCTATAAATGACC CAACTAGTTGGTTAA GCGCGCTACTTGATGCC GCGCCCCGCTTCCCGAT CGCCAGCAAAAATAA TACGGTAATCG GTGCACTATACA TATGATGCGGGCAGGTCGTGACG ATGATGCGCCCCGCTTC CCGTTACGGATGAAAAT TTGACCGCTTTTTGAT TACATACCTGCCCGC TTGTTGGCATAATTAA TAGAAAACACGAGGCA GATGATCACCAAGG CAGCAAATCTGTATAT ACAAGAAAGTTTGTTGA TGGGGGTAAATAACAGA
found 7 •/ •/ S •/ •/ •/ •/ S v^ •/ •" S •/ S •/ •/ •/ •/ J S •/ •/ •/" •/ •/ •/ S •/ ./" •/ •/ •" •/ •/ S •/ •/ S S •/ S • •/ •/ •/ •/ •/ y •/ •/ S S •/" •/ • S J • S S •/ S •/ S •/ •/ •/
CHAPTER 19 MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS FOR COMPUTER SCIENCE APPLICATIONS
Gary B. Lamont, Mark P. Kleeman, Richard 0. Day Genetic Computational Techniques Research Group Department of Electrical and Computer Engineering Air Force Institute of Technology Wright Patterson Air Force Base, Dayton, OH 45433, USA E-mail: [email protected] mkleeman@afit. edu [email protected] In this chapter, we apply the multi-objective Messy Genetic Algorithm-II (MOMGA-II) to two NP-Complete multi-objective optimization problems. The MOMGA-II is a multi-objective version of the fast Messy Genetic Algorithm (fmGA) that is an explicit building-block method. First, the MOMGA-II is used to determine 'good' formations of unmanned aerial vehicles (UAVs) in order to limit the amount of communication flow. The multi-objective Quadratic Assignment Problem (mQAP) is used to model the problem. Then, the MOMGA-II is applied to the Modified Multi-objective Knapsack Problem (MMOKP), a constrained integer based decision variable MOP. The empirical results indicate that the MOMGA-II is an effective algorithm to implement for similar NPComplete problems of high complexity.
19.1. Introduction Multi-objective evolutionary algorithms (MOEAs) are stochastic computational tools available to researchers for solving a variety of multi-objective problems (MOPs). Of course, there are many pedagogical polynomialcomplexity MOPs that can be solved optimally using deterministic algorithms. However, there are also many real world MOPs that are too computationally intensive to be able to obtain an optimal answer in a reasonable amount of time. These problems are considered NP-complete (NPC) prob451
452
Gary B. Lamont et al.
lems1. The use of MOEAs to find "good" solutions to these high-dimension exponential-complexity problems is of great utility. We address the general category of NPC as defined in a MOP structure. Two MOP NPC examples are solved in depth using MOEAs in order to provide specific insight. Some generic comments are presented that address MOEA approaches for NPC combinatoric problems. 19.2. Combinatorial MOP Functions Multi-objective Optimization Problems (MOPs), a variation of Combinatorial Optimization Problems, are a highly researched area in the computer science and operational research fields. MOPs generally model real-world problems better than their single objective counterparts as most real world problems have competing objectives that need to be optimized. MOPs are used to solve many /VP-Complete problems. The detailed notational symbology for these problems, as well as for Pareto optimality, are described in Chapter 1. Table 19.75 lists just a few of these NP-Complete (NPC) types of problems. In essence, NPC combinatoric MOP problems are constrained minimization problems with the additional constraint on x such that it is only able to take on discrete values (e.g., integers). The use of these combinatorial MOPs in any proposed MOEA test suite should also be considered. On one hand, EAs often employ specialized representations and operators when solving these NPC problems which usually prevent a general comparison between various MOEA implementations. On the other hand, NPC problems' inherent difficulty should present desired algorithmic challenges and complement other test suite MOPs. Databases such as TSPLIB 29, MP-Testdata 32, and OR Library 2, exist for these iVP-Complete problems. On another note, the fitness landscapes for various ./VP-Complete problems vary over a wide range. For example, the knapsack problem reflects a somewhat smooth landscape while the TSP problem exhibits a many-faceted landscape. The later then being more difficult to search for an "optimal" Pareto front. Other NP-Complete problem databases are also available 25>6>34. As an example, for the multi-objective 0/1 knapsack problem with n
'By "NP" problem of course is meant non-polynomial deterministic Turing machine polynomial execution time. "C" refers to the polynomial mapping of various NP combinatoric problems to each other; i.e., a complete set 8 .
Multi-Objective Evolutionary Algorithms for Computer Science Applications
453
Table 19.75. Possible Multi-objective iVP-Complete Functions | JVP-Complete Problem Travelling Salesperson Coloring Set/Vertex Covering Maximum Independent Set (Clique) Vehicle Routing Scheduling Layout iVP-Complete Problem Combinations 0/1 Knapsack - Bin Packing Quadratic Assignment
Example ^j Min energy, time, and/or distance; Max expansion Min number of colors, number of each color Min total cost, over-covering Max set size; Min geometry Min time, energy, and/or geometry Min time, missed deadlines, waiting time, resource use Min space, overlap, costs Vehicle scheduling and routing Max profit; Min weight Max flow; Min cost
knapsacks and m items, the objective is to maximize /(x) = (/ 1 (x),.../ n (x))
(B.I)
where m
(B-2)
/i(x) = Y,P*JXJ i=i
and where pij is the profit of the j item in knapsack i and Xj is 1 if selected. The constraint is m
2_,Wi,3Xi
— Cifor att i
(B-3)
i=i
where u>ij is the weight of item j in knapsack i and Cj is the capacity of knapsack j . 19.3. MOP NPC Examples In order to gain insight to applying multi-objective evolutionary algorithms (MOEAs) to NPC MOP problems, discussed is the multi-objective quadratic assignment problem and a modified MOP knapsack problem. Insight provides for general understanding of MOEA development for NPC MOPs. 19.3.1. Multi-Objective
Quadratic Assignment
Problem
The standard quadratic assignment problem (QAP) and the multi-objective quadratic assignment problem (mQAP) are NP-complete problems. Such
454
Gary B. Lamont et al.
problems arise in real-world applications such as facilities placement, scheduling, data analysis, manufacturing, and resource use. Most QAP examples can be thought of as minimizing the product of two matrices, for example a distance matrix times a flow matrix cost objective. Many approaches to solving large dimensional QAPs involve hybrid algorithms including GA integration with local search methods such as Tabu search and simulated annealing. Here we examine the mQAP as mapped to a heterogenous mix of unmanned aerial vehicles (UAVs) using a MOEA. Our model concentrates on minimizing communication flow and maximize mission success by positioning UAVs in a selected position within a strict formation. Various experiments are conducted using a MOEA approach. The specific algorithm used was the multi-objective Messy Genetic Algorithm-II (MOMGA-II), an explicit building-block method. Solutions are then compared to deterministic results (where applicable). The symbolic problem description is initially discussed to provide problem domain insight. Regarding a specific application, consider UAVs, flying in large groups. One possible scenario is to have a heterogenous group of UAVs flying together to meet a specific objective. There could be some in the group that are doing reconnaissance and reporting the information for security purposes. In a large heterogenous group, such as this, one UAVs position with respect to the other UAVs is important. For example, it would be best to place some UAVs around the outside of the group in order to protect the group as a whole. It would also be advantageous to have the reconnaissance UAVs nearer to the ground in order to allow them to have an unobstructed field of view. While location in the formation for their particular part of the mission is important, they also need to be in a position where they can communicate effectively with other UAVs. For example, the reconnaissance UAVs need to communicate coordinates, to enable them to find their target. Other UAVs need to communicate with all of the other UAVs when they sense approaching aircraft, so that the group can take evasive action (likefishbehavior). All of this communication may saturate on communication channel, so multiple communication channels are used. All of these channels of communication can also dictate where the best location in the group may be for each UAV type. The UAV communication and mission success problem is a natural extension of the mQAP. The mQAP comes from the quadratic assignment problem (QAP) and was introduced by Knowles and Corne 17. The scalar quadratic assignment problem was introduced in 1957 by Koopmans and
455
Multi-Objective Evolutionary Algorithms for Computer Science Applications
Beckmann, when they used it to model a plant location problem3. It is defined as follows. 19.3.1.1. Literary QAP Definition The QAP definition is based on a fixed number of locations where each location is a fixed distance apart from the others. In addition to the locations, there are an equal number of facilities. Each facility has afixedflowto each other facility. A solution consists of placing each facility in one and only one location. The goal is to place all facilities in such a way as to minimize the cost of the solution, where the cost is defined as the summation of each flow multiplied times the distance. 19.3.1.2. Mathematical QAP Definition n
n
i=i
j=i
(c.1) where n is the number of objects/locations, aij is the distance between location i and location j , bij is the flow from object i to object j , and •Ki gives the location of object i in permutation TT 6 P(n) where P{n) is the QAP search space, which is the set of all permutations of {1,2,..., n) 18 . This problem is not only NP-hard and NP-hard to approximate, but is almost intractable. It is generally considered to be impossible to solve optimally any QAP that has 20 instances or more within a reasonable time frame 3>27. 19.3.1.3. General mQAP The mQAP is similar to the scalar QAP, with the exception that there are multiple flow matrices - each needing to be minimized. For example, the UAVs may use one communication channel for passing reconnaissance information, another channel for target information, and yet another channel for status messages. The goal is to minimize all the communication flows between the UAVs. The mQAP is defined in mathematical terms in equations C.2 and C.3 19.3.1.4. Mathematical mQAP The mQAP is defined in mathematical terms in equations C.2 and C.3
456
Gary B. Lamont et al.
mmimfze{C(7r)} = {C1(7r),C2(7r),...,Crm(7r)}
(C.2)
Ck(n) =n£%n) J2 E °0-^W).. * e l..m
(C.3)
where
»=i
j=i
and where n is the number of objects/locations, a,ij is the distance between location i and location j , bfj is the kth flow from object i to object j , •Ki gives the location of object i in permutation ?r £ P(n), and 'minimize' means to obtain the Pareto front 18. Much work has been done with respect to classify solutions found in the fitness landscape of QAP instances. Knowles and Corne 18 identified two metrics for use with the mQAP: diameter and entropy. Diameter of the population is defined by Bachelet l and is shown in Equation C.4:
dmm(P) =
1—
(C.4)
where dist(ir,fi) is a distance measurement that measures the smallest number of two-swaps that need to be performed in order to transform one solution, 7T, into another solution, fi. The distance measure has a range of [0,n-l]. The metric entropy measures the dispersion of the solutions. It is shown in equation C.5:
(c.5) where n*j is a measure of the number of times object i is assigned to the j location in the population. Many approaches have been tried to solve the QAP. Researchers interested in finding the optimal solution can usually only do so for problems that are of size 20 or less; moreover, even problem sizes of 15 are considered to be difficult 3 . In cases where it is feasible to find the optimal solution (less than size 20), branch and bound methods are typically used 10>28-3. Unfortunately, most real-world problems are larger than size 20 and thus require the employment of other solving methods in order to find a good solution in a reasonable time. For instance, the use of Ant Colonies has been explored
Multi-Objective Evolutionary Algorithms for Computer Science Applications
457
and is found to be effective when compared to other available heuristics 7,31,21 Evolutionary algorithms have also been applied 23>20>n>26. A good source where researchers compare performances of different search methods when solving the QAP can be found at 33-22. 19.3.1.5. Mapping QAP to MOEA Table 19.76. Test Name
Test Suite
Instance Category
# of locations
"~KC10-2fl-[l,2,3]uni II Uniform ~ KC20-2fl-[l,2,3]uni Uniform KC30-3fl-[l,2,3]uni Uniform KC10-2fl-[l,---,5]rl Real-like KC20-2fl-[l,---,5]rl Real-like ~~KC30-3fl-[l,2,3]rl || Real-like [|
Table 19.77.
10 20 30 10 20 30
# of flows II
||
2 2 3 2 2 3
MOMGA-II settings
Parameter
Value
GA-type Representation Eras BB Sizes Pent Psplice Pynutation string length Total Generations Thresholding Tiebreaking
fast messy GA Binary 10 1-10 2% 100% 0% 100, 200, 300 100 No No
|
The Multi-objective messy Genetic Algorithm-II (MOMGA-II) program is based on the concept of the Building Block Hypothesis (BBH). The MOMGA-II is based off of the earlier MOMGA algorithm 38 . The MOMGA implements a deterministic process to generate an enumeration of all possible BBs, of a user specified size, for the initial population. This process is referred to as Partially Enumerative Initialization (PEI). Thus, the MOMGA explicitly uses these building blocks in combination to attempt to solve for the optimal solutions in multi-objective problems.
458
Gary B. Lamont et al.
The original messy GA consists of three distinct phases: Initialization Phase., Primordial Phase, Juxtapositional Phase. The MOMGA uses these concepts and extends them where necessary to handle k > 1 objective functions. In the initialization phase, the MOMGA produces all building blocks of a user specified size. The primordial phase performs tournament selection on the population and reduces the population size if necessary. The population size is adjusted based on the percentage of "high" fitness BBs that exist. In some cases, the "lower" fitness BBs may be removed from the population to increase this percentage. In the juxtapositional phase, BBs are combined through the use of a cut and splice recombination operator. Cut and splice is a recombination (crossover) operator used with variable string length chromosomes. The cut and splice operator is used with tournament thresholding selection to generate the next population. A probabilistic approach is used in initializing the population of the fmGA. The approach is referred to as Probabilistically Complete Initialization (PCI) 9. PCI initializes the population by creating a controlled number of BBs based on the user specified BB size and string length. The fmGA's initial population size is smaller than the mGA (and MOMGA by extension) and grows at a smaller rate as a total enumeration of all BBs of size o is not necessary. These BBs are then "filtered", through a Building Block Filtering (BBF) phase, to probabilistically ensure that all of the desired good BBs from the initial population are retained in the population. The BBF approach effectively reduces the computational bottlenecks encountered with PEI through reducing the initial population size required to obtain "good" statistical results. The fmGA concludes by executing a number of juxtapositional phase generations in which the BBs are recombined to create strings of potentially better fitness. The MOMGA-II mirrors the fast messy Genetic Algorithm (fmGA) and consists of the following phases: Initialization, Building Block Filtering, and Juxtapositional. The MOMGA-II differs from the MOMGA in the Initialization and Primordial phase, which is referred to as the Building Block Filtering phase. The initialization phase of the MOMGA-II uses PCI instead of the PEI implementation used in the MOMGA and randomly creates the initial population. The application of an MOEA to a class of MOP containing few feasible points creates difficulties that an MOEA must surpass in order to generate any feasible points throughout the search process. A random ini-
Multi-Objective Evolutionary Algorithms for Computer Science Applications
459
tialization of an MOEA's population may not generate any feasible points in a constrained MOP. Without any feasible solutions in the population, one must question whether or not the MOEA can even conduct a worthwhile search. In problems where the feasible region is greatly restricted, it may be impossible to create a complete initial population of feasible solutions randomly. Without feasible population members, any MOEA is destined to fail. Feasible population members contain the BBs necessary to generate good solutions. It is possible for an infeasible population member to contain a BB that is also present in a feasible solution. As it is also possible for mutation to generate a feasible population member from a infeasible population member. However, typically feasible population members contain BBs that are not present in infeasible population members. Evolutionary operations (EVOP) applied to feasible members tend to yield better results than EVOPs applied to infeasible population members. Therefore, it is critical to initialize and maintain a population of feasible individuals. 19.3.2. MOEA mQAP Results and Analysis 19.3.2.1. Design of mQAP Experiments and Testing The goal of the experiments was to compare the MOMGA-II results with other programs that have solved the mQAP. In order to do this, a benchmark data set was needed for comparison purposes. The test suite chosen was created by Knowles 16 and for smaller sized problems a deterministic search program was used to get definitive results. See Table 19.76 for an entire listing of the test suite problems. Table 19.77 lists the MOMGA-II default parameter settings used during the mQAP experiments. Building block sizes 1 through 10 were used. Each building block size was run in a separate iteration, or era, of the program. Population sizes were created using the Probabilistically Complete Initialization method referred to earlier in the paper. Specifically, the population for each era was determined using the the population formula found in Equation C.6. PopSize = NumCopies x AlleleCombo x Choose(Prob_Len, Order) (C.6) Where PopSize is the population size to be found, NumCopies is the number of duplicate copies of alleles that we want to have, AlleleCombo is 2Order where Order is the building block size, and Choose(ProbLen, Order) is a combination that takes the problem length and the building block size
460
Gary B. Lamont et al.
as its variables. These settings were chosen based on previous settings used when MOMGA-II was applied to the multi-objective knapsack problem. Due to the extended length of time it took to generate data, other settings could not be evaluated as well. It is recommended that future experiments be run with different settings in order to determine the best settings for these particular problems. The MOMGA-II results are taken over 30 data runs. The MOMGA-II was run on a Beowulf PC cluster consisting of 32 dual-processor machines, each with 1-GB memory and two 1-GHz Pentium III processors (using Redhat LINUX version 7.3 and MPI version 1.2.7.1). The MOMGA-II code was run in two different manners. One method started with a randomized competitive template and passed the improved competitive template to larger building sizes. The other method had separate competitive templates for each building block size. The first method allows the algorithm to exploit "good" solutions as the algorithm runs. The second method allows the larger building block sizes to explore the search space more. The MOMGA-II code was run in order to generate a population with good (low) fitness values for theflowsand found the non-dominated points. After the unique Pareto points for each of the runs was found, the results were combined, one at a time, and pareto.enum was used to pull out the unique Pareto points for each round. A simple MatLab program was then used that showed how the data values improved as more runs were run. 19.3.2.2. QAP Analysis Table 19.78 compares our original results (competitive template passed to larger building block sizes) to those found by Knowles and Corne 18 and the optimal results, when applicable, using a simple program that goes through all possible permutations. Abbreviations used in the table are as follows: Non-Dominated (ND), Diameter(Dia), Entropy(Ent), and Deterministic(Det). For all of the instances with 10 locations and 10 facilities Knowles and Corne used a deterministic algorithm. For all the instances with 20 locations and 20 facilities, they used local search measures which employed 1000 local searches from each of the 100 different A vectors. For the instances with 30 locations and 30 facilities, they employed a similar local search measure which used 1000 local searches from each of the 105 different A vectors 17.
461
Multi-Objective Evolutionary Algorithms for Computer Science Applications
Table 19.78. Comparison of QAP Results
Ttest Name
Knowles Results
#NDI pts
KC10-2fl-luni II 13 I KC10-2fl-2uni 1 KC10-2fl-3uni 130 KC20-2fl-luni 80 KC20-2fl-2uni 19 KC20-2fl-3uni 178 KC30-3fl-luni 705 KC30-3fl-2uni 168~ KC30-3fl-3uni 1257 KC10-2fl-lrl 58 KC10-2fl-2rl 15 KC10-2fl-3rl 55 KC10-2fl-4rl 53 KC10-2fl-5rl 49 ~ KC20-2fl-lrl 541 KC20-2fl-2rl 842 KC20-2fl-3rl 1587 KC20-2fl-4rl 1217~ KC30-3fl-lrl 1329 KC30-3fl-2rl 1924
I
Dia
Ent
7 I 0.71 II 6 0.39 8 0.78 15 0.828 14 0.43 16 0.90 24 0.97 22 0.92 24 ~~0.96 8 0.68 7 0.49 8 0.62 8 ' 0.58 8 ~0.63 15 ~0,63 14 ~ 0 . 6 15 0.66 15 ~0.51 24 0.83 24 0.86
Our Results
#NDI pts 13 1 118 24 538 51 126 58 155 44 10 36 34 45 17 12 29 25 191 183
Dia I
PFtTUe Points
I
Ent
EA
I ^~\
%
Det
Found
5 I 0.69 II 9 I 13 I 0 0 1 1 6 0.87 40 130 11 0.82 15 1.48 ~ 12 0.92 20 O50 22 ~0.64 ~ 20 0.56 ~ 5 0.61 21 58 ~ 5 0.56 5 15 ~ 6 0.71 23 55 4 6753 24 53 6 ~0.69 36 49 ~ 12 0.73 11 0.76 12 0.91 10 ~0.18 ~ 24 0.79 ~ 24 0.77
69 100 31
36 33 42 45 73
By comparing the initial MOMGA-II results for the 10 locations with 10 facilities instances, it can be shown that the results did not equal the results for the Pareto optimal (found deterministically). It can also be assumed that the MOMGA-II results for problems with 20 & 30 locations and 20 & 30 facilities did not find all the true Pareto front members. When compared to Knowles and Corne's test results, the MOMGA-II results might be deficient depending on if they indeed have found true Pareto front points. Figures 19.1 and 19.2 illustrate results for 30 locations and 30 facilities. Then we ran the MOMGA-II with the exact same settings, but we randomized our competitive template for each building block size. Table 19.79 shows the outcome of those results with respect to our initial run and the optimal results. The old method refers to using the same competitive templates throughout all the building block sizes and the new method randomizes the competitive template before each building block size. As you can see, by allowing more exploration of the search space, we were able to find more PFtrue points. Figures 19.3 and 19.4 show some of the results of these runs. The results show that the new method performs much better than the old method on
462
Gary B. Lamont et al.
Fig. 19.1. Pareto front found for the KC30-3fl-2rl test instance
Fig. 19.2. Pareto front found for the KC30-3fl-3uni test instance
all instances except one. The one time the old method performs better is when there is only one data point as a solution. These results show that, with the exception of one instance, the new method is more effective than
MvXti- Objective Evolutionary Algorithms for Computer Science Applications
463
Table 19.79. Comparison of MOMGA-II Methods to PFtruf,
Test Name
Total PFlrue
KC10-2fl-luni KC10-2fl-2uni KC10-2fl-3uni KC10-2fl-lrl KC10-2fl-2rl KC10-2fi-3rl KC10-2fl-4rl KC10-2fl-5rl
13 1 130 58 15 55 53 49
Mean Std. Dev. Mean (w/o anomaly) Std. Dev. (w/o anomaly)
II
True Pareto Front Points Old Percent New Method Found Method II
II
9 1 40 21 5 23 24 36
69 100 31 36 33 42 45 73 53.76 24.59
II ' '
II
Percent Found
11 0 122 56 11 50 47 49
85 0 94 97 73 91 89 100 78.49 32.75
47.16
89.70
17.28
8.82
1
i.5 1
Fig. 19.3. instance
Comparison of MOMGA-II methods to optimal results on KC10-2fl-lrl test
the old method. These suggest that randomizing the competitive template, allows the algorithm to explore the objective space more effectively and yield better results. It is believed that the MOMGA-II suffers from "sped-
464
Gary B. Lamont et al.
ation". This can be overcome by adding some competitive templates near the center of the Pareto Front. See 13>14 for more detailed analysis of these results.
Fig. 19.4. Comparison of MOMGA-II methods to optimal results on KC10-2fl-luni test instance
The results from Table 19.79 support these findings. Whenever there were many points to find, the new method always found more than the old method. The reason why the old method performed better than the new method when there was only one point to find is due to the fact that both competitive templates are pointing at the same location. This directs the search in the same direction as opposed to dividing the search into two directions. Since the new method doesn't have this directed search passed on to the larger building block sizes, they start at a disadvantage when trying to find one or two points. Additional experiments were done to see if building block size played a role in where the points were located along the Pareto Front. We found that, on average, about twice as many large building blocks populate the outside of the Pareto Front than the smaller building block sizes do. This is due to more bits being set in the genotype domain and allows for a better solution in the phenotype domain. These results support the results that are discussed in Section 19.4 of this chapter.
Multi-Objective Evolutionary Algorithms for Computer Science Applications
465
More in-depth analysis of the results can be found in 5'14-13. 19.3.3. Modified Multi-Objective Knapsack Problem (MMOKP) The generic multiple knapsack problem (MKP) consists of maximizing the profit (amount of items) for all knapsacks while adhering to the maximum weight (constraint) of each knapsack. The MOMGA-II is applied to the Modified Multi-objective Knapsack Problem (MMOKP), also a constrained integer based decision variable MOP. The formulation contains a large number of decision variables. MOEAs are suited to attempt and solve problems of high dimensionality and hence the MOMGA-II is suited for this application. The MMOKP is formulated in 100, 250, 500, and 750 item formulations with integer based decision variables and real-valued fitness functions. While the MMOKP formulation used does not reflect the true multi-objective formulation of the true multiple knapsack problem (MKP) due to the constraint that any item placed into one of the knapsacks must also be placed into all of the knapsacks. However, the MMOKP remains a good test problem due to the large number of decision variables and the difficulty associated with generating solutions on the Pareto front. Many researchers have selected this MOP to test their non-explicit building-block MOEAs 12,15,19,30,35,36,37 T h e MMOKP has been selected for testing due to the difficulty of finding good solutions to this problem and to evaluate the performance of an explicit BB-based MOEA approach as applied to this MOP. Since the MMOKP has similar characteristics to other real-world MOPs, it is a good test problem to use. The specific MOMGA-II settings used are presented in Table 19.80. Results are taken over 30 data runs in order to compare the results of the MOMGA-II to other MOEAs also executed over 30 data runs. The MOMGA-II was run on a SUN Ultra 10 with a single 440 MHz processor, 1024 MB of RAM, and using the Solaris 8 operating system. Table 19.80. MOMGA-II settings |
Parameter Eras BB Sizes
|
Value 10 TlO
Pent Psplice
2% 100%
string length Total Generations
100, 250, 500, 750 100
|
466
Gary B. Lamont et al.
The overall goal is to maximize the profit obtained from each of the knapsacks simultaneously while meeting the weight constraints imposed. The MOP formulation follows for m items and n knapsacks , where Pi,j = profit of item j according to knapsack i, Wij = weight of item j according to knapsack i, Ci = capacity of knapsack i For the MMOKP problem with n knapsacks and m items, the objectives are to maximize /(x) = ( / l « , - / n ( x ) )
(C.7)
/i(x) = $ > , ; * i
(C.8)
where
and where Xj — Hi item j is selected, 0 otherwise 37. The constraints are: m ^WijXj
(C.9)
where Xj — 1 if item j is selected, 0 otherwise. The MMOKP has similar characteristics to those of the ALP problem. Both problems are formulated with a large number of decision variables, the decision variables are integer based, and the constraints are linear. Since both problems have similar characteristics, one may expect similar issues to arise as compared to the ALP testing. In fact, the application of the MOMGA-II to the MMOKP problem illustrates that initially the similar performance was obtained as compared to the results of the ALP. In the initial application of the MOMGA-II to the MMOKP, a constraint handling approach was not used and the MOMGA-II generated few feasible solutions, entirely of inferior quality when compared to the results of other algorithms. The initial results necessitated the use of a repair mechanism in order to provide the MOMGA-II an increased probability of identifying good BBs in the population. An analysis of the initial results of the MOMGA-II illustrate that without the repair method, the infeasible results generated are far away from the feasible solutions in phenotype space. This is a different result than was realized from the application of the MOMGA-II to the ALP. Hence the repair mechanism is also used in all three phases of execution. The population members are repaired following the random initialization of the population, during the BBF phase and during the juxtapositional phase.
Multi-Objective Evolutionary Algorithms for Computer Science Applications
467
Additionally, the competitive templates are repaired prior to use. The population members are not repaired after each specific cut-and-splice operation of the juxtapositional phase in order to avoid convergence to a suboptimal solution and reduce the overhead associated with the numerous recombinations that take place within a single generation. However, at the end of each generation of the juxtapositional phase, all of the population members are repaired. The population is stored to an external archive to ensure that the feasible population members generated by the MOMGA-II are not lost. The next juxtapositional generation begins with feasible population members and the process repeats itself. Once termination criteria is met, the results are presented to the researcher. At the conclusion of execution of the three phases, all of the population members are feasible and the competitive templates are updated for the next BB execution. The repair process ensures that subsequent BB size executions begins with feasible competitive templates. The initial repair mechanism selected for use is identical to the repair mechanism used in the application to the ALP. Since other researchers have attempted to solve the MMOKP, one must consider whether or not other repair approaches may have merit. Zitzler and Thiele 37 proposed a greedy approach that is an extension of a single objective repair mechanism used by Michalewicz and Arabas 24. In this multi-objective greedy repair approach, items are removed from the knapsacks based on their profit to weight (ptw) ratio. Remember that each item has an associated profit and weight with respect to the knapsack the item is placed into. If the knapsack capacity is exceeded, then items of lowest profit to weight ratio with respect to the knapsack constraint that is violated are removed first. The profit to weight ratio is defined in Equation (CIO). ptWij = EhL
(C. 1O )
w
ij
Zitzler et al., state that this repair method performs well when used in their MOEA, the SPEA. Initial testing of this method applied to the MOMGA-II was not as successful and hence a different way of repairing the population members became necessary. Since the operators are substantially different from the SPEA operators and the process it follows, it is accepted that a method that performs well when implemented in one MOEA may not perform well when implemented in other MOEAs. Zitzler's repair mechanism is a simple approach to extending a single objective repair mechanism to a multi-objective repair mechanism. The problem is that Zitzler's repair assumes that all of the constraints are violated, which may not be the case 37.
468
Gary B. Lamont et al.
A potentially better repair mechanism for use in the MOMGA-II removes items from the knapsacks based only on the knapsack constraint that is violated. If only the knapsack 1 constraint is violated, then the ptw ratio of each item based on knapsack 1 is compared. However in testing this method in the MOMGA-II, the new proposed repair method tends to obtain slightly better results in some cases than Zitzler's method but overall the performance is similar to that obtained with Zitzler's repair mechanism. In Jaszkiewicz's repair approach 12, used in the IMMOGLS MOEA, the ptw ratio of an item takes into account the multiple objectives in a better manner than the other mentioned approaches. Only one ptw ratio is calculated per item, but the ptw ratio takes into account all j knapsacks unlike the previously mentioned methods that calculate a ptw ratio per item, per knapsack. Jaszkiewicz's method sums up the profits over all of the knapsacks and sums up the weight over all of the knapsacks for a single item. The ptw ratio is calculated by determining the ration of the sums and is presented in Equation (C.ll).
^
= gfli
(O.1I,
Jaszkiewicz's repair method yields the better performance when used in the MOMGA-II. All of the results presented for the MOMGA-II use Jaszkiewicz's repair method. The improved performance of Jaszkiewicz's repair method over other repair methods (applied to the MMOKP using the MOMGA-II) is attributed to a better calculation of the lowest ptw ratio that more accurately takes into account all of the knapsacks simultaneously. The MOMGA-II is applied to the 100 item, 2 knapsack; 250 item, 2 knapsack; 500 item, 2 knapsack; and 750 item, 2 knapsack MMOKP presented in Zitzler 37 . Any proposed change to an MOEA should be tested in order to determine if it performs logically correct and to determine if the effect of the change is worthwhile or as anticipated. The effect of the repair method selected (Jaszkiewicz's repair method), as applied to the initial population of a single run of the MOMGA-II, is illustrated in Figure 19.5. The black (.)s represent the results of applying the repair mechanism to the infeasible population members. Any feasible population members are not repaired and hence the black (.)s only represent the members that are repaired. One can see that the repaired population members have fitness values that are lower than the initial population members since the repair mechanism removes item types from the knapsack if a constraint is violated. Hence the repaired
Multi-Objective Evolutionary Algorithms for Computer Science Applications
469
members move towards the lower left portion of the figure as item types are removed from their knapsacks and hence the fitness values of these repaired members are decremented. Such a result is what one would expect occurs after repairing the population and this validates that the repair mechanism performs correctly. The repaired members are all feasible, there are feasible BBs in the population, and the MOMGA-II can proceed to the BBF phase with feasible BBs present in the population. This type of comparison should always be done by researchers attempting to improve the performance of their MOEAs. It is crucial to validate the performance of a repair operator.
Fig. 19.5. Initial Population for 100 Item 2 Knapsack MMOKP, MOMGA-II
Prior to execution of the MMOKP tests, a modification to the MOMGAII is completed to increase the selection pressure. The MOMGA-II uses an elitist selection mechanism in which the current Pareto front is passed from one generation to the next each time that selection occurs. This mechanism is used in the testing performs well. The elitist method used maintains PFCUTrent in the population from generation to generation but does not guarantee that a member from PF^nov.m is not destroyed or removed from the population during a particular generation. Instead, the nondominated members are stored to an external archive each generation and if nondominated at the conclusion of MOEA execution, appear in the final PFknown set. Maintaining PFcurrent through the selection mechanism is an-
470
Gary B. Lamont et al.
ticipated to increase the effectiveness of the MOMGA-II as the good BBs present in PFcurrent remain in the population. In order to increase the selection pressure and increase the convergence to a good solution set, this elitist scheme is implemented for the MMOKP. The elitist selection mechanism selects all of the population members that are elements of PFcurrent to be placed in the next generation. If the population is not full, additional members are selected through the use of an elitist based tournament selection mechanism that selects two individuals at random to compete against each other. The tournament selection process repeats until the required population size is achieved. This new elitist routine is expected to increase the effectiveness of the MOMGA-II. The results of this elitist scheme are presented in Figure 19.6 as (*)s, and the results of the normal tournament selection without elitism scheme is represented by (.)s. In the 100 item, 2 knapsack MMOKP, a slight improvement is noted in the results of this new elitist selection scheme. Since an improvement is realized in the limited testing and anticipated to improve performance, all of the MMOKP tests use this new elitist selection mechanism.
Fig. 19.6. Elitist 100 Item 2 Knapsack MMOKP
Multi- Objective Evolutionary Algorithms for Computer Science Applications
471
19.3.4. MOEA MMOKP Testing and Analysis In order to compare the MOMGA-II with a different MOEA as applied to the MMOKP, the SPEA is selected. The data results of the SPEA are available and hence allow a comparison to be conducted. A small difference exists between the results of the MOMGA-II as compared to the SPEA. The MOMGA-II obtains a limited number of points that dominate those the SPEA generates and vice versa, but overall both algorithms generate numerous identical points. The SPEA appears to obtain a few points that dominate the MOMGA-II results in the upper left end of the Pareto front. Overall the results from both MOEAs are similar with the exception of slightly better performance by the SPEA at the end of the front. The smallest instantiation of the MMOKP represents a difficult problem, but one in which both MOEAs find good solutions and a good distribution of points across the front. The next instantiation tested is the 250 item, 2 knapsack MMOKP. The MOMGA-II achieves better performance than the SPEA across the entire center section of the front, as well as the lower right end of the front. However the SPEA obtains points in the upper left end of the front unpopulated by MOMGA-II results. Better performance is realized by the MOMGA-II for most of the Pareto front but the SPEA continues to obtain better solutions in the upper left end. Increasing the dimensionality of the MMOKP leads to the next instantiation tested, the 500 item, 2 knapsack formulation. Results as generated by both the MOMGA-II and the SPEA for the 500 item formulation. The MOMGA-II obtains many more points than the SPEA and the results dominate those of the SPEA. The results illustrate a considerable distance between the Pareto front generated by the SPEA and the better front generated by the MOMGA-II. All of the points generated by the SPEA with the exception of two are dominated. The MOMGA-II also obtains a good spread of solutions across the front. While the MOMGA-II does not find solutions on one small area of the front, the solutions found dominate those solutions found by the SPEA. This illustrates much better performance on this more difficult problem. Note that the use of the number of function evaluation comparisons regarding explicit and implicit MOEAs as utilized here is not appropriate due to the extensive differences in algorithm structures. The largest instantiation of the 2 knapsack problem is also tested, the 750 item MMOKP. The MOMGA-II results exhibit much better performance across the entire front as compared to the SPEA. Figure 19.7 shows
472
Gary B. Lamont et al.
that the results of the MOMGA-II as compared to the SPEA. It is easily seen that the entire Pareto front generated by the MOMGA-II dominates the entire front generated by the SPEA in this 750 item MMOKP The MOMGA-II does not obtain the same spread of solutions as the SPEA but all of the MOMGA-II solutions are of higher quality (dominate) than those of the SPEA.
Fig. 19.7. 750 Item 2 Knapsack MMOKP
Results of calculating metric values for the 100, 250, 500, and 750 item, 2 fitness function MMOKP instantiations are presented in Table 19.81. The ONVG and spacing metrics are used along with the visualizations of the Pareto fronts presented. The selection of these metrics is discussed in detail in 38. For each MOEA, the mean and standard deviation results are presented for each metric. The MOMGA-II and SPEA obtain similar performance for the ONVG and spacing metrics as applied to the 100 item knapsack MMOKP. The 100 item knapsack MMOKP is the smallest instantiation of the MMOKP resulting in similar PFknown • Table 19.81 shows that the SPEA, on average, finds a larger number of points in PFknown and obtains a slightly better spacing value for the 250 item MMOKP. However, the MOMGA-II obtains a much better distribution of points and points of higher quality over most of the front, with the exception of the upper left section for the 250 item MMOKP. Overall, the MOMGA-II generates mostly points of equivalent or better quality but the
Multi-Objective Evolutionary Algorithms for Computer Science Applications
473
Table 19.81. 2 Knapsack MMOKP Results Number of I MOEA I Items 100 I SPEA 100 MOMGA-II 250 SPEA 250 MOMGA-II 500 SPEA 500 MOMGA-II 750 SPEA 750 1 MOMGA-II |
ONVG I Spacing Mean | SD Mean | SD 49.267 I 6.291 17.797 I 3.841 44.333 8.976 18.331 7.361 55.567 6.377 26.163 3.956 41.167 10.986 22.855 9.343 34.533 5.594 46.798 10.778 35.733 10.866 24.703 16.146 34.200 6.408 71.340 20.733 30.200 | 12.090 | 34.096 24.750
SPEA obtains a better spread of points. Since the quality of the results is typically a driving factor for use of an MOEA, one would deem both algorithms as performing well on this MOP instantiation. The 500 item MMOKP is a more challenging instantiation of the MMOKP and the formulation specifies a large number of decision variables. Results show that the solutions dominate all but two of the solutions found by the SPEA. Additionally, Table 19.81 illustrates that the average number of solutions and the spacing values generated by the results of both MOEAs are comparable and but the SPEA metric results tend to be slightly better than the MOMGA-II. However, the MOMGA-II obtains a good distribution of points across most of the front and the MOMGA-II points dominate those of the SPEA. Overall the MOMGA-II obtains better results on the 500 item MMOKP. The results presented in Table 19.81 show that the MOMGA-II obtains a similar number of vectors on average as the SPEA but obtains a much better spacing value for the 750 item MMOKP. The results are shown graphically in Figure 19.7 which also illustrates that the results of the MOMGA-II dominate all of the points found by the SPEA. Overall the MOMGA-II obtains better performance than the SPEA on this instantiation of the MMOKP. The results presented illustrate a trend that as the number of decision variables increase, the improvement in performance as compared to the SPEA increases, and the MOMGA-II typically generates solutions of higher quality. Jaszkiewicz proposes a different metric for comparing the results of different MOEAs. His method involves calculating the average range of fitness values of each Pareto front curve or surface and then he compares the average values among the MOEAs 12. To determine these averages one must find the minimum value generated for each fitness function in PFknown for
474
Gary B. Lamont et al.
each data run. The minimum values for each fitness function are then averaged across the number of data runs conducted. The same process is used to calculate the average maximum values for each fitness function. Table 19.82. MMOKP Knapsack Problem Results | Instance | MOMGA-II | IMMOGLS | SPEA | 2-250 I [8876.47, 9405.57] I [8520.15, 9537.85] [8407.63, 9460.87] [8956.97, 9546.53] [8614.15, 9629.90] [8809.47, 9747.23] 2-500 [18387.17, 18906.07] [17684.70, 19047.50] [17697.50, 18900.40] [18830.70, 19311.60] [18114.70, 19459.30] [18455.20, 19460.70] 2-750 [27001.57, 27506.87] [25902.60, 27868.50] [26374.90, 27924.00] [ [27110.57, 27580.47] [25738.10, 27904.30] | [26152.20, 27720.10]
Table 19.83. MMOKP Knapsack Problem Results (Cont) | Instance | MOMGA-II | M-PAES | MOGLS | 2-250 I [8876.47, 9405.57] I [8742.50, 9473.05] I [7332.55, 9883.90) [8956.97, 9546.53] [8866.95, 9593.00] [7747.65, 10093.2] 2-500 [18387.17, 18906.07] [18198.50, 19174.70] [16148.60, 20029.20] [18830.70, 19311.60] [18541.80, 19514.00] [16766.30, 20444.20] 2-750 [27001.57, 27506.87] [27100.30, 28661.80] [23728.90, 29938.80] [27110.57, 27580.47] [26460.30, 28255.90] [23458.20, 29883.10]
The results of the IMMOGLS, SPEA, M-PAES, MOGLS, and MOMGA-II 12 are presented in Tables 19.82 and 19.83. Jaszkiewicz uses this average range metric to state how well the MOEA solutions are spread out over PFknown • However, the data presentation format he used does not accurately reflect the spread or concentration of points. In most attempts to solve an MOP by an MOEA, researchers conduct numerous data runs and then combine the PFknown sets generated for each run. This combination of the data and subsequent analysis yields the overall PFknown solution set generated by an MOEA over a course of data runs. While one run may generate solutions in the lower right portion of the front (consider the characteristic of the MMOKP fronts as an example, Figure 19.7), another may generate solutions exclusively in the upper left portion of the front and the remaining runs may only generate solutions in the center of the front. A researcher solving real-world MOPs is interested in the final overall result and not necessarily in the averages. The averages can be deceiving. In the
Multi-Objective Evolutionary Algorithms for Computer Science Applications
475
previous example, most of the data runs generated solutions in the center of the front. Calculating the maximum and minimum average values using the data runs containing results on the endpoints of the Pareto front and combining the results with the two runs that generated solutions in a different area of the front may yield a value closer to the center than the end of the Pareto front. An analysis would then lead one to believe that the MOEA did not generate a good spread of solutions but instead a cluster around the center of the front. In comparison testing of an MOEA that consistently generated solutions only at the two ends of the front, the MOEA would be shown as generating a poor distribution of points across the front. However, the a different MOEA may not have generated any solutions in the center portion of the front and therefore does not obtain as good a spread of solutions but the average range table would show otherwise. While the average range data presentation format may be useful, it must be used in conjunction with other metrics or with a graphical presentation of the Pareto front in order to avoid misinterpreting the results. As stated earlier, in some cases, a visual representation may be better than the results of a specific metric as metrics are lossy evaluations and loose information in mapping a Pareto front of multiple data points to a single value. However, Tables 19.82 and 19.83 can be useful if used in conjunction with the graphical representation of PFknOwn presented in the Figures. It is important to realize that the MOGLS MOEA is executed on the relaxed formulation of the MMOKP, and not on the formulation of the MMOKP that the MOMGA-II and the SPEA use. Due to this fact, one must question if Jaszkiewicz's comparison of the MOGLS to IMMOGLS, SPEA, and M-PAES is valid. In Table 19.83 the results of MOGLS are included, but a direct comparison between the MOMGA-II and the SPEA is the main focus. A comparison is made to the SPEA as it achieves the best published performance when applied to the identical formulation of the MMOKP. The results presented in Tables 19.82 and 19.83 indicate that, for the most part, the MOMGA-II does not generate as effective results as the the other MOEAs when comparing the spread of solutions. This interpretation of the results is not necessarily correct. For example, consider the 2 knapsack, 750 item instantiation of the MMOKP. Table 19.82 indicates that the SPEA has a wider range of values and leads to better solutions but in actuality, Figure 19.7 illustrates that the front generated by the MOMGA-II dominates the entire front generate by the SPEA. Therefore this presentation format is not recommended for use when one Pareto front completely
476
Gary B. Lamont et al.
dominates another. The average range metric can mislead a researcher's analysis of the data if one is not careful to see the whole picture. The results of the MOMGA-II are very good when applied to the realworld NPC application MOPs. The MOMGA-II finds PFtrue for the 60 bit formulation of the ALP and appears to find good solutions for the larger 120 bit formulation. The MOMGA-II also performs favorably when applied to the MMOKP and compared to other MOEAs. In particular, as the problem size increases, so does the effectiveness of the MOMGA-II in terms of dominating the solutions found by other MOEAs. Additionally, detailed descriptions of possible repair mechanisms and the best found MOMGA-II repair mechanism to use with MOPs formulated with integer based decision variables is presented. The MOMGA-II performs well when using repair mechanisms to attempt to solve these discrete constrained MOPs. 19.4. MOEA BB Conjectures for NPC Problems Since one seeks to find the best solution in any optimization problem, identification of the good BB(s) is critical to generating good and hopefully the best solution. In the search for the optimal solution, it is possible that the identification of only one good BB is necessary to generate all of the good solutions on the Pareto front or, the more likely case, that there exists multiple BBs that must be identified in order to generate the multiple solutions on the Pareto front. If the identification of more than one good BB is necessary to find the entire front, then the MOEA must find the multiple good BBs. The size of the BBs that are necessary to find points on the Pareto front has yet to be addressed in the literature. Assuming a worst case situation that multiple BBs are contained within each point in PFtrue, the possibility also exists for the good BBs necessary to generate the points in PFtrUe to be of varying sizes. In general, the identification of multiple BBs is necessary to generate PFtme for many MOPs as multiple solutions exist in PFtrUe • Since multiple solutions existing in PFtrue, and multiple BBs are typically necessary to generate these solutions, there is a high probability that multiple BB sizes are also required to generate all of the solutions in PFtme • The identification of multiple BB sizes by an MOEA results in the generation of multiple fronts of different ranks through the search process. As good BBs are identified and recombined by the MOMGA-II, solutions on inferior fronts (fronts of rank 1, 2, etc.) are generated as the
Multi-Objective Evolutionary Algorithms for Computer Science Applications
477
population progresses towards PFtrUe • Once all of the necessary good BBs are generated, assuming a large enough population size to combat the noise present in the evolutionary process, an MOEA generates all of the points in PFtrue plus portions of the inferior fronts. Some researchers have identified in explicit statements or through the results they have presented a difficulty for MOEAs in the generating points at the extremes of the front 12>19>37. The extremes are referred to as the endpoints of the curve or /c-dimension surface as dictated by the k objective functions. The difficulty of generating the extreme points of the Pareto front is attributed to the necessary identification of multiple BBs of different sizes. Implicit BB-based MOEAs may only generate BBs of a single size or may not be executed with a population size large enough to statistically generate the multiple good BBs of various sizes necessary to generate PFtrue • Various examples can illustrate the effect that different BB sizes may have in finding various points in the ranked fronts and in PFtrue • Since many MOEA researchers conduct their research efforts with implicit BB-based MOEAs, building block concepts and the effects of the identification of good BBs are not readily noticeable. Through research conducted using the MOMGA-II and the theoretical development of population sizing equations 38 based upon the Building Block Hypothesis, the need for different sized BBs to generate PFtrue becomes apparent. Many of the existing MOEAs are not effective at finding all of the points on the Pareto front and more explicitly, points at the endpoints or end sections of the Pareto front when applied to test suite and real-world MOPs 12'19>37. While generating any point(s) on the Pareto front may be useful for real-world applications in which potential solutions have not been found, it would be even more useful if a researcher could generate a good distribution of points across the entire front. This has been identified by researchers utilizing MOEAs as an important issue 4-12>19. A question that the MOEA community should answer is: Why do various MOEAs fail to find the endpoints of the Pareto front or if they do find some of the points, why does this typically occur with larger population sizes?
When using an explicit BB-based MOEA, the implications of Van Veldhuizen's theorem are that one must use a BB of the same order as the largest order BB required to solve each of the functions in the MOP leading to various conjectures 38!
478 19.5.
Gary B. Lamont et al. Future Directions
The mQAP and the MMOKP are examples of difficult NPC MOP problems to solve deterministically for relatively large problem sizes. Stochastic algorithms, like MOEAs, take a long time to get a "good" answer for a large number of locations simply because the solution space is so large and of exponential complexity. It's imperative to ensure that the proper building block sizes are used in order to populate PFknown with enough members to get as close to PFtTue as possible. Thus, in applying MOEAs to large dimensional NPC MOPs, one should consider possible problem relaxation, analysis of building block structures, use of a variety of MOEAs and operators, parallel computation, and finally an extensive design of experiments with appropriate metric selection, parameter sensitivity analysis, and comparison. We plan on looking at how chromosome sizing affects the mQAP results. By changing the bit representation, we can cut the chromosome size down from 10 bits per location to 4. This effectively halves the genotype space from previous experiments. This should produce better results since the search space is reduced. This concept can also improve the efficiency in solving other NP-Complete MOPs. References 1. Vincent Bachelet. Metaheuristiques Paralleles Hybrides: Application au Probleme D'affectation Quadratique. PhD thesis, Universite des Sciences et Technologies de Lille, December 1999. 2. John E. Beasley. Or-library. 12 May 2003 http://mscmga.ms.ic.ac.uk/ info.html. 3. Eranda Cela. The Quadratic Assignment Problem - Theory and Algorithms. Kluwer Academic Publishers, Boston, MA, 1998. 4. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, May 2002. 5. Richard O. Day, Mark P. Kleeman, and Gary B. Lamont. Solving the Multiobjective Quadratic Assignment Problem Using a fast messy Genetic Algorithm. In Congress on Evolutionary Computation (CEC'2003), volume 4, pages 2277-2283, Piscataway, New Jersey, December 2003. IEEE Service Center. 6. Eranda ela. Qaplib - a quadratic assignment problem library. 8 June 2004 http://www.opt.math.tu-graz.ac.at/qaplib/. 7. L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for the quadratic assignment problems. Journal of the Operational Research Society, 50:167-176, 1999.
Multi- Objective Evolutionary Algorithms for Computer Science Applications
479
8. M. R. Garey and D. S. Johnson. Computers and Intractability - A Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979. 9. David E. Goldberg, Kalyanmoy Deb, Hillol Kargupta, and Georges Harik. Rapid, accurate optimization of difficult problems using fast messy genetic algorithms. In Stephanie Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 56-64. Morgan Kauffmann Publishers, 1993. 10. Peter Hahn, Nat Hall, and Thomas Grant. A branch-and bound algorithm for the quadratic assignment problem based on the hungarian method. European Journal of Operational Research, August 1998. 11. Jorng-Tzong Horng, Chien-Chin Chen, Baw-Jhiune Liu, and Cheng-Yen Kao. Resolution of quadratic assignment problems using an evolutionary algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation, volume 2, pages 902-909. IEEE, IEEE, 2000. 12. Andrzej Jaszkiewicz. On the performance of multiple-objective genetic local search on the 0/1 knapsack problem—a comparative experiment. IEEE Transactions on Evolutionary Computation, 6(4):402-412, August 2002. 13. Mark P. Kleeman. Optimization of heterogenous uav communications using the multiobjective quadratic assignment problem. Master's thesis, Air Force Institute of Technology, Wright Patterson AFB, OH, March 2004. 14. Mark P. Kleeman, Richard O. Day, and Gary B. Lamont. Multi-objective evolutionary search performance with explicit building-block sizes for npc problems. In Congress on Evolutionary Computation (CEC2004), volume 4, Piscataway, New Jersey, May 2004. IEEE Service Center. 15. Joshua Knowles and David Corne. M-PAES: A Memetic Algorithm for Multiobjective Optimization. In 2000 Congress on Evolutionary Computation, volume 1, pages 325-332, Piscataway, New Jersey, July 2000. IEEE Service Center. 16. Joshua Knowles and David Corne. Instance generators and test suites for the multiobjective quadratic assignment problem. Technical Report TR/IRIDIA/2002-25, IRIDIA, 2002. (Accepted for presentation/publication at the 2003 Evolutionary Multi-criterion Optimization Conference (EMO2003)), Faro, Portugal. 17. Joshua Knowles and David Corne. Towards Landscape Analyses to Inform the Design of Hybrid Local Search for the Multiobjective Quadratic Assignment Problem. In A. Abraham, J. Ruiz del Solar, and M. Koppen, editors, Soft Computing Systems: Design, Management and Applications, pages 271279, Amsterdam, 2002. IOS Press. ISBN 1-58603-297-6. 18. Joshua Knowles and David Corne. Instance generators and test suites for the multiobjective quadratic assignment problem. In Carlos Fonseca, Peter Fleming, Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele, editors, Evolutionary Multi-Criterion Optimization, Second International Conference, EMO 2003, Faro, Portugal, April 2003, Proceedings, number 2632 in LNCS, pages 295310. Springer, 2003. 19. Marco Laumanns, Lothar Thiele, Eckart Zitzler, and Kalyanmoy Deb. Archiving with Guaranteed Convergence and Diversity in Multi-Objective
480
20. 21. 22.
23. 24. 25. 26. 27. 28.
29. 30.
31. 32. 33. 34.
Gary B. Lamont et al.
Optimization. In W.B. Langdon and et. al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '2002), pages 439447, San Francisco, California, July 2002. Morgan Kaufmann Publishers. In Lee, Riyaz Sikora, and Michael J. Shaw. A genetic algorithm-based approach to flexible flow-line scheduling with variable lot sizes. IEEE Transactions on Systems, Man and Cybernetics - Part B, 27:36-54, February 1997. Vittorio Maniezzo and Alberto Colorni. The ant system applied to the quadratic assignment problem. IEEE Transactions on Knowledge and Data Engineering, 11:769-778, 1999. Peter Merz and Bernd Freisleben. A comparison of memetic algorithms, tabu search, and ant colonies for the quadratic assignment problem. In Proceedings of the 1999 Congress on Evolutionary Computation, 1999. CEC 99, volume 3, pages 1999-2070. IEEE, IEEE, 1999. Peter Merz and Bernd Freisleben. Fitness landscape analysis and memetic algorithms for the quadratic assignment problem. IEEE Transactions on Evolutionary Computation, 4:337-352, 2000. Zbigniew Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, 2nd edition, 1994. Arnold Neumaier. Global optimization test problems. 8 June 2004 http: //www.mat.univie.ac.at/~neum/glopt/test.html. Volker Nissen. Solving the quadratic assignment problem with clues from nature. IEEE Transactions on Neural Networks, 5:66-72, 1994. Panos M. Pardalos and Henry Wolkowicz. Quadratic assignment and related problems. In Panos M. Pardalos and Henry Wolkowicz, editors, Proceedings of the DIM ACS Workshop on Quadratic Assignment Problems, 1994. K. G. Ramakrishnan, M. G. C. RESENDE, and P. M. PARDALOS. A branch and bound algorithm for the quadratic assignment problem using a lower bound based on linear programming. In C. Floudas and P. M. PARDALOS, editors, State of the Art in Global Optimization: Computational Methods and Applications. Kluwer Academic Publishers, 1995. Gerhard Reinelt. Tsplib. 4 May 2003 http://www.iwr.uni-heidelberg.de/ groups/comopt/software/TSPLIB95/. Masatoshi Sakawa, Kosuke Kato, and Toshihiro Shibano. An interactive fuzzy satisficing method for multiobjective multidimensional 0-1 knapsack problems through genetic algorithms. In Proceedings of the 1996 International Conference on Evolutionary Computation (ICEC'96), pages 243-246, 1996. Kwang Mong Sim and Weng Hong Sun. Multiple ant-colony optimization for network routing. In First International Symposium on Cyber Worlds (CW'02), volume 2241, pages 277-281. IEEE, IEEE, 2002. G. Skorobohatyj. Mp-testdata. 20 May 2003 http://elib.zib.de/pub/ Packages/mp-testdata/. Eric D. Taillard. Comparison of iterative searches for the quadratic assignment problem. Location science, 3:87-105, 1995. Ke Xu. Bhoslib: Benchmarks with hidden optimum solutions for graph problems. 8 June 2004 http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/ graph-benchmarks.htm.
Multi-Objective Evolutionary Algorithms for Computer Science Applications
481
35. Eckart Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. PhD thesis, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, November 1999. 36. Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. Technical Report 103, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland, May 2001. 37. Eckart Zitzler and Lothar Thiele. Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, 3 (4): 257-271, November 1999. 38. Jesse Zydallis. Explicit Building-Block Multiobjective Genetic Algorithms: Theory, Analysis, and Development. PhD thesis, Air Force Institute of Technology, Wright Patterson AFB, OH, March 2003.
CHAPTER 20 DESIGN OF FLUID POWER SYSTEMS USING A MULTI OBJECTIVE GENETIC ALGORITHM
Johan Andersson Department of Mechanical Engineering, Linkoping University SE-581 83 Linkoping, Sweden E-mail: [email protected] Within this chapter the multi-objective struggle genetic algorithm is employed to support the design of hydraulic actuation systems. Two concepts, a valve-controlled and a pump-controlled system for hydraulic actuation are evaluated using Pareto optimization. The actuation systems are analyzed using comprehensive dynamic simulation models to which the optimization algorithm is coupled. The outcome from the Pareto optimization is a set of Pareto optimal solutions, which allows visualization of the trade-off between the objectives. Both systems are optimized, resulting in two Pareto fronts that visualize the trade-off between system performance and system cost. By comparing the two Pareto fronts, it could be seen under which preferences a valve system is to be preferred to a pump system. Thus, optimization is employed in order to support concept selection. Furthermore, general design problems usually constitute a mixture of determining both continuous parameters as well as selecting individual components from catalogs or databases. Therefore the optimization is extended to handle a mixture of continuous parameters and discrete selections from catalogs. The valve-controlled system is again studied, but this time with cylinders and valves arranged in hierarchical catalogs, resulting in a discrete Pareto optimal front. 20.1. Introduction Design is an iterative feedback process where the performance of the system is compared with the specification, see for example Pahl and Beitz12 and Rozenburg and Eekels13. Usually this is a manual process where the designer makes a prototype system, which is tested and modified until sat483
484
Johan Andersson
isfactory. With the help of a simulation model, the prototyping could be reduced to a minimum. If the desired behavior of the system can be described as a function of the design parameters and the simulation results, it is possible to introduce optimization as a tool to further help the designer to reach an optimal solution. A design process that comprises simulation and optimization is presented in Andersson1 and depicted in Figure 20.1 below.
Fig. 20.1. A system design process including simulation and optimization.
The 'problem definition' in Figure 20.1 results in a requirements list which is used in order to generate different solution principles/concepts. Once the concepts have reached a sufficient degree of refinement, modeling and simulation are employed in order to predict the properties of particular system solutions. Each solution is evaluated with the help of an objective function, which acts as a figure of merit. Optimization is then employed in order to automate the evaluation of system solutions and to generate new system proposals. The process continues until the optimization is converged and a set of optimal systems are found. One part of the optimization is the evaluation of design proposals. The second part is the generation of new and hopefully better designs. Thus, optimization consists of both analysis (evaluation) and synthesis (generation of new solutions). Often the first optimization run does not result in the final design. If the optimization does not converge to a desired system, the concept has to be modified or the problem reformulated, which results in new objectives. In Figure 20.1 this is visualized by the two outer loops back to 'generation of solution principles' and 'problem definition' respectively. Naturally the activity 'generation of solution principles' produces a number of conceivable concepts, which each one is optimized. Thus each concept is brought to maximum performance; optimization thereby pro-
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
485
vides a solid basis for concept selection. This will be illustrated later in a study of hydraulic actuation systems. One essential aspect of using modeling and simulation is to understand the system we are designing. The other aspect is to understand our expectations on the system, and our priorities among the objectives. Both aspects are equally important. It is essential to engineering design to manage the dialog between specification and prototype. Often simulations confirm that what we wish for is unrealistic or ill-conceived. Conversely, they can also reveal that our whishes are not imaginative enough. However, engineering design problems are often characterized by the presence of several conflicting objectives. When using optimization to support engineering design, these objectives are usually aggregated to one overall objective function. Optimization is then conducted with one optimal design as the result. Another way of handling the problem of multiple objectives is to employ the concept of Pareto optimality. The outcome from a Pareto optimization is a set of Pareto optimal solutions, which visualizes the tradeoff between the objectives. In order to choose the final design, the decision-maker then has to trade the competing objectives against each other. General design problems also consists of a mixture of determining both continuous parameters as well as selecting individual components from catalogs or databases. Thus, an optimization strategy suited for engineering design problems has to be able to handle a mixture of continuous parameters as well as discrete selections of components from catalogs. This chapter continues with describing a nomenclature for the general multi-objective design problem. Thereafter, multi-objective genetic algorithms are discussed and the proposed multi-objective struggle GA is described together with the genetic operators used. The optimization method is then connected to the HOPSAN simulation program and applied to support the design of two concepts for hydraulic actuation. Thus, it is shown how optimization could be employed in order to support concept selection. The simulation model is then extended to include component catalogs for valves and cylinders. The optimization strategy is modified accordingly and the problem is solved as a mixed discrete/continuous optimization problem. 20.2. The Multi-Objective Optimization Problem A general multi-objective design problem is expressed by equation B . I , where fi(x), f2(x),..., fk(x) are the k objectives functions, ( x i , x2, •••,xn)
486
Johan Andersson
are the n optimization parameters, and 5 6 Rn is the solution or parameter space. Obtainable objective vectors, {F (x) \x € S} are denoted by Y. Y G Rk is usually referred to as the attribute space, where dY is the boundary of Y. For a general design problem, F is non-linear and multi-modal and S might be defined by non-linear constraints containing both continuous and discrete member variables. minF(x) = [/i(x),/ 2 (x),..,/fc(x)] s.t. x e S x=
(B.I)
(xi,x2,..;Xn)
The Pareto subset of dY is of particular interest to the rational decisionmaker. The Pareto set is defined by equation (B.2). Considering a minimization problem and two solution vectors x,y € S, x is said to dominate y, denoted x >- y, if: Vt e {1, 2,..., k} : fi (x) < ft (y) and 3j 6 {1, 2,.... k} : fj (x) < fj (y) (B.2) If the final solution is selected from the set of Pareto optimal solutions, there would not exist any solutions that are better in all attributes. It is clear that any final design solution should preferably be a member of the Pareto optimal set. If the solution is not in the Pareto optimal set, it could be improved without degeneration in any of the objectives, and thus it is not a rational choice. This is true as long as the selection is done based on the objectives only. The presented nomenclature is visualized in Figure 20.2 below. 20.3. Multi-Objective Genetic Algorithms Genetic algorithms are modeled after mechanisms of natural selection. Each optimization parameter (xn) is encoded by a gene using an appropriate representation, such as a real number or a string of bits. The corresponding genes for all parameters Xi, ..xn form a chromosome capable of describing an individual design solution. A set of chromosomes representing several individual design solutions comprise a population where the most fit are selected to reproduce. Mating is performed using crossover to combine genes from different parents to produce children. The children are inserted into the population and the procedure starts over again, thus creating an artificial Darwinian environment. For a general introduction to genetic algorithms, see work by Goldberg8.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
487
Fig. 20.2. Solution and attribute space nomenclature for a problem with two design variables and two objectives.
When the population of an ordinary genetic algorithm is evolving, it usually converges to one optimal point. It is however tempting to adjust the algorithm so that it spreads the population over the entire Pareto optimal front instead. As this idea is quite natural, there are many different types of multi-objective genetic algorithms. For a review of genetic algorithms applied to multi-objective optimization, readers are referred to the work done by Deb3. Literature surveys and comparative studies on multiobjective genetic algorithms are also provided by several other authors, see for example Coello4, Horn10 and Zitzler and Thiele15. 20.3.1. The Multi-Objective
Struggle GA
In this paper the multi-objective struggle genetic algorithm (MOSGA)1'3 is used for the Pareto optimization. MOSGA combines the struggle crowding genetic algorithm presented by Grueninger and Wallace9 with Pareto-based ranking as devised by Fonseca and Fleming7. As there is no single objective function to determine the fitness of the different individuals in a Pareto optimization, the ranking scheme presented by Fonseca and Fleming is employed, and the "degree of dominance" in attribute space is used to rank the population. Each individual is given a rank based on the number of individuals in the population that are preferred to it, i.e. for each individual the algorithm loops through the whole population counting the number of preferred individuals. "Preferred to" is implemented in a strict Pareto sense, according to equation (B.2), but one could also combine Pareto optimality with the satisfaction of objective goal levels, as discussed in ref.7. The principle of the MOSGA algorithm is outlined below.
488
Johan Andersson
Step 1: Initialize the population. Step 2: Select parents using uniform selection, i.e. each individual has the same probability of being chosen. Step 3: Perform crossover and mutation to create a child. Step 4: Calculate the rank of the new child. Step 5: Find the individual in the entire population that is most similar to the child. Replace that individual with the new child if the child's ranking is better, or if the child dominates it. Step 6: Update the ranking of the population if the child has been inserted. Step 7: Perform steps 2-6 according to the population size. Step 8: If the stop criterion is not met go to step 2 and start a new generation. Step 5 implies that the new child is only inserted into the population if it dominates the most similar individual, or if it has a lower ranking, i.e. a lower "degree of dominance". Since the ranking of the population does not consider the presence of the new child it is possible for the child to dominate an individual and still have the same ranking. This restricted replacement scheme counteracts genetic drifts and is the only mechanism needed in order to preserve population diversity. Furthermore, it does not need any specific parameter tuning. The replacement scheme also constitutes an extreme form of elitism, as the only way of replacing a non-dominated individual is to create a child that dominates it. The similarity of two individuals is measured using a distance function. The method has been tested with distance functions based upon the Euclidean distance in both attribute and parameter space. A mixed distance function combining both the attribute and parameter distance has been evaluated as well. The result presented here was obtained using an attribute based distance function. An inherent property of the crowding method is the capability to identify and maintain multiple Pareto fronts, i.e. global and local Pareto fronts in multi modal search spaces, see 1>2'3. In real world applications can only parts of the true problem be reflected in the formulation of the optimization problem. Therefore it is valuable to know about the existence of local optima as they might posses other properties, such as robustness, that are important to the decision maker but not reflected in the objective functions.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
489
In single objective optimization, niching techniques have been introduced in order to facilitate the identification of both global and local optima. As can be seen from the description of the method there are no algorithm parameters that have to be set by the user. The inputs are only: population size, number of generations, genome representation and crossover and mutation methods, as in every genetic algorithm. 20.3.2. Genome
Representation
The genome encodes design variables in a form suitable for the GA to operate upon. Design variables may be values of parameters (real or integer) or represent individual components selected from catalogs or databases. Thus, the genome is a hybrid list of real numbers (for continuous parameters), integers and references to catalog selections, see Figure 20.3. A catalog could be either a straight list of elements, or the elements could be arranged in a hierarchy. Each element of a catalog represents an individual component. The characteristics of catalogs would be discussed further on and exemplified by the design example. Real numbers 4.237
I
6.87e-3
Catalog selections
I
12 \
I
37
12th element, 1st catalog Fig. 20.3. Example of the genome encoding. The first two elements represent real variables and the last two elements catalog selections.
20.3.3. Similarity
Measures
Speciating GAs require a measure of likeness between individuals, a so called similarity measure. The similarity measure is usually based on a distance function that calculates the distance between two genomes. The similarity could be based on the distance in either the attribute space (between the objectives), the phenotype space (between the design parameters) or the genotype space (in the genome encoding). As direct encoding is used (not a conversion to a string of bits), a phenotype and a genotype distance function would yield the same result. It is shown in references 3-2
490
Johan Andersson
that the choice between an attribute based and a parameter based distance function might have a great influence on the outcome of the optimization. To summarize; an attribute space distance measure gives a fast and precise convergence on the global Pareto optimal front, whereas a parameter based distance function does not converges as fast but has the advantage of identifying and maintaining both global and local Pareto optimal fronts. 20.3.3.1. Attribute Based Distance Function One way of comparing two individual designs is to calculate their distance in attribute space. As we want the population to spread evenly on the Pareto front (in attribute space) it seems to be a good idea to use an attribute based distance measure. The distance between two solutions (genomes) in attribute space is calculated using the normalized Euclidean distance, see equation C.I.
Distance^, b) = , £ ( f ^ I f \ ^ = 1 \/imax
Jimin/
)' \
^
K
Where fta and fib are the objective values for the ith objective for o and b respectively. fimax and fimin is the maximum and the minimum of the i:th objective in the current population, and k is the number of objectives. Thus, the distance function will vary between 0, indicating that the individuals are identical, and 1 for the very extremes. 20.3.3.2. Phenotype Based Distance Function Another way of calculating the distance between solutions is to use the distance in parameter (phenotype) space. As the genome might be a hybrid mixture of real numbers and catalog selections, we have to define different distance functions to work on different types of elements. The methods described here build on the framework presented by Senin et al.14. In order to obtain the similarity between two individuals the distance between each design variable is calculated. The overall similarity is then obtained by summing up the distances for each design variable. 20.3.3.3. Real Number Distance A natural distance measure between two real numbers is the normalized Euclidean distance, see equation C.2.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
Distance^, b) = J I ^— I y \ m a x distance/
491
(C.2)
Where a and b are the values for the two real numbers and max distance is the maximum possible distance between the two values (i.e. the search boundaries). 20.3.3.4. Catalog Distance Distance between two catalog selections could be measured through relative positions in a catalog or a catalog hierarchy. The relative position is only meaningful if the catalog is ordered, see Figure 20.4.
Fig. 20.4. Examples of ordered and unordered catalogs.
The dimensionless distance between two elements within the same catalog is expressed by equation C.3 and exemplified in Figure 20.5. ^ . ,. pos(a) — pos(b) Distance^, b) = , max distance
Fig. 20.5. Distance evaluation for two elements of an ordered catalog.
._, . (C.3)
492
Johan Andersson
For catalog hierarchies equation C.3 has to be generalized as exemplified in Figure 20.6. For elements belonging to the same sub-catalog, the distance is evaluated using the relative position within that sub-catalog. Otherwise, the maximum length of the path connecting the different sub-catalog is used. This implies that for two given sub-catalogs an element in one catalog is equally distant from every element in the other catalog. The length of the path is calculated as the maximal distance within the smallest common hierarchy. In both cases, the distance is normalized by dividing with the maximum distance (i.e. the catalog size).
Fig. 20.6. Exemplification of distances between different catalog elements in a hierarchical catalog.
20.3.3.5. Overall Distance So far, distance measures for individual design variables have been developed. An overall distance measure for comparing two genomes is obtained by aggregating the distances for the individual design variables, see equation C.4.
Dist!ince(a,b) = tDiStanCe{DVi) i=i
(CA)
n
Where a and b are the two designs being compared, and n is the number of design variables (DV) encoded by the genome. Thus, the phenotype distance between two individual designs is calculated by summing up the individual distances for each element of the genome.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
20.3.4. Crossover
493
Operators
As the genome is a hybrid mix of continuous variables and catalog selections, we define different operators to work on different type of elements. Uniform crossover is used, which implies that each element of the fathers genome is crossed with the corresponding element from the mothers genome. For real numbers BLX crossover6 is used, see exemplification in Figure 20.7. For catalog selections, an analog crossover scheme is employed as illustrated in Figure 20.8.
Fig. 20.7. The outcome of a BLX crossover between two real numbers a and b is randomly selected from an interval of width 2d centered on the average M.
Fig. 20.8. An exemplification of the catalog crossover. The outcome of a crossover of individuals within the same catalog (a and b) are randomly selected from the interval between them. For individuals from different sub-catalogs (c and d) the outcome is randomly selected within the smallest common hierarchy.
494
Johan Andersson
20.4. Fluid Power System Design The objects of study are two different concepts of hydraulic actuation systems. Both systems consist of a hydraulic cylinder that is connected to a mass of 1000 kilograms. The objective is to follow a pulse in the position command with a small control error and simultaneously obtain low energy consumption. Naturally, these two objectives are in conflict with each other as low control error implies large acceleration which consumes more energy. The problem is thus to minimize both the control error and the energy consumption from a Pareto optimal perspective.
Fig. 20.9. The valve concept for hydraulic actuation.
Two different ways of controlling the cylinder are studied. In the first more conventional system, the cylinder is controlled by a directional valve, which is powered from a constant pressure system. In the second concept, the cylinder is controlled by a servo pump. Thus, the systems have different properties. The valve concept has all that is required for a low control error, as the valve has a small mass and thus very high bandwidth. On the other hand, the valve system is associated with higher losses, as the valve constantly throttles fluid to the tank. The different concepts have been modeled in the simulation package HOPSAN, see ref.11. The system models are depicted in Figures 20.9 and 20.10 respectively.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
495
The models of each component of the systems consist of a set of algebraic and differential equations considering effects such as friction, leakage and non-linearities as for example limited stroke distances and stroke speeds. HOPSAN uses a distributed simulation technique where each component contains its own numerical solver. The components are then connected using transmission line elements as describes in ref.11. The distributed simulation technique has the advantage that the components are numerically separated from each other which promotes stability. Furthermore, the computational time grows linearly with the size of the problem, which is not true for centralized solvers. The HOPSAN simulation software could be freely downloaded from the web. The valve system consists of the mass and the hydraulic cylinder, the directional valve and a p-controller to control the motion. The directional valve is powered by a constant pressure pump and an accumulator, which keeps the system pressure at a constant level. The optimization parameters are the sizes of the cylinder, valve and the pump, the pressure level, the feedback gain. Furthermore, a leakage parameter is added to both systems in order to guarantee sufficient damping. Thus, this problem consists of six optimization parameters and two objectives.
Fig. 20.10. The pump concept of hydraulic actuation.
The pump concept contains fewer components: the cylinder and the mass, the controller and the pump. A second order low-pass filter is added in order to model the dynamics of the pump. The pump system consists of only four optimization parameters. The performance of a relatively fast
496
Johan Andersson
pump system is depicted in Figure 20.11.
Fig. 20.11.
Typical pulse response for a pump system.
20.4.1. Optimization Results Both systems where optimized in order to simultaneously minimize the control error /i and the energy consumption / 2 . The control error is obtained by integrating the absolute value of the control error and adding a penalty for overshoots, see equation D.I. The energy consumption is calculated by integrating the hydraulic power, expressed as the pressure times the flow, see equation D.2 4
/ 2
4
h = f \Xref - X\dt + a l I (X> Xref)dt 0
\0
\
(D.I)
+ f {x < Xref)dt 2
/
4 h =
{Qpump • Ppump)dt
(D.2)
0
The optimization is conducted with a population size of 30 individuals over 200 generations. The parameters are real encoded and BLX crossover is used to produce new offspring, and the Euclidean distance in attribute space was used as the similarity measure.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
497
As a Pareto optimization searches for all non-dominated individuals, the final population will contain individuals with a very high control error, as they have low energy consumption. It is possible to obtain an energy consumption close to zero, if the cylinder does not move at all. However, these solutions are not of interest, as we want the system to follow the pulse. Therefore, a goal level/constraint on the control error is introduced. The optimization strategy is modified so that solutions below the goal level on the control error are always preferred to solutions that are above it regardless of their energy consumption. In this manner, the population is focused on the relevant part of the Pareto front. The obtained Pareto optimal fronts for both systems are depicted in Figure 20.12. In order to achieve fast systems, and thereby low control errors, large pumps and valves are chosen by the optimization strategy. A large pump delivers more fluid, which enables higher speed of the cylinder. However, bigger components consume more energy, which explains the shape of the Pareto fronts.
Fig. 20.12. Pareto fronts showing the trade-off between energy consumption and control error for the two concepts. The graph on the right shows a slow pulse response, whereas the graph on the left shows a fast pulse response.
498
Johan Andersson
When the Pareto fronts for different concepts are drawn within the same graph, as in Figure 20.12, an overall Pareto optimal front could be obtained by identifying the non-dominated set from all Pareto optimal solutions obtained. It is then evident that the final design should preferably be on the overall Pareto front, which elucidates when it is rational to switch between concepts. The servo pump system consumes less energy and is preferred if a control error larger than 0.05 ms is acceptable. The servo valve system is fast but consumes more energy. If a lower control error than 0.05 ms is desired, the final design should preferably be a servo valve system. In order to choose the final design, the decision-maker has to select a concept and then study the trade-off between the control error and the energy consumption and select a solution point on the Pareto front. This application shows how Pareto optimization can be employed to support concept selection, by visualizing the pros and cons of each concept.
20.5. Mixed Variable Design Problem Real design problems usually show a mixture of determining continuous parameters as well as selecting existing components from catalogs or databases, see Senin et al.14. Therefore, the multi-objective genetic algorithm has been extended to handle a mixture of continuous variables as well as discrete catalog selections. The object of study for the mixed variable design problem is the valve actuation system depicted in Figure 20.9. The objective is again to design a system with good controllability, but this time at low cost. When designing the system, cylinders and valves are selected from catalogs of existing components. To achieve good controllability we can choose a fast servo valve, which is more expensive than a slower proportional valve. Therefore, there is a trade-off between cost and controllability. The cost for a particular design is composed of the cost for the individual components as well as the cost induced by the energy consumption. Other parameters such as the control parameter, the leakage coefficient and the pump size have to be determined as well. Thus the problem is multiobjective with two objectives and five optimization variables, of which two are discrete catalog selections and three are continuous variables. For this optimization the pressure level is not an optimization parameter, as it is determined by the choice of the cylinder.
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
499
20.5.1. Component Catalogs For the catalog selections, catalogs of valves and cylinders have been included in the HOPSAN simulation program. For the directional valve, the choice is between a slow but inexpensive proportional valve or an expensive and fast servo valve. Valves from different suppliers have been arranged in two ordered sub-catalogs as depicted in Figure 20.13. The same structure applies to the cylinders as they are divided into sub-catalogs based on their maximum pressure level. The pressure in the system has to be controlled so that the maximum pressure for the cylinder is not exceeded. A low-pressure system is cheaper but has inferior performance compared to a high-pressure system. Each catalog element contains a complete description of that particular component, i.e. the parameters that describe the dynamics of the component, which is needed by the simulation model as well as information on cost and weight etc.
Fig. 20.13. The catalog of directional valves is divided into proportional valves and servo valves. Each sub-catalog is ordered based on the valve size. For each component, a set of parameters describing the component is stored together with information on cost and weight.
20.5.2. Optimization
Results
The system has been optimized using a population of 40 individuals and 400 generations. In order to limit the Pareto front a goal level on the control error was introduced for this problem as well. The result could be divided into three distinct regions depending on valve type and pressure level, see Figure 20.14. As can be seen from Figure 20.14, there is a trade-off between system performance (control error) and system cost. By accepting a higher cost, better performance could be achieved. The cheapest designs consist
500
Johan Andersson
Fig. 20.14. Optimization results. In (a) the obtained Pareto optimal front is shown in the objective space. Different regions have been identified based on valve and cylinder selections, which is shown in the parameter space in (b).
of small proportional valves and low-pressure cylinders. By choosing larger proportional valves and high-pressure cylinders, the performance could be increased at the expense of higher cost. If a still better performance is desired, a servo valve has to be chosen, which is more expensive but has better dynamics. The continuous parameters, such as the control parameter, tend to smoothen out the Pareto front. For a given valve and cylinder, different settings on the continuous parameters affect the pulse response. A faster response results in a lower control error, but also a higher energy consumption and thereby higher cost. Therefore, there is a local trade-off between cost and performance for each catalog selection. 20.6. Discussion and Conclusions Modelling and simulation are very powerful tools that could support the engineering design process and facilitate better and deeper understanding of the systems being developed. When connecting an optimization strategy to the simulation model the knowledge acquisition process could be sped-up further as the optimization searches through the simulation model in an efficient manner. Furthermore, the optimization frequently identifies loopholes and shortcomings of the model since it is unbiased in its search for an optimal design. As a system designer, or model developer, it is hard to conduct such a thorough inspection of the model as the optimization does. Thus even more information could be gathered out from the simula-
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
501
tion models if combined with an optimization strategy. As has been shown in this chapter optimization also elucidates how the preferences among the objectives impact the final design. Thus optimization facilitates the understanding of the system being developed as well as our expectations on the system and our priorities among the objectives. In this chapter the multi-objective struggle genetic algorithm is connected to the HOPSAN simulation program in order to support the design of fluid power systems. The method has been applied to two concepts of hydraulic actuation systems; a valve-controlled system and a pump controlled system, which has been modeled in the HOPSAN simulation environment. Both systems were optimized in order to minimize the control error and the energy consumption. Naturally, these two objectives are in conflict with each other, and thus the resulting Pareto fronts visualize the trade-off between control error and energy consumption for each concept. The existence of the trade-off was known beforehand, but with support of the Pareto optimization the trade-off could be quantified and the performance for designs at different regions on the Pareto front could be visualized in order to point out the effects of the trade-off. When the Pareto optimal fronts for different concepts are drawn in the same graph, the advantages of the concepts are clearly elucidated. An overall Pareto optimal front could be obtained by identifying the non-dominated set from all Pareto optimal fronts. The rational choice is naturally to select the final design from this overall Pareto optimal set. Thus the decisionmaker is advised which concept to choose depending on his or her preferences, and hence Pareto optimization could be a valuable support for concept selection. In this application it was recognized that the concepts had different properties, i.e. one concept is faster but consumes more energy, but it was not known under which preferences one concept were better than the other, i.e. when the Pareto fronts intersected. Therefore, Pareto optimization contributed to elucidate the benefits of the different concepts. The conception of an overall Pareto front and thereby the support for concept selection is one of the main contributions of this chapter. Subsequently, the method has been extended to handle selection of individual components from catalogs, and thus the problem is transformed to a mixed discrete/continuous optimization problem. Component catalogs have therefore been added to the simulation program where each catalog element contains all data needed by the simulation program as well as properties such as cost and weight. Furthermore, the GA has been extended with genomes with the ability to represent hierarchical catalogs as
502
Jokan Andersson
well as operators for similarity measures and crossover between catalogs elements. The valve-controlled system was again optimized resulting in a discrete Pareto front that visualizes the trade-off between system cost and system performance based on discrete selections of valves and cylinders. For future work, the catalogs could be exchanged for databases, where each element could be extended to contain the entire simulation model for a particular component. These models could either be made by the system designer, or be provided by the supplier, in such a form that proprietary information is not jeopardized. In this way, the supplier does not only supply a component for the final system, but also the simulation model describing the component. Furthermore, optimization is transformed from being a system model operator to a system model creator. References 1. Andersson J., Multiobjective Optimization in Engineering Design - Applications to Fluid Power Systems, Dissertation, Linkoping studies in science and Technology, Dissertation No. 675, Linkoping University, Linkoping, Sweden, 2001. 2. Andersson J. and Krus P., "Multiobjective Optimization of Mixed Variable Design Problems", in Proceedings of 1st International Conference on Evolutionary Multi Criteria Optimization, Zitzler E et al. (editors), SpringerVerlag, Lecture Notes in Computer Science No. 1993, pp 624-638, 2001. 3. Andersson J. and Wallace D., "Pareto optimization using the struggle genetic crowding algorithm", Engineering Optimization, Vol. 34, No. 6 pp. 623-643, 2002. 4. Coello Coello C , An empirical study of evolutionary techniques for multiobjective optimization in engineering design, PhD thesis, Department of Computer Science, Tulane University, 1996. 5. Deb K., Multi-objective Objective Optimization using Evolutionary algorithms, Wiley and Sons Ltd, 2001. 6. Eshelman L. J. and Schaffer J. D., "Real-Coded Genetic Algorithms and Interval-Schemata," in Foundations of Genetic Algorithms 2, L. D. Whitley, Ed., San Mateo, CA, Morgan Kaufmann, pp. 187-202, 1993. 7. Fonseca C. M. and Fleming P. J., "Multiobjective optimization and multiple constraint handling with evolutionary algorithms - Part I: a unified formulation," IEEE Transactions on Systems, Man, & Cybernetics Part A: Systems & Humans, vol. 28, pp. 26-37, 1998. 8. Goldberg D. E., Genetic Algorithms in Search and Machine Learning, Addison Wesley, Reading, 1989. 9. Grueninger T. and Wallace D., "Multi-modal optimization using genetic algorithms," Technical Report 96.02, CADlab, Massachusetts Institute of Technology, Cambridge, 1996. 10. Horn J., "Multicriterion decision making," in Handbook of evolutionary
Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm
503
computation, T. Back, D. Fogel, and Z. Michalewicz, Eds., IOP Publishing Ltd and Oxford University Press, pp. F1.9:l - F1.9:15, 1997. 11. Jansson A., Krus P., Hopsan -a Simulation Package, User's Guide, Technical Report LITHIKPR-704, Dept. of Mech. Eng., Linkoping University, Sweden, 1991. http://hydra.ikp.liu.se/hopsan.html. 12. Pahl G. and Beitz W., Engineering Design - A Systematic Approach, Springer-Verlag, London, 1996. 13. Roozenburg N. and Eekels J., Product Design: Fundamentals and Methods, John Wiley & Sons Inc, 1995. 14. Senin N., Wallace D. R., and Borland N., "Mixed continuous and discrete catalog-based design modeling and optimization," in Proceeding of the 1999 CIRP International Design Seminar, U. of Twente, Enschede, The Netherlands, 1999. 15. Zitzler E. and Thiele L., "Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach," IEEE Transaction on evolutionary computation, vol. 3, pp. 257-271, 1999.
CHAPTER 21 ELIMINATION OF EXCEPTIONAL ELEMENTS IN CELLULAR MANUFACTURING SYSTEMS USING MULTI-OBJECTIVE GENETIC ALGORITHMS
S. Afshin Mansouri Industrial Engineering Department Amirkabir University of Technology P.O.Box 15875-4413, Tehran, Iran E-mail: [email protected] Cellular manufacturing is an application of group technology in order to exploit similarity of parts processing features in improvement of the productivity. Application of a cellular manufacturing system (CMS) is recommended for the mid-volume, mid-variety production environments where traditional job shop and flow shop systems are not technically and/or economically justifiable. In a CMS, collections of similar parts (part families) are processed on dedicated clusters of dissimilar machines or manufacturing processes (cells). A totally independent CMS with no intercellular parts movement can rarely be found due to the existence of exceptional elements (EEs). An EE is either a bottleneck machine allocated in a cell while is being required in the other cells at the same time, or a part in a family that requires processing capabilities of machines of the other cells. Despite the simplicity of production planning and control functions in a totally independent CMS, such an independence cannot be achieved without machine duplication and/or part subcontracting, which have their own side effects. These actions deteriorate some other performance aspects of the production system regarding cost, utilization and workload balance. In this chapter, tackling the EEs in a CMS is formulated as a multi-objective optimization problem (MOP) to simultaneously take into account optimization of four conflicting objectives regarding: intercellular movements, cost, utilization, and workload balance. Due to the complexity of the developed MOP, neither exact optimization techniques nor total enumeration are applicable for large problems. For this, a multi-objective genetic algorithm (MOGA) solution approach is proposed, which makes use of the non-dominated sorting idea in conjunction with an elitism scheme to provide the manufacturing system designers with a set of near Pareto-optimal solutions. Application 505
506
S. Afshin Mansouri
of the model and the solution approach in a number of test problems show its suitability for real world instances.
21.1. Introduction The majority of manufacturing industries employ three basic designs to organize their production equipments. These include: job shop, flow shop and cellular designs. A job shop process is characterized by the organization of similar equipment by function (such as milling, drilling, turning, forging, and assembly). As jobs flow from work centre to work centre, or department to department, a different type of operation is performed in each centre or department. Orders may flow similar or different paths through the plant, suggesting one or several dominantflows.The layout is intended to support a manufacturing environment in which there can be a great diversity of flow among products. Fig. 21.1 depicts a job shop design.
Fig. 21.1. A job shop design.
Theflowshop is sometimes called a product layout because the products always flow the same sequential steps of production. Fig. 21.2 shows a typical flow shop system. In a cellular manufacturing system, machines are divided into manufac-
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
507
Fig. 21.2. A flow shop design.
turing cells, which are in turn dedicated to process a group of similar parts called part family. Cellular manufacturing strives to bring the benefits of mass production to high variety, medium-to-low volume quantity production. It has several benefits such as reduced material handling, work-inprocess inventory, setup time and manufacturing lead time, and simplified planning, routing and scheduling activities. Fig. 21.3 shows a cellular configuration. Each of the above-mentioned systems has its own rational range of application. Fig. 21.4 illustrates relative position of these systems in terms of production volume and product variety. Identification of part families and machine groups in the design of a CMS is commonly referred to as cell design/formation. Many solution approaches for cell design problem have been proposed over the last three decades. Mansouri et al.1 and Offodile et al.2 provide comprehensive reviews of these approaches. There are occasions where all of the machines/parts cannot be exclusively assigned to a machine cell/part family. These are known as Exceptional Elements (EEs).The EEs cause a number of problems in the operation of CMSs, e.g. intercellular part movements and unbalance of the workload across the cells. Dealing with the EEs has also been a subject of research.
508
S. Afshin Mansouri
Fig. 21.3. A cellular design.
Fig. 21.4. Relative position of the three manufacturing systems.
For instance, Logendran and Puvanunt3, Moattar-Husseini and Mansouri4, Shafer et al.5, Sule6 and Seifoddini7 develop solution approaches for this problem. Fig. 21.5(a) demonstrates initial machine part incidence matrix in a 4 machines, 6 parts problem. A "1" entry in the matrix indicates that there is a relationship between the associated parts and machines, i.e. the
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
509
part requires that particular machine in its process route. In Fig. 21.5(b), the sorted matrix is shown along with a decomposition scheme which separates all the machines and parts in two interdependent cells as: Cell 1: {(M2, Ml), (PI, P3, P6)}, and Cell 2: {(M4, M3), (P2, P4, P5)} where M and P stands for Machine and Part, respectively. There are two exceptional parts (PI and P4) and two exceptional (bottleneck) machines (M2 and M4) in the CMS proposed in Fig. 21.5(b).
Parts
Parts
1 2 3 4 5 6 «, I |
1 2 3 4
1
1 1
1 1 1 1 1 11 11 ...... . a. Initial matrix
1 3 6 2 4 5 1 1
« | J
2 1 4 3
I 1 1 ll [ i l l 1
1 111 1 1 1
b. A decomposition scheme on the sorted ,. matrix
Fig. 21.5. Initial and final machine-part incidence matrixes.
In deciding on which parts to subcontract and which machines to duplicate, one should take into account the associated side effects, e.g. cost increment and utilization decrement. Any effort to decrease intercellular part movements (as a performance measure) may degrade other measures of performance. In other words, there are multiple objectives to be considered in tackling the EEs. Hence it can be formulated as a multi-objective optimization problem (MOP). In this chapter, a MOP model is introduced for dealing with the EEs in a CMS and a MOGA-based solution approach to find locally non-dominated or near Pareto-optimal solutions. The remaining sections are organized as follows. An overview of multi-objective optimization is given in Section 2. Section 3 formulates the problem of dealing with the EEs as a MOP model. The developed MOGA-based solution approach for the model is introduced in Section 4, and subsequently its parameters are set in Section 5. Experiments on a number of test problems are conducted in Section 6. Finally concluding remarks are summarized in Section 7.
510
5. Afshin Mansouri
21.2. Multiple Objective Optimization A MOP can be defined as determining a vector of design variables within a feasible region to minimize a vector of objective functions that usually conflict with each other. Such a problem takes the form: Minimize {/i(X),/ 2 (X), ...,/ m (X)} subject to g(X) < 0
(B.I)
where X is vector of decision variables; /i(X) is the ith objective function; and g(X) is a constraint vector. Usually, there is no single optimal solution for B.I, but rather a set of alternative solutions. These solutions are optimal in the wider sense that no other solutions in the search space are superior to them when all objectives are considered. A decision vector X is said to dominate a decision vector Y (also written as X >- Y) iff: /i(X) < fi(Y) for all i E {1, 2, . . . , m};and
(B.2)
/i(X) -< fi(Y) for at least one i £ {1, 2, . . . , m}
(B.3)
There are various solution approaches for solving the MOP. Among the most widely adopted techniques are: sequential optimization, e-constraint method, weighting method, goal programming, goal attainment, distance based method and direction based method. For a comprehensive study of these approaches, readers may refer to Szidarovszky et al.s. Evolutionary algorithms (EAs) seem particularly desirable to solve multi-objective optimization problems because they deal simultaneously with a set of possible solutions (the so-called population) which allows to find an entire set of Pareto-optimal solutions in a single run of the algorithm, instead of having to perform a series of separate runs as in the case of the traditional mathematical programming techniques. Additionally, EAs are less susceptible to the shape or continuity of the Pareto-optimal frontier, whereas these two issues are a real concern for mathematical programming techniques. However, EAs usually contain several parameters that need to be tuned for each particular application, which is in many cases highly time consuming. In addition, since the EAs are stochastic optimizers, different runs tend to produce different results. Therefore, multiple runs of the same algorithm on a given problem are needed to statistically describe their performance on that problem. These are the most challenging issues with using EAs for solving MOPs. For detail discussion on application of EAs in multi-objective optimization see Coello et al.9 and Deb10.
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
511
21.3. Development of the Multi-Objective Model for Elimination of EEs 21.3.1.
Assumptions
It is assumed that part subcontracting and machine duplication are two possible alternatives for the elimination of EEs from a CMS as proposed by Shafer et al.5. It is also assumed that partial subcontracting is not allowed. In other words, the whole demand to a part should be supplied by subcontractors once it has been decided to subcontract it. 21.3.2. The Set of Decision
Criteria
The following set of criteria is considered for development of the MOP model: • Minimizing intercellular parts movements, • Minimizing total cost of machine duplication and part subcontracting, • Minimizing under-utilization of machines in the system, and • Minimizing deviations among the levels of the cells' utilization. Among the above-mentioned objectives, minimizing intercellular parts movement is of special importance as it is the key factor to make cells independent. However any effort to reduce intercellular parts movement by means of machine duplication and part subcontracting, increases cost, deteriorates overall utilization of machinery, and imbalances levels of utilization among the cells. The other objectives are considered to overcome these side effects. 21.3.3. Problem
Formulation
21.3.3.1. Notation Set of indices i :Index for machine types, i=l,...,m j :Index for part types, j=l,...,p k : Index for cells, k=l,...,c Decision variables
512
5. Afshin Mansouri
Two binary decision variables are defined to formulate the problem as : Xj = 1 if part j is subcontracted and Xj — 0 otherwise; y ^ = 1 if machine i is duplicated in cell k and yi^ = 0 otherwise. Set of parameters Dj-. annual demand for part j ; SJ: incremental cost of subcontracting a unit of part j ; Uj: processing time of a unit of part j on machine i; PM jti : number of intercellular transfers required by part j as a result of machine type i not being available within the part's manufacturing cell; Mi : annual cost of acquiring an additional machine i; CMi : annual machining capacity of each unit of machine i (minutes); HFk'- set of parts assigned to cell k; MCk • set of machines assigned to cell k; GFk'- set of parts assigned to the cells other than k while requiring some of the machines in cell k; BMk • set of the bottleneck machines required by the parts in cell k; EP\. : set of exceptional parts in cell k; EM j : set of bottleneck machines required by the exceptional part 3\ CSk '• number of machines assigned to cell fe; MCS : maximum cell size; c : number of cells; UCk '• utilization of cell A;; and OU : overall utilization of the CMS. 21.3.3.2. The Objective Functions We define the solution vector X = (XJ'S , j/i^'s) which consists of binary decision variables. The objectives considered for dealing with the EEs are described as follows: Objective 1: minimizing intercellular parts movement Intercellular movement of parts is one of the major problems associated with the EEs in a CMS, which complicates production and inventory
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
513
management functions. Minimization of the intercellular parts movement is sought through the following objective function:
A(X)= £ £ (1-*;) * ( £ (PMjti x (1-ifc,*))) (C.I) *=1 j€EPk
\iEEMj
J
Objective 2: minimizing total cost of machine duplication and part subcontracting Any reduction in intercellular parts movement by machine duplication and / or part subcontracting will result in cost increment. Hence minimization of total part subcontracting and machine duplication cost is included in the model as follows:
/ 2 (X) = £ £ k jeEPk
[(DjX S3 x Xj) + £ (Mi x j,i>fc) J (C.2) \
i£EMj
J
Objective 3: minimizing overall machine under-utilization Since machine duplication and / or part subcontracting deteriorates machinery utilization, minimization of the overall machines under-utilization, which is equivalent to maximization of overall utilization, is taken into account employing the following objective function: E UCk x \CSk+ Y. Vi,k\ MX) = l-OU = 1- ^ V ^ ^
E icsk+
k=l \
(C.3)
E vi,k) i<=BMk
j
where UCk's can be calculated as below:
Jjp
E
k
( E (DjXU.i)- £
__ i€MCk \i£HFk
~
j€EP t
(DixU,jxxj)+ £
jeGF t
(DjXti.jxa-xj)))
E CMi+ Y. (yi,kxCMi)
i€MCk
J_
i£BMk
Objective 4: minimizing deviations among utilization of the cells
(C.4)
514
S. AJshin Mansouri
Significant differences in the cells' utilization may result in major problems in the managerial functions, e.g. different overtime payments to the operators as a result of their differing workload. Hence the following objective function is included to minimize deviations among the cells' level of utilization:
E (uck - ouf
/ 4 (X) =
k=l
c
_
(C.5)
i
According to Bowker and Liberman11, in calculating standard deviation of a small sample size N, the sum of the square differences from the sample mean should be divided by TV — 1 rather than N. That is why the denominator in equation C.5 is c — 1 instead of c. Among the above-mentioned objectives, objective 1 is of special importance due to the fact that intercellular movements are the main cause of cells interdependencies. However any effort to reduce intercellular parts movement by means of machine duplication and part subcontracting, increases cost, deteriorates overall utilization of machinery, and imbalances levels of utilization among the cells. Objectives 2, 3 and 4 have been included in the model to overcome these side effects, respectively. 21.3.3.3. The Constraints The solution space is restricted by the following constraints: [CSk +
]T
Vi>k)
< MCS,k
= l,...,c
(C.6)
i£BMk
xe{o,i}
(C.7)
Constraints C.6 prevent cell sizes from exceeding a pre-determined upper bound. Relations C.7 restrict the decision variables to take either a '0' or '1'. 21.3.3.4. The Multi-Objective Optimization Problem (MOP) The set of objectives and constraints stated above, constitute the MOP as follows: Minimize {A(X), / 2 (X), / 3 (X), / 4 (X)},
515
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
subject to: (C.6) and (C.7)
(C.8)
21.3.4. A Numerical Example Consider the problem shown in Fig. 21.5(a) and the CMS proposed in Fig. 21.5(b). There are 4 machines and 6 parts grouped in 2 cells along with 4 exceptional elements. The values of Utj,Dj,Sj and M, are given in Table 21.84, wherein the numbers in the incident matrix represent tij's. It is also assumed that CMi =210000 minutes for all machines. The upper bound for the cell sizes is assumed to be 3. The data sets for the two cells proposed by Fig. 21.5(b) are as shown in Table 21.84. Table 21.84. The values of Uj, Dj, Sj and M{ for the example. Parts 1 1 |
2
|
3
2
3
8
4
5
7 7
3
2
10
6 1
4
9
3
Dj
10000
7000
8000
Sj
3
3
1
1000 4
6
M,
1
14000
8 5
6000 19000
9
7000 4000
2000
2
3
Cell 1: HFl = {PI, P3, P6}, MCX = {M2, Ml}, GF1 = {P4}, BM1 = {M4}, EPX = {PI}, EMX ={M4}, C5i = 2, PM1A = £>i = 10000. Cell 2: HF2 = {P2, P4, P5}, MC2 = {M4, M3}, GF2 = {PI}, BM2 = {M2}, EP2 = {P4}, EMi = {M2}, CS2 = 2, FM4,2 = D4 = 1000. The decision vector for this problem is X = {a;i,£4,2/4,1,2/2,2}- The objective functions and constraints of the problem are as follows: Minimize {^(X), / 2 (X), / 3 (X), / 4 (X)} where:
(C.9)
516
S. Afshin Mansouri
MX)
= ((10000) x(l-Xl)x
(l-t/4,1)) + ((1000) x (1 -
Xi)
/ 2 (X) = (10000) x (3) x ( n ) + (7000) x (2/4,1) + (1000) x (1) x (i 4 ) + (6000) x (2/2,2) / 3 (X) = 1-OU
/ 4 (X) =
( ( t / d - O C / ) 2 + (UC2-OU)2)
x (l-y 2 ,2)) (CIO) (c l
•
m ;
(C.12)
(C.13)
where: 0[/
^ 1
=
((UCi)x(2 + y4A) + (t7C 2 )x(2 + y2,2)) ((2 + 2/4,1) + (2 + 2/2,2))
= 210000+210000+210000 X(j/4,l)
u W2
'
j
X
((10000) x (7 + 8) + (8000) x (3 + 7) + (2000) x (8 + 1 ) (10000) x (7 + 8) x (zi) + (10000 x 2) x (1 - xA)) JJC
l
(C.15)
1 v 210000+210000+210000x(»/2,2)
((7000) x (10 + 3) + (1000) x (1 + 6) + (4000) x (9 + 5 ) (1000) x (1 + 6) x (x4) + (10000 x 9) x (1 - xi))
(C.16)
subject to: (2 + 1/4,1) < 3
(C.17)
(2 + ?/2,2)<3
(C.18)
xe{o,i}
(C.19)
In Table 21.85, the set of feasible solutions for the example are summarized along with their relative objective values. The utilization level of in each cell as well as the overall utilization of the system, which are required
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
517
in the calculation of the objective values f$ and j \ , are also presented as complementary information. The results show that X3dominates Xn or, X3 XXn. Moreover, X5 >X 2 y X6and X 4 y (X8, X 9 , X 12 , X 13 , X 15 , X 16 ). The set of the nondominated or Pareto-optimal solutions for the example problem include: X i , X 3 , X 4 , X 5 , X 7 , X10 and X14. Table 21.85.
Total solutions of the numerical example.
_ . . . Decision vectors ~~7/ I Xi={0,0,0,0} 11000 X2={0,0,0,1} 10000 X 3 = { 0 , 0,1,0} 1000 X4 = {0,0, 1, 1} 0 X5 = {0, 1, 0,0} 10000 Xt '•'- {0, 1,0,1} 10000 X7 - { 0 , 1 , 1 , 0 } 0 X s » { 0 , 1, 1, 1} JO X, = { 1 , 0 , 0 , 0 } 1000 X ! O - { 1 , 0, 0, 1} 0 X (1 = {1,0, 1,0} .1000 Xt* = {l. 0, 1, 1} , 0 X n = {l, 1,0,0} d XM = {1, 1, 0, 1} 0 X,5 = {1, 1, 1,0} 0 Xit = {l, 1, 1, 1 } | 0 j i=<4<»,: dominated solution
Objective values
Utilization levels
J
f2 I fs I 7, 0 0.41190 0.00010 6000 0.52952 0.02248 7000 0.52952 0.01763 13000 0.60794 0.00005 4000 0.42262 0.00034 10000 0.53810 O.O23S8 11000 0.53810 0.01 ^I I 17000 0.61508 0.00015 30000 0.69762 0.00827 36000 0.75810 0.00002 37000 0.75810 0.02248 43000 0.79841 0.00367 34000 0.70833 O.00681 40000 0.76667 10.00000 41000 0.76667 10.01966 47000 I U SO.viC O.iK'302
UC! I 0.595 0.595 0.397 0.397 0.590 0.590 ...194 n 394 0.238 0.238 0.159 0.159 0.233 0.233 0.156 0 156
UC2 I 0.581 0.387 0.581 0.387 0.564 0.376 0.564 0.376 0,367 0.244 0.367 0.244 0.350 0.233 0.350 0.233
OU 0.588 0.470 0.470 0.392 0.577 0.462 0.462 0.385 0,302 0.242 0.242 0.202 0.292 0.233 0.233 0.194
The solution space of the above example with 4 exceptional elements consists of 24 = 16 solutions. In general, the total number of solutions for a problem having n exceptional elements is equal to 2 n . It is obvious that the size of the solution space exponentially increases as the number of exceptional elements increases. Hence application of exact optimization techniques is prohibitively expensive (computationally speaking) for large problems. For this, a MOGA-based solution approach is developed, which is described in the subsequent section. 21.4. The Proposed MOGA In simple GAs, a candidate solution is represented by a sequence of genes and is known as a chromosome. A chromosome's potential as a solution is
518
S. Afshin Mansouri
determined by its fitness function that evaluates a chromosome with respect to the objective function of the optimization problem at hand. A judiciously selected set of chromosomes is called a population and the population at a given time is a generation. The problem size remains constant from generation to generation and has a significant impact on the performance of the GA. The mechanism of Gas generally operates on a generation through three main operators, i.e. (1) reproduction (selection of copies of chromosomes according to their fitness value), (2) crossover (an exchange of a portion of the chromosomes), and (3) mutation (a random modification of the chromosome). The chromosomes resulting from these three operations form the next generation's population. The process is then iterated for a desired number of times, usually up to the point where the system ceases to improve or the population has converged to a few well-performing sequences. In order to apply genetic algorithms to the developed MOP in a problem with n decision variables, a chromosomal structure consisting of n genes is considered. Each gene in the chromosome can take either a value of '0 or '1' that reflects value of its corresponding binary decision variable. The objective values are normalized so that they lie in the interval of 0 and 1 by means of the following formula: Fi = - ^
J
- ,i = l,...,4
^i T Ji
(D.I)
where: Ft is the fitness value, / , is the objective value and C» is the normalizing factor concerning the objective i. 21.4.1. Pseudocode for the Proposed
MOGA
The following pseudocode details steps of the proposed MOGA: Initialize Search Parameters Randomly Generate Initial Solutions i= 1 do for j = 1 to Population Size Calculate Dummy Fitness Value for chromosome^ if chromosome^ is Infeasible then Let Dummy Fitness Value of chromosome^- to be 0 end if next j
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
519
do Select chromosome to form Mating Pool using RSSWR-UE scheme while (Members of Mating Pool are less than Population Size) Shuffle Mating Pool Produce offspring using: Reproduction, Crossover and Mutation operators for j = 1 to Population Size if chromosome^ is Infeasible then Let Dummy Fitness Value of chromosomej to be 0 end if next j i= i+1 while ((i <Max Generations) and (Successive Nondominated Frontiers is less than Min. Successive Nondominated Frontiers)) Report Resultant Nondominated Frontier It should be noted that Successive Nondominated Frontiers refers to the number of successive generations in which all non-dominated solutions of the current generation have remained non-dominated when compared against the non-dominated frontiers of previous generations. Some steps of the algorithm are discussed in more details in the following sub-sections. 21.4.2. Fitness
Calculation
The fitness values are calculated using the non-dominated sorting method of Srinivas and Deb12. The idea behind the non-dominated sorting procedure is that a ranking method is used to emphasize good solutions and a niche method is used to maintain stable subpopulations of good solutions. In this procedure, the population is ranked on the basis of an individual's non-domination. The non-dominated individuals present in the population are first identified from the current population. Then, all these individuals are assumed to constitute the first non-dominated frontier in the population and assigned a large Dummy Fitness Value. The same fitness value is assigned to give an equal reproductive potential to all these non-dominated individuals. To maintain diversity in the population, these classified individuals are then shared with their dummy fitness values. Sharing is achieved by performing a selection operation using degraded fitness values that are obtained by dividing the original fitness value of an individual by a quantity proportional to the number of individuals around it. This causes multiple Pareto-optimal solutions to co-exist in the population. After sharing, these non-dominated individuals are ignored temporarily to process the rest of
520
S. Afshin Mansouri
the population in the same way to identify individuals for the second nondominated frontier. These non-dominated solutions are then assigned a new dummy fitness value that is kept smaller than the minimum shared dummy fitness of the previous frontier. This process is continued until the entire population is classified into several frontiers. 21.4.3. Selection For selection, a novel scheme, called RSSWR-UE was developed using the Remainder Stochastic Sampling Without Replacement in conjunction with a new Elitism operator. For details of this scheme, readers may refer to Mansouri et al.13. 21.4.4.
Recombination
All selected chromosomes in the Mating Pool are shuffled and then mutually recombined, according to Crossover Rate, via single-point crossover, wherein the two selected parents are cut from a random point along their length into two sections. Section 1 of parent 1(2) attaching section 2 of parent 2(1) form offspring 1(2). A small portion of genes in the population are then mutated according to the Mutation Rate from "1" into "0" and vice versa through the mutation operator. 21.4.5. Updating the Elite Set The updating mechanism of the Elite Set and how to keep its size from exceeding an upper limit are important factors that affect the performance of the MOGA. In the current MOGA, a niche mechanism is employed in updating the elite set so that the diversity of the members of the set is improved. 21.4.6. Stopping
Criteria
The algorithm terminates as soon as either it converges to a robust nondominated frontier or a predetermined number of generations have been completed. To realize if a robust non-dominated frontier is achieved, members of non-dominated frontiers of successive generations are mutually compared against each other. If individuals of the frontiers remain nondominated for a predetermined number of generations, say Minimum Successive Nondominated Frontiers (Min. SNDF), then it could be asserted that the obtained frontier is robust and hence the algorithm terminates.
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
521
21.5. Parameter Setting In order to find a good set of parameters for the MOGA, two mediumsized problems with n — lOand n = 15 were selected from the literature. True non-dominated frontiers of these problems, found via total enumeration, were employed as the references for evaluation. Three measures for judgment on the effectiveness of the set of parameters were used as follows: • MPi: Quality of non-dominated solutions; ratio of true nondominated solutions in the final non-dominated frontier of the algorithm. • MP2: Diversity of solutions in the final non-dominated frontier, measured by the number of solutions in the frontier. • MP3: CPU time. The experiments were conducted in two stages. In the first stage, the parameters are examined individually, i.e. changing the value for a given parameter while keeping values of the rest of parameters at a constant level. At each level, both problems were solved twenty times. Considering the three performance measures, an appropriate value for the given parameter was selected according to the average of these runs. The examined parameter was given this value and the procedure was repeated for another parameter until all parameters were assigned an initial value. Figures 21.6 to 21.8 illustrate sample results of this stage for the test problems with n = 15.
Fig. 21.6. The effect of mutation rate on quality.
In order to examine the interaction between parameters, the parameters were examined jointly in the second stage on problem set with n = 15. A pair of parameters was selected at the beginning with various combinations of values. Twenty runs were conducted using each combination and the best
522
S. Afshin Mansouri
Fig. 21.7. The effect of mutation rate on diversity.
Fig. 21.8. The effect of mutation rate on CPU time.
value for one of them was selected. The best value for the other parameter was then determined in the same way with a new parameter. The procedure was iterated until all parameters were examined. The measure for selecting appropriate values in this stage was multiplication of Quality (MPi) and Diversity (MP2), i.e. MPixMP 2 . This measure reflects the number of true non-dominated solutions found by the MOGA. Figures 21.9 and 21.10 depict the effect of crossover rate in two joint tests. Considering the result of the above experiments, the following parameter set was found to be promising in terms of the three performance measures: Population Size=150, Crossover Rate=0.50, Mutation Rate=0.03, Niching Parameter=0.60, Min. Successive Nondominated Frontiers (Min. SNDF)=15, Elitism Prob.=1.00, Initial Transfer Prob.=0.10, Epsilon Niche=0.30, Elite Set Size=50 and Degrading Factor—0.80. 21.6. Experimentation In order to evaluate the MOGA algorithm, data sets regarding 5 cellular manufacturing systems in differing sizes were selected from the literature. Major characteristics of the test problems are presented in Table 21.86.
Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms
523
Fig. 21.9. The joint effect of crossover rate and min. SNDF.
The size of solution space associated with the test problems ranges from a space having 2 10 = 1024 solutions to a space of 2 43 = 8.796xl0 12 solutions. Moreover maximum number of mutual comparisons required for total enumeration of these problems, ranges from (210!)/[(2!)(210-2)!] = 523,776 to (243!)/[(2!)(243-2)!] = 3.869xl0 25 where complete enumeration is impossi-
ble.
Table 21.86. Main characteristics of the test problems.
Problem Venogupal and Narendran 1 4 Burbidge 1 5 Askinef al.16 Seifoddini 7 Boe and Cheng 1 7
Decision . ,. variables 10 15 30 35 43
Number of: . Machines
Parts
15 30 2 0 - 3 5 12 19 16 43 20 35
„ Cells 3 5 3 5 4
524
5. Afshin Mansouri
Fig. 21.10. The joint effect of crossover rate and elite set size.
The algorithm was coded in C++ and implemented on a Pentium-II (Celeron) CPU at 333 MHZ with 64 MB of RAM under Windows 2000. To evaluate quality (MPi) of the non-dominated frontier found by the algorithm, a reference set for every problem was formed. In small to medium sized problems, i.e. the problems with less than or equal to 15 decision variables, the reference sets were created through total enumeration. Concerning the large problems, i.e. the problems with more than 16 variables, where total enumeration was practically impossible, a refining scheme was devised to establish a set of near non-dominated frontiers. In the refining scheme, successive runs of the MOGA were conducted, each run using a randomly selected set of parameters. Non-dominated solutions of the first run were adopted as an initial reference set. Adding non-dominated solutions of the next run into the reference set, a mutual dominance check was performed between the old members of the set and the new entrants. Dominated solutions were removed and the remaining solutions formed the new reference set. This procedure was iterated 50 times for each problem and the final reference set was adopted for
Cellular Manufacturing Systems Using Multi- Objective Genetic Algorithms
525
later comparisons of the algorithm. Quality of each run (MPi) of an algorithm is then calculated by comparing the final results against the corresponding reference sets. The diversity was simply measured by the number of non-dominated solutions found and represented by MP 2 . CPU time (MP3) was also measured and used as the third measure. Each problem was then solved 20 times by each algorithm. The average results of the 20 conducted test runs are presented in Table 21.87. Table 21.87. The average results for the test problems.
Problem
Measures of performance MPi
MP 2
MP 3
VenogupalandNarendran 14 Burbidge15
0.983 0.789
30.6 75.3
18.9 22.8
Askin etal.16
0.599
101.4
22.5
0.695
106.5
30.7
0.596
97.1
24.8
Seifoddim
7
Boe and Cheng17
21.7. Conclusion In this chapter a multi-objective optimization model was addressed along with a solution approach based on genetic algorithms to decide on which parts to subcontract and which machines to duplicate in a CMS wherein some exceptional elements exist. The set of objectives considered include: (1) minimizing inter-cellular movements, (2) minimizing total cost of machine duplication and part subcontracting, (3) minimizing overall underutilization of the cells, and (4) minimizing unbalance of the workloads among the cells. The proposed MOGA seeks for non-dominated or Paretooptimal solutions to the aforementioned model. Application of the MOGA was tested in a number of problems. It was observed that the MOGA is capable of producing good solutions in terms of a three fold measure of effectiveness concerning quality, diversity and CPU time. Simplicity of use besides its acceptable level of effectiveness in a short amount of computation time are the key advantages of the proposed MOGA compared to the exact optimization and total enumeration methods, which are only applicable to small problem instances. This besides the fact that
526
S. Ajshin Mansouri
the majority of cellular manufacturing systems are being implemented in small to medium enterprises (SMEs) where the application of sophisticated optimization schemes is impractical even for small problems, justifies more the use of the MOGA for real world problems. References 1. S. A. Mansouri, S. M. Moattar-Husseini and S. T. Newman, A review of the modern approaches to multi-criteria cell design, International Journal of Production Research, 38 (2), 1201-1218 (2000). 2. F. Offodile, A. Mehrez and J. Grznar, Cellular manufacturing: a taxonomic review framework, Journal of Manufacturing Systems,13, 196-220 (1994). 3. R. Logendran and V. Puvanunt, Duplication of machines and subcontracting of parts in the presence of alternative cell locations, Computers and Industrial Engineering, 33 (3-4), 235-238 (1997). 4. S. M. Moattar-Husseini and S. A. Mansouri, A cost based part machine grouping method for group technology, Proceedings of the 5 International Conference on Flexible Automation and Intelligent Manufacturing (FAIM'95), Stuttgart, Germany, 415-423 (1995). 5. S .M. Shafer, G. M. Kern and J. C. Wei, A mathematical programming approach for dealing with exceptional elements in cellular manufacturing, International Journal of Production Research, 30, 1029-1036 (1992). 6. D. R. Sule, Machine capacity planning in group technology, International Journal of Production Research, 29 (6), 1909-1922 (1991). 7. H. Seifoddini, Duplication process in machine cells formation in group technology, HE Transactions, 21 (1), 382-388 (1989). 8. F. Szidarovsky, M. E. Gershon and L. Dukstein, Techniques for Multiobjective Decision Making in Systems Management, Elsevier: New York (1986). 9. C. A. C. Coello, D. A. Van Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems, Kluwer Academic Publishers, New York (2002). 10. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms, John Wiley & Sons, Chichester, UK (2001). 11. H. Bowker and G. J. Liberman, Engineering Statistics (2 ed), PrenticeHall (1972). 12. N. Srinivas and K. Deb, Multiobjective optimization using nondominated sorting in genetic algorithms, Evolutionary Computation, 2 (3), 221-248 (1994). 13. S. A. Mansouri, S. M. Moattar-Husseini and S. H. Zegordi, A genetic algorithm for multiple objective dealing with exceptional elements in cellular manufacturing, Production Planning & Control, 14 (2), 437-446 (2003). 14. V. Venugopal and T. T. Narendran, A genetic algorithm approach to the machine-component grouping problem with multiple objectives, Computers and Industrial Engineering, 22 (1), 469-480 (1992). 15. J. L. Burbidge, An introduction of group technology, Proceedings of the
Cellular Manufacturing Systems Using Multi- Objective Genetic Algorithms
527
Seminar on GT, Turin, Italy (1969). 16. R. G. Askin, S. H. Cresswell, J. B. Goldberg and A. J. Vakharia, A Hamiltonian path approach to reordering the part-machine matrix for cellular manufacturing, International Journal of Production Research, 29 (3), 1081-1100 (1991). 17. W. J. Boe and C. H. Cheng, A close neighbor algorithm for designing cellular manufacturing systems, International Journal of Production Research, 29 (10), 2097-2116 (1991).
CHAPTER 22 SINGLE-OBJECTIVE AND MULTI-OBJECTIVE EVOLUTIONARY FLOWSHOP SCHEDULING
Hisao Ishibuchi and Youhei Shibata Department of Industrial Engineering, Osaka Prefecture University 1-1 Gakuen-cho, Sakai, Osaka 599-8531, Japan E-mail: {hisaoi, shibata} @ie.osakafu-u.ac.jp This chapter explains how evolutionary algorithms can be applied to single-objective and multi-objective permutation flowshop scheduling problems. In permutation flowshop scheduling, each solution is represented by an order (i.e., permutation) of given jobs, which are processed on given machines in that order. Such a permutation is handled as an individual in genetic algorithms as in the case of traveling salesman problems. We first examine various genetic operations designed for permutation-type strings (i.e., order-based coding) through computational experiments on single-objective problems. Next we compare genetic algorithms with multi-start local search and genetic local search. It is shown that multi-start local search and genetic local search are more efficient than genetic algorithms for single-objective problems. Then we discuss the application of genetic algorithms to multi-objective problems. It is shown that multi-objective genetic algorithms outperform multiple runs of multi-start local search and single-objective genetic algorithms. This is because a large number of various non-dominated solutions can be simultaneously obtained by a single run of multi-objective genetic algorithms. We also suggest some tricks for improving the performance of multi-objective genetic algorithms. 22.1. Introduction Permutation flowshop scheduling is one of the most frequently studied scheduling problems in the literature1. Since Johnson's pioneering work2, various criteria have been considered such as makespan, total flowtime, maximum flowtime, maximum tardiness, and total tardiness3. In general, it is impractical to try to find optimal schedules for large permutation 529
530
H. Ishibuchi and Y. Shibata
flowshop scheduling problems. Thus metaheuristic approaches such as simulated annealing4'5, taboo search6'7 and genetic algorithms8"11 as well as heuristic approaches12"15 have been proposed for efficiently finding nearoptimal solutions. By simultaneously considering multiple criteria, singleobjective permutation flowshop scheduling problems have been extended to multi-objective ones16 where evolutionary algorithms have been frequently used17"25. Evolutionary algorithms, which are population-based search techniques, are suitable for multi-objective optimization because a large number of various non-dominated solutions can be simultaneously obtained by their single run 26 " 29 . On the other hand, only a single solution is usually obtained by a single run of other heuristic and metaheuristic algorithms. Since Schaffer's proposal of the first multi-objective genetic algorithm30, a number of evolutionary multi-objective optimization (EMO) algorithms have been proposed in the literature (e.g., MOGA31, NPGA32, NSGA33, SPEA34, PAES35, MOGLS36, NSGA-II37). Those algorithms have been compared with each other in some comparative studies34'38'39 where function optimization problems and 0/1 knapsack problems have been frequently used as test problems. One of the main characteristic features of the application of evolutionary algorithms to permutation flowshop scheduling is the use of the order-based coding where an order (i.e., permutation) of given jobs is used as an individual. As a result, standard genetic operations for binary strings are not directly applicable. Thus we first examine various crossover and mutation operations for the order-based coding through computational experiments on single-objective permutation flowshop scheduling problems. Next we examine the performance of single-objective genetic algorithms in comparison with multi-start local search and genetic local search. It is shown that multi-start local search and genetic local search are more efficient than genetic algorithms for single-objective problems. Then we discuss the application to multi-objective problems where we use a wellknown multi-objective genetic algorithm: NSGA-II37 (elitist nondominated sorting genetic algorithm). Of course, we can use other EMO algorithms because they are usually general-purpose algorithms. It is shown that multiobjective genetic algorithms outperform multiple runs of multi-start local search and single-objective genetic algorithms. We also discuss some tricks for improving multi-objective genetic algorithms for permutation flowshop scheduling such as the hybridization with local search and the introduction of mating restriction. Through computational experiments, it is shown that the search ability of the NSGA-II can be improved by those tricks.
531
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
22.2. Permutation Flowshop Scheduling Problems In this section, we briefly explain permutation flowshop scheduling. For details of various scheduling problems, see Brucker 40 . Let us assume that we have n jobs {J\, J2, ..., Jn) that are processed on m machines {Mi, M 2 , ..., Mm} in the same order. We also have an n x m matrix whose (i, j) element is the processing time of the i-th job on the j-th machine, and an n-dimensional vector whose i-th element di is the due date of the i-th job. A single-objective permutation flowshop scheduling problem is to obtain an optimal permutation of {Ji, J2, ..., Jn} with respect to a single scheduling criterion. In this chapter, we consider the maximum completion time (i.e., makespan) and the maximum tardiness. Let Ci be the completion time of the i-th job at the last machine (i.e., m-th machine), which is calculated from the n x m matrix as shown in Fig. 1. The makespan is defined as max{Ci | i = 1,2, ...,n} while the maximum tardiness is defined as max{max{(Ci - di), 0} | i = 1,2,..., n} .
Machine 1 Machine 2 Machine 3
Job 1
Job 2 Job 3 Job 4 Job 1
Job 2 Job 3 Job 4 I Job 1 Job 2 Job 3 C,
C2
Job 4 C3
C4
Fig. 22.1. A schedule of four jobs on three machines. In this figure, the four jobs are processed on the three machines in the order of Job 1, Job 2, Job 3 and Job 4. The length of each rectangle shows the processing time of the corresponding job on the corresponding machine. The completion time Ci of the i-th job is the right limit of the corresponding rectangle at the last machine (i.e., the third machine).
When our aim is to minimize the makespan, the optimal schedule for the four jobs on the three machines in Fig. 1 is obtained as shown in Fig. 2. Since all permutations of the given n jobs are feasible schedules, the total number of possible solutions (i.e., the size of the search space) is n\. When n is small (e.g., n < 10), the optimal schedule can be easily obtained by examining all permutations. On the other hand, it is impractical to try to find the optimal schedule of a large problem with many jobs. Thus metaheuristic approaches such as simulated annealing, taboo search and genetic algorithms as well as various heuristic approaches have been proposed for permutation flowshop
532
H. Ishibuchi and Y. Shibata
Machine 1 Job 4 Job 3 Job 2 Machine 2
Job 1
Job4 Job3 | Job2
Job 1 I
Machine 3
Job 4
Job 3 C4
Job 2 Job 1 C3
C2
C,
Fig. 22.2. The optimal schedule of the four jobs in Fig. 1 with respect to the minimization of the makespan.
scheduling problems in the literature. As test problems, we use a 20-machine 40-job problem and a 20-machine 80-job problem, which were generated in our former study25. The processing time of each job on each machine was specified as a random integer in the interval [1, 99]. The due date of each job was specified by adding a random integer in the interval [-100,100] to its actual completion time in a randomly generated schedule. In this chapter, the makespan and the maximum tardiness are separately optimized by single-objective genetic algorithms while they are simultaneously optimized by multi-objective genetic algorithms. This means that we have four single-objective and two two-objective test problems. 22.3. Single-Objective Genetic Algorithms In this section, we discuss several issues related to the implementation of genetic algorithms for single-objective permutationflowshopscheduling problems. Multi-objective genetic algorithms are discussed in the next section. 22.3.1. Implementation
of Genetic
Algorithms
We use the order-based coding where a permutation of the given n jobs is directly handled as a chromosome (i.e., individual) in genetic algorithms. We examine seven crossover operations: three versions of one-point order crossover, three versions of two-point order crossover, and one version of uniform order crossover. The one-point order crossover is illustrated in Fig. 3 where one parent is divided into two parts by a randomly chosen cutting point. The left-hand side of the parent is inherited to the offspring with no changes in Version 1 as shown in Fig. 3 (a). The remaining jobs are placed into the remaining positions of the offspring in the order of those jobs in
Single-Objective and Multi- Objective Evolutionary Flowshop Scheduling
533
the other parent. On the other hand, the right-hand side of one parent is inherited in Version 2 as shown in Fig. 3 (b). Version 3 uses Version 1 in Fig. 3 (a) and Version 2 in Fig. 3 (b) with the same probability (i.e., the probability of 0.5) when this crossover operation is invoked.
Parent 1
|Jl|J2|J3fj4| J51J6 JJ7 [
Parent 1
| j l jJ21 J3|J4JJ5|J6|J7|
Offspring
|Jl|J2|J31J61J4jJ7 [J51
Offspring
|j31J21Jl |J4|J5|j6|J7|
Parent 2
| J3| J61 J4| J2| J71J5 [jjj
Parent 2
1 J3| J6|J4|J2| J7 |J5 |J11
111
(a) Version 1 of the one-point order crossover.
II 11
(b) Version 2 of the one-point order crossover.
Fig. 22.3. One-point order crossover. The left-hand side is inherited to the offspring in Version 1 in (a) while the right-hand side is inherited in Version 2 in (b). In Version 3, these two crossover operations are used with the same probability.
The two-point order crossover has two cutting points as shown in Fig. 4. The outer parts of one parent are inherited to the offspring as shown in Fig. 4 (a) in Version 1 of the two-point order crossover. On the other hand, the inner part is inherited in Version 2 as shown in Fig. 4 (b). In Version 3, Version 1 and Version 2 are used with the same probability. By increasing the number of cutting points, we may have the uniform order crossover in Fig. 5 where each job in one parent is inherited to the offspring with the probability of 0.5. The remaining jobs are placed into the remaining positions of the offspring in the order of those jobs in the other parent.
Parent 1 | Jl]j21J31J41J5JJ6I J7|
Parent 1
lJlfj2|J3|J4|J5|j6|J7|
Offspring [jl|J3|j4iJ2lJ5|j6|j7|
Offspring
|J61J2|J3|J4|J5|J71Jl 1
/ iT \
Parent 2 lj3[j6ij4|J2[j7|J5|jl| (a) Version 1 of the two-point order crossover,
\
.' / t
Parent 2 [j3|j6|J4|J2|J7|j5iJl| (b) Version 2 of the two-point order crossover.
Fig. 22.4. Two-point order crossover. The outer parts are inherited to the offspring in Version 1 in (a) while the inner part is inherited in Version 2 in (b). Version 3 uses Version 1 and Version 2 with the same probability (i.e., the probability of 0.5).
534
H. Ishibuchi and Y. Shibata
1 * 1
• * •
. * .
Parent 1 |jl |J2|J31J4|J51J6|J7|
I
IT
Offspring 1J6|J4|J31J2|J5|J1|J7|
\\T
\
Parent 2 |J3|J6|J4[J2|J7|J5|J11 Fig. 22.5. Uniform order crossover. Each job in one parent is inherited to the same position of the offspring with the probability of 0.5. The remaining jobs are placed into the remaining positions in the order of those jobs in the other parent.
While other crossover operations (e.g., edge recombination41, enhanced edge recombination42, partially matched43, cycle44, precedence preservation45 and one segment46) were examined in comparative studies9'24, it was reported that better results were obtained by the order crossover for permutation flowshop scheduling. Thus we examine the above-mentioned seven versions of the order crossover. It should be noted that the order crossover operations are not suitable for traveling salesman problems while they work well for permutation flowshop scheduling. We also examine four mutation operations shown in Fig. 6: Adjacent two-job change (i.e., switch), arbitrary two-job change (i.e., swap), arbitrary three-job change, and insertion (i.e., shift). It should be noted that the arbitrary two-job change and the insertion include the adjacent two-job change. The arbitrary three-job change does not include the adjacent or arbitrary two-job change as its special case. For selecting a pair of parents from the current population, we use the standard binary tournament selection. First, two individuals are randomly chosen from the current population with replacement. Next the better one is chosen as a parent. The other parent is also chosen from the current population in the same manner. One of the above-mentioned seven crossover operations is applied to the pair of the selected parents with a pre-specified crossover probability for generating an offspring. When the crossover operation is not applied, one of the two parents is randomly chosen. Then one of the above-mentioned four mutation operations is applied to the newly generated offspring (or to the randomly chosen parent when the crossover operation is not applied) with a pre-specified mutation probability. These genetic operations (i.e., selection, crossover and mutation) are iterated for generating a pre-specified number of offspring. In our implementation of a single-objective genetic algorithm, we gen-
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling jfr "k
-k
| Jl 1 J2|J3|J4|J5| J6[J7J
X
-k
|J11 J2|J3|J4| J5|J6|J7
><
1Jl1J2|J4|J3|J5[J6|J7[
|J1[J2|J6JJ4|J5|J3|J7
(a) Adjacent two-job change.
(b) Arbitrary two-job change.
•k
*
*
535
y
*
|J1|J21J3|J41J5|J6|J71
1Jl|J2lJ3|J4|J5|J6|J7
|J6|J2|Jl|J4)J5|J3|J7J
|J1|J6|J2|J3|J41JS|J71
(c) Arbitrary three-job change.
(d) Insertion.
Fig. 22.6. Four mutation operations examined in this chapter.
erate Npop offspring where Npop is the population size (i.e., the number of strings in each population). The next population is constructed by choosing the best Npop strings from the current population with Npop strings and the offspring population with Npop strings. This generation update scheme is similar to that of the elitist nondominated sorting genetic algorithm (NSGA-II37). In this generation update scheme, the number of elite solutions can be viewed as the population size (i.e., all strings in each population can be viewed as elite solutions). We also examine the standard generation update scheme with a single elite solution. 22.3.2. Comparison of Various Genetic
Operations
Through computational experiments, we examine the performance of the seven crossover operations and the four mutation operations. We use the following parameter specification in our single-objective genetic algorithm with NpOp elite solutions: Population size (Npop): 100, Crossover probability: 1.0, Mutation probability: 1.0, Stopping condition: Evaluation of 100,000 solutions. The performance of each crossover operation is examined by applying
536
H. Ishibuchi and Y. Shibata
our genetic algorithm to each of the four single-objective test problems 100 times. In this computational experiment, we use the insertion mutation. The average value of the makespan is shown in Table 1 together with the corresponding standard deviation in parentheses. The best (i.e., smallest) average value in each column is highlighted by boldface in Table 1. From this table, we can see that Version 2 of the one-point order crossover and Version 1 of the two-point order crossover work well for all the four test problems. This observation suggests that the utilization of the right-hand side of a string as a building block (i.e., the inheritance of jobs processed later in a schedule) is important in the implementation of efficient genetic algorithms. This is also supported by poor performance of Version 1 of the one-point order crossover and Version 2 of the two-point order crossover for the 80-job maximum tardiness minimization problem. We can also see from Table 1 that the uniform order crossover does not work well for the makespan minimization problems. Table 22.88. Comparison of the seven crossover operations on the four single-objective test problems.
Crossover operation One-point One-point One-point Two-point Two-point Two-point Uniform
Version 1 Version 2 Version 3 Version 1 Version 2 Version 3
Makespan 40-job 80-job 3336 (13.3) 3335 (13.1) 3332 (11.9) 3333 (12.9) 3338 (17.2) 3334 (14.5) 3347 (17.1)
5490 (18.7) 5487 (17.4) 5484 (17.2) 5484 (15.4) 5490 (18.2) 5487 (20.0) 5515 (23.7)
Maximum tardiness 40-job 80-job 87 76 75 75 78 82 74
(54.9) (47.0) (48.0) (49.2) (52.9) (50.9) (43.8)
291 207 231 213 252 222 212
(67.2) (85.7) (76.1) (74.3) (74.1) (75.9) (64.8)
In the same manner as Table 1, we examine the performance of each mutation operation. We use Version 1 of the two-point order crossover in this computational experiment. Experimental results are summarized in Table 2. From this table, we can see that the best results are obtained by the insertion mutation for all the four test problems. From the comparison between Table 1 and Table 2, we can see that the choice of a mutation operation has a much larger effect on the performance of our genetic algorithm than the choice of a crossover operation. Based on our experimental results in Table 1 and Table 2, we decide to use Version 1 of the two-point order crossover and the insertion mutation in this chapter. We also compare the two generation update schemes with each other:
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
537
Table 22.89. Comparison of the four mutation operations on the four single-objective test problems.
Mutation operation Adjacent two-job Arbitrary two-job Arbitrary three-job Insertion
Makespan 40-job 3490 (38.1) 3354 (19.6) 3434(29.2) 3333 (12.9)
80-job 5787 (65.4) 5515 (25.9) 5616(36.5) 5484 (15.4)
Maximum tardiness 40-job 723 (143.9) 135 (42.8) 206(48.1) 75 (49.2)
80-job 1962 (263) 336 (82.5) 700(143.6) 213 (74.3)
One genetic algorithm used in the above computational experiments has NpOp elite solutions while the other has a single elite solution. Since the appropriate specifications of the crossover and mutation probabilities are different between these two algorithms, we examine 10 x 10 combinations of the crossover probability Pc and the mutation probability PM '• Pc = 0.1, 0.2, ..., 1.0 and PM — 0.1, 0.2, ..., 1.0. Using each combination, each genetic algorithm is applied to each test problem 20 times. Experimental results on the 80-job makespan minimization problem are shown in Fig. 7. From this figure, we can see that better results are obtained by the genetic algorithm with Npop elite solutions than that with a single elite solution. When the number of elite solutions is small (e.g., a single elite solution), high mutation probabilities lead to poor search ability as shown in Fig. 7 (b). On the other hand, the higher the mutation probability is, the higher the search ability is in Fig. 7 (a) with many elite solutions. Similar observations are obtained from computational experiments on the 80-job maximum tardiness minimization problem in Fig. 8. In Table 3, we summarize the best result by each algorithm for each test problem over the 100 combinations of the crossover and mutation probabilities. Table 3 shows the average result over 20 runs with the best combination of these two parameters. From this table, we can see that much better results are obtained by the single-objective genetic algorithm with Npop elite solutions. In general, a large number of elite solutions have a negative effect on the diversity of solutions while they have a positive effect on the convergence speed of solutions. This negative effect is observed in Fig. 7 and Fig. 8 when the mutation probability PM is small. In Fig. 9, we show the distribution of the values of the makespan of randomly generated 1,000,000 schedules for the 80-job problem and the best (i.e., smallest) value of the makespan obtained by the above computational experiments using the genetic algorithms. From this figure, we can see that the obtained
538
H. Ishibuchi and Y. Shibata
Fig. 22.7. Comparison between the two generation update schemes on the 80-job test problem with the objective of minimizing the makespan.
Fig. 22.8. Comparison between the two generation update schemes on the 80-job test problem with the objective of minimizing the maximum tardiness.
best solution is far from randomly generated schedules. This means that the convergence speed to the optimal solution is very important. As a result, the positive effect of a large number of elite solutions on the convergence speed overwhelms their negative effect on the diversity of solutions in a wide range of parameter values in Fig. 7 and Fig. 8. In this chapter, we use the generation update scheme with Npop elite solutions.
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
539
Table 22.90. Comparison of the two generation update schemes on the four single-objective test problems using the best combination of the crossover and mutation probabilities for each algorithm and each test problem.
Algorithm Npop elite solutions Single elite solution
Makespan
Maximum tardiness
40-job
80-job
40-job
80-job
3330 (13.1) 3354 (16.8)
5479 (11.3) 5503 (22.9)
49 (36.2) 80 (41.7)
185 (60.0) 230 (44.0)
Fig. 22.9. Distribution of the values of the makespan of randomly generated 1,000,000 schedules of the 80-job test problem. The best (i.e., smallest) value of makespan obtained in our computational experiments using the genetic algorithms is also shown.
22.3.3. Performance Evaluation of Genetic 9 10
Algorithms
In some comparative studies ' , it was reported that genetic algorithms were outperformed by local search methods (e.g., simulated annealing, taboo search and multi-start local search) in their applications to permutation flowshop scheduling. In this subsection, we compare our singleobjective genetic algorithm with a multi-start local search algorithm where local search is repeated from randomly generated initial solutions. In local search, we use the insertion mutation for generating neighboring solutions. Thus the size of the neighborhood structure is (n — I) 2 for n-job permutation flowshop scheduling problems (i.e., 1521 for the 40-job test problem and 6241 for the 80-job test problem). We use the first move strategy in local search. That is, neighboring solutions are examined in a random order and the current solution is replaced with the first solution that improves the current one. As a stopping condition of local search, we use a parameter Lfails. When Lfails solutions have already been examined in the neighborhood of the current solution (i.e., when local moves have succes-
540
H. Ishibuchi and Y. Shibata
sively failed Lfails times), local search is terminated. In this case, an initial solution is randomly generated for restarting local search. We compare our genetic algorithms with the multi-start local search algorithm under the same computation load (i.e., the examination of 100,000 solutions). We examine the multi-start local search algorithm using various values of Lfails: Lfails = 10, 20, 50, 100, 200, 500, 1000. Using each value of Lfails, the multi-start local search algorithm is applied to each test problem 100 times. Average results over the 100 runs are summarized in Table 4 where the experimental results by our genetic algorithms are cited from Table 3. We can see from Table 4 that our genetic algorithms are outperformed by the multi-start local search algorithm except for the 40-job makespan minimization problem. Table 22.91. Comparison between the multi-start local search algorithm and our genetic algorithms. The best result over the 100 combinations of the crossover and mutation probabilities is cited from Table 3 as the result of our genetic algorithms for each test problem. , , , l-jails
Makespan
Maximum tardiness
10 20 50 100 200 500 1000
40-job 3532 (19.3) 3477 (15.7) 3419 (13.3) 3387 (11.9) 3363 (10.1) 3345 (11.8) 3338 (12.2)
80-job 5876 (26.5) 5752 (28.0) 5633 (23.0) 5551 (22.7) 5496 (15.6) 5475 (10.6) 5475 (12.0)
40-job 556 (54.0) 294 (31.9) 139 (13.0) 88 (19.3) 35 (20.1) 27 (16.0) 27 (16.0)
80-job 1513 (169) 787 (97.8) 330 (34.1) 218 (46.7) 176 (60.2) 171 (61.8) 171 (61.8)
GA {Npop elites) GA (single elite)
3330 (13.1) 3354 (16.8)
5479 (11.3) 5503 (22.9)
49 (36.2) 80 (41.7)
185 (60.0) 230 (44.0)
In some comparative studies9'10, it was also reported that very good results were obtained by genetic local search algorithms (i.e., hybrid algorithms of genetic algorithms and local search). Thus we implement a genetic local search algorithm based on our genetic algorithm with Npop elite solutions. We use the same local search procedure as in the above-mentioned multi-start local search algorithm. The local search procedure is applied to each offspring generated by the genetic operations with a pre-specified local search application probability Pis- We examine 10 x 11 combinations of the local search termination parameter Lfails and the local search application probability PLS: Lfails = 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000 and PLs = 0, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0. It should be noted
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
541
that the genetic local search algorithm with PLS — 0 is the same as our genetic algorithm. Using each combination of I-fails and PLS, the genetic local search algorithm is applied to each test problem 20 times using the evaluation of 100,000 solutions as the stopping condition. Experimental results are summarized in Fig. 10 (a) for the 80-job makespan minimization problem and Fig. 10 (b) for the 80-job maximum tardiness minimization problem. In this computational experiment, we use the best combination of the crossover and mutation probabilities found in the previous computational experiments for our genetic algorithm on each test problem. That is, these parameters are not tuned for the genetic local search algorithm but tuned for the genetic algorithm (i.e., for the case of PLS = 0 in Fig. 10). From Fig. 10, we can see that the hybridization with local search improves the performance of our genetic algorithm when Lfails and PLS are appropriately specified. The genetic local search algorithm also outperforms the multi-start local search algorithm. For example, the best average result is 171 by the multi-start local search algorithm in Table 4 for the 80-job maximum tardiness minimization problem while it is 151 by the genetic local search algorithm in Fig. 10. Similar results are obtained for the two 40-job test problems as shown in Fig. 11. Moreover the hybridization with local search has a positive effect on the efficiency of genetic algorithms. In Fig. 12, we show the average CPU time for each combination of Lfails and PLS corresponding to Fig. 11. From Fig. 12, we can see that the hybridization with local search decreases the CPU time of genetic algorithms.
22.4. Multi-Objective Genetic Algorithms While genetic algorithms are often outperformed by local search in their applications to single-objective permutationflowshopscheduling, they have an inherent advantage in their applications to multi-objective problems. That is, a large number of various non-dominated solutions can be simultaneously obtained by their single run. On the other hand, a single solution is usually obtained by a single run of other search algorithms. In this chapter, we use the elitist nondominated sorting genetic algorithm (NSGA-II37) because its high search ability has been frequently reported in the literature and its implementation is relatively easy. Of course, other EMO (evolutionary multi-objective optimization) algorithms can be applied to multi-objective permutation flowshop scheduling problems.
542
H. Ishibuchi and Y. Shibata
(a) Minimization of the makespan.
(b) Minimization of the maximum tardiness.
Fig. 22.10. Performance of the genetic local search algorithm on the two 80-job test problems. Experimental results with P^s = 0 show the performance of the non-hybrid genetic algorithm.
(a) Minimization of the makespan.
(b) Minimization of the maximum tardiness.
Fig. 22.11. Performance of the genetic local search algorithm on the two 40-job test problems. Experimental results with PLS = 0 show the performance of the non-hybrid genetic algorithm.
22.4.1. NSGA-II
Algorithm
The basic framework of the NSGA-II is the same as our single-objective genetic algorithm in the previous section. This is because we used the same framework as the NSGA-II when we implemented our single-objective genetic algorithm for comparison. In the NSGA-II, Npop offspring are generated from the current population with Npop solutions. Then the best iVpop
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
(a) Minimization of the makespan.
543
(b) Minimization of the maximum tardiness.
Fig. 22.12. Average CPU time of the genetic local search algorithm with each combination of PLS an<3 l-fails for the two 40-job test problems.
solutions are chosen from the current and offspring populations for constructing the next population. The point is how to evaluate each solution because multiple objectives are involved. The NSGA-II uses the Pareto dominance relation and the concept of crowding for evaluating each solution. When the next population is constructed by choosing the best ATpop solutions, first the current and offspring populations are merged to form a tentative population. Then a rank is assigned to each solution in the tentative population using the concept of Pareto ranking. That is, the first rank is assigned to all the non-dominated solutions in the tentative population. All solutions with the first rank are removed from the tentative population and added to the next population. The second rank is assigned to all the non-dominated solutions in the reduced tentative population. All solutions with the second rank are removed from the reduced tentative population and added to the next population. In this manner, good solutions with respect to multiple objectives are chosen and added to the next population. If the number of the solutions in the next population exceeds the pre-specified population size (i.e., Npop), solutions with the worst rank in the next population are sorted using the concept of crowding. Each solution is evaluated by the sum of the distances from adjacent solutions with the same rank. That is, the crowding measure of each solution is calculated as the sum of the distances from adjacent solutions with the same rank. More specifically, two adjacent solutions of each solution are identified with respect to each objective. Then the dis-
544
H. Ishibuchi and Y. Shibata
tance between those adjacent solutions is calculated on each objective and summed up over all the objectives for calculating the measure of crowding. To each extreme solution with the maximum or minimum value of at least one objective among the same rank solutions, an infinite large value is assigned as the crowding measure because one of the two adjacent solutions cannot be identified. Solutions with larger values of the crowding measure are viewed as being better because those solutions are not located in crowded regions in the objective space. Solutions with the worst rank are removed from the next population in the increasing order of the crowding measure until the number of remaining solutions in the next population becomes the pre-specified population size. When a pair of parent solutions is to be selected from the current population by the binary tournament selection, each solution is also evaluated in the same manner (i.e., using its rank as the primary criterion and the crowding measure as the secondary criterion). 22.4.2. Performance Evaluation of the NSGA-II
Algorithm
We apply the NSGA-II to the two-objective 40-job and 80-job test problems in Section 2 using the same parameter specification in Section 3 (i.e., population size: 100, crossover probability: 1.0, mutation probability: 1.0, stopping condition: evaluation of 100,000 solutions). In Fig. 13, we show experimental results by a single run of the NSGA-II on each test problem. Each figure shows the initial population, an intermediate population at the 50th generation and the final population at the 1000th generation. From Fig. 13, we can see that the NSGA-II simultaneously minimizes both objectives while maintaining the diversity of solutions. For evaluating the performance of the NSGA-II, we compare solution sets obtained by the NSGA-II with those by multiple runs of our singleobjective genetic algorithm with Npop elite solutions. More specifically, we apply the NSGA-II to each two-objective test problem 10 times. From this computational experiment, 10 solution sets are obtained. Our singleobjective genetic algorithm is also applied to each of the corresponding single-objective test problem 10 times. From this computational experiment, 20 solutions are obtained (10 solutions from each single-objective test problem). Obtained solutions are shown in Fig. 14. From this figure, we can see that a variety of solutions cannot be obtained by multiple runs of our single-objective genetic algorithm. We can also see that better results are obtained by our single-objective genetic algorithm if we consider only
545
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
(a) 40-job test problem.
(b) 80-job test problem.
Fig. 22.13. The initial population, an intermediate population at the 50th generation and the final population at the 1000th generation in a single run of the NSGA-II on the two-objective 40-job and 80-job test problems.
(a) 40-job test problem.
(b) 80-job test problem.
Fig. 22.14. Comparison between solutions obtained by the NSGA-II (small closed circles) and those by our single-objective genetic algorithm (open circles). a single objective. We also use the following weighted scalar objective function in our single-objective genetic algorithm: /(x) =Wl
x.fi(x)+w2
x / 2 (x),
(D.I)
where x denotes a solution, fx (x) is the makespan, f2 (x) is the maximum
546
H. Ishibuchi and Y. Shibata
tardiness, and Wi and u>2 are non-negative weights. For simultaneously minimizing both objectives, we specify the weight vector w = (u>i, w2) as w = (0.5, 0.5). Our single-objective genetic algorithm is applied to each twoobjective test problem 10 times for minimizing the weighted scalar objective function in (1) with w = (0.5, 0.5). In this computational experiment, we use the same stopping condition as the NSGA-II: evaluation of 100,000 solutions. Experimental results are summarized in Fig. 15. From this figure, we can see that a variety of solutions cannot be obtained by our singleobjective genetic algorithm for minimizing the weighted scalar objective function with the fixed weight vector. We can also see that our singleobjective genetic algorithm slightly outperforms the NSGA-II when our objective is to minimize the weighted scalar objective function with the fixed weight vector w = (0.5, 0.5).
(a) 40-job test problem.
(b) 80-job test problem.
Fig. 22.15. Comparison between solutions obtained by the NSGA-II (small closed circles) and those by our single-objective genetic algorithm for minimizing the weighted scalar objective function (open circles).
For finding a variety of solutions by our single-objective genetic algorithm, we use the following five weight vectors in the weighted scalar objective function: w = (1, 0), (0.75, 0.25), (0.5, 0.5), (0.25, 0.75), (0, 1). Our single-objective genetic algorithm is applied to each test problem for minimizing the weighted scalar fitness function with each weight vector. For comparing the NSGA-II with our single-objective genetic algorithm under the same computation load, we use the evaluation of 20,000 solutions (i.e.,
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
547
1/5 of 100,000 solutions in the case of the NSGA-II) as the stopping condition of our single-objective genetic algorithm. This is because our singleobjective genetic algorithm is applied to each test problem five times for obtaining five solutions, each of which corresponds to the minimization of the weighted scalar objective function with each weight vector. Five solutions are obtained by our single-objective genetic algorithm. Those solutions are compared with a solution set obtained by a single run of the NSGA-II in Fig. 16. From this figure, we can see that the quality of obtained solutions by our single-objective genetic algorithm is not high because the available computation load for its single run is specified as 1/5 of the NSGA-II for allowing multiple runs in order to obtain multiple solutions. On the other hand, a large number of non-dominated solutions can be obtained by a single run of the NSGA-II.
(a) 40-job test problem.
(b) 80-job test problem.
Fig. 22.16. Comparison between solutions obtained by the NSGA-II (small closed circles) and those by multiple runs of our single-objective genetic algorithm for minimizing the weighted scalar objective function with various weight values (open circles).
We also perform the same computational experiment as Fig. 16 using the multi-start local search algorithm with Lfails = 1000 in Section 3. Experimental results are shown in Fig. 17. In contrast to Fig. 16, we cannot say that the NSGA-II outperforms multiple runs of the multi-start local search algorithm in Fig. 17. This is because the multi-start local search algorithm is more efficient than genetic algorithms for single-objective permutation flowshop scheduling. Advantages of multi-objective genetic algorithms over
548
H. Ishibuchi and Y. Shibata
local search, however, become clearer when we apply them to permutation flowshop scheduling problems with many objectives. This is because local search should be executed many times in order to obtain a variety of nondominated solutions in high-dimensional objective spaces (i.e., because we cannot use long CPU time for a single run of local search). On the other hand, a large number of non-dominated solutions can be obtained by a single run of multi-objective genetic algorithms.
(a) 40-job test problem.
(b) 80-job test problem.
Fig. 22.17. Comparison between solutions obtained by the NSGA-II (small closed circles) and those by multiple runs of the multi-start local search algorithm for minimizing the weighted scalar objective function with various weight values (open circles).
22.4.3. Extensions to Multi-Objective
Genetic
Algorithms
In the design of evolutionary multi-objective optimization (EMO) algorithms, there exist two conflicting requirements: One is to increase the convergence speed to the Pareto front and the other is to increase the diversity of solutions. We demonstrate that the choice of parent solutions has positive and negative effects on these two requirements. We also demonstrate the effect of the hybridization with local search on the performance of EMO algorithms. We use a similarity-based mating scheme47, which is illustrated in Fig. 18. First, a candidates are chosen by iterating the binary tournament selection a times. Then the most extreme solution among them is selected as one parent (say, Parent A). This selection is based on the distance from each
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
549
candidate in the objective space to the average vector of the a candidates. For choosing the other parent (say, Parent B), j3 candidates are chosen by iterating the binary tournament selection /3 times. Then the most similar candidate to Parent A is selected as Parent B. This selection is based on the distance from Parent A in the objective space to each candidate.
Fig. 22.18. Similarity-based mating scheme.
In the similarity-based mating scheme, the diversity of solutions is increased by the use of a large value of a while the convergence speed to the Pareto front is increased by a large value of /?. These effects are demonstrated in Fig. 19 where we depict a solution set obtained for the 40-job test problem by a single run of the NSGA-II using the similarity-based mating scheme with each combination of a and 0. For comparison, experimental results by the original NSGA-II are also depicted in Fig. 19. It should be noted that the similarity-based mating scheme with (a, /?) = (1, 1) is exactly the same as the standard binary tournament selection. Fig. 19 (a) shows two extreme cases (i.e., (a, /3) = (10, 1), (1, 10)) where we can observe the above-mentioned effects of the similarity-based mating scheme. On the other hand, a and /? are appropriately specified in Fig. 19 (b) where the convergence speed to the Pareto front is improved without degrading the diversity of obtained solutions. The efficiency of evolutionary multi-objective optimization (EMO) algorithms can be improved by the hybridization with local search19'25-39. The hybridization, however, is not straightforward if compared with the case of single-objective optimization. This is because local search is a single-
550
H. Ishibuchi and Y. Shibata
(a) Extreme parameter specifications.
(b) Appropriate parameter specifications.
Fig. 22.19. Solution sets obtained for the 40-job test problem by the NSGA-II with the similarity-based mating scheme.
objective optimization technique. We implement a hybrid EMO algorithm by combining local search with the NSGA-II in the following manner. As we have already explained, an offspring population with Npop solutions is generated from the current (i.e., parent) population with Npop solutions in the NSGA-II. An initial solution for local search is chosen from the offspring population using the binary tournament selection with replacement. This selection is based on the weighted scalar objective function in (1) where the weight vector w = (w\,W2) is randomly specified whenever an initial solution is to be chosen. Then local search is applied to a copy of the selected initial solution for improving the weighted scalar objective function with the current weight vector in the same manner as in the previous subsection. The execution of local search is terminated based on the termination parameter Lfails. When the initial solution is improved by local search, the improved solution is added to the offspring population. The selection of an initial solution and the application of local search to the selected initial solution are iterated Npop x Pis times where PLS is the local search application probability. The next population is constructed from the parent population and the offspring population in the same manner as in the NSGA-II. In Fig. 20, we show solution sets obtained for the 80-job test problem by a single run of our hybrid EMO algorithm with each combination of PLS a n d Lfails: (PLS,Lfails) = (1, 500), (0.2, 1). In the case of (PLS,Lfails) = (1, 500), almost all solutions are examined in the local search part. Actually
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
551
the number of updated generations in the EMO part is 1 in this case. As a result, the diversity of solutions is degraded by the hybridization with local search while the convergence speed is not degraded. Since local search can be more efficiently executed than genetic search, the average CPU time is decreased from 10.6 seconds of the non-hybrid NSGA-II to 7.6 seconds of our hybrid EMO algorithm with (PLS,Lfails) = (1, 500). On the other hand, a good balance between local search and genetic search is realized in Fig. 20 (b) where (Pts,L/ai/s) = (0.2, 1). In this case, the number of updated generations is 806.
(a) Too much local search.
(b) Appropriate parameter specifications.
Fig. 22.20. Solution sets obtained for the 80-job test problem by our hybrid EMO algorithm and the non-hybrid original NSGA-II.
22.5. Conclusions In this chapter, we illustrated how genetic algorithms can be applied to single-objective and multi-objective permutation flowshop scheduling. We showed through computational experiments that the order-based crossover and the insertion mutation work well for permutation flowshop scheduling. We also showed that multi-objective genetic algorithms are superior to multiple runs of single-objective optimization techniques in terms of the diversity of solutions while single-objective genetic algorithms are inferior to single-objective local search in many cases. While we used the NSGA-II in our computational experiments, we can use other evolutionary multi-objective optimization (EMO) algorithms for permutation flowshop
552
H. Ishibuchi and Y. Shibata
scheduling. For improving the performance of those EMO algorithms, we suggested the use of a similarity-based mating scheme and the hybridization with local search. References 1. R. A. Dudek, S. S. Panwalkar and M. L. Smith, "The lessons of flowshop scheduling research," Operations Research 40 (1992) 7-13. 2. S. M. Johnson, "Optimal two- and three-stage production schedules with setup times included," Naval Research Logistics Quarterly 1 (1954) 61-68. 3. K. R. Baker and G. D. Scudder, "Sequencing with earliness and tardiness penalties: A review," Operations Research 38 (1990) 22-36. 4. I. H. Osman and C. N. Potts, "Simulated annealing for permutation flow-shop scheduling," OMEGA 17 (1989) 551-557. 5. H. Ishibuchi, S. Misaki and H. Tanaka, "Modified simulated annealing algorithms for the flow shop sequencing problem," European Journal of Operational Research 81 (1995) 388-398. 6. E. Taillard, "Some efficient heuristic methods for the flow shop sequencing problem," European Journal of Operational Research 47 (1990) 65-74. 7. M. Ben-Daya and M. Al-Fawzan, "A tabu search approach for the flow shop scheduling problem," European Journal of Operational Research 109 (1998) 88-95. 8. C. R. Reeves, "A genetic algorithm for flowshop sequencing," Computers and Operations Research 22 (1995) 5-13. 9. T. Murata, H. Ishibuchi and H. Tanaka, "Genetic algorithms for flowshop scheduling problems," Computer and Industrial Engineering 30 (1996) 10611071. 10. C. A. Glass and C. N. Potts, "A comparison of local search methods for flow shop scheduling," Annals of Operations Research 63 (1996) 489-509. 11. C. Dimopoulos and A. M. S. Zalzala, "Recent developments in evolutionary computation for manufacturing optimization: Problems, solutions, and comparisons," IEEE Trans, on Evolutionary Computation 4 (2000) 93-113. 12. M. Nawaz, E. Enscore and I. Ham, "A heuristic algorithm for the m-machine ra-jobflow-shopsequencing problem," OMEGA 11 (1983) 91-95. 13. Y. B. Park, C. D. Pegden and E. E. Enscore, "A survey and evaluation of static flowshop scheduling heuristics," International Journal of Production Research 22 (1984) 127-141. 14. T. C. Lai, "A note on heuristics offlow-shopscheduling," Operations Research 44 (1996) 648-652. 15. B. Chen, C. A. Glass, C. N. Potts and V. A. Strusevich, "A new heuristic for three-machineflowshop scheduling," Operations Research 44 (1996) 891-898. 16. R. L. Daniels and R. J. Chambers, "Multiobjective flowshop scheduling," Naval Research Logistics 37 (1990) 981-995. 17. J. Sridhar and C. Rajendran, "Scheduling in flowshop and cellular manufacturing systems with multiple objectives - A genetic algorithmic approach," Production Planning and Control 7 (1996) 374-382.
Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling
553
18. T. Murata, H. Ishibuchi and H. Tanaka, "Multi-objective genetic algorithm and its applications to flowshop scheduling," Computer and Industrial Engineering 30 (1996) 957-968. 19. H. Ishibuchi and T. Murata, "A multi-objective genetic local search algorithm and its application to flowshop scheduling," IEEE Trans, on Systems, Man, and Cybernetics - Part C: Applications and Reviews 28 (1998) 392-403. 20. T. P. Bagchi, Multiobjective Scheduling by Genetic Algorithms (Kluwer Academic Publishers, Boston, 1999). 21. T. P. Bagchi, "Pareto-optimal solutions for multi-objective production scheduling problems," Lecture Notes in Computer Science 1993 (2001) 458471. 22. E. Talbi, M. Rahoual, M. H. Mabed and C. Dhaenens, "A hybrid evolutionary approach for multicriteria optimization problems: Application to the Flow Shop," Lecture Notes in Computer Science 1993 (2001) 416-428. 23. M. Basseur, F. Seynhaeve and E. G. Talbi, "Design of multi-objective evolutionary algorithms: Application to the flow-shop scheduling problem," Proc. of 2002 Congress on Evolutionary Computation (2002) 1151-1156. 24. C. A. Brizuela and R. Aceves, "Experimental genetic operators analysis for the multi-objective permutation flowshop," Lecture Notes in Computer Science 2632 (2003) 578-592. 25. H. Ishibuchi, T. Yoshida and T. Murata, "Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling," IEEE Trans, on Evolutionary Computation 7 (2003) 204-223. 26. C. A. Coello Coello, "A comprehensive survey of evolutionary-based multiobjective optimization techniques," Knowledge and Information Systems 1 (1999) 269-308. 27. D. A. van Veldhuizen and G. B. Lamont, "Multiobjective evolutionary algorithms: Analyzing the state-of-the-art," Evolutionary Computation 8 (2000) 125-147. 28. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms (John Wiley & Sons, Chichester, 2001). 29. C. A. Coello Coello, D. A. van Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems (Kluwer Academic Publishers, Boston, 2002). 30. J. D. Schaffer, "Multi-objective optimization with vector evaluated genetic algorithms," Proc. of 1st International Conference on Genetic Algorithms and Their Applications (1985) 93-100. 31. C. M. Fonseca and P. J. Fleming, "Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization," Proc. of 5th International Conference on Genetic Algorithms (1993) 416-423. 32. J. Horn, N. Nafpliotis and D. E. Goldberg, "A niched Pareto genetic algorithm for multi-objective optimization," Proc. of 1st IEEE International Conference on Evolutionary Computation (1994) 82-87. 33. N. Srinivas and K. Deb, "Multiobjective optimization using nondominated sorting in genetic algorithms," Evolutionary Computation 2 (1994) 221-248. 34. E. Zitzler and L. Thiele, "Multiobjective evolutionary algorithms: A com-
554
35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47.
H. Ishibuchi and Y. Shibata
parative case study and the strength Pareto approach," IEEE Trans, on Evolutionary Computation 3 (1999) 257-271. J. D. Knowles and D. W. Corne, "Approximating the nondominated front using Pareto archived evolution strategy," Evolutionary Computation 8 (2000) 149-172. A. Jaszkiewicz, "Genetic local search for multi-objective combinatorial optimization," European Journal of Operational Research 137 (2002) 50-71. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE Trans, on Evolutionary Computation 6 (2002) 182-197. E. Zitzler, K. Deb and L. Thiele, "Comparison of Multiobjective Evolutionary Algorithms: Empirical Results," Evolutionary Computations (2000) 173-195. A. Jaszkiewicz, "On the performance of multiple-objective genetic local search on the 0/1 knapsack problem - A comparative experiment," IEEE Trans, on Evolutionary Computation 6 (2002) 402-412. P. Brucker, Scheduling Algorithms (Springer, Berlin, 1998). D. Whitley, T. Starkweather and D. Fuquay, "Scheduling problems and traveling salesmen: the genetic edge recombination operator," Proc. of 3rd International Conference on Genetic Algorithms (1989) 133-140. T. Starkweather, S. McDaniel, D. Mathias, D. Whitley and C. Whitley, "A comparison of genetic sequence operators," Proc. of 4th International Conference on Genetic Algorithms (1991) 69-76. D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning (Addison-Wesley, Reading, 1989). I. Oliver, D. Smith and J. Holland, "A study of permutation crossover operators on the travelling salesman problem," Proc. of 2nd International Conference on Genetic Algorithms (1987) 224-230. C. Bierwirth, D. C. Mattfeld and H. Kopfer, "On permutation representations for scheduling problems," Lecture Notes in Computer Science 1141 (1996) 960-970. M. Gen and R. Cheng, Genetic Algorithms and Engineering Design (John Wiley k Sons, New York, 1997). H. Ishibuchi and Y. Shibata, "A similarity-based mating scheme for evolutionary multiobjective optimization," Lecture Notes in Computer Science 2723 (2003) 1065-1076.
CHAPTER 23 EVOLUTIONARY OPERATORS BASED ON ELITE SOLUTIONS FOR BI-OBJECTIVE COMBINATORIAL OPTIMIZATION
Xavier Gandibleux LAMIH/ROI - UMR CNRS 8530, Universite de Valenciennes Le Mont Houy - F59313 Valenciennes cedex 9, France E-mail: Xavier. [email protected] Hiroyuki Morita Faculty of Economics, Osaka Prefecture University Sakai, Osaka 599-8231, Japan E-mail: morita@eco. osakafu-u. ac.jp Naoki Katoh Graduate School of Engineering, Kyoto University Kyoto 606-8501, Japan E-mail: [email protected] Combinatorial optimization problems with multiple objectives represent an important area of mathematical programming. Yet with two objectives, these problems are difficult to solve, even when, for example, the problems considered are structured according to their constraint matrix. Sometimes subsets of non-dominated solutions can be computed or approximated easily. Such solutions can set up an initial elite solution set that can be used advantageously in an evolutionary algorithm with the appropriate operators. This chapter describes a population-based method, using evolutionary operators based on elite solutions, to approximate efficient solutions of bi-objective combinatorial optimization problems. The operators used are a crossover, a path-relinking and a local search on elite solutions. The method has been applied to the biobjective assignment problem and the bi-objective knapsack problem. These two fundamental problems are encountered in practical applications, such as resource assignment and portfolio design, and are sub555
556
X. Gandibleux, H. Morita, N. Katoh
problems of other more complicated problems, such as transportation problems.
23.1. Introduction Combinatorial Optimization is studied extensively in operational research. Due to its potential for application in real world problems (vehicle routing, bin-packing, timetabling, etc.), this field of study has prospered over the last few decades. However, real world decision-making involves dealing with several, usually conflicting, objectives. For example, a decision-maker faced with a portfolio problem has to balance the risks and the returns of investment, taking both objectives into account simultaneously. Obviously, there is generally no single unique optimal solution, and thus decision-makers have to deal with efficient solutions that best meet their multiple objectives. For these reasons, the increasing interest of many researchers in the field of multi-objective combinatorial optimization (MOCO) in recent years is hardly surprising. Since 1990, specific methodologies have been developed, and the number of papers in the field has increased considerably8. Still, the theoretical complexity of MOCO7 problems is certainly a major obstacle to the development of exact methods. Even with only two objectives, computing all efficient solutions is generally difficult, and even if the single-objective version of the problem is polynomially solvable. Thus, as is the case for single objective problems, a reasonable alternative to exact methods for solving large-scale instances of MOCO problems is to derive an approximation method. The challenge for these methods in multi-objective programming is to find "good" solutions which approximate all efficient solutions of the problem. And here, Multiple Objective Heuristics and Multiple Objective MetaHeuristics are powerful methods aiming to provide a good tradeoff between the "quality" of the set of elite solutions and the time and memory required to produce them. The approximation is called the set of potential efficient solutions or, in the context of evolutionary algorithms, the set of elite solutions. When a solution is included in this set, no other solution computed by the procedure at this step dominates that solution. The approximation methods for multi-objective problems first appeared in 1984. Since then, pioneer methods have been introduced: Genetic Algorithms (Schaffer 198420), Artificial Neural Networks (Malakooti 199018), Simulated Annealing (Serafini 199221), and Tabu Search (Gandibleux 19969). Two characteristics are common to these pioneering methods. First,
Evolutionary Operators Based on Elite Solutions for biCO
557
they are inspired exclusively by evolutionary algorithms, or by neighborhood search algorithms. Second, the first methods were direct derivations of single objective optimization metaheuristics, adapted to integrate the concept of efficient solutions for optimizing multiple objectives. Recent multi-objective metaheuristics are often hybridized. For example, some methods based on neighborhood search algorithms handle a population of solutions4'16. For some MOCO problems, subsets of efficient solutions can be computed or approximated easily. Such solutions can set up an initial solution set that can be used advantageously in an evolutionary algorithm with the appropriate operators. This is the primary concern of this chapter. The generic principle of a population-based method that identifies the efficient frontier of bi-objective combinatorial optimization problems is described. The operators used are a crossover, a path-relinking and a local search on elite solutions. Numerical experiments underline the method's effectiveness for quickly obtaining an approximation of the exact efficient frontier for two bi-objective combinatorial optimization problems: the assignment problem and the knapsack problem.
23.2. MOCO Problems and Solution Sets Given a finite set X and Q > 1 objective functions zq : X —t E, q = 1 , . . . , Q, a multi-objective combinatorial optimization (MOCO) problem is defined as7 : u
nnn"(z1(x),...,zQ(x))
(MOCO)
A solution x € X is a feasible decision, where X is called the decision space. A vector z(x) = (z1(x),..., zQ{x)),z(x) 6 Z is a performance, where Z is called the objective space. Typically, two types of objective functions are often considered, namely the sum and the bottleneck objectives. The problem is then to solve (MOCO) where the meaning of "min" has still to be defined. Often the minimization in (MOCO) is understood in the sense of efficiency, also called Pareto optimality. Since we are interested in the bi-objective case (denoted biCO), Q is set to 2 in the continuation. A solution x € X is called efficient if there is no other feasible solution x' G X, such that zq(x') < zq(x) for all q — 1,2 with at least one strict inequality. The corresponding vector z(x) is called a non-dominated point in the objective space Z. The set of all efficient solutions is denoted by E, and the representation of E in Z is called the efficient frontier, or also the Pareto front.
558
X. Gandibleux, H. Morita, N. Katoh
For a (biCO), the efficient solution set is generally partitioned into two subsets. The set SE of supported efficient solutions is a subset of E, such that z(x) for x G SE is an optimal solution of the following parametrized single objective problem for some A = (A1, A2) with A1, A2 > 0 : 2
mm ^ A ' z ' O r )
(biCOA)
9=1
If a convex hull C — conv{z(z) : x 6 E} is computed in the objective space, z(a;) for any supported efficient solution x £ SE belongs to the boundary of this convex hull. SE is composed of SE1 and SE2; SE1 is the set of SE solutions x, such that z(x) is on the vertex of the convex hull C and SE2 = SE \ SE1. Computing the SE2 set is generally more difficult than computing SE1, because the former requires the enumeration of all optimal solutions that minimize (biCO.\) with A given. A solution x in the set NE = E \ SE of non-supported efficient solutions is the one for which z(x) is not on the boundary of the convex hull. In the bi-objective case, NE solutions are located in the triangles drawn on two successive supported efficient solutions in the objective space. There is no theoretical characterization leading to the efficient computation of NE solutions. Generally, several distinct efficient solutions xl,x2, x3 can correspond to the same non-dominated point z(x1) = z(x2) = z(x3) in the objective space. The solutions x1,x2,xi are said to be equivalent in the objective space. The number of such equivalent solutions is generally quite large, and so the enumeration of all of them may be intractable. In such a situation, it is impossible to design an efficient algorithm that can compute all efficient solutions. All the introduced sets are then redefined restrictively according to the notion of a minimal complete set17 of efficient solutions. A set of efficient solutions is minimal if and only if no two of its efficient solutions are equivalent. The application of this definition to the introduced sets gives rise to the Em, SEm, SElm, SE2m, and NEm minimal complete sets. Figure 23.1, which summarizes the inclusion relationship among these sets, illustrates, for example, that SE\m C SEm C SE. The published papers are sometimes unclear about the ability of the algorithms that they present. Some authors claim that their algorithm can enumerate "all" efficient solutions in terms of the set E. However, as mentioned before, it is generally difficult to compute this set. Thus, it is important to clearly define the class of efficient solutions handled by the algorithm.
Evolutionary Operators Based on Elite Solutions for biCO
559
Pig. 23.1. Classification of efficient solutions
23.3. An Evolutionary Heuristic for Solving biCO Problems The principle of our heuristic11'12'13 is based on the intensive use of three operators applied to a population composed uniquely of elite solutions. The following sections present the main features of the heuristic. Its algorithmic framework is shown in algorithm 1. 23.3.1. Overview of the Heuristic Let us introduce PE, which denotes the set of elite solutions. PE is first initialized with a subset of supported solutions (routine detectPEinit). Three operators are used: a crossover (routine crossoverWithElites), a path-relinking (routine pathRelinkingWithElites), and a local search (routine localSearchOverNewElites). Upper and lower bound sets defined in the objective space (routine buildBoundSets) provide acceptable limits for performing a local search. A genetic map, derived from the elite solutions (routine elaborateGeneticInf ormation), provides useful information to crossover operators for fixing certain bits. This genetic information is refreshed periodically. Each new elite solution is noted (routine noteNewSolutions). Three rules, which can be used separately or in combination, define a stopping condition (routine isTheEnd?). Basically, the heuristic can be stopped after a predefined effort (rule 1 with parameter iterationMax) or after an elapsed time (rule 2 with parameter timeMax). Rule 3 concerns the detection of unfruitful iterations. This rule allows the heuristic to be stopped when no new elite solutions are produced after a certain number of iterations (parameters seekChangesFrequency and noChangeMax). Each iteration of the algorithm performs one crossover operation, which generates one solution, and one path-relinking operation, which generates
560
X. Gandibleux, H. Morita, N. Katoh
Algorithm 1 The entry point Require: input data which determines the objective functions and constraints ; parameter(s) for the stopping condition chosen. Ensure: PE - - Compute the initial elite population set PEinit detectPEinit( data 4- , pelnit t) ; pe 4- pelnit --| Compute the lower and the upper bound sets buildBoundSets( data 4- , pe \. , lowerB t, upperB t ) --| A first local search on the PEina solution set localSearchOverNewElites(pe X ) --| Identify the genetic heritage and elaborate the genetic map elaborateGeneticInf ormation( pe 4- , map "f) - -| Initialize the running indicators iteration 4- 1 ; elapsedTime <— 0 ; changes -f- 0 ; noMore «— 0 repeat - -| Elaborate a solution by crossover crossoverWithElites( pe £ , map 4- , lowerB 4-, upperB 4-) - -| Elaborate a series of solutions by path-relinking pathRelinkingWithElites( pe £ , lowerB | , upperB 4-) --| Apply a local search to the new elite solutions in PE localSearchOverNewElites( pe £ ) - -| Refresh the genetic heritage by integrating genetic information --| from the new PE into the existing map if (iteration MOD refreshMapFrequency = 0) then elaborateGeneticInf ormation( pe 4- , map | ) end if
--| Identify the producer of new potential solutions and note --| a series of iterations without production of new PE noteNewSolutions( pe 4- , changes t ) if (iteration MOD seekChangesFrequency = 0) then noMore «— (changes — 0 ? noMore + 1 : 0 ) ; changes «— 0 end if - -| Check the stopping condition(s) until isTheEnd?( iteration-|--(- 4- , elapsedTime 4- , noMore 4-)
Evolutionary Operators Based on Elite Solutions for biCO
561
a list of solutions. For each of these generated solutions, a local search is performed (section 23.3.7), if and only if the solution is promising, meaning that it falls into the "admissible area" (section 23.3.3). All solutions potentially efficient in this neighborhood are added to the existing set PE. The iteration ends once again by performing a local search operation for each newly included elite solution in PE set. Due to the simplicity of the method, performing one iteration consumes little CPU time, especially when the individual is not promising (in which case, no local search is performed). This allows the heuristic to be very aggressive in implementing a generation process that performs many iterations. In addition the approximation set contains only elite solutions. The algorithm maintains PE, and iteratively improves it, moving it towards the set of exact efficient solutions. Thus, a poor solution, one that is far from the exact efficient frontier, will never be introduced to the approximation. At any time, the heuristic will produce only good approximations of the efficient frontier. Unlike other Multi-Objective Evolutionary Algorithms2, our heuristic performs no direction searches to drive the approximation process, and it requires no ranking method (there is no fitness measure). This is important, given that direction searches and ranking are often criticized; the former for its difficulty in guiding the heuristic search along the efficient frontier, and the latter for requiring increased computing efforts. 23.3.2. The Initial
Population
The initial population is the set SElm. When a polynomial time algorithm is available for the single objective combinatorial optimization, (biCO^) can be solved efficiently for a fixed A. Clearly, this initial population set gives a complete description of the efficient frontier (figure 23.2). When such a polynomial time algorithm is not available, a heuristic can be used to obtain an approximate description of the efficient frontier10. In any case, the exact or approximate set of SElm can be obtained by solving the parametric problem (biCC-A) for all possible A. By applying a dichotomic scheme in the objective space, solving this problem is possible in time proportional to the size of |£.El m | multiplied by the running time needed to solve (biCO,\) for a fixed A. In this chapter, we assume that an efficient algorithm exists for computing the exact set of SElm. Clearly, some efficient solutions belonging to SE2m can be obtained using this computation principle as a byproduct. Obviously, these solutions are
562
X. Gandibleux, H. Morita, N. Katoh
integrated into the initial population set, denoted by PEinit. However, no specific algorithm has been developed for computing solutions belonging to SE2m.
Fig. 23.2. Example of an initial solution set (squares). Bullets are solutions (Em \ PEinit) that will be approximated by the heuristic
23.3.3. Bound Sets and Admissible Areas The upper bound set is defined by the set of "local nadir points", where one nadir point is derived from two adjacent supported solutions. Specifically, if xi and x2 are two adjacent supported solutions in the objective space, the corresponding nadir is a point in Z with the following coordinates : (ma,x(z1(x1),zl(x2)) ,ma,x(z2(x1),z2{x2))) Figure 23.3 illustrates the upper bound set. Using these points, an initial area (SI) is derived inside the efficient frontier. A point in the area SI can correspond to either a feasible or an infeasible solution. The lower bound set is defined in a symmetrical manner, by computing: (min(z1{x1),z1(x2))
,min (z2{x1), z2{x2)))
Evolutionary Operators Based on Elite Solutions for biCO
563
Fig. 23.3. Efficient solutions (squares), upper bound set (bullets), lower bound set (stars) and the admissible areas where a local search procedure will be performed to new solutions. Filled squares are supported solutions, empty squares are non-supported ones. Grey triangles denote areas where efficient solutions can exist
for all adjacent supported solutions x\ and X2 in the objective space. Obviously, this bound also defines a second admissible area (52) outside the efficient frontier which is composed only of infeasible solutions. The use of areas 51 and 52 allows the design of an oscillation strategy15 between the feasible and infeasible parts of the search space along the efficient frontier. These bound sets are used in a heuristic strategy to determine whether a solution is a candidate for an intensive search in its neighborhood. All solutions in both areas are considered promising for finding new elite solutions. A local search is performed, beginning with such promising solutions. Because no effort is wasted on solutions outside of the admissible area, this heuristic strategy helps to save computing effort. 23.3.4. The Genetic Map The mechanism used here is inspired by the principle of pheromones in artificial ant colonies6. Assuming that similarities exist between efficient solution vectors, we compute the occurrence frequency of the values of the
564
X. Gandibleux, H. Morita, N. Katoh
elite solutions for each component (which is often a variable). A roulette wheel is built to store those occurrence frequencies which provides genetic information. A genetic map comprised of the roulette wheels of each solution vector's component, contains the genetic heritage of our population. This information is used extensively by the crossover operator (section 23.3.5). The genetic information is always derived from elite solutions. The initial map is thus derived only from exact supported solutions. Periodically, the genetic map is refreshed. Once PE (the current set of elite solutions) has been significantly renewed, it is used to rebuild the roulette wheels. This activity indicates an important evolution in the population. In the current version, refreshment occurs after a predefined number of generations (parameter ref reshMapFrequency) has been performed. This parameter value has been experimentally set to 100 000 generations. 23.3.5. The Crossover
Operator
For each crossover operation, two parent individuals, x\ and X2, are randomly selected from the current elite population, and one offspring x3 is produced. Genes common to the parents are replicated in the child, and the other genes are determined using the genetic map, on the basis of the occurrence frequency stored in the roulette wheel (figure 23.4).
Fig. 23.4. Crossover operator principle
Suppose the values of a component j for two parents are different. The value of the component j for an offspring can be randomly determined according to the probability value stored in the roulette wheel j . However, the solution so obtained may not be feasible in general. To ensure feasibility, the value of component j is determined randomly from a list of feasible selections. Another option allows infeasible solutions to also be considered as candidates for a local search (depending whether or not they are located in the admissible area).
Evolutionary Operators Based on Elite Solutions for biCO
23.3.6. The Path-Relinking
565
Operator
Path-relinking generates new solutions by exploring the trajectories that connect elite solutions. Starting from one solution -the initiating solution, a path is generated through the neighborhood space that leads to the other solution -the guiding solution15. Because the population contains only elite solutions, the presence of a path-relinking operator in our heuristic is a natural development. A path-relinking operation starts by randomly selecting I A and IB, two individuals from the current elite population (figure 23.5). Because both individuals are elite, both could potentially be the guiding solution. Let IA be the initiating solution and IB, the guiding solution. The pathrelinking operation generates a path IA(— / O ) , / I , . .. , / B , such that the distance between h and IB decreases monotonically in i, where the distance is denned as the number of positions for which different values are assigned in Ii and IB-
Fig. 23.5. Path-relinking operator principle
Although many such paths may possibly exist, one path is chosen using random moves based on a swap operator. (Details are provided in Ref. 13.) Such randomness introduces a form of diversity to the solutions generated along the path. For every intermediate solution /,, a single solution is generated in the neighborhood (figure 23.6). Like the crossover operation, the bound set is used to determine whether the solution produced falls into the admissible area. If so, the solution is compared with the current list of elite solutions, and a local search is performed. Otherwise, no improvement strategy is triggered, and the solution is simply ignored.
566
X. Gandibleux, H. Morita, N. Katoh
Fig. 23.6. Illustration of a possible path construction. I A and IB are two individuals randomly selected from the current elite population (small bullets). IA is the initiating solution, and IB is the guiding solution. N(IA) is the feasible neighborhood according to the move defined. IA — I\ — li — /3 — I4 — IB is the path that is built
23.3.7. The Local Search Operator A classic neighborhood structure based on a swap move has been adopted for implementing the local search operator. Let us consider two positions ji and j 2 of a solution x, where Xjx and Xj2 are values in positions j \ and J2, respectively. Then using the swap move (ji,.72)1 the set of pairwise exchanges (iji,ij2) with jl = l,...,n — 1 and j2 = jl,...,n defines an associated neighborhood J\f(x) in the current solution x. J\f(x) may contain infeasible solutions. Such solutions y 6 Af(x) may be considered if y is located in area 52, or if y is located in area SI and z(y) is not dominated by a solution from PE. Because the computational cost for a local search can be significant (O(n 2 )), the local search will only be triggered on the promising candidate solutions that result from the crossover and the path-relinking operators.
Evolutionary Operators Based on Elite Solutions for HCO
567
23.4. Application to Assignment and Knapsack Problems with Two Objectives The general principle of our population-based heuristic has been applied to two classic bi-objective (MOCO) problems: the assignment problem (MAP) and the knapsack problem (biKP). 23.4.1. Problem
Formulation
The assignment problem with two objectives (biAP) can be formulated as follows, where c\ are non-negative integers and x — ( i n , . . . , xnn) : n
n
"min"^) =Y,Y,cuXil
5= 1 2
'
i=i i=i n n
^ x i=i
u
= 1
i =
l,...,n
xu E {0,1} The (single-objective) assignment problem (AP) is a well-known fundamental combinatorial optimization problem. The goal is to find an optimal assignment of n tasks to n positions so that every task is assigned to exactly one position, and no two tasks are assigned to the same position, c^ denotes the cost incurred by assigning the task i to position j for objective q. Efficient specific algorithms exist to solve the single objective assignment problem, such as the Hungarian method or the successive shortest path method1. The 0-1 knapsack with two objectives (biKP) can be formulated as follows, where coefficients cj, Wi and w are nonnegative constants and x = (xi,...,x n ) : n
" max " z" (x) = YsC1xi
9=1>2
2 ^ WiXi < U xt G {0,1} The single objective 0 - 1 knapsack problem is also a well-known combinatorial optimization problems. Although it is known to be NP-hard, it
568
X. Gandibleux, H. Morita, N. Katoh
can be solved efficiently in a practical sense by a branch and bound method or by using dynamic programming. (See the book by Martello and Toth19 for details about knapsack problems.) In addition, a fully polynomial-time approximation scheme exists. 23.4.2. Experimental Protocol A library of numerical instances for MOCO problems is available online at www.terry.uga.edu/mcdm/. This library contains data for both the assigment and the knapsack problems. A series of fifteen instances were used for our (MAP) experiments. The objective coefficients c\,• were generated randomly in the range [1,20], with a problem size n ranging from 5 to 100. For the biKP, we used nine randomly generated problem instances, where the problem size n ranges from 100 to 500. The objective's coefficients c\ and weights Wi were also randomly generated in the range [1,100]. The entry w on the right-hand side of the constraint 5ZiLi wixi ^ w i s correlated with vector w as follows : u = 0.5 x Y^i=i wiFor these problems, the minimal complete set of efficient solutions Em was computed, using Cplex on a mainframe5, and broken down into the characteristic subsets, specially SElm and SE2m. The computer used for the experiments was a desktop equipped with a Pentium 4 2.6GHz processor, with 1 GB of RAM installed. The operating system was Redhat linux, version 9, and the algorithms were implemented in language C. The compilation was done using gcc-2.95.3 with the optimizer option -03. The following three stopping rules were used both separately and in combination. Rule 1: number of iterations. The heuristic is stopped after a predetermined effort (parameter iterationMax). Rule 2: timeout. The heuristic is stopped after a predefined elapsed time (parameter timeMax); Rule 3: unfruitful iterations. After a cycle of a predetermined number of iterations (parameter seekChangesFrequency), the rule checks to see if new elite solutions were added during the cycle. Consecutive cycles without change are counted, and the heuristic is stopped when a predetermined number of cycles has been recorded (parameter noChangeMax). The default parameter value of iterationMax in the heuristic is 250 000
Evolutionary Operators Based on Elite Solutions for biCO
569
iterations, and the genetic map is refreshed every 100 000 iterations. The suggested value for rule 3 is 2 cycles of 100 000 iterations. For each problem size, we repeated the experiments five times, using the different random seeds. We used Mi (introduced by Ulungu23) for measuring the ratio of exact efficient solutions contained in the elite solution set PE, i.e., Mi = \PE(~]Em\/\Em\. Minimal, average, and maximal values for Mi for each input size, have been recorded. 23.5. Numerical Experiments with the Bi-Objective Assignment Problem The initial population set, PEinn is computed by solving a series of parametric assignment problems according to a dichotomic scheme. Each single objective assignment problem is solved by the Successive Shortest Path algorithm. By using such a dichotomic scheme to generate SElm, some solutions belonging to SE2m can also be generated. Thus, PEinu contains all SElm solutions and some SE2m solutions. Assignments are coded as permutations of n tasks instead of the n2dimensional 0 — 1 vector. For example, the coded solution x = (4,2,1,5,3) means task i = 4 is assigned to position j = 1, i — 2 to j = 2, etc. A neighborhood J\f(x), associated with the current solution x, is the set of permutations obtained by applying pairwise exchanges (iji,ij2) to the current solution, where j \ — l,...,n — 1 and j2 = jl + l,...,n. Any move resulting from the pairwise exchange preserves the feasibility of the assignment. Consequently, no oscillation is designed for this problem. The genetic map is composed of n roulette wheels corresponding to n positions, each of which represents the occurrence frequency of assignment "i in position j " in vector x for elite solutions. The crossover and path-relinking operators are designed as described in Sections 23.3.5 and 23.3.6. 23.5.1. Minimal Complete Solution Sets and Initial Elite Solution Set Figure 23.7 shows how the Em, SEm, NEm, SElm, SE2m and PEinit grow as the problem size increases. The CPU time needed for computing PEinu is small compared to the time needed to run the heuristic. According to figure 23.7, the number of solutions in each subset increases linearly with the input size for these instances. Interestingly, the difference between SElm and SE2m seems to decrease with input size. Examining PEinn column confirms that PEinu contains all SElm so-
570
X. Gandibleux, H. Morita, N. Katoh
Table 23.92. Number of solutions in the minimal complete set Em and its distribution in the subsets. Initial set of elite solutions (number and CPUt). PEinit computed using a successive shortest path (SSP) method instance n
Em
NEm
SEm
SElm
5 10 15 20 25 30 35 40 45 50 60 70 80 90 100
8 16 39 55 74 88 81 127 114 163 128 174 195 191 223
5 10 27 42 49 61 54 73 71 96 84 114 126 108 122
3 6 12 13 25 27 27 54 43 67 44 60 69 83 101
3 6 12 13 20 24 25 34 32 39 39 42 47 51 50
Fig. 23.7.
Cplex SE2m
PEinH
SSP CPUt
0 0 0 0 5 3 2 20 11 28 5 18 22 32 51
3 6 12 13 21 24 25 38 33 43 41 46 50 60 59
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 2.0 3.0 5.0 6.0
Minimal complete sets and the initial elite sets
{PEinit)
571
Evolutionary Operators Based on Elite Solutions for biCO
lutions, plus some additional solutions from SE2m. For instances with a large input size, it is not surprising to observe an increasing difference between SElm and the PEinit. Because solutions belonging to SE2m increase with input size, the probability of generating a SE2m solution using the dichotomic principle is increased. 23.5.2. Our Results Compared with Those Existing in the Literature The reported computational results in column two of table 23.93 are taken from the results published in Tuyttens et al.22. They were obtained for one run only, using the improved MOSA method with the same set of numerical instances. Figure 23.8 shows that the MOSA method performs poorly in Table 23.93.
instance ~ nxn
MOSA vs the population-based heuristic (rules 1 and 3 enabled)
MOSA ~ Mi
CPUt avg 00
PEinit #iter avg 100 000
87.5
min 100.0
avg iOOO
Mi max 100.0
10 15 20 25 30 35 40 45 50 60 70 80 90
56.2 25.6 3.7 0.0 3.4 0.0 0.0 0.0 0.0 N.A. N.A. N.A. N.A.
100.0 100.0 96.4 93.2 94.4 95.1 86.7 84.2 86.7 71.1 67.8 78.5 66.0
100.0 99.5 99.3 94.1 96.9 96.1 90.8 87.7 87.5 73.6 69.4 80.0 71.7
100.0 100.0 100.0 98.7 97.8 96.3 93.0 89.5 88.5 76.6 70.1 84.5 74.4
1.0 1.0 4.0 4.2 7.2 7.4 8.8 10.4 14.2 14.4 20.4 27.4 31.6
100 000 100 000 190 000 250 000 250 000 250 000 250 000 250 000 250 000 250 000 250 000 250 000 250 000
100
N_A.
58.7
61.4
66.4
43.8
250 000
5~~
terms of the quality measure Mi, especially as the size of the instances increases. Since the computer used to obtain these results is different from the one used in our experiments (MOSA results were computed on a DEC3000 alpha), discussion of CPUt is not possible. The columns on the right-hand side of table 23.93 report the results of our heuristic when rules 1 and 3 were activated. The number of generations was tuned to 250 000 iterations, and the genetic map was refreshed every 100 000 iterations. The CPUt indicated is an average value for five complete
572
X. Gandibleux, H. Morita, N. Katoh
insiance s i ^ i i x n;
Fig. 23.8. The population based heuristic with rules 1-3 activated, compared with MOSA
runs of the heuristic. The time reported includes the time for computing the initial population of elite solutions, and the time used for the approximation. The rightmost column of the table gives the average number of iterations performed by the algorithm during one generation. Any value other than 250 000 indicates that rule 3 was triggered before rule 1. The comparison of MOSA results with those produced by the population-based heuristic proposed in this chapter, underlines clearly that our heuristic outranks the MOSA performances (figure 23.8). We presume that our heuristic consumes more time than the MOSA method. (CPUt for MOSA were reported to be 5s and 246s respectively for instances 5 x 5 and 50 x 50.) However, our heuristic shows two important features that convince us that our method would outperform the MOSA method in tests run on the same computer. First, the solution detection evolves very quickly during the early iterations of the generation 13 . Despite the brief time allowed for the generation, the quality of the approximations is already good. Second, our heuristic is able to improve its approximation when more time is allowedforthe generation process, which does not seem to be possible for MOSA, whose approximation did not improve, even given more time. Also,
Evolutionary Operators Based on Elite Solutions for biCO
573
according to the values reported for the quality indicator M2 in Tuyttens et al.22, the MOSA method rapidly has difficulty in detecting good approximations of the efficient frontier. Because our heuristic uses SElm and manages only elite solutions, use of the M2 quality indicator makes no sense here (M2 = 100% at all times). 23.6. Numerical Experiments with the Bi-Objective Knapsack Problem In this second set of numerical experiments, the initial population set, PElinit is also computed by solving a series of the parametric knapsack problem, denoted by (biKP\). As with the biAP, this is done using the dichotomic scheme. The single objective knapsack problems are solved using the branch and bound procedure19. A solution is coded as an n-dimensional 0-1 vector. A neighborhood Af(x), associated with the current solution x, is the set of permutations obtained by applying pair wise exchanges (iji,ij2), with jl = l,...,n — 1 and j2 = jl + 1, ...,n to the current solution. An infeasible solution x can result from such a pairwise exchange, due to the violation of the knapsack constraint. Such a solution x is not deleted when its vector performance z(x) either belongs to area S2 or, belongs to area SI with z(x) not dominated by a PE solution. (SI and S2 are defined in section 23.3.3, and are considered here in the maximization case.) The principle is to admit promising infeasible solutions located near the efficient frontier in the objective space. All accepted infeasible solutions are subject to a local search. Infeasible solutions are used as trials for jumping between the feasible and infeasible domains in the decision space. However, only feasible solutions are considered in the continuation. This use of infeasible solutions designs an oscillation strategy to seek the neighborhoods of promising infeasible solutions. This strategy is also triggered for infeasible solutions resulting from the crossover and path-relinking operations. The path-relinking operator adopted for the (biKP) follows the principle described in section 23.3.6. However, the path between two solutions 71 and 72 may be long, and the number of solutions elaborated along the path may be quite important, requiring a large number of operations to be performed for each path-relinking built. To reduce the effort required, more than one swap is performed for each step in the path, which speeds up path construction. Specifically, if 5 is the number of different genes between 71 and 72, the value p G [1,6] is selected
574
X. Gandibleux, H. Morita, N. Katoh
at random, and only p solutions are built along the path. According to this strategy, the path elaborated is a sample of solutions from 7o, I\,..., 12.
23.6.1. Minimal Complete Solution Sets and the Initial Elite Solution Set Figure 23.7 shows how the sizes of Em, SEm, NEm, SElm, SE2m and PEinit grow as the problem size increases. Although the knapsack problem is known to be NP-hard, the running time required by the branch and bound algorithm is reasonable for these instances. If the CPUt for computing PEinit should become large for other instances, an approximate algorithm (like a greedy one) instead of branch and bound could be advantageously used. (The impact of the initial solution set on the detection ratio is discussed in Gandibleux et al10.)
Table 23.94. Distribution of efficient solutions Em, its subsets and PEinit computed with a Branch & Bound-based method instance n 100 150 200 250 300 350 400 450 500
Em 172 244 439 629 713 871 1000 1450 1451
NEm 149 216 400 579 658 805 930 1369 1353
SEm 23 28 39 50 55 66 70 81 98
SElm 23 28 38 50 54 63 69 73 96
Benchmark SE2m 0 0 1 0 1 3 1 8 2
PEinit 23 28 38 50 54 63 69 75 96
B&B CPUt 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0
As shown in Figure 23.9, the number of solutions in each subset except SE2m increases linearly with the input size for these instances. Unlike the situation with biAP, SE2m is very small and remains insignificant even when input size increases. As for the assignment problem, PEinit is composed of all solutions belonging to SElm, plus some additional solutions from SE2m. Table 23.94 presents all the solutions computed. As shown, \NEm\ grows quickly as instance size increases. On the other hand, \SEm\ stays small. These results are very contrasted from those produced with the biAP.
Evolutionary Operators Based on Elite Solutions for biCO
Fig. 23.9. Minimal complete sets and initial elite set
575
(PEinit)
23.6.2. Our Results Compared with Those Existing in the Literature The results of the proposed population-based method have been compared with those obtained using the MGK algorithm10, a genetic algorithm that uses crossover, mutation and local search operators. The results of the comparison are presented in Table 23.95 and Figure 23.10. (Both algorithms were implemented on the same computer, described in section 23.4.2). This comparison highlights the advantages of our proposed heuristic over the MGK. The proposed algorithm produces a better approximation of efficient solutions E with less CPUt. As with MAP, the approximation PE is improved when more CPU time is allowed, which means that when the rule 1 is triggered, the approximation has not yet been saturated. 23.7. Conclusion and Perspectives We have described a population-based heuristic for solving bi-objective combinatorial optimization problems. This heuristic uses three operators - crossover, path-relinking and local search - on a population composed
576
X. Gandibleux, H. Morita, N. Katoh
Table 23.95. M\ (avg) and CPUt(avg, including the computation of initial elite set). In this experiment, iterationMax of stopping rule 1 is 600,000. The elapsedTime of stopping rule 2 is shown in the rightmost column instance~ n 100 150 200 250 300 350 400 450 500
EMO01 Mi 98.8 98.6 96.7 95.3 94.8 95.7 92.8 91.6 91.8
CPUt 110.6 225.0 495.2 740.2 1693.0 2336.0 2949.8 5896.4 6297.6
Fig. 23.10.
rulel Afi 99.9 99.6 99.0 97.9 97.1 96.9 95.1 94.3 92.2
CPUt 42.0 148.8 274.6 379.6 665.6 860.8 925.4 1935.2 1633.2
rule2 Mi 99.8 99.6 99.2 98.4 97.8 98.0 97.3 96.9 96J3
elapsedTime 50 200 400 700 1500 2000 2500 5500 6000
Comparison with EMO01
uniquely of elite solutions. Based on simple principles, our heuristic is easy to implement and involves only two parameters that require no tuning phase. In the first step, a set of efficient supported solutions is computed. This set composes the initial population of elite solutions. In the second step, this population undergoes a generation process involving the three opera-
Evolutionary Operators Based on Elite Solutions for biCO
577
tors. Through computational experiments with the biojective assignment problem and the bi-objective knapsack problem, we have verified that, in comparison with other heuristics, our proposed heuristic is able to produce an excellent approximation of the efficient frontier even given a short computing time (i.e. a small number of iterations). Several perspectives for further research exist. The first path concerns the elaboration of the genetic information. In order to handle larger instances more efficiently, it must be useful to divide the genetic information into sectors along the efficient frontier, using a region-based principle 3 . Such a technique would help maintain a representative genetic map locally and avoid genetic map sterilized by too many diverse solutions. Second promising path would be to design a less random path-relinking operator. Generating several neighbors according to a given characteristic, and selecting the best neighbor from among them, could make path-relinking even more powerful. A third possibility concerns the local search. In the current version, a local search is systematically applied to a solution when it is located in the promising zone. The same solution can be visited several times and thus repetitively generate the same neighbors. A filtering on solutions before starting the local search would help to reduce computation time. Lastly, further experimentation with a broader class of combinatorial optimization problems would help to confirm the resolution performances of our heuristic. References 1. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network flows: Theory, Algorithms, and Applications, Prentice-Hall, 1993. 2. C. Coello, D. Van Veldhuizen and G. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, 2002. 3. D.W. Corne, N.R. Jerram, J.D. Knowles, and M.J. Oates. PESA-II: Regionbased Selection in Evolutionary Multiobjective Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 283-290, Morgan Kaufmann Publishers, 2001. 4. P. Czyzak and A. Jaszkiewicz. A multiobjective metaheuristic approach to the localization of a chain of petrol stations by the capital budgeting model. Control and Cybernetics, 25(1):177-187, 1996. 5. F. Degoutin and X. Gandibleux. Un retour a"experiences sur la resolution de problemes combinatoires bi-objectifs, 5e journee du groupe de travail Programmation Mathematique MultiObjectif (PM20), Angers, France, mai 2002. 6. M. Dorigo and G. Di Caro. The Ant Colony Optimization Meta-Heuristic. In D. Corne, M. Dorigo and F. Glover, editors, New Ideas in Optimization,
578
X. Gandibleux, H. Morita, N. Katoh
McGraw-Hill, 11-32, 1999. 7. M. Ehrgott. Multiple Criteria Optimization - Classification and Methodology. Shaker Verlag, Aachen (1997). 8. M. Ehrgott and X. Gandibleux. Multiobjective Combinatorial Optimization. In Multiple Criteria Optimization: State of the Art Annotated Bibliographic Survey (M. Ehrgott and X. Gandibleux Eds.), pp.369-444, Kluwer's International Series in Operations Research and Management Science : Volume 52, Kluwer Academic Publishers, Boston, 2002. 9. X. Gandibleux, N. Mezdaoui, and A. Freville. A tabu search procedure to solve multiobjective combinatorial optimization problems. In R. Caballero, F. Ruiz, and R. Steuer, editors, Advances in Multiple Objective and Goal Programming, volume 455 of Lecture Notes in Economics and Mathematical Systems, pages 291-300. Springer Verlag, Berlin, 1997. 10. X. Gandibleux, H. Morita, N. Katoh. The Supported Solutions used as a Genetic Information in a Population Heuristic. In Evolutionary Multi- Criterion Optimization (E. Zitzler, K. Deb, L. Thiele, C. Coello, D. Corne Eds.). Lecture Notes in Computer Sciences 1993, pp.429-442, Springer, 2001. 11. X. Gandibleux, H. Morita, N. Katoh. Use of a genetic heritage for solving the assignment problem with two objectives. In Evolutionary Multi-Criterion Optimization (C. Fonseca, P. Fleming, E. Zitzler, K. Deb, L. Thiele Eds.). Lecture Notes in Computer Sciences 2632, pp 43-57, Springer, 2003. 12. X. Gandibleux, H. Morita, N. Katoh. Impact of clusters, path-relinking and mutation operators on the heuristic using a genetic heritage for solving assignment problems with two objectives. MIC2003 Fifth Metaheuristics International Conference, Kyoto, Japan, August 25 - 28, 2003. 13. X. Gandibleux, H. Morita, and N. Katoh. A population-based metaheuristic for solving assignment problems with two objectives. Technical Report n°7/2003/ROI, LAMIH, Universite de Valenciennes, 2003. To appear in Journal of Mathematical Modelling and Algorithms. 14. X. Gandibleux, M. Sevaux, K. Sorensen and V. T'kindt (Eds.) Metaheuristics for Multiobjective Optimisation Proceedings of the workshop "MOMH: Multiple Objective MetaHeuristics", November 4-5, 2002, Carre des Sciences, Paris. Lecture Notes in Economics and Mathematical Systems 535, 249 pages, Springer Berlin. 15. F. Glover and M. Laguna. Tabu search. Kluwer Academic Publishers, Boston, 1997. 16. M.P. Hansen. Tabu search for multiobjective combinatorial optimization: TAMOCO. Control and Cybernetics, 29(3):799-818, 2000. 17. P. Hansen. Bicriterion path problems. In Multiple Criteria Decision Making Theory and Application (G. Fandel and T. Gal Eds. ). Lecture Notes in Economics and Mathematical Systems 177, pp 109-127. Springer Verlag, Berlin, 1979. 18. B. Malakooti, J. Wang, and E.C. Tandler. A sensor-based accelerated approach for multi-attribute machinability and tool life evaluation. International Journal of Production Research, 28:2373, 1990. 19. S. Martello and P. Toth. Knapsack Problems-Algorithms and Computer Im-
Evolutionary Operators Based on Elite Solutions for biCO
579
plementations. John Wiley & Sons, Chichester (1990). 20. J.D. Schaffer. Multiple objective optimization with vector evaluated genetic algorithms. In J.J. Grefenstette, editor, Genetic Algorithms and their Applications: Proceedings of the First International Conference on Genetic Algorithms, 93-100. Lawrence Erlbaum, Pittsburgh, 1985. 21. P. Serafini. Simulated annealing for multiobjective optimization problems. In Proceedings of the 10th International Conference on Mutiple Criteria Decision Making, Taipei-Taiwan, volume I, pp.87-96, 1992. 22. D. Tuyttens, J. Teghem, Ph. Fortemps and K. Van Nieuwenhuyse. Performance of the MOSA method for the bicriteria assignment problem, Journal of Heuristics, 6 pp. 295-310, (2000). 23. E.L. Ulungu. Optimisation Combinatoire multicritere: Determination de I'ensemble des solutions efficaces et methodes interactives, Universite de Mons-Hainaut, Faculte des Sciences, 313 pages, 1993.
CHAPTER 24 MULTI-OBJECTIVE RECTANGULAR PACKING PROBLEM
Shinya Watanabe Department of Human and Computational Intelligence, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan. E-mail: sinQsys.ci.ritsumei.ac.jp Tomoyuki Hiroyasu Doshisha University Department of Engineering 1-3 Tatara Miyakodani,Kyo-tanabe, Kyoto, 610-0321, JAPAN E-mail: [email protected] This chapter describes an implementation of a Multi-Objective Genetic Algorithm (MOGA) for the Multi-Objective Rectangular Packing Problem (RP). RP is a well-known discrete combinatorial optimization problem arising in many applications, such as a floor-planning problem in the LSI problem, truck packing problem, etc. Over the last 20 years, Evolutionary Algorithms (EAs), including Genetic Algorithms (GA), have been applied to RP, as EAs are adapted for pattern generation. On the other hand, many cases of RP have become multi-objective optimization problems. For example, floor-planning problems should take care of the minimum layout area, the minimum length of wires, etc. Therefore, RP is a very important problem as an application of MOGA. In this chapter, we describe the application of MOGA to Multi-Objective RP. We treat RP as two objective optimization problems to archive several critical layout patterns, which have different aspect ratios of packing area. We used the Neighborhood Cultivation GA (NCGA) as a MOGA algorithm. NCGA includes not only the mechanisms of effective algorithms, such as NSGA-II and SPEA2, but also the mechanism of neighborhood crossover. The results were compared to those obtained using other methods. Through numerical examples, we found that MOGA is a very effective method for RP. Especially, NCGA can provide the best
581
582
S. Watanabe and T. Hiroyasu
solutions as compared to other methods.
24.1. Introduction In this chapter, we describe the implementation of a Multi-Objective Genetic Algorithm (MOGA) for the Multi-Objective Rectangular Packing(RP). RP is a well-known discrete combinatorial optimization problem arising in many applications, such as the VLSI layout problem2'4'10'12-15-16,the truck packing problem 14, etc. As RP is a wellknown NP-hard and discrete problem, good heuristic methods such as Genetic Algorithms (GA) or Simulated Annealing (SA), are generally applied. The VLSI layout problem is one of the most important RP because there are many VLSI layout problems such as chip floor planning, standard cell, macro cell digital placement, and analog placement which have the same goal of optimally packing arbitrarily sized blocks. In addition, layout complexity is becoming an important design consideration as VLSI device integration is doubling every two to three years. In addition, a floor-layout problem is essentially a multi-objective optimization problem involving the minimum layout area, the minimum length of wires, the minimum overlapping area, etc. Therefore, RP is a very important problem as an application of a MOGA. As the variety of packing is infinite, the important key for successful optimization is the introduction of a finite solution space that includes an optimal solution. We used a sequent-pair to represent the solution of rectangular packing. Sequence-pair schemes can represent not only slicing structures but also non-slicing structures. In this chapter, we describe the application of MOGA to MultiObjective RP. We treat the RP as two objective optimization problems to achieve several critical layout patterns, which have different aspect ratios of packing area. We used Neighborhood Cultivation GA (NCGA) as a MOGA algorithm 17. NCGA includes not only the mechanisms of effective algorithms, such as NSGA-II and SPEA2, but also the mechanism of neighborhood crossover. This model can be used to derive good nondominated solutions in typical multi-objective optimization test problems. The results were compared to those obtained with other methods: SPEA2, NSGA-II, and non-NCGA (NCGA without neighborhood crossover). Through numerical examples, we found that MOGA was a very effective method for RP, because several good solutions were found with small iterations in one trial. NCGA obtained the best solutions with a small area of layout as compared
Multi-Objective Rectangular Packing Problem
583
to the other methods. Just after this section, we introduce the formulation of RP(Section 24.2), followed by a discussion of the application of GA for RP (Section 24.3). Section 24.4 introduces our proposed NCGA. Finally, Section 24.5 presents the results of the experiments for test data. 24.2. Formulation of Layout Problems Many layout problems can be treated as rectangular packing problems (RP) in the real world. RP involves the placement of a given set of rectangular blocks of arbitrary size without overlap on a plane within a rectangle of minimum area, and is a well-known discrete combinatorial optimization problem in many applications, such as VLSI layout problems 10
RP
There have been a number of previous studies of multi-objective RP, for example in structural synthesis of cell-based VLSI circuits 1, placement of power electronic devices on liquid-cooled heat sinks6, and the truck packing problem 14, etc. In this chapter, we treat the RP as a bi-objective optimization. This multi-objective RP aims to minimize not only the packing area but also both the width and height of the packing area. In this formulation, we can obtain various Pareto solutions that have different aspect ratios by performing a single search. Therefore, a decision maker can select the aspect ratio of the packing area.
584
5. Watanabe and T. Hiroyasu
Fig. 24.1. Example of placement.
Next, we will describe the formulation of the multi-objective RP adopted in this chapter.
min fi (x) = width of packing area of blocks min fi (x) — length of packing area of blocks These two objectives have tradeoff relations with each other. 24.3. Genetic Layout Optimization GAs have been applied to various aspects of digital VLSI design. Examples include cell placement (layout)4'10'11'12'15'16, channel routing9, test pattern generation13, etc. There are two key issues in the use of a GA for RP. • Representation of individuals • GA operators These issues have a strong influence on the search ability of the GA. If these points are not carefully considered, it is not possible to obtain good results in real time. The following sections describe these considerations in some detail.
Multi-Objective Rectangular Packing Problem
585
Fig. 24.2. Slicing structure and Polish expression.
24.3.1. Representations There are two distinct spatial representations of placement configurations2. The first is the so-called flat or absolute representation that has been used in earlier studies4. In this method, block positions are denned in terms of absolute coordinates on a gridless plane. As this method allows blocks to overlap in possibly illegal ways, this method uses a weighted penalty cost term that is associated with infeasible overlaps, and this penalty must be driven to zero in the optimization process. However, the total overlap in the final placement solution is not necessarily zero. In addition, the weighted penalty cost must be carefully instituted: if it is too small, the blocks may tend to collapse, while if it is too large we may not obtain good search ability. Moreover, the packing variety of this method is infinite. In contrast to the fiat representation, in the topological representations, block positions are specified in a relative manner. The most common representations are based on the slicing model that assumes the blocks are organized in a set of slices that recursively bisect the layout horizontally and vertically. The direction and nesting of the slices is recorded in a slicing tree or equivalently in a normalized Polish expression15. In this method, blocks cannot overlap, which may lead to improved efficiency in the placement optimization. However, if the optimal solution is not non-slicing, this representation cannot obtain the optimal solution as this representation is restricted to slicing floor plan topologies. Fig.24.2 shows an example of a slicing structure. Recently, the sequence-pair, first suggested by Murata et al. 10, and bounded-sliceline grid (BSG), proposed by Nakatake et al.11, have been
586
S. Watanabe and T. Hiroyasu
proposed as solutions to this problem. The sequence-pair encodes the "leftright" and "up-down" positioning relations between blocks using two sequences of blocks. BSG can define orthogonal relations between blocks without physical dimensions. These methods are particularly suitable for stochastic algorithms, such as GA and simulated annealing (SA). These encoding schemes can represent not only slicing structure but also non-slicing structures. In this chapter, we used sequence-pair as the representation of a solution, as this method can perform more effective searches than BSG. The number of all sequence-pair combinations is smaller than that of BSG. 24.3.1.1. Sequence-Pair The sequence-pair is used to represent the solution of rectangular packing. Each block has the sequence-pair (F_, F+). In Fig. 24.3, an example of sequence-pair is shown. To express the relative position, packages are located on the sequence-pair surface. This surface consist of two axes: F_ and F+. These axes are not located perpendicularly and horizontally but leans 45 degrees. The relative positions of two blocks are defined by comparing the sequence-pair of the two blocks. Let blocks A and B have the sequence pairs (xa-,ya+) and (xb-,Ub+), respectively. In this case, there is a relationship between the positions of the blocks and the sequence pairs as follows: when xa- < a;j_ and ya+ < yb+, A is in the left side of B when xa- > Xb- and ya+ > yb+, A is in the right side of B when a;o_ < Xb- and ya+ > yb+, A is in the upper side of B when a;a_ > Xb- and ya_ < yb+, A is in the bottom side of B. In addition to the sequence-pair, each block has the orientation information 0. This information instructs the direction of the block arrangement. 24.3.1.2. Encoding System A gene of the GA consists of three parts: F_, F + , and 0. Fig. 24.3 shows the encoding for 6 blocks. The relative position (b) is derived from the encoding information (Fig. 24.3(c)). This position shows the floor plan (a). In this chapter, each block is settled lengthwise or breadthwise. Therefore, 0 takes a value of 0 or 1.
Multi- Objective Rectangular Packing Problem
587
Fig. 24.3. Encoding example of sequence-pair.
24.3.2. GA Operators For an effective search, it is necessary to choose an appropriate method. Especially, crossover, which is the primary method of optimization, must be chosen carefully. The traditional genetic crossover operator, one-point crossover, cannot be applied without modification to the combination problem, such as RP or TSP. Some crossovers for combination problems have been previously proposed. The three crossover methods, which are one of the most commonly used methods for combination problems16, can be described as follows (Fig.24.4 shows the concept of these crossover). Order crossover (OX) : Pass the left segment from parent 1. Construct the right segment by taking the remaining blocks from parent 2 in the same order. Partially mapped crossover (PMX): The right segments of both parents act as a partial mapping of pairwise exchanges to be performed on parent 1 to generate the offspring. Cycle crossover (CX): Start with the cell in location 1 of parent 1 (or
588
S. Watanabe and T. Hiroyasu
Fig. 24.4. Crossover operators . (a) Order crossover, (b) PMX crossover, (c) Cycle crossover.
any other reference point) and copy to location 1 of the offspring. The block, which is the same block at location 1 of parent 2, is searched in parent 1 and passed on to the offspring from there. This process continues until we complete a cycle and reach a block that has already been passed. However, these crossovers cannot provide an efficient search using sequence-pairs, as they do not take into account the features of sequencepairsJ. Therefore, an effective crossover must be considered as a position on an oblique grid (F_, F+) that is stetted by two sequences of blocks. Nakaya et al. proposed a new crossover for sequence-pair, known as Placement-based Partially Exchanging Crossover (PPEX) 12.
J
In this chapter, we do not describe the performance of these crossover operators. In our previous experience using sequence-pair, however, OX can obtain the best solutions, as compared to other methods. On the other hand, CX does not provide good solutions.
Multi-Objective Rectangular Packing Problem
589
24.3.2.1. Placement-Based Partially Exchanging Crossover Here, we used the Placement-based Partially Exchanging Crossover (PPEX) 12. PPEX makes a window-territory located in the neighborhood of blocks chosen at random. This window-territory is a continuous part of the oblique-grid that is denned by the sequence-pair. PPEX performs a crossover that exchanges blocks within this window-territory. Therefore, PPEX can exchange blocks within the neighborhood position. The PPEX procedure is illustrated as follows. Step 1: Two blocks are chosen randomly as parent blocks. Step 2: The window-territory is created in the neighborhood of the chosen blocks. Let Mc be the set of blocks within window-territory and Mnc be the rest of the blocks. Step 3: Each block of Mc is exchanged according to the sequence of its partner parent and is copied to the child. Step 4: Mnc are directly copied to the child. Fig.24.5 displays PPEX when the window-territory size is 4. In Parent 2, blocks of a and e are chosen for Mc, and blocks of Mc are exchanged. In this exchange, the relative position of the other parent is referenced. Then, these blocks are copied to the child. With the location information of Parent 1, a, e and / are moven then copied to child 2. 24.3.2.2. Mutation Operator In this chapter, we describe the use of bit flip of the orientation for block(0). That is, if 9 is 1, let 9 be 0. In the opposite case, if 9 is 0, let 9 be 1. 24.4. Multi-Objective Optimization Problems by Genetic Algorithms and Neighborhood Cultivation GA 24.4.1. Multi-Objective Algorithm
Optimization
Problems and Genetic
Several objectives are used in multi-objective optimization problems. These objectives usually cannot be minimized or maximized at the same time due to a tradeoff relationship among them 7 . Therefore, one of the goals of the multi-objective optimization problem is to find a set of Pareto optimal solutions.
590
5. Watanabe and T. Hiroyasu
Fig. 24.5. Placement-based Partially Exchanging Crossover (PPEX).
The Genetic Algorithm is an algorithm that simulates the heredity and evolution of living things 7 . As it is a multi-point search method, an optimum solution can be determined even when the landscape of the objective function is multi-modal. It can also find a Pareto optimum set with one trial in multi-objective optimization. As a result,the GA is a very effective tool for multi-objective optimization problems. There is a great deal of research concerned with multi-objective GA. Also, many new evolutionary algorithms for multi-objective optimization have been recently developed 3.5.7.8.18. Multi-objective genetic algorithms can be roughly divided into two categories: algorithms that treat Pareto optimal solutions implicitly and those that treat Pareto optimal solutions explicitly 7 . Many of the newest methods treat Pareto optimal solutions explicitly. Typical algorithms that treat Pareto optimal solutions explicitly include
Multi-Objective Rectangular Packing Problem
591
NSGA-II 5 and SPEA2 18. These algorithms have the following similar schemes: 1) Mechanism responsible for retaining nondominated solutions 2) Cut down (sharing) method for maintaining diversity among the nondominated solutions retained 3) Unification mechanism of values of each objective These mechanisms derive good Pareto optimal solutions. Consequently, a competitive multi-objective genetic algorithm should have these mechanisms. 24.4.2. Neighborhood Cultivation Genetic
Algorithm
In this section, we describe the mechanism of a new algorithm called Neighborhood Cultivation Genetic Algorithm (NCGA). NCGA has a neighborhood crossover mechanism in addition to the mechanisms of GAs that were explained in the previous section. In GAs, exploration and exploitation are very important. By exploration, an optimum solution can be found around the elite solution. By exploitation, an optimum solution can be found in a global area. In NCGA, the exploitation factor of the crossover is reinforced. In the crossover operation of NCGA, a pair of individuals for crossover are not chosen randomly, but individuals that are close to each other are chosen. As a result of this operation, child individuals that are generated after the crossover may be close to the parent individuals, and therefore precise exploitation is expected. Let us denote the search population at generation t by Pt- Also we denote the archive population at generation tbyAt. Using these notations, the overall flow of NCGA can be described as follows. Step 1: Initialization: Generate an initial population Po- Population size is N. Set t = 0. Calculate fitness values of the initial individuals in -Po- Copy PQ into AQ. Archive size is also N. Step 2: Start new generation: set t = t + 1. Step 3: Generate new search population: Pt = At-\. Step 4: Sorting: Individuals of Pt are sorted according to the values of the focused objective. The focused objective is changed at every generation. For example, when there are three objectives, the first objective is focused in the first generation and the third objective is focused in the third generation. The first objective is focused again in the fourth generation. Step 5: Grouping: Pt is divided into groups consisting of two individuals.
592
5. Watanabe and T. Hiroyasu
Fig. 24.6. Neighborhood crossover.
These two individuals are chosen from the top to the bottom of the sorted individuals. Step 6: Crossover and Mutation: In a group, crossover and mutation operations are performed. From two parent individuals, two child individuals are generated. Here, parent individuals are eliminated. Step 7: Evaluation: All of the objectives of individuals are derived. Step 8: Assembling: All the individuals are assembled into one group and this becomes the new FtStep 9: Renewing archives: Assemble Pt and At-\ together. The N individuals are chosen from 2N individuals. To reduce the number of individuals, the same operation of SPEA2 (Environment Selection) is performed. In NCGA, this environment selection is applied as a selection operation. Step 10: Termination: Check the terminal condition. If it is satisfied, the simulation is terminated. If not, the simulation returns to Step 2. In NCGA, most of the genetic operations are performed in a group consisting of two individuals. The neighborhood crossover is performed for a crossover operations with population that is sorted according to the values of the focused objective. As two adjacent individuals of the sorted population are relatively close from each other in objective space, a "neighborhood crossover" is realized by using two adjacent individuals. The concept of neighborhood crossover is shown in Fig.24.6.
Multi-Objective Rectangular Packing Problem
593
However, if the focused objective has completely converged, applying crossover over a pair of individuals may cause no changes at the final stages of the search. Therefore, we use the following techniques within our crossover operator: 1) The focused objective is changed at every generation. 2) The sorted population is slightly disturbed by using "neighborhood shuffle". The focused objective is changed one by one at every generation. For example, when there are three objectives, the first objective is focused in the first generation and the third objective is focused in the third generation. The first objective is focused again in the fourth generation. The "neighborhood shuffle" is a technique, which randomly shuffles the population within a definite range. The range of neighborhood shuffle is defined as 10 percent of the population size. For example, when the population size is 100, the population is randomly shuffled within a range of size 10. To use these techniques,@the parents subject to crossover should be changed at every generation, even if the population had stayed unchanged. In addition, an exchange of individuals would be more active. The following features of NCGA are the differences between SPEA2 and NSGA-II. 1) NCGA has a neighborhood crossover mechanism. 2) NCGA has only environment selection and does not have mating selection15.
24.5. Numerical Examples In this chapter, we describe the application of NCGA to some numerical experiments. We used four instances of this problem: ami33, ami49, rdmlOO, and rdm500. The instances ami33 and ami49, whose data are in the MCNC benchmark, consist of 33 and 49 blocks (rectangles). The instances rdmlOO and rdm500 were randomly generated and have 100 and 500 rectangles, respectively. The results were compared with those of SPEA2 18 , NSGA-II 5 , and nonNCGA. Non-NCGA is the same algorithm as NCGA without neighborhood crossover. k If there are diverse solutions that have the same design variables, neighborhood crossover may not perform effectively. Therefore, the search population (Pf) is produced by making a copy of the archive population (At).
594
S. Watanabe and T. Hiroyasu Table 24.96. GA Parameters population size 200 crossover rate 1.0 mutation rate I/bit length terminal generation 400
Fig. 24.7. Sampling of the Pareto frontier lines of intersection
24.5.1. Parameters
of GAs
Table 24.96 displays the GA parameters used. We used the previously described GA operator, PPEX and the bit flip of block orientation. The length of the chromosome is three times as long as the number of blocks. 24.5.2. Evaluation Methods To compare the results obtained by each algorithm, the following evaluation methods were used. 24.5.2.1. Sampling of the Pareto Frontier Lines of Intersection(ILI) This comparison method was reported by Knowles and Corne 8 . The concept of this method is shown in Fig. 24.7. This figure illustrates two solution sets of X and Y derived by the different methods. The following three steps are the comparison procedures. Firstly, the attainment surfaces defined by the approximation sets are calculated. Secondly, the uniform sampling lines that cover the Pareto tradeoff area are defined. For each line, the intersections of the line and the attainment surfaces of the derived sets are obtained. These intersections are then compared. Finally, the Indication of Lines of Intersection (ILI) is derived. When the
Multi-Objective Rectangular Packing Problem
595
Fig. 24.8. Example of IMMA-
two approximation sets X and Y are considered, ILJ(X,Y) indicates the average number of points X that are ranked higher than Y. Therefore the most significant outcome would be ILI{X,Y) = 1.0 and ILI(Y,X) = 0.0. To focus only on the Pareto tradeoff area as defined by the approximation sets and to derive the intuitive evaluation value, the following terms are considered: • The objective values of approximation sets are normalized. • The sampling lines are located in the area where the approximation sets exist. • Many sampling lines are prepared. In the following experiment, 1000 lines were used. 24.5.2.2. Maximum, Minimum and Average Values of Each Object of Derived Solutions (IMMA) To evaluate the derived solutions, not only the accuracy but also the spread of the solutions is important. To discuss the spread of the solutions, the maximum, minimum, and average values of each object are considered. Figure 24.8 shows an example of this measurement. In this figure, the maximum and minimum values of the objective function are illustrated and the medium value is shown as a circle. 24.5.3. Results In this chapter, we examined four types of problem: ami33, ami49, rdmlOO, and rdm500 blocks. In this section, we discuss only the instances ami33 and rdm500. Proposed NCGA, SPEA2, NSGA-II, and non-NCGA (NCGA without
596
S. Watanabe and T. Hiroyasu
Fig. 24.9. Placement of the blocks(ami33).
neighborhood crossover) were applied to these problems. Thirty trials were performed and all results shown are the averages of 30 trials. 24.5.3.1. Layout of the Solution It should be verified whether solutions that are derived by the algorithm are opposite placement of blocks. In this section, we focus on ami33, which consisted of 33 blocks. The placement of ami33, which is presented by solutions of NCGA, is shown in Fig. 24.9. Some of the typical solutions are illustrated in Fig. 24.9. As this is a combination of the N\x N\x 2N problem with N blocks, the real optimum solutions were not derived. In this experiment, 80,000 function calls (200 individuals and 400 generations) were performed. These results may be reasonable, as there were very few blank spaces. We also used a sequence-pair and PPEX to derive good solutions as these techniques are very suitable
Multi-Objective Rectangular Packing Problem
Fig. 24.11.
597
7jvfMA °f ami33
for GAs and RP. 24.5.3.2. ami33 The results of ami33, ILI are shown in Fig. 24.10, and those of IMMA are shown in Fig. 24.11. Fig. 24.12 shows the nondominated solutions of each algorithm. In this figure, all nondominated solutions derived from the 30 trials are plotted. ILI of Fig. 24.10 indicates that solutions of NCGA are closer to the real Pareto solutions than those obtained by the other methods. This is also confirmed by the plots of the nondominated solutions(Fig.24.12). It is also clear from IMMA of Fig. 24.11 that NCGA and non-NCGA can find more widely spread nondominated solutions as compared to the other methods. Non-NCGA can obtain widely spread nondominated solutions. However, as compared to the real Pareto solutions, non-NCGA is not ideal. This result
598
S. Watanabe and T. Hiroyasu
Fig. 24.12. Nondominated solutions(ami33).
Fig. 24.13. Results of/ L /(rdm500).
shows that neighborhood crossover can derive good solutions in RP. 24.5.3.3. rdm500 The results of rdm500 are shown in Fig. 24.13 and Fig. 24.14. Fig. 24.15 illustrates the nondominated solutions of the different algorithms. The results from this problem showed a similar trend to those of the previous problem. From Fig.24.13 and Fig.24.15, it is clear that NCGA
Multi-Objective Rectangular Packing Problem
599
Fig. 24.14. IMMA of rdm500
Fig. 24.15. Nondominated solutions(rdm500).
obtained a better value of ILI', i.e., the solution of NCGA was much better than those of the other methods. Similarly to the previous problem, the solutions of non-NCGA were far from the real Pareto front. Therefore, the neighborhood crossover was very effective to derive good solutions in RP, irrespective of the number of blocks. On the other hand, in this problem, the solutions of SPEA2 and NSGAII were gathered around the center of the Pareto front. These observations indicate that SPEA2 and NSGA-II tend to concentrate in one part of the Pareto front when the number of blocks is very large. On the other hand,
600
S. Watanabe and T. Hiroyasu
Fig. 24.14 and Fig. 24.15 indicate that NCGA and non-NCGA maintained high degrees of diversity of their solutions during the search even if the number of blocks was very large. 24.6. Conclusion In this chapter, we described the implementation of MOGA for the MultiObjective Rectangular Packing Problem (RP). We described the formulation of RP, implementation of GA to RP, and our experience with RP using GA. The main issues associated with the implementation of GA to RP are the representation of a solution and the appropriate GA operator. In this chapter, we explain sequence-pair as an effective representation of placement, and PPEX as an effective crossover in cases using sequence-pair. In addition, based on our experience using GA for RP, Neighborhood Cultivation GA (NCGA), which has not only the important mechanism of the other methods but also the mechanism of neighborhood crossover selection, was applied to Multi-Objective RP. We confirmed that MOGA is a very effective method for RP. In addition, NCGA can obtain the best solutions as compared to other methods. Through numerical examples, the following points were clarified. 1) The RP described in this chapter is a large scale problem. For this problem, a reasonable solution is derived with a small calculation cost. It is assumed that a sequence-pair and PPEX work well in this problem. 2) In almost all the test functions, the results of NCGA were superior to those of the other methods. From this result, it can be noted that NCGA is a good method for the RP. 3) NCGA was obviously superior to NCGA without neighborhood crossover in all problems. The results emphasized that neighborhood crossover allows the derivation of good solutions in RP. 4) When the number of blocks is very large, the solutions of SPEA2 and NSGA-II tend to concentrate in the center of the Pareto front. However, NCGA and non-NCGA could retain diversity of the solutions. References 1. T. Arslan, D. H. Horrocks, and E. Ozdemir. Structural synthesis of cellbased vlsi circuits using a multi-objective genetic algorithm. In IEE Electronic Letters, volume 32, pages 651-652, 1996.
Multi-Objective Rectangular Packing Problem
601
2. F. Balasa and K. Lampaert. Symmetry within the sequence-pair representation in the context of placement for analog design. In IEEE Trans, on Comp.-Aided Design of IC's and Systems, volume 19, pages 721-731, 2000. 3. C. A. Coello Coello, D.A. Van Veldhuizen, and G. B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York, May 2002. ISBN 0-3064-6762-3. 4. J. P. Cohoon and W. D. Paris. Genetic placement. In Proceedings of The IEEE International Conference on Computer-Aided Design, pages 422-425, 1986. 5. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2): 182-197, April 2002. 6. D. Gopinath, Y. K. Joshi, and S. Azarm. Multi-objective placement optimization of power electronic devices on liquid cooled heat sinks. In the Seventeenth Annual IEEE Symposium on Semiconductor Thermal Measurement and Management, pages 117-119, 2001. 7. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Chichester, UK:Wiley, 2001. 8. J. D. Knowles and D. W. Corne. Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. In Evolutionary Computation, volume 8, pages 149-172, 2000. 9. J. Lienig and K. Thulasiraman. A genetic algorithm for channel routing in vlsi circuits. In Evolutionary Computation, volume 1, pages 293-311, 1994. 10. H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani. VLSI Module Placement Based on Rectangle-Packing by the Sequence-Pair. In IEEE Transactions on Computer Aided Design, volume 15, pages 1518-1524, 1996. 11. S. Nakatake, H. Murata, K. Fujiyoshi, and Y. Kajitani. Module Placement on BSG-Structure and IC Layout Applications. In Proc. of International Conference on Computer Aided Design '96, pages 484-491, 1996. 12. S. Nakaya, S. Wakabayashi, and T. Koide. An adaptive genetic algorithm for vlsi floorplanning based on sequence-pair. In 2000 IEEE International Symposium on Circuits and Systems, (ISCAS2000), volume 3, pages 65-68, 2000. 13. M. J. O'Dare and T. Arslan. Generating test patterns for vlsi circuits using a genetic algorithm. In IEE Electronics Letters, volume 30, pages 778-779, 1994. 14. P. Grignon, J. Wodziack, and G. M. Fadel. Bi-objective optimization of components packing using a genetic algorithm. In In NASA/AIAA/ISSMO Multidisciplinary Design and Optimization Conference, pages 352-362, 1996. 15. V. Schnecke and O. Vornberger. An adaptive parallel genetic algorithm for vlsi-layout optimization. In 4th Conf. Parallel Problem Solving from Nature (PPSN IV), pages 859-868, 1996. 16. K. Shahookar and P. Mazumber. A genetic approach to standard cell placement using meta-genetic parameter optimization. In IEEE Transaction on Computer-Aided Design, volume 9, pages 500-511, 1990. 17. S. Watanabe, T. Hiroyasu, and M. Miki. Neighborhood cultivation genetic
602
5. Watanabe and T. Hiroyasu
algorithm for multi-objective optimization problems. In Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution And Learning (SEAL2002), pages 198-202, 2002. 18. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. In K. Giannakoglou, D. Tsahalis, J. Periaux, P. Papailou, and T. Fogarty, editors, EUROGEN 2001. Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, pages 95-100, Athens, Greece, 2002.
CHAPTER 25 MULTI-OBJECTIVE ALGORITHMS FOR ATTRIBUTE SELECTION IN DATA MINING
Gisele L. Pappa and Alex A. Freitas Computing Laboratory, University of Kent, Canterbury CT2 7NF, UK , E- mail: {glp6,A.A.Freitas}@kent.ac.uk http://wwui. cs.kent. ac.uk/people/staff/aaf Celso A. A. Kaestner Graduate Program in Applied Computer Science Pontificia Universidade Catolica do Parana (PUCPR) Rua Imaculada Conceicao, 1155 80215-901 Curitiba - PR - Brazil E-mail: [email protected] Attribute selection is an important preprocessing task for the application of a classification algorithm to a given data set. This task often involves the simultaneous optimization of two or more objectives. In order to solve this problem, this chapter describes two multi-objective methods: a genetic algorithm and a forward sequential feature selection method. Both methods are based on the wrapper approach for attribute selection and were used to find the best subset of attributes that minimizes the classification error rate and the size of decision tree built by a well-known classification algorithm, namely C4.5.
25.1. Introduction Attribute selection is one of the most important preprocessing tasks to be performed before the application of data mining techniques. In essence, it consists of selecting a subset of attributes relevant for the target data mining task, out of all original attributes. In this work the target task is classification, where the goal is to predict the class of an example (record) given the values of the attributes describing that example. Attribute selec603
604
G.L. Pappa, A.A. Preitas and C.A.A. Kaestner
tion became essential when researches discovered it can improve the data mining algorithm's performance (with respect to learning speed, classification rate and/or rule set simplicity) and at the same time remove noise and decrease data dimensionality. In face of the importance of attribute selection, a variety of methods have been used in order to find a small attribute subset capable of obtaining a better classification rate than that obtained with the entire attribute set. These methods include sequential search1, ranking techniques2 and evolutionary algorithms3. Independent of the method used to solve the problem of attribute selection, solving this problem often requires the minimization of at least two objectives: the classification error rate and a measure of size — which can be a measure of size of the selected data (typically the number of selected attributes) and/or a measure of size of the classifier (say, a rule set) learned from the selected data. Many attribute selection methods optimize these objectives setting weights to each one and combining them in a single function. However, the study of multi-objective optimization has shown that, in some tasks, a weighted combination of the objectives to be optimized in a single function is not the most effective approach to solve the problem. Mainly in tasks that deal with optimization of conflicting objectives, such as attribute selection, the use of the Pareto's dominance concept during optimization can be the best choice. The optimization based on the Pareto's concept4 suggests that, for each of the conflicting objectives to be optimized, exists an optimal solution. So, the final response of the optimization system is a set of optimal solutions instead of a single solution. This is in contrast with systems that intend to optimize a single objective. Hence, it is left to the user to decide which of the optimal solutions he/she considers the best to solve his/her problem, using his/her background knowledge about the problem. In this spirit, this work presents two multi-objective attribute selection algorithms based on the Pareto's dominance concept. One of them is a multi-objective genetic algorithm, and the other one is a multi-objective version of the well-known forward sequential feature selection method. Both methods use the wrapper approach (see next section) in order to minimize the error rate and the size of the decision tree built by a well-known classifier, namely C4.5. We report the results of extensive computational experiments with 18 public domain real-world data sets, comparing the performance of these
Multi-Objective Algorithms for Attribute Selection in Data Mining
605
two methods. The results show that both methods effectively select good attribute subsets — by comparison with the original set of all attributes — and, somewhat surprisingly, the multi-objective forward sequential selection method is competitive with the multi-objective genetic algorithm. 25.2. Attribute Selection As mentioned earlier, attribute selection is an important step in the knowledge discovery process and aims to select a subset of attributes that are relevant for a target data mining task. In the classification task, which is the task addressed in this work, an attribute is considered relevant if it is useful for discriminating examples belonging to different classes. We can find in the literature a lot of attribute selection methods. These methods differ mainly in the search strategy they use to explore the space of candidate attribute subsets and in the way they measure the quality of a candidate attribute subset. With respect to the search strategy, the methods can be classified as exponential (e.g. exhaustive search), randomized (e.g. genetic algorithms) and sequential. The exponential methods are usually too computationally expensive, and so are not further discussed here. The sequential methods include the well-known FSS (forward sequential selection) and BSS (backward sequential selection)5. FSS starts with an empty set of attributes (features) and iteratively selects one-attribute-ata-time — the attribute considered most relevant for classification at the current step — until classification accuracy cannot be improved by selecting another attribute. BSS starts with the full set of original attributes and iteratively removes one-attribute-at-a-time — the attribute considered least relevant for classification at the current step — as long as classification accuracy is not decreased. We have developed a multi-objective version of the FSS method, which will be described later. With respect to randomized methods, in this chapter we are particularly interested in genetic algorithms, due to their ability to perform a global search in the solution space. In our case, this means that they tend to cope better with attribute interaction than greedy, local-search methods (such as sequential methods)3. We have also developed a multi-objective genetic algorithm (GA) for attribute selection, which will be described later. The evaluation of the quality of each candidate attribute subset can be based on two approaches: the filter or the wrapper approach. The main difference between them is that in the wrapper approach the evaluation
606
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
function uses the target classification algorithm to evaluate the quality of a candidate attribute subset. This is not the case in the filter approach, where the evaluation function is specified in a generic way, regardless of the classification algorithm. That is, in the wrapper approach the quality of a candidate attribute subset depends on the performance of the classification algorithm trained only with the selected attributes. This performance can be measured with respect to several factors, such as classification accuracy and size of the classifier learned from the selected data. Indeed, these are the two performance measures used in this work, as will be seen later. Although the wrapper approach tends to be more expensive than the filter approach, the wrapper approach usually obtains better predictive accuracy that the filter approach, since it finds an attribute subset "customized" for the target classification algorithm. The vast majority of GAs for attribute selection follow the wrapper approach. Table 25.97, adapted from Freitas3, shows the criteria used in the fitness function of a number of GAs for attribute selection following the wrapper approach. As can be observed in Table 25.97, there are many criteria that can be used in the fitness of a GA for attribute selection, but all the GAs mentioned in the table use classification accuracy, and many GAs use either the number of selected attributes or the size of the classifier learned from the data. Note that only one of the GAs mentioned in Table 25.97 is a multi-objective method — all the other GAs either try to optimize a single objective (predictive accuracy) or use some method (typically a weighted formula) to combine two or more objectives into a single objective to be optimized. 25.3. Multi-Objective Optimization Real world problems are usually complex and require the optimization of many objectives to reach a good solution. Unfortunately, many projects that should involve the simultaneous optimization of multiple objectives avoid the complexities of such optimization, and adopt the simpler approach of just weighing and combining the objectives into a single function. This simpler approach is not very effective in many cases, due to at least two reasons. First, the objectives are often conflicting with each other. Second, the objectives often represent different and non-commensurate aspects of a candidate solution's quality, so that mixing them into a single formula is not semantically meaningful. Indeed, both reasons hold in our case, where the
Multi-Objective Algorithms for Attribute Selection in Data Mining
607
Table 25.97. Main aspects of fitness functions of GAs for attribute selection Reference [Bala et al. 1995]6 [Bala et al. 1996]7 [Chen et al. 1999]8 [Guerra-Salcedo & Whitley 1998]9 [Guerra-Salcedo et al. 1999]10 [Cherkauer & Shavlik 1996] n [Terano & Ishino 1998]12 [Vafaie & DeJong 1998]13 [Yang & Honavar 1997, 1998]14'15 [Moser & Murty 2000]16 [Ishibuchi &: Nakashima 2000]17
[Emmanouilidis et al. 2000] ls [Rozsypal & Kubat 2003]19
[Llora & Garrell 2003]20
Criteria used in fitness function predictive accuracy, number of selected attributes predictive accuracy, information content, number of selected attributes based first on predictive accuracy, and then on number of selected attributes predictive accuracy predictive accuracy predictive accuracy, number of selected attributes, decision-tree size subjective evaluation, predictive accuracy, rule set size predictive accuracy predictive accuracy, attribute cost predictive accuracy, number of selected attributes predictive accuracy, number of selected instances, number of selected attributes (attribute and instance selection) predictive accuracy, number of selected attributes (multi-objective evaluation) predictive accuracy, number of selected instances, number of selected attributes (attribute and instance selection) predictive accuracy
two objectives to be minimized — classification error rate and decision-tree size are to some extent conflicting and entirely non-commensurate. According to the multi-objective optimization concept, when many objectives are simultaneously optimized, there is no single optimal solution. Rather, there is a set of optimal solutions, each one considering a certain trade-off among the objectives21. In this way, a system developed to solve this kind of problem returns a set of optimal solutions, and can be left to the user to choose the one that best solves his/her specific problem. This means that the user has the opportunity of choosing the solution that represents the best trade-off among the conflicting objectives after examining several high-quality solutions. Intuitively, this is better than forcing the user to define a single trade-off before the search is performed, which is what happens when the multi-objective problem is transformed in a single-objective one.
608
G.L. Poppa, A.A. Preitas and C.A.A. Kaestner
The Pareto's multi-objective optimization concept is used to find this set of optimal solutions. According to this concept, a solution Si dominates a solution S2 if and only if4: • Solution Si is not worse than solution S2 in any of the objectives; • Solution Si is strictly better than solution 52 in at least one of the objectives. Figure 1 shows an example of possible solutions found for a multiobjective attribute selection problem. The solutions that are not dominated by any other solutions are considered Pareto-optimal solutions, and they are represented by the dotted line in Figure 1.
Fig. 25.1. Example of Pareto dominance in a two-objective problem
Note that Solution A has a small decision-tree size but a large error rate. Solution D has a large decision-tree size but a small error rate. Assuming that minimizing both objectives is important, one cannot say that solution A is better than D, nor vice-versa. On the other hand, solution C is clearly not a good solution, since it is dominated, for instance, by D. 25.4. The Proposed Multi-Objective Methods for Attribute Selection In the last few years, the use of multi-objective optimization has led to improved solutions for many different kinds of problems21. So, in order to evaluate the effectiveness of the multi-objective framework in the attribute selection problem for the classification task, we proposed a multi-objective
Multi-Objective Algorithms for Attribute Selection in Data Mining
609
genetic algorithm22 (MOGA) that returns a set of non-dominated solutions. We also proposed a multi-objective version of the forward sequential selection (FSS) method23. The goal of these proposed algorithms is to find a subset of relevant attributes that leads to a reduction in both classification error rate and complexity (size) of the decision tree built by a data mining algorithm. The classification algorithm used in this paper is C4.525, a well-known decision tree induction algorithm. The proposed methods are based in the wrapper approach, which means they use the target data mining algorithm (C4.5) to evaluate the quality of the candidate attribute subsets. Hence, the methods' evaluation functions are based on the error rate and on the size of the decision tree built by C4.5. These two criteria (objectives) are to be minimized according to the concept of Pareto dominance. The next subsections present the main aspects of the proposed methods. The reader is referred to Pappa22'23 for further details.
25.4.1. The Multi-Objective
Genetic Algorithm
(MOGA)
A genetic algorithm (GA) is a search algorithm inspired by the principle of natural selection. It works evolving a population of individuals, where each individual is a candidate solution to a given problem. Each individual is evaluated by a fitness function, which measures the quality of its corresponding solution. At each generation (iteration) the fittest (the best) individuals of the current population survive and produce offspring resembling them, so that the population gradually contains fitter and fitter individuals — i.e., better and better candidate solutions to the underlying problem. For a comprehensive review of GAs in general the reader is referred to Michalewicz24. For a comprehensive review of GAs applied to data mining the reader is referred to Freitas3. The motivation for developing a multi-objective GA for attribute selection was that: (a) GAs are a robust search method, capable of effectively exploring the large search spaces often associated with attribute selection problems; (b) GAs perform a global search, so that they tend to cope better with attribute interaction than greedy search methods, which is also an important advantage in attribute selection; and (c) GAs already work with a population of candidate solutions, which makes them naturally suitable for multiobjective problem solving4, where the search algorithm is required to consider a set of optimal solutions at each iteration.
610
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
25.4.1.1. Individual Encoding In the proposed GA, each individual represents a candidate subset of selected attributes, out of all original attributes. Each individual consists of M genes, where M is the number of original attributes in the data being mined. Each gene can take on the value 1 or 0, indicating that the corresponding attribute occurs or not (respectively) in the candidate subset of selected attributes. 25.4.1.2. Fitness Function The fitness (evaluation) function measures the quality of a candidate attribute subset represented by an individual. Following the principle of multiobjective optimization, the fitness of an individual consists of two quality measures: (a) the error rate of C4.5; and (b) the size of the decision tree built by C4.5. Both (a) and (b) are computed by running C4.5 with the individual's attribute subset only, and by using a hold-out method to estimate C4.5's error rate, as follows. First, the training data is partitioned into two mutually-exclusive data subsets, the building subset and the validation subset. Then we run C4.5 using as its training set only the examples (records) in the building subset. Once the decision tree has been built, it is used to classify examples in the validation set. 25.4.1.3. Selection Methods and Genetic Operators At each generation (iteration) of the GA, the next population of individuals is formed as follows. First the GA selects all the non-dominated individuals of the current generation, which are then passed unaltered to the next generation by elitism26. Elitism is a common procedure in MOGAs. It avoids that non-dominated individuals disappear from the population due to the stochastic nature of selection operators. However, a maximum number of elitist individuals has to be fixed to avoid that the next population consist only of elitist individuals, which would prevent the creation of new individuals, stopping the evolutionary process. This maximum number of elitist individuals was set to half the population size. If the number of nondominated individuals is larger than half the population size, that number of elitist individuals is chosen by the tie-breaking criterion explained later. Once elitist reproduction has been performed, the remainder of the next generation's population is filled in with new "children" individuals, generated from "parent" individuals from the current generation. The parent
Multi-Objective Algorithms for Attribute Selection in Data Mining
611
individuals are chosen by tournament selection with a tournament size of 2. Then children are generated from parents by applying conventional uniform crossover and bit-flip mutation. The tournament selection procedure is adapted for multi-objective search as follows. The fitness of an individual is a vector with values for two objectives: the error rate and decision-tree size associated with the attribute subset represented by the individual. The selection of the best individual is based on the concept of Pareto dominance, taking into account the two objectives to be minimized. Given two individuals I\ and 72 playing a tournament, there are two possible situations. The first one is that one of the individuals dominates the other. In this case the former is selected as the winner of the tournament. The second situation is that none of the individuals dominates the other. In this case, we use the following tie-breaking criterion to determine the fittest individual. For each of the two individuals /», i=l,2, the GA computes Xi as the number of individuals in the current population that are dominated by U, and Yi as the number of individuals in the current population that dominate U. Then the GA selects as the best the individual Ii with the largest value of the formula: Xi -Yi. Finally, if It and Is have the same value of the formula Xi - Yi (which is rarely the case), the tournament winner is simply chosen at random. In all our experiments the probabilities of crossover and mutation were set to 80% and 1%, respectively, which are relatively common values in the literature. The population size was set to 100 individuals, which evolve for 50 generations. These values were used in all our experiments. 25.4.2. The Multi-Objective Forward Sequential Selection Method (MOFSS) A single-objective optimization and a multi-objective optimization method differ mainly in the number of optimal solutions that they return. Hence, the first step to convert the traditional FSS into a multi-objective method is to make it able to return a set of optimal solutions instead of a single solution. This first point was resolved by creating a list of all non-dominated solutions generated by the MOFSS until the current iteration of the algorithm. This concept of a external list of non-dominated solutions was inspired by some MOGAs in literature such as SPEA27, that maintain all the non-dominated individuals in an external population.
612
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
The proposed MOFSS starts as the traditional FSS: a subset of solutions is created and evaluated. The evaluation of each solution considers both the error rate and the decision tree size generated by C4.5 during training. As in the proposed MOGA, the values of these objectives to be minimized are stored and later used to judge a solution as better or worse than other. Each new solution of the current iteration is compared with every other solution of the current iteration, in order to find all non-dominated solutions in the current iteration. Then the non-dominated solution list, L, is updated. This update consists in comparing, through the Pareto's dominance concept, the solutions in the list with the non-dominated solutions of the current iteration. More precisely, for each non-dominated solution S of the current iteration, 5 will be added to the list L only if S is not dominated by any solution in L. It is also possible that S dominates some solution(s) in L. In this case those dominated solutions in L are, of course, removed from L. The non-dominated solution list is the start point for generating new candidate solutions. At each iteration, each solution in the current list is extended with each new attribute (different from the ones that occur in the current solution), and the process starts again, until no more updates can be made in the non-dominated solution list. 25.5. Computational Results Experiments were executed with 18 public-domain, real-world data sets obtained from the UCI (University of California at Irvine)'s data set repository28. The number of examples, attributes and classes of these data sets is shown in Table 25.98. All the experiments were performed with a well-known stratified 10-fold cross-validation procedure. For each iteration of the cross-validation procedure, once the MOGA/MOFSS run is over we compare the performance of C4.5 using all the original attributes (the "baseline" solution) with the performance of C4.5 using only the attributes selected by the MOGA/MOFSS. Recall that the MOGA/MOFSS can be considered successful to the extent that the attributes subsets selected by it lead to a reduction in the error rate and size of the tree built by C4.5, by comparison with the use of all original attributes. As explained before, the solution for a multi-objective optimization problem consists of all non-dominated solutions (the Pareto front) found. Hence, each run of the MOGA outputs the set of all non-dominated so-
Multi-Objective Algorithms for Attribute Selection in Data Mining Table 25.98. periments
613
Main characteristics of the data sets used in the ex-
Data Set Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick-euthyroid Tic tac toe Vehicle Votes Wine Wisconsin breast-cancer
# examples
# attributes
# classes
269 4 6 6 15 34 10 34 4 22 8 57 25 9 18 16 13 9
452 625 345 1717 690 366 214 351 150 8124 768 106 3163 958 846 435 178 699
16 3 2 4 2 6 7 2 3 2 2 2 2 2 4 2 3 2
lutions (attribute subsets) present in the last generation's population and each run of the MOFSS outputs the solutions stored in the non-dominated solution list in the last iteration. In a real-world application, it would be left to the user the final choice of the non-dominated solution to be used in practice. However, in our research-oriented work, involving many different public-domain data sets, no user was available. Hence, we needed to evaluate the quality of the non-dominated attribute subsets returned by MOGA/MOFSS in an automatic, data-driven manner. We have done that in two different ways, reflecting two different (but both valid) perspectives, as follows. The first approach to evaluate the set of non-dominated solutions returned by MOGA and MOFSS is called Return All Non-Dominated Solutions. The basic idea is that we return all the non-dominated solutions found by the method, and we compare each of them, one-at-a-time, with the baseline solution — which consists of the set of all original attributes. Then we count the number of solutions returned by the MOGA and MOFSS that dominate or are dominated by the baseline solution, in the Pareto sense — with respect to the objectives of minimizing error rate and decision-tree size, as explained above. The second approach, called Return the "Best" Non-Dominated Solution
614
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
consists of selecting a single solution to be returned to the user by using the tie-breaking criterion described earlier. From a user's point of view, this is a practical approach, since the user often wants a single solution. Moreover, this decision making process makes the solution of the multiobjective problem complete, following its 3 potential stages of development: measurement, search and decision making29. There are many ways of setting preferences in a decision making process, as shown in Coello-Coello29, but we did not follow any of those approaches. For both MOGA and MOFSS we return the solution in the non-dominated set of the last generation (or iteration) with the highest value of the tiebreaking criterion - which is a decision-making criterion tailored for our algorithms and underlying application. Note that, once the number of solutions that dominates the solutions in the non-dominated set is zero, the formula of the tie-breaking criterion is reduced to Xi. Therefore, instead of explicit ranking the objectives, we rank the non-dominated solutions according the number of individuals they dominate in the last generation. The solution chosen through this method was compared with the baseline solution. There is one caveat when using this criterion in MOFSS. For this algorithm, we recalculate the tie-breaking criterion considering all the solutions generated in all the iterations of the method. That is, we calculate the number of solutions that are dominated by each of the solutions in the non-dominated solution list of the last iteration, considering all solutions generated by the method. The tie-braking criterion was recalculated because, for some data sets, the number of solutions in the non-dominated list at the beginning of the last iteration was small. As a result, few new solutions were generated in the last iteration. It was not fair to compare the solutions in that list just with those few solutions generated in the last generation, because the small number of solutions would lead to a low confidence (from a statistical point of view) in the result. In order to solve this problem, the tie-breaking criterion is recalculated using all generated solutions since the algorithm starts. There was no need to apply this procedure to MOGA, because this method has a larger number of solutions in the last iteration, providing enough solutions for a reliable computation of the tie-breaking criterion.
Multi- Objective Algorithms for Attribute Selection in Data Mining
615
25.5.1. Results for the "Return All Non-Dominated Solutions" Approach As explained earlier, the basic idea of this approach is that MOGA and MOFSS return all non-dominated solutions that they have found, and then we count the number of solutions returned by each of these methods that dominate or are dominated by the baseline solution. Tables 25.99 and 25.100 show, respectively, the results found by MOGA and MOFSS returning all the non-dominated solutions of the last generation (or iteration). Hereafter this version of the algorithms is called MOGA-all and MOFSS-all. In Tables 25.99 and 25.100 the second column shows the total number of solutions found by the method. The numbers after the "±" are standard deviations. The next columns show the relative frequency of the found solutions that dominate the baseline solution (column Fdominate), the relative frequency of the found solutions that are dominated by the baseline solution (column Fdominated) a n d the relative frequency of the found solutions that neither dominate nor are dominated by the baseline solution (column
Fneutral)Table 25.99.
Results found with MOGA-all Solutions found with MOGA-all
Data set Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pirna Promoters Sick- euthyroid Tic tac toe Vehicle Votes Wine Wisconsin
Total 3.9 ± 0.54 1.0 ± 0.0 6.1 ± 0.38 38.3 ± 0.76 4.55 ± 0.67 1.11 ± 0.11 46.9 ± 1.03 1.14 ± 0.14 4.4 ± 0.16 1.9 ± 0.18 18.3 ± 1.15 1.5 ± 0.16 25.4 ± 0.93 16.5 ± 1 . 0 6.1 ± 0.76 26.6 ± 1.63 4.66 ± 1.21 9.3 ± 0.4
Fdorninate 0.21 0.7 0.31 0.002 0.56 0.8 0 0.37 0.8 0.68 0.34 0.33 0.02 0 0.25 0.6 0.48 0.5
Fdominated 0.33 0 0 0 0.05 0 0.06 0.12 0.02 0 0 0 0.02 0 0.18 0 0.31 0.2
Fneutrat 0.46 0.3 0.69 0.998 0.39 0.2 0.94 0.5 0.18 0.32 0.66 0.67 0.96 1 0.57 0.4 0.21 0.3
As can be observed in Table 25.99, there are 6 data sets where the value
616
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
of Fdcminate is greater than 0.5 (shown in bold), which means that more than 50% of the MOGA-all's solutions dominated the baseline solution. In 9 out of the 18 data sets, no MOGA-all's solution was dominated by the baseline solution. There are only two data sets, namely arrhythmia and glass, where the value of Fdominate is smaller than the value of Fdominated (shown in bold), indicating that the MOGA was not successful in these two data sets. In any case, in these two data sets the difference between Fdominate and Fdominated is relatively small (which is particularly true in the case of glass), and the value of Fneutrai is greater than the values of both Fdominate and Fdominated-
In summary, in 14 out of the 18 data sets the value of Fdominate is greater than the value of Fdominated, indicating that overall MOGA-all was successful in the majority of the data sets. MOGA-all was very successful in 6 data sets, where the value of Fdominate was larger than 0.5 and much greater than the value of FdominatedIn Table 25.100, we can see that there are 7 data sets where the value of Fdominate is greater than 0.5 (shown in bold), which means that 50% or more of the MOFSS-all's solutions dominated the baseline solution. Remarkably, there are only two data sets — namely wine and Wisconsin breast cancer — where the number of MOFSS-all's solutions dominated by the baseline solution was greater than zero, and in the case of wine that number is very close to zero, anyway. There are two data sets where all MOFSS-all's solutions are neutral, namely dermatology and mushroom. In summary, in 16 out of the 18 data sets the value of Fdominate is greater than the value of Fdominated, indicating that overall MOFSS was successful in the vast majority of the data sets. MOFSS was very successful in 7 data sets, as mentioned above. 25.5.2. Results for the "Return the 'Best' Solution" Approach
Non-Dominated
Tables 25.101 and 25.102 show the results obtained by following this approach. These tables show results for error rate and tree size separately, as usual in the machine learning and data mining literature. Later in this section we show results (in Table 25.103) involving Pareto dominance, which consider the simultaneous minimization of error rate and tree size. In Tables 25.101 and 25.102 the column titled C4.5 contains the results for C4.5 ran with the baseline solution (all original attributes), whereas the columns titled MOGA-1 and MOFSS-1 contain the results for C4.5 ran with the sin-
Multi-Objective Algorithms for Attribute Selection in Data Mining Table 25.100.
617
Results found with MOFFS-all Solutions found with MOFFS-all
Data set Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick-euthyroid Tic tac toe Vehicle Votes Wine Wisconsin
Total 32.2 ± 10.82 1.8 ± 0.2 2.9 ± 0.31 4.3 ± 0.33 84.1 ± 2.05 76.5 ± 10.3 94.1 ± 5.24 12.9 ± 6.23 3.5 ± 0.34 51.9 ± 11.88 11.1 ± 1.88 66.6 ± 12.66 50.3 ± 6.44 8.1 ± 1.54 3.6 ± 0.16 98.4 ± 0.37 8.3 ± 6.1 10.1 ± 4.76
Fdominate
Fdominated
0.54 0.5 0.65 0.07 0.89 0 0.99 0.14 0.86 0 0.95 0.27 0.1 0.11 0.17 0.1 0.92 0.45
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.01 0.37
Fneutral 0.46 0.5 0.35 0.93 0.11 1 0.01 0.86 0.14 1 0.05 0.73 0.9 0.89 0.83 0.9 0.07 0.18
gle "best" non-dominated solution found by MOGA and MOFSS, using the criterion for choosing the "best" solution explained earlier. The figures in the tables are the average over the 10 iterations of the cross-validation procedure. The values after the "±" symbol represent the standard deviations, and thefiguresin bold indicate the smallest error rates/tree sizes obtained among the three methods. In the columns MOGA-1 and MOFSS-1, the symbol "+" ("-") denotes that the results (error rate or tree size) of the corresponding method is significantly better (worse) than the result obtained with the baseline solution. The difference in error rate or tree size between the columns MOGA-l/MOFSS-1 and C4.5 are considered significant if the corresponding error rate or tree size intervals — taking into account the standard deviations — do not overlap. The last two lines of Tables 25.101 and 25.102 summarize the results of these tables, indicating in how many data sets MOGA-l/MOFSS-1 obtained a significant win/loss over the baseline solution using C4.5 with all original attributes. In Tables 25.101 and 25.102, the results of MOFSS-1 for the dataset Arrhythmia are not available due to the large number of attributes in this data set, 269. This leads to a too large number of solutions generated along all iterations of the algorithm, so that re-calculating the tie-breaking criterion considering all the generated solutions was impractical with the machine
618
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner Table 25.101.
Error rates obtained with C4.5, MOGA-1 and MOFSS-1 Error Rate (%)
Data set Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick-euthyroid Tic tac toe Vehicle Votes Wine Wisconsin Wins over C4.5 Losses over C4.5
C4j>
MOGA-1
MOFSS-1
32.93 ± 3.11 36.34 ± 1.08 37.07 ± 2.99 7.49 ± 0.70 15.95 ± 1.43 6.0 ± 0.98 1.86 ± 0.76 10.2 ± 1.25 6.0 ± 2.32 0.0 ± 0.0 26.07 ± 1.03 16.83 ± 2.55 2.02 ± 0.12 15.75 ± 1.4 26.03 ± 1.78 3.2 ± 0.91 6.69 ± 1.82 5.28 ± 0.95 -
26.38 ± 1.47 (+) 28.32 ± 0.71 (+) 30.14 ± 1.85 (+) 16.65 ± 0.4 (-) 12.44 ± 1.84 2.19 ± 0.36 (+) 1.43 ± 0.73 5.13 ± 1.27 (+) 2.68 ± 1.1 (+) 0.0 ± 0.0 23.07 ± 1.16 11.33 ± 1.92 (+) 2.22 ± 0.18 22.65 ± 1.19 (-) 23.16 ± 1.29 2.97 ± 0.75 0.56 ± 0.56 (+) 3.84 ± 0.67 8 2
N/A 36.47 ± 1.84 40.85 ± 1.45 18.5 ± 0.70 (-) 15.04 ± 1.35 11.15 ± 1.60 (-) 1.86 ± 0.76 7.98 ± 1.37 6.01 ± 2.09 0.18 ± 0.07 (-) 28.16 ± 1.72 33.5 ± 6.49 (-) 2.32 ± 0.23 31.19 ± 1.69 (-) 33.74 ± 1.78 (-) 4.57 ± 0.89 6.07 ± 1.69 7.16 ± 0.77 (-) 0 7
used in the experiments (a dual-PC with l.lGHz clock rate and 3Gbytes memory). The results in Table 25.101 show that MOGA-1 obtained significantly better error rates than the baseline solution (column "C4.5") in 8 data sets. In contrast, the baseline solution obtained significantly better results than MOGA-1 in just two data sets. MOFSS-1 has not found solutions with significantly better error rates than the baseline solution in any data set. On the contrary, it found solutions with significantly worse error rates than the baseline solution in 7 data sets. As can be observed in Table 25.102, the tree sizes obtained with the solutions found by MOGA-1 and MOFSS-1 are significantly better than the ones obtained with the baseline solution in 15 out of 18 data sets. In the other three data sets the difference is not significant. In summary, both MOGA-1 and MOFSS-1 are very successful in finding solutions that led to a significant reduction in tree size, by comparison with the baseline solution of all attributes. The solutions found by MOGA-1 were also quite successful in reducing error rate, unlike the solutions found by MOFSS-1, which unfortunately led to a significant increase in error rate in a number of data sets. Hence, these results suggest that MOGA-1 has
Multi-Objective Table 25.102.
Algorithms for Attribute
Selection in Data Mining
619
Tree sizes obtained with C4.5, MOGA-1 and MOFSS-1 Tree Size (number of nodes)
Data set Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick-euthyroid Tic tac toe Vehicle Votes Wine Wisconsin Wins over C4.5 Losses over C4.5
CJU5
MOGA-1
MOFSS-1
80.2 ± 2.1 41.0 ± 1.29 44.2 ± 3.75 165.3 ± 2.79 29.0 ± 3.65 34.0 ± 1.89 11.0 ± 0.0 26.2 ± 1.74 8.2 ± 0.44 32.7 ± 0.67 45.0 ± 2.89 23.8 ± 1.04 24.8 ± 0.69 130.3 ± 4.25 134.0 ± 6.17 10.6 ± 0.26 10.2 ± 0.68 28.0 ± 2.13 -
65.4 ± 1.15 ( + ) 16.5 ± 3.45 (+) 7.4 ± 1.36 (+) 29.4 ± 5.2 (+) 11.2 ± 3.86 (+) 25.2 ± 0.96 (+) 11.0 ± 0.0 13.0 ± 1.4 (+) 5.8 ± 0.53 (+) 30.0 ± 0.89 (+) 11.0 ± 2.6 (+) 11.4 ± 2.47 (+) 11.2 ± 1.35 (+) 21.1 ± 4.54 (+) 95 ± 3.13 (+) 5.4 ± 0.88 (+) 9.4 ± 0.26 25 ± 3.71 15 0
N/A 7.5 ± 1.5 (+) 11.4 ± 2.78 (+) 17.7 ± 1.07 (+) 24.6 ± 8.27 23.2 ± 2.84 (+) 11.0 ± 0.0 14.2 ± 2.23 (+) 6.0 ± 0.68 (+) 27.2 ± 1.76 (+) 9.2 ± 1.85 (+) 9.0 ± 1.2 (+) 9.6 ± 0.79 (+) 10.6 ± 1.4 ( + ) 72.8 ± 10.98 (+) 5.6 ± 1.07 (+) 8.6 ± 0.26 (+) 18 ± 1.53 (+) 15 0
effectively found a good trade-off between the objectives of minimizing error rate and tree size, whereas MOFSS-1 minimized tree size at the expense of increasing error rate in a number of data sets. Table 25.103. Number of Pareto dominance relations C4.5 MOGA-1 MOFSS-1
significant
C4.5
MOGA-1
MOFSS-1
X 14 8
0 X 0
0 7 X
Table 25.103 compares the performance of MOGA-1, MOFSS-1 and C4.5 using all attributes considering both the error rate and the tree size at the same time, according to the concept of significant Pareto dominance. This is a modified version of conventional Pareto dominance tailored for the classification task of data mining, where we want to find solutions that are not only better, but significantly better, taking into account the standard deviations (as explained earlier for Tables 25.101 and 25.102). Hence, each cell of Table 25.103 shows the number of data sets in which the solution
620
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
found by the method indicated in the table row significantly dominates the solution found by method indicated in the table column. A solution Si significantly dominates a solution 5 2 if and only if: • obji(Si) + sd^Si) < obj1{S2) - sdl{S2) and . not[o&j2(S2) + sd2(S2) < o6j2(5i) - sd2(S1)\ where obji(Si) and sdi(Si) denote the average value of objective 1 and the standard deviation of objective 1 associated with solution Si, and similarly for the other variables. Objectivel and objective2 can be instantiated with error rate and tree size, or vice-versa. For example, in the bupa dataset we can say that the solution found by MOGA-1 significantly dominates the solution found by MOFSS-1 because: (a) In Table 25.101 MOGA-l's error rate plus standard deviation (30.14+1.85) is smaller than MOFSS-1's error rate minus standard deviation (40.85-1.45); and (b) concerning the tree size (Table 25.102), the condition "not (11.4 + 2.78 < 7.4 - 1.36)" holds. So, both conditions for significant dominance are satisfied. As shown in Table 25.103, the baseline solution (column "C4.5") did not significantly dominate the solutions found by MOGA-1 and MOFSS-1 in any dataset. The best results were obtained by MOGA-1, whose solutions significantly dominated the baseline solution in 14 out of the 18 datasets and significantly dominated MOFSS-1's solutions in 7 data sets. MOFSS1 obtained a reasonably good result, significantly dominating the baseline solution in 8 datasets, but it did not dominate MOGA-1 in any dataset. A more detailed analysis of these results, at the level of individual data sets, can be observed later in Tables 25.104 and 25.105. 25.5.3. On the Effectiveness of the Criterion to Choose the "Best" Solution Analyzing the results in Tables 25.99, 25.100, 25.101 and 25.102 we can evaluate whether the criterion used to choose a single solution out of all non-dominated ones (i.e., the criterion used to generate the results of Tables 25.101 and 25.102) is really able to choose the "best" solution for each data set. We can do this analyzing the dominance relationship (involving the error rate and tree size) between the single returned solution and the baseline solution. That is, we can observe whether or not the single solution returned by MOGA-1 and MOFSS-1 dominates, is dominated by, or is neutral with respect to the baseline solution. Once we have this information, we can compare it with the corresponding relative frequencies associated with the
621
Multi-Objective Algorithms for Attribute Selection in Data Mining
solutions found by MOGA-all/MOFSS-all (columns Fdominate, FdominaUd, Fneutrai of Tables 25.99 and 25.100). This comparison is performed in Tables 25.104 and 25.105, which refer to MOGA and MOFSS, respectively. In these two tables the first column contains the data set names, the next three columns are copied from the last three columns in Tables 25.99 and 25.100, respectively, and the last three columns are computed from the results in Tables 25.101 and 25.102, by applying the above-explained concept of significant Pareto dominance between the MOGA-l's/MOFSS-l's solution and the baseline solution. Table 25.104.
Performance of MOGA-all versus MOGA-1
Performance of MOGA-all's solutions wrt baseline solution Data set
Jrf om
F,jom_eli
Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick- euthyroid Tic tac toe Vehicle Votes Wine Wisconsin
0.21 0.7 0.31 0.002 0.56 0.8 0 0.37 0.8 0.68 0.34 0.33 0.02 0 0.25 0.6 0.48 0.5
0.33 0 0 0 0.05 0 0.06 0.12 0.02 0 0 0 0.02 0 0.18 0 0.31 0.2
Fneut 0.46 0.3 0.69 0.998 0.39 0.2 0.94 0.5 0.18 0.32 0.66 0.67 0.96 1 0.57 0.4 0.21 0.3
Performance of MOGA-l's solution wrt baseline solution Pom
Dom_ed
Neut
X X X X X X X X X X X X X X X X X X
As can be observed in Table 25.104, there are only 4 data sets in which the solutions found by MOGA-1 do not dominate the baseline solutions: car, glass, tic-tac-toe and Wisconsin. For these 4 data sets the solutions found by MOGA-1 were neutral (last column of Table 25.104), and the value of Fneutrai was respectively 0.998, 0.94, 1 and 0.3. Therefore, in the first three of those data sets it was expected that the single solution chosen by MOGA-1 would be neutral, so that the criterion used for choosing a single solution cannot be blamed for returning a neutral solution. Only in the Wisconsin data set the criterion did badly, because 50% of the found
622
G.L. Pappa, A.A. Freitas and C.A.A. Table 25.105.
Kaestner
Performance of MOFSS-all versus MOFSS-1
Performance of MOFSS-all's solutions wrt baseline solution
Performance of MOFSS-1's solution wrt baseline solution
Data set
Fdom
Fdorn_ed
Pom
Arrhythmia Balance-Scale Bupa Car Crx Dermatology Glass Ionosphere Iris Mushroom Pima Promoters Sick- euthyroid Tic tac toe Vehicle Votes Wine Wisconsin
0.54 0.5 0.65 0.07 0.89 0 0.99 0.14 0.86 0 0.95 0.27 0.1 0.11 0.17 0.1 0.92 0.45
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.01 0.37
Fneut 0.46 0.5 0.35 0.93 0.11 1 0.01 0.86 0.14 1 0.05 0.73 0.9 0.89 0.83 0.9 0.07 0.18
Dom-ed
Neut
X X X X X X X X X X X X X X X X X
solutions dominated the baseline solution but a neutral solution was chosen. The criterion was very successful, managing to chose a solution that dominated the baseline, in all the other 14 data sets, even though in 8 of those data sets less than 50% of the solutions found by MOGA-all dominated the baseline. The effectiveness of the criterion can be observed, for instance, in arrhythmia and sick-euthyroid. Although in arrhythmia the value of Fdominated was quite small (0.21), the solution returned by MOGA-1 dominated the baseline solution. In sick-euthyroid, 96% of the solutions found by MOGA-all were neutral, but a solution that dominates the baseline solution was again returned by MOGA-1. With respect to the effectiveness of the criterion when used by MOFSS-1, unexpected negative results were found in 2 data sets of Table 25.105, namely crx and glass. For both data sets, despite the high values of Fdominate, the solutions chosen by MOFSS-1 were neutral. The opposite happened in ionosphere, sickeuthyroid and votes, where Fneutrai had high values, but single solutions better than the baseline solution were chosen by MOFSS-1. The relatively large number of neutral solutions chosen by MOFSS1 happened because in many data sets the tree size associated with the solution chosen by MOFSS-1 was smaller than the tree size associated with
Multi-Objective Algorithms for Attribute Selection in Data Mining
623
the baseline solution, whilst the error rates of the former were larger than the error rates of the latter. Overall, the criterion for choosing a single solution was moderately successful when used by MOFSS-1, and much more successful when used by MOGA-1. A possible explanation for this result is that the procedure used for tailoring the criterion for MOFSS, described earlier, is not working very well. An improvement in that procedure can be tried in future research. It is important to note that, remarkably, the criterion for choosing a single solution did not choose a solution dominated by the baseline solution in any data set. This result holds for both MOGA-1 and MOFSS-1. 25.6. Conclusions and Future Work This chapter has discussed two multi-objective algorithms for attribute selection in data mining, namely a multi-objective genetic algorithm (MOGA) and a multi-objective forward sequential selection (MOFSS) method. The effectiveness of both algorithms was extensively evaluated in 18 real-world data sets. Two major sets of experiments were performed, as follows. The first set of experiments compared each of the non-dominated solutions (attribute subsets) found by MOGA and MOFSS with the baseline solution (consisting of all the original attributes). The comparison aimed at counting how many of the solutions found by MOGA and MOFSS dominated (in the Pareto sense) or were dominated by the baseline solution, in terms of classification error rate and decision tree size. Overall, the results (see Tables 25.99 and 25.100) show that both MOGA and MOFSS are successful in the sense that they return solutions that dominate the baseline solution much more often than vice-versa. The second set of experiments consisted of selecting a single "best" solution out of all the non-dominated solutions found by each multi-objective attribute selection method (MOGA and MOFSS) and then comparing this solution with the baseline solution. Although this kind of experiment is not often performed in the multi-objective literature, it is important because in practice the user often wants a single solution to be suggested by the system, to relieve him from the cognitive burden and difficult responsibility of choosing one solution out of all non-dominated solutions. In order to perform this set of experiments, this work proposed a simple way to choose a single solution to be returned from the set of non-dominated solutions generated by MOGA and MOFSS. The effectiveness of the proposed criterion was analyzed by comparing the results of the two different
624
G.L. Pappa, A.A. Preitas and C.A.A. Kaestner
versions of MOGA and MOFSS, one version returning all non-dominated solutions (results of the first set of experiments) and another version returning a single chosen non-dominated solution. Despite its simplicity, the proposed criterion worked well in practice, particularly when used in the MOGA method. It could be improved when used in the MOFSS method, as discussed earlier. In the future we intend to analyze the characteristics of the data sets where each of the proposed methods obtained its best results, in order to find patterns that describe the data sets where each method can be applied with greater success. References 1. Aha, D.W., Bankert, R.L.: A Comparative Evaluation of Sequential Feature Selection Algorithms. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data: AI and Statistics V. Springer-Verlag, Berlin Heidelberg New York, (1996), 1-7. 2. Guyon, I., Elisseeff, A. : An Introduction to Variable and Feature Selection. In: Kaelbling, L. P. (ed.) Journal of Machine Learning Research 3, (2003), 1157-1182. 3. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag (2002). 4. Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, England (2001). 5. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, Kluwer, (1998). 6. Bala, J., De Jong, K., Huang, J.,Vafaie, H., Wechsler, H.: Hybrid learning using genetic algorithms and decision trees for pattern classification. In: Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI-95), (1995), 719-724. 7. Bala, J., De Jong, K., Huang, J., Vafaie, H., Wechsler, H.: Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3),(1996), 297-312. 8. Chen, S., Guerra-Salcedo, C, Smith, S.F.: Non-standard crossover for a standard representation - commonality-based feature subset selection. In: Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), Morgan Kaufmann, (1999), 129-134. 9. Guerra-Salcedo, C, Whitley, D.: Genetic Search for Feature Subset Selection: A Comparison Between CHC and GENESIS. In: Proc. Genetic Programming Conference 1998, (1998), 504-509. 10. Guerra-Salcedo, C, Chen, S., Whitley, D., Smith, S.: Fast and accurate feature selection using hybrid genetic strategies. In: Proc. Congress on Evolutionary Computation (CEC-99),Washington D.C., USA. July (1999), 177184. 11. Cherkauer, K.J., Shavlik, J.W.: Growing simpler decision trees to facilitate
Multi-Objective Algorithms for Attribute Selection in Data Mining
12. 13. 14. 15. 16.
17. 18.
19. 20. 21. 22.
23.
24. 25. 26.
knowledge discovery. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, (1996), 315-318. Terano, T. , Ishino, Y. interactive genetic algorithm based feature selection and its application to marketing data analysis. In: Liu, H. ,Motoda, H. (Eds.) Feature Extraction, Construction and Selection,Kluwer, (1998), 393-406. Vafaie, H., DeJong, K.Evolutionary Feature Space Transformation. In: Liu, H., Motoda, H. (Eds.) Feature Extraction, Construction and Selection, Kluwer, (1998), 307-323. Yang, J. ,Honavar, V.: Feature subset selection using a genetic algorithm. Genetic Programming 1997: Proc. 2nd Annual Conf. (GP-97), Morgan Kaufmann, (1997), 380-385. Yang J., Honavar V.: Feature subset selection using a genetic algorithm. In: Liu, H., Motoda, H. (Eds.) Feature Extraction, Construction and Selection, Kluwer,(1998), 117-136. Moser A., Murty, M.N.: On the scalability of genetic algorithms to very large-scale feature selection. In: Proc. Real-World Applications of Evolutionary Computing (EvoWorkshops 2000). Lecture Notes in Computer Science 1803, Springer-Verlag, (2000), 77-86. Ishibuchi, H., Nakashima, T.: Multi-objective pattern and feature selection by a genetic algorithm. In: Proc. 2000 Genetic and Evolutionary Computation Conf. (GECCO-2000), Morgan Kaufmann, (2000), 1069-1076. Emmanouilidis, C, Hunter, A., Maclntyre, J.: A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator. In: Proc. 2000 Congress on Evolutionary Computation (CEC-2000), IEEE, (2000), 309-316. Rozsypal, A., Kubat, M.: Selecting Representative examples and attributes by a genetic algorithm. Intelligent Data Analysis 7, (2003), 290-304. Llora, X.,Garrell, J.: Prototype Induction anda attribute selection via evolutionary algorithms. Intelligent Data Analysis 7, (2003), 193-208. Coello Coello, C.A., Van Veldhuizen, D.A., Lamont, G.B.: Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, New York (2002). Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: Attribute Selection with a Multiobjective Genetic Algorithm. In: Proc. of 16 th Brazilian Symposium on Artificial Intelligence, Lecture Notes in Artificial Intelligence 2507, SpringVerlag, (2002), 280-290. Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: A Multiobjective Genetic Algorithm for Attribute Selection. In: Proc. of 4 International Conference on Recent Advances in Soft Computing (RASC), University of Nottingham, UK, (2002), 116-121. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. 3rd edn. Springer-Verlag, Berlin Heidelberg New York, (1996). Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, (1993). Bhattacharyya, S.: Evolutionary Algorithms in Data mining: MultiObjective Performance Modeling for Direct Marketing. In: Proc of 6th ACM
625
626
G.L. Pappa, A.A. Freitas and C.A.A. Kaestner
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), ACM Press (2000), 465-471. 27. Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Study and the Strength Pareto Approach. In: IEEE Transactions on Evolutionary Computation 3(4), (1999), 257-271. 28. Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning databases. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, (1994). 29. Coello Coello, C.A.: Handling Preferences in Evolutionary Multiobjective Optimization: A Survey. In: Proc. of Congress on Evolutionary Computation (CEC-2002), IEEE Service Center, New Jersey (2000), 30-37.
CHAPTER 26 FINANCIAL APPLICATIONS OF MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS: RECENT DEVELOPMENTS AND FUTURE RESEARCH DIRECTIONS
1
Frank Schlottmann 1 ' 2 and Detlef Seese2 GILLARDON AG financial software, Research Department, Alte Wilhelmstr. 4, D-75015 Bretten, Germany E-mail: [email protected] 2
Institute AIFB, University Karlsruhe (TH) D-76128 Karlsruhe, Germany E-mail: [email protected]
The area of finance contains many algorithmic problems of large practical interest whose complexity prevents finding efficient solutions. The range of application in this area covers e. g. portfolio selection and risk management and reaches from questions of real world financial intermediation to sophisticated research problems. Since there is an urgent need to solve these complex problems heuristic approaches like Evolutionary Algorithms are a potential toolbox. The application of Multi-Objective Evolutionary Algorithm concepts to this area has started more recently compared to the vast majority of other application areas, as e. g. design and engineering. We give a brief survey on promising developments within this field and discuss potential future research directions. 26.1. Introduction It is one of the goals of computational finance to develop methods and algorithms to support decision making. Unfortunately, many problems of practical or theoretical interest are too complex to be solvable exactly by a deterministic algorithm in reasonable computing time, e. g. using a method that applies a simple closed-form analytical expression. Such problems require approximation procedures which provide sufficiently good solutions while requiring less computational effort compared to an exact algorithm. Heuristic approaches are a class of algorithms which have been developed 627
628
F. Schlottmann and D. Seese
to fulfil these requirements in many problem contexts, see e. g. Fogel & Michaelwicz1 or Hromkovic2 for the general methodology and Schlottmann & Seese3 for an overview of heuristic algorithm applications to financial problems. Moreover, Chen's book4 contains a selection of mainly singleobjective Evolutionary Algorithm applications in the finance area. In contrast to these more general literature surveys, we concentrate solely on Multi-Objective Evolutionary Algorithm (MOEA) applications in finance in the following text. The main advantage of MOEAs is their ability to investigate many objectives/goals at the same time. Hence, they offer many possibilities to support decision making particularly in finance where a majority of naturally multi-criteria problems have been considered only in a simplified single-objective manner for a long time. Since we do not address general concepts and details of standard MOEAs, we refer the reader e. g. to Deb5, Coello et al.6 and Osyczka7 for an introduction as well as a thorough coverage of this methodology. The rest of this chapter is structured as follows: In the next section, we point out the complexity of financial problems. Afterwards, we give an introduction to portfolio selection problems in the standard Markowitz setting. We discuss some successful MOEA applications from the literature which solve different problems related to portfolio selection. The chapter ends with a conclusion and potential future research directions. 26.2. A Justification for MOEAs in Financial Applications Many decision problems in the area of finance can be viewed as some of the hardest problems in economics. This is caused by their strong interrelation with many other difficult decision problems in business life, by the huge number of parameters which are usually involved in these problems and often by the intrinsic complexity of some of these problems itself. Complexity influences financial decision making in many forms. There is the huge number of parameters often involved in financial decision problems, financial systems are often highly dynamic and recently there are proofs that even financial decision problems for simple models have a high algorithmic complexity preventing the existence of efficient algorithmic solutions. Often such complexity results give insights into structural reasons for the difficulties preventing to support decision making with the help of computers. For instance, Aspnes et al.8 showed that already for a very simple model of a stock market the complexity of the problem to predict the market price depends essentially on the number of trading strategies in comparison
Financial Applications of MOEAs: Recent Developments and Future Research
629
to the number of traders. If there is a large number of traders but they employ a relatively small number of strategies, then there is a polynomialtime algorithm for predicting future price movements with high accuracy and if the number of trading strategies is large, market prediction becomes complex. Of course, such complexity results require a precise definition of complexity. A widely accepted formal definition of complex problems results by comparing the asymptotic computation times of their solution algorithms. Here the computation time - measured in elementary computation steps is a function denning for each size of the input, which is measured e.g. as the number of input variables of the given problem, the number of steps the algorithm needs to compute the result in a worst case. Such computation time functions can be compared with respect to their asymptotic growth rates, i.e. comparing the growths of the function neglecting constant factors and possibly a finite number of input sizes. The observation is that most problems which can be solved efficiently in practice can be solved via algorithms whose asymptotic growth rate is at most polynomial in the input size n, i.e. at most nk for a constant k and in most cases k is small. All these problems are gathered in the class P. Unfortunately, for almost all problems of practical or theoretical importance no polynomial time algorithms are known - instead of that only exponential time solutions are found, e.g. with an exponential number 2™ of necessary calculation steps for n given input variables of the considered problem. Another observation is that almost all of these problems can be computed in polynomial time by a nondeterministic algorithm. Such algorithms can perform a certain number of computation steps in parallel and choose the shortest computation path at the end. An equivalent way to describe such algorithms is to allow guessing a solution and the only thing the algorithm has to do in polynomial time is to verify that the guessed solution is correct. All problems computable in polynomial time via a nondeterministic algorithm build the class NP and almost all problems of practical importance are included in this class. It is an outstanding open problem in computer science to decide whether P = NP holds. In an attempt to answer this question the class of NP-complete problems was defined and investigated. A problem is defined to be NP-complete if it is in NP and each problem in NP can be reduced to it via a deterministic polynomial time algorithm. NP-complete problems are thus the hardest problems in the class NP. If one finds an algorithm which solves one of these problems in polynomial time then all problems in NP can be solved
630
F. Schlottmann and D. Seese
in polynomial time, hence P = NP. It is widely conjectured, that no such algorithm exists and this conjecture is one of the most famous problems in computer science which is open already for more than two decades. The really surprising fact is not the existence of NP-complete problems, but the fact that almost all problems of practical interest belong to this class of problems: Until now there are thousands of NP-complete problems in all areas of application and there is no known algorithm which requires only a polynomial number of computational steps depending on the input size n for an arbitrarily chosen problem that belongs to this class of problems. So it is not surprising that recently it could be proved that also many problems in finance belong to the class of NP-complete problems, since they have a combinatorial structure which is equivalent (with respect to polynomialtime reductions) to well-known NP-complete problems, e. g. constrained portfolio selection and related questions of asset allocation (which are considered later in this chapter) are equivalent to the following problem which is known to be NP- complete. KNAPSACK: Given is a finite set U is given together with positive integers s(u) (the size of u) and v{u) (the value of u) for each element u 6 U, a positive integer B as size constraint and a positive integer K as value goal. Find a subset U' CU such that £)«€£/' s(u) < B and J2uev v(u) > KMore details on knapsack problems and a large collection of further complexity results can be found e. g. in Garey & Johnson9, see also Papadimitriou10 for the formal definitions of computational complexity and Kellerer et al.11 for a contemporary monograph on knapsack problems. An illustrative formulation of knapsack problems in portfolio selection is given in the next section, whereas e. g. Seese & Schlottmann12'13 provide corresponding complexity results. The main consequence of the above mentioned complexity results is that we require approximation algorithms that yield sufficiently good solutions for complex finance problems and consume only polynomial computational resources measured by the size of the respective problem instance (e. g. number of independent variables). For some complex problem settings and under certain assumptions, particularly linearity or convexity of target functions in optimization problems, there are analytical approximation algorithms which provide a fast method of finding solutions having a guaranteed quality of lying within an e-region around the globally best solution (s). If the considered problem instance allows the necessary restrictions for the application of such algorithms, these are the preferred choice, see Ausiello et al.14 for such considerations. However, some applications in finance require non-
Financial Applications of MOEAs: Recent Developments and Future Research
631
linear, non-convex functions (e. g. valuation of financial instruments using non-linear functions), and sometimes we know only the data (parameters) but not the functional dependency between them, so there is nevertheless a need for methods that search for good solutions in difficult problem settings while spending only relatively small computational cost. This is the justification for heuristic approaches like MOEAs, which unlike conventional algorithms, allow imprecision, uncertainty as well as partial truth and can handle multiple objectives in a very natural manner. These requirements are matching many real-world search and optimization problems. MOEAs offer especially on the basis of their evolutionary part adaptability as one of their characteristic features and thus permit the tracking of a problem through a changing environment. Moreover on the basis of their multi-objective part they allowflexibledecisions of the management on the basis of the actually present information. Hence they are an interesting tool in a complex and dynamically changing world. The next section contains some examples of such heuristic approaches to complex financial problems.
26.3. Selected Financial Applications of MOEAs 26.3.1. Portfolio Selection Problems All MOEA approaches which will be discussed later in this subsection focus on portfolio selection problems or related questions. To give a brief introduction of this application context, we concentrate on standard Markowitz15 portfolio selection problems first. Given is a set of n £ N financial assets, e. g. exchange traded stocks. At time to € M., each asset i has certain characteristics describing its future payoff: Each asset i has an expected rate of return Hi per monetary unit (e. g. dollars) which is paid at time t\ 6 E, t\ > to- This means if we take a position in y £ K units of asset 1 at time to our expected payoff in t\ will be Hi y units. Moreover, the covariances between the rates of return of all assets are given by a symmetric matrix £ := Wij)i,je{i,...,«}• I n this straightforward notation, an is the variance of asset i-th's rate of return and Gij is the covariance between asset i-th's rate of return and asset j-th's rate of return. A portfolio is defined by a vector x :— (xi,..., xn) G IRn which contains the weight x, € R of asset i e { 1 , . . . , n} in its i-th component. In the standard problem formulation, the weights of a portfolio are
632
F. Schlottmann and D. Seese
normalized as follows:
j>
=i
(c.i)
»=i
Depending on the specific problem context, there are additional restrictions on the weights, e. g. lower bounds (a common constraint is Xi > 0), upper bounds and/or integrality constraints. This topic will be addressed in more detail later. At this point, it is sufficient to denote the set of all unconstrained portfolios by S C 1™ and the set of feasible portfolios which satisfy the required constraints by F C 5. If the specific portfolio selection problem is unconstrained, one can simply assume F = S. Usually, at least two conflicting target functions are considered: A return function freturnix) which is to be maximized and a risk function frisk{x) which is to be minimized. In the standard Markowitz setting these functions are defined as follows: n
freturnix) \= ^
x
i Mi
(C2)
i=l n
n
frisk (x) := ^2 XI X i xi °{i i=l
(C-3)
j=l
The above definition of freturn resembles the fact that the expected rate of return of a portfolio is the weighted sum of the assets' expected rate of return. And in the above specification of frisk the standard deviation of the portfolio rate of return is chosen as a risk measure which describes the level of uncertainty about the future payoff at time ti. In the context of portfolio management, a feasible portfolio x £ F is dominated by a feasible portfolio y G F iff at least one of the following two conditions is met: freturnix) < freturniv) A frisk(x) > frisk{v)
(C.4)
freturnix) < freturniv) A friskix) > friskiy)
(C.5)
As rational investors prefer non-dominated portfolios over dominated portfolios, one is usually interested in finding an approximation of the so-called efficient frontier which is identical to the set of all feasible nondominated portfolios. In the standard finance literature, this is formulated as a constrained single objective problem: Given a rate of return r*, find a feasible portfolio x* £ F satisfying freturnix*) = r* A frisk(x*) = min{/ri.*(a:)}. xEF
(C.6)
Financial Applications of MOEAs: Recent Developments and Future Research 633
If there are no integrality constraints or other restrictions which raise the complexity such problems can be solved using standard Quadratic Programming algorithms (under the assumption that £ is positive definite). Prom a computational complexity point of view, this is equivalent to solving a knapsack-like problem using real-valued decision variables - in the knapsack problem formulation in section 26.2 we considered binary decision variables, hence the complexity is different (lower) here although the objective function is not linear. By considering two objective functions instead of modelling an objective function constraint, we obtain a quite natural problem formulation for a MOEA approach which allows more flexibility concerning both the objective functions and the constraints on the portfolios. And we point out that the question of finding non-dominated portfolios raised above can easily be extended to a multi-period problem where the payoff at each additional future point of time t2, t3,..., tm, m e N, Mi £ { 1 , . . . , m} : U £ E is considered separately. This results in 2 • m objective functions to be optimized. In the following subsections we will summarize several applications of MOEAs in this context. Particularly, we will describe the deviation from the above Markowitz problem setting, the genetic modelling, the chosen genetic variation operators and the parameter sets used in empirical tests of the methodology.
26.3.2. Vederajan et al. The article by Vederajan et al.16 contains different applications of Genetic Algorithm (GA) methodology to portfolio selection problems in the Markowitz context. At first, the authors consider the standard problem of portfolio selection from the previous section and add the constraint \/i£{l,...,n}:0<Xi<xmax
(C.7)
where xmax 6ffi-|_is a constant. Besides a single-objective GA approach using a weighted sum of the frisk and freturn objective functions from section 26.3.1 which we do not consider here, Vederajan et al. also propose a MOEA approach searching for non-dominated feasible invididuals with respect to the two objectives. They use the Non-dominated Sorting Genetic Algorithm (NSGA) from Srinivas & Deb17 based on the following genetic representation of the Xi variables: Each decision variable is represented by a binary string of fixed length lCOnst, which represents the weight of the asset in the portfolio. The strings of all decision variables are concatenated such that the resulting genotype of each
634
F. Schlottmann and D. Seese
individual consists of a binary gene string of length n • I const- It has to be emphasized here that this genetic modelling restricts the search space to a discrete subset of W1:
F : = | a ; 6 {O.d.ca • - , . . . , Q C O B , , _ 1 - ^ - J
} | £ > < = 1 | (C.8)
Here the constants c; > 0 are chosen together with lconst such that Xi > 0 (trivial) and Xi < xmax is assured. To incorporate the summation constraint from equation (C.I) into the algorithm, Vederajan et al. propose a repairing procedure for infeasible individuals derived from Bean18: The Xi values of an infeasible individual are sorted in descending order to obtain a permutation TT(I') of the decision variables. Using this permutation one starts with the highest value given by xn(k) for k := 1 and raises k successively until Xa=i xn(i) > 1 f° r the minimum k. Knowing this value k one sets
{
xn(j)
if j < k,
1 - Eti 1 **«
iij = k,
(C.9)
0 otherwise. This repairing operation is applied each time an infeasible individual is generated (e. g. after random initialization of the first population). The selection operator used for reproduction of individuals in the NSGA is standard binary tournament, and the genetic variation operators are one-point-crossover with crossover probability pcross '•= 0.9 and a standard binary complement mutation operator applied with probability pmut := 0.01 to each single bit in the gene string. Diversity preservation in the population is achieved by a niching approach using the sharing function n(J \ ._ J 1 ~ ( ^ 7 7 ) ti dxy < Sconst, gyaxy) • - < . (o.iuj (^ 0 otherwise. Here, dxy is the Euclidean distance between the fitness function values of a given individual x and a given individual y. sconst is the maximum accepted value of dxy for two arbitrary individuals which belong to the same niche. Vederajan et al. perform several experiments with stock market data particularly consisting of the historical asset price means and covariances for Boeing, Disney, Exxon, McDonald's and Microsoft stocks from January 1991 to December 1995. Their NSGA application to the Markowitz problem
Financial Applications of MOEAs: Recent Developments and Future Research 635
described above yielded a well-converged approximation of many Paretooptimal solutions within 100 population steps. Each population contained 1000 individuals. Concerning their application of a MOEA to a quadratic optimization problem instead of using standard quadratic programming approaches, Vederajan et al. point out an interesting fact which the authors of this chapter also encountered when it came to real-world portfolio selection problems: As it has already been mentioned in section 26.3.1, the covariance matrix ( is required to be positive definite to apply standard quadratic programming algorithms. If there are numerical issues (e .g. numerical imprecision due to rounding and/or floating-point arithmetic), this assumption might be violated. Moreover, a violation is not unlikely for real-world data, particularly when n gets large, since the covariances are estimated from real asset price time series which do not necessarily satisfy a priori given restrictions of the mathematical tool for portfolio analysis. Thus, a MOEA approach is even suitable for such a standard problem setting. In addition to the above results, Vederajan et al. also consider a variant of their Markowitz problem setting where transaction cost due to changes in a portfolio (rebalancing) are an additional ingredient which causes problems for standard quadratic programming algorithms. Thus, the authors apply their NSGA approach again using a third objective function which is to be minimized: n
fcost(x) :- ^2ci(xi - x{f,
(C.ll)
;=i
where X{ G E^. is the given initial weight of asset i in the portfolio that is to be changed potentially due to rebalancing transactions, and the constant Ci £ R is the transaction cost for asset i. The above NSGA approach is again applied to the given five asset problem instance, just the number of individuals per population is raised to 1500. Vederajan et al. illustrate the three-dimensional boundary of the approximated solutions in the objective function space and give a reasonable interpretation for the shape of the approximated Pareto front. As a summary, the work by Vederajan et al. contains an early application of the MOEA methodology to portfolio selection problems and provides even a practical justification for the application of this methodology to standard Markowitz problem settings where quadratic programming approaches are often considered to be mandatory. Furthermore, an interesting application of a MOEA to portfolio selection problems with transaction cost
636
F. Schlottmann and D. Seese
is illustrated.
26.3.3. Lin et al. In their study, Lin et al.19 consider the following variation of the standard Markowitz problem from section 26.3.1: Each asset can only be held in nonnegative integer units, i. e. | Vi e { 1 , . . . , « } : a* eNU{0}}
S:={x:=(xu...,xn)
(C.12)
The market price of one unit of asset % which can be bought is pi. There is an upper limit Ui on the maximum monetary value which is invested into each asset i, i. e. Vie{l,...,n}
(C.13)
:piXi
Furthermore, there are capital budget thresholds Co,Ci € K+ and the total capital to be invested is required to satisfy the condition n
Co<]Tpia:i
(C.14)
i=i
Summarizing the above constraints, we obtain n
F:= {(x1,...,xn)
e S \ Co <^PiXi
< Ci A'ii e {1,... ,n} : piXi < m}
(C.15) For each asset i € { 1 , . . . , n) there is a variable transaction cost of c; £ E which is proportional per unit of asset i that is to be bought. In addition, there is a fixed transaction fee of Fi which is also due in case that asset i is put into the portfolio. The transaction cost considerations yield a different return objective function: , freturn(x) ~
™T
n
n
(^((H
- d) Pi Z» - ^
Fi 1
{*i>0})
(C.16)
where the characteristic function is denned as
!(„>.» = {; X>>0:
(C17)
[0 otherwise. The risk function is the variance as in the standard portfolio selection setting, but written in terms of the integer xt variables and the prices p;: ..
friskix) := j^n
n
n PiXi
r^ YlY2
KL.i=iPiXi)
i=lj=1
c x
ii
ai
i
( C - 18 )
Financial Applications of MOEAs: Recent Developments and Future Research
637
Lin et al. suggest a hybrid algorithm for the approximation of feasible non-dominated portfolios which combines elements of the MOEA NSGA-II by Deb et al.20 with concepts used in the single-objective EA GENOCOP by Michalewicz & Janikow21. Their hybrid algorithm is structured as follows: (1) Run NSGA-II to determine a feasible initial population Pop(0) for the succeeding steps. The NSGA-II run is terminated if all individuals are feasible. (2) Use NSGA-II to determine minx£f{frisk{%)} and rri&xX£F{fretv.rn{x)} and insert the corresponding individual x into Pop(0). (3) Apply NSGA-II using Pop(0) and stop if popmax populations have been processed. The basis is the NSGA-II using integer-valued genes to represent the Xi decision variables. Due to the feasibility constraints, Lin et al. suggest two special preprocessing stages to obtain a completely feasible initial population particularly including the boundary solution(s) having minimum risk and maximum return objective function value. For clarity, we omit the additional optimization problems which arise in the first two steps and concentrate on the algorithmic elements used by Lin et al. in their NSGA-II implementation. The standard binary tournament selection implemented in the NSGAII is performed on each population. To incorporate the level of constraint violation into the standard NSGA-II tournament rule which allows the comparison between individuals both by domination and by degree of constraint violation, the following natural definition of the degree of constraint violation g(x) is used:
[o co<ELiP^i
C
[ Ei=l Pi i ~ l
(C19)
Ei=l Pi Xi>Ci.
As a crossover operator, a modified simulated binary crossover (SBX, cf. Deb & Agarwal22) is chosen. It works as follows: Given are two parent individuals x,y e S which are used to create two offspring individuals a, b € S. A random scaling factor /% is drawn independently and identically distributed for each gene (for more details, see e. g. Deb5, pp. 109-110), and then the offspring are determined by a, := 0.5 ((1 + Pi)Xi + (1 - Pi)yi)
(C.20)
638
F. Schlottmann and D. Seese
6 < : =0.5((l-A)a:i + (l + A)y0
(C.21)
After performing this SBX operation, the offspring individuals do not necessarily consist of integer-valued genes. Thus, Lin et al. use the following strategy for each gene to obtain integer allele values:
f
\ai\ or \a{\ + 1 chosen randomly if pi (|~a;] + 1) < Ui, r^-1 otherwise.
(C.22)
The same applies to bi after performing the above SBX operator. Using the ideas from GENOCOP, the two offspring are checked for violation of constraint (C.14). If both are feasible, they are accepted, otherwise they are dropped and the whole crossover procedure is repeated using the two given parents x and y. Moreover, the adaptation from formula (C.22) is also applied to each gene after performing the mutation operator for which the standard parameter based mutation for real-valued genes is used. Lin et al. point out that they have found empirical evidence that in the context of their problem restoring the feasibility of individuals after performing the genetic variation operators is essential for improving the efficiency of the evolutionary search process. In their empirical test a sample portfolio consisting of 31 stocks from the Hang Seng index is considered. The constrained portfolio selection problem is solved using the following parameters for the NSGA-II in the algorithmic steps (1) and (2): 200 individuals per population, crossover probability Pcross : = 0.95, mutation probability pmut := „ * nes . In the third run of the NSGA-II, they use pcross :- 0.4, pmut := 0.2 and popmax := 3000 populations. Lin et al. compare the final individuals found by a run of the MOEA to the globally optimal solutions for the corrresponding unconstrained portfolio selection problem instance without transaction cost and real-valued variables (i. e. the standard Markowitz problem instance), and the exact solution is approximated well by the MOEA. For the constrained problem, they illustrate the deviation of the approximations found by the MOEA for the Hang Seng instance from the globally optimal solutions of the corresponding unconstrained problem which is due to the integrality constraints and the transaction cost. Summarizing the study, Lin et al. have considered a variant of the Markowitz portfolio selection problem where additional constraints raised the complexity such that a MOEA approach seems reasonable. They constructed a hybrid algorithm which combined ideas from the single-objective
Financial Applications of MOEAs: Recent Developments and Future Research
639
GENOCOP algorithm to repair infeasible solutions with the NSGA-II algorithmic framework. 26.3.4. Fieldsend & Singh The article by Fieldsend k Singh23 transfers the concepts offindinga whole Pareto front of non-dominated individuals concerning a risk and a return objective function to the prediction of time series by Artificial Neural Networks (ANNs). As the main focus in this volume is on MOEAs we refer the reader to Schlottmann & Seese3 for a short introduction to ANN as well as a recent overview of financial applications of the latter methodology and further references. For our considerations below, it is sufficient to know that an ANN is used as a non-linear regression tool by Fieldend & Singh to perform asset price time series prediction: Before asset trading commences at day t - 1 (t € N) the following data is considered for a single risky asset: the unknown opening asset price popen(t - 1) at day t - 1 and the unknown highest daily asset price which will occur at day t, denoted by Phigh(t)- The goal is to predict V-
Popen\i ~ *-) )
(c.23)
where c := 0.993 is a constant which is derived using a hypothetical trading strategy for the asset incorporating transaction cost. This is described in more detail later. Since the prediction for y(t) is made before knowing the realization of popen(t — 1) and Phigh(t) the a priori prediction y(t) is not necessarily matching the true ex post outcome y(t) at day t. This yields a forecast error which is measured by the commonly used Root Mean Squared Error (RMSE) over k € N observations: 1
RMSE ~
\|
k
- J2(y(t) - y(t))2 K
t=i
(C24)
To obtain a prediction y(t) the known data from the previous ten trading days before t—1 is used to calculate y(t-2), y(t — 3),... ,y(t —11), and these values together with the five previously made predictions y(t - l),y(t 2),... ,y(t — 5) are the input for the ANN which performs a non-linear regression to predict the dependent variable y(t). The key idea of Fieldsend & Singh is to consider two separate functions describing the prediction error: The first function freturn measures the return of a trading strategy that buys and sells the asset depending on the
640
F. Schlottmann and D. Seese
value of y(t), the last asset price movements on day t and the riskless interest rate for a bank deposit which is considered as an investment alternative to the risky asset. The calculation of the fretUm function value can be summarized as follows: The risky asset is bought if the forecast of the next day's highest asset price is 1.7% above or more than 99.3% of today's opening price and if today's lowest price is < 99.3% of today's opening price. If y(t + 1) > 1.017 then the asset is sold as soon as the market price is 1.7% above the price paid for buying the asset thus realizing a profit. Otherwise, the asset is sold at the end of the same trading day it was bought, and the outcome of the trade can be either profit or loss. If no trade is made, the risk-free interest rate is earned by the investor. The function value of freturn represents the outcome of the trading strategy depending on the prediction y(t) and on the ex post realized asset prices. The second function friSk covers the prediction risk and is identical to the RMSE from formula (C.24). Based on the two objective functions, Fieldsend & Singh are interested in finding a front of non-dominated ANN models which predict the asset price time series. In their study, they highlight the analogy of their considerations to the Capital Asset Pricing Model (CAPM) proposed independently by Sharpe, Lintner and Mossin (see e. g. Sharpe24) which is an extension of the Markowitz work described in section 26.3.1. The bottom line is that they use a Markowitz-like definition of risk and return objective functions and search for a non-dominated Pareto front of non-linear regression models rather than searching a capital market equilibrium as stated by the CAPM. Thus, their study fits exactly into the picture from section 26.3.1. The non-dominated solutions concerning the two objective functions are approximated by a combination of ANN methodology and the Strength Pareto Evolutionary Algorithm (SPEA) which is described in Zitzler & Thiele25. Fieldsend & Singh report the results of an empirical study with the following settings for SPEA: The search population (standard population) contains 80 individuals (ANNs), and there are up to 20 individuals chosen from a secondary unconstrained elite population for performing SPEA's binary tournament selection within each generation. A total of popmax := 2000 population steps is performed. The standard one-point-crossover is chosen as the first variation operator applied with probability pcrOss '-= 0.8 to two selected individuals. In addition, the mutation operator is performed by adding the product Zmut of three independent random variables Z\ £ [0,1], Z2 6 [0,1] and Z3 £ K to the mutated gene such that Zmut := Z\ • Z2 • Z$ and Z\, Z2 are distributed
Financial Applications of MOEAs: Recent Developments and Future Research
641
uniformly, Z3 is normal distributed with zero mean and variance 10%. The random variable Zmut is symmetric and has zero mean as well as a high degree of kurtosis (« 10). The mutation probability is 10%. The data used in an empirical study is the Dow Jones Industrial Average index within 2500 trading days between 28/2/1986 and 3/1/2000. This data is divided into 25 time windows each of which consists of the first 1000 trading days that are used in the ANN training (i.e. for calibrating parameters in the non-linear regression) and of the following 100 trading days used for an out-of-sample prediction performance test. The objective function values are calculated seperately on the training and on the test data for each of the 25 time windows. Fieldsend & Singh compare the resulting profit of the hybrid ANNMOEA algorithm to the profit of a naive random walk prediction (y(t) = y(t - 2)) and to the compounded daily asset return which reflects the outcome of a buy-and-hold strategy for the asset over the chosen time period. They conclude that the respective hybrid algorithm's prediction model outperforms the buy-and-hold strategy in terms of profit on the given data set if one chooses e.g. the model from the approximated Pareto front which has maximum objective function value of freturn- The same holds for the model from the approximated Pareto front which has minimum objective function value of frisk, and this also applies to the 'middle' model which is the model in the middle of the respective approximated Pareto front. An interesting additional observation is that while both the first and the last mentioned model also outperforms the naive random walk prediction on the data set, the prediction risk minimizing model does not. This means that strictly minimizing frisk does not necessarily lead to excess values of freturn compared to more risky strategies which is an analogy to the views of neo-classical capital market theory (of course in a different context here since the prediction's risk and return are considered). Summarizing their study, Fieldsend & Singh apply a hybrid algorithm to an asset price time series. The goal is to predict the asset price movements by different non-linear ANN regression models which are non-dominated concerning the return resulting from the prediction as well as the prediction risk. As the shape of the set of non-dominated solutions is a priori unknown and depends heavily on the empirical data which changes over time (cf. the extreme movements in stock prices which happened recently), a MOEAbased approach (here: SPEA) is appropriate. A heuristic approach is also reasonable here due to the results of Aspnes et al. which were mentioned in section 26.2: In the real world stock market there are many traders having
642
F. Schlottmann and D. Seese
many different strategies, thus stock market prediction is a complex task. 26.3.5. Schlottmann & Seese In Schlottmann & Seese26'27>28'29 we have developed a hybrid algorithm for solving a portfolio selection problem which is relevant to real-world banking. It is substantially different from the original Markowitz problem: Given is a bank which has a fixed supervisory capital budget C g l This is an upper limit for investments into a portfolio consisting of a subset of n given assets (e. g. n loans to be given to different customers of the bank) each of which is subject to the risk of default (credit risk). Besides an expected rate of return / i j £ R similar to the Markowitz problem setting, each asset i also has an a priori expected default probability pi £ (0,1) and a net exposure ei e l f within a fixed risk horizon [0, T]. The expected rate of return fn is not adjusted for the default risk (this will be addressed later). If asset i defaults until time T, the bank will lose the amount of e^, and this loss event is expected to happen with probability pi. Otherwise, if asset i does not enter default within the period of time [0, T], the bank's loss from this asset will be equal to 0. The search space of potential investment decisions for the bank without respecting the capital budget C is given by | V i e { 1 , . ..,n} : n € { 0 , e ; } } .
S:={x:=(Xl,...,xn)
(C.25)
Thus, we consider binary-style decision variables since the bank has to decide whether the whole net exposure is to be held in the portfolio. Furthermore, if and only if asset i is held in the portfolio, the bank has to allocate a supervisory capital amount of Wi • e, (wi 6 K__ | is the supervisory capital weight) from its scarce resource C. This implies the constrained search space
F:=ixeS\
^2wiXi
(C.26)
In contrast to the standard Markowitz setting, the return objective function has to be adjusted for default risk, and we consider a monetary objective function value rather than a relative (percentage) value due to real-world banking objectives: n
freturn{x) ~ ^ i=l
n
n
Hi Zj - ^ Pi Xi = ^ i—1
i=l
(fM ~ Pi) %i
(C.27)
Financial Applications of MOEAs: Recent Developments and Future Research 643
The objective function measures the net expected return of the portfolio x n
which is adjusted for the whole expected loss from x, i. e. J2 Pixi • 2=1
Since the portfolio loss due to defaults has a highly skewed and asymmetric probability distribution, the variance risk measure used in the Markowitz setting is not appropriate here. Instead, in many banks, the following Credit-Value-at-Risk measure is used to quantify the unexpected loss for the bank due to the default risk: For a given portfolio structure x the Credit-Value-at-Risk (CVaR) at the arbitrary, but fixed confidence level a € (0.5,1) is obtained by calculating n
CVaR(x) := ^~l{a) -J^pHi
(C.28)
i=i
where ip~1(a) is the a-percentile (inverse) of the cumulative distribution function of aggregated losses calculated from the portfolio x from given parameters (i. e. ei,pi and further parameters not relevant here, see e. g. Schlottmann & Seese29 for more details). To account for the real-world objective of banks, we set frisk{x) '•— CVaR(x). From a computational perspective, this immediately raises problems for standard non-linear optimization methods, since it can be shown easily that CVaR(x) is a non-convex function (cf. e. g. Schlottmann30, p. 127 f.). Moreover, in conjunction with the binary-style decision variables and the knapsack-like constraint (cf. formula (C.26)) one obtains a discrete constrained search space which contains many local optima and two conflicting objective functions. Hence, a MOEA-based approach is appropriate here. The decision variables Xi are concatenated to form a gene string consisting of real-valued alleles. Of course, the binary-style decision variables could also be represented by binary digits, but for sake of simplicity, we assume real-valued genes coding the decision variables in the following text. In the genetic representation, the decision variables are ordered such that the variables for assets which are highly correlated concerning default events are located very close to each other, see Schlottmann & Seese26 for more details and an example. The main goal of the permutation of the decision variables in the gene string is a better performance of the chosen one-point-crossover variation operator which has a lower probability of destroying good partial solutions for the highly correlated decision variables if there are very few cut positions between them. We combine elements from different MOEAs in our approach with a
644
F. Schlottmann
and D. See.se
gradient-based search method. The basic algorithm is similar to the NSGAII by Deb et al.20 for binary-style decision variables, but we added an external elite population which contains the best non-dominated feasible individuals found so far at each population step to obtain more approximation solutions after termination of the algorithm without the need to raise the normal population size. The individuals are selected from the population for reproduction in a standard binary tournament comparable to NSGA-II. We use the standard one-point-crossover with probability pcross '•= 0.9, and the standard mutation operator for binary-style variables (i. e. Xi — e; is mutated to Xi = 0 and vice versa) with a mutation probability of pmut := ^ per gene. In addition to these standard MOEA elements, we apply a third local search variation operator with probability 0 < pi0Cai < 0.2 (this parameter choice will be discussed later) to each selected individual afterfinishingthe crossover and mutation operation: If x £ F Then Direction := — 1 Else Choose Direction G {1,-1} with uniform probability 0.5 Vz G {1,... ,n} : xt := Xi Step := 0 Do Vi G { 1 , . . . , n} : Xi :— £i Aold) Jreturn
__ r , x .(Old) . _ f ( \ -~ J return\J') •> Jrisk '~ Jrisky-L)
For each x-j calculate the partial derivatives d,- := jp- ( 7'1""? ^ ) If Direction = — 1 Then Choose the minimal gradient component i := argmin{dj \XJ > 0} £i := 0 Else Choose the maximal gradient component i := argmax {dj \XJ = 0 } £i := a End If Anew) Jreturn
._ , /-x Anew) . _ , (~\ -~ JreturnK-L/: J risk '~ Jnsky-1)
Step:= Step + 1 While (Step < Stepmax) A (3i : xt > 0) A (3j : Xj = 0) A (x ^ current population) A (x £ elite population) A [ (Direction = - l A ^ F ) V t/r\-
i-
((Direction
i « -
= lAxeF)A
Vi e { 1 , . . . , n} : Xi := fj
T-I\ . /Anew)
(PretJn
^
j.(o/d)
> freturn
. , j.(neu;l
V frisk
_ /•(o!d)\\i
<
frisk))]
Financial Applications of MOEAs: Recent Developments and Future Research
645
At first, the feasibility of the current individual x is checked: In case of an infeasible individual, the knapsack constraint (cf. formula C.26) is violated, thus the local search procedure has to remove assets from x to move into the direction of F. Otherwise the Direction of the local search is selected randomly. Within each iteration of the Do loop, the partial derivatives of the quotient freturn{x)/frisk(x) are exploited to obtain the decision variable which is to be modified. This quotient is known in finance applications as a Risk-Adjusted Performance Measure, and maximizing this measure implies maximizing freturn a n d minimizing fTisk- We use a computationally efficient credit portfolio risk measurement model (CreditRisk+ from CreditSuisse Financial Products31) which yields a fast approximation of all partial derivatives d\,..., dn for a given portfolio x within a total computation time of only O(n). This saves time since the exploitation of the neighbourhood of a solution x that can be obtained by evaluation of all individuals, which differ from x only in one allele value, would require O(n2) computational steps in our setting. Of course, one has to point out that the exploitation of the gradient is a heuristic approach here since we calculate the change of frisk(x)I freturn(x) for infinitely small changes of the decision variables and actually change the decision variables in a large discrete step afterwards. The local search operation is iterated at most Stepmax £ N times, and by empirical tests we recommend a value of 0 < Stepmax < 4 depending on the problem instance (again, this parameter setting will be discussed later). Moreover, the local search iteration is terminated if no decision variable is left to be changed, if an a priori infeasible solution has become feasible, or if an a priori feasible solution would either become infeasible or could not be improved at least in one of the two objectives. Besides the fact that the gradient-based local search is rather heuristic, the empirical results from using this problem-specific variation operator are striking: In our empirical studies, we applied the hybrid MOEA approach to several portfolios, with the number of assets ranging from 20 < n < 386. The portfolios correspond to typical real-world loan portfolios from German banks. To obtain a true judgement of the hybrid algorithm's performance, we compared the approximated feasible non-dominated solutions to the globally optimal solutions provided by a complete enumeration of the search space for small problem instances. In these comparisons, the hybrid algorithm found a well-converged approximation of the globally optimal solutions within seconds or a few minutes while the enumeration took hours. Moreover, we compared the hybrid MOEA ceteris paribus to
646
F. Schlottmann and D. Seese
its MOEA counterpart without the additional local search variation operator (piocai — 0) and performed 50 independent runs for the respective problem instance to obtain e. g. average performance measures for each algorithm. A summarizing observation is that the hybridization yielded an average improvement of about 17% to 95% for the set coverage metric criterion (cf. Zitzler25) which measures the dominance between solutions in two approximation sets in relation to their cardinality, and the maximum spread of solutions in the objective function space was raised by about 1% to 6% on average. Moreover, the standard deviation of the performance measures over the 50 respective runs was lower for the hybrid approach. For setting the parameters piocai and Stepmax we recommend the following guideline: If the problem is very discrete in its nature (i. e. n fa 30 or lower), piocai should be chosen very low (e. g. pi0Cai — 0.01), since the gradient is a rather imprecise measure for the actual changes in objective function values depending on variations of the decision variables (this is also indicated by empirical results). For higher dimensionality, the problem typically gets smoother in the objective function values even if the decision variables are binary, thus the gradient-based local search should be applied to more individuals in the population. For such instances, we obtained good results with piocai values up to 0.2. The considerations for Stepmax are similar, but in general this parameter should be kept small compared to n, otherwise the local search operator might run into the same local optima again and again, even if it starts from portfolios which are far away from each other both in the objective function and in the decision variable space. Our approach can be summarized as a problem-specific MOEA hybridized with gradient-based local search which is applied to a discrete non-convex constrained multi-criteria problem from the context of portfolio credit risk management. It contains elements and ideas from different MOEAs as well as from classical quantitative methods in finance. 26.4. Conclusion and Future Research Directions In the previous sections we discussed several successful applications of MOEAs to portfolio management problems which are very common in finance, particularly in real-world asset management and trading. All the approaches discussed so far in this chapter incorporate problem-specific knowledge - besides the standard MOEA elements - to obtain a more powerful algorithm compared to a straightforward MOEA application. From a theoretical point of view this is not surprising: It is a well known
Financial Applications of MOEAs: Recent Developments and Future Research
647
result from complexity theory that there is no uniform problem solving approach which can be successfully applied to all possible algorithmic problems while guaranteeing an efficient solution. Hence, MOEAs cannot be such a tool, either. Also from a practitioner's perspective, one cannot expect that a simple straightforward application of the basic ideas underlying MOEAs necessarily leads to success. Instead, it is highly recommended to hybridize MOEAs with other search algorithms and problem-specific methods since both the multi-objective as well as the evolutionary approach offer high potential for a successful hybridization. The applications summarized in this chapter are good examples of such algorithms. We refer the reader also to Takada et al.32 and Mukerjee et al.33 for further interesting results in the context of portfolio selection. Of course, reasonable applications of MOEAs to financial problems assume that the respective problem under consideration is actually multidimensional in its objectives. At first, this might seem to be a trivial statement, but particularly in economics and finance, this requirement is contrary to the tendency to build models as simple as possible. As a consequence of this tendency, many financial problems are modelled in a singleobjective way even if they are naturally multi-objective. The portfolio selection problem denned in section 26.3.1 is a typical example. Thus, to extend the focus of MOEAs to a broader range of financial applications beyond portfolio selection problems - this is what we consider to be mission-critical for a better acceptance of MOEAs from finance researchers' point of view - one has to look for problems which are not sufficiently solved by considering simple (i. e. single-objective) models and/or standard algorithms. Potential candidates are particularly those problems which require simultaneous calibration of several input parameters or multiple model outputs, e. g. • the calibration of a multi-factor model for the term structure of interest rates to empirical data given for all maturities (e. g. 1 year, 2 years,..., 10 years), • the calibration of the parameters of a non-linear, non-smooth valuation function for an exotic option contract to given empirical asset price histories, • building valuation models from basic building blocks for complex structured finance products for which a closed-form solution has not been found yet. Such problems are really multi-dimensional in the decision variables and the
648
F. Schlottmann and D. Seese
objective functions. Moreover, they typically incorporate constraints, nonlinear and/or non-convex functions as well as other issues which cannot be handled by standard methods. Hence, more flexible solution methods like MOEAs which do not assume strict mathematical properties offer interesting potential not only to solve a single given instance of a problem, but also to enhance our understanding of the problem's structure. For instance, if we obtained a whole set of Pareto-efficient solutions to a difficult parameter calibration problem we might better understand the dependencies and trade-offs between the parameters and the objective functions. And this is very valuable support for the finance model architect to design better models. Due to the flexibility of MOEAs one can potentially build more realistic financial models or respectively, models which are more suitable for practical applications compared to classroom-tailored approaches that are constrained to analytical solution procedures (cf. similar considerations in Goldberg34). For instance, compare our approach from section 26.3.5 to the standard Markowitz problem. While our MOEA-based approach allows the use of the Credit-Value-at-Risk which is a very common risk measure in current practice, the latter approach based on the portfolio variance is not suitable in the context of asymmetric asset return probability distributions occurring in credit risk. Identifying appropriate problems, building corresponding applications and finding the added value beyond solving the respective problem instance in the above sense will be a key challenge both for MOEA researchers and the finance community in the near future. Moreover, we think that it is mission-critical for the success of MOEAs in the long run that their theoretical foundation can be provably enhanced. Cotta and Moscato35 point out some central questions for general evolutionary computation, which of course also apply to MOEAs in finance so they should be stated in our context, too: • Identify multi-objective NP- complete finance problems for which MOEAs have proved not to be competitive against the best other heuristic or approximation algorithm known for those problems. • Identify which financial problems can be approached using a MOEA paradigm and identify the reasons. • For the problems matching the previously listed two items, it will be important to find links to the theory of computational complexity and to the complexity classes these problems belong to. Of special
Financial Applications of MOEAs: Recent Developments and Future Research
649
interest here are approximability and, in particular, parameterized complexity (see Downey & Fellows36 and Fellows37). It is very important for real-world applications to know some conditions under which MOEAs can be applied successfully. Especially in the area of finance where often deep, mathematically founded theories are applied it is important to know why a specific heuristic method succeeds. Furthermore, it is important to prove the performance guarantee. An important question in this context is how the success depends on structural parameters? In Seese & Schlottmann 12 ' 13 ' 38 ' 39 we discussed a structural criterion implying high complexity of algorithmic problems. This criterion, denoted as ABC-criterion, is also relevant in the area of finance and especially risk management, and we think that it could be of interest to investigate how the structural parameters size of embeddable grids, homogeneity and flow of information identified in our studies could influence the performance of MOEAs for different problems. We hope that the interaction between the MOEA community and finance researchers as well as professionals will be intensified by this survey on selected financial MOEA applications and by the potential future research activities pointed out above. Hopefully, more applications will be developed in a broader range of financial problem contexts compared to the nevertheless promising results obtained so far. 26.5. Acknowledgement The authors would like to thank GILLARDON AG financial software for partial support of their work. Nevertheless, the views expressed in this chapter reflect the personal opinion of the authors and are neither official statements of GILLARDON AG financial software nor of its partners or its clients. References 1. David Fogel and Zbiginiew Michalewicz. How to solve it: Modern heuristics. Springer, Heidelberg, 2000. 2. J. Hromkovic. Algorithmics for hard problems. Springer, Heidelberg, 2001. 3. Frank Schlottmann and Detlef Seese. Modern heuristics for finance problems: a survey of selected methods and applications. In S. Rachev and C. Marinelli, editors, Handbook on Numerical Methods in Finance. Springer, Berlin, 2004. 4. S. Chen. Evolutionary computation in economics and finance. Springer, Heidelberg, 2002.
650
F. Schlottmann and D. Seese
5. Kalyanmoy Deb, Multi-objective optimisation using evolutionary algorithms. John Wiley k. Sons, Chichester, 2001. 6. C. Coello, D. Van Veldhuizen, and G. Lamont. Evolutionary Algorithms for solving multi-objective problems. Kluwer, New York, 2002. 7. Andrzej Osyczka. Evolutionary algorithms for single und multicriteria design optimization. Physica, Heidelberg, 2002. 8. J. Aspnes, D. Fischer, M. Fischer, M. Kao, and A. Kumar. Towards understanding the predictability of stock markets. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 745-754. 2001. 9. Michael Garey and David Johnson. Computers and intractability. W. H. Freeman k. Company, New York, 1979. 10. Christos Papadimitriou. Computational complexity. Addison-Wesley, Reading, 1994. 11. H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems. Springer, Heidelberg, 2004. 12. Detlef Seese and Frank Schlottmann. The building blocks of complexity: a unified criterion and selected problems in economics and finance. 2002. Sydney Financial Mathematics Workshop, http://www.qgroup.org.au/SFMW. 13. Detlef Seese and Frank Schlottmann. The building blocks of complexity: a unified criterion and selected applications in risk management. 2003. Complexity 2003: Complex behaviour in economics, Aix-en-Provence, http://zai.ini.unizh.ch/www_complexity2003/doc/Paper _Seese.pdf. 14. Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, and Viggo Kann. Complexity and approximation. Springer, Heidelberg, 1999. 15. Harry Markowitz. Portfolio selection. Journal of Finance, 7:77ff, 1952. 16. Ganesh Vedarajan, Louis Chan, and David Goldberg. Investment portfolio optimization using genetic algorithms. In John Koza, editor, Late Breaking Papers of the Genetic Programming 1997 Conference, pages 255-263. Stanford University, California, 1997. 17. N. Srinivas and K. Deb. Multiobjective function optimization using nondominated sorting genetic algorithms. Evolutionary Computation, 2(3):221-248, 1995. 18. J. Bean. Genetic algorithm and random keys for sequencing and optimization. ORSA Journal of Computing, (6):154-160, 1994. 19. Dan Lin, Shouyang Wang, and Hong Yan. A multiobjective genetic algorithm for portfolio selection. 2001. Working Paper, Institute of Systems Science, Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing, China. 20. Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: NSGA-II. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. Merelo, and H. Schwefel, editors, Parallel problem solving from nature, LNCS 1917, pages 849-858. Springer, Berlin, 2000. 21. Z. Michalewicz and C. Janikow. Genocop: A genetic algorithm for numerical optimization problems with linear constraints. Communications of the ACM, (12):118, 1996.
Financial Applications of MOEAs: Recent Developments and Future Research 651 22. K. Deb and R. Agarwal. Simulated binary crossover for continuous search space. Complex Systems, 9(2):115-148, 1995. 23. Jonathan Fieldsend and Sameer Singh. Pareto multi-objective non-linear regression modelling to aid capm analogous forecasting. In Proceedings of the IEEE/INNS Joint International Conference on Neural Networks (INCNN'02), World Congress on Computational Intelligence, Vol. 1, pages 388393. Honolulu, Hawaii, 2002. 24. William Sharpe. Capital asset prices, a theory of market equilibrium under conditions of risk. Journal of Finance, 19:425-442, 1964. 25. Eckart Zitzler and Lothar Thiele. An evolutionary algorithm for multiobjective optimization: the strength pareto approach. Technical Report 43, Eidgenoessisch-Technische Hochschule, Zuerich, 1998. 26. Frank Schlottmann and Detlef Seese. A hybrid genetic-quantitative method for risk-return optimisation of credit portfolios. In Carl Chiarella and Eckhard Platen, editors, Quantitative Methods in Finance 2001 Conference abstracts, page 55. University of Technology, Sydney, 2001. Full paper: http://www.business.uts.edu.au/finance/resources/qmf2001/Schlottmann_F.pdf. 27. Frank Schlottmann and Detlef Seese. Hybrid multi-objective evolutionary computation of constrained downside risk-return efficient sets for credit portfolios. 2002. 8th International Conference of the Society for Computational Economics: Computing in Economics and Finance, Aix-en-Provence, http://www.cepremap.ens.fr/sce2002/papers/paper78.pdf. 28. Frank Schlottmann and Detlef Seese. Finding constrained downside riskreturn efficient credit portfolio structures using hybrid multi-objective evolutionary computation. In G. Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder, and K. Vollmer, editors, Credit risk, pages 231-266. Physica, Heidelberg, 2003. 29. Frank Schlottmann and Detlef Seese. A hybrid heuristic approach to discrete portfolio optimization. Computational Statistics and Data Analysis, 2004. To appear. 30. Frank Schlottmann. Komplexitaet und hybride quantitativ-evolutionaere Ansaetze im Kreditportfoliorisikomanagement (in German). PhD thesis, University Karlsruhe, Karlsruhe, 2003. 31. CreditSuisse Financial Products. CreditRisk+(tm). 1997. http://www.csfp.co.uk/creditrisk/assets/creditrisk.pdf. 32. Y. Takada, M. Yamamura, and S. Kobayashi. An approach to portfolio selection problems using multi-objective genetic algorithms. In In Proceedings of the 23rd Symposium on Intelligent Systems, pages 103-108. 1996. 33. A. Mukerjee, R. Biswas, K. Deb, and A. Mathur. Multi-objective evolutionary algorithm for the risk return trade-off in bank loan management. Technical Report 2001005, KanGAL, Kanpur, India, 2002. 34. David Goldberg. Genetic and evolutionary algorithms in the real world. Technical Report 99013, Department of General Engineering, University of Illinois, Urbana, March 1999. 35. Carlos Cotta and Pablo Moscato. Evolutionary computation: challenges and duties, pages 1-15, 2002. Preprint of Dept. Lenguajes y Ciencias de la Com-
652
36. 37. 38.
39.
F. Schlottmann and D. Seese
putation, Universidad de Malaga and The School of Electrical Engineering and Computer Science, University of Newcastle. Rod Downey and Mike Fellows. Parameterized complexity theory. Springer, Berlin, 1999. Mike Fellows. Parameterized complexity: The main ideas and connections to practical computing. Electronic Notes in Theoretical Computer Science, 61, 2002. http://www.elsevier.nl/locate/entcs/volume61.html. Detlef Seese and Frank Schlottmann. Large grids and local information flow as a reason for high complexity. In Gerry Frizelle and Huw Richards, editors, Tackling industrial complexity: the ideas that make a difference, Proceedings of the 2002 Conference of the Manufacturing Complexity Network, pages 193207. University of Cambridge, Cambridge, UK, 2002. Detlef Seese and Frank Schlottmann. Structural reasons for high complexity: A survey on results and problems, pages 1-160, 2003. University Karlsruhe, unpublished manuscript.
CHAPTER 27 EVOLUTIONARY MULTI-OBJECTIVE OPTIMIZATION APPROACH TO CONSTRUCTING NEURAL NETWORK ENSEMBLES FOR REGRESSION
Yaochu Jin, Tatsuya Okabe and Bernhard Sendhoff Honda Research Institute Europe Carl-Legien-Str.30, 63073 Offenbach, Germany E-mail: [email protected] Neural network ensembles have shown to be very effective in improving the performance of neural networks when they are used for classification or regression. One essential issue in constructing neural network ensembles is to ensure that the ensemble members have sufficient diversity in their behavior. This chapter suggests a multi-objective approach to generating diverse neural network ensemble members. A genetic algorithm is used to evolve both the weights and structure of the neural networks. Besides, the R-prop learning algorithm is employed for efficient life-time learning of the weights. Different complexity criteria, such as the number of connections, the sum of absolute weights and the sum of squared weights have been adopted as an additional objective other than the approximation accuracy. Ensembles are constructed using the whole set or a subset of found non-dominated solutions. Various methods for selecting a subset from the found non-dominated solutions are compared. The proposed multi-objective approach is compared to the random approach on two regression test problems. It is found that using a neural network ensemble can significantly improve the regression accuracy, especially when a single network is not able to predict reliably. In this case, the multi-objective approach is effective in finding diverse neural networks for constructing neural network ensembles.
27.1. Introduction It has been shown that neural network ensembles are able to improve the generalization performance both for classification and regression17'3. The benefit of using a neural network ensemble originates from the diversity of the behavior of the ensemble members. Basically, diversity of ensemble 653
654
Y. Jin et al
members can be enhanced by using various initial random weights, varying the network architecture, employing different training algorithms or supplying different training data20. In some cases, it is also possible to increase network diversity by generating training data from different sources. For example, the geometry of an object can be represented by parametric or non-parametric methods. Thus, different sources of training data can be obtained for describing certain performance of the same object. Compared to the above-mentioned methods that achieve diversity implicitly, methods for explicitly encouraging diversity among ensemble members have been widely studied in the recent years. Measures for increasing diversity include a diversity index16, degree of decorrelation19, or degree of negative correlation14-15 between the output of the candidate networks. Individual neural networks in an ensemble can be trained either independently, sequentially and simultaneously11. In the first case, neural networks are generated separately and no interaction between the networks will be taken into account in training. In the second case, neural networks are generated sequentially. However, the correlation between the current network and the existing ones will be considered too to encourage diversity. In the third case, neural networks are trained simultaneously, not only minimizing the approximation error, but also encouraging diversity among individual networks. Obviously, in the latter two approaches, diversity is taken into account explicitly. It is believed that one possible disadvantage of simultaneous training is that the networks in the population could be competitive11. In training single neural networks, regularization techniques have widely been employed to improve the generalization performance of neural networks3. A general idea is to include an additional term in the cost function of learning algorithms, often known as the regularization, to avoid overfitting the training data. Actually, most diversity based methods for generating ensembles can also be seen as a kind of regularization techniques. From the multi-objective optimization point of view, adding a regularization term in the cost function is equivalent to combining two objectives using a weighted aggregation formulation. Thus, it is straightforward to re-formulate the regularization techniques as multi-objective optimization problems. Such ideas have been reported4. In that chapter, a variation of the e-constraint algorithm was adopted to obtain one single Pareto-optimal solution that simultaneously minimizes the training error and the norm of the weights. Similar work has also been reported1, where a multi-objective evolutionary algorithm is used to minimize the approximation error and the
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 655
number of hidden nodes of the neural network. Again, only the one with the minimal approximation error has been selected for final use. In addition, multi-objective optimization has been employed to evolve neural network modules in a cooperative co-evolution framework to increase diversity of the modules7. This chapter presents a method for generating a set of Pareto-optimal neural networks for constructing neural network ensembles. The genetic algorithm with Lamarckian inheritance for evolving neural networks9 is adapted to the multi-objective optimization purpose. To this end, the elitist non-dominated sorting and the crowded tournament selection suggested6 are adopted for fitness assignment and selection. The whole obtained nondominated set or a subset of it is used to construct neural ensembles. The performance of the ensembles are compared on two test problems. Ensembles whose members are generated using the multi-objective approach is also compared to those whose member networks are generated independently. It is shown that the performance of the ensembles depends to a large degree on the features of the training, validation and test data. 27.2. Multi-Objective Optimization of Neural Networks 27.2.1. Parameter and Structure Representation Network
of the
A connection matrix and a weight matrix are employed to describe the structure and the weights of the neural networks. Obviously, the connection matrix specifies the structure of the network whereas the weight matrix determines the strength of each connection. Assume that a neural network consists of M neurons in total, including the input and output neurons, then the size of the connection matrix i s M x ( M + l ) , where an element in last column indicates whether a neuron is connected to a bias value. In the matrix, if element dj,i — l,...,M,j — 1,...,M equals 1, it means that there is a connection between the i-th and j-th neuron and the signal flows from neuron j to neuron i. If j = M + 1, it indicates that there is a bias in the i-th neuron. Obviously, for a purely feedforward network, the upper part of the matrix, except the M + 1-th column is always zero. Fig. 27.1 illustrates a connection matrix and the corresponding network structure. It can be seen from the figure that the network has one input neuron, two hidden neurons, and one output neuron. Besides, both hidden neurons have a bias. The strength (weight) of the connections is defined in the weight matrix.
656
Y. Jin et al
Fig. 27.1. A connection matrix and the corresponding network structure.
Accordingly, if the c,j in the connection matrix equals zero, the corresponding element in the weight matrix must be zero too.
27.2.2. Objectives in Network
Optimization
The most common objective function (also known as the error function or the cost function) in training or evolving neural networks is the mean squared error (MSE): E=
^I>dW-^))2>
(B.I)
where N is the number of training samples, yd{i) is the desired output of the i-th sample, and y(i) is the network output for the i-th sample. For the sake of clarity, we assume here that the neural network has only one output. Other error functions, such as Minkowski error or cross-entropy can also be used.3 It has been found that neural networks can often over-fit the training data, which means that the network has a very good approximation accuracy on the training data, but a very poor one on unseen data. Many methods have been developed to improve the generalization performance of neural networks. 3 A very popular technique to improve the generalization performance is known as regularization, which usually adds a penalty term to the error function: J = E + Xfl,
(B.2)
where A is a coefficient that controls the extent to which the regularization influences the optimal solution, and 0, is known as the regularizer. A simple class of regularizers is to penalize the sum of squared weights, also known
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 657
as the Gaussian regularizer, which favors smooth output of the network:
(8.3) 1
k
where k is an index summing up all weights. Alternatively, the sum of absolute weights, also known as the Laplace regularizer can be used:
n = Y^\wi\.
(B.4)
i
The Gaussian regularizer and the Laplace regularizer are also known as weight decay in neural network training. In neural network training using regularization techniques, it is often a matter of trial-and-error to determine the coefficient A, although methods have been developed to optimize the coefficient based on empirical, algebraic or Bayesian estimation of the generalization error on the validation data.21 This situation is quite easy to understand from the multi-objective point of view. For each given A, one single Pareto-optimal solution will be obtained. Obviously, the regularization technique in equation (B.2) can be reformulated as a bi-objective optimization problem: min{/ 1 ,/ 2 } h = E,
(B.5) (B.6)
h = n,
(B.7) where E is defined in equation (B.I), and fi is one of the regularization term denned in equation (B.3) or (B.4). Basically, the weight decay techniques try to reach a good trade-off between the complexity of neural networks and the approximation accuracy to avoid overfitting the training data. Another straightforward index for measuring the complexity of neural networks is the sum of connections- in the network:
" = EEc^*
3
(B-8)
Obviously, the smaller the number of connections in a network is, the less complex the network. Note that this regularizer is well suited for evolutionary optimization although it is not applicable to gradient-based learning algorithms due to its discrete nature. For this reason, we term it as evolutionary regularization. In the following study, the sum of connections, the sum of absolute weights and the sum of squared weights are employed as the second objective in optimization.
658
Y. Jin et al
27.2.3. Mutation and Learning A genetic algorithm with a hybrid of binary and real-valued coding has been used for optimizing the structure and weights of the neural networks. The genetic operators used are quite specific. Four mutation operators are implemented on the chromosome encoding the connection matrix, namely, insertion of a hidden neuron, deletion of a hidden neuron, insertion of a connection and deletion of a connection.9 A Gaussian-type mutation is applied on the chromosome encoding the weight matrix. One of the five mutation operators is randomly selected and performed on each individual. No crossover has been employed in this algorithm. After mutation, an improved version of the Rprop algorithm10 has been carried out to train the weights. This can be seen as a life-time learning within a generation. After learning, the fitness of each individual with regard to the approximation error (/i) is updated. In addition, the weights modified during the life-time learning are also encoded back into the chromosome, which is known as the Lamarckian type of inheritance2. In the life-time learning, only the first objective, i.e., the approximation error will be minimized. The Rprop learning algorithm is employed in this work because it is believed that the Rprop learning algorithm is faster and more robust compared with other gradient-based learning algorithms. Let u>ij denotes the weight connecting neuron j and neuron i, then the change of the weight (Aw,j) in each iteration is as follows:
A^) = - S K1^H?'
where sign(-) is the sign function, A^y > 0 is the step-size, which is initialized to Ao for all weights. The step-size for each weight is adjusted as follows:
{
£+ • A(4"1} A^~ '
if a^" 1 ' • »^!1 > n
(B.10)
, otherwise
where 0 < £~ < 1 < £ + . To prevent the step-sizes from becoming too large or too small, they are bounded by A m ; n < Ajj < A maa: . One exception must be considered. After the weights are updated, it is necessary to check if the partial derivative changes sign, which indicates that the previous step might be too large and thus a minimum has been
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 659
missed. In this case, the previous weight change should be retracted: aro«., =
_
a
3
r
U
^
.
OWij
^
<
0
.
(B.11)
OWij
Recall that if the weight change is retracted in the i-th iteration, the dE^/dwij should be set to 0. It is argued that the condition for weight retraction in equation (B.ll) is not always reasonable.10 The weight change should be retracted only if the partial derivative changes sign and if the approximation error increases. Thus, the weight retraction condition in equation (B.ll) is modified as follows: AwW = - A 13 ^ 1 ' , if ^—^
dwij
•^
awij
< 0 and if £ « > E^K
(B.12)
It has been shown on several benchmark problems that the modified Rprop (termed as Rprop + ) exhibits consistent better performance than the Rprop.10
27.2.4. Elitist Non-Dominated Sorting and Crowded Tournament Selection After mutation and life-time learning, the offspring and the parent populations are combined. Then, a non-domination rank (r,) and a local crowding distance (di) are assigned to each individual in the combined population as suggested.6 After that, the crowded tournament selection6 is implemented. In the crowded tournament selection, two individuals are randomly picked out from the combined population. If individual A has a higher (better) rank than individual B, individual A is selected. If they have the same rank, then, the one with a better crowding distance (the one locating in a less crowded area) is selected. Compared to the fitness sharing techniques, the crowded tournament selection gurantees that the one with a better rank is selected. The crowding distance can be calculated either in the parameter or objective space. In this work, the distance is computed in the objective space.
27.3. Selecting Ensemble Members So far, the size of ensembles is often determined empirically, with a few exceptions.22'23 A genetic algorithm is used to select a subset of the final population as ensemble members.22 In another work23, a genetic programming has been employed to search for an optimal ensemble size.
660
Y. Jin et al
Selecting a subset from a given number of networks can also be converted into finding out the optimal weight for each candidate network based on a certain criterion. Given TV neural networks, the final output of the ensemble can be obtained by averaging the weighted outputs of the ensemble members:
(c.1) fc=i
where j/ fe ) and a^ are the output and its weight of the k-ih neural network in the ensemble. Usually, all weights are equally set to 1/N, and the overall output is known as simple average. If the weights are optimized based on a certain criterion, the overall output is then called weighted average. Given a set of validation data, the expected error of the weighted output of the ensemble can be calculated by: F^^flPCy, 1=1
(C.2)
j=l
where Cij is the error correlation matrix between network i and network j in the ensemble: Cij = E[(yi - yfXyj - yf)],
(C.3)
where E(-) denotes the mathematical expectation. It has been shown17 that there exists an optimal set of weights that minimizes the expected prediction error of the ensemble:
v N (r, W (fc)
a where 1
_ -
Lj=i^k3) N
N
l^%=\ 2^7=1 v ° u ;
(c
,
4)
^.4j
< N.
However, a reliable estimation of the error correlation matrix is not straightforward because the prediction errors of different networks in an ensemble are often strongly correlated. Alternatively, the recursive leastsquare method can be employed to search for the optimal weights. 22 Other methods have also been proposed to solve this problem. 12-24 In this investigation, a canonical evolution strategy is employed to find the optimal weights to minimize the expected error in equation C.2. In the multi-objective optimization approach to generating neural network ensemble members, the easiest way is to select all non-dominated solutions found in the optimization as ensemble members. In the following
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 661
empirical investigations, we compare three cases. In the first case, all nondominated solutions found in the final population are used to construct an ensemble. In the second case, a well distributed subset of the non-dominated solutions are selected by hand. Finally, the criterion in equation (C.2) is minimized using an evolution strategy based on a validation data set. 27.4. Case Studies 27.4.1. Experimental
Settings
The population size of the GA used for evolving neural networks is 100 and the optimization is run for 200 generations. In mutating the weights, the standard deviation of the Gaussian noise is set to 0.05. The weights of the network are initialized randomly in the interval of [—0.2,0.2] and the maximal number of hidden neurons is set to 10. In the Rprop+ algorithm, the step-sizes are initialized to 0.0125 and bounded between [0,50] in the adaptation, and £~ = 0.2, £ + = 1.2. Note that a number of parameters needs to be specified in the Rprop+ algorithm, however, the performance of the algorithm is not very sensitive to these values.10 In our work, we use the default values suggested in reference 10 and 50 iterations are implemented in each life-time learning. A standard (15,100)-ES has been used to optimize the ensemble weights in equation (C.I) based on the expected error on the validation data. The initial step-sizes of the evolution strategy are set to 0.0001 and the weights are initialized randomly between 0.005 and 0.01. The weight optimization has been run for 200 generations. 27.4.2. Results on the Ackley Function The simulation study has been first conducted on the 3-dimensional Ackley function.13 100 samples are generated randomly between [—3,3], of which the first 80 samples are used as training data, another 10 data are used as validation data, and the remaining 10 data samples are used as test data. In the first case, the approximation error and the number of connections described in equation (B.8) are used as two objectives in evolving the neural network. The non-dominated solutions in the 200-th generation are plotted in Fig. 27.2. The most straightforward approach is to use all obtained non-dominated solutions to construct the ensemble. In the final generation, 40 solutions have been found to be non-dominated. The MSE of the best and worst single networks from the 40 solutions, the MSE of the simple average ensemble,
662
Y. Jin et al
Fig. 27.2. Non-dominated solutions when number of connections is used as the second objective.
and the MSE with the weights being optimized using the algorithm presented in Section 27.3 are given in Table 27.106. Notice that in calculating the MSE of the ensemble on the test data, the weights are those optimized on the basis of the validation data. Table 27.106. MSE of the ensemble consisting of all 40 non-dominated solutions, validation test
best single 0.121 0.348
worst single 2.29 2.07
simple average 0.409 0.179
weighted average 0.118 0.361
It is suggested that it might be better to use a subset of available neural networks than to use all.17'22 For this purpose, different strategies have been tried. For example, we can select a "representative" subset from the non-dominated solutions to construct a neural network ensemble. Another possibility is to select the non-dominated solutions whose MSE error .on training data is smaller than a specified value, or to select those whose MSE on the validation data is smaller than a given value. Fig. 27.3 shows the 14 heuristically selected representative solutions (filled circles). The MSE of the best and worst single networks, the MSE of the ensemble using simple average and weighted average of the 14 representatives on the validation as well as the test data are shown in Table 27.107.
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 663
Fig. 27.3. 14 selected representatives. Table 27.107. MSE of the ensemble consisting of 14 heuristically selected members, validation test
best single
worst single
simple average
weighted average
0.160 0.468
2.28 2.07
0.279 0.236
0.074 0.449
Some observations can be made from the results. First, the MSE of the ensemble using simple average of the 14 selected representatives is worse than that using all non-dominated solutions. Second, the ensemble with optimized weights on the basis of the validation data exhibits better performance on the validation data than the one with simple average. Unfortunately, its MSE on the test data is larger than that of the ensemble using simple average. This implies that validation data set and the test data set might not have the same statistical characteristics. In this case, it might be not practical to optimize the weights based on the validation data for predicting unseen data sets. The results for ensemble members selected according to the MSE on the training and validation data, respectively, are shown in Table 27.108 and 27.109. From Tables 27.108 and 27.109, it can be seen that the MSE on the test data of both ensembles are larger than that of the ensemble consisting of the 14 representative networks. Furthermore, good performance on data training or validation data does not mean good performance on test data.
664
Y. Jin et al
Table 27.108. MSE of the ensemble consisting of 14 networks whose MSE on the training data is smaller than 0.01. best single worst single simple average weighted average validation 0.57 1.09 0.84 0.50 test 0.34 0.61 0.43 0.42 Table 27.109. MSE of the ensemble consisting of 12 networks whose MSE on validation data is smaller than 0.50. best single worst single simple average weighted average validation 0.12 0.57 0.073 0.038 test 0.50 1.40 0.535 0.618
Fig. 27.4. Non-dominated solutions when the sum of absolute weights is used as the second objective. The shaded circles denotes those selected as a subset for constructing an ensemble.
Optimization of the weights of the ensemble members do not necessarily reduces the error on the test data. Next, the sum of absolute weights in equation (B.4) is adopted as the second objective in the evolution. The obtained non-dominated solutions are shown in Fig. 27.4. Similar to the above simulations, we calculate the best and worst MSE of a single network, the MSE of the ensemble with simple and weighted average over all the 32 non-dominated solutions, or over a heuristically selected representative subset (the circles filled with a star in Fig. 27.4),
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 665
a subset whose MSE on the training data is smaller than 0.1, or a subset whose MSE on the validation data is smaller than 0.5. The results are presented in Tables 27.110, 27.111, 27.112 and 27.113, respectively. Table 27.110. MSE of the ensemble using all 32 non-dominated solutions. validation test
best single 0.174 0.637
worst single 1.52 1.91
simple average 0.336 0.491
weighted average 0.152 0.558
Table 27.111. MSE of the ensemble consisting of 14 heuristically selected members, validation test
best single 0.410 0.637
worst single 1.52 1.91
simple average 0.363 0.361
weighted average 0.152 0.453
Table 27.112. MSE of the ensemble consisting of 15 networks whose MSE on the training data is smaller than 0.1. best single worst single simple average weighted average validation test
0.460 0.636
0.62 1.72
0.524 1.21
0.460 1.38
Table 27.113. MSE of the ensemble consisting of 9 networks whose MSE on the validation data is smaller than 0.5. best single worst single simple average weighted average validation 0.174 0.46 0.172 0.150 test 0.770 1.46 0.560 0.570
From the above results, it can be seen that the use of ensemble is a reliable way to reduce the prediction error, although the ensemble quality must not be better than the best member in it. The results also suggest that it is still an open question how to properly select an optimal subset from a set of obtained non-dominated solutions to construct an ensemble. Networks with good performance on either training or validation data sets are not necessarily good candidates for the test data set. The optimization algorithm presented in Section 27.3 is very effective in minimizing the ensemble prediction error on the validation data. However, this does not
666
Y. Jin et al
imply that the MSE on the test data will be reduced too using the optimal weights obtained on the validation data. Finally, a single objective optimization has been run for 14 times, where the MSE on the training data is used as the fitness function. The individual networks are generated randomly and no interactions between the networks have been considered. In generating the networks, all parameter settings are the same as in the multi-objective case. These 14 neural networks are then used to construct a neural ensemble and the results on validation and test data are presented in Table 27.114. The results seem worse than those from the ensembles consisting of 14 networks that are generated using multiobjective optimization, as shown in Tables 27.107 and 27.111. Table 27.114. MSE of the ensemble consisting of 14 networks randomly generated using the single objective optimization. validation test
best single 0.270 0.655
worst single 1.79 1.81
simple average 0.320 0.532
weighted average 0.220 0.595
Finally, a number of non-dominated solutions are obtained using the MSE and the sum of squared weights as two objectives, which are shown in Fig. 27.5. Simulations have been conducted to study the different methods for selecting ensemble members and very similar results are obtained. Thus, these results will not be presented in detail here. 27.4.3. Results on the Macky-Glass
Function
In this subsection, neural network ensembles are used to predict the output of the Mackey-Glass series: (D1) " " ' 1 ^ - ' , ) - * where a = 0.2, /? = 0.1, r = 17. The task of the neural ensemble is to predict x(t + 85) using x(t), x(t — 6), x(t — 12), anda:(t —18). According to reference 8, 500 samples are generated for training, 250 samples for validation and the another 250 samples for test. In the 200-th generation, 34 non-dominated solutions have been found, which are illustrated in Fig. 27.6. All the non-dominated solutions are used to construct an ensemble. The results from the best and the worst single networks, and those from simple average and weighted average of the ensemble members are provided in Table 27.115. From these results, we notice
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 667
Fig. 27.5. Non-dominated solutions when the sum of squared weights is used as the second objective.
first that the performance of the simple average ensemble is better than the worst member, but worse than the best one. Another important factor is that the performance of the ensemble using weighted averaging exhibits better performance than the one with simple averaging not only on validation data, but also on the test data. This indicates that in this example, the validation data are able to reflect the feature of the test data. Table 27.115. MSE of the ensemble consisting of all 34 non-dominated solutions. validation test
best single 0.0111 0.0097
worst single 0.0488 0.0518
simple average 0.0134 0.0118
weighted average 0.0117 0.0104
As done in the previous Section, a second ensemble is constructed by selecting 14 representative solutions from the 34 non-dominated solutions, which are the filled circles in Fig. 27.6. The results of this ensemble are presented in Table 27.116. Next, we construct another two ensembles by selecting the networks having the MSE smaller than 0.012 on the training data and on the validation data, respectively. According to this criterion, 6 and 7 networks have been selected and the results are given in Tables 27.117 and 27.118. From these results, it can be seen that no big differences exist between
668
Y. Jin et al
Fig. 27.6. Non-dominated solutions when the number of connections is used as the second objective. The filled circles are the representatives. Table 27.116. MSE of the ensemble consisting of 14 representatives. validation test
single best 0.0112 0.0097
single worst 0.0488 0.0518
simple average 0.0129 0.0111
weighted average 0.0112 0.0099
Table 27.117. MSE of the ensemble consisting of 6 networks whose MSE on the training data is smaller than 0.012. best single worst single simple average weighted average validation test
0.0112 0.0097
0.0116 0.0105
0.0113 0.0102
0.0112 0.0097
the various methods for selecting ensemble members. Besides, the ensemble with weighted average shows consistent better performance than the one using simple average. However, the performance of the best single network is better than that of the ensemble with simple average, and thus the performance of the ensemble with optimized weighted average is almost the same Table 27.118. MSE of the ensemble consisting of 7 networks whose MSE on the validation data is smaller than 0.012. best single worst single simple average weighted average validation 0.0112 0.0117 0.0114 0.0112 test 0.0097 0.0109 0.0102 0.0097
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 669
as that of the single best, which makes sense. Nevertheless, the ensembles with simple average or optimized weighted average show consistently better performance than that of the single worst network. Furthermore, ensembles consisting of the selected networks based on training or validation error are better than those consisting of all or a heuristically selected subset of the non-dominated solutions. This implies that no significant overfitting occurs during the training. Finally, 14 networks are generated randomly using single objective optimization. The results of this ensemble are shown in Table 27.119. It can be seen that the performance of the ensemble is better than that of the single worst network but worse than that of the single best. Obviously, diversity does not help to improve the performance of the ensemble if no significant overfitting occurs. Table 27.119. MSE of the ensemble consisting of 14 randomly generated networks, validation test
best single 0.01 0.0095
worst single 0.0143 0.0133
simple average 0.0115 0.0111
weighted average 0.01 0.0095
Simulations have also been conducted when the sum of absolute weights or the sum of squared weights serves as the second objective on the MackyGlass series data. The non-dominated solutions from these optimization runs are plotted in Fig. 27.7 and Fig. 27.8, respectively. Notice that nondominated solutions whose MSE on the training data is larger than 0.05 are missing from the 200-th generation. This does not mean that such solutions do not exist. Rather, this is due to the randomness of multi-objective optimization algorithm introduced by the crowded tournament selection. As discussed in reference 5, such randomness occurs when the number of non-dominated solutions in the combined population is larger than the population size. The prediction results of the ensembles constructed from these solutions are omitted here because they are very similar to those presented above when the number of connections is used as the second objective. 27.5. Discussions and Conclusions Approximation accuracy and complexity have been used as two objectives to generate neural networks for constructing ensembles. In the algorithm, ad hoc mutations such as node/connection addition and deletion are employed
670
Y. Jin et al
Fig. 27.7. Non-dominated solutions when the sum of absolute weights is used as the second objective.
Fig. 27.8. Non-dominated solutions when the sum of squared weights is used as the second objective.
without crossover. The Rprop learning algorithm is adopted in life-time learning and a Lamarckian inheritance has been implemented. In selection, the elitist non-dominated sorting and the crowded tournament selection techniques have been used. This algorithm has proved to be effective in generating neural networks trading off between accuracy and complexity
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 671
Fig. 27.9. Trade-off between the MSE on the validation data and the complexity of the neural networks.
through two test problems. Whereas it is able to improve the performance of the ensemble whose members have a trade-off between complexity and accuracy if overfitting occurs, no performance improvement can be expected by use of network ensembles when the networks do not overfit the training data. In fact, it seems that in this case, the network that has the best accuracy on the training data also exhibits the best performance on the test data. Thus, ensembles with different degrees of accuracy will degrade its performance. Note that the proposed method for individual network training belongs to the simultaneous approach. Due to the explicit trade-off between the complexity and accuracy, the individuals in a population are competitive, which is harmful to the performance of the ensemble if the test data has the same feature as the training data. This can easily be observed by plotting the relationship between the MSE on the validation data and the complexity, as shown in Fig. 27.9. It can be seen from the figure that the higher the complexity of the network is, the better. As argued in reference11, it is equally important that the ensemble members cooperate with each other. To this end, the concept of cooperative coevolution18 could play a significant role in generating ensemble members. This will be our next research direction.
672
Y. Jin et al
Acknowledgments The authors would like to thank Edgar Korner and Andreas Richter for their kind support. References 1. H.A. Abbass. Speeding up back-propagation using multiobjective evolutionary algorithms. Neural Computation, 15(ll):2705-2726, 2003. 2. D.H. Ackley and M.L. Littman. A case for lamarckian evolution. In C.G. Langton, editor, Artificial Life, volume 3, pages 3-10. Addison-Wesley, Reading, Mass., 1994. 3. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK, 1995. 4. R. de A. Teixeira, A.P. Braga, R. H.C. Takahashi, and R. R. Saldanha. Improving generalization of MLPs with multi-objective optimization. Neurocomputing, 35:189-194, 2000. 5. K. Deb. Multi-objective Optimization Using Evolutionary Algorithms. Wiley, Chichester, 2001. 6. K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. A fast elitist nondominated sorting genetic algorithm for multi-objective optimization: NSGAII. In Parallel Problem Solving from Nature, volume VI, pages 849-858, 2000. 7. N. Garcia-Pedrajas, C. Hervas-Martinez, and J. Munoz-Perez. Multiobjective cooperative co-evolution of artificial neural networks (multiobjective cooperative networks). Neural Networks, 15:1259-1278, 2003. 8. E. Hartmann and J.D. Keeler. Predicting the future: Advantages of semilocal units. Neural Computation, 3(4):566-578, 1991. 9. M. Hiisken, J. E. Gayko, and B. Sendhoff. Optimization for problem classes Neural networks that learn to learn. In Xin Yao and David B. Fogel, editors, IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (ECNN 2000), pages 98-109. IEEE Press, 2000. 10. C. Igel and M. Hiisken. Improving the Rprop learning algorithm. In Proceedings of the 2nd ICSC International Symposium on Neural Computation, pages 115-121, 2000. 11. Md. M. Islam, X. Yao, and K. Murase. A constructive algorithm for training copperative neural network ensembles. IEEE Trasactions on Neural Networks, 14(4):820-834, 2003. 12. D. Jimenez. Dynamically weighted ensemble neural networks for classification. In Proceedings of International Joint Conference on Neural Networks, pages 753-756, Anchorage, 1998. IEEE Press. 13. Y. Jin, M. Olhofer, and B. Sendhoff. A framework for evolutionary optimization with approximate fitness functions. IEEE Transactions on Evolutionary Computation, 6(5):481-494, 2002. 14. Y. Liu and X. Yao. Negatively correlated neural networks can produce best ensemble. Australian Journal of Intelligent Information Processing System, 4(3-4):176-185, 1997.
Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 673 15. Y. Liu, X. Yao, and T. Higuchi. Evolutionary ensembles with negative correlation learning. IEEE Transactions on Evolutionary Computation, 4(4):380387, 2000. 16. D.W. Opitz and J. W. Shavlik. Generating accurate and diverse members of a neural network ensemble. In Advances in Neural Information Processing Systems, volume 8, pages 535-541, Cambridge, MA, 1996. MIT Press. 17. M.P. Perrone and L.N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 126-142. Chapman & Hall, London, 1993. 18. M.A. Potter and K.A. De Jong. Coperative coevolution: An architechture for evolving coadapted subcomponents. Evolutionary Computation, 8(l):l-29, 2000. 19. B. E. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, 8(3-4):373-384, 1996. 20. A.J.C. Sharkey and N. E. Sharkey. Diversity, selection and ensembles of artificial neural nets. In Proceedings of Third International Conference on Neural Networks and their Applications, pages 205-212, March 1997. 21. S. Sigurdsson, J. Larsen, and L. K. Hansen. On comparison of adaptive regularization methods. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, volume 10, pages 221-230, 2000. 22. X. Yao and Y. Liu. Making use of population information in evolutionary artificial neural networks. IEEE Transactions on Systems, Man, and CyberneticsPart B:Cybernetics, 28(3):417-425, 1998. 23. B.-T. Zhang and J.G. Joung. Building optimal committee of genetic programs. In Parallel Problem Solving from Nature, volume VI, pages 231-240. Springer, 2000. 24. Z.-H. Zhou, J.-X. Wu, Y. Jiang, and S.-F. Chen. Genetic algorithm based selective neural network ensemble. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pages 797-802, Seattle, 2001. Morgan Kaufmann.
CHAPTER 28 OPTIMIZING FORECAST MODEL COMPLEXITY USING MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS
Jonathan E. Fieldsend and Sameer Singh Department of Computer Science, University of Exeter North Park Road, Exeter, EX4 4QF, UK E-mail: [email protected] When inducing a time series forecasting model there has always been the problem of defining a model that is complex enough to describe the process, yet not so complex as to promote data 'overfitting' - the socalled bias/variance trade-off. In the sphere of neural network forecast models this is commonly confronted by weight decay regularization, or by combining a complexity penalty term in the optimizing function. The correct degree of regularization, or penalty value, to implement for any particular problem however is difficult, if not impossible, to know a priori. This chapter presents the use of multi-objective optimization techniques, specifically those of an evolutionary nature, as a potential solution to this problem. This is achieved by representing forecast model 'complexity' and 'accuracy' as two separate objectives to be optimized. In doing this one can obtain problem specific information with regards to the accuracy/complexity trade-off of any particular problem, and, given the shape of the front on a set of validation data, ascertain an appropriate operating point. Examples are provided on a forecasting problem with varying levels of noise. 28.1. Introduction The use of neural networks (NNs), specifically multi-layer perceptrons (MLPs), for classification and regression is widespread, and their continuing popularity seemingly undiminished. This is not least due to their much vaunted ability to act as a 'universal approximator' - that is given sufficient network size, any deterministic function mapping can be modelled. This is typically done where the process (function) is unknown, but where example data have been collected, from which the estimated model is induced. 675
676
J.E. Fieldsend & S. Singh
Seasoned practitioners will however know that the great amenability of NNs is a double edged sword. It is difficult, if not impossible, to tell a priori how complex the function you wish to emulate is, therefore it is difficult to know how complex your NN design should be. Too complex a model design (too many transformation nodes/weights and/or large synaptic weight values) and the NN may overfit its function approximation. It may start modelling the noise on the examples as opposed to generalizing the process, or it may find an overly complex mapping given the data provided. Too few nodes and the NN may only be able to model a subset of the casual processes in the data. Both of these effects can lead a NN to produce substandard results in its future application. Various approaches to confront this problem have been proposed since NNs have become widely applied; such as weight decay regularization to push the NN weights to smaller values (which keeps them in the linear mapping space),5'27 pruning algorithms to remove nodes,21 complexity loss functions31 and topology selection based on cross validation.29 More recently the field of evolutionary neural networks (ENNs) has also been addressing this problem. As the evolutionary approach to training is not susceptible to the local minima trapping of gradient descent approaches a large number of studies have investigated this approach to NN training, a review of a substantial number of these can be found in Yao.33 A number of studies enable the evolution of different sized ENNs, with some studies including size penalization22 similar to the complexity loss functions used in gradient descent approaches. However this leads to the problem of how you define the penalization - as it implicitly means making assumptions about the interaction of model complexity and accuracy of the ENN for your problem (the trade-off between the two). Through using the formulation and methods developed in the evolutionary multi-objective optimization (EMOO) domain,6'8'14'30 the set of solutions that describe the trade-off surface for 2 or more objectives of a design problem can be discovered. This approach can equally be applied to ENN training in order to discover the set of estimated Pareto optimal ENNs for a function modelling problem, where accuracy of function emulation, and complexity of model are the two competing objectives. Previous studies by Abbass 2'3 have tackled this by formulating complexity in terms of the number of transfer units in an ENN, however his model does not easily permit the use of other measures of complexity. As such this chapter will introduce a general and widely applicable methodology for EMOO training of NNs, for discovering the complexity/accuracy trade-off for NN
Optimizing Forecast Model Complexity Using MOEAs
677
modelling problems. The chapter will proceed as follows: a basic outline of the MLP NN model is provided in Section 28.2 for those unfamiliar with NNs. In Section 28.3 the traditional approaches for coping with the bias/variance trade-off are discussed, along with their perceived drawbacks. Section 28.4 presents the general evolutionary algorithm approach to NN training, along with recent work on trading-off network size and accuracy and a new model to encompass many definitions of complexity. In Section 28.5 a set of experiments to validate this new approach are described, using time series exhibiting differing levels of noise. Results from these experiments are reported in Section 28.6. The chapter concludes with a discussion in Section 28.7. 28.2. Artificial Neural Networks The original motivation behind artificial NNs was the observation that the human brain computes in a completely different manner than the standard digital computer,16 which enables it to perform tasks such as pattern recognition and motor control far faster and more accurately than standard computation. This ability is derived from the fact that the human brain is complex, nonlinear and parallel, and has the additional ability to adapt to the environment it finds itself in (referred to as plasticity). Artificial NNs developed as a method to mimic these properties, and terms relating to NN design (neurons, synaptic weights) are taken from the biological description of the brain function. However, it is generally the case that NNs in popular use by researchers use only the concepts of parallelism, non-linearity and plasticity within a mathematical framework, and do not attempt to copy exactly the functions of the brain (which are still not fully understood). The most popular NN model is the multi-layer perceptron (MLP) since the formalization of the backpropagation (BP) leaning algorithm in the early 1980s. The basic design of an MLP is shown in Figure 28.1. The input signal of an MLP (or feature vector) is propagated through the network (neuron by neuron), and transformed during its passage by the combination of the synaptic weights and mathematical properties of the neurons, until on the final layer an output signal is generated. In the example shown in Figure 28.1 the network is defined as being fully connected, each neuron (or node) being connected to each other neuron in the layers directly preceding and proceeding it, and having a / : 3 : 2 : 1 topological design. That is it has / input nodes, followed by two hidden layers,
678
J.E. Fieldsend & S. Singh
Input Layer
First Second Output Hidden Hidden Layer Layer Layer Error signals. "- — Function signals.
Fig. 28.1. Generic multi-layer perceptron, showing the forward flow of the input signal (function signal) and the backward flow of the error signal.
the first containing 3 nodes and the second 2 nodes, with a single output node. The two middle layers are referred to as hidden due the fact that the user does not commonly observe the inputs or outputs from these nodes (unlike the input layer where the feature vector is known and the output layer where the output is observed). The most common transfer function used in the MLP is the sigmoid function tp(). For the jth hidden node of a network with a vector of z inputs its logistic form is defined as:
¥> (z) =
/ ,
1
—n
\Y
(R1)
l + exp^-lBj + E^iWijZi)) where Wij is the ith input weight between node j and the previous layer, Zi is the output of the ith node in the layer preceding node _;' and Bj is the weight of the bias input to the jth node. The bias is similar to the intercept term used in linear regression and has a fixed value for all patterns. The adjustment of the synaptic weight parameter variables within an MLP are most commonly performed in a supervised learning manner using the fast backpropagation algorithm. Sequences of input and resultant outputs are collected from an undefined functional process /(a) *-> b. This set of patterns are then presented to the MLP in order for it to emulate the unknown function. The kth input pattern a(fc) is fed through the network
Optimizing Forecast Model Complexity Using MOEAs
679
generating an output b(A;), an approximation of the desired output b (illustrated with the arrows pointing to the right in Figure 28.1). The difference between the desired output b and the actual output b(fc) is calculated (usually as the Euclidean distance between the vectors), and this error term, E, is then propagated back through the network, proportional to the partial derivative of the error at that node (illustrated with the dashed arrows pointing to the left in Figure 28.1). An in-depth discussion of the history and derivation of the backpropagation algorithm (and associated delta rule), through the calculus chain, can be found in Bishop5 and Haykin16. Each pattern in turn is presented to the MLP, with its weights adjusted using the delta rule at each iteration. Only a fraction of the change demanded by the delta rule is usually applied to avoid rapid weight changes from one pattern to the next. This is known as the learning weight. A momentum term (additionally updating weights with a fraction of their previous update) is also commonly applied. The passing of an entire pattern set through the MLP is called a training epoch. MLPs are usually trained, epoch by epoch, until the observed average error of the function approximation reaches a plateau. The generalization ability of the approximated function is then assessed on another set of collected data which the NN has not been trained on. In recent years there has been increasing interest in the use of evolutionary computation methods for NN training.33 In these ENNs the adjustable parameters of an NN (weights and also sometimes nodes) are represented as a string of floating point and/or binary numbers, the most popular representation being the direct encoding form.33 Given a maximum size for a three layer feed-forward MLP ENN of / input units (features), H hidden units in the hidden layer, and O output units, the vector length used to represent this network within an MOEA is of size: (I + 1)-H + (H + 1)-O + I + H
(B.2)
The first (I + 1) • H + (H + 1) • O genes are floating point and store the weight parameters (plus biases) of the ENN, the next I + H are bit represented genes, whose value (0 or 1) denotes the presence of a unit or otherwise (in the hidden layer and input layer). These decision vectors are manipulated over time using the tools of evolutionary computation (usually evolution strategies (ESs) or genetic algorithms (GAs)). At each time step (known as a generation) the ENNs represented by the new decision vectors are evaluated on the training data, and selection for parameter adjustment in the next generation is typically based on their relative error on this
680
J.E. Fieldsend & S. Singh
data. The popularity of these approaches to NN training is that they are not susceptible to trapping in local minima that gradient descent based learning algorithms are, and in addition, can use quite complex problem specific error functions which may be difficult to propagate using derivatives. Because of the high function complexity that NNs can emulate, there is always a risk that the NN will simply map the input and output vectors directly without recourse to creating an internal representation of their generation process. An illustration of this is shown in Figure 28.2.
Fig. 28.2. Overfitting illustration. Explanatory variable a and dependent variable b with noise. Top: generating function. Bottom: overfitted signal.
In the illustration the model approximator is too complex, and therefore fits exactly to the noisy data points instead of modelling the smoother generating process.
Optimizing Forecast Model Complexity Using MOEAs
681
28.3. Optimal Model Complexity Procedures to prevent the over-fitting of NNs can be categorized as falling into two broad camps. The first group of methods take the approach that the model used may be over specified (have more complexity than is needed to model the problem), but by judicious use of more that one data set in the training process the risk of over fitting can be minimized. The type most frequently used is the so called 'early stopping' method, where a an additional validation data set is used in the training process,25 other more advanced methods based on bootstrapping27 are also in use. The second group of methods tackles over-fitting with conscious attempts to restrict the complexity of the NN during its training process, sometimes in conjunction with early stopping methods. 28.3.1. Early Stopping There are a number of different approaches to early stopping.25 The traditional approach is to train a network and monitor its generalization error on a validation set and stop training when the error on this set is seen to rise. The general problem with this approach is that the generalization curve may exhibit a number of local minima, so the early stopping may in fact be too 'early'. In order to overcome this the NN is trained as normal, without stopping until the training error reaches a plateau, at the same time however the generalization error on a validation set is checked - and the network parameters when this is lowest recorded and used. 28.3.2. Weight Decay Regularization and Summed Penalty Terms One of the most common approaches to prevent over-fitting through complexity minimization is that of weight decay regularization. This approach attempts to inhibit the complexity of a particular model by restricting the size of its weights, as it is known that larger weights values are needed to model functions with a greater degree of curvature (and therefore complexity).5 In its standard form the sum of the squares of the NN weights are used as a penalty term within the error function, such that Enew=E
+ /3Q
(C.I)
where E is the default error function (commonly Euclidean error), 0 is the sum of squares of the NN weights, /? is a weighting term and Enew is the new error term to be propagated through the NN.
682
J.E. Fieldsend & S. Singh
Other approaches have been developed by researchers in the ENN field use slightly different summed penalty terms in NN training, for example Liu and Yao22 include a penalty for the size of the network in their composite error function. 28.3.3. Node and Weight
Addition/Deletion
Node pruning/addition techniques ignore the complexity through the weight value approach of weight decay regularization and some of the other complexity penalty term approaches, and instead couch the complexity of a NN in terms solely of the number of transformation nodes. The simplest methodology of this approach is exhaustive search, training many different NNs with different numbers of hidden units and comparing their performance against each other. The computation cost of this approach is obviously prohibitive, however it can be constrained to a certain degree by simply adding an additional node to a previously trained NN, using the weights of the previous network as a starting point. This method is described as a growing algorithm approach,5 cascade correlation being another. In Kameyama and Kosugi17 the opposite approach is taken, with a large NN initially specified, followed by the selective pruning of NN units. LeCun et al.21 take a different approach of pruning, again citing that the best generalization is obtained by trading off the training error and network complexity, their method called optimal brain damage (OBD) focuses on removing NN weights. The basic idea is to choose a reasonable network architecture, train the network until a reasonable solution is obtained using gradient descent methods and compute the second derivative for each parameter (NN weight). The parameters are then sorted by this saliency, and those parameters with low saliency are deleted. Ragg and Gutjahr26 in contrast use mutual information in their routine for topology determination. 28.3.4. Problems with These Methods Network growing and pruning methods are usually characterized as being slow to converge, with long training time and, for those that use of gradient descent training techniques, susceptible to becoming stuck in local minima.3 The main criticism directed at weight decay regularization and other penalty term approaches to training, is the problem of how to specify the weighting terms needed by these methods. Just as it is difficult to
Optimizing Forecast Model Complexity Using MOEAs
683
ascertain the correct model complexity for a model a priori, so the correct degree of penalization to include in these adjusted error values is difficult to know beforehand. In addition the weighted sum approach is only able to get all points from a Pareto front when it is convex.7 A demonstration of the problem of composite error weight specification is illustrated below in Figure 28.3.
Fig. 28.3. Illustration of the problems inherent in using composite error functions to determine an operating point.
Figure 28.3 shows three different fronts describing the trade-off between accuracy and complexity. Each with a line tangential to them at the point where the values are equal (equivalent to (3=1 in Equation C.I). As can be seen in the illustration, depending upon the actual interaction of complexity and accuracy exhibited by the process, as described by the curves, three very different models will be returned by using this composite error weighting. One with high error, low complexity (el,c3), one with intermediate complexity and error (e2,c2) and a third with low accuracy and low error (e3,cl). Again, it must be noted that these results are dependent on the front shape, which is unknown a priori, but which must be implicitly
684
J.E. Fieldsend & S. Singh
guessed at if the composite error approach is used. Of course it is feasible to run the composite error algorithm a number of times to discover the shape of the front, however the algorithm will need to be run as many times as the number of points desired, which is very time consuming, and even then non-convex portions of the front will remain undiscovered.
28.4. Using Evolutionary Algorithms to Discover the Complexity/Accuracy Trade-Off As discussed in Section 28.3, until recently researchers interested in constraining the complexity of their models had to assign one or more variables, whose value was known to greatly affect the end model, but whose selection was difficult, if not impossible, to assign without knowing how the model complexity and accuracy interacts for the specific problem. Instead of trying to simultaneously optimize these separate objectives by combining complexity and accuracy into a single error value, which is shown to be problematic, they can be optimized as two separate objectives, through the use of EMOO techniques. By using this methodology a set of ENNs can be produced showing the realized complexity/accuracy trade-off for each problem. Before discussing this approach further however, the concept of Pareto optimality needs to be briefly described.
28.4.1. Pareto
Optimality
Most recent work in EMOO is formulated in terms of non-dominance and Pareto optimality. The multi-objective optimization problem seeks to simultaneously extremize D objectives: yi = fl(x),
i = l,...,D
(D.I)
where each objective depends upon a vector x of P parameters or decision variables, in the case of this chapter, ENN weights and nodes. The parameters may also be subject to the J constraints: e,(x)>0,
j = l,...J.
(D.2)
Without loss of generality it is assumed that the objectives are to be minimized, so that the multi-objective optimization problem may be expressed as: minimize y = f(x) = ( ^ ( x ) , . . . , /o(x))
(D.3)
Optimizing Forecast Model Complexity Using MOEAs
subject to e(x) = (ei(x),... ,ej(x)) > 0
685
(D.4)
where x = (xlt... ,xP) and y = (j/i,... ,yD)When faced with only a single objective an optimal solution is one which minimizes the objective given the model constraints. However, when there is more than one objective to be minimized solutions may exist for which performance on one objective cannot be improved without sacrificing performance on at least one other. Such solutions are said to be Pareto optimal30 after the 19th century Engineer, Economist and Sociologist Vilfredo Pareto, whose work on the distribution of wealth led to the development of these trade-off surfaces.24 The set of all Pareto optimal solutions are said to form the true Pareto front. The notion of dominance may be used to make Pareto optimality clearer. A decision vector u is said to strictly dominate another v (denoted u -< v) iff /i(u)
Vt = l , . . . , D A 3 i / i ( u ) < / i ( v )
(D.5)
Less stringently, u weakly dominates v (denoted u •< v) iff /i(u)
Vi = l,...,ZJ.
(D.6)
A set of M decision vectors {W;} is said to be a non-dominated set (an estimated Pareto front) if no member of the set is dominated by any other member: Wi/W,
Vi,j = l , . . . , M .
(D.7)
28.4.2. Extent, Resolution and Density of Estimated Pareto Set There are a number of requirements of estimated Pareto fronts that researchers strive for their algorithms to produce. These can be broadly described as high accuracy, representative extent, minimum resolution and equal density. The first concept, accuracy, is simply that the estimated solutions should be as close as possible to the true Pareto front. As illustrated in Figure 28.4, the estimated front of Algorithm A is clearly more accurate than that of Algorithm B, however the comparison of A and C is more difficult to quantify, as some members of A dominate members of C, but also the reverse is true.
686
J.E. Fieldsend & S. Singh
x
True Pareto Front • Estimated Pareto optimal individuals, algorithm A * Estimated Pareto optimal individuals, algorithm B o Estimated Pareto optimal individuals, algorithm C Fig. 28.4. Illustration of the true Pareto front, and two estimates of it, estimate of algorithm A being clearly more accurate the B, but the comparison of A and C is not as easy to quantify.
Ideally the Pareto solutions returned (or estimates of them) should lie across the entire surface of the true Pareto front, and not simply be concerned with a small subsection of it. Minimum resolution is a common requirement as in many applications the end user may wish the separation between potential solutions to be no bigger than a fixed value (of course, in discontinuous Pareto problems this requirement is not entirely realistic). Much emphasis has been placed by researchers on the non-dominated solutions returned by the search algorithm being equally distributed (of even density),9 however it is arguable that this should only be of concern if the generating process results in evenly distributed solutions. In an actual application it may well be the case that the generating process produces an unbalanced Pareto front, this information itself may be very pertinent to the decision maker - by forcing multi-objective evolutionary algorithms (MOEAs) to misrepresent this fact by penalizing any representation than equal density they may well have negative repercussions for the final user of the information. An illustration of this is provided in Figure 28.5. Figure 28.5a shows the
Optimizing Forecast Model Complexity Using MOEAs
687
Fig. 28.5. Comparing the density of estimated Pareto fronts. Illustration of an underlying true Pareto front (a), and its approximation using an MOEA that is designed to return equal density along the front (b) and one that does not (c).
true Pareto front, with Figures 28.5b and 28.5c illustrating the returned sets of two MOEAs, one which focuses on equal density and one that does not, Figure 28.5b gives no indication to the end user of the density of solutions to the lower left of the front. 28.4.3. The Use of EMOO Abbass2'3 and Abbass and Sarker1 have recently applied EMOO techniques to trading off the number of hidden units with respect to the accuracy of the NN, where each point on the Pareto frontier is therefore represented by an ENN with a different number of hidden units to any other set member. A description of their memetic Pareto artificial neural network (MPANN) model can be found in Algorithm 2.
688
J.E. Fieldsend & S. Singh
Algorithm 2 The memetic Pareto artificial neural network algorithm.1'2'3 M, N, E, 1:
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
Size of initial random population of solutions. Maximum number of EA generations. Maximum number of backpropagation epochs. Generate random NN population, S, of size M, such that each parameter (weight) of the NN, x, ~ A/"(0,1), and the binary part of the decision vector is either initialized at one or ~ U(0,1). Initialize generation counter t:— 0. while t < N Find set of solutions within S which are dominated, 5, S:=S\S. if|5|<3 Insert members from 5 until |5| = 3. end if Randomly mark 20% of training data as validation data. while \S\ < M Select random representatives from S; x, y and z xnew := crossover(x,y,z) xnew := mutate(xnew) xnew := backpropagation{xnew, E) if xnew -< x S := S + xnew end if end while t:=t + l end while end
The algorithm presented by Abbass is sufficient when concerned with NN complexity defined as the number of transfer units, but is insufficient when concerned with complexity defined as the number of weights or the sum of the squared weight values. This is because the algorithm internalises the estimated Pareto front F within the search population, and needs the maximum size of the Pareto front to be less than that of the search population. This can be seen at line 4 of Algorithm 2, where the dominated members of the search population S are removed. If none of the search population members are dominated (it is a mutually non-dominating set) then no further search will be promoted (line 9) and the Algorithm will simply
Optimizing Forecast Model Complexity Using MOEAs
689
do nothing until the maximum number of generations is reached. As the second objective in MPANN1'2'3 is discrete, with a maximum limit of Hmax and a minimum limit of 1, the maximum size of F equals Hmax- As in its empirical applications1'2'3 the maximum number of hidden units was 10 and the search population size 50, this problem was not encountered, however Algorithm 1 cannot be easily applied to situations where the second objective is to minimize the number of weights, as the maximum size of F (for a single layer MLP) would be Hmax x I + Hmax + Hmax + 1 and the search population would therefore need to be significantly greater than this. In the case of sum of squared weights then there is essentially no limit on the size of the Pareto set, and therefore no search population in Algorithm 2 would be large enough. The method of search population update 1 ' 2 ' 3 is essentially a variant of the conservative replacement scheme described by Hanne15, where an individual in the search population is only replaced if it is dominated by a perturbed copy of itself. In this chapter a more generally applicable algorithm will be described for the multi-objective evolution of NNs, with the emphasis placed on ease of encoding, for the trade-off of complexity and accuracy
28.4.4. A General Model Perhaps the simplest EA in common use is the ES, where the parameters are perturbed to adjust their value from one generation to the next. Its popularity is probably derived in part from its ease of encoding and use, however it has also formed the base of a number of successful algorithms in the MOEA domain, not least the Pareto archived evolutionary strategy (PAES) of Knowles and Corne.18-19 Due to its simplicity and previous success it is also used as the base of Algorithm 3, which is used here to search for the complexity/accuracy trade-off front. 28.4.4.1. mutate{) In ES, the weight space of a network is perturbed by set of values drawn at each generation from a known distribution, as shown in Equation D.8. Xi
= xt + 7 • 0
(D.8)
where Xi is the ith decision parameter of a vector, 0 is a random value drawn from some (pre-determined) distribution and 7 is some multiplier. A (//, A)ES process is one in which \i decision vectors are available at the start of a
690
J.E. Fieldsend & S. Singh
Algorithm 3 The ES based MO training algorithm for complexity/accuracy trade-off discovery. M, N, Pmut, pw, pu, E, 1: 2: 3:
4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 27:
Size of initial random population of solutions. Maximum number of EA generations. Probability of weight perturbation. Probability of weight removal. Probability of unit removal Maximum number of backpropagation epochs. Initialise NN individual z. z := backpropagation(z,E) Generate random NN population, S, of size M, such that each parameter (weight) of the NN, Xi ~ N{zi, 1), and the binary part of the decision vector is either initialised at one or ~ U(0,1). Fo = 0. Update Fo with the non-dominated solutions from S U z Initialise generation counter t := 0. while t < N Create copy of search population, S := S for i = 1 : M Si := mutate(Si,Pmut) Si := weightadjust(Si,pw) Si := unitadjust(Si,Pu) end for Update F o with the non-dominated solutions from S for i — 1 : M if Si -<J5i Si := & else if Si ^ Si if 0.5 > ZY(0,1) Si := Si end if end if end for S:=replace(S,F, f) t:=t + l end while end
Optimizing Forecast Model Complexity Using MOEAs
691
generation (called parents), which are then perturbed to generate A variants of themselves (called children or offspring). This set of A children is then truncated to provide the \x parents of the following iteration (generation). The process of selection for which children should form the set of parents in the next iteration is usually dependent on their evaluated fitness (the fitter being more probable to 'survive'). A (/j, + A)-ES process denotes one where the parents compete with the children in the selection process for the next generation parent set, which is the method used in Algorithm 2. Recent work in the field of EAs has shown that the use of heavier tailed distributions can speed up algorithm performance34 and as such in this chapter 0 is a Laplacian distribution with width parameter 1, and 7 — 0.1. In Algorithm 3 mutate(x,pmut) perturbs the weight parameters of the decision vector x with a probability of pmut28.4.4.2. weightadjustQ In order for partially connected ENNs to lie within the search space of the algorithm, the weightadjustQ method is used (line 10 of Algorithm 2). weightadjust(x,pw) acts upon the weight parameters of x, setting them to 0 with a probability of pw (effectively removing them). 28.4.4.3. unitadjust{) Topography and input feature selection is implemented within the model by bit mutation of the section of the decision vector representing the network architecture. This is facilitated by first determining a super-set of input features and maximum hidden layer sizes. Once this is determined, any individual has a fixed maximum representation capability. Manipulation of structure is stochastic. By randomly bit flipping members of the first / genes of the binary section of the decision vector the set of input features used by the network is adjusted, and flipping the following H genes affects the hidden layer. 28.4.4.4. The Elite Archive In addition to the search population S Algorithm 3 also maintains an elite archive F of the non-dominated solutions (ENNs) found so far in the search process. No truncation is used on this set as the process can lead to some negative repercussions; it can cause some members of F to be dominated by members of F from an earlier generation (empirical proof of this can
692
J.E. Fieldsend & S. Singh
be found in Fieldsend11 and theoretical justification in Hanne15). It also means that the final front discovered should be distributed in a way more indicative of the underlying process, as discussed in Section 28.4.2. Time concerns can be addressed by the efficient use of data structures,10'11'12'23 however if growth is significant then some form of truncation may be worth considering.20 28.4.4.5. replaced In order to promote additional search pressure on S in Algorithm 2, the replace(S,F, 4p) function updates 5 by randomly replacing a fifth (M/5) of its decision vectors with copies of individuals from F. These copies are selected using partitioned quasi-random selection (PQRS),12 which ensures that a good spread of solutions is selected from the estimated Pareto front. 28.4.5. Implementation
and
Generalization
A number of recent approaches to training ENNs when simultaneously adjusting topology have done so using a hybrid approach, where training with EA methods and gradient descent techniques has been inter-levered.1'2'3'22 Justification for this approach has been made for the very sensible reason of computational efficiency - by using a hybrid learning approach as opposed to a purely EA training methodology the training time is typically reduced. However this is not to say that hybrid training does not create problems of its own, if the problem at hand demands a 'hand crafted' error function, like many in financial forecasting applications,4'11'13'28'32 they may be difficult to propagate through gradient descent learning methods. Recent work has highlighted that the most profitable model is not necessarily the one that minimizes forecast Euclidean error.11'13. As such the method described in this chapter uses traditional gradient descent methods to seed to search process, line 2 of Algorithm 2, but thereafter is exclusively EA driven, meaning it is easily applicable to the widest range of time series forecast problems with minimal modification requirements. Algorithm 2 deals solely with fitting the ENNs to a set of training data, which then leads us to the question of how to minimize generalization error with this information? The approach advocated in this chapter is disarmingly simple. Instead of convoluted training and validation during the training process, validation error/complexity is compared to the Pareto training error/complexity after training, and a suitable operating ENN chosen using this comparison. An illustration of this is provided in Figure 28.6.
Optimizing Forecast Model Complexity Using MOEAs
693
Fig. 28.6. Illustration of the complexity/error trade-off front. Left: Training data Pareto front, Right: the same ENNs evaluated on validation data, from point 'p' onwards the ENNs are overfitted and should not be used.
The curve on Figure 28.6a illustrates the complexity/accuracy trade-off curve discovered on a set of training data, and Figure 28.6b illustrates the same ENNs evaluated on some validation data. This curve can be seen as being non-Pareto, as it curves back on itself at high-complexity, showing that those networks have been overfitted. The practitioner should therefore operate with ENN at point 'p' if they wish to minimize generalization error, or at a complexity below that if they have constraints on the distributed complexity of their model (if for instance they are content with a lower accuracy if they can reduce the number of transfer units/weights in the network). The actual generalization error can then be assessed on some additional unseen test data to reassure the choice of complexity. This approach has an advantage over the common early stopping method described earlier, in that it doesn't have the potential to be trapped in local minima, and it promotes search in areas which are not confined to the gradient descent weight trajectory.
28.5. Empirical Validation The methodology introduced in the previous section will now be validated. Two different measures of complexity will be modelled - that of the sum of the squared weights, and the number of weights used. Results from choosing the model at point 'p' will be compared to the traditional approach of early stopping on time series problems with various degrees of noise, and example fronts produced to support the general methodology.
694
J.E. Fieldsend & S. Singh
28.5.1. Data The data used will be of a physical process, the oscillation of a laser,1 where an underlying function is thought to drive the observations, but where there is also a degree of measurement noise. Additional noise will be added to these processes in order to promote lower complexity representations and penalize high complexity representations. A plot of the training data is shown in Figure 28.7, Figure 28.8a shows the scatter plot of the time series versus its first lag, and Figure 28.8b shows the correlation coefficient values for different lags of this data.
Fig. 28.7. Laser oscillation time series.
On inspection of the correlation coefficient values, 40 lags were decided to be used to model the process, resulting in 960 input/output pairings. This data was then randomly partitioned into an ENN training set of 640 samples and validation set of 220 samples. The unseen test set consisted of 9053 input/output pairings. Ten different variants of the series where subsequently made with different degrees of additional noise, drawn from a Gaussian, to mimic different levels of measurement corruption, making a total of eleven time series, each with a different propensity to overfitting. 28.5.2. Model
Parameters
MOEA training was applied through the process described in Algorithm 2, with the following parameters: Hmax = 10, 7 = 0.1, pmut — 0.2, pw = 0.02, 'The time series data, with full descriptions, can be found at http://wwwpsych.stanford.edu/~andreas/Time-Series/SantaFe.html .
Optimizing Forecast Model Complexity Using MOEAs
695
Fig. 28.8. Laser oscillation time series. Left: Scatter plot of current value against previous value. Right: Correlation coefficient values for different lags of time series.
pu = 0.02, N = 5000, E = 5000 and \S\ = 100. In addition, a NN was trained for each of the time series using the more advanced early stopping method described in subsection 28.3.1, for 20000 epochs. The leaning rate for all algorithms using backpropagation was 0.05, with a momentum term of 0.5. In addition the MOEA with the minimizing sum of squared weights objective updated F during the initial training of the seed neural network, this was found to improve training as it gave a good first estimate of the trade-off front. 28.6. Results Figure 28.9 is an indicative plot of the realized fronts created by the set of optimal ENNs F evaluated on the training, validation and test sets. Although the set is mutually non-dominating on the training data, the validation and test data sets both exhibit the folding-back predicted in the previous section, indicating the ENN to select if the user is solely concerned with minimising generalization error. The point at which this fold back occurs is observed to lower as the amount of noise increases (see Table 28.120). Figure 28.10 on the other hand shows the evaluation of ENNs training with the second objective of number of weights minimization. The size of F at the end of this process is substantially smaller than that of squared sum of weights minimization (averaging around 100 as opposed to over 10000), however this form of training can be viewed as more useful to the practitioner who is concerned with the trade-off of accuracy versus actual NN size, as shows that the NN can be drastically reduced with only a marginal increase in error, if they wish to distribute a far simpler model. Table 28.120 gives the error and 'complexity' of the different models
696
J.E. Fieldsend & S. Singh
Fig. 28.9. Training, validation and test set fronts for the error and Hw minimization training process, with additional noise W(0,4). The phenomena of the validation and test set fronts folding back on itself can be clearly seen.
Fig. 28.10. Training, validation and test set fronts for the error and #w minimization training process, without additional noise. The user now gains the information that the number of active weights (connectivity) of the NN can be drastically reduced with only a marginal increase in error, if they wish to distribute a far simpler model.
selected at 'p' by the MOEAs with the different complexity objectives for the 11 data sets, along with that of a NN trained in the traditional early stopping fashion. The error rates can be seen to be equivalent, with the MOEAs seeming to perform slightly better as the amount of noise increases. The MOEA models minimizing sum of the squared weights can also be seen
Optimizing Forecast Model Complexity Using MOEAs
697
to have much lower weight values compared to the early stopping approach as the noise increases. Table 28.120. Results of single ENN model selected at point 'p' on validation front, and an early stopping backpropagation NN. Added noise M(0,1) M(0,2) A/"(0,3) A/"(0,4) -A/(0,5) JV(O,6) AA(0,7) W(0,8) •A/"(0,9) AA(0,10)
'p', Em2 Error 63 7.7 8.9 15.8 23.0 33.4 46.2 58.9 74.1 89.7 107.3
min. T,w2 459.3 444.6 450.7 342.5 332.2 297.0 106.9 42.7 19.4 18.6 10.39
'p', #weight min. Error #w 6?9 410 7.7 410 8.9 404 16.6 388 28.2 78 37.9 33 46.3 75 58.9 52 74.6 19 90.3 23 108.3 24
Backprop Error Eui2 6^9 459.3 7.7 444.6 8.9 450.7 15.8 345.5 23.0 346.9 33.4 297.1 46.3 154.9 59.7 145.6 75.3 141.5 89.8 142.5 110.8 138.9
28.7. Discussion EMOO approaches to NN training have already proved useful in providing trade-off fronts between competing error objectives in financial time series forecasting,11'13 and a methodology already exists for learning the tradeoff front between NN accuracy and the number of hidden units.1'2'3 The methodology described in this chapter takes this further and presents a process for encapsulating other definitions of NN complexity within a MOEA training process. These have been shown to be equivalent or better than the popular early stopping approach to NN training on a physical time series process with many different degrees of noise, and therefore over-fitting propensity, for the selection of a single 'best' NN in terms of generalization error. However more importantly, by using the assessment of a set of estimated Pareto optimal ENNs on validation data, the non-dominated ENNs can give the user a good representation of the complexity/accuracy trade-off of their problem, such that NNs with very low complexity may feasibly be used. In the example series used in this paper the cost in terms of realized error of this approach was surprising low. Acknowledgements Jonathan Fieldsend gratefully acknowledges support from the EPRSC grant number GR/R24357/01 during the writing of this chapter.
698
J.E. Fieldsend & S. Singh
References 1. H. Abbass and R. Sarker. Simultaneous evolution of architectures and connection weights in anns. In Artificial Neural Networks and Expert Systems Conference, pages 16-21, Dunedin, New Zealand, 2001. 2. H.A. Abbass. A Memetic Pareto Evolutionary Approach to Artificial Neural Networks. In The Australian Joint Conference on Artificial Intelligence, pages 1-12. Springer, 2001. 3. H.A. Abbass. An Evolutionary Artificial Neural Networks Approach for Breast Cancer Diagnosis. Artificial Intelligence in Medicine, 25(3):265-281, 2002. 4. J.S. Armstrong and F. Collopy. Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(l):69-80, 1992. 5. CM. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1998. 6. C.A.C Coello. A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques. Knowledge and Information Systems. An International Journal, l(3):269-308, 1999. 7. I. Das and J.Dennis. A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural Optimization, 14(l):63-69, 1997. 8. K. Deb. Multi-objective genetic algorithms.' Problem difficulties and construction of test problems. Evolutionary Computation, 7(3):205-230, 1999. 9. K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable Multi-Objective Optimization Test Problems. In Congress on Evolutionary Computation (CEC2002), volume 1, pages 825-830, Piscataway, New Jersey, May 2002. IEEE Service Center. 10. R.M. Everson, J.E. Fieldsend, and S. Singh. Full Elite Sets for MultiObjective Optimisation. In I.C. Parmee, editor, Adaptive Computing in Design and Manufacture V, pages 343-354. Springer, 2002. 11. J.E. Fieldsend. Novel Algorithms for Multi-Objective Search and their application in Multi-Objective Evolutionary Neural Network Training. PhD thesis, Department of Computer Science, University of Exeter, June 2003. 12. J.E. Fieldsend, R.M. Everson, and S. Singh. Using Unconstrained Elite Archives for Multi-Objective Optimisation. IEEE Transactions on Evolutionary Computation, 7(3):305-323, 2003. 13. J.E. Fieldsend and S. Singh. Pareto Multi-Objective Non-Linear Regression Modelling to Aid CAPM Analogous Forecasting. In Proceedings of the 2002 IEEE International Joint Conference on Neural Networks, pages 388-393, Hawaii, May 12-17, 2002. IEEE Press. 14. CM. Fonseca and P.J. Fleming. An Overview of Evolutionary Algorithms in Multiobjective Optimization. Evolutionary Computation, 3(1):1—16, 1995. 15. T. Hanne. On the convergence of multiobjective evolutionary algorithms. European Journal of Operational Research, 117:553-564, 1999. 16. S. Haykin. Neural Networks A Comprehensive Foundation. Prentice Hall, 2
Optimizing Forecast Model Complexity Using MOEAs edition, 1999. 17. K. Kameyama and Y. Kosugi. Automatic fusion and splitting of artificial neural elements in optimizing the network size. In Proceedings of the International Conference on Systems, Man and Cybernetics, volume 3, pages 1633-1638, 1991. 18. J. Knowles and D. Corne. The pareto archived evolution strategy: A new baseline algorithm for pareto multiobjective optimisation. In Proceedings of the 1999 Congress on Evolutionary Computation, pages 98-105, Piscataway, NJ, 1999. IEEE Service Center. 19. J.D. Knowles and D. Corne. Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation, 8(2):149-172, 2000. 20. J.D. Knowles and D. Corne. Properties of an Adaptive Archiving Algorithm for Storing Nondominated Vectors. IEEE Transactions on Evolutionary Computation, 7(2):100-116, 2003. 21. Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal brain damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems II, pages 598-605, San Mateo, CA, 1990. Morgan Kauffman. 22. Y. Liu and X. Yao. Towards designing neural network ensembles by evolution. Lecture Notes in Computer Science, 1498:623-632, 1998. 23. S. Mostaghim, J. Teich, and A. Tyagi. Comparison of Data Structures for Storing Pareto-sets in MOEAs. In Congess on Evolutionary Computation (CEC'2002), volume 1, pages 843-848, Piscataway, New Jersey, May 2002. IEEE Press. 24. V. Pareto. Manuel D'Economic Politique. Marcel Giard, Paris, 2nd edition, 1927. 25. L. Prechelt. Automatic early stopping using cross validation: quantifying the criteria. Neural Networks, ll(4):761-767, 1998. 26. T. Ragg and S. Gutjahr. Automatic determination of optimal network topologies based on information theory and evolution. In IEEE Proceedings of the 23rd EUROMICRO Conference, pages 549-555. IEEE Compter Society Press, 1997. 27. Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique. Connection Science, 8:356-372, 1996. 28. E. Saad, D. Prokhorov, and D. Wunsch. Comparative Study of Stock Trend Prediction Using Time Delay, Recurrent and Probabilistic Neural Networks. IEEE Transactions on Neural Networks, 9(6):1456-1470, 1998. 29. J. Utans and J. Moody. Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction. In Proc. of the First Int. Conf on AI Applications on Wall Street, pages 35-41, Los Alamos,CA, 1991. IEEE Computer Society Press. 30. D. Van Veldhuizen and G. Lamont. Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art. Evolutionary Computation, 8(2):125-147, 2000. 31. D. Wolpert. On bias plus variance. Neural Computation, 9(6):1211-1243, 1997.
699
700
J.E. Fieldsend & S. Singh
32. J. Yao and C.L. Tan. Time Dependant Directional Profit Model for Financial Time Series Forecasting. In Proceedings of IJCNN 2000, Como IEEE/INNS/ENNS, volume 5, pages 291-296, 2000. 33. X. Yao. Evolving Artificial Neural Networks. Proceedings of the IEEE, 87(9):1423-1447, 1999. 34. X. Yao, Y. Liu, and G. Lin. Evolutionary Programming Made Faster. IEEE Transactions on Evolutionary Computation, 3(2):82-102, 1999.
CHAPTER 29 EVEN FLOW SCHEDULING PROBLEMS IN FOREST MANAGEMENT
E.I. Ducheyne 1 , B. De Baets 2 and R.R. De Wulf3 Dept. of Veterinary Science, Institute of Tropical Medicine Nationalestraat 155, B-2000 Antwerpen, Belgium E-mail: [email protected] Dept. of Applied Mathematics, Biometrics and Process Control Ghent University, Coupure links 653, B-9000 Gent, Belgium Lab. of Forest Management and Spatial Information Techniques Ghent University, Coupure links 653, B-9000 Gent, Belgium The performance of the Multiple Objective Genetic Algorithm (MOG A) and of the Non-dominated Sorting Genetic Algorithm (NSGA-II) are compared on a forest scheduling problem with a known Pareto-optimal front. The most performant algorithm is then applied to a harvest scheduling problem with integer decision variables for each of the harvest periods that can be assigned to the forest stands of Kirkhill forest. This Model I harvest schedule attempts to simultaneously maximize the present value derived from timber production and minimize the deviations of timber volume between successive harvesting periods. Both the optimal encoding of the decision variables and the best population size for this type of scheduling problem are determined. The results of the multiple objective approach are compared with the maximum value that can be attained without the even flow constraint and also to the result obtained using a single objective genetic algorithm. Finally, proportional fitness inheritance is applied to this problem. The inheritance technique and the regular genetic algorithm are compared by studying the evolution of various performance indices. 29.1. Benchmark Problem 29.1.1. Introduction Multiple objective genetic algorithms have not yet been used in forest management. Hence, there is no information available as to which algo701
702
E.I. Ducheyne et al.
rithms perform well for that domain. In order to gather this information, a comparative study was conducted on a forest management problem with known Pareto-optimal front. Two multiple objective genetic algorithms were tested: the Multiple Objective Genetic Algorithm (MOGA) implemented by 9 because this has earned its merit in a land use planning problem 18>17 and the Non-dominated Sorting Genetic Algorithm II (NSGA-II) 5 because of its reported efficiency. These two algorithms are also recommended by 22 as starting points. The outcomes of these algorithms were compared to a random search strategy in order to decide on the efficiency of a genetic approach. Because the Pareto-optimal front is unknown for real-world harvesting problems, a forest management problem defined by 11 was chosen as a forest benchmark problem. In the forest, four different activities need to be scheduled. Those activities are recreation both passive and active, habitat conservation and finally harvesting timber. There are two different management decisions that can be assigned to a forest stand: clear felling the stand on the one hand and leaving the stands on the other hand. Each stand can receive only one set of management activities over the complete planning horizon. For this benchmark problem, the first objective is to maximize the harvest volume V. Since harvest volume is negatively correlated to the standing volume left in the forest, it is possible to write the first objective as in Eq. A.I: Maximize /i = J2iLi v(i,a)
Vstand
(A.I)
where N is the number of forest stands, a a management decision, v(i, a) the harvest volume associated with stand i and management decision a and Vstand the volume left standing. The second objective is to maximize the benefit that people obtain from the standing forest, measured by a utility function U. According to the law of diminishing returns, this function can be modelled using the square root of the stand volume n . The more standing volume left in the forest, the more trees present for people to enjoy. However, the increase in benefit derived from the trees will decrease when the forested area is larger as the marginal gain of leaving an extra tree declines. Therefore the second objective can be written as in Eq. A.2: Maximize f2 = 52i=1 u(i) ~ y/vstand
(A.2)
where N is the number of forest stands, m the utility associated with stand i and vstand the volume left standing.
Even Flow Scheduling Problems in Forest Management
703
As follows from Eqs. A.I and A.2 the two objectives are conflicting. Moreover the Pareto-front between the two objectives is non-convex (Fig. 29.1). This prohibits the use of weighted sum formulations, leading u to revert to a complex dynamic programming formulation in order to solve it. 29.1.2. Methodology The forest activities are mapped using one bit per forest stand. 295 stands from the 399 stands were retained, the other stands were excluded because they were either unplanted or not yet productive during the initial period. A simple land block assignment procedure as in 17 was applied. In this procedure each gene represents a land block and make up a linear chromosome, even though in reality the forest layout is two-dimensional. For the two genetic algorithms, the population size was set to 100, the number of generations to 50, crossover probability to 0.8 and mutation probability to 0.01. All algorithms were repeated 30 times and for each run the following performance indices were determined: error ratio, generational distance, spacing, spread and hypervolume. The results of the performance indices were analyzed using both statistical tests (One Way ANOVA or Kruskal-Wallis) and a distribution-free bootstrapping method 20. The significance level for all statistical tests was set to 95%. The 50% attainment surface was derived for visual comparison and this is also used as input for the Mann-Whitney test statistics provided by 15. 29.1.3. Results and Discussion 29.1.3.1. Visual Interpretation In Fig. 29.1 the median attainment surfaces over 30 runs were obtained for MOGA and NSGA-II. The median attainment surface of the random simulation is not presented because it is too small, instead the solutions found by the random simulations over the 30 runs are presented. Fig. 29.1 shows that both MOGA and NSGA-II perform much better than the random search strategy. NSGA-II approximates the Paretooptimal front better than MOGA. MOGA on the other hand is capable of finding more extreme solutions and this results in a better spread along the Pareto-front. None of the algorithms is capable of finding the extreme solutions. The lack of spread in the forest management problem is not caused by implementation errors but is most likely caused by the discreteness of the problem 6 .
704
E.I. Ducheyne et al.
Fig. 29.1. Comparison of the median attainment surface between random search, MOGA and NSGA-II versus the non-convex Pareto-optimal front
29.1.3.2. Performance Indices 1. Testing closeness to the Pareto-optimal front The generational distance (GD) 23 and the terror ratio 4 , with 5 = 0.05, were calculated for both genetic algorithms. The output results were normalized using the respective minimum and maximum values in each objective dimension in the Pareto-optimal front because both the error ratio and generational distance are scaling dependent: the difference in magnitude between the objectives disregards the effect of the objective with the lowest magnitude. In Fig. 29.2 the result of the bootstrapping method is shown. A non-parametric Kruskal-Wallis test in combination with the bootstrapping method is applied for the statistical analyis because neither normality nor homoscedasticity are fulfilled. From both the Kruskal-Wallis test (p — 0.0 < 0.05) as well as the bootstrapping method (Fig. 29.2) follows that NSGA-II performs better than MOGA at a confidence level of 95%. In Fig. 29.2 the test measure is positive, indicating that the mean generational distance as well as the mean error ratio is higher for MOGA than for NSGA-II. As for these two test indices lower values are better, it follows that NSGA-II performs better in terms of closeness to the Pareto-optimal front. The standard deviation of the generational distance is smaller for NSGA-II than for MOGA, and this can be interpreted as a more robust behavior of the NSGA-II algorithm. 2. Testing spread The spread was measured by the spacing measure and by the spread measure. The spacing for NSGA-II is lower than the spacing for MOGA indicating that the crowding distance function of NSGA-II spreads the solutions better than the sharing function. For spacing and
Even Flow Scheduling Problems in Forest Management
705
spread the variances are not equal (p = 0.001 < 0.05) and this is confirmed by the bootstrapping method (Fig. 29.3). The test indices for the spacing are in the 5% tails of the histogram. Once more lower values are better, therefore the NSGA-II has more evenly spaced solutions than MOGA.
(a)
(b)
Fig. 29.2. Bootstrapping results for the generational distance (29.2(a)) and error ratio (29.2(b)). The confidence boundaries are marked in dark grey bullets (a — 95%), the test measure is marked in a light grey bullet. Both the test indices are outside the boundaries
If the distance to the most extreme solutions is included as in the spread measure by 4 , however, MOGA has a better performance because it can reach the extremes better. From the bootstrapping results (Fig. 29.3), it follows that there is no difference between the two algorithms, but as noted before, the test value is just inside the boundaries and the boundaries differ between two bootstrapping procedures. The Kruskal-Wallis test cannot detect any significant differences at the level of 95% (p = 0.095 > 0.05). 3. Combining spread and closeness The hypervolume measure calculates the size of the dominated space and is a combined measure for both spread and closeness. For this measure the data was not normalized as it is scaling independent. Both normality (p > 0.05 for all groups) and homogeneity of variances (p = 0.426 > 0.05) assumptions are fulfilled and from statistical analysis (One Way ANOVA) follows that NSGA-II is significantly better than MOGA at the 95% level, and that both genetic algorithms are significantly better than the random strategy. From the bootstrapping
706
E.I. Ducheyne et al.
method the same conclusion can be drawn when comparing MOGA and NSGA-II.
(a)
(b)
Fig. 29.3. Bootstrapping results for spacing (29.3(a)) and spread (29.3(b)). The confidence boundaries are marked in dark grey bullets (a = 95%), the test measure is marked in a light grey bullet. Both the test indices are outside the boundaries
Fig. 29.4. Bootstrapping results for hypervolume. The confidence boundaries are marked in dark grey bullets (a — 95%), the test measure is marked in a light grey bullet
29.1.3.3. Statistical Approaches The previous comparisons are based on performance indices and require the transformation of n solutions into a single performance index. There is one way of avoiding this transformation through the use of the attainment surfaces proposed by 10. They describe the output from multiple objective
Even Flow Scheduling Problems in Forest Management
707
genetic algorithms through the use of attainment surfaces. The median attainment surfaces for MOGA and NSGA-II have already been described in Fig. 29.1. These surfaces can be used as input for statistical comparison. 15 provided a test measure based on these attainment surfaces showing where algorithm A outperforms algorithm B and vice versa. From this measure, it follows that NSGA-II dominates MOGA in 79.4% of the covered search space and that MOGA dominates NSGA-II in 16.4% of the cases. These statistics can be explained because MOGA reaches the extreme solutions better than NSGA-II does and therefore MOGA beats NSGA-II in part of the search space. In the largest part of the search space the solutions from MOGA are dominated by NSGA-II because NSGA-II is closer to the Pareto-optimal front. 29.1.3.4. Implications for Forest Management Problems
Both MOGA and NSGA-II have shown a better performance than a random search strategy. They both approximate the Pareto-optimal front well, but suffer from a lack of spread. Especially NSGA-II is not capable of finding the more extreme solutions. This lack of spread, however, is not caused by any implementation errors: both algorithms have a very good spread over the complete Pareto-front for a non-convex test function commonly used a reference function in specialized literature. NSGA-II is capable of approximating the Pareto-optimal front faster than MOGA and has more evenly spaced solutions. If the distance from the extreme solutions in the Pareto-front to the extremes from the Paretooptimal front are included in the spread measure, MOGA scores better than NSGA-II. However, this is not significant. The variance between several runs in generational distance is smaller for NSGA-II than for MOGA, highlighting that NSGA-II is more robust than MOGA in terms of approximation of the Pareto-optimal front. When the algorithms are compared in terms of both spread and closeness, the hypervolume measure indicates that the NSGA-II dominates a higher portion of the search space than MOGA does. Using the attainment surfaces similar conclusions can be drawn: the Mann-Whitney test procedure shows that NSGA-II beats MOGA in the larger portion of the search space. Overall, the NSGA-II algorithms shows a better performance for the forest management problem and therefore this algorithm will be used in the subsequent case studies.
708
E.I. Ducheyne et al.
29.2. Applying Single Objective Genetic Algorithms To a Real-World Problem 29.2.1. Introduction Harvest scheduling was studied extensively in the past 13>16.21>8. Forest managers need to schedule management treatments over a planning horizon. The two objectives that are mostly used in the harvest scheduling problems in literature are (1) to maximize net present worth, and (2) to minimize the deviations between the different cutting periods. Using a Model I harvest scheduling formulation this can be written as 14 : N
M
Maximize f = 2__l2^CijXij
(B-l)
M
Minimize g = Y^(Vi ~ V)
(B-2)
j=i
where N is the number of stands, M is the number of time periods, c^ is the present value obtained when applying the management treatments to stand Xi in period j . Vj is the total volume summed over all stands (m3) cut in period j and V is the average volume over all cutting periods. Eq. B.I expresses the management objective of maximizing the net present value, while Eq. B.2 expresses the objective of minimizing the deviations in timber volume between the different cutting periods, also referred to as even flow. For N cutting periods and k management activities, the harvest scheduling problem has Nk possible combinations because we can schedule a forest stand in each of the harvesting periods. Constraining the problem to an integer program is needed because we can schedule a stand for felling only once, and this requires N x k decision variables. This increases the number of decision variables to such an extent that it can only be solved using heuristics. As a consequence the global optimum for the bi-objective problem is unknown. The global optimum under no even flow assumptions on the other hand can be derived. In that case, all felling activities are postponed to the end of the planning horizon and the maximum present value that can be obtained under that scenario amounts to € 914232. This value can be used as a benchmark to compare the solutions found with the genetic algorithm. As the encoding strategy might influence the results from the genetic algorithm, the effect of three encoding strategies will also be investigated.
Even Flow Scheduling Problems in Forest Management
709
29.2.2. Methodology 29.2.2.1. Input Data For each of the stands the yield class is known. This information was used as input for the production tables from the Forestry Commission 12, and from these forecast tables the cumulative volume from thinning and felling activities can be derived. To simplify the problem, it was assumed that all timber, both from thinning and felling, was sold at the felling date even though this is not very realistic. Prices were real prices per diameter class published in 2000 l. The diameter class at each period was derived from 12 . The discount rate was 3% and the present value was calculated as in Eq. B.3:
v
" = (ih
< R3 >
where Vt is the net revenue obtained after t periods, i is the discount rate and VQ is the revenue today. The difficult task of assigning values to weights can be simplified by working with relative weights. If one is indifferent to either of the objectives, then the objectives should be rescaled between 0 and 1 in order to remove the differences of scale magnitude 2. However, this might lead to numeric imprecision and therefore the weights are usually multiplied by a fixed factor. As the present value is in magnitude 100 times larger than the harvest volumes, the present value was divided by 100. The objective function can thus be written using the notations from Eqs. B.lB.2:
Maximize / = £Jli Zf=i di^i + *> * Ejii(Vj - V)
(B.4)
with w the weight for the second objective. 29.2.2.2. Implementation A genetic algorithm with binary tournament selection, one-point crossover with a probability of 0.8 and uniform mutation with a probability of 0.01 (> 1//) was implemented. The population size was 100 and the number of generations was set to 50. No fitness scaling was implemented as binary tournament selection is insensitive to differences in objective value magnitudes. Because genetic algorithms can lose good solutions during the optimization process, elitism was applied. In this particular case of elitism, the parent and child population were merged and sorted according to their fitness values. The best N individuals were used to continue the search
710
E.I. Ducheyne et al.
process. Binary, gray and integer encodings were initially tested with equal weights. For the binary and gray codes, 3 bits for each harvesting period were used as there are 8 periods in total over the complete planning horizon. This was repeated 10 times 19 . After selection of the representation that led to the best solution, the weights were varied. The weight w was initially linearly distributed on the half-open interval ]0,1] in steps of 0.2. If the weight 0 is included in the interval then the optimization problem is unbounded and all felling activities will be planned at the end of the planning horizon (period 8). Two additional weights (0.01 ;0.05) were evaluated in a later phase to get more information on the Pareto-front between the two objectives. For each weight the genetic algorithm was repeated 10 times. 29.2.3. Results and Discussion 1. Encoding strategy Initially, the influence of the encoding strategy on the solution quality was tested. In Table 29.121 the mean objective function value, mean value for present value and mean sum of deviation in volume for binary, gray and integer coding are presented. As the data is not normally distributed, the Kruskal-Wallis test was used. Table 29.121 shows that gray Table 29.121. Influence of binary, gray and integer coding on the performance of the genetic algorithm. The experiment is repeated 10 times for each encoding. OV is the combined objective value, PV is the present value (*€100), Vj is the total volume (m3) summed over all stands cut in period i and V is the average volume over all cutting periods Encoding type binary gray integer
mean OV 2265.50 2684.50 2573.50
mean PV 3308.00 3534.00 3326.68
y"]"=1 Vj — V 1043.10 850.00 753.58
codes results in a higher objective value than integer and binary codes. The Kruskal-Wallis test statistic (p = 0.263 > 0.05) indicates that there is no significant difference between the three groups. As 3 recommends to use the most natural representation of any given problem, integer coding strategies will be used for this forest management problem. 2. Changing the weight In Table 29.122 and Fig. 29.5 the mean values per weight combination for integer encoding are presented. A first observation is that linearly distributing the weights on a small interval does not
Even Flow Scheduling Problems in Forest Management
711
result to evenly spaced solutions along the Pareto-front. This has some implications: if a forest manager decides to investigate only weights in the interval ]0,1] in steps of 0.2, a considerable amount of information on the shape of the Pareto-front will be lost. Changing the weight from w = 0.2 to w = 0.05 and then to w = 0.01 changes the slope of the Pareto-front substantially. Beyond w = 0.02 a small increase in present value results in a large increase in the sum of all deviations and a large increase of the present value. The effect of the even flow objective is small even when the present value is deemed 5 times more important, but this changes drastically once the present value is considered 100 times more important than even flow. The present value obtained with a weight of 0.01 amounts to €666811, which is 72.9% of the maximum value that can be obtained if there is no even flow assumption. Table 29.122. Results for the different weight combinations. The experiment was repeated 10 times. OV is the combined objective value, PV is the present value (*€100), Vi is the total volume (m3) summed over all stands cut in period i and V is the average volume over all cutting periods weight 1 0.8 0.6 0.4 0.2 0.05 0.01
mean OV 2573.00 2744.60 3151.30 3434.80 4042.70 5482.10 6469.80
mean PV 3326.58 3396.14 3659.14 3752.19 4211.20 5508.46 6668.11
£"=i Vj-V 753.58 814.43 846.40 790.98 1984.93 2636.38 19830.60
Fig. 29.5. Results for the different weight combinations. The experiment was repeated 10 times. On the x-axis the present value (*€l00), and on the j/-axis the sum of all deviations in harvest volume
712
E.I. Ducheyne et al.
3. Validity In Fig. 29.6 the volumes per cutting period are presented for all the weights. From Fig. 29.6(a) to Fig. 29.6(c) the even flow constraint is strengthened. If this constraint is strengthened then the average volume obtained over all periods declines when increasing w from 0.01 to 0.2 (Fig. 29.6(d)). For a weight w = 0.01 the volume harvested per year amounts to 6.70 m 3 /ha/yr. For equal weights this is only 6.30 m 3 /ha/yr. To illustrate that Kirkhill forest was managed as a productive forest, a typical Flemish forest yields on average 4 m 3 /ha/yr. The other weights produce similar average volumes. Therefore even flow constraint does not only have an effect on the present value but also has a negative side-effect on the average volume over all periods. The felling age of the forest stands (Fig. 29.7) after equal weights indicates that the rotation age should be increased in order to obtain a normal age distribution. Up to 1/3 of the stands have a felling age over 80 years. From this figure also follows that some stands are cut very young in order to obtain an even flow of timber volume. The age distribution is affected by the harvesting plan. Looking at the effect of the plan where the two objectives were equally important, the age distribution almost resembles an age distribution of a normal forest (Fig. 29.8(a)). This is caused implicitly by the even flow objective: this controls the volume and the age distribution is implicitly adjusted to a normal state. This is confirmed in Fig. 29.8(b): if the even flow objective is relaxed, the state of the forest reduces to a normal forest but to a lesser extent. Running the genetic algorithm for another planning horizon stabilizes the age distribution even more if the objectives are equally important (Fig. 29.9(a)). Starting from the age distribution with equal weights in the first planning horizon and running it again for a second planning horizon with a weight of 0.01 affects the age distribution a little: it becomes less stable (Fig. 29.9(b)). 29.2.4. Conclusion Genetic algorithms are capable of solving a harvest scheduling problem. The encoding strategy did not affect the quality of the solutions; there was no significant difference between the different codes. As there is no difference, integer codes were used. In order to find the Pareto-front, the weights were initially linearly distributed on the interval ]0,1]. It was found that this did not lead in evenly spaced solutions along the Pareto-front. In order to gain more information, two additional weights were chosen: w = 0.01 and w = 0.05. When the present value was 100 times more important the
Even Flow Scheduling Problems in Forest Management
(a) w = 0.01
(b) w = 0.05
(c) w = 1
(d) Effect of w on V
713
Fig. 29.6. The influence of the weight w on the variation in volume between different cutting periods. From 29.6(a) to 29.6(c) the even flow constraint is strengthened, in 29.6(d) the effect on V is shown
Fig. 29.7. Felling age of the forest stands
slope of the Pareto-front changed a lot. This implies that a user without prior knowledge on the problem, might lose a lot of information on the Pareto-front if these weights are linearly distributed on a small interval. Both the age distribution and the average volume are affected by the
714
E.I. Ducheyne et al.
(a)
(b)
Fig. 29.8. Age distribution of the forest before and after the harvest scheduling plan with 29.8(a) two equally important objectives and 29.8(b) with the present value 100 times more important than the even flow objective
(a)
(b)
Fig. 29.9. Age distribution of the forest before and after the harvest scheduling plan after a second planning horizon with 29.9(a) two equally important objectives and 29.9(b) with the present value 100 times more important than the even flow objective
evenflowobjective. Running the genetic algorithm in order to maximize the present value and to minimize the deviations between the periods, produces harvesting plans enforcing a balanced age distribution. Relaxing the even flow objective has an effect on the age distribution, but then some variations in frequency between the different age classes are still present. The present value obtained with a relaxed even flow constraint amounts to 72.9% of the total maximum attainable present value. The even flow objective also influences the harvested volume: this declines as the even flow objective becomes more important.
Even Flow Scheduling Problems in Forest Management
715
A practical drawback of using weights is that it is very cumbersome. Rerunning the genetic algorithm or any single objective optimizer for several weight combinations is a tedious job and requires large amounts of computing time. 29.3. Applying NSGA-II: A Truly Bi-Objective Approach 29.3.1.
Introduction
The harvest scheduling problem as defined in the single objective case will now be solved using the multiple objective genetic algorithm NSGA-II. The original objective functions (Eqs. B.I and B.2) are the direct input for the genetic algorithm and do not have to be combined in any way beforehand. The same data and production forecast tables as in the single objective case are used to allow for comparison of both approaches. 29.3.2. Methodology As for the single objective case, the effect of the encoding was investigated. In order to compare to the single objective case, the same settings in the previous experiment were used. Again binary, gray and integer encodings were used to represent the 8 different cutting periods per management unit. The effect of the encoding strategies was inspected visually as well as using the hypervolume measure and the statistical analysis via the attainment surfaces. Other indices for closeness could not be applied because the Pareto-optimal front is not known. The spacing measure was also used. The output is compared to that of the single objective case study. Later on, the population size was increased from 500 up to 1000 individuals in steps of 250. For each of these population sizes, the effect on convergence and spread was determined. For each encoding type and population size, the experiment was repeated 10 times. For all experiments binary tournament selection with the nondominance selection criterion was used, together with one-point crossover with a crossover probability of 0.8 and uniform mutation with a probability of 0.01. Once more the elitist strategy was applied. 29.3.3. Results 29.3.3.1. Effect of Encoding on the Spread and Pareto-Optimality 1. Visual interpretation Integer encoding proves to be the best encoding strategy in terms of approximating the Pareto-optimal front
716
E.I. Ducheyne et al.
(Fig. 29.10), but again gray encoding is a close second. The three encodings show a similar spread.
Fig. 29.10. Median attainment surfaces for binary, gray and integer encoding
2. Performance indices The performance of the integer encoding is confirmed after a Kruskal-Wallis test for the spacing measure, and One Way ANOVA for the hypervolume. The bootstrapping procudure is used for both performance indices. There is a significant difference between the groups for the spacing measure. Again gray and integer codes score best and both are significantly better than binary codes according to a non-parametric post-hoc test. The bootstrapping procedure confirms this, in both Figs. 29.11(a) and 29.11(e) the test value is outside the confidence interval indicating a significant difference between integer and binary codes and between gray and binary codes. In Fig. 29.11(c) the test value is within the boundaries of the interval showing that there is no difference between gray and integer codes. The One Way ANOVA test statistic for the hypervolume measure indicates that there are no significant differences (p = 0.656 > 0.05) and this is also confirmed by the bootstrapping results (Figs. 29.11(b), 29.11(d) and 29.11(f)). Integer codes will be used to solve the harvest scheduling problems because they are the most natural representation for the problem. 29.3.3.2. Comparing the Single and Multiple Objective Genetic Algorithm Running NSGA-II with integer encoding and a population size of 100 enables comparing the results from the single and multiple objective optimizer. The multiple objective genetic algorithm was also run for the same
Even Flow Scheduling Problems in Forest Management
(a) Spacing for Int-Bin
(b) Hypervolume for IntBin
(c) Spacing for Int-Gray
(d) Hypervolume for IntGray
(e) Spacing for Bin-Gray
(f) Hypervolume for BinGray
717
Fig. 29.11. Bootstrapping results for the difference in mean spacing (left column) and mean hypervolume (right column) for integer, binary and gray encodings
number of generations. Overlay of the median attainment surface from the multiple objective optimization runs with the median attainment surface from the single objective genetic algorithm is depicted in Fig. 29.12. The
718
E.I. Ducheyne et al.
Fig. 29.12. Median attainment surfaces for the single and the multiple objective genetic algorithm
two median attainment surfaces are very similar. Only the most extreme solution is missing from the Pareto-front found by NSGA-II. Running the multiple objective genetic algorithm has particular benefits in terms of computer efficiency. For both algorithms the same population size and number of generations was chosen. The product of population size and number of generations yields the number of function evaluations. In the case of the single objective optimizer, this total number should be multiplied by five as the genetic algorithm has to be run five times to get points along the Pareto-front. For the multiple objective genetic algorithm, with the same population size and number of generations, the number of function evaluations is only one fifth of the total number of evaluations needed for the single objective optimizer. 29.3.3.3. Effect of Population Size on Solution Quality 1. Visual interpretation In a last phase the effect of the population size on solution quality as well as spread was investigated. The population size was increased from 50 to 1000 in steps of 250. This results in the following median attainment surface in Fig. 29.13. The median attainment surfaces for the three population sizes are very similar. They approximate the Pareto-front in the same way and the solutions are evenly spread along the attainment surface. The fact that they approximate the same front indicates that they are very close to the (unknown) Pareto-optimal front. 2. Performance indices The spacing and hypervolume measure are determined for the different population sizes. The mean values for the hypervolume measure is almost the same for the three population sizes, the spacing along the Pareto-front is also very similar across the differ-
Even Flow Scheduling Problems in Forest Management
719
Fig. 29.13. Median attainment surfaces for population sizes 500, 750 and 1000
ent population sizes. For the spacing measure the data is normally distributed but not homoscedastic and therefore the Kruskal-Wallis procedure is applied. From the test value (p = 0.0 < 0.05) it follows that there is a significant difference. These differences are found, according to a nonparametric post-hoc test, between the population size of 500 and the population sizes of 750 and 1000. This is also confirmed from the bootstrapping results (Figs. 29.14(a), 29.14(c), 29.14(e)).The statistical analysis of the hypervolume (One Way ANOVA) shows that there is a significant difference between the different population sizes for the hypervolume measure (p = 0.001 < 0.05). According to Tukey's post-hoc test and the bootstrapping procedure (Figs. 29.14(b), 29.14(d), 29.14(f)) this is between the population size of 500 on the one hand and the population sizes of 750 and 1000 on the other hand. A population size of 750 is thus sufficiently large to solve the harvest scheduling problem. 29.3.3.4. Validity of the Plans Running NSGA-II with a population size of 750 and for 50 generations yields the following Pareto-front (Fig. 29.15). The maximum present value that is attained in 50% of the repetitions amounts to € 670300 (73.3% of the maximum attainable present value) and has a total sum of volume deviations of 398991 m3. For a weight of 0.01 this was € 667435 (73% of the maximum attainable present value) and a total sum of volume deviations of 21535.5 m3. The median values are similar for the single and multiple objective optimizer. Two plans will be investigated more closely as to their validity: the harvest schedule plan with the most strict even flow objective and the plan with the best present value. The objective values are listed in Table 29.123. The volume per period, age distribution and the harvest
720
E.I. Ducheyne et al.
(a) Spacing for 500-750
(b) Hypervolume for 500750
(c) Spacing for 500-1000
(d) Hypervolume for 5001000
(e) Spacing for 750-1000
(f) Hypervolume for 7501000
Fig. 29.14. Bootstrapping results for the difference in mean spacing (left column) and mean hypervolume (right column) for population sizes 500, 750 and 1000
pattern in the forest are illustrated in Figs. 29.16, 29.17 and 29.18. From these figures a similar conclusion to that of the single objective case follows: the age distribution is forced by the proposed plans towards a normal age
Even Flow Scheduling Problems in Forest Management
721
Fig. 29.15. Overlay of Pareto-front for a population size of 750 and 50 generations versus the best solutions found by the single objective optimizer Table 29.123. The present value PV (*€ 100) and the average volume over all cutting periods V (m3) for the best even flow and the best present value plan mean PV 5878 6851
(a)
£?=i Vj - V ~ 600 28514
(b)
Fig. 29.16. The variation in total deviation in volume (m3) between the different cutting periods. From 29.16(a) to 29.16(b), the even flow constraint is strengthened
distribution. If the even flow objective becomes more important this effect is stronger than when the present value objective is more important. Again the average volume that is attained with the relaxed even flow objective (6.80 m 3 /ha/yr) is higher than when the objective of even flow becomes
722
E.I. Ducheyne et al.
(a)
(b)
Fig. 29.17. The effect of even flow objective on the age distribution. From 29.17(a) to 29.17(b), the even flow constraint is strengthened
(a)
(b)
Fig. 29.18. The effect of the even flow objective on the harvest pattern. From 29.18(a) to 29.18(b)), the even flow constraint is strengthened
more important (6.56 m 3 /ha/yr). From the harvest pattern, it follows that in order to get a better present value, more stands are scheduled for cutting in the later planning periods than when the even flow objective is important. From the detailed Pareto-front follows that there is a very narrow range where low deviations from the average volume can be obtained. This shows that forest managers need to design their plans very carefully as to avoid too large deviations. 29.3.4. Conclusion The encoding strategy is important in terms of approximation of the Paretofront. The best encoding strategies are gray and integer encoding. As is
Even Flow Scheduling Problems in Forest Management
723
suggested in literature, binary encoding does not perform very well. Using multiple objective genetic algorithms instead of single objective genetic algorithm process has particular benefits: in order to find solutions linearly distributed along the Pareto-front a single run suffices. There is an effect if the population size is increased from 500 to 750, the Pareto-optimal front is approximated closely. This effect is no longer present when the size increases even more to 1000. For both optimizers the effect of the plans on the age structure of the forest is the same: if the even flow objective becomes more and more important, the age structure resembles that of a normal forest due to the volume control, even though this is not implicitly mentioned in the objective functions. If the even flow objective is relaxed, the stands are scheduled in later planning periods than when the even flow objective is very important. Finally, the Pareto-front is very steep, indicating that forest managers have to design their plans carefully to meet their objectives. 29.4. Speeding Up the Optimization Process 29.4.1. Introduction
Finally, fitness inheritance is used to speed up the optimization process for the bi-objective harvest scheduling problem. As this problem is convex, fitness inheritance should be a feasible approach 7 . 29.4.2. Methodology
The same input data and Forestry Commission production tables are used. The population size is set to 100, the number of generations without fitness inheritance to 200 and with proportional inheritance to 400 so that the same number of function evaluations is maintained. Average inheritance was not tested as was shown by 7 that its behavior is either similar or worse than proportional inheritance. One-point crossover is used with a probability of 0.8 and uniform mutation with a probability of 0.01. Integer encoding is used together with binary tournament selection and the crowding distance operator. 29.4.3. Results and Discussion
From Fig. 29.19 follows that after the same number of function evaluations, the attainment surface from the inheritance approach equals that of the non-inheritance approach. This is confirmed by calculating the hypervolume
724
E.I. Ducheyne et al.
measure. From a Student t-test statistic follows that there is no significant difference.
Fig. 29.19. Attainment surfaces for the harvest scheduling problem for non-inheritance and proportional inheritance approaches
29.4.4. Conclusions
The behavior of the inheritance approach is similar to that of the standard genetic algorithm. However, this should be relativized because in reality the same number of function evaluations are necessary to obtain the same Pareto-front. Acknowledgements
This work was funded under the Research Fund of the Ghent University. The authors would like thank Dr. Cameron for the data on Kirkhill Forest. They would also like to thank the anonymous reviewer for the useful comments. References 1. Anonymous. Gemiddelde Prijzen van Hout op Stam. Houthandel en nijverheid, 5:5, 2000. 2. J. G. Buongiorno and J. K. Gilles. Forest Management and Economics. MacMillan Publishing Company, USA, New York, 1987. 3. L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, USA, New York, 1991. 4. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Wiley and sons, UK, Chichester, 2001. 5. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE T. Evolut. Comput, 6(2):182197, 2002.
Even Flow Scheduling Problems in Forest Management
725
6. E. I. Ducheyne. Multiple objective forest management using GIS and genetic optimisation techniques. PhD thesis, Ghent University, 2003. 7. E. I. Ducheyne, B. De Baets, and R. R. De Wulf. Is Fitness Inheritance Really Useful for Real-World Applications? Led. Notes Comput. Sc, 2632:31-43, 2003. 8. A. O. Falcao and J. G. Borges. Designing an Evolution Program for Solving Integer Forest Management Scheduling Models: An Application in Portugal. Forest Set., 47(2):158-168, 2001. 9. C. M. Fonseca and P. J. Fleming. Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In Proc. of the Fifth Internat. Conference on Genetic Algorithms, pages 416-423, USA, San Mateo, 1993. Kauffmann Publishers. 10. C. M. Fonseca and P. J. Fleming. On the Performance Assessment and Comparison of Stochastic Multiobjective Optimizers. In Parallel Problem Solving from Nature (PPSN) - IV, pages 584-593, Germany, Berlin, 1996. SpringerVerlag. 11. P. Gong. Multiobjective Dynamic Programming for Forest Resource Mangement. Forest Ecol. Manag., 48:43-54, 1992. 12. G. J. Hamilton and J. M. Christie. Forest Management Tables. Forestry Commission Booklet No. 34- Her Majesty's Stationery Office, UK, London, 1971. 13. H. M. Hoganson and D. W. Rose. A Simulation Approach for Optimal Harvest Scheduling. Forest Sci., 34(4):526-538, 1994. 14. K. N. Johnson and H. L. Scheurman. Techniques for Prescribing Optimal Timber Harvest and Investments under Different Objectives - Discussion and Synthesis. Forest Sci. Mon., 18:31, 1977. 15. J. D. Knowles and D. W. Corne. Approximating the Nondominated Front using the Pareto Archived Evolution Strategy. Evol. Comput., 8(2):149-172, 2000. 16. C. Lockwood and T. Moore. Harvest Scheduling with Spatial Constraints: A Simulated Annealing Approach. Can. J. Forest Res., 8(2):149-172, 1993. 17. K. B. Matthews, S. Craw, A. R. Sibbald, and I. MacKenzie. Applying Genetic Algorithms to Multi-Objective Land Use Planning. In Proc. of the Genetic and Evolutionary Computation Conference - GECCO 2001, pages 519-526, San Francisco, 2000. Morgan Kauffman. 18. K. B. Matthews, A. R. Sibbald, and S. Craw. Implementation of a Spatial Decision Support System for Rural Land Use Planning: Integrating Geographic Information System and Environmental Models with Search and Optimisation Models. Comput. Electron. Agr., 23:9-26, 1999. 19. A. Osyczka. Evolutionary Algorithms for Single and Multicriteria Design Optimization. Physica-Verlag, USA, New York, 2002. 20. R. C. Purshouse and P. J. Fleming. Why use elitism and sharing in a multiobjective genetic algorithm? In Proc. of the Genetic and Evolutionary Computation Conference - GECCO 2002, pages 520-527, New York, 2002. Morgan Kauffman. 21. P. Tarp and F. Helles. Spatial Optimization by Simulated Annealing and Linear Programming. Scand. J. Forest Res., 12:390-402, 1997.
726
E.I. Ducheyne et al.
22. D. A. Van Veldhuizen and G. B. Lamont. Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art. Evol. Comput, 8(2):125-147, 2000. 23. D. A. Van Veldhuizen and G. B. Lamont. Multiobjective Optimization with Messy Genetic Algorithms. In Proc. of the 2000 ACM Symposium on Applied Computing, pages 470-476, Italy, 2000. ACM.
CHAPTER 30 USING DIVERSITY TO GUIDE THE SEARCH IN MULTI-OBJECTIVE OPTIMIZATION
J.D. Landa Silva and E.K. Burke Automated Scheduling, Optimisation and Planning Research Group School of Computer Science and Information Technology University of Nottingham, UK E-mail: [email protected], [email protected] The overall aim in multi-objective optimization is to aid the decisionmaking process when tackling multi-criteria optimization problems. In an a posteriori approach, the strategy is to produce a set of nondominated solutions that represent a good approximation to the Pareto optimal front so that the decision-makers can select the most appropriate solution. In this paper we propose the use of diversity measures to guide the search and hence, to enhance the performance of the multi-objective search algorithm. We propose the use of diversity measures to guide the search in two different ways. First, the diversity in the objective space is used as a helper objective when evaluating candidate solutions. Secondly, the diversity in the solution space is used to choose the most promising strategy to approximate the Pareto optimal front. If the diversity is low, the emphasis is on exploration. If the diversity is high, the emphasis is on exploitation. We carry out our experiments on a two-objective optimization problem, namely space allocation in academic institutions. This is a real-world problem in which the decision-makers want to see a set of alternative diverse solutions in order to compare them and select the most appropriate allocation. 30.1. Introduction This paper is concerned with the application of the class of approaches known as meta-heuristics to tackle multi-objective optimization problems. We assume that the reader is familiar with the fields of multi-criteria decision-making2'39 and multi-objective optimization7'10. Recent surveys on the application of meta-heuristics to multi-objective optimization prob727
728
J.D. Landa Silva and E.K. Burke
lems are those provided by Jones et al.19, Tan et al.i2 and Van Veldhuizen and Lamont45. Multi-objective optimization is a very active research area that has received increased attention from the scientific community and from practitioners in the last ten years or so. One main reason for this is that many real-world problems are multi-criteria optimization problems. This means that in these problems, the quality of solutions is measured taking into account several criteria that are in partial or total conflict. Therefore, there is no such global optimum solution but a number of them that represent a trade-off between the various criteria. It is also commonly the case that more than one decision-maker is involved in the selection of the most appropriate solution to the multi-criteria problem. Then, the overall aim in multi-objective optimization is to aid the decision-makers to tackle this type of problems. One of the strategies for this is to produce a set of solutions that represent a good approximation to the trade-off surface. Then, the decision-makers can decide which of the solutions in this set is the most adequate for the problem at hand. In general terms, a good approximation set should be as close as possible to the optimal front and it should also give a good coverage of the optimal front. The goal of achieving a good coverage of the trade-off surface, i.e. maintain the diversity and spread of solutions, is of particular interest in multi-objective optimization. A number of techniques to accomplish this goal have been proposed in the literature, e.g. weighted vectors, clustering or niching methods (fitness sharing, cellular structures, adaptive grids, etc.), restricted mating, relaxed forms of dominance, helper objectives, and objective-driven heuristic selection (hyper-heuristics). Most of these techniques are targeted towards maintaining diversity in the objective space. However, in some scenarios, the decision-makers are also concerned with the diversity of solutions in the solution space. Then, to serve as a useful tool in tackling multi-criteria optimization problems, the multi-objective optimization algorithm should have the mechanisms to find the set of solutions that satisfy the requirements of the decision-makers. That is, solutions that are close to the optimal front and have the desired diversity in the objective space, the solution space or both spaces. One goal in this paper is to present an overview of a number of techniques that have been proposed in the literature to maintain a diverse set of solutions when tackling multi-objective optimization problems. Another goal here is to describe some mechanisms that we implemented to help a multi-objective search algorithm to obtain a diverse set of solutions for a real-world optimization problem with two objectives. These mechanisms consist on using diversity measures, in both the objective space and
Using Diversity to Guide the Search in Multi-Objective Optimization
11^
the solution space, to guide the search and enhance the performance of the multi-objective search algorithm. We carry out experiments on three tests instances of the space allocation in academic institutions. In this problem, a set of entities (staff, computer rooms, teaching rooms, etc.) must be allocated into a set of available areas of space or offices and a number of additional constraints should also be satisfied. In the space allocation problem, the decision-makers are interested in the diversity of solutions in both the objective space and the solution space. The results of our experiments show that the proposed mechanisms help the algorithm to produce a set of compromise solutions that better satisfies the requirements from the decision-makers. The rest of this paper is organized as follows. Section 30.2 discusses the issue of diversity in the context of multi-objective optimization. Section 30.3 gives an overview of some of the mechanisms incorporated into modern multi-objective search algorithms to achieve a good coverage of the trade-off surface. A description of the two-objective space allocation problem and the way in which diversity in the objective space and diversity in the solution space are measured in this problem are the subject of Sec. 30.4. The diversity control mechanisms implemented to guide the search and the algorithm in which these mechanisms were incorporated are described in Sec. 30.5. The experiments and results are presented and discussed in Sec. 30.6 while Sec. 30.7 gives a summary of this paper. 30.2. Diversity in Multi-Objective Optimization Given two solutions x and y for a /c-criteria optimization problem, x is said to weakly dominate y if x is as good as y in all the k criteria and better in at least one of them. In the case that x is better than y in all the k criteria, x is said to strictly dominate y. In the following, we refer to weak dominance simply as dominance. A solution x is said to be nondominated with respect to a set of solutions S is there is no solution in 5 that dominates x. The Pareto optimal front denoted Sp is the set of all non-dominated solutions with respect to the whole set of feasible solutions SF- Then, the goal of a multi-objective search algorithm is to find a set SND of non-dominated solutions for a given multi-criteria optimization problem. The non-dominated set SND should represent a good approximation to the Pareto optimal front Sp. This means that the solutions in SND should be: • As close as possible to the Pareto optimal front Sp, • widely spread across the entire trade-off surface, and • uniformly distributed across the entire trade-off surface.
730
J.D. Landa Silva and E.K. Burke
The closeness of SND to the Pareto optimal front Sp gives an indication of how good is the convergence towards the optimal front. The spread and distribution of SND give an indication of how good is the coverage of the Pareto optimal front Sp. This is illustrated in Fig. 30.1 where various non-dominated sets are depicted for a two-objective minimization problem. Using the notation in Fig. 30.1, it is clear that an effective multi-objective search algorithm should find an approximation set with the characteristics of Sl(c + ,s + ,d + ). Moreover, in some real-world scenarios the decision-makers are interested on a set of alternative solutions like those in SI but at the same time, they want to see solutions that have a certain diversity with respect to the solution space. This is the case for the problem tackled in this paper, space allocation in academic institutions, as it will be explained later. Then, in order to achieve the aim of assisting the decision-making process, a multi-objective search algorithm must also take into account the diversity of SND with respect to the solution space.
Fig. 30.1. The quality of the non-dominated set is given by the closeness to the Pareto optimal front (c+ is close, c~ is far), the spread of solutions (s~*~ is good spread, s~ is poor spread) and the distribution of solutions (d~*~ is good distribution, d~ is poor distribution). Then, the quality of the non-dominated sets in this Fig. can be described as follows: Sl(c+,s+,d+), S2(c",s+,d+), S3(c+ ,s+,d~), S4(c~,s+,d~), S5(c+,s-,d+), S6(c~ ,s~ ,d+), S7(c+,s-,d-),
and S8(c~
,s~,d~).
30.3. Maintaining Diversity in Multi-Objective Optimization The majority of meta-heuristics proposed for multi-objective optimization incorporate a specialized mechanism to help achieving a good diversity with
Using Diversity to Guide the Search in Multi-Objective Optimization
731
respect to the objective space. As it was pointed out by Laummans et al.28, this is not a straightforward task because many algorithms that implement specific mechanisms to maintain diversity suffer from deterioration which affect their convergence ability. This Sec. gives an overview of a number of strategies that have been proposed in the literature to maintain diversity in multi-objective optimization. For more references to multi-objective optimization algorithms that incorporate mechanisms for diversification not discussed here, see the survey by Tan et al.42 and also the books by Coello Coello et al7 and Deb10. 30.3.1. Weighted
Vectors
One of the first techniques that were proposed to achieve a better diversity of solutions in multi-objective optimization is the use of weighted vectors to specify the search direction and hence, aim a better coverage of the trade-off surface. This method consists on setting a vector of k weights W = [u>i,W2, • • -Wk] where 0 < Wi < 1, k is the number of objectives in the problem and the sum of all Wi equals 1. The fitness of a solution x is calculated as f(x) = w\f\{x) + 11*2/2(20 + • • -Wkfk{x) where fi(x) measures the quality of x with respect to the ith criterion. The strategy is to systematically generate a set of vectors in order to approach the trade-off surface from all directions. Weighted vectors is a popular technique that has been used in a number of algorithms like the multi-objective cellular genetic algorithm of Murata et al.35 and the multiobjective simulated annealing algorithm of Ulungu et al.43. Another approach that uses weighted vectors to encourage diversity is the Pareto simulated annealing algorithm of Czyzak and Jazkiewicz8. Their strategy is to modify the weights for a solution x so that x is moved away from its closest neighbor xcn by increasing the weights of those objectives in which x is better than xcn and decreasing the weights for those objectives in which x is worse than xcn. In another approach implemented by Gandibleux et al.14, the set of supported solutions is first computed. Then, the information obtained from these solutions is used to guide the search and improve the performance of a population heuristic. Ishibuchi et al.15 used weight vectors in a different way to encourage diversity. Instead of generating a weighted vector to specify a search direction for a solution, they choose an appropriate solution for a randomly specified weight vector. The selection of the solution for a given vector is based on the position of the solution in the objective space. That is, they attempt to set an appropriate search direction for each new solution
732
J.D. Landa Silva and E.K. Burke
in order to achieve a better approximation set. 30.3.2. Fitness Sharing In this mechanism the idea is to decrease the fitness of individuals that are located in crowded regions in order to benefit the proliferation of solutions in sparse regions. Usually, the fitness of an individual is reduced if the distance to its closer neighbor is smaller than a predefined value. Fitness sharing can be implemented in the objective space or in the solution space. However, most of the implementations of fitness sharing reported in the literature are on the objective space. For example, Zhu and Leung47 implemented fitness sharing in their asynchronous self-adjustable island genetic algorithm. Talbi et a/.41 implemented fitness sharing mechanisms in both the objective space and the solution space. In their experiments, they observed that fitness sharing in the objective space appears to have a stronger influence on the search than fitness sharing in the solution space, but they also noted that the combination of both fitness sharing mechanisms improved the search. 30.3.3. Crowding/Clustering
Methods
These methods attempt to control the number of solutions in each region of the trade-off surface. The general idea here is to limit the proliferation of solutions in crowded or over-populated areas and at the same time, to encourage the proliferation of solutions in sparse or under-populated areas. An example of this type of mechanisms is the adaptive grid implemented by Knowles and Corne22 in their Pareto archived evolutionary strategy. They divide the fc-objective space into 2lk regions where I is the number of bisections in each of the k dimensions. Then, based on the crowdedness of the region in which the new solution lies, a heuristic procedure is used to decide if the new solution is accepted or not. Lu and Yen29'30 used a modified version of the adaptive grid of Knowles and Corne. In their algorithm they modify the fitness of solutions based on the density value of the population. They also associate an age indicator to each solution x in the population in order to control its life span. An agent-based crowding mechanism was proposed by Socha and Kisiel Dorohinicki40 in which agents interact between them in order to encourage the elimination of too similar solutions or agents. Each agent in the population contains an amount of energy and the crowding mechanism seeks to maintain a uniform agent distribution along the trade-off surface and prevent agent clustering in particular areas by discouraging agents from
Using Diversity to Guide the Search in Multi-Objective Optimization
733
creating groups of similar solutions. In their mechanism, an agent A communicates with another agent B and then the solutions from both agents, XA and XB respectively, are compared. If the similarity between XA and XB (measured with a distance metric) is smaller than a predefined value, an amount of energy is transferred from agent A to agent B. The amount of energy transferred depends on the degree of similarity between XA and XB • This is similar to fitness sharing but here, one agent receives and the other provides. 30.3.4. Restricted
Mating
Restricted mating is a mechanism that prevents the recombination of individuals that do not satisfy a predefined criterion. Most of the times, this criterion is that mating individuals should not be too close to each other in the objective space or in the solution space. In this sense, restricted mating can be regarded as a mechanism that is similar to crowding/clustering. An example of restricted mating is the strategy implemented by Kumar and Rockett21 in their Pareto converging genetic algorithm. That algorithm is an island based approach in which the genetic operations are restricted to individuals within the same island. There is no migration between islands and no cross-fertilization between individuals in two different islands. However, two islands can be merged into one island in order to test convergence during the search. Their algorithm is a steady-state approach that produces only one offspring in each iteration. Kumar and Rocket argue that the steady-state nature of the algorithm helps maintain diversity because genetic drift, which is inherent in generational genetic algorithms, is less likely to occur. Other examples of restricted mating mechanisms are used in the approaches implemented by Lu and Yen29'30 and the cellular genetic algorithm of Murata et al.35. 30.3.5. Relaxed Forms of Dominance Another strategy to encourage diversity that has been explored recently by several researchers, is to use relaxed forms of the dominance relation to assess the fitness of individuals. As described in Sec. 30.2, in the standard dominance relation a solution x is considered better than another solution y only if x is not worse than y in all the objectives and x is better that y in at least one of the objectives. In the relaxed forms of dominance, the basic idea is to consider a solution x as better that a solution y even if x is worse that y in some objective(s). Usually, the condition is that such
734
J.D. Landa Silva and E.K. Burke
deterioration must be compensated by a good improvement in the value of other objective(s). The idea is that by using relaxed forms of dominance, the algorithm will be capable of exploring more of solutions and hence, to maintain a better diversity. For example, Laumanns et al.28 proposed the use of e-dominance to implement archiving/selection strategies that permit to achieve a better convergence and distribution of the approximation non-dominated set. Burke and Landa Silva3 used a variant of a-dominance, which is also a relaxed form of dominance, to improve the converge ability of two multi-objective search algorithms. Mostaghim and Teich34 compared the performance of a multi-objective optimization algorithm when using a clustering technique and when using the e-dominance method. They observed in their experiments that using e-dominance to update the archive of non-dominated solutions, was beneficial because it helped to reduce the computation time and it also helped to achieve a better convergence and comparable diversity. Another interesting aspect of using relaxed forms of the dominance relation is that it can help to identify those solutions that are more attractive to the decision-makers out of the set of solutions in the trade-off surface, which can be of considerable size. As it was pointed out by Farina and Amato13, the number of solutions that can be considerable equal or incomparable (based on standard dominance) to the current solution, increases considerably with the number of objectives. They developed the notion of fc-dominance in which they proposed to take also into consideration the number of incomparable or equal objectives in the new solution and the normalized size of improvement achieved in the other objectives. In fc-dominance v\ fc-dominates i>2 if and only if: ne<M
and nb > ^ ^ where 0 > k < 1
In the above, nt is the number of objectives in which v\ is better than V2, and ne is the number of objectives in which t>i and V2 are equal. Farina and Amato also extendedfc-dominanceby evaluating the number of nt, ne, in a fuzzy way instead of a crisp way by introducing a tolerance on the ith objective, that is the interval at which an improvement on objective i is meaningless. Jin and Wong17 also investigated archiving techniques based on their concept of relaxed dominance, called ^-dominance. The main feature of their archiving mechanism is that it adapts according to the solutions that have been found. It also includes the concept of hyperrectangles to enclose the search space even considering unseen solutions. This gives their technique the advantage of not requiring prior knowledge
Using Diversity to Guide the Search in Multi-Objective Optimization
735
of the objective space (objective values). 30.3.6. Helper Objectives The specification of helper objectives is a strategy that has been used to aid the search not only in multi-objective optimization but also in singleobjective optimization. For example, this mechanism can be used to handle constraints by treating each constraint as additional objective to be optimised. In single-objective optimization the aim of helper objectives is help on maintaining diversity and escaping from local optima. For example, Jensen16 and Knowles et al.20 proposed the 'multi-objectivization' of single-objective optimization problems which is decomposing the singleobjective problem into subcomponents by considering multiple objectives. In this way, 'multi-objectivization' can help to remove local optima because for the search process to be stuck it is required that all objectives are stuck. The helper objectives should be chosen so that they are in conflict with the main objective, at least partially. 30.3.7. Objective Oriented Heuristic
Selection
Another idea that has been proposed to help maintaining diversity in multiobjective optimization is to adapt the local search strategy according to the current distribution of solutions in the objective and/or the solution space. For example, Knowles and Corne23 proposed to adapt the focus of the search on exploration or exploitation when approximating the Pareto front, by selecting the most adequate between three search strategies: l)use a population-based method that tries to improve in all objectives at once in order to approach the Pareto front from all directions, 2)generate a weighted vector which is used to specify a specific search direction, or 3)use a singlesolution local search method that tries to move along the Pareto front by perturbing one solution and obtain a nearby point in the front. The selected strategy depends on the correlation between distance in the solution space and distance in the objective space. This strategy was also investigated by Jin and Sendhoff18 for some continuous test problems. Adapting the local search heuristic according to the value of the objectives in the solutions has also been proposed as a mechanism to maintain diversity while converging to the Pareto front. For example, Burke et al.h implemented an approach that has been termed 'hyper-heuristic'. The idea is to use a guiding/learning method that choses the most promising heuristic in order to push solutions towards the desired area in the objectives of
736
J.D. Landa Silva and E.K. Burke
interest. This technique takes into consideration the localization of the solution in the objective space and the ability of each local search heuristic to achieve improvements on each objective. The idea is to try improving poor objectives while maintaining the rich ones. Adapting the heuristic local search is interesting when using hybrid approaches that use local search in an efficient way. Then, the analysis or pre-sampling of the fitness landscape can be useful to design a good hybrid32. 30.3.8. Using Diversity to Guide the Search Various evolutionary algorithms for multi-objective optimization use estimators of density in the objective space to bias the selection operator in order to maintain diversity in the population. Laumanns et al.27 noted that the accuracy of the density estimator used has a strong effect on the performance of the selection strategy and hence, the density estimator should be good for the diversity maintenance strategy to be effective. Also, Knowles et al.25 proposed a bounded archiving technique that attempts to maximize the hypervolume covered by the approximation set. They compared the performance of their archiving technique against other methods and obtained promising results. However, they pointed out that the computational cost was considerable for more than three objectives. In single-objective optimization some researchers have also made some efforts towards designing evolutionary algorithms that maintain diversity in an adaptive fashion by using diversity measures to guide the search. For example, Ursem44 proposed a diversity-guided evolutionary algorithm that alternates between phases of exploration and exploitation according to a measure of the diversity in the population given by the distance to the average point. If the diversity is below a threshold diow, the algorithm uses selection and recombination in an exploration mode. If the diversity is above a threshold d^igh, the algorithm uses mutation in an exploitation mode. Another approach that uses diversity measures to guide the search is the diversity-control-oriented genetic algorithm of Shimodaira38 in which the probability of individuals to survive depends on the Hamming distance between the individual and the best individual in the population. 30.4. The Two-Objective Space Allocation Problem The management of physical space in universities is an important and difficult issue as it was discussed by Burke and Varley6. With the continuous increase in the number of students and staff, it must be ensured that the
Using Diversity to Guide the Search in Multi-Objective Optimization
737
available estate is used as efficiently as possible while simultaneously satisfying a considerable number of constraints. The allocation of office space to staff, postgraduate students, teaching rooms, computers rooms, etc. is carried out manually is most universities. This is a process that takes a considerable amount of time and effort from the space administrators. More importantly, this manual distribution usually provokes an inefficient utilization of the available estate. 30.4.1. Problem
Description
The space allocation problem can be briefly described as follows. Given a set of n entities (people, teaching rooms, computer rooms, etc.) and a set of m available rooms, the problem is to allocate all the n entities into the m rooms in such a way that the office space is used as efficiently as possible and the additional constraints are satisfied. Each entity requires a certain amount of space"1 according to university regulations and each room has a given capacity. It is very unlikely that the capacity of a room matches exactly the amount of space required by the entities allocated to the room. Let Ci be the capacity of the ith room and let s; be the space required by all the entities allocated to the room. Then, if d > Sj, space is said to be wasted, while if Ci < Si, space is said to be overused. It is less desirable to overuse space than to waste it. The overall space utilization efficiency is measured by the amount of space that is being misused, i.e. space wasted plus space overused for all rooms (space misuse is represented by Fi). In addition to this, space administrators should ensure that certain constraints are satisfied. Some constraints are hard, i.e. they must be satisfied while other constraints are soft, i.e. their violation should be minimized. The number of different types of constraints varies considerably between problem instances but in general, the constraints limit the ways in which the entities can be allocated to rooms. For example, two professors must not share a room, a computer room should be allocated in the ground floor and adjacent to a seminar room, teaching rooms must be away from noisy areas, postgraduate students in the same research group should be grouped together, etc. The penalty applied when a constraint is violated depends on the type of constraint and it may also vary from one problem instance to another (soft constraints violation is represented by F2). A solution or allocation is represented here by a vector II — [m, 7T2,..., 7rn] where each TTJ e {1, 2,..., m} for j — 1, 2,..., n m
Note that here, space is the floor area usually measured in m 2 .
738
J.D. Landa Silva and E.K. Burke
indicates the room to which the j t h entity has been allocated. In a multi-criteria optimization problem, the criteria can be conflicting, harmonious or independent and this has an influence on the difficulty to achieve a good approximation to the Pareto front as it was discussed by Purshouse and Fleming37. The existence of conflicting criteria makes more difficult to achieve a good convergence. If the criteria are harmonious, convergence is not affected but achieving a good diversity may be more difficult because it is very probable that solutions will have similar values in the harmonious criteria. If the criteria are independent, it is possible to decompose the problem and then to use a divide and conquer strategy to solve it. An investigation into the conflicting nature of the criteria in the space allocation problem was carried out by Landa Silva26. In that investigation it was found that in general, the minimization of space wastage is not in conflict with the minimization of space overuse and that the satisfaction of different types of soft constraints is not in conflict with each other. However, it was also found that the minimization of space misuse (overuse and wastage) is in strong conflict with the minimization of soft constraints violation. Therefore, we consider two objectives in the space allocation problem: (1) Minimization of space misuse, i.e. minimization of F\. (2) Minimization of soft constraints violation, i.e. minimization of F^. In this problem, space administrators often know of additional constraints which are not (or cannot for political reasons) be explicitly built into the objectives. For example, when two members of staff have a personality clash and cannot be allocated in the same room. Another common example is when people have a preference for certain type of rooms. Therefore, in this context, the aim of a multi-objective optimization algorithm is to aid the space administrators by finding a set of alternative high-quality allocations. Space administrators usually want to see a set of allocations which are very similar in certain aspects while being very different in other aspects. For example, administrators may want to see two or more alternative solutions in which the teaching areas are allocated to the same rooms in each of the allocations but with different ways of distributing offices to people. Another example is when the allocation needs to be re-organized and the space administrators want to explore alternative non-dominated solutions that are very similar to the existing distribution in order to avoid major disruptions. Then, in the space allocation problem it is important to take into consideration the diversity of the set of allocations with respect
Using Diversity to Guide the Search in Multi-Objective Optimization
739
to the solution space. Besides its practical interest, the space allocation problem as described here is of scientific importance because it can be formulated as a variant of the multiple knapsack problem which is an important problem in combinatorial optimization (see Dawande et al.g and Martello and Toth31). 30.4.2. Measuring Diversity of Non-Dominated
Sets
There are various papers in the.literature that propose, compare and discuss indicators to assess the performance of multi-objective optimization algorithms. These include those by Knowles and Corne24 Ang et al.1, FarhangMehr and Azarm12, Tan et al.42, Okabe et al.36 and others. Assessing the diversity (in the solution space or in the objective space) of a non-dominated set is a difficult task because, as it was discussed in Sec. 30.2, the diversity should be measured in terms of the distribution and the spread of solutions in the set. Some of the indicators proposed in the literature seek to evaluate the quality of the spread and the distribution of solutions. For example, the S metric of Zitzler and Thiele48 calculates the hypervolume of the kdimensional region covered by the approximation set. But a reference point must be given in order to compute the hypervolume and the location of this reference point may have an influence on how two or more non-dominated sets compare. Deb et al.10 proposed a spacing metric designed to measure how evenly points are distributed. That metric is based on computing the Euclidean distance between each pair of non-dominated solutions and it also requires the boundary solutions. Another spacing metric which is also based on the Euclidean distance between pairs of non-dominated solutions is the one described by Van Veldheuzien and Lamont46. Other metrics that have been proposed to estimate the diversity of a population of solutions are based on entropy as proposed by Farhang-Mehr and Azarm11. These metrics require the division of the objective space into a cellular structure. A high entropy value indicates a better distribution of solutions across the trade-off surface because it measures the flatness of the distribution of solutions or points. In this paper, diversity in the objective space is measured using a population metric proposed by Morrison and De Jong33. We have selected this metric because it does not require reference solutions and it is also related to the Hamming and Euclidean distances between solutions. The metric by Morrison and De Jong is inspired on concepts of mechanical engineering, specifically on the moment of inertia which measures mass distribution of
740
J.D. Landa Silva and E.K. Burke
an object. The centroid of a set of p points in a A;-dimensional space has coordinates given by eq. D.I, where Xij is the value of the iih dimension in the j t h point. The measure of diversity for the population of p points, based on their moment of inertia is given by eq. D.2. The higher the value of / , the higher the diversity of the set of p points.
Ci =
y p _ x. . ^ * - i *'] for i = 1,2,...
7
*
P
= EE(^-c<)2 i=i
j=i
fe
(D.I)
(D-2)
To measure diversity in the solution space, the metric used should provide a meaningful way to express the similarity between solutions for the problem at hand. Therefore, we have designed a specific way of measuring diversity in the solution space for the space allocation problem. Equation D.3 gives the percentage of non-similarity or variety used here as a measure of diversity for a set of allocations, where D(j) is the number of different values in the j i h position for all the p vectors representing the solutions. Figure 30.2 illustrates how the percentage of variety is calculated for a set of p = 5 allocations. v^n
£>(j)-l p
V = ^i=l
~1
• 100
(D.3)
n
30.5. Using Diversity to Guide the Search In this Sec. we describe the strategies that we implemented in order to obtain approximation sets that better satisfy the requirements from the decision makers in the space allocation problem. The diversity indicators / (eq. D.2) and V (eq. D.3) described above are used to guide the search and find sets of non-dominated solutions that are diverse with respect to both the solution space and the objective space.
30.5.1. Diversity as a Helper Objective We use the diversity in the objective space as a helper objective in order to decide when a candidate solution is considered attractive. Let P be a population of solutions from which a solution x is used to generate a candidate solution x'. Then, / (eq. D.2) indicates the diversity of the set P
Using Diversity to Guide the Search in Multi-Objective
Optimization
741
Five strings representing allocations
D(j)
(D(j)-l)l{p-l)
A
A
A
A
A
A
A
A
A
B
B
A
B
B
A
B
B
C
B
C
C
A
B
B
C
B
D
D
A
B
B
C
C
D
E
2
3
3
4
1
0
2
0.25 0.25 0.50
0.50 0.75
5
1
%> = (3.25/7)x 100 = 46.42%
Fig. 30.2. Calculation of the percentage of variety V for a set of p = 5 allocations. The number of entities is n = 7 and the number of rooms is m = 5.
while / ' indicates the diversity of the set P' in which x is replaced by x'. We use the expression u dominates(ci, c2,...) v to indicate that the criteria ci,C2,... are used to determine dominance between vectors u and v. Then, a candidate solution x1 is considered attractive if x' dominates(FT, I) x where FT = i*\ + F2. That is, x' is considered better than x if FT(X') < FT{x) and / ' > / or if FT{x') < FT(x) and / ' > / . Note that we use the aggregated value FT instead the individual criteria i*i and F2- This is because in our previous research we have observed that the aggregation method was more beneficial than the dominance relation" for the overall performance of our algorithm over all set of instances (see Burke and Landa Silva4). Then, a candidate solution is accepted if it has better fitness (FT) without worsening the diversity in the objective space (/) or if it has the same fitness value but it improves the diversity in the objective space. 30.5.2. Diversity to Control Exploration and Exploitation We use the diversity measure in the solution space to alternate between the phases of exploration and exploitation in our algorithm. This is similar to the strategy implemented by Ursem44 in single-objective optimization. As it was discussed above, the measure V (eq. D.3) is an indication of how diverse a set of allocations is considered by the space administrators. The value of V(PND) is used to control the algorithm search strategy, where n
We also found that using relaxed forms of dominance (see Sec. 30.3.5) seems to improve the performance of our algorithm but only in some problem instances.
742
J.D. Landa Silva and E.K. Burke
PND is the current set of non-dominated solutions. First, two threshold values are set, Vgood is the diversity that is considered as 'good' in the obtained set of non-dominated solutions and Vmin is the minimum diversity that is accepted in the obtained set of non-dominated solutions. Then, when V(PND) > Vgood the algorithm is in exploitation mode and when V(PND) < Vmin the algorithm enters the exploration mode. In exploitation mode, the algorithm attempts to find better solutions by using local search only. In exploration mode, the algorithm uses local search and a specialized mutation operator in order to increase the diversity V(PND) of the current set of non-dominated solutions. Based on our previous experience26 with the space allocation problem, we set Vgood = 70% and Vmin = 30% in our experiments. 30.5.3. The Population-Based
Hybrid Annealing
Algorithm
Our algorithm is a population-based approach in which each individual is evolved by means of local search and a specialized mutation operator. The algorithm is shown in pseudocode 1 and is a modified version of our previous approach described elsewhere4. The modification consists on adding the mechanisms described above to guide the search based on the diversity measures. The population Pc contains the current solution for each individual. The population PB contains the best solution (in terms of FT) found by each individual so far. The population PND is the external archive of nondominated solutions. A common annealing schedule is used to control the evolution process of the whole population by means of the global acceptance probability p (steps 6.3 and 6.4). The local search heuristic His selects the type of move from relocate, swap, and interchange if all the n entities are allocated. If there are unallocated entities (this occurs when the specialized mutation operator is applied as described below), then His employs the allocate move. Relocate moves an entity from one area to another, swap exchanges the assigned areas between two entities, interchange exchanges all the allocated entities between two areas, and allocate finds a suitable area to allocate an unallocated entity. The local search heuristic His incorporates a cooperation mechanism to encourage information sharing between individuals in the population. This cooperation mechanism, maintains two matrices Mr and MA of size n x t n i n which the cell (j, i) indicates the allocation of the j t h entity to the ith area. Mr stores pairs (entity,area) that are considered tabu for a number of iterations while MA stores those that are considered attractive during the search.
Using Diversity to Guide the Search in Multi- Objective Optimization
743
Pseudocode 1. The Population-based Hybrid Annealing Algorithm. 1.Generate the i n i t i a l current population of solutions PQ 2.Copy PQ to the population of best solutions PQ 3. Set acceptance probability p <— 0, cooling factor 0 < a < l , decrement step r\, re-heating step ?, and re-heating counter r f - 0 (77, ip and r are a number of iterations) 4.For r\ iterations, apply the local search heuristic HLS to each individual in PQ 5. Set p«— 1, mode = exploitation 6.For each XQ in Pc an 0 and a random generated number in the normal distribution [0,1] is smaller than p, then Xc <— X'Q b)if pxO (in our setting, if p < 0.0001), then T <— r + 1 and if r >
When a move produces a detriment in the fitness of the solution, Mr is updated as Mr{j,i) = iterations + tenure which indicates that moves involving that pair are considered tabu for tenure RJ n iterations. When a move produces an improvement in the fitness of the solutions, MA is updated as MA = MA + 1 to indicate that the higher the value of the cell, the more attractive the moves involving that pair are considered. The purpose of the specialized mutation operator is to disturb solutions in a controlled way in order to promote exploration. This operator unallocates a maximum of n/5 entities from their assigned area of space. The entities to be unallocated are selected in decreasing order of their associated penalty (violation of soft constraints associated to the entity). The entities that are unallocated by the mutation operator are re-allocated by the heuristic His-
744
J.D. Landa Silva and E.K. Burke
In the algorithm presented in pseudocode 1, the diversity I is used as a helper objective in steps 6.2 and 6.3 while the diversity V(PND) is used to guide the search in steps 7-9. In our previous approach4, the preference of the candidate solution X'c over Xc and XB in steps 6.2 and 6.3, is based solely on the value of FT- The other difference in our previous implementation is that the specialized mutation operator (steps 7-9) is applied when no individual in Pg has achieved an improvement for r\ iterations instead of being controlled by the diversity in the solution space as proposed here. 30.6. Experiments and Results The purpose of our experiments was to investigate if the mechanisms described above to guide the search based on the diversity measures / (eq. D.2) and V (eq. D.3) help our algorithm to find better sets on non-dominated solutions. Here, we are interested in finding sets of non-dominated allocations that have a good spread and distribution in the objective space but also have high diversity in the solution space. We compared the performance of the algorithm presented in pseudocode 1 to our previous implementation using the same real-world data sets nottl, nottlb and trentl described in that paper4 (these test instances are available from http://www.cs.nott.ac.uk/~jds/research/spacedata.html). 30.6.1. Experimental
Setting
For each test instance, we executed 10 runs of our algorithm described in pseudocode 1 and 10 runs of the previous implementation. In each pair of runs, the same initial set of solutions was used for the two algorithms. In each run, the stopping criterion was a maximum number of solution evaluations set to 100000, 80000 and 50000 for nottl, nottlb and trentl respectively. The parameters for the algorithm were set as in our previous paper4: \Pc\ = \PB\ = 20, a = 0.95, r\ = n and
Using Diversity to Guide the Search in Multi-Objective Optimization
745
space V (eq. D.3), the number of non-dominated solutions found |P/vz?| and the C metric of Zitzler et al.49 which is given by eq. F.I. If C(A, B) = 1, all solutions in set B are dominated by at least one solution in set A. If C(A,B) = 0, no solution in set B is dominated by a solution in set A. We used the C metric because it directly compares the quality of two nondominated sets, it is simple to compute and it does not require knowledge of the Pareto optimal front. C (A, B) =
— \B\
(b .1)
We carried out our experiments on a PC with a 3.0GHz processor, 768MB of memory and running on Windows XP. The algorithms were coded on MS Visual C++ version 6.0. 30.6.2. Discussion of Obtained Results The results of the experiments described above are shown in tables 30.124 to 30.126. Each table presents the results obtained for one test instance. DGPBAA refers to the implementation described in pseudocode 1 (with the diversity control mechanisms) and PBAA refers to the previous version (without the diversity control mechanisms). The values in the columns /, V are computed for the set PND- For the values in the column C(A,B), A represents the non-dominated set obtained by DGPBAA and B represents the non-dominated set obtained by the PBAA. It can be observed that the use of the diversity control mechanisms helps to improve the performance of the search algorithm. For example, for the nottl instance we can see in table 30.124 that in each of the 10 runs the non-dominated set obtained by DGPBAA is better than the nondominated set obtained by PBAA. That is, the approximation sets obtained when the diversity measures are used to guide the search have higher diversity in the objective space (/), higher diversity in the solution space (V), more non-dominated solutions (size) and also compares (slightly) better when using the C metric. Similar observations can be made for the test problems nottlb and trentl in tables 30.125 and 30.126 respectively. It is important to highlight that in each single run, the diversity in the solution space of the non-dominated set obtained when using the diversity control mechanisms is greater than Vgooci. On the contrary, when the mechanisms to control diversity are not used, the diversity in the solutions space of the
746
J.D. Landa Silva and E.K, Burke
obtained non-dominated set is below Vgood, except for a few runs in the test instance trentl as shown in table 30.126. When using the C metric it is not clear whether the DGPBAA implementation finds better nondominated sets. However, we should emphasize that the main contribution of the implemented mechanisms appears to be that they help the algorithm to maintain diversity in both the objective and the solution space. This is precisely the aim in the space allocation problem tackled here, to provide a set of non-dominated solutions that better satisfies the requirements of the space administrators. Results for the test instance nottl.
Table 30.124. DGPBAA
PBAA
run
/
V
size
C(A,B)
I
V
size
1 2 3 4 5 6 7 8 9 10
4.70 4.95 4.56 4.87 4.91 4.52 4.59 5.03 4.86 4.77
76.3 74.7 79.3 81.6 76.1 75.9 73.6 77.4 80.2 81.6
23 28 25 24 27 29 28 22 26 25
0.71 0.63 0.69 0.57 0.60 0.73 0.64 0.62 0.66 0.64
3.45 3.83 3.39 3.47 3.76 3.51 3.36 3.28 3.52 3.41
61.6 61.4 56.2 49.4 62.1 56.3 59.2 53.7 58.3 52.5
21 16 18 15 20 18 15 19 17 19
0.46 0.37 0.42 0.47 0.47 0.37 0.39 0.41 0.49 0.43
offline
6.43
73.2
37
0.62
5.12
47.4
24
0.37
Table 30.125.
Results for the test instance n o t t l b .
DGPBAA run
1 2 3 4 5 6 7 8 9 10 offline
/
4.31 4.48 4.87 4.22 4.95 5.04 4.69 4.27 4.63 4.91 5.63
V
72.5 74.2 75.6 71.8 74.3 75.1 73.5 71.6 74.8 73.5 67.2
C(B,A)
size
21 18 19 22 17 24 18 19 22 21 34
PBAA C(A,B)
0.59 0.61 0.57 0.46 0.62 0.59 0.63 0.71 0.66 0.57 0.72
I
3.13 3.51 3.04 3.76 3.28 3.16 2.94 3.45 3.31 3.34 4.12
V
65.2 61.6 62.7 60.5 58.4 57.3 61.3 55.7 57.3 61.6 41.4
size
C{B,A)
20 15 18 16 19 18 17 21 14 20 21
0.46 0.51 0.48 0.48 0.56 0.49 0.44 0.31 0.48 0.41 0.38
Using Diversity to Guide the Search in Multi-Objective Optimization Table 30.126.
747
Results for the test instance t r e n t l .
DGPBAA
PBAA
run
/
V
size
C{A,B)
I
V
size
C{B,A)
1 2 3 4 5 6 7 8 9 10
5.45 5.51 5.34 5.16 5.46 5.62 5.39 5.26 5.11 5.74
82.6 75.4 80.2 77.5 74.9 79.4 81.0 75.8 79.4 82.6
25 23 27 22 25 29 31 24 21 25
0.64 0.53 0.48 0.51 0.47 0.40 0.59 0.57 0.46 0.49
4.02 4.62 3.56 4.23 4.56 4.18 4.39 4.40 4.04 3.87
61.2 63.6 71.2 64.9 69.5 62.7 64.6 61.1 73.7 64.2
21 16 18 16 14 22 23 19 16 20
0.56 0.48 0.37 0.41 0.39 0.36 0.46 0.51 0.39 0.35
offline
6.76
72.4
43
0.68
5.12
52.4
28
0.37
30.7. Summary In this paper we have shown that diversity measures can be used to guide the search in multi-objective optimization in order to achieve sets of nondominated solutions that better satisfy the requirements of the decisionmakers. We carried out experiments for a real-world problem with two objectives, the problem of space allocation in academic institutions. In this problem, the decision-makers are interested in obtaining a good approximation set that is also diverse with respect to the solution space. We used the moment of inertia to measure diversity in the objective space and a problem-specific indicator to measure diversity in the solution space. The algorithm used in our experiments is a population-based approach in which each individual in the population in improved by local search and a specialized mutation operator is used to disturb a solution in a controlled fashion. Two diversity control mechanisms were incorporated to the algorithm, one based on diversity in the objective space and another based on diversity in the solution space. In the first mechanism, the diversity in the objective space is used as a helper objective in order to determine if candidate solutions generated by local search are accepted or not. In the second mechanism, the diversity in the solution space is used to alternate between the phases of exploitation and exploration. During exploitation, the algorithm employs local search only. During exploration, the specialized mutation operator is also applied in addition to local search. In order to assess the contribution of the diversity control mechanisms, we carried out experiments on three real-world test instances of the space allocation problem in academic institutions. The results obtained in our experiments show that
748
J.D. Landa Silva and E.K. Burke
the algorithm produces better sets of non-dominated solutions when the diversity control mechanisms are used to guide the search. In particular, these non-dominated sets have higher diversity in the solution space which is a common requirement by space administrators. References 1. Ang K.H., Chong G., Li Y., Preliminary statement on the current progress of multi-objective evolutionary algorithm performance measurement, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 1139-1144, (2002). 2. Belton V., Stewart T.J., Multiple criteria decision analysis - an integrated approach, Kluwer academic publishers, 2002. 3. Burke E.K., Landa Silva J.D., Improving the performance of multiobjective optimizers by using relaxed dominance, Proceedings of the l^th asiapacific conference on simulated evolution and learning (SEAL 2002), 203207, (2002). 4. Burke E.K., Landa Silva J.D., The effect of the fitness evaluation method on the performance of multiobjective search algorithms, to appear in European Journal of Operational Research, (2004). 5. Burke E.K., Landa Silva J.D., Soubeiga E., Hyperheuristic approaches for multiobjective optimisation, Proceedings of the 5th metaheuristics international conference (MIC 2003), (2003). Extended version available from the authors. 6. Burke E.K., Varley D.B., Space Allocation: an analysis of higher education requirements. The Practice and theory of automated timetabling II: Selected papers from the 2nd international conference on the practice and theory of automated timetabling (PATAT 97), Lecture notes in computer science, 1408, Springer , 20-33, (1998). 7. Coello Coello C.A., Van Veldhuizen D.A., Lamont G.B., Evolutionary algorithms for solving multi-objective problems, Kluwer academic publishers, (2002). 8. Czyzak P., Jaszkiewicz A., Pareto simulated annealing - a metaheuristic for multiple-objective combinatorial optimization, Journal of multicriteria decision analysis, 7(1), 34-47, (1998). 9. Dawande M., Kalagnanam J., Keskinocak P., Ravi R., Salman F.S., Approximation algorithms for the multiple knapsack problem with assignment restrictions, Journal of combinatorial optimization, 4(2), 171-186, (2000). 10. Deb K., Multi-objective optimization using evolutionary algorithms, Wiley, (2001). 11. Farhang-Mehr A., Azarm S., Diversity assessment of pareto optimal solution sets: an entropy approach, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 723-728, (2002). 12. Farhang-Mehr A., Azarm S., Minimal sets of quality metrics, Proceedings of the 2nd international conference on evolutionary multi-criterion opti-
Using Diversity to Guide the Search in Multi-Objective Optimization
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
mization (EMO 2003), Lecture notes in computer science, 2632, Springer, 405-417, (2003). Farina M., Amato P., Fuzzy optimality and evolutionary multiobjective optimization, Proceedings of the 2nd international conference on evolutionary multi-criterion optimization (EMO 2003), Lecture notes in computer science, 2362, Springer, 58-72, (2003). Gandibleux X., Morita H., Katoh N., The supported solutions used as a genetic information in a population heuristics, Proceedings of the 1st international conference on evolutionary multi-criterion optimization (EMO 2001), Lecture notes in computer science, 1993, Springer, 429-442, (2001). Ishibuchi H., Yoshida T., Murata T., Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling, IEEE transactions on evolutionary computation, 7(2), IEEE press, 204-223, (2003). Jensen M.T., Guiding single-objective optimization using multi-objective methods, Applications of evolutionary computing, Proceedings of the EvoWorkshops 2002, Lecture notes in computer science, 2611, Springer, 268-279, (2003). Jin H., Wong M.L., Adaptive diversity maintenance and convergence guarantee in multiobjective evolutionary algorithms, Proceedings of the 2003 congress on evolutionary computation (CEC 2003), IEEE press, 2498-2505, (2003). Jin Y., Sendhoff B., Connectedness, regularity and the success of local search in evolutionary multi-objective optimization, Proceedings of the 2003 congress on evolutionary computation (CEC 2003), IEEE press, 1910-1917 (2003). Jones D.F., Mirrazavi S.K., Tamiz M., Multiobjective meta-heuristics: an overview of the current state-of-the-art, European journal of operational research, 137(1), 1-9, (2001). Knowles J.D., Watson R.A., Corne D.W., Reducing local optima in singleobjective problems by multi-objectivization. Proceedings of the 1st international conference on evolutionary multi-criterion optimization (EMO 2001), Lecture notes in computer science, 1993, Springer, 269-283, (2001). Kumar R., Rockett P., Improved sampling of the pareto-front in multiobjective genetic optimization by steady-state evolution: a pareto converging genetic algorithm, Evolutionary computation, 10(3), 283-314, (2002). Knowles J., Corne D.C., Approximating the nondominated front using the pareto archived evolution strategy. Evolutionary computation, 8(2), MIT press, 149-172, (2000). Knowles J.D., Corne D.W., Towards landscape analyses to inform the design of a hybrid local search for the multiobjective quadratic assignment problem, Soft computing systems: design, management and applications, IOS Press, 271-279, (2002). Knowles J., Corne D., On metrics for comparing nondominated sets, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 711-716, (2002).
749
750
J.D. Landa Silva and E.K. Burke
25. Knowles J.D., Corne D.W., Fleischer M., Bounded archiving using the lebesgue measure, Proceedings of the 2003 congress on evolutionary computation (CEC 2003), IEEE press, 2490-2497, (2003). 26. Landa Silva J.D., Metaheuristics and multiobjective approaches for space allocation. PhD thesis, School of computer science and information technology, University of Nottingham, UK, (2003). 27. Laumanns M., Zitzler E., Thiele L., On the effects of archiving, elitism, and density based selection in evolutionary multi-objective optimization, Proceedings of the 1st international conference on evolutionary multicriterion optimization (EMO 2001), Lecture notes in computer science, 1993, Springer, 181-196, (2001). 28. Laumanns M., Thiele L., Deb K., Zitzler E., Combining convergence and diversity in evolutionary multiobjective optimization, Evolutionary computation, 10(3), 263-282, (2002). 29. Lu H., Yen G.G., Dynamic population size in multiobjective evolutionary algorithms, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 1648-1653, (2002). 30. Lu H., Yen G.G., Rank-density based multiobjective genetic algorithm, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 944-949, (2002). 31. Martello S., Toth P., Knapsack problems - algorithms and computer implementations, Wiley, (1990). 32. Merz P, Freisleben B. Fitness landscape and memetic algorithm design, in: Corne D., Dorigo M., Glover F. (eds.), New ideas in optimisation, McGraw Hill, 245-260, (1999). 33. Morrison R.W., De Jong K.A., Measurement of population diversity, Artificial Evolution: Selected Papers of the 5th International Conference on Artificial Evolution (EA 2001), Lecture notes in computer science, 2310, Springer, 31-41, (2001). 34. Mostaghim S., Teich J., The role of e-dominance in multi-objective particle swarm optimization methods, Proceedings of the 2003 congress on evolutionary computation (CEC 2003), IEEE press, 1764-1771, (2003). 35. Murata T., Ishibuchi H., Gen M., Specification of genetic search directions in cellular multi-objective genetic algorithms, Proceedings of the 1st international conference on evolutionary multi-criterion optimization (EMO 2001), Lecture notes in computer science, 1993, Springer, 82-95, (2001). 36. Okabe T., Jin Y., Sendhoff B. A critical survey of performance indices for multi-objective optimisation. Proceedings of the 2003 congress on evolutionary computation (CEC 2003), IEEE press, 862-869, (2003). 37. Purshouse R.C., Fleming P.J., Conflict, harmony, and independence: relationships in evolutionary multi-criterion optimisation, Proceedings of the 2nd international conference on evolutionary multi-criterion optimization (EMO 2003), Lecture notes in computer science, 2632, Springer, 16-30, (2003). 38. Shimodaira H., A diversity control oriented genetic algorithm (DCGA): development and experimental results, Proceedings of the 1999 genetic and
Using Diversity to Guide the Search in Multi-Objective Optimization
751
evolutionary computation conference (GECCO 1999), Morgan kaufmann, 603-611, 1999.
39. Steuer Ralph E., Multiple criteria optimization: theory, computation and application, Wiley, (1986). 40. Socha K., Kisiei-Dorohinicki M., Agent-based evolutionary multiobjective optimization, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 109-114, (2002). 41. Talbi E.G., Rahoudal M., Mabed M.H., Dhaenens C , A hybrid evolutionary approach for multicriteria optimization problems: application to the flow shop, Proceedings of the 1st international conference on evolutionary multi-criterion optimization (EMO 2001), Lecture notes in computer science, 1993, Springer, 416-428, (2001). 42. Tan K.C, Lee T.H., Khor E.F., Evolutionary algorithms for multi-objective optimization: performance assessments and comparisons, Artificial intelligence review, 17, 253-290, (2002). 43. Ulungu E.L., Teghem J. Fortemps P.H., Tuyttens D., MOSA method: a tool for solving multiobjective combinatorial optimization problems, Journal of multicriteria decision analysis, 8, 221-236, (1999). 44. Ursem R.K., Diversity-guided evolutionary algorithms, Proceedings of the 7th parallel problem solving from nature (PPSN VII), Lecture notes in computer science, 2439, Springer, 462-471, (2002). 45. Van Veldhuizen D.A., Lamont G.B., Multiobjective evolutionary algorithms: analyzing the state-of-the-art, Evolutionary computation, 8(2), 125-147, (2000). 46. Van Veldhuizen D.A., Lamont G.B., On measuring multiobjective evolutionary algorithms performance, Proceedings of the 2000 congress on evolutionary computation (CEC 2000), IEEE press, 204-211, (2000). 47. Zhu Z.Y., Leung K.S., Asynchronous self-adjustable island genetic algorithm for multi-objective optimization problems, Proceedings of the 2002 congress on evolutionary computation (CEC 2002), IEEE press, 837-842, (2002). 48. Zitzler E., Thiele L., Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach, IEEE transactions on evolutionary computation, 3(4), 257-271, (1999). 49. Zitzler E., Deb K., Thiele L. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary computation, 8(2), 173-195, (2000).
INDEX
-E-dominance, 734 a-dominance, 734 e-constraint method, 254 e-dominance, 5, 17, 87, 734 fc-dominance, 734 (1+1)-ES, 55 NP- complete problem, 629 NP-hard, 395, 402, 410
bi-objective assignment problem, 21, 567 bi-objective covering tour problem, 19, 252 bi-objective harvest scheduling, 715 bi-objective knapsack problem, 21, 567 bias-variance dilemma, 394 binary classifier, 369 binding, 273 biological sequences, 20 black-box optimization, 395 blended crossover, 299 BLX, see blended crossover bottleneck machine, 505 bound sets, 559, 562 brachytherapy, 373 branch-and-cut, 19
absolute sensors, 126 Adaptive Range Multiobjective Genetic Algorithm, 19 additive e-quality measure, 288 admissible solutions, 4 aerodynamic optimization, 296, 298, 301 aggregating functions, 6 allocation, 272 ANN, see artificial neural network ARAC, 383 arc routing problem, 248 ARMOGA, see Adaptive Range Multiobjective Genetic Algorithm artificial neural network, 639, 677 attribute based distance function, 490 attribute selection, 603, 605 automated parameterization, 81 autonomous vehicle, 125
C4.5, 610 CACSD computer aided control system design, 155 unified approach, 158 Capital Asset Pricing Model, 640 CAPM, see Capital Asset Pricing Model cartesian genetic programming, 104 catalog, 483, 489, 500 cell design/formation, 507 cellular manufacturing system, 20, 505 Center-of-Gravity Method, 17, 127 chemical engineering, 19
backward sequential selection, 605 Bayesian decision, 398 beam orientation, 377 benchmark application, 273 best-N selection, 299 753
754
chemotherapy, 381 cis-acting DNA sequences, 428 city planning, 18, 227 class NP, 629 class P, 629 classification, 383, 603 classifier, 395, 407, 418 cluster, 396, 406, 412, 418, 419 cluster analysis, 311 CoGM, see Center-of-Gravity Method combinational logic circuit design, 101 combinatorial optimization, 395, 402, 410 compact'genetic algorithm, 10 complexity, 628 in neural networks, 22 component catalog, 500 computational finance, 627 computational fluid dynamics, 296 computational parameters, 334 computer engineering application, 270 computer science applications, 451 computer-aided diagnosis, 19, 369 confidence, 383 confidence level, 393 connection matrix, 655 connectionist architecture, 400, 418 constraint, 412 convergence, 402, 404, 406, 410, 420 cooperation mechanism, 742 coverage, 383 coverage measure, 288 covering tour problem, 247 credit assignment, 397 credit risk, 642 Credit-Value-at-Risk, 643 CreditRisk+, 645 cross-talk, 396, 397 cross-validation, 612 crossover operator, 322, 493, 557, 564, 566 crowding, 543 clustering, 732 curse of dimensionality, 400 CVaR, see Credit-Value-at-Risk CX, see Cycle Crossover
Index
Cycle Crossover, 587 cyclone separator, 317, 319, 322-330 cyclone separators design, 19 data density, 408 data mining, 19, 21, 295, 297, 312, 603 decision boundary, 396, 406, 419 decision making process, 614 decision tree, 397, 400, 604 decomposition, 395, 397, 398 decomposition-through-competition, 398 definition of complexity, 629 delay, 271 design of a valve actuation system, 498 design of fluid power systems, 20, 483 design process, 485 design space exploration, 272 design unification and automation, 158 dimensionality, 400, 406, 412 direct-drive implosion, 354 distance function, 489-491 diversification, 731 diversity, 402, 404, 410, 730, 739 estimators, 736, 739, 740 guided search, 736, 740, 741 measures, 22 divide-and-conquer, 396, 420 dominated portfolio, 632 dose distribution, 372 optimization, 372 volume histogram, 372 dynamic population sizing, 86 efficiency goals, 15 efficient frontier, 557, 562, 632 efficient solutions, 4, 557 eigen-value, 400 electrocardiogram, 371 electrodynamical effects, 62 elite archives, 691
755
Index elite solutions, 21, 535 elitism, 489, 520 engineering design, 17, 29 ensemble based, 394, 419, 420 entropy, 5 environmental engineering, 79, 317 equivalent solutions, 558 error ratio, 11 error-function, 394, 396 error-surface, 396 Espresso, 103 evolutionary algorithms, 1, 177, 178, 557 evolutionary multi-objective optimization, 2 evolutionary neural networks, 679 evolutionary regularization, 657 evolutionary strategy, 411 evolvable hardware, 103 expected loss, 643 expert network, 398 expert systems, 136 explicit building block, 451 EXPO, 284 extrapolation, 393 extrinsic evolution, 103 fast messy Genetic Algorithm, 458 feasible solution, 31 features, 367 feedforward network, 408, 418 FEMO, 286 filter approach, 605 finance, 628 financial applications, 21 financial problems, 628 finite-element solver, 63 fitness inheritance, 22 fitness sharing, 299, 732 flowshop, 506 flowshop scheduling, 20, 529, 531 fluid power, 494 fluid power system design, 494 fmGA, see fast messy Genetic Algorithm forest
benchmark problem, 702 management problems, 22 scheduling problems, 22 forward sequential selection, 605 fringing field, 64 fuzzy logic, 136, 429 GAP, see Generalized Analysis of Promoters gating network, 398 Gaussian blobs, 416 mutation, 412 noise, 412 regularization, 657 gene expression, 385 general multi-objective optimization problem, 3 General Multi-Objective Program, 10 generalization, 393, 394, 398, 400, 407, 420, 421 generalization error, 679, 692 Generalized Analysis of Promoters, 20, 429 generational distance, 12 generational nondominated vector generation, 14 generic framework, 393, 420 genetic algorithm, 185, 317, 320, 529 drift, 403 heritage, 564 layout optimization, 584 local search, 529, 540 map, 563 networks, 428 optimization, 393, 402, 404 GENMOP, see General Multi-Objective Program GENOCOP, 637 genome representation, 489 global minimum, 3 Global Positioning System, 126 gradient-based local search, 646 gross tumor volume, 372 groundwater monitoring, 80
756
group technology, 505 handshake protocol, 283 helper objectives, 735, 740 heuristic, 412 hierarchical Bayesian network, 10 hierarchical classifier, 421 hierarchical partitioning, 397 high dose rate brachytherapy, 373 high-dimensional, 393, 394, 396, 404, 415, 417 hillclimbing, 22 hybrid algorithm, 637, 641, 642 hybrid annealing algorithm, 742, 743 hybrid EMO algorithm, 550 hybrid strategies, 58 hyperarea and ratio, 12 hyperplane, 397, 409 hypersphere, 407, 408, 412, 417, 418 image reconstruction, 366, 367 imbalanced training set, 409 implosion core plasma gradients, 342 indirect-drive implosion, 357 induction heating, 69 Inertial Confinement Fusion, 342 Inertial Measurement Unit, 126 infeasible solution, 32 insertion, 534 intelligent machine, 393 intensity modulated beam radiotherapy, 379 inter-island rank histogram, 413-415 inter-module, 397 intercellular parts movement, 512 interface benchmark-optimizer, 281 interpolation, 393 intra-island rank histogram, 413, 414 intra-module, 397 intrinsic dimensionality, 406, 408, 416 intrinsic evolution, 103 inverse planning, 373 inverse problem, 63, 365 Inverted and Shrinkable Pareto Archived Evolutionary Strategy, 18
Index
ISPAES, see Inverted and Shrinkable Pareto Archived Evolutionary Strategy algorithm, 204 iterative refinement, 394, 396 job shop, 506 Karnaugh maps, 103 knapsack problems, 630 knowledge discovery, 382 Lamarckian evolution, 658 Laplace regularization, 657 LDR, see low dose rate brachytherapy learning architecture, 395 complexity, 395, 406, 408, 420 cost, 393, 395, 408 error, 395 linear time-invariant, 155 linear variable differential transformer, 128 linkage learning algorithm, 10 local search, 20, 21 local search methods, 605 local search operator, 557, 561, 566, 573 local search variation operator, 644 low dose rate brachytherapy, 386 LTI, see linear time-invariant machine duplication, 511, 513 machine learning, 19, 393, 394, 406, 415, 417, 419 machine under-utilization, 513 magnetic reactor, 62 magnetic shield, 62 makespan, 531 manufacturing cells, 507 Markowitz portfolio selection, 631 material cost, 64 mating restrictions, 5, 20, 403, 404 mating scheme, 548 maximin fitness function, 18, 229
757
Index MCEA, see Multi-objective Continuous Evolutionary Algorithm MDESTRA, see Multi Directional Evolution Strategy Algorithm, 61 mechanical engineering, 483 medical image processing, 19 medicine, 19, 365 messy Genetic Algorithm, 9, 458 meta-knowledge, 399 meta-learning, 399 metrics for MOEAs, 11 mGA, .see messy Genetic Algorithm,
MOGM, see Multi-Objective Gradient-based Method moment of inertia, 739, 740 MOMGA, see Multi-Objective Messy Genetic Algorithm MOMGA-II, see Multi-Objective Messy Genetic Algorithm - II MOSA method, 571 MOSGA, see multi-objective struggle genetic algorithm MOSS, see Multi-Objective Scatter Search mQAP, see multi-objective Quadratic Assigment Problem, 451
MGK algorithm, 575 Micro-Genetic Algorithm for Multi-Objective Optimization, 10 microarray, 385 , , „„ minimal complete set, 558, 568, 569 ,,. . , _ . , ' , ' Minimal Description Length, 401 . . . , , . , „ , minimum description length, 431 MisII 103 . ! • ii j • .1 nrv mixed variable design problem, 20,
Multi Directional Evolution Strategy Algorithm, 17, 61 multi-objective combinatorial o p t i m i z a tion, 7, 20, 556, 557 ii.-i.-i.u- i. • i multi-objective combinatorial ,, . __ problems, 177 mii.-i.-i.n J.Multi-objective Continuous . . .. ... __ Evolutionary Algorithm, 17, 127 Multi-Objective Evolutionary r. .•>**'->** ™E
.„„
MMOKP, see Modified Multi-objective Knapsack Problem model complexity, 681, 682, 684 Modified Multi-objective Knapsack Problem 20 451 modular system, 394, 398, 419 MOEA 177 186 MOEA'performance measures, 11 MOEA toolbox, 17 control module, 159 decision-making module, 159 optimization module, 159 parameter settings, 167 specification template, 159 MOEAs, .see Multi-Objective Evolutionary Algorithms MOEAs in design of combinational logic circuits, 101 MOGA, see Multi-Objective Genetic Algorithm, 19, 22
Algorithms, 4, 125, 155, 295
Multi-Objective Forward Sequential Selection, 21, 611 Multi-Objective Genetic Algorithm, 8 ll > ' ouo Multi-Objective Gradient-based Method, 17, 127 Multi-Objective Hierarchical Bayesian Optimization Algorithm, 10
Multi-Objective Messy Genetic Algorithm, 9, 458 Multi-Objective Messy Genetic Algorithm - II, 20, 451 multi-objective optimization, 30, 156, 432 multi-objective optimization problem, 2 multi-objective particle swarm optimization, 107 multi-objective Quadratic Assigment Problem, 20, 451
758 multi-objective rectangular packing problem, 21, 581, 583 Multi-Objective Scatter Search, 20, 429 multi-objective spectroscopic data analysis, 341, 347 multi-objective struggle genetic algorithm, 20, 483, 487 multi-objective truss optimization, 201 multi-start local search, 529, 539 multiple costs, 395 Multiple Objective Heuristics, 556 Multiple Objective MetaHeuristics, 556
multiple views, 393, 395, 420 mutation, 322 nadir point, 562 Navier-Stokes, 296, 301, 302, 311 NCGA, see Neighborhood Cultivation Genetic Algorithm nearest neighbor, 408, 418 Neighborhood Cultivation Genetic A 1 -j.1. 01 m i Algorithm, 21, 591 • iv • j 11 ..v. rc-y neighborhood search algorithms, 557 1 4. 1 oi on one one inv neural network, 21, 22, 296, 396, 397, OQQ neural network ensemble, 21, 659 Niched Pareto Genetic Algorithm, 9, 343 344 347 niching, 403, 411 No Free Lunch Theorem, 401 Non-dominated Sorting Evolutionary Strategy Algorithm, 17, 55 non-inferior solutions, 4 non-linear regression, 639 non-minimum phase, 165 non-supported efficient solutions, 558, 563 nondeterministic algorithm, 629 nondominated portfolio, 632 nondominated solutions, 30, 44 nondominated sorting, 519 Nondominated Sorting Genetic Algorithm, 8, 32
Index Nondominated Sorting Genetic Algorithm-II, 6, 9 nondominated vector addition, 14 normal tissue, 372 NPGA, see Niched Pareto Genetic Algorithm, see Niched-Pareto Genetic Algorithm NSESA, see Non-dominated Sorting Evolutionary Strategy Algorithm, 55 NSGA, 17, 19, 20, see Nondominated Sorting Genetic Algorithm, 317, 31s, 323, 336, 519, 633 NSGA-II, see Nondominated Sorting Genetic Algorithm-II, 19-22, 80, 255, 286, 542, 637, 644, 659 nugget discovery, 383 objective oriented search, 735 optimization, 29, 178 optimization problems, 630 Q r d e r Crosgover) 532] 5 8 ?
order-based coding, 529, 532 , . , „__ organs at risk, 372 „ . .„„ „„„ oscillation strategy, 563, 573 outlier, 407, 409, 417, 418 ' ' ' ' overall nondominated vector generation ratio, 13 overfittmg, 394, 399, 400 ®^-> see Order Crossover P a c k e t processor design, 19 P a c k e t processors, 271 PAES, see Pareto Archived Evolution Strategy, 21 PAF > see Paroxysmal Atrial Fibrillation Pareto Archived Evolution Strategy, 10 Pareto Converging Genetic Algorithm, 20, 395, 411 Pareto dominance, 488, 543 Pareto dominance selection , 202 Pareto Evolution Strategy Algorithm, 17, 59
Index Pareto front, 4, 347, 395, 402, 404, 405, 412 Pareto frontier sampling, 594 Pareto Gradient Based Algorithm, 17, 58 Pareto optimal controller, 155 Pareto optimal designs, 29 Pareto optimal set, 4, 318, 329, 331, 335 Pareto optimal solutions, 127 Pareto optimality, 3 Pareto ranking, 8, 59, 411, 414 Pareto ranking method, 299 Pareto-based approaches, 8 Paroxysmal Atrial Fibrillation, 370 part families, 505 part subcontracting, 511, 513 partial classification, 383 Partially Mapped Crossover, 587 particle swarm optimization, 101 multiobjective, 107 partition, 393, 407, 412, 415, 419 partitioning, 393, 395, 396, 406, 415, 420 path-relinking operator, 21, 557, 559, 565, 566 pattern space, 395, 396, 406, 407 PCA, see. Principal Component Analysis PCGA, see Pareto Converging Genetic Algorithm performance measures, 11 performance satisfaction, 157 performance specification, 161 actuator saturation, 163 disturbance rejection, 162 minimal controller order, 164 robust stability, 162 stability, 161 step response specifications, 162 permutation, 529, 531 PESTRA, see Pareto Evolution Strategy Algorithm, see Pareto Evolution Strategy Algorithm
759 PGBA, see Pareto Gradient Based Algorithm, see Pareto Gradient Based Algorithm phenotype based distance function, 490 physics, 19 PISA, 283 Placement-based Partially Exchanging Crossover, 589 planning target volume, 372 plant uncertainty, 167 PMX, see Partially Mapped Crossover polymer extrusion, 177, 178, 184 population-based approaches, 7 Population-based Hybrid Annealing Algorithm, 22 portfolio management problems, 21 portfolio selection, 631, 642 potential efficient solutions, 556, 561 PPEX, see Placement-based Partially Exchanging Crossover pre-processor, 393, 395 predictive accuracy, 606 Principal Component Analysis, 406 progress measure, 13 pruning, 400 QAP, see quadratic assignment problem quadratic assignment problem, 454 quadratic programming, 633 qualitative features, 427 quality-of-service, 271 quasi-Newton method, 141 Quine-McCluskey method, 103 quota traveling salesman problem, 251 radiotherapy, 376 randomized search, 395 rank-histogram, 395, 405, 411 real-world application, 393, 401, 419 receiver operating characteristic, 369 rectangular packing problems, 583
760 Reduced Pareto Set Genetic Algorithm with Elitism, 18 regional planning, 18 regression, 21 regularization, 656 relative sensors, 126 relaxed dominance, 733 replacement scheme, 489 restricted mating, 733 return function, 632 risk function, 632 measure, 632 risk-adjusted performance measure, 645 RMSE, see Root Mean Squared Error RNA polymerase, 433 ROC, see receiver operating characteristic Root Mean Squared Error, 639 Rprop, 658 Rprop + , 659 RPSGAe, see Reduced Pareto Set Genetic Algorithm with Elitism, 177, 178, 186, 187, 194, 196 SBX, see Simulated Binary Crossover scalability, 394, 420 scalar formulation, 65 scalar objective function, 545 scheduling, 273 search space reduction, 207 secondary population, 5 seeded starting generation, 234 selective traveling salesman problem, 251 Self-Organizing Map, 19, 296 SEMO, 286 sensitivity, 369 sequence-pair, 585, 586 shape design, 62 sharing, 403, 404, 411 sharing function, 321 significant Pareto dominance, 619 similarity measure, 489, 490 simulated annealing, 22
Index Simulated Binary Crossover, 637 simulation, 495, 500 single objective harvest scheduling, 708 space allocation, 22, 736 spacing, 13 spatial cross-talk, 396, 397 SPEA, see Strength Pareto Evolutionary Algorithm, 611, 640 SPEA2, see Strength Pareto Evolutionary Algorithm 2, 286 specificity, 369 spectroscopic data analysis, 19 steady-state algorithm, 411 steepest descent method, 141 stock market, 628 stream, 275 Strength Pareto Evolutionary Algorithm, 9 Strength Pareto Evolutionary Algorithm 2, 9 structural criterion of complexity, 649 struggle crowding, 488 stud genetic algorithm, 10 subspace learning, 407 supersonic transport, 296 supported efficient solutions, 558, 562, 564 swap, 534 switch, 534 tabu search, 22 tardiness, 531 task, 275 teletherapy, 376 temporal cross-talk, 396, 397 term structure of interest rates, 647 testing important issues, 15 TFH, 73 throughput, 271 time series forecasting, 21, 693 time series prediction, 639 trade-off, 155, 158, 170 trading strategy, 639 training, 369
761
Index transaction cost, 635, 636 transportation planning, 227 transverse-flux heating, 62 traveling salesman problem, 18, 177, 187, 188, 248 traveling salesman problems with profit, 248 treatment planning, 19, 372 tree classifier, 397 truss optimization, 18, 201 TSP, see traveling salesman problem two set coverage, 11 UAVs, see Unmanned Aerial Vehicles Ugly Duckling Theorem, 401 ULTIC, see unified linear time-invariant control evolutionary CACSD paradigm, 159 evolutionary design application, 165 optimal design, 158 system formulation, 160 underrating, 394, 399, 400 unexpected loss, 643 unified linear time-invariant control, 17, 156 uniform crossover, 493 uniform selection, 488 Unmanned Aerial Vehicles, 451 usage scenarios, 272 validation error, 395, 409, 418 valuation models for financial products, 647 Vector Evaluated Genetic Algorithm, 7, 102 VEGA, see Vector Evaluated Genetic Algorithm vehicle routing, 248 vehicle routing problem, 248 venturi scrubber, 317, 319, 331-334 venturi scrubbers design, 19 weak dominance, 729
weight decay, 400 weight decay regularization, 681 weight vector, 546 weighted vectors, 731 winner-takes-all, 398 wrapper approach, 605