This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
{1,2}" mapping the truth assignments for Monotone2SAT onto the bundles for 2Cons2. ip(B(v)) := x with Xi = 1 •£> Vi = false. ip is one-to-one and B(v) is a satisfying truth assignment for F iff ip{B{v)) is a consistent bundle for A. Details of the proof can be found in [FS97], • Since mCons2 is a generalization of 2Cons2 we conclude that we cannot expect to get polynomial time algorithms for the counting problem considered here. < (p,/(!/)> , Vy G R " . 0 , f(y) - f(x) € C , g(x) > 0, R, given by 0 whenever xv — v" -4 0 then the bounding M >-¥ 0(M) is normal, where /3(M) = min{<^(z)|
3.3
mCons3
Next, we will show that the problem to find the most consistent bundle according to a consistency matrix A is A r P-complete even if we restrict our problems to have
63 binary key factors only. This is done by reducing the following well known problem to 2Cons3: Problem SimpleMaxCut: In: Out:
k € IN, an undirected graph G = (V, E). Is there a partition Vi U Vi of V such that the number of edges between Vi and Vi is at least k ?
SimpleMaxCut is known to be NP-complete ([GJ79, p. 210]). Theorem 3 2Cons3 is
NP-complete.
Sketch of proof: Let (A;, G) be an instance of SimpleMaxCut. Define S := V, n := |V|, m := 2 and A € /C 2 n x 2 n by _ J 2 A{s,bUt,d) •- | !
if {s,t} € £ a n d & ^ d otherwiSe
Define c:= k + \n(n — 1). This transformation can be computed in polynomial time. We refer to [FS97] for the technical details of the proof that G has a cut of size at least k iff there exists a consistent bundle x for A having consistency value 7(1) > c. Observing that 2Cons3 G NP we have the desired result.
•
It follows that mCons3 is iVP-complete for m > 2.
4
Algorithms
In this section we will present algorithms for the three problems mConsl, mCons2 and mCons3. Although we know from Section 3 that polynomial algorithms cannot be expected we will show that, for instances appearing in practice, fast algorithms can be developed. These algorithms are based on standard techniques to solve ./VP-hard problems.
4.1
mConsl
From Theorem 1 we know that mConsl is computationally equivalent to mSAT. To develop an algorithm for mConsl we use the reduction function from Lemma 2 to
64
generate an instance for mSAT from an instance for mConsl. The instances for mSAT are then solved using the public domain SAT-solver SATO [Zha93]. Since a large fraction of the clauses generated by the reduction function are 2-clauses, SATO efficiently solves even large instances. • n-20, m - 2 n*=20, ra-^3 v~r.-i***^"^"*
' n**20, m^4
Jr1
•i**
£i!l
>-Vu»^,A«.'»»-*^* m v i'"
„„*««K ,
v
/ - A ""> »"
200
400 number of inconsistencies
Figure 1: Run times to solve mConsl with SATO To test the performance of our approach we generated three classes of random consistency matrices. Figure 1 shows the average performance of SATO on these matrices. We fixed n = 20 for all three classes and varied m from 2 to 4. For each class random matrices with up to 600 total inconsistencies have been generated. The inconsistencies have been distributed uniformly over the matrices. In addition we included the performance of SATO on two real scenario projects performed recently by the authors of [GFS95]. These measurements are marked A and B in the figure. The consistency matrices of these projects are shown in the appendix of [FS97]. The experiments were run on a Pentium 133MHz processor. Note that SATO solves all the problem instances in a fraction of a second. Thus we have established our Irst goal to solve practically relevant instances of mConsl in real time.
4.2
mCons2
In Section 3 we have shown that mCons2 is at least as hard as TVP-complete problems. Therefore we cannot expect to find polynomial algorithms for the problem. In [FS97] we describe two algorithms to count the number of consistent bundles for a consistency matrix A, Both have exponential run times in the worst case but are fast enough for real time requirements on problem instances of practical relevance. The first algorithm is a backtracking algorithm counting the number of consistent bundles. The algorithm successively constructs all bundles by fixing the key factors one after another. Thus, a tree is generated with the leaves of the tree corresponding
65 to single bundles. The algorithm cuts off the search at a tree node if the last key factor fixed is totally inconsistent to one of the key factors fixed in levels above. The second algorithm is based on the inclusion-exclusion principle and successively generates all subsets of the sets of total inconsistencies. The run time of the algorithm is 0(2' 7 I) if i" is the set of inconsistencies in A. In [FS97] we found that the first algorithm is more efficient than the second one if the number of inconsistencies exceeds \nm2 when running on randomly generated instances. Our solution applies the backtracking algorithm if this threshold is exceeded and the inclusion-exclusion algorithm in the other case. Due to the limited space we refer the reader to [FS97] for details.
4.3
mCons3
In this section we will develop a Branch&Bound algorithm which solves the optimization problem Problem m C o n s 3 a : In: n,m £lN, and a consistency matrix A 6 fCnmxnm. Out: A consistent bundle x with 7(1) > 7(2/) for all consistent ye {l,...,m}s. In order to avoid the optimization process to suppress projections the scenario managers often are interested in a set X of constant size c of optimal bundles for each projection as formalized in Problem c-mCons3b: In: n,m £]N, and a consistency matrix A € ]Cnmxnm. Out: For each projection (s, b) a set X(a,b) of consistent bundles with l^s.ft)! = c such that for all x € <-f(s,f>) a n d all consistent y € { 1 , . . . ,m}s \ /f(s,6), with ys = b we have xs = b and •y(x) > 7(2/). All consistent bundles x with xs = b should be delivered, if there are less than c such bundles. The latter problem is especially important, when many different scenarios are under investigation even if they are considered to be unlikely, e.g. if the effect of a crisis shall be studied. Variations of c-mCons3b with different additional constraints are sometimes used but are not considered here.
66
Both problems are computationally equivalent to mCons3 and therefore are known to be ATP-complete from Theorem 3. In what follows we will first present a BranchfeBound algorithm for mCons3a. Later we will describe some modifications which allow the algorithm to solve c-mCons3b. The general structure of the algorithm is presented in Figure 2.
Branch&Bound(consMatrix A, nt n, int m ) { bundle opt, x, y; opt = initialSolution(A,n,m); lower = 7(opt);
* = (); Open = {(x, MAXINT)}; while (lempty(Open)) { x = best entry from Open; y = x; next = first unfixed key factor of x; for (b=l; b < m; b + + ) { y[next] = b; k = Bound (y); if (k > lower) { if (next < n-1) Open = Open U {(y,K)}; else { opt = y; lower = k; Open = Open \ {(z i P) 1 P < lower }; } } } } return opt; }
Figure 2: A Branch&Bound-algorithm for mCons3a Some questions remain open in this general approach, which will be investigated in more detail in the next paragraphs.
Finding an initial solution To find a good initial solution we use a state-of-the-art Hill Climbing algorithm: For a number of HCRounds rounds the algorithm starts with a random bundle. It then tries to improve on a given bundle by testing all neighbors (bundles differing in exactly one component) and proceeding with the one having the largest consistency value. This is done as long as there is any neighbor having a better consistency value than the current bundle.
67 We tested the Hill Climbing algorithm on 6000 random inputs with the same parameters as a recent scenario project: we generated matrices corresponding to 20 key factors, 9 of them having 2 projections the other 11 having three projections. The consistency matrix contains 26 total inconsistencies uniformly distributed over the set of entries and all the other entries were randomly chosen from the set of consistency values without zero. The quality of the solution returned by the function "initialSolution" is excellent. Even with HCRounds = 20 a best solution is found in more than 80 % of the instances. With HCRounds = 100 a best solution is found in more than 99 % of the instances. The run time of the function is smaller than one second even for HCRounds = 100. In our experiments we obtained the best results with HCRounds = 5. These excellent results may well be due to the fact that the instances used are random instances. However, the algorithm behaves equally well on instances appearing in practice.
Computing a lower bound The second question is how to compute a lower bound f5(x) for a bundle x that is fixed only in the first h key factors. We are interested only in monotone bounds, i.e. if x' is a bundle resulting from x by fixing a key factor then P{x') should not be larger than P(x). This is guaranteed by the following: Definition 3 Let n,m G IN and A be an n x m-consistency matrix. A function /3 : { 1 , . . . , m}~n —>W is called a bound function for A iff for each x G { 1 , . . . , m}h we have f3(x) > y(y) \/y € { 1 , . . . , m}", x is prefix of y. Consider the following function: Definition 4 Let n,m G W, and S = { 0 , . . . , n — 1} be a set of key factors. Let A be an n x m consistency matrix. Let x be a bundle with the key factors s G { 0 , . . . , h} fixed and all the others unfixed. We define h-l
h
s=0 t=s+l
'
. s,t fixed h
Ps,b(x) :=
Y A(t,xt),(s,b) t fixed
'
68 s-1
Kbix)
•=
max
Y.
AA(t4),{s,b)}
J
t unfixed
P(x) := ft(z) +
V
max (/3s,6(x) + J3s,b{x)) 6e{l,...,m}
s=h+1
/? sums up the consistency values of all pairs of projections that have already been fixed. For any unfixed key factor s and all projections (s, 6) the consistency values obtained from the key factors already fixed and the maximum possible consistency values that can be expected from the unfixed key factors are computed under the assumption that xs will become b. For each unfixed key factor s the maximum over all projections (s, 6) is added to obtain the value /3(x). Therefore, f3 is clearly a bound function. In [Sen97] we compared j3 to other functions. Our experiments show that (5 is one of the best functions considered there but that there are several plausible alternatives. In order to compute /3 efficiently during the search it is updated incrementally with the help of several numbers as described in the following definition. Definition 5 Let n G IN, m G IN and let A be annxm consistency matrix. Let x be a bundle with the key factors s 6 { 0 , . . . , h} fixed and all the others unfixed. /i-l
Xi(x,h)
h
:= 5Z H As,*»),(t,z«) s=0 t=s+l h
X2(x,h,s,b)
:= J2 A(t<Xt),(Sfi) s-l
\3{h,s,b)
:=
Y,
™ax M(t,d),(s,b)}
t=h+1de{i,...,m]
for all h G {0,.. .,n — l } , s G {h + 1 , . . . ,n - 1},6 G { 1 , . .
.,m}.
It is now easy to see that f3(x) =
\1(x,h)+
n-l
y^
max
,=/H-l6e{1
(A2(a;, h, s, b) + A3(/i, s, b)) m}
for a bundle x having the first key factors { 0 , . . . , h} already fixed.
69 The values of A3 can be computed offline by using the entries of A in time 0(n2 • m) and can be stored in a table of size 0(n2 • m). If x is a bundle having the first key factors s € {0,... ,h} already fixed and y is a bundle resulting from x by fixing yu+\ then the values of Ai and A2 can be efficiently computed during the search using the equations Xi{y,h+l) X2(y,h + l,s,b)
=
\\{x,h)
+
X2(y,h,h+l,yh+i)
-
X2(x,h,s,b)
Thus, the bound function /3(x) can be computed in 0(n-m) y in algorithm Branch&Bound of Figure 2.
and
time for any new bundle
Some more implementation details are contained in [FS97].
Finding a set of optimal bundles The scenario manager is not only interested in the optimal bundle. He is much more interested to compute for each projection (s, b) a set of solutions. All the bundles in this set must have key factor s fixed to b. The cardinality c of these sets of solutions is the same for all projections. These constraints have been described as problem c-mCons3b. The Branch&Bound algorithm presented above can easily be modified to solve this problem: 1. Initial solution: Computing an initial solution is no longer done. Although the function for computing an initial solution can easily be modified to compute a set of solutions, it turned out that for the instances considered here this computation is too time consuming. This inefficiency is caused by the effort to manage the set of solutions found so far, since the Hill Climbing algorithm may visit solutions twice or even more often. Thus, the Branch&Bound algorithm without computing an initial solution outperforms the one with initial solution on instances of moderate size. 2. Set of optimums: The Branch&Bound algorithm keeps a set of optimal solutions instead of a single solution only. This is easily done since the BranchfeBound algorithm delivers every solution at most once. A small heap is used to keep the set of optimums, with the worst optimum located in the root. The mechanism of heap sort is used to exchange the root entry for a better bundle found during the search. 3. Lower Bound: If the solution heap already contains c solutions, the lower bound can easily be found as the smallest consistency value of a bundle in the
70 heap. As described above this is the consistency value of the root of the heap. If less than c bundles are contained in the heap then the lower bound is set to 0. 4. One set for each projection: In order to deliver a set of optimum bundles for each projection (s, 6) we introduced a set of heaps for the optima found so far, one for each projection. Each complete bundle found during the search is then compared with the root of each of the solution heaps and inserted into the heap, if its consistency value is larger than the consistency value of the root. When generating an incomplete bundle x during the search, its upper bound has to be compared with the solution heaps. For the projections already fixed the bound is compared to the root entries of the corresponding heaps. For the projections still unfixed all possible solution heaps must be checked. The bundle may be pruned if the bundle is no longer relevant for all solution heaps, i.e. its upper bound is smaller than the consistency values of all the roots of the solution heaps. To speed up the comparisons of the upper bound with the solution heap roots of the unfixed key factors, a list of bounds is maintained, one for each key factor. For a key factor s the corresponding entry contains the minimal root bound of the solution heaps corresponding to projections of s. The list entry for s is updated, whenever the consistency value of a root entry changes in a solution heap corresponding to a projection of s. After branching an incomplete bundle, the upper bound of the newly generated bundle is then compared only with the minimum bound for the last key factor fixed. 5. Deleting obsolete entries from the Open list: In the regular Branch&Bound algorithm entries may be deleted from the Open list whenever their upper bound is smaller than the best solution found so far. In the modified algorithm this would be an inefficient operation, since each entry has to be compared with many roots. This makes the deletion operation so inefficient that we decided to omit the operation in the modified algorithm at the cost of an increase in memory requirement. Instead of deleting obsolete entries from the Open list we compare an entry with the roots of the corresponding solution heaps immediately before it is branched. With this we compare all entries that are branched twice but avoid the comparisons with the solution heap roots.
Experimental Results We again generated random matrices to evaluate the performance of our algorithm. In addition we tested the algorithm on instances from real scenario projects. The random matrices were constructed as consistency matrices for 20 key factors, 10 of them having 2 projections and the other 10 having 3 projections. The experiments were run on a Pentium 133MHz processor.
71
"4*
s-^^^^.^
40.0 60.0 number of inconsistencies
Figure 3: Run times to solve c-mCons3b depending on the number of inconsistencies
400 600 number of bundles to compute
Figure 4: Run times to solve c-mCons3b depending on the number of bundles to compute
In Figures 3 and 4 we present the run times of the optimization algorithm depending on the number of total inconsistencies in the matrix and depending on the number of bundles to compute. In the first experiment we fixed the number of bundles to compute to 100, in the latter experiment we fixed the number of inconsistencies in the matrix to 10. Again we included the performance on the three real scenario projects in the figure. As can be seen from the graphs the run times are always smaller than 10 seconds. The experiments show that the problem mCons3 can be solved in real time for input instances that are in the magnitude of the instances generated by real existing scenario projects.
72
5
Conclusions
In our paper we addressed the topic of the consistency analysis in scenario projects. We analyzed the three most important problems posted to us by the scenario managers. For all three problems we showed that in a general form they are at least NP-ha,rd. Nevertheless, we presented algorithms that solve all three problems very quickly. Moreover, for instances occurring in practice, the solution times are usually in the order of some seconds on a state-of-the-art notebook. This means, that • the work of the human experts to determine the key factors, projections and the consistency matrix can be efficiently supported by the computer. • the time consuming consistency check can be done in real time and • mistakes in early stages of a scenario project can be corrected online. Therefore, our algorithms are an important step forward towards a real time scenario management system.
References [AL83] R. Amara and A.J. Lipinski. (1983), Business Planning for an Uncertain Future: Scenarios & Strategies, Pergamon Press, New York, USA. [Bec83] H.S. Becker. (1983), "Scenarios: A Tool of Growing Importance to Policy Analysists in Government and Industry," Technological forecasting & social change, Vol. 23/83, pp 95-120. [DL80] C. Ducot and G.J. Lubben. (1980), "A Typology for Scenarios," Future, Vol. 12, pp 51-57. [FS97] R. Feldmann and N. Sensen. (1997), "Efficient Algorithms for the Consistency Analysis in Scenario Projects," Technical Report, Univ. of Paderborn, Germany. [Fos90] M.J. Foster. (1993), "Scenario Planning for Small Business," Longe Range Planning, Vol. 26, pp 123-129. [Fre84] R.E. Freeman. (1984), Strategic Management: A Stakeholder Approach, Pitman Publishers, Boston, USA. [GJ79] M. R. Garey und D. S. Johnson. (1979), Computers and Intractability, Freeman and company, New York, USA. [GFS95] J. Gausemeier, A. Fink, and O. Schlake. (1995), Szenario Management, Carl Hanser Verlag, Miinchen, Wien. [God87] M. Godet. (1987) Scenarios and strategic management, Butterworths, London, England.
73 [Goe93] U. Gotze. (1993), Szenario Technik in der strategischen Unternehmensplanung, Deutscher Universitats Verlag, Wiesbaden, Germany. [Mey92] M. Meyer-Schonherr. (1992), Szenario Technik als Instrument der strategischen Planung, Verlag Wissenschaft & Praxis, Ludwigsburg, Germany. [Schae78] T.J. Schaefer. (1978), "The complexity of satisfiability problems,", Proc. 10th Ann. ACM Symp. on Theory of Computing, pp 216-226. Association for Computing Machinery. [Sen97] N. Sensen. (1997), Konsistenzanalyse beim Szenario Management: Ein Existenz-, Zdhl- und Optimierungsproblem, Master thesis (in German), Univ. of Paderborn, Germany. [Val79a] L. G. Valiant. (1979), "The complexity of computing the permanent," Theoretical Computer Science, Vol 8, pp 189-201. [Val79b] L. G. Valiant. (1979), "The complexity of enumeration and reliability problems," SIAM Journal of Computing, Vol 8, pp 410-421. [Zha93] H. Zhang. (1993), "Sato: a decision procedure for propositional logic," Association for Automated Reasoning Newsletter, Vol. 22, 1993.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 75—95 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Assignment of Reusable and Non-Reusable Frequencies Dimitris A. Fotakis 1 ' 2 (fotakisficti.gr) (1) Department of Computer Engineering and Informatics University of Patras, 265 00 Rion, Greece Paul G. Spirakis 1 ' 2 ( s p i r a k i s S c t i . g r ) (2) Computer Technology Institute Kolokotroni 3, 262 21 Patras, Greece
Abstract Graph radio coloring and graph radio labelling are combinatorial models for two interesting cases of Frequency Assignment. In both problems positive integer labels (channels) must be assigned to all the vertices of a graph such that adjacent vertices get labels at distance at least two. In radio labelling all the labels must be distinct, while in radio coloring only the vertices being at distance no more than two in the input graph must be assigned distinct labels. For both problems the objective is to minimize the maximum label used. We first prove that both radio coloring and radio labelling remain A/"'P-complete for graphs of diameter two, and we show that a 1aapproximation algorithm for radio coloring can be obtained from any aapproximation algorithm for coloring squares of graphs. We also show that radio labelling is equivalent to Hamiltonian Path with distances one and two (HP(1,2)), and we present a polynomial-time algorithm for computing an optimal radio labelling, given a coloring of the input graph with constant number of colors. Thus we prove that radio labelling is in V for planar graphs. Additionally, we present competitive algorithms and a lower bound for on-line radio labelling. Keywords: Frequency Assignment, Hamiltonian Cycle, PolynomialTime Algorithms, On-line Algorithms.
76
1
Introduction
The Frequency Assignment Problem (FAP) arises from the fact that radio transmitters operating at the same or closely related frequency channels have the potential to interfere with each other. FAP can be formulated as an optimization problem as follows: Given a collection of transmitters to be assigned operating channels and a set of interference constraints on transmitter pairs, find an assignment that fulfills all the interference constraints and minimizes the allocated bandwidth. A common model for FAP is the interference graph. Each vertex of an interference graph represents a transmitter, while each edge represents an interference constraint between the adjacent transmitters. The frequency channels are usually assumed to be uniformly spaced in the spectrum and are labelled using positive integers. Frequency channels with adjacent integer labels are assumed adjacent in the spectrum [6, 7, 10]. Clearly, FAP is a generalization of graph coloring [7]. Instead of specifying the interference constraints for each pair of transmitters, FAP can be defined by specifying a minimum allowed spatial distance for each channel/spectral separation that is a potential source of interference [6, 7]. In [7] these are called Frequency-Distance (FD) constraints. FD-constraints can be given as a set of distances {D0,Di,...,DK}, Do > D\ > • • • > D^, where the distance Dx, x G { 0 , 1 , . . •, K}, is the minimum distance between transmitters using channels at distance x. The FD(/t)-coloring problem has been proposed as a model for FAP in unweighted interference graphs [10, 17]. In FD(/t)-coloring we seek a function XK : V (->• { 1 , . . ., u} that fulfills the FD-constraints with respect to the graph distances Do = K + 1, Dx = K,...,DK = 1, and minimizes the largest color v used (color span). Alternatively, v,u G V are only allowed to get colors at distance x, |XK(u) — XK(w)| = x, x € { 0 , 1 , . . . , K}, if v and u are at distance at least K — x + 1 from each other in the interference graph. A polynomial-time exact algorithm for a variant of FD(/t)-coloring in lattices is presented in [10]. FD(2)-coloring, which is also called radio coloring, is a combinatorial model for the widely used "co-channel" and "adjacent-channel" interference constraints. A problem similar to FD(2)-coloring is studied [13] in the context of mobile networks, where each vertex may require more than one colors (multicoloring). A polynomial-time approximation algorithm for triangular lattices is presented in [13]. In some practical applications the transmitters cover a local or metropolitan area. Hence, the transmitters are not allowed to operate at the same channel (D0 = oo, nonreusable frequency channels). The problem of radio labelling is the equivalent of radio coloring in the context of non-reusable frequency assignment. In particular, a valid radio labelling fulfills the FD-constraints with respect to DQ = oo, D\ = 2, D 2 = 1. In
77 radio labelling we seek an assignment of distinct integer labels to all the vertices of a graph, such that adjacent vertices get labels at distance at least two. The objective is to minimize the maximum label used (label span). The definition of radio labelling is communicated to us by [9]. Radio labelling and similar combinatorial models for non-reusable frequency assignment are used for obtaining lower bounds on the optimal values of general FAP instances [2, 17]. Since general instances of FAP are intractable, there exist many attempts to develop heuristic approximation algorithms [1]. Hence, some lower bounding techniques are necessary for assessing the quality of the solutions found by these algorithms.
1.1
Summary of Results
We start with proving that both radio coloring and radio labelling remain MVcomplete for graphs of diameter two. We also show that a polynomial-time 1aapproximation algorithm for radio coloring can be obtained from any polynomial-time a-approximation algorithm for coloring squares of graphs. We proceed to study radio labelling that is shown equivalent to Hamiltonian Path with distances one and two (HP(1,2)). Hence, radio labelling is MAX-SNP-hard and approximable in polynomial time within 7/6 [16]. Next, we present a polynomial-time algorithm for computing an optimal radio labelling, given a coloring of the graph with constant number of colors. Therefore, we prove that radio labelling is in V for planar graphs and graphs colorable with constant number of colors in polynomial time. As a side effect, we show that, given a partition of the vertices of a graph into constant number of cliques, we can decide if the graph is Hamiltonian in polynomial time. We are not aware of another algorithm that exploits a partition into cliques for deciding Hamiltonicity. Motivated by the practical applications of on-line frequency assignment in mobile networks [15], we define two on-line variations of radio labelling, and we prove that the greedy algorithm achieves a competitive ratio of two for both variations. We also obtain a lower bound of 3/2 on the competitive ratio of any on-line algorithm for the first variation.
2
Definitions and Techniques
Given a graph G(V, E), d(v, u) denotes the length of the shortest path between v, u £ V, and diam(G) denotes the diameter of G defined as diam(G) = max„ >u6 y{d(u, u)}.
78 The square of G, denoted G 2 , is a graph on the vertex set V that contains an edge between v, u £ V, if v and u are at distance at most two in G. The complementary graph G is a graph on the vertex set V that contains an edge {v, u}, iff {v, u} £ E. Hamiltonian Path with distances one and two (HP(1,2)) is the problem of finding a Hamiltonian path of minimum length in a complete graph, where all the edge lengths are either one or two. A Hamiltonian path is a simple path visiting each vertex of a graph exactly once. An instance of HP(1,2) can also be defined by an unweighted graph G(V, E), if the edges of E are considered of length one, and the edges not in E (non-edges) of length two. Therefore, HP(1,2) is a generalization of the Hamiltonian path problem. HP(1,2) is MAX-SNP-hard and approximable in polynomial time within a factor of 7/6 [16]. In the graph coloring problem, we seek an assignment of colors to the vertices of a graph such that no pair of adjacent vertices get the same color. The objective is to minimize the number of colors used. Given a graph G, the value of an optimal coloring is called the chromatic number of G and is denoted by x{G). It is AfV-h&id even to efficiently approximate the chromatic number of general graphs [14]. However, there exist polynomial-time approximation algorithms whose performance guarantees are not so good [8, 12]. The following generalized version of graph coloring is derived by applying the FDconstraints to unweighted graphs with respect to the distances DQ = K + 1,-Di = K,...,DK
= l.
Definition 2.1 (FD(re)-coloring) INSTANCE: A graph
G(V,E).
SOLUTION: A valid FD(«)-coloring, i.e. a function XK : V — i >• { 1 , . . . , v} such that, for all v, u e V, |XK(i>) — XK(w)| = x, x € { 0 , 1 , . . . , K], only if d(v, u) > K — x + 1. O B J E C T I V E : Minimize the maximum color v used (color span). Clearly, the FD(«)-coloring problem allows a pair of non-adjacent vertices to be assigned the same color/channel, provided they are located far apart. In the sequel, we shall concentrate on the study of FD(2)-coloring, also called radio coloring, modelling the widely used "co-channel" and "adjacent-channel" interference constraints. We remark that the problem of radio coloring a graph G is not equivalent to the problem of coloring G 2 . In particular, the objective in coloring G 2 is to minimize the number of different colors used, while the objective in radio coloring G is to minimize the maximum color assigned to a vertex. For example, if G is Km^m (i.e. the complete bipartite graph with m vertices on each class), then G 2 is the complete graph on 2m vertices. Therefore, x{G2) = 1m, while X2(G) = 2m + 1.
79 A valid FD(/c)-labelling fulfills the FD-constraints with respect to the graph distances Do = oo, A = K, . . . , DK — 1. Definition 2.2 ( F D (/^-labelling) INSTANCE: A graph
G(V,E).
SOLUTION: A valid FD(n)-labelling, i.e. a function LK : V H > {1,...,U} such that, for all v,u € V, |LK(w) — L K (M)| = x, x € { 1 , . . . , «}, only if d(v,u) > K — x + 1. O B J E C T I V E : Minimize the maximum label v used (label span). In the sequel, we shall concentrate on the FD(2)-labelling problem, also called radio labelling. Given a graph G(V,E), RL(G) denotes the value of an optimal radio labelling for G and, for any v € V, RL(v) denotes the label assigned to v by a valid radio labelling. Obviously, given a coloring of G with x colors, it is easy to find a radio labelling of value at most \V\ + \ ~ 1- Therefore, |V|
X
(G)-1
It is not hard to verify that RL(G) = |V| for all graphs G such that G contains a Hamiltonian path. On the other hand, RL(G) = |V| + x(G) — 1 for any complete r-partite graph G, r > 2.
2.1
On-line Radio Labelling
We proceed to define the problem of on-line radio labelling (cf. Chapter 13 of [11] for the basic definitions concerning on-line computation). In the on-line setting of the problem, an induced subgraph of the interference graph appears to the on-line algorithm in a vertex-by-vertex fashion. In case of on-line radio labelling, a request is a newly arrived vertex that has to be assigned a radio label. A request is accepted if the vertex actually gets a valid label. Otherwise, the request is rejected. We define two variations of on-line radio labelling. Definition 2.3 (On-line Radio Labelling — Benefit Version) INSTANCE: A graph G(V, E) and an integer bound p > 0. A new vertex v € V is presented to the on-line algorithm at every step. The adversary may choose to request any subset V C V in any order. SOLUTION: The algorithm has to decide to either accept v by assigning a label not greater than /3 to it, or reject v. At any step, the labels of the set Va of accepted vertices must form a valid radio labelling for the subgraph of G induced by Va.
80 O B J E C T I V E : Maximize the number of accepted vertices. Definition 2.4 (On-line Radio Labeling — Assignment Version) INSTANCE: A graph G(V,E). At every step, a new vertex v e V is presented to the on-line algorithm. The adversary may choose to request any subset V C V in any order. SOLUTION: The algorithm must assign labels to all the vertices requested. At any step, the labels of the set Vp of the vertices presented so far must form a valid radio labelling for the subgraph of G induced by Vv. O B J E C T I V E : Minimize the maximum label used.
2.2
Competitive Analysis and Lower Bounds
The usual approach of analyzing the performance of on-line algorithms is the competitive analysis [18], In competitive analysis, the performance of the on-line algorithm is compared to the performance of the optimal off-line algorithm on every sequence of requests, and the worst-case ratio is considered. Let A be an on-line algorithm for a benefit (maximization) problem and a any sequence of requests. Also, let A(a) denote the benefit accrued by A when presented with a, and OPT(cr) denote the benefit accrued by the optimal off-line algorithm OPT on the same sequence. We say that the on-line algorithm A is c-competitive if there exists a constant b such that on every request sequence a, c • A{a)
+b>
OPT(CT)
The competitive ratio for cost (minimization) problems is defined similarly. In case of randomized on-line algorithms, the competitive ratio is defined with respect to the expected benefit of the algorithm. A standard technique used in competitive analysis is to employ an adversary which plays against the algorithm A and constructs an input which incurs a high cost for A and a low cost for OPT. We say that an adversary is oblivious if it constructs the request sequence in advance, before A starts working on the sequence. However, the oblivious adversary knows the probability distribution of the actions taken by the algorithm A. A common technique for proving lower bounds on the competitive ratio of a randomized on-line algorithm against the oblivious adversary is to bound the performance of the "best" deterministic on-line algorithm on inputs generated from the "worst" probability distribution. In particular, let V be a probability distribution over sequences of requests a, let Ep[A(cr)] denote the expected value of the benefit of an algorithm A,
81 and E E I[OPT((T)] be the expected benefit of the optimal off-line algorithm on inputs generated from V. An algorithm A (for a benefit problem) is c-competitive against V if there exists a constant b such that, c • Ev[A(a)] +b>
Ev[OPT{a)}
Lemma 2.5 ([4]) A real number c is a lower bound on the competitive ratio of randomized on-line algorithms against the oblivious adversary if and only if there exists a probability distribution V such that c is a lower bound on the competitive ratio of any deterministic on-line algorithm against T>.
3
The Complexity of Radio Coloring and Radio Labelling
We proceed to show that both radio coloring and radio labelling remain AA'P-complete, even for a very restricted class of instances, namely graphs of diameter two. Lemma 3.1 Radio coloring and radio labelling restricted to graphs of diameter two are HV -complete. Proof. Let G(V, E), \V\ = n, be any graph of diameter two. Since, for all v,u € V, d(v, u) < 2, any valid radio coloring must assign distinct colors to all the vertices of G. Moreover, if {v, u) e E, then \X2(v) - X 2 ( M ) | > 2. Therefore, if diam(G) = 2, the problem of radio coloring in G is equivalent to the problem of radio labelling in G. Hence, it suffices to show that radio labelling is jV"P-complete for graphs of diameter two. Clearly, radio labelling is in MV. Additionally, it is not hard to verify (see also Lemma 3.3) that RL(G) < |V| if and only if the complementary graph G contains a Hamiltonian path. Thus, in order to show that radio labelling is jVP-complete for graphs G of diameter two, it suffices to show that the Hamiltonian Path problem remains A/"'P-complete for complementary graphs G such that diam(G) = 2. Let G'(V',E') be any graph and let s,t e V' be any pair of non-adjacent vertices. The problem of deciding if G' contains a Hamiltonian path starting from s and ending to t is A/"'P-complete (Hamiltonian Path between Two Vertices [5]). Let G^(V' U {vs,vt},E' U {(s, vs), (t, vt)}) be the graph obtained from G' by adding two nonadjacent vertices vs,vt, and connecting vs to s and vt to t (Figure 1). The graph G^ is the complement of a graph of diameter two. In particular, the following observations justify that diam (G(cn = 2, since all the vertices are at distance at most two from each other in G(c\
82
Figure 1: The complementary graph of G' c ' has diameter two. 1. The vertex pairs (s,i), (v„vt), (s,vt) and (t,vs) are connected by edges in
G^.
2. Any pair of vertices u, w £ V — {s, t] are at distance at most two from each other, because they are connected to both vs,vt. 3. Any vertex u € V U {vs} — {s, £} is at distance at most two from s, because both u and s are connected to vt. 4. Any vertex u € V U {v(} — {s,£} is at distance at most two from t, because both u and t are connected to vs. Additionally, G^ contains a Hamiltonian path if and only if G' contains a Hamiltonian path from s to t. Therefore, Hamiltonian path is A/'T'-complete for complements of graphs of diameter two. • Notice that a set C C V is a valid color class of G2, iff it is a valid radio color class of G, because any pair of vertices in C is at distance at least three in G. Moreover, if we assign the colors 1 , 3 , . . . , 2\{G2) — 1 to the color classes of G 2 , we obtain a valid radio coloring of G. Therefore, X(G2) < X 2 (G) < 2 X (G 2 ) - 1 Additionally, if A is a polynomial-time a-approximation algorithm for coloring G 2 and | J 4 ( G 2 ) | denotes the number of colors used by A, then we can easily compute a valid radio coloring of G of value no more than 2| J 4(G 2 )| — 1. Since X 2 (G)
< 2|A(G 2 )| - 1 < 2 Q X ( G 2 ) - 1 < 2 Q X 2 ( G ) - 1 ,
this is a 2a-approximation for radio coloring in G.
83 Lemma 3.2 For any graph G and real number a > 1, a polynomial-time laapproximation algorithm for radio coloring in G can be obtained from any polynomialtime a-approximation algorithm for coloring G2. Since radio labelling assigns distinct integer labels to all the vertices of a graph, it is a vertex arrangement problem. In particular, we show that radio labelling is equivalent to HP(1,2) in the complementary graph. Lemma 3.3 Graph radio labelling and HP(1,2) are equivalent. Proof. Given an instance of radio labelling, i.e. a graph G(V, E), the corresponding instance of HP(1,2) is a complete graph G on the vertex set V, and the distance function d is defined for all v, u £ V, v / u, by
d(v u) =\2l1 ' > a[V U
*M*E
if ( « , « ) € £
Given any valid radio labelling L (for G) of value RL(L), we can obtain a Hamiltonian path H for G by traversing all the vertices in increasing order of their labels. Moreover, the following claim implies that the length of the Hamiltonian path H is exactly RL(L) - 1. Claim 1 The length of the path up to any vertex of label i, i — 1 , . . . , RL(L), is exactly i — 1. Proof of the Claim. We prove the claim by induction on i. Clearly, it is true for the first vertex, where i = 1. Assume inductively that it is true for any vertex v of label i > 1, and let u be the next vertex in the Hamiltonian path. We proceed by case analysis: 1. If the label of u is i + 1, the edge {v, u] is not present in G and, by construction, d(v, u) = 1. Thus, the length of the path up to vertex u is exactly i. 2. If the label of u is i + 2, by the construction of the Hamiltonian path, there does not exist a vertex of label i + 1. Therefore, the edge {v,u} 6 E, and d(v,u) = 2. Consequently, the path up to u has length exactly i + 1. D Conversely, given an instance (G,d) of HP(1,2) on the vertex set V, \V\ = n, an instance GiV, E) of radio labelling can be obtained by only connecting the vertex pairs that are at distance 2. Furthermore, given a Hamiltonian path H = (vx, v2, • • •, v„) for G of length 1(H), we obtain a valid radio labelling L (for G) as follows:
84 1. RL(vi) = 1. 2. For i = l , . . . , n - l , (a) RL(ui+i) = RL{Vi) + 1, if d(vi, vi+\) = 1. By the construction of G(V, E), in this case {vt, vi+1} £ E. (b) RL(« i+1 ) = RL(^) + 2, if d(vi, vi+1) = 2. By the construction of G(V, E), in this case {«,, i»i+i} 6 -EBy construction, all the vertices get distinct labels, while, if an edge {i>,,i>;+1} is present in E, the vertices vi and vi+i are assigned non-adjacent labels. Therefore, the resulting radio labelling is a valid one. Additionally, the last vertex of the path vn is assigned the label 1(H) + 1, that is the largest label used. Hence, RL(L) = 1(H) + 1.
• The previous lemma implies that radio labelling is M AX-SNP-hard and approximable in polynomial time within 7/6 [16].
4
An Exact Algorithm for Constant Number of Colors
Since radio labelling is equivalent to HP(1,2) and a coloring corresponds to a partition into cliques of the complementary graph, we present the technical part of the proof in the context of Hamiltonian paths/cycles and partitions into cliques. In particular, we prove that, given a partition of a graph G(V, E) into constant number of cliques, we can decide if G is Hamiltonian in polynomial time.
4.1
Hamiltonian Cycles and Partitions into Cliques
Theorem 4.1 Given a graph G(V,E), \V\ = n, and a partition of V into K > 1 cliques, there exists a deterministic algorithm that runs in time 0(nK^2li~1A and decides ifG is Hamiltonian. IfG is Hamiltonian, the algorithm outputs a Hamiltonian cycle. Proof. A set of inter-clique edges M C E is an HC-set if M can be extended to a Hamiltonian cycle using only clique-edges, i.e. there exists an M' c ' C E of clique edges such that M U M^"1 is a Hamiltonian cycle. We first show that, given a set of inter-clique edges M, we can decide if M is an HCset and construct a Hamiltonian cycle from M in poly(n, K) time (Proposition 4.2).
85 Then, we prove that G is Hamiltonian iff there exists an HC-set of cardinality at most K(K — 1) (Lemmas 4.3 and 4.4). The algorithm exhaustively searches all the sets of inter-clique edges of cardinality at most K(K — 1) for an HC-set. Additionally, we conjecture that, if G is Hamiltonian, then there exists an HC-set of cardinality at most 2(K — 1). We prove this conjecture for a special case (Lemma 4.5) and we use Lemma 4.3 to show the equivalence to Conjecture 1. Let C — {C\,... ,CK} be a partition of V into K > 1 cliques. Given a set M C E of inter-clique edges, the clique graph T(C, M) contains exactly K vertices, that correspond to the cliques of C, and represents how the edges of M connect the different cliques. If M is an HC-set, then the corresponding clique graph T(C, M) is connected and eulerian. However, the converse is not always true. Given a set of inter-clique edges M, we color an edge RED, if it shares a vertex of G with another edge of M. Otherwise, we color it BLUE. The corresponding edges of G are colored with the same colors, while the remaining edges (E — M) are colored BLACK. Additionally, we color RED each vertex v 6 V, which is the common end vertex of two or more RED edges. We color BLUE each vertex » e l ^ t o which exactly one edge of M (RED or BLUE) is incident. The remaining vertices of G are colored BLACK (Figure 2).
Let H be any Hamiltonian cycle of G and let M be the corresponding set of interclique edges. Obviously, RED vertices cannot be exploited for visiting any BLACK vertices belonging to the same clique. If H visits a clique C; through a vertex v, and leaves C, through a vertex u, then v,u € Ct consist of a BLUE vertex pair. A BLUE pass through a clique Ct is a simple path of length at least one, that entirely consists of non-RED vertices of Cj. A clique d is covered by M, if all the vertices of C; have degree at most two in M, and the existence of a non-RED vertex implies the existence of at least one BLUE vertex pair. The following proposition characterizes HC-sets. P r o p o s i t i o n 4.2 A set of inter-clique edges M is an HC-set iff the corresponding clique graph T(C, M) is connected, eulerian, and, (a) For all i = 1 , . . . , K, Ci is covered by M; and (b) There exists an eulerian trail R for T such that: For any RED vertex v G V, R passes through v exactly once using the corresponding RED edge pair. Proof. Any HC-set corresponds to a connected, eulerian clique graph T(C, M) that fulfills both (a) and (b). Conversely, we can extend M into a Hamiltonian cycle H following the eulerian trail R. All the RED vertices can be included in H with degree two because of (b). Moreover, since R is an eulerian trail and (a) holds for M, all the BLUE and the BLACK vertices can be included in H with degree two. Therefore, H is a Hamiltonian cycle. •
86
(a) A graph partitioned into 3 cliques
0 9 ^
(b) An HC-set with 5
(c) A RED Hamiltonian Cycle
A black vertex A red vertex A blue vertex
-
— A blue edge A red edge A black edge A black edge of the cycle - - - • An edge to be removed
(d) A cycle using less RED edges
(e) An HC-set with 3 edges
Figure 2: An application of Lemmas 4.3 and 4.4. The proof of Proposition 4.2 implies a deterministic procedure for deciding if a set of inter-clique edges is an HC-set in poly(n, K) time. Moreover, in case that M is an HC-set, this procedure outputs a Hamiltonian cycle.
Lemma 4.3 Let BK > 2 be some integer only depending on K such that, for any graph G(V,E) and any partition of V into K cliques, if G is Hamiltonian and \V\ > BK, then G contains at least one Hamiltonian cycle not entirely consisting of inter-clique edges (RED vertices). Then, for any graph G(V, E) and any partition of V into K cliques, G is Hamiltonian iff it contains a Hamiltonian cycle with at most BK inter-clique edges.
Proof. Let H be the Hamiltonian cycle of G containing the minimum number of interclique edges and let M be the corresponding set of inter-clique edges. Assume that \M\ > BK. The hypothesis implies that H cannot entirely consist of RED vertices. Therefore, H should contain at least one BLUE vertex pair.
87 We substitute any BLUE pass of H through a clique C* with a single RED super-vertex v, that also belongs to the clique C*. Hence, v is connected to all the remaining vertices of C, using BLACK edges. These substitutions result in a cycle H' that entirely consists of RED vertices, and contains exactly the same set M of inter-clique edges with H. Obviously, the substitutions of all the BLUE passes of H with RED super-vertices result in a graph G'(V, E') that is also Hamiltonian, |V'| > BK, and V is partitioned into K cliques. Moreover, for any Hamiltonian cycle He of G', the reverse substitutions of all the RED super-vertices v with the corresponding BLUE passes result in a Hamiltonian cycle of G that contains exactly the same set of inter-clique edges with He (Figure 2). Since H' is a Hamiltonian cycle that entirely consists of inter-clique edges and \V'\ > BK, the hypothesis implies that there exists another Hamiltonian cycle of G' that contains strictly less inter-clique edges than H'. Therefore, there exists a Hamiltonian cycle of G that contains less inter-clique edges than H. • Lemma 4.3 implies that, in order to prove the upper bound on the cardinality of a minimum HC-set, it suffices to prove the same upper bound on the number of vertices of Hamiltonian graphs that (i) can be partitioned into K cliques, and (ii) only contain Hamiltonian cycles entirely consisting of inter-clique edges. It should be intuitively clear that such graphs cannot contain an arbitrarily large number of vertices. L e m m a 4.4 Given a graph G(V, E) and a partition of V into K cliques, G is Hamiltonian iff there exists an HC-set M such that \M\ < K(K — 1). Proof. By definition, the existence of an HC-set implies that G is Hamiltonian. Conversely, let H be the Hamiltonian cycle of G that contains the minimum number of inter-clique edges, and let M be the corresponding HC-set. If \M\ < K(K - 1), then we are done. Otherwise, Lemma 4.3 implies that it suffices to prove the same upper bound on |V\ for graphs G(V, E) that only contain Hamiltonian cycles entirely consisting of inter-clique edges. Assume that \V\ = \M\ and the coloring of V under M entirely consists of RED vertices, and consider an arbitrary orientation of the Hamiltonian cycle H (e.g. a traversal of the edges of H in the clockwise direction). If there exist a pair of cliques C; and Cj and four vertices V\, v-i e Q and u\, %2 € Cj, such that both vx are followed by ux (x = 1, 2) in a traversal of H, then the BLACK edges {vi, v2} and {ui, M2} can be used instead of {vi, u^} and {^2, W2} in order to obtain a Hamiltonian cycle containing less inter-clique edges than H. The previous situation can be avoided only if, for all i = 1 , . . . , K, and j = 1 , . . . , K, j / i, at most one vertex u, € Cj is followed by a vertex u € Cj in any traversal of H. Hence, if |V| > K(K— 1), then G contains at least one Hamiltonian cycle not entirely consisting of inter-clique edges. Alternatively, any HC-set M of minimum cardinality contains at most two edges between any pair of cliques Cj and Cj. Thus, M contains at most K(K — 1) inter-clique edges. D
88
Figure 3: An HC-set containing exactly 2(K — 1) edges. Therefore, we can decide if G is Hamiltonian in time 0[n^2K~r>\ because the number of the different edge sets containing at most K(K — 1) inter-clique edges is at most n 2 "'" - 1 ), and we can decide if a set of inter-clique edges is an HC-set in time poly(n, K) = 0(nK). • A BLUE Hamiltonian cycle is a Hamiltonian cycle that does not contain any RED vertices or edges. We can substantially improve the bound of K(K — 1) for graphs containing a BLUE Hamiltonian cycle. Lemma 4.5 Given a graph G(V, E) and a partition ofV into K cliques, ifG contains a BLUE Hamiltonian cycle, then there exists an HC-set M entirely consisting of BLUE edges, such that \M\ < 2(K — 1). Proof. Let H be the BLUE Hamiltonian cycle of G that contains the minimum number of inter-clique edges and let M be the corresponding HC-set. Assume that \M\ > 2(K — 1), otherwise we are done. Notice that, since RED vertices cannot be created by removing edges, any M' C M that corresponds to an eulerian, connected, clique graph T(C, M') is an HC-set that only contains BLUE edges. Let ST{C,MS) be any spanning tree of T(C,M). Since \Ms\ = K — 1, the graph T< S '(C, M-Ms), which is obtained by removing the edges of the spanning tree from T, contains at least K edges. Therefore, T ' s ' contains a simple cycle L. The removal of the edges of L does not affect connectivity (the edges of L do not touch the spanning tree ST), and subtracts two from the degrees of the involved vertices/cliques. Clearly, the clique graph T'(C, M — L) is connected and eulerian, and M — L, \M — L\ < \M\, is an HC-set. • Figure 3 shows an HC-set of cardinality exactly 2(K— 1) that corresponds to a Hamiltonian cycle using the minimum number of inter-clique edges. Therefore, the bound of 2(K — 1) is tight. However, we are not able to construct HC-sets that contain more than 2(K — 1) edges and correspond to Hamiltonian cycles using the minimum number of inter-clique edges. Hence, we conjecture that the bound of 2(K — 1) holds for any graph and any partition into K cliques. An inductive (on |V|) application of Lemma 4.3 suggests that this conjecture is equivalent to the following: Conjecture 1 For any Hamiltonian graph G(V,E) of 2K — 1 vertices and any partition of V into K cliques, there exists at least one Hamiltonian cycle not entirely
89 consisting of inter-clique edges. The previous conjecture is very similar to a theorem proved by C.A.B. Smith in 1946. This theorem states that the number of Hamiltonian cycles that contain any given edge of a cubic graph (i.e. a simple regular graph of degree 3) is even. A simple algebraic proof of this theorem can be found in [3]. Smith's Theorem implies Conjecture 1 in case that the partition C contains at least k — 1 cliques of even cardinality and at most 1 clique of odd cardinality. If we only consider Hamiltonian graphs, Smith's Theorem can be applied to graphs G(V, E) consisting of a Hamiltonian cycle and a perfect matching. Therefore, Smith's Theorem is applicable to graphs that contain a RED Hamiltonian cycle, such that V is partitioned into n=^ cliques of two vertices (perfect matching). A more general class of graphs fulfills the hypothesis of Conjecture 1. In particular, Conjecture 1 is applicable,to the graphs consisting of a RED Hamiltonian cycle and K cliques of arbitrary cardinalities. However, the conclusion is weaker than the conclusion of Smith's Theorem, in the sense that it only claims the existence of a Hamiltonian cycle avoiding at least one inter-clique (RED) edge.
4.2
A Reduction from Radio Labelling to Hamiltonian Cycle
L e m m a 4.6 Given a graph G(V,E), \V\ = n, and a coloring of G with K colors, an optimal radio labelling can be computed in O l n l l ( & + 1 ' ) time.
Proof. Let G(V,E) be the complement of the input graph G. Obviously, RL(G) < n + K — 1. Therefore, at most K — 1 labels remain unused by an optimal radio labelling. Hence, any optimal solution to the corresponding HP(1,2) instance (see also the proof of Lemma 3.3) contains at most K — 1 non-edges (of G), and there exists a Hamiltonian cycle containing at most K non-edges (of G). Then, we show how to compute a Hamiltonian cycle containing the minimum number of non-edges to the complementary graph G. Let A be the algorithm of Theorem 4.1. We call A at most M = 0(n2li) times, with input the graphs G;(V, E U Ni), i = 1 , . . . , M. The sets N, are all possible subsets of non-edges of G with at most K elements, including the empty one. Let Gi be a Hamiltonian graph that corresponds to a set N, of minimum cardinality. Obviously, the Hamiltonian cycle produced by A(Gi) is a Hamiltonian cycle with the minimum number of non-edges for G. D Since any planar graph can be colored with constant number of colors in polynomial time, the following theorem is an immediate consequence of Lemma 4.6.
90 T h e o r e m 4.7 An optimal radio labelling of a planar graph can be computed in polynomial time. Remark. Conjecture 1 implies that, given a graph G(V,E) and a coloring with K colors, an optimal radio labelling can be computed in time noi-K\
5 5.1
Algorithms for On-line Radio Labelling On-line Radio Labelling - Benefit Version
We first analyze the performance of the greedy algorithm B G R E E D Y . B G R E E D Y assigns to a newly arrived request the least integer j , such that j has not been assigned to any previously accepted request, and the resultant radio labelling is a valid one. B G R E E D Y accepts the request, if and only if j is no more than the bound
L e m m a 5.1 The competitive ratio of B G R E E D Y is /S § Proof. The B G R E E D Y algorithm always accepts at least [f 1 requests, because, for i = 1, 2 , . . . , 111, B G R E E D Y can always assign a label no more than 2i — 1 to the i-th request. Since the optimal off-line algorithm cannot accept more than (5 requests, an upper bound of /?[fl on the competitive ratio of B G R E E D Y can be established. We prove that this ratio is precisely /3 | by exhibiting a special sequence of requests. Let Sm be any graph on m vertices containing a Hamiltonian path and Hm = Sm be the complementary graph. Lemma 3.3 implies that RL(Hm) = m. Additionally, let Lm be the graph obtained from the complete graph on y vertices, denoted Km/2, and Hm/2 by connecting any vertex of Km/2 to any vertex of Hm/2. By construction, RL(L m ) = ^ . For some bound /? > 0, let the request sequence consist of the vertices of the graph L2p, such that all the /3 vertices of Kp are requested before all the vertices of Hp. Clearly, B G R E E D Y can only accept the first f requests, while the optimal algorithm accepts the last j3 requests. This establishes the competitive ratio of B GREEDY. • Notice that B G R E E D Y does not take into account the structure of the optimal solution on the current set of requests. The previous instance shows that unless a deterministic on-line algorithm takes into account the optimal solution on the current set of requests, it cannot achieve competitive ratio better than /3 [ | 1 . Therefore
91 B G R E E D Y is optimal among on-line algorithms that do not reject by choice. We next prove that no randomized on-line algorithm can achieve a competitive ratio less than | against any adversary. Lemma 5.2 No randomized algorithm for the benefit version of on-line radio labelling can achieve competitive ratio less than | . Proof. We prove the lower bound against the oblivious adversary; since the oblivious adversary is the least powerful one, this implies a lower bound of § against any adversary. The proof consists of defining an appropriate probability distribution on the request sequences and applying Lemma 2.5. Let { 1 , . . . ,/3} be the set of available labels. We consider the following probability distribution on the request sequences: 1. With probability p, the request sequence consists of the vertices of the graph K0. 2. With probability 1 — p, the request sequence consists of the vertices of the graph L2g, such that the vertices of Kg precede the vertices of Hg. If the input is Kg, the optimal off-line algorithm accepts exactly f 1 requests, while if the input is L2/3, it accepts exactly /3 requests. Thus, the expected number of accepted requests for the optimal algorithm is
E(OPT) = p
+(1-P)0
Let A be any deterministic algorithm. On input Kg, the algorithm A will accept x requests using labels from the set { 1 , . . . , 2x — 1}, where x is any integer number between 1 and | . The value of x is fixed after the choice of the input distribution, in order for A to be the deterministic on-line algorithm maximizing the expected number of accepted requests with respect to the specific probability distribution on the request sequences, i.e. A to be the "best" deterministic algorithm. Thus, with probability p, A accepts x requests. Additionally, A has to accept exactly x requests from the graph Kg on input L2g, because ^4 is a deterministic on-line algorithm and the vertices of Kg precede the vertices of Hg. Consequently, A can accept at most (/3 — 2x) requests from the subgraph Hg. Thus, the expected number of accepted requests of the best deterministic algorithm in this distribution is at most
E{A) = px + {1 - p){p ~ x)
92 For p = | , this reduces to E{A) = 2 ,
and E(OPT) = 1
+0) >
3/3
Thus, any deterministic on-line algorithm cannot achieve a competitive ratio less than | against this probability distribution. Consequently, Lemma 2.5 implies that | is a lower bound on the performance of any randomized on-line algorithm against the oblivious adversary. D In the on-line setting, we mainly face information-theoretic questions that have to do with the value of information on the computation of a minimum cost labelling. Thus, the lower bound holds for any on-line algorithm A, and it does not depend on the running time of A. On the other hand, even if we know the entire request sequence, radio labelling is AfP-complete, and, therefore, not expected to be solvable optimally in polynomial time. However, the proof of Lemma 5.2 does not take into account the Af'P-completeness of radio labelling in Hp. Consequently, a polynomial-time on-line algorithm with competitive ratio | is unlikely to exist.
5.2
On-line Radio Labelling — Assignment Version
We continue to analyze the performance of the greedy algorithm AGREEDY for the assignment version of on-line radio labelling, where the algorithm is not allowed to reject requests. AGREEDY assigns to a newly arrived request i, i = 1 , . . . , n, the least integer j such that both j has not been assigned to a previous request, and the resulting radio labelling is a valid one. Since there does not exist an upper bound on the labels used, such a label j always exist. L e m m a 5.3 The competitive ratio CA of A G R E E D Y is 2 — -
— -.
Proof. For any sequence of n requests, the value of a labelling computed by AGREEDY is at most 2n — 1, since the labelling 1 , 3 , . . . , 2n — 1 is always a valid one. The upper bound on the competitive ratio follows from the fact that the value of an optimal labelling cannot be less than n. To show the lower bound, for any even integer n > 2, let Un be the graph on n vertices obtained from Kn by removing the edges of a Hamiltonian path. Hence, the complement of Un is a Hamiltonian path on n vertices, and RL([/„) = n. Let Mi,u 2 ,... ,un be an ordering of the vertices of U(n) according to their appearance in the Hamiltonian path contained by the complementary graph; that is, for i = 1,... n — 1, the vertices ut and u i + 1 are not adjacent in Un. The adversary requests
93 all the vertices of Un in the following order: for i = 0 , . . . , f - 1 , it requests the vertex Mj+i followed by the vertex un-{. Claim 2 Fori = 0 , . . . , | - 2 , A G R E E D Y assigns to the vertexui+i the label4(i+l)-3 and to un-i the label 4(i + 1) — 1. Moreover, A G R E E D Y computes a radio labelling of Un of value 2n — 3. Proof of the Claim. We prove the first part of the claim by induction on i. Clearly, for i = 0, the vertex ui gets the label 1, and the vertex un gets the label 3, since u\ and un are adjacent in Un. Assume inductively that the claim holds for any i, 0 < i < | — 2. Then, for i + 1, the adversary requests the vertices Mi+2 and un_(j+i). In [/„, the vertex ui+2 is adjacent to all the previously requested vertices except u; +1 . Since the vertex u n _j has been assigned the label 4(i + 1) — 1, A G R E E D Y assigns to ui+2 the label 4(i + 1) + 1 = 4(i + 2) — 3. Similarly, since all the previously requested vertices except « n _j are adjacent to the vertex «„_( i+1 ), A G R E E D Y assigns to M„_(i+1j the label 4(i + 2) — 1. Therefore, for i — | — 2, the A G R E E D Y algorithm assigns to the vertex u„/2^i the label 2n — 7, and to wn/2+2 the label 2n — 5. Then, the adversary requests the vertices un/2 and Mn/2+i that get the labels 2n — 3 and 2n — 4 respectively. • Since the optimal off-line algorithm can easily compute a radio labelling of value exactly n, we obtain a lower bound of 2 — | on the competitive ratio of the algorithm AGREEDY.
6
•
Open Problems
An interesting direction for further research is to obtain a polynomial-time approximation algorithm for radio coloring in planar graphs using our exact algorithm for radio labelling. One approach may be to decompose the planar graph to subgraphs, such that almost all vertices of each subgraph get distinct colors by an optimal (or near optimal) assignment. Then, our exact algorithm can be used for computing an optimal radio labelling for each of the resulting subgraphs. Obviously, the decomposition of the planar graph and the combination of the partial assignments to a near optimal solution require an appropriate planar separator theorem. Another research direction is the conjecture that, given a graph G(V, E) and a partition of V into K cliques, G is Hamiltonian iff there exists a Hamiltonian cycle containing at most 2(K — 1) inter-clique edges. This would imply an n°^ exact algorithm for radio labelling in the complementary graph G, given a coloring of G with K
94 colors. In this paper, we prove the conjectured bound for graphs and partitions that contain at least one Hamiltonian cycle H, such that any vertex v £ V has degree at most one in the set of inter-clique edges of H. Additionally, it may be possible to improve the complexity of the algorithm of Theorem 4.1 to 0(f(K)p(n)), where /(re) is a fixed function of re, e.g. /(re) = 2" , and p(n) is a fixed polynomial in n of degree not depending on re, e.g. p(n) = n.
References [1] K. Aardal, C.A.J. Hurkens, J.K. Lenstra, and S.R. Tiourine (1996), "Algorithms for frequency assignment problems", CWI Quarterly 9, pp. 1-9. [2] S.M. Allen, D.H. Smith, and S. Hurley (1997), "Lower Bounding Techniques for Frequency Assignment", Submitted to Discrete Mathematics. [3] C. Berge (1985), Graphs (second edition), North Holland. [4] A. Borodin, N. Linial, and M. Saks (1992), "An optimal on-line algorithm for metrical task systems", Journal of the Association for Computing Machinery 39(4), pp. 745-763. [5] M.R. Garey and D.S. Johnson (1979), Computers and Intractability: A Guide to the Theory of MV'-completeness, Freeman, San Francisco. [6] R.A.H. Gower and R.A. Leese (1997), "The Sensitivity of Channel Assignment to Constraint Specification", Proc. of the 12th International Symposium on Electromagnetic Compatibility, pp. 131-136. [7] W.K. Hale (1980), "Frequency Assignment: Theory and Applications", Proceedings of the IEEE 68(12), pp. 1497-1514. [8] M. Halldorsson (1993), "A still better performance guarantee for approximate graph coloring", Information Processing Letters 45, pp. 19-23. [9] F. Harary (1997), Personal Communication. [10] J. van den Heuvel, R.A. Leese, and M.A. Shepherd (1996), "Graph Labelling and Radio Channel Assignment", Manuscript available from http://www.maths.ox.ac.uk/users/gowerr/preprints.html. [11] D.S. Hochbaum (1997), Approximation Algorithms for NV-hard Publishing.
Problems, PWS
95 [12] D. Karger, R. Motwani, and M. Sudan (1994), "Approximate graph coloring by semidefinite programming", Proc. of the 35th IEEE Symposium on Foundations of Computer Science, pp. 2-13. [13] S. Khanna and K. Kumaran (1998), "On Wireless Spectrum Estimation and Generalized Graph Coloring", Proc. of the 17th Joint Conference of IEEE Computer and Communications Societies - INFOCOM. [14] C. Lund and M. Yannakakis (1994), "On the Hardness of Approximating Minimization Problems", Journal of the Association for Computing Machinery 4 1 , pp. 960-981. [15] G. Pantziou, G. Pentaris, and P. Spirakis (1997), "Competitive Control in Mobile Networks", Proc. of the 8th International Symposium on Algorithms and Computation, pp. 404-413. [16] C.H. Papadimitriou and M. Yannakakis (1993), "The Traveling Salesman Problem with Distances One and Two", Mathematics of Operations Research 18(1), pp. 1-11. [17] A. Raychaudhuri (1985), Intersection assignments, graphs, PhD Thesis, Rutgers University.
T-colourings and powers of
[18] D. Sleator and R.E. Tarjan (1985), "Amortized Efficiency of List Update and Paging Rules", Communications of the Association for Computing Machinery 28, pp. 202-208.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 97-110 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Image Space Analysis for Vector Optimization and Variational Inequalities. Scalarization F . Giannessi Department of Mathematics, University of Pisa, Via F. Buonarroti 2 56127 Pisa, Italy
L. Pellegrini Institute of Mathematics, University of Verona, Via dell'Artigliere 19 37129 Verona, Italy
Abstract
Vector constrained extremum problems and Vector Variational Inequalities are considered, and a separation scheme is introduced. Starting from such a scheme, several theoretical aspects can be developed as well as methods of solution; for instance, scalarization of Vector Optimization is analysed and a method is described which does not require any assumption on the problem. The analysis is extended to a Vector Variational Inequality; this turns out to be equivalent to a scalar Quasi-Variational Inequality. K e y w o r d s : Vector Optimization, Vector Variational Inequalities, Separation, Scalarization.
98
1
Introduction
In the last few years there has been a growing interest in vector problems, both from a theoretical point of view and as it concerns the applications to real problems. This paper aims to contribute to the definition of a general scheme for carrying out the analysis of vector problems as well as for finding methods of solution. Such a scheme was proposed in [10]; it has produced some developments in the field of Vector Variational Inequalities [13] besides that of Vector Optimization [1,7]. As far as Vector Optimization, some papers have introduced some aspects of the image space analysis proposed in [10], independently from each other and from [10]. In the next section we will briefly recall the general scheme which we pose as the basis for analysing a vector extremum problem. In Sect.3, with reference to scalarization, it will be shown how the above scheme can help us in carrying out the analysis of several topics. In Sect.4 we discuss how to extend the approach to Variational Inequalities. It is shown that a Vector Variational Inequality is equivalent to a scalar Quasi-Variational Inequality. Let the positive integers £, m, n and the cone C C R be given. In the sequel it will be assumed that C is convex, closed and pointed with apex at the origin and with i n t C ^ 0; more precisely, with nonempty interior 1 . Consider the vector-valued functions / : R™ -> R £ , g : R n -> R m , and the subset X C R n . We will consider the following vector minimization problem, which is called generalized Pareto problem: (1.1)
minC\{0} f(x)
, subject to x €. R := {a; £ X : g(x) > 0},
where minc\{o} marks vector minimum with respect to the cone C\{0} : y £ R is a (global) vector minimum point (in short, v.m.p.) of (1.1), iff2
(1-2)
/(») 2c\{o}/(*) , Vzetf,
where the inequality means
f(v) ~ f(x) i C*\{0}. At C = Re+ (1.1) becomes the classic Pareto vector problem. A vector minimization problem which is often associated to (1.1) is the following one 3 , called weak vector problem: (1.3) 1
min i n t c/(a;) , s.t. x € R,
Some of the propositions which will be established do not require all these assumptions on C. Without cutting off the apex of C, (1.2) would require that y be the unique v.m.p. of (1.1): at i Vi f{z) — f(y) inequality (1.2) - with C instead of C\{0} - would become 0 $ C, which is false. 3 Different from (1.1), since different cones obviously identify different vector problems. 2
z L
99 where minj nt c marks vector minimum with respect to the cone i n t C : y G R is a (global) v.m.p. of (1.3), iff (1-4)
f(y)hntcf(x) , VxeR,
where the inequality means f(y) — f(x) ^ i n t C . The term weak comes from the following tradition. Notwithstanding the fact that (1.1) and (1.3) are distinct problems, since the solutions of (1.1) are solutions also of (1.3) (but not necessarily vice versa), then the solutions of (1.3) are often called "weak solutions" of (1.1). At C = R ^ (1.3) is called weak vector Pareto problem. In the sequel we will outline a separation scheme for (1.1). It would also be interesting to carry out such a scheme for (1.3) too. Connections between the two schemes might be useful.
2
A Separation Scheme
Let us now consider problem (1.1). It is trivial to note that (1.2) is satisfied iff the system (in the unknown x): (2.1)
f(y) - f(x) e C , f(y) - f(x) ± 0 , g(x) > 0. x £ X
is impossible. Consider the sets: - t t : = ( C \ { 0 } ) x R ™ , « : = { ( « , « ) e R ( x E " : « = / ( } ) - / ( i ) , v = g(x), x € X}. System (2.1) is impossible iff (2.2)
ftn/C
= 0.
Hence, to prove that y is a v.m.p. of (1.1) it is equivalent to show (2.2). Since in general this is too difficult, an approach for proving (2.2) consists in obtaining the existence of a hyperplane, such that % belongs to an open halfspace (defined by the hyperplane) and K. to its complement. The above hyperplane (if any) separates (in a disjunctive way) % and /C. More generally, we can look for a functional, such that H belongs to its positive level set and K, to the nonpositive one. % and /C are subsets of the space Re+m, which is called image space; K. is called the image of problem (1.1). It is easy to see that (1.1) is equivalent to the vector maximization problem: (2.3)
max C V [ 0 } u , s.t. (u,v) e K. n (Re x R™),
where maxc\{o} marks vector maximum with respect to the cone C\{0}; more precisely, (u, s ) e / C n (R* x R™) is a vector maximum point of (2.3) iff (2.4)
w^c\{o}W , V ( « , « ) e ^ f l ( R ' x K " ) .
100 (2.3) is called image problem associated to (1.1). In [10] the above approach was proposed both for Scalar (£ = 1) and Vector (C = R ^ i ^ l ) Optimization as well as for Variational Inequalities. Such a proposal has led to some developments in the scalar case (see, e.g., [2,6,11,14,21] and references therein), and to some initial results [1,7,13] also in the vector case. Recently, some papers (see, e.g., [3,8]) have been published in the field of Vector Optimization, which introduce, more or less explicitly and independently from each other, the image space (called objective or balance space). The Authors of such papers seem not to know the proposal made in [10] and the subsequent developments. For this reason, the present paper aims to contribute to avoid the re-introduction of already existing theories and to stimulate research in the field. Now, we will briefly outline a way of deriving several topics from the separation scheme; for details one is referred to [11]. Let C* := {c* € R* : (c*,c) > 0, Vc e C} denote the positive polar of C. Consider a function w : He x R m —• R given by w = w(u,v;0,\) = (9,u) + (\,v), where J e R ( and A G R m are parameters. We then have (u,v) €H
, 9 e int C\
A > 0 => {9, u) + (A, v) > 0,
or (2.5)
9 e int C* , A e R!? =4- U
Therefore, it is easy to see that the existence of 9 E int C* and A 6 R™, such that (2.6)
(9, f(y) - f(x)) + (A, g(x))
<0,VxeX,
is a sufficient condition for (2.2), and thus for the impossibility of (2.1), and hence for y to be a v.m.p. of (1.1). Under suitable assumptions, it is also a necessary condition. Likewise what has been done in [11], from (2.6) we can derive, besides optimality conditions, duality, penalty and gap functions theory. In fact, (2.6) can be considered as coming from the minimization of the scalar function (9,f(x)) over R, 9 being a (scalarizing) parameter; indeed, this is precisely the classic way of scalarizing (1.1). Of course, the above way of separating % and K, is not the only one. Instead of a linear separation function, like the above w, we can introduce a nonlinear ones; among these, a particularly interesting one is a piecewise linear separation function, which has been studied in [4]. If this piecewise separation scheme is adopted, then we recover the classic vector duality theory [22].
101
3
On the Scalarization of Vector Optimization
Now, let us discuss one of the most analysed topics in Vector Optimization: scalarization of (1.1), namely how to set up a scalar minimization problem, which leads to detecting all the solutions of (1.1) or at least one. Assume that X be convex. Let us recall that / is called C-function, iff Vx',x" G X we have [11]: (3.1)
(1 - a)f(x')
+ af(x")
- / ( ( l - a)x' + ax") G C , Va G [0,1].
When C D R ^ o r C C R ' + , then / is called C-convex. At £ = 1 and C = R+ we recover the classic definition of convexity. Now, Vy 6 X, consider the sets 4 : S(y) :={x€X:
f(x) G f(y) - C} , Sp(y) := {x G X : (p, f(x))
< (p,
f(y))},
where p G C*. When X = R " and C = R+, then the above sets are level sets of / and (p, / ) , respectively. If / is linear, then S(y) is a cone with apex at y, and Sp(y) a supporting halfspace of S(y) at its apex. Proposition 1. If / is a C-function, then S(y) is convex My G X. Proof, x', x" <E S(y) => 3c', c" e C such that f(x') = }{y) - c' and f(x") = / ( ? / ) - c". From these equalities, since the convexity of C implies c := (1 — a)d + ac" € C, Va e [0,1], we find: (3.2)
{l-<x)f(x,)
+ af(x!') = f(y)-c
, Vae[0,l].
From (3.1) we have that 3c G C, such that: / ( ( l - a)x' + ax") = (1 - a)f(x')
+ af(x")
- c = f(y) - c - c = f(y) - c,
where (1 — a)x' + ax" G X (since X is convex), c := c + c G C because C is a convex cone, and the last but one equality comes from (3.2). It follows that (1 - a)x' + ax" G S(y) , Va e [0,1], W, x" G S{y). Now, consider any fixed p G C*, and introduce the (scalar) minimization problem: (3.3)
min(p, f(x)}
, s.t.
x€RnS(y),
which depends on the parameter y. 4
In the sequel, p will not play the role of a parameter and will be considered fixed.
D
102 Proposition 2. Let X be convex. If / is a C-function, g is concave and p G C*, then (3.3) is convex. Proof. We have to show that (p, / ) and RnS(y)
W,x"
are convex, p e C* and (3.1) imply,
eX, (p, (1 - a ) / ( a 0 + a / ( i " ) - / ( ( l - a ) z ' + ax")) > 0 , Va 6 [0,1],
or (p, / ( ( l - a)a:' + ax")) < (1 - a)(p, /(*')) + a(p, / ( * " ) } , Va G [0,1], which expresses the convexity of (p, f{x)). The convexity of X and the concavity of g give the convexity of R. Because of Proposition 1 we obtain the convexity of S(y) and hence that of R n S(y). D
Proposition 3 . If p € C*, then (3.4)
S(y)CSp(y)
, y e S(y)nSp(y)
, Vy e R» .
Proof, a; G 5(y) => 3c € C such that f(x) = f(y) — c. From this equality, taken into account that p G C* and c G C imply (p, c) > 0, we have: (P,f(x))
= (p,/(l/)) -,
The 1st of (3.4) follows. O e C o j e (3.4) holds.
S(y);y
G S p (y) is trivial; hence the 2nd of O
Now, let us state some properties; they might be useful in defining a method for finding one or all the solutions of (1.1) by solving (3.3). Proposition 4. Let p G intC* be fixed. Then, (2.1) is impossible - and hence y is a solution of (1.1) - iff the system (in the unknown x): (3.5)
xeX
is impossible. Furthermore, the impossibility of (3.5) is a necessary and sufficient condition for y to be a (scalar) minimum point of (3.3). Proof. The 1st of (3.5)=> f(y) - f(x) / 0, so that the possibility of (3.5) implies that of (2.1). The 1st of (2.1) and p G intC* imply the 1st of (3.5), so that the possibility of (2.1) implies that of (3.5). By replacing the 1st of (3.5) equivalently with (p,f(x)) < (p,f(y)), we immediately see the triviality of the 2nd part of the statement. Dl
103 Proposition 4 shows that y° is a v.m.p. of (1.1) iff it is a (scalar) minimum point of (3.3) at y = y°. For every y G R let A(y) denote the set of solutions of (3.3); Proposition 4 states that y° is a v.m.p. of (1.1) iff it is a fixed point of the point-toset map A : X =t X. The following results show how it is possible to determine such a fixed point. Proposition 5. We have: (3.6)
x° G S(y°)
=• S(x°) C S(y°).
Proof. x° G S(y°) => 3c° 6 C such that f(x°) = f{y°) - c°. x € S(x°) => 3c G C such that / ( £ ) = /(a; 0 ) — c. Summing side by side the two equalities we obtain f(x) = f(y°) — c, where c := c° + c G C since C is a convex cone. It follows that x G S{y°) and hence 5(a:°) C S(y°). O
Proposition 6. If x° is a (global) minimum point of (3.3) at y — y°, then a;0 is a (global) minimum point of (3.3) also at y = x°. Proof. Ab absurdo, suppose that a;0 is not a (global) minimum point of (3.3) at y = x°. Then (3.7)
3xeRnS(x°)
s.t.
(pj(x))<(p,f(x0)).
Because of Proposition 5, x° G S{y°) =^> S(x°) C S{y°). This inclusion and (3.7) imply xeRnS(y°) and (p,f(x))<(p,f(x0)), which contradict the assumption.
D
Proposition 6 suggests a method for finding a v.m.p. of (1.1). Let us choose any p G intC*; p will remain fixed in the sequel. Then, we choose any y° G R and solve the (scalar) problem (3.3) at y = y°. We find a solution a;0 (if any). According to Proposition 6, a:0 is a fixed point of the point-to-set map A and then it is a v.m.p. of (1.1). If we want to find all the solutions of (1.1), we must look at (3.3) as a parametric problem with respect to y; Propositions 4 and 6 guarantee that all the solutions of (1.1) will be reached. Note that such a scalarization method does not require any assumption on (1.1). Example 1. Let us set I = 2, m = 2, n = 1, X = R, C = R+, and /i(x) = 2 x - a ; 2 , f2{x) = 1 - x 2 , gi{x) = x, g2(x) = l - x , / = ( / i , / 2 ) , g= {gi,g2).
104 We find S(y) = {y} Vy £ [0,1]. Hence, the unique solution of (3.3) is y itself. By varying y, (3.3) gives, with its solutions, the interval [0,1], which is the set of v.m.p. of (1.1), as it is trivial to check. Now, let us use the classic scalarization [5,21], i.e. the scalar parametric problem which, here, becomes: (3.8)
min[cifi(x)
+ c2f2(x) = - ( c i + c2)x2 + 2cxx + c2], s.t. x G [0,1],
where (ci,C2) € intC* = int R^. are parameters. Such a scalarization aims to find all the v.m.p. of (1.1) by solving (3.8) with respect to all possible pairs of parameters (ci,c 2 ). In the present example it is easy to see that the only solutions of (3.8) are x = 0, or a; = 0 and x = 1, or x = 1, according to respectively c2 < C\, or c2 = ci, or c2 > C\. Hence, the scalarized problem (3.8) does not detect all the solutions of (1.1). In order to stress the differences between the classic scalarization of a Vector Optimization Problem and the present one, let us consider the following: Example 2. Let us set £ = 2,m = l , n = 2,X = R 2 , C = R ^ , x = {x1,x2),y (2/1,2/2), and
=
fi(x) = xi+ 2x 2 , f2(x) = 4xx + 2x2 , g{x) = ~\xx\ + x2. Choose p = (1,1) and y° = (0,1). Then (3.3) becomes: (3.9)
min(5xi + 4x 2 ) , s.t. - |xi| + x2 > 0, xx + 1x2 < 2,2x\ + x2 < 1.
The (unique) solution of (3.9) is easily found to be x° = (—2, 2). Because of Proposition 6 x° is a v.m.p. of (1.1) in the present case. Furthermore, we have R!~\ S(x°) = {x0}, namely the parametric system (in the unknown x): (3.10)
-|a;i I + x2 > 0 , x1 + 2x2
+ 2y2 , 2xx + x2 < 2yx + y2
has the (unique) solution a;0. In order to find all the v.m.p. of (1.1) we have to search for all y € R such that (3.10) has y itself as the (unique) solution. (3.10) is equivalent to /3
11 - )
/ kil < 1
x2 < ~^xx + ^(yi + 2y2) x2< - 2 x i + 2j/i + y2.
With xi > 0, (3.11) cannot have y as (unique) solution. Hence, we consider the case Xi < 0; by using Motzkin elimination method and by requiring a unique solution, (3.11) becomes: ~~2Xl
=
2^Vl
+ 2y2
^
'
Xl = 2yi
+ V2
'
Xl
~ °
105 and leads us to y\ + y2 = 0, y\ < 0 or
y = (yi = -t, V2 = t) , te [o, +oo[, which gives us all the v.m.p. of (1.1). Now, let us use the classic scalarization [5,21], i.e. the scalar parametric problem which, here, becomes: (3.12) min[ci/i(a:) + c2f2{x) = ( d + 4c 2 )zi + (2c : + 2c 2 )x 2 ], s.t.
-\x1\+x2>0,
where (ci,c2) £ C*\{0} = R+ \{0} are parameters. Such a scalarization aims detect all the v.m.p. of (1.1) by solving (3.12) with respect to all possible pairs parameters (ci,c 2 ). In the present example, it is trivial to see that the minimum (3.12) exists iff —\c\ < c2 < \c\, and that at 0 < c2 < | c i , the minimum points (3.12) are all the v.m.p. of (1.1).
4
to of of of
Vector Variational Inequalities
The approach outlined in Sect.2 in reference to (1.1) can be adopted also in fields other than Optimization. Indeed, the starting point is the impossibility of a system; (2.1) is a special case. Now, let F : R " —> R £ x n be a matrix-valued function, and consider the following Vector Variational Inequality: find y G K := {x £ X : g(x) > 0} such that (4.1)
F(y){x - y) £c\{0} 0 , Vz e K.
At £ = 1 (4.1) becomes the classic Stampacchia Variational Inequality 5 [15]. At £ > 1 and C = R ^ , the study of (4.1) has been proposed in [10]. Obviously, y is a solution of (4.1) iff the system (in the unknown x): (4.2)
F(y)(y - x) 6 C , F(y){y - x) # 0 , g(x) > 0 , x € X
is impossible. Starting from this point several topics can be developed; see, for instance, [13]. Here, we take into consideration the scalarization of (4.1). To this aim consider the sets: T(y) := { i 6 R " : F(y)x e F{y)y-C}
, Tp{y) := {x 6 R" :
(PF(y),y)},
where p G C* is considered a row-vector. When C = R^_, then the above sets are the level sets of the vector function F{y)x and of the scalar function (pF(y),x). Tp(y) is a supporting halfspace of T(y) at y, as Proposition 11 will show. If F(y) is a constant matrix and C is polyhedral, then T(y) is a polyhedron. 5
It would be interesting to extend the present analysis to the Minty Vector Variational Inequality
[12,16].
106 Proposition 7. T(y) is convex \/y e X. Proof. x',x" e T(y) => 3c', c" € C such that F{y)x' F(y)y — c". From these equalities, Va € [0,1] we have:
= F{y)y - d,F(y)x"
=
F(y)[(l - a)x' + ax"] = F{y)y - c(a), where c(a) := (1 — a)d + ad' e C since C is convex. Hence (1 — a)x' + ax" € T{y) for each a 6 [0,1]. O
Now, let us introduce the (scalar) Quasi-Variational Inequality which consists in finding y e K(y) := K II T(y) such that (4.3)
{Fp(y),x-y)>0,VxeK(y),
where Fp(y) := pF(y) and p has to be considered fixed; (4.3) is a scalarization of (4.1). F will be called C-operator iff (4.4)
[F{x') - F{x")]{x' - x") € C , Va:', x" G R n .
When C D R+ or C C R^_, then F will be called C'-monotone; when £ = 1, the notion of C-operator collapses to classic ones: F becomes monotone or antitone, according to C = R + or C = R_, respectively. Proposition 8. If X is convex, F is a C-operator, g is concave, and p e C*, then (4.3) is monotone. Proof. We have to show that K(y) is convex and Fp monotone. The assumptions on X and g imply the convexity of K; because of Proposition 7, T(y) is convex Vy € R n . Hence K(y) is convex Vy G R". p € C* and (4.4) imply (p, [ i V ) - F{x")}{x' - x")) > 0 W,x"
€ Rn,
or (Fp(x') - Fp{x"),x' - x") > 0 Va;',x" e E " .
Proposition 9. If p £ C*, then (4-5)
T(y)CTp(y)
, T(y)nTp(u)
D {y}.
D
107 Proof, x 6 T(y) =>• 3c € C such that F(y)x = F(y)y — c. From this equality, account being taken that p € C* and c € C imply (p, c) > 0, we have: (PF(y),x)
= (pF(y),y)
- (p,c) <
(pF(y),y).
The 1st of (4.5) follows. The 2nd of (4.5) is a consequence of the obvious relations V £ T(y) (due to the closure of C) and y € Tp(y). D
Now, let us state some preliminary properties, which might help in finding methods for solving (4.1) through (4.3). Proposition 10. If p € intC*, then (4.2) is impossible - and hence y is a solution of (4.1) - iff the system (in the unknown x): (4.6)
(PF(y), y-x)>0
, F{y){y - i ) e C , g(x) > 0 , x € X
is impossible. Furthermore, the impossibility of (4.6) is a necessary and sufficient condition for y to be a solution of (4.3). Proof. The 1st of (4.6)=> F(y)(y — x) =f= 0, so that the possibility of (4.6) implies that of (4.2). The 1st of (4.2) and p e intC* imply the 1st of (4.6). The last part of statement is trivial. D Note that Proposition 10 shows that a Vector Variational Inequality can be equival e n t ^ replaced with a scalar Quasi-Variational one. Consider, now, the function
(Fp{y),x-y),
and the problem (where y € K(y) is a parameter): (4.7)
mm(p(x;y)
, s.t. x £ K{y).
Since ip(y; y) = 0, for any fixed y the minimum in (4.7) is < 0. Proposition 11. y € K is a solution of (4.1) iff it is a global minimum point of (4.7). Proof. Let y be a global minimum point of (4.7), so that tp(y; y) = 0 and tp(x; y) > 0 for each x e K(y). Hence, because of Proposition 10, y is a solution of (4.1). Now, let y be a solution of (4.1). Because of Proposition 10, system (4.6) is impossible, so that y is a solution of (4.3); this implies (p(x; y) > 0 Mx e K(y). Since tp(y; y) = 0, y is a global minimum point of (4.7). D
108 Example 3. Let us set I = 2, m = 1, n = 2, C = R2+ = C, x — (xi,x2), g(x) = Xi + x2 — 2, and F ( v ) = w
(201-2 V2y: + 2
X = K2+, y = {yu y2),
2y2 + 2 2y2~2j-
We choose p = (pi,P2) = (1> l)i so that (fi(x; y) = 4(2/13;! + y2x2 - y\ - y^), f (1 - Vl)Xl - (1 + 2/2)^2 + y? + 2/1 - 2/i + 2/2 > 0 •^(2/) : S - ( 1 + yi)a:i + (1 - 2/2)^2 + 2/1 + y\ + 2/1 - y2 > 0 [ X\ + x2 > 2 , i i > 0 , x2 > 0. It is easy to check that any y° which belongs to the segment ](2,0), (0, 2)[c R 2 is a solution of (4.7) at y = y° and hence of (4.1). At y = y = (2,0), problem (4.7) becomes min4(xi — 2) , s.t. x\ + x2 = 2, 3 ^ ! — x2 < 6, x\ > 0, x2 > 0, whose unique global minimum point is x = (0, 2) / y. Hence, because of Proposition 11, y is not solution of (4.1). A quite analogous conclusions can be drawn at y = (0, 2). Note that in Example 3 the operator F(y) is the Jacobian matrix of the vector function / : R 2 -> R 2 , given by fr
, A _ ( / i ( 2 / h _ ( ( 2 / i - l ) 2 + (2/2 + l ) 2 \
/W
~ U ( 2 / ) J ~ V ( 2 / i + l) 2 + (2/ 2 -l) 2 J '
Consider (1.1) with the above function and with R equal to K of Example 3, i.e. K = {x 6 R ^ : Xi + x2 > 2}. It is easy to check that all the v.m.p. of (1.1) are given by the segment [(2,0), (0, 2)], whose extrema are not solutions of the related Vector Variational Inequality (4.1). Hence, such inequality is a sufficient (but not necessary) condition for y to be v.m.p. of (1.1).
References 1 G. Bigi and M. Pappalardo, "Regularity conditions in Vector Optimization". Jou. Optimization Theory and Appls., Plenum, New York, to appear. 2 R. Conti et al. (Eds.), "Optimization and related fields". Lecture Notes in Mathematics No. 1190, Springer-Verlag, Berlin, 1986, pp. 57-93.
109 3 J.P. Dauer, "Analysis of the objective space in multiple objective linear programming". Jou. Mathematical Analysis and Appls.", Vol. 126, 1987, pp. 579-593. 4 P.H. Dien, G. Mastroeni, M. Pappalardo and P.H. Quang, "Regularity conditions for constrained extremum problems via image space: the linear case". In Lecture Notes in Sc. and Mathem. Systems, No. 405, Komlosi, Rapcsack, Schaible Eds., Springer-Verlag, 1994, pp. 145-152. 5 Dinh The Luc, "Theory of Vector Optimization". Lecture Notes in Ec. and Math. Systems No. 319, Springer-Verlag, Berlin, 1989. 6 G. Di Pillo et al. (Eds.), "Nonlinear optimization and Applications". Plenum, New York, 1996, pp. 13-26 and 171-179. 7 P. Favati and M. Pappalardo, "On the reciprocal Vector Optimization problems". Jou. Optimization Theory Appls., Plenum, New York, Vol. 47, No. 2, 1985, pp. 181-193. 8 P.A.V. Ferreira and M.E.S. Machado, "Solving multiple-objective problems in the objectice space". Jou. Optimization Theory Appls., Plenum, New York, Vol. 89, No. 3, 1996, pp. 659-680. 9 E.A. Galperin, "Nonscalarized multiobjective global optimization". Jou. Optimization Theory Appls., Plenum, New York, Vol. 75, No. 1, 1992, pp. 69-85. 10 F. Giannessi, "Theorems of the alternative, quadratic programs and complementarity problems". In "Variational Inequalities and complementarity problems", R.W. Cottle et al. Eds., J. Wiley, 1980, pp. 151-186. 11 F. Giannessi, "Theorems of the alternative and optimality conditions". Jou. Optimization Theory Appls., Plenum, New York, Vol. 42, No. 11, 1984. pp. 331-365. 12 F. Giannessi, "On Minty variational principle". In "New trends in mathematical programming", F. Giannessi, S. Komlosi and T. Rapcsack Eds., Kluwer Acad. Publ., Dordrecht, 1998, pp. 93-99. 13 F. Giannessi Ed., "Vector Variational Inequalities and Vector equilibria. Mathematical Theories". Kluwer Acad. Publ., New York 1999. 14 F. Giannessi and A. Maugeri (Eds.), "Variational Inequalities and Networks equilibrium problems". Plenum, New York, 1995, pp. 1-7, 21-31, 101-121 and 195-211. 15 D. Kinderleherer and G. Stampacchia, "An introduction to Variational inequalities". Academic Press, New York, 1980.
110
16 S. Komlosi and L. Pellegrini, "On the Stampacchia and Minty Vector Variational Inequalities". Submitted to [13]. 17 G. Mastroeni, "Separation methods for Vector Variational Inequalities. Saddle point and gap function". To appear in "Nonlinear Optimization and Applications", G. Di Pillo at al. (Eds.), Kluwer Acad. Publ., Dordrecht, 1999. 18 M. Pappalardo, "Stationarity in Vector Optimization". Rendiconti del Circolo Matematico di Palermo, Serie II, No. 48, 1997, pp. 195-200. 19 A. Pascoletti and P. Serafini, "Scalarizing Vector Optimization problems". Jou. Optimization Theory Appls., Plenum, New York, Vol. 42, No. 4, 1984, pp. 499-523. 20 L. Pellegrini, "On a general approach to Vector Optimization. Duality". To appear in "Nonlinear Optimization and Applications", G. Di Pillo et al. (Eds.), Kluwer Acad. Publ., Dordrecht, 1999. 21 T. Rapcsack, "Smooth Nonlinear Optimization in R n " . Series "Nonconvex Optimization and its Applications", No. 19, Kluwer Acad. Publ., Dordrecht, 1997. 22 S. Wang and Z. Li, "Scalarization and Lagrange duality in multiobjective optimization". Optimization, Vol. 26, Gordon and Breach Publ., 1992, pp. 315324.
Combinatorial and Global Optimization, pp. 111-121 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Solving Quadratic Knapsack Problems by Reformulation and Tabu Search. Single Constraint Case Fred Glover University of Colorado at Boulder Gary Kochenberger University of Colorado at Derive Bahrain Alidaee University of Mississippi Mohammad Amini University of Memphis
Abstract Several recent papers have presented new approaches to special cases of the quadratic knapsack (QK) problem. Despite the advances reported, these problems remain, in general, very challenging to solve. Models containing more than 100 variables tax current methods to their limits. Moreover, current methods can only handle special instances of quadratic objective functions, leaving a large range of important problems unconsidered. In this paper, we show that it is possible to handle quadratic knapsack problems of dramatically greater size than those within the capacity of previous methods, and in addition to handle entirely general quadratic objectives. This outcome results by the device of reformulating the QK problem as an unconstrained binary quadratic problem, and applying a recently developed metaheuristic to this latter version. We report computational experience that discloses the viability of our approach. Keywords: quadratic knapsack problem, unconstrained binary quadratic problem, tabu search
112
1
Introduction
The Quadratic (multidimensional) Knapsack problem is one of several classic NPhard problems that has been the object of considerable research over the years. The general problem can be stated as QK : max xQx st Ax < b xe{o,i} We assume the coefficients of the matrix A are all non-negative, and those of b are all positive. No sign conditions are assumed for Q. Note that any linear term originally appearing in the objective function can be absorbed into the diagonal of Q, since Xj = x? for any 0-1 variable Xj. In this paper we focus on the single knapsack case, where A consists of a single row vector, although the approach taken can be applied to the more general case with multiple knapsack constraints. Program QK has many applications, having been shown to address problems arising in capital budgeting, flexible manufacturing, earth station site selection for satellite communication systems, task assignments in host satellite systems, and a variety of other problems. The references given, and particularly the paper by Gallo, Hammer and Simeone [4], give discussions of various applications. A significant number of these applications involve the single knapsack case we examine here, and most of the algorithms developed for QK also focus on this case. It is also possible, via the types of re-formulations recently highlighted by Kochenberger, Alidaee, and Amini [12], to recast many traditional models into the form of QK. Thus, the applicability of QK is quite broad in its scope. Several notable papers have recently appeared in the literature advancing our ability to solve special cases of the single knapsack QK. Billionnet and Calmels [2] discuss a branch and bound algorithm based on linearizations while Michelon and Veilleax [14] present a branch and bound approach based on Lagrangian decomposition and Lagrangian relaxation. Each of these algorithms assumes the matrix Q is positive. Both approaches employ variable fixing routines and both report computational experience with random problems of size up to 40 variables. More recently, Hammer and Rader [8] gave a branch and bound algorithm for the single knapsack QK based on repeated use of the L2 -best linear approximation. Variable fixing procedures, based on Lagrangian relaxation, are explored and computational experience on problems with up to 100 variables is reported. The Hammer and Rader algorithm, like the two methods cited above, assumes a positive Q matrix. Since QK is NP-Hard, exact algorithms are likely to degrade in performance as the
113 problem size grows. As expected, the experience reported in the three algorithms referenced above illustrates the rapid growth in computation times as problem size grows. The state of the art, even for this special class of problems (positive Q) with a single knapsack constraint, is rather limited in terms of the size of problem that can be solved by exact methods. Larger problems, and those involving a general Q matrix, are computationally demanding and more amenable to heuristics than exact methods. The approach we take in this paper is to reformulate QK as an unconstrained binary quadratic program (QP) and then solve the reformulated, equivalent model by recently developed heuristics designed to solve QP. This approach is motivated by the success reported on new solution approaches for QP. See for example, the recent papers by Pardalos and Rodgers [15], Chardaire and Sutter [3], Glover, Kochenberger and Alidaee [5], Glover, Kochenberger, Alidaee, and Amini [6], Hammer, Boros, and Sun [7], and Lodi, Allemand, and Liebling [13]. In this case we make use of the method of [6]. Although heuristic, this method has been demonstrated to be so effective that it has obtained optimal solutions to all benchmark problems that exact methods have been able to solve, while requiring greatly reduced computational time by comparison to the exact methods. In the sections that follow, we present the transformation used to convert QK into QP. We then present our computational experience followed by some summary comments.
2
Reformulation
Transformations for reformulating integer programs in the manner employed here have been proposed by several authors. Discussions relevant to our work, and upon which we draw, are given by Bazarra and Goode [1], Hammer and Rudeanu [9], Hansen [10], Hansen et al [11], and Sinclair [16]. For a recent paper highlighting such reformulations in a variety of model settings, the reader is referred to Kochenberger, Alidaee, and Amini [12]. In the work reported here, we convert Quadratic Knapsack models into an unconstrained quadratic program (QP) by employing a quadratic infeasibility penalty. While other representations are possible, we restrict our efforts to quadratic penalties in order to take advantage of our ability to solve QP. Adding slack variables (in binary expansion form), we can write the quadratic knapsack problem as: QK : max Xo = xQx st Ax = b xe{0,l}
114 where x, A, and Q have been augmented to include the slack variables. Then, the equality constraints can be accommodated by introducing a positive penalty, PEN, and the associated penalty term PEN(Ax - b)'(Ax - b). Subtracting this term from the objective function yields the equivalent penalized program QK(PEN) : max x 0 = = st
xQx - PEN(Ax - b)'(Ax - b) xQx — xDx + ex + constant x€{0,l} l
where D = PEN * A*A, c = 2PEN * b A, and the constant = - P E N * b'b. The additive constant does not effect the optimization and can be ignored for the time being. The remaining three terms can be combined into a single term allowing us to represent the equivalent unconstrained quadratic program simply as: QK(PEN) : max xQx st xe{o,i} where qjj = q5j — dy for i ^ j and qj, = qn—dn+Ci for the diagonal terms. As previously remarked, we solve QK(PEN) using the Tabu Search (TS) approach reported in Glover, Kochenberger, Alidaee, and Amini [6]. An overview of our TS method is given in the appendix. Example 1. To illustrate the reformulation procedure, consider the example used by Hammer and Rader [8]: maximize xo =
2xi + 5x2 + 2x3 + 4x4 + 8x1X2 + 6x1X3 +10xix 4 + 2x 2 x 3 + 6x2X4 + 4x 3 x 4
st 8xi + 6x2 + 5x 3 + 3x4 < 16 Introducing a slack activity, bounded above by 3, as lx 5 + 2x6, the transformation can be used to give an equivalent unconstrained problem in 6 binary variables. (The upper bound on the slack is selected to be large enough to admit an optimal solution. This can be done by several approaches, which we do not address here.) Choosing PEN to be 10, we get QP(PEN) with Q given by " 1922 -476 -397 -235 -80 -476 1565 -299 -177 -60 -397 -299 1352 -148 -50 874 -30 -235 -177 -148
-80
-60
-50
-160 -120 -100
-30 310 -60 -20
-160 " -120 -100 -60 -20 600
115 and an additive constant of —2560. Solving this unconstrained problem, QP(PEN), gives the solution x 0 = 2588 which is found at x = {1,0,1,1,0,0} . Adjusting for the additive constant, we have the original objective function value of 28. In this example, both slack variables are equal to zero and thus the knapsack constraint is tight at the optimal solution given. We now turn to the computational work we carried out and the results obtained.
3
Computational Experiments
To test our approach to quadratic knapsack problems, 44 random quadratic knapsack problems were generated, re-formulated as unconstrained quadratic binary programs, and solved using our tabu search algorithm. Each problem was generated with qy between 0 and 25, aj between 1 and 10, and b chosen between 10 and X) a j- Problem sizes ranged from 10 to 500 variables with densities varying from 25% to 100 %. Each problem had a single knapsack constraint that was converted to an equality with the addition of six additional binary slack variables. This allowed for a slack activity up to size 63 (which proved to more than sufficient). Tables 1 and 2 summarize our results. For each problem, we report problem characteristics, along with the solution obtained in the allotted number of Tabu Search oscillation cycles and computation times. The solutions shown in Table 1, with the exception of problem C03, are in fact optimal, as verified by the branch and bound algorithm of Pardalos and Rodgers (P&R) [15]. Problem C03 could not be solved to completion by the exact method of [15] and thus an optimal solution to this problem is not known. Likewise, optimal solutions for the problems of Table 2 are yet to be established. These problems, while readily solved by our Tabu Search heuristic, proved to be beyond the capability of the branch and bound algorithm. (Branch and bound (P&R) run times in excess of six hours failed to produce optimal solutions for the smallest of these problems.) The results shown in Table 2 are the best values found within the arbitrary cycle limit imposed in our tabu search approach. Certainly the choice of the parameter PEN is key to the approach we are taking. Theoretically, it is sufficient to choose PEN to be greater than the sum of the absolute values of the elements of Q. Our experience indicates, however, that feasible solutions can be obtained with much smaller values for PEN. PEN needs to be large enough to make infeasible choices unattractive to the search/optimization process. Yet choosing PEN somewhat larger than necessary may cause the elements of Q to become unnecessarily large, masking the relevance of Q and making the problem, in principle, harder to solve. This potential difficulty, as discussed later in this section, did not prove to be a problem for us. The approach taken in the work reported here was to start with a small value for PEN and raise it as needed in order to produce
116
Problem ID
A01 A01 A03 A04 A05 A06 A07 A08 B01 B02 B03 B04 B05 B06 B07 B08 C01 C02 C03 C04 C05 C06 C07 C08
Table 1: COMPUTATIONAL RESULTS # variables Density PEN Solution # TS Cycles 10 0.25 50 100 6220 10 50 100 0.25 24373 10 100 0.5 50 1357 10 0.5 50 45342 100 10 50 3408 100 0.75 100 10 0.75 50 34220 1 58612 100 10 50 10 1 50 42462 100 50 180947 100 20 0.25 20 50 76794 100 0.25 20 50 73279 100 0.5 20 0.5 50 226432 100 20 50 347331 100 0.75 20 100 0.75 50 55570 20 1 50 194715 100 20 1 50 32605 100 30 100 0.25 250 156990 30 50 100 0.25 2635 30 250 1262937 100 0.5 30 250 814340 100 0.5 30 50 4631 100 0.75 100 30 0.75 250 273512 100 30 1 250 422510 6472 100 30 1 50
Time(sec)
<1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <1 <2 <2 <2 <2 <2 <2 <2 <2
feasible solutions. We arbitrarily took PEN to be 50, 250, 500, 750, 1000, or 1500. For the "A", " B " , "C", "D", and "E" problems, PEN was initially set to 50. Each problem was then solved and the solutions obtained were checked for feasibility. Problems that were infeasible were solved again with successively larger values of PEN until the knapsack constraint was satisfied. The " F " problems, based on what we learned from A-E, were solved with PEN = 1500 from the outset. Clearly PEN = 1500 (or something larger) would have worked for all the problems. Tables 1 and 2 report the values of PEN used to yield the solutions shown. In all cases, the solutions listed are feasible with respect to the original knapsack constraints. Choosing PEN too large, as mentioned above, has the potential to make a problem more difficult to solve than an appropriately chosen smaller value of PEN. For the testing we have done to date, this potential difficulty did not materialize. This is illustrated in Table 3 where we report results obtained by solving a particular problem,
117
Problem ID
D01 D02 D03 D04 D05 D06 D07 D08 E01 E02 E03 E04 E05 E06 E07 E08 F01 F02 F03 F04
Table 2: COMPUTATIONAL EXPERIENCE (CON'T) # variables Density PEN Solution # T S cycles Time(sec) 40 0.25 50 888033 100 <4 40 50 1607975 0.25 100 <4 40 0.5 50 177190 100 <4 40 50 1272126 100 <4 0.5 40 0.75 50 1076775 100 <4 40 0.75 50 1389799 100 <4 1 40 50 1549279 100 <4 1 <4 40 50 40931 100 100 50 2659975 0.25 100 <9 100 100 0.25 50 1953310 <9 100 100 0.5 500 4241440 <9 100 0.5 750 10993576 100 <9 100 500 2119812 100 <9 0.75 100 0.75 1000 11040166 100 <9 100 1 750 13696762 100 <9 100 1 1000 5943174 100 <9 500 0.25 1500 35491046 500 240 500 0.5 1500 35544448 500 240 500 0.75 1500 76078805 500 240 500 1 1500 76145674 500 240
C08, with various values of PEN ranging from 250 to 3000. Table 3 lists the value of PEN, the optimal solution value, and the Tabu Search cycle at which the optimal solution was obtained for these runs. In all cases, the solutions shown in Table 3 are feasible and optimal. Note that the performance of the heuristic was fairly uniform over the range of PEN values. While there is some variation in the cycle count required to reach the optimal solution, no discernible pattern is suggested by the table. Similar computations on other problems have also yielded only small variations in solution burden associated with differing PEN values. This is a somewhat surprising discovery since the folklore about using " infeasibility penalties" in discrete and nonlinear optimization is that their size can materially effect computational efficiency. The absence of this effect here suggest that our TS method is highly robust. Our approach had no trouble finding high quality, feasible solutions to all problems in the test set. And, compared to alternative methods from the literature, these solutions were found very quickly. The papers by Michelon and Veilleux [14] and Billionnet and Calmels [2] report computational experience on problems up to size n = 40 only. Their problems are comparable in size and density to our "A", " B " , "C", and "D"
118
Table 3: PERFORMANCE ON PROBLEM C08 FOR VARIOUS VALUES OF PEN PEN 250 500 750 1000 1500 3000
Optimal Value 30500 60750 91000 121250 181750 363250
Cycles Yielding Optimal 16 17 29 10 19 29
problems. For the 40 variable problems, Michelon and Veilleax report solution times of more than 900 seconds (SUN 4.0) while Billionet and Calmels report solutions times of approximately 500 seconds (HP 9000) for their 40 variable problems. Hammer and Rader (H&R) [8] report computational experience with problems up to size n = 100. They report solving completely dense, 40 variable problems in roughly 20 seconds (SPARC station 1+). For their 100 variable problems, H&R solve low density problems with admirable efficiency. For instance, 25% problems are solved in little more than 2 minutes. However, 100 % dense problems, of size 100, were solved in roughly 24 minutes. None of these authors report experience on larger problems. Since our approach is heuristic in nature, comparing the solution times we report in Tables 1 and 2 with the branch and bound times given above must be done with considerable care. Such comparisons are further confounded by the fact that different computers were used in the various studies and that our Tabu Search approach was run for an arbitrary number of oscillation cycles. Nonetheless, a quick comparison gives the reader some indication of the speed with which our approach generates high quality solutions to these problems. As shown in Tables 1 and 2, we produce optimal solutions to test problems up to 30 variables (and varying densities) in about 2 seconds on a Pentium 200 computer. Thus, as in the case of unconstrained problems, we obtain optimal solutions to all problems where such solutions are known, and we obtain such solutions in at least an order of magnitude faster than the exact approach. In many cases we are faster by two orders of magnitude. The remaining problems belong to the group that could not be solved by the exact methods. We produced high quality feasible solutions to the 40 variable and 100 variable problems in roughly 4 and 9 seconds, respectively. For our 500 variable problems, which we ran for 500 oscillation cycles, run times were approximately 4 minutes. To the best of our knowledge, ours is the first study to report on problems as large as 500 variables. The previous limit is represented by the 100 variable problems in the Hammer and Rader paper.
119 It is interesting to note that our solution times for a given problem size are not very sensitive to problem density. Problems with 500 variables are solved in about 4 minutes whether they are 25% dense or 100% dense. This behavior is in sharp contrast to the branch and bound algorithms whose performance degrades considerably with density. For example, on their 100 variable problems, H&R report approximate times of 2, 3, 8, and 24 minutes, respectively, for densities 25, 50, 75, and 100%. The time invariant nature of our approach is due to the fact that regardless of the density of the original matrix Q, the transformation produces a matrix Q which is very dense. This is illustrated in the example presented earlier where Q was fairly sparse and Q was 100 % dense. It should also be pointed out that the transformation destroys any nice properties (like all non-negative elements) that Q may have exhibited. Neither the absence of special structure nor presence of high density pose a problem for our approach.
4
Summary and Conclusions
For quadratic knapsack problems of modest size and special properties, efficient exact methods, like that of Hammer and Rader, exist. However, quadratic knapsack problems with hundreds of variables pose computational difficulties in excess of what can reasonably be handled by exact methods. Current exact methods also are unable to handle QK problems that have negative elements in their objective functions. For these larger and/or more general problems, heuristic methods must be employed. In this paper we have demonstrated that the unconstrained binary quadratic program, QP, is a viable alternative approach for modeling and solving QK problems. The reformulated problems are readily solved by our tabu search algorithm to yield high quality feasible solutions, which are optimal in all cases that can be solved by exact methods. Our results illustrate that large instances can be readily solved. For the class of problems considered here, choosing an appropriate value for the parameter PEN was straightforward. An appropriate choice, which for any problem depends on both problem size and problem data, can always be found by a minimum of experimentation, or, more formally, by simple target analysis. However, we found (surprisingly) that taking PEN somewhat larger than necessary did not significantly affect computational performance. It should be noted that no attempt was made to tailor our Tabu Search algorithm for the work carried out here. We used the algorithm of [6] for general QP problems with no modifications. (The method is also not tuned to obtain best results for these QP problems.) For instance, employing an improvement routine, designed specifically for QK problems, at critical points in the search process would likely lead to better results. Our approach likewise does not depend on any particular property of the
120 original Q matrix. Future research will address the solution of QK problems that include multiple knapsack constraints.
A P P E N D I X : Overview of our Tabu Search Algorithm Our algorithm is built around strategic oscillations that alternate between constructive phases (progressively setting variables to 1, ADDS) and destructive phases (progressively setting variables to 0, DROPS). To control the underlying search process, we use a memory structure that is updated at locally optimal solutions called critical events. A parameter span is used to indicate the amplitude of oscillation about a critical event. We begin with span equal to 1 and gradually increase it to some limiting value. For each value of span, a series of alternating constructive and destructive phases is executed before progressing to the next value. At the limiting point, we begin to gradually decrease span, allowing again for a series of alternating constructive and destructive phases. When span reaches the value of 1, a complete span cycle has been completed and the next cycle is launched. Information stored at critical events is used to influence the search process by penalizing adds (during constructive phases) and favoring drops (during destructive phases) for variables that have been in recent critical solutions at a level of one. Cumulative critical event information is also used to introduce a subtle long term bias into the search process. A complete description of the approach is given in the references sited.
References [1] Bazarra, M. and J. Goode, "A Cutting-Plane Algorithm for the Quadratic Set Covering Problem," OR, Vol.23, no. 1, Jan-Feb (1975), pp.150-158. [2] Billionnet, A., and Calmels, A./'Linear Programming for the 0-1 Quadratic Knapsack Problem," EJOR, 92 (1996),310-325. [3] Chardaire, P., and Sutter, A., "A Decomposition Method for Quadratic Zero-One Programming," Management Science, 41(1995),704-712. [4] Gallo, G., Hammer, P., and Simeone,B., " Quadratic Knapsack Problems," Mathematical Programming, 12(1980),132-149. [5] Glover, F., Kochenberger, G., and Alidaee, B., "Adaptive Memory Tabu Search for Binary Quadratic Programs," Management Science, 44(1998), 336-345.
121 [6] Glover, F., Kochenberger,G., Alidaee, B., and, Amini, M., "Tabu Search With Critical Event Memory: An Enhanced Application for Binary Quadratic Programs," (1997), 2nd International Conference on MetaHeuristics. [7] Hammer, P., E. Boros, and X. Sun, "On Quadratic Unconstrained Binary Optimization," paper given at the Seattle INFORMS Meeting, Fall 1998. [8] Hammer, P., and Rader, D., "Efficient Methods for Solving quadratic 0-1 Knapsack Problems," INFOR, 35(1997),170-182 [9] Hammer, P., and Rudeanu, S., Boolean Methods in OR and Related Areas, Springer-Verlag, New York, (1968). [10] Hansen, P., "Methods of Nonlinear 0-1 Programming," Annals of Discrete Mathematics, 5(1079), 53-70. [11] Hansen, P., Jaumard,F., and Mathon,V., "Constrained Nonlinear 0-1 Programming," ORSA Journal on Computing, 5(1993), 97-119. [12] Kochenberger, G., Alidaee, B., and Amini, M., "Applications of the Unconstrained Binary Quadratic Program," Working Paper, University of Colorado at Denver, (1998) [13] Lodi, A., Allemand, K., and Liebling, T., "An Evolutionary Heuristic for Quadratic 0-1 Programming," EJOR (to appear). [14] Michelon, P., and Veilleau, L., "Lagrangean Methods for the 0-1 Quadratic Knapsack Problem," EJOR, 92 (1996), 326-341. [15] Pardalos, P., and Rodgers, G., "Computational Aspects of a Branch and Bound Algorithm for Quadratic zero-One Programming," Computing, 45(1990), 131144. [16] Sinclair, M., "An Exact Penalty Function Approach for Nonlinear Integer Programming Problems," EJOR, 27(1986), 50-56.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 123-132 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Global optimization using dynamic search trajectories Albert A. Groenwold (Albert.GroenwoldOeng.up.ac.za) De.pt. of Mechanical and Aeronautical Engineering, University of Pretoria, Pretoria, 0002, South Africa.
J.A. Snyman (Jan. SnymanQeng. up. a c . za) Dept. of Mechanical and Aeronautical Engineering, University of Pretoria, Pretoria, 0002, South Africa.
Abstract Two global optimization algorithms are presented. Both algorithms attempt to minimize an unconstrained objective function through the modeling of dynamic search trajectories. The first, namely the Snyman-Fatti algorithm, originated in the 1980's and still appears an effective global optimization algorithm. The second algorithm is currently under development, and is denoted the modified bouncing ball algorithm. For both algorithms, the search trajectories are modified to increase the likelihood of convergence to a low local minimum. Numerical results illustrate the effectiveness of both algorithms. Keywords: Global optimization, Dynamic search trajectories.
124
1
Introduction
The problem of globally optimizing a real valued function is inherently intractable (unless hard restrictions are imposed on the objective function) in that no practically useful characterization of the global optimum is available. Indeed the problem of determining an accurate estimate of the global optimum is mathematically ill-posed in the sense that very similar objective functions may have global optima very distant from each other [1]. Nevertheless, the need in practice to find a relative low local minimum has resulted in considerable research over the last decade to develop algorithms that attempt to find such a low minimum, e.g. see [2]. The general global optimization problem may be formulated as follows. Given a real valued objective function f(x) defined on the set x € D in M n , find the point x* and the corresponding function value /* such that /* = f{x") = minimum {/(x)|x e D}
(1)
if such a point x* exists. If the objective function and/or the feasible domain D are non-convex, then there may be many local minima which are not global. If D corresponds to all 1R" the optimization problem is unconstrained. Alternatively, simple bounds may be imposed, with D now corresponding to the hyper box (or domain or region of interest) defined by D = {x\£ < x < u)
(2)
where £ and u are n-vectors defining the respective lower and upper bounds on x. From a mathematical point of view, Problem (1) is essentially unsolvable, due to a lack of mathematical conditions characterizing the global optimum, as opposed to the local optimum of a smooth continuous function, which is characterized by the behavior of the problem function (Hessians and gradients) at the minimum [3] (viz. the Karush-Kuhn-Tucker conditions). Therefore, the global optimum /* can only be obtained by an exhaustive search, except if the objective function satisfies certain subsidiary conditions [4], which mostly are of limited practical use [5]. Typically, the conditions are that / should satisfy a Lipschitz condition with known constant L and that the search area is bounded, e.g. for all x, x € X \f(x)-f(x)\
< L\\x-x\\
(3)
So called space-covering deterministic techniques have been developed [6] under these special conditions. These techniques are expensive, and due to the need to know L, of limited practical use. Global optimization algorithms are divided into two major classes: deterministic and stochastic 1 [6]. Deterministic methods can be used to determine the global optimum 'Prom the Greek word stokhastikos, i.e. 'governed by the laws of probability'.
125 through exhaustive search. These methods are typically extremely expensive. With the introduction of a stochastic element into deterministic algorithms, the deterministic guarantee that the global optimum can be found is relaxed into a confidence measure. Stochastic methods can be used to assess the probability of having obtained the global minimum. Stochastic ideas are mostly used for the development of stopping criteria, or to approximate the regions of attraction as used by some methods [3]. The stochastic algorithms presented herein, namely the Snyman-Fatti algorithm and the modified bouncing ball algorithm, both depend on dynamic search trajectories to minimize the objective function. The respective trajectories, namely the motion of a particle of unit mass in an n-dimensional conservative force field, and the trajectory of a projectile in a conservative gravitational field, are modified to increase the likelihood of convergence to a low local minimum.
2
The Snyman-Fatti trajectory method
The essentials of the original SF algorithm [5] using dynamic search trajectories for unconstrained global minimization will now be discussed. The algorithm is based on the local algorithms presented in [7, 8]. For more details concerning the motivation of the method, its detailed construction, convergence theorems, computational aspects and some of the more obscure heuristics employed, the reader is referred to the original paper.
2.1 Dynamic trajectories In the SF algorithm successive sample points x^j = 1,2,..., are selected at random from the box D defined by (2). For each sample point x*, a sequence of trajectories T*, i = 1,2,..., is computed by numerically solving the successive initial value problems: x(t) = -Vf(x(t)) (4) x(Q) = xl0 ; x(0) = x'Q This trajectory represents the motion of a particle of unit mass in an n-dimensional conservative force field, where the function to be minimized represents the potential energy. Trajectory Tl is terminated when x(t) reaches a point where f(x(t)) is arbitrarily close to the value f(x'0) while moving "uphill", or more precisely, if x(t) satisfies the
126 conditions f(x(t)) and
> / ( x j ) - eu
x(t)TVf(x(t))
(5) >0
where eu is an arbitrary small prescribed positive value. An argument is presented in [5] to show that when the level set {x\f(x) < f(xl0)} is bounded and V / ( X Q ) ¥" 0, then conditions (5) above will be satisfied at some finite point in time. Each computed step along trajectory T1 is monitored so that at termination the point x'm at which the minimum value was achieved is recorded together with the associated velocity xlm and function value fa. The values of x%m and x'm are used to determine the initial values for the next trajectory T'+l. From a comparison of the minimum values the best point x\, for the current j over all trajectories to date is also recorded. In more detail the minimization procedure for a given sample point x J , in computing the sequence x\, i = 1,2,..., is as follows.
2.2
Minimization procedure MP1
1. For given sample point xj, set x\ := xj and compute T 1 subject to x\ := 0 ; record xlm, xlm and fa ; set x\ := xxm and i := 2, 2. compute trajectory T* with x'0 := \ (x'f1 + sc^M and x'0 := j i ^ 1 , record a4,a4and/4, 3. if fa < }{x\-1)
then x\ := <
; else x\ := x j " 1 ,
4. set i:— i + 1 and go to 2. In the original paper [5] an argument is presented to indicate that under normal conditions on the continuity of / and its derivatives, x\ will converge to a local minimum. Procedure MP1, for a given j , is accordingly terminated at step 3 above if | | V / ( x j ) | | < e, for some small prescribed positive value e, and x\ is taken as the local minimizer x^, i.e. set x^ := x\ with corresponding function value fj := f{x3,). Reflecting on the overall approach outlined above, involving the computation of energy conserving trajectories and the minimization procedure, it should be evident that, in the presence of many local minima, the probability of convergence to a relative low local minimum is increased. This one is expected because, with a small value of tu (see conditions (4)), it is likely that the particle will move through a trough associated with a relative high local minimum, and move over a ridge to record a lower function value at a point beyond. Since we assume that the level set associated
127 with the starting point function is bounded, termination of the search trajectory will occur as the particle eventually moves to a region of higher function values.
3
The modified bouncing ball trajectory method
The essentials of the modified bouncing ball algorithm using dynamic search trajectories for unconstrained global minimization are now presented. The algorithm is in an experimental stage, and details concerning the motivation of the method, its detailed construction, and computational aspects will be presented in future.
3.1
Dynamic trajectories
In the MBB algorithm successive sample points x*,j = 1,2,..., are selected at random from the box D defined by (2). For each sample point asJ', a sequence of trajectory steps Ax' and associated projection points xl+l, i = 1,2,..., are computed from the successive analytical relationships (with x1 := x' and prescribed Vo, > 0):
A*' = IV.cosfliV/^/liy/V)
(6)
where tan-1(||V/(*i)||) + ^ ,
U
1
V0i smOi + {(V0i shift) 2 + 9 L
(7) 1/2
2gh(xi)}
(8)
fc(x') = /(«') + *
(9)
with k a constant chosen such that h(x) > 0 V a; £ D, g & positive constant, and xi+1 = x* + Ax1
(10)
For the next step, select Voi+1 < Vor Each step Ax' represents the ground or horizontal displacement obtained by projecting a particle in a vertical gravitational field (constant g) at an elevation ^(a:1) and speed Voi at an inclination ft. The angle ft represents the angle that the outward normal n to the hypersurface represented by y = h(x) makes, at x' in n + 1 dimensional space, with the horizontal. The time of flight ti is the time taken to reach the ground corresponding to y = 0. More formally, the minimization trajectory for o given sample point x' and some initial prescribed speed V0 is obtained by computing the sequence x\ i = 1,2,..., as follows.
128
3.2
Minimization procedure MP2
1. For given sample point x^, set x 1 := x J and compute trajectory step A x 1 according to (6) - (9) and subject to VQ1 := Vo; record x2 := x1 + Ax1, set i := 2 and V02 := aV0l (a < 1) 2. Compute Ax1 according to (6) - (9) to give x1+1 := x' + Ax\ set V0j+1 := aV0i
record x I + 1 and
3. set i := i + 1 and go to 2 In the vicinity of a local minimum x the sequence of projection points x\ i = 1,2,..., constituting the search trajectory for starting point xj will converge since Ax1 -¥ 0 (see (6)). In the presence of many local minima, the probability of convergence to a relative low local minimum is increased, since the kinetic energy can only decrease for a < 1. Procedure MP2, for a given j , is successfully terminated if | | V / ( x 2 ) | | < e for some small prescribed positive value e, or when aVJ < fiVg, and x 1 is taken as the local minimizer x^ with corresponding function value fj := /i(xj-) — k. Clearly, the condition aVg < fiV^ will always occur for 0 < /3 < a and 0 < a < 1.
4
Global stopping criterion
The above methods require a termination rule for deciding when to end the sampling and to take the current overall minimum function value / , i.e. / = minimum Ifj,
over all j to date }
(11)
as the global minimum value /*. Define the region of convergence of the dynamic methods for a local minimum x as the set of all points x which, used as starting points for the above procedures, converge to x. One may reasonably expect that in the case where the regions of attraction (for the usual gradient-descent methods, see [9]) of the local minima are more or less equal, that the region of convergence of the global minimum will be relatively increased. Let R/c denote the region of convergence for the above minimization procedures MP1 and MP2 of local minimum xk and let ak be the associated probability that a sample point be selected in Rk. The region of convergence and the associated probability for the global minimum x* are denoted by R* and a* respectively. The following basic
129 assumption, which is probably true for many functions of practical interest, is now made. A. Basic assumption: a* > a^ for all local minima x . The following theorem may be proved. B. Theorem (Ref. [5]): Let r be the number of sample points falling within the region of convergence of the current overall minimum / after n points have been sampled. Then under assumption A and a statistically non-informative prior distribution the probability that / corresponds to /* may be obtained from
^"•M'l-^
(I2)
On the basis of this theorem the stopping rule becomes: STOP when Pr I / = /* I > q*, where q* is some prescribed desired confidence level, typically chosen as 0.99.
5
Numerical results No. 1 2 3 4 5 6 7 8 9 10 11 12
Name Griewank Gl Griewank G2 Goldstein-Price Six-hump Camelback Shubert, Levi No. 4 Branin Rastrigin Hartman 3 Hartman 6 Shekel 5 Shekel 7 Shekel 10
ID Gl G2 GP C6 SH BR RA H3 H6 S5 S7 S10
n 2 10 2 2 2 2 2 3 6 4 4 4
Ref. [2,4] [2,4] [2, 10] [2, 11] [12] [2, 13] [2] [2, 10] [2, 10] [2, 10] [2, 10] [2, 10]
Table 1: The Test Functions The test functions used are tabulated in Table 1, and tabulated numerical results are presented in Tables 2 and 3. In the tables, the reported number of function values Nf are the average of 10 independent (random) starts of each algorithm. Unless otherwise stated, the following settings were used in the SF algorithm (see [5]): 7 = 2.0, a = 0.95, e = 10" 2 , w = 10~2, 5 = 0.0, q* = 0.99, and At = 1.0. For
130
No. 1 2 3 4 5 6 7 8 9 10 11 12
SF-- This Study ID N, (r/n)b (r/n)w 4199 6/75 Gl 6/40 G2 25969 6/84 6/312 2092 5/12 GP 4/4 C6 426 4/4 5/9 6/104 8491 6/29 SH 3922 5/12 BR 4/4 4799 RA 6/67 6/117 H3 933 4/4 5/8 1025 4/4 H6 5/10 6/24 1009 S5 4/4 S7 1057 6/37 5/8 6/31 SlO 845 4/4
Ref. [5]
Nf
r/h
1606 6/20 26076 6/60
668 263 — — — 563 871
4/4 4/4 — — — 5/6 5/8
1236 6/17 1210 6/17 1365 6/20
MBB (r/h)b (r/h)w 2629 6/23 5/8 Nf
19817
6/24
592 213
4/4 4/4 5/7 4/4 4/4 5/9 4/4 5/8
1057
286 1873
973 499 2114 2129 1623
6/16
5/7
6/69 5/10 5/10 6/26
5/6 6/42 6/29
5/9 6/39 6/47 6/39
Table 2: Numerical Results
Method TRUST MBB
Test Function C6 G P R A SH 55 31 103 59 72 25 29 74 168 171
BR
H3 58 24
Table 3: Cost (Nf) using a priori Stopping Condition
the MBB algorithm, a = 0.99, e = 10~ 4 , and q* = 0.99 were used. For each problem, the initial velocity VQ was chosen such that Aa; 1 was equal to half the 'radius' of the domain D. A local search strategy was implemented with varying a in the vicinity of local minima. In Table 2, (r/h)b and (r/h)w respectively indicate the best and worst r/h ratios (see equation (12)), observed during 10 independent optimization runs of both algorithms. The SF results compare well with the previously published results by Snyman and Fatti, who reported values for a single run only. For the Shubert, Branin and Rastrigin functions, the MBB algorithm is superior to the SF algorithm. For the Shekel functions (S5, S7 and SlO), the SF algorithm is superior. As a result of the stopping criterion (12), the SF and MBB algorithms found the global optimum between 4 and 6 times for each problem. The results for the trying Griewank functions (Table 2) are encouraging. G l has some 500 local minima in the region of interest, and G2 several thousand. The values used
131 for the parameters are as specified, with At = 5.0 for G l and G2 in the SF-algorithm. It appears that both the SF and MBB algorithms are highly effective for problems with a large number of local minima in D, and problems with a large number of design variables. In Table 3 the MBB algorithm is compared with the recently published deterministic TRUST algorithm [14]. Since the TRUST algorithm was terminated when the global approximation was within a specified tolerance of the (known) global optimum, a similar criterion was used for the MBB algorithm. The table reveals that the two algorithms compare well. Note however that the highest dimension of the test problems used in [14] is 3. It is unclear if the deterministic TRUST algorithm will perform well for problems of large dimension, or problems with a large number of local minima in D.
6
Conclusions
Two stochastic global optimization methods based on dynamic search trajectories are presented. The algorithms are the Snyman-Fatti trajectory method and the modified bouncing ball trajectory method. Numerical results indicate that both algorithms are effective in finding the global optimum efficiently. In particular, the results for the trying Griewank functions are encouraging. Both algorithms appear effective for problems with a large number of local minima in the domain, and problems with a large number of design variables. A salient feature of the algorithms is the availability of an apparently effective global stopping criterion.
References [1] Schoen, F. (1991), "Stochastic Techniques for Global Optimization: A Survey of Recent Advances," J. Global Optim., 1, 207-228. [2] Torn, A. and Zilinskas, A. (1989), Global Optimization: Lecture Notes in Computer Science, 350, Springer-Verlag, Berlin. [3] Arora, J.S., El-wakeil, O.A., Chahande, A.I. and Hsieh, C.C. (1995), "Global Optimization Methods for Engineering Applications: A Review," Preprint. [4] Griewank, A.O. (1981), "Generalized Descent for Global Optimization," J. Optim. Theory Appl., 34, 11-39. [5] Snyman, J.A. and Fatti, L.P. (1987), "A Multi-start Global Minimization Algorithm with Dynamic Search Trajectories," J. Optim. Theory Appl., 54, 121-141.
132 [6] Dixon, L.C.W., Gomulka, J. and Szego, G.P. (1975), "Towards a Global Optimization Technique", Towards Global Optimization, (Dixon, L.C.W. and Szego, G.P., Eds.), North-Holland, Amsterdam, 29-54. [7] Snyman, J.A. (1982), "A New and Dynamic Method for Unconstrained Minimization," Appl. Math. Modeling, 6, 449-462. [8] Snyman, J.A. (1983), "An Improved Version of the Original Leap-Frog Dynamic Method for Unconstrained Minimization: LFOPl(b)," Appl. Math. Modeling, 7, 216-218. [9] Dixon, L.C.W., Gomulka, J. and Hersom, S.E. (1976), "Reflections on the Global Optimization Problem," Optimization in Action, (Dixon, L.C.W., Ed.), Academic Press, London, 398-435. [10] Dixon, L.C.W. and Szego, G.P. (1978), "The Global Optimization Problem: An Introduction," Towards Global Optimization 2, (Dixon, L.C.W. and Szego, G.P., Eds.), North-Holland, Amsterdam, 1-15. [11] Branin, F.H. (1972), "Widely Used Convergent Methods for Finding Multiple Solutions of Simultaneous Equations," IBM J. Research Develop., 504-522. [12] Lucidi, S. and Piccioni, M. (1989), "Random Tunneling by means of Acceptancerejection Sampling for Global Optimization," J. Opt. Theory Appl., 62, 255-277. [13] Branin, F.H. and Hoo, S.K. (1972), "A Method for Finding Multiple Extrema of a Function of n Variables," Numerical Methods of Nonlinear Optimization, (Lootsma, F.A., Ed.), Academic Press, London, 231-237. [14] Barhen, J., Protopopescu, V. and Reister, D. (16 May 1997), "TRUST: A Deterministic Algorithm for Global Optimization," Science, 276, 1094-1097.
Combinatorial and Global Optimization, pp. 133-144 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
On Pareto Efficiency. A General Constructive Existence Principle G. Isac ( isac-gQrmc . ca) Department of Mathematics and Computer Science Royal Military College of Canada PO. Box 17000 STN Forces Kingston, Ontario, K7K 7B4 Canada
Abstract We present in this paper a general constructive existence principle for Pareto efficiency. This principle is presented in sequentially complete locally convex spaces. In the second part of the paper we present several realizations of this general principle in some particular cases. This paper can be considered as a starting point for new investigations. Keywords: Constructive existence principle and Pareto efficiency.
1
Introduction
The importance of Pareto efficiency, in vector optimization theory, with respect to a cone, is well known. An important problem in vector optimization is the identification of efficient points of a set. Consider the general case, i.e., the case of infinite dimensional vector spaces. Let F and E be topological vector spaces. Suppose that E is ordered by a closed pointed
134
convex cone K C E. Given a non-empty subset X C F and a function f : X —t E, consider the vector optimization problem j maximize \ x €X
f(x)
,..., [ '
For problem (1), we are interested in finding all solutions that are efficient, i.e., the elements of the set
Xe = eff(f(X),K)
=
{x0£X\f(x)-f(x0)£K,
for some x £ X implies f(x) = /(xo)} We can show that x0 € Xe if and only if f(X)
n (K + f(x0))
= {/(*„)}
(2)
We have that /(zo) is a K-support point for f(X) in the sense of Bishop and Phelps [4]. There also exist two important variants of efficiency. For any set A C E we denote by cone(A) = [JtA and by cZ[cone(A)] the closure of cone(A). A point x0 is said to be t>o
a properly efficient solution of (1) if XQ € Xe and cl [cone (f(X) — K — {f(xQ)})]nK {0}.
=
If we denote by Xpe the set of all properly efficient solutions for (1) we have that Xpe C Xe. For more details about the properly efficiency the reader is referred to [1], [2], [3], [15], [16] and [20]. If E* is the topological dual of E, we denote by
K* = {4>e E*\4>(k) > o Vifc e K \ { o } } . Suppose that K is such that K " is non-empty. We say that xo € X is a positive proper efficient solution of (1) if there exists <j> G K " such that <j>(f(x0)) >
135
2
Preliminaries
Let E{T) be a topological vector space. We say that a subset K C E is a pointed convex cone if the following properties are satisfied: (kl)
A K C K for all A G R+,
(k2)
K + KCK,
(k3)
K n ( - K ) = {0}.
The topological dual of E will be denoted by E*. By definition, the dual cone of K is K* = {y G E*\{x, y) > 0 for all x G K } where (,) is a duality between E and E*. We say that the cone K is well based if there exists a bounded convex set B such that 0 0 B and K = (J XB. We denote by B the topological closure of B. If the cone A>0
K is well based we can show that K has a bounded base, i.e., there exists a bounded convex set B j C K such that for every x G K \ { 0 } , there exists a unique real number \ x > 0 and a unique element bx G B0 such that x = \xbx. We say that E(T) is a locally convex space if the topology r is defined by a sufficient family of seminorms {P*}A<EA> * e ' t n e family {PAIAGA satisfies the following properties: (si) for any x G E, x ^ 0, there exists A G A such that p\{x) ^ 0, and (s2) for any Ai, A2 G A there exists A3 G A such that p^(x),p\2(x)
for all i e £ .
If {E(T), {P\}\EA) is a locally convex space then a closed pointed convex cone is well based, if and only if, there exists, / G K* such that for each A G A there exists a real number 5\ > 0 such that 6\p\(x) < f(x) for all i £ K . We say that the cone K is solid if its topological interior is non-empty. A non-empty subset D of E is said to be K-bounded if D C y - K for some y G E. Denote by < the ordering defined by K, i.e., x < y if and only if y — x G K. If A is a non-empty subset of E, we say that an element x0 G A is a K- efficient point (or a K-conical support point) if and only if A n (K + x0) = {x0} . Let D C E be a non-empty subset. We say that a set-valued mapping V : D —> 2D is a generalized dynamical system if for each x £ D, T(x) is non-empty. A point xt G D is said to be a critical point for r if T(a:») = {a;.}. We observe that x, G D is a K-emcient point if and only if x, is a critical point for the generalized dynamical system T(x) = (x + K) n D, x G D. We say that a locally convex space {E(T), Cauchy sequence in E is convergent.
{PA}ASA) is sequentially complete if every
136
3
The main result
The main result in this paper is the following general constructive existence test for K-efficient points, for a non-empty closed subset. Theorem 1 Let (E(T), {PA}AGA) be a sequentially complete locally convex space, K c E a pointed, closed convex cone and D C E a non-empty closed subset. Consider the generalized dynamical system T : D —¥ 2D defined by T(x) = (x + K) n D. Given XQ G D, if there exists a sequence {xn}^=0 C D such that xn+\ G r(a;„) for all n = 0, 2, 3 , . . ..and lim^.^, 6\(r(xn)) = 0 for all A e A, where 6X{A) = sup{px(x - y)\x,y G A}, OO
then, D has a ~K-efficient point xt such that x0 < xt. Moreover, P | r ( x n ) = {i*} and n=0
nmn_^00 xn = x^. Proof First we remark that r ( x n + 1 ) C T(xn), for all n = 0, 2, 3 , . . . . Indeed, if x G F(xn+i) = (x„+i + K) fl £>, then x = xn+i + k\, with k\ G K and x G D. Since xn^.\ G T(x n ), there exists ^ ^ K such that xn+i = xn + k2. Thus, we have x G D and x = xn + k\ + k% G xn + K, that is x G T(xn). Therefore { r ( x n ) } ^ _ 0 is decreasing. The sequence {xn}^=0 is a Cauchy sequence. Indeed, since limn_too<SA(r(a:n)) = 0 for all A G A, we have that for each e > 0 and A G A, there exists n€ G N such that sup p\(x — y) < e, for all n > n€. x,yer(xn) Because xm G T(xm-i) C T(xn) for all m,n G N such that m > n, P\{xm — Xn) < e, whenever m > n > nt. This implies that {xn}^L0 sequence in E(T). By the sequential completeness of E there exists a such that linin-jooXn = x„. Notice that xm G Y{xn) for all m > n,n
we have that is a Cauchy point x, G E = 0,1,2,....
oo
Hence we have xt G T(xn) for all n = 0 , 1 , 2 , . . . , i.e., x, G P | T(xn).
(We also used the
71=0
fact that T(xn) is closed for each n = 0,1, 2,....) The element x, is unique. Indeed, oo
assume that there exists w„ G f] T(xn) with u„ ^ x*. Since the family of seminorms n=0
{PA}A<EA is sufficient, we have that p\{x) = 0 for all A G A implies x = 0. Hence, because u* — x, ^ 0, there exists some A G A such that px(u* — x*) = a > 0. Since lim n ^ 0 0 5A(r(x n )) = 0 for all A G A there exists n 0 G N such that 6\(r(xno)) < a. In view of x*,u, G Y(xno) we have PA(M» - xt)5x{r(xno))
< px(ut
-
xt),
137 oo
which is a contradiction. Therefore we have that f] V(x) = {x,} . We must show n=0
that r(xt) — {x„}. Indeed, suppose that x E T(xt). We have x = x* + k, with k G K, and x 6 D. Since x„ G r(a;„) for all n = 0,1, 2 , . . . , we deduce that x„ — xn 6 K for all n = 0,1, 2 , . . . , which implies that x = x, + k = xn + k + ki G T(xn) for all oo
n — 0 , 1 , 2 , . . . , (where &i G K). It follows, x G P) r(a;„) = {a;,} . Evidently, we have n=0
x — x*. Since in particular xt € XQ + K, we have XQ < x„ and the proof is complete. • To apply Theorem 1 to particular problems, the art is the ability to construct the sequence with the properties indicated in Theorem 1.
4
Realizations of Theorem 1
We preset now in this section some realizations of the general principle expressed by Theorem 1. Let (E, || ||) be a Banach space and K C E a solid, closed pointed convex cone. For any ko 6 Int(K) and A E]0, 1[ we can define the set K(A, ko) = {x € E\x — A||x||fe0 € K}. We can show that K(A, ko) is a closed pointed convex cone. Because, for every x G K(A, k0) there exists k € K such that x—A||x||A;o = k, it follows that x = A||a;||fco € K, i. e., K(A, k0) C K. The cone K(A, k0) was defined and used in [14]. The next result, proved by A. Gopfert and Chr. Tammer in [14], can be considered in particular as a realization of Theorem 1. Denote by
Let (E, || ||) be a Banach space, K C E a solid, closed pointed convex € Int(K) and A G]0,1[. If D C E is an arbitrary closed K.-bounded any XQ £ D there exists x, 6 Eff(Z3; K(A, k0)) such that x0
Proof We consider on D the generalized dynamical system F(x) = (K(A; k0) + x)nD, for all x e D. Let the sequence be obtained by the construction proposed by Gopfert and Tammer in [14], i.e., (i) Xi = x0 (ii) if defined we choose xn+\ G T(xn) such that x G T(xn) does not exist with x G xn+i + ^nk0 + (K\{0}).
138 In [14] it is proved that this sequence is well defined and for every n € N, T(a;r,+i) C r(xn). If x e r ( x „ + i ) is an arbitrary element we have that x € xn+i +K(A, k0), which implies that ( i — xn+i) — X\\x — a;n+i||A;o G K. We have x
€
x„ + i + A||;r-:r n+1 ||fco + K
(3)
Since r(ai„ + i) C T(xn), we have that x e T(xn) and considering (ii) we deduce x
$ xn+1 + - ^ — k0 + ( K \ { 0 » n+ 1
(4)
Hence, taking into account (4) we have X\\x — xn+i\\ < ^ , which implies that 5(T(xn+i)) < w 2 + 1 ) , where <5(r(a;n+i)) = diam{T(xn+i)). Applying Theorem 1 the theorem is proved. • We will show now that a result similar to Theorem 2 follows from a more general construction, certainly, modulo another definition of the sequence {xn}neN . T h e o r e m 3 Let (E,\\ ||) be a Banach space, e a strictly positive real number and K C E a closed pointed convex cone such that K C {x € E\^(x) + e\\x\\ < 0}, where "J : E —> R is a subadditive continuous mapping. Let D be a non-empty closed subset of E. Then, for every xa € D such that $> is bounded from below on (xa + K)DD, there exists x* e Eff(D;K) such that xa
€.T(xn).
If (i) is the case, we have that xn = xt. Suppose that (ii) is true. Since K C {x 6 E\ib(x) + e\\x\\ < 0} we deduce from (ii) and using the fact that * is subadditive, tf (a;) - * ( x n ) < *(a; -
xn),
•${x) - *(a; n ) + e||ar - xn\\ < * ( i - xn) + e\\x - xn\\ < 0, and
*(a; n ).
(5)
139 Thus, *(xn) -
inf
*(j/) > * ( i n ) - *(x) > 0.
We pick a point xn+\ G T(a;n) such that *(xn+1)
<
inf
*(!/) + - * ( i B ) -
"gr(3:„)
inf
Z
*(i/)
(6)
v€T(x„)
and we have T(xn+i) = (x n + 1 + K) n D, and so on. Since K is a convex cone we can show that T(xn+i) C r(a; n ) for all n = 0 , 1 , 2 , . . . . The sequence {r(z 7 l )}^ = 0 (if it is not stopped after a finite number of steps) is decreasing. From (6) we obtain 9(Xn+1)
-
inf *(!/) < *(Xn+1) ^6r(i„+i)
-
inf 9( ) < - * ( ! „ ) - inf mv) fsr(x„) V z ver(xn)
(7)
Let z G T(a;„), then applying (7) n-times we obtain *(*„)-tf(z)
<
#(z„)-
inf
*(i/)
^€r(x„)
< -
1 - *(i„_i)2
< -
1 —n * ( * « ) 2
*(i/) < A
inf "6r(i„_i)
inf
*(i/)
Since K C {x € £|\t(a;) + e||z|| < 0} and r(a;„) = (xn + K) n D, we have for any z G r ( x n ) , *(z) - W(xn) < # ( 2 - z„), which implies - # ( 2 - xn) < *(a;„) - ^ ( z ) and finally, |iB-z||
< M*(a;„)-*(«)] <
1 ¥ ( ! . ) - inf *(«/) e2n "/er(i0)
¥ ( i a ) - inf *(«/) which implies that <5(r(:rn)) -4 0 e2" as n —>• 00. The theorem is now a consequence of Theorem 1 and the proof is complete. • Therefore 5(T{xn))
<
In [11] J. D. Dauer and O. A. Saleh proved the following result. Theorem 4 Let E(T) be a topological vector space and K a solid pointed convex cone in E. If y € Int{K) then the mapping ^v(x) = inf{a G R|a; G ({-ay} + K ) } , for all x G E, is a continuous suhlinear functional on E which satisfies the following properties:
140 (1) Int(K)
= { i E E\
(2) K = {x(E E\Vy{x)
< 0},
(3) ^y is K-decreasing on E. Applying Theorem 4 we obtain the following variant of Theorem 3. Theorem 5 Let (E, \\ ||) be a Banach space, K C E a solid, closed pointed convex cone and \f „ the continuous sublinear mapping defined in Theorem 4- Let e be a strictly positive (eventually sufficiently small) real number. Consider the closed convex cone K.(^y) = {x £ E\^y(x) + e||a;|| < 0}. If D C E is an arbitrary K-bounded set, then for any xa € D there exists xt € Eff(Z); K(\E,!,)) such that xa < K ( * „ ) X„. Moreover xm is obtained as the limit of a sequence {xn}%L0 C D defined by the method given in Theorem 3. Proof We observe that K(\t ! / ) C K. Since D is K-bounded, we have that ^y is bounded from below on (xa + K($j / )) n D. (This is a consequence of property (3) of mapping tyy. The theorem is now a consequence of Theorem 3.) • Remark Because for a very small e > 0 the cone K(^ , !/ ) is, in some sense, very close to K, the K(3 , !/ )-emciency obtained by Theorem 5 is a kind of e-emciency. For e-efficiency the reader is referred to [19], [25] and [32], Now, we consider the case of locally convex spaces. Let [E(T), {px}xe\) D e a locally convex space as denned in this paper. If K c E is a well based pointed closed convex cone, then there exists / 6 K* with the property that for every A e A there exists ex > 0 such that K C {x € E\f(x) + EAPA(^) < 0}. We have the following result. Theorem 6 Let (E(T),{PX}X6A) be a sequentially complete locally convex space and K C E a closed pointed convex cone. Suppose that there exists a subadditive continuous mapping $ : E —> R with the property that for every A £ A there exists ex > 0 such that K C {x e E\V(x) + txPx(x) < 0}. Let D be a non-empty closed subset of E. Then, for every xa £ D such that \f is bounded from below on (xa + K ) n D , there exists x„ G Eff(£); K) such that xa < K 1 , . Moreover, i» is obtained as the limit of a sequence {xn}n£ff C D, well defined. Proof We consider on D the generalized dynamical system r(a;) = (x + K)flD, x £ D and we define a sequence {xn}^=0 C D inductively as in the proof of Theorem 3. We take x0 = xa. Suppose that xn £ D is defined and r(xfc +1 ) C T{xk) for all k = 0 , 1 , . . . , n — 1. We have two possibilities:
141 (i) either T(xn) = (xn + K) n D = {xn} or (ii) there exists x ^ xn with x e
T(xn).
If (i) is the case, we have that xn = x*. Suppose that (ii) is true. Because the family of seminorms {p\}\eh is sufficient, there exists Ao € A such that p\0(x — xn) / 0 (i.e., P\a{x — xn) > 0). Because ty is subadditive we have $(x) - tf (z„) + cXoPx0(x - xn) < m(x - xn) + eXo{x - xn) < 0, which implies * ( i ) < * ( x n ) - ex0Px0(x - xn) < * ( x n ) . Thus, V(xn) -
inf
V(v) > V(xn) - V(x) > 0.
We pick a point xn+i € T(xn) such that *(xn+i)<
inf * ( » + - * ( * „ ) ^er(o:n) ^
inf
*M
^6r(x„)
We have F(a; n+ i) = (xn+i + K) n Z? and r ( x n + 1 ) C T(xn) for all n = 0,1, 2, It is evident that *0»+i)-
inf
*(i/)
- *(iB)-
<
i/er(x„+i)
/
inf *(i/) "er(i n )
(8)
Let a; G T(a; n ). Applying (8) n-times we obtain * ( ! „ ) - * ( ! )
<
^
tffxJ
-
inf
VM
ver(x„)
Since, for every A £ A we have K
C
{x € E\V(x)
+ £APA(X) < 0}
we can show as in the proof of Theorem 3 that px{xn - x)
<
1 ex2n
*(*„)-
inf
Mu)
v£T{xa)
which implies that Sx(T{xn))
<
e A 2"
^er(a:0)
We have that <5A(r(£n)) -> 0 as n -> oo for all A e A. The theorem is now a consequence of Theorem l.D
142 Corollary 7 Let {E(T), {P\}\£A) be a sequentially complete locally convex space and K C E a well based closed pointed convex cone. Let D be a non-empty closed subset of E. Then, for every xa £ D such that (xa + K) n D is bounded, there exists x, € Eff(£);K) such that xa
References [1] BENSON, H. P.: The vector maximization problem: proper efficiency and stability, SIAM J. Appl. Math. 32 (1977), 64-72. [2] BENSON, H. P.: An improved definition of proper efficiency for vector minimization with respect to cones, J. Math. Anal. Appl. 79 (1979), 232-241. [3] BENSON, H. P.: Efficiency and proper efficiency in vector maximization with respect to cones, J. Math. Anal. Appl. 93 (1983), 273-289. [4] BISHOP, E. and PHELPS, R. R.: The support functionals of a convex set, In.: Proc. Symp. Pure Math., Amer. Math. Soc, Providence, R. I. (1962), 27-35. [5] BITRAN, G. R. and MAGNANTI, T. L.: The structure of admissible points with respect to cone dominance, J. Optim. Theory Appl. 29 Nr. 4 (1978), 573-614. [6] BORWEIN, J. M : On the existence of Pareto efficient points, Math. Oper. Res. 8 Nr. 1 (1983), 64-73. [7] CESARY, L and SURYANARAYANA, M. B.: Existence theorems for Pareto optimization, multivalued and Banach space valued functionals, Trans. Amer. Math. Soc. 244 (1978), 37-65. [8] CHEW, K. L.: Maximal points with respect to cone dominance in Banach spaces and their existence, J. Optim Theory Appl. 44 Nr. 1 (1984), 1-53.
143 CORLEY, H. W.: An existence result for maximization with respect to cones, J. Optim. Theory Appl. 31 Nr. 2 (1980), 277-281. DAUER, J. P. and GALLAGHER, R. J.: Positive proper efficient points and related cone results in vector optimization theory, SI AM J. Control Optim. 28 Nr. 1 (1990), 158-172. DAUER,J. P. and SALEH, O. A.: A characterization of proper minimal points as solutions of sublinear optimization problems, J. Math. Anal. Appl. 178 (1993), 227-246. DAURER, J. P. and STADLER, W.: A survey of vector optimization in infinitedimensional spaces, Part II, J. Optim. Theory Appl, 51 (1986), 205-241. GOPFERT, A, and TAMMER, CHR.: A new maximal point theorem (To appear) . GOPFERT, A, and TAMMER, CHR. :e-Approximate solutions and conical support points. A new maximal point theorem, (Preprint-(1997)). HARTLEY R.: On cone-efficiency, cone-convexity and cone-compactness, SI AM J. Appl. Math., 34 Nr. 2 (1978), 211-222. HENIG, M. I.: Proper efficiency with respect to cones, J. Optim. Theory Appl. 36 (1982), 387-407. ISAC, G.: Sur l'existence de l'optimum de Pareto, Riv. Math. Univ. Parma (4) 9 (1983), 303-325. ISAC, G.: Pareto optimization in infinite dimensional spaces: the importance of nuclear cones, J. Math. Anal. Appl., 182, Nr. 2 (1994), 393-404. ISAC, G.: The Ekeland's principle and the Pareto e-efficiency, In: M. Tamiz (Ed.): Multi-Objective Programming and Goal Programming, Lect. Notes in Econom. Math. Systems Nr. 432, Springer-Verlag (1996), 148-162. ISAC, G. and POSTOLICA, V.: The Best Approximation and Optimization in Locally Convex Spaces, Verlag Peter Lang, Frankfurt (1993). JAHN, J.: Existence theorems in vector optimization, J. Optim. Theory Appl. 50 Nr. 3 (1986), 397-406. JAHN, J.: Mathematical Vector Optimization in Partially Ordered Linear Spaces, Peter Lang, Frankfurt. LUC, D. T.: An existence theorem in vector optimization, Math. Oper. Res. 14 Nr. 4 (1989), 693-699.
144 [24] LUC, D. T.: Theory of Vector Optimization, Lecture Notes in Economics and Math. Systems, Vol. 319, Springer-Verlag, New York, Berlin (1989). [25] NEMETH, A. B.: Between Pareto efficiency and Pareto e-efficiency, Optimization 20 Nr. 5 (1989), 615-637. [26] PENOT, J. P.: L'optimization la Pareto: Deux ou trois choses que je sais d'elle, Publ. Math. Univ. Pau (1978). [27] PERESSINI, A. L.: Ordered Topological Vector Spaces, Harper and Row New York, Evanston and London, (1967). [28] PHELPS, R. R.: Support cones in Banach spaces and their applications, Advances in Math. 13 (1974), 1-19. [29] POSTOLICA, V.: Some existence results concerning the efficient points in locally convex spaces, Babes-Bolyai Univ. Faculty of Math. Seminar on Math. Anal. (1987), 75-80. [30] STAIB, T.: On two generalizations of Pareto minimality, J. Optim. Theory Appl. 59 Nr. 2 (1988), 289-306. [31] STERNA-KARWAT, A.: On existence of cone-maximal points in real topological linear spaces, Israel J. Math. 54 Nr. 1 (1986), 33-41. [32] TAMMER, CHR.: Existence results and necessary conditions for e-efficient elements, In: B. Brosowski, J. Ester, S. Helding and R. Nehse (Eds.), Multicriteria Deci- sion Proc. 14th Meeting of the German Working Group "Mehrkriterielle Entsheidung" , Peter Lang, Frankfurt (1993), 97-110. [33] WAGNER, D. H.: Semi-compactness with respect to an euclidean cone, Canad. J. Math. 29 Nr. 1 (1977), 29-36. [34] YU, P. L.: Cone convexity, cone extreme points and nondominated solutions in decision problems with multiobjectives, J. Optim. Theory Appl. 14 (1974), 319-377.
Combinatorial and Global Optimization, pp. 145-160 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Piecewise Linear Network Flow Problems Dukwon
Kim
([email protected])
Center for Applied Optimization Department of Industrial and Systems 303 Weil Hall, University of Florida Gainesville, FL 32611 USA
P a n o s M. P a r d a l o s
Engineering
(pardalosQufl.edu)
Center for Applied Optimization Department of Industrial and Systems 303 Weil Hall, University of Florida Gainesville, FL 32611 USA
Engineering
Abstract We consider the minimum cost network flow problem, and in particular the case with piecewise linear arc costs. We identify special subclasses of minimum cost network flow problems according to the type of piecewise cost functions, discuss their relationships, and present transformations and reformulations; The general piecewise linear minimum cost network flow problem (PLNFP) is formulated as a fixed charge network flow problem (FCNFP), and it is demonstrated that the indefinite, nonconvex, P L N F P is equivalent to a F C N F P with reduced feasible region. A major advantage of the formulations introduced here is that solutions can be found by solving fixed charge problems instead of solving difficult nonconvex optimization problems. K e y w o r d s : minimum concave cost network flow problem, piecewise linear minimum cost network flow problem, fixed charge network flow problem, minimum indefinite cost network flow problem, mixed integer formulations
146
1
Introduction
In this article, the minimum cost network flow problem [39] is considered with piecewise linear arc costs, so called the Piecewise Linear minimum cost Network Flow Problem (PLNFP). As a special subclass of minimum cost network flow problems, general piecewise linear network problems can be classified according to the type of piecewise cost functions. Using general network flow constraints, PLNFP can be stated as follows: Given a directed graph G = (N, A) consisting of a set N of m nodes and a set A of n arcs, then solve [PLNFP] fij(xa)
min f(x) = Y,
subject to
Y, x/a{k,i)eA
^2 x* = bi'
Vi € N
(!)
(i,k)eA
0 < x y < « y , V{i,j)EA
(2)
where / is separable and each / y is piecewise linear. For instance, the arc cost / y ^ y ) can be defined as follows:
c x
ij ij
Jij\xij)
"r siji
\j
_
x
ij
<
\j
— ' rT*>T.-4- «"^ ij lJ ij '
\Tii~l ij
< r < — l3 —
\Tii ij
where A*^ for k = 1 to r^ — 1 are breakpoints in the given interval [0, My]. The constraints in (1) are called the conservation of flow equations. The constraints in (2) are called capacity constraints on the arc flows. The problem is uncapacitated if ui:i = oo, V(i,j) e A.
147 Three specific classes of PLNFP are identified based on the arc costs, / y - as follows: 1. Convex PLNFP 2. Concave PLNFP 3. Indefinite PLNFP In some cases, the indefinite PLNFP is called discontinuous PLNFP, since it usually results from a set of discontinuities in the arc cost functions. Since the Fixed Charge Network Flow Problem (FCNFP) has a very close relation to the PLNFP, it is important to understand the special structure of FCNFP to solve PLNFP. Due to the global optimality property of concave minimization [32, 34], global solutions can be obtained at extreme points of feasible regions for the cases of FCNFP and concave PLNFP. A survey on minimum concave-cost network flow problems can be found in [15].
1.1
Convex P L N F P
Suppose the constraints are all linear, and the cost function to be minimized is separable and piecewise linear. Then the proportionality assumption is violated for the cost function. However, if the piecewise linear function is convex, the problem can still be modeled as an LP [28]. Let fij(xij) denote the contribution of Xy in the separable objective function. Suppose that there are r y — 1 breakpoints at which /y(xy) changes slope, such that
Let the slope in the subinterval A y _1 < aiy < Ay be c y for k = 1 to r y , and let yfj be the portion of xy lying in the kth subinterval, A y _1 to A y , (i.e., j/ t * is the length of the overlap of the interval 0 to x y with the subinterval Ay rl to Ay), k = 1 to r-y. When defined in this manner, the new variables yh,yfj, • • • ,Vij' partition a;y such that *tf = V y + t / y + ••• + «#'• (3) These variables are subject to the constraints: 0 <
Vh
< %j
0 <
4
< Ay- - Ay.
l
1
o < yT < xr - xr 0 <
2/1/ < A ^ - A ^ " 1
(4)
148 and for every t, if y\j > 0, then each of j/ y is equal to its upper bound
(5)
A*. - A*"1, V k < t. Defining the new variables as shown above, it is clear that fij(xij) is equal to cj.-tA + . .. + q!J yi". If the original separable piecewise linear objective function to be minimized is continuous and convex so that it holds the following increasing conditions
4 <4 <...<#,
(6)
constraints on the new variables of the type (5) can be ignored in the transformed model. Since convex PLNFP is reformulated as an LP using the above technique and has specially structured network constraints, it can be solved efficiently in polynomial time. If fij is not continuous or the slopes do not satisfy the condition (6), then the constraints (5) must be specifically included in the model. Since these constraints are not linear, the transformed model is no longer an LP.
1.2
FCNFP
Due to the similarity of its structure with the piecewise linear case, the Fixed Charge Network Flow Problem (FCNFP) has close relations to PLNFP. It is very important and useful to investigate some features of FCNFP even if FCNFP does not belong to the class of PLNFP. The FCNFP is a special case of Minimum Concave Cost Network Flow Problems (MCNFP) [32], whose arc cost function has a discontinuity at the origin. The arc cost function fij(xij) of the FCNFP has a form
f.(x-
)= I
%i
i = ° >0,
(7) y !
where s y > 0 is a fixed cost for arc (i,j) G A. In many practical problems, the cost of an activity is the sum of a fixed cost and a cost proportional to the level of the activity. FCNFP is obtained by imposing a fixed cost of s y > 0 if there is positive flow on arc (i,j) and a variable cost c v . Due to the discontinuity of / y , the problem can be transformed to a 0-1 mixed integer programming problem by introducing n binary variables, indicating whether the corresponding activity is being carried out or not. Assuming Sy > 0, /y- can be replaced by
149 Jij
CijX^j ~h SijVij
with
Xii
> 0 and !/« = J J ff *jj ~ J
(8)
The above condition (8) can be incorporated into the capacity constraints to yield 0 < Xij < UijVij, yij 6 {0,1}. Hence we obtain the following formulation of the fixed charge network flow problem: [FCNFPJMIP
min
Yl (c
subject to Mx = b
(9)
0 < x^ < Uijyij,
(i,j)€A
(10)
!/ye{0,i},
(iJ)eA,
(ll)
where M is an m x n node-arc incidence matrix and 6 is an m-dimensional column vector. This P C N F P M / P can be solved by using any type of classical branch and bound algorithms that use LP relaxations [30]. These LP relaxations can be solved efficiently by existing linear network algorithms exploiting the special structure of their feasible domain [18]. As we can see in later sections, many concave and indefinite PLNFPs can be reduced to a FCNFP model by introducing new variables and modifying problem structures. It is noticed that FCNFP models reduced from original PLNFPs can be also transformed to a 0-1 mixed integer programming problem. Consequently, the size of the resulting model grows very fast even if the original PLNFP is just of medium size. This stimulates the reason that many researchers have developed new efficient schemes to improve their exact methods (especially the branch and bound method). Indeed, the computational effort and memory requirement to solve large-scale FCNFP models have been gradually reduced in various application areas. Yet, since there is a limitation for improving exact solution methods to solve the problem in practical sense, developing an effective approximate method is still in need.
1.3
Concave P L N F P
As different from the convex case, concave PLNFPs are more difficult to solve since we cannot use the same technique in the convex case to reduce the problem into an LP.
150
V (= ««)
*«
Figure 1: Example of a concave piecewise linear arc cost function. However, a concave PLNFP can be transformed to a fixed charge network problem in an extended network. The size of the extended network depends on the number of linear pieces in each arc cost function. An Arc Separation Procedure (ASP) is required to solve the problem in this way and ASP can be valid due to the concavity of arc cost functions. Let us consider an arc (i,j) £ A and its arc cost / „ , and suppose JV, has r „ linear pieces as defined previously. Then arc (i,j) can be separated into rtj arcs between nodes i and j for (i,j) 6 A. Each separated arc (i, j)k for k = 1 to r^ has a fixed charge cost function ft (see Figure 1) defined by
Jij\*l,)
| 4+4-Iy
if
Xij
= 0 > 0.
This extended network is denoted by Ge(N, Ae) where the number of arcs \Ae\ is given by
ne = \Ae\ = J2 rnAfter the ASP modification shown in Figure 2, the original concave piecewise linear objective function can be expressed as a sum of fixed charge arc cost functions as follows:
/(*) =
E (i,3)eA
fa(xii)
151
fi'W
ffrtl)
S>
<
Figure 2: Arc separation procedure.
= £ E/«(*«) (i,j)eA fc=i
= E X K 4 + 4)-
(12)
It is easy to see that the equality in (12) can not be true in general cases without a set of constraints to restrict a domain for each separated arc cost function. However, due to the following property from the concavity of arc cost functions:
4 > 4 > ... > c$ > 0,
(13)
the equality holds true in this case. More precisely, fij(xij) is equal to at most one arc cost among all separated arc costs between node i and j at the optimality of minimization problems. This argument can be generalized as the following theorem.
Theorem 1 Given an extended network described above, if a positive flow a;*,- is optimal for a minimum concave PLNFP and if Xjf1 < X\A < Ay- for 1 < q < rij, then it takes only one arc, {i,j)q (i.e. q-th arc) among all separated arcs between node i and j , (i,j)k for k = 1 to rtj. Proof: The same notation as the above is used in the proof. Let us consider an arc (i,j) e A with r y pieces of concave piecewise linear cost function / y . And denote that c y a n < i s% a r e slope and fixed charge of the kth piece of linear function for k = 1,2, ...,r y -, respectively. Notice that these slopes hold the property in (13). After ASP, there are r y separated arcs the flow can take. So, let us denote that flowk is the optimal flow of the fcth separated arc (i, j)k for K — i, z , . . . , r^j.
152 Firstly, it is clear that if x*j is less than or equal to the first breakpoint Ay, then the optimal flow cannot be split (i.e. flow1 = a;*.) since the cost of arc (i, j)1, f}j < f£, V Xij < X\j for k = 2 , 3 , . . . , r „ . Secondly, consider the following two cases when A?- < A^"1 < x*j < Xfj, and suppose flowp and flow9 are optimal flows in the extended network problem. Case I : flowp — e and flow9 = x*j — e > 0 for any positive e. Using a contradiction, suppose that the optimal flow is split into two arcs (i,j)p and (hj)9j 1 < V < 1 < rtj- Now, let us evaluate the difference of cost function values in the original PLNFP with x^ and the extended network problem with flowp and flow9:
x x [fm+fU h-c)]-fv( w £ c x £
= [4 + 4 + 4 + U ij - )1 ~ (4 + cijx*ij) >
0.
The inequality in the last term above is true since 4 contradicts to the optimality of flowp and flow9. Case I I : flowp = e = x*j and flow9 = 0.
^
an
d 4
*"- 4 "
Thus, it
Again, using a contradiction, let us evaluate the difference of cost function value between node i and j :
[fm+fiM-uMi) =
\sij + Hje) ~ \Sij + cijxij) lSij ' Hj^ijl ~ \Sij "T" cijxij)
=
>
0.
With the same reason in Case I, we have a contradiction to the optimality of flowp and flow9. Therefore, flow9 = x^ and the proof is completed. D Based on the theorem above, the original concave PLNFP can be reduced to a FCNFP with the objective function given in (12) and some extended network constraints corresponding to the constraints in (1). The resulting formulation of a concave PLNFP is shown as follows: [PLNFP]FC
min /(*)= £ £ 4 ( 4 ) (ij)€A *=1
subject to
£
£ 4 -
(l,i)€Ak=l
£ 5>« = fc> vi 6 jv (i,l)eAk=l
(i4)
153
4>0, o < i y < Uy,
\/{i,j)£A, fc = l,2,...,r y V(i, j ) e A
In the formulation, it is clear that any other set of constraints is not necessary to specify lower and upper bounds for separated arcs {i,j)k £ Ae, V i,j G N due to the above theorem. As a result, the solution of concave PLNFP can be found by solving the fixed charge network problem formulated as above. Thus, developing an efficient algorithm to solve a FCNFP (exactly or approximately) is a key to solve PLNFPs in this approach.
1.4
Indefinite P L N F P
Lastly, we consider a PLNFP with indefinite arc cost functions, which is the most difficult case in this class. The major difficulty to find exact solutions for indefinite PLNFPs is originated from the structure of their arc cost function. Obviously such cost functions are neither convex nor concave, and possibly have a finite set of discontinuities. However, due to the nature of real world applications of the model, we focus on two certain types of arc cost function in indefinite PLNFP models, so called staircase [6, 17, 24] and sawtooth [24] arc cost functions, respectively. Both arc cost functions have a very similar structure in overall shape, however, they have a different aspect at breakpoints (see Figure 1 and 2). It can be described in mathematical form as follows: • "Staircase" arc cost function: fif1^'1)
< fiMif1 + £)>
for
any £ > 0 and fc = 2, 3 , . . . , r y .
• "Sawtooth" arc cost function: / ^ ( A * - 1 - e) > /'(Ay" 1 ), for any e > 0 and k = 2, 3 , . . . , r y . Moreover, it is assumed that the property of slopes shown in (13) is still valid since it is a very general phenomenon in real applications. Note that extreme point solutions are not guaranteed in this case since objective functions are no longer concave. Now, we introduce an equivalent MIP formulation for the problem in a FCNFP model with some additional parameters and binary variables. Let us define the size of interval between adjacent breakpoints as AA*. = A*. - A*ri, V ( U ) € A ,
k = 1, 2 , . . . , r y ,
(15)
154
A / ( = u,j)
Figure 3: Staircase arc cost function.
Figure 4: Sawtooth arc cost function.
*J
155 and define the gap of function values at each breakpoint in arc cost functions as
A 4 = (4A*r1 + 4)-(c^ 1 A* r 1 + s*r1) =
4 - 4 " 1 + (c* - c*ri)A*-\
V(i, j ) e A, Vfc,
(16)
where sy- = 0 and Cy = 0 (also clearly Ady = sy-)- We now let Xy be the part of Xy that lies within level k (i.e. fcth subinterval), in the following sense: (0 4
ifXy
= 1 ^y-^y"1 [ AA*.
ifAy-^XylA^. if xy > A*-,
(17)
and we obtain the following equation for substitution into [ F C N F P ] M I P model. We then introduce new binary variables defined by *
f 1 \ 0
^
if A*"1 < Xy otherwise.
( V
} ;
Using (15) - (18), the indefinite PLNFP under consideration can be formulated as an MIP version of FCNFP as follows: mhl
£
m $>yXy.+Ady4)
(ij)6-4 fc=l
subject to constraints in (1) and V
(M) V(t, j ) , k = 1, . . . , Ty
(19) (20)
xy-i > AAy- 1 ^., V ( i J ) , fc = 2 , . . . , ry
(21)
2y = E ^ l 4 , Xy < AAy»y, 4>°> »y 6 {0,1},
V(t,j), A = l , . . . , r y V(t,j), fc = l , . . . , r y .
(22) (23)
It is noticed that combining one constraint from (20) and one from (21) yields AAy- 1 !^ < x ^ < AAf- 1 ^- 1 , which implies y% < y*" 1 , V(i, j ) , fc > 1. There is another approach to formulate the problem as a concave Minimum Cost Network Flow Problem model. In [24], Lamar described an equivalent formulation of MCNFP with general nonlinear arc costs (including the problem considered in this section) as a concave MCNFP on an extended network. The equivalence between the problems is based on converting each arc with an arbitrary cost function in the original problem into an arc with a concave piecewise linear cost function in series with a set of parallel arcs, each with a linear arc cost function (see [24] for details). Thus, the resulting problem is a concave MCNFP, which is different from the FCNFP formulation model shown above.
156
2
Applications
Piecewise linear network models have a number of applications in various areas such as transportation problems, location problems, distribution problems, communication network design problems, and economic lot-sizing problems. Due to the structure of objective functions (especially in concave and indefinite PLNFP models), many realworld situations listed above can be modeled as PLNFP. There are major applications in the following two fields which have been studied extensively by many researchers.
2.1
Transportation Problems
The first major field is transportation-related problems with concave cost functions including fixed charge cost functions. The concave cost functions in this field are usually assumed to be piecewise linear in many cases. A number of algorithms developed with different schemes and their computational results have been reported. These can be found in a limited list of references [2, 3, 4, 7, 11, 16, 19, 20, 23, 25, 26, 31, 35, 36]. The category of exact solution approach contains diverse techniques based on extreme point ranking, branch and bound, and dynamic programming methods. Since the problems can be formulated as MIP models, branch and bound approach with various branching schemes has been a major interest in the literature. Murty [29] introduced an extreme point ranking method for solving fixed charge problems. McKeown [27] extended Murty's method to avoid some degeneracy of the problem. Recently, Pardalos [33] discussed a range of enumerative techniques based on vertex enumeration or extreme point ranking. Rech and Barton [35] investigated a nonconvex transportation algorithm using branch and bound approach to problems with piecewise linear cost functions. These functions are approximated by a convex envelope and solved using out-of-kilter method. Bornstein and Rust [6] specialized this approach to the problem with staircase cost function, using successive linearizations of the objective function. Thach [38] proposed a method for decomposing the problem with a staircase structure into a sequence of much smaller subproblems. Lamar [25] developed a branch and bound approach for cases of capacitated MCNFP in which the costs consist of piecewise linear segments. The problem is formulated as an MIP, with the branching variables determining which linear cost region an arc flow falls into. Recently, Kim and Pardalos [22] developed a heuristic procedure for solving general FCNFP without formulating it as an MIP. The procedure is consist of solving a series of LPs to update slopes and searching extreme points of the convex feasible region
157 with the updated slopes. This approach provides a potential possibility of parallel implementation with different initial solutions to improve the quality of solutions. Some heuristic approaches can be found in [8, 11, 20, 23].
2.2
Location Problems
Another major application area is to solve location problems. Since the problem in this field is to locate facilities and determine the size of facilities to minimize total cost [13], it naturally involves fixed costs and/or piecewise linear costs. Since solution methods in this field have used network formulation in many cases, they are quite similar to those for solving FCNFP or PLNFP [1, 10, 12, 21, 37]. However, exploiting their certain problem structures, there are some Lagrangian approaches [5, 14] to the problems. Recently, Holmberg [17] proposed a decomposition and linearization approach for solving the facility location problem with staircase costs. A comparison of heuristic and relaxation approaches in this field can be found in [9].
3
Concluding Remarks
In this article, three categories of PLNFP are identified and formulated in general formats. Some properties of problems in each category are investigated to show the insight of problems including FCNFP. The concave PLNFP is formulated as a FCNFP in MIP structure exploiting the concavity of arc cost functions (see section 2.3). Moreover, the indefinite (nonconvex) PLNFP is also transformed to a FCNFP with a reduced feasible region. This implies that the extreme point solution of the transformed FCNFP may not be an extreme point solution of the original indefinite PLNFP. A major advantage of the formulations introduced here is that solutions can be found by solving fixed charge problems instead of solving difficult nonconvex optimization problems. As we can see in the transformation to FCNFP, the size of FCNFP is usually quite large because of new binary variables introduced in the model. Thus, developing an efficient algorithm for large-scale FCNFP can provide a key to solve concave and indefinite PLNFP in practice.
158
References [1] A K I N C , U . , AND KHUMAWALA, B.M.: 'An Efficient Branch and Bound Algorithm for t h e Capacitated Warehouse Location Problem', Management Science 2 3 (1977), 585-594. [2] BALAKRISHNAN, A., AND G R A V E S , S.C.: 'A Composite Algorithm for a Concave-Cost Network Flow Problem', Networks 19 (1989), 175-202. [3] BALINSKI, M . L . : 'Fixed-Cost Transportation Problems', Naval Research Quarterly 8 (1961), 41-54.
Logistics
[4] B A R R , R . S . , G L O V E R , F . , A N D K L I N G M A N , D . : 'A New Optimization Method for
Large Scale Fixed Charge Transportation Problems', Operations Research 2 9 (1981), 448-463. [5] BEASLY, J . E . : 'Lagrangean Heuristics for Location Problem', European Journal of Operational Research 6 5 (1993), 383-399. [6] BORNSTEIN, C . T . , AND R U S T , R.: 'Minimizing a Sum of Staircase Functions under Linear Constraints', Optimization 19 (1988), 181-190. [7] C A B O T , A . V . , AND E R E N G U C , S.S.: 'Some Branch-and-Bound Procedures for FixedCost Transportation Problems', Naval Research Logistics Quarterly 3 1 (1984), 145-154. [8] C O O P E R , L., AND D R E B E S , C : 'An Approximate Solution Method for the Fixed Charge Problem', Naval Research Logistics Quarterly 14 (1967), 101-113. [9] CORNUEJOLS, G., SRIDHARAN, R., AND THIZY, J . M . : 'A Comparison of Heuristics and Relaxations for t h e Capacitated Plant Location Problem', European Journal of Operational Research 5 0 (1991), 280-297. [10] DAVIS, P . S . , AND R A Y , T.L.: 'A Branch-and-Bound Algorithm for the Capacitated Facilities Location Problem', Naval Research Logistics Quarterly 16 (1969), 331-344. [11] DlABY, M.: 'Successive Linear Approximation Procedure for Generalized FixedCharge Transportation Problem', Journal of the Operational Research Society 4 2 (1991), 991-1001. [12] E F R O Y M S O N , M.A., AND R A Y , T . L . : 'A Branch-and-Bound Algorithm for Plant Location', Operations Research 14 (1966), 361-368. [13] FRANCIS, R.L., McGlNNIS, L . F . , AND WHITE, J . A . : Facility Layout and Location: An Analytical Approach, 2nd Ed., Prentice Hall, 1992. [14] G E O F F R I O N , A . , AND M C B R I D E , R.: 'Lagrangean Relaxation applied to Capacitated Facility Location Problems', AIIE Transactions 10 (1978), 40-47. [15] GuiSEWITE, G . M . , AND PARDALOS, P . M . : 'Minimum Concave-cost Network Flow Problems: Applications, Complexity, and Algorithms', Annals of Operations Research 2 5 (1990), 75-100.
159 G R A Y , P . : 'Exact Solution of the Fixed-Charge Transportation Problem', Research 19 (1971), 1529-1538.
Operations
H O L M B E R G , K.: 'Solving the staircase cost facility location problem with decomposition and piecewise linearization', European Journal of Operational Research 75 (1994), 41-61. H O R S T , R., AND P A R D A L O S , P . M . (eds.): Handbook of Global Optimization, Academic Publishers, 1995.
Kluwer
KENNINGTON J . , AND U N G E R , E . : 'A New Branch-and-Bound Algorithm for the Fixed-Charge Transportation Problem', Management Science 2 2 (1976), 1116-1126. K H A N G , D . B . , AND F U J I W A R A , O . : 'Approximate Solutions of Capacitated FixedCharge Minimum Cost Network Flow Problems', Networks 21 (1991), 689-704. KHUMAWALA, B.M.: 'An Efficient Branch-and-Bound Algorithm for t h e Warehouse Location Problem', Management Science 1 8 (1972), B718-B731. K I M , D . AND P A R D A L O S , P . M . : 'A Solution Approach to t h e Fixed Charge Network Flow Problem Using a Dynamic Slope Scaling Procedure' t o appear Operations Research Letters, 1997. KUHN, H . , AND BAUMOL, W . : 'An Approximative Algorithm for t h e Fixed Charge Transportation Problem', Naval Research Logistics Quarterly 9 (1962), 1-15. L A M A R , B. W . : 'A Method for Solving Network Flow Problems with General Nonlinear Arc Costs', in Du, D.-Z. and Pardalos, P.M. (eds.), Network Optimization Problems, World Scientific Publishing Co., 1993. LAMAR, B . W . : 'An Improved Branch and Bound Algorithm for Minimum Concave Cost Network Flow Problems', Journal of Global Optimization 3 (1993), 261-287. L A R S S O N , T . , M I G D A L A S , A . , AND R O N N Q V I S T , M., 'A Lagrangean Relaxation Ap-
proach to Capacitated Minimum Concave Cost Flow Problem', European Journal of Operations Research 7 8 (1994), 116-129 McKEOWN, P . : 'A Vertex Ranking Procedure for Solving the Linear Fixed-Charge problem', Operations Research 2 3 (1975), 1183-1191. M U R T Y , K . G . : Linear and Combinatorial
Programming,
John Wiley and Sons, 1976
M U R T Y , K . G . : 'Solving the Fixed Charge Problem by Ranking the Extreme Points', Operations Research 16 (1968), 268-279. NEMHAUSER, G.L., AND WOLSEY, L.A.: Integer and Combinatorial John Wiley and Sons, 1988.
Optimization,
P A L E K A R , U . S . , K A R W A N , M . H . , A N D Z I O N T S , S.: 'A Branch-and-Bound Method
for the Fixed Charge Transportation Problem', Management 1105.
Science 3 6 (1990), 1092-
160 [32] HORST, R., PARDALOS, P . M . , AND THOAI, N.V.: Introduction tion, Kluwer Academic Publishers, 1995.
to Global
Optimiza-
[33] P A R D A L O S , P . M . : 'Enumerative Techniques for Solving some Nonconvex Global Optimization Problems' OR Spectrum 10 (1988), 29-35. [34] PARDALOS, P . M . AND ROSEN, J . B . : Constrained Global Optimization: Algorithms and Applications, Lecture Notes in Computer Science 268, Springer Verlag, 1987. [35] R E C H , P . , AND B A R T O N , L.G.: 'A Non-Convex Transportation Algorithm', in Beale, E.M.L. (ed.), Applications of Mathematical Programming Techniques, The English Universities Press, 1970. [36] S A , G.: 'Concave Programming in Transportation Networks', Ph.D. dissertation, Sloan School of Management, M.I.T., 1968. 8 (1961), 41-54. [37] SOLAND, R . M . : 'Optimal Facility Location with concave Costs', Operations 22 (1974), 373-382.
Research
[38] THACH, P . T . : 'A Decomposition Method for the Min Concave Cost Flow Problem with a Staircase Structure', Japan J. Appl. Math. 7 (1990), 103-120. [39] ZANGWILL, W . I . : 'Minimum Concave-Cost Flows in Certain Networks', Science 14 (1968), 429-450.
Management
Combinatorial and Global Optimization, pp. 161-176 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Semidefinite Programming Approaches for MAX-2-SAT and MAX-3-SAT: computational perspectives E. de Klerk ( e . d e k l e r k O t w i . t u d e l f t . n l ) Department of Technical Mathematics and Informatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands. J.P. Warners (j . p. warnersStwi. tudelf t . nl) Department of Technical Mathematics and Informatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.
Abstract Semidefinite programming (SDP) relaxations - in conjunction with randomized rounding schemes - yield 7/8 and 0.931 approximation algorithms for MAX3-SAT and MAX-2-SAT respectively. In spite of these powerful theoretical results, it is not clear if SDP can be used as a practical tool for solving MAXSAT problems to optimality. In this regard, the usefulness of the SDP approach will ultimately depend on the ability to exploit sparsity in the SDP relaxations of large dimension. (The dimension corresponds to the number of variables and the sparsity is related to the number of clauses.) We present an investigation of sparsity issues for the SDP relaxations of Goemans and Williamson [7], and Feige and Goemans [6] for MAX-2-SAT. Moreover, we test a branch and cut procedure to solve MAX-2-SAT to optimality, where the dual of the SDP relaxation is solved by interior point methods in order to exploit sparsity. The idea of exploiting sparsity in this way was first investigated 'for other combinatorial optimization problems by Benson, Ye, and Zhang [2]. Finally, based on this numerical experience we discuss possible extensions to MAX-3-SAT using the 7/8 relaxation of Karlow and Zwick [12] for MAX-3-SAT. Keywords: Semidefinite programming, satisfiability, interior point algorithms
162
1
Introduction
An instance of the MAX-SAT problem is defined by a collection of Boolean clauses {C\,..., Ck}, where each clause is a disjunction of literals drawn from a set of variables {x\,... , £ „ } . A literal is either a variable x; or its negation -ix, for some i. Each clause has an associated nonnegative weight, and an optimal solution to a MAX-SAT instance is an assignment of truth values to the variables which maximizes the total weight of the satisfied clauses. M AX-p-SAT is a special case of MAX-SAT where each clause contains at most p literals. There are many algorithms available for the MAX-SAT problem, even though the MAX-2-SAT problem is already NP-complete. Algorithms for MAX-SAT include well-known procedures such as EDPL, and branch and cut; approximation algorithms include GRASP, GSAT, and recently semidefinite programming (SDP) (see e.g. [10] and the references therein). SDP became a powerful tool for MAX-2-SAT problems when Goemans and Williamson [7] proved that a 0.87856 approximation algorithm could be obtained by rewriting the MAX-2-SAT problem as a Boolean quadratic programming problem, and solving a convex SDP relaxation of the resulting problem, followed by a randomized rounding procedure. This rounding procedure may be seen as a heuristic which gives a solution with a quality guarantee, while the optimal value of the SDP gives a bound on the sub-optimality of this heuristic solution. If the heuristic solution is not shown to be optimal by the bound, then the relaxation must be tightened by adding suitable cuts. Recent numerical studies indicate that it is very hard to prove optimality for MAX-2-SAT by only tightening the SDP relaxation [10]. It seems necessary therefore to use the SDP relaxations in some branch and cut framework. Up to now, the bottleneck for such an approach has been that it is hard to exploit sparsity in the solution procedure of the SDP relaxations. In this paper, we show that one can solve the dual of the SDP relaxations efficiently, using the dual interior point method of Benson, Ye and Zhang [2]. Moreover, we show that the relaxed solutions can be used in a branch and cut scheme to solve MAX-2-SAT problems with 50 variables and up to 500 clauses to optimality in a few minutes on a workstation.
2
The SDP relaxation of MAX-2-SAT
The key in the SDP reformulation is to introduce new Boolean variables ( j / i , . . . , yn) S {—1,1}" and to express the number of unsatisfied clauses as a quadratic function of these new variables. Note that a clause Xi V Xj is satisfied if«/, + yj > 0, etc. Thus we can represent the satisfiability problem as a feasibility problem: find y € { - 1 , 1 } " so that Ay > 0,
163 where A is a suitable k x n matrix. Note that
(
1 or — 1
if clause i is satisfied I (1)
—3
otherwise
I
Letting e denote the vector of all-ones, it follows that (Ay — e)T(Ay — e) = 9unsat + (k — unsat) where unsat denotes the number of unsatisfied clauses for the assignment y of truth values. Using eTe = k, we have that unsat — - (yTATAy
- 2eT Ay),
and the MAX-2-SAT problem is simply to minimize this quantity over y G {—1,1}". Thus we have rewritten MAX-2-SAT as a Boolean quadratic programming problem: unsat* :- min j - (yT AT Ay - 2eT Ay) \ y e { - 1 , l } n | . Such problems have standard semidefinite relaxations with provable quality bounds [16]. The first step in deriving the relaxation is to remove the linear term in the objective by adding an additional Boolean variable yn+i € {—1,1} to obtain unsat* = nun { i {f ATAy - 2yn+1eTAy)
| (y, yn+1) e { - 1 , l } n + 1 } .
(2)
Note that the introduction of the auxiliary variable yn+\ does not change the optimal value of the optimization problem. Problem (2) can be further simplified by using the structure of A: Note that one has
J2{ATA)H = 2k, «=i
which shows that unsat* = min {^ (f
[ATA - diag (ATA)) y - 2yn+1eTAy
+ 2k) \ (y, yn+1) e { - 1 , l } n + 1 }
We can rewrite this as: ]-k + mm{yTWy
: y e {-1,1}-+'} ,
where y := [j/i,..., yn, 2/n+i] and W is the (n + 1) x (n + 1) matrix: ATA - diag (ATA) W:=\
-eTA
-ATe 0
(3)
164 Note that Wij can only be nonzero if Xi and Xj appear together in some clause. Thus the fraction of nonzeros of W will never exceed the ratio (k + n) : | n ( n + 1). For example, for a MAX-2-SAT instance with n = 100 variables and k = 400 clauses the upper bound on the density of W is 9.8%. We will show in later sections why this ratio is an important consideration when choosing an algorithm for solving the SDP relaxation. The SDP relaxation of (3) can be derived by rewriting it as unsat* := -k + mm J T r \WyyT)
: diag \yyT] = e | ,
where "Tr' denotes the trace operator. The SDP relaxation is now obtained by replacing the rank one, positive semidefinite matrix yyT by a positive semidefinite matrix X which is not restricted to have rank one. The relaxation therefore takes the form: unsat* > SDP* := -k + min {Tr (WX) : diag {X) = e, X > 0} , (4) 4 x where X > 0 means X is symmetric positive semidefinite. Note that all products j/ij/j are replaced by the matrix entries Xtj. Given a Choleski decomposition of X, say X = VTV, one can write Xtj = (vl)Tv^ where the vl's are the columns of V. This means that the product y^- is in fact relaxed to an inner product (vl)Tv^. This type of relaxation was originally suggested by Lovasz and Schrijver [13]. Goemans and Williamson [7] proved that k — unsat* •k^SDp-* * °'87856-
,% (5)
(We should note that the derivation of the model presented here is different from that in [7], and follows that of Van Maaren and Warners [14].) Also note that Wtj = 0 V(i, j) if all possible clauses are included for the n variables. In this case exactly | of the clauses are satisfiable, i.e. unsat* = \k. In this case one trivially has SDP* = unsat*, i.e. the SDP relaxation is exact. It is therefore reasonable to expect that the SDP relaxation will become even tighter as the ratio k/n grows. This can be observed from numerical experiments shown in Table 1, where the average ratio for the left hand side of expression (5) is given as a function of the number of clauses (for random MAX-2-SAT instances with 50 variables). Goemans and Williamson also proposed the following heuristic for use in conjunction with the SDP relaxation: • Solve the SDP relaxation (4) to obtain an e-optimal X =
VTV.
165
# clauses
k—unsat* k-SDP*
50
0.95658
100
0.97502
200
0.98538
500
0.98911
1000
0.99128
2000
0.99258
3000
0.99400
4000
0.99503
5000
0.99592
Table 1: The average quality of the SDP relaxation improves as the number of clauses grows (50 variables).
• Choose a random r € R " and normalize r. • Set x/i = 1 if rTvl > 0 or set j/, = — 1 otherwise. This randomized algorithm yields an approximate solution to MAX-2-SAT with expected objective value at least 0.87 times the optimal value.
3
Additional valid inequalities
There exist instances of MAX-2-SAT where the ratio in (5) is no better than 0.88889 [6]. The SDP relaxation (4) can be strengthened by adding a number of valid inequalities (cuts) proposed by Feige and Goemans [6]. The first set of inequalities (called triangle inequalities) are based on the observation that for any pair of indices (i,j) there holds: Vn+m + yn+Wj + yiyj
>
-i
-Vn+lVi ~ Vn+lVj + ViVj
>
-1
-Vn+lVi + Vn+lVj - ViVj
>
-1.
166 In the SDP relaxation these inequalities correspond to | n ( n — 1) additional linear constraints: Xn+i,i + Xn+ij -Xn+hi
+ Xij
>
—1
- Xn+1J + X{j
>
-1
•
(6)
—
Xn+i,i + Xn+ij — Xij > —1. Note that the first of these constraints can be rewritten as ^ T r (cy lfl+1 cfj iB+1 A-) - I1- > - 1 where the vector e j J j n + 1 has ones in the positions i, j and n + 1, and zero elsewhere. In other words, we have a constraint of the form Tr(AiX)>l,
(7)
where Ai is a rank one matrix containing only zero's and ± 1 . The other two constraints in (6) can be represented similarly. Feige and Goemans have shown that the addition of these inequalities improves the quality guarantee of the SDP relaxation from 0.87856 to 0.93109 (see (5)). A bound on the worst-case approximation is 0.94513, i.e. there exist problems where the ratio (5) is no larger than 0.94513. The constraints (6) form a subset of an even larger set of | n ( n - l)(n - 2) valid inequalities which follow from: VkVi + VkVj + ViV, > VkVi - VkVj ~ ViVj >
-1 -1
-VkVi + VkVj ~ ViVj > - 1 -VkVi - VkVj + ViVj > - 1 for each distinct triple of indices (i, j , k). The corresponding constraints in the SDP relaxation once again takes the form (7). The quality guarantee for the additional cuts remains 0.93109, but the worst-known behaviour now becomes 0.98462. In practice all these inequalities cannot be added beforehand because of the increase in problem size; it is more feasible to re-solve the SDP relaxation after having added (some of) the violated inequalities.
4
Solving the SDP relaxation of MAX-2-SAT
The SDP relaxations mentioned so far can be cast in the generic form minTr(WX)
167 subject to diag (X) Tr(AtX)
=
e
>
1,
i=
l,...,m
x y o, where the A^s are rank one matrices corresponding to the valid inequalities in Section 3. The associated dual problem is m
n
subject to m
D^
+ ^yiAi
+S
=
W
J/>0,7 6 R " , S ^ 0 , where D(j) denotes the diagonal matrix with the vector 7 £ R " on its diagonal. Note that the dual matrix S will have more or less the same sparsity structure as W, if the number of cuts m is small. Recall further that W will be sparse in general, as discussed in Section 2. This suggests to solve the dual problem instead of the primal in order to exploit this sparsity structure. Dual interior point methods are based on the dual logarithmic barrier function /6(S,i/)=logdet(S) + f:iog(»i), which can be added to the dual objective function in order to replace the (matrix) inequality constraints S >z 0 and y > 0. Thus one can solve a sequence of problems of the form 5Z Vi + Y, 7t + t*h(S, y) \
(8)
subject to m t=i
for decreasing values of fj, > 0. The projected Newton direction for this problem can be calculated from a positive definite linear system with coefficient matrix consisting
168 of four blocks (see the collected works [5, 1, 8, 4, 2]):
BT
C
where 'o' indicates the Hadamard (componentwise) product, and the blocks B and C are respectively of the form
Bij = Tr [AiS^ejejS-1)
=
ejS^AiS^ej
where e, is the j t h standard unit vector, and
dj = Tr (AiS^AjS-1) Once S'1 is known, the computation of [S~ J oS^1}^, multiplication and some additions.
. J3y and C„ all require only one
The matrix M can therefore be assembled quickly once the inverse 5 _ 1 has been computed. Detail of how to compute 5 _ 1 efficiently in general is given by Benson, Ye, and Zhang in [2]. They have implemented a dual scaling method (using the search direction described above 1 ) which requires 0{s/m + n) iterations for convergence. The computation per iteration is dominated by the solution of the linear system with coefficient matrix M. This algorithm is used in the numerical experiments below. To give an impression of how fast the relaxed problem can be solved using this approach, the CPU-times (in seconds) for MAX-2-SAT relaxations of some of the largest benchmark problems from [9] is given in Table 2. The computation was done on a HP 9000/715 workstation. The column 'solution' gives the best obtained solution for the Goemans Williamson heuristic, upper gives the bound k — SDP*, and 'ratio' indicates the ratio of the best obtained heuristic solution to k — SDP*.
5
A branch and cut framework
The SDP relaxations can be used in any branch and cut framework. The framework we have used for our numerical experiments will be described in this section, with reference to Figure 1. 1 The dual scaling method chooses the parameter fi in (8) dynamically by monitoring the progress of the algorithm via the so-called Tanabe-Todd-Ye potential function. For details the reader is referred to [2].
169
problem
clauses
variables
solution
upper
ratio
time sdp
time heuristic
p2300-l
300
99
285
292
0.9762
4.09
0.32
p2300-2
300
101
285
294
0.9710
4.08
0.35
p2300-3
300
101
287
293
0.9808
3.91
0.34
p2300-4
300
101
286
293
0.9772
4.02
0.35
p2300-5
300
101
284
293
0.9694
4.90
0.34
p2300-6
300
101
283
290
0.9774
4.38
0.35
p2300-7
300
101
279
288
0.9709
4.74
0.36
p2300-8
300
101
283
291
0.9757
4.11
0.36
p2300-9
300
101
284
291
0.9761
4.26
0.34
p2400-l
400
101
369
379
0.9761
5.26
0.40
p2400-2
400
101
371
380
0.9780
5.13
0.41
p2400-3
400
101
373
383
0.9764
4.37
0.39
P 2400-4
400
101
371
378
0.9827
4.65
0.40
p2400-5
400
101
370
379
0.9780
5.22
0.39
p2400-6
400
101
372
380
0.9796
4.82
0.41
p2400-7
400
101
373
382
0.9788
4.44
0.40
p2400-8
400
101
366
376
0.9748
4.77
0.40
Table 2: Solution times for the SDP relaxation of MAX-2-SAT and for the GoemansWilliamson heuristic.
At any node in the branching tree, the current set of clauses (obtained after partial assignment of the variables) is denoted by $, and lb and ub contain lower and upper bounds on the minimal number of unsatisfiable clauses respectively. The value unsat is a counter for the number of unsatisfied clauses by the current partial assignment. Note that lb and unsat are local variables that are only valid in the current branch; on the other hand, ub is a global variable which is valid for the whole search tree. At termination of the procedure, ub contains the optimal value of the instance. Before calling node.procedure the values lb, unsat, and ub must be initialized. One can take lb := 0, unsat := 0, ub := k. Following Borchers and Furman, unit resolution is applied if ub — unsat = 1. Subsequently, the semidefinite relaxation of the current
procedure node_procedure ($, lb, unsat); if (ub — unsat = 1) unit_resolution($); (lbsltp, ubsdp)~ SDP-.relaxation(<J>);
(*)
ub := mm{ub, unsat + ubadp};
(*)
lb~max{lb,
(*)
unsat 4- max{0, Z6sdp}};
if (ub — lb < 1) return; x :=branch_rule($); Set x «— TRUE and update $, unsat; if (u6 — unsat > 1) node_procedure ($, Z6, wrasai); Set x 4- FALSE and update $, unsat; if (u6 — unsat > 1) node.procedure ($, Z6, unsat); return;
Figure 1: Branch and cut framework for MAX-2-SAT
171 formula is solved to obtain upper and lower bounds ubslip and lbsip. The current bounds ub and lb are then updated (taking unsat into account), and if ub — lb < 1, then the best known solution so far cannot be improved upon in the current branch, so that we backtrack. Otherwise a variable x is determined to branch on, which is set to TRUE and FALSE respectively. The branching rule for fixing variables is as follows: choose the variable with the highest occurrence in the longest clauses. The formula $ and unsat are subsequently updated; if ub — unsat < 1 this branch need not be further explored. In the other case node_procedure is recursively called.
In each node, the steps marked (*) can be repeated adding violated cuts from Section 3 to the relaxation, to obtain tighter bounds.
6
Numerical experiments
In this section we present some numerical results for the branch and cut SDP algorithm of the previous section. The results presented here are of a preliminary nature, and were obtained without adding extra cuts. The MAX-2-SAT benchmark problems are taken from Borchers and Furman [3] (except for the two largest instances). As before, all reported CPU-times are in seconds on a HP-9000/715 workstation with 160MB internal memory. The SDP branch and cut method presented here are compared to a modified EDPL algorithm [3] and to the Mixed integer linear programming approach using the commercial solver Minto [15]. The respective CPU-times are shown in Table 3. It is immediately clear that the SDP approach is distinctly superior to the other two approaches if the clauses/variables ratio exceeds 4:1. The reason is that the SDP relaxation becomes tighter as this ratio grows, as discussed in Section 2. Note also that the SDP branch and cut algorithm solved each of the problems in a few minutes. It therefore has a very robust performance in comparison to the other two methods. The second set of test problems are weighted M AX-2-S AT problems from Borchers and Furman [3]. The same observations hold as for the unweighted problems, although the difference is now somewhat less pronounced. All the algorithms fare somewhat better on these problems. The results are shown in Table 4.
172
# clauses S D P (nodes)
EDPL
Minto
100
84 (82)
1.36
12.9
150
69 (64)
5.1
18.0
200
91 (70)
395
67.3
250
118 (92)
2218
128
300
170 (128)
29794
687
350
127 (91)
>12hr
2339
400
56 (40)
>12hr
1550
450
276 (210)
>12hr
12634
500
205 (144)
>12hr
8677
2500
331 (184)
not run not run
5000
663 (399)
not run not run
Table 3: Solution times (in seconds) of MAX-2-SAT benchmark problems (n = 50) for different algorithms
7 7.1
Future work Cutting planes
The results from the previous section for MAX-2-SAT can probably be improved upon significantly by adding some of the cuts described in Section 3 to the relaxations. The influence of added cuts is illustrated in Table 5. These results were obtained using the SDP solver CUTSDP [11] in the branching scheme described in Section 5. The solution times are for proving optimality only, and are given for two MAX-2SAT instances from Table 3 and two from Table 4 (weighted). The solver CUTSDP uses a primal-dual predictor-corrector algorithm based on the so-called XS direction. This direction also results in a sparse Newton system at each iteration of the solution of MAX-2-SAT relaxation, but the algorithm still requires additional computation involving the dense primal matrix variable. For this reason, it is not as efficient as the dual scaling method. However, the CUTSDP software automatically adds (some of) the violated triangle inequalities described in Section 3, and therefore gives and indication of the effect of cutting planes on the branching procedure.
173
# clauses
SDP (nodes)
EDPL
Minto
100
101 (125)
1.36
12.9
150
101 (108)
2.04
16.3
200
58 (61)
23.5
34.1
250
137 (117)
235
171
300
61 (44)
874
149
350
161 (122)
40285
2155
400
100 (82)
20233
579
450
53 (44)
>12hr
1420
500
118 (76)
>12hr
3153
Table 4: Solution times (in seconds) for weighted MAX-2-SAT benchmark problems (n = 50) for different algorithms
# clauses
CUTSDP with cuts
(nodes)
CUTSDP without cuts
(nodes)
100
283
(25)
206
(78)
450
396
(38)
403
(191)
100 (weighted)
180
(20)
228
(116)
450 (weighted)
270
(28)
80
(40)
Table 5: Solution times (in seconds) for MAX-2-SAT benchmark problems (n = 50) for the CUTSDP method (with and without cuts) in a branching framework
It is clear from Table 5 that the introduction of cuts reduces the number of nodes in the branching tree significantly, but increases the solution time of the relaxations at the nodes. The total solution time is not improved in general, and all the solution times are worse than those reported in Table 3 and Table 4 for the dual scaling method without cuts. Nevertheless, it is clear that the number of branching nodes can be reduced significantly; the challenge is therefore to extend the dual scaling method to use cuts and to
174 find the optimal trade-off between stronger relaxations and increased solution times.
7.2
Extension to MAX-3-SAT
Another challenging problem is to extend the approach in this paper to MAX-3-SAT problems. The are two possibilities in this regard: (1) One can rewrite MAX-3-SAT as a MAX-2-SAT problem in such a way that the SDP relaxation to the MAX-2-SAT problem yields a 0.801 approximation algorithm for MAX-3-SAT (see [17]); the resulting MAX-2-SAT problem can then be solved using the approach in this paper. (2) One can use the branch and cut formulation of this paper in conjunction with the MAX-3-SAT relaxation of Karlov and Zwick [12]. This relaxation guarantees a 7/8 approximation. The difficulty with approach (1) is that one has seven 2-clauses for each clause of MAX-3-SAT. For example, the clause a V b V c is replaced by the seven (weighted) clauses 2a V z, ->6Vz, &V-.Z, -1CV2, cV^z, b V c, ->b V ->c, where z is an auxiliary variable. If the MAX-3-SAT instance therefore has k clauses and n variables, the associated MAX-2-SAT instance has 7k clauses and n + k variables. The MAX-2-SAT instance is also highly structured, and it remains to be seen if the SDP relaxations are as effective in this case as for random instances. The approach (2) involves the following relaxation of MAX-3-SAT: the clause Xi V Xj V xk will be true if and only if 1 - f (X„+1 + Xi)(Xj + Xk),
!
1 - \{xn+i
+ Xk)(Xi + Xj),
1 - \{xn+l
+ Xj)(Xi + Xk)
1.
1
We can relax the left hand side to four linear matrix inequalities by replacing xt by a vector Vi of norm one, etc., and replacing the products by inner products, as before: t
<
1 - -(Vn+l + Vi)T{Vj + i;*)
t
<
1-
t
<
1 - j K + l + Vk)T{Vi + Vj)
t
< 1.
~{vn+1
+ Vj)T(Vi + Vk)
175 The SDP relaxation involves maximizing the sum of the values t for all the clauses. It is easy to check that one obtains an SDP with 4k inequality constraints where the coefficient matrices of the constraints are of rank 3. One can still solve the resulting problem by the dual scaling method (see [2]), but the assembly of the linear system at each iteration becomes more expensive, and its coefficient matrix becomes more dense. The question is therefore if these relaxations can be solved quickly enough to allow incorporation in a branch and cut scheme.
References [1] K.M. Anstreicher and M. Fampa. A long-step path following algorithm for semidefinite programming problems. Working Paper, Department of Management Sciences, University of Iowa, Iowa City, USA, 1996. [2] S.J. Benson, Y. Ye, and X. Zhang. Solving large-scale sparse semidefinite programs for combinatorial optimization. Working paper, Computational Optimization LAb, Dept. of Management Science, University of Iowa, Iowa City, USA, 1997. [3] B. Borchers and J. Furman. A two-phase exact algorithm for MAX-SAT and weighted MAX-SAT problems. Manuscript, 1997. (To appear in Journal of Combinatorial Optimization). [4] E. de Klerk. Interior Point Methods for Semidefinite Programming. PhD thesis, Delft University of Technology, Delft, The Netherlands, 1997. [5] L. Faybusovich. Semi-definite programming: a path-following algorithm for a linear-quadratic functional. SIAM Journal on Optimization, 6(4):1007-1024, 1996. [6] U. Feige and M. Goemans. Approximating the value of two prover proof systems with applications to MAX 2SAT and MAX DICUT. In Proc. Third Israel Symposium on Theory of Computing and Systems, 1995. [7] M.X. Goemans and D.P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6):1115-1145, 1995. [8] B. He, E. de Klerk, C. Roos, and T. Terlaky. Method of approximate centers for semi-definite programming. Optimization Methods and Software, 7:291-309, 1997.
176 [9] S. Joy, J. Mitchell, and B. Borchers. A branch and cut algorithm for MAX-SAT and weighted MAX-SAT. In M.A. Trick and D.S. Johnson, editors, DIM ACS series in Discrete Mathematics and Theoretical Computer Science, volume 26. American Mathematical Society. [10] S. Joy, J. Mitchell, and B. Borchers. Solving max-sat and weighted max-sat problems using branch-and-cut. Manuscript, 1998. [11] S. Karisch. CUTSDP - A toolbox for a cutting-plane approach based on semidefinite programming. User's Guide/Version 1.0. Technical Report IMM-REP-199810, Dept. Mathematical Modelling, Technical University of Denmark, 1998. [12] H. Karlow and U. Zwick. Manuscript, 1997.
A 7/8-approximation algorithm for max 3sat?
[13] L. Lovasz and A. Schrijver. Cones of matrices and set-functions and 0-1 optimization. SI AM Journal on Optimization, 1(2):166-190, 1991. [14] H. Van Maaren and J.P. Warners. Bounds and fast approximation algorithms for binary quadratic optimization problems with application to MAX 2SAT and MAX CUT. Technical Report 97-35, Delft University of Technology, The Netherlands, 1997. [15] G.L. Nemhauser, M.W.P. Savelsbergh, and G.C. Sigismondi. Minto, a mixed integer optimizer. Operations Research Letters, 15(l):47-58, 1994. [16] Yu. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic optimization. CORE Discussion paper 9719, Belgium, March 1997. [17] L. Trevisan, G.B. Sorkin, M. Sudan, and D.P. Williamson. Gadgets, approximation and linear programming. In Proc. of the 37th Anual IEEE Symposium on Foundations of Computer Science. 1996.
Combinatorial and Global Optimization, pp. 177—204 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
On a Data Structure in a Global Description of Sequences Victor Korotkich (v. korotkichQcqu. edu. au) Department of Mathematics and Computing, Central Queensland University Rockhampton, Queensland ^702, Australia
Abstract A data structure in a global description of sequences is presented. The structure consists of infinite hierarchical levels. We can view the elements as those composed of elements from the lower levels. The key interest in the structure is that it has two distinctive representations, that is, algebraic and geometric, which complement each other. In the first one the elements are integer relations suggesting that ultimate building blocks are just integers from which the structure develops as one undivided whole. In the second one the elements are two-dimensional geometric patterns which upon visualization give a picture of hierarchical formations. The picture contains nonlocal order and large symmetry. Global optimization problems formulated in terms of the structure are given to show its descriptive potentialities. Well-known geometric objects appear in a new way as solutions to the problems. Keywords: data structures, binary sequences, global optimization
1
Introduction
Frequently the design of new algorithms in combinatorial and global optimization involves the development of new data structures. In computations the basic carriers of information are finite binary sequences. The paper presents a data structure in a
178 global description of sequences. The structure consists of infinite hierarchical levels. We can view the elements as those composed of elements from the lower levels. The key interest in the structure is that it has two distinctive representations, that is, algebraic and geometric, which complement each other. In the first one the elements are integer relations suggesting that ultimate building blocks are just integers from which the structure develops as one undivided whole. In the second one the elements are two-dimensional geometric patterns which upon visualization give a picture of hierarchical formations. The picture contains nonlocal order and large symmetry. The structure begins to emerge when sequences are described in global terms as presented in Sect. 2. To realize a global description of a sequence we associate it with numbers, called "structural" numbers, that are assumed to capture its global properties. Structural numbers are combinatorial objects in nature. There exists a geometric interpretation of structural numbers which visualizes them and gives their meaning. The geometric interpretation plays the crucial role in the study. The next step in Sect. 3 leads us to consider the ability of structural numbers to completely specify sequences. In particular, it turns out that we should be concerned to find out whether they can be viewed as coordinates of a space. We show that a complete specification of a sequence with respect to another one in terms of structural numbers is reduced to a system of linear equations. These systems of linear equations constitute the main ingredient for the definition of the structure. Our approach to the system is to focus on its geometric understanding. In general terms geometric understanding occurs by visual pictures that address insight as a force for generating new knowledge. In our case this means that we do not algebraically manipulate the system, but instead through visualization we try to understand phenomena involved. It turns out that the geometric interpretation reveals interesting phenomena. With this focus on visualization two-dimensional geometric objects, called integer patterns, are presented in Sect. 4. They are used as a means to unfold the systems of linear equations into visible images. In Sect. 5 for a particular example we show how the system of linear equations can be visualized to generate a heirarchical formation of integer patterns and integer relations. By visualization the formation has order and large symmetry. This order is nonlocal as it concerns integer patterns of each level and across all levels simultaneously. The aim of the example is to demonstrate a language of integer patterns and hopefully provide a visual access to the structure when it is defined formally. Figuratively, such examples are a window through which we can view properties of the structure and see answers that may not be possible to obtain in the formal way. The example helps to understand that the formation constitutes one possible formation among many others admitted by a whole structure. In Sect. 6 we consider and define this whole structure. Firstly, the system of linear equations is associated with a hierarchical set of integer relations. However, transformations of integer relations
179 are not viewed as a good way to describe how things change. They are more perceived visibly through changes in geometric objects. Secondly, we show that there is an isomorphism between the set of integer relations and a hierarchical set of integer patterns. This puts the integer relations in a geometric form and gives a picture for their transformations. The language of integer patterns in this picture implies phenomena with changes. We define a structure that incorporates all these formations into one whole and call the structure a "web of relations" to informally capture its properties. Due to the isomorphism, elements of the web of relations can be equally seen as integer patterns or integer relations. In Sect. 7 we show descriptive potentialities of the structure. The descriptions arise as hierarchical formations of the elements of the structure and the geometric interpretation enables us to actually see them. To make the potentialities more accessible simple global optimization problems formulated in terms of the structure are considered. We demonstrate that well-known geometric objects appear in a new way as solutions to the problems.
2
Structural Numbers and their Geometric Interpretation
The structure begins to emerge when sequences are described in global terms as presented in this section. It is known that sequences occur when physical systems are analysed in terms of Cartesian order. Specifically, in this analysis they give a simple description concerning the dynamics of the systems in a finite time. It is convenient here to have this interpretation of sequences. Let 6 and e be respective spacings of a spacetime lattice (5, e) in 1 + 1 dimensions. Let I be an integer alphabet and In = {s: s = sx...sn, Si £ I, i = 1,..., n} be the set of all sequences of length n > 2. If I = { - 1 , +1} then In is the set of all binary sequences of length n, denoted by the alternative notation Bn. A sequence s = Si...sn € In encodes on a lattice (S, e) the dynamics of a physical system that has a space position Sj<5 at a time instant ie,i = l,...,n. This encoding can always be made clear from a context and explicit when 5 = 1 and e = 1. A global description of a sequence is its representation that uses information about the sequence as a whole object. To realize a global description of a sequence s = Si...sn € In we associate it with numbers #i(s),..., $/t(s),... , called "structural" numbers, that are assumed to capture its global properties. Definition 1. Structural numbers ^ ( s ) , k = 1, 2,... of a sequence s = Si...sn 6 In
180 on a lattice (5, e) are defined by the formula k-l
tik(s) = Y, akmi((m + nfsi
+ (m + n-
lfsi
+ ... + (m + iysn)ek6,
(1)
t=0
where m is an integer, akmi = (1/fc!) ( * ) ( ( - l ) * - ' " 1 ^ + I)*"' + ( - l ) * - ^ * - ' ) - / 0,
i = 0,..., k - 1,
and (A is the binomial coefficient. Let us briefly analyse the formula (1). Firstly, we see that a sequence s = s\...sn comes into play in (1) as a whole object, because all its components Si,i = 1, ...,n are presented in it. Secondly, we observe that structural numbers are combinatorial objects as binomial coefficients are involved in (1). The formula also shows that structural numbers are defined by using powers of integers m + 1, ...,m + n but it does not give immediately a style of thinking based on intuition. Fortunately, there exists a geometric interpretation of structural numbers which visualizes them and gives their meaning. The geometric interpretation plays the crucial role in the study and can be presented by two parts involved. Firstly, finite sequences admit a natural representation in terms of piecewise constant functions. Namely, let As[tm,tm+n], \m\ = 0,1,... denote the partition of an interval [tm,tm+n] C R[e] with n + 1 equally spaced points ti,i = m,...,m + n such that tm = me,
ti+i -U = e, i = m, ...,m + n-
1,
n = {tm+n -
tm)/e
and where R[e] means that R is measured in terms of e. Let Wj(Ae[tm, tm+n}) be the set of all piecewise constant functions / : [tm, tm+n] —» R^j such that / is constant in [tm, tm+i], (ti} ti+i], i = m + 1,..., m + n — 1, where R^j means that R is measured in terms of S. A code of a function / 6 Ws(As[tm,tm+n]), denoted c ( / ) , is a sequence s = S\...sn, Si € R, i = 1, ...,n such that Si6 is the value of the function in (U-i, ti],i = m + 1,..., m + n. Let a mapping pm(E take a sequence s = S\...sn, s, G R, i = 1,..., n to a function / G Ws(Ae[tm,tm+n\), denoted / = p m fe(s), such that c(f) = s and whose kth integral /'*! satisfies the condition fik\tm) = 0, k = 1, 2,... (see Fig. 1). Secondly, the following result is of central importance to the geometric interpretation. In particular, an integer code series for the value of the kth k = 1,2,... integral /'*' of a function / G Wd(Ae[tm, tm+n\), f = pms£{s), s = Si...sn G /„ at the point tm+n in terms of the code gives [1] fc-i
f[k\tm+n)
= Y, akmi((m + n / s i + (m + n - 1 ) % + - + {m + lYsn)ekS. i=0
(2)
181
f
t 1
. i
i
r-— r - —hi
i
i——*t
Figure 1: Graph of a function / = pon(s),m = 0,(5 = l,e = 1, s = + 1 — 1 — 1 + 1. Comparing (1) with (2), we have / [ " ( W n ) = * ( s ) , fc = l , 2 , . . . .
(3)
Consequently, the geometric interpretation lies in the fact that the kth k = 1,2,... structural number of a sequence s € /„ is the value of the kth integral of a function / = Pm5e{s) at the point tm+n. Actually, Definition 1 is chosen to ensure the validity of this condition. It follows easily from the geometric interpretation that structural numbers are independent of m, which specifies a "frame of reference" of their consideration. Clearly, structural numbers can be seen in a way as global properties of the sequence since, as definite integrals, they carry information about it as a whole. To illustrate the geometric interpretation, consider a function / € Wi(Ai[to, U]) such that / = pon(s),m = 0,5 = l , e = l , s = + l — 1 — 1 + 1 (see Fig. 1). In this example the symmetry of the function is easily seen and we can sketch, thinking in geometric images, the graph of its first /M and second /t 2 l integral as shown in Fig. 2. From the figures we conclude that f[l]{U) = 0, fl2]{U) = 0, / [ 3 ] (t 4 ) > 0 and, by using (3), 0i(s) = O,
02(«) = 0,
tf3(s)>0.
(4)
Note that the visualization gives much more than formula (1) by blind computation. For example, from Figs. 1 and 2 condition (4) is immediate to our intuition.
3
Structural Numbers as Coordinates of a Space and a System of Linear Equations
In Sect. 2 we started to define the structure by presenting structural numbers which provide a global description for sequences. The next step leads us to consider the ability of structural numbers to completely specify sequences. In particular, it turns
182
Figure 2: Graphs of the first Z'1' and second /' 2 ' integral(sketched) of the function / depicted in Fig. 1. The visualization gives much more about fW(U),k = 1,2,... than formula (2) by blind computation. out that we should be concerned to find out whether they can be viewed as coordinates of a space. Definition 2. If for a pair of distinct sequences s,s' G /„ there is a finite integer k > 1 such that their first (k — 1) structural numbers are equal
«i(') = < i M
^ i W ^ w M
whereas the fcth structural numbers are not $A>(S) 7^ $fc(s') when k > 2 and #i(s) 7^ $i(s') when k = 1 then C(s, s') = k. Definition 2 implies that by using C(s, s') structural numbers a sequence s € /„ can be completely specified with respect to another sequence s' e In- Note that the special order is used in the specification as structural numbers are considered successively starting with the first ones. The following theorem clarifies the question about the ability of structural numbers to completely specify sequences. Theorem 1. If s = si...sn,
s' = s[...s'n £ In is a pair of distinct sequences then C{s,s')
Proof. Consider a sequence s = Si...sn € In and let N*(s,m) = {m + n)'si + (m + n-
l ) ' s 2 + ... + (m + l)'s„,
= 0,1,.
(5)
Then (1) for k = 1,2,... takes the form jt-i
fik(s) =
E i=0
^akmiKi(s,m)ek6.
(6)
183 It can be easily shown by using (6) that structural numbers i?i(s),i = l,...,k numbers Nj_i(s,m),i = l,...,k for A; > 1; and if
Ms)=Ms'),-,A-i(s)
define
= i)k-i(s'), M')*MJ)
(7)
then N 0 (s, m) = N 0 (s', m),..., Nfc_2(s, m) = Nfc_2(s', m),
(8)
K f c _i(*,ro)^K t _ 1 (s' ) m)
(9)
for A; > 2 and vice versa. Condition (8) for k > 2 by taking numbers N;(s',m),i = 0, ...,fc — 2 from the righthand side to the left-hand side of the equations and using (5) can be brought to a system of (k — 1) linear equations (m + l)°(s„ - s'n) + {m + 2) 0 (s n _! - s'^)
(m + l)*" 2 ( S n - s'n) + (m + 2)k-2(sn^
- s'^)
+ ... + (m + n)°{Sl - s[) = 0
+ ... + (m + n) fc ~ 2 ( Sl - s'J = 0 (10)
in n integer unknowns st — sj, i = l,...,n. Note that if the system (10) has n equations then at least one of its equations is nonzero. Indeed, the determinant of the Vandermonde matrix /
(m + l)° (m + 1)1
V(m+l)n_1
(m + 2)° (m + 2) 1
... ...
(m + n)° (m + n)1
\
(m + 2 ) " - 1 ... {m + n)"-1
j
is known to be written in the form det(V) = ±Il{i - j : 1 < j < i < n) ^ 0
(11)
and is nonzero. Suppose all linear equations of the system are zero, then we conclude, with the help of (11), that the unique solution to the system is the trivial one si — s'i = 0,i = l,...,n. This contradicts that the sequences s,s' are distinct. Thus not all equations of the system (10) are zero and there exists an integer 1 < k < n such that (m + i y ( s „ - s'n) + ... + (m + n)*(si - s[) = ^ ( s , m) - Xt(s', m) = 0 where i = 0,..., k — 2 and (m + \y-\sn
- s'J + ... + (m + n ) * " 1 ^ - si) = N fc -i(s, m) - N t _i(s', m) ^ 0
when 2 < fc < n, and (m + l)°(s„ - s'n) + ... + (m + n)°(si - s'x) = N 0 (s, m) - N 0 (s', m) =f= 0
184 when k = 1. The integer k specifies the nonzero equation of (10) with the minimum power of integers m 4-1,..., m + n. This implies that N 0 (s, rn) = K 0 (s', m),..., N*-2(s, m) = Kfc_2(s', m),
N*_i(s, m) / N*_i(s', m)
when 2 < fc < n and Njt_i(s, m) ^ Njfc_i(s', m) when k = 1. Then, using (8) and (9), we have (7) with ^ ( s ) = ^ ( A - A - i ( s ) = ^ - i ( s ' ) , flb(a) ^ ^ ( s ' ) when 2 < A; < n and i?i(s) ^ ^i(s') when fc = 1. Therefore, a sequence s e J n can be completely specified with respect to a sequence s' e I„ by at most n of its structural numbers. Recalling Definition 2, we conclude that C(s, s') < n. This completes the proof of the theorem. • As a result of Theorem 1 we see that numbers Nj_!(s, m),i = 1, ...,n can be equivalently considered instead of structural numbers ??i(s),z = l,...,n. We can say that there are two related sets of coordinates with the merit of structural numbers being their geometric interpretation. However, our main interest in Theorem 1 is connected with the fact established during the course of its proof. Namely, we have shown that iff k = C(s,s') > 2 then we have (8) and (9), which by taking numbers N;(s', m), i = 0,..., k — 1 from the right-hand side to the left-hand side of (8) and (9) can be brought to a system of (k — 1) linear equations (m + n)°{Sl - s'J + ... + (m + 1)°(«„ - s'n) = 0 (m + n)k-2(Sl
- s[) + ... + (rn + l)*" 2 ( S n - 0 = 0
(12)
- s[) + ... + (m + l ) * " 1 ^ - s'n) / 0.
(13)
and linear inequality (m + n)k-\Sl
We also call the system (12) a system of integer relations, because its equations can be seen as relations between 0 < i < k — 2 powers of integers m + 1,..., m + n. The system of linear equations (12) and inequality (13) constitute the main ingredient for the definition of the structure. Definition 3. Let C(s,In)
=
max
C(s,s')
< n.
Definition 3 reads that C(s, In) of a sequence s G In is the maximal number of its successive structural numbers by which it can be completely specified in /„. In Sect. 5 we present a notion which captures that C(s, s') is connected with an interesting visible phenomenon. It is conjectured that for s £ Bn [2]
C{s,Bn)< Llog2nJ+l, and the upper bound is realized by the celebrated Prouhet-Thue-Morse (PTM) sequence rj [3], [4], [5], where [log 2 nJ is the integer part of log2 n.
185
\'/ F/
v y^ A
\
\
'
*
V
Figure 3: The pattern P(f, [a, b]) (shaded) of a function / . The shadow gives to the pattern impression of a spatially-localized object.
4
Integer Patterns: Means for the Visualization of the System
In Sect. 3 we showed that a complete specification of a sequence with respect to another one in terms of structural numbers was reduced to the system of linear equations (12). These systems constitute the main ingredient for the definition of the structure. Our approach to the system is to focus on its geometric understanding. In general terms geometric understanding occurs by visual pictures that address insight as a force for generating new knowledge. In our case this means that we do not algebraically manipulate the system, but instead through visualization we try to understand phenomena involved. It turns out that the geometric interpretation reveals interesting visible phenomena. With this focus on visualization we need to use proper geometric objects to unfold the system (12) and inequality (13) into visible images. Namely, identify a function / : [a, b] —> 5R1 with a two-dimensional geometric pattern P(f, [a, b]) C 5ft2 as shown in Fig. 3, where the pattern of a function / is brought out by shading (see [6] for more details). Next, we describe geometric patterns of a special character. The consideration of these patterns arises naturally as they give a mental image of spatially-localized objects. However, for the most part these patterns attract particular interest because of their connection with integers and integer relations. To be specific, set / = /M and for a function / e Ws{Ae[tm, tm+n\), f = pm6s{s), s = S\...Sn
£
ln
• e t ^(/Jim,tm+n]) be a set of intervals [* m + t,Wt+i] C \tm,tm+n} such that si+1 / 0, i = 0,..., n - 1. Let T ( / M , [tm, tm+n]), k = 1, 2,... be a set of intervals [U,tj] C [tm,tm+n], m < i, i + 1 < j , j < m + n such that /W(
[u.tj]), A: = 0,1,... of a function f
eW6{Ae[tm,tm+n})
186 is called an integer pattern of the kth integral of the function if [ti,tj]eT(f^,[tm,tm+n])^0.
For instance, P{f,[UM), P(f, [tio,tn]) 1 - 1 e In, P(f[1],[h,t7}),
Fig. 4 shows integer patterns P(f, [t0, £1]), P{f, [t\, t2]), P{f, [t2, £3]), Ptf.foM* P(f,[U,t7]), P{f,[t7,t8}), P(f,[ts,t9}) P(/,[t 9 > tio]), of a function / = pon(s), s = + 1 - 1 - 1 0 + 1 + 1 - 1 + 1 - 1 I = { - 1 , 0 , + 1 } and integer patterns P ( / W , [t0,t2]), P(f[l],[t2,h\), [1] P(f ,[*rM) of its first integral /W.
t 1
p*1
-T
'A .t ' "
"V-
R •
vi-n M t
Figure 4: Integer patterns(shaded) of a function / and its first integral /W, where f = Pou(s), s = + l - l - 1 0 + l + l - l + l - l - l - l e / u , 7 = { - l , 0 , + l } .
W h a t Picture Appears when the System is Visualized: an Illustrative Example In this section for a particular example we show how the system of linear equations (12) and inequality (13) can be visualized to generate a heirarchical formation of integer patterns and integer relations. By visualization the formation has nonlocal order and large symmetry. The aim of the example is to demonstrate a language of integer patterns and hopefully provide a visual access to the structure when it is defined formally. Figuratively, such examples are a window through which we can view properties of the structure and see answers that may not be possible to obtain in the formal way. Consider a sequence ? e B , connected with the Fibonacci numbers, tiling and quasicrystals [7]. The sequence
187 1 - 1,... . Let s=+1-1+1+1-1+1-1+1+1-1+1+1-1+1-1 be the initial segment of length 15 of <;. There is a sequence s' = - 1 + 1 + 1 + 1 + 1 - 1 - 1 + 1 + 1 + 1 - 1 + 1 - 1 - 1 + 1 such that C(s, s') = 4, i.e., Ms)
= Ms'),
k = 1,2,3,
Ms)*
Ms1),
(14)
where e = 1,5 = 1. Clearly, condition (14) is equivalent to a system of linear equations 15°( Sl - si) + ... + l°(s 1 5 - s'15) = 0 151(s1-s'1) + ...+ l 1 ( s 1 5 - s ' 1 5 ) = 0 15 2 (s x - s[) + ... + l 2 ( S l
= 0,
(15)
153{Sl-s[)
^0.
(16)
and inequality + ... + l3{st
We are interested to show what phenomena are connected with (15) and (16).
/\S
s^s/^
/v\
' \ T 1 > t t i i 4 i'o
Figure 5: Graphs of f^ rately.
1'1 12 13 U
15
'
and // ' exhibit no visible properties when regarded sepa-
By using the geometric interpretation we graph the first integrals /W, /, of functions / = poii(s),/< = Pon(s') as shown in Fig. 5. When regarded separately the graphs exhibit no visible properties, but depicted together, properties emerge as in Fig. 6. These properties are manifested explicitly in the graph of a function
188
f ky
XX / D^ 1 4
10
s\ >
1 12 13 14 15
Figure 6: Graphs of /W and / / ' depicted together uncover visible properties. Consequently /W(ti 5 ) = / / % i 5 ) , * = 1,2,3, /W(t 15 ) / / / % s ) , or by (3)
Ms) = 0*(A * = 1,2,3,
04(«) 7^ Ms'),
and this leads us exactly to (15) and (16). Thus system of linear equations (15) and inequality (16) can be associated with a hierarchical formation of the integer patterns, that is amenable to explanation in terms of visible things. In particular, Figs. 7 and 8 may be viewed as though they were experimental results successively representing levels of this formation. It is interesting to observe that as we integrate, it appears as if corresponding integer patterns merge together in pairs to form new integer patterns belonging to the next level. Since g^k\t) > 0, t e (*o, *is], k = 4, 5,... then there are no integer patterns belonging to the (k + l)th level. Moreover, the levels combined together form a unified geometric object, in which a tree diagram can be noticed (see Fig. 9). This object displays many interesting features that address our insight. Firstly, in Fig. 9 we see immediately large symmetry involved in the formation of the integer patterns. It is worthy of note that the symmetry is exhibited without any need for further explanation. The eye at once sees the crucial image that yields it. Secondly, we witness order involved in the formation of these patterns. This order is very much connected with the symmetry and of a nonlocal character. It concerns all integer patterns of a level and across all the levels simultaneously. In particular, the order specifies how integer patterns of a level have to make the exactly right connections between themselves in terms of their positions to grow from the level to the next one. One may say that the order enfolds integer patterns of a level into integer patterns that they produce at the next level. The order is very rigid as a minor change in it leads to the collapse of the whole object. Thirdly, Fig. 9 demonstrates self-similarity that accompanies the formation. It arises as at each level topologically we have the same situation when the integer patterns join or branch. The difference is in scale and shape. But the most important is
189
Figure 7: Graphs of g and gM. As we integrate, it appears as if the integer patterns merge together in pairs to form new integer patterns belonging to the next level.
Figure 8: Graphs of g^ and g^ (sketched). that Fig. 9 give us a clear understanding that the integer patterns uncover themselves through their connections to an undivided whole as its integrated parts. The undivided whole consists of infinite hierarchical levels involving integer patterns, all of them interconnected. For example, the integer pattern of the forth level, which can be viewed as the result of the formation, is just an integrated building block of the level. This naturally motivates us to make this whole a subject of special interest and the structure is defined in Sect. 6 to formally capture this whole. We can think the integer patterns of each coming level as next in order of complexity to the integer patterns of the previous level, since every integer pattern of the successive level is formed from integer patterns of the previous one. Therefore, condition C(s, s') = 4 admits interpretation in terms of the maximum level or complexity of the integer pattern that the sequences s, s' produce. More precisely, all these four levels contain only integer patterns while next ones do not. This phenomenon may be seen to be close to intuitive understanding of what complexity is about and motivates us to use a special notion "structural complexity" as the name for C(s, s'). It is possible to translate these geometric considerations into algebraic form. This is
190
Figure 9: The hierarchical formation of integer patterns corresponding to (15) and (16). This formation has order and large symmetry. The order is nonlocal as it concerns all integer patterns of a level and across all levels simultaneously. The unified geometric object of these integer patterns is invisibly controlled by relations between integers 1,2,5,6,10,11,14,15. The structural complexity C(s,s') = 4 is the highest level of this formation. a place where a hierarchical formation of integer relations comes into view. Namely, substituting Si, s[, i = 1,..., 15 explicitly in (15) and (16), we obtain a system of integer relations, which rearranged for some elegance with 2 cancelled out gives a system of Prouhet's type identities [3] 1° + 6° + 11° + 14° = 2° + 5° + 10° + 15° 1 1 + 6 1 + l l 1 + 141 = 2 1 + 5 1 + 101 + 151 1 2 + 6 2 + l l 2 + 142 = 2 2 + 5 2 + 102 + 152
(17)
1 3 + 6 3 + l l 3 + 143 ^ 2 3 + 5 3 + 103 + 153
(18)
and an inequality
between integers 1,2,5,6,10,11,14,15. Inequality (18) may be viewed to suggest that relations between the integers presented in (17) are exhausted. System of integer relations (15) viewed by itself does not show that in fact we have a hierarchical formation of these relations behind it. The geometric interpretation helps to identify this formation shown in Fig. 10. In Figs. 9 and 10 we observe a match between the integer patterns and the integer relations, and that the formation of integer patterns corresponds to the formation of integer relations. Overall, in these figures we witness a connection between two different modes of mathematical
191
Figure 10: The hierarchical formation of integer relations corresponding to (15) and (16). There is a one-to-one correspondence between this formation and the formation of integer patterns in Fig. 9. The figure gives us a clear understanding that everything develops from the integers, i.e., "ground" level, as one undivided whole and the formation is its integrated part. understanding, that is, geometry and integers, which comes to light in an interesting manner. In particular, it tells us that geometric patterns may be just relations between integers and the integers themselves can be considered as the ultimate building blocks from which hierarchical formations start. Consequently, the system of linear equations (15) and inequality (16) have two faces, one is geometric and the other is algebraic. With this example in mind we proceed with formal descriptions of the structure.
6
Definition of the Structure and its Isomorphic Representations : Web of Relations
In Sect. 5 for a particular case we have seen how the system of linear equations (12) and inequality (13) can be visualized to generate a heirarchical formation. In the formation an element of each level can be viewed as composed of elements from lower levels and a building block for elements of higher levels. The formation permits two distinctive representations. In the first one elements are integer relations suggesting that ultimate building blocks are just integers from which everything develops as one undivided whole. In the second one elements are two-dimensional geometric patterns. This gives the possibility to visually inspect level by level compositions of these elements and reveals nonlocal order and large symmetry. The example helps to
192 understand that the formation constitutes one possible formation among many others admitted by a whole structure. This naturally motivates us to consider and define this whole structure. It is shown in [8] that the system (12) and inequality (13) can be associated with a heirarchical set WR(s, s', m, In) of k = C(s, s') levels whose elements are integers and integer relations. Elements of the set belonging to a level may be viewed to transform into elements of the neighbouring levels. For example Fig. 11 shows a hierarchical set of integer relations WR(r)(16),f}(16),0,Bie), where 77(16) = + 1 - 1 - 1 + 1 - 1 + 1 + 1 - 1 - 1 + 1 + 1 - 1 + 1 - 1 - 1 + 1 is the initial segment of length 16 of the PTM sequence T] and for convenience the common multiple 2 is omitted in the elements' representation. However, transformations of integer relations are not viewed as a good way to describe how things change. They are more perceived visibly through changes in geometric objects. A result in [8] demonstrates that the set WR(s, s', m, In) is isomorphic to a set of integer patterns WPse(s, s', m, In). This puts the integer relations in a geometric form and gives a picture for their transformations. The language of integer patterns in this picture implies phenomena with changes (see Fig. 12). Interestingly, the integer relation acquires a spatial shape through the isomorphism and can be measured by its area.
Figure 11: A hierarchical set of integer relations W R(r/(16), rj (16), 0, B 1 6 ). Note that relations +16 1 - 15 1 - 141 + 13 1 = 0 and - 1 6 1 + 15 1 + 141 - 13 1 = 0 are different elements. Therefore, based on the system of linear equations (12) and inequality (13) two heirarchical sets of integer relations WR(s, s', m, In) and integer patterns WPSe(s, s', m, In)
193 are defined. We can view elements of the sets as those composed of elements from the lower levels. There is an isomorphism ipSe • WR(s, s', m, /„) ->• WP5e{s, s', m, /„).
(19)
The system (12) viewed by itself exhibits not much grounds to conclude that it is related with a hierarchical formation of two-dimensional geometric patterns. Without the geometric context (12) looks like a curious result of algebra that expresses a property of integers. It seems reasonable to suggest that formal methods have limited applications in deriving properties of the formations such as nonlocal order and large symmetry, because algebraic manipulations of symbols cannot generate their visible images. At the same time when the formations are visualized these properties can be observed directly and each time for a particular pair of sequences we witnessed them. In particular, the system (12) can be easily brought into a form which is well-known in many areas. For simplicity, let s — s\...sn, s' = s[...s'n 6 Bn, s ^ s' then Si — s\, i = l,...,n takes values —2,0,2. This gives that powers (m + n + 1 —i)J', j = 0,..., k — 2 of an integer m+n+1—i, i = 1, ...,n are included in (12) with -I- sign if s; — s[ = +2, with — sign if Si — s[ = —2, and can be seen not to be included if S,— s\ = 0, i = 1,..., n. Let xt € {m + 1,..., m + n} and j/j e {m + 1,..., m + n}, 1 < i < n denote an integer that is the zth (counting from left) integer included in (12) with + and — sign respectively. Then by using the notation and cancelling 2 out, the system (12) can be turned into x\ + ...+xlp = y\ + ... + ylp
xk1-2 + ... + xkp-2 = yk1-2 + - + ykp-2, (20) where in addition it is supposed that C(s, s') = k > 3 and p = q, 1 < p < [f J • The system (20) showed up in many different contexts, under many different guises (for example [9], [10], [11]). However, as known to the authors, there exist no geometric interpretations of (20) leading to the connection with a hierarchical formation of two-dimensional geometric patterns that, when visualized, is believed to exhibit nonlocal order and large symmetry. In this respect the geometric interpretation presented in Sect. 2 indeed plays the crucial role in understanding the system (12).
Now, we finally define a structure that constitutes the main subject of the paper. This structure incorporates all these integer relations into one whole. In particular, the whole structure of these integer relations, denoted WR, is defined by Definition 5. Let
WR{In)= U U U m£Z s£ln
s'£ln
WR(s,s',m,In),
194
Figure 12: Two hierarchical formations of integer patterns and integer relations are displayed together to underline the unified character of elements of the web of relations. The figure allows us to see how integer relations transform. Note that the integer relations can be measured by the area of the integer patterns.
WR{I)
= lim WP6e{In),
WR =
WR{Z),
where Z is the set of integers. By using the isomorphism (19), a structure that for given 6 and e incorporates all integer patterns into one whole can be defined by Definition 6. Let
rPss : WR - • WPSe.
We call WR and WPg£ a web of integer relations and integer patterns accordingly, to informally capture their character. For convenience, due to the isomorphism, we can view both these structures as one structure, called a web of relations, whose elements can be equally seen as integer relations or integer patterns.
195
7
On Descriptive Potentialities of the Structure: Simple Examples of Global Optimization Problems
In this section we show descriptive potentialities of the structure. The descriptions appear as hierarchical formations of the elements of the structure. On one side these formations are represented by trees whose vertices are two-dimensional geometric patterns, i.e., integer patterns. Therefore, means of specification of these trees can be seen as parameters of the descriptions. For example, the area of the integer pattern naturally appears as an important parameter. To make the potentialities more accessible, simple global optimization problems formulated in terms of the structure are considered. We show that well-known geometric objects appear in a new way as solutions to the problems. Due to the simplicity, the geometric interpretation demonstrates how the hierarchical formations arise and allows us to actually see them. Namely, we consider configurations of points in threedimensional cartesian space when they are a pair, equilateral triangle and regular tetrahedron. It is known that these geometric objects are connected with global minimum energy conformations of the scaled Lennard-Jones pair potential of 2,3,4 atoms [12]. This conveniently enables us to view them as molecules with points being van der Waals atoms. We are interested to know how descriptions of these configurations given by the distances between points may be connected with their descriptions in the web of relations. It is supposed that in the web of relations a configuration can be described by using structural numbers ifc-i
Ms)
= T,akmi({m
+ n)is1 + ... + {m + l)isn)ek060,
A; = 1,2,...,
(21)
i=0
where rn,n are integers, s = si...sn € In and <5o,£o are appropriate scales of the ground level of a web of integer patterns WPsoeo- The structural complexity C(s, s'), s' € /„ encodes hierarchical formations corresponding to the configuration. In finding a connection between the descriptions the idea is to identify in the right-hand side of (21) constituents whose mathematical form can also be revealed in the cartesian description. Let h(s) = # { i : st =£ 0, i = 1,..., n} and r(j) denote the value of index i such that the condition Sj ^ 0 is the j t h occurrence when components of s = Si...sn € In are nonzero as the index takes successive values 1, ...,n. For simplicity the dependence of the function r on s in its notation is implicit. Assume that for s = Si...sn £ In in (21) we have s, > 0, i = 1,..., n and Sj ^ 0 for all the indices at once. Then we can transform (21) into the following form
196
h(s) = £ otkmi{x[ + ... + 4)e^o,
(22)
i=0
where for j = 1,..., h(s) we have Xi = (m + n — r(j) + 1) when j-\ 1
j
+ J2 ST(1)
h(s) S
T(1)
and
V = Y. sr(i)-
1=0
j=l
For each integer i = 0,1,... the ith power-sum of x\, ...,xp is 7rjp(i) = x\ + ... + xj, and then (22) can be written as fc-i
^*( s ) = Yl akmiKip(x)£a50-
(23)
i=0
From (23), recalling (6) and Theorem 1, we have a correspondence i?i(s),... ) i? J t(s)^7ro p (x),...,7r t _i J ,(i),
A; = 1,2,....
(24)
Thus, instead of structural numbers of a configuration, we can be interested, due to (24), in their related power-sums. This fact is used as the key as we look for power-sums associated with the cartesian description of a configuration. In particular, let D = {dy, i,j = 1,...,N} be a square symmetric matrix of order N with nonnegative entries, where d^ is the distance between points i and j of a configuration. It follows directly that the trace of D, due to da = 0,i = 1,..., JV, is zero and if A is an eigenvalue of D , then A is real, since the matrix is real and symmetric. D is a matrix and its eigenvalues are rather complicated functions of its entries. The elementary symmetric functions of the eigenvalues however, being the coefficients of the characteristic polynomial P(A) = dei(AI - D) = \N + aiX"-1 + ... + aN,
(25)
are polynomials in the entries of D. Namely, if we denote the roots of (25) by Ax,..., Aw, which are the eigenvalues of D , so that (A - Ai)...(A - Aw) = XN + a i A " - 1 + ... + aN, then as well-known OI,...,OAT are the polynomial functions in \i,...,\N, called the elementary symmetric polynomials, a\ = auv(A), ...,a^ —CT/vw(A)where
a homogeneous polynomial of degree k in Ai,..., A^ and k = 1,..., N. The power-sum 7r*:jv(A) = A* + ... + A^, fc = l,2,...
197 is clearly a symmetric polynomial and it can be expressed by the fundamental theorem on symmetric polynomials as a unique polynomial in the elementary symmetric polynomials. Moreover, the elementary symmetric functions and the power-sums are connected by Newton's identities, which provide a recursive scheme for expressing Tfcjv^), k = 1,..., N in terms ofCTIAT(A),..., o-fcjv(A) [13]. Thus O-lJv(A), ..., OAW(A) <"> TTliv(A), ..., TTjVjvM and there is a connection between D and the power-sums D -> P(A) <-• CTIJV(A), ..., aNN{X) O 7r1JV(A),..., irNN{\).
(26)
, 3
-2
0
-1 4
3
1 2
2 1
h
3 0
4 -
1
5
-
2
Figure 13: The spectral function encodes information about the spectrum (—2, — 1, —1, —1, + 1 , +4) of a matrix. This suggests, by using 7TIAT(A), ..., 7TJVJV(A) and (24) as the key, to define numbers that can be interpreted as structural numbers of the configuration. They are specified by eigenvalues of the distance matrix D , i.e., the spectrum Spec(D) of D. Therefore, the spectrum of the distance matrix is proposed as a means to describe a configuration in the web of relations. We consider configurations when eigenvalues are integers in terms of a length scale. These configurations are a pair, equilateral triangle and regular tetrahedron with distance 1 apart. In these examples, the geometric interpretation serves as the main tool to show how hierarchical formations corresponding to these configurations arise from the spectrum. As the result a connection between the descriptions is found. In particular, interestingly, it turns out that the area of an integer pattern corresponding to a configuration equals the absolute value of the determinant of its distance matrix. Now, we show how structural numbers can be developed from an integer spectrum. It is possible to make it with the help of a spectral function. In particular, given a
198 spectrum Spec(D) = (Ai£,..., A^e) of a distance matrix D describing a configuration of N points, where e > 0 is a length scale of distances in D and Xt £ Z, i = 1,..., N, we define a function, called spectral and denoted S ( D , t ) , that represents information about it. Consider the partition Ae[tm,tm+n] of an interval [tm,tm+n], where n=
max A; — min A; + 1 > 2, t=l,...,AT
i=l,...,AT
m=
_
min A; — 1, i=l,...,JV
then the spectral function S(D,t)
: [tm,tm+n]
—>• 5ft
of D is defined in an interval [t2m+n-k,hm+n-k+i),
k = max Aj, max Aj - 1,..., min A* 1=1,...,N
1=1,. ..,N
1=1,.. .,N
as the number of eigenvalues that are equal to k. The value of the function at the point tm+n is equal to its value in the interval [tm+n-i,tm+n). For example, in Fig. 13 a spectral function S(D,t) of a matrix D such that Spec(D) = ( - 2 , - l , - l , - l , + l , + 4 ) is shown. It follows from the definition that S(D, t) £ H / i(A e [i m , tm+n]) and the code of the spectral function is a sequence s = S\...sn € In given by S(D,t) = pmu(s), where 8 is dimensionless and equals 1. Then structural numbers of the sequence s = si...sn € /„ are *-i
Ms)
= Y,akrni{si(m
+ n)i + ... + sn(m + l)i)£k,
k = l,2,...
(27)
i=0
where S; is the number of eigenvalues which are equal to (m + n — i + 1), i = 1,..., n. This implies that (27) can be rearranged to give fc-l
Ms)
= E
k-l
a
kmi(K + - + XN)ek = X) akmTriN{X)ek,
i=0
k = 1,2,... .
i=0
Therefore, by taking into account (26), we have D -> P(A) ^am(\),...,aNN{\) where <7ON{^) = ^ON{^)
^7r0N(\),...,7rNN(X)
«t? 1 (s),...,i) J V + i(s),
= ^V- This leads to a connection between the descriptions.
Let the length scale of distances between points be 1, i.e., e = 1. We have the following matrix of distances between two points of the pair
199
Figure 14: Two different descriptions in one picture. The structural complexity of the sequence s = 101 uncovers a heirarchical formation corresponding to the pair.
and eigenvalues of the matrix are Ai = —1,A2 = + 1 . The graph of the spectral function 5 ( D 2 , t), t € [—2,1] is given in Fig. 14 and its code is a sequence s = 101 £ ^3, where / = {0,1,2}. Consider hierarchical formations that in the web of relations WP\\ correspond to D2 and are encoded by the structural complexity C(s,s') = max C(s,s")
(28)
s"£h
of the sequence s. It can be found by calculations that a sequence s' = 020 € J3 is the unique solution to the global optimization problem (28). This sequence s' together with the sequence s give the hierarchical formation shown in Fig. 14. The figure exhibits an integer pattern P ( / ' 2 ' , [—2,1]), / = p_ 2 n(s — s') which can be seen as the result of the formation. The system (12) in this case takes the form + l ° - 2 * 0 ° + (-l)° = 0 + l 1 - 2 * 0 1 + ( - l ) 1 = 0.
(29)
We show that the descriptions are closely related in this case. Note, that there is an interesting connection between the product of the eigenvalues, i.e., the determinant of the distance matrix D 2 , and the area of P ( / [ 2 ] , [-2,1]). Namely, we have det(D 2 ) = AiA2 = (-1)(+1) = - 1 .
200 The area of P(f^2\ [—2,1]) in its turn is the third structural number of a sequence a" = s — s' = s'ls'^s'z = + 1 — 2 + 1 which, according to (1), equals 2
Figure 15: This hierarchical formation arises from pinpointing the structural complexity of the sequence s = 1002. This is a global optimization problem. It looks like the sequences s, s' are defined by their correspondence to the resulting integer pattern in the web of relations. Making use of (29), we obtain
Ms") = «3,-2,2(i2 + o2 * (-2) + (-i)2) = I Q ( ( - I A - I ) 1 + ( - I ) 1 ^ ) 1 ) = i. Thus |dct(D 2 )| = Ms")
= area of(P(/M, [-2,1])).
We have the following matrix of distances triangle /0 D3 = 1 \ i
(30)
between three points of the equilateral 1 1\ 0 1 i o;
and eigenvalues of the matrix are Ai = —1, A2 = —1, A3 = 2. The graph of the spectral function 5 ( D 3 , i ) , i G [—2,2] is given in Fig. 15. The code of the spectral function is a sequence s = 1002 € 74, where I = {0,1,2,3}. Consider hierarchical formations
201 that in the web of relations WPu correspond to D 3 and are encoded by the structural complexity C{s,s') = max C(s,s") (31) of the sequence s. It can be found by calculations that a sequence s' = 0111 G I\ is the unique solution to the global optimization problem (31) and together with the sequence s gives the hierarchical formation shown in Fig. 15. The figure exhibits an integer pattern .P(/[ 2 ', [—2, 2]), / = p_2ii(s — s')> which can be seen as the result of the formation. The system (12) in the case takes the form + 2
°-l0-0
0
+ (-l)0 = 0
+ 2 1 - l 1 - 0 1 + ( - l ) 1 = 0. We have det(D3) = AiA2A3 = ( - l ) ( - l ) ( + 2 ) = 2. The area of P ( / ' 2 ' , [—2,2]) is the third structural number of a sequence s" = s - s' = s s s s 'i 2 3 4 = + 1 - 1 - 1 + 1, which equals
Us") =J2a3^2,i(2's'l + 1*4 + 0V3' + (-l)X') = 2. 2=0
Thus |det(D 3 )| = Ms")
= area of(P(/l 2 l, [-2,2])).
(32)
Along similar lines, we have the following matrix of distances between four points of the tetrahedron f0111 \ 11 10 D4 1 1 0 1 Vi i i o / and eigenvalues of the matrix are Ai = —1, A2 = —1, A3 = —1, A4 = 3. The graph of the spectral function 5(D4, t ) , i £ [—2,3] is given in Fig. 16. The code of the spectral function is a sequence s = 10003 € ^5, where / = {0,1,2,3, 4}. Consider hierarchical formations that in the web of relations WPu correspond to D 4 and are encoded by the structural complexity C(s,s') = rriax C(s,s") (33) of the sequence s. It can be found by calculations that a sequence s' = 01012 e IA is a solution to the global optimization problem (33) and together with the sequence s gives the hierarchical formation shown in Fig. 16. The figure exhibits an integer
202 pattern P(fl2\ [—2,3]), / = p_2ii(s — s'), which can be seen as the result of the formation. The system (12) in this case takes the form +3° - 2° - 0° + (-1)° = 0 + 3 1 - 2 1 - 0 1 + (-l)1 =0.
Figure 16: The picture shows a connection between the tetrahedron, i.e., the geometric object, and the relation + 3 1 — 2 1 — 0 1 + (—l) 1 = 0. In all these cases the area of the upper integer pattern equals the absolute value of the determinant of the distance matrix. We have det(D 4 ) = A!A2A3A4 = - 3 . 2
The area of P(p ', [—2,3]) is the third structural number of a sequence s" = s — s' = s'(s'i4s'l4 = + 1 - 1 0 - 1 + 1, which equals Ms")
= £ a3,-2,i(3V; + ?4 + 1*4' + Ws'l + (-1)V 5 ') = 3.
Thus \det(D4)\ = Ms")
= area of(P(/M, [-2,3])).
(34)
In all these cases we find an interesting connection between the descriptions. The connection consists of the fact that the important general invariant of the distance matrix of a configuration, i.e., the determinant, is expressed, as we observed in (30), (32) and (34), in terms of the web of relations. Namely, the area of an integer pattern belonging to the highest level equals the absolute value of the determinant of the distance matrix.
203
Acknowledgements This work was supported in part by CQU URG Grant "Computational Optimisation of Large Molecular Structures".
References V. Korotkich, Integer Code Series: Applications in Dynamic Systems and Complexity, The Computing Center of Russian Academy of Sciences, Moscow, 1993. V. Korotkich, Multicriteria Analysis in Problem Solving and Structural Complexity, in Advances in Multicriteria Analysis, P. Pardalos, Y. Siskos and C Zopounidis (editors), Kluwer Academic Publishers, 1995, pp. 81-90. E. Prouhet, Memoire sur Quelques Relations entre Its Puissances des Nombres, C. R. Acad. Sci. Paris, 33, 1851, p. 225. A. Thue, Uber unendliche Zeichenreihen, Norske vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana, 7, 1906, p. 1 (Reprinted in Selected Mathematical Papers of Axel Thue, T. Nagell (editor), Universitetaforlaget, Oslo, 1977. M. Morse, Recurrent Geodesies on a Surface of Negative Curvature, Trans. Amer. Math. Soc, 22, 1921, p. 84. V. Korotkich, Symmetry in Structural Complexity and a Model of Formation, in From Local Interactions to Global Phenomena, IOS Press, Amsterdam, 1996, pp. 84-95. M. Gardner, Penrose Tiles to Trapdoor Ciphers, W. H. Freeman and Company, New York, 1989. V. Korotkich, On a Structure of Integer Relations and its Geometric Interpretation, Central Queensland University, Technical Report 97-004, 1997. S. Ramanujan, Note on a Set of Simultaneous Equations, J. Indian Math. Soc, 4, 1912, pp. 94-96. L. Mordell, On a Sum Analogous to a Gauss's Sum, Quart. J. Math., 3, 1932, pp. 161-169. N. Korobov, Trigonometric Sums and their Applications, Nauka, Moscow, 1989. M. Hoare, Structure and Dynamics of Simple Microclusters, Advances in Chemical Physics, 40, (1979), p. 49.
204
[13] I. MacDonald, Symmetric Functional and Hall Polynomials, The Camelot Press Ltd., Southhampton, 1979.
Combinatorial and Global Optimization, pp. 205-236 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Heuristic Solutions of Vehicle Routing Problems in Supply Chain Management Yannis Marinakis ([email protected]) DSS Laboratory Department of Production Engineering and Management Technical University of Crete 73100 Chania, Greece
Athanasios Migdalas ( s a k i s @ v e r e n i k e . e r g a s y a . t u c . g r ) DSS Laboratory Department of Production Engineering and Management Technical University of Crete 73100 Chania, Greece
Abstract The distribution of commodities, known by the generic name vehicle routing problem, is one of the most important components of supply chain. The vehicle routing problem, which is a hard combinatorial problem, has therefore attracted considerable research attention and a number of algorithms has been proposed for its solution. In this paper we present an extensive review of heuristic solution techniques for the vehicle routing and traveling salesman problems. Keywords: Supply Chain Management, Vehicle Routing Problem, Traveling Salesman Problem, Distribution Problem, Combinatorial Optimization, Heuristics
206
1
Introduction
In this paper we present a review of heuristic algorithms for vehicle routing and traveling salesman problems. We focus on heuristic algorithms because no exact algorithm can be guaranteed to find optimal tours within reasonable computing time when the number of cities is large. This is due to NP-hardness of the problems. The structure of the paper is as follows: In Section 2, we present some basic characteristics of the supply chain management and we focused on the role of transportation problems. In Section 3, we present the traveling salesman problem, the vehicle routing problem and a few extensions of the latter. In Section 4, we describe classic heuristics for the solution of the mentioned problems. In Section 5, we review modern techniques. Section 6 is devoted to computational comparisons between the reviewed algorithms.
2
Supply Chain Management
A complete logistics [5, 25] system covers the entire process of shipping raw materials and input requirements from suppliers to plants, the conversion of the inputs into products at certain plants, the shipping of the products to various warehouses or depots, and the eventual delivery of these products to the final customers. The distribution activities of a firm comprise all shipping and storage of goods downstream from the plants. We classify the decisions for supply chain management into two broad categories — strategic and operational. Strategic decisions are made typically over a longer time horizon. On the other hand, operational decisions are short term, and focus on activities over a day-to-day basis. There are four major decision areas in supply chain management, and there are both strategic and operational elements in each of these areas: 1. Location. 2. Production. 3. Inventory. 4. Transportation — Distribution.
2.1
Transportation decisions
Transport decisions [5] can involve mode selection, shipment size, and routing and scheduling. These decisions are influenced by the proximity of warehouses to cus-
207 tomers and plants, which, in turn, influence warehouse location. Inventory levels also respond to transport decisions through shipment size. Transportation often represents the most important single element in logistics costs for most firms. Transportation is a key decision area within the logistics mix. Except for the cost of purchased goods, transportation absorbs, on the average, a higher percentage of logistics costs than any other relevant activity. Because transportation costs typically range between one third and two thirds of total logistics costs, improving efficiency through the maximum utilization of transportation equipment and personnel is of a major interest. The time that goods are transit reflects on the number of shipments that can be made with a vehicle within a given period of time and on the total transportation costs for all shipments. To reduce transportation costs and also to improve customer services finding the best — in terms of time or distance minimization — routes that a vehicle should follow through a network of roads, rail and other shipping lines is frequently an important decision problem.
3
The Vehicle Routing Problem
The distribution or vehicle routing problem (VRP) is often described as the problem in which vehicles based on a central depot are required to visit geographically dispersed customers in order to fulfill known customer demands. The problem is to construct a low cost, feasible set of routes — one for each vehicle. A route is a sequence of locations that a vehicle must visit along with the indication of the serve it provides [9]. The vehicle must start and finish its tour at the depot. We can say that the problem arises as a generalization of the traveling salesman problem. The traveling salesman problem (TSP) requires the determination of a minimal cost cycle that passes through each node of a given graph exactly once [46]. If costs are symmetric, that is, if the cost of traveling between two locations does not depend on the direction of travel, we have a symmetric TSP, otherwise we have an asymmetric TSP. The multiple traveling salesman problem arises if many salesmen or vehicles in the fleet are to leave from and return to the same depot. There are no restrictions on the number of nodes that each vehicle must visit except that each vehicle must visit at least one node. The vehicle routing problem has a variety of additional constraints and extensions that are often found in real-world problems. These include the following [16] • Each vehicle can operate on more than one route, provided that the total time spent on these routes is less than a given bound T. • Each customer must be visited within a specific time interval, known as time
208 window. • The problem may involve both deliveries to and collections from customers. In addition, it may be possible to mix deliveries and collections on a single route, or alternatively, it may be required from a vehicle to first perform all the deliveries in the route before performing the collections. • Vehicles may also be associated with time windows within which they are allowed to operate.
3.1
Variants of the vehicle routing problem
In this section we present some variants of the vehicle routing problem, namely, the multiple depot vehicle routing problem, the multiple commodities vehicle routing problem, the stochastic vehicle routing problem, the vehicle scheduling problem and others. Many applications [50, 60] involve pickup and delivery services between the depot and peripheral locations (warehouses, stores, stations). 'Delivery' refers to transportation of goods from the depot to customers, and 'pickup' means shipment in the opposite direction (to the depot). In the literature this problem is known as pick-up and delivery problem (PDP) or vehicle routing problem with back-hauls (VRPB) [63, 65]. The objective is to find a set of vehicle routes that serve the delivery and backhaul customers so that vehicle capacities are not violated and the total distance traveled is minimized. In companies [8, 16] with more than one depot, it is often the case that each depot is autonomous, with its own fleet of vehicles and its own geographical customer area to serve. In such cases, the company would simply face a number of similar single depot vehicle routing problems. In other cases depot operations are interdependent and vehicles leaving one depot may, after delivering to customers, end up at another depot. These problems are called multiple depot vehicle routing problem. In some cases [16], the vehicles are compartmented so that different commodities are stored in segregated compartments. Each customer may require specific quantities of different types of commodities. The problem is then called multiple commodities vehicle routing problem. The problem of managing routing and dispatch operations under conditions of random demand fluctuations, is often referred to as the stochastic routing problem [34]. The stochastic vehicle routing problem is usually formulated under the following conditions [9]: 1. Customer demand is a random variable with a known probability distribution.
209 2. Routes must be designed before the actual demands become known. 3. The objective is to minimize expected travel distance. Vehicle scheduling problems can be thought of as routing problems with additional constraints imposed by time periods during which various activities may be carried out [9]. Three constraints commonly determine the complexity of the vehicle scheduling problems: • The length of the time that a vehicle may be in operation before it must return to the depot for service or refueling. • The fact that certain tasks can only be carried out by certain vehicle types. • The presence of a number of depots where vehicles may be housed. In the school bus routing and scheduling problem [8, 10], there are a number of schools, each having being assigned a set of bus stops, with a given number of students assigned to each stop, and time windows for the pickup and the delivery of the students. The problem is to minimize the number of buses used or total transportation costs while serving all the students and satisfying all the time windows. In the problem of routing and scheduling with full loads and time windows, a set of demands is specified for a number of origin-destination pairs. Each demand is a full trailer which must be loaded onto a tractor at an origin and unloaded at a destination. These stops must satisfy prespecified time window constraints and the goal is to design routes and schedules for the fleet of tractors. In most cases, the objective is to minimize total transportation costs or the number of tractors used. In the multi-vehicle covering tour problem [33] we are given two sets of locations. The first set, V, consists of potential locations at which some vehicles may stop, and the second set, W are locations not actually on the vehicle routes but within an acceptable distance of a vehicle route. The problem is to construct several vehicle routes through a subset of V, all starting and ending at the same locations, subject to some side constraints, having a total minimum length, and such that every location of W is within a reasonable distance of a route. An important aspect [7] of the vehicle routing problem that has been largely overlooked is the use of satellite facilities to replenish vehicles during a route. When possible, satellite replenishment allows the drivers to continue making deliveries until the close of their shift without necessarily returning to the central depot. This situation arises primarily in the distribution of fuel and certain retail items. The vehicle routing problem with trailers [28] is concerned with the case where a vehicle consists of a truck and a trailer. Both can carry goods. The use of truck and
210 trailer may cause problems when serving customers that are located in the center of a city or customers who have little space nearby for maneuvering the vehicle. Time and trouble could be saved if these customers were served by the trucks only.
4
Classic Heuristics for the Traveling Salesman and t h e Vehicle Routing Problems
In this section we examine three different classes of heuristics — tour construction procedures, two phases algorithms and tour improvement procedures. Tour construction procedures generate an approximately optimal tour for the distance matrix. Two phases algorithms, like cluster first - route second procedures, which during the first phase group or cluster demand nodes and then, during a second phase, design economical routes. Tour improvement procedures attempt to find a better tour given an initial one.
4.1 4.1.1
Constructive methods Nearest Neighbor Procedure
In this heuristic, the salesman starts at some city and then visits the city nearest to the starting one. From there, he visits the nearest unvisited city, until all cities are visited, and then returns to the starting city [54, 59]. Step 1. Start with any node as the beginning of a path. Step 2. Find a node, not already on the path, which is nearest to the last added node. Add it to the path. Step 3. Repeat Step 2 until all nodes belong to the path. Then, join the first and the last nodes of the path. For symmetric, complete graphs, the worst case behavior of the algorithm is length of nearest neighbor tour 1 length of optimal tour ~ 21
0g2(n)l +
1 2'
where \x] is the smallest integer > x, and n is the number of nodes in the network. The nearest neighbor algorithm is of 0(n2) time complexity.
211 4.1.2
Insertion Procedures
The insertion procedures [30, 59] takes a subtour of k nodes and attempts to determine which node (not already in the subtour) should join the subtour next (the selection step) and then determines where in the subtour it should be inserted (the insertion step). The most known of these algorithms is the nearest insertion algorithm: Step 1 Start with a subgraph consisting of only one node, say i. Step 2 Find node k such that en- is minimal and form the subtour i — k — i. Step 3 (Selection) Given a subtour, find node k, not already in the subtour, closest to any subtour node. Step 4 (Insertion) Find the arc (i, j) in the subtour which minimizes Cik + c^j — ci}•. Insert k between i and j . Step 5 Go to Step 3 unless a Hamiltonian cycle has been formed. The worst case behavior of the algorithm is length of nearest insertion tour length of optimal tour ~~ and its time complexity
0(n2).
Similar to the nearest insertion procedure are the cheapest insertion [54], the arbitrary insertion [9], the farthest insertion [54], the quick insertion [9], and the convex hull insertion [9] algorithm.
4.1.3
Clarke and Wright Algorithm
This savings algorithm is an exchange procedure [17, 30, 44], that was originally developed for the VRP. It can, however, be applied to the TSP as well: Step 1. Select any node as the central depot and denote it as node 1. Step 2. Compute the savings Sy = Cy + cu — Cij for i, j = 2,3, • • •, n. Step 3. Order the savings from largest to smallest. Step 4. Starting at the top of the savings list and moving downwards, form larger subtours by linking appropriate nodes i and j . Repeat until a tour is formed.
212 Next the Clarke and Wright algorithm for the solution of the VRP [16] is given: Step 1. Calculate the savings s,j = cy + Cu — Cij for all pairs of customers i and j . Note that s^ is the saving in cost that would result if the link (i,j) is made to produce the route (l,i,j,l) instead of supplying i and j on two routes (1, i, 1) and ( l , j , 1). Step 2. Order the savings in descending order. Step 3. Starting at the top of the list do the following. Parallel version Step 4. If a given link results in a feasible route according to the constraints of the VRP, then append this link to the solution, if not reject the link. Step 5. Try the next link in the list and repeat Step 4 until no more links can be chosen. Sequential version Step 4. Find the first feasible link in the list which can be used to extend one of the two ends of the currently constructed route. Step 5. If the route cannot be expanded further, terminate the route. Choose the first feasible link in the list to start a new route. Step 6. Repeat Steps 4 and 5 until no more links can be chosen. The worst case behavior for this approach is bounded by a linear function in log2(n). The calculation of the matrix sy- in Step 2 requires about en2 computations for some constant c. Next, in Step 3 savings can be sorted into nonincreasing order via Heapsort method in a maximum of cn2lg(n) computations. Step 4 involves at most n 2 computations. Thus, the Clark and Wright savings procedure requires an order of n 2 log 2 (n) computations. 4.1.4
Nearest Merger Algorithm
The nearest merger method [59] when applied to a TSP of n nodes constructs a sequence Si, • • •, Sn such that each Si is a set of n — i + 1 disjoint subtours covering all the nodes. Step 1. Si consists of n subtours, each containing a single node.
213 Step 2. For each i < n, find an edge (a^bi) such that caj&; = min{cxy for x and y in different subtours in Si}. Then S i + 1 is obtained from S* by merging the subtours containing aj and 6j.
The worst case behavior of the algorithm is length of nearest merger tour length of optimal tour ~~ This approach requires an order of n 2 computations.
4.1.5
Christofides Algorithm
Christofides [9, 54] suggested a method of transforming spanning trees to Eulerian graphs. For this, it is sufficient to add a perfect matching on the odd-degree nodes of the tree. After adding the perfect matching edges, all node degrees are even and hence the graph is Eulerian. The following procedure for the solution of the TSP can be based on this:
Step 1. Find a minimal spanning tree T of the given graph G. Step 2. Identify all the odd degree nodes in T. Solve a minimum cost perfect matching on the odd degree nodes using the original cost matrix. Add the edges from the matching to the edges of T to obtain an Euler cycle. In this subgraph every node is of even degree although some nodes may have degree greater than 2. Step 3. Remove polygons over the nodes with degree greater than 2 and transform the Euler cycle into a Hamiltonian cycle.
The worst case behavior of the algorithm is length of Christofides tour length of optimal tour ~~ Since most of the computation time is consumed by the minimum matching subroutine, this heuristic is 0(n3). However, the number of odd nodes may be considerable less than n.
214
4.2
2-phase Algorithms
In two-phase methods, the first phase consists of clustering the customers by assigning them to vehicles, and the second phase routes these clusters. The best known algorithms of this category are the sweep algorithm, the Mole and Jameson algorithm and the two-phase method of Fisher and Jaikumar.
4.2.1
The Sweep Algorithm
The sweep algorithm was originally devises by Gillet and Miller [9]. This approach constructs a solution in two stages. First, it assigns nodes to vehicles and then it sequences the order in which each vehicle visits the nodes assigned to it. The procedure is as follows [17, 16, 44]: Phase I Step 1. Choose an unused vehicle k. Step 2. Starting from the unrouted customer i with smallest angle #;, include consecutive customers i + 1, i + 2, • • • in the route until the capacity constraint of the vehicle k is reached. Step 3. If all customers are swept or if all vehicles have been used, go to phase II, else return to Step 1. Phase II Step 4. Solve a TSP for every set of customers assigned to a vehicle to form the final routes.
4.2.2
M e t h o d of Mole & Jameson
The algorithm of Mole and Jameson is a sequential procedure [17] in which, for a given value of two parameters A and fi, the following two criteria are used to expand a route under construction: e(M,J) a{i,l,j) The algorithm proceeds as follows:
= ca + ctj - \xcij = Xcoi-e(i,l,j)
215 Step 1. For each unrouted customer x; compute the feasible insertion in the emerging route R as e(ii,l,ji)
= min[e(r,/, s)]
for all adjacent customers xr,xs £ R, where Xj, and Xjt are customers between which xi results in the best insertion. Step 2. The best customer xi* to be inserted in the route is computed as the one for which <7(*i*.**.>)
=
max[a(ii,l,ji)},
where xi is unrouted and feasible. Step 3 . Insert xi* in route R between x^ and Xj*. Step 4. Optimize route R using r-optimal methods (§4.3). Step 5. Return to Step 1 to start a new route JR either until all customers are routed or no more customers can be routed.
4.2.3
M e t h o d of Fisher & Jaikumar
The first [22, 23] phase of this algorithm performs a parallel clustering by solving optimally a generalized assignment problem. Phase I Step 1. Choose m customers to be seeds of clusters and allocate a vehicle to each. Step 2. For each customer i and for each cluster k, compute an insertion cost dik relative to the seed of the cluster. Step 3. Solve a corresponding generalized assignment problem (GAP) to obtain clusters. Phase II Step 4. Solve a TSP for every set of customers in the clusters implied by the solution to GAP.
216
4.3
Improving heuristics or local search heuristics
A local search algorithm [1] is built around a neighborhood search procedure. Given a tour the algorithm examines all the tours that are "neighboring" to it and tries to find a shorter one. The definition of "neighbor" varies with the details of the particular local search heuristic [39]. The overall process goes as follows: Starting with some initial tour, chosen arbitrary or generated by some other heuristic, if there is no neighboring tour which is shorter than the original one, the process halts. The initial tour is a local optimum with respect to the chosen neighborhood. Otherwise, a shorter neighbor of the original tour is used as a new starting point, and the process is repeated. The method must eventually terminate as there is only a finite number of possible tours. A neighborhood N for a problem instance (5, / ) , where S is the set of feasible points and / the objective function, can be defined as a mapping from S to its power set, i.e., N : S —» 2 5 . N(s) is called the neighborhood of s € 5 , and contains all the solutions that can be reached from s by a single move. Here, the meaning of a move is that of an operator which transforms one solution to another with small modifications. A solution x is called a local minimum of / with respect to the neighborhood N if
f(x)
4.3.1
2-opt Method
A 2-opt procedure [47, 54], in general, consists of eliminating two edges and reconnecting the two resulting paths in order to obtain a new tour. Note that there is only one way to reconnect the paths. The 2-opt procedure was introduced by Lin (1965) for the TSP: Step 1. Let T be the current tour. Step 2. For every node i = 1, 2, • • •, n: Examine all 2-opt moves involving the edge between i and its successor in the tour. If it is possible to decrease the tour length this way, then choose the best such 2-opt move and update T.
217 Step 3. If no improving move could be found, then stop. In the worst case, it can only be guaranteed that an improving move decreases the tour length by at least one unit. No polynomial worst case bound for the number of iterations to reach a local optimum can be given. Checking whether an improving 2-opt move exists takes 0(n2) time.
4.3.2
3-opt Method
The 3-opt heuristic is quite similar to the 2-opt but it introduces more flexibility in modifying the current tour, because it uses a larger neighborhood. The tour breaks into three parts instead of only two. There are eight ways to connect the resulting three paths in order to form a tour. There are I „ I ways to remove three edges from a tour.
4.3.3
Lin - Kerningham Algorithm
This algorithm was developed by Lin and Kernigham (LK) [48, 35] and for many years was considered to be the best heuristic for the TSP. The LK-algorithm decides dynamically at each iteration the number of edges that should be exchanged. The procedure works as follows [32, 48]: Step 1. Generate a random initial solution T. Step 2. Step 2.1. Set the iteration counter i = l. Step 2.2. Select xt and yt as the most out of place pair at the i-th iteration. This generally means that X{ and yi are chosen to maximize the improvement when X\, • • • ,Xi are exchanged with j/i, • • •, y,. Xi is chosen from T — {xi, • • •, x^} and y, from S — T — {x%, • • •, X(}. Step 2.3. If it appears that no more gain can be made, go to Step 3. Otherwise, set i = i + 1 and go back to Step 2.2. Step 3. If the best improvement is found for i = k, exchange x\,---,Xk with V\i' •• iVki to obtain a new T, and go to Step 2. If no improvement was found, go to Step 4. Step 4 (Multistart). Repeat from Step 1 if desired. Otherwise, terminate.
218 4.3.4
Or-opt
The Or-opt procedure, well known as node exchange heuristic, was first introduced by Or [66]. It removes a sequence of up-to-three adjacent nodes and inserts it at another location within the same route. Or-opt can be considered as a special case of 3-opt (three arcs exchanges) where three arcs are removed and substituted by three other arcs. When removing a chain of consecutive nodes in Or-opt, two arcs are deleted, and the third is eliminated when inserting the chain back into the route. However, the number of possible Or-exchanges is far less than that of possible 3-exchanges. Or-opt is also shown to produce improved solutions of comparable quality to those produced by 3-opt, while requiring significantly less computational time. Or-opt algorithm can be described as follows [43]: Step 1. Consider an initial tour and set t = 1 and s = 3. Step 2. Remove from the tour a chain of s consecutive vertices, starting with the vertex in position t, and tentatively insert it between all remaining pairs of consecutive vertices on the tour. Step 2 . 1 . If the tentative insertion decreases the cost of the tour, implement it immediately, thus defining a new tour. Set t = 1 and repeat Step 2. Step 2.2. If no tentative insertion decreases the cost of the tour, set t = t + 1. If £ = n + 1 then proceed to Step 3, otherwise repeat Step 2. Step 3 . Set t = 1 and s = s — 1. If s > 0 go to Step 2, otherwise stop.
4.3.5
G E N I and G E N I U S
The GENI algorithm was presented by Gendreau, Hertz and Laporte (1992) [40]. GENI is a hybrid of tour construction and local optimization. Suppose that cx, Ci, • • •, c/v is an arbitrary ordering of the cities. Starting with the partial tour consisting of the first three cities, ci, C2, c3, new cities are added to the current tour in the order given, starting with c 4 . To add city ct, possible ways of inserting it into the tour are considered as well as a 3-opt or 4-opt move with the C; as the endpoint of one of the deleted edges. The range of possibilities is restricted by requiring that certain of the inserted edges link cities to members of their nearest neighbor lists, where only cities currently in the tour qualify to be in such lists. Also the list lengths are constrained to a maximum p. GENIUS is a true local optimization algorithm based on principles similar to GENI's. Given a starting tour generated by GENI, it cycles through the cities looking for improvements.
219
5
5.1
Metaheuristics for the Traveling Salesman and t h e Vehicle Routing Problems Simulated annealing
Simulated annealing (SA) belongs [19] to a class of local search algorithms that are known as threshold algorithms. These algorithms play a special role within local search for two reasons. First, they appear to be quite successful when applied to a broad range of practical problems. Second, some threshold algorithms such as SA have a stochastic component, which facilitates a theoretical analysis of their asymptotic convergence. The approach of SA originates from theoretical physics, where Monte-Carlo methods are employed to simulate phenomena in statistical mechanics. Its predecessor is the so-called Metropolis filter. This simulation method can be motivated as follows. Consider a huge number of particles of fixed volume at some temperature #. Since the particles move, the system can be in various states. The probability that the system is in a state of certain energy E is given by the Boltzmann distribution f(E) = —S& 5 , where z($) is a normalization factor and KB is the Boltzmann constant. This distribution characterizes the statistical equilibrium of the system at temperature i9. In the following we present a simulated annealing procedure for the TSP [54]: Step 1. Compute an initial tour T and choose an initial temperature d > 0 and a repetition factor r. Step 2. If the stopping criterion is not satisfied: Step 2.1. Do the following r times. Step 2.1.1. Perform a random modification of the current tour to obtain the tour (V) and let A = c(T') - c(T). Step 2.1.2. Compute a random number x, 0 < x < 1. Step 2.1.3. If A < 0 or x < e x p ( ^ ) . Step 2.2. Update d and r. Step 3. Output the current tour T as solution. 5.1.1
Algorithm by Alfa, Heragu and Chen
The method described by Alfa, Heragu and Chen [26] can be viewed as route first cluster second algorithm. A giant tour is first constructed without considering weights and vehicle capacity, and then it is partitioned into segments of consecutive vertices
220 whose total weight does not exceed a given capacity Q. It is implicitly assumed that the number of vehicles is not fixed a priori. Starting with the initial tour, at iteration t, three edges are randomly selected and removed from the tour, as in classic 3-opt. 5.1.2
Osman's Simulated Annealing Algorithm
The simulated annealing implementation proposed by Osman [26] is substantially more involved in several aspects: 1. It uses a better starting solution (using the Clarke and Wright algorithm). 2. Some parameters of the algorithm are adjusted in a trial phase. 3. Richer solution neighborhoods are explored using A-interchanges (See §5.2.2). 4. The cooling scheduling is more sophisticated. The reader can find other algorithms based on simulated annealing in the articles [11, 12, 41, 64].
5.2
Tabu search
Tabu search (TS) was introduced by Glover [29]. Computational experience has shown that TS is a well established approximation technique [36], which can compete with almost all known techniques and which, by its flexibility, can beat many classic procedures. TS combines [1] the deterministic iterative improvement algorithm with a possibility to accept cost-increasing solutions. Thus, the search is directed away from local minima so that other parts of the search space can be explored. The next solution visited is always chosen to be a legal neighbor of the current solution with the best cost, even if that cost is worse than that of the current solution. The set of legal neighbors is restricted by a tabu list designed to prevent the search from cycling. The tabu list is dynamically updated during the execution of the algorithm. The tabu defines solutions that are not acceptable in the next few iterations. However, a solution in the tabu list may be accepted if its quality is in some sense good enough, in which case it is said to attain a certain aspiration level. A general description of TS is the following: Step 1. Choose an initial solution x. Set x* — x (the best solution so far) and k =0 Step 2. Set k = k + 1 and generate a subset V* of solutions in N(x,k), neighborhood of x at iteration k.
the
221 Step 3. Choose a best y e V* with respect to the objective function / or some modified objective criterion Z1 and set x = y. Step 4. If f(x) < f(x*), then set x* = x. Step 5. If a stopping condition is met, then stop. Else go to Step 2.
5.2.1
Willard's Algorithm
One of the first attempts to apply tabu search to the VRP is due to Willard [26]. Here the solution is first transformed into a giant tour by replication of the depot, and neighborhoods are defined as all feasible solutions that can be reached from the current solution using 2-opt or 3-opt exchanges. The next solution is determined by the best non-tabu move.
5.2.2
Osman's Algorithm
Osman [26] again defines neighborhoods using a A-interchange scheme. This includes a combination of 2-opt moves, vertex reassignments to different routes, and vertex interchanges between two routes. In one version of the algorithm, called best-admissible, the whole neighborhood is explored and the best non-tabu feasible move is selected. In another version, called first-best-admissible, the first admissible improving move is selected if one exists.
5.2.3
TABUROUTE
The TABUROUTE [26] algorithm involves several innovative features: • The neighborhood structure is defined by all solutions that can be reached from the current solution by removing a vertex from it and inserting it into another route in its neighborhood using GENI (see section 4.3.5). • The search procedure examines solutions that may be infeasible with respect to the capacity or maximum route length constraints. • TABUROUTE does not actually use a tabu list, but random tabu tags. • TABUROUTE uses a diversification strategy. This is achieved by penalizing vertices that have been moved frequently in order to increase the probability of considering slow-moving vertices.
222 5.2.4
Taillard's Algorithm
The Tailard tabu search [26] implementation contains some of the features of the algorithm presented in section 5.2.3, namely, random tabu durations and diversification. It defines neighborhoods using A-interchanges like the algorithm presented in section 5.2.2. Rather than executing insertions with GENI, the algorithm uses standard insertions. 5.2.5
T h e X u and Kelly Algorithm
Xu and Kelly [45] used a more sophisticated neighborhood structure. They consider swaps of vertices between two routes, a global repositioning of some vertices into other routes, and local route improvements. The global repositioning strategy solves a network flow model to optimally relocate given numbers of vertices into different routes. Route re-optimizations are performed by means of 3-opt exchanges and a TS improvement routine. The algorithm is governed by several parameters which are dynamically adjusted through the search. A pool of best solutions are memorized and periodically used to re-initiate the search with new parameter values.
5.2.6
Adaptive M e m o r y Procedure
One of the most interesting developments to have occurred in the area of TS in recent years is the concept of Adaptive Memory developed by Rochat and Taillard [31, 45, 57]. It is mostly used in TS, but its applicability is not limited to this type of metaheuristic. An adaptive memory is a pool of good solutions that is dynamically updated throughout the search process. Periodically, some elements of these solutions are extracted from the pool and combined differently to produce new good solutions. In the VRP, vehicle routes selected from several solutions will be used as a starting point. The extraction process gives a larger weight to those routes belonging to the best solutions.
5.2.7
Other Algorithms Based on Tabu Search
Many algorithms based on TS have been proposed during the last five years for the solution of VRPs. Gendreau, Laporte Musaraganyi and Taillard proposed an algorithm [27] for the solution of the heterogeneous vehicle routing problem. Firstly, the algorithm makes use of GENIUS (section 4.3.5), secondly it uses TS, and then it is itself embedded within a so-called Adaptive Memory Procedure (section 5.2.6). Renaud, Laporte and Boctor proposed an algorithm [55] for the solution of the multidepot vehicle routing problem. The algorithm consists of three phases which are: fast
223 improvement, intensification and diversification. The reader can find other algorithms based on tabu search in the articles [3, 24, 12, 14, 52].
5.3
Genetic algorithms
This search strategy uses concepts from population genetics and evolution theory to construct algorithms [1, 51, 62] that try to optimize the fitness of a population of elements through recombination and mutation of their genes. The general idea of genetic local search is given by the following procedure: Step 1 (Initialize): Construct an initial population of n solutions. Step 2 (Improve): Use local search to replace the n solutions in the population by n local optima. Step 3 (Recombine): Augment the population by adding m offspring solutions, the population size now equals n + m. Step 4 (Improve): Use local search to replace the m offspring solutions by m local optima. Step 5 (Select): Reduce the population to its original size by selecting n solutions from the current population. Step 6 (Evolute): Repeat Step 3 through 5 until a stop criterion is satisfied.
5.3.1
GIDEON
GIDEON [26, 61] is a genetic algorithm for VRPs with time windows and capacity constraints, based on a cluster first - route second strategy. The genetic algorithm is only applied during the clustering phase: a procedure, called genetic sectoring, partitions the vertices into sector or clusters centered at the depot, as in the sweep algorithm (section 4.2.1).
5.3.2
GENEROUS
GENEROUS [26] is a genetic algorithm for VRPs with time windows. This algorithm avoids the difficulties related to the encoding of a solution into a chromosome by applying the crossover and mutation operators on the solutions themselves. In this algorithm, a new solution is created from two parent solutions 1 and 2 by linking the first customers on a route of parent 1 to the last customers on a route of parent 2,
224 as in the 2-opt exchange procedure. The new route replaces the old one in parent solution 1. A second offspring solution can be created by inverting the roles of the parents solutions.
5.4
Neural nets algorithms
The use of artificial neural networks to find good solutions to combinatorial optimization problems has recently caught some attention. A neural network consists of a network [1, 15] of elementary nodes (neurons) that are linked through weighted connections. The nodes represent computational units, which are capable of performing a simple computation, consisting of a summation of the weighted inputs, followed by the addition of a constant called the threshold or bias, and the application of a nonlinear response function. The result of the computation of a unit constitutes its output. This output is used as an input for the nodes to which it is linked through an outgoing connection. The overall task of the network is to achieve a certain network configuration, for instance a required input-output relation, by means of the collective computation of the nodes. This process is often called self-organization.
5.4.1
Hopfield Neural N e t s
The first attempt to produce a solution for the TSP with neural nets was based on a Hopfield Neural Network [2]. Hopfield networks can be used as associative memories for information storage and retrieval, and to solve combinatorial problems. They belong to the class of recurrent neural networks, that is, outputs of a neural networks are fed back to inputs of previous layers of the network. The implementation of a Hopfield Neural Network requires: • A transformation of the cost and the constraints of the problem into one function, known as Hopfield energy function. • The determination of Lagrange parameters. The above steps are often the key decisive factors on the success of the application. These, are also, the reasons why researchers have difficulty in duplicating Hopfield's result for solving the traveling salesman problem. To map the traveling salesman problem onto the Hopfield framework, a scheme is needed to represent the final state of the network as a tour list. Hopfield and Tank [2] adopted a representation scheme in which the precedence (sequence) of cities in a tour list is encoded by the final states of a set of neurons. For example, for an n-city problem, the network needs n 2 neurons — one neuron for each possible tour position for each city. Since there are n cities, each with n possible tour positions, n x n neurons are needed.
225 5.4.2
Self-Organized Maps
Self-organized maps are instances of so-called competitive neural networks models [26, 49]. They self-organize by gradually adjusting the weights on their connections.
5.5
Ant algorithms
The ant system, introduced by Colorni, Dorigo and Maniezzo [13, 18], is a new distributed metaheuristic for hard combinatorial optimization problems and was first used on the traveling salesman problem. Observations on real ants searching for food were the inspiration to imitate the behavior of ant colonies. Real ants are able to communicate information concerning food sources via an aromatic essence, called pheromone. They mark the path they walk on by laying down pheromone in a quantity that depends on the length of the path and the quality of the discovered food source. Other ants can observe the pheromone trail and are attracted to follow it. The described behavior of real ant colonies can be used to solve combinatorial optimization problems by simulation: artificial ants searching the solution space simulate real ants searching their environment, the objective values correspond to the quality of the food sources and an adaptive memory corresponds to the pheromone trails. In addition, the artificial ants are equipped with a local heuristic function to guide their search through the set of feasible solutions.
5.6
Iterated Lin-Kernigham algorithm
The Iterated Lin Kernigham (ILK) [67, 53] has been proposed by Johnson [40] and it is considered to be one of the best for the TSP. ILK uses LK to obtain a first local minimum. To improve this local minimum, the algorithm examines other local minimum tours 'near' the current local minimum. To generate these tours, ILK first applies a random and unbiased nonsequential 4-opt exchange to the current local minimum and then optimizes this 4-opt neighbor using the LK algorithm. If the tour obtained by this process is better than the current local minimum then Iterated LK makes this tour the current local minimum and continues from there using the same neighbor generation process. Otherwise, the current local minimum remains as it is and further random 4-opt moves are tried. The algorithm stops when a stopping criterion based either on the number of iterations or the computational time is satisfied. The random 4-opt exchange performed by ILK is called double-bridge move and plays a diversification role for the search process; It tries to propel the algorithm into
226 a different area of the search space while preserving at the same time large parts of the structure of the current local minimum. The ILK procedure is as follows: Step 1. Generate a random tour T. Step 2. Do the following for some prespecified number, M, of iterations: Step 2.1. Perform an (unbiased) random 4-opt move on T, obtaining T1. Step 2.2. Run LK on T \ obtaining T 2 . Step 2.3. If length(T2)
< length{T),
set T = T 2 .
Step 3 Return T.
5.7
Guided local search
Guided local search (GLS) originally proposed by Voudouris and Chang [67, 4] is a general optimization technique suitable for a wide range of combinatorial optimization problems. The main focus is on the exploitation of problem and search-related information to effectively guide local search heuristics in the vast search spaces of NPhard optimization problems. This is achieved by augmenting the objective function of the problem to be minimized with a set of penalty terms which are dynamically manipulated during the search process to steer the heuristic to be guided. GLS augments the cost function of the problem to include a set of penalty terms and passes this, instead of the original one, for minimization by the local search procedure. Local search is confined by the penalty terms and focuses attention on promising regions of the search space. Iterative calls are made to local search. Each time local search gets caught in a local minimum, the penalties are modified and local search is called again to minimize the modification cost function.
5.8
Fast local search
Fast local search (FLS) originally proposed by Voudouris and Chang [67]. FLS works as follows. The current neighborhood is broken down into a number of small subneighborhoods and an activation bit is attached to each of them. The idea is to scan continuously the sub-neighborhoods in a given order, searching only those with the activation bit set to 1. These sub-neighborhoods are called active sub-neighborhoods. Sub-neighborhoods with the bit set to 0 are called inactive sub-neighborhoods and they are not being searched. The neighborhood search process does not restart whenever we find a better solution but it continues with the next sub-neighborhoods in the given order. This order may be static or dynamic.
227
5.9
GRASP
The Greedy Randomized Adaptive Search Procedure (GRASP) [20, 21, 38, 37, 56] is a two-phase local search. This randomized technique provides a feasible solution within every iteration. The final result is simply the best solution found over all iteration (multi-start local search). Each iteration consists of two phases, a construction phase and a local search procedure. In the construction phase a randomized greedy function is used to build up an initial solution. This solution is then exposed for improvement attempts in the local search phase. The construction phase can be described as stepwise adding one element at a time to the partial (incomplete) solution. The choice of the next element to be added is determined by ordering all elements in a candidate list with respect to a greedy function. The heuristic is adaptive because the benefits associated with every element are updated at each iteration of the construction phase to reflect the changes brought on by the selection of the previous element. The probabilistic component of a GRASP is characterized by randomly choosing one of the best candidate in the list but not necessary the top candidate. A generic GRASP algorithm is given below: Step 1. Input the problem instance. Step 2. Do until some GRASP stopping criterion is satisfied: Step 2 . 1 . Execute the construction phase of GRASP in order to obtain a greedy random solution. Step 2.2. Apply the local search phase to the obtained greedy solution. Step 2.3. Update the best solution if needed. Step 3. Return the best solution found. Although there are many GRASP applications to combinatorial optimization the only known application in the context of the vehicle routing problem is that of Kontoravdis et al. ([6, 42]). Kontoravdis and Bard addressed the problem of finding the minimum number of vehicles needed to serve n customers subject to time windows and capacity constraints: Step 1. Find a lower bound u on the number of routes needed. Step 2. Select u seed customers to form a set of initial routes. Step 3 . Calculate the cost of inserting each customers into these routes.
228 Step 4. Calculate penalties for each insertion. Step 5. Construct a list L that contains r largest penalties. From L, randomly choose one customer. Step 6. Try to insert the customer into the corresponding route. If successful, then go to Step 7. If insertion leads to a time or capacity violation, start a new route between the depot and the customer and go to Step 4. Step 7. If every customer is routed, then stop, otherwise update the costs and go to Step 5. Step 8. Repeat Steps 1 to 7 a predetermined number, M, of times and save the best results. During these M iterations, run a post-processor every N (N < M) times and keep the best results.
6
Computational Results
In order to compare the various algorithms we report the best solution values obtained using each algorithm on the 14 benchmark problems described by Christofides et al. [17]. The algorithms for which we have comparable results are: • A. The Clarke and Wright algorithm [17]. • B. The Mole and Jameson algorithm [17]. • C. The sweep algorithm [17]. • D. The Fisher & Jaikumar algorithm [45]. • TA. Willard's tabu search algorithm [26]. • TB. Osman's tabu search algorithm — Best admissible strategy [26]. • TC. Osman's tabu search algorithm — First best admissible strategy [26]. • TD. TABUROUTE [26]. • TE. Taillard's tabu search algorithm [26]. • T F . The Xu & Kelly tabu search algorithm [45]. • SAO. Osman's simulated annealing algorithm [45].
229 In Table 1 the numbers of each box give the total mileage. The best known solutions are indicated with an asterisk. The computational results suggest that the best known solutions for these bench mark problems are always found using Taillard's algorithm, followed closely by the Xu and Kelly algorithm and TABUROUTE. The classic heuristic algorithms give sufficiently good solutions but none of them is competitive with the tabu search implementations. SA implementations produce good quality results, although not so good as the results obtained by Taillard's algorithm. Concerning genetic algorithms, GIDEON minimizes the total distance and GENEROUS minimizes the total route time [26]. GENEROUS is computationally more expensive than GIDEON as it handles a population of solutions rather than a population of sector angles. For both algorithms the results are less favorable, when compared to TS algorithms. Besides the 14 benchmark problems by Christofides et al. [17], a large number of problem instances are in the TSPLIB library (www. e l i b . z i b . de) and others may be found from the VRP site (www.geocities.com/ResearchTriangle/7279/vrp.html).
230
,_ CM LO CO 00 LO
£
im ^ oco g o j o Sc o L o ' o i S CM i-H LO O *? oo oo LO OS OO £
T-H «P ^f CM in
* CD =i m CO oo
* -* -1 co CM oo
. . CO CO CM
00
*-. oq to ^ c>
sK
^
CO co m o5 Oi S co oo cs O «H?0)00 I-H CM OS * I* T. OS Tjl i-H Tf N CO ^ oo o; T-H •V 0O ID Ol 00 00 CM IV CM
^ 3 Jg "5 cS 3 o s S S 55
CMCOCMOCMinOCOSSS 1 0 0 0 0 0 H H « 3 0 ) 0 0 ^ 5 S co
IV t-LO
0 D 0 0
i-HT-HLO
C n
LO LO * • • _ — co in co * 3 M H ! H ! 't •^ co ^ ^
tco
s ^ CO 00 J5 00 o ho
CO OO 10 CO
00T-H^HT-H00T-H00
Sr
0 0 Tj< T f
• TTf
"
o o § o s ; s s S S S m 2 2 2 2 S ^
CO in LO
^Hi-HT-HT-ii-Ht^ 00 00
m
CM "* CO
3 £ S LO' ^
m
N
»
m
CM £ S
N ^ 3
3
^
io to 3
00 00 CO CO 00 O ) O LO OO Oi
o
^ i > c o ^ f S o c o i n 2 ° °
_,
CM CO
m
fe 53 08 s "5 OS
CO 0> SK
fc
§
*
U )
°
BS^2S5SS22SooS CMco^fmcoivoooi eu
oo
3
OJ OJ
CM T j i H N t o < 30i03I.N£U2J Si.>l co iv m O CO CO CO O in oo oo in os oo T-H T-H T—I
CO
231
References [1] E. Aarts and J.K. Lenstra. Local Search in Combinatorial Optimization. and Sons, 1997. [2] N. Ansari and E. Hou. Computational intelligence for optimization. Academic Publishers, first edition, 1997.
Wiley
Kluwer
[3] P. Augerat, J.M. Belenguer, E. Benavent, A. Corberan, and D. Nannef. Separating capacity constraints in the cvrp using tabu search. European Journal of Operational Research, 106:546-557, 1998. [4] B.D. Baker, V. Furnon, P. Shaw, P. Kilby, and P. Prosser. Solving vehicle routing problems using constraint programming and metaheuristics. Journal of Heuristics, 6:501-523, 2000. [5] R. Ballou. Business Logistics Management, Planning, Organizing and Controlling the Suply Chain. Prentice-Hall International, Inc., fourth edition, 1999. [6] J. Bard, L.Huang, P. Jaillet, and M. Dror. A decomposition approach to the inventory routing problem with satelite facilities. Transportation science, 32(2):189-203, 1998. [7] J.F. Bard, L. Huang, M. Dror, and P. Jaillet. A branch and cut algorithm for the vrp with satelite facilities. HE Transactions, 30:821-834, 1998. [8] L.D. Bodin and B.L. Golden. Classification in vehicle routing and scheduling. In B. Golden and L. Bodin, editors, Proceedings of the International Workshop on Current and Future Directions in the Routing and Scheduling of Vehicles and Crews, pages 97-108. Wiley and Sons, 1979. [9] L.D. Bodin, B.L. Golden, A. A. Assad, and Michael O. Ball. Routing and scheduling of vehicles and crews. Computers and Operations Research, 10:63-211, 1983. [10] J. Braca, J. Bramel, B. Posner, and D. Simchi Levi. A computerized approach to the new york city school bus routing problem. Technical report, Columbia University, 1994. [11] A. Van Breedam. Improvement heuristics for the vehicle routing problem based on simulated annealing. European Journal of Operational Research, 86:480-490, 1995. [12] A. Van Breedam. Comparing descent heuristics and metaheuristics for the vehicle routing problem. Computers and Operations Research, 28:289-315, 2001.
232 [13] B. Bullnheimer, R. Hartl, and C. Strauss. An improved ant system algorithm for the vehicle routing problem, pages 1-11, 1997. preprint. [14] V. Campos and E. Mota. Heuristics procedures for the capacitaded vehicle routing problem. Computational Optimization and Applications, 16:265-277, 2000. [15] Bo Sodererg Carsten Peterson. Artificial neural networks. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 173-214. Wiley and Sons, 1997. [16] N. Christofides. Vehicle routing. In E.L. Lawer, J.K. Lenstra, A.H.G. Rinnoy Kan, and D.B. Shmoys, editors, The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization, pages 431-448. Wiley and Sons, 1985. [17] N. Christofides, A. Mignozzi, and P. Toth. The vehicle routing problem. In N. Christofides, editor, Combinatorial Optimization, pages 315-338. Wiley and Sons, 1979. [18] M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, 26:1-13, 1996. [19] Peter J.M. van Laarhoven Emile H.L. Aarts, Jan H. M. Korst. Simulated annealing. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 91-120. Wiley and Sons, 1997. [20] T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedure. Journal of Global Optimization, 6:109-133, 1995. [21] P. Festa and M.G.C. Resende. Grasp: An annotated bibliography. Look again, 2000. [22] M. Fisher. The langrangean relaxation method for solving integer programming problems. Management Science, 27(1):1-18, 1981. [23] M.L. Fisher and R. Jaikumar. A generalized assignment heuristic for vehicle routing. In B. Golden and L. Bodin, editors, Proceedings of the International Workshop on Current and Future Directions in the Routing and Scheduling of Vehicles and Crews, pages 109-124. Wiley and Sons, 1979. [24] D. Ozgur G. Barbarosoglu. A tabu search algorithm for the vehicle routing problem. Computers and Operations Research, 26:255-270, 1999. [25] R. Ganeshan. An introduction to supply chain management, pages 1-7, 2001.
233 [26] M. Gendrau, G. Laporte, and J.Y. Potvin. Vehicle routing: Modern heuristics. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 311-336. Wiley and Sons, 1997. [27] M. Gendreau, G. Laporte, C. Musaraganyi, and E.D. Taillard. A tabu search heuristic for the heterogeneous fleet vehicle routing problem. Computers and Operations Research, 26:1153-1173, 1999. [28] J.H. Gerdessen. Vehicle routing problem with trailers. European Journal of Operational Research, 93:135-147, 1996. [29] F. Glover. Tabu search: A tutorial, pages 1-47, 1989. [30] B.L. Golden. Introduction to and recent advances in vehicle routing methods. In M. Florian, editor, Transportation Planning Models, pages 383-418. Elsevier Science Publishers, 1991. [31] B.L. Golden, G. Laporte, and E.D. Taillard. An adaptive memory heuristic for a class of vehicle routing problems with minmax objective. Computers and Operations Research, 24:445-452, 1997. [32] B.L. Golden and W. Stewart. Empirical analysis of heuristic. In E.L. Lawer, J.K. Lenstra, A.H.G. Rinnoy Kan, and D.B. Shmoys, editors, The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization, pages 207250. Wiley and Sons, 1985. [33] M. Hachicha, M.J. Hodgson, G. Laporte, and Frederic Semet. Heuristics for the multi-vehicle covering tour problem. Computers and Operations research, 27:29-42, 2000. [34] M. Haughton and A. Stenger. Semi-variable delivery routes and the efficiency of outbound logistics. International Journal of Physical Distribution and Logistics Management, 27:459-474, 1997. [35] K. Helsgaum. An effective implementation of the lin-kernigham travelling salesman heuristic. European Journal of Operational Research, 126:106-130, 2000. [36] Alan Hertz, Eric Taillard, and Dominique de Werra. Tabu search. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 121-136. Wiley and Sons, 1997. [37] K. Holmqvist, A. Migdalas, and P.M. Pardalos. Parallel continuous non-convex optimization. In A. Migdalas, P.M. Pardalos, and S. Stor0y, editors, Parallel Computing in Optimization, pages 471-528. Kluwer Academic Publishers, 1997.
234 [38] K. Holmqvist, A. Migdalas, and P.M. Pardalos. Parallelized heuristics for combinatorial search. In A. Migdalas, P.M. Pardalos, and S. Stor0y, editors, Parallel Computing in Optimization, pages 269-294. Kluwer Academic Publishers, 1997. [39] D. Johnson and C. Papadimitriou. Performance guarantees of heuristic. In E.L. Lawer, J.K. Lenstra, A.H.G. Rinnoy Kan, and D.B. Shmoys, editors, The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization, pages 145-180. Wiley and Sons, 1985. [40] D.S. Johnson and L.A. McGeoch. The travelling salesman problem: A case study. In E. Aarts and Lenstra J.K, editors, Local Search in Combinatorial Optimization, pages 215-310. Wiley and Sons, 1997. [41] H. Kokubugata, H. Itoyama, and H. Kawashima. Vehicle routing methods for city logistics operations. In IFAC Transportation Systems, 1997. [42] G. Kontoravdis and J.F. Bard. A grasp for the vehicle routing problem with time windows. ORSA Journal on Computing, 7(l):10-23, 1995. [43] G. Laporte. The travelling salesman problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59:231-247, 1992. [44] G. Laporte. The vehicle routing problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59:345-358, 1992. [45] G. Laporte, M. Gendreau, J.Y. Potvin, and F. Semet. Classical and modern heuristics for the vehicle routing problem. International Transactions in Operational Research, 7:285-300, 2000. [46] E.L. Lawer, J.K. Lenstra, A.H.G. Rinnoy Kan, and D.B.-Shmoys. The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley and Sons, 1985. [47] S. Lin. Computer solutions of the travelling salesman problem. Bell Technical Journal, 44:2245-2269, 1965.
System
[48] S. Lin and B.W. Kernigham. An effective implementation for the traveling salesman problem. Operations Research, 21:498-516, 1973. [49] A. Modares, S. Somhom, and T. Enwaka. A self - organizing neural network approach for multiple travelling salesman and vehicle routing problems. International Transactions in Operational Research, 6:591-606, 1999. [50] G. Mosheiov. Vehicle routing with pickup and delivery: Tour - partitioning heuristics. Computers and Industrial Engineering, 34:669-684, 1998.
235 [51] Heinz Muhlenbein. Genetic algorithms. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 137-172. Wiley and Sons, 1997. [52] W.P. Nanry and J.W. Barnes. Solving the pickup and delivery problem with time windows using reactive tabu search. Transportation Research Part B, 34:107-121, 2000. [53] David Neto. Efficient cluster compensation for Lin - Kernigham heuristics. PhD thesis, Computer science university of Toronto, 1999. [54] G. Reinelt. The Travelling Salesman Problem, Computational solutions for TSP Applications. Springer-Verlag, 1994. [55] J. Renaud, G. Laporte, and F. Boctor. A tabu search heuristic for the multi depot vehicle routing problem. Computers and Operations Research, 23:229-235, 1996. [56] M.G.C. Resende. Greedy randomized adaptive search procedure. Technical report, 1998. [57] Y. Rochat and I. Taillard. Probabilistic diversification and intensification in local search for vehicle routing problem. Journal of Heuristics, 1:147-167, 1995. [58] P. Rodriguez, M. Nusbaum, R. Baeza, G. Leon, M. Sepulveda, and A. Cobian. Using global search heuristics for the capacity vehicle routing problem. Computers and Operations Research, 25(5):407-417, 1998. [59] D.J. Rosenkratz, R.E. Stearns, and P.M. Lewis. An analysis of several heuristics for the travelling salesman problem. SIAM Journal on Computing, 6:563-581, 1977. [60] K.S. Ruland and E.Y. Rodin. The pickup and delivery problem: Faces and branch-and-cut algorithm. Computers Mathematical Applications, 33(12):1-13, 1997. [61] S. Thangiah. Vehicle routing with time windows using genetic algorithms. Technical Report, pages 1-23, 1993. [62] S.R. Thangiah, I. Osman, T. Sung, and R. Vinayagamoorthy. Algorithms for the vehicle routing problems with deadlines. American Journal of Mathematical and Management Science, 13:323-355, 1995. [63] S.R. Thangiah, J.Y. Potvin, and T. Sung. Heuristics approaches to vehicle routing with backhauls and time windows. Computers and Operations Research, 23:1043-1057, 1996.
236 [64] P. Tian, J. Ma, and D.M. Zhang. Application of the simulated annealing algorithm to the combinatorial optimisation problem with permutation property: An investigation of generation mechanism. European Journal of Operational Research, 118:81-94, 1999. [65] P. Toth and D. Vigo. An exact algorithm for the vehicle routing problem with backhauls. Transportation Science, 31(4):372-385, 1997. [66] D.V. Tung and A. Pinnoi. Vehicle routing-scheduling for waste collection in hanoi. European Journal of Operational Research, 125:449-468, 2000. [67] C. Voudouris and E. Tsang. Guided local search and its application to the travelling salesman problem. European Journal of Operational Research, 113:469-499, 1999.
Combinatorial and Global Optimization, pp. 237-249 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
A New Finite Cone Covering Algorithm for Concave Minimization Christophe Meyer ([email protected] ) Ecole Polytechnique de Montreal Depariement de Mathematiques et de Genie Industriel C. P. 6079, succ. Centre-ville Montreal (Quebec) Canada H3C 3A 7
Brigitte Jaumard ( b r i g i t t O c r t . umontreal. ca) GERAD and Ecole Polytechnique de Montreal Depariement de Mathematiques et de Genie Industriel C. P. 6079, succ. Centre-ville Montreal (Quebec) Canada H3C 3A7
Abstract We propose a new finite cone covering algorithm for concave minimization over a polytope, in which the cones are defined by extreme points of the polytope. The main novelties are the use of cones defined by an arbitrary number of edges, and the subdivision process. This latter is shown to have a "descent property", i.e., all subcones are strictly better in some sense than the subdivided cone, which eliminates the possibility of cycling. The main task in the subdivision process consists in expressing a given point of the polytope as a convex combination of extreme points of a face of the polytope. Keywords: concave minimization, cone covering, finite convergence
238
1
Introduction
We consider the following concave minimization problem (CP)
min{f(x)\x
<= P}
where / is a concave function defined on R™ and P is a full dimensional polytope of R". The first conical algorithm for concave minimization was proposed by Tuy [17] in 1964. The main idea was to cover the polytope by polyhedral cones, each of which having exactly n edges corresponding to extreme points of the polytope. If it can be shown that a cone cannot contain a better solution than the one at hand, the cone is fathomed; otherwise it is replaced by a set of subcones that cover it. Unfortunately, Zwart [22] showed that this algorithm is not finitely convergent as it was first expected by exhibiting a small example on which the algorithm cycles. In order to avoid the possibility of cycling, Bali [1] and Zwart [23] proposed a small modification which has the effect to transform Tuy's algorithm into a cone partitioning one. During the next two decades, numerous cone partitioning algorithms were developed and shown convergent ([4, 5, 6, 8, 16, 18, 19, 20, 21] to cite only a few; for additional references, see the surveys of Horst and Tuy [7], Benson [2]). Note however that up to now only infinite convergence could be shown when an exact solution is sought; the fact that the cones are not anymore defined by extreme points of the original polytope complicates significantly the search of a finitely convergent cone partitioning algorithm. In contrast, the only paper on cone covering is due to Gallo [3] who proposed a modification of Tuy's 1964 algorithm that results in a search tree which is a subtree of Tuy's one. However no convergence proof was given. It is only recently that Meyer [12] proved the infinite convergence of Gallo's [3] and Tuy's [17] algorithms. Only infinite convergence can be shown because cycling can occur in both algorithms. Two methods were proposed to transform these algorithms into finite ones. These methods are passive with respect to cycling: the first one does not attempt to identify if cycling occurs but still has a stopping criterion that guarantees an optimal solution after however a very large number of iterations. The second ensures that a cone is not generated twice by keeping a list of all generated cones. In this paper, we propose a finite cone covering algorithm which prevents cycling. The key point is a subdivision process: it was initially proposed in [11] and used to develop a cone covering algorithm that can be shown to be finitely convergent under the (rather unrealistic) assumption that the best simplicial lower bound is computed exactly for each cone (the notion of best simplicial lower bound was introduced in [4]; the computation of the best simplicial lower bound amounts to solve a convex program as shown in [11]). The nonexistence of cycles is proved by defining a function whose value depends only on the cone and on the current incumbent value, and that is shown to decrease strictly when going from one cone to another. This function is essentially equal to the optimal
239 value of the linear program that is solved in cone partitioning algorithms in order to check if the cone can be fathomed. The paper is organized as follows. In Section 2, the basic operations of the cone covering algorithm are recalled, i.e., the construction of an initial cover, the fathoming test, and the subdivision procedure with its descent property. In Section 3, we give the algorithm and prove its finite convergence. Conclusions are drawn in the last section.
2
Basic operations
In this section, we recall the basic operations needed to define the algorithm, namely the construction of an initial conical cover (Section 2.1), the fathoming test (Section 2.2) and the subdivision procedure (Section 2.3). The descent property satisfied by the subdivision process is given in Section 2.3.3.
2.1
Initial cover
Assume that a nondegenerate vertex of P is available. By performing a change of variables if necessary, we may assume that this vertex is the origin O. Let K° be the polyhedral cone of origin O defined by the n adjacent vertices to O. This cone is used to define the initial cover of P. If no nondegenerate vertex is available, we compute an interior point of P and use it to decompose P into n + 1 subpolytopes. This point defines a nondegenerate vertex for each of the n + 1 subpolytopes (see, e.g., Meyer [12])In this paper, we consider polyhedral cones of origin 0 that may have more than n edges. Moreover, each edge corresponds to an extreme point of P, i.e., the edge intersects the boundary of P at an extreme point. The set of cones of origin O and whose edges correspond to extreme points of P will be denoted by K. liu^,j = l...p are the directions of a cone, this cone will be noted conefw1, w 2 , . . . , up} (since the cone is vertexed at O, uj can be viewed indifferently as a point or as a vector of E n , for j = 1 , . . . ,p). For a subset X of R", we denote by conv(X) the convex hull of X.
2.2
Fathoming of a cone
In order to define the fathoming procedure, we first recall the definition of 7-extensions. Basically the 7-extension along a halfline of origin O (with 7 < f(0)) is the farthest point y on the halfline with value f(y) > 7. More precisely, if u denotes the direction
240 of the halfiine, the 7-extension is the point y = 0u with 0 = max{0|/(0 u ) > 7; Ou 6 C) where C is a large set containing the polytope P whose aim is to ensure that the 7-extensions are at finite distance. The notion of 7-extension was introduced in Tuy [17] (see also Horst and Tuy [7]). Now consider a cone K = cone{v},..., up}, let 7 = / be the value of the best known solution, and let yi, j = 1 , . . . ,p be the 7-extensions along the directions u1,... ,up respectively. Consider the following pair of primal-dual linear programs
p
PLP{K)
max V a%
"^2i\yi
s.t.
i-l,...,m
A>0. —
\
DLP{K)
- ^u
3=1
min i=l '
s.t.
m
^Hid'y1
> 1, j =
l,...,p
i=l
n> 0. It can be shown that problem PLP(K) has a finite optimal value, which due to linear programming duality (see, e.g., Luenberger [10]), is also an optimal value to problem DLP(K). Let p denote this optimal value. Consider the optimal solutions of problem PLP(K). We denote by ft the number of edges of K that are involved with a strictly positive coefficient in at least an optimal solution. Note that a convex combination of optimal solutions is still an optimal solution, therefore there exists an optimal solution (not necessarily basic) with exactly /3 components Xj that are strictly positive. Moreover, for a given optimal solution A, we define
Q = £A,V.
(!)
241 Now, let p. be an optimal solution of problem DLP(K).
We define
m
a
= ^/iiaS
(2)
i=\
and let H be the hyperplane of equation ax = p. The following property holds for H. Proposition 1 H C\P is a face of P. Proof: In order entirely contained first show that P for % = 1 , . . . ,m. m
to prove that H D P is a face of P, we have to show that P is in one of the halfspace defined by H and that H D P =^ 0. We C {x e Rn : ax < p}. Indeed, let x G P : then a*x < bt Multiplying each inequality by /i; and summing we obtain ax =
m
^ Ai^'z < J2 AA = P. Now by the complementary slackness conditions, we have Ai I a' ^ J AjyJ — &i J = 0 for i = 1 , . . . , m. After summing, we obtain au) = p which shows that UJ € H H P.
•
The following result is well known when the cone has exactly n edges (see, e.g., Horst and Tuy [7]): Proposition 2 If p < 1, then min /(a;) > / .
Proof: This result is usually proved by reasoning on the primal problem (see, e.g., Horst and Tuy [7]). Here we give a proof using problem DLP(K). By Proposition 1, H supports P. Furthermore H intersects the edges of K at points z? = ^j for j = 1 , . . . ,p. Thus K n P C S = conv{{0, z1,..., z"}). But since p < 1 and ayj > 1 by feasibility of p, and definition of d, we have F 6 [Oyj] for j = 1 , . . . ,p. Hence by concavity of / , f(zj) > m i n { / ( 0 ) , /(j/ J )} > / for j = 1 , . . . ,p. Using again the concavity of / , we deduce that min/(:r) = m i n { / ( 0 ) , f{zl),..., /(F)} > /. xes
Using the inclusion K f] P C S, we obtain then min fix) > f.
•
This result implies that if p < 1, the portion of the polytope P contained in the cone K cannot contain a point that improves the best known solution, hence the cone can be fathomed.
242
2.3
Subdivision of a cone
2.3.1
w-subdivision p
Let K = cone{v},...,
up} be a cone to be subdivided and let w = TJAJ-J^' be a
point of R™ distinct from O. Let J> = {j\\j
> 0}. Assume that |J>| > 1 (note that
p
this assumption is satisfied if V J Aj > 0 as it is the case if A is an optimal solution j=i
of problem PLP(K)). For each j € J>, define K3 as the cone obtained from K by t h replacing the j edge of if by the halfiine of origin O passing through w. We have the following result.
Proposition 3
K C (J JP
Proof: Let i be a point of K.
There exists at least a vector v > 0 such that
p
UjU3. Now let £ such that — = min - ^ We have &J Xe
3=1
i > Xj
p
/
e
j=ij^
,
A/
p
\
i=ij&
/
\
)
A*
By definition of £, Vj•, — — > 0 for j = 1 , . . . ,p, j ^ £ and — > 0. Hence x belongs to Ke, which proves the inclusion K C Uj^j>K3. 3
•
By Proposition 3, the set of cones {K }jej> defines a cover of the cone K. This subdivision process was proposed by Tuy [17] for the case where w £ K (in which case we obtain a partition of K), and extended by Gallo [3] to the case where w can lie outside K. The cones K3 are said subcones of K. If A = A is an optimal solution of problem PLP(K), then w = ui and the subdivision process is referred to as an UJ-subdivision.
243 2.3.2
Extension of a cone p
Since w = \~] ^jV' may not be an extreme point of P, a subcone K' of a cone K 6 K, 3=1
via w-subdi vision is not in general a cone of /C. The purpose of this Section is to construct a cone K" € K, such that K' C K". In Section 2.3.3, we will show that the proposed construction satisfies a descent property, which will be used to show the convergence of the algorithm. Recall that H is the hyperplane constructed from an optimal solution of the dual DLP(K). Denote by z3\ j = 1 , . . . ,p the intersection of the edges of K with the hyperplane H. By renumbering the edges of K if necessary, assume that K' is the cone obtained from K by replacing the first edge by the halfliine passing through w, so that K' = cone{uj, z2,..., zp}. Let X = {x1,..., xr} be a set of extreme points of H D P such that Q 6 conv(X). Since w € H (1 P, such points always exist by Caratheodory Theorem (see, e.g., Rockafellar [14]) which in addition states that there exists a set X with cardinality r < dim(P n H) + 1 < n. Note that since H n P is a face of P, extreme points of HOP are actually extreme points of P. The cone K" is then simply defined as K" = cone{z2,...,
zp, x\ ...,
xr}.
It may happen that xe = z1' for some £ and C: in this case, we simply remove the redundant edge from K". The following proposition shows that if this situation occurs, we may sometimes improve the current best solution. Proposition 4 If p > 1 and if xl — ze for some £ and £' with (! such that A^ > 0,
then f{xl) < f. Proof: Since V > 0, the complementary slackness conditions imply z1 — pye, hence xe = z1' lies after ye' on the edge. Since xe is an extreme point of P, it follows that y1' £ P C C, hence f(xe) < f{ye') = f by definition of the 7-extension. • The following result is immediate. Proposition 5 The inclusion K' C K" holds. v Proof:
VjZj. On
Let x be a point of K'\ there exists v > 0 such that x = v\Z) + ^ 3=2 r
the other hand, since w 6 conv(X), there exists 77 > 0 satisfying ^.Ve i=\
=
1 s u c h that
244
w= y
rftx . By replacing UJ, we obtain x = y
e=i
Vji? + N viVi^i which shows that
3=2
i=\
x belongs to K".
•
A set X corresponding to Caratheodory Theorem can be constructed by the following procedure, which is based on one of the numerous proofs of this theorem (Scherk [15]): Step 1 (initialization) : Let / be the set of indices of the constraints of P satisfied at equality by u>. Define the polytope P(I) = {x e R" : a'x < bi(i <£ I)\alx = bi(i € I)}. Set I1 4- I, co1 4- Co and k 4- 1. Step 2 (extreme point) : find an extreme point xk of
P(Ik).
Step 3 (update of w) : compute the intersection point ojk+1 of the halfline [xkuik) with the boundary of P(Ik). If uik+l is an extreme point of P(Ik), stop: a) is a convex combination of the extreme points x1,..., xk, uik+1. Step 4 (update of / ) : let Ik+1 be the set of indices of the constraints of P satisfied at equality by u>k+1. Increment k and return to Step 2. If Step 2 is implemented by solving a linear program with an arbitrary objective function, the set X is obtained after solving at most n linear programs. These solutions can be more efficiently computed with the observation that the optimal solution at iteration k is dual-feasible for the linear program solved at step k + 1. Also note that the dimension of the polytope P(I) decreases by at least 1 at each iteration. 2.3.3
Descent property
Let K be a cone of /C, and K" another one obtained from K by ^-subdivision and extension. We show in this section that cr(K") -< a(K) where a is a function from K to I P x N that maps a cone K to the 2-dimensional vector (p(K), J3(K)) (the quantities p(K) and P(K) were defined in Section 2.2). The symbol -< is the less-than symbol for the lexicographic order. We first give a general result.
Proposition 6 Let K be a cone such that p(K) > 1. Let UJ = V ^ Xjy-' be an .7 = 1
solution of problem PLP(K). Let J> = {j : Xj > 0}. Let H = {x 6 K™ : ax = p] be the hyperplane associated to an optimal solution of problem DLP(K). Denote by Y the set of f-extensions of K, and by Y> = {\p 6 Y;j € J>} its restriction to J>. Similarly define K' and Y'. Assume that ay' > 1 for all y' G Y'. Then p(K')
<
p(K).
(3)
245 Furthermore, (i) if (3) is satisfied at equality, (ii) ay' > 1 for all y' e Y'\Y, Y> <£ Y', then P(K')
and (Hi)
< p(K).
(4)
Proof: Recall that a = V ^ pna1 where \x is an optimal solution of problem Since ay'j > 1 for j = 1 , . . . ,p, p is a feasible solution for DLP(K'), p{K).
DLP(K).
hence p(K') <
Now let us show the second part. Recall that fi(K') is the number of edges that can be part of an optimal solution of problem PLP(K') with a strictly positive coefficient. n
Let A' be such an optimal solution and consider &>' = 2\2 W^' ^
e
nave
®y'* — *
for j = l , . . . , p with strict inequality for the new edges by assumption (ii), i.e., those that are not already edges of cone K. Observe that A^ = 0 for all j such that ay'i > 1. Indeed, if not, we would have aCj' > p, which is impossible since UJ' e P and P C {x £ K n | a x < p). Hence u' is a positive combination of points of Y f~l Y', which means that A' is a feasible solution for PLP(K). Since p(K') = p(K), it is actually an optimal solution, hence /3(K') C /3(K). This inclusion is strict since by assumption (iii) at least one edge that is involved with a strictly positive component in an optimal solution of PLP(K) is absent from the decomposition of ui'. • From this result, we conclude that Corollary 1 If p > 1, then either we find a point x1' of PC\H satisfying f(xe') ora(K")
< f,
The cone K" differs from K by:
• the removing of an edge [Oyi) corresponding to j e J>, that counts for 1 in P{K). • the addition of edges corresponding to extreme points xl of P(~)H. Assume that none of these points improves the best known solution. Then their /-extension satisfies ay1 > 1. These 7-extensions correspond to the set Y'\Y in Proposition 6. Clearly for the edges common to K and K", we have ayi > 1 by definition of fi. Hence p(K") < p(K) by Proposition 6.
246 If p(K") < p{K), we are done. Hence assume that p{K") = p{K). Since we already know that ay' > 1 for y' e Y'\Y , it suffices to show (iii) Y> (/L V. This is true if the best known solution is not improved since an element of Y> was removed and could not be added again in the extension process by Proposition 4. • Note that Proposition 6 can also be applied to the cone K', obtained from K by w-subdivision. We then conclude that the cone partitioning algorithm (the modified version of Tuy's 1964 [17] algorithm by Bali[l] and Zwart [23]; see also Jaumard and Meyer [8] [9]) also shows the descent property. However since no upper bound on the number of possible cones is available, we cannot use the same argument to show the finite convergence of the cone partitioning algorithm.
3
Algorithm
The proposed algorithm is a two-phase algorithm: the first phase consists in a local search while the second phase aims to prove that the current best point is optimal or to find a better point. Phase 1 (local search) : starting from a point z of P, find an extreme point x of P satisfying f{x) < f(z). Let / = f(x). Go to Phase 2. P h a s e 2 (transcending the incumbent) : Step 1 (initialization): construct an initial conical cover C of P as indicated in Section 2.1. For each cone of K of C, solve the linear problem PLP(K), obtaining the optimal value p(K), the point Co{K) and the hyperplane H(K). Let C be the set of cones of C for which p(K) > 1. Step 2 (optimality test and selection): if £ = 0, stop: x is an optimal solution of problem (CP) with value / . Otherwise select K* € argmax{/?(i;sr)|.ft' G £ } . In case of equality, select the cone that maximizes P{K). Subsequent ties can be broken by defining an order on the extreme points defining a cone and by selecting the smallest with respect to this order. Remove K* from C. Step 3 (subdivision): w-subdivide the cone K* via the point Ci{K*) as indicated in Section 2.3.1 and extend the subcones using a set X of extreme points of the face P n H(K*) as explained in Section 2.3.2. Let C be the set of subcones. Step 4 (update of the incumbent): if for some extreme point x of X, f(x) < / , go to Phase 1 with z = x.
247 Step 5 (fathoming): for all cone K in C, construct the linear program PLP(K). Let p(K) be its optimal value, and ui(K) and H{K) be respectively the point and hyperplane associated with the primal and dual optimal solution. Add to C all cones K of C for which p{K) > 1 and return to Step 2.
Theorem 1 After a finite number of iterations, the algorithm terminates optimal solution x of problem (CP).
with an
Proof: Since the number of extreme points of a polytope is finite, and since at each occurrence of Phase 1 we have an extreme point with strictly smaller value, the number of occurrences of Phase 1 is finite. Hence we have only to show that Phase 2 is finite. Let Kh be the cone selected at iteration h of Phase 2, ph = p{Kh) and $h = p{Kh). By definition of the selection rule (Step 2) and by the descent property (Corollary 1), the function h >-> (ph, f5h, Kh) is decreasing when considering the lexicographic order (by Kh, we mean for example the vector obtained by concatenating the vectors corresponding to the extreme points defining the cone). But since the cones are defined by extreme points of P and since the incumbent is updated with extreme points of P (see Step 4), there are a finite number of distinct pairs (K, f), and hence a finite number of values (p,J3,K). It follows that the algorithm terminates after a finite number of iterations. •
4
Conclusions
In this paper we have presented a new finite cone covering algorithm for concave minimization. As for previous cone covering algorithms, the cones are defined by extreme points of the polytope, which is a desired property since the optimal solution belongs to the set of extreme points. But contrary to these algorithms, cycling does not occur due to the descent property of the subdivision process. In particular, it is not necessary to maintain a list of all generated cones.
References [1] S. Bali (1973), "Minimization of a Concave Function on a Bounded Convex", PhD thesis, University of California at Los Angeles.
248 [2] H. P. Benson (1996), "Concave minimization: Theory, applications and algorithms", In Reiner Horst and Panos M. Pardalos, editors, Handbook of Global Optimization, Kluwer Academic Publishers. [3] G. Gallo (1975), "On Hoang Tui's concave programming algorithm", Nota scientifica S-76-1, Instituto di Scienze dell'Informazione, University of Pisa, Italy. [4] P. Hansen, B. Jaumard, C. Meyer, and H. Tuy (1996), "Best simplicial and double-simplicial bounds for concave minimization", Les Cahiers du GERAD G-96-17, GERAD, Montreal, Canada. Submitted for publication. [5] R. Horst and N. V. Thoai (1989), "Modification, implementation and comparison of three algorithms for globally solving linearly constrained concave minimization problems", Computing, 42, 271-289. [6] R. Horst, N. V. Thoai, and H. P. Benson (1991), "Concave minimization via conical partitions and polyhedral outer approximation", Mathematical Programming, 50, 259-274. [7] R. Horst and H. Tuy (1996), Global Optimization (Deterministic Springer-Verlag, Berlin, third, revised and enlarged edition.
Approaches),
[8] B. Jaumard and C. Meyer (1996), "On the convergence of cone splitting algorithms with w-subdivisions", Les Cahiers du GERAD G-96-36, GERAD. Submitted for publication. [9] B. Jaumard and C. Meyer (1998), "A Simplified Convergence Proof for the Cone Partitioning Algorithm", Les Cahiers du GERAD G-98-07, GERAD. Submitted for publication. [10] D. G. Luenberger (1973), Linear and Nonlinear Programming, Addison-Wesley Publishing Company, second edition. [11] C. Meyer (1996), Algorithmes coniques pour la minimisation quasiconcave, PhD thesis, Ecole Polytechnique de Montreal. [12] C. Meyer (1997), "On Tuy's 1964 cone splitting algorithm for concave minimization", Les Cahiers du GERAD G-97-48, GERAD, To appear in A. Migdalas et al., editors, From Local to Global Optimization, proceedings of the workshop in honor of Professor Tuy's 70th birthday (Linkoping), Kluwer Academic Publishers. [13] M. Nast (1996), "Subdivision of simplices relative to a cutting plane and finite concave minimization", Journal of Global Optimization, 9, 65-93. [14] R. T. Rockafellar (1970), Convex Analysis, Princeton University Press, Princeton, New Jersey.
249 [15] P. Scherk (1966), "On Caratheodory's theorem", Canadian Mathematical letin, 9(4), 463-465.
Bul-
[16] N. V. Thoai and Hoang Tuy (1980), "Convergent algorithms for minimizing a concave function", Mathematics of Operations Research, 5, 556-566. [17] H. Tuy (1964), "Concave programming under linear constraints", Soviet Mathematics, 5, 1437-1440. [18] H. Tuy (1991), "Effect of the subdivision strategy on convergence and efficiency of some global optimization algorithms", Journal of Global Optimization, 1, 2 3 36. [19] H. Tuy (1991), "Normal conical algorithm for concave minimization over polytopes", Mathematical Programming, 51, 229-245. [20] H. Tuy, V. Khatchaturov, and S. Utkin (1987), "A class of exhaustive cone splitting procedures in conical algorithms for concave minimization", Optimization, 18(6), 791-807. [21] H. Tuy, T. V. Thieu, and Ng. Q. Thai (1985), "A conical algorithm for globally minimizing a concave function over a closed convex set", Mathematics of Operations Research, 10, 498-514. [22] P. B. Zwart (1973), "Nonlinear programming: Counterexamples to two global optimization algorithms", Operations Research, 21, 1260-1266. [23] P. B. Zwart (1974), "Global maximization of a convex function with linear inequality constraints", Operations Research, 22, 602-609.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 251-263 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
A diagonal global optimization method Anna Molinaro (annaSunical. i t ) Dip. Elettronica, Informatica e Sistemistica Universitd della Calabria, 87030 Rende (CS) - Italy Clara Pizzuti ( p i z z u t i S s i . d e i s . u n i c a l . i t ) Istituto per la Sistemistica e I' Informatica - C.N.R., c/o D.E.I.S. - Universitd, della Calabria, 87030 Rende (CS) - Italy Yaroslav D. Sergeyev1 ( y a r o S s i . d e i s . u n i c a l . i t ) Istituto per la Sistemistica e I' Informatica - C.N.R., c/o D.E.I.S. - Universitd della Calabria, 87030 Rende (CS) - Italy and University of Nizhni Novgorod, Gagarin Av., 23, Nizhni Novgorod - Russia
Abstract In this paper we consider the classical global optimization problem of searching the global minimum of a multiextremal multidimensional Lipschitz function over a hyperinterval. We present a new multidimensional diagonal algorithm belonging to the class of information global optimization methods. It is supposed that the Lipschitz constant is unknown and only the values of the objective function can be evaluated. The new method uses local tuning on the behaviour of the objective function to accelerate the search adaptively estimating local Lipschitz constants during minimization. Sufficient convergence conditions are established for the proposed technique. Numerical examples are also reported. Keywords: Global optimization, diagonal approach, information methods, acceleration, convergence.
Corresponding author
252
1
Introduction
In this paper we consider the following problem : min/(:r),
(1)
x£zD
where f(x) is a multidimensional multiextremal function satisfying the Lipschitz condition with a constant 0 < L < oo over a hyperinterval D C Rn, D = [a, b] = {x : a(i) < x(i) < b(i), 1 < i < n}.
(2)
Many numerical algorithms have been proposed to solve this problem (see e.g. [4, 5, 6, 8, 14, 23]). One of the possible ways to tackle the problem (1), (2) consists in generalizing fast univariate methods to the multidimensional case. Information algorithms proposed in [19, 20] have shown a good performance in comparison with the other methods using only the values of the objective function in their work (see [19, 2, 16]). These algorithms are derived as optimal statistical decision functions within the framework of a stochastic model representing the objective function as a sample of some random function. The. diagonal approach has been introduced in [10, 11, 13] to extend a class of onedimensional methods to the multidimensional case. The diagonally extended information algorithm presented in [11, 13] generalizes the information method from [19, 20]. This method has been widely used in such applications as : nonlinear approximation, data classification, globally optimized calibration of complex system models etc. (numerous examples of real-life problems solved by the diagonal global optimization algorithms are presented in [13]). Since the diagonal approach is actively used in applications where the evaluation of every value of f(x) usually takes a lot of time, the problem of its acceleration arises. Using a local information during the global search is one of the ways to speed up the search. It has been shown (see [15, 16]) for the one-dimensional information global optimization algorithms that adaptive estimating the local Lipschitz constants instead of the global one can accelerate the search significantly. In this paper new sufficient global convergence conditions are established for the diagonally extended information algorithm from [11, 13]. This result permits to accelerate the method by introducing a new parameter in its scheme. Then, a new diagonal information algorithm using local information about behavior of the objective function over small subintervals of the search region [a, b] is proposed. Theoretical results and numerical experiments show that the introduced acceleration tools permit to obtain a significant speed up the comparison with the original diagonal information method.
253
2
Diagonal information global optimization algorithm and its new convergence conditions
Let us describe the diagonally extended information algorithm (DEIA) from [11, 13] generalizing the univariate information method from [19, 20] to the multi-dimensional case. Given a point x £ D, let I be the current iteration number, m = m(l) - the current number of generated subintervals, and e > 0 - a given accuracy of the search. DEIA sequentially subdivides the domain D into adaptively generated n-dimensional subintervals Dt = [ai,bi\, 1 < i < m, having vertices a;,6j. The objective function f(x) is evaluated only at the vertices a;,h;. At each step the 'characteristic value' is evaluated for every subinterval Dj. A new point is generated at the subinterval having the highest characteristic value by means of a point selection function 5,. Step 0. (Initialization) Set I = 1; m = 1; x0 = a; X\ = b; z0 = f(x0), z\ = f(x{). The initial estimate of the global optimum is taken as z\ = min{zo, Z\). Suppose now that I > 1 iterations of the method have already been executed. The iteration / + 1 consists of the following steps. Step 1. (Calculating characteristics) For each hyperinterval Dt = [a i; 6j], 1 < i < m, calculate its characteristic
Ri = K || at - k || + I ^ i ) - l | ] ] ! _ 2(/(ai) + /(&,)),
(3)
where the value K estimates the global Lipschitz constant L of the objective function and is determined as follows
K = K(l) = (4 + ?) max ' f™ ~ {^ '. l' i
||
(4)
at-btW
The constant C > 0 is the reliability parameter of the method. Step 2. (Interval selection) Choose a cell Dt such that Rt = max Ri.
(5)
l
Step 3. (Stopping Rule) If || Dt ||> e, where || . || is the Euclidean norm, then go to Step 4, otherwise take the value z\ as an estimate of the global optimum of the problem (1),(2) and Stop. Step 4. (Point selection) By using the function St choose a new point
x
,+i _ q _ ot + h - S t - —
f(h) - f(at) —
X||
bt-at a t _
M
|
. . (6)
254 belonging to the main diagonal (the diagonal joining the vertices at and bt) of the subinterval Dt, where t is from (5). Step 5. {Partitioning) Subdivide the hyperinterval Dt into 2" new subintervals generated by the intersection of the boundary of Dt and the hyperplanes that contain xl+1 and are parallel to the boundary hypersurfaces of Dt. Step 6. (Executing new trials) Denote by Xi, i = l , . . . , s , the vertices of the new subintervals generated during Step 3 where f(x) must be evaluated. The number s = 2 x 2" - 3 because the new 2n subintervals are identified by their two vertices, but the points at and bt come from the subdivided hyperinterval Dt (f{x) has already been evaluated at its vertices during the previous iterations) and xl+1 is common to two hyperintervals. Evaluate f(x) at the points xt, i = 1 , . . . , s. Take the value z*l+l = m m { z , * , z i , . . . , z s } as an estimate of the global optimum of the problem (1),(2) after executing the (I + l)th iteration. Set 1 = 1 + 1, m = m + 2n — 1 and go to Step 1. Let V be the set of limit points (points of accumulation) of the infinite (e = 0 in the stopping rule) sequence {yk} generated by DEIA during minimization of the function f(x) from (1),(2) and X* be the set of global minimizers of f(x) over D. It has been proved in [11] (see also [12, 13]) that if, starting from an iteration number i*, for K = K(l) from (4) the inequality K(l)>AL,l>l*,
(7)
holds then, the set Y' of the limit points coincides with the set X* of the global minimizers. The following theorem presents new convergence conditions for DEIA. Theorem 1 Let there exist an iteration number I* such that for a hyperinterval Dj,j = j(l), containing a global minimizer x* of f(x) during the l-th iteration of DEIA the following inequality takes place K(l) > 2Hj + ^4Hj
- Af,
l>r,
(8)
where K(l) is from (4),
Xj = 3
X
i/fe)-/(yi
l l * ' - O j II
ll&7-**ll
(9) '
Then, x* is a limit point of the trial sequence {yh} generated by DEIA.
255 Proof. Suppose, that there exists a limit point y' ^ x* of the trial sequence {yk}. It follows from (3) that for a hyperinterval Diti = i(l), containing y' during the Ith. iteration of LT1, the following result takes place lim Ri(l) = -4f(y').
(11)
Consider now the hyperinterval Dj,j = j(l), containing the global minimizer x* and suppose, that x* is not a limit point of {yk}. This signifies that there exists an iteration number m such that for all / > m
xl+liDjJ=mEstimate now the characteristic Rj(l), I > m, of the hyperinterval Dj. It follows from (10) and the fact of x* e Dj that
/K)-/(**)<#; II ^-^11, f(bj) - / ( * ' ) < ^
|| bj
-x'\\.
Then, summarizing these inequalities we obtain f{*j) + f{bj) < 2/(i*) + Hj(\\ aj - x* || + || bj - x* ||) < 2(f(x*)+Hj\\aj-bj\\). From this inequality and (8) we can deduce for all iteration numbers I > l* that Rj(l) =|| bj - a, || (Kj(l) + \)Kj{l)'1) || bj - aj || (Kj(l) + \)lKj(l)
- 2(f(a3) + /(&,)) >
- 4Hj) - Af{x'). > - 4 / ( x ' ) .
(12)
As x* is a global minimizer since it follows from (11), (12) that an iteration number m* > max{7*,m} will exist such that Rj{m*) > Ri(m*). But this means that during the m*th iteration trials will fall at the hyperinterval Dj. Thus, our assumption that x* is not a limit point of {yh} is not true and theorem has been proved. • The obtained result is quite different in comparison with the corresponding conditions proved in [11, 13] for DEIA. First, it has been demonstrated that to have convergence to a global minimizer x* it is not necessary to estimate correctly the global Lipschitz constant over the whole region D (this value may be underestimated). It is enough that the value K(l) satisfies condition (8) at a subinterval containing the point x* during the Zth iteration. Second, the new condition (8) is weaker than (7).
256 Since the number 4 in (4) is the direct consequence of condition (7), these considerations advise to modify DEIA by introducing (cf. (4))
*-*<„-<,•?, ^ 1 ^ * 1 1 .
,13,
The number 4 is substituted here by the reliability parameter r > 1. Increasing r we increase the reliability of the method but slow down the search. Thus, for problems where the choice of the parameter 1 < r < 4 is enough to satisfy (8), the global convergence will be maintained and an acceleration of the search will be obtained. These theoretical results are confirmed by numerical experiments presented in Section 4.
3
A new diagonal information algorithm
The value K(l) estimates the global Lipschitz constant L of the objective function. This estimate is used in the whole region D. Thus, in a subinterval Di where the local Lipschitz constant L, is significantly less than L this method uses the same global estimate K. It has been shown for the one-dimensional information global optimization algorithms (see [15, 16]) that using estimates of the local Lipschitz constants can accelerate the global search significantly. The same effect takes place for the information methods using the Peano type space-filling curves (see [16]). The idea of the new method presented in this Section is to tune the diagonal global optimization algorithm on the local behaviour of f(x) in every hyperinterval Di estimating the corresponding local Lipschitz constants Lt. Since the new algorithm belongs to the class of diagonal information global optimization methods, it is enough for its description to introduce the characteristic and point selection functions -R* and S* to be used during Step 1 and Step 4. Let us introduce these Steps of the New Algorithm (NA). Step 1. (Calculating characteristics) For each hyperinterval D{ = [OJ, 6j], 1 < i < m, calculate its characteristic Ri = Ki\\ai-
k || + ( / ( q | ) - / y ) 2 _ 2 ( / ( a i ) + f(bl)), Hi || en ~ bi ||
(14)
where the local estimate Ki of the local Lipschitz constant L, is calculated by the formulae Ki = Ki(l) = max{0.5(r + y ) ( M + —),£}, M = max — l
flu
7-jT—, -
Oj
(15) 16)
257 the value Xt is from (9), and £ > 0 is a small number needed for correct work of the method in the degenerate case f(xi) = const for all trial points xt. The constants r > 1 and C > 0 are the reliability parameters of the method. Step 4. (Point selection) By using the function St choose a new point i+i_s_at±bt x
St
- ~—2
fjbt) ~ / K ) „
Wt
X
bt ~ at
K^MI
,17,
(17)
belonging to the main diagonal of the subinterval Dt, where t is from (5).
Note that NA uses a local information about the objective function over the whole search region D during the global search in contrast with techniques which do it only in a neighborhood of local minima after stopping their global procedures (see e.g. [5]). Let us designate by X' the set of limit points of the infinite (e = 0 in the stopping rule) sequence {xh} generated by NA in the course of minimization of the function f(x) from (1), (2). The introduced method belongs to the class of diagonally extended information algorithms and also to the more general classes of adaptive partition and divide the best algorithms (see [12, 13] and [18], respectively). Since the following convergence results can be proved for {xh} by using these general frameworks, we omit their proofs. Theorem 2 Let x' be a limit point of the sequence {xk} xk € {xk} it follows f(xk) > f{x').
then, for all trial points
Corollary 1 If alongside with x' there exists another limit point x" £ X'
then,
fix') = fix"). Let us now establish the global convergence conditions for the introduced method.
Theorem 3 Let there exist an iteration number I* such that for a hyperinterval Dj,j = j(l), containing a global minimizer x* of fix) during the l-th iteration of DEIA the following inequality takes place Kjil) > IHj + JW]
- \%
1>V,
(18)
where Kjil) is calculated by (15), Xj is from (9), and Hj is from (10). Then, x* is a limit point of the trial sequence {xk} generated by DEIA.
258 Proof. To prove the theorem it is enough to show that the estimates Ki(l) of the local Lipschitz constants Li from (15) are bounded values. In fact, since the global Lipschitz constant L < oo and the constants r > 1, C > 0, and f > 0 it follows 0
< (r + C)max{L,£} < oo,
/ > 1.
The rest of the proof repeats the arguments produced to prove Theorem 1. • Corollary 2 Given the conditions of theorem 3 all the limit points of the sequence {xk} are the global minimizers of f{x). Proof. This corollary is a straightforward consequence of Corollary 1. • Corollary 2 ensures the inclusion X' C X*. These sets are identical when conditions established by Corollary 3 hold. Corollary 3 If condition (18) is fulfilled for all the points x* € X*, then the set of limit points of {xk} coincides with the set of global minimizers of the objective function f(x), i.e. X' = X*. Proof. The corollary follows immediately from Theorem 3 and Corollary 1. •
4
Numerical results
In this Section we present numerical experiments executed to test DEIA and NA. The following test functions taken from literature have been used. Problem 1. (Six-Hump Camelback Function) f{xux2)
xi = (4 - 2.1a:? + -j-)x\ + Xlx2 + ( - 4 + 4x22)x22, - 2 . 5 < xx < 2.5, - 1 . 5 < x2 < 1.5.
This function has six local minimizers, two of which x{ ~ (0.0898, —0.7126) and x\ ~ (-0.0898,0.7126) are global. The global minimum z* ~ -1.0316285. Problem 2. (Quartic Function) 4
9
9
where —10 < x, < 10, i = 1,2. This function has two local minimizers; one of them x' ~ (-1.04668,0), z* ~ -0.3523865 is global.
259 Problem 3. [7] f{x) = -{10«n 2 0ri/i) + D ( K ~ 1) 2 (1 + 10sm 2 (7n/ i+1 )] + (y„ - l ) 2 } , n
i=i
Vi = 1 + -(xi - 1), where —10 < xt < 10, i = 1 , . . . , n. This function has roughly 5™ local minimizers and a unique global optimizer at x\ = 1, i = 1 , . . . , n, the global minimum z* = 0 . Problem 4. [7] /(a) = —{sin2(3Mi)+ £ [ ( H - 1) 2 (1 + sm 2 (37rx i+1 )]} + - ^ ( z „ - 1)2[1 +
sin2{2nxn)},
where —10 < Xj < 10, i = l , . . . , n . This function has roughly 15" local minimizers and a unique global optimizer at x* = 1, i = 1 , . . . ,n, the global minimum z* = 0. To show influence of the reliability parameter r on the search speed we test both methods choosing r = 1.45 and r = 1.7. We used C = 10, e = 0.01 for the twodimensional functions and C = 100, e = 0.02 for the three-dimensional ones. In all the experiments £ = 10~6. The results are reported in Tables 1 and 2, respectively, where the following designations are used: PN - problem number; PD - problem dimension; GMV - global minimum value; VDEIA - the best value found by DEIA; VNA - the best value found by the new algorithm; DEIA - number of function evaluations (trials) executed by DEIA; NA - number of trials executed by the new algorithm; Average - average number of trials executed by the methods.
260 Table 1: Results of the numerical experiments executed with r = 1.45.
PN 1 2 3 4 3 4
PD 2 2 2 2 3 3
GMV -1.031629 -0.3523865
0.0 0.0 0.0 0.0
VDEIA -1.031209 -0.345443 0.001009 0.000677 0.000012 0.000384
VNA -1.031526 -0.345042 0.001111 0.000240 0.000287 0.003591
Average
DEIA 2372 10657 3462 1222 33997 8218 9998.0
NA 797 3767 1917
542 25118 2264 5734.1
We have not tested DEIA with the original value r = 4 because (see Tabs. 1, 2) the further increasing r leads to a strong slowing down the search. Figures 1, 2 present trial points generated by DEIA and NA. It is seen from Figures that density of the points produced by NA is significantly less both in the neighborhood of the global minimizer and very far from it. Table 2: Results of the numerical experiments executed with r = 1.7.
PN 1 2 3 4 3 4
PD 2 2 2 2 3 3
GMV -1.031629 -0.3523865
0.0 0.0 0.0 0.0
Average
5
VDEIA -1.030856 -0.344359 0.001004 0.000668 0.000023 0.000324
VNA -1.031385 -0.345005 0.001128 0.000048 0.000157 0.002991
DEIA 5117 12017 4207 1332 51976 10181 14136.8
NA 1402 4497 2082
827 32879 2654 7390.1
Conclusions
In this paper the class of the diagonal information global optimization algorithms widely used in many real-life applications has been chosen to tackle the problem of finding the global minimum of a multiextremal multidimensional Lipschitz function defined over a hyperinterval. New global convergence conditions have been established for the diagonally extended information algorithm proposed in [11, 13] and generalizing the one-dimensional information method from [19, 20]. The new result has permitted to modify the multidimensional method and, as consequence, to improve its speed characteristics.
261
Figure 1: Level curves of problem 3 with the trial points generated by DEI A with r = 1.45
Figure 2: Level curves of problem 3 with the trial points generated by the new algorithm with r = 1.45
262
A new diagonal information algorithm using local tuning on the behaviour of the objective function during the global search has been proposed. The method adaptively estimates the local Lipschitz constants over different subintervals of the search region applying the local tuning everywhere in contrast with traditional methods using a local information only near the global minimizer. Global convergence conditions have been established for the new technique. The algorithm demonstrates a satisfactory performance in comparison with the information diagonal method using only global information about the Lipschitz constant.
References [1] Butz, A.R. (1968), Space filling curves and mathematical programming, Inform. Control., 12(4), 314-330. [2] Grishagin, V.A. (1978), Operation characteristics of some global optimization algorithms, Problems of Stochastic Search, 7, 198-206. [3] Grishagin, V.A., Ya.D. Sergeyev, and R.G. Strongin (1997), Parallel characteristical global optimization algorithms, J. of Global Optimization, 10, 185-206. [4] Floudas, C.A. and P.M. Pardalos, (1996), Eds., State of the Art in Global Optimization, Kluwer Academic Publishers, Dordrecht. [5] Horst, R. and P.M. Pardalos, (1995), Eds., Handbook of Global Kluwer Academic Publishers, Dordrecht.
Optimization,
[6] Horst, R. and H. Tuy, (1996), Global Optimization - Deterministic 3d ed., Springer Verlag, Berlin.
Approaches,
[7] Lucidi, S. and M. Piccioni (1989), Random tunneling by means of acceptancerejection sampling for global optimization. Journal of Optimization Theory and Applications, 62(2), 255-277. [8] Pardalos, P.M. and J.B. Rosen, Eds., (1990), Computational Methods in Global Optimization, Annals of Operations Research, 25. [9] Pijavskii, S.A. (1972), An algorithm for finding the absolute extremum of a function, USSR Comput. Math. Math. Physics, 12, 57-67. [10] Pinter, J. (1983), A unified approach to globally convergent one-dimensional optimization algorithms, Techn. Report IAMI CNR 83-51. [11] Pinter, J. (1986), Extended univariate algorithms for n-dimensional global optimization, Computing, 36, 91-103.
263 [12] Pinter, J. (1992), Convergence qualification of adaptive partition algorithms in global optimization, Math. Programming, 56, 343-360. [13] Pinter, J. (1996), Global Optimization in Action, Kluwer Academic Publishers, Dordrecht. [14] Torn, A. and A. Zilinskas (1989), Global Optimization, Springer-Verlag, Lecture Notes in Computer Science, 350. [15] Sergeyev, Ya.D. (1994), Using local information for acceleration of the global search, Lecture, Operations Research Seminar, Nizhni Novgorod State University, Nizhni Novgorod. [16] Sergeyev, Ya.D. (1995), An information global optimization algorithm with local tuning, SI AM J. Optimization, 5(4), 858-870. [17] Sergeyev, Ya.D. (1995), A one-dimensional deterministic global minimization algorithm, Comp. Mathematics and Mathematical Physics, 35(5), 553-562. [18] Sergeyev, Ya.D., On convergence of "Divide the Best" global optimization algorithms, to appear in Optimization. [19] Strongin, R.G. (1978), Numerical Methods on Multiextremal Problems, Nauka, Moscow. [20] Strongin, R.G. (1989), The information approach to multiextremal optimization problems, Stochastics & Stochastics Reports, 27, 65-82. [21] Strongin, R.G. (1992), Algorithms for multiextremal mathematical programming problems employing the set of joint space-filling curves, J. of Global Optimization, 2, 357-378. [22] Strongin, R.G. and Ya.D. Sergeyev (1992), Global multidimensional optimization on parallel computer, Parallel Computing, 18, 1259-1273. [23] Zhigljavsky, A.A. (1991), Theory of Global Random Search, Kluwer Academic Publishers, Dordrecht.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 265—282 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Frequency Assignment For Very Large, Sparse Networks R o b e r t A. M u r p h e y ( m u r p h e y @ e g l i n . a f . m i l ) Air Force Research Laboratory Munitions Directorate, Eglin AFB, FL 32542 USA
Abstract The frequency assignment problem (FAP) concerns the allocation of a limited frequency spectrum to a group of transmitters. Although the frequency spectrum is continuous in nature, it is typically partitioned into discrete channels, each with a unique label. Often the FAP is presented as an interference graph whose vertices represent transmitters and where an edge between a pair of vertices represents the potential for interference between the corresponding transmitter pair. The objective is then to minimize the number of unique channels used, termed a minimum order assignment, or else to minimize the difference between the maximum and minimum channels used, termed a minimum span assignment. Clearly the FAP is closely related to graph coloring and consequently is found to be a difficult problem in practice. An approximate algorithm for finding minimum order assignments is discussed. The algorithm relies on a transformation of the interference graph to an alternate graph of greatly fewer vertices. This transformation is shown to be especially useful for graphs that are large and sparse. Experimental results demonstrate that the alternate graph algorithm finds low order assignments much faster than direct methods, especially for large, sparse graphs. In addition, a sequential coloring technique which constructs "short" Hamiltonian paths in the interference graph is adapted to the FAP. Results indicate that this technique combined with the transformation method works very well. Local search neighborhoods are defined which offer good improvements on the primal solutions with reasonable added running time. K e y w o r d s : frequency assignment, sparse.
266
1
Introduction
Consider the graph G with vertex set V(G) and a function, <j> that maps the vertices of the graph to points in Euclidean j-space: cj>(v) € 5RJ, Vv € V(G). Define the edge set E(G) such that
(rHw) e E(G) -t=>||^(«j) -
D(i,j)
for all Vi, Vj 6 V
where D(i,j) € 5R+ and || • || denotes the Euclidean distance operation. The graph G is called a j-unit sphere graph. If the vertices of the graph represent transmitters,
Instance: G(V,E), d find / : V -> Z+ \ {0} s.t. {vi,Vj) G E i—> \f{vi) - f(vj)\
> d^, d^ e Z+.
Hale [8] defines two figures of merit for any feasible frequency assignment; the number of unique channels used in the assignment, which he calls order, and the largest channel assigned minus the smallest channel assigned, which he calls span. Cozzens and Roberts [5] define the notation for minimum order and minimum span to be XT(G) and spr(G) respectively. Hale points out that when dy = 1, the FAP is equivalent to ordinary graph coloring and termed this the co-channel FAP. He proved
267 that the minimum span and minimum order co-channel FAP are equivalent K-colorability, which is known to be jVP-complete [7], so we expect that disk FAP is at least as hard. Hale further showed that for the co-channel spr(G) + 1 = XT(G) = x{G), where x{G) denotes the chromatic number of
to graph the unit problem, graph G.
The remainder of the paper is organized as follows. In Section 2 exact and approximate methods for finding minimum order and minimum span frequency assignments will be discussed. Since exact methods are limited to very small problems, the emphasis will be on sequential heuristics. An established heuristic named DSATUR will be reviewed and a heuristic which, at one point, constructs approximate traveling salesman tours will be introduced. In Section 3 a new approach is discussed which transforms the interference graph into a much smaller alternate graph. A frequency assignment is made on the alternate graph which is feasible to the original interference graph. Local search methods offer the ability to improve feasible solutions obtained by any method and are discussed in Section 4. Finally, in Section 5 some experimental results are discussed.
2
Minimum Order and Minimum Span Assignments
As was stated earlier, solutions to the easiest FAP, the co-channel problem, is equivalent to K-colorability which is known to be A/"75-complete. Nonetheless, it is this similarity to graph coloring that is often exploited when designing algorithms for the FAP. It is especially a simple matter to adapt a graph coloring algorithm to the unit disk FAP; simply replace the coloring constraint {vi,Vj) G E <—• l/l^i) ~ f(vj)\ > I with
(vi,Vj) £ £ M \f(vi) - f{vj)\ > dij. However, since graph coloring algorithms try to find colorings with x colors, these approaches are typically much better for finding minimum order assignments and usually don't do as well for minimum span objectives.
2.1
Exact Methods
Several exact methods exist for coloring graphs. Brelaz [2] developed a widely used method which was improved by Peemoller [17] that uses a depth-first backtracking scheme to improve a solution obtained by some primal heuristic. Lanfear [11] presents an adaptation of Peemoller's algorithm for the FAP. Kubale and B. Jackowski [10] present a generalized implicit enumeration scheme. Cameron [4] developed an algorithm that solves a minimum set covering problem to test the hypothesis that a graph
268 has a given chromatic number. When the test is repeated for all chromatic numbers between an upper bound (as found by a heuristic) and a lower bound (the clique number) the resulting algorithm is exact. None of the exact techniques work reliably on large FAPs since in the worst case, they may enumerate an exponential number (in n = \V\) of solutions.
2.2
Sequential Heuristics
Since exact methods are, in general, not efficient for most frequency assignment problems, approximate techniques have usually been the focus of the research. The most popular approximation approach is based upon what are known as sequential heuristics for graph coloring. Of course all sequential methods designed for obtaining graph colorings may also be used for obtaining frequency assignments by changing the conditions of feasibility to those of the FAP. Sequential heuristics are so named because they color the vertices of a graph one at a time according to some sequential order. For every sequential heuristic, two rules must be defined; the first specifies how the vertices will be ordered and, once a vertex is selected, the second specifies how to select the color to assign to it. Metzger [14] describes three possible ways to select the color:
Frequency Exhaustive Given an ordering of the vertices, attempt to color each vertex, sequentially, the smallest feasible color. This approach is also called a greedy coloring. Requirement Exhaustive Given an ordering of the vertices, attempt to color each vertex, sequentially, color 1. Then repeat for color 2 and so on. Uniform Given an ordering of the vertices, attempt to color each vertex, sequentially, the color that has been least used.
Zoellner and Beall [19] studied the performance of sequential techniques by comparing the performance of three different vertex ordering rules each applied to greedy (frequency exhaustive) and uniform sequential methods. Their results indicated that greedy assignment rules typically out perform uniform assignment rules, regardless of the vertex order specified. Given a vertex order, it can easily be shown that greedy runs in 0(n2t) time for problems with unrestricted spectrum, where n = |V(G)| and t is the number of unique values in d [5]. When the channels in the spectrum are restricted to a finite domain (domains may also be different for each vertex), the run time improves. In this work, unrestricted spectrum (no domains) will be assumed.
269 Many vertex ordering schemes have been studied for the greedy heuristic, with mixed results. Two schemes that have been well studied, the order that is the shortest Hamiltonian path and the ordering by saturation degree, will now be described.
2.2.1
Shortest Hamiltonian path as a vertex order
A path in a graph G(V, E) is an ordered sequence of distinct vertices and edges in G such that each edge is incident to the vertices listed immediately before and after it. A cycle is a path that begins and ends at the same vertex. A path that includes every vertex in V exactly once is called a Hamiltonian path. A cycle that includes every vertex in V exactly once is called a Hamiltonian cycle. It is important to recognize that an ordering of n vertices of G induces a Hamiltonian path in G and vise-versa. Consider the edge set E(G) of graph G to have weights w: w(vt,Vj) — d y for all (vt, Vj) € E(G). Then the length of a path (cycle) is simply the sum of the weights on the edges of the path (cycle). The problem of finding a cycle or "tour" of shortest length which visits each vertex exactly once is the traveling salesman problem (TSP). TSP is a generalization of the Hamiltonian cycle problem which seeks to find a Hamiltonian cycle (with or without weights). Consider a slightly modified TSP which seeks a path of shortest length that visits each vertex exactly once. This problem will be referred to as the shortest Hamiltonian path problem (SHPP). Raychaudhuri uncovered a relationship between the span of a greedy sequential assignment for problems with two unique weight values and the shortest Hamiltonian path in G. Theorem 1 (Raychaudhuri [18]). Consider an interference graph G which is complete with channel distances constrained to dtj £ {1,2}. Consider the edge set E(G) of graph G to have weights w: w(vi,Vj) = djj for all (vi,Vj) E E(G). Then spr(G) =length of the shortest Hamiltonian path in G. A greedy sequential assignment that follows the vertex order induced by the shortest Hamiltonian path on G will obtain a minimum span assignment for G. Inspired by Raychaudhuri's work with Hamiltonian graphs, Lanfear [11] describes an approximate algorithm that finds assignments for dy- £ {1,2} and G not complete. Observe that a shortest Hamiltonian path need not be unique. Indeed, in a complete graph that has only two unique weights, that is d^ 6 {1,2}, the existence of a single, unique shortest Hamiltonian path would often be unusual. Based upon experimental evidence, Murphey conjectures that Theorem 1 may be generalized to any number of unique weight values [15].
270 Conjecture 1 (Murphey [15]). Consider an interference graph G which is complete with channel distances defined by the arbitrary integer array d = {dy}. Consider the edge set E(G) of graph G to have weights w: w(vi,Vj) = dtj for all (vi,Vj) € E(G). Consider the set % that contains all vertex orders of V(G) that induce Hamiltonian path lengths in G equal to that of the shortest Hamiltonian path in G. Then there exists at least one vertex order in H that when followed by the greedy sequential algorithm, will obtain a minimum span assignment for G.
Importantly, Conjecture 1 does not claim that any Hamiltonian path in % will obtain a minimum span assignment, only that there is at least one in ri that will. Of course, this is not the panacea it appears to be for two reasons. First, Conjecture 1 requires G to be complete which is not very realistic. Nonetheless, this problem can be overcome since the main algorithm in Section 3 depends upon a technique that transforms any interference graph into one that is complete. The second and more compelling reason is that finding a Traveling Salesman tour is JVP-complete, hence we expect that SHPP should be as difficult. However, for n < 100, there are many approximate techniques that are able to find good solutions and occasionally optimal solutions in polynomial time for TSP [12]. While n < 100 is unrealistically low for large FAPs, this approach is shown to be quite useful in Section 3 since the algorithm presented there transforms the large interference graph to an alternate graph often with fewer than 20 vertices. Furthermore, in [15] Murphey presents experimental evidence that suggests that Hamiltonian path length and span are well correlated for sequential greedy assignments on unit disk graphs. Thus "short" Hamiltonian paths will, on average, induce smaller span assignments than "long" Hamiltonian paths will.
The Short Hamiltonian P a t h Heuristic (ISHP) Pursuing the heuristic notion that "short" Hamiltonian paths will have, on average, small spans, an iterated short Hamiltonian path heuristic (ISHP) results.
271 ISHP 1. Input G, d, K; 2. i = 1, best = {0}; 3. Do until i = K 4.
Construct a "short" Hamiltonian path pl;
5.
Obtain a frequency assignment / on (G, d) using a sequential greedy assignment with vertex order specified by
6.
If / is "better" than best, then best<— / ;
7.
i = i + 1;
8. Return best; As mentioned previously, there are many efficient ways to construct short Hamiltonian paths as required of ISHP in Step 4. ISHP uses an edge ordering or greedy feasible technique to construct a path on G. A vertex 2-exchange (2-opi) local search is used to improve the primal solution [16]. For step 5, frequencies are assigned to vertices using a greedy rule following the vertex order induced by the Hamiltonian path p1. The best solution of K iterations is retained as the global solution. The edge ordering technique is an 0(n log n) operation and the 2-opt local search has a neighborhood with (^) elements thus is an 0(n2) task. Therefore, ISHP has worst case complexity of 0(n2K) but in practice the local search is often much better. 2.2.2
Saturation degree as a vertex order
Consider a graph G(V, E) with a coloring / . That is, / is a co-channel frequency assignment on G. The saturation degree of a vertex v 6 V(G) is defined to be the number of unique colors (equivalently channels) that exist on the vertices that are adjacent to v. During a sequential assignment, of all the unassigned vertices, the vertex with the highest saturation degree is considered to be the "most denied" since there are fewer colors to choose from. Based upon this notion, Brelaz [2] developed the DSATUR algorithm. DSATUR is a greedy sequential heuristic that uses the saturation degree of the vertices to specify an ordering. The algorithm begins (step 1) by randomly selecting a vertex from V(G) and assigning
272 it the smallest color. At step i, the saturation degree of the n — i uncolored vertices is computed and the vertex with the largest saturation degree is selected to be colored the smallest feasible color. DSATUR terminates when all n vertices have been colored. The algorithm may construct K such solutions using K random starts and select the best one as the global solution. DSATUR has been successfully applied to the FAP [9], [3]. In particular, Borndorfer, Eisenblatter, Grotschel and Martin [3] find good minimum interference solutions for infeasible FAPs by using DSATUR as a starting heuristic and then improve the solutions with a local search consisting of an iterated 1-opt (vertex 2-exchange). The level of improvement that the local search yields is often slight due to the high quality of the DSATUR solution. Because DSATUR tends to color denser regions of the graph first, it has been found to obtain excellent solutions for unit disk graphs, also known as geometric graphs [6] because they tend to have large cliques. DSATUR runs in 0(n2) time for graph coloring. For the minimum order or minimum span problem, DSATUR may be implemented for general unit disk graph FAPs with unconstrained spectrum in 0(n2t) time by using a binary heap to update the saturation degree of the assigned vertices. In this study DSATUR will be discussed in two contexts; for frequency assignment and for graph coloring. To avoid confusion, the DSATUR used for frequency assignment will henceforth be denoted as DSATURFA while the graph coloring version will be called simply DSATUR.
3
Alternate graph approach
A new approximate approach is presented which promises to find good low order assignments for interference graphs which are sparse. The technique constructs an alternate graph GA from G using the DSATUR algorithm and then finds an assignment on GA which is feasible to G using either ISHP or DSATURFA. The advantages of the new method are that it can construct solutions faster than methods which finds assignments directly on G and that the solutions have low orders. A local search is also investigated which offers slight improvements on the order of solutions that exceed x(G) and greatly improves the span for all solutions. The construction of an alternate graph is now presented. Begin with the interference graph G(V, E) and corresponding channel distance matrix d.
273 Alternate Graph Construction.
1. Input
G(V,E),d.
2. Partition V(G) into independent sets J/1,2/2, • • • ,2/T by finding a 7-coloring of G. A 7-coloring is an assignment / of positive, nonzero integers to V(G) subject to the coloring constraint {vi,Vj) G E <—> \f{vi) — f(vj)\ > 1. The vertices of the alternate graph will be labeled the resulting colors: V(GA)
=yi,V2,-.-,y1-
3. Compute the channel distances dA for GA: dAj — max< dki : for all k G yi,l G jjj > 4. The edges of the alternate graph are (Vi, %) G E(GA) if dA ± 0 for all Vi, Vj G V(GA),
i± j.
The result of Step 2 is a collection of independent sets 2/1,2/2, • • •, 2/7 which partitions V(G). Suppose that sets j/; and % are mutually independent for i ^ j , that is, there exists no vertex in j/j that is adjacent to any vertex in j / , . If this is true, then yi and Vj may be combined as a single, larger independent set. It is easy to see that if sets j/i and j/j cannot be combined for all i / j , then GA is complete. If GA is complete, then the ISHP heuristic may be used to find a frequency assignment on GA. Furthermore, if the size of GA is small, then ISHP will find good solutions within reasonable run-times. Clearly, if 7 = x{G), then GA must be complete and the size of GA is minimized. If 7 > x{G), then GA may still be complete (and often will be) but it cannot be guaranteed. For these reasons, the graph coloring algorithm in Step 2 of the alternate graph construction should attempt to obtain a coloring with 7 = x(G) colors. DSATUR obtains reasonable estimates of 7 « x(G) and was used in this study. In practice, for all of the test problems investigated in this study, DSATUR with the number of random starts equal to at least one percent of n, always yielded a complete alternate graph and often obtained a coloring with 7 = x{G) colors.
3.1
The Alternate Algorithm
Consider the following theorem.
274 Theorem 2. A feasible assignment for GA is always feasible for G. Proof. Channels assigned as /(j/i), /(jte), • • •, f(y~t) are in-turn assigned to all v £ V since if v £ yt then f(v) = /(j/j) and because j/i, 2/2, • • • > 2/7 partitions V(G). Feasibility of f{v) for all v € V follows from use of the maximum operator in dfj = maxfc,,{<4i : for all k £yhl G % } . D
Observe that an alternate graph with 7 vertices will have order < 7. The result; reducing G to GA guarantees that if a feasible assignment exists, it will have an order which does not exceed the size of the alternate graph. This fact and Theorem 2 suggests an algorithm for obtaining a low order frequency assignment on G by finding an assignment on GA. The alternate frequency assignment (AFA) algorithm is described below.
AFA 1. Construct GA using DSATUR. 2. Find frequency assignment / on GA using sequential FA heuristic. 3. Map / back to V(G).
It is observed that |V(G.a)| is approximately constant for constant L words, no matter how large G becomes, GA will remain approximately as long as the density of G remains the same. This means that if G very short Hamiltonian paths are being constructed, AFA should, on very low order frequency assignments.
J . In other the same size is sparse and average, find
In step 2 of AFA a frequency assignment is made on the alternate graph. Two sequential FAP heuristics are investigated for this step; ISHP and DSATURFA. The ISHP heuristic is used since there is some indication that it will obtain assignments with tighter spans than those made using DSATURFA. However, any assignment technique could be used in step 2 and still obtain low order solutions since the order is more a function of the size of GA than how GA is assigned channels. An analysis of the complexity of AFA is as follows. In step 1, graph coloring of G using DSATUR is an 0(n2) task, where n = |V(G)|. For step 2, frequency assignment on GA using ISHP was shown to be an OlrfK) operation, where 7 = ^ ( G ^ ) ! and K is the number of path constructions allowed. If 7 is quite small, K may be of the same magnitude as 7. Frequency assignment on GA using DSATURFA is an 0(y2t)
275 operation, where t is the number of unique channel separations in d. In any event, when G is sparse, 7 < < n. For example, 7 is several orders of magnitude smaller than n when N5§I| ~ 6, which commonly occurs and is deemed realistic [1]. No matter which FA method is used in step 2, the construction step dominates the running time and the main algorithm runs in 0(n2) in the worst case. Since the run-time of DSATURFA on G is 0(n2t), the main algorithm is expected to be faster than DSATURFA for problems with a large number of unique values in d.
4
Local Search
Local search is an effective way to improve solutions obtained by primal (sequential) heuristics. DSATURFA and AFA yield solutions of low order but large span. Hence, in this section, four local search methods are introduced for decreasing the span of a feasible assignment obtained by either of these primal methods. The effectiveness of a local search strategy is greatly attributed to the neighborhood of solutions it is restricted to. A desirable neighborhood contains good solutions but is not so large as to be difficult to search. The neighborhoods studied in this paper are all motivated by the observation that the only way to decrease the span of an assignment is to reassign those vertices with the largest (highest) channel.
Neighborhood TVl. Begin with an assignment / obtained for G, d with highest channel F = m a x { / } and order X. Define the set V to be all vertices with channel F assigned. Consider all of the solutions that result when all of the vertices in V are reassigned to lower channels such that the order of the new solutions are each maintained at X. In this procedure, if |V| > 1, then the vertex of largest degree is chosen to be reassigned first. Ties of degree are broken arbitrarily. This procedure is repeated on the best new solution and iterates in this manner until reassignment of all vertices in V is not possible. All of the solutions constructed during this process together comprise the neighborhood jVl. This is the smallest of all the neighborhoods studied.
Neighborhood 7V2. This neighborhood is identical to Hi with the exception that if reassignment of all vertices in V is not possible, then those vertices that cause the conflict are backtracked to and an attempt is made to reassign them. The partial backtracking scheme used here is that of Leung's [13] where only one level of backtracking is permitted.
276 Neighborhood Af3. The third neighborhood is the same as jVl with the exception that the order is not restricted to remain at X. This neighborhood is much larger than jVl but should contain solutions with much lower spans. The obvious disadvantage is that the orders may not be as small. This trade-off between the two objectives is well modeled as a multi criteria optimization problem, the subject of a future paper.
Neighborhood 7V4. The forth neighborhood incorporates the partial backtracking of neighborhood Ml into the unrestricted order neighborhood of Af3. This is the largest of all the neighborhoods studied.
The restricted order neighborhoods A/"l and Ml are easily applied to AFA primal solutions by searching the unrestricted neighborhoods AfS and A/"4 respectively on GA. That is, if in AFA at the end of step 2 the neighborhoods J\f3 or A/"4 are searched, the assignment at the end of step 3 will be equivalent to the local minimum of neighborhoods jVl or M2 respectively. The restricted neighborhoods are not so easily searched for DSATURFA primal solutions. To do so requires a technique that will force exactly one new channel into the new assignment. The difficulty is that there is no obvious rule for which channel should enter the solution or which vertices should use the new channel.
5
Experimental Results
The two primal algorithms, AFA and DSATURFA, and several local searches were applied to four test problems. The test problems used possess unit-disk interference graphs with no domain constraints. Few modifications are necessary to the algorithms for handling domain constrained problems but it was decided that the relationship between span and order could be ascertained more clearly without the domains. The test problems were obtained from the European Cooperation on the Long term in Defense (EUCLID) program Combinatorial Algorithms for Military Applications (CALMA) project at web site ftp://ftp.win.tue.nl/pub/techreports/CALMA/index.html. The test instances are all from the DUTtestl.tar.gz file with the domains removed and all equality constraints replaced with > constraints. This should result in problems with much larger feasible search spaces than those of the domain constrained instances. Each instance in the DUTtestl.tar.gz file were developed using the GRAPH (Generating Radio link frequency Assignment Problems Heuristically) generator developed by Benthem et al. [1]. However, the largest instances provided in this set have 916 vertices. Consequently, the GRAPH generator concept was used to develop an additional instance with 1832 vertices. The test problems are summarized in Table
277 Table 1: Test Instances Instance TUD200.3 TUD916.1 TUD916.3 TUD1832.1
\V(G)\ 200 916 916 1832
\E(G)\ 1160 5177 5262 10515
1. Three primal algorithms were tested: • DFA. The DSATURFA sequential heuristic applied directly to G. • A F A 1 . The AFA sequential heuristic with ISHP used in step 2. • AFA2. The AFA sequential heuristic with DSATURFA used in step 2. The number of random starts P used for DSATUR in step 1 of AHAl and AFA2, for ISHP and DSATURFA in step 2 of AHAl and AFA2 respectively, and for DSATURFA in the DFA primal algorithm was, in all cases, set experimentally to be 1% of the total number of vertices. Higher values of P (4% and 5%) were found to improve the primal solutions by only small amounts while requiring substantial additional computing time. In the case of DFA primal, this additional time could be quite significant. A local search was added to each of the primal solutions. It quickly became apparent that the neighborhoods without backtracking ( Ml and A/"3) were too small and often gave little or no improvement to the primal solution. Results for these neighborhoods are not given. A local search using the J\f2 neighborhood was performed for the AFA1 and AFA2 primal solutions. Quite often this local search did not improve the solution greatly. The JV4 neighborhood was searched for the DSATURFA, AFA1 and AFA2 primal solutions with and without a prior JV2 search and was found to reduce the spans substantially but at the cost of increasing the order. Ten replications were run for each instance/algorithm combination and the sample mean values recorded in Tables 2, 3, 4, and 5. All algorithms were coded in Matlab 5.1 and run on a Pentium 166 with 24Mbytes of memory.
Primal Results. The experimental results indicate that in-practice, AFA1 will always run faster than AFA2 and certainly faster than DFA. Both AFAl and AFA2
Table 2: Solutions For TUD200.3 Test Instances algorithm DFA DFA with Mi AFA1 AFA1 with M2 AFA1 with Mi AFA1 with Ml and Mi AFA2 AFA2 with M2 AFA2 with Mi AFA2 with M2 and Mi
order 145 140 12 12 109 108 12 12 104 106
span 941 590 1637 1566 543 537 1673 1673 567 540
run-time (sec) | 178 248 41 117 233 292 85 95 258 279
Table 3: Solutions For TUD916.1 Test Instances algorithm DFA DFA with Mi AFA1 AFA1 with Ml AFA1 with Mi AFA1 with M2 and Mi AFA2 AFA2 with M2 AFA2 with Mi AFA2 with M2 and Mi
order 396 375 18 18 313 313 18 18 285 285
span 1040 659 2665 2632 657 657 3158 3142 690 666
run-time (sec) ] 3305 3583 396 476 1414 1479 651 796 1542 1658
Table 4: Solutions For TUD916.3 Test Instances algorithm DFA DFA with Mi AFA1 AFA1 with M2 AFA1 with Mi AFA1 with M2 and Mi AFA2 AFA2 with M2 AFA2 with Mi AFA2 with M2 and Mi
order 393 376 18 18 314 297 18 18 282 249
span 1038 682 2765 2584 669 675 2989 2989 706 709
run-time (sec) 3126 3324 390 471 1349 1400 544 778 1414 1658
279 Table 5: Solutions For TUD1832.1 Test Instances algorithm DFA DFA with JV4 AFAl AFAl with JV2 AFAl with M4 AFAl with JV2 and JV4
order 525 496 18 18 414 414
span 1145 678 3477 3305 661 661
run-time (sec) 18013 18593 2070 2219 4697 4790
are an order of magnitude faster than DFA. This data confirms one of the major advantages of the AFA methods; that alternate graph transformations take advantage of the sparsity of the interference graphs and significantly reduce the amount of time necessary to obtain a low order solution. The second major advantage of the AFA methods should also be clear from the data; the orders obtained by AFAl and AFA2 are extremely low. Indeed, AFAl and AFA2 appear to be equally capable of obtaining low order assignments. However, as expected, the low order assignments are obtained at a cost. The spans for these assignments are very high. Note that the spans for AFAl are always lower than for AFA2 which lends credence the conclusions of Conjecture 1; that is, a sequential assignment that follows a short Hamiltonian path is more likely to obtain a low span solution than a sequential assignment that uses the saturation degree. Indeed, AFA2 has no advantage in order or span for instances TUD200.1, TUD916.1, and TUD916.3 and always requires more time than AFAl. Consequently, AFA2 was not applied to instance TUD1832.1.
Local Search Results. Searching neighborhood 7V2 (fixed order) appears to achieve moderate reductions in span for AFAl solutions but does little or nothing for the primal solutions obtained by AFA2. Why this is true is not yet understood. Searching the JV4 neighborhood on DFA solutions works well at reducing span, however, the orders tend to remain quite large. For AFAl and AFA2 solutions the A/"4 neighborhood achieves the best reductions in span, with somewhat better reductions for AFAl than for AFA2. As with the 7V2 results, why this is true remains to be determined. Recall that the MA. neighborhood allows increases in order for reductions in span, which clearly results in the tests. Nonetheless, the orders obtained by searching A/"4 on AFAl and AFA2 solutions are always smaller than those obtained using DFA with or without the A/"4 local search. The key disadvantage of Ni. is that it can be timeintensive. Even so, when combined with an alternate graph method (AFA), it obtains better solutions than the direct DFA method in much less time. This time advantage increases significantly with the size of the graph. Searching A/"2 requires very little time and so is recommended even though it often makes only small improvements.
280
6
Conclusions
Alternate graph transformations take advantage of the sparsity of the interference graphs and reduce by an order of magnitude the amount of time necessary to obtain a low order solution. Indeed, the orders obtained by alternate graph methods are extremely low, though the respective spans are very high. An alternate graph method that uses DSATUR for the frequency assignment on the alternate graph (i.e. AFA2) has no advantage in order or span and always requires more time than an alternate graph method that uses a short Hamiltonian path heuristic for the frequency assignment (AFA1). Local search neighborhoods were defined which permit reductions in order and span. Searching these neighborhoods is most likely worth the extra time for reducing order and is certainly worthwhile for reducing span. For an unknown reason, the neighborhoods do not work as well for AFA2 as for AFA1. These observations recommend that AFA1 be used to obtain initial solutions and that these solutions be improved by searching the J\f2 neighborhood, which will reduce the span while keeping the order small. If further reductions in span are needed, the A/"4 local neighborhood always performs well but with the cost of increased order. This discussion underlines the fact that at some point, it becomes impossible to achieve reductions in both span and order. This trade-off should be relatively easy to model and analyze as a multi criteria optimization model. There is a clear time advantage using the alternate graph methods over conventional sequential heuristics when sparsity exists. The dependency of this time advantage on sparsity should be investigated. It is expected that for more sparse interference graphs that the time advantage will increase and will decrease for less sparse graphs.
References [1] H. van Benthem, A. Hipolito, B. Jansen, C. Roos, T. Terlaky and J. Warners (1995), "EUCLID CALMA technical report, GRAPH: a test case generator, generating radio link frequency assignment problems heuristically," EUCLID CALMA Project, Delft and Eindhoven Universities of Technology, The Netherlands.
[2] D. Brelaz (1979), "New methods to color the vertices of a graph," tions ACM, Vol. 22, 251-256.
Communica-
281 [3] R. Borndorfer, A. Eisenblatter, M. Grotschel and A. Martin (1998), "Frequency assignment in cellular phone networks," Annals of Operations Research 76, 7393. [4] S.H. Cameron (1973), "The solution of the graph coloring problem as a set covering problem," IEEE Transactions on Electromagnetic Compatibility, Vol. EMC-19, 320-322. [5] M.B. Cozzens and F.S. Roberts (1982), "T-colorings of graphs and the channel assignment problem," Congressus Numerantium, Vol. 35, 191-208. [6] J. Culberson, A. Beacham and D. Papp (1995), "Hiding our colors," CP'95 Workshop, Studying and Solving Really Hard Problems, Cassis, France, September 1995, 31-42. [7] M. R. Garey and D. S. Johnson (1979), Computers and Intractability, A Guide To The Theory of AfV-Completeness, W.H. Freeman and Company, New York, NY. [8] W.K. Hale (1980), "Frequency assignment: theory and applications," Proc. of The IEEE Vol. 68, No. 12, 1497-1514. [9] W.K. Hale (1981), "New spectrum management tools," IEEE International posium on Electromagnetic Compatibility, Boulder, CO, 47-53.
Sym-
[10] M. Kubale and B. Jackowski (1985), "A generalized implicit enumeration algorithm for graph coloring," Communications of the ACM, 28, 412-418. [11] T.A. Lanfear (1989), Graph Theory and Radio Frequency Assignment, Technical Report, Allied Radio Frequency Agency NATO Headquarters, B-1110, Brussels, Belgium. [12] E.L Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (1985), The Traveling Salesman Problem, John Wiley and Sons Ltd. [13] D.S.P. Leung (1981), "Application of the partial backtracking technique to the frequency assignment problem," IEEE International Symposium on Electromagnetic Compatibility, Boulder, CO, 70-74. [14] B.H. Metzger (1970), "Spectrum management technique," paper presented at the 38th National ORSA Meeting, Detroit, MI. [15] R. Murphey (1997), Multi graphs, weighted graphs and the frequency assignment problem, Engr Thesis, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL.
282 [16] G.L. Nemhauser and L.A. Wolsey (1998), Integer and Combinatorial tion, John Wiley and Sons, Inc.
Optimiza-
[17] J. Peemoller (1983), "A correction to Brelaz's modification of Brown's coloring algorithm," Communications of the ACM, Vol. 26, No. 8, 595-597. [18] A. Raychaudhuri (1985), Intersection assignments, T-coloring, and powers of graphs, Ph.D. Thesis, Department of Mathematics, Rutgers University, New Brunswick, NJ. [19] J.A. Zoellner and C.L. Beall (1977), "A breakthrough in spectrum conserving frequency assignment technology," IEEE Transactions on Electromagnetic Compatibility, Vol. EMC-19, 313-319.
Combinatorial and Global Optimization, pp. 283-296 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
A Derivative Free Minimization Method For Noisy Functions V.P. Plagianakos (vppQmath.upatras.gr) Department of Mathematics, University of Patras, University of Patras Artificial Intelligence Research Center, GR-26110 Patras, Greece.
M.N. Vrahatis ([email protected]) Department of Mathematics, University of Patras, University of Patras Artificial Intelligence Research Center, GR-26110 Patras, Greece.
Abstract An unconstrained minimization method which is based on Powell's derivative free method is presented. The proposed method retains the termination properties of Powell's method and it can be successfully applied to problems with imprecise function values. The ability of this method to cope with imprecise or noisy problems is due to the fact that it proceeds solely by comparing the relative size of the function values. The method has been implemented and tested, and performance information is given. Keywords: Powell's method, Rosenbrock's method, bisection method, unconstrained optimization, imprecise problems, noisy functions.
1
Introduction
Several methods for finding the extrema of a function / : V C R" —> R, where V is open and bounded, have been proposed, with many applications in different scientific fields (mathematics, physics, engineering, computer science etc.). Most of them
284 require precise function and gradient values. In many applications though, precise values are either impossible or time consuming to obtain. For example, when the function and gradient values depend on the results of numerical simulations, then it may be difficult or impossible to get very precise values. Or, in other cases, it may be necessary to integrate numerically a system of differential equations in order to obtain a function value, so that the precision of the computed value is limited. Furthermore, in many problems the values of the function to be minimized are computationally expensive. Such problems are common in real life applications as in the optimization of parameters in chemical experiments or finite element calculations, where a single measurement (function evaluation) takes several hours or even days. With such applications in mind, robust methods are needed which make good progress with the fewest possible number of function evaluations. In this contribution a new method is presented for the computation of a minimum x* of an n-dimensional real valued function / . The proposed algorithm is based on Powell's method (see [14, 28]), which minimizes a function without calculating derivatives, and proceeds solely by comparing the relative size of the function values. Thus, although this method retains the termination properties of Powell's method, it can be successfully applied to problems with imprecise function values. In Section 2 we give a discussion of optimization of noisy functions as well as a simulation of the influence of noise (proportional to a Gaussian distributed random number with zero mean and various variances). In Section 3 a brief overview of Powell's method and a detailed description of the new method are presented, while in Section 4 numerical results are presented. Finally, in Section 5, we give some concluding remarks.
2
Optimization of noisy functions
The problem of optimization of noisy or imprecise (not exactly known) functions occurs in various applications, as for instance, in the task of experimental optimization. Also, the problem of locating local maxima and minima of a function from approximate measurement results is vital for many physical applications. In spectral analysis, chemical species are identified by locating local maxima of the spectra. In radioastronomy, sources of celestial radio emission and their subcomponents are identified by locating local maxima of the measured brightness of the radio sky. Elementary particles are identified by locating local maxima of the experimental curves. The theory of local optimization provides a large variety of efficient and effective methods for the computation of an optimizer of a smooth function / . For example, Newton-type and quasi-Newton methods show superlinear convergence in the vicinity of a nondegenerate optimizer. However, these methods require the gradient or the
285 Hessian, respectively, in contrast to other optimization procedures, like the simplex method [12], the method of Hook and Jeeves [9], the direction set method of Powell (see [7, pp.87-92]), or some other recently proposed methods [5,6,23]. In some applications, however, the function to be minimized is only known within some (often unknown and low) precision. This might be due to the fact that evaluation of the function means measuring some physical or chemical quantity or performing a finite element calculation in order to solve partial differential equations. The function values obtained are corrupted by noise, namely stochastic measurement errors or discretization errors. This means that, although the underlying function is smooth, the function values available show a discontinuous behavior. Moreover, no gradient information is available. For small variations in a neighborhood of a point the corresponding function values reflect the local behavior of the noise rather than that of the function. Thus, a finite-difference procedure to estimate the gradient fails [5]. The traditional method for optimizing noisy functions is the simplex or polytope method proposed by Nelder and Mead [12] (cf. [7, p.18], [13, p.202], [15, p.230]). This method surpasses other well-known optimization methods when dealing with the large noise case. However, this is not valid in the noiseless case. The ability of this method to cope with noise, is due to the fact that it proceeds solely by comparing the relative size of the function values, as the proposed method does. The simplex method does not use a local model of the function / and works without the assumption of continuity. Although this method has poor convergence properties (for a convergence proof of a modified version see [18]), yet it has been proved to be a useful method in many applications. The simplex method converges whenever the standard deviation of the function at the vertices of the current simplex is smaller than some prescribed small quantity. This method can be deficient when the current simplex is very "fiat". This can be avoided by suitable variants (see for example [18]). More sophisticated methods in this direction are discussed by Powell [16]. To study the influence of the imprecise information (regarding the values of the objective function and the gradient), we simulate imprecisions with the following approach. Information about f(x) is obtained in the form of f{x), where f[x) is an approximation to the true function value f(x), contaminated by a small amount of noise 77. Specifically, the function values are obtained as [5,6]:
r(x)
= f(x)(l
+ r,),
2
V~N(0,a
),
(1)
where iV(0, a2) denotes a Gaussian distributed random number with zero mean and variance a2, i.e., relative stochastic errors are used for the test problems. To obtain 77 we apply the method of Box and Muller [1] using various variances a.
286
3
A derivative-free minimization method for imprecise problems and its convergence
In this section we briefly describe Powell's algorithm for solving the nonlinear unconstrained minimization problem without calculating derivatives. Also, we propose a derivative free minimization method which is based on Powell's method and we study the termination properties of the new method. Powell's method [14] is based on the use of conjugate directions and the main idea of his approach is that the minimum of a positive-definite quadratic form can be found by performing at most n successive line searches along mutually conjugate directions, where n is the number of variables. Also, this procedure can be applied to non-quadratic functions by adding a new composite direction at the end of each cycle of n line searches. In this case finite termination is no longer expected. One iteration of Powell's basic procedure consists of the following steps, where x° is the initial approximation to the minimum, and Wj, i = 1,2, . . . , n determine the initial set of directions which are equal to the basis vectors e;, i = 1, 2 , . . . , n: Step 1. Step 2. Step 3. Step 4.
For i = 1,2,..., n compute A; to minimize f{xl~l + A;Uj) and define xl = xl~x + \iUt. For i = 1, 2 , . . . , n — 1, replace M; by u;+i. Replace un by xn — x°. Compute A to minimize f(xn + Xun), and set x° = xn + Xun.
For a general (non-quadratic) function, the iteration is repeated until some stopping criterion is satisfied. If / is quadratic we minimize along n conjugate directions ui,v,2, • •. ,un, and the minimum is reached if the u, are all nonzero. This is true if Ai ^ 0 at each iteration, for then the directions «i, Ui, • • •, un cannot become linearly dependent. The problem is that Powell's method has a tendency to choose search directions that are linearly dependent on each other, especially in ill-conditioned problems. There are various procedures to cope with the linear dependence in Powell's algorithm. The simpler way to avoid linear dependence of the search directions is to reset the set of directions u,, to the basis vectors e, after n or (n +1) iterations of the basic procedure. This is the procedure we will follow in our approach and it retains the quadratic termination property if \x ^ 0. From the above discussion, it is evident, that the only computable information required by Powell's method is the values of / when the one-dimensional subminimization procedures used to minimize f(x + AM) are performed without utilizing the derivatives of / . One such subminimization procedure has been proposed by Powell [14].
287 Next, we propose a modification of Powell's method for the numerical computation of a minimizer of / utilizing only the relative size of the function values. To do this we have to minimize f(x + Xu) by comparing the relative size of the values of / . In our case, we have to minimize f(x° + Xu) along the direction u. One way to do this, by applying one-dimensional rootfmding methods, is to compute the value of A ^ 0 such that: f(x° + Xu) - f(x°) = 0. (2) Now, if A is the solution of the above equation, then, of course, the point x° = x° + Xu possesses the same function value as the point x°, so it belongs to the contour line of x°. Then, we can choose a point which belongs to the line segment with endpoints a;0 and x° possessing a smaller function value than these endpoints. With this fact in mind we can now choose such a point, say for example: X1 = X°+'Y(X0-X0),
76(0,1).
To solve the one-dimensional Eq. (2), we use our modification of the bisection method which is briefly described below. A solution of the equation f(x) = 0, where the function / : [o, 6] C R —¥ R is continuous, is guaranteed to exist in the interval (a, b) if the following criterion is fulfilled: f(a) f(b) < 0,
or
sgn/(o) sgn/(6) = - 1 ,
where sgn is the well known three valued sign function. This criterion is known as Bolzano's existence criterion (for a generalization of this criterion to higher dimensions see [21]). Based on this criterion various rootfmding methods, as for example the bisection method, were created. Here we shall use the bisection method which has been modified to the following simplified version described in [19,20,25]. There it is reported that, in order to compute a root of f{x) = 0 where / : [a, b] C R —¥ R is continuous, a simplified version of the bisection method leads to the following iterative formula: rP+i =rP
+ c
. S gn/(r p )/2P + 1 ,
p = 0 , 1 , . . . , [log2(6 - a) e " 1 ] ,
(3)
with c = sgn/(r°)(6 — a), r° = a and where e is the required accuracy. Of course the iterations (3) converge to a root r* e (a, b) such that \rp+1 — r*\ < e if for some rp, p = 1,2,..., the following holds: sgn/(r°)sgn/(r p ) = - l . Instead of the iterative formula (3) we can also use the following rV+l = rV-c-Sgnf{r*)l2V+\ with c = sgn/(r°)(6 — a), r° = b.
p = 0,1,.. .,\\og2(b - a)£-'],
(4)
288 The one-dimensional rootfinding portion of our method employs the above modified bisection method. We use this method since it is a globally convergent method, it always converges within the given interval and it is optimal, i.e. it possesses asymptotically the best possible rate of convergence [17]. In addition, it has a known behavior concerning the number of iterations required, when we seek a root with a predetermined accuracy e, and last, but not least, it is the only method that can be applied to problems with imprecise function values, as it is evident from (3) and (4), where the only computable information required by the bisection method is the algebraic signs of the function / . As a consequence, for problems where the function value is given as a result of an infinite series (e.g. Bessel or Airy functions), it can be shown [24,26] that the sign stabilizes after a relatively small number of terms of the series and the calculations can be speed up considerably. In our case to solve Eq. (2) for A, along the direction u, the above modified bisection method assumes the form: AP+1 = \P + C • sgn [/ (a + \"u) - f {x°)]/v+\
p = 0 , 1 , . . . , [log 2 (&-a) e " 1 ] , (5)
with c = sgn [/(a) — f{x0)] (b — a), A0 = 0, h = b — a and where h indicates the stepsize. Of course, utilizing our process we are able to obtain also local maxima along the line u. But if we choose the endpoints a and 6 in a proper way, then this method deals with a minimum. This can be easily handled by applying the iterative scheme (5) and taking the endpoints a and b from the following relations: a = x° — s(3 — — (1 + s) h
and
b = a + h,
(6)
where s = sgn [f(x° + /3u) — f{x0)] and j3 is a small positive real number (e.g. /3 ex y/epsmch where epsmch denotes the relative machine precision). To study the termination properties of our approach we give the following result that states that any search method involving minimization along a set of n linearly independent conjugate directions has the quadratic termination property. Theorem 3.1 [14,28]. If a quadratic function f(x) of dimension n is minimized sequentially, once along each direction of a set of n linearly independent, conjugate directions, the global minimum of f will be located in n or less cycles independent of the starting point as well as the order in which the minimization directions are used. Remark 3.1 A method that minimizes f according to the requirements of the above theorem has the property known as quadratic termination. Also, the order in which the directions are used is immaterial. Theorem 3.2
[14,28]. The directions generated in Powell's method are conjugate.
289 Theorem 3.3 The proposed method locates the minimum of an n-dimensional quadratic function f{x), in n or less iterations, utilizing only the relative size of the function values of f, independent of the starting point as well as the order in which the minimization directions are used. Proof: Since the proposed method uses the direction set m of Powell's method, by Theorem 3.2 these directions are conjugate. Also, the starting search directions are coordinate directions and hence they are linearly independent. Furthermore, the proposed method avoids linear dependence of the search directions by resetting the set of directions Wj, to the coordinate directions after n or (n + 1) iterations of the basic procedure. Thus the assumptions of Theorem 3.1 are fulfilled and the result follows. As it is well known the quadratic termination property is very powerful because most of the general (non-quadratic) functions can be approximated very closely by a quadratic one near their minima. Thus, this property is expected to speed up the convergence even for general functions. On the other hand Powell's method as well as the proposed method require generally more than, the theoretically estimated number, of n cycles for quadratic functions. The proof of the quadratic termination property has been established with the assumption that the exact minimum is found in each of the one-dimensional subminimizations. In practice, the actual subminimizer is approximated and hence the subsequent directions will not be conjugate. Thus, the methods require more iterations for achieving the overall convergence. For a proof of convergence, when the directions of the basis vectors are used and inexact one-dimensional subminimization procedures are applied, see [22, p.377].
4
Numerical applications
The above procedures were implemented using a new portable Fortran program named SIGNOPT, which has been applied to several test functions and has been compared with the well known Powell's and Rosenbrock's minimization methods. Rosenbrock's method is a simple iterative procedure that can be utilized to locate the minimum of a function of one variable. An initial point a;0 and an initial step length h are chosen and the function f(x) evaluated at a;0 and (x° + h). If f(x° + h) < f(x°), the step is called a success, otherwise it is called a failure. The successive steps are taken according to the following simple algorithm: Step 1. Step 2.
Choose a;0, h. Repeat
290 Step 3.
Iff(x°
+
h)
Then x° = x° + ft and ft = 2ft. Step 4.
Else ft = - f t / 2 . Until converged.
The method described above can also be used to find the minimum of a multi-variable function, if the function f(x) is minimized sequentially, once along each direction of a set of n linearly independent, conjugate directions. We have used as such directions the same directions we used for the SIGNOPT method, namely the Powell's directions. SIGNOPT was tested on various test problems and our experience is that it behaves predictably and reliably. Some typical computational results, for some classical test problems, are given below. For each of the following examples, the reported parameters are: — n dimension, — a2 the value of the variance of the simulated noise, — x° = (xi, X2, • • •, xn) starting point, — ft the stepsize for the bisection and the Rosenbrock's method. In the following tables the number of function evaluations needed for the methods to converge is given. If the field for the number of function evaluations contains a hyphen, the method failed to converge. Here we exhibit results from the following test problems. E X A M P L E 1. Himmelblau function, [8]. In this case / is given by: /Or) = (x2 + x2-
l l ) 2 + (Xl +x2-
7) 2 .
The starting point was x° = (1,1) and the steplength for the bisection and the Rosenbrock's method were ft = 9 and ft = 1 respectively. In Table 1 we exhibit the corresponding results obtained by the three methods. Powell's method was faster in the noiseless case, but when we added noise to the function values, SIGNOPT exhibited better performance. The Rosenbrock's method needed much more function evaluations in order to converge. E X A M P L E 2. Kearfott function, [10]. The objective function / is given by:
f(x) = (x2 +
2 x
-2)2+(xl-x22-l)2.
291 Table 1: Function Evaluations for solving Example 1. a Powell Rosenbrock SIGNOPT
0 0.01 0.10 0.20 0.30
321 726 845 927 1900
432 645 645 645 751
727 10283 14963 18809 21649
Table 2: Function Evaluations for solving Example 2. a Powell Rosenbrock SIGNOPT
0 0.01 0.10 0.20 0.30
144 598 -
292
325 324 324 324 324
5763 14534 13286 31260
In this problem we started the methods from the point x° = (1,1) and the steplength for the bisection was h = 9 and for the Rosenbrock's method was h = 1. SIGNOPT required the same function evaluations, even when we have used noisy function values, while Powell's method failed to converge for a > 0.01 and Rosenbrock's method needed many function evaluations. In Table 2 we exhibit the obtained results. E X A M P L E 3. Weber- Werner singular function, [27]. In this example the function is given by: f(x) = (x\ - 2 i i + ^ + | J + (x\ -
Xlx2
-2Xl
+ ^ + -j
(7)
As our starting values we utilized, x° = (10,10). The steplength of the bisection method was h = 9 and the steplength for Rosenbrock's method was h = 1. The results are shown in Table 3. Once again Powell's method failed to converge when we increased the variance of the noise. Rosenbrock's method converged even when we applied noisy function values, but once again it needed many function evaluations. On the other hand, SIGNOPT converged to the minimum within at most 1404 function evaluations, in all cases. E X A M P L E 4. Broyden banded function, [11]. In this example / is given by:
/(*)=£/< 2 (s),
(8)
292 Table 3: Function Evaluations for solving Example 3. Powell a Rosenbrock SIGNOPT
0
733
2352 4433 4036
0.01 0.10 0.20 0.30
511 898
10397 14002 31233 47460
-
1404 1404 1404
with: fi{x)=xi{2
+ 5x21) + l-Y,xj{l
+ xj),
(9)
jeJi
where: Ji = {J '• J 7^ *> max(l, i — mi) < j < min(n, i + mu)}
and
mi = 5, mu = 1. (10)
In this problem, for n = 2, we used as starting values x° = (1,1), h = 9 for the bisection and ft = 1 for Rosenbrock's method. For n = 3 we started the methods from the point x° = (1,1,1), using the same steplengths. In Table 4 we exhibit the Table 4: Function Evaluations for solving Example 4. Powell Rosenbrock SIGNOPT n a
2
3
0
246
599
0.01 0.10 0.20 0.30
1549 9369
12437 20524 15819 30756
0
2182
0.01 0.10 0.20 0.30
-
1426 43654 42231 90159
-
536 964 1289 1180 1824 1002 3429 2574 4000 9289
obtained results. SIGNOPT converged in all cases and had predictable performance, while Powell's method diverged as we increased a and Rosenbrock's method once again exhibited slow convergence. E X A M P L E 5. Hilbert function, [2]. In this example / is given by: f{x) = xTAx,
(11)
293 where A is an n by n Hilbert matrix, i.e.: %• = — — — r , l
0 0.01 0.02 0.03 0.04 0.05 0.10
0 0.01 0.02 0.03 0.04 0.05 0.10
5
106
205
1151
1213 2411 38882 65861
439
507
2523 3370 11370 15221 15598 24533
3553 3814 3514 3623 4343 5040
882
717
6160 7174 4297 9757 1624 4505
3810 4426 7136 6659 9373 11332
322 432 538 324 1825 428 716 716 716 858 858 4225 533 359 359 359 359 538 4824
Concluding remarks
A method for the computation of a minimum of an n-dimensional noisy or inexact objective function / is presented. The proposed method is based on Powell's derivative-
294 free minimization method and utilizes a modification of the one-dimensional rootfinding bisection procedure. The method proceeds solely by comparing the relative size of the function values. Thus, although it retains the termination properties of Powell's method, it can be successfully applied to problems with imprecise function values. Our experimental results clearly show that the proposed method outperforms Powell's and Rosenbrock's methods when noisy function values were provided. Powell's method utilizing exact one-dimensional line search was faster when accurate function values were used. In the presence of noise, Powell's method either needed many function evaluations or failed to converge and it behaved rather randomly. On the other hand, the performance of SIGNOPT, when noisy function values were provided, was satisfactory and predictable, and its number of function evaluations seemed to increase with n in a smooth and predictable way, which makes it a reliable method for minimizing imprecise or noisy functions.
References [1] Box G.E.P. and Muller M.E., A note on the generation of random normal deviates, Ann. Math. Statistics, 29, 610-611, (1958). [2] Brent R.P., Algorithms for minimization glewood Cliffs, NJ, (1973).
without derivatives, Prentice-Hall, En-
[3] Dennis J.E. and Schnabel R.B., Numerical methods for unconstrained optimization and nonlinear equations, Prentice-Hall, Englewood Cliffs, NJ, (1983). [4] Dennis J.E. and Torczon V., Direct search methods on parallel machines, SIAM J. Optimization, 1, No. 4, 448-474, (1991). [5] Elster C. and Neumaier A., A grid algorithm for bound constrained optimization of noisy functions, IMA J. Numer. Anal., 15, 585-608, (1995). [6] Elster C. and Neumaier A., A method of trust region type for minimizing noisy functions, Computing, 58, 31-46, (1997). [7] Fletcher R., Practical Methods of Optimization, Wiley, New York, (1987). [8] Himmelblau D.M., Applied nonlinear programming, McGraw-Hill, New York, (1972). [9] Hook R. and Jeeves T., Direct search solution of numerical and statistical problems, J. ACM, 7, 212-229, (1969). [10] Kearfott R.B., An efficient degree-computation method for a generalized method of bisection, Numer. Math., 32, 109-127, (1979). [11] More B.J., Garbow B.S. and Hillstrom K.E., Testing unconstrained optimization, ACM Trans. Math. Software 7, 17-41, (1981).
295 Nelder J.A. and Mead Ft., A simplex method for function minimization. Computer J., 7, 308-313, (1965). Nocedal J., Theory of algorithms for unconstrained optimization, In: Acta Numerica 1992, Iserles A., ed., Cambridge University Press, Cambridge, 199-242, (1992). Powell M. J.D., An efficient method for finding the minimum of a function of several variables without calculating derivatives, Computer J., 7, 155-162, (1964). Powell M.J.D., A review of algorithms for nonlinear equations and unconstrained optimization, In: ICIAM 1987 Proceedings, McKenna J. and Temam R., eds., SIAM, Philadelphia, 220-232, (1988). Powell M.J.D., A direct search optimization method that models the objective and constraint functions by linear interpolation, Numerical Analysis Reports, DAMTP 1992/NA5, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, England, April (1992). Sikorski K., Bisection is optimal, Numer. Math., 40, 111-117, (1982). Torczon V., On the convergence of the multidimensional search algorithm, SIAM J. Optimization, 1, No. 1, 123-145, (1991). Vrahatis M.N., Solving systems of nonlinear equations using the nonzero value of the topological degree, ACM Trans. Math. Software, 14, 312-329, (1988). Vrahatis M.N., CHABIS: A mathematical software package for locating and evaluating roots of systems of nonlinear equations, ACM Trans. Math. Software, 14, 330-336, (1988). Vrahatis M.N., A short proof and a generalization of Miranda's existence theorem, Proc. Amer. Math. Soc, 107, 701-703, (1989). Vrahatis M.N., Androulakis G.S., Lambrinos J.N. and Magoulas G.D., A class of gradient unconstrained minimization algorithms with adaptive stepsize, J. Comput. Appl. Math., 114, 367-386, (2000). Vrahatis M.N., Androulakis G.S. and Manoussakis M.E., A new unconstrained optimization method for imprecise function and gradient values, J. Math. Anal. Appl., 197, 586-607, (1996). Vrahatis M.N, Grapsa T.N., Ragos O. and Zafiropoulos F.A., On the localization and computation of zeros of Bessel functions, Z. Angew. Math. Mech., 77, 467475, (1997). Vrahatis M.N. and Iordanidis K.I., A rapid generalized method of bisection for solving systems of nonlinear equations, Numer. Math., 49, 123-138, (1986). Vrahatis M.N, Ragos O., Zafiropoulos F.A. and Grapsa T.N., Locating and computing zeros of Airy functions, Z. Angew. Math. Mech., 76, 419-422, (1996). Weber H. and Werner W., On the accurate determination of nonisolated solutions of nonlinear equations, Computing, 26, 315-326, (1981).
296 [28] Zangwill W.I., Minimizing a function without calculating derivatives, Computer J., 10, 293-296, (1967).
Combinatorial and Global Optimization, pp. 297-303 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Tight Q A P bounds via linear programming K.G. Ramakrishnan ([email protected]) Mathematical Sciences Research Center, Bell Laboratories M.G.C. Resende ([email protected]) Information Sciences Research Center, AT&T Labs Research B. Ramachandran ([email protected]) Department of Chemical Engineering, Purdue University J.F. Pekny ([email protected]) Department of Chemical Engineering, Purdue University
Abstract Lower bounds for the quadratic assignment problem (QAP) tend to deteriorate rapidly with the size of the QAP. Recently, Resende, Ramakrishnan, and Drezner (1995) computed a linear programming based lower bound for the QAP using an interior point algorithm for linear programming to solve the linear programming relaxation of a classical integer programming formulation of the QAP. That linear program can be viewed as a two-body interaction formulation. Those bounds were found to be the tightest for a large number of instances from QAPLIB, a library of QAP test problems. In this paper, we apply the same interior point approach to compute lower bounds derived from the three-body interaction formulation of Ramachandran and Pekny (1996). All instances from QAPLIB, having dimension up to n = 12, were solved. The new approach produces tight lower bounds (lower bounds equal to the optimal solution) for all instances tested. Attempts to solve the linear programming relaxations with CPLEX (primal simplex, dual simplex, and barrier interior point method) were successful only for the smallest instances (n < 6 for the barrier method, n < 7 for the primal simplex method, and n < 8 for the dual simplex method). Keywords: Quadratic assignment, linear programming bounds.
298
1
LP-based lower bounds for the QAP
In this section we briefly review integer are useful for producing lower bounds. assignment of facility i to location j and to location j and facility k to location I. integer quadratic program: n
n
programming formulations of the QAP that Let the binary variables xy- represent the denote by Cy^ the cost of assigning facility i The QAP can be formulated as the following n
n
Y^cioki Xij xkl
min J^5ZS
(1)
i = i j = i k=i i=i
subject to n
Y,xij
= 1. i = l , . . . , r a ,
(2)
Y,xa = !» i = l,...,n,
(3)
n
i=l
^
€ {0,1}, i , j = l , . . . , n .
(4)
Linear programming based bounds for the QAP [1, 9, 7] have relied on the following mixed integer formulation obtained by the linearization of the quadratic objective with the introduction of continuous variables yijki = Xij Xki- The resulting linear integer program is min
n
n
HHZ^5ZcyW VUki
(5)
i—l j=l k^i l>j
subject to (2, 3, 4) and X^ Vijkl = xkh
V(j, k, I), I > j
(6)
V(j, k,l),l
(7)
&k
Y,Vkiij = xu,
<j
i^k
^2yijki + ^2ykiij = xki, j
V(i,k,l),k^i
(8)
i>i
yijki>0,
V{i,j,k,l),ij=k,j^l.
(9)
Though lower bounds obtained from the linear programming relaxation of this formulation are, in general, better than previously known lower bounds [9], there is still a significant gap between the optimal solution and the lower bound for problems as small as dimension n = 8. For example, problem nug08 from QAPLIB [2] has an optimal solution of 214 and an LP-based lower bound of 204. This gap deteriorates with the increase in the size of the problem, necessitating the solution of a large number of linear programs in branch and bound algorithms [8], For example, nug30 has a best known solution of 6124 and an LP-based lower bound of 4805.
299 Ramachandran and Pekny [7] have recently proposed a higher-order formulation of the QAP based on the application of lifting procedures to (5-9). Defining three-body interaction coefficients as Cijkipq = (Hjki + Ckipq + Cijpq, the QAP can be formulated as:
min
n
n
n
mzsi] Y Y
n
Cijkipq %ijklpq i—l j—1 k^-i l>j p^i,fc q>l
(io) i=l j=l k^i l>j
subject to (2-4), (6-9) and Y
z
mpq ~ Vkipq>
V
0>fc>1>V,
j , q>l,
(11)
i^k,p
Y
z
Wpq = Vkipi' V{hk,l,p,q),p^k,i>l,q>
j,
(12)
i^k,p
Y
z
kLpqi] = Vkipg, V{j,k,l,p,q),p
^k,q>l,j
> q,
(13)
i^k,p
/ . Zijklpq ~T / . Zklijpq "T" / , Zktpqij j
Kj
Vklpqi
Kq<j
V(i,h,l,p,q), Zijklpq > 0.
p^k^i,q>
I,
(14) (15)
It can be shown that the optimal objective function of (10-15) is (n — 1) times that of QAP. Prior to our study, this formulation had been tested only for small instances of QAP of size at most n = 8 [7], showing that the LP relaxations were 100% tight in those cases. Larger instances of quadratic assignment problems could not be solved due to the limitations of CPLEX, the LP solver used. Decomposition methods based on this formulation have also yielded better lower bounds than the LP based lower bounds using the formulation (5-9) for a number of problems [7]. In this paper, our main objective is to use the interior point code ADP to obtain superior lower bounds using (10-15).
2
Experimental results
In this section, we describe computational results. Because of the size of the linear programs, we have limited this study to all QAPLIB instances having dimension n < 12. ADP requires about 1.2 Gbytes of memory to run the largest instances in the test set, which have 299,256 variables and 177,432 constraints. The experiments were done on a 250MHz Silicon Graphics Challenge. The ADP code is written in C and Fortran. It was compiled with the cc and f77 compilers
300
name nug05 nug06 nug07 nug08 nugl2
Table 1: QAPLIB instances of dimension n < 12 LP-based lower bound linear programming relaxation n BKS R R D 9 5 3-body rows cols NZ(A) 1410 5850 5 50 50 50 825 86 86 3972 20232 6 86 2886 9422 57134 148 148 8281 7 148 204 214 19728 8 214 20448 139008 177432 299256 1954944 12 578 523 578
esc08a esc08b esc08c esc08d esc08e esc08f
8 8 8 8 8 8
roulO roul2
2 8 32 6 7 18
2 8 32 6 7 18
19728 19728 19728 19728 19728 19728
10 174220 170384 12 235528 224278
174220 235528
66620 90550 177432 299256
601400 1954944
scrlO scrl2
10 26992 12 31410
26874 29827
26992 31410
66620 90550 177432 299256
601400 1954944
lipalOa lipalOb
10 473 10 2008
473
473
2008
2008
66620 66620
601400 601400
0 2 22 2 0 18
20448 20448 20448 20448 20448 20448
90550 90550
139008 139008 139008 139008 139008 139008
using compiler flags CFLAGS = -0 -DVAX - c c k r -p and FFLAGS = -02 -p - t r a p u v . Running times were measured by making the system call times and converting to seconds, using the HZ defined in sys/param.h. ADP requires many parameters to be set. We used the parameter setting described in [9]. Table 2 summarizes these instances, listing for each instance, its name, dimension (n), best known solution (BKS), the lower bound computed by Resende, Ramakrishnan, and Drezner [9] by solving (5-9) (RRD95 bound), the lower bound resulting from the 3-body formulation (3-body), and the dimension of the 3-body linear programming formulation (rows, columns, and number of nonzeros in the coefficient matrix). Note that of the 17 instances, the lower bounds computed in [9] were tight for only 6 instances, whereas all 3-body lower bounds were tight. Table 2 summarizes the ADP runs. For each instance, the table lists its name, the number of interior point iterations (ipitr), number of conjugate gradient iterations (cgitr), maximum number of conjugate gradient iterations in a single interior point iteration (max-cgitr), average number of conjugate gradient iterations per interior point iteration (avg-cgitr), and number of preconditioners computed (#-precond),
301
name nug05 nug06 nug07 nug08 nugl2
Table 2: ADP interior ipitr cgitr max-cgitr 48 37 311 557 32 55 721 54 59 63 1036 83 91 4655 201
point solution statistics avg-cgitr #-precnd time (se 3.2s 36 6 12.2s 48 9 43.3s 51 12 139.1s 56 16 6504.2s 86 50
esc08a esc08b esc08c esc08d esc08e esc08f
57 65 61 61 66 59
1130 4682 953 1110 2472 879
75 425 79 251 101 53
19 70 15 17 36 14
53 60 51 51 62 53
146.4s 435.8s 130.6s 138.6s 275.5s 121.5s
roulO roul2
69 80
1476 2736
101 173
21 33
59 72
800.8s 4222.0s
scrlO scrl2
71 83
1788 3240
101 201
24 38
60 78
950.0s 5038.8s
lipalOa lipalOb
66 66
943 900
66 62
14 13
58 54
603.1s 580.1s
and the total CPU time in seconds. We make the following observations regarding the experimental results: • The lower bounds computed are tight for all instances tested. • No other lower bounding technique for the QAP has produced tight bounds for all instances from this set of problems. • CPU times ranged from a little over 3 seconds on the smallest instance to a little under 2 hours for the longest n = 12 run. In the concluding remarks we discuss the relevance of this to branch and bound methods.
3
Concluding remarks
In this paper, we used an interior point algorithm [5] that uses a preconditioned conjugate gradient algorithm to compute lower bounds for the QAP by solving a linear programming relaxation of the 3-body interaction formulation of Ramachandran and
302 Pekny [7]. On all QAPLIB [2] instances of dimension n < 12, the computed lower bounds were tight, i.e. they equaled the optimal objective function value. A good lower bound by itself is of little use. However, in a branch and bound algorithm, a good lower bound can make a significant difference. Ramakrishnan, Resende, and Pardalos [8] showed that the weaker LP-based lower bound (QAPLP) studied in [9] can reduce substantially the number of nodes of the branch and bound tree that need to be scanned. Though the solution time for computing those bounds is significantly greater than the time needed to compute the classical Gilmore-Lawler bound [3, 6], the large number of scanned nodes for a Gilmore-Lawler based branch and bound algorithm makes the LP-based branch and bound method more attractive, specially for large quadratic assignment problems. For example, using the branch and bound code described in [8], QAPLIB instance chr/18a was solved after scanning 18 level-1 nodes of the search tree and 17 level-2 nodes in about 1600s, while on the same machine the identical branch and bound code using the Gilmore-Lawler lower bound in place of the LP-based lower bound had not solved the problem after having scanned over 1636 million nodes in over 12 days of CPU time. To this date, there exist QAPLIB instances of dimension n = 16 that remain unsolved. Though solving a 3-body interaction lower bound for n = 16 is beyond the capabilities of today's LP solvers, one can use this bound deeper in the search tree, where the subproblems solved have smaller dimension. A practical approach is to combine the QAPLP lower bound to compute bounds for shallow search tree nodes, with the 3-body interaction lower bound to compute bounds for deeper nodes. Since the 3-body interaction LP contains the entire set of constraints of the LP used for the QAPLP bound, the 3-body bound will always be at least as good as the QAPLP lower bound. Lower bounds that are better than QAPLP but not as good as the 3-body bound can be computed by considering a subset of the constraints (11-14). The number of constraints used should be a function of the depth of the node being scanned in the search tree. Linear programming formulations of the QAP have been shown to produce tight bounds. Further understanding of structural properties of the QAP polytope will hopefully provide yet tighter bounds. For two recent investigations in this direction, see Rijal [10] and Jiinger and Kaibel [4].
References [1] W.P. Adams and T.A. Johnson. Improved linear programming-based lower bounds for the quadratic assignment problem. In P.M. Pardalos and H. Wolkowicz, editors, Quadratic assignment and related problems, volume 16 of DIM ACS
303 Series on Discrete Mathematics and Theoretical Computer Science, pages 43-75. American Mathematical Society, 1994. [2] R.E. Burkard, S. Karisch, and F. Rendl. QAPLIB - a quadratic assignment problem library. European Journal of Operational Research, 55:115-119, 1991. Updated version - Feb. 1994. [3] P.C. Gilmore. Optimal and suboptimal algorithms for the quadratic assignment problem. J. SIAM, 10:305-313, 1962. [4] M. Jiinger and V. Kaibel. A basic study of the QAP-polytope. Technical Report 96.215, Institut fur Informatik, Universitat zu Koln, Koln, Germany, 1996. [5] N.K. Karmarkar and K.G. Ramakrishnan. Computational results of an interior point algorithm for large scale linear programming. Mathematical Programming, 52:555-586, 1991. [6] E.L. Lawler. The quadratic assignment problem. Management Science, 9:586599, 1963. [7] B. Ramachandran and J.F Pekny. Higher order lifting techniques in the solution of the quadratic assignment problem. In State of the Art in Global Optimization: Computational Methods and Applications, pages 75-92. Kluwer Academic Publishers, 1996. [8] K.G. Ramakrishnan, M.G.C. Resende, and P.M. Pardalos. A branch and bound algorithm for the quadratic assignment problem using a lower bound based on linear programming. In State of the Art in Global Optimization: Computational Methods and Applications, pages 57-73. Kluwer Academic Publishers, 1996. [9] M.G.C. Resende, K.G. Ramakrishnan, and Z. Drezner. Computing lower bounds for the quadratic assignment problem with an interior point algorithm for linear programming. Operations Research, 43:781-791, 1995. [10] M. Rijal. Scheduling, design and assigment problems with quadratic costs. PhD thesis, New York University, 1995.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 305-316 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
G P S Network Design: An Application of the Simulated Annealing Heuristic Technique Hussain A. Saleh ( h . s a l e h 9 u e l . a c . u k ) School of Surveying, University of East London, UK.
Peter J. Dare ( p e t e r 9 u e l . a c . u k ) School of Surveying, University of East London, UK.
Abstract
In Global Positioning System network design, a given set of sessions is to be observed consecutively (known as a schedule) with a resultant total cost for the schedule. The objective is to minimize the total cost of the schedule by searching for the best order in which these sessions can be observed. Generally, solving large networks to optimality requires impracticable computational time. In this paper, a new Simulated Annealing heuristic technique is developed to provide near- optimal solutions for large networks within an acceptable amount of computational effort. The technique is validated and illustrated by comparing its outputs with existing construction methods and a branch and bound exact algorithm on a set of case studies. In this paper a Simulated Annealing heuristic algorithm, which takes into account many GPS design requirements, is applied to a network created for the Republic of Malta. The proposed algorithm has performed very well in terms of solution quality and computation time. K e y w o r d s : Combinatorial Optimisation Problem (COP), Global Positioning System (GPS), Heuristic and Simulated Annealing (SA).
306
1
Introduction
The Global Positioning System GPS is a world-wide navigation and positioning resource developed by the US Department of Defence for both military and civil use. This system is based on a constellation of 24 satellites orbiting the earth at an altitude of approximately twenty thousand kilometres (see Fig. 1). These satellites, which act as reference points from which receivers on the ground determine their position, provide the user with a 24-hour highly accurate three dimensional position and timing system at any location on the globe. The processes which are involved in GPS technology in surveying are becoming a major source in providing unprecedented accuracy at an economical rate compared to other techniques. On the other hand, GPS equipment is very expensive compared with other traditional methods, and this becomes crucial in large networks. Generally, solving these large networks to optimality requires an impracticable computation time. To avoid this, a new area of research that tries to provide near-optimal solutions for large networks within an acceptable amount of computational effort have been implemented. This research is based on the heuristic techniques within the field of Operational Research (OR) (Saleh and Dare 1997a, 1997b) (Dare and Saleh 1997). These techniques have been developed which allow the formulation of a strategy for designing GPS networks which maximise the GPS technology benefit by reducing the total cost of carrying out the work.
Figure 1: GPS satellite constellation (from Elliott 1996) In this paper, the Simulated Annealing algorithm (SA), which is based on the concept of exploring the vicinity of current solution, is introduced and applied for a network in the Republic of Malta. This technique was validated and illustrated by comparing its outputs with existing known optimal solutions previously computed by Dare (1995). The proposed SA heuristic has performed extremely well in terms of solution quality and computation time. The organisation of this paper is as follows. Section 2 describes the basic simulated annealing structure. In Section 3 the GPS surveying network is formulated as a
307 Combinatorial Optimisation Problem (COP). Section 4 defines the implementation of the GPS Simulated Annealing procedure. The numerical results and solution comparison are presented in Section 5. The paper ends with conclusions and some remarks.
2
Simulated Annealing technique
The SA approach imposes different randomized search, acceptance and stopping criteria on the local search method in order to escape poor quality local minima inherent in a Local Search (LS) descent methods (Dowsland, 1995), (Aarts and Van Laarhoven, 1987), (Rene, 1993), (Osman and Potts, 1989). This technique has its origins in statistical mechanics (Metropolis et al., 1953). The interest in SA began with the work of Kirkpatrick, Gelatt and Vecchi (1983) and Cerny (1985). This technique starts with an initial solution (within this context, an initial schedule). It then attempts to improve upon the current solution by a series of small local changes generated by a suitably defined neighbourhood mechanism until a stopping criterion is met. A neighbourhood I of a solution V is a set of solutions (Vl,..,Vn) that can be reached from V by a simple operation such as removing or adding an element (session) to V. Producing different solutions is achieved by perturbing the initial solution using a specific move type (Lenstra and Aarts, 1995). In this problem, a special move type has been designed to fit and match with the GPS technology (a move is a pair exchange i.e., a swapping of two sessions in a given schedule). These generated solutions (Vl,...,Vn) from a given solution S are called the neighbourhood of V. The aim is to minimise the objective function (cost) by sampling and examining the solution space (Q) by moving from one schedule to another. The introduction of the successful application of SA to problems in many different areas, such as, statistics, engineering, mathematical programming and operational research, has broadened the scope of local search and has led to new algorithmic templates such as, Tabu Search and variants of Genetic Algorithms (Thangish et al., 1994), (Zanakis et al., 1989) and (Lin et a l , 1993).
3
Formulation of the GPS surveying problem
The network in GPS can be defined as a set of stations (a,b,c,d,e,f, etc) which are co-ordinated by placing receivers (X, Y, Z, etc) on them to determine sessions (ab, ac, etc) between them as shown in Fig. (2). A session can be defined as a period of time during which two or more receivers are simultaneously recording satellite signals. The problem addressed is to search for the best order in which these sessions can be
308 organised to give the best schedule. Thus, the schedule can be defined as a sequence of sessions to be observed consecutively. In practise this means determining how each GPS receiver should be moved between stations to be surveyed in an efficient manner taking into account some important factors such as time, cost etc. From the above, a proper design to a network GPS survey can be guaranteed by good organisation of field-work components (Dare, 1995) such as: location of stations to be surveyed; type and number of used receivers; required sessions to be observed to make the network solvable and the length of session observations which is dependent on number of factors including the geometry of satellites, day or night observation and distance between stations etc. (Saleh, 1996). The formulation of a GPS network as a Combinatorial Optimization Problem (COP) is generally specified by a universe of the solution space Q. This solution space is the number of possible schedules, V!, where each V represents a schedule consisting of the required number of sessions U. The goal is to try to determine the most suitable receiver schedule over a combinatorial set of these feasible schedules giving the lowest cost. To represent the GPS surveying problem within the frame of heuristics, the following notations are used: N n R r PRi Sj V C(V) Q Oab I
: : : : : : : : : : :
the set of stations N = { 1 , . . . , n}; the number of stations; the set of receivers R = 1 , . . . , r; the number of receivers; the initial position of receiver i ; the set of stations visited by receiver i; a feasible solution which is defined as V = {Si, , S r }; the cost of schedule V; the universe of potential schedules Q = {1,., V}; the cost of moving reciever i from station a to station b; the set of sessions satisfying the imposed constraints (size of the neighbourhood).
The aim is to determine an optimal solution that minimizes the total cost of observing the whole network and satisfies the requirements of GPS surveying i.e., Minimize: C(V) subject to : V€l,
ICQ;
U S, > N; C(V) = £ C ( S , )
VieR;
ieSi
The original cost matrix represents the cost of moving a receiver from one point
309 to another and computed based upon the criteria to be minimized. For example as shown in Fig. (2), the cost of observing session (ac) after observing session (ab) is obtained by the cost associated by moving Rec.Y from (b) to (c) while Rec.X remains at (a). Subsequently, the cost of observing session (dc) is obtained by the cost associated by moving Rec. X from (a) to (d) while Rec.Y remains at (c). The cost could be evaluated purely upon time or purely upon distance, for more details about the evaluation of the cost matrix see Dare (1995).
4
The GPS-Simulated annealing algorithm
In order to implement SA for GPS design network, a number of decisions and choices have to be made. Firstly, the problem specific choices which determine the way in which the GPS network is modelled in order to fit into the SA framework. In other words, it involves the definition of the solution space (Q) and its neighbourhood (I) structure, the form of the cost function C(V) and the way in which a starting solution (V) is obtained. Secondly, the generic choices which govern the workings of the algorithm itself, are mainly concerned with the components of the cooling schedule: control parameter (T) and its initial starting value, the cooling rate (F) and the temperature update function, the number of iterations between decreases (L) and the condition under which the system will be terminated (Reeves, 1993). The GPS-SA procedure used in this work is designed and developed essentially from practical experience and the requirement of the GPS technique. A simple constructive procedure is proposed to obtain an initial feasible schedule (V) for the GPS network. The aim of this simple procedure, which was implemented as a greedy local search method, is to obtain quickly an initial schedule, for more details about this procedure see (Saleh and Dare, 1997c). The structure of the Simulated Annealing algorithm is shown in Fig.(3) below:
5
Computational results
The effectiveness of any iterative technique is partly determined by the efficiency of the generation mechanism and the way in which a neighbourhood is searched for a better solution. Lin (1965) suggested a generating mechanism called the -optimal ( -opt, is the number of interchange and in this case is 2) procedure based on arcs-exchange for the TSP. In this paper a similar procedure based on sessions-interchange for GPS network is adopted. It is preferable in heuristics to evaluate the proposed algorithm by comparison with an optimal solution. Fortunately, there are such known optimal solutions using exact algorithms for relatively small GPS surveying networks (Dare,
310 1995). Hence, the best known solution obtained by TSP algorithm is used as the known optimal value. The near-optimal schedules obtained using SA had the same result as the known optimal solutions (Saleh and Dare 1997c, 1998). To generalise the above SA procedure and work with larger networks with more objectives and constraints, the actual data set for network in Malta has been implemented. The GPS data and operational requirements for the session's observations were obtained from Dare (1994). The data set consisted of 37 sessions, which comprised two weeks of session observations during July-September 1993. One bench mark, which was available for this data set, allowed comparisons to be made as to the effectiveness and computational efficiency of the proposed simulated annealing procedure. This benchmark was the actual operating schedule used by the Mapping Unit of Malta as a part of establishing a new primary network spanning the islands of Malta, Gozo and Comino using GPS and reported in Dare (1994). This schedule, with cost of 2264 minutes, was manually generated and the time required to create it was created using intuition and experience on a day-to-day basis. At the end of one working day, a schedule for the following day was created. This observed schedule (Dare 1994) was adjusted to be an initial schedule used by the SA-GPS technique. Given the manually generated solution of the cost 1405 minutes as a starting solution, the overall cost was reduced to 1339 minutes in approximately two seconds on a Viglen p5/133 using software written in visual C + + . Fig. (5) shows a graphical depiction of the rapid convergence of the SA heuristic for this solution. From the above, the ability of the developed SA to generate rapidly high-quality solutions for designing the GPS networks can be seen. The swap neighbourhood structure is implemented, the geometric cooling scale is adopted. The SA performance was relatively insensitive to changes in T, L, and F, however, the selected cooling parameters are T = 6, an inner loop size of L=380 and a cooling factor of F=0.85.
6
Further work and conclusion
A special method based on Simulated Annealing for minimizing the cost of a GPS surveying schedule have been presented. Good approximate solutions may be obtained, however, by viewing the session scheduling problem as a (COP). The time required to produce this solution is of the order of seconds rather than hours or days. The presented method has been tested on several problems and appears to be a powerful technique. It is extremely fast; the calculation can be terminated as soon as the solution obtained is deemed to be sufficiently cheap for the wanted purpose and it can be adapted to solve other kinds of problems. In addition to the above advantages of using this procedure, the implementation of the session pairing model has greatly in-
311 creased the flexibility of the surveyors sessions scheduling process. The logistics (the most difficult part in observing the GPS network as shown in Dare 1994) is easily solved using the above heuristic procedure. As no optimal solution will be available for these large networks, the general applicability of other heuristic techniques such as Tabu Search (TS) to GPS surveying design is a matter for further research to provide solution comparisons.
Acknowledgement This research was supported by the Ministry of Higher Education, Syria. The authors wish to express their gratitude to Prof. Barry Gorham, Dr Brian Whiting, Dr Michael Peel and Richard Latham, also all at University of East London, for their support and help in achieving this result.
References [ I]
Aarts, E. H. L. and Van Laarhoven, P. J. M. (1985). "Statistical cooling: a general approach to combinatorial optimization problems". Philips Journal of Research, Vol. 40, pp. 193-226.
[II]
Cerny, V. (1985). "A thermodynamical approach to the travelling salesman problem: an efficient simulated annealing algorithm". Journal of Optimization Theory and Applications, 45, 41-51.
[III]
Dare, P. J. (1994). "Project Malta' 93: The Establishment of a New Primary Network for the Republic of Malta by use of the Global Positioning System". Report for Mapping Unit, Planning Directorate, Floriana, Malta.
[IV]
Dare, P. J. (1995). "Optimal Design of GPS Networks: Operational Procedures" Ph.D. thesis, School of Surveying, University of East London, UK.
[V]
Dare, P. J. and Saleh H. A. (1997). "The use of heuristics in the design of GPS networks ". Paper presented at Scientific Assembly of the International Association of Geodesy (IAG97), Riocentro-Rio De Janeiro, Brazil. 3-9 September 1997.
[VI]
Dowsland, K. (1995). "Variants of Simulated Annealing for Practical Problem Solving In: Applications of Modern Heuristic Methods" Ed. V. Rayward-Smith, Alfred Waller Ltd, in association with UNICOM, Henleyon-Thames.
312 [VII]
Elliott, D. (1996). "Understanding GPS Principles & Applications". Boston, Mass. Artech House, USA.
[VIII]
Kirkpatrick, S., Gelatt C D . and Vecchi, P.M. (1983). "Optimization by Simulated Annealing", Science, 220, 671-680.
[IX]
Lenstra, J. K. and Aarts, E. H. L. (1995). "Local Search Algorithms". John Wiley & Sons, Chichester. UK.
[X]
Lin, S. (1965). Computer solutions of the travelling salesman problem. Bell System Technical Journal, Vol. 44, pp. 2245-2269.
[XI]
Lin, F. T., Kao, C Y . and Hsu, C C (1993) Applying the genetic approach to simulated Annealing in solving some NP-hard problems, IEEE Transactions on Systems Man and Cybernetics 23, 1752-1567.
[XII]
Metropolis, W., Roenbluth, A., Rosenbluth, M., Teller, A., and Teller, A. (1953). "Equation of the state calculations by fast computing machines". Journal of Chemical Physics, 21, 1087-1092.
[XIII]
Osman, I. H. and Potts, C. N. (1989). "Simulated Annealing for permutation flow-shop scheduling". Omega, 17:551-557
[XIV]
Rene, V.V. Vidal (1993). "Applied Simulated Annealing". Springer-Verlag, Berlin, Heidelberg, Germany.
[XV]
Reeves, C. R. (editor). (1993). "Modern Heuristic Techniques for Combinatorial Problems". Blackwell Scientific Publications, Oxford, England.
[XVI]
Saleh, H. A. and Dare, P. J. (1998). "Efficient Simulated Annealing Heuristic Techniques for Designing GPS Surveying Networks". Paper presented at Young Operational Research Conference, University of Surrey, Guildford, UK, 31 March to 2 April.
[XVII]
Saleh, H. A. and Dare, P. J. (1997a). "A Heuristic Approach to The Design of GPS Networks". Paper presented at Untied Kingdom Geophysical Assembly-21, University of Southampton, UK, 2-4 April.
[XVIII]
Saleh, H. A. and Dare, P. J. (1997b). "The Design of GPS Networks using the Heuristic Techniques". Paper presented at Young Researchers Forum in Operational Research and Management Science, University of Southampton, UK, 17-18 April.
[XIX]
Saleh, H. A. and Dare, P. J. (1997c). "A Simulated Annealing Approach for Designing a GPS surveying Network". Paper presented at 2nd Metaheuristic International Conference (MIC'97), Sophia-Antipolis, France. 2124 July 1997.
313
[XX]
Saleh, H. A. (1996). "Improvements to The GPSdemoUCL Simulation Software" , MSc Dissertation in Surveying. Department of Geomatic Engineering, University College London, London, UK.
[XXI]
Thangish, S.R., Osman, I.H., Vinayagamoorthy, R. and Sun (1994). "Algorithms for vehicle routing problems with time deadlines". American Journal of Mathemtical and Management Sciences, 13, 323-354.
[XXII]
Zanakis, H., Evans, J., and Vazacopoulos, A., (1989). "Heuristic methods and application : a categorized survey". European Journal of Operational Research, 43, 88-110.
314
Data transmission by GPS satellites
Session in observation by GPS receivers and creation of the GPS Network
No. Session
Receiver X a t Station Moving
Receiver Y a t Receiver Z Station Moving
Cost 1 2 3 4
ab ac dc e.g.
,
a a d e
[--]
[0] [CJ [CJ
b c c g
Session Cost
Cost [--] [0]
cbc cad
[CJ
c, e +c e2
[CJ
The Total Cost of the Schedule
Figure 2: Sessions observation using GPS receivers
EC,
315
[I] THE PROBLEM SPECIFIC DECISIONS (A) FORMULATING the original cost matrix: Step 1 Insert set of stations N. Insert the estimated cost for each recievers's move C^b (B) CREATING the actual cost matrix (solution representation): Step 2 Insert set of receivers R. Define the sessions to be observed U. Insert the constraints which have to be met. (C) DETERMINING an initial schedule: Step 3 Generate a feasible solution V with cost C(V) using the actual cost matrix. [II] THE PROBLEM GENERIC DECISIONS (D) INITIALIZING the cooling schedule parameters: Step 4 Set the initial starting value of the temperature parameter, T > 0. Set the temperature length, L. Set the cooling ratio, F. [III] THE GENERATION MECHANISM (E) SELECTING and acceptance strategy of generated neighbours: Step 5 Select a neighbour V of V where V € I(V). Let C(V')= the cost of schedule V . Compute the move value = C(V')-C(V). Step 6 If A < 0 accept V as a new solution and set V = V . ELSE A > 0. IF e~ A / T > 9 set V = V , where 8 is a uniform random number 0 < 6 < 1. OTHERWISE retain the current solution V. (F) UPDATING the temperature: S t e p 7 Update the annealing schedule parameters using the geometric cooling schedule T ( k + i) = FT k {k = 0,1,2}. (G) TERMINATING the solution: Step 8 Stop if the stopping criterion is met. Step 9 Show the output. Declare the best solution. Declare the computation time. OTHERWISE. Go to (E). END.
Figure 3: T h e general framework for t h e S A - G P S procedure
316
GorcUm Dax© P&k
^
14 3 0 ' Off'E
I tDamnra
WaRMja.
it
399S
36 05' 00"H
A
Il-Q«ri
(Eacal)
COMING
T « ' "i
GOZO
397S
35 SO'OCf* N J
14
f
fas-Silg JO'
go" E
I H 4 5iw» "425
430
43S
440
445
450
4SS
460
46 S
Figure 4: Malta GPS surveying network design. 1420i||
Figure 5: Current and best found solution quality versus iteration number for a Simulated Annealing heuristic, as it visits five local optima.
Combinatorial and Global Optimization, pp. 317-331 P.M. Paxdalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Global optimization for crack identification: impact-echo experiments. Georgios E. Stavroulakis ([email protected]) Institute of Applied Mechanics, Carolo Wilhelmina Technical University, Spielmanstrafie 11, P.O. Box 3329, D-38106 Braunschweig, Germany
Abstract Model-based crack identification problems are considered for static and dynamic loadings with an emphasis on unilaterally working cracks. The inverse problem is formulated as an output, for instance least square, error minimization problem. If only restricted knowledge on the position and the properties of the crack is available, the latter nonlinear least square problem is nonconvex and, due to the unilateral contact effects, possibly nondifferentiable as well. The computational mechanics modelling of the problem on hand is outlined in this paper, previous attempts using optimization and neural networks are briefly described and some numerical results obtained with genetic optimization techniques are presented. Keywords: Inverse problems, structural analysis, mechanics, quality control, crack identification.
1
Introduction
Flaw, crack or damage identification problems belong to the class of the so-called inverse problems in mechanics. The technological question arises during nondestructive testing and quality control of structures or structural elements. From the optimization point of view a model-based identification problem can be written as an error minimization problem. In fact, an appropriate parametrized model of the structure
318 with all expected defects is considered (i.e., the number of the cracks, their position and other characteristic values). After solution of the direct mechanical problem, this model provides the structural response (i.e. stresses, displacements of the structure). Obviously the structural response depends on the values of the considered fault parameters. On the assumption that the adopted model includes (or may approximate with sufficient accuracy) the expected crack or flaw, one tries to find the latter deficiency by minimizing the error (i.e, the difference, in a specific, usually the Euclidean metric norm) between the parametrized structural response and the corresponding experimental measurements. Nondestructive testing methods which use static or dynamic loadings or wave propagation effects are formulated within the previously outlined, general framework. The application of the appropriate model depends on the studied problem. On the other hand, both discrete (e.g, the existence and the number of the cracks) and continuous variables (e.g., the crack length) may be involved. Inverse problems are generally ill-conditioned and require appropriate numerical techniques to avoid numerical instabilities during their solution. This point is investigated in several works concerning inverse problems in mechanics, acoustics etc. On the contrary, the global nature of the arising error minimization problems is mentioned only in a few publications. In particular the nonconvexity of the problem is explained as follows. Let us consider the measurement error minimization problem as a composite function. The external component of this composite function may be a quadratic (cf., least square) error measure, thus a convex function. The internal component is the mapping between crack (or flaw) variables and the corresponding structural response. But this latter function is, in general nonlinear. Thus the whole composite function is nonconvex, as a combination of a convex function with a nonlinear mapping. In several applications of practical interest, for example in the modelling of unilateral contact effects, the previously mentioned structural mapping is, in addition, nondifferentiable, which makes the composite error function nondifferentiable as well. Fortunately, engineering experience and the combination of several complementing inspection methods usually reduces the difficulties arising in practical applications. Actually, good initial choices of the involved variables are provided and, thus, classical local optimization techniques are successfully used. This means that one solves efficiently inverse problems for one isolated crack, in a structure of simple geometry, for instance in an infinite medium, by using appropriately chosen test loadings and local optimization techniques. The same inverse problem concerning a crack lying in a complicated, real-life plate with several external boundaries and loaded by an external dynamical test loading may be very difficult, since complicated wave propagation, reflection and phase transition phenomena take place. For all these cases global optimization techniques must be used. It should be emphasized here that the
319 calculation of the global optimum is crucial for inverse problems. A local minimum, which could be an acceptable solution for an optimal structural design or an optimal control application, may have no practical use if the values of the involved variables, for instance the prediction about the existence and the position of a crack, have an unacceptable error. In this paper results of recent computational investigations of crack identification problems are discussed. Modelling of the direct problem through boundary element techniques and solution of the inverse problem by means of some optimization methods are used. In particular, an impact-echo test is numerically modelled and the arising inverse problem is formulated as a global minimization problem. The purpose of this paper is to make people of the global optimization community aware of an interesting application of global optimization which is of importance for modern industrial applications. Several relevant publications are cited. A demonstration example is treated. Nevertheless, neither the review part of this paper nor the application are complete. The paper is organized as follows: a short bibliographical review of global optimization approaches for inverse problems in mechanics is given in the next section. The mechanical modelling of the direct, wave propagation problem, which is used here is outlined in section three. The inverse problem is formulated in the subsequent section. Finally, some results obtained by genetic optimization techniques are presented in the last section.
2
Global optimization for inverse problems
From the published literature, it seems that searching for cracks or other flaws within elastic structural components by means of static test loadings leads to inverse problems which can be tackled by means of classical optimization tools. The ill-posedness of the problem, which lead to numerical instabilities, can usually be resolved by appropriate smoothing or regularization techniques. Flaw detection using static analysis results and classical optimization has been considered in [18]. For error-tolerant or online applications one has used Kalman filter techniques or neural network models. Nevertheless, usually no contact conditions between the crack sides are considered. Their introduction leads to additional difficulties due to nonconvexity and nondifferentiability of the arising problem. The analogous mathematical model within the framework of optimal structural design has recently been considered in [7], [11] (see also the previous works of the author in [14], [20]). More details on unilateral crack identification using static analysis and neural networks are given in our previous paper [15]. Frequency domain crack and flaw identification leads to more delicate error optimiza-
320 tion problems. The structural analysis operator is more complicated, and depends on the frequency of the applied loading. Extensions of the previously outlined methods of the static case have been reported. It seems that the necessity of using global optimization techniques for this class of problems has been accepted in recent works (cf. the interval analysis application of [10] and the author's experience with genetic algorithms for flaw detection problems in [18]). One should mention here that a frequency domain formulation is a compromise between the complete, but demanding dynamic modelling and the relatively simple, but sometimes of reduced practical importance static model. Moreover, not all problems can be treated in the frequency domain, which applicability is reduced to linear elastodynamic problems. Damage and crack identification using vibration measurements and, in particular, the eigenvalues and eigenmodes of the structure is another interesting area which will not be considered here (see, e.g., [9]). Real dynamic analysis problems usually concern transient dynamic loadings of short duration. A simple tool like a hammer may be used to produce this pulse. A wave propagation problem is a typical example for this class of problems, which form the basis for several nondestructive evaluation techniques. From the interaction of a suitable dynamic loading with cracks or flaws one may see the interior of a structure and identify possible defects there. One of these applications is the impact-echo technique which is modelled in this paper. Clearly, wave interaction with elastic media provides a wealth of information which can be used for inverse analysis purposes. For simple configurations appropriate features of the dynamical signals may be extracted. For instance, in the impact echo method, the time between sending a dynamical signal and receiving its echo, which is reflected from a crack or an interface in front of the excitation device, gives information about the covered distance (thus, the position of the reflector). The inverse problem is then simplified to an association between these features and the unknown parameters (e.g., the crack position). Unfortunately, for complicated fault patterns and geometry of the specimen's the results are too complicated. Their interpretation is sometimes an exercise for experienced personnel. It is not strange that recent investigations in this direction use soft computing tools for the solution of the underlying error optimization problem. Neural networks [22] and genetic optimization tools [23], [6], [5] have been used in several works.
3
Mechanical modelling
The Linear Complementarity Problem (LCP) formulation of unilateral problems for elastic structures discretized by the Boundary Element Method (BEM) will be outlined in this section. Static and dynamic problems will be considered. More details
321 on nonsmooth mechanics problems can be found, among others, in [12], [2], [4], [8].
3.1
Static problems
Starting point will be the matrix formulation of the boundary element equations in elastostatics: Hu = Gt. (1)
In (1) u is the vector of nodal displacements at the boundary, t is the boundary traction vector and the nonsymmetric matrices H and G are appropriate influence matrices which are based on the used fundamental solution and the adopted boundary element discretization. Here we assume that on a part of the boundary unilateral contact relations hold. Thus both displacements and tractions of that boundary must remain, after appropriate rearrangement, for the solution of the nonlinear, unilateral problem. To this end let Uc„, uct, t c n , tct be the boundary nodal displacements and the boundary nodal tractions respectively along the normal (n) and the tangential (t) direction at the unilateral (contact) boundary. After partitioning we arrive from (1) at: H
// Hc/
H
/ c n Hjct H„ H-
X
=
+
Gfcn
Gfct
(2)
Here the BEM equations are written for the two parts of each boundary. These groups correspond to the two rows of the supermatrices H and G, i.e., the part of free (classical) boundaries (first subscript / ) and the part of the contact interface (with first subscript c). Finally the vectors of boundary displacements and boundary tractions are composed of the parts which correspond to the bilaterally connected interface (with unknowns gathered in vector x, as usual, and given values used for the construction of vector f and the contributions normally and tangentially to the unilateral part (with subscripts en and ct respectively)). The introduction of unilateral constraints leads to a LCP. For a frictionless unilateral contact problem (with ttc = 0 and uct unconstrained in (2)) the unilateral contact relations of the boundary lying in an initial distance (gap) d from a rigid support read: yn = d - u c n > 0 , t „ = - t c n >O,y£tn = 0 (3) Solving (2) with respect to vector ucn , one gets the flexibility relation: y n = d + y° + F n n t n .
(4)
Here vector d is the initial distance vector, y° is the deformation of the boundary due to the external loading or imposed displacement, classical boundary conditions
322 and Fnn is the flexibility matrix constructed by the BEM equation (2). The LCP is composed of relations (3), (4). For the frictional unilateral contact problem an analogous LCP reformulation is possible. Briefly speaking, one may first write, as previously, a flexibility relation: Yn
= d+ [y°l .y°t.
+
*• nn
* nt
Ftn
F«
Here the notation yt = u c ; and tt = tct has been used. Moreover the Coulomb friction relations may be assumed in the form: yt yt yt
= 0 iff t t < fi | t„ |, > 0 for t t = fitn, < 0 for t t = - / i t „ ,
(6)
where /i is the friction coefficient. A LCP formulation is obtained from (5), (3) and (6) by using additional slack variables, as it is described in [1], [8].
3.2
Dynamic problems
In this case, after time discretization, for simplicity with constant time step, the matrix formulation of the boundary element method in the k-th time step reads (cf., [2], [19]): fc-i H (i) u (fc) =
G^t^ + V
[G(*- m + 1 )t ( m ) - H ( *" m + 1 ) u ( m ) l .
(7)
m=l
In (7) superscripts denote for the displacement and the traction variables the time step in which the respective quantities are calculated and for the influence matrices the difference between the current (k) and the considered (m) time step. Moreover matrix H is appropriately modified to take into account the dynamic effects, while the contribution of all previous steps is taken into account by the last terms in (7). From (7) a LCP problem can be formulated, in analogy to the previous static problem, for each time step k.
3.3
Impact-echo example
An elastic layer glued on a rigid sublayer (support) is considered. The possibility of finding an interlayer crack from using impact-echo measurements on the accessible surface of the structure is considered. The modelling of the direct mechanical problem with the previously outlined theory is described here.
323
Figure 1: A layer with a crack. BEM discretization. A two-dimensional elastic plate under plain-stress conditions is studied. The configuration of the plate is shown in Fig. 1. A time dependent impact-like loading is considered on the free boundary AF. The opposite side (parts BC and DE of size BE) is considered to be fixed at a rigid support. A potential subsurface crack is assumed at the part CD of the side BE. Either a classical, bilaterally working, unpressurized crack is assumed, or a unilateral crack with zero initial gap is considered. The modelling follows the lines of the previous section. An elastic material with modulus of elasticity G = 100000.0, Poisson's ratio v = 0.30 and mass density p = 1000.0 is assumed. All variables are given in compatible units in this section. A plate with depth equal to d is assumed. The spatial boundary element discretization of the problem with quadratic boundary elements is shown in Fig. 1. A total of 80 quadratic boundary elements, i.e., 160 boundary nodes have been used. The impact loading has a zig-zag form, is of one period duration and takes a maximum value equal to 100.00 at time point 0.06, a minimum value equal to —100.00 at time instance 0.18 and becomes zero at time 0.24. The position of the loading is determined by the length of segments AH = 12.08 and AG = 12.91; the total length of boundary AF being equal to 25.00. With a time step At = 0.02025, a total of 100 time steps is analysed. The existence and the position of the hidden crack influence the impact-echo results, which therefore can be used for inverse crack analysis purposes. For a plate of thickness equal to d = 2.50 a crack of length equal to 7.50 and a loading perpendicular to the free boundary AF (i.e., along the Ox direction) the displacements of the free boundary AF in the calculated time interval are plotted in Figure 2. Normal ux and tangential uv displacement components are plotted for the 60 boundary nodes and
324
uy - no crack
ux - no crack 80
W^riw
60 40 20 10
20 30 40 ux - classical crack
50
60
10
100
100
20 30 40 uy - classical crack
50
^///l\i\r
80 60 40 20 10
20 30 40 ux - unilateral crack
50
60
100
10
20 30 40 uy - unilateral crack
-N^
.W^SjN O •;
50
100 80
\{f/m
60
J* J
40 20 10
20
30
40
50
60
10
20
30
40
50
60
Figure 2: Influence of a classical or unilateral sublayer crack on the displacement history of the free surface.
325 2 x10 •
ft 1.5
I V1 X ' **\
1 I
0.5
CD
I
0
a
S\
f /
/ J
li 1-
Iv- •
«s *
i
'*
\A'
\\
c
If p
#-0.5
»
: •'/ ' \
\ v .
»1 \\
1
<
...
d
-1
:''J -1 1
,:
i
A*i
V% /
\ /' \ /'
IN
\* i *.y
il
\\
4
A/ \t 1
20
Vi
- ' J
:l J
i
-1.5 -2
:-' i: -•.' / V
.'i ' AV / bA\ ! H IV
40
*
60
80
100
time step
Figure 3: Effect of the position of a sublayer crack on the impact-echo waveform.
80 - 100 time steps. For an impulsive loading parallel to the plate's boundary (i.e., along the Oy direction) the difference of the displacement responses between the crack and the no-crack cases in the same (Oy) direction is plotted in Fig. 3. A crack of length lc = 7.50 is considered. The center of the crack lies: for the case a at yc = 9.99, for case b at yc = 11.66, for case c at yc = 13.33, for case d at yc = 15.00. The distance yc is measured from point B of the boundary BE (see Figure 1). A unilateral contact crack with a friction coefficient equal to fi = 0.30 is assumed for the plots of Fig. 3. Finally, for the same crack length and for several values for the thickness of the plate (depth at which the crack lies) a parametric investigation has been performed. The corresponding waveforms at the point where the external dynamic loading has been applied are plotted in Figure 4. Both the loading and the measured displacements (waveforms) are perpendicular to the free boundary of the plate. The upper set of results in Figure 4 correspond to classical, unpressurized cracks and (from the top to the bottom) to thickness of the plate equal to d = 1.00, 1.25, 1.50, 1.75, 2.00, 2.25, 2.50 respectively. The lower set of results correspond to unilateral cracks with zero initial crack opening. The effect of different wave inversion and reflection phenomena due to the two mechanisms is obvious. A detailed investigation of the previously outlined impact-echo problem can be found in [16].
326
Figure 4: Influence of layer thickness and crack properties on the waveforms.
4
Inverse problem
Let us consider that the sought crack can be fully characterized by a set of variables grouped together in the vector z. The existence or not of a crack at a given position (i.e., a discrete variable) and the length of a crack or its position (continuous variables) are examples of possible elements of z. The mechanical model outlined in the previous section includes the parameters z. Accordingly, the response of the structure (i.e. the displacements u, the boundary tractions t, the stresses etc) are parametrized by z and depend on its current value. Let us assume that the waveforms at given points of the structure j = 1 , . . . , n, i.e., the time history of displacements u\ (z), i = 1 , . . . , m. Here the notation of (7) is used with a given time discretization. Moreover n measurement points and m time steps are considered. For the real structure with an existing crack the corresponding waveforms (for instance, the experimental measurements) read, after discretization
uf. The inverse problem leads to the following least square error minimization:
" E
£ [uf(z)-B?f.
(8)
j = l , n x=l,m
Problem (8) is actually a waveform matching technique. It is a nonlinear least square,
327 thus a global optimization problem, due to the nonlinear relation z —> u(z), as it has been discussed previously. The solution of problem (8) provides an estimate about the sought crack. It should be noted that waveform or stress-wave matching techniques are powerful methods for inverse nondestructive evaluation. Due to their difficulty, very few recent investigations try to formulate them in the right, mathematical optimization framework. Related studies which use neural network techniques, have been recently reported. Among others, [13] use waveform matching of an impact echo signal in layered media and [21] use stress-wave matching for pile capacity prediction.
4.1
Impact-echo example
Let us consider the different plate depths of Figure 4 and the corresponding impactecho waveforms for the case with a classical crack, a unilateral crack without friction and a unilateral crack with friction coefficient equal to /z = 0.30 (the latter case is not shown in Fig. 4). On the assumption that the measured waveform corresponds to a unilateral crack without friction behind a layer of depth equal to d — 1.75, the minimization problem (8) is used for the inverse analysis. For clarity of the presentation, but also for better numerical performance (cf., [15]) the logarithm of this function is used in the minimization procedure. For this case the error function is plotted in Figure 5. Here crack type 1 corresponds to the classical crack case, crack type 2 is the unilateral crack case and crack type 3 is the frictional crack case. The seven different layer depths are used in the other direction of this plot. In a preliminary investigation the problem has been solved by a genetic optimization algorithm. In order to reduce the time used in the computer implementation the results of the parametric investigation (cf., Fig. 4) are first produced and stored off-line. All other results for the intermediate crack thicknesses are produced by interpolation between the tabulated, stored waveforms (a strategy which is analogous to the response function approximation which is widely used in complicated structural optimization tasks). The genetic optimization method is the same with the one used in [18] with respect to flaw identification problems using frequency domain data. The method is described in [3] and a FORTRAN implementation is also available. It should be mentioned here that a back-propagation neural network approach has been able to solve this problem partially. In fact, for a given crack type the question of determining the depth of the layer is a trivial task for the neural network. The problem is due to overtraining of the neural network. Analogous deficiencies of the neural network approach, which have been observed in neural crack identification in frequency domain elastodynamics (see [17]) led us to use global, genetic algorithms (see also [18]).
328 logarithmic error function
crack type Figure 5: Plot of the logarithm of the error function.
5
Conclusion
The natural framework to formulate and solve crack identification problems and, in general, inverse problems in mechanics is the use of global error optimization. It ' should be emphasized that a local minimum may be of no use in this context. The authors' recent experience in this area is briefly described here. A simple impactecho problem is presented in some details, in order to give the reader a feeling of what kind of data may be involved in these studies. More complicated structural identification problems arise in real-life applications. The use of appropriate global optimization algorithms in connection with powerful structural analysis techniques will help automatize the solution of the corresponding inverse problems. In this area, by restricting the interest to more elementary applications and by taking advantage of existing engineering experience, one tries to avoid the use of global optimization. The way outlined in this paper and recent advances in both structural analysis and instrumentation technology will make a larger class of inverse problems in mechanics tractable.
329
Acknowledgements The author would like to thank Prof. Dr. Heinz Antes, from the Technical University of Braunschweig, for helpful discussions and support.
References [1] A.M. Al-Fahed, G.E.Stavroulakis, P.D. Panagiotopoulos (1991), "Hard and soft fingered robot grippers", Zeitschrift fur Angew. Mathematik und Mechanik ZAMM 71 (7/8), 257-266. [2] H. Antes, P. D. Panagiotopoulos (1992), The boundary integral approach to static and dynamic contact problems. Equality and inequality methods, Birkhauser, Basel, Boston, Stuttgart. [3] D.L. Carroll (1996), "Chemical laser modeling with genetic algorithms", AIAA Journal, 34(2) 338-346. [4] V.F. Dem'yanov, G.E. Stavroulakis, L.N. Polyakova, P.D. Panagiotopoulos (1996), Quasidifferentiability and nonsmooth modelling in mechanics, engineering and economics, Kluwer Academic, Dordrecht, Boston, London. [5] O. Hunaidi (1998), "Evolution-based genetic algorithms for analysis of nondestructive surface wave tests on pavements", NDT & E 31(4), 273-280. [6] L.E. Kannal, J.F. Doyle (1997), "Combining spectral super-elements, genetic algorithms, and massive parallelism for computationally efficient flaw detection in complex structures", Computational Mechanics 20, 67-74. [7] Z.Q. Luo, J.S. Pang, D. Ralph, D. (1996), Mathematical programs with equilibrium constraints, Cambridge University Press, Cambridge. [8] E.S. Mistakidis, G.E. Stavroulakis (1998), Nonconvex optimization in mechanics. Algorithms, heuristics and engineering applications by the F.E.M., Kluwer Academic, Dordrecht, Boston, London. [9] H.G. Natke, C. Cempel (1997), Model-aided diagnosis of mechanical Springer Verlag, Berlin, Heidelberg.
systems,
[10] M. Oeljeklaus, H.G. Natke, H.G. (1996), "Parallel interval algorithm for parameter identification in the frequency domain". Inverse Problems in Engineering, 3, 305-325.
330 [11] J. Outrata, M. Kocvara, J. Zowe (1998), Nonsmooth approach to optimization problems with equilibrium constraints. Theory, applications and numerical results. Kluwer Academic, Dordrecht, 1998. [12] P. D. Panagiotopoulos (1985), Inequality problems in mechanics and applications. Convex and nonconvex energy functions, Birkhauser, Basel, Boston, Stuttgart. [13] D. Pratt, M. Sansalone (1992), "Impact-echo signal interpretation using artificial intelligence", ACI Materials Journal 89(2), 179-187. [14] G.E. Stavroulakis (1995), "Optimal prestress of cracked unilateral structures: finite element analysis of an optimal control problem for variational inequalities," Computer Methods in Applied Mechanics and Engineering, 123 231-246. [15] G.E. Stavroulakis, H. Antes (1997), "Nondestructive static unilateral crack identification. A BEM - Neural Network approach," Computational Mechanics, 20(5), 439-451. [16] G.E. Stavroulakis (1998), "Impact-echo from a unilateral interlayer crack. LCPBEM modelling and neural identification", Engineering Fracture Mechanics (submitted) . [17] G.E. Stavroulakis, H. Antes (1998), "Neural crack identification in steady state elastodynamics", Computer Methods in Applied Mechanics and Engineering (in press). [18] G.E. Stavroulakis, H. Antes (1998), "Flaw identification in elastomechanics: BEM simulation with local and genetic optimization", Structural Optimization (in press). [19] G.E. Stavroulakis, H. Antes, P.D. Panagiotopoulos (1998), "Transient elastodynamics around cracks including contact and friction", Computer Methods in Applied Mechanics and Engineering, (in press). [20] G. E. Stavroulakis, H. Giinzel (1998), "Optimal structural design in nonsmooth mechanics", In: Multilevel Optimization: Algorithms and Applications, A. Migdalas, P. Pardalos, P. Varbrand (eds.), Kluwer Academic, Dordrecht, Chapter 4 pp. 91-115. [21] C.I. Teh, K.S. Wong, A.T.C. Goh, S. Jarithojam (1997), "Prediction of pile capacity using neural networks", ASCE J. of Computing in Civil Engineering 11(2), 129-138. [22] G. Yagawa, H. Okuda (1996), "Neural networks in computational mechanics", Archives of Computational Methods in Engineering 3(4), 435-512.
331 [23] H. Yamanaka, H. Ishida (1996), "Application of genetic algorithms to an inversion of surface-wave dispersion data", Bulletin of the Seismological Society of America, 86(2), 436-444.
This page is intentionally left blank
Combinatorial and Global Optimization, pp. 333-355 P.M. Pardalos, A. Migdalas and R. Burkard, Editors © 2002 World Scientific Publishing Co.
Normal Branch and Bound Algorithms for General Nonconvex Quadratic Programming H o a n g Tuy ( h t u y Q i o i t . n e s t . a c . v n ) Institute of Mathematics, P.O. Box 631, Bo Ho, Hanoi, Vietnam
Abstract We discuss a general class of branch and bound (BB) algorithms for solving nonconvex quadratic programs. Bounds in these algorithms satisfy a condition of normality, characterized by the property that convergence of the algorithm is ensured whenever a suitable branching rule (called "normal") is used. Most of the currently best known bounding methods satisfy this normality condition and so can be systematically incorporated into normal BB algorithms. A general method for generating normal bounds is proposed which consists in converting the problem into an equivalent one with all the nonconvexity concentrated in a number of coupling constraints, such that when the latter are properly relaxed the problem becomes easily solvable. Existing and new relaxations are discussed within this framework of variables decoupling. K e y w o r d s : Nonconvex quadratic programming, branch and bound algorithm, normal bounding, normal branching, variables decoupling, normal linear relaxations, normal SDP relaxations.
1
Introduction
We are concerned with t h e general q u a d r a t i c p r o g r a m m i n g problem (QP)
/i:=min{/o(i)|/fc(i)<0
k = l,...,m,
x e X]
(1)
334 where X is a polyhedron in Rn and fk(x), h{x) = \{x,Qkx)
k = 0 , 1 , . . . , m, are quadratic functions: + {ck,x)+bk
(2)
(c* € Rn, bk G i?, and Q* being n x n symmetric real matrices). Due to numerous applications ([20], [7], [14], [2],[17], [24], [8] and references therein) this problem has been a subject of extensive investigation in recent years. Several approaches have been proposed for solving it, most of which are of branch and bound type and differ by the way of computing bounds (bounding) or/and the way of partitioning (branching). Let QP(M) be the problem QP restricted to a partition set M, i.e. the problem obtained from replacing X by XDM. Usually, a (lower) bound on M is obtained as the optimal value in an auxiliary problem RP(M) called a relaxation of QP(M). For instance, a relaxation can be defined by replacing the feasible domain with some larger polyhedron or convex set, and/or replacing the objective function with a minorant whose minimum over the relaxed domain can easily be computed. Several different relaxations along this line have been proposed in the literature. Let us mention the Lagrangian relaxation providing dual estimates [20], the linear relaxation [2], the reformulation-linearization and convexification [17], and more recently, the SDP (semidefinite programming) relaxation ([13], [8],[14], [15])... To incorporate the corresponding bounds into a BB algorithm one must devise a branching rule consistent with the bounding in order to ensure convergence. So far most BB algorithms use exhaustive rectangular partition, i.e. the space is divided into (hyper)rectangles in such a way that any nested sequence of partition sets generated by the subdivision process shrinks to a single point. The disadvantage of such exhaustive subdivision processes is that the guaranteed convergence cannot usually be very fast, because information on the current problem situation is not suitably exploited, and the diameter of the current most promising partition set often decreases slowly. The purpose of the present paper is to discuss a general class of branch and bound algorithms for solving QP called normal. In these algorithms bounds satisfy a certain tightness condition, while branching exploits this property to accelerate the convergence. As it turns out, the normality property is satisfied by many bounds so far developed by different authors. Therefore, each of these boundings can be systematically incorporated into a normal BB algorithm, whose convergence is guaranteed and usually better than exhaustive procedures. A general method for generating normal bounds is by variables decoupling. The essential idea of this method is to convert the problem into an equivalent one with all the nonconvexity concentrated in a number of "coupling constraints". When the latter are properly relaxed, the problem should become linear, or convex, or easily solvable. Since a certain amount of coupling information is contained in the linear constraints, to exploit this information a device is proposed which consists in transforming a set of linear constraints into equivalent quadratic ones. Using this transformation one can
335 develop various normal boundings, and, in particular, convert any given problem with mixed linear and quadratic constraints into an all-quadratic program, which may be more convenient for certain efficient relaxation methods. These results shed new light on the relationship between various relaxation procedures and may lead to a more efficient use of these relaxations for solving nonconvex quadratic programs. The paper is organized as follows. After the Introduction, we shall introduce in Sections 2 and 3 the concepts of normal bounds and normal branching, which are the two basic constituents of a normal BB algorithm for solving QP. Examples will also be given to illustrate these concepts and show that some existing BB algorithms for quadratic programming use actually normal bounds and could therefore be improved by "normalizing" the branching operation. The next sections are mainly devoted to normal bounding methods based on decoupling relaxation. As applied to QP these methods often require certain sets of linear constraints to be transformed into equivalent sets of nontrivial quadratic constraints. Section 4 discusses this transformation in a general setting. The general decoupling scheme is described in Section 5, and a hierarchy of normal relaxations developed within this scheme is discussed in the last Sections.
2
A generic BB algorithm
In this section we present a generic branch and bound algorithm for (QP). A branch and bound (BB) algorithm involves two basic operations: branching, i.e. partitioning the space into polyhedral domains of the same kind ((hyper)rectangles, simplices or cones); and bounding, i.e. for each partition set M estimates a lower bound P(M) for the objective function values over the feasible points in M. For definiteness, below we shall assume that the partition sets are (hyper) rectangles of the form M = [p, q] := {x € Rn\ p < x < q}.
• NORMAL B O U N D I N G
To each partition set M is associated with the subproblem: (QP(M))
f*M := min{/ 0 (x)| fox) < 0{i = 1 , . . . ,m), x e X n M}
The bounding operation consists of assigning M a number /3(M) (lower bound) satisfying
W) < rM. 0) This is done by considering a relaxation RP(M) of QP(M) which is easier to solve and such that its optimal value yields /3 = (M) satisfying (3). The problem RP(M)
336 will be referred to as the bounding problem for M. The bounding (or the relaxation) is said to be normal if it is such that: (*) Any corner (extreme point) of M which is feasible to RP(M) is also feasible to QP(M) and for any nested sequence {M„} of partition sets if a corner x* of M» = n ^ M „ is the limit of a sequence x" such that x" is feasible to RP(M„), then x* is feasible to QP. One usual method of relaxation is by means of convex minorants. A function
(4)
Proof We need only prove the second part of condition (*) because the first part is obvious. Let x* be a vertex of M, = nf=1Mv such that x* = lima;", where each x" is feasible to RP(M„). Denote the vertex set of M„ by V(M„) = {v"- 1 ,..., w"'2"}. By taking subsequences if necessary, we may assume that vv'x —> v*''. It is easy to see that M, is just the rectangle with vertices v*'1,..., i>*'2". Indeed, if a; € M* then for every v one must have x = YA=I ^v,ivV'\ 'with \ „ t i > 0, Y^li K,i = 1- We may assume K,i -> K,i with XUii > 0, E 2 =i K,iV*'i = 1. Thus, M* C [v*'1,... ,v*'2"}. Since any v*'% obviously belongs to M», the converse is also true, proving our assertion. It then follows that x* = limi;", with v" e V(MV). Then v" — x" —> 0, hence, by hypothesis, for every k : ^"{vv) - ^"(x") -> 0. Since
• NORMAL BRANCHING
The algorithm starts with an initial rectangle M 0 known to contain an optimal solution. At iteration k of the algorithm, a number of partition sets may have proved to be nonpromising and hence have been discarded from further consideration. The search is then continued by further dividing the partition set M* that corresponds to the smallest lower bound among all partition sets still of interest. For this operation many existing BB algorithms use the bisection, i.e. Mk is subdivided into two equal subrectangles by a hyperplane perpendicular to an edge of M^ at its midpoint. This edge is chosen so as to ensure that any infinite nested sequence {M^} generated by
337 the subdivision process collapses to a point, i.e. satisfies diam Mkv —> 0 as v —• +00. Such a property, called exhaustiveness of the subdivision process, is usually sufficient to guarantee convergence of the algorithm, but the convergence achieved by exhaustiveness may be very slow, because it may take many iterations to significantly reduce diamMfc. To better exploit the normality property of the bounding we use a more flexible subdivision rule called "normal subdivision" and defined as follows. At iteration k let uj(Mk) be an optimal solution of the bounding problem for Mk. If u{Mk) happens to be a corner of Mk then, by normality of the bounding, ui(Mk) is feasible to QP(Mfc), hence minQP(M fc ) = p(Mk) < min (QP) (because /3(Mk) < P{M) for all partition sets M generated up to this stage), hence, P{Mk) = min(QP), i.e. ui(Mk) solves (QP). So Mk = \pk,qk] is candidate for further subdivision only when ujk := ui(Mk) is not a corner of Mk. Intuitively, to ensure fast convergence the branching should help to bring the point co(Mk) as fast as possible to a corner of the partition set. This motivates the following Normal Subdivision Rule: Define 7)i = min{wf - pk, qk - wf},
= ik € a r g m a x { ^ | i =
l,...,n]
and divide Mk into two subrectangles Mk,i = MkC\ {x\xik > uikk},
= Mfc>2 = Mk n {x\xh
< cjkJ
(we say that Mk is bisected via (u>k,ik)). Once the branching and bounding have been specified, the BB algorithm proceeds according to a standard scheme in which, at a general iteration k : - If for some partition set M the optimal solution w(M) of the associated bounding subproblem happens to be a corner of M then UJ(M) is a feasible solution and can thus be used to update the incumbent (current best solution) xk. - The candidate for branching is Mk 6 argmin{/3(M)| M € Hk}, where TZk denotes the collection of all partition sets M still of interest at this iteration, i.e. such that
p(M) < f(xk).
- The algorithm terminates when Hk = 0 (then xk is a global optimal solution). A BB algorithm using normal bounding and normal branching is called normal. T h e o r e m 1 A normal BB algorithm, if infinite, generates an infinite sequence every cluster point of which is an optimal solution of (QP).
{xk},
Proof The subdivision ensures that, when infinite, the algorithm generates a nested sequence of rectangles {Mv = [p",?"]} such that w" := w(M„) —> x* and x* is a
338 corner of \p*,q*] = rtv\pf,q''] (see [22], Theorem 5.5). Let M, = \p*,q*}. From the definition of normal bounding, x* is feasible to (QP) and f(x*) = lim/(w") < fx. Hence f(x*) = fx, i-e. x* is an optimal solution. • Thus, each normal bounding can be systematically combined with a normal branching rule to yield a convergent BB algorithm. Remark 1 (Sirnplicial Subdivision) Sometimes it is more convenient to partition the space into simplices. For sirnplicial subdivision the above concept of normal bound applies, but a normal subdivision rule is defined as follows ([22]): Let a(M) be the generation index of a partition set M defined inductively, such that: the initial simplex has index 1; if a(M) = v then any son of M has index v + 1. Let A be a given infinite sequence of natural numbers. If a(Mk) G A, then bisect Mj,; otherwise divide Mk via the point u){Mk). A subdivision via W(MA,) is called an w-subdivision. The choice of the sequence A is up to the user. It is designed to ensure that any infinite nested sequence of partition sets involve infinitely many bisections. In practical implementation, it suffices to perform w-subdivision in most iterations, and to use bisection only occasionally, as a means to overcome jams when the algorithm seems to slow down.
3
Examples
It turned out that many bounds for QP(M) developed in the literature are normal. We give here two simple examples. Example 1 Relaxation by Tight Convex Minorants. Let us first recall some concepts and results from the theory of semi-definite programming which will also be needed later. Denote by Sn the set of all symmetric n x n matrices and for any two A,B € S„ write A > B (A y B, resp.) to mean that the matrix A — B is positive semidefinite (definite, resp.). In particular: A y 0 & (y, Ay) > 0 Vj/ € Rn A y 0 «*• (y, Ay) > 0 Vy + 0 A Linear Matrix Inequality (LMI) is a matrix inequality of the form n
A{x) := A0 + J2 XjAj y 0,
339 where A0,Ai,... ,An € Sn. Since {x\ A(x) >z 0} = nyeRn{x\ (y,A(x)y) > 0} a LMI is a convex inequality of a special form. A finite system of LMIs A^(x) y 0,..., A<-h\x) >: 0 can be expressed as a single LMI diag(A (1) (x),..., A^(x)) >r 0. A semidefinite program (SDP) is a linear program under a LMI constraint, i.e. a problem of the form min{(c,:r)| A{x) ^ 0}. SDPs are convex programs efficiently solvable by interior point methods (see e.g. [13]; [24]). Now let p = supt;=0 j
m
pk where pk denotes the spectral radius of Qk. Clearly
p = min{t\ Qk+tl>z0,
k = 0 , 1 , . . . , m},
so p can be computed by solving a SDP. Then every function gk(x) := fk(x) + p\\x\\2, k = l,...,m,is convex and since the convex envelope of — ||a;||2 on M = [p,q] is — J2]=\[(Pj + 1j)xj — PjQj], a convex minorant of fk(x) = gk{x) — p\\x\\2 on M is n
n
~ PJ)(XJ - %)• (5)
3=1
Since any corner x of M satisfies Xj e {Pj,
s t
-k = 1,..., m
(6)
x e XnM.
By Theorem 1, using a normal branching rule a convergent BB algorithm for solving (QP) can thus be obtained with bounding based on the relaxation (6). In a BB algorithm earlier proposed by Androulakis, Maranas and Floudas [3] this bounding was used but with a bisection rather than a normal branching rule. Example 2 Sandwiching Relaxation We can rewrite QP(M) as min (x,y°) + (c°,x) s.t. (x,yk) + {ck,x) + bk < 0 k= yk = Qkx, p< x < q, x G X
l,...,m
340 For k = 0 , 1 , . . . , m and i = 1 , . . . , n define rf = mm{yk = Qkx\ p < x < q} = E"=i min^-p,-, a* ^ } sk = max{j/f = gfx\ p<x
(or v € {7,5}). Therefore, for every k =
Vx S \p, q],
where
k = l,...,m k = l,...,m
(>
Again by Theorem 1, a convergent BB algorithm for solving (QP) can be devised with the above bounding and a normal branching rule. The BB algorithm by AlKhayyal et al. in [2] uses the same bounding but with a bisection rather than a normal branching rule.
4
Quadratic system equivalent to a linear system
In the rest of this paper we will discuss normal bounds obtained by variables decoupling. Since this method uses a transformation of a set of linear constraints into an equivalent set of quadratic constraints, we first study this transformation. From elementary mathematics we know that a set of two linear inequalities of one real variable x, such as 1 < x < 2, is equivalent to the quadratic inequality (x — l)(x — 2) <
341 0, i.e. x2 — 3x + 2 < 0. Generalizing this fact we will show that under mild conditions a set of linear inequalities in x £ Rn is equivalent to a set of quadratic inequalities. Consider a system of linear inequalities (S0)
p, < x{ < qi i = 1 , . . . , rh
(Si) where n\ < n,x,o? £ Rn,aj
a?x
j — l,...,h
(8) (9)
€ R and o?x stands for (a? ,x). Assume that:
(i) pi < qi for at least one i € { 1 , . . . , n i } (ii) There is no x satisfying a?x — aj > 0 Vj = 1 , . . . , h, or equivalently, Mx G Rn
min (ajx - a,) < 0.
(10)
j=l,...,h
Obviously condition (10) is fulfilled if the system (9) includes an inequality of the form - o o < rio < xio < sio < +oo (11) for some i0 and rio < sio. Proposition 2 Under the above assumptions, the linear system (So) (Si)
Pi < xt < qt i = 1,.. .,71! a?x < aj
j = 1,....,h
(12) (13)
is equivalent to the quadratic system (a3x-aj)(xi-pi)<0 (a3x-
.=
1
ctj)(qi - Xi) < 0
, n i ; j = l,...,h,
(14)
where we agree that A x (+oo) < 0 means A < 0. Proof It suffices to show that (14) implies (12)-(13) because the converse is evident. Let x satisfy (14). If pt = —oo or qt = +oo for some i < n\ then for every j we have a?x — «j < 0 from (14). Otherwise, taking i < n\ such that —oo < pi < qi < +oo we have (a^x — ctj)(xi — pi + qi — x^ = (a^x — ctj)(qi — Pi) < 0, hence o?x < aj. Furthermore, for the given x there exists by (10) an index j 0 e { 1 , . . . , h] such that ai°x — aj0 < 0. But (a^x — aj0)(xi —Pi) < 0, hence Xi—pi > 0 for every i = 1 , . . . , n\. a
Thus, under mild conditions a system of linear inequalities in x can be transformed into an equivalent system of quadratic inequalities.
342 Now for every quadratic function f(x) denote by f(x, w) the affine function of (x, w) that results by replacing in the expanded form of f(x) every product xtXj by wtj. For instance, if f(x) = (ax — a)(bx — /?) where a e Rn, b G -R", a, /3 G R then n
n a
i^3Wii ~ abx — Pax + aP-
f(x, w) = ] P £ »=1 j ' = l
Sometimes we will find it more convenient to denote f(x, w) by [/(a:)]/ following Sherali and Tuncbilek (see e.g. [17]). The system (12-13) can then be rewritten as [(a? x - oij)(xi - pi)}t < 0; [(a? x - aj)(qi - Xi)]e < 0 Vi,j Wij = XiXj
i = l,...,ni,j
= l,...,h,
(15) (16)
where the inequalities (15) are affine in x, w. Proposition 3 / / the system (13) is consistent and implies a°x — a^ < 0, then the system (15) implies [(a°x - a0)(xi - pi)]e < 0,
[(a0 a; - Q 0 ) ( % - Xi)]i < 0, i = l,...,m.
(17)
Proof The hypothesis of the Proposition means that the system a?x — aj < 0
(j = 1 , . . . , h),
a°x — OIQ > 0
is inconsistent. Since (13) is consistent it follows from Farkas-Minkowski Theorem (see e.g. [22]) that there exist Xj > 0, j = 1 . . . , h satisfying h
(a°x — Qo) — 5Z ^j(a''x ~ aj) = 03=1
Then for any i = 1 , . . . , n\ : [(a°x - a0)(xi - pi)]e = =
[££=i Xj(ajx - a})(xi - pi)]e E?=i \[{ajx - aj){xi - Pi)]e < 0.
The other inequalities (17) are proved analogously.
•
Now define o~i = sup{|:Cj| | a?x — aj < 0, j = 1 , . . . , h} and assume that / = {i\ o-i < +00} £ 0. For every i e / let r*, Sj be two real numbers such that r, < Sj, cr, € [r;, s,].
(18)
343 Corollary 1 The system (15) implies, for all i = 1 , . . . , n\ and all j £ I: \{XJ - Sj){xi - pi)]e < 0,
[(rj - Xj)(xi - pi)]t < 0;
(19)
[{XJ - Sj)(qi - Xi)]e < 0,
[(r, - Xj)(qi - Xi)]e < 0;
(20)
Proof This follows because Xj — Sj < 0, r,- — Xj < 0, j € I, are implied by (13).
•
Proposition 4 Under assumptions (10) and (18): (i) / / (x, w) satisfies (15) then x satisfies (12)-(13) and conversely. In other words, x is a feasible solution of (12)-(13) if and only if there exists w = {IDJJ} such that (x,w) satisfies (15). (ii) / / (x, w) satisfies (15) and in addition x is a corner of the rectangle {(x\,..., xni) | Pi < Xi < qi (i = 1 , . . . , ni)} then w^ = XiXj for alii = 1 , . . . , ni and all j € / . Proof (i) That (15) is implied by (12)-(13) is obvious. Suppose now that (x,w) satisfies (15). If p, = —oo or qt = +oo for some i = 1 , . . . , nx then a^x - ctj < 0 Vj. Otherwise, let i € { 1 , . . . , ni} be such that —oo < Pi < qi < +oo. For any j we have from the first inequality in (15): n
0 > [(ajx)xi - ctjXi - Pi(ajx - aj)]e = ^2 aiwki - ctjXi - pi(ajx - otj) and from the second inequality n
0 > [—(ajx)xi + ajXi + qi(ajx — aj)]i = - ^Z aiwki + ajxi
+ Qi{a3x — aj)-
k=\
Consequently, n
ajXi + pi{a'x — aj) >^2
a w
i ki
> ajxi + Qi{a3x — aj).
k=\ j
This yields (qt - Pi)(a x - aj) < 0, and hence, a?x - aj < 0, proving (13). On the other hand, for j € I and i = 1 , . . . , n\, by Corollary 1, (a;, w) satisfies (19), i.e. Wij - PiXj - rj(xi - Pi) > 0,
-Wij + PiXj + Sj(xi - p^ > 0,
hence PiXj + rj(xi - Pi) < w^ < PiXj + Sj(xi - p^.
(21)
This implies that (SJ — rj)(xt —pi) > 0 and hence xt > Pi- The derivation of the other inequalities (12) is analogous.
344 (ii) Suppose now that (x,w) satisfies (15) and, in addition, that £ is a corner of {(xi,...,xni)\ Pi <Xi< ft (i = l , . . . , n i ) } , i.e. x{ 6 {ft, ft} for every i = 1,. . . , n i . Ii j € I then for every i 6 { 1 , . . . , n{\ such that Xi = Pi we have from (21): PiXj < Wij < PiXj Similarly, from (20) we have qiXj + Sj(xi - ft) < Wij < qtXj + Tj(xi -
ft)
hence for every i such that Xj = ft : ftXj < Wy < qiXj
i.e. again Wy = XiXj. This completes the proof.
•
Remark 2 (Equality Constraints) Proposition 3 shows that in forming the quadratic system (15) equivalent to (12)-(13) it suffices to consider only the maximal number of non redundant inequalities in (13). Note that if the system (13) includes an equality a>°x — aj0 = 0 then the quadratic system (14) should include the equalities (aJ0x - ak)(xi
- Pi) = 0,
(aJ0x - a J0 )(ft - xt) = 0
where we agree that A x (+oo) = 0 implies A = 0. Also the linear system (15) should include the equalities [(aJ0x - ajo){xi - Pi)]t = 0,
[(a^x - a J0 )(ft - Xi)]t = 0.
That a quadratic system may result from pairwise multiplying linear constraints has been long known and has been exploited in nonconvex quadratic programming by Shor and Stetsenko [20] and Sherali and Tuncbilek [16]. However, so far little notice has been taken of the equivalence between the mentioned systems and properties such as the one expressed in Proposition 4. As will be shown in Section 5 these properties serve as the basis for constructing a full hierarchy of normal boundings which can be incorporated into normal efficient branch and bound algorithms.
5
General decoupling scheme
As we saw in Section 2 the central issue in devising a normal BB algorithm for solving a quadratic program QP is: given a rectangle M = \p, q], p ^ q, to construct a normal relaxation for min fo(x) subject to QP(M) fk(x)<0, k = l,...,m
x6 inM
345 A general scheme for generating normal relaxations is the following.
D E C O U P L I N G RELAXATION SCHEME:
(i) In the expanded form of each quadratic function replace every product XiXj by a new variable wtj. Then fk(x) becomes an affine function of (x,w) : fk(x, w) = ^2 altjWij + Y, CkjXj + h i<j
j
and the problem QP(M) becomes QP(M)
min f0(x,w) subject to fk(x,w) < 0, k-l,...,m, (x,w) eE(M)
where E(M) is the set of all (x, w) satisfying Wij = XiXj
x € X, (ii) Replace the nonconvex set E(M) C(M) D E(M) such that:
Vi,j
p<x
(22)
(23)
by a normal relaxation, i.e. a convex set
(**) Any corner of M which belongs to C(M) also belongs to E(M) and for any nested sequence {M„}, whenever a corner x* of M» = n„M„ is the limit of a sequence {xv} of points x" E C(MV) then x* £ E(Mt). In view of condition (**) the convex problem RP(M)
min f0(x,w) subject to fk(x,w) < 0, k = l,...,m, (x,w) eC(M)
will be a normal relaxation of QP(M). Thus, depending on how the set C(M) satisfying condition (**) is constructed, we will have different normal boundings, and hence, different normal BB algorithms for (QP).
6
Linear relaxations
On the basis of the above results a full hierarchy of normal relaxations can be developed.
346 RELAXATION
RQ 0
Denote C = r\i<jCij,
Cij = {(x,w)\
Wij = XiXj, Pi <xt < qu pj < Xj < qj},
(24)
so that E(M) = {(x, w) e C\ x e X}.
(25)
By Proposition 4 where n\ = n and (13) is taken to be identical to (12), a point x satisfies Pi<Xi
(i =
l,...,n)
if and only if there exists w = {wy-} satisfying \{xi - Pi)(Pj - Xj)]e < 0,
[(xi - Pi) {qj - Xj)]t < 0 Vi < j ,
i.e. 9ij(xi, Xj) := max{piXj + PjX{ - ptpj, qtXj + qjX{ - q{qj\ < wyWij < min{qiXj + PjXt - pjq{, p{Xj + qjXt - p ^ - } := /iy-(xit Xj).
^
. ^
<
~J'
Furthermore, if a corner x of the rectangle M — \p, q] is a solution of the latter system then Wij = XiXj Vi,j. It then easily follows that the set C(M) = {{x,w)\ xeX,
gij(xi:Xj)
< wy- < hij{xi,Xj) Vi < j}
satisfies condition (**), i.e. is a normal polyhedral relaxation of E(M). normal linear relaxation of QP(M) is RQ0(M)
min f0(x,w) fk{x,w)<0
s.t. k = l,...,m
9ij(xi,Xj) < w^ < hij(xi,Xj)
Hence a
(27) (28) 1
x e X.
(29) (30)
Note that gtj(xi,Xj), hij(xi, Xj) are the convex and concave envelopes, respectively, of the function XiXj on the box {(x,,a;j)| Pi < Xi < qit pj < Xj < qj} (see e.g. [22], Proposition 3.16). Therefore convCy = {(x,w)\
gtj(xi,Xj) < wy- <
hij(xi,Xj)}
and the above relaxation amounts to replacing the nonconvex set C = C\i<jCij D E(M) = {(x,w) € C\ x € X} by the convex set nj<jConvC y . It can be proved that the relaxation RQ 0 (M) is tighter than the sandwiching relaxation (Example 2).
347 Remark 3 (Complicating Variables) Let Qk = [a*,-]. Of course, a coupling constraint (22) should be considered if and only if the corresponding product present in the expanded form of at least one function fk(x), i.e. if and only if akj ^ 0 for at least one k € { 0 , 1 , . . . , m}. Define T Ii
= ~
{(ij)\ a% / 0 for some k = 0,l,...,m} {i\ (i,j) € T for some j = 1 , . . . , n}.
Then instead of (22) one should consider only wa=xiXi
V(i,;)6T
(31)
and these nonconvex constraints should be relaxed to the constraints gij(xi,Xj) < Wij < hij(xi,Xj)
V(i.j) G T.
If | i i | < n we may assume, without loss of generality, that I\ = { l , . . . , n 1 } for some n\ < n. Since the problem becomes linear when X\,...,xni are fixed, these variables can be considered as "complicating". So the branching should be performed upon x\,... ,n\, in other words, the partition sets should be rectangles of the form M = {x e M01 pi < Xi < qi (i = 1 , . . . ,rii)}, where M0 is an initial rectangle. For any such rectangle M denote MB = { ( z i , . . . , xni)\ Pi < Xi < qt (i = l , . . . , n i ) } . The normality of the relaxation implies that for any partition set M any corner of MB which is feasible to RP(M) is also feasible to (QP). At iteration k if an optimal solution uk = w(Mjfc) of RP(Mjt) is such that ( w i , . . . , w ni ) is a corner of M f then it is an optimal solution of (QP). Otherwise, M* is bisected via (w*,^) where ik is defined by r)k = minjw* - pk, qk - wk},
ik € a r g m a x ^ l i = 1 , . . . , n j .
Thus, when |7i| < n the above approach allows the BB algorithm to operate essentially in a space of smaller dimension than the original one. This may be an important advantage, since the computational burden of a BB algorithm increases rapidly with the dimension of the space in which branching is performed. RELAXATION R Q X
Denote Eij(M) = {(x, w)\ wtj = xtXj, x e X n M} so that E(M) = (\jEij(M). Since £y(Af) C Qj, a tighter relaxation than RQ 0 (M) could be obtained by replacing the set E(M) with n(ij)Conv.Ey(Af). Unfortunately, while convCj is easy to compute, it seems difficult to obtain an explicit description of convE^ (M). On the other hand, using the results of Section 4, we can
348 easily construct a normal polyhedral relaxation of E(M), i.e. a polyhedron C(M) containing E(M) and satisfying (**). Assume that the polyhedron X is defined by the linear inequalities g>x-ds<0
j = l,...,1
(32)
n
where gi G R ,dj € R and as usual g*x stands for (gi, x). Let us rewrite the inequalities (12) and (32) into a single set ajx-aj<0
j = l,...,2n + l,
(33)
where the In first inequalities are (12). Applying then Proposition 4, we obtain the normal relaxation RQi(M)
min f0(x,w) s.t. fk(x,w)<0 k = l,...,m gij(xi,Xj) < Wij < hij{xi,Xj) l(gjx-dj)(xi-pi)]e<0
1
,
,
(34) (35) (36) ,„„,
Remark 4 (Redundant Constraints) In the context of a BB algorithm, a partition set M at an advanced iteration may be entirely contained in some halfspaces {x\ g^x — dj < 0}. Let JM = {j = 1, • • •, Z| gjx - dj < 0 Vs e M } , so that JM = {J = 1,...,/|
m a x ( ^ - ^ ) < 0 }
(38)
xGV(M)
where V(M) denotes the vertex set of M. The inequalities (32) with j € JM are then implied by (12) and, if JM = {j\j > h}, say, then it follows from Proposition 3 that the inequalities (37) with j > l\ are redundant and hence, can be omitted without any harm for the tightness and normality of the relaxation. Remark 5 (Reformulation-Linearization) set of inequalities
By adding to RQi(M)
any subset of the
[{g'x-d^x-di^t^O
i,j = l,...,h,i<j,
(39)
one obtains the Reformulation-Linearization (RL) bounding method which was primarily proposed for linearly constrained quadratic programs in Sherali and Tuncbilek [16] (see also [17], [18]). It is not known to which extent the addition of the constraints (39) may improve the bound. In any case, there is a trade-off between the advantage of a more refined bound and the computational cost to be incurred when adding too many constraints to RQi(M).
349 There are in total | / ( / + 1) constraints (39), and in the general case a RL relaxation could involve up to | ( / + 2n)(l + 2n + 1) linear constraints. From (38) it is easy to determine the redundant constraints which should be omitted. In a sense the system (36-37) is minimal to guarantee the normality of the relaxation with a reasonable quality of the bound.
7
Semidefinite relaxation
Following the scheme described in Section 5, a normal relaxation of QP(M) is obtained via a normal relaxation of the set E(M), i.e. the set of all (a;, w) satisfying Wij = XiXj P<x
1< i <j
(40) (41)
In the previous linear relaxations, E(M) was approximated by a polyhedron containing E(M). We now try to construct a convex set C(M) D E(M) satisfying (**). RELAXATION
RQ 2
First, observe the following Proposition 5 The system (41) can be rewritten equivalently as a quadratic system of the form fk(x)<0 k = m + l,...,N (42) where fk(x), k = m + 1 , . . . , N are quadratic functions such that a vector x satisfies (41) if and only if there exists w satisfying fk{x,w)<0
k = m + l,...,N
(43)
and any vertex x of the box M = \p,q] for which there exists w satisfying (43) must also satisfy (40). Proof Recall that f(x,w) — [f(x)]e. We show that the system (42) can be chosen in many different ways. For instance, by writing (41) as two sets of inequalities Pi <Xi
(Xi - pi)(qj - Xj) < 0 Mi < j *-!.•••.". J-l.---.<
(46) («)
350
The second part of the Proposition follows from Proposition 4. D As was mentioned in Remarks 3 and 4, certain inequalities in (46)-(47) can be dropped because they are useless or redundant. Furthermore, by Proposition 2 with (44) as (So) and (32) as (Si), the set of inequalities (47) alone is still equivalent to (41) and satisfies all conditions mentioned in Proposition 5. Thus, the problem QP can always be converted into an all-quadratic program mm{fo(x)\fk(x)<0
k = l,...,N)
(48)
which in turn can be rewritten as min fo(x,w) s.t. fk(x,w)<0 k = l,...,N Wij — XiXj i,j = 1,.. •, n
(49) (50) (51)
Let us now represent the vector w with components Wy (i, j = 1,..., n) by an n x n symmetric matrix W with elements W*,-. By rewriting the constraints (40) in matrix form W = xxT (52) (x is a column vector, i.e. an n x 1 matrix) we see from (52) that W > Q and rankW = 1, so the constraints (40)-(41) can be relaxed to the linear matrix inequality W >z 0. Since by Proposition 5 the linear program (49) (50) is already a normal relaxation of QP(M), a tighter normal relaxation is given by RQ2(M)
min f0(x,w) fk(x,w)<0 W y xxT.
s.t. k = l,...,N
(53) (54) (55)
Substituting fk(x) = (x,Qkx) + (ck,x) + bk in (53) and (54) yields RQ2(M)
min Tr WQ° + (c°)Tx s.t. Tr WQk + (ck)Tx + bk < 0 fc = l,...,JV W x >-0. xT 1
This is a SDP which can be solved practically by several currently available interior point methods, see e.g. [9], [23]. In many applications, aside from the quadratic constraints the problem may involve one or more LMI constraints, see e.g. [6], [24]. The above relaxation is then very convenient since it is still a SDP.
351 Remark 6 Setting fi
=
i (2/0,2/1, •
,VN)
k = Q, l,...,N
Vk = fk(x)
(2/I,---,2/JV) < 0 ,
1
x
the problem is min{2/0| (2/0,2/i, • • •, 2/JV) £ ^ } , or equivalently, min{2/0| (j/ 0 ,yi,...,y N ) G convQ}.
(56)
1
Obviously, fi = F n (i? x R ^), where •F1 : =
\(yo,yi,---,VN)
yk = fk(x) xe Rn
k =
0,l,...,N
Prom a result of Ramana [14] it follows that convF=
Uy0,yi,...,yN
yk = f{x,w) W t xxT,
A = 0 , 1 , . . ,N xGRn
so RQ 2 (M) amounts to replacing the feasible set Q = F n (R x R1^) by (convF) n (R x R^) which obviously contains convfi though may be larger. Remark 7 Relaxations similar to the above have been used in the literature. However, in most published results, the linear inequalities are treated as quadratic only formally, i.e. with zero matrix Ql = 0, so that, when turned into linear inequaltites in (x, w) they do not involve w. In constrast, in the above presented method, a system of l + 2n linear inequalities (13)-(12) is replaced by a system of In quadratic inequalities, which when turned into inequalities in (a;, w) do involve w, i.e. will set restrictions not only for x but also for w. RELAXATION
RQ 3
As was observed in [4]: Proposition 6 We have W = xxT if and only if
W
x
T
1
x
Tr(W-xxT)
0
(57)
< 0.
(58)
t
Proof It suffices to prove the "if part. From (57) it follows that W — xxT y 0. Then Tv(W - xxT) > 0 and hence by (58) Tr(VF - xxT) = 0. Therefore, W = xxT. • Note that Tx{W — xxT) = YA=I(WH
~ xl)
ls a
concave function of (x, w).
352 Proposition 7 The convex envelope of the function Tr(W — xxT) on the set {{x,w)\ p < x < q} is n
Y^WH ~ (Pi + 1i)xi + Pi*]'
(59)
Proof For fixed i = 1 , . . . ,n and fixed u>u the function Xi >-» Wn — (pi+qi)xi+piqi is the unique affine function that matches the concave function wu — x\ at the endpoints of the segment [p,, <&]. Therefore (59) is the unique affine function that matches Tr(VK xxT) — Y^=i(wu ~ XX) at the corners of the rectangle [p, q]. O Corollary 2 A normal relaxation ofQP(M)
is
min Tr WQ° + {c°)Tx
RQ 3 (M)
k
k T
Tr WQ + {c ) x + bk < 0 xeXOM W xT
x 1
YlWii
(60)
s.t. k
,m,
(61) (62)
>-0.
(63)
~ (Pi + 1i)Xi + Pili] ^ °"
(64)
Proof Clearly RQ 3 (M) is a relaxation of QP(M). Furthermore, if (x, w) satisfies (63)(64) and x is a corner of the rectangle M = \p,q], i.e. xt G {pi,qi} for every i, then obviously (Pi + q^Xi —ptqt = x\ Vi, hence Tr(W — xxT) < 0. This, together with the inequality W — xxT y 0, implies for every (i, j), i.e. W = xxT. The continuity condition in (*) is obvious. • In view of this Corollary and since the constraints (63) and (64) are especially designed to approximate the constraint W = xxT one may wonder whether they can subsume the constraints on (x, w) that are derived from the quadratic inequalities equivalent to the linear system x E X nM. In other words, one may wonder whether the following relaxation is any better than RQs(M) : min Tr WQ° + (c°)Tx k
Tr WQ
+ {c ) x + bk < 0
{xi-pi^x-dj) {qi-xi)(gjx-d:i) W
s.t.
h T
> 0,
<0 <0
(65) k = l,...,m
.
,n, J
Yl\.Wii ~ (P' + Q*)X< + P'*l - °-
(66) (67) (68)
Concluding Remark A hierarchy of normal relaxations has been obtained by variables decoupling. While by Theorem 1 any one of these relaxations can be incorporated into a normal BB algorithm with guaranteed convergence, the choice of
353 a proper relaxation for solving a given problem (QP) must be decided on the basis of a trade-off between the necessary computational effort and the desired quality of the bounds. It is also possible, and perhaps advisable, to use relatively inexpensive bounds at a first stage of the algorithm, then switch to more refined (hence more expensive) bounds at a later stage, when coming close to the optimum. The availability of a full range of normal relaxations should help to make this strategy easier to implement.
References F.A. Al-Khayyal and J.E. Falk, "Jointly constrained biconvex programming", Mathematics of Operations Research, 8(1983), 273-286. F. A. Al-Khayyal, C. Larsen and T. Van Voorhis, "A relaxation method for nonconvex quadratically constrained quadratic programs", Journal of Global Optimization 6(1995), 215-230. LP. Androulakis, C D . Maranas and C.A. Floudas, "aBB: A global optimization method for general constrained nonconvex problems, Journal of Global Optimization, 7(1995), 337-363. P. Apkarian and H.D. Tuan, "Robust control via concave optimization: local and global algorithms", Proceedings of the 37th Conference on Decision and Control, 1998. A. Ben-Tal, G. Eiger and V. Gershovitz, "Global minimization by reducing the duality gap", Mathematical Programming, 63(1994), 193-212. S. Boyd,L.E. Ghaoui, E. Feron and V. Balakrishnan, Linear Matrix in System and Control Theory, SIAM, Philadelphia, 1994.
Inequalities
C. Floudas and V. Visweswaran, "Quadratic Optimization", in R. Horst and P. Pardalos (eds.), Handbook of Global Optimization Kluwer 1995, 217-269. T. Fujie and M. Kojima, "Semidefinite programming relaxation for nonconvex quadratic programs", Journal of Global Optimization 10(1997), 367-380. P. Gahinet and A. Nemirovski, "The projective method for solving linear matrix inequalities", Mathematical Programming, 77 (1997), 163-190. K.C. Goh, M. G. Safonov and G.P. Papavassilopoulos, "Global optimization for the Biaffine Matrix Inequality Problem", Journal of Global Optimization, 7(1995), 365-380.
354 [11] G.P. McCormick, Nonlinear Programming: Theory, Algorithms and Applications, John Wiley and Sons, New York, 1982. [12] L.D. Muu and W. Oettli, "Method for minimizing a convex concave function over a convex set", Journal of Optimization Theory and Applications, 70(1991), 377-384. [13] Ju.E. Nesterov and A.S. Nemirovski, Interior Point Polynomial Methods in Convex Programming: Theory and Applications, SIAM, Philadelphia, 1994. [14] M.V. Ramana, An algorithmic analysis of multiquadratic and semi-definite programming problems, Ph.D. Thesis, Johns Hopkins University, Baltimore (1993). [15] M.V. Ramana and P.M. Pardalos, "Semidefinite programming", in T. Terlaky ed. Interior Point Algorithms, Kluwer (1996), 369-398. [16] H.D. Sherali and C.H. Tuncbilek, "A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique", Journal of Global Optimization 2(1992), 101-112. [17] H.D. Sherali and C.H. Tuncbilek, "A reformulation-convexification approach to solving nonconvex quadratic programming problems", Journal of Global Optimization, 7(1995), 1-31. [18] H.D. Sherali and C.H. Tuncbilek, "New reformulation linearization/ convexification relaxations for univariate and multivariate polynomial programming problems", Operations Research Letters, 21(1997), 1-9. [19] N.Z. Shor, "Dual quadratic estimates in polynomial and boolean programming", Annals of Operations Research, 25 (1990),163-168. [20] N.Z. Shor and S.I. Stetsenko, Quadratic extremal problems and nondifferentiable optimization (in Russian), Naukova Dumka, Kiev (1989). [21] H. Tuy, "D.C. Optimization: Theory, Methods and Algorithms", in R. Horst and P. Pardalos (eds.), Handbook of Global Optimization, Kluwer 1995, 149-216. [22] H. Tuy, Convex Analysis and Global Optimization, Kluwer (1998). [23] L. Vandenberge and S. Boyd, "A primal-dual potential reduction method for problems involving matrix inequalities", Mathematical Programming, Series B, 69(1995), 205-206. [24] L. Vandenberge and S. Boyd, "Semidefinite Programming", SIAM 38(1996), 49-95.
Review
355
[25] V: Visweswaran and C.A. Floudas, "New properties and computational improvements of the GOP algorithm for problems with quadratic objective functions and constraints", Journal of Global Optimization, 3(1993), 439-462.
Series on Applied Mathematics - Vol. 14
COMBINATORIAL AND GLOBAL OPTIMIZATION Combinatorial and global optimization problems appear in a wide range of applications in operations research, engineering, biological science, and computer science. In combinatorial optimization and graph theory, many approaches have been developed that link the discrete universe to the continuous universe through geometric, analytic, and algebraic techniques. Such techniques include global optimization formulations, semidefinite programming, and spectral theory. Recent major successes based on these approaches include interior point algorithms for linear and discrete problems, the celebrated Goemans-Williamson relaxation of the maximum cut problem, and the Du-Hwang solution of the Gilbert—Pollak conjecture. Since integer constraints are equivalent to nonconvex constraints, the fundamental difference between classes of optimization problems is not between discrete and continuous problems but between convex and nonconvex optimization problems. This volume is a selection of refereed papers based on talks presented at a conference on "Combinatorial and Global Optimization" held at Crete, Greece.