ANNALS OF DISCRETE MATHEMATICS
annals of discrete mathematics Managing Editor Peter L. HAMMER, University of Waterloo, Ont., Canada
Advisory Editors C. BERGE, UniversitC de Paris, France M A . HARRISON, University of California, Berkeley, CA, U.S.A. V. KLEE, University of Washington, Seattle, WA, U.S.A. J.H. VAN LINT, California Institute of Technology, Pasadena, CA, U.S.A. G.-C. ROTA, Massachusetts lnstitute of Technology, Cambridge, MA, U.S.A.
NORTH-HOLLAND PUBLISHING COMPANY
-
AMSTERDAM NEW YORK. OXFORD
ANNALS OF DISCRETE MATHEMATICS
DISCRETE OPTIMIZATION II Proceedings of the ADVANCED RESEARCH INSTITUTE ON DISCRETE OPTIMIZATION AND SYSTEMS APPLICATIONS of the Systems Science Panel of NATO and of the DISCRETE OPTIMIZATION SYMPOSIUM co-sponsored by IBM Canada and SIAM &nfX Alta. and Vancouver, B.C. Canada, August 1977
AKADEMIE 2 Edited by
P.L. IHAMMER, University of Waterloo, Ont., Canada E.L. JOHNSON, IBM Research, Yorktown Heights, NY, U.S.A. B.H. KORTE, University of Bonn, Federal Republic of Germany
1979 NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM NEW YORK OXFORD
5
0 NORTH-HOLLAND PUBLISHING COMPANY - 1979
All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission o f the copyright owner. Submission to this journal of a paper entails the author's irrevocable and exclusive authorization of the publisher to collect any sums or considerationsfor copying or reproduction payable by third parties (as mentioned in article 17paragraph 2 of the Dutch Copyright Act of 1912 and in the Royal Decree ofJune 20, 1974 (S.351) pursuant to article 16 b of the Dutch Copyright Act of 1912) andfor to act in or out of Court in connection therewith.
PRINTED IN THE NETHERLANDS
CONTENTS 1
3. Methodology Surveys E. BUS, Disjunctive programming P. HANSEN, Methods of nonlinear 0-1 programming R.G. JEROSLOW, An introduction to the theory of cutting planes E.L. JOHNSON, On the group problem and a subadditive approach to integer programming A survey of lagrangian techniques for discrete optimization J.F. SHAPIRO, K. SPIELBERG, Enumerative methods in integer programming
3 53 71 97 113 139
Reports Branch and bound/implicit enumeration (E. BUS) Cutting planes (M. GONDRAN) Group theoretic and lagrangean methods (M.W. PADBERG)
185 193 195
4. Computer codes
199
Surveys E.M.L. BEALE,Branch and bound methods for mathematical programming systems A. LANDand S. POWELL, Computer codes for problems of integer programming
201 221
Reports Current state of computer codes for discrete optimization (J.J.H. FORREST) 271 Codes for special problems (F. GIANNESSI) 275 Current computer codes (S. POWELL) 279
5. Applications
285
Surveys R.L. GRAHAM,E.L. LAWLER, J.K. LENSTRA and A.H.G. RINNOOY KAN, Optimization and approximation in deterministic sequencing and schedul287 ing: a survey V
vi
Contents
J. KRARUPand P. PRUZAN, Selected families of location problems S. ZIONTS, A survey of multiple criteria integer programming methods
327 3 89
Reports Industrial applications (E.M.L. BEALE) Modeling (D. KLINGMAN) Location and distribution problems (J. KRARUP) Communication and electrical networks (M. SEGAL) KAN) Scheduling (A.H.G. RINNOOY
399 405 411 417 423
Conclusive remarks
427
PART 3
METHODOLOGY
CONTENTS OF PART 3
Sumeys
E. BALAS,Disjunctive programming 3 53 P. HANSEN, Methods of nonlinear 0-1 programming 71 R. G. JEROSLOW,An introduction to the theory of cutting planes E. L. JOHNSON, On the group problem and a subadditive approach to integer programming 97 J. F. SHAPIRO, A survey of lagrangian techniques for discrete optimization 113 139 K. SPIELBERG, Enumerative methods in integer programming
Reports Branch and bound/implicit enumeration (E. BALAS) Cutting planes (M. GONDRAN) Group theoretic and lagrangean methods (M. W. PADBERG)
185 193 195
Annals of Discrete Mathematics 5 (1979) 3-51 @ North-Holland Publishing Company
DISJUNCTIVE PROGRAMMING Egon BALAS Carnegie-Mellon University, Pittsburgh, PA 15213, U.S.A.
1. Introduction This paper reviews some recent developments in the convex analysis approach to integer programming. A product of the last five years, these developments are based on viewing integer programs as disjunctive programs, i.e., linear programs with disjunctive constraints. Apart from the fact that this is the most natural and straightforward way of stating many problems involving logical conditions (dichotomies, implications, etc.), the disjunctive programming approach seems to be fruitful for zero-one programming both theoretically and practically. On the theoretical side, it provides neat structural characterizations which offer new insights. On the practical side, it produces a variety of cutting planes with desirable properties, and offers several ways of combining cutting planes with branch and bound. The line of research that has led to the disjunctive programming approach originates in the work on intersection or convexity cuts by Young [40], Balas [2], Glover [23], Owen [36] and others (see also [13, 24, 421). This geometrically motivated work can be described in terms of intersecting the edges of the cone originating at the linear programming optimum X with the boundary of some convex set S, whose interior contains X but no feasible integer points, and using the hyperplane defined by the intersection points as a cutting plane. An early forerunner of this kind of approach was a paper by Hoang Tuy [29]. In the early ~ O ’ S research , on intersection or convexity cuts was pursued in two main directions. One, typified by [23, 18, 41, was aimed at obtaining stronger cuts by including into S some explicitly or implicitly enumerated feasible integer points. The other, initiated by [3], brought into play polarity, in particular outer polars (i.e., polars of the feasible set, scaled so as to contain all feasible 0-1 points in their boundary), and related concepts of convex analysis, like maximal convex extensions, support and gauge functions, etc. (see also [5, 15, 191). Besides cutting planes, this second direction has also produced (see [5, 61) a “constraint activating” method (computationally untested to date) based on the idea of “burying” the feasible set into the outer polar (without using cuts), by activating the problem constraints one by one, as needed. This research has yielded certain insights and produced reasonably strong cutting planes; but those procedures that were
4
E . Balm
implemented (and, by the way, very few were) turned out to be computationally too expensive to be practically useful. In 1973 Glover [25] discovered that intersection cuts derived from a convex polyhedron S can be strengthened by rotating the facets of S in certain ways, a procedure he called polyhedral annexation. This was an important step toward the development of the techniques discussed in this paper. The same conclusions were reached independently (and concomitantly) in a somewhat different context by Balas [7]. The new context was given by the recognition that intersection cuts could be viewed as derived from a disjunction. Indeed, requiring that no feasible integer point be contained in the interior of S, is the same as requiring every feasible integer point to satisfy at least one of the inequalities whose complements define the supporting halfspaces of S. This seemingly innocuous change of perspective proved to be extremely fruitful. For one thing, it led naturally and immediately to the consideration of disjunctive programs in their generality [S, 91, and hence to a characterization of all the valid inequalities for an integer program. By the same token, it offered the new possibility of generating cuts specially tailored for problems with a given structure. Besides, it offered a unified theoretical perspective on cutting planes and enumeration, as well as practical ways of combining the two approaches. Finally, it vastly simplified the proofs of many earlier results and opene‘d the way to the subsequent developments to be discussed below. Besides the above antecedents of the disjunctive programming approach there have been a few other papers concerned with linear (or nonlinear) programs with disjunctive constraints [26, 26a, 271. The paper by Owen [36] deserves special mention, as the first occurrence of a cut with coefficients of different signs. However, these efforts were focused on special cases. The main conceptual tool used in studying the structural properties of disjunctive programs is polarity. Besides the classical polar sets, we use a natural generalization of the latter which we call reverse polar sets [lo]. This connects our research to the work of Fulkerson [20, 21, 221, whose blocking and antiblocking polyhedra are close relatives of the polar and reverse polar sets. There are also connections with the work of Tind [39] and Araoz [l]. One crucial difference in the way polars and reverse polars of a polyhedron are used in our work, versus that of the above mentioned authors, is the fact that we “dualize” the reverse polar S# of a (disjunctive) set S, by representing it in terms of the inequalities (of the disjunction) defining S, rather than in terms of the points of S. It is precisely this element which leads to a linear programming characterization of the convex hull of feasible points of a disjunctive program. Except for the specific applications described in Sections 7 and 8, which are new, the results reviewed here are from [8-12, 141. We tried to make this review self-contained, giving complete proofs for most of the statements. Many of the results are illustrated on numerical examples. For further details and related developments the reader is referred to the above papers, as well as those of
Disjunctive programming
5
Glover [ 2 5 ] and Jeroslow [32]. Related theoretical developments are also to be found in Jeroslow [30, 311, Blair and Jeroslow [17], Borwein [17a], Zemel [41], while some related computational procedures are discussed in Balas [121. This paper is organized as follows. Section 2 introduces some basic concepts and terminology, and discusses ways of formulating disjunctive programs (DP). Section 3 contains the basic characterization of the family of valid inequalities for a DP, and discusses some of the implications of this general principle for deriving cutting planes. Section 4 extends the duality theorem of linear programming to DP. Section 5 discusses the basic properties of reverse polars and uses them to characterize the convex hull of a disjunctive set (the feasible set of a DP). It shows how the facets of the convex hull of the feasible set of a DP in y1 variables, defined by a disjunction with q terms, can be obtained from a linear program with q(n 1) constraints. In Section 6 we address the intriguing question of whether the convex hull of a disjunctive set can be generated sequentially, by imposing one by one the disjunctions of the conjunctive normal form, and producing at each step the convex hull of a disjunctive set with one elementary disjunction. We answer the question in the negative for the case of a general D P or a general integer program, but in the positive for a class of DP called facial, which subsumes the general (pure or mixed) zero-one program, the linear complementarity problem, the nonconvex (linearly constrained) quadratic program, etc. The first 6 sections treat the integrality constraints of an (otherwise) linear program as disjunctions. When it comes to generating cutting planes from a particular noninteger simplex tableau, the disjunctions that can be used effectively are those involving the basic variables. Section 7 discusses a principle for strengthening cutting planes derived from a disjunction, by using the integrality conditions (if any) on the nonbasic variables. Section 8 illustrates on the case of multiple choice constraints how the procedures of Sections 3 and 7 can take advantage of problem structure. Section 9 discusses some ways in which disjunctive programming can be used to combine branch and bound with cutting planes. In particular, if LPk, k E Q, are the subproblems associated with the active nodes of the search tree at a given stage of a branch and bound procedure applied to a mixed integer program P, it is shown that a cut can be derived from the cost rows of the simplex tableaux associated with the problems LPk, which provides a bound on the value of P, often considerably better than the one available from the branch and bound process. Finally, Section 10 deals with techniques for deriving disjunctions from conditional bounds. Cutting planes obtained from such disjunctions have been used with good results on large sparse set covering problems.
+
2. Linear programs with logical conditions By disjunctive programming we mean linear programming with disjunctive constraints. Integer programs (pure and mixed) and a host of other nonconvex
6
E. Balas
programming problems (the linear complementarity problem, the general quadratic program, separable nonlinear programs, bimatrix games, etc.) can be stated as linear programs with logical conditions. In the present context “logical conditions” means statements about linear inequalities, involving the operations “and” ( A , conjunction -sometimes denoted by juxtaposition), “or” ( v , disjunction), “complement of” (1, negation). The operation “if * * * then” (3, implication) is known to be equivalent to a disjunction. The negation of a conjunctive set of inequalities is a disjunction whose terms are the same inequalities. The operation of conjunction applied to linear inequalities gives rise to (convex) polyhedral sets. The disjunctions are thus the crucial elements of a logical condition (the ones that make the constraint set nonconvex), and that is why we call this type of problem a disjunctive program. A disjunctive program. (DP) is then a problem of the form
I
min {cx Ax 3 a,, x 2 0 , x E L }, where A is an m x n matrix, a, is an m-vector, and L is a set of logical conditions. Since the latter can be expressed in many ways, there are many different forms in which a DP can be stated. Two of them are fundamental. The constraint set of a D P (and the DP itself) is said to be in disjunctive normal form if it is defined by a disjunction whose terms do not contain further disjunctions; and in conjunctive normal form, if it is defined by a conjunction whose terms do not contain further conjunctions. The disjunctive normal form is then
while the conjunctive normal form is
V (d’x 3 die),
j ES
iEQ,
or, alternatively, (2.2’) Here each d’ is an n-vector and each di, a scalar, while the sets Q and Q,, j E S, may or may not be finite. The connection between the two forms is that each term of the disjunction (2.1) has, besides the m + n inequalities of the system Ax 2 a,, x a O , precisely one inequality d’x a dio, i E Qi, from each disjunction j E S of (2.2), and that all distinct systems Ahx> a ; , x 2 0 with this property are present among the terms of (2.1); so that, if Q (and hence each Q,, j E S ) is finite, then IQl= Illjes where ll stands for Cartesian product. Since the operations A and v are distributive with respect to each other [i.e., if A, B, C are inequalities,
ail,
7
Disjunctiue programming
A A (B v C) = AB v AC, and A v (BC)= (A v B) A (A v C)], any logical condition involving these operations can be brought to any of the two fundamental forms, and each of the latter can be obtained from the other one. We illustrate the meaning of these two forms on the case when the DP is a zero-one program in n variables. Then the disjunctive normal form (2.1) is
v (Ax a ~ ,x,a O , x = x h )
hsQ
where x', . .. ,xlQl is the set of all 0-1 points, and IQI = 2" ; whereas the conjunctive normal form is
AxaO,
xa0,
(xi=O)v(xi=l),
j=1, . . . , n.
Once the inequalities occurring in the conjunctions and/or disjunctions of a DP are given, the disjunctive and conjunctive normal forms are unique. It is a fact of crucial practical importance, however, that the inequalities expressing the conditions of a given problem can be chosen in more than one way. For instance, the constraint set
- 2x3 + x4 1, X I + x2 + x3 + xq s 1,
3x1
+
x2
x i = O o r 1,
j = 1 , . . . , 4,
when put in disjunctive normal form, becomes a disjunction with Z4 = 16 terms; but the same constraint set can also be expressed as 3x1 +x,-
2x3 + x 4 s 1,
which gives rise to a disjunction with only 5 terms.
3. The basic principle of disjunctive programming A constraint B is said to be a consequence of, or implied by, a constraint A, if every x that satisfies A also satisfies B. We are interested in the family of inequalities implied by the constraint set of a general disjunctive program (DP). All valid cutting planes for a DP belong of course to this family. On the other hand, the set of points satisfying all the inequalities in the family is precisely the convex hull of the set of feasible solutions to the DP. A characterization of this family is given in the next theorem, which is an easy but important generalization of a classical result. Let x ER",(Y ER",( Y ~R, E A h E R"."", u! E R"., h E Q (not necessarily finite) and let ur be the jth column of Ah, h E Q, j E N = (1, . . . , n}.
E. B R ~ R S
8
Theorem 3.1. The inequality a x 3 a. is a consequence of the constraint
v
(3.1)
heQ
if and only if there exists a set of OhERmh,O h Z O , h e Q*, satisfying aaOhAh and
aosOhagh, b'hEQ*,
(3.2)
where Q* is the set of those h E Q such that the system Ahx3 a,,, x 3 0 is consistent.
Proof. a x 5 a. is a consequence of (3.1) if and only if it is a consequence of each term h E Q* of (3.1). But according to a classical result on linear inequalities (see, for instance, Theorem 1.4.4 of [38], or Theorem 22.3 of [37]), this is the case if and only if the conditions stated in the theorem hold. 0
Remark 3.1.1. If the ith inequality of a system h E Q* of (3.1) is replaced by an equation, the ith component of Oh is to be made unconstrained. If the variable xi in (3.1) is allowed to be unconstrained, the jth inequality of each system a 2 OhAh,h E Q*, is to be replaced by the corresponding equation in the "if" part of the statement. With these changes, Theorem 3.1 remains true. An alternative way of stating (3.2) is aia sup Ohap, h=Q*
j E N,
(3.3)
ao< inf 6"':. heQ*
Since Q* c Q, the if part of the Theorem remains of course valid if Q* is replaced by Q. Since (3.3) defines all the valid inequalities for (3.1), every valid cutting plane for a disjunctive program can be obtained from (3.3) by choosing suitable multipliers 01. If we think of (3.1) as being expressed in terms of the nonbasic variables of a basic optimal solution to the linear program associated with a DP, then a valid inequality a x 2 a o cuts off the optimal linear programming solution (corresponding to xi = 0, j E N) if and only if a()>0; hence a. will have to be fixed at a positive value. Inequalities with aoSO may still cut off parts of the linear programming feasible set, but not the optimal solution x = 0. The special case when each system A h x a a ; , h c Q , consists of the single inequality ahx 3 ah(,( a h a vector, ahoa positive scalar) deserves special mention. In this case, choosing multipliers Oh = l/ah,,,h E Q, we obtain the inequality (3.4)
Disjunctive programming
9
which (for Q finite) is Owen’s cut [36]. It can also be viewed as a slightly improved version of the intersection cut from the convex set
which has the same coefficients as (3.4) except for those (if any) j c J such that a:
ahO); in the presence of negative coefficients, however, (3.4) can sometimes be further strengthened. Due to the generality of the family of inequalities defined by (3.3), not only can the earlier cuts of the literature be easily recovered by an appropriate choice of the multipliers Oh (see [8] for details), but putting them in the form (3.3) indicates, by the same token, ways in which they can be strengthened by appropriate changes in the multipliers. A significant new feature of the cutting planes defined by (3.3) consists in the fact that they can have coefficients of different signs. The classical cutting planes, as well as the early intersectionlconvexity cuts and the group theoretic cutting planes (including those corresponding to facets of the corner polyhedron), are all restricted to positive coefficients (when stated in the form a, in terms of the nonbasic variables of the tableau from which they were derived). This important limitation, which tends to produce degeneracy in dual cutting plane algorithms, can often be overcome in the case of the cuts obtained from (3.3) by an appropriate choice of multipliers. Another important feature of the principle expressed in Theorem 3.1 for generating cutting planes is the fact that in formulating a given integer program as a disjunctive program, one can take advantage of any particular structure the problem may have. In Section 7 we will illustrate this on some frequently occurring structures. We finish this section by an example of a cut for a general mixed integer program.
Example 3.1. Consider the mixed integer program whose constraint set is x, =0.2+0.4(-x,)+1.3(-x~)-0.01(-x,)+0.07(-x,) x2 = 0.9 - 0.3(- xg) + 0.4(- x4) -0.04( - x,) + 0.1(- Xg) xj20,
j = l , . . .,6,
xi integer,
j = l , . . . ,4.
This problem is taken from Johnson’s paper [35], which also lists six cutting planes derived from the extreme valid inequalities for the associated group
E . Balm
10
problem : ~
.
~
~
X
~
~
~1,
.
8
~
~
X
~
0.778x3+0.444x4+0.40x, +0.111x6a1, 0.333~3+0.667~4+0.033~, +O.35X,52 1, 0.50x,+x4+0.40x,+0.25x,~ 1,
+ 0.478X63 1, 0.394~3 +0.636x4+0.346x, + O.155X63 I. 0.444X,+0.333X4+0.055X5
The first two of these inequalities are the mixed-integer Gomory cuts derived from the row of x1 and xz respectively. To show how they can be improved, we first derive them as they are. To do this, for a row of the form
xi =aio+Caii(-Xi), i€J
with xj integer-constrained for j E J1, continuous for j € J 2 , one defines aij-[aijl, jEJU{O}, and qio=fio, jE
~t= {jE Jl I fio a
iEJ;={jEJ1
fij
=
fij),
Ifio
j E Jz.
Then every x which satisfies the above equation and the integrality constraints on xi,j E J1U { i } , also satisfies the condition yi = qio+C q i i ( - x i ) ,
yi integer.
jcJ
For the two equations of the example, the resulting conditions are y1 =0.2-0.6(-x3)-0.7(-x4)-0.01(-x5)+0.07(-x6),
y1 integer,
y2=o.9+o.7(-x3)+o.4(-x4)-o.04(-x5)+o.1(-x6),
y2 integer.
Since each yi is integer-constrained, they have to satisfy the disjunction yi 4 0 v yi a 1. Applying the above theorem with multipliers 0, = l/uio then gives the cut (3.4) which in the two cases i = 1 , 2 is
and
These are precisely the first two inequalities of the above list. Since all cuts discussed here are stated in the form a l , the smaller the jth
~
~
Disjwnctiue programming
11
coefficient, the stronger is the cut in the direction j . We would thus like to reduce the size of the coefficients as much as possible. Now suppose that instead of y1 S Ov y, 3 1, we use the disjunction
which of course is also satisfied by every feasible x. Then, applying Theorem 3.1 with multipliers 5, 5 and 15 for y1 S O , y1 3 1 and x1 3 0 respectively, we obtain the cut whose coefficients are
- 0.6) 5 0.6 + 15 ( 0.4) {5 5 (X0.2 ’ 5 X0.8+ 15 X(-0.2) 5 (-0.7) 5 0.7+ 15 (- 1.3) max { 5 0.2 ’ 5 0.8+ 15 (-0.2) 5 (-0.01) 5 15 XO.01 max { 5 X0.2 ’ 5 X0.8+ 15 (-0.2)
max
X
X
X -
X
X
X
X
X
X
X
XO.Ol+
X
I I I
= -3, = -3.5,
5 ~ 0 . 0 7 5X(-0.07)+15~(-0.07) ’ 5 X 0.8+ 15 X (-0.2)
= 0.2,
I
= 0.35,
that is
The sum of coefficients on the left hand side has been reduced from 1.9875 to -5.95. Similarly, for the second cut, if instead of y2 S 0 v y2 3 1we use the disjunction
with multipliers 10, 40 and 10 for y2 S 0, x, 3 0 and y2 3 1 respectively, we obtain the cut - 7x,
- 4x,
+ 0.4x5- xg 3 1.
Here the sum of left hand side coefficients has been reduced from 1.733 to - 11.6. 4. Duality
In this section we state a duality theorem for disjunctive programs, which generalizes to this class of problems the duality theorem of linear programming.
E . Balm
12
Consider the disjunctive program zo = min cx,
v
(P)
hcQ
where A h is a matrix and bh a vector, V h E Q. We define the dual of ( P ) to be the problem
w o= max w,
A
(D)
hsQ
[
w - Uhbh S O uhtl;;}.
The constraint set of (D) requires each uh, h E Q, to satisfy the corresponding bracketed system, and w to satisfy each of them. Let Xh={X)AhXaO,XaO};
Xh={X)AhXabh,XaO}, Uh={Uh
1UhAhSC,
UhbO},
uh={Uh
I UhAh
Further, let Q*={hEQ(&f@},
Q**={hEQ(
uh#@}.
We will assume the following
Regularity condition:
(a*#@, Q\Q**#@)+Q*\Q**;*@; i.e., if (P) is feasible and (D) is infeasible, then there exists h E Q such that X,, f @, uh =
@.
Theorem 4.1. Assume that (P) and ( D ) satisfy the regularity condition. Then exactly one of the following two situations holds. ( 1 ) Both problems are feasible; each has an optimal solution and zo = wg. (2) One of the problems is infeasible; the other one either is infeasible or has no finite optimum.
Proof. (1) Assume that both (P) and (D) are feasible. If (P) has no finite minimum, then there exists h E Q such that j z h # @ and f E % . , such that cX < 0. But then u h = @, i.e., (D) is infeasible; a contradiction. Thus (P) has an optimal solution, say X. Then the inequality cx>zo is a consequence of the constraint set of (P); i.e., x E x h implies cx 3 zo. V h E Q. But then for all h E Q*, there exists uh E u h such that uhbhb to.Further, since (D) is feasible, for each h E Q \ Q" there exists fih E u h ; and since xh = @ (for h E 0 \ Q*), there also exists iih E u h such that iihbh>O, Vh E 0 \ Q*. But
Disjunctive programming
13
then, defining u h ( A )= C h +Aiih,
h E Q \ Q",
for A sufficiently large, u h ( A )E U,, uh(A)bh3 zo, V h E Q \ Q". Hence for all h E Q, there exist vectors u h satisfying the constraints of (D) for w = zo. To show that this is the maximal value of w, we note that since X is optimal for (P), there exists h E Q such that cx = min {cx I x
Ex h } .
But then by linear programming duality,
cx =max{uhbh 1 u h E uh)
I
=max{w w-uhbh
U
~ u Eh}
I A (w-uhbh60, tf".u")) hsQ
i.e., w 6 zo, and hence the maximum value of w is w,, = zo. (2) Assume that at least one of (P) and (D) is infeasible. If (P) is infeasible, x h = 8, V h E Q; hence for all h E Q, there exists iih E u h such that iihbh2 0. If (D) is also infeasible, we are done. Otherwise, for each h E Q there exists C E uh. But then defining uh(A)=Ch+Aiih,
hEO,
u h ( A )E uh, h E Q, for all A >0, and since iihbh>0, V h E Q, w can be made arbitrarily large by increasing A ; i.e., (D) has no finite optimum. Conversely, if (D) is infeasible, then either (P) is infeasible and we are done, or else, from the regularity condition, Q" \ Q"" # 8; and for h E Q" \ Q"* there h such that cx < 0. But then exists P E xh and x E z x ( p ) = P +/-&I
is a feasible solution to (P) for any p>O, and since ci.0, z can be made i.e., (P) has no finite optimum. arbitrarily small by increasing /-&; The above theorem asserts that either situation 1 or situation 2 holds for (P) and (D) if the regularity condition is satisfied. The following Corollary shows that the condition is not only sufficient but also necessary.
Corollary 4.1.1. If the regularity condition does not hold, then if (P) is feasible and (D) is infeasible, (P) has a finite minimum (i.e., there is a "duality gap"). Proof. Let (P) be feasible, (D) infeasible, and Q" \ Q**=@, i.e., for every h E Q*,let u h # 8. Then for each h E Q", min {cx I x E xh} is finite, hence (P) has a finite minimum. 0
E. Balm
14
Remark. The theorem remains true if some of the variables of (P) [of (D)] are unconstrained, and the corresponding constraints of (D) [of (P)] are equalities. The regularity condition can be expected to hold in all but some rather peculiar situations. In linear programming duality, the case when both the primal and the dual problem is infeasible only occurs for problems whose coefficient matrix A has the very special property that there exists x f 0 , u#O, satisfying the homogeneous system
AxaO,
x30;
uAs0,
USO.
In this context, our regularity condition requires that, if the primal problem is feasible and the dual is infeasible, then at least one of the matrices A h whose associated sets U,, are infeasible, should not have the above mentioned special property. Though most problems satisfy this requirement, nevertheless there are situations when the regularity condition breaks down, as illustrated by the following example. Consider the disjunctive program min - x1- 2x2,
- x1+ x2 3 0
(PI and its dual max w,
-u:-
(D)
u:W
s 0,
+2u:
w
u:
s-1,
u:
s -2, -u;so, - u;+
u ; s - 1, u:- u ; s -2, u: 3 0, i=l,2;
k=l,2.
The primal problem (P) has an optimal solution X = (0,2), with c f = -4; whereas the dual problem (D) is infeasible. This is due to the fact that Q*={l},Q\Q**={2} and X , = @ , U,=@,i.e., Q * \ Q * * = @ , hence the regularity condition is violated. Here
Disjunctive programming
15
5. The convex hull of a disjunctive set
Having described the family of all valid inequalities, one is of course interested in identifying the strongest ones among the latter, i.e., the facets of the convex hull of feasible points of a disjunctive program. If we denote the feasible set of a DP by
then for a given scalar ao,the family of inequalities ax 3 a. satisfied by all x E F, i.e., the family of valid inequalities for the DP, is obviously isomorphic to the family of vectors a E F&, where
F&,=(YER"I YXZW,,
VXEF},
in the sense that a x 3 a o is a valid inequality if and only if a E FE,,. In view of its relationship with ordinary polar sets, we call FE,, the reverse polar of F. Indeed, the ordinary polar set of F is
F ' = ( ~ E R "I yxS1, VXEF), and if we denote by Fa,, the polar of F scaled by ao, (i.e., the set obtained by replacing 1 with a. in F), then F$,) = -F;I)-+ The size (as opposed to the sign) of a. is of no interest here. Therefore we will distinguish only between the 3 cases ao>O (or ao= l), ao=O and ao 1, since this is the only case when the inequality ax 3 a. cuts off the point x = 0. This is why we need the concept of reverse polars. For an arbitrary set S c R " we will denote by cl S, conv S, cone S, int S and dim S, the closure, the convex hull, the conical hull, the interior and the dimension of S, respectively. For a polyhedral set S c R" we will denote by vert S and dir S the set of vertices (extreme points) and the set of extreme direction vectors of S, respectively. For definitions and background material on these and related concepts (including ordinary polar sets), the reader is referred to [37] or [38] (see also [28]). In [lo] we showed that while some of the basic properties of polar sets carry over to reverse polars, others can only be recovered in a modified form. In the first category we mention (a) (AS)#=(l/A)S#; (b) S s T 3 S#?T#; (c) ( S U T ) # = S # n T#, properties which follow from the definitions. In the second one we state a few theorems, which are from [lo] (see also [l]).
Theorem 5.1. (i) If aoSO, then S # # Q, and 0 E int cl conv S e S# is bounded
16
E. Balas
(ii) If a,,> 0, then
0 E cl conv S ($ S#
=@
e S#
is bounded.
Proof. (i) follows from the corresponding property of the ordinary polar So of S and the fact that Sgo)= - S o(a,,). (ii) For a,> 0, if S# # @ there exists y E R" such that xy ao, Vx E cl conv S. But 0 . y 0 and) ax 3 a,,,Vx E cl conv S ; which implies a E S#, i.e., S # # @ . It also implies ha E S#, Vh > 1, i.e., S# is unbounded. 0 From this point on, we restrict our attention to sets S whose convex hull is polyhedral and pointed. For the disjunctive set F, this condition is satisfied if Q is finite. Most of the results carry over to the general case, but proofs are simpler with the above assumptions. Theorem 5.2. If clconv S is polyhedral, so is S#. Proof. Let u,, . . . , u, and vl,. . . ,v, be the vertices and extreme direction vectors, respectively, of cl conv S. Then for every y E S there exist scalars hi 5 0, i = l , ..., p , p j z = O , j = l,..., q,withC:=,hi=1,such that i=l
j=1
and it can easily be seen that for arbitrary x, xy>a,,, VycS, if and only if x y a a , , i = 1,.. . ,p , and xuj >0, j = 1,.. . ,q. Thus
S# = {x E R"I
i = 1 , . . . ,p j=l, ...,q
xui aa,, X U j 3 0 ,
i.e., S# is polyhedral. 0 The next result describes the crucial involutory property of polar and reverse polar sets.
Theorem 5.3. Assume S# # pl. Then cl conv S + cl cone S, if a. > 0, if a. = 0, cl conv ( S U(0)).
Proof. S##
= {x E R"
if a,
I x y 3 a,,,Vy E S#}
x y 3 ao, for all y U ~ Y ~ C Ki = ~ l,
uiy>O,
,
i = 1,.. . , q
17
Disjunctiue programming
But xy >ao is a consequence of the system y y >ao, i = 1, . . . ,p and viy 3 0 , i = 1,.. . ,q (consistent, since S # # g), if and only if there exists a set of 4 3 0 , i = 1,.. . ,p, ui 3 0 , i = 1,., . , q , such that P
x=
8it4 i=l
with
+ if= l upi,
(5.1)
P
C 8iao 3
(YO.
i=l
Since S# is polyhedral, so is S##. thus S## is the closed set of points x E R" of the f o r m ( 5 . 1 ) w i t h 8 i 3 0 , i = 1 ,..., p u i 3 0 , i = 1,..., q , a n d
f.
i=l
4(
if ao>O, 3 0 , if ao=O, s 1, if ao
But these are precisely the expressions for the three sets claimed in the theorem to be equal to S## in the respective cases. 0
Corollary 5.3.1. cl conv S = SET nSET). Proof. Follows immediately from the proof of Theorem 5.3, where SET nSF?) corresponds to 8, = 1. 0
z,,
Example 5.1. Consider the disjunctive set -XI
xi
-x2==-2
1vx* 3 1
illustrated in Fig. l(a). Its reverse polars for a. = 1 and a0= -1 are the sets
and
-1
2y12 y 2 3 - 1
shown in Fig. l(b) and (c).
XE
18
E . Balas
(b)
(a)
Fig. 1.
{
- x x 1 - x 2 3 -2
x 1 + x 2 ~ ~ }FETl={x~R2 ,
F$p= x € R 2 x 1
a 0
x1
x230
x230
}
The next theorem is needed to prove some other essential properties of reverse polars.
Proof. If a0S 0, this follows from the corresponding property of ordinary polars. If ao>0 and 0 E cl conv S, then S# = 8, S## = R",and S### = 9 = S#. Finally, if a . > 0 and 0 $ cl conv S, then S###
= (cl conv S = {y E R"
+cl cone S)#
(from Theorem 5.3)
I xy a a0,Vx E (cl conv S + cl cone S)}
= { y € R " I x y ~ c r , , V x ~ S ) = S # .0
Fig. 2.
Disjunctive programming
19
The above results can be used to characterize the facets of cl conv S. To simplify the exposition, we will assume that S is a full-dimensional pointed polyhedron, which implies that S## is also full-dimensional. For the general case the reader is referred to [lo]. We recall that an inequality T X 3 r 0defines a facet of a convex polyhedral set C if T X 3 r 0 ,Vxe C, and T X = .rro for exactly d affinely independent points x of C, where d =dim C. The facet of C defined by T X a m0 is then {x E C 1 m = no}; but as customary in the literature, for the sake of brevity we call T X 3 r 0itself a facet. We proceed in two steps, the first of which concerns the facets of S##.
Theorem 5.5. Let dim S = n, and a,,# 0. Then a x 3 a. is a facet of S## if and only if a # 0 is a vertex of S#.
Proof. From Theorem 5.4,a ER" is a vertex of S# if and only if u y a a o , V u E vert S## u y 3 0, V u E dir S## and a satisfies with equality a subset of rank n of the system defining S#. Further, a # 0 if and only if this subset of inequalities is not homogeneous (i.e., at least one right hand side coefficient is a. # 0). On the other hand, a x 3 a . is a facet of S## if and only if (i) a x 3 ao,Vx E S##, i.e., a E S # ; and (ii) a x = a. for exactly n affinely independent points of S##. But (ii) holds if and only if au = a o for r vertices u of S##, and au = 0 for s extreme direction vectors v of S##, with r 3 1 (since a # 0) and r + s 3 n, such that the system of these equations is of rank n. Thus the two sets of conditions (for a x 3 a . to be a facet of S## and for a to be a vertex of S#) are identical. 0 By arguments similar to the above proof one shows that a x 3 0 is a facet of S## if and only if a is an extreme direction vector of S#. Unlike for ao#O, the homogeneous inequality ax 2 0 is a facet of SzP if and only if it is also a facet of Scf)[of Si"op], due to the fact that every extreme direction vector of S& is also an extreme direction vector of S c l ) [of S&], and vice-versa.
Theorem 5.6. Let dim S = n, and a. # 0. Then a x 3 a. is a facet of cl conv S i f and only i f it is a facet of S& Proof. (i) If ao>0, the halfspace a x >ao contains cl conv S if and only .if it contains cl conv S +cl cone $. If a . < 0, the halfspace a x b a . contains cl conv S if and only if it contains cl conv (S U(0)). From Theorem 4.3, in both cases ax b a. is a supporting halfspace for cl conv S if and only if it is a supporting halfspace for
E. Balas
20
(ii) Next we show that
{ x ~ c conv 1 S 1 ax = a,,}={xE ~
$ 1 2ax = a,,}.
The relation c follows from c l c o n v S ~ S E 5(Theorem 5.3). To show the converse, assume it to be false, and let x E Sg$ \ cl conv S satisfy ax = ao. From Theorem 5.3, x = Au for some u ECI conv S, and A > 1 if ao>O, O< A < 1 if a,,
where Q is assumed to be finite, and F to be full-dimensional (for the general case see [lo]).
i
y 2 BhAh, h E Q*
h E FEU,, = y ER"for some 0h >O, satisfying t9hal;3a,,
I
Q * ,
Proof. From Theorem 3.1, the set F:,, = { y E R" I x y 3 ao, V x E F} is of the form claimed above. The rest is a direct application of Theorems 5.5 and 5.6. 0 As for the case a,,= 0, from the above comments it follows that if ax 2 0 is a facet of cl conv F, then a # 0 is an extreme direction vector of FE,, for all a,,.The converse is not true, but if a f 0 is an extreme direction vector of F&>,for some a,,(hence for all a,,), then ax S O is either a facet of cl conv F, or the intersection of two facets, a'x3at and a 2 x 2 a &with a ' + a 2 = a and a:= -a;#O (see [lo] for the details). Since the facets of cl conv F are vertices (in the nonhomogeneous case) or extreme direction vectors (in the homogeneous case) of the convex polyhedron F#, they can be found by maximizing or minimizing some properly chosen linear
Disjunctive programming
21
function on F#, i.e., by solving a linear program of the form
or its dual
c eh
= g,]
hcQ*
~g[,h - A hchS 0,
1
h E Q".
tgz-0, p a 0 .
From Theorem 5.1, if aoSO then F&,, f $4, i.e., PT(g, a,,)is always feasible; whereas if a. > 0, then PT(g, ao) is feasible if and only if 0 $ cl conv F. This latter condition expresses the obvious fact that an inequality which cuts off the origin can only be derived from a disjunction which itself cuts off the origin. Two problems arise in connection with the use of the above linear programs to generate facets of cl conv F. The first one is that sometimes only Q is known, but Q" is not. This can be taken care of by working with Q rather than Q". Let P,(g,aJ denote the problem obtained by replacing Q" with Q in Pz(g,a,,), k = 1, 2. It was shown in [lo], that if P2(g,a") has an optimal solution $such that
(a==, PfO)+ h E Q * , then every optimal solution of P , ( g , a,,)is an optimal solution of PT(g, ao).Thus, one can start by solving P,(g, a,,).If the above condition is violated for some h E Q \ Q", then h can be removed from Q and P,(g, ao)solved for Q redefined in this way. When necessary, this procedure can be repeated. The second problem is that, since the facets of cl conv F of primary interest are > 0, since they cut off the the nonhomogeneous ones (in particular those with a,, origin), one would like to identify the class of vectors g for which PT(g, a,) has a finite minimum. It was shown in [lo], that PT(g, ao)has a finite minimum if and only if AgEclconvF for some A > O ; and that, should g satisfy this condition, a x 2 a,,is a facet of cl conv F (where F is again assumed full-dimensional) if and only if a = 7 for every optimal solution (7, to PT(g, ao). As a result of these characterizations, facets of the convex hull of the feasible set F can be computed by solving the linear program Pl(g, a,,)or its dual. If the disjunction defining F has many terms, like in the case when F comes from the disjunctive programming formulation of a 0-1 program with a sizeable number of
e)
E. Balas
22
0-1 conditions, Pl(g, ao>is too large to be worth solving. If, however, F is made to correspond to a relaxation of the original zero-one program, involving zero-one conditions for only a few well chosen variables, then Pl(g, ao) or its dual is practically solvable and provides the strongest possible cuts obtainable from those particular zero-one conditions. On the other hand, since the constraint set of P2(g,a,,)consists of 1 0 1 more or less loosely connected subsystems, one is tempted to try to approximate an optimal solution to P2(g,a,,)-and thereby to Pl(g, ao)-by solving the subsystems independently. Early computational experience indicates that these approximations are quite good. We now give a numerical example for a facet calculation.
Example 5.2. Find all those facets of cl conv F which cut off the origin (i.e., all facets of the form a x 3 l), where F c R2 is the disjunctive set F=F, v F ~ v F ~ v F ~ , with
F l = { x I - x 1 + 2 x 2 2 6 , O G ~ l c 1 x2>0}, ,
F 2 = { x ( 4 x 1 + 2 x 2 3 1 1 , 1Sx1=S2.5, x2Z30}, F ~ = { Ix - ~ l + ~ 2 ~ - 2 , 2 . 5 Q ~ l ~ 4 , ~ i ~ O } , F4={x ~ X ~ + X ~ ~ ~ , ~ ~ X ~ Q ~ , X ~ ~ O } , (see Fig. 3). After removing some redundant constraints, F can be restated as the set of those x E R: satisfying
and the corresponding problem Pl(g, 1) is
23
Disjwnctiue programming
Fig. 3.
Solving this linear program for g = (1, l),yields the optimal points
(y ; 6) = (f, f; t, $,
t, 81,
1 1 1 1 1 1 3 ; S, 9, g , 3),
and (Y;6) = (3,
which have the same y-component: (i, f). These points are optimal (and the associated y is unique) for all g > O such that g, < 5 g 2 . For g, = 5g,, in addition to the above points, which are still optimal, the points 1 7 1 2 1 3 1
(y; 6) = (5, S;
6, 9, 18,
d
and (Y;0) = (6,
7 2 1 3 1 2 E, g, i ~ d, ,
a),
which again have the same y-component y = (t, also become optimal; and they are the only optimal solutions for all g > 0 such that g, > 5g2. We have thus found that the convex hull of F has two facets which cut off the origin, corresponding to the two vertices y = (5, 4) and y2 = (2, g) of F$,.
’
iXl
+$,a 1,
$ x , +ax,> 1.
6 . Facial disjunctive programs In this section we discuss the following problem [lo]. Given a disjunctive program in the conjunctive normal form (2.2), is it possible to generate the convex hull of feasible points by imposing the disjunctions j e S one by one, at each step calculating a “partial” convex hull, i.e., the convex hull of the set defined by the inequalities generated earlier, plus one of the disjunctions? For instance, in the case of an integer program, is it possible to generate the convex hull of feasible points by first producing all the facets of the convex hull of points satisfying the linear inequalities, plus the integrality condition on, say, x , ; then adding all these facet-inequalities to the constraint set and generating the facets of the convex hull of points satisfying this amended set of inequalities, plus the integrality condition on x,; etc.? The question has obvious practical importance, since calculating facets of the convex hull of points satisfying one disjunction is a considerably easier task, as shown in the previous section, then calculating facets of the convex hull of the full disjunctive set. The answer to the above question is negative in general, but positive for a very important class of disjunctive programs, which we term facial. The class includes (pure or mixed) 0-1 programs.
E . Balm
24
The fact that in the general case the above procedure does not produce the convex hull of the feasible points can be illustrated on the following 2-variable problem.
Example 6.1. Given the set F,, = {X E R~ I - 2 ~+,2x2s 1, 2 ~-,2 ~ s, 1, 0 s
s 2, 0 s xz s 2)
find F = cl conv (F,,fl {x 1 x,, x2 integer}). Denoting
F, = cl conv (F,, n{x I x1 integer}),
F2 = cl conv (F, n { x I x2 integer)),
the question is whether F2 = F. As shown in Fig. 4,the answer is no, since
If the order in which the integrality constraints are imposed is reversed, the outcome remains the same. Consider a disjunctive program stated in the conjunctive normal form (2.2), and denote
F,={xER" IAx>a,, ~ 2 0 ) . The disjunctive program (and its constraint set) is called facial if every
(0,O)
(2,O)
(0,O)
(2,O)
(b) F = c l c o n v (Fan { x l x , , x2 integer))
(a)
Fp = c l c o n v ( FIn {xlxeinteger))
I
Fig. 4.
Disjunctive programming
25
inequality d i x 3 din that appears in a disjunction of (2.2), defines a face of Fn: i.e., if for all i E Q,, j E S, the set
F n n { xI d'xsdio} is a face of FO.(A face of a polyhedron P is the intersection of P with some of its boundary planes.) Clearly, DP is facial if and only if for every i E Qi, j E S, F o g {x 1 d'x d die}. The class of disjunctive programs that have the facial property includes the most important cases of disjunctive programming, like the 0-1 programming (pure or mixed), nonconvex quadratic programming, separable programming, the linear complementarity problem, etc. ; but not the general integer programming problem. In all the above mentioned cases the inequalities d ' x s d , , of each disjunction actually define facets, i.e., (d - 1)-dimensional faces of F,, where d is the dimension of F,. Another property that we need is the boundedness of Fo. Since this can always be achieved, if necessary, by regularizing Fn, its assumption does not represent a practical limitation.
Theorem 6.1. Let the constraint set of a DP be represented as F=[xEF,
1V
iEQ,
(dixz-di,),jES},
where F,={xeR" I A x z - a o , x>O}. For an arbitrary ordering of S , define recursively
V (F,-,n{ x I d'x
ieQ,
die})],
j
= 1, . . . , IS[.
If F is facial and FO is bounded, then Flsl= conv F. The proof of this theorem [lo] uses the following auxiliary result.
Lemma 6.1.1. Let PI, . . . ,P, be a finite set of polytopes (bounded polyhedra), and P= Ph. Let H' = {x E R" I d'x 4 dio} be an arbitrary halfspace, and H - = {x E R" I d'x 3 die}. I f P G H + , then
u;*=,
H - nconv P = conv ( H - nP ) . Proof. Let H - n conv P # $3 (otherwise the Lemma holds trivially). Clearly, ( H - n P ) G ( H - nconv P ) , and therefore conv ( H - n P ) G conv ( H - nconv P ) = H -
nconv P.
E. Balas
26
To prove =, , let u l , . . . , u, be the vertices of all the polytopes Ph, h = 1, . . . , r. Obviously, p is finite, conv P is closed, and vert conv P G (Ukz1 vert Ph). Then P
x E H7-I conv P5$ d'x 3 dio and
X
hkUk,
= k=l
with P
k = l , ..., p .
xhk=1, k=l
Further, Pc H + implies d'u, S die, k = 1 , . . . ,p. We claim that in the above expression for x , if hk > 0 then d'uk 3 dio. To show this, we assume there exist h k > 0 such that diUk < dio. Then d'x=d'(~lAkuk)
f:
k=l
=
a contradiction. Hence x is the convex combination of points uk E H - n P, or x E conv ( H - n P). 0 Another relation to be used in the proof of the theorem, is the fact that for arbitrary S , , S,sR", conv (conv S , U conv S,) = conv ( S , U SJ.
Proof of Theorem 6.1. For j = 1, the statement is true from the definitions and the obvious relation
V
( F o n { xI d ' x 2 d i o } ) = X E F ,
ieQl
v (d'xad,,)).
licQ,
To prove the theorem by induction on 1, suppose the statement is true for j = 1 , ..., k. Then
V
1
(Fk n{x 1 d'x a die}) (by definition)
iEQkt,
V
=conv[
1
( { x ( d i x a d i o } f l c o n v ( x E F O '€0, V ( d i x 2 = d i o )j, = 1 , ...,
icQk+l
(from the assumption)
(from Lemma 6.1.1, since DP is facial, hence F 0 r { x 1 d ' x s d i o } , and Fo is bounded.) =conv[(
V ieQk+I
{ x I d ' x z = d i o } n X E F , V ( d ' x a d i o ) ,j = 1 , . . . ,
) {
IieQ,
21
Disjunctive programming
(from (6.1) and the distributivity of
n with respect to V)
(d’x a d i o ) , j = 1, . . . , k + I], i.e., the statement is also true for j = k + 1. 0 Theorem 6.1 implies that for a bounded facial disjunctive program with feasible set F, the convex hull of F can be generated in )SJstages, (where S is as in (2.2)), by generating at each stage a “partial” convex hull, namely the convex hull of a disjunctive program with only one disjunction. In terms of a 0-1 program, for instance, the above result means that the problem min{cxIAxab, O S x S e , xj=O or 1, j = 1 , ..., n } , where e = (1,.. . , l), is equivalent to (has the same convex hull of its feasible points, as) min{cxIAxab, O S x S e , aixacuio, iEH,, xj=O or 1, j = 2 , . . . , n } (6.2) where a i x 3 a i o , i € H I , are the facets of F,=conv{x I A x a b , O q x S e , x , = O o r l } . In other words, x1 is guaranteed to be integer-valued in a solution of (6.2) although the condition x, = 0 or 1 is not present among the constraints of (6.2). A 0-1 program in n variables can thus be replaced by one in n - 1 variables at the cost of introducing new linear inequalities. The inequalities themselves are not expensive to generate, since the disjunction that gives rise to them (xl = Ovxl = 1) has only two terms. The difficulty lies rather in the number of facets that one would have to generate, were one to use this approach for solving 0-1 programs. However, by using some information as to which inequalities (facets of a “partial” convex hull) are likely to be binding at the optimum, one might be able to make the above approach efficient by generating only a few facets of the “partial” convex hull at each iteration. This question requires further investigation. For additional results on facial disjunctive programs see [33, 341.
7. Disjunctive programs with explicit integrality constraints The theory reviewed in the previous sections derives cutting planes from disjunctions. In this context, 0-1 or integrality conditions are viewed as disjunctions, and the disjunction to be used for deriving a cut usually applies to the basic variables. In this section we discuss a principle for strengthening cutting planes derived from disjunctions in the case when, besides the disjunction which applies to the
E. Balm
28
basic variables, there are also integrality constraints on some of the nonbasic variables. In [14] we first proved this principle for arbitrary cuts, by using subadditive functions, then applied it to cuts from disjunctions. Here we prove the principle directly for the latter case, without recourse to concepts outside the framework of disjunctive programming. Let a DP be stated in the disjunctive normal form (2.1), and assume in addition that some components of x are integer-constrained. In order for the principle that we are going to discuss to be applicable, it is necessary for each Ahx, h E Q, to have a lower bound, say b,h. With these additional features, and denoting by J the index set for the components of x (IJI = n ) , the constraint set of the DP can be stated as A'x 2 b:,, i E Q, (7.1) x 20,
V (A'x2 ah),
(7.2)
i s 0
and
xi integer,
~EJ c J, ,
(7.3)
where
ahabf,
iEQ.
(7.4)
Let Q = (1, . . . , q } , and let a; stand for the j-th column of A', j
E J,
i E Q.
Theorem 7.1. [14] Define
mi integer,
iEQ
(7.5)
Then every x ER"that satisfies (7.1), (7.2), (7.3), also satisfies the inequality
1"pi 2
(7.6)
"0,
jsJ
where
.={
inf maxB'[aj+rq(ab-bb)],
jEJ1
m o M ieQ
(7.7) max
e%;,
j E J \ J , = J2
io0
and = min
. .
eta;.
(7.8)
'€0
To prove this theorem we will use the following auxiliary result.
Lemma 7.1. Let mj EM, mi = (mii), j E J1.Then for every x E R" satisfying (7.3) and x 3 0, either
2 miixi= 0,
id,
vi E Q
(7.9)
Disjunctioe programming
29
or (7.10)
holds.
Proof. If the statement is false, there exists 2 3 0 satisfying (7.3) and such that
c c m-x.-,<0. *I
is0 i c J ,
On the other hand, from X 3 0 and the definition of M,
a contradiction. 0
Proof of Theorem 7.1. We first show that every x which satisfies (7.1), (7.2) and (7.3), also satisfies (7.2') for any set of mi EM, j EJ,.To see this, write (7.2') as (7.2") From Lemma 7.1, either (7.9) or (7.10) holds for every x 3 0 satisfying (7.3). If (7.9) holds, then (7.2") is the same as (7.2) which holds by assumption. If (7.10) holds, there exists k E Q such that CieJ,rn,,x, = 1+ A for some A L 0. But then the kth term of (7.2") becomes
1a xi 2 bb - A (a;- b fJ) jeJ
which is satisfied since h(af;-bi)Z=O and x satisfies (7.1). This proves that every feasible x satisfies (7.2'). Applying to (7.2') Theorem 3.1 then produces the cut (7.6) with coefficients defined by (7.7), (7.8). Taking the infimum over M is justified by the fact that (7.6) is valid with a. as in (7.8), ai as in (7.7) for j € J 2 , and ai=max Oi[aj+mii(ab-bb)] isQ
for j E J l r for arbitrary miEM. 0
Corollary 7.1.1. [14]. Let the vectors cri(a6- b;) = 1,
dab > 0.
cri,
i E Q, satisfy (7.11)
E . Balas
30
Then every X ER"that satisfies (7.1), (7.2) and (7.3), also satisfies (7.6')
where f
(7.7')
Proof. Given any ai, i E Q, satisfying (7.11), if we apply Theorem 7.1 by setting 6' = ( a i / u i a ; )i, E Q, in (7.7) and (7.8), we obtain the cut (7.6'), with Pi defined by (7.7'), j € J . 0 Note that the cut-strengthening procedure of Theorem 7.1 requires, in order to be applicable, the existence of lower bounds on each component of A'x, V i E Q. This is a genuine restriction, but one that is satisfied in many practical instances. Thus, if x is the vector of nonbasic variables associated with a given basis, assuming that A'x is bounded below for each i E Q amounts to assuming that the basic variables are bounded below and/or above. In the case of a 0-1 program, for instance, such bounds not only exist but are quite tight.
Example 7.1. Consider again the mixed-integer program of Example 3.1 (taken from [35]), and assume this time that x1 and x2 are 0-1 variables rather than just integer constrained, i.e., let the constraint set of the problem be given by x1 =
x2
0.2+0.4( -x,)+ 1 . 3- ~ ~ ) - 0 . 0 1 ( - ~ g ) + o . 0 7 ( - ~ 6 )
= 0.9-0.3( -x,)
+ 0.4(-x4)-0.04(
-xg)
+ 0.1( -x6)
x j a O , j = l , ..., 6; x j = O o r 1 , j = 1 , 2 ; xi integer,j=3,4. This change does not affect the Gomory cuts or the cuts obtainable from extreme valid inequalities for the group problem, which remain the same as listed in Example 3.1. Now let us derive a cut, strengthened by the above procedure, from the disjunction
Since xl, - x2 and x2 are bounded below by 0, - 1 and 0 respectively, we have
a ; = ( - 0.2 o.9),
b;=(-o.l); - 0.2
ag=0.1,
b:= -0.9.
31
Disjunctive programming
Applying Corollary 7.1.1, we choose u1= (4, l), u 2= 1, which is easily seen to satisfy (7.11). Since Q has only 2 elements, the set M of (7.5) becomes
~ = { m = ( m , , m , ) ~ m , +m,, m , m~2~integer} ; and, since at the optimum we may assume equality,
I
~ = { =(m,, m -ml) m, integer}. The coefficients defined by (7.7‘) become
’3
min max
4 X (-0.4)+ 1X (-0.3)+ m, 1X 0.3- m, 4X(-0.2)+1X0.9 ’ lXO.l
min max
4 X (- 1.3)+ 1X 0.4+ m, 1X (-0.4)- m, = -24 (with m, 4~(-0.2)+1X0.9 ’ lXO.l
= m , integer
p4=m,integer
{
1-
ps = max
4XO.Ol+lX(-O.O4) 1XO.04 4 x (-0.2)+ 1X 0.9 ’ 1X 0.1
= 0.4,
p6 = max
4 X (-0.07)+ 1X 0.1 1X (-0.1) 4X(-0.2)+1X0.9 ’ l X O . l
I
=
=
-7
(with m , = l), = 2),
-1,
and the cut is
+
- 7x, - 24x4 0 . 4 ~ xg ~ 2 1,
which has a smaller coefficient for x4 (and hence is stronger) than the cut derived in Example 3.1. In the above example, the integers mi were chosen by inspection. Derivation of an optimal set of mi requires the solution of a special type of optimization problem. Two efficient algorithms are available [14] for doing this when the multipliers u iare fixed. Overall optimization would of course require the simultaneous choice of the ui and the mi, but a good method for doing that is not yet available. The following algorithm, which is one of the two procedures given in [14], works with fixed ui, i E Q. It first finds optimal noninteger values for the mi, i E Q, and rounds them down to produce an initial set of integer values. The optimal integer values, and the corresponding value of pi, are then found by applying an iterative step k times, where k 6 IQI- 1, IQ1 being the number of terms in the disjunction from which the cut is derived.
Algorithm for calculating
Pi,j E J 1 , of (7.7’)
Denote (y.
=
. . (Tq,
hi =
(7.12)
E. Balm
32
and (7.13) Calculate m"=-ai,
(7.14)
iEQ,
hi
set mi= [m?]],i E Q, define k
=
-CieO [rn?],
and apply k times the following
Iterative Step. Find &(a,+m, + l ) = m i n hi(ai+mi +1) isQ
and set
m, + m, + 1,
mi
i E Q \ {s}.
+ mi,
This algorithm was shown in [14] to find an optimal set of mi (and the associated value of Pi) in k steps, where k = -Ciso [m*]sIQI- 1.
Example 7.2. Consider the integer program with the constraint set -1
+5( - x5)-$x - X6) +d(-
x7), x2 = $+$( - Xg) +&(- X6) -&( - x7), 1 -6
+t( xg) -g - x7),
x3 = $-%(- x5) 4
XI
-1
-6+
-
2( - x5) +3- X6) - i x
-
x7),
+ x2 + x3 + x4 3 1,
x j = O o r l , j = l ,..., 4;
x i 3 0 integer, j = 5 , 6 , 7 .
We wish to generate a strengthened cut from the disjunction
x, 3 1v x* 3 1v x33 1v x 43 1. If we apply Theorem 3.1 without strengthening and choose 8' = (1 -aio)-',
i = 1 , 2 , 3 , 4 , we obtain the cut $ X S + ~ X 6 + f X1,7 ~ whose jth coefficient is - a,
ai= max -. i~{1,2,3,4)1- a,(]
To apply the strengthening procedure, we note that each xi, j
= 1,2,3,4
is
Disjunctiue proRramm inR
33
bounded below by 0. Using mi =1 (which satisfies (6.11) since a'(ab-bb)= . . m i[1 - aio- (-aio)]= 1, i = 1,2,3,4, and m'af, = m i(1- aio)> 0, we obtain
p7 = min max{$(-d+ m,), $Q+m2),$($+m,), $($+m4)}. msM
Next we apply the above Algorithm for calculating For j = 5 : y =
-#; my=%, m f =-23 102,
m:=
PI:
-8, m f = $ .
Thus our starting values are [my]= 0, [mf] = - 1, [m:] = - 1, [ m f ]= 0. Since k = - ( - 1)- (- 1) = 2,the Iterative step is applied twice:
1. min{-&, -$,
3,
+}=-$, s=2; m,=0, m2=-1+1=0,
m 3 = - 1 . m,=O.
2. min{-+,&$,$}=-+, s = I ; m , = I , m 2 = 0 , m3=-1, m 4 = 0 . These are the optimal m,, and ~ , = m a x { - f , -$, For j = 6 ; y =
1.
-4,
-6, [my]=-1,
-14 , 32, 51}-- -I4,
-$}=-I 5
5.
[ m : ] = -1, [mf]=O, [m:]=O; k=2.
s=2; m , = - l , m2=0, m3=o, m 4 = 0 .
2. min{& 2, $,&}=+, s = 4 ; m = - I , m=O, m=O, m = l .
-a,
P,=max{-:,
-!, +}=+.
Forj=7:y = -&, [mT]=O, [mf]=- 1 , [mT]= -1, [ m f ] =-1; k = 3 . 1. min{!5 , 41, 31, 5 1)-1 - s , s = l ; m,=1, m 2 = - 1 , m 3 = - 1 , m a = - 1 ; 2. min{$, $, f , &}=&s=4; m , = I , m2= -1, m 3 = -1, m4=O; 3. rnin{$,$,f, $}=$, s=2; m , = I , m2=0, m 3 = - 1 , m 4 = 0 .
p7 = max (3, $, -3. +}=$. Thus the strengthened cut is -:x,
+:x,
+ax,
3
1.
The frequently occurring situation, when IQI = 2, deserves special mention. In this case the coefficients pi, j E J , , are given by Pi
= min {A, (a1 + ( m$>),A 2 ( a 2 - [ mEI)},
(7.15)
where
(7.16)
E. Balas
34
with ai, hi, i = 1, 2, defined by (7.12), and (rn;) = the smallest integer 3 rn:. The optimal value of rn, = - rn2 is either (rn;) or [rn;], according to whether the minimum in (7.15) is attained for the first or the second term. The strengthening procedure discussed in this section produces the seemingly paradoxical situation that weakening a disjunction by adding a new term to it, may result in a strengthening of the cut derived from the disjunction; or, conversely, dropping a term from a disjunction may lead to a weakening of the inequality derived from the disjunction. For instance, if the disjunction used in Example 7.2 is replaced by the stronger one X I 3
1 v x2 3 1 v x3 3 1,
then the cut obtained by the strengthening procedure is -&x5+$xg+&,&
1,
which is weaker than the cut of the example, since the coefficient of x6 is $ instead of 4. The explanation of this strange phenomenon is to be sought in the fact that the strengthening procedure uses the lower bounds on each term of the disjunction. In Example 7.2, besides the disjunction x1 3 1v x2 3 1v x3 2 1v x4 3 1, the procedure also uses the information that xi 2 0 , i = 1, 2, 3, 4, When the above disjunction is strengthened by omitting the term x4 B 1, the procedure does not any longer use the information that x4 2 0.
8. Some frequently occurring disjunctions
As mentioned earlier, one of the main advantages of the disjunctive programming approach is that it can make full use of the special structure inherent in many combinatorial problems. In [8, 91 cutting planes are derived from the logical conditions of the set partitioning problem, the linear complementarity problem, the two forms of representation for nonconvex separable programs, etc. More general complementarity problems are discussed in [33, 341. Here we illustrate the procedure on the frequently occurring logical condition (where xi 2 0 integer, iEQ)
cxi=l, i EQ
often called a multiple choice constraint. If all the problem constraints are of this form, we have a set partitioning problem. But the cut that we derive uses only one condition (8.1), so it applies to arbitrary integer programs with at least one equation (8.1). It also applies to mixed-integer programs, provided the variables 4,i E Q, are integer-constrained. Here we discuss the all integer case, to simplify the notation. Let I and J be the index sets for the basic and nonbasic variables in a basic
35
Disjunctive programming
feasible noninteger solution of the form
xi = a,,+
(8.2)
i E I.
- q), i d
In [8], [9], several cutting planes are derived from the disjunction
Here we describe another cut, which in most cases turns out to be stronger than those mentioned above. It is derived from the disjunction
clearly valid for any partition (Q1,Q,) of Q in the sense of being satisfied by every integer x satisfying (8.1). We require only that
I ai, > 0) # (3,
{i E Qk Denoting
P,”=
c
aii,
k
(8.4)
= 1,2.
(8.5)
k = 1,2; j E J U { O ) ,
ielnQk
(8.3) can be written as
which implies the disjunction
with pb> 0, k = 1 , 2 . Note that once the sets I n Qk, k = 1,2, are chosen, the sets Jn Q1 and Jn Q, can be “optimized,” in the sense of putting an index j E Jfl Q into Q1if (&!/PA) 3 (PT/p& and in Qz otherwise. Using this device while applying Theorem 3.1 to (*) with multipliers tIk = l/P;, k = 1, 2, we obtain the cut
1
Pj?
3
1
7
with coefficients jEJ\
Q, (8.7)
,
jEJnQ.
E. Balm
36
We now apply the strengthening procedure of Section 7 to the coefficients pi, j E J \ Q (the coefficients indexed by J n Q can usually not be further strengthened). This, as mentioned earlier, assumes that all xi, jEJ\Q, are integer constrained. A lower bound on CieJ,Qkp,"xj is pk-1, for k = 1,2, since
C
xi=pf;+
idnQk
C
p,"(-xi)S1,
k = l , 2.
IEJ\Qk
The multipliers uk= 1, k = 1, 2, salisfy condition (7.11) of Corollary 7.1.1, since ~ ~ [ P ~ - ( ( P ~ - l ) l = luSp;>O, , k = l , 2, and thus the jth coefficient of the strengthened cut becomes (Corollary 7.1.1)
m e M ks(1.2)
with M defined by (7.5). Applying the closed form solution (7.15) to the minimization problem involved in calculating pi (in the special case of a disjunction with only two terms), we obtain
and hence
where
We have thus proved
Theorem 8.1. If (8.2) is a basic feasible noninteger solution of the linear program associated with an integer program whose variables have to satisfy (8.1), then for any partition (In Q1, I n Q,) of the set I n Q satisfying (8.4),the inequality (8.6) is a valid cut, with coefficients
(8.9)
where the p,", k = 1, 2, j e J , are defined by (8.5) and mg is given by (8.8). We illustrate this cut on a set partitioning problem, which is a special case of Theorem.
Example 8.1. Consider the set partitioning problem whose cost vector is c = (5,4, 3 , 2, 2, 3, 1, 1, 1, 0) and whose coefficient matrix is given in Table 1.
37
Disjunctive programming Table 1 1
2 3 4
1 1 2 1 1 3 1 4 1 5
5 6 7 8 9 10 - 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1
1
The linear programming optimum is obtained for x4 = x8 = xg = 1/3, x7 = 2/3, xl0 = 1, and xi = 0 for all other j . The associated system (8.2) is shown in the form of a simplex tableau in Table 2 (artificial variables have been removed). Table 2 1
-x,
-xg
-x3
-xs
-x2
~~~
We choose the disjunction corresponding to row 5 of the matrix shown in Table 1, which is of the form (8.1), with Q = {3,4,5,6,8,9}. We define I f l Q1= {4,8}, In Q2= {9}, and we obtain the coefficients shown in Table 3. Table 3
Since J n Q = {3,5,6}, we need the values m$ for j mX(1) =$, mz(2) = -5. Hence
8}}=0,
pa=min{o, max{q,
{
p3 = min 0, max
{i,}:
= 0,
0 -2 ps=min{0, max{j, f } } = ~ , {"O -$-(-l) p,=min 7, 1 3
EJ
\ Q = (1,2}. They are
E. BRIRS
38
and we obtain the cut
$x,+x2z-1, or 1
-x,
-xg
-x3
-x5
-x2
0
0
-1
~
s
-1
-4
0
which is considerably stronger than the traditional cuts that one can derive from Table 2, and it actually implies that the nonbasic variable x2 has to be 1 in any integer solution. Dual cutting plane methods have been found reasonably successful on set partitioning problems. Using stronger cuts can only enhance the efficiency of such methods, since the computational cost of the cut (8.9) is quite modest.
9. Combining cutting planes with branch and bound
The disjunctive programming approach offers various ways of combining branch and bound with cutting planes, some of which are currently the object of computational testing. Here we discuss one feature which seems to us crucial. For any problem P, let u(P) denote the value of P (i.e., of an optimal solution to P ) . Suppose we are using branch and bound to solve a mixed-integer 0-1 program P, stated as a maximization problem. If {Pi}ioQis the set of active subproblems (active nodes of the search tree) at a given stage of the procedure and fi(Pi)is the available upper bound on u(Pi) (assume, for the sake of simplicity, that G(Pi)=u(LPi), where LP, is the linear program associated with P i ) , then maxioQ6(Pi)is an upper bound on u ( P ) . Also, _v(P),the value of the best integer solution found up to the stage we are considering, is of course a lower bound on u ( P ) ; i.e., at any stage in the procedure,
-u ( P )s u ( P )s max fi(Pi). ioQ
(9.1)
Hence the importance of finding good bounds for both sides of (9.1). It is a crucial feature of the approach reviewed here that it can be used to derive a cutting plane from the optimal simplex tableaux associated with the subproblems LP,, i E Q, which provides an upper bound on u ( P ) at least as good as, and often better than maxioQfi(Pi). Let the linear program LP associated with the mixed-integer 0-1 program P have an optimal solution of the form (9.2)
Disjwnctiue programming
39
where I and J are the index sets for the basic and nonbasic variables respectively, and let IIand J1be the respective index sets for the integer constrained variables. Here ah0 3 0, h E I, and ah0 < 1, h E II.Further, since P is a maximization problem and the solution (9.2) is optimal, aOj3 0, j E J. Now let {Pk}ksQ be the set of active subproblems, and for k E Q, let the Optimal solution to LPk, the linear program associated with Pk, be of the form
where I k , J k are defined with respect to LPk the same way as I, J with respect to LP. Again a,ki>O,V j E J k , since each LPk is a maximization problem. In order to derive a valid cutting plane from (9.3)k, k € Q, we view the branching process as the imposition of the disjunction
where A x 3 b stands for the system
1(-
ahj)Xj
a - ahor
h E I,
jsJ
cahjxj isJ
3 aha- 1,
h E I,,
expressing the conditions x h a o , h € I , composed of inequalities of the form
xh S
1, h E I l , while each D k x L d $ is
caiixj c(
3 ai,
j
or
d
- U i j > X j3 1- aio,
jeJ
corresponding to the conditions 6 0 or xi > 1 for some i E I, whose totality, together with Ax 5 b, x 2 0, defines Pk. Now consider the cut derived from (9.4) on the basis of Theorem 3.1, with the optimal dual variables (obtained by solving LPk) used as the multipliers B k , k E Q. By optimal dual variables we mean those obtained by minimizing the sum of aojxj, j e J , subject to the kth term of (9.4). If for k g Q we denote by ( u k ,u k ) the optimal dual vector associated with the kth term of (9.4), and a k= u k D k+ v k A ,
a$= u k d $ + v k b ,
(9.5)
and if ah> 0, k E Q, then according to Theorem 3.1, the inequality (9.6)
E . Balas
40
is satisfied by every x satisfying (9.4), i.e., by every feasible integer solution. The condition a;> 0, k E Q, amounts to requiring that U(LPk)< u(LP), i.e., that the “branching constraints” Dkx a d : force u(LP) strictly below u(LP), Vk E Q. This is a necessary and sufficient condition for the procedure discussed here to be applicable. Should the condition not be satisfied for some k E Q, one can use a different objective function for LPk than for the rest of the subproblems -but we omit discussing this case here. Note that, since b S O , ukbSO, V k E Q, and thus agk>O implies ukdgk>O, V k E Q . Since the multipliers ( u k ,u k ) are optimal solutions to the linear programs dual to LPk, k E Q, they maximize the right hand side coefficient agk of each inequality akx3 agk underlying the cut (9.6) subject to the condition that af S a,, V j E J. We now proceed to strengthen the inequality (9.6) via the procedure of Section 7. To do this, we have to derive lower bounds on akx, k E Q. We have k~
+
- a: = U k (Dkx- dgk) u (Ax - b ) 2 u k (Dkx- d;) a - uke,
where e = (1, . . . , 1). The fist inequality holds since Ax - b 2 0 for all x satisfying (9.4), while the second one follows from the fact that each inequality of the system Dkx - dgk2 0 is either of the form -xi 2 0 or of the form xi- 120, with i E 11,and in both cases - 1 is a lower bound on the value of the left hand side. Thus a k x a a ; - u k e, kEQ (9.7) holds for every x satisfying (9.4). Note that ukdgk>O implies u k # O and hence uk 3 0 implies uke > 0, k E Q. We now apply Corollary 7.1.1 to the system akx>agk-u k e, ~ E Q ,
v
x 20, (akxaagk),
ksQ
x i integer,
j
EJ,.
We choose crk = l/uke, k E Q, which satisfies condition (7.11) o the Corollary:
(l/uke)[agk-(a:- u‘e)] = 1,
(l/uke)agk>o.
The strengthened cut is then
C Pjxi 2 1,
(9.9)
iSJ
with
p. =
(9.10)
Disjunctive programming
41
where
(9.11) The values of a:, a: and uke needed for computing the cut coefficients, are readily available from the cost row of the simplex tableaux associated with the optimal solutions to LP and LPk, k E Q. If the latter are represented in the form (9.2)and (9.3,respectively, and if d ; and a, denote the jth column of D k and A of (9.4),while S k is the row index set for D k , k E Q, we have for all k E Q, = aOi - Ukd;- v k
=aOi-a:,
~ i
jEJknJ,
and jEJknSk,
a&=O+u;,
since the indices j E s k correspond to the slack variables of the system - D k xs -d,k, whose costs are 0 (note that S k n J = 9, by definition). Further, for j E J \ J k = J nI k the reduced cost of xi in LPk is 0, hence for all k E Q
0 = aoi- ukd:-vka, = a,,
-a;.
Finally,
a;,, = a”,- u k d i - v k b VkeQ.
=UOO-~;,
From the above expressions we then have for k E Q,
(9.12)
uke=
C
aii
(9.13)
i E Jkns,
(since uk = 0, V i E Skn I k ) . The representation (9.3)kof the optimal solution to LPk assumes that the slack variable j E S k of each “branching constraint” xic 0 or xi 3 1 that is tight at the optimum, is among the nonbasic variables with aOi> 0. If one prefers instead to replace these slacks with the corresponding structural variables xi and regard the latter as “fixed” at 0 or 1, and if Fk denotes the index set of the variables fixed in LPk, the reduced costs a,ki, i € J k nFk are then the same, except for their signs, as a&, j E J k n s k , and the only change required in the expressions derived above is
E.Balas
42
to replace (9.13) by
uke =
1
(9.13')
isJknFk
Of course, in order to calculate a cut of the type discussed here, one needs the reduced costs u,"~for both the free and the fixed variables. We have thus proved the following result.
Theorem 9.1. If LP and LPk, k E Q, have optimul solutions of the form (9.2) and ( 9 3 , respectively, with a,, > at,, k E Q, then every feasible integer solution satisfies the ilzequality (9.9), with coefficients defined by (9.10), (9.11), (9.12) and (9.13) [or (9.1371.
In the special case when lQl=2 and LP1, LP2 are obtained from LP by imposing x, s 0 and 3 1 respectively (for some i E I , such that 0 < ai,< l ) , the definition of pi for j E J 1 becomes (9.10')
with (9.14)
We now state the property of the cut (9.9) mentioned at the beginning of this section.
Corollary 9.1.1. Adding the cut (9.9) to the constraints of LP and performing one dual simplex pivot in the cut row reduces the objective function value from a,, to tioo such that a,, G max a,,.k (9.15) kEQ
Proof. For each k E Q, a,",= a,,VkEQ, Then
a,". Now suppose (9.15) is false, i.e., tio,> ak,,
tioo= a,, - min aOi/Pi> a,, - a,", Vk E i d IPj>O
Q,
and hence for all k E Q, (9.16)
(9.17)
Disjunctive programming
43
The second inequality in (9.16) holds since the cut (9.9) is a strengthened version of (9.6), in the sense that p,srnax-i;, ff . :jEJ. hsQ
ffo
Now suppose the minimum in the second inequality of (9.16) is attained for j = t. Since (9.16) holds for all k E Q, we then have
or (since aS>O, ag>0), a;> uot.But this contradicts the relation a ; S a o timplied by (9.12) Note that the Corollary remains true if the strenghtened cut (9.9) is replaced by its weaker counterpart (9.6). In that case, however, (9.15) holds with equality. The remarkable fact about the property stated in the Corollary is that by using the strengthened cut (9.9) one often has (9.15) satisfied as strict inequality. More precisely, we have the following
Remark 9.1. (9.15) holds as strict inequality if pt . ;€Jlal’3.0
Note that for (9.15) to hold as strict inequality the pivot discussed in the Corollary need not occur on a “strengthened” cut coefficient. All that is needed, is that the coefficient on which the pivot would occur in case the unstrengthened cut were used, should be “strengthened” i.e., reduced by the strengthening procedure). The significance of the cut of Theorem 9.1 is that it concentrates in the form of a single inequality much of the information generated by the branch and bound procedure up to the point where it is derived, and thus makes it possible, if one so wishes, to start a new tree search while preserving a good deal of information about the earlier one. Theorem 9.1 and its Corollary are stated for (pure or mixed) 0-1 programs; but the 0-1 property (as opposed to integrality) is only used for the derivation of the lower bounds on Dkx a d k , k E Q; hence it only involves those variables on which branching has occured. The results are therefore valid for mixed-integer programs with some 0-1 variables, provided the strengthening procedure is only used on cuts derived from branching on 0-1 variables.
Example 9.1. Consider the problem in the variables xi2 0, j = 1, . . . , 6 ; xi= 0 or 1, j = 1 , 4; xi integer, j = 2 , 3, whose linear programming relaxation has the optimal solution shown in Table 4.
E . Balas
44
Table 4
xo X, X*
1
-x3
-x4
-xs
- XI5
1.1 0.2 0.9
2.0 0.4 -0.3
0.2 1.3 0.4
0.05 -0.01 -0.04
1.17 0.07 0.1
If we solve the problem by branch and bound, using the rules of always selecting for branching (a) LPk with the largest ai,, and (b) xi with the largest max {up penalty, down penalty}, we generate the search tree shown in Fig. 5. The optimal solution is x = (0,2, 1,0,20,0), with value - 1.9, found at node 6. To prove optimality, we had to generate two more nodes, i.e., the total number of nodes generated (apart from the starting node) is 8.
Fig. 5 .
Suppose now that after generating the first four nodes, we wish to use the available information to derive a cut. At that point there are three active nodes, associated with LPk, for k = 2, 3, 4. The corresponding reduced cost coefficients a&, j E J k , are shown in Table 5. The slack variables of the “branching constraints” x1 S O , x1> 1, x4<0, and x4> 1 are denoted by x7, x8, x9 and xlo respectively. Table 5
k J k 1 3
a& a&
3
2
4.0 -2.9
4
8
6
6.7
5.0
1.52
( 7 5.0 0.1
4
9
5
6
1
6.3
0.1
0.82
3
10
1
6
4.0
6.7
5.0
1.52
-4.6
45
Disjunctive programming
The coefficients a:, a; and uke, extracted from Table 9.2, and the cost row of Table 4, are as follows:
k = 2: a; = uoo-u&, =4;u2e = u:
= u& = 5 ; a: = uos-u& = -2,
-6.5, a:=
a:= u ( ) ~ -
=0.05,
= ~ ( 1 6 -u&=
-0.35;
k = 3: a: = 1; u3e = u: + u; = 11.3; a: = 2, a: = 0.2, a; = -0.05, a ; = 0.35; k =4: a:=5.7; u4e=u;lo=6.7; a $ = -2, a%=0.2, a2=0.05, a:= -0.35. The coefficients of the strengthened cut are shown below, where M = ( r n ~ R ~ ) r n , + r n , + r n , a Omi, integer, i = l , 2, 3).
p3 = min max mtM
m, (2/11.3)+ m2 (-2/6.7)+ r n 3 = 0.75, ’ 1/11.3 ’ 5.716.7
{(-22y
with rn = (1, - 1,O).
p4= min max msM
with rn
{
]
(-6.5/5)+rn, (0.2/11.3)+rn2 (0.2/6.7)+rn3 = 0.035, 4/5 ’ 1/11.3 ’ 5.7/6.7
= (1,
-
1,O).
-}
0.05 -0.05 0.05 &=max - ( 4 ’ 1 ’ 5 . 7
--I
-0.35 0.35 -0.35 p6=max - 4 ’ 1 ’ 5 . 7
{
= 0.0125,
= 0.35.
Adding to the optimal tableau of LP the cut 0 . 7 5 +O.O35x4 ~~ + 0 . 0 1 2 5 +0.35x6 ~~
1
produces Table 6 and the two pivots shown in Tables 6 and 7 produce the optimal Tableau 8. Thus no further branching is required. Table 6
x,, x, x2
s
1.1 0.2 0.9 -1.0
1-1
2.0 0.4 -0.3
0.2 1.3 0.4 -0.035
0.05 -0.01 -0.04 -0.0125
1.17 0.07 0.1 -0.35
Table 7
Xg
Xi XZ
X?
-1.57 -0.333 1.3 1.33
2.67 0.533 -0.4 - 1.33
0.106 1.281 0.414 0.047
0.01675
1-0.016751 -0.035 0.01675
1.117 -0.117 0.240 0.467
E . Balm
46
Table 8
X”
x5 xj x3
1
--s
-x4
-x5
-x6
-1.9 20.0 2.0 1.0
2.687 -31.8
1.387 -73.0
1.0 -60.0 - 2.0 1.O
1.0 7.0
10. Disjunctions horn conditional bounds In solving pure or mixed integer programs by branch and bound, the most widely used rule for breaking up the feasible set is to choose an integerconstrained variable xi whose value ai, at the linear programming optimum is noninteger, and to impose the disjunction (xi S [ a i o ]v) ( x i 3 [qO]+1). It has been observed, however, that in the presence of multiple choice constraints, i.e., of constraints of the form
it is more efficient to use a disjunction of the form
where Q1U Q2 = Q, Q1n Qz= @, and Q, and Q2 are about equal in size. This is just one example of a situation where it is possible to branch so as to fix the values of several variables on each branch. The circumstance that makes this possible in the above instance is the presence of the rather tight multiple choice constraint. More generally, a tightly constrained feasible set makes it possible to derive disjunctions stronger than the usual dichotomy on a single variable. On the other hand, the feasible set of any integer program becomes more or less tightly constrained after the discovery of a “good” solution (in particular, of an optimal solution), provided that one restricts it to those solutions better than the current best. Such a “tightly constrained” state of the feasible set can be expressed in the form of an inequality T X S To,
with ~ 3 0 and , with a relatively small r O > O . One way of doing this, if the problem is of the form
(PI min {CX I Ax 2 b, x 2 0, xi integer, j E N } , with c integer, and if zu is the value of the current best integer solution, is to find a set of multipliers u such that uASc,
uaO,
Disjunctive programming
47
and define
Then multiplying Ax b by - u and adding the resulting inequality, - uAx G - ub, to cx S Z" - 1, yields the inequality
satisfied by every feasible integer x such that cx < zu. Here ~ 2 0r o,> O , and the size of r odepends on the gap between the upper bound zu and the lower bound ub on the value of (P). Now suppose we have such an inequality r x G ro.Without loss of generality, we may assume that > 0, V j (by simply deleting the zero components). Then the following statement holds [12].
Theorem 10.1. Let TER", T,ER, ( r , w O ) > O , N={l, . . . , n}, and for i = 1 , . . . , p , l ~ p s n let , QirN,
a,#@, with
rjC =imin ) vj. icQ,
If the sets Qi, i = 1, . . . ,p, satisfy the conditions (10.1) and
f
rj(i)
>
(10.2)
i=l
then every integer vector x 2 0 such that
TX
s r o ,satisfies the disjunction
P
V (xi = 0,j E Q i ) .
(10.3)
i=l
Proof. Every integer x a 0 which violates (10.3) satisfies (10.4) Multiplying by riCi) the ith inequality of (10.4) and adding up the resulting inequalities, one obtains (10.5)
48
E . Balas
Further,
P
C
a
[from (10.5)]
Tj(i)
i=l
[from (10.2)].
'To
Hence every integer x 5 0 that violates the disjunction (10.3) also violates the inequality T X s rro. 0 One way of looking at Theorem 10.1 is as follows. Suppose the constraints of an integer program which include the inequality rrx s rro, were amended by the additional constraints (10.4). From the proof of the Theorem, these inequalities give a lower bound on rrx which exceeds rro; this contradiction then produces the disjunction (10.3). Since the inequalities (10.4) are not actually part of the problem, we call the bound on rrx derived from them a conditional bound, and the disjunction obtained from such bounds, a disjunction from conditional bounds.
Example 10.1. The inequality 9x1
+ 8x2 + 8x3+ 7 ~ 4 + 7 +~ 65 ~ , +6x7 + 5x8 + 5 ~ 9 +5
~
~
~
+
+4X1*+3XI3+3Xl4+3x15 +2X16+2X,, s 10, together with the condition x a 0, xi integer Vj, implies the disjunction
(xi = 0, j = 1,2,3,4,5,6,7)v (xi= 0, j
= 1,8,9, 10,11, 12, 13, 14)v
v(xi=0,j=2,3,8,9,10,15,16,17). Indeed, ~ ~ ( ~ ) = m i n { 9 , 8 , 8 , 7 , 7 , 6 , =6, 6} rrii(2)= min {9,5,5,5,4,4,3,3)= 3, rri(3)=min {8,8,5,5,5,3,2,2}= 2,
and (lO.l), (10.2) are satisfied, since 6 + 3 + 2 > 1 0 , while 6 + 3 ~ 9 ( j = l), 6 + 2 S 8(j = 2,3), 6 s 7 ( j = 4,5), 6 < 6 ( j = 6,7), 3 + 2 s 5(j = 8,9, lo), 3 ~ 4 (=j11,12), 3 s 3(j = 13,14), 2 S 3(j= 15), 2 S 2(j = 16,17). Next we outline a procedure [12] based on Theorem 10.1 for systematically generating disjunctions of the type (10.3) from an inequality rrx rro, with rri > 0, V j. 1. Choose some S c N such that
4
~
~
Disjunctive programming
49
but
for all T c S, T Z S . Order S = {j(l), . . . , j(p)} according to decreasing values of T~( and ~ )go to 2. 2. Set
and define recursively
6; = 0 otherwise. The sets Qi, i = 1, . . . , p, obtained in this where 6; = 1 if j E Ok, way, satisfy (lO.l), (10.2). In the above example, S = {7,14,17}. If, on the other hand, one uses S = {5,12} (which is also admissible, since r 5+ T,*= 7 + 4 > lo), one obtains the disjunction (xi =0, j = 1 , . . . , 5)v(xi = 0 , j = 6 , .
. . , 12).
A disjunction of the form (10.3) can be used to partition the feasible set into p subproblems, the kth one of which is constrained by
Another way of using a disjunction of the form (10.3) is to derive cutting planes. This has been explored in the context of set covering problems [12] and found to yield good results. In particular, let A = (aii) be a 0-1 matrix and consider the set covering problem min{cxIAxse,xi=Oorl,jEN},
(SC)
where e is the vector of 1’s of appropriate dimension. For every row of A, let
Ni = { j E N 1 aii = l}. Now suppose a prime cover (a basic feasible integer solution) R is known; then zu = cX is an upper bound on the value of the optimum. If ii is any feasible
solution to the dual of the linear programming relaxation of (SC), i.e., any vector satisfying iiA zs c, ii 0, then setting T = c - iiA and T,,= zu - Lie - 1 one obtains an inequality T X S Wwhich ~ is satisfied by every integer solution better than X, and which can therefore be used to derive a disjunction of form (10.3). Suppose this is done, and a disjunction (10.3) is at hand, known to be satisfied by every feasible integer x better than the current best. Then for each i~ (1,. . . ,p } , one chooses a row h ( i ) of A, such that Nh(i)nQi is “large”-or, conversely, N h ( i ) \ Qi is “small.” Clearly (and this is true for any choice of the
E . Balm
50
indices h(i)), the disjunction (10.3) implies (10.6) which in turn implies the inequality
c xj31,
(10.7)
jeW
where
The class of cutting planes (10.7) obtained in this way was shown in [12] to include as a proper subclass the Bellmore-Ratliff inequalities [161 derived from involutory bases. An all integer algorithm which uses these cutting planes was implemented and tested with good results on set covering problems with up to 1,000 variables (see [12] for details).
References [l] J. A. Araoz Durand, Polyhedral neopolarities, Ph.D. Dissertation, University of Waterloo (November 1973). [2] E. Balas, Intersection cuts -A new type of cutting planes for integer programming, Operations Res. 19 (1971) 19-39. [3] E. Balas, Integer programming and convex analysis: Intersection cuts from outer polars, Math. Programming 2 (1972) 330-382. [4] E. Balas, Ranking the facets of the octahedron, Discrete Math. 2 (1972) 1-15. [5] E. Balas, Nonconvex quadratic programming via generalized polars, SIAM J. Appl. Math. 28 (1975) 335-349. [6] E. Balas, A constraint-activating outer polar method for pure and mixed integer 0-1 programs, in: P. L. Hammer and G. T. Zoutendijk, eds., Mathematical Programming in Theory and Practice, (North-Holland, Amsterdam, 1974 275-310. [7] E. Balas, On the use of intersection cuts in branch and bound. Paper presented at the 8th International Symposium on Mathematical Programming, Stanford, CA(August 27-31, 1973). [8] E. Balas, Intersection cuts from disjunctive constraints, MSRR No. 330, Carnegie-Mellon University, First draft (August 1973); Revised and expanded (February 1974). [9] E. Balas, Disjunctive programming: Cutting planes from logical conditions, in: 0. L. Mangasanan, R. R. Meyer and s. M. Robinson, eds., Nonlinear Programming 2 (Academic Press, New York, 1975) 279-312. [lo] E. Balas, Disjunctive programming: Properties of the convex hull of feasible points, MSSR No. 348, Carnegie-Mellon University- (Julv - 1974). [ I l l E. Balas, A Note on duality in disjunctive programming, J. Optimization Theory Appl. 15 (1977). [12] E. Balas, Set covering with cutting planes from conditional bounds, MSRR No. 399. CarnegieMellon University (July 1976). [13] E. Balas, V. J. Bowman, F. Glover and D. Sommer, An intersection cut from the dual of the unit hypercube, Operations Res. 19 (1971) 40-44. [14] E. Balas and R. Jeroslow, Strengthening cuts for mixed integer programs, MSRR No. 359, Carnegie-Mellon University (February 1975).
Disjunctive programming
51
[15] E. Balas and A. Zoltners, Intersection cuts from outer polars of truncated cubes, Naval Res. Logist. Quart. 22 (1975) 477-496. [16] M. Bellmore and H.D. Ratliff, Set covering and involutory bases, Management Sci. 18 (1971) 194-206. [17] C.E. Blair and R.G. Jeroslow, A converse for disjunctive constraints, MSRR No. 393, Carnegie-Mellon University (June 1976). [17a] J. Borwein, A strong duality theory for the minimum of a family of convex programs, Dalhousie University, Halifax (June 1978). [18] C.A. Burdet, Enumerative inequalities in integer programming, Math. Programming 2 (1972) 32-64. [19] C.A. Burdet, Polaroids: A new tool in nonconvex and integer programming, Naval Res. Logist. Quart. 20 (1973) 13-24. [20] D.R. Fulkerson, Blocking Polyhedra, in: B. Harris, ed., Graph Theory and Its Applications (Academic Press, New York, 1972) 93-112. [21] D.R. Fulkerson, Anti-blocking polyhedra, J. Combinatorial Theory 12 (1972) 50-71. [22] D.R. Fulkerson, Blocking and anti-blocking pairs of polyhedra, Math. Programming 1 (1971) 168-194. [23] F. Glover, Convexity cuts and cut search, Operations Res. 21 (1973) 123-134. [24] F. Glover, Convexity cuts for multiple choice problems, Discrete Math. 6 (1973) 221-234. [25] F. Glover, Polyhedral annexation in mixed integer programming. MSRS 73-9, University of Colorado (August 1973). Revised and published as: Polyhedral annexation in mixed integer and combinatorial programming, Math. Programming 9 (1975) 161-188. [26] F. Glover and D. Klingman, The generalized lattice point problem, Operations Res. 21 (1973) 141-156. [26a] F. Glover, D. Klingman and J. Stutz, The disjunctive facet problem: formulation and solution techniques, Operations Res. 22 (1974) 582-601. [27] J.J. Greenberg and W.P. Pierskalla, Stability theorems for infinitely constrained mathematical programs, J. Optimization Theory Appl. 16 (1975) 409-428. [28] B. Griinbaum, Convex Polytopes (J. Wiley, New York, 1967). [29] Hoang Tuy, Concave programming under linear constraints (Russian), Doklady Akademii Narck SSSR (1964). English translation in Soviet Math. Dokl. (1964) 1437-1440. [30] R.G. Jeroslow, The principles of cutting plane theory: part I, Carnegie-Mellon University (February 1974). [31] R.G. Jeroslow, Cutting planes for relaxations of integer programs, MSRR No. 347, CarnegieMellon University (July 1974). [32] R.G. Jeroslow, Cutting plane theory: Disjunctive methods, Ann. Discrete Mgth., vol. 1: Studies in Integer Programming, (1977) 293-330. [33] R.G. Jeroslow, Cutting planes for complementarity constraints, SIAM J. Control 16 (1978). [34] R.G. Jeroslow, A cutting plane game and its algorithms, CORE Discussion Paper 7724 (June 1977). [35] E.L. Johnson, The group problem for mixed integer programming, Math. Programming Study 2 (1974) 137-179. [36] G. Owen, Cutting planes for programs with disjunctive constraints, J. Optimization Theory Appl. 11 (1973) 49-55. [37] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, N.J, 1970). [38] J. Stoer and C. Witzgall, Ccnvexity and Optimization in Finite Dimensions I (Springer, Berlin, 1970). [39] J. Tind, Blocking and anti-blocking sets, Math. Programming 6 (1974) 157-166. [40] R.D. Young, Hypercylindrically deduced cuts in zero-one integer programs, Operations Res. 19 (1971) 1393-1405. [41] E. Zemel, On the facial structure of 0-1 polytopes, Ph.D. Dissertation, GSIA, Carnegie-Mellon University (May 1976). [42] P. Zwart, Intersection cuts for separable programming, Washington University, St. Louis (January 1972).
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 53-70. @ North-Holland Publishing Company
METHODS OF NONLINEAR 0-1 PROGRAMMING Pierre HANSEN Instituf d’Economie Scientifique ef de Gestion, Lille, France, and Faculfd Uniuersifaire Catholique de Mons, Belgium Much work has been recently devoted to the solution of nonlinear programs in 0-1 variables. Many methods of reduction to equivalent forms and of linearization have been proposed. Particular cases solvable in polynomial time by network flows have been detected. Algebraic and enumerative algorithms have been devised for the general case. This paper summarizes the main results of these various approaches, as well as some of the many applications.
1. Introduction A nonlinear 0-1 program is a problem of the form minimize z
= f(X),
subject to
g,(X)==O, t = 1,2,.
. , , rn
and
X EB;
(1.3)
where B,={0, 1) and the f(X) and g,(X) are applications of B; into the real numbers R, or pseudo-Boolean functions [48]. Much work has been recently devoted to the solution of such problems. Various methods of reduction to equivalent simpler forms have been devised. Particular cases, solvable in polynomial time by network flows have been detected. Algebraic and enumerative methods have been proposed for the general case and for other important NP-hard subcases. Applications in various fields are rapidly increasing in number. Assumptions are often made regarding the form of f(X) and the g,(X). Many algorithms have been devised to solve polynomial 0-1 programs, i.e. problems of the form
(1.4) subject to ‘,
a,,, h=l
n
x , s b , , t = l , 2, . . . , m
jEHh,
53
(1.5)
54
P. Hansen
in 0-1 variables, and even more to solve quadratic 0-1 programs, i.e. problems of the form
c C cijxixj,
(1.6)
aij,xixj4 b,, t = 1,2, . . . , m
(1.7)
n
minimize z =
n
i=l j=l
subject to
f
i-1 j=1
in 0-1 variables; note that cjjxjxj= cjjxjand ajj,xjxi= ajjxjare linear terms. Also cij and aij, may be assumed equal to 0 for all i > j (or for all i <j). Hyperbolic 0-1 programs, i.e. problems of the form
c cjxj do+ c djxj c,+
minimize z =
j=l n
j=l
subject to
2 a j , x j s b , , t = 1 , 2, . . . ,
m
j=1
in 0-1 variables have attracted some attention. Particular structured problems, prominent among which is the quadratic assignment problem, continue to be much studied. Nonlinear 0-1 programs may of course be viewed as particular cases of nonlinear integer programs. Algorithms for nonlinear integer or mixed-integer programming, such as those of J. Abadie [l,21, E. Balas [6,7], E.J. Kelley [67] or C. Witzgall [loo], could therefore be used to solve some nonlinear 0-1 programs which satisfy the assumptions necessary for their application. 2. Equivalent forms
2.1. Constrained and unconstrained problems As noted by T. Gaspar (see [48,p. 211) any pseudo-Boolean function may be written as a polynomial involving the same 0-1 variables. Hence, any nonlinear 0-1 program may be written as a polynomial 0-1 program with the same numbers of constraints and variables. Boolean algebraic manipulations which allow to do so are described in P.L. Hammer and S. Rudeanu’s book “Boolean Methods in Operations Research and Related Areas” [48]. G. Bar [9] and P.L. Hammer and I. Rosenberg [44] have noted that any system of linear or nonlinear constraints in 0-1 variables is equivalent to a single polynomial constraint in the same variables i.e. the arithmetic expression of the equation stating that the resolvent is equal to 0 (cf. [42]). Therefore any nonlinear
Methods of nonlinear 0-1 programming
59
0-1 program is equivalent to a polynomial 0-1 knapsack problem in the same variables. The work necessary to obtain the single constraint may, however, be as large as that required to solve the initial problem directly. Let f(X) and f(X) denote lower and upper bounds on the values of the pseudoIBoolean function f ( x ) . Any bounded nonlinear 0-1 program (i.e. any program of the form (1.1)-(1.3) with - w < f ( x ) s f ( x ) ~ f ( x ) < w and g,(X)a & ( x ) < w for t = l , 2,. . . , m ) with integer valued equality constraints may be reduced to the equivalent unconstrained nonlinear 0-1 program in the same variables. m minimize z‘ = f ( ~+)A ( g ,(x)12
c
1=l
in 0-1 variables, where A > f(x)- f ( x ) . If f(X) is a polynomial, the Lagrangean multiplier A may be taken as Gual to the sum of absolute values of the coefficients plus 1; if g,(X)5 0 for X EB,”and t = 1,2, . . . , m the squaring of the g, (X) is unnecessary in (2.1) (see [48]). Moreover, integer-ualued inequality constraints g,(X)3 0, e.g. constraints involving polynomial functions in 0-1 variables with rational coefficients suitably multiplied to take only integer values, are equivalent to equality constraints
with Y k + l , l E & and 2p*c’>&(x). Hence it is always possible to eliminate the constraints of a polynomial 0-1 program with rational coefficients; the price to pay will often be a large augmentation of the numbers of terms and variables. I. Rosenberg [86] has shown that any unconstrained polynomial 0-1 program “minimize z = f(x)subject to X E B r ’ is equivalent to an unconstrained quadratic 0-1 program. Let xi and xi denote a pair of variables appearing in terms of order greater than two; the program minimize z’ = f(xl, x2, . . . ,x,, x,+J
+ A[xixj+ (3- 2 4 - ~ X ~ ) X , , + ~ ]
(2.3)
in 0-1 variables, where A > f(X) - f ( x ) and every product xixi has been replaced by x,+~ in f(X), has the same optimal solution(s) as the original problem. If x h denotes a known solution A can be taken as equal to f(xh)-f(x). Repeated use of such substitutions yields an unconstrained quadratic 0-1 program, often with many more terms and variables than the original polynomial program. 2.2. Nonlinear 0-1 and continuous programs Linear 0-1 programs are often solved efficiently by linear-programming based branch-and-bound algorithms. When such an approach is extended to the quadratic case a difficulty arises as the continuous relaxation obtained by replacing XEB,”by XE[O, l]”will usually be a nonconvex quadratic programming problem. However P.L. Hammer and A.A. Rubin [47] have shown that to any
P. Hansen
56
quadratic 0-1 program (1.6)-( 1.7) could be associated an equivalent quadratic 0-1 program
c c ciixixi- a n
minimize z’ =
n
i=l j=1
xi(1 - xi)
(2.4)
j=1
in 0-1 variables, subject to (1.7), with a convex continuous relaxation. To this effect one must impose cii = cji for all i and j and choose a large enough for the smallest eigenvalue of the matrix C’= ( C + a I ) to be positive. J. Abadie [l]noted that an homologous result holds in the general nonlinear 0-1 case. E.L. Lawler [69] as well as R.C. Carlson and G.L. Nemhauser [171 had already used a similar technique in algorithms for some quadratic assignment problems. I. Rosenberg [84] has noted that the unconstrained minimum of a polynomial function f(X) of the form (1.4) (i.e. linear in each of its variables) is attained for one X E B at~ least. Algorithms of continuous nonlinear programming could thus be used to solve directly some nonlinear 0-1 programs, when f(X) satisfies the necessary assumptions for their application. 2.3. Linearization R. Fortet [24,25] (see also E. Balas [4,5], G. Bar [lo], L.J. Watters [97] and W. Zangwill[102]) has shown that any polynomial 0-1 program could be reduced to a linear 0-1 program by replacing every distinct product njEHhxiof 0-1 variables by a new 0-1 variable xn+h. Two new constraints
and
niEHh
xi = 1. The must be added in order to ensure that X n + h = 1 if and only if numbers of new variables and of new constraints so introduced may be high, even for small nonlinear 0-1 programs. F. Glover and E. Woolsey [32,33] have proposed several ways to economize on the number of new constraints. When a single constraint is used to impose the value 0 to several new variables, slightly more than one new constraint by distinct product of 0-1 variables must be added on average. Moreover, continuous variables Yh E [O, 11 which take automatically the values 0 or 1 in any feasible solution may be introduced instead of the 0-1 variables x,+h. The constraints
Methods of nonlinear 0-1 programming
impose the value 0 to
Yh
as soon as one xi with j E H
h
57
is equal to 0. The constraint
imposes the value 1 to Yh when all xi with jEHh are equal to 1. Finally, if Qi denotes the set of indices of the terms containing xi, constraints of the form
may be used instead of the constraints (2.7) and are much fewer. F. Glover [30] has also shown that a single continuous variable wj may replace a set of quadratic terms C;=l cjkxjxk involving the same 0-1 variable xi. Indeed, the four constraints
-
c,xi a wi a qi
(2.10)
and n
(2.11) where
c max(cjk,~) n
n
G= C
min(cik,~) and
C, =
k=l
k=l
impose that n
wi
=O
if
xi = 0 and wi =
cikxk if xi = 1. ,
k=l
When products of 0-1 and of continuous bounded variables appear in the problem to be solved, a similar linearization procedure applies also.
3. Network flows and nonlinear 0-1 programs 3.1. The quadratic case Let G = (V, U)denote a network (cf [23]) with a source q , a sink q, and a capacity matrix C'= (c!~).As shown by P.L. Hammer [37] the determination of a minimum capacity cut of G is equivalent to minimizing
in 0-1 variables, with x1 = 1 and x,, = 0. Indeed, if a variable xi E B2 is associated with each vertex vj of G and is equal to 1 or 0 according to the side of the cut on which is vj, there is a 1-1 correspondence between cuts and vectors X E B';with x1 = 1 and x, = 0; clearly (3.1) is equal to the capacity of the cut corresponding to
P. Hansen
58
X. After substitution and grouping (3.1) becomes minimize z =
j=2
ctj+
c (c~,,-c:~+c c;i)xj- c c cijxixj
n-1
n-1
j-2
i=2
n-1 n-1
i = 2 j=2
(3.2)
in 0-1 variables. As stressed by J.C. Picard and H.D. Ratliff [79], it follows that any quadratic 0-1 function of n - 2 variables
co+
c cjjxj - C c
n-1
n-1
j=2
i=2 j=2+i
n-1
cijxixj
(3.3)
with cija 0 for all i, 1= 2,3, . . . , n - 1 may be minimized with a network flow algorithm applied to a network with n vertices. Capacities cb = cij are given to the arcs joining intermediary vertices, capacities c:j= max (Zy:; cb- cjj, 0) to the arcs from the source to the intermediary vertices and capacities c;,, = max (cjj-Cy=;' c:~,0) to the arcs from the intermediary vertices to the sink; if C ~ S c ~ ;~ ~an ,arc ~ joining the source to the sink, with capacity c;, = C ~ - C ; = ctj ~ is added; otherwise C;=l ctj-c0 must be subtracted from the value of the maximum flow. Several other ways to construct a network flow problem, with directed or with both directed and undirected arcs, equivalent to the minimization of (3.3) are also proposed in [79]. J.C. Picard and H.D. Ratliff [77] have also noted that the problem of minimizing a quadratic 0-1 function without restrictions on the signs of the coefficients (which is NP-hard, as it contains, among others, the maximum independent set and the maximum cut problems, see [48, p. 218 and p. 2591) is equivalent to a minimum cut problem in a network with arcs having positive and negative capacities. No network flow algorithm is, however, available to solve such a problem. Let us call positive-negative any polynomial 0-1 function with positive coefficients in all linear terms and negative coefficients in all higher-order terms. Clearly, after fixing at 1 all xi with cjjSO,(3.3) becomes positive-negative, as the cij are assumed non-negative. A quadratic 0-1 function (3.3) with some qj < O may be equivalent to a positive-negative one. Let us first note (cf. [42]) that if
and if
cij> C max (cij,0) + C max (cji,o), i#j
ifj
xj = O in all vectors X which minimize (3.3). After fixing all such variables, let us say (3.3) is simplified. Recall a signed graph is an undirected graph to the edges of which are associated signs + or -. A signed graph is balanced if and only if the products of the signs of the edges of it's cycles are all positive. Let us associate to the quadratic terms of a function (3.3) a signed graph, a vertex corresponding to each
Methods of nonlinear 0-1 programming
59
variable, an edge with negative sign to each term with cii < O and an edge with positive sign to each term with cij>O. Then, as noted in [56], a simplified quadratic 0-1 function is equivalent to a positive-negative quadratic 0-1 function, obtained by replacing some of its variables by their complements, if and only if the associated signed graph is balanced. Let us call constraints of the form (3.4) and xj
(l-xk),
xj
(I-&)
(3.5)
binary constraints; constraints of the form (3.4) define a partial order on the variables xl, x2, . . . ,x,. J.C. Picard [75] has shown that minimizing a linear 0-1 function on a partial order is equivalent to a network flow problem. Indeed, the constraints (3.4) may be expressed as xj(l - x k ) = 0 and x k ( 1- xi) = 0 and introduced into the objective function by the Lagrangian multiplier technique of sub-section 2.1., to obtain a positive-negative quadratic 0-1 function. Constraints of the form (3.5) may be treated in the same way and, in some cases detectable as above, a positive-negative quadratic 0-1 function may be obtained by complementing some of the variables. 3.2. The general case Without explicitly introducing 0-1 variables, J. Rhys [82] and M.L. Balinski [8] have proposed a network flow technique to minimize positive-negative polynomial 0-1 functions. A bipartite graph is first constructed with vertices associated with the polynomial terms and with the linear terms respectively, and infinite capacity arcs joining the vertices from both subsets associated with terms involving the same variables. A source and a sink are added and are joined to the vertices of the first and second subsets respectively by arcs of capacity equal to the absolute values of the coefficients of the terms associated with these vertices. A maximum flow (and minimum cut) in the network so defined, is determined by a labelling algorithm; the labelled and unlabelled vertices of the second subset correspond to the x, equal to 1 and 0 respectively in the optimal solution. The optimal value of the function to be minimized is equal to the value of the maximum flow plus the sum of the (negative) coefficients of the polynomial terms. A converse result was obtained earlier by P.L. Hammer and S. Rudeanu [48]in a study of the primal-dual algorithm for the transportation problem. In that algorithm a maximum flow in a restricted network, with arcs corresponding to the reduced costs equal to zero, is sought at each iteration. That problem is equivalent to the determination of a minimum weight covering of zero cells in the reduced costs matrix, by lines weighted by the disponibilities or demands. This last problem is in turn equivalent to the minimization of a positive-negative polynomial 0-1 function.
P. Hansen
60
4. Algebraic methods The first algebraic methods for nonlinear 0-1 programming appear to be those of R. Fortet [24,25] and of P. Camion [16], the former author using the Boolean sum and the latter the sum modulo 2. As Fortet’s methods were only sketched, more elaborate algorithms have soon been proposed [45,48], the most wellknown being that of P.L. Hammer and S. Rudeanu [48]. Let us recall its principle, in the case of the minimization of an unconstrained function f(x,, x2,. . . ,x,) of n 0-1 variables. Grouping the terms where the variable x1 appears gives the expression f b l , x2,
-
* * 9
x,) = X l g l ( x 2 ,
* * * 9
x,)+ h b 2 , *
* * 7
d
(4.1) (4.2)
is a necessary condition for optimality. This condition, expressed algebraically, can be used to eliminate x 1 from (4.1). The procedure is iterated, and after all variables have been eliminated, their values are determined in decreasing order of the indices. Variants of the method allow to solve constrained problems, determine all optimal solutions or all local minima, etc, [48]. Related procedures for irredundant generation of all optimal solutions and for multicriteria problems have been devised by S. Rudeanu [88,90]. Other algebraic methods are due to G. Bar [9,10] and to I. Rosenberg [85]. The subject of this subsection is considered in more detail in P.L. Hammer’s survey [41].
5. Enumerative methods 5.1. Branching Enumerative methods for nonlinear 0-1 programming include lexicographical enumeration algorithms due to E.L. Lawler and M.D. Bell [71], J.C.T. Mao and B.A. Wallingford [72] and I. Dragan [20], branch-and-bound algorithms due to G . Berman [14], V. Ginsburgh and A. Van Peetersen [29], P. L. Hammer [39], P. L. Hammer and A. Rubin [46], P. L. Hammer and U. Peled [43], P. Hansen [52-571, F.S. Hillier [61], D.J. Laughhunn [68] and probably others, as well as an adaptation by P. Huard [62,63] of his mithode des centres for nonlinear programming. The lexicographic algorithms are equivalent to branch-and-bound algorithms with rigid, i.e. not problem-dependent, branching rules; the bounds they use are not very tight. As usual the branch-and-bound algorithms use branching to split the problem to be solved into smaller subproblems, direct tests [58] to show some subproblems need not be considered further because they cannot yield a better solution than the best one already known and conditional tests to show
Methods of nonlinear 0-1 programming
61
some variables must take the value 0 or must take the value 1 in all feasible or in all optimal solutions. Some algorithms also use relational tests to show all feasible or all optimal solutions must satisfy some logical relations between the variables. Such relations may be combined, with the methods described in [41, 42, 49, 601, to show the subproblem is infeasible or to fix some variables. Several branching rules have been proposed. A first one (cf. [39, 43, 51]), for unconstrained polynomial 0-1 programming, consists in selecting a term c,, xi with a negative coefficient and branching according to all xi= 1, for j E H,, or at least one such xi= 0. A second rule proposed by R. Jeroslow [66] consists in choosing a pair of free variables xi and x k and branching according to xi= x k or xi = 1 - x k . Yet another possibility is to branch according to xi s x k or xi= 1 and Xk = 0. The most common branching rule is to select some free variable xi,following some criteria taking cost and reduction of infeasibility into account and to branch according to x, = 1 or xi = 0.
njEHh
5.2. Bounds, penalties, upper and lower planes Many bounds have been proposed for unconstrained polynomial 0-1 programming, and have been used also to solve constrained problems by focusing on the objective function or on one constraint at a time. As ways of obtaining bounds for quadratic 0-1 programming are easily extended to more general cases [57], they only will be reviewed. A first lower bound on the values of a quadratic 0-1 function (1.6) is the sum of its negative coefficients n
g1 =
n
C C min (cij,0).
(5.1)
i = l j=1
A penalty pp or pi' is an increment which may be given to a bound when a free variable xi is fixed at 0 or 1. Penalties are used to obtain tighter bounds or to show some variables should be fixed. The following penalties may be associated to g1 [57, 591.
pp = min (0, cji) - C min (0, ci,) - C min (0, c j i ) i
(5.2)
i
and
p ; = min (0, cii)+ 1 I Q,
-t
C .
yi
C.
min ((ciil,JciiI)
i IC,I cil
min (IcjiI, ICiiI).
(5.3)
Penalties are additive [54]if the sum of the penalties associated with the fixation of several variables at 0 and several variables at 1 always constitutes a valid penalty associated with the set of these fixations. The following penalties associated with g 1 are additive.
pi' = max (0, cii)
(5.4)
P. Hansen
62
and
2 min (0,
py = -
aijcij)-
i=l#j
2 min (0, (1-
aji)cii)
(5.5)
i=l#j
where 0 S aij4 1 for all i and j . Hence, n
+ C min (pp', pf')
g2 = g1
(5.6)
i= 1
is a lower bound on the values of (1.6). Values of the aij which maximize gz may be obtained by solving a network flow problem [59] i.e. by maximizing the positive-negative 0-1 function obtained from (1.6) by suppression of the quadratic terms with positive coefficients and deducing the values of the aij from the optimal solution. More precise bounds may be obtained by taking the quadratic terms with positive coefficients into account [57, 591: let qi = p;'- pp', qi = min (0, q j ) and
rij = min (cij, -P{qi, - -Pfqi) where 0 s 0 ; s1 and
(5.7)
Cy=l Pf= 1 for all i, j = 1,2, . . . , n. Then
is a lower bound of the values of (1.6). For given qj,values of the Pf which maximize g3 may be determined by linear programming [57]: the problem of determining simultaneously the values of the aijand the Pf which maximize g3 is open. An upper plane [26] of a nonlinear 0-1 function f(X) is a linear 0-1 function dX such that dX 3f ( X )for all X E B;;a lower plane is defined in a similar way. The procedures for obtaining bounds and additive penalties described in this subsection also yield lower planes of (1.6). Indeed,
j-1
i-1
is such a lower plane; another one may be obtained from additive penalties associated with z3 [59]. The upper and lower plane concepts suggest many interesting questions as e.g.: what is the lower plane for which dX-f(X) is on average smallest or max (dX- f(X)) smallest for X E B;?
Methods of nonlinear 0-1 programming
63
6. Particular cases and applications 6.1. Hyperbolic 0-1 programs In hyperbolic 0-1 programming, the objective function (cf. (1.8)) is a ratio of linear 0-1 functions, or sometimes a sum of such ratios. P.L. Hammer and S. Rudeanu [48] appear to have been the first to study such problems; in [48] algebraic methods are proposed for both the constrained and the unconstrained case. Another approach is due to M. Florian and P. Robillard [21], who adapted the iterative method of J.R. Isbell and W.H. Marlow [65] for continuous fractional programming. Instead of solving directly the problem, a sequence of linear 0-1 programs, differing only in their objective functions, are solved. Good computational results have been reported [22]. A direct implicit enumeration algorithm, using surrogate constraints, has been developed by D. and F. Granot [34]. A more general hyperbolic 0-1 program, with a sum of ratios in the objective function has been studied by A.L. Saipe [91]. Very recently, J.C. Picard and M. Queyranne [96] have identified some unconstrained hyperbolic 0-1 programs whose objective functions are ratios of a quadratic 0-1 function and of a linear one, solvable in polynomial time by a network flow algorithm. Among the many applications of hyperbolic 0-1 programming, is the selection of investment projects to maximize the rate of return. 6.2. Quadratic knapsack problems
Quadratic knapsack problems, introduced by G . Gallo, P.L. Hammer and B. Simeone [26] may be written as follows: maximize z =
f: f: ciixix,
j=l
j=1
subject to n
C ajxj
b
j=1
in 0-1 variables, where the ci,, ai and b are non-negative real numbers. Bounds are obtained by replacing the objective function (6.1) by one of its upper planes, and solving the resulting linear knapsack problem in 0-1 or in continuous variables. Branch-and-bound algorithms based on that approach have allowed to solve problems with 70 variables. Applications include the location of new airports in order to maximize expected traffic subject to an investment constraint, the selection of sites for electronic message handling stations and testing if a graph possesses a clique of order k. 6.3. Various applications Quadratic 0-1 programs have many actual or potential applications because they allow to take explicitly into account the interactions between pairs of
64
P. Hansen
activities, to which the 0-1 variables are associated. This is of particular importance in the investment planning field as it is natural to consider both the expected return and the riskiness of the set of chosen investments. A quadratic 0-1 objective function allows to take both these criteria into account, the second one being weighted by a coefficient of risk aversion. Many authors, among which D.J. Laughhunn [68], J.C.T. Mao and B.A. Wallingford [72], D.E. Peterson and D.J. Laughhunn [74], Y. Seppala [92], M. Schechter and P.L. Hammer [93] have proposed such models for the choice of investments; the most thorough study appears to be F. Hilliers’ book [61] on “The Evaluation of Risky Interrelated Investments”. Location theory is also a rich field for applications as quadratic 0-1 models allow to take explicitly into account the flows between pairs of facilities. The many papers devoted to the quadratic assignment problem and its variants are surveyed in R. Burkard’s study [15]. Some other location applications are P.L. Hammer’s [38] pseudo-Boolean formulation of the Simple Plant Location Problem and several site selection problems in which a cost is incurred for each selected site and a benefit accrues from the selection of some subsets of sites, due to J. Rhys [82]. Applications appear also in fields more remote from Operations Research; M. Rao [Sl] has shown that several basic problems of cluster analysis could be expressed as polynomial or hyperbolic 0-1 programs. J. De Smet [18] has reduced Thurstone’s sign problem in statistics to the minimization of an unconstrained quadratic 0-1 function. S.G. Papaioannou [73] has emphasized the help pseudoBoolean programming may give to select an optimal set of tests for combinational networks. In addition a large number of optimization problems on graphs or hypergraphs can be given a nonlinear 0-1 programming formulation. This sometimes suggests resolution methods, as exemplified by the good algorithms recently proposed by J.C. Picard and M. Queyranne [76] for the arboricity and pseudoarboricity of a graph.
7. Compotational results
Recorded computational experience with nonlinear 0-1 programs is scarce and unsystematic. Some experiments with linearization and with enumerative methods have been described as well as a few comparative tests. E.M.L. Beale and J. Tomlin [13] report some success with a large linear mixed-integer code applied to linearized versions of problems with a few quadratic or higher-order terms. H. Taha [95, 961 has studied linearization of pure nonlinear 0-1 programs, following two approaches. In the first one, the additional constraints (2.5)-(2.6) are explicitly added to the linear program which is relaxed to compute bounds. In the second one, these constraints are kept implicit and used only to check the feasibility of the current partial solution. Thirteen test problems have been solved
Methods of nonlinear 0-1 programming
65
on an IBM 7040 computer by algorithms implementing both approaches and by the lexicographic algorithm of E. Lawler and M.D. Bell [71]. These test problems have 5 to 30 0-1 variables but only 15 distinct products of 0-1 variables at most. The computation times are in the ranges 0.516-21.133 seconds and 0.266-11.823 seconds with Taha’s first and second algorithm respectively and 0.616-919.383 seconds for the lexicographic algorithm (which allowed to solve 6 problems only out of 13). The same tests problems have been solved on a CDC 6400 computer in times ranging from 0.024 to 3.819 seconds with a branch-and-bound algorithm using the bounds of Section 5.2 [57]. P.L. Hammer and U. Peled [43] report some computational results for unconstrained polynomial 0-1 programming with a branch-and-bound algorithm using fixation of all the 0-1 variables of a term at 1 or negation of this as branching rule. Problems with 10 to 30 variables and 5 to 50 terms were solved in times ranging from 0.48 to 239.03 seconds on an IBM 360/50 computer. In the thesis [57], various branch-and-bound algorithms for quadratic and for nonlinear 0-1 programming have been tested and compared with the lexicographic algorithm of J.C.T. Mao and B.A. Wallingford. While this last algorithm fails to solve a problem with 20 variables in 1000 seconds on a CDC 6400 computer, a branch-and-bound algorithm which does not use additive penalties allows to solve similar problems with up to 28 variables in about 1000 seconds and another such algorithm using additive penalties allows to solve similar problems with up to 40 variables in about 700 seconds.
8. Conclusions and Suggestions (a) While methods of nonlinear 0-1 programming have been much studied recently, many avenues of research remain little explored. Only a small part of the many algorithms suggested have been programmed and tested. (b) A number of unconstrained quadratic 0-1 programs can be solved by network flow algorithms. The study of that class of problems, begun by J.C. Picard and H.D. Ratliff [77-791, is worth continuing. Network flow algorithms could also be used for computing bounds when they do not allow to solve immediately the problem at hand. (c) The class of unconstrained 0-1 programs solvable by matching algorithms should be studied also. It contains the maximum cut problem for planar graphs, as shown by F. Hadlock [35] (see also K. Aoshima and M. Iri [3]) and by M.R. Garey, D.S. Johnson and L. Stockmeyer [27]. (d) Most enumerative algorithms for quadratic or polynomial 0-1 programming consider the objective functions or one of the constraints only at a time. The efficiency of these algorithms could be enhanced through the use of surrogate constraints (cf. F. Glover [31]) or of Lagrangian relaxation (cf. A. Geoffrion [28]). (e) Many tests proposed in the enumerative algorithms are only useful under specific circumstances, e.g. when the constraints become tight. There are often
66
P. Hansen
many possible options and trade-offs between the precision of the bounds used and the computation times to obtain them. Research on heuristic rules and auxiliary tests to find out which tests should be used and when could be very useful. (f) A standard data set on real-world and artificial problems of increasing size and difficulty would allow to benchmark codes, stimulate research of more efficient ones and encourage further applications. (g) The study of some mathematical problems, interesting in their own right, could help to derive better algorithms. Linear bounds on or linear approximations of nonlinear 0-1 functions appear to be of particular interest. (h) Boolean formulations of programming problems, as e.g. those of [48], help to unify the field and may suggest algorithmic ideas or proof techniques. They could be useful to derive concise proof of NP-completeness of some combinatorial problems.
References [13 J. Abadie, Une mCthode arborescente pour les programmes non lintaires partiellement discrets, Rev. Franpaise Informat. Recherche OpCrationnelle 3 (3) (1969) 24-50. [2] J. Abadie, Une mdthode arborescente pour Ics programmes non IinCaires partiellement discrets sans hypothbse de convexit6, Rev. Francaise Informat. Recherche OpCrationnelle 5 (1) (1971) 23-28. [3] K. Aoshima and M. Iri, Comments on F. Hadlock’s paper: Finding a maximum cut of a planar graph in polynomial time, SIAM J. Comput. 6 (1977) 86-87. [4] E. Balas, Extension de I’algorithme additif A la programmation en nombres entiers et B la programmation non IinCaire, C.R. Acad. Sci. Paris, 258 (1964) 5136-5139. [5] E. Balas, An additive algorithm for solving linear programs with zero-one variables, Operations Res. 13 (1965) 517-546. [6] E. Balas, Duality in discrete programming, 11. The quadratic case, Management Sci. 16 (1969) 14-32. [7] E. Balas, A duality theorem and an algorithm for (mixed-) integer nonlinear programming, Lin. A l g . and Appl. 4 (1971) 341-352. [8] M.L. Balinski, On a selection problem, Management Sci. 17 (1970) 230-231. [9] G . Bar, Zur linearen darstelbarkeit von ausdrucken des aussgenskalkuls, EIK 8 (1972) 353-378. [lo] G . Bar, Zur linearisierung beliebiger 0-1 optimierungsprobleme, Math. Optimierungsforsch. u. Statist. 2 (1976) 181-195.
[ll] M.S. Bazaraa and J.J. Goode, A cutting-plane algorithm for the quadratic set-covering problem, Operations Res. 23 (1975) 150-158. [12] U. Brannolte, G. Schlageter und W. Bott, Pseudo-Boolesche verfahren zur losung nichtlinearen 0-1 probleme, Zeit. Oper. Res. 19 (1975) B83-B90. [13] E.M.L. Beale and J.A. Tomli, An integer programming approach to a class of combinatorial problems, Math. Programming 3 (1974) 339-344. [14] G. Berman, A branch and bound method for maximization of pseudo-Boolean functions, Faculty of Mathematics, University of Waterloo, Canada (1969). [15] R. Burkard, Assignment and travelling salesman problems, Annals of Discrete Math. 4. [16] P. Qmion, Une mCthode de rdsolution par I’algbbre de Boole des problemes combinatoires ob interviennent des entiers, Cahiers Centre Etudes Rech. Oper., 2 (1960) 234-246.
67
Methods of nonlinear 0-1 programming
[17] R.G. Carlson and G.L. Nemhauser, Scheduling to minimize interaction cost, Operations Res. 14 (1966)52-58. [18] J. de Smet, Le problbme des signes de Thurstone, Rev. Franpaise Informat. Recherche OpCrationnelle 3 (1968)57-64. [19] M. Despontin, D. Van Oudheusden et P. Hansen, Choix de mCdias par programmation quadratique en variables 0-1, Actes du colloque de I'AFCET, Aide la ddcision 1 (1974)1-8. [ZO] I. Dragan, A lexicographic algorithm for the solution of polynomial programs in binary variables (in Romanian), Studii si Cerc. Mat. 20 (1968)1135-1146. [21] M. Florian and P. Robillard, Hyperbolic programming with bivalent variables, Rev. Franpaise Informat. Recherche OpCrationnelle 5 (1)(1971).3-9. [22] M. Florian and P. Robillard, A note on hyperbolic programming with bivalent variables, Research Report, UniversitC de MontrCal (1971). [23] L.R. Ford and D.R. Fulkerson, Flows in Networks (Princeton University Press, Princeton, 1962). [24] R. Fortet, L'algbbre de Boole et ses applications en recherche opCrationnelle, Cahiers Centre Etudes Rech. Oper. 1 (1959)5-36. [25] R. Fortet, Applications de I'algbbre de Boole en recherche opCrationnelle, Rev. Franpaise Informat. Recherche OpCrationnelle, 4 (1960)17-26. [26] G. Gallo, P.L. Hammer and B. Simeone, Quadratic knapsack problems, Research Report 76-43, University of Waterloo, Waterloo, Canada (1976). [27] M.R. Garey, D.S. Johnson and L. Stockmeyer, Some simplified NP-complete graph problems, Theor. Comp. Sci., 1 (1976)237-267. [28] A. Geoffrion, Lagrangean relaxation for integer programming, Math. Progamming Study 2 (1974)82-114. [29] V. Ginsburgh and A. Van Peetersen, Un algorithme de programmation quadratique en variables binaires, Rev. Franpaise Informat. Recherche OpCrationnelle 3 (2) (1969)57-74. [30] F. Glover, Improved integer programming formulations of nonlinear integer problems, Management Sci. 22 (1975)455-460. [31] F. Glover, Surrogate constraint duality in mathematical programming, Operations Res. 23 (1975)434-451. [32] F. Glover and R.E.D. Woolsey, Further reduction of zero-one polynomial programs to zero-one linear programming problems, Operations Res. 21 (1973) 156-161. [33] F. Glover and R.E.D. Woolsey, Note on converting the 0-1 polynomial programming problem to a 0-1 linear program, Operations Res. 22 (1974)180-181. [34] D. Granot and F. Granot, On solving fractional (0-1) programs by implicit enumeration, INFOR 14 (1976)241-249. [35] F. Hadlock, Finding a maximum cut of a planar graph in polynomial time, SIAM J. Comput. 4 (1975)221-225. [36] W. Hahn, Optimierungaufgaben mit multiplicativ verknupften binaren variablen, Oper. Res. Verfahren 18 (1974)121-128. [37] P.L. Hammer, Some network flow problems solved with pseudo-boolean programming, Operations Res. 13 (1965)388-399. [38] P.L. Hammer, Plant location - A pseudo-boolean approach, Israel J. of Tech. (1968)350-362. [39] P.L. Hammer, A B.B.B. method for linear and nonlinear bivalent programming, in: B. Avi-Itzhak, ed., Developments in Operations Research (Gordon and Breach, New York, 1971) 45-82. [40] P.L. Hammer, Boolean procedures for bivalent programming, in: P.L. Hammer and G. Zoutendijk, ed., Mathematical Programming in Theory and Applications (North-Holland, Amsterdam, 1974). [41] P.L. Hammer, Boolean elements in combinatorial optimization' a survey in B. Roy ed., Combinatorial Programming: Methods and Applications (Reidel, Dordrecht, 1975) 67-92. [42] P.L. Hammer and P. Hansen, On quadratic 0-1 programming, CORE discussion paper 72-19 (1972). [43] P.L. Hammer and U. Peled, On the maximization of a pseudo-boolean function, JACM 19 (1972)265-282. ~
68
P. Hansen
[44] P.L. Hammer and 1. Rosenberg, On equivalent forms of pseudo-boolean programs, in: S. Zaremba, ed., Applications of Number Theory to Numerical Analysis (Academic Press, New York, 1972) 453-463. [45] P.L. Hammer (Ivanescu), I. Rosenberg and S. Rudeanu, On the determination of the minima of pseudo-boolean functions (in Romanian), Studii si Cerc. Mat. 14 (1963) 359-364. [46] P.L. Hammer and A.A. Rubin, Quadratic programming with 0-1 variables, Research Report, Technion, Israel (1969). [47] P.L. Hammer and A.A. Rubin, Some remarks on quadratic programming with 0-1 variables, Rev. Francaise Informat. Recherche Optrationnelle 4 (1970) 67-79. [48] P.L. Hammer and S. Rudeanu, Boolean methods in operations research and related areas (Heidelberg, Springer Verlag, 1968). [49] P.L. Hammer and N. Sang, APPOSS - A partial order in the solution space of bivalent programs, Research Report CRM - 163, Centre de Recherches MathCmatiques, UniversitC de Montrtal, Canada (1972). [SO] M. Hanan and J.M. Kurtzberg, A review of the placement and quadratic assignment problems, SIAM Rev. 12 (1972) 324-342. [513 P. Hansen, Un algorithme S.E.P. pour les programmes pseudo-bool6ens non IinCaires, Cahiers Centre Etudes Rech. Oper. 11 (1969) 26-44. [52] P. Hansen, Note sur I’extension de la mCthode d’Cnum6ration implicite aux programmes non Iintaires en variables z&o-un, Cahiers Centre Etudes Rech. Oper. 11 (1969) 162-166. [53] P. Hansen, Un algorithme pour les programmes non IinCaires en variables zCro-un, Comptes rendus Acad. Sci. Paris 270 (1970) 1700-1702. [54] P. Hansen, PCnalitCs additives pour les programmes en variables z6ro-un, Comptes rendus Acad. Sci. Paris 273 (1971) 175-177. [ 5 5 ] P. Hansen, Quadratic zero-one programming by implicit enumeration, in: F.A. Lootsma, ed., Numerical Methods in Nonlinear Optimization (Academic Press, New York, 1972) 265-278. [56] P. Hansen, Minimisation d’une fonction quadratique en variables zCro-un par I’algorithme de Ford et Fulkerson, papier prCsentC au colloque Structures Cconomiques et CconomCtrie, Lyon (1973). [57] P. Hansen, Programmes mathkmatiques en variables 0-1, Thbse d’agrkgation, UniversitC Libre de Bruxelles (1974). [58] P. Hansen, Les procCdures d’exploration et d’optimisation par sCparation et Cvaluation, in: B. Roy, ed., Combinatorial Programming: Methods and Applications (Reidel, Dordrecht, 1975) 29-65. [59] P. Hansen, Fonctions d’bvaluation et pCnalitCs pour les programmes quadratiques en variables zCro-un, in: B. Roy, ed., Combinatorial Programming: Methods and Applications (Reidel, Dordrecht, 1975) 361-370. [60] P. Hansen, A cascade algorithm for the logical closure of a set of binary relations, Infor. Proc. Letters 5 (1976) 50-54. [61] F.S. Hillier, The evaluation of risky interrelated investments (North-Holland, Amsterdam, 1969). [62] P. Huard, Resolution of mathematical programs with nonlinear constraints by the method of centers, in: J. Abadie, ed., Nonlinear Programming (Wiley, New York, 1967) 207-219. [63] P. Huard, Programmes methCmatiques non IinCaires B variables bivalentes, in: H. Kuhn, ed., Proceedings of the Princeton symposium on Mathematical Programming (Princeton University Press, Princeton, 1970). [64] Y. Inagaki and T. Fukumura, Pseudo-boolean programming with constraints (in Japanese), J. Inst. Elec. Comm. Eng. of Japan, 50 (1967) 26-34. [65] J.R. Isbell and W.H. Marlow, Attrition games, Naval Res. Logist. Quart., 3 (1956) 71-93. [66] R.G. Jeroslow, Cross-branching in bivalent programming, paper presented at the 8th International Symposium on Mathematical Programming, Stanford (1973). [67] E.J. Kelley, The cutting plane method for solving convex programs, Journal of SIAM 8 (1960) 703-712. [68] D.J. Laughhunn, Quadratic binary programming with applications to capital-budgeting problems, Operations Res. 14 (1970) 454-461.
Methods of nonlinear 0-1 programming
69
[69] E.L. Lawler, The quadratic assignment problem, Management Sci., 9 (1963) 586-599. [70] E.L. Lawler, The quadratic assignment problem: a brief survey, in: B. Roy, ed., Combinatorial Programming: Methods and Applications (Reidel, Dordrecht, 1975). [71] E.L. Lawler and M.D. Bell, A method for solving discrete optimization problems, Operations Res. 14 (1966) 1098-1112. [72] J.C.T. Mao and B.A. Wallingford, An extension of Lawler and Bell’s method of discrete optimization with examples from capital budgeting, Management Sci. 15 (1968) 51-60. [73] S.G. Papaioannou, Optimal test generation in combinational networks by pseudo-boolean programming, IEEE Trans. on Computers 26 (1977) 553-560. [74] D.E. Peterson and D.J. Laughhunn, Capital expenditure programming and some alternative approaches to risk, Management Sci. 17 (1971) 320-336. [75] J.C. Picard, Maximal closure of a graph and applications to combinatorial problems, Management Sci. 22 (1976) 1268-1272. [76] J.C. Picard and M. Queyranne, Networks, graphs and some nonlinear 0-1 programming problems, Rapport technique E P 77-R-32, Ecole Polytechnique de MontrCal (1977). [77] J.C. Picard and H.D. Ratliff, A graph-theoretic equivalence for integer programs, Operations Res. 21 (1973) 361-369. [78] J.C. Picard and H.D. Ratliff, Minimum cost cut-equivalent networks, Management Sci., 19 (1973) 1087-1092. [79] J.C. Picard and H.D. Ratliff, Minimum cuts and related problems, Networks 5 (1975) 357-370. [SO] J.C. Picard and H.D. Ratliff, A cut approach to the rectilinear facility location problem, Operations Res. 26 (1978) 422-433. [81] M.R. Rao, Cluster analysis and mathematical programming, Journal Amer. Stat. Assoc. 66 (1971) 622-626. [82] J. Rhys, a selection problem of shared fixed costs and network flows, Management Sci., 17 (1970) 200-207. [83] P. Robillard, 0-1 hyperbolic programming, Naval Res. Logist. Quart. 18 (1971) 47-58. [84] I. Rosenberg, 0-1 Optimization and nonlinear programming, Rev. Franeaise Informat. Recherche OpCrationnelle 6 (1972) 95-97. [85] I. Rosenberg, Minimization of pseudo-boolean functions by binary development, Discrete Math. 7 (1974) 151-165. [861 I. Rosenberg, Reduction of bivalent maximization to the quadratic case, Cahiers Centre Etudes Rech. Oper., 17 (1975). [87] S. Rudeanu, Axiomatization of certain problems of minimization, Studia Logica 20 (1967) 37-61. [88] S. Rudeanu, Programmation bivalente B plusieurs fonctions Cconomiques, Rev. Francaise Informat. Recherche Optrationnelle 3 (2) (1969) 13-30. [89] S. Rudeanu, Irredundant optimization of a pseudo-boolean function, Journal Opt. Th. Appl., 4 (1969) 253-259. [90] S. Rudeanu, An axiomatic approach to pseudo-boolean programming, I, Math. Vest. 7 (22) (1970) 403-414. [91] A.L. Saipe, Solving a 0-1 hyperbolic program by branch-and-bound, Naval Res. Logist. Quart., 22 (1975) 497-516. [92] Y. Seppala, Choosing among investment possibilities with stochastic payoff minus expenditure, 6perations Res., 15 (1967) 978-979. [93] M. Schecter and P.L. Hammer, A note on the dynamic planning of investment projects, European Econ. Rev. 2 (1971) 111-121. [94] H. Taha, Hyperbolic programming with bivalent variables, Technical report 71-7, Univ. of Arkansas (1971). [95] H. Taha, A Balasian-based algorithm for zero-one polynomial programming, Management Sci., 18 (1972) 328-343. 1961 H. Taha, Further improvements in the polynomial zero-one algorithm, Management Sci., 19 (1972) 226-227. [97] L.J. Watters, Reduction of integer polynomial programming problems to zero-one linear programming problems, Operations Res. 15 (1967) 1171-1 174.
70
P. Hansen
[98] H.M. Weingartner, Capital budgeting of interrelated projects, Management Sci. 12 (1966) 485-5 16. [99] H.P. Williams, Experiments in the formulation of integer programming problems, Math. Programming Study 2 (1974) 180-197. [ 1001 C. Witzgall, An all-integer programming algorithm with parabolic constraints, Journal of SIAM 1 1 (1963) 855-871. [ l o l l C. Witzgall, Mathematical methods of site selection for electronic message systems, NBS report 75-737 (1975). [102] W. Zangwill, Media selection by decision programming, Journal Advertising Res. 5 (1965) 30-36.
Annals of Discrete Mathematics 5 (1979) 71-95 @ North-Holland Publishing Company
AN INTRODUCTION TO THE THEORY OF CU‘ITING-PLANES Robert JEROSLOW Georgia Institute of Technology, Atlanta, G A 30332, USA
0. Introduction The essay is a general introduction to cutting-plane theory. A cutting-plane, relative to a constraint “ x E S” that a vector x = ( x ~.,. . , x,) lie in a set S G R , is simply a linear inequality
(CP) j=l
that is satisfied by all points x E S. Methods for obtaining cutting-planes (CP) of course vary with the nature of S. Many individuals have worked on the theory of cutting-planes, starting with the early and seminal work of Gomory [28]; to cite some, we have the papers [l,3, 9, 11, 12, 13, 17, 23, 25, 27, 34, 35, 36, 42, 46, 48, 51, 521. We will cover primarily work of the last three to five years, and omit any discussion of historical development. Here’s the plan of the paper. In Section 1 we give two principles for obtaining cutting-planes, one for S = S , = { x ~ O I A x ~ band } one for S = S , = { x a O I x integer and A x = b } . The latter principle is then adapted to the set S = S ; = {x 5 0 1 x integer and A x 5 b}. We show that the principle for S1 is a restatement of the duality theorem of Linear Programming. In Section 2, we give two metaprinciples for combining cuts obtained by the two principles, under various special circumstances. By using the principle for S = S1 together with one of the metaprinciples, we obtain the basic principle of disjunctive constraints which has an interesting converse that we mention. Then we give examples, of how to use the principles and metaprinciples, to produce. cutting-planes for a wide variety of situations. In Sections 3 and 4 we present some additional results of cutting-plane theory, which we hope will give a general idea of what kinds of theorems have been proven by developing the basic principles further. The exercises provided should be a useful supplement to the material discussed. Those readers interested in more information on cutting-plane theory are urged to follow-up this essay with the more extensive surveys [38]. 71
R . Jeroslow
72
1. Two principles for generating cuts Let (CP) be a valid cutting-plane for a constraint X E S . Then we say that r
1 r;xi 3 7T:,
(CP)'
j= I
is a weakening of (CP), or that (CP) is a strengthening of (CP)', if the following conditions hold: (1) r j 2 v j for j = 1 , . . . , r ; (2)
r(1.
Since x 20 shall always be required for all sets S considered in this essay, any weakening (CP)' of a valid cut (CP) will also be valid, as CI=, ( r ; -rj)xi 3 0 . The following principle has an obvious validity.
Principle LP. If A 3 0 , then the inequality AAx 2 Ab
and all its weakenings are valid for S
= S, = {x 3 0
1 A x 2 b}.
Example. From the constraints 2x1 - x2+4x3 2 5 ,
4 x , + 3 x , + 6 x 3 2 17,
with A
, = A, = 4, we obtain as (LPC) 3 x , + x , + 5 x 3 2 11.
(1.2)
This amounts to multiplying each row in (1.1) by 4 and then adding them together, and is clearly correct. We next show that Principle LP is closely related to the Duality Theorem of Linear Programming. Many books treat this Duality Theorem (e.g., [47]).
Theorem 1.1. I f S, f (3 then all valid cuts (CP) for S (LPC) for a suitable A 2 0 .
= S,
Proof. The dual to the linear program min
TX,
s.t.
Ax s b ,
X20, where r
= (T,.
. . . , r,.)from (CP), is the linear program
max Ab, s.t.
AA c r , A 20.
are weakenings of a cut
Theory of cutting-planes
73
Now the validity of (CP) can be equivalently restated as the fact, that the minimum of T X in (LP) is at least T() from (CP). Furthermore, (LP) is consistent by hypothesis. Hence if A" is optimal in (DLP) we have A" 3 0, A o A T,A"b 3 T(), and indeed (CP) is a weakening of (LPC) for A = A 0 . When S , = $3, then Principle LP need not always provide all the valid cuts (CP) among its weakenings. In fact, let S , = { ( x , , x 2 ) a 0 I x , 3 1 and - x I 2 0 } . Then the cuts (LPC) have the form (A, - A 2 ) x l + O * x 2 a h l for A,, A220. For this S1 all cuts (CP) are valid, yet 0 . x , - 1 x2 a 0 will not occur among the weakenings of (LPC) simply because - 1< 0. The second principle we will present in this section, for sets of the form S2 = { x 2 0 x integer and A x = b}, requires that we introduce several new ideas. A monoid M is a semi-group under addition, i.e., M satisfies the two axioms: (1) OEM; (2) If c l , u 2 ~ Mthen , u,+u,EM. A subadditive function F : M+ R is a mapping from a monoid M into the reals R which satisfies for each u , , U ~ ME the inequality
-
I
F(ui + ~ 2 ) F(uI)+ F(u2) (SUB) We also designate the jth column ( j = 1, . , . ,r) of A as a") and write A = [ a ( ; ) ] (cols). The second principle requires justification.
Principle IP. ([31, 36, 421). If F is a subadditive function on the monoid M = { u 1 u = A x for some integer vector x 2 0}, and F(0) = 0, then the inequality
c r
F(a"')xi
3 F(b).
i=l
is valid for S 2 = { x 2 0 I x is integer and A x
= b}.
Proof. We prove by induction on the sum u =I;=, xi, that for x integers, we have
20
a vector of (1.3)
j= 1
This will establish (IPC), since A x = b for x E S,. For u = 0 (1.3) becomes 0 2 F ( 0 ) = 0, which is true. To go from u to u + 1, let u + 1 = 1;I xi and, without loss of generality, x , a 1. Then using first the inductive hypothesis, and then the subadditivity of F, we have
+
~ ( a ( i ) ) x=, F ( ~ ( L )~)( a ( l ) ) ( x-, I ) + ;=I
~(a(J))x, j=2
R. Jeroslow
74
Example. The function of modular arithmetic to base 2 1 if n is odd; 0 if n is even;
is readily verified to be subadditive, and F ( 0 )= 0 . When applied to the constraints Sx,+2x,+2x,+x4=7,
x I , x 2 , x j , x 4 3 0 andinteger
(1.6)
we obtain as (IPC) the cut x, + x 4 2 1
(1.7)
which states that at least one variable with an odd coefficient in (1.6) must be non-zero if the odd number 7 on the right-hand-side in (1.6) is to be the sum. (See Exercise I.l(e) for a generalization). Analogous to Theorem 1. I for Principle LP we have the following result for Principle IP.
Theorem 1.2. If S, # @ and the matrix A has only rational quantities, then all valid cuts (CP) for S = S 2 are weakenings of a cut (IPC) for a suitable subadditive function F on M = { v I u = Ax for some integer vector x 3 0) with F ( 0 )= 0 . A rigorous proof for Theorem 1.2 is too advanced to give here, but the idea of the proof is easy to describe and is interesting in its own right. Given (CP), we put rr = ( T ,.,. . , r r r ) and define F by
F ( u ) = min {rrx I Ax
=u
and x 20 is integer},
(1.8)
for v E M, M as described in Principle IP. Incidently, functions like F in (1.8) are called value functions, since they give the optimal value of an integer program as a function of its right-hand-side u. Next, with work one shows that F ( 0 ) = 0 and that F ( v ) is finite (i.e., not -a)for all U E M. Clearly, a”’€M for j = 1,. . . , r (use unit vectors) and the fact that S, # (4 puts 6 E M, so that F is defined for all quantities needed in (IPC). Again, unit vectors give F(a”’) T, for j = I , . . . , r. Also, the validity of (CP) can be restated as the fact that min {rrx I Ax = 6 and x 3 0 is’ integer} is at least rr(,, i.e., F ( h )2 rrO. Thus, (CP) is a weakening of (IPC). The remainder of the proof is spent showing that F is subadditive on M. This is an interesting exercise that we leave to the reader (see [36]). There is a principle analogous to Principle IP for constraint sets of the form Sg = { x 2 0 I Ax 2 b and x is integer}. To state this principle, for v, w E M’, with M’ as in Principle IP’ below, we write u 2 w to abbreviate v, 2 wi for i = 1, . . . , m.
15
Theory of cuffing-planes
Call a function F monotone if 21 2
w
WON)
implies F ( u ) 2 F ( w ) .
The principle can now be stated as follows.
Principle IP’. If F is a monotone subadditive function o n the monoid M ’ = { u I u s A x for some integer vector x 0} and F ( 0 )= 0, then the inequality (IPC) and all of its weakenings are valid for the set S; = { x 2 0 I A x b and x is integer}.
*
*
To prove Principle IP’, one simply combines (1.3) and A x 3 b with monotonicity (MON). There is also a theorem corresponding to Theorem 1.2 for this Principle IP’. Its proof is similar to that of Theorem 1.2, if one observes that F defined by ( 1.8) is monotone when ‘ = ’ is replaced by ‘ 2 ’. Apart from functional characterizations, the sets S , and S&= { x 1 A x 2 b, and xi E {O, l} for j = I. . . . , r } have been extensively studied from the point of view of cutting-planes. For one surprising result on cutting-planes for Sl,, see Blair’s theorem in [9]. An unexpected result on the cutting-planes for a bounded integer program, is the theorem due to Chvatal [I31 which is closely related to Gomory’s algorithm [28]. In the terminology of this section, that theorem states that the repeated application of Principle LP plus Principle IP, with the latter used only for the subadditive functions F of modular arithmetic, provides all the valid cuttingplanes for a set Sz when Sz is finite. Exercise 1.4 is relevant here, and the interested reader may wish to consult the papers cited. We conclude with some homework problems which are designed to build up skills in subadditive function construction, and which will make the discussion of Principle IP more concrete. The solutions of selected exercises are given at the end of the exercises for Section 2. Exercises
Exercise 1.1. Show that the following functions are subadditive on their domains: (a) F ( u ) = hu for A E R” any vector. If A 3 0, F is monotone. (b) F ( u ) = max {aF,(v),bF,(u)} where a, b 5 0 and F,, F, are subadditive on a common domain (See [36]). If F, and F, are monotone, so is F. (c) F ( u ) = aF,(v)+ bF,(u) where a, b 3 0 and F,, F, are subadditive on a common domain. (d) F(u)=inf,,,{G(p)+H(v-p)}, where P is a monoid, G a subadditive function defined on P, v E K . K is a group with P G K , and H is subadditive on K . You may also suppose that F ( u ) is finite for all u E K . (e) F ( u ) , where r > 0 is real number and F,(u) is defined by the condition (for u
ER) F(u)=v-nr
provided that
n r s u < ( n + l ) r and n isan integer. (1.9)
R. Jcroslow
16
(f) F(v)=min{G(v), c } where c S 0 is a constant and G is a non-negative subadditive function.
Exercise 1.2. If F, and F2 are subadditive on a common domain, is F ( u ) = min { F , ( u ) ,F2(u)}necessarily subadditive on this domain? Exercise 1.3. Suppose that H is subadditive on its domain and that G is subadditive and monotone on the reals R. Show that F(v)= G(H(v)) is subadditive on the domain of H. Is F still necessarily subadditive if G is not monotone? Is F still necessarily subadditive if G is not monotone but H ( v )= hv? Exercise 1.4. (Gomory’s fractional row cut [28]). Suppose that x, t l , t,, t3 are nonnegative integer variables constrained by the relation x = 1.7 + 2.3(-tI) - 0.3(-t,)
+ 2(-t3).
(1.10)
Show that the cutting-plane 0.3t1+0.7t2+0
*
t320.7
(1.11)
is valid. (Hint: Use Exercise l.l(e)).
Exercise 1.5. (Compare with [17]). From (1.10) we see that not all variables t , , t, can be at zero in a solution, hence that the cutting-plane t,+
t 2 2
1
(1.12)
is valid. What subadditive function gives (1.12)? (Hint: Use Exercise 1.3, with H as in the cut for (1.11) and with G suitably chosen via reference to Exercise l.l(f).) Note that, after multiplying (1.12) by 0.7, (1.12) is a weakeningof (1.11).
Exercise 1.6. Clearly the constraints 2x, +5x2+7x3+ 11x4> 15,
x,, x, x3, x4 all zero or one
(1.13)
make it impossible that x3 = x4 = 0 (since 2 + 5 < lS), hence the cut x3+x421
(1.14)
is valid. Can (1.14) be obtained from a monotone subadditive function for general integer variables, possibly after some linear transformations on certain of the binary variables?
2. Two metaprinciples for manipulating cuts Suppose you have learned that either the inequality 2x,-3x2-7x,21s
or the inequality. XI +2x2- 5 X 3 b 28
Theory of cutting-planes
I1
is valid for the nonnegative variables x , , x2, x3 2 0, but you don't know which one is valid. Is there some one inequality which is therefore necessarily valid? Evidently, there is. Since 2 = max (2, l},2 = max {-3,2}, -5 = max {-7, -5) and 15 = min { 15,28}, the inequality
2x,+ 2x2 - 5x32 1 5
(2.3)
is a weakening of both (2.1) and (2.2). Hence, (2.3) is valid as long as at least one of (2.1), (2.2) holds. The metaprinciple which follows summarizes the idea behind our derivation of (2.3) from (2.1) and (2.2).
Metaprinciple DC. Suppose that at least one of the inequalities r
bijxi2 bio
(2.4)
j=l
is known to hold for some i = 1 , 2 , . . . , p and that the variables x = ( x l , . . . , x,) are nonnegative. Then the inequality ( m y hij)xi3m:n bi,,
(2.5)
j= I
holds, and so do all of its weakenings. By combining Principle LP with Metaprinciple DC, we obtain the following corollary.
Theorem 2.1 (Fundamental Principle of Disjunctive Constraints [3]). Suppose that at least one of the systems ( s h ) hold for some h E H, where ( s h ) is ACh)x> b'h), x20
(sh)
and A'h' is an mh X r matrix and b(h)is an mh X 1 vector. Then for any 1 x mh vectors A'h)>O of non-negative multipliers, the inequality
holds, as do any of its weakenings. In (DC), SUPhsH A(h)A'h'denotes that vector whose jth component (j = 1, . . . , r ) is SUP,,& v ; ~ )and , v ( ~=)A ( ~ ) A ( ~(The ) . sups and the inf of (DC) are assumed to exist). We note that Principle LP can also be stated as a metaprinciple, derived from the logical condition that all the inequalities A x 3 b are known to hold. Usually, (DC) represents about all one can say, when one knows only that a system among the ( s h ) holds (for a precise statement, see Theorem 2.2 below). But sometimes (DC) is relatively ineffective in representing this information.
R . Jeroslow
78
Example. Suppose that x , is a nonnegative variable and that either
-x12-l
or O . x , > l
(2.6)
holds. Then using Theorem 2.1, (DC) becomes max {-Al
*
1, A,. O}x, a m i n {-Al
1, A2 * 1)
(2.7)
or equivalently
0*
(2.7)’
2-AL
X I
as A , a0 varies. Now (2.7) is a weakening of the trivial cut 0 * x , 20,so we have entirely thrown away the information in (2.6) that 0 C x , < 1 must hold (since 0 * x I2 1 is impossible, - x I 2 -1 must hold). When does (DC) accurately represent the fact, that at least one system ( s h ) holds? The answer is given in our next results; the proof is too lengthy for inclusion here. (In (2.9), we use the set addition operation of A + B = { a + b 1 a E A , b E B}).
Theorem 2.2 [lo]. Suppose that H f Q is finite and define for h E H c h
={ X
a 0 I A‘h’X2 O},
(2.8)
let T c H be the set of indices p E H such that the system ( S , ) is consistent, and let Z=H\T. Suppose that T f Q and that for each h E I we have PET
((2.9) is vacuously satisfied i f Z = 8). Then among all the weakenings of (DC), as the multipliers A‘h’20 are varied, are all the valid cutting-planes (CP) for S = UD E T { x 3 0 I A‘p’x 2 b‘p’}. Supplementing Theorem 2.2 is the following, which treats the case T = Q not covered in Theorem 2.2.
Theorem 2.3 [ l o ] . Zf ch = (0) in (2.8) for all h E H, and H is finite, then the weakenings of (DC) include all the valid cutting-planes (CP)for S = UpeH{ x I ( S , ) holds}, with the convention that all inequalities (CP) are valid if S = Q. The set ch of (2.8) has an interesting interpretation in terms of “directions of infinity” or “directions of recession?’(see [47]). To be specific, suppose that d‘h’ is chosen so that A‘”’x
3 d(h’,
x 30
(2.10)
is consistent. Let x0 satisfy (2.10) and let X E c h . Then for any A 2 0 , A ( h ) ( X O + A ~=) A ( h ) X O + A A ( h ) ~ ~ d d ( h ) + d‘h) O=
(2.1 1 )
Theory of cutting-planes
19
and consequently x"+ A x also satisfies (2.10). This interpretation gives the following consequence of Theorem 1.3.
Corollary 2.4 [ 101. Suppose that, for each h E H, there is some choice of d'h' such that (2.10) describes a non-empty and bounded set, and suppose that H is finite. Then (DC) affords, among its weakenings, all valid cutting-planes for S = U p s ~ {I xCSp)holds). Proof. Since (2.10) is a bounded set, we must have f = 0 in (2.11). Hence c h = { O } for all h E H , and Theorem 2.3 applies. In most applications in integer programming, the hypothesis of Corollary 2.4 applies. For instance, ( s h ) may be Axsb,
x=h,
x 3 0
Wt
where h is a non-negative integer vector and { x 2 0 I A x 2 b } is non-empty and bounded. In such cases, Corollary 2.4 assures that (DC) provides at least one strengthening of any valid cut (CP). Of course, the multipliers for any equality constraints, such as x = h, are unrestricted in sign. (Represent x = h as x 2 h and - x 2 h, with multipliers (Y I 2 0 and a2a 0; then 8 = a I- a2 is indeed unrestricted in sign and is the multiplier for x = h ) . There are a number of practical problems which have the format, that at least one system ( S , ) holds, and for these problems (DC) is immediately applicable. More commonly, though, the disjunctive condition, that at least one system ( s h ) holds, is deduced from some other set of constraints. For instance, from the constraint set S ; = { x a 0 I A x 2 b and x is integer}, x = ( x l , .. . ,x r ) , we deduce that either
holds. Alternatively, one might deduce that at least one system (sh)' holds. However, since one desires to obtain simply a valid cutting plane, and the number of systems ( & ) I is usually exponential, the two systems (S,)ll or (s2)'' would be a more typical application (compare with cut (2.19) in the next example below, where the two systems derive from ''x s 1 or x 2 2."). The cutting-planes then obtained from various disjunctive systems ( & ) , deduced from the actual constraint set s, can then be used for fathoming in enumeration-based algorithms. A principle which is very different in nature from Metaprinciple DC, and which is used for cut-strengthening (CS), is as follows. Metaprinciple MCS. (Compare with [6]). Let T f g be a set and M a monoid.
R.Jeroslow
80
Suppose that the constraints A x E T + M, x 2 0 and integer
(2.12)
always imply the cutting-plane
(2.13) j=l
for any matrix A, where F , , . . . , Fr are certain (not necessarily subadditive) functions, and a"' is the jth column of A =[a"'] (cols). Then the cutting-plane
i( inf
j=l
(2.14)
~ , ( a ( i )m+)
mcM
is also valid for the constraints (2.12). be picked arbitrarily for j = 1,. . . , r. Since M is a monoid, m(j)xjE M for any integer vector x 2 0. Consequently Ax E T + M implies
Proof. Let
m ( j ) EM
i
(,(j)+&)Xj
i
=
&)xj
+
j=l
j=l
i
m(j)xj
j=l
ET+M+M=T+M
(2.15)
(the last equality since M is a monoid), which in turn implies (by the hypothesis)
(2.16) In brief, Ax E T + M implies (2.16) for any arbitrary choice of the m"', hence (2.14). Metaprinciple MCS can be used together with Theorem 2.1 to strengthen cuts obtained by Theorem 2.1, when x 3 0 is integer constrained.
Example. Suppose that a constraint (1.10) holds for non-negative integer variables x, t l , t2, tj. We weaken (1.10) by writing it as 2.3(-t,)-0.3(-t,)+2(-t,)E
T+M
(1 .lo)'
T+M
(2.17)
and in generality as ai(-tt,)+a2(--2)+
a,(-t,)
E
where T={-1.7) and M = t h e integers. Since any element of T + M is either 2 0 . 3 or s - 0 . 7 , we see that: either (-al)tl+ ( - a 2 ) t 2 + ( - - a 3 ) t 3 ~ 0 . 3
or a,tl+
Uztz+
U3t320.7
Theory of cutting-planes
81
holds. Using multipliers A ‘ ” = 1/0.3 and A(2)= 1/0.7, from Theorem 2.1 we immediately justify the cutting-plane 3
max {-aj/0.3, aj/0.7}tj5 1
(2.18)
;=I
which is one instance of the kind of cutting-plane suggested by Gomory [29] for problems with continuous variables ti a 0 . In the context of (1.10)’, (2.18) becomes 3.29tI +t,+2.86t3==1.
(2.19)
If one multiplies both sides of (2.19) by 0.7, one sees that (2.19) is a weakening of Gomory’s cut (1.11). However, (2.19) is valid for continuous ti, while (1.11) requires integral ti. To strengthen (2.18) and (2.19) when all ti 3 0 are integer, we appeal to Metaprinciple MCS to obtain the cut 7
1 min max { - (aj+ nY0.3, (aj+ n)/0.7}ti 3 1 j=1
(2.18)’
naZ
where Z is the integers. We simplify (2.18)’ as follows. We write aj in terms of its integral and fractional parts: aj = Laj] +fi,
Laj] integer and 0.t.c . < 1 .
(2.20)
For n 3 - Laj] we have max { -(aj
+ n)/0.3, (aj+ n)/0.7}= (aj+ n)/0.7 sf;/0.7
(2.21)
and for n G - Laj] - 1 we have max{-(aj+n)/0.3, (a; +n)/0.7}=-(aj +n)/0.33(1-fi)/0.3
(2.22)
Therefore (2.18)‘ is equivalent to (2.23) which becomes, in the context of (1.10)‘,
-
0.43t, + t* + 0 t3 3 1
(2.24)
Multiplying (2.24) by 0.7, we see that it agrees with (1.10) to two decimal places of accuracy (the cuts are in fact identical). From the length of this example, it is clear that the “proper way” of combining Metaprinciple MCS with Theorem 2.1 is a matter of art, and not a straightforward matter. It would be very useful to have other cut-strengthening procedures in addition to MCS, to use in connection with Theorem 2.1. In fact, very little is known about cut-strengthening procedures. This matter is important, since it has been observed in practice that the strengthening, represented by (2.14) over (2.13), is often quite significant; compare e.g., (2.24) with (2.19).
R . Jeroslow
82
To gain some more familiarity with the combination of Theorem 2.1 and MCS, we work one more example in detail. The exercises allow the reader to try his own skills; solutions are at the end of Section 3.
Example [6]. Suppose that at least one system there are "lower bounds", i.e. all the systems A ' h )2~dh',
x
(Sh)
holds, and that in addition
20
(2.25)
hold, where d h s bh is a suitable vector. Finally, suppose that x is to be an integer vector. Put
T={(v"',. . . , o('))I all v ( h ) 3 d ( h 'and at least one u ' ~ is)
2 b'h'}
(2.26a)
(2.26b) where t = [HI. Also set
(2.27)
Then the information we have can be represented as (2.28)
AXET
which we now relax to (2.29)
AxET+M.
Now note that (2.29) implies that at least one system ( S h ) holds. For if all n h = 0 in (2.26.b) then, Ax E T and at least one ( s h ) holds; while if some n h f 0, then at least one n h * 3 1 and then (&*) holds. Therefore (2.29) guarantees the validity of the cuts (DC). Let a'ih)denote the jth column of A'h' for j = 1,. . . , r and h = 1 , . . . , t. Then we write (DC) as max {A'h)a(jh'}xi 2 min ,+(h'b'h' j=i
htH
(2.30)
heH
and so the strengthened cut (2.14), for (2.30) taken as (2.13), is i=l
min [m~~{A'h'(a'ih'+nh(b'h'-d'h) ))I I'
all It,, are integer and
C j=l
nh
1
S O xi zminAhb'h'. (2.31) heH
Theory of cutting-planes
83
Exercises
Exercise 2.1 [6]. Suppose that the constraints x1=
t +gc-t
1)
- f ( - t2)- $(-t3)
+ 2(-
t,)
+
x2 = 2 + A(- t i ) + &(-t2) A(- t3)- i(- f4) x , = $ - ’ 6(-tl) - i ( - f 3 ) -&(-f4)
+$(-a
+ ++(-t5),
+ A(
-t5),
(2.32)
-3-t5)
are imposed on the binary (i.e., zero or one) variables x l , xz, x 3 which are also restrained by the “set-covering” row XI +X,+X,S
1.
(2.33)
In (2.32), all ti are integer-constrained and nonnegative. (a) Use the constraint (2.33) and the binary nature of the xi to derive a system (Si)( i = 1, 2, 3) of disjunctive constraints in the variables ti, and from this system derive the cut $ t 1 + ~ t 2 + ~ t , + ~ t 4 + ~1t s 2
(2.34)
(Hint:From (2.33) we see that either x 1 2 1 or x 2 b 1 or x 3 2 1). (b) Use the fact that the xi are non-negative to obtain lower bounds (as in (2.25)) on the disjunctive system obtained in part (a), and then apply (2.31) to obtain the strengthened cut
-it,+’
>I*
(2.35)
s t 2 +;2s t 3 +It 4 4 --It 4 S H
(Hint:If properly done, you will find that b“’- d“’ = 1 for i = 1,2,3). Exercise 2.2. Suppose that (CP) is valid for the restrictions A x E T + M , x S O integer, with T # @a set and M a monoid, and that these restrictions have a solution in non-negative integers x 2 0. Suppose also that for j = 1,. . . , r there is where , a”’ is the jth column of A. an integer Di with D i u c i ) ~ M (a) Show that riS O for j = 1,. . . , r in (CP). (b) Suppose that M=ZP, the p-dimensional integers (where A has p rows), and A is a matrix of rationals. Show that rjS O for j = 1, . . . , r. In particular, (CP) cannot be the cut (2.35) derived above. (Hint:Use part (a)). Exercise 2.3. Suppose that M is a monoid of the special form
I v = - Bz for some integer vector z 5 0 ) where z = ( z l , . . . , z s ) and B is a p by q matrix of rationals. Let M
= {v
(2.36)
b = (bl, . . . ,b,,) be a p-vector and abbreviate {b}+M by b M. Show that the valid cuts (CP) for the set
+
S = { x a 0 integer 1 A x E b + M }
(2.37)
R . Jeroslow
84
are precisely the weakenings of a cut (IPC) for a suitable subadditive function F with F(O)=O and
F(b'k')SO,
k = 1 , . . . ,q
(2.38)
provided that A is a rational and Ax E b + M is a consistent set of constraints. In (2.38), B =[b'k'] (cols). (Hint: Use Theorem 1.2). Solutions to selected exercises from Section 1
Exercise 1.1. (b) We have for u l , u2 arbitrary aFi(ui + ~ 2 S) aFi(ui)+ aFi(v2)s F F ( u i ) + F(u2), bF2(uI+ ~ 2 ) bF2(111)+ bF,(u2) F(ui)+ F(u2). Taking the maximum on the left in both inequalities, we have the desired
F(uI + u2)S F(u
+ F(u2).
(d) Let u l , u2 be arbitrary and let
E
> 0. Pick pi, p2E P such that
F ( I I ~ ) + EG(pi)+-H(ui /~~ -Pi), F ( u 2 ) + ~ / 2 3G(pz)+H(u2-~2). Then we have
F(ui + 4 s G(pi + ~ 2+) H((QI+ u ~ ) - ( P I + PJ) = G(pi + ~
S
2+)H((ui - P I ) + ( ~ 2 - p ~ ) )
G(Pd + G(P2) + W U l F(u,) + F(u2)+ E .
PI) + w u 2 - P2)
+
Since E > 0 was arbitrary, F ( u , u2)S F(u,) + F(u,); this is the desired result. (f) Let u l , u2 be arbitrary. If G(u,)>c, then since G (hence F) is non-negative, we have
Similarly, we again have F(u, + u2)S F ( u J + F(u2) if G(u2)a c. Now suppose that G(u,)and G(u2)are both
and all cases have been considered.
Theory of cutting-planes
85
Exercise 1.2. No. Take F,(u) = -u, F2(u)= u, with domain the reals. Then F ( u ) = - ( u ( ,which is not subadditive. Exercise 1.3. We have F(ui + u2) = G ( H ( u ,+ ~
2 ) )
G(H(ui)+ H(u2)) S G ( H ( u , ) ) + G ( H ( u , ) ) =F(ul)+F(02). In the above, the first s is due to the fact that G is monotone and H(vl+ u2) S W u , )+ Wv2). If G is not monotone, F need not be subadditive. For example, H(v)=lul is subadditive on the reals, as is G(u)= - u, but here F ( u ) = - lu( is not subadditive. If H ( u )= Au then F is always subadditive, whether or not G is monotone. In the above inequalities, note that the first C becomes = since H ( u l + u 2 ) = H ( u , )+ H(u2).
Exercise 1.4. By Exercise l.l(e), F(v) = u modulo 1 is a subadditive function on the reals. Now (1.10) is equivalent to x+2.3t,-0.3t2+2t3= 1.7 Applying F to this equation as in (IPC), one obtains (1.11).
Exercise 1.5. Let H ( u )= ( u modulo !)/0.3 and let G ( u )= rnin {u, 1). From Exercise l.l(f), G is subadditive for u a 0 ; it is clearly monotone. By Exercise 1.3, F ( u )= G ( H ( u ) )= min { ( u modulo 1)/0.3, 1) is subadditive, and clearly F(0) = 0. We have F(1.7) = rnin {0.7/0.3, 1) = 1, F(2.3) = rnin {0.3/0.3,1) = 1, F( - 0.3) = rnin {0.7/0.3, 1) = 1, F(2) = rnin {0/0.3, 1) = 0, F(l)= 0, and (IPC) is (1.12). Exercise 1.6. Since xI and x2 are binary variables, so are x; = 1 -x,, x $ = 1-x2. Then (1.13) is equivalent to -2x;
- 5x5 +7x,+
l l x , > 15 -7
=8
and
(1.13)’
and we need only look for a suitable “truncation,” as in Exercise l . l ( f ) , to turn 7, 11 and 8 into 1 , while turning negative numbers (like -2 and -5) into zero. We note that G(u)= max (0, u ) is subadditive by Exercise l.l(a) and (b), and G(-2) = G(-5) = 0, while G(7) = 7, G(11) = 11, G(8)= 8. Then F ( u )= rnin { G ( u ) ,1) is the desired truncation, which gives (1.14) as (IPC).
3. A result on complementarity constraints To give an idea of the results that one is led to by a further development along the lines of Theorem 2.1, we turn to “generalized linear complementarity” constraints. These take the form Ax+Bzsd,
X,
z20,
x * z=O
(GW
86
R. Jeroslow
and the problem connected with (GLC) is often that of finding a feasible solution. The adjective “generalized” refers to the fact that special properties are assumed for the matrices A and B in many studies, although we shall not do so here. Problems of the type (GLC) have arisen in game theory, economics, partial differential equations, and other areas. For some references, see [14, 15, 43, 441. To state our result, we shall consider “deductions” of a certain logical calculus, as these deductions are given in tree form. The tree shall be spread out at the “top” and narrow to one node at the “bottom.” The propositions of our logical calculus shall be linear inequalities. The propositions occurring at the top of deductions are called “assumptions.” Where a given node has several others connected to it by an edge and just above it, we require that the inequality assigned to this node “follow from” the inequalities assigned to the nodes just above, in the sense that it is the conclusion of one of the rules of deduction of the logical system and the inequalities above are the premisses of this rule. For instance, one of our rules of deduction shall be
and for the application of (LC) we require that A, 8 3 0 and that c o ~ A a 0 + 8 b , . The premisses of (LC) are a I u + * * + a,,u,, 2 a,, and also b , u + * * . + b,u, 2 b,; the conclusion of (LC) is
,
-
( ~ a , + e b , ) u , +.+(ha,, -. +eb,,)~,zc,,. The parameters of (LC) are 8, A and cg. The rule (LC) is understood (as is any rule) as “saying” that, if its premisses have already been “deduced,” one is entitled to “deduce” its conclusion. For the generic variables u l , . . . , u,, one may employ any of the variables xI, x2, . . . , x,, y I , y2,. . . , z , , . . . . When (LC) occurs in a tree, as part of it, that part looks like
where the top two nodes correspond to the premisses, and the bottom node corresponds to the conclusion. An instance of (LC), with the corresponding inequalities underlined, and A = 8 = 1, is 2x1 - xz 3 7
v
5x,+3x*2 - 2
7xl + 2x2 2 5
which is often abbreviated 2x1 - x2 3 7
5x, + 3x23 -2
\ 7x, + 2x, 5 5 /
Theory of cutting-planes
87
The second rule of deduction is described in (ii); below. For j deductions of the type
= 2,
it specifies
2 x , - 3x2+ 7 x , + 2 , + 222 - 5 z 3 2 4 , 2x1 +4x2+7x3 + 2 , - 522 - 523 2 4 2 x , - 3x2 + 7x3 + 21 - 52, - 5 2 , 4~ Since we must have x2z2= 0 in (GLC), as x . z = 2; xizj = 0, either x2 = 0 or z2 = 0. If x2 = 0, then the conclusion is equivalent to the premiss on the right above it; if z2 = 0, the conclusion is equivalent to the premiss on the left. This kind of analysis shows why rules like (ii); are valid and hence provide some of the valid cutting-planes for (GLC). It also turns out that one obtains all the valid cutting-planes this way, as our next result states.
Theorem 3.1 [39]. Zf {(x,z ) 1 A x + Bz 3 d’, x 3 0 , z s 0} is bounded and non-empty for some d’, then any valid cutting-plane for the complementarity constraints Ax+BzSd,
xSO,
220,
x-t=O,
(GLC)
is obtained by starting from the linear defining inequalities Ax+Bz2d,
x20, 220
and applying, finitely often, the following two rules (the second for j
= 1,
. . . ,r ) :
(i) Take non-negative combinations of given inequalities, and possibility weaken the right-hand side. (ii)i Having already obtained two inequalities a!
,x + . . + uxj + .+ a,x, + p ,z ,+ . + ‘2;+ + przr2 a!(), + . + U’Xi + . . + a,x, + P I + + t’Zi + + przr I
(YIXI
*
*
*
* *
a
2,
*
* * *
* * *
* * *
S a g ,
one may deduce a , x ,+-
*
a
+ ux; + . +a!,X,+ p , z , + * *
* * *
+ trzi+ - . .+ prz, 2a0.
Conversely, any inequality thus obtained is valid for the constraints (GLC). A very similar result holds if {(x,2) 1 A x + B z 2 d , x, z 2 0) is unbounded. For details on this and broader generalizations of Theorem 3.1, including its relationship to earlier results in [2] and [9], see [39]. It has been seen experimentally that often conclusions of (ii)i make redundant certain of the original inequalities among A x + Bz 2 d, in the sense that the latter could be removed since they are restorable through the use of rule (i) applied to the remaining inequalities of A x Bz 2 d and the conclusions of (ii),. Rigorous results are needed which detail these redundancies.
+
R . Jcroslow
88
When one wishes to use Theorem 3.1 in an algorithmic manner, rule (i) is accomplished by the Simplex Algorithm, and rule (iQj corresponds to adding a cut. We want to add cuts in a way which, if possible, does not increase the size of the problem, hence cuts which make some problem constraint removable are of value. Solutions for selected exercises from Section 2
Exercise 2.1. (a) By logic, either x l 2 1 or x 2 3 1 or x 3 a 1, i.e., either - i t l + g t 2 + 2 t 3 -;t4-gts
3;
or --Lf
6 1
--It6 2 --Lt6 3 +'t6 4 --It6
S 3 2
or
Itl - 2t2 +if3+ i t 4 + it, 23 8 giving b"' = 2, b'2' = $, b(3)= 2 and
(S3) t = 3.
Since always x I , x2, x 3 3 0 we similarly obtain in (2.25) that d ( ' ) = -l d','= d'3)= - $ Therefore b'h)-d'h)= 1 for h = 1, 2, 3. From (2.31), a valid cut is 67
- 2 67
(3.27) wherever the nh are integer (possibly negative) with n,+n,+n,20, and the A'h'a0 are non-negative scalars. Take, for instance, A'" = $, A'*' = 6 A(3) = 37 so that all A(h'b(h) = 1. Then with n , = n2 = n3 = 0 we obtain the usual disjunctive cut 4 7
$1 2
+'s t 2 +> st3 +'t3 4 +5f 3. s 2
(3.28)
1
which is (2.34). (b) Let us now pick ( n l , n2,n3) to improve the coefficient 3 of t l in (3.28), if possible. Using ( n , , n,, n3)= (1,0, -1) will improve the coefficient, and that coefficient becomes max {$(-$
+ I), $(-A+
$(a- 1))= -4
O),
as noted in (2.35). To improve the coefficient of (1,0, - 1) gives a coefficient of
t4,
max {$(-2+ I), !j(&+o),$(;-
which is
1))
4
in (3.28), note that (nl, n,, n,)
=
=a,
as noted in (2.35). For an algorithm that finds the best ( a , , n,, n3) in problems of this type, see [6], from which the above example is drawn.
Exercise 2.2. (a) Let xo be any solution to Ax E T + M, and let e, denote the jth
Theory of cutting-planes
89
unit vector. Then for any integer k S O ,
A(x”+kD,e,)=Ax”+kD,a“’E T+M+M= T+M, hence for any k 20, x”+ kD,e, is also a solution to the constraints. Therefore the jth component of a solution can be made arbitrarily large while keeping the other components fixed, and 7, S O must hold in a valid cut (CP). (b) Since A is a matrix of rationals, D,~”’E Z” holds if Dl is the least common denominator of the rationals in a‘”, and part (a) applies. The group problem of Gomory involves constraints of the form Ax E T+ZP,and this provides one way of proving T,3 0 for cuts based on that problem (see [26]).
Exercise 2.3. (a) Note that Ax E b + M is equivalent to Ax + B y
= b,
x, y
0 and integer
(3.27)’
and Theorem 1.2 applies directly to (3.27)’. For a weakening of a cut (IPC) to have zeroes in co-ordinate positions for the variables Yk, (2.38) is both necessary and sufficient. It is precisely such weakenings with zeroes in positions of Y h ’ s that provides cuts for the constraint set Ax E b M in terms of the given variables x .
+
4. Linear programming formulations of subadditivity Suppose that we wish to obtain all the valid cuts for S = {x 2 0 I Ax = b and x integer}. Throughout this section, we shall assume that A and b are rational, S # 8, and at first we are willing to try a “brute force” approach to this problem. One direct way of characterizing these cuts is to enumerate S by S = { x “ ’ , x(*),. . .} and to require that the cut-coefficients r ( a ( J ) satisfy ) the (possibly infinite) linear inequality system
Indeed, (4.1) simply requires that we obtain a valid cut (CP) if we set T,= ~ ( a ” ) ) and r o= m(b). The system (4.1) is one instance of what is termed the “polar set” for S (see [47]). So far writing (4.1) did not depend on the nature of S, simply the fact that S = { x ( I ) ,x(*),... .}. We now bring the integrality of the variables x 3 0 to bear and simplify (4.1) somewhat. To be specific, for each x ( ~ ) S, E we chose a “path from 0 to x ( ~ ) ” ,by which is w(q)= = 0, w(’), w(’) x ( k ) (the sequence and meant a sequence of vectors w((’) q depending on x ‘ ~ ) ) such , that for each i = 0, 1 , . . . ,q - 1 there is some unit vector eici,(depending on i) with I . . . ,
R . Jeroslow
90
For instance, if ~ ( ~ ) = 1, ( 2l), , we can choose w'") = (0, 0, O), w(I)= w(2)=
(4.3.a)
w ( ( " + e , = (1,0, O), (l)+
(4.3.b) (4.3.c)
1,0), w(3) = w(')+ el = (2, 1, O), w
w(4) = w
e2 = (1,
(4.3.d)
( ~ ) e3 + = (2,1,1)= x ( ~ ) ,
(4.3.e)
with q = 4 = 2 + 1 + 1. Having chosen such a path, we propose to replace the equation
which occurs in (4.1), by the inequalities T ( A w ( ' + 'S) )T(Aw'")+~ ( a ( j ( ~ ) ) i) = 0, . . . ,q - 1
(4-5)k
where (4.2) is assumed for each i = 0, . . . , q - 1. That is to say, we enlarge the set of variables in (4.4)k, which are only ~ ( a " ) ). ,. . , ~ ( a ( ~.rr(b), ) ) , to include a variable m(Aw(")for each i = 0, . . . , q, and then we replace (4.4)k by the system of inequalities ( 4 3 , . Any solution to ( 4 3 , gives a solution to (4.4)k when only the variables ~ ( a ' " ). ,. . , ~ ( a ( l ) T) ,( b ) are considered. In fact, we are going to leave it to the reader to prove by induction on i that (4.5)k implies r
T ( A w ( ~ ) ) G~ ( a ( j ' ) w ; ' )
( ~ ( 0=) 0).
(4.6)
j=1
Then from (4.6), by setting i = q and noting that w4 = x ( ~ ) A , W'~ ='Adk'= b, we have (4.4)k. Conversely, any solution to ( 4 4 can be expanded to a solution to (4.5)k, by setting for i = 1 , . . . , q - 1 T(Aw(") =
1~ ( a ( j ) ) w f ' ) .
(4.7)
j=l
Therefore the inequality (4.4)k and the system (4.5)k are equivalent. Now we examine ( 4 . 5 ) k from a different perspective. It has the form of a subadditivity relation T(U+Uf)~T(U)+T(Uf)
(4.8)
for = Aw(i),u' = a(i(')), u + uf= A(w'"+ e j ( i ) = ) Aw('+'). This is, in elementary form, the basic insight, that simply writing down the subadditivity relations on a linear inequality system in variables T ( U ) leads to a characterization of the valid cuts. One can always write more subadditivity relations (4.8) than the "bare
Theory of cutting-planes
minimum" required by (4.5),,since (4.9, converse settings such as
is enough to imply (4.4),,
r(a"')wi 1 w 3 0 is integer and u = Aw
91
and for the
(4.9)
j=l
insure that all the tacked-on subadditivity relations (4.8) hold for d.A formula like (4.9) is necessary here (in place of the simpler (4.7)) because of multiple paths from 0 to u. Checking that (4.9) works is technically involved, and we omit it here; but the details are in our "Algebraic Methods" paper [38]. Our treatment thus far provides a finite linear inequality system characterizing the valid cuts for S finite, via systems (4.5)k for k = 1,. . . , s with s = {x('), . . . , x'"}. For S infinite, the resulting linear inequality system will not be finite. To make it finite, further work is needed. It follows from a result of Hilbert [32], that the rationality hypotheses on A, b imply that there exist finitely many integer vectors x('), . . . , x(') and d"', . . . , d ( - ) , such that solutions x E S are precisely vectors of the form u
x = x ( ~ ) + nid"'
(4.10)
i=l
for some k = 1,. . . ,s and some non-negative integer n , , . . . , nu. Furthermore, AX',' = b, ~ ( ~0 ' for2k = 1, . . . ,s, A d"' = 0, d " ' 2 0, for i = 1, . . . ,cr, and the d"' have the property that, if Au = 0, u 3 0 then (4.11)
for suitable real scalers Ai 2 0 . The proofs of these results are in [37]. Since x given by (4.10) is always a member of S, if (CP) is a valid cut then for any integer nia 0 we have (4.12)
which implies that (4.13), j=l
since ni can be made arbitrarily large. This suggests the following way of obtaining a finite system from (4.1). We write systems (4.3, for only the "fundamental solutions" x('), . . . ,x(') of (4.10) and, to take care of any remaining solutions, we write systems (4.13), for i = 1,. . . , c.This approach does in fact work; details are in [38]. We summarize our discussion in the following result; this result is closely related to results in [27], [31], and [42], and is one of the generalizations of Gomory's linear inequality formulation of subadditivity for the group problem [27].
R . Jeroslow
92
Theorem 4.1. Suppose that A, b are rational. Then there is a finite set F of pairs of vectors (v, w ) , with v and w both non-negative integer sums of the columns a“’ of A, and there are finitely many integer vectors d“’, . . . , d‘“’ such that the following holds. For any valid cut (CP) for S = {x 3 0 I Ax = b and x integer}(Sf 8) there is a solution to the finite linear inequality system T(0) = 0, ? T ( v + w ) ~ ? T ( v ) + ? T ( w(v, ), w)EF,
0 s
2 .rr(a(;))d;i’,
i = 1 , . . . , u,
i=l
for which .rri >.rr(a”’), j = 1, . . . , r,
(4.14)
s .rr(b) hold. Conversely, given any solution to (FS), the inequality (4.15)
?T(u”’)x, 3 m(b) ,=I
and all of its weakenings are valid cuts. Furthermore, an arbitrary finite set of subadditivity relations (4.8) can be added to (FS), and our conclusions remain unchanged. (Note: ~ ( a ” )and ) r ( b ) are variables of (FS).) Theorem 4.1 can be strengthened in many directions, by making many of the subadditivity relations of (FS) into additivity relations
?T(v+ w ) = ?T(v)+?T(w).
(4.8)’
Some examples of (4.8)’ occur in the context of “minimal inequalities”; see [ 3 8 ] for details. Also, results like Theorem 4.1 can be used to provide “subadditive duals” for (IP) (see e.g., the “Algebraic Methods” paper [ 3 8 ] ) . All the previous analysis depended only on the representation formula (4.10) for X E S , which turns out to be a weaker hypothesis than requiring S = {x 3 0 1 Ax = b, x integer} for some rational A, b. In fact, a representation formula (4.10) holds for Gomory’s group problem [26], [ 2 7 ] , with the d(’) being unit .rr(a(’))dj 2 0 for i = 1 , . . . , u simplify to the vectors, so that the constraints nonnegativities .rr(a”’)aO, j = 1, . . . , r. For a proof of the representation formula (4.10) for Gomory’s group, see [ 3 7 ] . The system (FS) is generally too large for practical use unless both s and (+ are not large in (4.10). If (4.10) is literally the solution form to an integer program, usually s is very large; so in typical applications (4.10) arises from a relaxation of an integer program. One way of obtaining such relaxations from constraints Ax = b, x 2 0 integer is
xi
Theory of cutting-planes
93
via homomorphisms. These are mappings f of the monoid M of non-negative integer combinations of the columns of A = [uci)]into another abelian semi-group which satisfy f(u + w ) = f ( u ) + f ( w ) for all u, w E M
(4.16)
The solution form (4.10) used then corresponds to the relaxed constraints (4.17) and various results insure the existence of (4.10) in most cases. Gomory’s group problem [26] arises through such a homomorphism. We need useful problem relaxations which yield reasonable-size problems (FS). So far, it is only from the systems (FS) that we have obtained cuts which are quite unrelated to branch-and-bound. Examples have been given in which cuts from (FS) (specifically, Gomory’s fractional row cut) immediately solve integer programs that cause exponential growth of search trees for many branch-and-bound approaches (see [40]).
Acknowledgements. We would like to thank C. E. Blair and G. L. Nemhauser for comments which were useful in preparing these notes.
References [ I ] A.A. Araoz (Durand), Polyhedral neopolarities, Ph.D. Dissertation, University of Waterloo, Canada (November 1973). [2] E. Balas, Disjunctive programming: Facets of the convex hull of feasible points, MSRR no. 348, Carnegie-Mellon University (July 1974). [3] E. Balas, Disjunctive programming: Cutting-planes from logical conditions, talk given at SIGMAP-UW Conference, April 1974. Published in: O.L. Mangasarian, R.R. Meyer and S.M. Robinson. eds.. Nonlinear Programming 2 (Academic Press, 1975). [4] E. Balas, Intersection cuts - a new type of cutting-plane for integer programming, Operations Res. 19 (1971) 19-30. [S] E. Balas, Integer programming and convex analysis: Intersection cuts from outer polars, Math. Programming 2 (1972) 330-382. [6] E. Balas and R. Jeroslow, Strengthening cuts for mixed integer programs, MSRR no. 359, GSIA, Carnegie-Mellon University (February 1975). [7] D.E. Bell and J.F. Shapiro, A finitely convergent duality theory for zero-ne integer programming, Research Memorandum 75-33, IIASA (July 1975). [8] C.E. Blair, Topics in integer programming, Ph.D. Dissertation, Carnegie-Mellon University (April 1975) 27pp. [9] C.E. Blair, Two rules for deducing valid inequalities for 0-1 problems, SIAM J. Appl. Math. 31 (1976) 614-617. [lo] C.E. Blair and R.G. Jeroslow, A converse for disjunctive constraints, MSRR no. 393, GSIA, Carnegie-Mellon University (June 1976), to appear in J. Optimization Theory Appl.
R . Jeroslow C.-A. Burdet, Enumerative inequalities in integer programming, Math. Programming 2 (1972) 32-64. W. Candler and R.J. Townsley, The maximization of a quadratic function of variables subject to linear inequalities, Management Sci. 10 (1964) 5 15-523. V. Chvital, Edmonds polytopes and a hierarchy of combinatorial problems, Discrete Math. 4 (1973) 305-337. R.W. Cottle. Complementarity and variational problems, Tech. rep. SOL 74-6, May 1974, Systems Optimization Laboratory, Stanford University. R.W. Cottle and G.B. Dantzig, Complementary pivot theory of mathematical programming, Linear Algebra and Appl. 1 (1968) 103-125. G.B. Dantzig, Discrete variable extremum problems, Operations Res. 5 (1957) 266-277. G.B. Dantzig, Notes on solving linear programs in integers, Naval Res. Logist. Quart. 6 (1959) 75-76. B.C. Eaves, The linear complementarity problem, Managcment Sci. 17 (1971) 612-634. M.L. Fisher and J.F. Shapiro, Constructive duality in integer programming, SIAM J. Appl. Math. 27 (1974) 31-52. R.S. Garfinkel and G.L. Nemhauser, Integer Programming, (Wiley, New York, 1972) 390 pp. A.M. Geoffrion, Lagrangean relaxation for integer programming, Mathematical Programming Study 2 (December 1974) 82-1 14. A.M. Geoffrion and R.E. Marsten, Integer programming algorithms: A framework and state-ofthe-art survey, Management Sci. I X (1972) 465-491. F. Glover, Convexity cuts and cut search, Operations Res. 21 (1973) 123-134. F. Glover, Polyhedral annexation in mixed integer and combinatorial programming, Math. Programming 9 (1975) 161-188. F. Glover and D. Klingman, The generalized lattice-point problem, Operations Res. 21 (1973) 135- 141. R.E. Gomory, On the relation between integer and non-integer solutions to linear programs. Proc. Nat. Acad. Sci. 53 (1965) 260-265. R.E. Gomory, Some polyhedra related to combinatorial problems, Linear Algebra and Appl. 2 (1969) 451-558. R.E. Gomory, An algorithm for integer solutions to linear programs, in: Graves and Wolfe, eds., Recent Advances in Mathematical Programming ( 1963) 269-302. R.E. Gomory, An algorithm for the mixed integer problem, RM-2597, RAND Corporation (1960). R.E. Gomory, Faces of an integer polyhedron, Proc. Nat. Acad. Sci. 57 (1967) 16-18. R.E. Gomory and E.L. Johnson, Some continuous functions related to corner polyhedra I, 11, Math. Programming 3 (1972) 23-85, 359-389. D. Hilbert, Uber die theorie der algebraischen formen, Math. Ann. 36 (1890) 475-534. P.L. Hammer and S. Rudeanu, Pseudo-boolean programming, Operations Res. 17 (1969) 233-264. A. Ho, Cutting-planes for disjunctive programs: Balas' aggregated problem, GSIA, CarnegieMellon University (in preparation). Hoang Tuy, Concave programming under linear constraints, (in Russian) Dok. Akad. Nauk SSR (1964), English translation in: Soviet Mathematics (1964) 1437-1440. R. Jeroslow, Cutting-planes for relaxations of integer programs, MSRR no. 347, CarnegieMellon University (July 1974). R. Jeroslow, Some basis theorems for integral monoids, Math. of Operations Res. 3 (1978) 145-1 54. R. Jerosiow, Cutting-plane theory: Disjunctive methods, Ann. Discrete Math. 1 (1977) 293-330. Also: Cutting-plane theory: Algebraic methods (March 1976), to appear in Discrete Math. R. Jeroslow, Cutting-planes for complementarity constraints, MSRR no. 394, GSIA, CarnegieMellon Universitv (June 1976). A revised and abbreviated version has ameared in SIAM J. .. Control and Optimization 16 (1978) 56-62. [40] R. Jeroslow, Trivial integer programs unsolvable by branch-and-bound, Math. Programming 6 (1974) 105-109.
Theory of cutting-planes
95
[4 I] E.L. Johnson, Integer programs with continuous variables (July 1974). [42] E.L. Johnson, The group problem for mixed integer programming, Math. Programming Study 2 (December 1974) 137-179. [43] C.E. Lemke, Bimatrix equilibrium points and mathematical programming, Management Sci. 11 ( 1965) 68 1-689. [44] C.E. Lemke and J.T. Howson, Equilibrium points of bimatrix games, Journal of SOC.for Ind. and Appl. Math. 12 (1964) 413-423. [4S] R.R. Meyer, On the existence of optimal solutions to integer and mixed-integer programming problems, Math. Programming 7 (1974) 223-23s. [46] G . Owen, Cutting-planes for programs with disjunctive constraints, J. Optimization Theory Appl. 1 I (1973) 49-55. [47] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, N.J., 1970). [4X] D.S. Rubin and G. Links, The asymptotic algorithm as cut generator: Some computational results, Paper presented at the 41st National Meeting of ORSA, New Orleans (April 1972). [49] J.F. Shapiro, Generalized Lagrange multipliers in integer programming, Operations Res. 19 (1971) 68-76. [SO] L.A. Wolsey, Group theoretic results in mixed integer programming, Operations Res. 19 (1971) 1691- 1697. [S 11 R.D. Young, Hypercylindrically-deduced cuts in zero-one integer programs, Operations Res. 19 (1971) 1393-1405. [S2] P. Zwart, Intersection cuts for separable programming, Washington University, St. Louis (January 1972).
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 97-112 @ North-Holland Publishing Company
ON THE GROUP PROBLEM AND A SUBADDITIVE APPROACH TO INTEGER PROGRAMMING Ellis L. JOHNSON* IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, U.S.A. The study of Gomory’s group problem has led to a subadditive approach to integer programming. In this paper, we trace that development using the cyclic group problem and knapsack problem as prototypes. The asymptotic theorem of Gomory is also discussed. Finally, an algorithm giving a constructive proof of a subadditive dual problem for the knapsack problem is presented.
1. Introduction The work of Gilmore and Gomory [4] on the knapsack problem began a line of development centered around facets of a group problem. Gomory [6,7] generalized to the pure integer problem and related this development to earlier work on the fractional cutting planes, which could be generated from a row of a tableau. H e gave a subadditive characterization for the facets of his group problem. That characterization is shown here for the cyclic group problem. The results for the group problem can be applied directly to integer programming problems. Following Araoz [I], we apply it to a blocking framework [3] and show a facet description, a dual problem, and an algorithm based on the dual problem for the knapsack covering problem.
2. The cyclic group problem The cyclic group problem arose in two different contexts as relaxations of integer programs. The first was in Gomory’s [ 5 ] original cutting plane work, and the second in the work on the knapsack problem [4] by Gilmore and Gomory. We begin by reviewing how a fractional cutting plane is derived from an updated row of a linear program. Any such row is an equation of the form: Xk
+ 2 aixi= p, jeN
where xk is the basic variable for that row and N is the set of non-basic indices. *The work of this author was supported (in part) by the U.S. Air Force Contract number F49620-
77-C-0014. 97
E.L. Johnson
98
P
The ai and
are real numbers. The fractional cut of Gomory is the inequality
where @(u)= u - La J for any real number u and where La] is the largest integer less than or equal to a. It is easily shown that any xi, EN, satisfying (2.1) for integer xk and xi3 0 and integer, j E N,
(2.3)
satisfies (2.2) because
so
Since 9 ( a j ) 3 0and xi3 0 , clearly 1 S(ai)xja 0. Since xk, xi, and Lai] are all integers, the right-hand side above is a real number whose fractional part is S(8); that is, it is one of *
-
*
@(P)-2, @(PI-1, S(P),S ( P ) + l ,9 ( P ) + 2 , . . *
Moreover, it cannot be negative because one of
S(P), @(P)+l, @(P)+2, * . *
*
1 F(ai)xi is non-negative.
Hence, it is
7
from which (2.2) follows. The set of cuts which Gomory originally obtained can be derived as follows. If we multiply (2.1) through by an integer h, then the same proof just given shows that
1@(hCxi)Xj
3 @(h@
jeN
is an inequality satisfied by all solutions to (2.1) and (2.3).The largest number of different inequalities one can obtain in this way is one less than the least common denominator D for the numbers ai, j E N. In order to show why this fact holds, we introduce the cyclic group problem [9]. The cyclic group problem is to minimize z =
cixisubject to ieN
aixj= 0 (modulo l), isN
and xi 3 0 and integer for j E N.
The group problem
99
The constraint (2.5) is derived from (2.1) by relaxing X k a O but keeping xk integer. Then, xk is the difference between the right-hand side and left-hand side of (2.5). Since the argument showing the validity of (2.2) does not use xk 2 0 , only xk integer, any xi, j e N , satisfying (2.5) and (2.6) will satisfy (2.2). An equivalent way to write (2,5) is
19 ( a j ) x j= $(p) (modulo l), ;EN
since the part deleted is integer and hence is congruent to zero modulo 1. Now, let $(aj)= p j / D , pi integer,
S ( p ) = q/D, q integer, where D is the least common denominator of $(ai),j E N . We may as well assume that D is also a denominator of $(p) since otherwise there is no solution to (2.5) even if xi could be negative. Then, (2.5) is equivalent to
1pixj =q (modulo D ) .
(2.7)
;EN
If x, satisfies (2.7) then it satisfies hpixi = hq (modulo D ) ;EN
for any integer h. However, there are at most D - 1 different congruences of the form (2.8) and at most D - 1 corresponding fractional cuts. The bound on the number of different congruences can be seen by observing that ( h + D ) p j = hp, (modulo D ) ,
and
( h +D)q= hq (modulo D ) . In fact, we may as well take h = 1, 2, . . . , D - 1. When h = O (modulo D ) , (2.8) gives O = O (modulo 0 ) and the corresponding fractional cut is 0 2 0 . For any other h we can add or subtract D enough times to obtain 1s h’= h + yD< D - 1. The D - 1 different cuts are related to results in Section 2D of [S]. We turn to the other place where this cyclic group problem arose; namely, as a relaxation of the knapsack problem. The knapsack problem is to maximize z
=
1rjxj subject to i=l
(2.9)
n
aix, s b, ,=l
(2.10)
E.L. Johnson
100
and xi 2 0 and integer,
(2.11)
where the coefficients ai and the right-hand side b are positive integers. Assuming that r J a , 3 ri/aj, j = 2, . . . ,n, and adjoining a slack variable s = b -1 aixi, we obtain from (2.9) and (2.10) (2.12) and
c, a .
1
lxi+-s=-.
XI+
b
(2.13)
j=2a1
Relaxing x, 2 0 , we obtain from (2.13) and (2.11)
"ai -xi
+-1 s =-b
j=2a1
a1
a,
(modulo l),
(2.14)
and x2, . . . ,x, and s 3 0 and integer.
(2.15)
Gilmore and Gomory [4] showed that once b is large enough, there are optimum solutions with only a, different values of x2, . . . , x,, and s. We state their result.
Theorem 2.1[4]. If b > a , x max {a2,. . . , a,, 1) for the knapsack problem (2.9), (2.10), and (2.11), then an optimum solution can be obtained by minimizing 2 given by (2.12) subject to (2.14) and (2.15) and, then, determining x1 from (2.13). In particular, there are only a, different values of bla, (modulo l), so there are only a, different values of x2,. . . , x,, s needed in optimum solutions when b is large enough. This theorem is the prototype of the asymptotic theorem in Section 4. We have defined the cyclic group problem (2.4), (2.5), and (2.6) and seen how it is related to fractional cuts and the knapsack problem. In studying the cyclic group problem, it is helpful to focus on a special case called the master cyclic group problem defined by D-1
minimize
cjti subject to
(2.16)
j=l D-1
C itj =j o (modulo D ) ,
(2.17)
j=l
and ti 3 0 and integer, j = 1,. . . , D - 1.
(2.18)
The cyclic group constraint (2.5) was shown to be equivalent to (2.7), and each pi in (2.7) satisfies 0 S pi C D - 1. If two or more of the pi's are equal, then the one with least cost should be retained and the others can be set to zero. If any pi = 0, it can be deleted from the problem provided ci 30. Thus, any cyclic group
The group problem
101
problem can be brought to the form: (2.19)
minimize c c i t i subject to jCJ
iti =j o (modulo D ) ,
(2.20)
jeJ
and ti
* 0 and integer for j E J,
(2.21)
where J c ( 1 , . . . ,D - l}. In general, there may be no solution to the problem because j o need not be in J and could be outside of the group generated by J. We also remark that if any c, < O , then there is no lower bound on the objective, so cJ2 0 is assumed. For any cyclic group problem, define Gomory's corner polyhedron to be the convex hull of the vectors (t,, j E J) satisfying (2.20) and (2.21). Let us denote this polyhedron by PD(J,jo). For the corner polyhedron of the master cyclic group problem, J = (1, . . . , D - l}, and we denote it by PD(jo).The next three theorems are all due to Gomory and are concerned with non-homogeneous facets of corner polyhedra. Let us first remark on the shape of these corner polyhedra. For a particular J and D, if t satisfies (2.20) and (2.21), then adding any positive multiple of D to any component of t gives a t' which also satisfies (2.20) and (2.21). Thus, if a vector t E PD(J,jo), then t + y E PD(J,j o ) for any y 0. Hence, every facet
*
c
T,t, a T o ,
IEJ
except tJ 2 0 , has T,>O and q a 0 . Let these facets be scaled so that mTT0 = 1. Characterizing these facets is a principal purpose of the remainder of this section. To state the next theorem, we define an irreducible vector t,, j E J, in PD(J,jo) to be a vector satisfying (2.20) and (2.21) and such that there do not exist different integer vectors (s,, j E J) and (r,, j E T ) satisfying 0 S s,, rJ6 t,, all j E J, and C,, js, =
1, ir,. EJ
Theorem Z.x7]. The inequality c7rjta i1
(2.22)
i d
is a facet of the comer polyhedron PD(J, j o ) for the cyclic group problem (2.20) and (2.21), i f and only if ri,j E D, is a vertex of the polyhedron CTjtjsl, i = l , . . . , m,
(2.23)
;EJ
Tia0,
where {(ti, j
E
(2.24)
J) 1 i = 1, . . . , m} is the finite set of irreducible vectors in PD(J,jo).
E.L. Johnson
102
For a proof of this theorem, see [7],Theorem 1 and Remark 1. The definition of irreducible used there is weaker than in [8], and we use the stronger definition here. We remark that only a finite number of inequalities (2.23) are needed. We also remark that this theorem identifies what Fulkerson [3] defines as a blocking pair of polyhedra: the vertices of the corner polyhedron are among the irreducible vectors and the facets of the corner polyhedron are the vertices of (2.23) and (2.24). Note that if r j ,j E D , satisfies (2.23) and (2.24) and if p , a O , j E J , then r j + p i , j g J , also satisfies (2.23) and (2.24). The next two theorems characterize the facets in a more precise way. At first reading, they may appear to be a restatement of the previous theorem. However, the next two theorems characterize the convex hull of the facets, whereas (2.23) and (2.24) give a polydedron whose extreme points are facets, but the polyhedron is not the convex hull of those extreme points because the polyhedron is unbounded; in particular, its recession cone is the non-negative orthant.
Theorem 2.3[8]. The inequality D-1
1Titi a 1
(2.25)
j=1
is a facet of Gomory's corner polyhedron P D ( j o ) for the master cyclic group problem i f and only i f ( r l ,... , rD-J is a vertex of the polytope defined by ri>O,
j = 1 , ..., D-1,
(2.26)
+ rk, i f j + k = 1 (modulo 01, 1= ri+ rk, if j + k =j o (modulo D ) ,
(2.28)
Ti" =1.
(2.29)
r l G ri
(2.27)
Furthermore, every (r,,. . . , rDP1) in this polytope is a convex combination of facets of the comer polyhedron. The last sentence follows from the fact that the solution set of (2.26)-(2.29) is bounded, i.e. a polytope. That it is bounded follows from (2.26) and (2.28) which together imply 0 S mi G 1, j = 1,. . . ,D - 1. In (2.26)-(2.29), we assume none of j , k, 1, or j o is congruent to zero modulo D. Theorem 2.3 is for the master cyclic group problem, but applies to the general cyclic group problem as stated below.
Theorem 2.4[8]. If the inequality (2.30)
is a facet of Gomory's corner polyhedron PD(J,jo) for the cyclic group problem, then is a vertex of the there exists ri,j E {l, . . . ,D - l}-J, such that ( r l ,. . . ,rD--l) polytope defined by (2.26)-(2.29).
The group problem
103
Proof. The corner polyhedron PD(J,jo) is a face of the corner polyhedron PD(jo) for the master cyclic group problem. This face is obtained by setting ti = 0, j $ J . Hence, all of its facets are among the inequalities (2.25) obtained by simply leaving out j # J in the summation. Thus, the theorem follows. Theorem 2.5. A dual problem to the cyclic group problem (2.19)-(2.21) is : maximize IT,, subject to (2.31) .rr,aO, j = 1 , ..., D-1,
+
.rri + ri, for all i, j , k such that k = i j (modulo D),
T kS
+
rj0 = ri nk, IT,
(2.32)
for all j , k such that j o = j
+k
(modulo)D),
s c,, j E J,
ccj
(2.34) (2.35)
in the sense that for any (IT,, . . . IT^-^) satisfying (2.32)-(2.35) and satisfying (2.20) and (2.21), we have Ti,
(2.33)
ti,
i E J,
(2.36)
tj
j€J
and equality holds for optimum values of (2.19) and (2.31). If ti, j E J , is optimum, then, for an optimum v 1 , .. . , rD-,, ti > 0 implies r j= c,
ti > O and k =
xi€.,jt; for 0 s
(2.37) t ; S ti, j #
1, and 0 s ti< ti
(2.38)
implies ?r, = .rri + IT, for i given by k = i + j (modulo D ) .
Proo The weak duality inequality (2.36) follows from (2.33), (2.35), and 'emma 2.6 below. By that lemma, if (2.20) and (2.21) hold, then "j"
c
S
Tit,.
j€J
by (2.35) and t i a O , (2.36) follows.
Lemma 2.6. I f k=
ti 3 0 , j E J,
c itj (modulo D )
jSJ
and
. . , rD-Jsatisfies (2.32) and
(IT,,.
rk d
2
(2.33) then
rjfj.
jEJ
Proof. See the proof of Theorem 1.5 of [9]. In any case, the proof is an easy induction on ti.
x
E L . Johnson
104
The equality of objectives at the optimum can be proven from Theorem 2.4 and was proven algorithmically by Johnson [lo]. To prove it from Theorem 2.4, consider the linear program minimize
1citjsubject to
(2.39)
j€J
(2.40)
ti 3 0,
and 1, for all facets (T;, j~ J ) , k = 1,.. . ,K of PD(J, j o ) .
X.rr;tj
(2.41)
j€J
By taking all facets, we assure that the linear program has an integer optimum satisfying (2.20). Consider the dual linear program: K
maximize
C h k subject to
(2.42)
k=l
(2.43)
hk 3 0 ,
and
c K
(2.44)
hkTr 4 cj, j E J .
k=l
Let
TT=
K XhkT;,
j=1,
..., D-1.
k=l
By Theorem 2.4, (T:,. . . , satisfy (2.26)-(2.29) for all k and by h k 3 0 so does ( ~ 7 ,. . ,T Z - ~ ) .By (2.44), for j E J. Hence, (2.32)-(2.35) hold. By (2.29), T;"=1 SO
TT<~
By the linear programming duality theorem, there are hk such that K
TE=
2 hk = k=l
cCjtj jcJ
for an optimum (ti. j E J ) subject to (2.40) and (2.41) or, equivalently, (2.20) and (2.21). Thus, this duality result is equivalent to a facet characterization plus linear programming duality.
3. Proof of the subadditive characterization of facets The line of proof to be given here was first used by Gomory and Johnson [9] and has since been found to be more general than its original use.
105
The group problem
Define a valid inequality to be an inequality
c Tjtj
D-1
31
(3.1)
i=l
satisfied by all ( t l , . . . , tD-J for which
ti 3 0 ,
(3.2)
and D-1
1itj =j o (modulo D).
(3.3)
j=1
The valid inequalities form a polyhedral set characterized in Theorem 2.2 by the finite inequality system (2.23) and (2.24). Every vertex of this polyhedron is a facet of the corner polyhedron PD(jo). Our objective here is to prove Theorem 2.3 characterizing the convex hull of the facets of PD(j,) as the subadditively constrained polytope (2.26)-(2.29). Define a subadditive valid inequality to be a valid inequality such that T k
+
6 ri r j ,
for all i + j = k (modulo D).
(3.4)
Lemma 3.1. The subadditive valid inequalities are the solutions to the inequality system (3.4), r j3 0 , and T j " 31.
(3.5)
Proof. This lemma is essentially a restatement of Lemma 2.6 and its proof is the same elementary induction. Define a minimal valid inequality to be a valid inequality such that if for all j , with strict ineqality for at least one j , then
TI
s rri,
D-1
1 Titj
31
j=1
is not a valid inequality.
Theorem 3.2. The minimal valid inequalities are subadditive valid inequalities. Proof. Suppose not. Then, some ( T ~ , .. . ,T ~ - is~ minimal ) but there exists i, j , and k such that T k > ri+ nj and k = i + j (modulo D ) . Theorem 2.2 characterized the valid inequalities by the finite inequality system (2.23) and (2.24). In order for (ml, . . . ,T ~ - to ~ be ) minimal in that polyhedron, it must be true that for any k there is some inequality (2.23) with t k > 0 and for which the inequality holds with equality; that is D-1
1T1tl= 1,
I= 1
for tk 3 I.
106
Now, let
E.L. Johnson
I
tI,
1 # i, j , k,
t’= t l + l , l = i or j , ti-1,
1 = k.
Then, ( t ; , . . . , thpl) satisfies (2.17) and (2.18), but
I=1
I=1
contracting ( r l.,. . ,
T ~ - being ~ )
a valid inequality.
Theorem 3.3. The extreme points of the polyhedron of valid inequalities are minimal.
Proof. Suppose not. Then some extreme rl,.. . , T ~ is- not ~ minimal so there exists a valid inequality p l , . . . ,pD-l with pi 6 ri for all j and P k < T k for at least . . . ,u D - 1 ) is a valid inequality. one k. Let ui = + ( T-~pi). Then, ui 3 so (ul, Now, v,. =2pi I +’,?ai,j = 1 , . . . , D-1, where each of p, cr is a valid inequality, contradicting extremality of
T.
Theorem 3.4. If a minimal valid inequality T can be represented as a mid-point of two other valid inequalities, T = $ T ~ + ; T then ~, both T ’ and IT’ must also be minimal.
Proof. Suppose not; say, T~ is not minimal. Then, there exists a valid inequality p 1 = (pt, . . . ,PA-,) such that pj’ G T; for all j with at least one strict inequality. Now, let pi = Ip; +&r;, j = 1 , . . . ,D - 1 . Then, pl, . . . ,p ~ is- a ~valid inequality because both p1 and r 2 are. Also, pi =sri since pi’ G T,?. But T was assumed minimal, and a contradiction is reached.
Theorem 3.5. The minimal valid inequalities (2.26)-(2.29).
T
are precisely those
T
satisfying
Proof. It should be clear from (2.28) and (2.29) and the fact that for every j there is a k =j o - j such that j + k =j o (modulo D ) that IT satisfying (2.26)-(2.29) must be minimal provided it is valid. But it is valid by Lemma 3.1 which says that T satisfying (2.26), (2.27), and (2.29) is a subadditive valid inequality. To prove the other direction, suppose T is a minimal valid inequality. Then by 1 hold. We need to show that mi0 = 1 and Theorem 3.2, (2.26), (2.27, and ri03 ri+ r k = 1 if j + k = jo (modulo D). The proofs in the two cases are similar, and
The group problem
107
the second will be treated. Suppose for some j = 1 , . . . ,D - 1, r j+ rk > 1 for k =j o - j (modulo D). Clearly, one of r j ,T k must be positive; say r j> 0. Consider any (tl, . . . ,t D - 1 ) satisfying (2.17) and (2.18) and ti 3 1. Then D-1
D-I
I=1
I=1
#i
+
2 rj T k ,
by lemma 2.6,
> 1, by assumption. In [9], a direct proof of a contradiction is reached. However, we can appeal to Theorem 2.2, which says that only a finite number of inequalities of the form (2.23) and (2.24) are needed to define the polyhedron of valid inequalities. Every such inequality with a positive coefficient for r j is slack. Hence, r j can be reduced by a positive amount and ( r l ,... , rD-Jwill still be valid, contradicting minimality. We are now ready for the main result of this section: the proof of Theorem 2.3.
Proof of Theorem 2.3. We can restate this result as being that the extreme points of the minimal valid inequalities are facets of PD(jo). By Theorem 3.3, one direction is obvious; if r is extreme then it is minimal by Theorem 3.3, and it must be extreme among the minimal valid inequalities because it is extreme among the valid inequalities. Conversely, suppose r is extreme among the minimal valid inequalities but not for two valid inextreme among the valid inequalities. Then, r=$r1+4r2 equalities r1 # r 2 By . Theorem 3.4, both r1and r 2must be minimal because r is (by assumption). Thus, extremality of r among the minimal valid inequalities is contradicted, completing the proof.
4. Asymptotic theorem
This theorem of Gomory's [6] relates integer solutions of linear programs to the continuous solutions. Suppose we are given the problem: x 3 0 and integer,
Ax = b, cx = z (minimize). If we solve the linear programming relaxation and find a optimum basis €3, then an optimum solution is x, = B-'b,
z * = cBB-'b,
E.L. Johnson
108
where X, and C, are the subvectors of x and c having indices corresponding to basic variables. Further, let us denote by N the non-basic columns of A and by X, and C, the subvectors of x and c corresponding to columns of N. Then, for any X, 2 0, a solution to the linear program is given by XB
Z
= B-'(b
- NxN),
=CBB-'(~-NXN)+
(4.4) CNX,
= Z* -k (CN-CBB-'N)XN.
In thinking about changing a linear programming solution to obtain an integer answer, one usually focuses on changing the basic variables, since they are fractional. Clearly, the non-basics must also be changed from 0 to some other values since if they all remain at 0, the basics are fixed by (4.4).The focus here is on what possible values the X, could take on in an integer optimum, since then xB is determined. For a fixed B, which is optimum for some 6, the range of b for which B is optimum is determined only by feasibility, i.e. B-'b 2 0, and is a polyhedral cone. Gomory's result is that if b is far enough interior to that cone (which is always the case if the solution is non-degenerate and then b is scaled up to ab for a a large scalar), then an optimum integer solution has X, given by solving
xN 2 0 and integer,
b (modulo B), (cN- cBB-'N)xN = z (minimize). Nx,
(4.6) (4.7) (4.8)
Without going into the difficulty of solving this (admittedly) integer programming problem, let us view what the result says. We have this cone of b for which xB = B-'b, X, = 0 is optimum to the linear program, and we have an interior part of it where solving (4.6)-(4.8)solves the integer program, by letting X, be given by (4.4).The constraint (4.7)says that for some integer vector ( y l , . . . , y,,) NxN+By= b,
and as b is changed there are only a finite number (in fact, d e t B ) different constraints (4.7).Thus, far enough in the interior of this entire cone, there are only a finite number of different values of x, needed in order to solve all of those infinitely many integer programs. In particular, if we start with a non-degenerate integer b and scale it by positive integer a to give right-hand side ab, there will be a finite number of small values for a where (4.6)-(4.8)does not solve the integer program, and thereafter there will only be a finite number of different x, needed to solve the integer programs, and these different xN will repeat periodically in a with period detB. Looking at the asymptotic theorem in this way, we see its striking similarity with the special case where it was first seen: the knapsack problem. Although the result here does motivate one to look closely at the problem (4.6)-(4.8),all too often the theorem itself is overlooked. In fact, we have here
The group problem
109
one of the very few qualitative theorems on integer programming giving us insight into the structure of optimum solutions and their relationship to the linear programming optimum.
5. Knapsack covering problem This section presents work of Julian Araoz [l]. Our proofs are different and are more like the earlier proofs for cyclic groups. The knapsack covering problem is n
minimize z =
1cjxi subject to j=1
xi 2 0 and integer, n
1ajxi3 b,
ai and b > 0 and integer.
i=l
As before, a valid inequality is
c n
Tixia 1
(5.4)
j=1
satisfied by all x satisfying (5.2) and (5.3). We speak of 7 as being a valid inequality when (5.4) holds. The valid inequalities are a polyhedron defined by r j3 0 and the inequalities (5.4)where x is a minimal solution; that is, lowering any xi would result in violating either (5.2) or (5.3). There are only a finite number of such x’s. In Fulkerson’s terminology, the set of valid inequalities and the convex hull of solutions x to (5.2) and (5.3) form a blocking pair of polyhedra [3]. The master knapsack covering problem has constraints
1jxi 2 n
(5.5)
j=1
replacing (5.3). Thus, there is one variable for each integer from 1 to the right-hand side n. In the same sense as for cyclic groups, we need only study facets of the master knapsack covering problems in order to know them for all knapsack covering problems.
Theorem 5.1. The facets of the master knapsack covering problem are the extreme points of q s 0 , j = 1 , . . . ,n, (5.6) whenever i + j 3 k, T, 6 .rri + 9,
..., n - 1 ,
~ = T ~ + T ~ j =-1~, , T n=
1.
E L . Johnson
110
Proof. Define subadditive valid inequalities to be those valid inequalities IT such that (5.7) hold. Then, as in Lemma 3.1, they are the IT satisfying (5.6), (5.7) and Tn3 1. In (5.7), let us assume that 1 ~ ~ and 3 0that
s r,, if k 6 j .
Irk
is among the inequalities. These inequalities are reduntant in the system (5.6)(5.9), as Araoz [l] has shown. To see this redundancy, let k 6 j . Then, by n - k + j > n ,
m,,++r,>1, Tn-k
+ I r k = 1,
and
by (5.8).
Subtracting gives IT^ 3 ‘ r k . Next, minimal valid inequalities are defined as before and are sub-additive by Theorem 3.2. The proof there carries over if i + j = k (modulo D ) is replaced by i + j > k. In fact, Araoz [l] shows these results for arbitrary semi-groups. The characterization of minimal valid inequalities as the solution set to (5.6)(5.9) is much as in Theorem 3.5. The analogue to Theorem 2.2 is, here, the blocking description of Fulkerson: the facets of the master knapsack problem are the extreme points of ITi
(5.10)
aO,
n
2 rixja 1,
i=l
for all minimal t,, . . . , t,,,
(5.11)
where a minimal (xl,. . . ,x,) satisfies (5.2) and (5.3) but lowering any xi by 1 causes a violation of either (5.2) or (5.3). Finally, Theorems (3.3) and (3.4) are generally true in any blocking situation and carry over this problem, in particular. The proof, then, is completed as in Section 3.
Theorem 5.2. A dual problem to the knapsack covering problem: minimize z =
2 cixi subject to
(5.12)
j€J
xiaO, j E J ,
(5.13)
2 jxi 3 h,
(5.14)
j€J
is the problem maximize IT,, subject to
(5.15)
r , a O , j = l , ..., n, mk S ?ri + ri,for all i, j , k such that k S i + j ,
(5.16)
mi s ci, for all j
(5.18)
E J.
(5.17)
The group problem
Proof. Weak duality is obvious: if x and (5.18>,then
111
r satisfy
(5.13), (5.14), and (5.16)-
(5.19) Strong duality can be shown from the facet description and linear programming, but we will give an algorithmic proof here. Before doing so, let us remark on an analogue to linear programming complimentary slackness. For any optimum x to the knapsack covering problem, clearly
(5.20)
xi 3 1 implies r j = ci
must hold for any optimum ( r l , ... , r,,)since otherwise (5.19) would not hold with equality. However, more can be said. If, for any optimum x,
x i > l and i = c l y l for O S y , S I
xi-l,
1 = j,
(5.21)
d
then r i += j r i+ r j (if
i +j > n, replace i + j by n).
(5.22)
To show that (5.21) implies (5.22), we remark that the induction used in the first part of (5.19) is applied successively to a (yj, j E J ) obtained by reducing positive xi by 1. These reductions can be done in any order, and equality must hold in the subadditive restriction at each stage. Conversely, if x and r satisfy (5.13), (5.14), (5.16)-(5.18), and (5.20)-(5.22), then they are both optimum to their respective problems, and strong duality holds. The algorithm to show strong duality constructs a r satisfying (5.16)-(5.18). It is very similar to that in [lo] for the cyclic group problem. Some refinements have been developed by Burdet and Johnson [2] relative to the generated points, but here we give the algorithm in its simplest form.
Initialize. Hit points H ={0} and lifting points L = {n}.Let C = { ( j , c j ) , i E Jl.
= 0 and r,,= 0. Let
LIFT. Form the piecewise linear function-going through the points (h, rh),h E H, and (1, T I ) , 1 EL, and interpolating linearly in between. Now, increase each 7rl, 1 E L, by the same amount E 3 0 until some (j, c j )E C lies on the function; i.e. r j = cj. If j E L, terminate. Otherwise go to HIT.
HIT. Put j in H and fix r j = c j . Remove ( j , cj) from C. Put n - j in L with 7rnPi = r,,- mi. Put in C every ( j + h, ri + r,,)for h E H, where j + h is reduced to n if j + h > n. Return to LIFT.
112
E L . Johnson
Clearly ri3 0 and r jS c j , j EJ,hold. To show (5.17) is the same as in [lo]; it reduces to the critical case i E H, j E H, and for that case we have put (i + j, ri+ ri) in C. One small point needed here is that the piecewise linear function generated is monotone increasing. To see this point, if not, then for some 1 E L, h E H, 1 < h, r ,> r h and the function is linear in between. Since 1 E L , n - 1 E H and n - 1 > n - h. Hence, h + (n- 1)> n so (n,r,,+ rnPI) must have been put in C. But, by construction, r ,+ T,,-~= r h for every pair 1 EL, n - 1 E H. By r h < r , we have
+ rn-[ >T h +
r,,=
rn-1.
Thus, a contradiction is reached since (n, r h + r,,-[) E C and r,, could not have been increased above .rrh + q-,. The final part of the proof is to show that when the algorithm terminates, the primal problem is solved with equality of (5.12) and (5.15). We use the fact that r,,= ri T,,-~for all j . This equality holds because it holds for every break point. In LIFT, if j E L, then n - j E H. Then cj + r n P =jr,,.But, for every h E H, r,, is obtained by a non-negative integer xi,
+
rh=
c cjxi, where h
=
jcJ
1jxi. jcJ
The same is true for ( j , ci) E C: ~ 1 ~ 1where , j=
ci = I d
ly,. ICJ
Now, (x,+ y,, 1 E J ) constitutes a primal solution with objective value equal to r,,.
References [I] J. Araoz, Polyhedral neopolarities, Ph.D. Thesis, Faculty of Mathematics, Dept. Of Computer Sciences and Applied Analysis, University of Waterloo, Waterloo, Ontario (December 1973). [2] C.A. Burdet and E.L. Johnson, A subadditive approach to solve integer programs, Ann. Discrete Math. 1, Studies in Integer Programming (North-Holland, Amsterdam, 1977) 117-144. [3] D.R. Fulkerson, Blocking polyhedra, in: B. Harris, ed., Graph Theory and its Applications, (Academic Press, New York, 1970). [4] P.C. Gilmore and R.E. Gomory, The theory and computation of knapsack functions, Operations Res. 14 (1966) 1045-1074. [5] R.E. Gomory, An algorithm for integFr solutions to linear programs, in: R. L. Graves and P. Wolfe, eds., Recent Advances in Mathematical Programming (McGraw-Hill, NY, 1963) 269302. [6] R.E. Gomory, On the relation between integer and non-integer solutions to linear programs, Proc. Nat. Acad. Sci. U S A . 53(1965) 260-265. [7] R.E. Gomory, Faces of an integer polyhedron, Proc. Nat. Acad. Sci. U S A . 57(1967) 16-18. [8] R.E. Gomory, Some polyhedra related to combinatorial problems. Linear Algebra and Appl. 2(1969) 451-558. [9] R.E. Gomory and E.L. Johnson, Some continuous functions related to corner polyhedra, Math. Programming 3(1972) 23-85 and 359-389. [lo] E.L. Johnson, Cyclic groups, cutting planes, and shortest paths, in: T.C. Hu and S.M. Robinson, eds., (Academic Press, New York, 1973) 185-212.
Annals of Discrete Mathematics 5 (1979) 113-138 @ North-Holland Publishing Company.
A SURVEY OF LAGRANGEAN TECHNIQUES FOR DISCRETE OPTIMIZATION Jeremy F. SHAPIRO" Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A
I. Introduction Lagrangean techniques were proposed for discrete optimization problems as far back as 1955 when Lorie and Savage suggested a simple method for trying to solve zero-one integer programming IP problems. We use their method as a starting point for discussing many of the developments since then. The behaviour of Lagrangean techniques in analyzing and solving zero-one IP problems is typical of their use on other discrete optimization problems discussed in later sections. Specifically, consider the zero-one IP problem u = min cx
s.t. Ax S b, xi = O
1.
or
Let ci denote a component of c, a, a column of A with components a,, and bi a component of b. Letting ii represent a non-negative vector of Lagrange multipliers on the right hand side b, the method proceeds by computing the Lagrangean
+
Lo(ii)= - iib +minimum {(c iiA)x},
x, = 0 or 1.
(2)
The function Lo(ii)is clearly optimized by any zero-one solution I satisfying if or 1 if if
ci + iiaj > 0 , ci+iiaj =0, ci + iiai < O .
(3)
In the introduction, we pose and discuss a number of questions about this method and its relevance to optimizing the original IP problem (1).In several instances, we will state results without proof. These results will either be proven in later sections, or reference will be given to papers containing the relevant proofs.
When is a zero-one solution f which is optimal in the Lagrangean also optimal in the given IP problem?
* Research was supported, in part, by the U.S. Army Research Office, Durham, under Contract DAAG29-76-C-0064. 113
J.F. Shapiro
114
In order to answer this question, we must recognize that the underlying goal of Lagrangean techniques is to try to establish the following sufficient optimality conditions. OPTIMALITY CONDITIONS: The pair (f,ii), where f is zero-one and ii 3 0, is said to satisfy the optimality conditions for the zero-one IP problem (1) if (i) Lo(ii)= - iib + (c + iiA)X, (ii) ii (A3 - b ) = 0, (iii) AX S b. It can be shown that if the zero-one solution 3 satisfies the optimality conditions for some 12., then f is optimal in problem (1). This result is demonstrated in greater generality in Section 3. The implication for the Lagrangean analysis is that f computed by (3) is optimal in problem (1)if it satisfies AE S b with equality on >O. rows where i& Of course, we should not expect that X computed by (3) will even be feasible in (11, much less optimal. According to the optimality conditions, however, such an E is optimal in any zero-one IP problem derived from (1) by replacing b with AX + S where S is any non-negative vector satisfying Si= 0 for i such that iii > 0. This property of X makes Lagrangean techniques useful in computing zero-one solutions to IP problems with soft constraints or in parametric analysis of an IP problem over a family of right hand sides. Parametric analysis of discrete optimization problems is discussed again in Section 6.
How should the vector ii of Lagrange multipliers be selected? Can we guarantee that there will be one which produces an optimal solution to the original IP problem?
xi
An arbitrary ii fails to produce an optimal X because aiiXi> bi on some rows or aiiXjC bi on other rows with iii > 0. In order to change ii so that the resulting f is closer to being feasible and/or optimal, we could consider increasing iii on the former rows and decreasing iii on the latter rows. A convergent tatonnement approach of this type is non-trivial to construct because we must simultaneously deal with desired changes on a number of rows. Systematic adjustment of ii can be achieved, however, by recognizing that there is a dual problem and a duality theory underlying the Lagrangean techniques. We discuss this point here briefly and in more detail in Section 3 . For any u SO, it can easily be shown that Lo(u)is a lower bound on v, the minimal IP objective function cost in (1). The best choice of u is any one which yields the greatest lower bound, or equivalently, any u which is optimal in the dual problem
xi
d o = max Lo(u) s.t. u 3 0.
(4)
Lagrangean techniques
115
The reason for this choice is that if ii can yield by (3) an optimal f to the primal problem (l),then fi is optimal in (4). The validity of this statement can be verified by direct appeal to the optimality conditions using the weak duality condition L o ( u ) S v for any u B 0 . Thus, a strategy for trying to solve the primal problem (1) is to compute an optimal solution fi to the dual problem (4), and then try to find a complementary zero-one solution i for which the optimality conditions hold. A fundamental question about Lagrangean techniques is whether or not an optimal dual solution to (4) can be guaranteed to produce an optimal solution to the primal IP problem (1). It turns out that the answer is no, although fail-safe methods exist and will be discussed for using the dual to solve the primal. If (4) cannot produce an optimal solution to (l), we say there is a duality gap. Insight into why a duality gap occurs is gained by observing that problem (4) is equivalent to the LP dual of the LP relaxation of (1) which results by replacing xi = 0 or 1 by the constraints 0 Q xi Q 1. This was first pointed out by Nemhauser and Ullman [42]. Here we use the term relaxation in the formal sense; that is, a mathematical programming problem is a relaxation of another given problem if its set of feasible solutions contains the set of feasible solutions to the given problem. The fact that dualization of (1) is equivalent to convexification of it is no accident because the equivalence of these two operations is valid for arbitrary mathematical programming problems [37]. For discrete optimization problems, the convexified relaxations are LP problems. Geoffrion [16] has used the expression Lagrangean relaxation to describe this equivalence. Insights and solution methods for the primal problem are derived from both the dualization and convexification viewpoints.
How should the dual problem be solved? We remarked above that problem (4) is nothing more than the dual to the ordinary LP relaxation of (1). Thus, a vector of optimal dual variables could be calculated by applying the simplex algorithm to the LP relaxation of (1). The use of Lagrangean techniques as a distinct approach to discrete optimization has proven theoretically and computationally important for three reasons. First, dual problems derived from more complex discrete optimization problems than (1) can be represented as LP problems, but ones of immense size which cannot be explicitly constructed and then solved by the simplex method. Included in this category are dual problems more complex than (4) derived from (1) when (4) fails to solve (1). These are discussed in Sections 2 and 4. From this point of view, the Lagrangean techniques applied to discrete optimization problems are a special case of dual decomposition methods for large scale LP problems (e.g. [34]). A second reason for considering the application of Lagrangean techniques to dual problems, in addition to the simplex method, is that the simplex method is exact and the dual problems are relaxation approximations. It is sometimes more effective to use an approximate method to compute quickly a good, but
J.F. Shapiro
116
non-optimal, solution to a dual problem. In Section 3, we consider alternative methods to the simplex method for solving dual problems and discuss their relation to the simplex method. The underlying idea is to treat dual problems as nondifferentiable steepest ascent problems taking into account the fact that the Lagrangean Lo is concave. Lagrangean techniques as a distinct approach to discrete optimization problems emphasizes the need they satisfy to exploit special structures which arise in various models. This point is discussed in more detail in Section 2.
What should be done if there is a duality gap? As we shall see in Section 3, a duality gap manifests itself by the computation of a fractional solution to the LP relaxation of problem (1).When this occurs, there are two complementary approaches which permit the analysis of problem (1) to continue and an optimal solution to be calculated. One approach is to branch on a variable at a fractional level in the LP relaxation; namely, use branch and bound. The integration of Lagrangean techniques with branch and bound is given in Section 5 . The other approach is to strengthen the dual problem (4) by restricting the solutions permitted in the Lagrangean minimization to be a strict subset of the zero-one solutions. This is accomplished in a systematic fashion by the use of group theory and is discussed in Section 4.
2. Exploiting special structures
Lagrangean techniques can be used to exploit special structures arising in IP and discrete optimization problems to construct efficient computational schemes. Moreover, identification and exploitation of special structures often provide insights into how discrete optimization models can be extended in new and richer applications. The class of problems we consider first is
u = min cx s.t. Ax S b, XEXGR" where X is a discrete set with special structure. For example, X may consist of the multiple-choice constraints xi = 1 for all k , i EJk
x,=O or 1,
where the sets Jk are disjoint. Another example occurs when X corresponds to a network optimization problem. In this case, the representation of X can either be
Lagrangean techniques
117
as a totally unimodular system of linear inequalities, or as a network. Other IP examples are discussed by Geoffrion [16]. The Lagrangean derived from ( 5 ) for any u 3 0 is
L ( u ) = -ub+min(c+uA)x.
(7)
xtx
We expect L(u) to be much easier to compute than 21 because of the special form of X.Depending on the structure of X , the specific algorithm used to compute L may be a “good” algorithm in the terminology of Edmonds [ll] or Karp [32]; that is, the number of elementary operations required to compute L ( u ) is bounded by a polynomial of parameters of the problem. Even if it is not “good” in a strictly theoretical sense, the algorithm may be quite efficient empirically and derived from some simple dynamic programming recursion or list processing scheme. Examples will be given later in this section. Finally, in most instances the x calculated in (7) will be integer and may provide a useful starting point for heuristic methods to compute good solutions to (5). Most discrete optimization problems stated in general terms can be formulated as IP problems, although sometimes with difficulty and inefficiently. We illustrate with two examples how Lagrangean techniques are useful in handling special structures which are poorly represented by systems of linear inequalities. We consider a manufacturing system consisting of I items for which production is to be scheduled at minimum cost over T time periods. The demand for item i in period t is the nonnegative integar rit; this demand must be met by stock from inventory or by production during the period. Let the variable xit denote the production of item i in period t. The inventory of item i at the end of period t is y.,t = y.,.t-, + x i , - rit,
t = 1, . . . , T
where we assume yi,o = 0, or equivalently, initial inventory has been netted out of the ri,. Associated with xi, is a direct unit cost of production tit. Similarly, associated with yit is a direct unit cost of holding inventory hit. The problem is complicated by the fact that positive production of item i in period t uses up a quantity ai+ bixi, of a scarce resource qt to be shared among the I items. The parameters ai and bi are assumed to be nonnegative. The use of Lagrangean techniques on this type of problem was originally proposed by Manne [38]. The model and analysis was extended by Dzielinski and Gomory [lo] and has been applied by Lasdon and Terjung [35]. This problem can be written as the mixed integer programming problem 21 = minimum
1 C (citxit+ hityit) i = l 1=1
( 4 8 , + bixit)s qt, t = 1,. . . , T ;
s.t. i=l
J.F. Shapiro
118
for i = l , ..., I
3 0, y,, 3 0: Sit= 0 or 1, t = 1, . . . , T xi,
where Mi, =IT=, ri, is an upper bound on the amount we would want to produce of i in period t. The constraints ( 8 b ) state that shared resource usage cannot exceed qt. The constraints (8c) relate accumulated production and demand through period t to ending inventory in period t, and the nonnegativity of the yit implies demand must be met and not delayed (backlogged). The constraints ( 8 d ) ensure that 6, = 1, and therefore the fixed charge resource usage ai is incurred, if production xi' is positive in period t. Problem (8) is a mixed integer programming problem with IT zero-one variables, 21T continuous variables and T+2IT constraints. For the application of Lasdon and Terjung [35], these figures are 240 zero-one variables, 480 continuous variables, and 486 constraints which is a mixed integer programming problem of significant size. For future reference, define the set
N
={(Sit,
Xit,
Yit),
t = 1,
..,T I
Sit, Xit, Yit
satisfy (8c), ( 8 4 , (8e)).
(9)
This set describes a feasible production schedule for item i ignoring the joint constraints (8b).The integer programming formulation (8) is not effective because it fails to exploit the special network structure of the sets Ni. This can be accomplished by Lagrangean techniques as follows. Assign Lagrange multipliers y s 0 to the scarce resources qt and place the constraints (8b) in the objective function to form the Lagrangean
I
+minimum
T
11{(c, + ybi)Xit +
haisit
+ hityit).
(SU.&~.YI,)EN, i = l 1=1
Letting T
~ ~ (=minimum u )
1{(q, + ybi)xit +
( ~ l r . ~ l , . ~t = a 1) ~ N l
+ hi,yi,),
the Lagrangean function clearly separates to become
t=l
i=l
Each of the problems (10) is a simple dynamic programming shortest-route
Lugrungean techniques
119
calculation for scheduling item i where the Lagrange multipliers on shared resources adjust the costs as shown. Notice that it is easy to add any additional constraint on the problem of scheduling item i which can be accommodated by the network representation; for example, permitting production in period t only if inventory falls below a preassigned level. Unfortunately, we must give up something in using Lagrangean techniques on the mixed IP (8) to exploit the special structure of the sets Ni. In the context of this application, the optimality conditions we seek but may not achieve involve Lagrange multipliers which permit each of the I items to be separately scheduled by the dynamic programming calculation Li while achieving a global minimum. As we see in the next section, this can be at least approximately accomplished if the number of joint constraints is small relative to I. In summary, the application of Lagrangean techniques just discussed involves the synthesis of a number of simple dynamic programming models under joint constraints into a more complex model. In a similar fashion, Fisher [12] applied Lagrangean techniques to problems where a number of jobs are to be scheduled, each according to a precedence or CPM network, and the joint constraints are machine capacity. Another example is the cutting stock problem of Gilmore and Gomory [lo]. In this model, a knapsack problem is used to generate cutting patterns and the joint constraints are on demand to be satisfied by some combination of the patterns generated. The traveling salesman problem is a less obvious case where an underlying graph structure can be exploited to provide effective computational procedures. The problem is defined over a complete graph g with n nodes and symmetric lengths cij= cji for all edges (i, j ) . The objective is to find a minimum length tour of the n nodes, or in other words, a simple cycle of n edges and minimal length. This problem has several IP formulations involving 4n(n - 1) variables xij for the tn(n-1) edges (i, j ) in the complete graph. One such IP formulation consists of approximately 2" constraints ensuring for feasible subgraphs of n edges that (i) the degree at each node is 2 and (ii) no cycle is formed among a subset of the nodes excluding node 1. The set of subgraphs of n edges satisfying (ii) has a very efficient characterization. A 1 -tree defined on the graph g is a subgraph which is a tree on the nodes 2 , . . ., n and which is connected to node 1 by exactly two edges. The collection of subgraphs of n edges satisfying (ii) is precisely the set T of 1-trees. Thus, the traveling salesman problem can be written as the IP problem n-1
u = min
n
C 1
i = l j=i+l
cijxii
The implication of the formulation (ll),however, is that we wish to deal with the
J.F. Shapiro
120
l-tree constraints implicitly rather than as a system of linear inequalities involving the zero-one variables xij. Held and Karp (1970) discovered this partitioning of the traveling salesman problem and suggested the use of Lagrange multipliers on the degree constraints. For u E R”,the Lagrangean is n-1
L ( U )= - 2
n
1 4 +minimum 1 C
i=l
XET
i=l j=i+1
+ + uj)xii.
(cii y
(12)
This calculation is particularly easy to perform because it is essentially the problem of finding a minimum spanning tree in a graph. A “greedy” or “myopic” algorithm is available for this problem which is “good” in the theoretical sense and very efficient empirically [33, 111. The traveling salesman problem is only a substructure arising in applications of discrete optimization including vehicle routing and chemical reactor sequencing. Lagrangean techniques can be used to synthesize the routing or scheduling problems into a more complex model. In a similar application, Gomory and Hu [22] discovered the importance of the spanning tree as a substructure arising in the synthesis of communications networks. Specifically, a maximum spanning tree problem is solved to determine the capacities in communications links required for the attainment of desired levels of flows. Lagrangean techniques can be used to iteratively select the spanning tree on which to perform the analysis until the communications problem is solved at minimum cost. All of the special structures discussed above arise naturally in applications. By contrast, a recent approach to IP involves the construction of a special structure which we use as a surrogate for the constraints Ax S b. The approach requires that A and b in problem (1) have integer coefficients; henceforth we assume this to be the case. For expositional convenience, we rewrite the inequalities as equalities Ax + Is = b where now we require the slack variables to be integer because A and b are integer. The system Ax + Is = b is aggregated to form a system of linear congruences which we view as an equation defined over a finite abelian group. The idea of using group theory to analyze IP problems was first suggested by Gomory [23], although his specific approach was very different. We follow here the approach of Bell and Shapiro [3]. Specifically, consider the abelian group G = Zq,@Zq2@* * - @ Z , where the qi are integers greater than l , Z q i is the cyclic group of order qi, and “@” denotes direct sum. Let Z” denote the set of all integer rn-vectors, and construct a homomorphism C$ from Z“ into G as follows. For each row i, we associate an element E~ of G and for any z E Z”,4(z) = ziq. We apply C$ to both sides of the linear system Ax +Is = b to aggregate it into the group equation
f ajxj + c m
j=1
i=l
EiSi
= p,
Lagrangean techniques
121
where ai=$(ai), p = $ ( b ) . It is easy to see that any integer x,s satisfying Ax +Is = b also satisfies the group equation. Therefore, we can add the group equation to the zero-one IP problem (1) without eliminating any feasible solutions. This gives us u =min cx
s.t. Ax + I s = 6,
f "pi + c m
j=l
EiSi
= p,
i=l
xi= O or 1, si = 0 , 1 , 2 , . . . . For future reference, let r = { ( x , s) 1 (x, s) satisfies (13c) and (13d)).
Lagrangean techniques are applied by dualizing with respect to the original constraints Ax + Is = b. For u 5 0, this gives us the Lagrangean
L ( u )= - ub + minimum { ( c+ uA)x + us} (X.S)EE
The calculation (14) can be carried out quite efficiently by list processing algorithms with the computation time determined mainly by the order of the group [48,20,24]. It is easy to see that for a non-trivial group G (i.e. ( G l s 2 ) , the Lagrangean L gives higher lower bounds than Lo from Section 1 since not all zero-one solutions x are included in for some value of s. The selection of G and homomorphism $ is discussed again in Section 4. We have not attempted to be exhaustive in our discussion of the various discrete optimization models for which Lagrangean techniques have been successfully applied. Lagrangean techniques have also been applied to scheduling power plants [41], the generalized assignment problem [47] and multi-commodity flow problems [30].
3. Duality theory and the calculation of Lagrange multipliers
Implicit in the use of Lagrangean techniques is a duality theory for the optimal selection of the multipliers. We study this theory by considering the discrete optimization problem in the general form u = min f(x)
s.t. g(x)< b, XEXGR", where f is a scalar valued function defined on R",g is a function from R" to R",
122
J.F. Shapiro
and X is a discrete set. If there is no x E X satisfying g(x) d b, we take II = +m. With very little loss of generality, we assume that X is a finite set; say X = {x‘}:= Implicit in this formulation is the assumption that the constraints g(x) d b make the problem substantially more difficult to solve. Lagrangean techniques are applied by putting non-negative multipliers on the constraints g(x) S b and placing them in the objective function. Thus, the Lagrangean function derived from (15) is L ( u ) = -ub+minimum{f(x)+ug(x)} xcx
+ ug(x‘)}.
= - ub +minimum {f(xr) r = l . . . . ,T
(16)
As we saw in the previous section, the specific algorith used to compute L may be a “good” algorithm, but even if it is not good, the intention is that it is quite efficient empirically and derived from a simple dynamic programming recursion or list processing scheme. Since X is finite, L is real valued for all u. Moreover, it is a continuous, but nondifferentiable, concave function [46]. The combinatorial nature of the algorithms used in the Lagrangean calculation is a main distinguishing characteristic of the use of Lagrangean techniques in discrete optimization. This is in contrast to the application of Lagrangean techniques in nonlinear programming. where f and g are differentiable, X = R” and the Lagrangean is minimized by solving the nonlinear system Vf(x) + uVg(x) = 0. A second distinguishing characteristic of the use of Lagrangean techniques in discrete optimization is the non-differentiability of L, due to the discreteness of X . This makes the dual problem discussed below a non-differentiable optimization problem . As it was for the zero-one IP problem discussed in the introduction, the selection of u in the Lagrangean is dictated by our desire to establish sufficient optimality conditions for (15). OPTIMALITY CONDITIONS: The pair (X, ti), where X E X and ti 3 0, is said to satisfy the optimality conditions for the discrete optimization problem (15) if (i) L ( f i ) =-iib+f(X)+iig(X), (ii) ii(g(f)-b) = 0, (iii) g(f) s b.
Theorem 3.1. If (X, ii) satisfy the optimality conditions for the discrete optimization problem (15), then X is optimal in (15).
Proof. The solution f is clearly feasible in (15) since X E X and g ( L ) d b by condition (iii). Let x E X be any other feasible solution in (15). Then by condition (9,
L(ii)=-iib+f(Z)+iig(x)d -tib+f(x)+iig(x)Sf(x),
Lagrangean techniques
123
where the final inequality follows because u B 0 and g ( x ) - b s 0. But by condition (ii), L(ii)= f ( f )and therefore f ( f ) s f ( x ) for all feasible x . Implicit in the proof of Theorem 3.1 was a proof of the following important result.
Corollary 3.2 (weak duality). For any u B 0, L ( u )d u. Our primary goal in selecting u is to find one providing the greatest lower bound, or in other words, one which is optimal in the dual problem
d
L(u) s.t. u 30.
= max
Corollary 3.3. If (2, ii) satisfy the optimality conditions for the discrete optimization problem (15), then ii is optimal in the dual problem (17). Proof. We have L(ii)= - iib + f ( f ) +iig(f) = f(f)= u by Theorem 3.1. Since L ( u )d 2) for all u 2 0 by Corollary 3.2, we have L ( u )S L(ii) for all u B 0. Thus, the indicated strategy for the application of Lagrangean techniques is to first find an optimal ii in the dual problem. Once this has been done, we then try to find a complementary f E X for which the optimality conditions hold by calculating one or more solutions x satisfying L(ii)= -iib +f(x) + i i g ( x ) . There is no guarantee that this strategy will succeed because ( a ) there may be no ii optimal in the dual for which the optimality conditions can be made to hold for some f EX;( b ) the specific optimal ii we calculated does not admit the optimality conditions for any f E X; or (c) the specific X (or X’s) in X selected by minimizing the Lagrangean do not satisfy the optimality conditions although some other 2 E X which is minimal in the Lagrangean does satisfy them. Lagrangean techniques can be applied in a fail-safe manner, however, if they are embedded in branch and bound searches. This is discussed in Section+ 5. Alternatively, for some discrete optimization problems, it is possible to strengthen the dual problem if it fails to yield an optimal solution to the primal problem. Under certain conditions, the dual can be successively strengthened until the optimality conditions are found to hold. This is discussed in Section 4. For any u B 0, it is easy to see by direct appeal to the optimality conditions that X satisfying L(ii)= - i i b + f ( f ) + i i g ( f ) is optimal in (15) with b replaced by g ( f ) + 6 where 6 is non-negative and satisfies Si= 0 if 12, >O. Thus, Lagrangean techniques can be used in a heuristic manner to generate approximately optimal solutions to (15) when the constraints g ( x ) S b are soft. Even if these constraints are not soft, heuristic methods exploiting the specific structure of (15) can be applied to perturb an infeasible f which almost satisfies the constraints to try to find a good feasible solution. D’Aversa [8] has had success with this approach on IP problems.
124
J.F. Shapiro
There are two distinct but related approaches to solving (17); it can be viewed as a steepest ascent, nondifferentiable optimization problem, or as a large scale linear programming problem. We discuss first the steepest ascent approach. Our development is similar to that given in [14]; see also [25,26,51]. Although L is not everywhere differentiable, ascent methods can be constructed using a generalization of the gradient. An rn-vector 7 is called a subgradient of L at U if
~ ( u ) s L ( i i ) + ( u - i i ) 7for all u. For any subgradient, it can easily be shown that the half space { u I ( u - ii)Y a 0) contains all solutions to the dual with higher values of L. In other words, any subgradient appears to point in a direction of ascent of L at ii. A readily available subgradient is
where f is any solution in X satisfying L(ii)= -iib+f(f)+iig(f). If there is a unique ZEX minimizing L at ii, then L is differentiable there and 7 is the gradient. The subgradient optimization method [29,30] uses these subgradients to generate a sequence { u k }of non-negative solutions to (17) by the rule uk+*= max (0, u:
+ 6ky!},
i = I,. . . , rn
where yk is any subgradient selected in (18) and 6 k > 0 is the step length. For example, if 6, obeys 6, -0+ and ck6kk++co,then it can be shown that L ( u k ) + d [45]. Alternatively, finite convergence to any target value d< d can be achieved if
~ E ~ > O , E ~ > O . The where llukll denotes Euclidean norm and E , < ( Y ~ < ~ - E for latter choice of 6, is usually an uncertain calculation in practice, however, because the value d is not known and therefore a target value d < d cannot be chosen with certainty. There is no guarantee when using subgradient optimization that L( u k t l ) > L ( u k )although practice has shown that increasing lower bounds can be expected on most steps under the correct combination of artistic expertise and luck. Thus, subgradient optimization using the rule (19) for the step length is essentially a heuristic method with theoretical as well as empirical justification. It can be combined with convergent ascent methods for solving (17) based on the simplex method which we now discuss.
Lagrangean techniques
12s
The dual problem (17) is equivalent to the LP problem
d =max
Y
Y S-ub+f(x')+
Ug(x'),
t = 1 , . .. , T
uao,
because, for any u 3 0, the maximal choice v(u) of
Y
is
-ub+minimum(f(x')+ u g ( x ' ) } = L ( u ) . t=l,.
. ., T
Problem (20) is usually a large scale LP because the number T of constraints can easily be on the order of thousands or millions. For example, in the traveling salesman dual problem discussed in Section 2, T equals the number of 1-trees defined on a graph of n nodes. The LP dual to (20) is r
d =min
2 f(xt)At
t=1
T
g(x')A, s b ,
s.t. t=l
c At=1, T
t=l
At30, t = l ,
..., 7'.
(214
This version of the dual problem clearly illustrates the nature of the convexification inherent in the application of Lagrangean techniques to the discrete optimization problem (15). For decomposable discrete optimization problems with separable Lagrangeans such as the multi-item production scheduling and inventory control problem (lo), the dual problem in the form (21) has a convexification constraint (21c) for each component in the Lagrangean. The number of such constraints for the production scheduling problem is I (the number of items being scheduled), and the number of joint constraints (21b) is T (one for each time period). If I > T , then an optimal solution to (21) found by a simplex algorithm will have pure strategies for at least I - T items; that is, one A variable equal to one for these items. If I>> T, then Lagrangean techniques give a good approximation to an optimal solution to (10) because pure strategies are selected for most of the items. Roughly speaking, when I>> T, the duality gap between (10) and its dual is small. Solution of the dual problem in its LP form (20) or (21) can be accomplished by a number of algorithms. One possibility is generalized linear programming, otherwise known as Dantzig-Wolfe decomposition [7,34,37]. Generalized linear programming proceeds by solving (21) with a subset of the T columns; this LP problem is called the Master Problem. A potential new column for the Master is
126
J.F. Shapiro
generated by finding 3 E X satisfying L( - ii) = + fib + f(%)- fig(:), where i i6 0 is the vector of optimal LP shadow prices on rows (21b) calculated for the Master problem. If L( - ii) < iib + where 8 is the optimal shadow price on the convexity row (21c), then the new column
e,
= iib + 8 (the " > '' case is is added to the Master with new A variable. If L( - ii) not possible), then the optimal solution to the Master is optimal in the version (21) of the dual problem. Note that if we required A, to be integer in version (21) of the dual problem, then (21) would be equivalent to the primal problem (15). Moreover, the dual solves the primal problem (15) if there is exactly one A, at a positive level in the optimal solution to (20); say, A, = 1, A, = 0, t f r. In that case, X' E Xis the optimal solution to the primal problem and we have found it by the use of Lagrangean techniques. Conversely, suppose more than one A, is at a positive level in the optimal solution to (21), say A, >0, . . . ,A, >O, A, = 0, t r + 1. Then in all likelihood the solution h,x' is not in X since X is discrete and the dual problem has failed to yield an optimal solution to the primal problem (15). Even if y =I:=, A,x' is in X,there is no guarantee that y is optimal because optimality conditions (ii) and (iii) can fail to hold for y. In the next section we discuss how this difficulty can be overcome, at least in theory, and in Section 5 we discuss the use of Lagrangean techniques in conjunction with branch and bound. Generalized linear programming has some drawbacks as a technique for generating Lagrange multipliers in discrete optimization. It has not performed consistently [44] although recent modifications of the approach such as BOXSTEP [31] have performed better. A second difficulty is that it does not produce monotonically increasing lower bounds to the primal objective function minimum. Monotonically increasing bounds are desirable for computational efficiency when Lagrangean techniques are used with branch and bound. A hybrid approach that is under investigation is to use subgradient optimization on the dual problem as an opening strategy and then switch to generalized linear programming when it slows down or performs erratically. The hope is that the generalized linear programming algorithm will then perform well because the first Master LP will have an effective set of columns generated by subgradient optimization with which to optimize. An alternative convergent algorithm for the dual problem is an ascent method based on a generalized version of the primal-dual simplex algorithm. We present this method mainly because it provides insights into the theory of nondifferentiable optimization which is central to the selection of Lagrange multipliers for
127
Lugrangean techniques
Y'
Fig. 1.
discrete optimization problems. Its computational effectiveness is uncertain although it has been implemented successfully for IP dual problems (see [14] which also contains proofs of assertions). The idea of the primal-dual ascent algorithm can best be developed by considering a difficulty which can occur in trying to find a direction of ascent at a point ii with positive components where L is non-differentiable. The situation is depicted in Fig. 1. The vectors y 1 and y2 are distinct subgradients of L at ii and they both point into the half space containing points u such that L ( u ) a L ( i i ) . Neither of these subgradients points in a direction of ascent of L ; the directions of ascent are given by the shaded region which is the intersection of the two half spaces { u I(u-ii)y'aO}and{u I(u-ii)y2aO}. In general, feasible directions of ascent of L at point Q can be discovered only by considering at least implicitly the collection of all subgradients at that point. This set is called the subdifferential and denoted by GL(ii). The directional derivative VL(ii;u ) of L at the point ii in the feasible direction u is given by [25]
VL (ii ;u ) = minimum uy y E SL(ii)
The relation (22) is used to construct an LP problem yielding a feasible direction of ascent, if there is one; namely, a feasible direction u such that VL(ii; u ) > O . Two sets to be used in the construction of the direction finding LP are
V( C ) = { u E RmI
O d u i s l for i such that iii = O ; - l d u i s l for i such that iii>O.
and
T(ii)= { t 1 L(ii)= -iib +f(x')+ iig(X')}. Without loss of generality, we can limit our search for a feasible direction of ascent to the set V(0).The subdifferential SL(ii) is the convex hull of the points
J.F. Shapiro
128
y' = g(x') - b, t E T(ii),and this permits us to characterize the directional derivative by the formula
VL(ii; v) =minimum uy' tcT(E)
If the non-negative vector ii is not optimal in the dual problem, then a direction of ascent of L at ii can be found by solving the LP problem V=maxv
u E V(ii).
Conversely, if ii is optimal in the dual problem, then V = 0 is the optimal objective function value in (24). Note that from (23) we have
V = maximum minimum uy' osV(Ci)
'=I,
..., T
= maximumVL(ii; u ) u e V( li)
and the LP (24) will pick out a feasible direction of ascent with VL(ii; u ) > O if there is one. Once an ascent direction u # 0 with VL(ii; u ) = V > 0 has been computed from (24), the step length 8 > 0 in that direction is chosen to be the maximal value of 8 satisfying L(ii+BV)=L(ii)+BV. This odd choice of 8 is needed to ensure convergence of the ascent method by guaranteeing that the quantity V strictly decreases from dual feasible point to dual feasible point (under the usual LP non-degeneracy assumptions). This is the criterion of the primal-dual simplex algorithm which in fact we are applying to the dual problem in the dual LP forms (20) and (21). The difficulty with problem (24) is the possibly large number of constraints u s v y ' since the set T(ii) can be large. This can be overcome by successive generation of rows for (24) as follows. Suppose we solve (24) with rows defined for y', t E T'(ii)E T(ii) and obtain an apparent direction u' of ascent satisfying
V'= minimum u'y' >O. f ET'hi)
We compute as before the maximal value 8' of 8 such that L(ii+Ou')= L(ii)+ 8V'. If O f > 0, then we proceed to ii Om,, u. If 8' = 0, then it can be shown that we have found as a result of the line search a subgradient ys,s E T(ii)- T'(ii), which satisfies V'>u'ys. In the latter case, we add u G u y s to the abbreviated version of (24) and repeat the direction finding procedure by resolving it. The dual problem is solved and ii is found to be optimal if and only if V = 0 in
+
Lagrangean techniques
129
problem (24). Some additional insight is gained if we consider the LP dual to (24) m
V=min
C s;+ i=l
s: i E l(ii)
where I ( 1 ) = {il&= 0). Problem (25) states, in effect, that 1 is optimal in the dual problem if and only if there exists a subgradient 7 E SL(1) satisfying Ti= 0 for i such that & > 0 and Ti SO for i such that 12, = 0. Moreover, the columns y t for t E T(I)and A, > 0 are an optimal set of columns for the dual problem in the form (21). The close relationship among the concepts of dualization, convexification and the differentiability of L is again evident. Specifically, a sufficient, but not necessary, condition for there to be no duality gap is that L is differentiable at some optimal solution 1. If such is the case, then SL(ii) consists of the single vector 7 = g(Z) - b which by necessity is the optimal column in (25). The necessary and sufficient condition for dual problem optimality that V = 0 in (25) implies that f satisfies the optimality conditions thereby implying it is optimal in the primal problem by Theorem 3.1. 4. Resolution of duality gaps
We have mentioned several times in previous sections that a dual problem may fail to solve a given primal problem because of a duality gap. In this section, we examine how Lagrange techniques can be extended to guarantee the construction of a dual problem which yields an optimal solution to the primal problem. The development will be presented mainly in terms of the zero-one IP problem (1) as developed in [3], but the theory is applicable to the general class of discrete optimization problems (15). The theory permits the complete resolution of duality gaps. Nevertheless, numerical excesses can occur on some problems making it difficult in practice to follow the theory to its conclusion. The practical resolution of this difficulty is to imbed the dual analysis in branch and bound as described in Section 5. Although it was not stated in such terms at the time it was invented, the cutting plane method for IP [21] is a method for resolving IP duality gaps. We will make this connection in our development here, indicate why the cutting plane methsd proved to be empirically inefficient and argue that the dual approach to IP largely supercedes the cutting plane method. We saw in section two how a dual problem to the zero-one IP problem (1) could be constructed using a group homomorphism to aggregate the equations
J.F. Shapiro
130
A x ~ b The . relationship of this IP dual problem to problem (1) can be investigated using the duality theory developed in the previous section. Recall that the Lagrangean for the IP dual was defined for u 3 0 as (see (14)) L(u)= - ub +minimum {(c + uA)x + us}, (*.S)EP
where
F = {(x, s)
m
In
C aixj+ 1 Eisi = p, xi = o or si= 0, 1, 2, . . . .I.
i=l
(26)
i=l
Although the slack variables in are not explicitly bounded, we can without loss of generality limit to a finite set, say P={(x', S')},T_~. This is because any feasible slack vector s = b -Ax is implicitly bounded by the zero-one constraints on x. The general discrete optimization dual problem in the form (21) is-specialized to give the following representation of the IP dual problem T
d =min
1(cxr)At t = l
T
s.t.
C (Ax' + Is')A,= b, 1=1
r
2 h,=1,
AraO.
r=l
This formulation of the IP dual problem provides us with the insights necessary to make several important connections between Lagrangean techniques and the cutting plane method. The convexification in problem (27) can be written in more compact form as
d = min cx s.t. x, s E{X, s I Ax + Is = b, O s x j 6 1 , O s si <Mi}n [F],
(28)
where "[ #]" denotes convex hull and Mi is the upper bound on the slack variable si. In words, the IP dual problem is effectively the problem of minimizing cx over the intersection of the LP feasible region with the polyhedron Inequalities based on the faces of [ are cuts, and there will generally be an extremely large number of them. The computational inefficiency of the cutting plane method is due in large part to the algorithmic ambiguity created by this proliferation of cuts. Lagrangean techniques and the IP dual problem provide a rationale for selecting cuts, but in the process, makes the practical use of cuts somewhat superfluous. For any u 3 0, the inequality
[m.
v]
(c
+ uA)x + us 3 L ( u )+ ub
is a sirpporting hyperplane of [
m. Since
(29) contains all feasible solutions to the
Lagrangean techniques
131
zero-one IP problem, (29) is a valid cut which can be added to any LP relaxation of the problem which included the constraints Ax +Is = b. Its effect on an LP relaxation would be to ensure that the objective function value cx would be at least L ( u ) [49]. Thus, the strongest cut in terms of forcing the objective function to increase is one derived from a dual vector u* that is optimal in the dual problem. Furthermore, the procedure for selecting a cut according to this criterion is to solve the dual problem by one or more of the methods of the previous section which, as we see from problem (28), implicitly considers all cuts (i.e., all faces of [71)without generating any of them. If an optimal solution to the dual problem produces an optimal solution to the zero-one IP problem, then a cut is not needed. If, on the other hand, an optimal zero-one solution is not produced, then a cut of the form (29) written with respect to an optimal dual solution u* has the same effect on the objective function as all the cuts implied by [y].The addition of such a cut to an LP relaxation would permit the IP dual analysis to continue in the sense that a stronger IP dual of the form (27) could be derived. However, the construction of Bell and Shapiro [3] attacks more directly the problem of strengthening the rP dual when it does not produce an optimal solution to the zero-one IP problem. Solution of the zero-one IP problem (1) by Lagrangean techniques is constructively achieved by generating a finite sequence of groups {Gk}F=,,sets {Y}f=, and IP dual problems analogous to (27) with objective function value d k . The groups have the property that G k is a subgroup of G k + l ,implying by the construction that P+lG kkand therefore that v 3 dk" 3 d '. The critical step in this approach to solving (1) is that if an optimal solution to the kth dual does not yield an so that 5 Fk.The optimal solution to (l), then we are able to construct Gk+' construction uses as its point of departure the following result.
rk"
Theorem 4.1 [3]. If only one A, is positive in an optimal basic solution to (27), then the corresponding solution (xt, s') is optimal in the zero-one IP problem. O n the other hand, i f more than one A, is positive, then all the (xt, s') corresponding to basic A, are infeasible in the zero-one IP problem. When more than one A, is positive in an optimal basic solution to (27), then we can use a number theoretic procedure on the columns in (27) with A, positive to construct a new group with the property that the corresponding (x', st) are infeasible in the new group equation. Thus, they are not considered in the Lagrangean calculation. Since at least two solutions are eliminated each time the dual is strengthened, and since the set of (x, s) to be considered is finite, the process converges to an IP dual problem of the form (27) which yields an optimal solution to the zero-one IP problem. Computational experience with the IP dual problem (27) is given in [14]. D'Aversa [8] has encoded the iterative IP dual analysis outlined above and experimentation is underway with it. The IP dual approach has been extended to
J.F. Shapiro
132
mixed IP by Northup and Shapiro [43]. Burdet and Johnson [4] have applied some concepts from convex analysis in the construction of 1P methods which bear a resemblance to the methods just discussed. The approach just outlined is applicable to the general discrete optimization problem (21) as long as g ( x ' ) - b is a rational vector. If more than one A, is positive in (21), a group structure could be induced which would exclude infeasible solutions x' from consideration in the Lagrangean (16). This would be accomplished by intersecting with the set of solutions satisfying a group equation which would, however, make the algorithm for the Lagrangean more complex. See [l] for an application of this approach to the traveling salesman dual problem of maximizing the Lagrangean (12). 5. Uses of Lagrangean techniques in branch and bound
Branch and bound is a method guaranteed to find an optimal solution to the general discrete optimization problem (15) by a systematic search of the discrete solution set X. The efficiency of the search is determined in large part by the strength of the bounds used in limiting it. Bounds are often derived from LP relaxations of a given discrete optimization problem which, as we have seen, arise naturally as dual problems for selecting Lagrange multipliers. Lagrangean analyses can also be used to indicate the most promising variables on which to branch. Conversely, branch and bound can be viewed as a method for perturbing a given discrete optimization problem when Lagrangean technique fail to yield an optimal solution to it. We describe the integration of Lagrangean techniques with branch and bound in terms of the general discrete optimization problem (15). Our development follows closely that of [14]. The branch and bound search of the set X is done in a non-redundant and implicitly exhaustive fashion. At any stage of computation, the least cost known solution 2 E X satisfying g ( i ) S b is called the incumbent with incumbent cost i = f ( i ) .Branch and bound generates a sequence of subproblems of the form
u(Xk) = min f(x) s.t. g ( x ) S b, x E Xk, where XkG X.The set X kis selected to preserve the special structure of X. If we can find an optimal solution to (30), then we have implicitly tested all subproblems of the form (30) with X k replaced by X' c X kand such subproblems do not have to be explicitly enumerated. The same conclusion holds if we can ascertain that u(Xk) 3 i without actually discovering the precise value of v(Xk). If either of these two cases obtain, then we say that the subproblem (30) has been fathomed. If it is not fathomed, then we separate (30) into subproblems of the form (30) with
Lagrangean techniques
133
X k replaced by XI, 1 = 1,.. . ,L, and L
u XI =
Xk,
n
xL1
x'2
1=1
= 4,
1, # 1,.
Lagrangean techniques are used to try to fathom the subproblem (30) by solution of the dual problem
d ( X k )= max L ( u ;Xk) s.t. u a o , where L ( u ;X k )= - ub+minimum{f(x)+ ug(x)}.
(32)
X€Xk
The use of (31) in analyzing (30) is illustrated in Fig. 2 taken from [14] which we now discuss step by step. Step 1 and 2: Often the initial subproblem list consists of only one subproblem corresponding to X . Step 3: A good starting dual solution ii 3 0 is usually available from previous computations. Step 4: Computing the Langrangean can be a network optimization problem, shortest route type computation for integer programming, minimum spanning tree for the traveling salesman problem, dynamic programming shortest route computation for resource constrained network scheduling problems, etc. Step 5: As a result of step 4,the lower bound L(ii;X k ) on u ( X k )is available, and it should be clear that (30) is fathomed if L ( i i ; X k ) 3 isince L ( i i ; X k ) S v(X". Steps 6, 7 and 8: Let f € X k be an optimal solution in (32) and suppose f is feasible, i.e. g(f)< b. Since (30) was not fathomed (step 5 ) , we have L(ii;X k ) = f(f)+ ii(g(f) - b ) < i with the quantity ii(g(f) - b ) s 0. Thus, it may or may not be true that f(2) < 2, but if so, then the incumbent 3 should be replaced by f. In any case, if f is feasible, we have by the duality theory discussed in Section 3 that f(1)+ ii(g(f) - b ) C v ( X k )=Zf(f) and therefore 1 is optimal in (30) if ii(g(f) - b ) = 0; i.e., if complementary slackness holds. Step 9: This may be a test for optimality in the dual of the current ii. Alternatively, it may be a test of recent improvement in the dual lower bound. If generalized linear programming is used to solve the dual, then it provides at each iteration an upper bound 2 on d ( X k ) .Thus, if d < 2, we know that the subproblem (30) will never be fathomed by bound by the given dual. Finally, as we indicated in Section 4, if the given dual problem proves unsatisfactory, then it can sometimes be strengthened depending on the nature of the primal problem. Step 10: The selection of a new ii 5 0 depends upon the methods discussed in Section 3 being used and which of these methods have proven effective on the same type of problem in the past. When subgradient optimization is used, the
J.F. Shapiro
134
SUBPROBLEM
INITIAL
I
SELECl INT ! IA L U I O
I 4
COMPUTE LAGRANGEAN
N SUePROBLEM
L
SEPARATE SUBPROBLEM
1. SELECT NEW SU B PROBLE tvl
Fig. 2.
incumbent value I can be used in place of 2 as the target value in selecting the step length (19). The rationale for this choice is the desire to fathom (30) by bound using the dual by finding ii 2 0 such that L(ii;Xk) 2 2. Computational experience has shown that subgradient optimization has a good chance of quickly finding such a ii if d(Xk) is somewhat above 2 and it also produces monotonically increasing lower bounds. Conversely, if d ( X k )< 2 and I is used as the target, the lower bounds produced by subgradient optimization will not be monotonic and a wobbling pattern will be observed. In the latter case, persistence with the dual (step 9) is not attractive.
Lagrangean techniques
135
Steps 11 and 12: The separation of the given subproblem can often be done on the basis of information provided by the dual problem. For example, in integer programming, the problem may be separated into two descendants with a zero-one variable xi set to zero and one respectively, where xi is chosen so that the reduced cost is minimal. It is important to point out that the greatest lower bound obtained during the dual analysis of (30) remains a valid lower bound on a subproblem derived from (30) with X ' s X k . In step 12, the new subproblem selected from the subproblem list can be one with low or minimal lower bound. There are some constructs used in branch and bound derived from or related to Lagrangean techniques which we will not cover in any detail. One such construct is the calculation of penalties relative to a given LP relaxation of a discrete optimization problem [6,9,27,52]. A penalty for a zero-one IP problem, for example, is a lower bound estimate on the increase in cost of the primal objective function value as the result of separating the IP problem by fixing a specific variable at zero and one. Another construct is the surrogate constraint which is given in the form.
for any u a 0 [15,19]. The idea is that this constraint can be added to (30) since any feasible solution with lower cost than 2 will satisfy it. The constraint has a strong effect on the analysis of subproblems derived from (30) if u is chosen to be optimal or near optimal in the dual (31). Geoffrion [16] discusses in greater detail penalties and surrogate constraints from the Lagrangean point of view.
6. Future research and applications areas
We have seen that Lagrangean techniques have already been widely used to analyze discrete optimization problems. Nevertheless, further progress should be possible in the use of these techniques, particularly in their integration with branch and bound, and the construction of fast hybrid algorithms for solving dual problems. We saw in Section 5 that a family of related dual problems is generated and used in conjunction with branch and bound. The relationship between these duals is incompletely understood as are methods for exploiting the relationship in their optimization. Some work in this direction has been done by Marsten and Morin [40].They give a new way to use linear programming to compute bounds on LP relaxations in branch and bound. Specifically, a resource-space tour is defined such that each simplex pivot yields a bound for every unfathomed subproblem in the branch and bound search. Sensitivity and parametric analysis of IP problems is an era of current research and considerable practical importance in which Lagrangean techniques can play a significant role. Geoffrion and Nauss [17] give an overview of the work done thus far in this area. Shapiro [50] discusses how the constructs from Section 4 can be
136
J.F. Shapiro
used in sensitivity analysis. Multicriterion IP is a particularly desirable type of parametric analysis which has not yet been implemented. The idea would be to use the branch and bound search to generate a number of feasible IP solutions which are optimal or near optimal under various objective functions. The work required to find a number of interesting mixed IP solutions may be little more than that of finding a single optimal solution. Parametric variation of the right hand side is studied by Marsten and Morin [40]. Another recent area of considerable research interest in which Lagrangean techniques are applicable is in the analysis of heuristic methods for combinatorial optimization. Cornuejols, Fisher and Nemhauser [5] develop a “greedy” heuristic to generate feasible solutions to a class of location problems and use Langreagean techniques to assess the error in objective function optimality.
References [l] D.E. Bell, The resolution of duality gaps in discrete optimization, Tech. Report 81, M.I.T. Operations Research Center. [2] D.E. Bell, Efficient group cuts for integer programs, HBS 78-10, Harvard Business School (1978). [3] D.E. Bell and J.F. Shapiro, A convergent duality theory for integer programming, Operations Res. (1977). [4] C.A. Burdet and E.L. Johnson, A subadditive approach to solve linear integer programs, Annals of Discrete Mathematics 1 (1977) 117-143. [5] G. Cornuejols, M.L. Fisher and G.L. Nemhauser, Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms, Management Sci. 23 (1977) 789-810. [6] R.J. Dakin, A tree search algorithm for mixed integer programming problems, Comput. J. 8 (1965) 250-255. [7] G.B. Dantzig and P. Wolfe, Decomposition principle for linear programs, Operations Res. 8 (1960) 101-111. [8] J.S. D’Aversa, Integrating IP duality and branch and bound: theory and computational experience, Ph.D. thesis (1978), in preparation. [9] N.J. Driebeek, An algorithm for the solution of mixed integer programming problems, Management Sci. 12 (1966) 576-587. [lo] B.P. Dzielinski and R. Gomory, Optimal programming of lot size inventory and labor allocations, Management Sci. 11 (1965) 874-890. [ll] J. Edmonds, Matroids and the Greedy algorithm, Math. Programming 1 (1971) 127-136. [12] M.L. Fisher, Optimal solution of scheduling problems using Lagrange multipliers: Part I, Operations Res. 21 (1973) 1114-1127. [13] M.L. Fisher and J.F. Shapiro, Constructive duality in integer programming, SIAM J. Appl. Math. 27 (1974) 31-52. [14] M.L. Fisher, W.D. Northup and J.F. Shapiro, Using duality to solve discrete optimization problems: theory and computational experience, Math. Programming Study 3 (1975) 56-94. [151 A.M. Geoffrion, An improved implicit enumeration approach for integer programming, Operations Res. 17 (1969) 437-454. [161 A.M. Geoffrion, Lagrangean relaxation and its uses in integer programming, Math. Programming Study 2 (1974) 82-114. [17] A.M. Geoffrion and R. Nauss, Parametric and postoptimality analysis in integer linear programming, Management Sci. 23 (1977) 453-466. [18] P.C. Gilmore and R.E. Gomory, A linear programming approach to the cutting-stock problem, Part 11, Operations Res. 11 (1963) 863-888.
Lagrangean techniques
137
[19] F. Glover, Surrogate constraints, Operations Res. 16 (1968) 741-749. [20] F. Glover, Integer programming over a finite additive group, SIAM J. Control 7 (1969) 213-231. 1211 R.E. Gomory, Essentials of an algorithm for integer solutions to linear programs, Bull. Amer. Math. Soc. 64 (1958) 275-278. [22] R.E. Gomory and T.C. Hu, Synthesis of a communication network, SIAM J. Appl. Math. 12 (1964) 348-389. [23] R.E. Gomory, On the relation between integer and non-integer solutions to linear programs, Proc. Nat. Acad. Sci. 53 (1965) 260-265. [24] G.A. Gorry, W.D. Northup and J.F. Shapiro, Computational experience with a group theoretic integer programming algorithm, Math. Programming 4 (1973) 171-192. [25] R.C. Grinold, Lagrangean subgradients, Management Sci. 17 (1970) 185-188. [26] R.C. Grinold, Steepest ascent for large scale linear programs, SIAM Rev. 14 (1972) 447-464. [27] W.C. Healy, Jr., Multiple choice programming, Operations Res. 12 (1964) 122-138. [28] M. Held and R.M. Karp, The traveling salesman problem and minimum spanning trees, Operations Res. 18 (1970) 1138-1162. [29] M. Held and R.M. Karp, The traveling salesman problem and minimum spanning trees: Part 11, Math. Programming 1 (1971) 6-25. [30] M. Held, P. Wolfe and H.D. Crowder, Validation of subgradient optimization, Math. Programming 6 (1974) 62-88. [31] W.W. Hogan, R.E. Marsten and J.W. Blankenship, The Boxstep method for large scale optimization, Operations Res. 23 (1975) 389-405. [32] R.M. Karp, On the computational complexity of combinatorial prolbems, Networks 5 (1975) 45-68. [33] J.B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem, Proc. Amer. Math. SOC.2 (1956) 48-50. [34] L.S. Lasdon, Optimization Theory for Large Systems (MacMillan, 1970). [35] L.S. Lasdon and R.C. Terjung, An efficient algorithm for multi-item scheduling, Operations Res. 19 (1971) 946-969. [36] J. Lorie and L.I. Savage, Three problems in capital rationing, Journal of Business (1955) 229-239. [37] T.L. Magnanti, J.F. Shapiro and M.H. Wagner, Generalized linear programming solves the dual, Management Sci. 22 (1976) 1195-1204. [38] A.S. Manne, Programming of economic lot sizes, Management Sci. 4 (1958) 115-135. [39] R.E. Marsten and T.L. Morin, Parametric integer programming the right-hand-side case, WP 808-75, Sloan School of Management, M.I.T. (1975). [40] R.E. Marsten and T.L. Morin, A hybrid approach to discrete mathematical programming, OR 051-76, M.I.T. Operations Research Center (1976). [41] J. Mukstadt and S.A. Koenig, An application of Lagrangian relaxation to scheduling in power generation systems, Operations Res. 25 (1977) 387-403. [42] G.L. Nemhauser and Z. Ullman, A note on the generalized Lagrange multiplier solution to an integer programming problem, Operations Res. 16 (1968) 450-452. [43] W.D. Northup and J.F. Shapiro, A generalized linear programming algorithm for mixed integer programming (1978) in preparation. [44] W. Orchard-Hays, Advanced Linear Programming Computing Techniques (McGraw-Hill, 1968) New York. [45] B.T. Poljak, A general method for solving extremum problems, Soviet Math. Dokl. 8 (1967) 593-597. [46] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, NJ, 1970). [47] G.T. Ross and R.M. Soland, A branch and bound algorithm for the generalized assignment problem, Math. Programming 8 (1975) 91-103. [48] J.F. Shapiro, Dynamic programming algorithms for the integer programming problem I: the integer programming problem viewed as a knapsack type problem, Operations Res. 16 (1968) 103-121. [49] J.F. Shapiro, Generalized Lagrange multipliers in integer programming, Operations Res. 19 (1971) 68-76.
138
J.F. Shapiro
[SO] J.F. Shapiro, Sensitivity analysis in integer programming, in: P.L. Hammer, E.L. Johnson, B.H. Korte and G.L. Nemhauser, eds., Ann. Discrete Math. 1: Studies in Integer Programming (North-Holland, Amsterdam, 1977) 467-478. [511 J.F. Shapiro, Fundamentals of Mathematical Programming (1978) to be published by WileyInterscience. [52] J.A. Tomlin, An improved branch and bound method for integer programming, Operations Res. 19 (1971) 1070-1075.
Annals of Discrete Mathematics 5 (1979) 139-183 @ North-Holland Publishing Company
ENUMERATIVE METHODS IN INTEGER PROGRAMMING Kurt SPIELBERG IBM Scientific Markefing
0. Introduction When one surveys the recent literature on integer programming, one almost gets the feeling that there is no such thing as an enumerative algorithm. The papers which talk of enumeration are as often as not de facto branch and bound papers. One might say the subjects are so closely related that drawing a distinction does not matter. However, this paper takes an opposite attitude. It tries to distinguish between the two points of view and to identify problem areas which require enumeration. Partially as a consequence, it places stress on working with the integer vector as an entity (partial solution, state, etc.), and with using auxiliary and logical inequalities to their fullest extent. This specific vantage point results in many references to the author’s work. That is assuredly not meant to imply any proportional merit, nor does possibly scant treatment of other papers, approaches, authors mean to slight their role. On balance it may be good if a field is occasionally reviewed from one specific angle, especially if excellent books and reviews are available to cover other points of view.
1. Problems (given and generated) After stating the given problems, we give associated inequdities derivable from them and from the solution procedure. Especially for mixed integer problems it is often the associated inequalities which will determine the enumeration process. Finally we discuss a form of logical inequalities, in turn derivable from the initial inequality system and/or the associated inequality system. We believe that the proper utilization of the logical inequalities is the best way for replacing a host of (in)feasibility tests which have been proposed for enumeration over the years, and that, moreover, these logical inequalities (often coupled with standard “penalty” techniques) are the proper tools for rational direction of the enumerative process. 139
K . Spielberg
140
1 . 1 . The given problems In this survey we consider either the pure integer-programming problem in rn constraints and n variables y ( i ) , j = 1,2,. . . , n. (IP) min
cy = z , C y < b, y ( j ) a 0 integral, j = 1 , 2 , . . . , n,
(1.1)
or the mixed integer problem with an added set of continuous varaibles x ( k ) , k = l , 2 ,..., p . (MIP) min c y + d x = z , Cy + Dy 6 b, y ( j ) 5 0, integral. 0s x(k); As one does not, in general, know how to solve these problems directly, one almost always has recourse to various altered versions of IP and MIP, usually obtained by the substitution ofsome integer variable values and the dropping of the other integer constraints. We shall sometimes denote the resultant problems by IPR or MIPR, or also by ALP (auxiliary linear program). All of these problems and their results are in a sense functions of the “trial values or intervals” for the y which are substituted. Enumerative techniques are usually confined to 0-1 problems. ( y ( j ) ~ ( 0l}), , or to problems with small interval bounds ( L ( j )C y ( j ) G U ( j ) ) .Alternatively, one can reduce integer problems to 0-1 problems by expansion of the integer variables into polynomials involving binary variables only, but no extensive testing of such options appears to have taken place. 1.2. Associated inequalities (usually due to Benders, [ 121) We shall take the attitude that the MIP is, in a sense, reducible to IP via t h e Benders Partitioning Procedure, or by enumerative schemes based on a set of Benders inequalities [114] (see also [59] for a description), or based on one (constantly regenerated or updated) “associated inequality” (usually of the Benders type, but often substantially modified to be “strengthened”) at each node of an enumerative algorithm (as in [78], for example). That is, associated with every problem (1.2) (and also with problems ( l . l ) , but less interestingly so), we assume the existence (or build-up during a preprocessing or (initial) algorithmic phase) of a set of “associated inequalities”, examples of which are defined further below, of the general form: By G r + 6z*.
(1.3)
In the above, r is a general right hand side, z* is a given or estimated upper bound on z (to be updated whenever possible) and 6 is a suitable vector of 0’s and 1’s.
Enumerative methods in integer programming
141
The inequalities (1.3) usually are, or derive from, Benders inequalities (after substitution of z * for z), of either: (Type 1)
( c + u C ) y - ub G z*, or
(Type2)
-u(b-Cy)GO.
(1.4)
The u and u are variables of the problem dual to (1.1), but it is also well known how to read inequalities of type (1.4) and (1.5) directly from the auxiliary linear programming tableau (tableau of “ALP”). Constraint (1.4) can be taken from the objective function row of any primally feasible tableau which is dually feasible in the continuous variables. Constraint (1.5) may be deduced from any row of a tableau in which all continuous variables have non-negative coefficients. The u or u are generally functions of the iteration number and also of the manner in which the linear program was set up. It is fairly clear that for ALP equal to the relaxed problem of (1.2), i.e. to MIPR, one will get the “best” inequality (in some sense; this can vary with the use one has for the inequality) by taking the dual variables at the termination of ALP (with an optimal or infeasible solution). Such a proof has been given for “surrogate constraints” (due to [69,71]) in [63]; see also [7]. We are much more interested in Benders type inequalities (1.3) which derive from an ALP after substitution of a trial vector y k into (1.2), as is done systematically in the Benders partitioning procedure, but can also be done otherwise. The coefficients of such inequalities and hence their “strength” (for one definition of strength see the next section) are clearly functions of the choice of Yk. Some of the work leading to [135] in the plant location area, suggested strongly that the inequalities associated with neighboring enumeration nodes do not differ sufficiently enough from each other to warrant their simultaneous use, hence the tendency to use single associated inequalities only in [135, 136,78,79]. Recently, an interactive treatment of zero-one problems has been based, on the other hand, on supplying to the program a sequence ( k = 1 , 2 , . . . ) of trial vectors yk, initially picked somewhat at random, but then selected by the substitution of modified solutions (feasible or infeasible) of ALP(ykp’),(see [ 8 2 , 8 3 ] ) . What complicates matters still further, is the fact that for any trial vector y k , there exists usually an abundance of alternate solutions’ u k ‘ (or v k ’ ) , and corresponding associated inequalities of quite varying strengths. Some of the work in [135, 136,781 utilizes a judicious alteration of the u k (facilitated by the additional device of transforming the initial problem (1.2) into a larger problem, the stronger disaggregated formulation ; there is a growing literature on strengthened formulations of integer programming problems (for example [143]), but the initial applications of disaggregation seem to go back to [43,25,26, 1351) into a form which promises strong associated inequalities. See Section 4.1.2 for more details about dual methods and disaggregation.
K . Spielberg
142
1.3. Logical inequalities, reduction The associated and logical inequalities are in the integer variables y ( j ) , and “reduction” is applied to such constraints only. I.e., these inequalities are not meant to be added to the auxiliary problems, they are rather meant to be used for extraction of logical information about the variables. Most conveniently, such information is stored in simple inequalities with 1 and -1 coefficients only: “logical inequalities”. In terms of the variables y ( j ) , they will be written in the form Qy
q.
We shall denote the process of deriving (1.6) from the given and associated inequalities (and z*) by the term “reduction”. The purpose of reduction is: (i) the establishment of possible infeasibility (of (1.1) or (1.3)) (ii) the fixing of variables if possible (iii) the suitable direction (and also curtailment) of the search, otherwise. We shall take the point of view that reduction is essential for successful enumeration, especially for mixed integer problems (even more so for specially structured problems), and that there exist, or should be developed, methods for efficient utilization of logical inequalities. There is a division of opinion on this matter, some authors (e.g. [ll]) believing that only the simple cases (i) and (ii), and the simplest forms of (iii), can be exploited efficiently, but we believe that this is not so and that all the interesting work to be done lies in (iii). Also, it appears that the alternative to using reduction and logical information is to rely on brute force methods which show little promise when problem size expands.
2. Background 2.1. The additive algorithm and extensions
The “implicit additive algorithm” is due to E. Balas ([3]; see also [4] for another exposition and references to earlier non-English versions). It started as a purely enumerative algorithm, its operations consisting only of additions, subtractions and comparison tests. Various improvements were made in references such as [69,62,63,114, 1311. Additional features became increasingly dependent on the use of auxiliary linear programs, and in a sense that direction of work brought enumeration increasingly close to Branch and Bound programming (such as introduced by [110]). More recently, some of the basic enumerative nature (the subject of this paper) has been stressed in connection with highly structured problems (with purely logical constraints which are not handled too well by linear programming) and in
Enumerative methods in integer programming
143
connection with what we call logical methods (as discussed in (1.3)), which derive simple logical constraints from the given problem and the latest obtained information during the search. 2.1.1. The implicit additive algorithm One deals with problem (1.1) and starts a search process at the “origin”, that is with the value y = (0, 0, . . . ,0). The problem is usually not feasible, that is the “updated right hand side ( b - Cyk; y k being the general “current” trial value for y ) is, in general, not non-negative; i.e., the problem is not “currently feasible”. In a “forward step” of the algorithm one sets one of the variables which have not been touched (which are “free”) to one. This can be assumed to increase (non-decrease) the overall cost, the cost coefficients being in (or having been transformed to) non-negative form. We do not give details here, a more general description being found in Section 3. But the main computational features, by now very well known, are easy to discern: (i) The monotone nature of the obj. function permits the search to “back-up” when z exceeds an upper bound z* “ceiling test”. (ii) By grouping positive and negative constraint coefficients together one can derive a number of “feasibility tests”, i.e. necessary conditions for rendering the right hand side non-negative at successor nodes. (iii) Ceiling and infeasibility tests can be “projected”, i.e. one can ask what would happen at the next node if some variable y(j*) were set to one. If such a choice were to lead to infeasibility, the variable could be fixed at 0 (“cancelled”). (iv) The direction of the search can be based on projected overall feasibility (Criterion of Balas [3]), or some related condition, etc.. 2.1.2. Logical inequalities [24,67,99,92,93, 114, 137, 1381 The items (i) to (iii) can be covered in (at least conceptually) more compact form by “reduction” of systems (1.1) and/or (1.3), augmented by the constraint cy S z* (with n o non-negativity required of the c ( j ) ) . We leave some details of reduction for Section 4.3, but illustrate it by examples. One can distinguish between (1) “local” (or “weak”, i.e. tied to an enumerative scheme) reduction, or (2) “free” (or “strong”) reduction. In either instance we generate a special form of logical inequality, termed a “preferred” inequality (or cut):
1j;(i)a1
(2.1)
j4i) = y(i)
(2.2)
in which or
Y(i)= Y(i) = 1- ~ ( i ) ,
K . Spielberg
144
depending on whether the coefficients in the original inequality were negative or not. The number of non-zero coefficients in (2.1) is called the degree d of the preferred inequality. In n-space, a preferred variable inequality (or cut) of degree d eliminates 2”-“ points from consideration. A strong inequality corresponds to small degree. Infeasibility corresponds to d = 0. One must also allow for the possibility that no preferred inequality can be derived. This is easily done, and can e.g. be signalled by setting d = n + 1 . (1) Suppose that a search (possibly with some variables already fixed” has led to a remaining constraint, in slightly simplified notation:
-2y 1+2y2+5y3- y4- y5 6 -2. Feasibility is clearly tied to the coefficients of y 1, y4 and y5. One may derive a “strongest” local condition (strongest in the sense of having a minimal number of non-zero terms) by dropping positive terms, and by combining the constraint successively with the hyper-cube constraints y4 C 1, y5 C 1 and y 16 1 (in order of decreasing negative coefficients). One proceeds until the right hand side of the resultant inequality is such that it would become non-negative if the next step were executed. As a final step one changes the remaining negative coefficients to -1 each (this can be viewed as a somewhat trivial application of Gomory’s procedure for deriving all-integer cuts, [73]). For the above example one gets the “minimal” cuts:
y l + y 4 3 1, y l + y 5 3 1. These local preferred cuts were introduced in [1141 for the purpose of limiting the search, and only minimal preferred variables were retained. (2) Stronger preferred inequalities can be obtained by lessening the emphasis on feasibility. Instead of dropping the positive terms, one complements them and then proceeds as above in the new variables y(j) (y(j)=y(j), o r y(j) = y(j) = 1 - y(j), as determined by the signs). Basic references are [24,67,99,92,93, 137, 1381. Strong Reduction. For the given example, one has: - 2y 1- 2y2- 593 - y4- y5 C -9. 1st pass: d = 1, 7 3 3 1 . 2ndpass: d = 2 , y l + y 2 3 1 , y l + y 4 2 1 , y l + y 5 2 1 jj2+y431, 7 2 + y 5 2 1 . The inequality system (1.6) is: 0 -1 +1 0
0 0
+1 0 +1 0
-1
0
-1
Enumerative methods in integer programming
145
The “degree d” of a minimal preferred variables system (1.6) is the number of non-zero coefficients of each of its component inequalities. By extension we also say that the degree of.the systems from which (1.6) is derived is equal to d . In general, the degree of a system depends on the procedure which is used for reduction. It is well known that one can occasionally form linear combinations of linear inequalities which could lead to lower degree logical inequalities upon reduction. The trouble is that there exist no general procedures for forming such linear combinations (some starts in such a direction have been made in [loo, also 391. Minimal inequalities in the sense of “degree” have been used in [137, 138,7882,11, . . .], the other authors primarily (but not exclusively), generating a wider class of inequalities so as to characterize the entire integer polytope (the difficulty, here, lies in the extremely large number of such inequalities), [92,24,97]. The degree of a system is an important measure of its strength, and directing a search towards regions of low degree is a major tool for keeping the exponential nature of enumerative search efforts within tractable bounds. 2.1.3. Partial orderings This is probably a good place for stressing an important special case of enumeration, as exemplified by the very first applications of implicit enumeration, and some recent attempts at generalization. The rest of this paper will not assume any a priori partial order. As mentioned before, the initial implicit enumeration algorithm [3] assumes origin y = 0 and cost c 2 0 , the tacit implication being that there exists a partial orderings such that for any two vectors y and y ’ : (i) y 6 y ’ , or (ii) y ’ s y, or (iii) neither y S y’, nor y ’ s y, and such that y s y’ implies cy =G cy’. Adjacent vectors y and y’ are then such that (i) or (ii) holds, and 3 no y” with y S y “ 6 y‘ (or y ’ S y ” S y). These properties give the strong result that, having a feasible solution vector y one may discard all y ’ s y as being too expensive (trivial “ceiling” test). On the other hand, the choice of any particular starting point y = 0 has been shown to be potentially quite poor (see Section 3.4) and therefore no a priori partial ordering will be assumed in the sequel. Nevertheless, the concept of a general partial ordering P is of intrinsic interest and has been studied in a number of recent papers (see [32], and references in it). The numerical results do not suggest wid,: general applicability of any particular ordering, but one may hope to be able to apply the concept profitably to specially structured problems. 2.2. Enumeration for mixed integer programming The advantage of Benders Partitioning [12], over branch and bound programming lies in its potential for preserving any special structure (1.2) may possess. Its
146
K . Spielberg
disadvantage lies in the requirement of resolving an integer program at each iteration (essentially to provide (i) a trial vector y k , (ii) a lower bound for 2 ) . Computationally more attractive seems to be the imbedding of the partitioning concept within a n enumerative scheme. 2.2.1. Partitioning and enumeration [ 1141 The algorithm proceeds exactly as does the additive enumerative method, except for the buildup of a system (1.3),plus any pure-integer constraints in (1.2). Benders inequalities can either be obtained from relaxed versions of (1.2) or from linear programs resulting after substitution of one of the y k (trial or “state” vectors). Care must be taken as to the domain of validity of (1.3). If the L P tableau is updated for fixed variables during simplex iterations, then the resulting BIQ’s (Benders inequalities) are globally valid. If not, they are valid only for successors to a given node, and must be suitably replaced on backups of the enumeration. For other expositions of the method see texts such as [59]. For some modifications see [140]. Even though this is a promising approach, and probably the natural extension of enumeration to mixed integer programming, there exist no general implementations. Judging from frequent references to the solution (heuristic, enumerative, or by special methods) of BIQ systems for some special problems, there appear to be special purpose algorithms which are closely related to the above, (e.g. [117]). One of the difficulties is the similarity of BIQ’s taken from adjacent nodes in an enumeration. It can be overcome if one solves carefully chosen “state problems” (with all, or most, y ( j > fixed; see Section 3.1.2), so that non-adjacent lattice points come into play. A device for eliminating “weak” BIQ’s is to perform reduction on them and drop (at least locally) those which are of high degree. 2.2.2. Enumeration with associated inequalities We now discuss an algorithm in which essentially one associated inequality is generated at every node (from a “state problem”, i.e., a problem with a fixed vector y k , such as obtained from the search tree plus any additional information available or computable). Some references are [137, 138,78431, often under the name state enumeration. Future implementations are likely to shift again to the scheme of 2.2.1, more Benders type inequalities being generally beneficial if carefully chosen. However, there are problems which are so special that one BIQ can be carried along and updated very efficiently. A prime example is the Simple Plant Location Problem. In that problem the coefficients of the associated inequality can be carried along in a strong form (the u are chosen quite carefully from general duality conditions), see [135,78]. They can be interpreted physically and are named “gain functions” in [135, 136,78,55], and are used in [37], where some general properties are
Enumerative methods in integer programming
147
given and exploited in the computation. Further general relationships (“submodularity”) have been established in [ S S , 8, 1211, with some connections to the Russian literature. One of the plant location algorithms has been shown to be “strong” in terms of “Held-Karp” Relaxation [98, 1011 (or “Lagrangian” Relaxation (e.g., [47,S2, 133,651) in [37]). The point is that there exist important special problems for which one associated inequality and its coefficients can play an overridingly important role, and that powerful cancellation tests and branching rules for enumeration can be based upon it. 2.3, Relation to branch and bound programming Most algorithms which are currently called enumerative, are really branch and bound methods of a special LIFO (last in, first out), or “single-branch”, see [42,30]) nature. As will be spelled out in more detail (3.1.3), enumeration in the strict sense is characterized by: (1) solution of strongly constrained problems (with an entire vector y k substituted in (1.1) or (1.2)) (2) choice of the y k to be often non-adjacent in the integer lattice (3) implicit handling of the structure, such as: multiple choice constraints, precedence constraints, conflict constraints, etc.. (4)consistent efforts at constructing and using logical inequalities, so as to provide some “logical structure” where it may not exist in an evident fashion. 2.4. Hybrid methods It is accepted today that integer programming requires the use of many techniques. The incorporation of cutting plane techniques within enumeration, say, could then be called a “hybrid” method. However, we shall reserve this term for the more narrow area of combination of techniques in a structural sense, with some instances given in 2.4.1 and 2.4.2. 2.4.1. Combination of BB and enumeration Out of a great variety of possibilities we describe two simple ones. (i) BB followed by enumeration. One starts by building up a “multi-branch’’ (or “multi-leaf”) branch and bound tree to some desired size (in terms of number of pending nodes, or of overall storage requirement). Then one “processes” the pending nodes sequentially, in some order determined from the node information, by means of a less costly (at least in terms of storage requirement), enumeration procedure (or possibly a heuristic procedure for producing feasible solutions). This approach, indeed, may be imperative when one has expended a great deal of work in producing the tree, without having obtained a commensurate return in terms of acceptable integer solutions.
148
K. Spielberg
(ii) Enumeration followed by “final” BB schemes. One starts with an enumerative procedure up to a certain search level 1 (see Section 3.1.1). Then one continues with the last node (at level 1) and builds from it a branch-and bound structure in the usual manner. While the first approach is of evident interest, if only in terms of storage, the second appears to be unattractive. But this may not be as clear as it seems to be. One instance in which such an approach appeared natural, was the “propagation” phase (branching on carefully chosen variables, after an initial ALP, so as to leave the LP tableau unaltered; see [138] and Section 4.4.3). Procedures more complicated than (i) and (ii) are clearly possible, but probably undesirable in view of the resulting complexity of the book-keeping. 2.4.2. Splitting and redrawing of trees The “flexible backtracking” procedure of [124] rests on the possibility of splitting a “single-branch’’ tree at any one of its nodes into two trees. The purpose is to resume the search at the node at which splitting took place, in a direction opposite to that which has been unsuccessful up to the moment in question. A “redrawing” idea (see [29]) is also concerned with information obtained relatively late in a BB search at two different pending nodes, and redrawing the tree so as to combine these two nodes at saving. See Section 3.4.3 for somewhat more details. 2.5. Algorithm and data interdependence Perhaps the most striking characteristic of Integer Programming is the great instability of any given algorithm as a function of the data. There is no doubt at all about the necessity of providing general algorithms with tools for some degree of adaptation to the data. We discuss first the relatively simple matter of prioritizing multiple strategic criteria, and secondly a few instances of more general adaptive devices. 2.5.1. Multiple strategic criteria Enumeration depends critically on the right choice of the branch index j” at some node of the search tree. The choice is governed by a “branching criterion”, which is often based on “penalties”. I.e., one chooses j* so as to maximize a penalty measure ~ ( j )This . measure may be a lower bound on Az, the increase in z (conditioned upon either setting y ( j ) = 0 or y ( j ) = l), or it may be the difference between A z ( y ( j )= 1) and A z ( y ( j )= 0), etc.. That is, there may be several criteria ~ ‘ ( j )p, 2 ( j ) ,. . . . In the most interesting case, some criteria may be quite disparate in nature. E.g., reference [138] is concerned with (i) a penalty measure and (ii) a “contraction” measure at the same time (see Section 4.4.3). One may then be willing to choose a variable j as branch variable j* if: (a) its penalty measure is within a certain percentage fraction of the maximal penalty measure, and if
Enumerative methods in integer programming
149
(b) at least one of the branches y ( j ) = 0, y ( j ) = 1, leads to a reduction (“contraction”) in the degree of the system, i.e. to a more constrained system. In [138] this matter is handled in a rudimentary fashion; see also [ll] for such ideas. But there is clearly a need for a more systematic and automatic procedure. A user of a computer program should be prompted (a certain degree of interactivity is assumed, such as in [83]; the trend of computing is unquestionably in such a direction) to list his strategic priorities and to supply the corresponding control parameters. This information can then be used for a successive narrowing of a branch candidate set, with the purpose of an ultimate choice which meets all objectives tolerably well, or better which meets those objectives that make sense in the particular algorithm-data configuration at hand. Of course, such a program has to have suitable defaults for criteria and parameters. It should also collect statistics about the effectiveness of the criteria (at least in an initial phase of the computation), so that a certain measure of alteration of strategy becomes possible after some computation, either by another user-intervention or preferably by the program itself (“artificial intelligence”). 2.5.2. Adaptive procedures: Matching algorithm and data Little work has been done on more general devices for adapting procedures to data (in a modest fashion [78] can be viewed as a step in such a direction; see also Section 3.1.2). Yet the possibilities are manifold, and there is need for such work. We cite some possible examples: (a) One may precede an algorithmic computer run with a diagnostic preprocessor. Its purpose would be to examine the general nature of the problem and to “suggest” actions (strategies) of algorithmic kind. E.g.: - special structure (not so easy to detect automatically): suggests decomposition or partitioning; - multiple choice sets: can be removed from the matris and handled implicitly; - for an inequality system of low (high) degree, signifying a strongly (weakly) constrained problem: one may invoke (not invoke) logical inequality methods; - given knapsack or set covering type constraints (this may be difficult to detect, and one may need user-input to identify such problems): one may routinely round up the (relaxed) ALP solutions to obtain integer-feasible solutions. The preprocessor phase may also try out various techniques and then construct an overall code which incorporates those techniques which have been found to be “useful”. (b) Somewhat more ambitious would be attempts at adaptivity during algorithm execution. Trivial instances might be: - the termination of cutting plane ‘imposition when the resulting increase in
150
K. Spielberg
objective function becomes negligible (the “foot-ball’’ effect of dual degeneracy) ; - the deletion of logical methods when the degree is large (the problem is essentially “unconstrained”), to be replaced by careful consideration of the objective function in place of the constraints. The design and testing of more intelligent adaptive devices is clearly an important research topic.
3. Structural and some computational considerations As simple as enumerative schemes are, their description has not been completely standardized, and we have to define some terms. A binary n-vector y is called a trial vector, sometimes a solution. It is called a feasible solution if it satisfies (l.l),or if there exists an x such that (x, y ) satisfy (1.2). It is called an optimal solution if it is feasible and minimizes z = cy of (l.l), respectively if (x, y ) is feasible and minimizes z = cy + dx of (1.2). In a search there is an iteration count u = 1,2, . . . , hopefully remaining very much below the completely intractable maximal value 2”, and a corresponding sequence of trial values y ” . At iteration u the vector y is called the current trial vector or solution. If feasible solutions have been found as of iteration u, the best is denoted by (x, y,z)* and is called the incumbent solution (z* the incumbent objective function value). In a machine program one usually initializes z* to a prohibitively large value L, in order to indicate that a feasible incumbent solution has as yet not been attained.
3.1. The search tree Except for Cutting-Plane and Heuristic Algorithms, most methods have the sequence of trial vectors tied to a simple scheme which is often visualized as a “tree”. In its most basic form a tree can be represented by one list of signed numbers. Sometimes it is necessary or convenient to have several lists of numbers, e.g. if one wishes to record the entire history of a search rather than its “current” status, o r if one tries to avoid certain redundancies of calculation. One may then display these lists in a number of diagrams, but will still speak of one search tree. There are clearly complicated trade-offs between simple search trees and computational schemes on the one hand and highly complicated schemes with substantial tree storage requirements on the other. The trend has always been to stay with simple schemes (simple in terms of structure) and the burden of justification for more complex structures has to be borne by the originator of such a structure.
Enumerative methods in integer programming
151
I=O 2 -2
1 3 - 3 4-
1 1 1 2 -11
-1
3
7 -7
5
‘-Current
node Iteration v Level = 5
Fig. 1 . “Single-branch’’ tree (level-diagram).
3.1.1. The level -diagram Figure 1 characterizes a search in terms of a “level-diagram”. We define the “level” of an enumerative search to be exactly the number of variables (components of y ) which have been explicitly and arbitrarily fixed at 0 or 1 . Such a choice is also called a “branch” or “forward step” of the search. The little circles 0 in Fig. 1 represent local computation or processing, which can be of any order of complexity but does not involve the “arbitrary” fixing of a variable. Examples of such processing may be:
- substitution of a trial vector in (1.1) solution of a linear program or other “relaxed” versions of (1.1) or (1.2) computation of bounds from the data - computation of logical implications from the data, etc.
-
The search proceeds from the left (level 0) to the right (levels 1 , 2 , . . .) by means of carefully chosen branches which are indicated by a horizontal line with the coefficient index written next to it. A + ( - ) sign means that the variable is fixed at l(0). In Fig. 1, the search is at level 5 and the variables y(l), y(2), y(7) and y ( l 1 ) have been explicitly fixed at 1, and variable y(3) has been fixed at 0. As long as one can not exclude the possible existence of improved solutions among LLsuccessor nodes” (nodes to the right of current level l ) , the local processing terminates with a further branch towards the right. When all successor nodes have been taken care of, either explicitly or implicitly by various tests, then the current problem has been “resolved” (the current node has been explored; sometimes the term “fathomed” is used instead), and the search “backs up” to the next lower level on the left. The vertical lines (or their endpoints) are said to correspond to “pending” nodes, and usually correspond to the complementary situations which have yet to be explored when the search “returns” to the appropriate level after all the vectors corresponding to higher levels (with the indicated branches) have been explored. Our terminology differs somewhat from that of other authors such as Balas [4] and Salkin [ 1301. The implicit fixing (“cancellation”) of variables during local processing is not counted (even though it is, of course, the main objective of the scheme and is taken care of in the book-keeping separately). In particular it should be noted that the “pending nodes” of Fig. 1 do not result
K. Spielberg
152
in explicit branches. When the search returns to level 1 from level l + 1, the branch variable y(j*) must be fixed to the complement of the former branch value. I.e., no arbitrary choice is involved and hence no increase in level. A new branch variable is selected, if necessary, from among the remaining ‘‘free” (not fixed) variables. As a consequence one can call a “low-level” search, that is a search which never exceeds a modest level, a well-behaved search: only relatively few explicit branches are required and the combinatorics of the problem is kept from getting out of hand. The need for low-level enumeration can perhaps be appreciated only by someone who has already experienced the helpless feeling attendant an enumerative or branch and bound scheme which churns far from the origin of the search, the explicitly fixed variables having evidently been chosen wrongly. In this connection see also the section on Generalized Search Origins and related techniques. The leftmost node in the level-diagram is called the search origin. The term has a temporal as well as locational significance. That is, the node at the origin changes in time. Whenever the search returns from level 1 to 0, at least one variable (the branch-variable) is fixed at a new value. The same can be said for the vectors at any level, which we shall denote by y“’ (the superscript enclosed in parentheses). During a search the vector y“’ will correspond to many different trial vectors y”. The distinction between y“’ (current trial vector at level I ) and y”(current vector at iteration u ) is solely one of point of view. 3.1.2. The “state”; partial solutions, processing In addition to the information contained in the level diagram, one must retain information about
(i) the variables which have been fixed not only explicitly (and arbitrarily) but also implicitly (i.e., non-arbitrarily, usually through implications from the problem data). and also about (ii) tentative values for the “free” variables (those not fixed). The fixed variables. A common and apparently efficient way of accomplishing (i) is illustrated in Fig. 2 by an example: P=O
I =1
I =2
P=3
f=4
r=5
Fig. 2. Fixed variable diagram.
Figure 2 is equivalent to two lists of numbers, one for the signed indices, the other (a binary list) for marking the understruck indices.
Enumerative methods in integer programming
153
The latter identify cancellations of variables which follow from the problem data at a particular point of the search (in time and location; only the latter being recorded in the diagram). In the particular instance of Fig. 1, variable y(9) has been fixed at 0 on level 0 (whether from logical tests or from a previous return of the search to level 0 after an earlier branch y(9) = 1 from l = 0 to l = 1, is not distinguishable from Fig. 2), variable y(2) has then been fixed to 1 explicitly in a branch (the simpler term “set to 1” is used sometimes) from level 0 to level 1, where variables y(4) and y(12) have then been fixed implicitly (or “cancelled”), etc. On the next return to level 0, variables y(4) and y(12) must be reset to “free” (their cancellation will in general have been contingent upon y(2) being at l), and y(2) must be fixed to 0, before the next computational step (processing) at level 0.
The free variables. Note that the above gives as yet no information about the free variables. But as stated in the introduction, we consider it a basic characteristic of enumeration that dispositions are made with respect to all free variables, o r at least a non-trivial subset of them. More often than not, the diagram of Fig. 2 carries with it, as it did certainly in the initial realizations of the additive Balas algorithm, two unstated assumptions: (i) all free variables are temporarily taken to be 0, or (ii) all free variables are taken to have been initially defined such that their cost coefficients are non-negative (and (i) is assumed for the transformed variables).
In general, we can drop these assumptions with little serious loss (especially when we replace all the possible “feasibility tests” of 4.3.1 by general reduction procedures). We suppose rather, that at each level (including the origin) and iteration the index set of variables will be partitioned into as many as five sets. The state; partial solutions; processing. We shall identify the “State S”” (or, also S“’) as a partitioning of the index set J = 1,2, . . . , n into five sets: S ” :J = E + Z +F1 + F 2 + F3,
(3.1)
with:
E ={j y(j) fixed at l}, Z ={j y(j) fixed at 0}, F1 = { j y(j) tentatively at l}, F2 = { j y(j) tentatively at O}, F3 = {j I 0 s y (j)s 1). The sets Z and E are determined by the tree, i.e. recorded in a diagram as in Fig. 2, but the partitioning of the free variable set is at the analyst’s disposal. We shall take the point of view that the sequence of trial vectors y ” is determined by the states S”. I.e., we take it (see [78-80]), as do authors who
154
K . Spielberg
stress the careful consideration of partial solutions (e.g., [ 118]), mostly in connection with specially structured problems, that enumeration algorithms are characterized by the resolution of “State Problems”, i.e. of problems in which the free variables are constrained as described by the State. In contrast to this, of course, many algorithms only use the relaxed problem, i.e. a problem which could be characterized by a State in the trivial sense: Completely Relaxed Problem:
F1= F2 = 0, F3 = F.
(3.2)
Such algorithms should best be called “single-branch’’ [30,42] or LIFO (last-in first-out) Branch and Bound Algorithms ([ 1101). An alternative terminology is that of “Partial Solutions y(P)”. P identifies the index set of variables to which binary values have been “assigned”. A “completion” of y ( P ) is then any vector y in which all components with j E P have been assigned the values of y ( P ) and all others have been assigned some binary value. In practice these concepts seem to be used with two different meanings, depending on what is meant by “assigned”. In one case, “assign” means fix and P = E + 2, in the other case assign means both : fix y ( j ) at 0 or 1 for j E Z + F and “set” (as used in ref. [118]) y ( j ) to 0 or 1 for j E F2 and j E F1, respectively. In the first case the completions are all possible trial vectors for 1’s1, in the second case they are all such possible trial vector with the additional stipulation that the y ( j ) for j E FZ = F1+ F2 are left unaltered. We believe it best to describe the State (or partial solution set P ) explicitly in terms of the sets E , Z , F l , F 2 , F 3 , and to spell out what successor nodes v’ at 1‘2I (always including later vectors y ” at level 1 ) will be considered. Setting the state (especially at the origin); processing. The proper definition of the state is crucial at least for:
(i) finding feasible or near-feasible solutions (ii) getting strong integer inequalities (usually of the Benders type [12]) from the state problem (iii) deriving strong logical inequalities from the integer inequalities (original or obtained as in (ii)). Item (i) is clearly important, but items (ii) and (iii) are both more interesting and also more conjectural. The likelihood of their utility in (A) Curtailing and (B) Positioning and Directing the search rests primarily upon evidence from computations in the plant location area ((see 2.2.2) and the section on the “Search Origin”). One can take the point of view that the above is especially important at the commencement of the search, when E and 2 are usually empty and when a minimum of information is available to work with. One may therefore be willing to expend a good deal more effort at processing for level 0 than for any other
Enumeraliue methods in integer programming
155
level. The processing may involve: (A) Steps for Curtailing the Search:
(1) Logical Preprocessing: Reduction. Le., attempts at fixing variables from the constraint set (and possible other relations). (2) Solution of the auxiliary problem with F 3 = F. One may be willing to resolve a Linear Program, possibly followed by additional attempts at getting strong lower bounds z on the objective function, or even integer feasible solutions, e.g. through use of penalty methods (e.g., see [45, 141, 1381). (3) Going further than in (2), one may follow the linear program by employment of cutting-plane techniques (e.g., Gomory, Gomory-Johnson cuts [73,75]; also Intersection Cuts, e.g. [6,33, etc.], even if one avoids such expensive computational devices elsewhere in the search. (See Section 4 about cornputational results). (4) One may resort to: Dual Methods (exact or heuristic resolution of the full dual versions of the auxiliary problem), [25,43,52,46,86], or Relaxation Methods (solution of special relaxations of the auxiliary problem, often arising from appropriate partitioning of the constraint sets in the primal problem), [47,98, 101,54,53]. Such solutions are often easier to obtain than solutions to the primal problem. (B) Steps for Positioning and Directing the Search: Positioning the search is equivalent to defining a state (a good partial solution). Directing it is equivalent to finding branches which will lead to good feasible solutions or to regions of the integer lattice in which reduction tests become effective in curtailing the search. ( 5 ) Heuristic methods, both locally and globally, may be employed to find feasible or near-feasible points. The state may then be defined to correspond to the conjectured near-feasible point, [95,96,104,112, 127, 1281. (6) The fractional solution of an auxiliary problem may be rounded, with a suitably chosen rounding parameter, to give an integer solution, which is usually infeasible but often a good starting point for further enumeration, [131, 129,811. (7) Auxiliary inequalities can be generated in many ways, e.g. as linear combinations of already existing inequalities (“surrogate constraints”, [71,7]; the optimal dual variables for the relaxed sub-problem have been shown to be “best” in a sense [63], but the computational improvements have been modest), or .as Benders inequalities from problems with fixed values for y. (8) Generation and exploitation of logical inequalities of “degree” (no. of non-zero coefficients, always 1 or -1) larger than 1, [114,92,93,24,67, 137, 1381. Le., the state can be chosen so as to satisfy the system of derived logical inequalities as closely as possible, by some measure (this has not been tested). Note: It is essential, of course, to respect fully any logical relations which are in
156
K . Spielberg
the initial structure of the problem, such as: - multiple choice equalities or inequalities - conflict relations (see the section on machine maintenance) - precedence relations, etc.. 3.1.3. Enumeration versus branch-and-bound We are now in a position to reflect upon typical differences between and relative merits of Branch-and Bound methods and enumeration. Table 1 summarizes some of these points. Table 1. Branch-Bound versus Enumeration
1. integer problem replaced by sequence of relaxed problems 2. relaxed problem resolved at each node 3. examined lattice points “close” to each other 4. surrogate and Benders inequalities tend to be weak 5. structural logical constraints are difficult to accommodate 6. mixed-integer problems handled easily and automatically 7. special-purpose codes difficult to code (when robust LP-code needed) 8. relaxed problem often degenerate
1. integer problem replaced by sequence of constrained problems
2. full state problem often solved only at subset of nodes 3. examined lattice points can be far apart 4. generated inequalities tend to be strong (esp. in a “logical sense”) 5 . structural constraints can be handled to advantage 6. mixed-integer problems require generation and use of Benders inequalities 7. special-purpose codes easily constructed 8. state problem often not degererate
Comments. (1) The constrained problems are the state problems. Note that for the pure integer problem, with all variables “set” (to F1 or F2), the state problem amounts to mere substitution. While this may be “weak”, there may be other features of the enumeration which outweigh it. E.g., a specially imposed structure may make cancellations easy (directly or via logical inequalities), so that the triviality of the state problem becomes an advantage rather than a drawback. (2) In both BB programming and enumeration it is not strictly necessary to resolve anything at every node of the search. Reflection show that only “terminal” pending points (of locally maximal 1, i.e. points from which backups occur) must be substituted if one is not to miss a possible solution. In the BB environment this observation is of little avail, since the entire scheme is intimately tied up with the resolution of relaxed problems at each node. In enumeration that is not quite so. See the machine maintenance problem (with search origin 0) for which only terminal points can be feasible, as an example [87]. (3) “Distance” is measured here in terms of the number of components which differ from each other. The imposition of different states at successive nodes of enumeration may be tantamount to jumping from a lattice point to a point which is close to the predecessor point only in the fixed variables.
Enumerative methods in integer programming
157
(4)The judgment expressed in the table is based strictly on computational experience. Generating batteries of inequalities at successive nodes proved quite useless in experiments with “Simple Plant Location” schemes (preparatory work for [137]). Alsp, it rests partly on the assumption that the inequalities would be used more for bounding purposes in BB programming, and more for logical purposes (derivation of logical inequalities for fixing of variables and boundreduction) in enumeration. Our experience has always been that they are not very useful for the first task, but highly useful for the second (especially when coming from widely separate points of the lattice, as would be more natural in enumeration). (5) Point 5 is probably the most important point in favor of enumeration. It is clear that special equality constraints, such as multiple choice constraints, can be handled implicitly by enumeration, usually to great advantage. This is not to say that BB codes can not employ special devices as well (see e.g. “SOS” sets, as introduced in [20,21] and incorporated in many BB production codes. But our perception is that enumeration will still outperform BB in this area (with the notable exception of “S2 sets” ([20,21]); see also [36] and [87] for applications), which are an important adjunct to branch-and-bound programming. (6) Point 6 is the most clearly BB oriented item. Nevertheless, there are numerous special problems for which enumerative methods may still be preferable, usually by virtue of point 5. Examples are again problems with imposed multiple choice and/or conflict structure (see machine-maintenance), plantlocation and fixed-charge network flow problems (for which concentrating on the all-important zero-one decision variables, namely those associated with the “building (or operation)” of plants, can be handled more easily by enumerative techniques (this point can probably be disputed by BB analysts)). (7) The coding of a robust linear programming code is a non-trivial task. Many very fine BB Fortran codes have been abandoned, in spite of all the sophistication which had been lavished on them, because they were found eventually to be inadequate to the task of handling really large linear programs. This difficulty is much less serious for many enumerative situations, in which the linear program is not absolutely essential, but only an auxiliary tool which can be abandoned when necessary. The situation can potentially be remedied when large-scale LP packages provide the interfaces required for the integer programming analyst. MPSX/370 of IBM has a PL/1 interface which has such a potential, [22,23]. (8) Contrary to rumors about the “moribund” state of linear programming (in the sense of the analyst to whom moribund means that there is no more challenge left; [28]), large scale LP problems can still be inordinately difficult on account of degeneracy, and there seems no prospect for easy remedies. The node problems of the BB approach may therefore prove to be too difficult, and enumerative approaches (or heuristic ones; which are intrinsically more related to enumeration than to BB programming) may be required. An outstanding example of this is the Airline Crew Scheduling problem [127, 1281, which was resolved in a practical
158
K . Spielberg
sense by a carefully designed heuristic approach of the type used by Lin for the travelling salesman‘problem [ 1 121, aided by auxiliary problems of the set-covering type. After all of the above has been said, most of it in favor of enumeration, one must not forget that the actually used programs, especially on a production level, are of the BB type and will likely remain so. A determination as to what is the right approach is not always easy to make, and this section is only meant to give some of the possible arguments. 3.2. The algorithm Having said a good deal about structural and a little about computational matters, we can sketch a typical algorithm in broad terms. 3.2.1. General scheme The flow in the following is such that one proceeds to the next line unless told otherwise. 1.
Initialization. Set 1 = 0, v = 1. Let
T be a given tolerance. Set data, problem bounds, etc. Initialize all variables to free. Permit input of an imposed state S ’ ={E, Z, F1, F2, F3). 2. Processing. (Local Computation). 2.1. Preprocessing. 2.1.1. Reduction Procedures. Apply reduction procedures to fix variables and to shrink bound-intervals (this should to a large extent subsume the many known feasibility tests of integer programming). If problem infeasible, go to 4. 2.1.2. Penalty Type Feasibility Procedures. ) here o r in prior calculations), Given a set of penalties ~ ( k (computed and/or expressions for the objective function z in terms of the variables (typically an obj. function row of an LP tableau), perform “ceiling tests”, such as: “is z ” + ~ ( k3)z* - T?” for all k. The objective again is to establish possible infeasibility, o r to fix variables and reduce bound intervals. 2.1.3. If no auxiliary processing requested, go to 2.3 . 2.2. Auxiliary processing. 2.2.1. Solve the relaxed version IPR” (or MIPR”) of the problem. If it is infeasible, or if its obj. function ~ z * - T ,go to 4. 2.2.2. Compute penalties and repeat penalty type feasibility tests.
Enumerative methods in integer programming
159
2.2.3. If so desired, apply rounding, o r heuristics, o r cutting plane methods. If applicable, test objective function as in 2.2.1. Record any newly found improved solution. Record new Benders inequality, if so desired. 2.2.4. If imposed or inherited state is to be used, go to 2.3.2. 2.3. The state problem. 2.3.1. Determine a new state S”, usually from IPR” or from logical inequalities. Make sure to respect the inherent structure of the problem. 2.3.2. Solve State Problem. 2.3.3. Record any new improved solution. 2.3.4. Generate and store the new state inequality (usually a Benders type inequality). Forward step. 3. 3.1. Strategy program. Determine the new branch index j * , e.g., from - penalties - logical inequalities - the imposed structure, etc. In conjunction with the above, e.g. after having narrowed the choice to a “candidate set” JC, one often selects the final branch via a criterion related to the prospective overall infeasibility at node U S 1 (e.g., the Balas criterion [3], which minimizes the sum over the prospective row infeasibilities). 3.2. Update problem data. Set l = l + l , u = v + l . Go to 2. Backward step. 4. If l = O , terminate. Update problem data. Set l = l - l , v = u + l . Go to 2.
3.2.2. Special versions As sketchy as the algorithn of Section 3.2.1 is, it still involves more steps than are needed for most special applications. For such problems, one or the other consideration often far outweighs everything else in importance. One may then do quite well by concentrating on it and implementing the rest of the algorithm in bare outline. A special version of the scheme might be one (similar to the one used in maintenance scheduling; see Section 5.3) in which each branch results in an increment to the objective function of c(j*)>O, and all that needs to be checked on the one hand is whether the problem is (or can become) feasible, and on the
160
K. Spielberg
other hand whether the cost is still below the incumbent cost, while at the same time strong cancellation tests are available from the problem structure.
1. Initialization. v = l , 1=0. 2. Processing. - Fix all variables which would violate the multiple choice and conflict relationships if set to one. - Is y ” feasible and can it be completed feasibly? If y ” is feasible and constitutes an improvement, record it and go to 4. If y ” can not be completed feasibly (if there is no branch candidate), go to 4. 3. Forward step. Choose j” in terms of “preferred sets” and minimal costs Modify data, set 1=1+1, v = v + l , go to 2. 4. Backup. Update tree. If 1 = 0 terminate, Set 1=1-1, v = v + l ; go to 2. For an applications analyst it would appear well worthwhile to acquire the skill of setting up an enumeration procedure in skeleton. Thereafter, it is usually not difficult to flesh out the algorithm by the incorporation of tests and strategies which exploit the structure of a particular problem. On the other hand, it is quite difficult, if not impossible, to devise a general program which will accommodate a wide variety of problem - data combinations without becoming ineffective because of lack of technical tools on the one hand, o r so overloaded with devices as to be inefficient on the other.
3.2.3. Computational considerations We need some idea as to the relative computational costs of various devices. There exist a number of recent studies about the effectiveness of enumerative schemes, such as [ l l , 118,120,122,1241, and we shall try to draw some inferences from them. This is not so easy, since some of these studies are more of the branch-and-bound than of the enumerative type. Because of the diversity of the approaches, it is probably best to give a brief discussion of each paper first. Particular points can be discussed further in later sections of the paper. In the second part of this section we draw some inferences from our own experimentation, usually carried out on relatively small problems in the interpretive language APL. The emphasis there is primarily on cutting down the number of linear programs which need to be resolved. However, we shall give a small table of relative timings which help to give an idea as to the overall possible efficiencies one might be able to achieve in a different environment.
Enumeratiue methods in integer programming
161
A. Computational experimentation. ( 1 ) Reference [124], takes medium sized problems from the literature [31,91, 114, 123,1041. The algorithm is essentially single-branch (LIFO) branchbound. A carefully coded dual linear program is used at every node. Some conclusions have pertinence for enumeration. - 9 branching criteria are compared. Penalty methods perform best, with a slight edge going to selecting basic variables which are not “quasi-integer” (here within 10% of integrality) and give the largest penalty, over using the improved penalties of [141] and over using those variables which show maximal difference between “Up- and Down” Penalties. “Pseudo-Penalties” (see [22,23,5 11) fare worse (they are presumably best for large-scale BB programming), as does the Balas criterion (see Section 4.4), probably because of the algorithm’s heavy reliance on LP rather than enumeration. -As has been observed by many authors, and given rise to the “flexible origin” methods (the meaning of which is somewhat misinterpreted by the author) in [136,131,129], some of the harder problems prove intractable if the algorithm starts off in the wrong direction (as it is highly likely to do). The author then proposes a “flexible backtracking” approach, which consists of selecting a node other than the last one, and splitting the tree judiciously to aviod non-redundance of computation at the price of storing multiple trees. While this can surely lead to added storage requirements which may prove burdensome, the author is able to solve difficult problems (esp. the (28,89) problem of [114]) which are not easily resolved otherwise. - “Surrogate Constraints (Benders Constraints?)’’ (see [71, 114,7,63]) are generated (after each LP presumably) and used in blocks of 4, rn (# of constraints), or 30. Surprisingly, an increase in the number of such constraints does not affect the performance significantly. This appears to be in some contradiction to the improvement derivable from the use of surrogate constraints, as reported in [71]. The discrepancy may lie in a difference of use to which the constraints are put (presumably some type of logical examination: “reduction”), or the second quote may refer to the improvement afforded by the use of a few surrogate constraints vis-a-vis the case that none are used at all. In mixed-integer programming, of course, Benders inequalities are essentially the only type of inequality available for the capture of strictly logical information (the algorithm of [114] can be considered an extension of the additive algorithm to mixed integer programming, as can the algorithm of [7], and also those of [70, 791). But these constraints are usually (typical for genuine enumeration) taken from a stateproblem rather than from the relaxed problem as here. -The logical tests of the experiment appear to consist of detection of overall infeasibility (of a sub-problem) and of possible cancellations. I.e., they are equivalent to the detection of logical inequalities of degree 0 and 1 (see Section 4, in particular 4.3.3). The test problems were run with and without such logical
162
K. Spielberg
tests, and the timings were uniformly better with logical tests, improvements running on the order of 10% to 100%. - Local searches were conducted, and again a uniform dominance of runs with searches over those without is reported. (2) Reference [ l l ] , is really concerned with branch and bound programming, but has results which are equally pertinent to enumeration. The stress is again on a reliable LP system with frequent reinversions and a good deal of flexibility. -There is evidence for the desirability of penalty-improvement, the work of Gomory and Tomlin being cited, (see [74,75,141], for example). The combination of penalty information with logical inequalities is argued. -In general, a strong case is made for the use of logical inequalities of degrees 1 and 2, but doubt is raised about the desirability of degrees > 2. Special cases of inequality system of degree 2 are discussed here, as they are also in [92,137, 1381. The overall impact of (simple) logical methods appears to be quite favorable, leading to consistent reductions in running times, perhaps from 1Oo/o to 90%. -The paper also makes a case for combination of penalty and feasibility considerations (see “multiple criteria” in Section 2.5.1) and for the heuristic selection of parameters (see also “adaptive procedures” in Section 2.5.2). (3) References [118,140], present some excellent results for enumeration in the case of specially structured problems. There is stress upon careful design of the data-processing aspects. The overall emphasis is on special structures. Important problem areas covered are: assembly line balancing, resource-constrained network scheduling, distribution, time-table scheduling, and scheduling in general. - Variables are preordered (by cost, usually). - The constraint matrix is partitioned in terms of the signs of its coefficients. Chaining methods are used for linking non-zero entries of equal signs. - Multiple choice constraints are taken care of implicitly. For a diverse list of special problems, with as many as 1000 zero-one variables and about 50 constraints, plus 20 to 50 multiple choice constraints, the authors report solution times (for complete enumeration) from a few seconds to 90 seconds on a CDC CYBER 72 machine. When compared to other codes, some of them generalpurpose to be sure, some improvements of two orders of magnitude were found. The number of general constraints appears to have little influence in this work, but the execution times grow exponentially with the number of SOS sets. (4) Reference [1201, treats implicit enumeration with auxiliary LP problems. The tests are on relatively small test problems. A substantial number of strategies are tested, with relatively few striking differences. - Surrogate constraints are found to be of relatively little impact. -Using LP auxiliary programs seems clearly an improvement over using only logical methods (the problems are general in nature, or small enough so that structure is not important. Also the logical tests appear to be relatively weak). - One interesting test about the desired frequency of LP-calculations
Enumerative methods in integer programming
163
appears to show that little is gained by solving the LP at every node. It appears preferable to solve it only at certain iteration intervals, e.g. at every fifth node only. Clearly, the most appropriate timing would be problem dependent. (5) Reference [122], compares a number of computer codes in the public domain (RIP30, DZIP1, DZLP, HOLCOMB). The tests are over small zero-one problems from the literature, without structure (or with structure not exploited). For these problems, RIP30 and DZLP (with imbedded linear programs) outperform the other two codes, being fairly evenly matched among themselves. RIP30 has an efficient LP subroutine and exploits surrogate constraints, [63]. DZLP, on the other hand, has a rather weak LP code, but uses logical inequalities (“preferred variables” in a somewhat weak form, [131]), and does not use surrogate constraints.
B. Small scale experimentation in APL. Many of the ideas related to state enumeration and the use of logical inequalities have been tested in an APL environment in [79-831. The flexible nature of the language makes it an ideal medium for experimentation, but at the price of reduced execution speed. As a consequence one tends to solve relatively small problems, concentrating on insight rather than solution by sheer speed. For the relatively easy (28,35) and (12,44) test problems of [ 1141, for example, the various devices are almost always successful in resolving the test with few ALP‘s, often only one. For larger, moderately well-behaved problems, such as the (37,74) problem of [114], the devices are manifestly effective, but the problem is large enough so that one is not tempted to resolve it completely, even though it is apparent that there would only be a moderate number of required nodes. Other problems are more intractable, a good example being the (20,28) problem of [31] (of which we have two versions of similar difficulty). One can resolve such problems by state enumeration in about 80 ALP’s (one per node). This compares with several 100 ALP’s with even well-designed BB methods. Sophisticated cutting plane techniques (as implemented by E.L. Johnson) still appear to require close to 100 auxiliary linear programs. The measure in all of this experimentation has been taken to be the number of ALP’s. To facilitate comparison of such results with others which are measured in a different manner, we give a set of relative timing comparisons for different computational devices (see Table 2). While the reduction procedures are time consuming, they are clearly dominated by the LP computations. Even if this were not the case to a sufficiently convincing extent, there would be classes of problems (namely large-scale structured mixed integer problems with relatively few integer variables) for which the ALP would involve expensive input-output operations on the computer, whereas reduction procedures could be designed so as to be over small associated inequality systems in high-speed main storage.
K . Spielberg
164
Table 2. Time in seconds for execution of various computational devices (APLKMS System, IBM 370-158) ~~
PROBLEM
(20, 28) (37,741
LP
PREPROC
3.45 0.89 40.6 13.6
BOUNDED
REDUCTION
GEN
0.81
0.4
0.63 0.34 ( d = 1)
3.34
1.3 5.6 ( d = 2 )
LP, linear program. PREPROC, preprocessing (fixing of varaibles by reduction, + ceiling tests). BOUNDED, reduction of bound intervals [144, 108,811, see (4.3.2). REDUCTION, determination of degree for system + fixing of variables for d = 1. GEN, generation of logical (minimal preferred variable) inequalities.
3.3. Restricting the solution space via inequalities We are concerned here with the structure of the search, not so much with details (see Section4). What can be done so as to restrict the overall solution space?
3.3.1. General associated inequalities We have discussed reduction in Sections 1 and 2. Here we ask what information can reasonably be carried along in the enumerative scheme so as to facilitate it. It appears that, in principle, one can work with two sets of associated inequalifies, one for (0, 1) integer variables, the other for the general variable set. (1) Integer associated inequalities (see eq. 1.3). By S r + Sz”.
(3.3)
At a node, the right hand side is given (after substitution of the incumbent values for y and z * ) , and (3.3) can be used directly for feasibility tests, and for derivation of logical inequalities (“reduction”): Qy s q .
(3.4)
Variables can be fixed from (3.4) when its “degree” is 1, i.e. when there are preferred inequalities of type jj(j)a1.
(2) Mixed associated inequalities. Consider enumeration with occasional resolution of IPR (MIPR) as the relaxed auxiliary problem ALP. The top row of the final tableau (only the feasible case is of real interest), in the free non-basic variables x N , sN, yN, is:
(3.5) with ( a NPN, , y N )3 0. The tildas are used to indicate possible complementations of
Enumerative methods in integer programming
165
the variables (relative to their upper bounds), as introduced during the LP solution procedure. This information must be retained. Note that s and x can be dropped to yield a resulting modified Benders inequality z * a z R + yN&. We may include it in (3.3), but we usually assume that system (3.3) derives from substitution of trial vectors yk, not from the relaxed problem. We suggest the serious consideration of paralleling the Benders system (3.3) with a system of inequalities of the form ( a N ? P N , ?Nlk!!?
rk
+6kz*
(3.6)
derived from a sequence of relaxed tableaux. The constraints involving z* stem from the top row, the other constraints (if they should be interesting; some diagnostic testing might be required for ascertaining this) would be derived from the constraint rows. Suitable candidates might be cutting planes evolving from fractional basis rows. System (3.6) is used for the same purpose as is (3.3), except that one will also attempt to obtain information (such as possible shrinking of bounds) for the continuous variables x ( k ) and slack variables s ( i ) . However, both storage and processing of (3.6) is more difficult than corresponding work with (3.3), especially because the vectors t kdiffer from row to row. One inequality of type (3.61, arising from the top row of an initially resolved ALP has been used with some success in [81,82]. 3.3.2. Restriction to special sets of variables While the fixing of variables from feasibility tests or reduction is the most important device for curtailing a search, there are other devices to which one must resort when fixing is not possible. A basic consideration is to realize at each node of the tree, that the search can almost always be restricted a priori to a subset of the free variables. I.e., one has the option of limiting one's choice of the free variables. The purpose of reduction is not so much the fixing of variables (which could be accomplished differently as well), but the identification of small preferred variables sets and the direction of the search into areas of the integer lattice with small preferred variables sets. Details are given in Sections 1, 2 and 4. Here, we shall consider two related special instances. Special subsets of variables can sometimes be identified by interpretation of the special problem at hand. Consider a Simple Plant Location Problem. At level l of the enumeration let a state be given (see equation (3.1); also reference [79]);
S' = ( E , Z, F1, F2)', such that the problem is feasible. Let IcFl be such that closing plant i E I (setting the integer variable y ( i ) = O ) results in an infeasible state (this can always be checked with ease).
166
K. Spielberg
Then a preferred set is T = F2 U (F1 -I). For it is not necessary to render the problem infeasible through an explicit forward step of the algorithm. If there is a feasible possibility among successor nodes to the node at level 1, it must involve the opening of a plant i E F2 at some level 1’> l (setting y(i) = l), and the feasible point can equally well be reached by opening the plant i already at level 1. Finally, what makes BB programming so surprisingly powerful is the ease with which it often (alas not always) restricts the search to a small number of the free variables. Fixing a fractional integer variable y(j) at 0 or 1 can, in a sense, be considered as the imposition of a degenerate preferred inequality, namely the disjunctive condition y(j)+ y(j) 1, interpreted as “either y(j) = 1, or y(j) = 0”.
3.3.3. Sequential methods We shall use the probably ill-chosen term sequential methods for certain attempts other than tree-based searches to generate a non-redundant (if possible) sequence of trial vectors yk. In this we explicitly disregard standard applications of Benders partitioning methods [ 121 and lexicographical methods (e.g., [106, 1251). Essentially one is tempted to attain the frequently favorable behavior of BB methods (namely that the fractionalities tend to disappear at a reasonable rate when variables are fixed; i.e., new fractional variables do not tend to appear) in the realm of enumeration. Often such attempts are natural for structured mixed integer programming. One very simple example is the fixed charge transportation problem, in which there are possible fixed charges at all m x n routes (one zero-one variable y(i, j ) for every route (i, j)). Suppose one starts with the simpler problem (of the same mixed-integer mold, unfortunately) which has only m + n - 1 y(i, j) E ZJ’ (the set of routes picked by a transportation program without fixed charges). If one solves this smaller problem and its optimal solution remains in IJ’ (i.e., x(i, j) = 1--, (i, j) E I J ’ ) , then the solution is clearly optimal for the overall problem. If not, one may enlarge the set of integer variables to ZJ2 = ZJ’, etc. . One obtains a sequence of (possibly) small mixed integer problems which may be better behaved in its totality than the original problem. Of course, everyone is familiar with replacing a Travelling Salesman Problem by a sequence of Assignment Problems plus constraints which interdict sub-loops. In this instance the problems are simple, but the likelihood of finding a solution with no sub-loops is small. Somewhat more generally, a recent paper, [134], examines a “long and thin” pure (0, 1) problem (of Capital Budgeting genesis), i.e., a problem with small m and large n. An initial ALP, say ALP’, can have no more than m 1S m fractional variables, for say j E J ’ , IJ’( = m 1. Fix the integer-valued integer variables at their values in ALP’. Enumerate over j E J ’ , and record any solutions thus found. Let ALP’ be the relaxed LP plus one constraint prohibiting the initially fixed integer vector (y(j) integral for j E J ’ ) . Clearly one preferred inequality C P(j) 3 1 suffices, with ?(j) suitably defined. Solve ALP’ and repeat the procedure.
Enumerative methods in integer programming
167
One solves a sequence of auxiliary linear problems (augmented by one added inequality at each iteration), followed by enumeration of small size (at least for small iteration numbers). The authors treat several problems of respectable size (10 by 100 to 10 by 200, e.g.), and achieve integer solutions within 1% of the optimal integer value with often little computational effort (on the order of 30 iterations). 3.4. The search origin (global or local) With difficult problems the search may degenerate into a useless thrashing about at high search levels if (i) the origin of the search is indifferently chosen, or (ii) the initial branching decisions are poor. Even if the algorithm is clever and uses a varied arsenal of techniques, little information about the problem is available at the outset, so that the search may commence badly. Special precautions should be taken, if possible, to counteract such a tendency. In what follows, it should be kept in mind that there are always two possibilities of somewhat opposite nature. Either: (a) one determines an origin (or state) from fairly simple considerations (heuristic or not), and then solves a state problem, and obtains dual variables, Benders inequalities and logical inequalities from it; or: (b) one has an intrinsic dual method for obtaining good dual variables, and determines an origin (state) from it and the complementarity conditions. It is likely that approach a) is suitable for general problems, and approach b) more suitable for problems with known specific structure. 3.4.1. Generalized search origin Implicit Enumeration is simplest when starting at a “zero-origin’’ (0, 0, . . . ,O), with variables chosen such that the cost coefficients are non-negative. Various feasibility tests are then particularly simple to execute. More general origins were occasionally proposed in the literature, e.g. by F. Glover in connection with Gomory’s All-Integer Integer Algorithm, [70,73], but attempts at overcoming numerically poor behavior via a general search origin appear to originate with the Simple Plant Location Problem, [136]. For that problem there are essentially two “natural” search origins, one corresponding to a start with all plants open, the other with all plants closed. In both cases the associated Benders inequality coefficients can be characterized, and computed or updated, as functions of the dual variables (“gain function”). A negative value of a gain function g ( i ) at level 1 permits the fixing of variable y ( i ) at level 1. A number of algorithms for the two natural origins were given in [135]. More recent work by [55,8,37] has further established the importance of
168
K. Spielberg
the gain functions (related to submodularity) and the good behavior of one of the algorithms (as a “greedy” heuristics). For large problems, however, and some distributions of transportation costs and fixed charges, these “natural” origins may be quite bad. One is then well-advised to start with a good guess at the solution (not difficult to obtain for plant location problems: e.g., see [26,46,86]) and place it at the search origin. The algorithm becomes more complex (e.g., negative gain functions are no longer sufficient for cancellation), and the computational effort correspondingly increases. But for a number of difficult problems it was shown ([136]) that, given appropriate new feasibility tests, the search tends to be low -level for well-chosen origins. In some cases the searches were terminated at the generalized origin (chosen via solutions from an initial phase of the enumeration) by cancellation of all variables. The analogous procedures were introduced for the general problem in [13 1 and 1291, with similar but not quite so striking results. Periodic restarts of the search at new feasible solution points were found to be worthwhile even at the cost of some loss of previous work. The relatively good behavior of a search about a reasonable starting point is quite analogous to the behavior of a non-linear search about a local optimum (or the behavior of a particle in the neighborhood of a local minimum of potential in mechanics or electrostatics). Such characteristics can apparently best be captured in terms of the associated Benders inequalities (and derived logical inequalities), or alternatively in terms of the locally optimal dual variables. Large problems may of course have many local optima. In such a case one might decide to perform sequences of (low-level) local searches, such as suggested by [112] for the travelling salesman problem, and highly successfully used by [127, 1281 in airline crew scheduling (related to set covering). These approaches are heuristic in nature, for it is not known whether the various local searches can be “spliced together” effectively (e.g., by means of additional inequalities) so as to guarantee overall non-redundancy.
3.4.2. Dynamic methods We use the possibly ill-chosen term “dynamic methods” for attempts a t adjusting the search from node to node, essentially via assignment of the currently best state for the free variables. I.e., one has a general state S” at every node v, and adjusts the partitioning (F1, F2, F3)” as best one can at the node. The proof of efficacy for such procedures (as advocated in 78-80) is still not overly convincing. The desirability of such an approach is fairly clear in principle, but implementations are not particularly easy. Various dual methods, as described for instance in [5,, 19,26,46,47,52,53,86,98,114, 133, 135, 1361, followed by the use of complementarity conditions, may be essential for the determination of good “dynamic states”, especially for structured problems.
Enumerative methods in integer programming
169
3.4.3. Restructuring of trees There have been some recent attempts at gaining benefits similar to those of 3.4.1.-2, by procedures which avoid redundancy of computation. They can be viewed as attempts at a redrawing of the search tree. The flexible backtracking procedure of [ 1241 rests on the possibility of splitting a single-branch (LIFO) tree at any one of its nodes into two trees. An example suffices to explain the idea:
Fig. 3.
See Section 3.1.2 (Fig. 2) for the conventions about tree representation. In effect one has chosen to work with the alternate possibility y(2) = 0 at level 1. Whereas in the original tree the “alternate” must await the return of the search to level 1 at some future, and possibly remote, point of time, it is explicitly represented by the first tree on the right of Fig. 3, for possible immediate processing if one so desires. The scheme is simple enough and the difficulty lies in the development of criteria for deciding on suitable backtracking. Reference [ 1241 is not very specific about this matter, but presents good computational experience with a difficult (28,89) test problem of [ 1141, which does not respond to enumeration otherwise. Reference [29] concerns itself with the potential redrawing of a “multi-branch” tree (i.e., a “general”, not LIFO, BB tree), with many pending nodes. As an’ illustration, let the 8 possible pending nodes of a three variable tree be represented by:
ABC ABC v ABCv ABC v ABCV ABCVA B c v A E , with A,B, C each representing one value for the variables y(l), y(2), y(3), respectively. Then the question treated is how to replace a pair such as ABC and A B e by a single node AB. I.e., one has to determine that the given dichotomy is “insignificant”, and then one has to redraw the tree so as to remove it and replace it by AB. A systematic approach for such “merging” is presented, but no computational results are as yet available.
4. The local computations: processing at nodes One may say that Section 3 was devoted to “strategical” considerations, and concentrate on “tactical” matters in Section 4: “What can be done at a node to resolve the local problems expeditiously?”. Of course, the dividing line is somewhat tenuous and one must never lose sight of strategical requirements.
K . Spielberg
170
4.1. Bounding: relaxation Problems IP (or MIPR) (1.1 or 1.2) are difficult. One therefore either: (i) relaxes the problem, e.g. by dropping or weakening constraints of the problem (often, but not exclusively, the integer constraints), so as to generate the relaxed problems IPR (or MIPR), or (ii) relaxes an integer programming solution procedure (which could in principle resolve IP or MIP), so as to generate a weakened procedure (often, but not always, by terminating an exact procedure when it bogs down). Dual methods are concerned with problems dual to IPR (or MIPR), and amount to a further weakening of the problem when treated heuristically, meant to be off-set by better tractability Penalty procedures are attempts at recapturing, to an extent, the effect of the integrality requirement on a variable which has been dropped in a relaxed problem. 4. I. 1. Relaxed problems and procedures ; disaggregation Little need be said about the relaxation of problems. Branch and Bound Procedures are almost entirely based on finding and exploiting relaxed versions of the originally given problem. Specially structured problems have, as is almost always the case, more technical interest than the general problems. For general problems one obtains the relaxed problem usually via replacement (for 0-1 problems) of: y(i)E{O, 11 by O s y ( j ) s l .
For special problems one often replaces the given problem by another with fewer constraints, e.g.: - the Travelling Salesman Problem by an assignment problem [113], - the Travelling Salesman Problem by a spanning tree problem [98], - the Capacitated Plant Location Problem by a transportation problem, etc.. Relaxation procedures [47,98] place intractable constraints into the objective function (with “dual variable” or Lagrangean multipliers; for an exposition see [651). Highly important is the fact that “integer-equivalent” IP formulations often result in relaxations of widely different strengths. In particular, the “disaggregation ” procedure of replacing a “linking constraint”
by the set of constraints:
x(i)
MY,
i E J,
(4.2)
usually results in substantially strengthened relaxations, (see [25,43, 1351 for plant location, [132, 1431 for somewhat more general problems).
Enumerative methods in integer programming
171
Equally interesting, and somewhat less exploited possibilities, are the partial resolutions of integer programming algorithms: - imposition of cutting planes - resolution of group pioblems as approximations to the integer problem - low level searches as approximations to complete enumeration, etc. 4.1.2. Dual methods Let IPR (with objective function z,) have a dual problem DPR. Any feasible solution has an objective function z D which is a lower bound for zR and the integer objective function. One’s goal is to find good dual solution with tight bounds. The major interest is in heuristic methods for obtaining good dual solutions. It appears that excellent heuristics can often be found when solving the linear program for an optimal solution is expensive. This is true especially for large structured mixed integer problems, such as plant location [25,46,86], scheduling [52,53,54, 1331, etc. . The great promise of dual approaches was demonstrated conclusively in the relaxation procedures of Held-Karp for generating and updating suitable dual variables in the case of the travelling salesman problem [98]; see also [loll. Other scheduling and multi-commodity problems lend themselves to treatment by similar techniques. The enlarged (disaggregated) formulations of certain auxiliary problems lend themselves particularly well to dual treatment, since direct solution may be too cumbersome. The methods of [25, 1351 can be viewed as dual equivalents to the method of [43]. In all of the above there remains the residual problem of finding good primal solutions. One may use the obtained dual variables to: (i) suggest guesses at primal solutions from the complementarity conditions, or to (ii) give rise to “Langrangean Objective Functions” for a relaxed primal problem. Alternatively, one may be able to close the “duality gap” by simple enumeration or branch and bound schemes if the bounds are particularly good or if the problems are not too large. 4.1.3. Penalty computations [45, 1411 The use of penalties is well-established (see texts such as [59, 1301. We may assume, after resolution of an auxiliary relaxed LP the existence of a “penaltytable”, containing two entries 7rd(j), + ( j ) for each free variable y(j) (corr. to rounding down and up). Some of the penalties may be “additive” in nature, e.g. when associated with nonbasic variables. Of relatively recent interest is the potential strengthening of individual penalties if logical constraints are taken into account, see [ 11,89, 1381. Another promising
112
K . Spielberg
possibility is the combination of “one-sided” large penalties (one of the pair rd and nubeing large, the other small) with a different favorable criterion for the small penalty alternative (see “multiple criteria”, 2.5.2). 4.2. Search for solutions: the State problem 4.2.1. Solving State problems A State is most often a decision about the setting of all integer variables to particular values. For pure problems, the state problem reduces to substitution of a trial vector yk. For a mixed integer problem, the state problem is a continuous problem resulting from substitution of yk. As discussed in 3.4, one may arrive at a state at level 1 in a very simple manner, e.g. by “inheritance“ of the integer values from level 1 - 1 (plus whatever explicit change was imposed by the branch from level 1-1). In that case, the state problem solution can often be found readily by adaptation (updating) of the last solution. The dual variables of the state problem can then be used to construct global inequalities over the integer variables (Benders [ 121). The alternative is to generate a state in a non-trivial fashion (e.g., via direct dual methods). In such a case, the state problem may be both more difficult and more significant. This alternative depends typically upon knowledge of the problem structure. 4.2.2. Heuristic methods: local search We denote by “local search” any procedure for generating a set of vectors y k at a node Y in the branch tree. Usually one starts with y” and alters free components of y” in some regular manner, such as: - “deep level” search (line search): one alters one component after another according to some feasibility or cost criterion for p G n forward steps (sometimes followed by a similar “backward” pass) - low level search: one complements no more than p (small, often 1 or 2) components at a time. The details are probably not too important. In all reported instances one somehow limits the overall effort devoted to the local search. A number of papers report net improvements of as much as an order of magnitude, [95,96,104,82]. The information obtained in the local search may be used secondarily to assist in subsequent strategic (branching) decisions, e.g. via a “branch towards feasibility’’. In this fashion one typically finds good feasible solutions early in the enumeration process. 4.3. Curtailing the search: reduction 4.3.1. Feasibility tests We use the term “feasibility test” to denote any of a large number of tests which have been described (see general texts) for establishing at a level 1 of the
Enumerative methods in integer programming
173
search whether the problem can possibly become (or remain) feasible over successor nodes. Negative results of the tests lead to backups or variable cancellations. The tests are usually over the pure integer constraints (given or generated), augmented by a current expression for the objective function in terms of the free nonbasic variables (such as ( 3 . 3 , (3.6)). The present trend appears (and probably should be) in the direction of replacing individual feasibility tests by logical reduction procedures applied to sets of inequalities, often generated and updated during the search process itself. 4.3.2. Bound internal reduction A number of papers, such as [107, 108, 1441, have featured formulas for systematic shrinking of bound-intervals, both for structural and slack variables. They express a new set of bounds as a simple function of the old bounds, such as: (4.3) and are invoked iteratively (e.g. see [82-841). While the basic implication (4.3) is quite trivial in nature, the overall effect can be significant. In particular it will take care of all possible cancellations derivable from pure integer inequalities (i.e., of the cases d = O and d = 1 for logical inequalities). One of the difficulties of bound-reduction lies in the necessity of invoking suitable tolerances. The alternative of “logical reduction” (see 1.3) seems easier to use and more stable numerically (aside from treating also the cases d > 1). On balance one may be best advised to use bound-reduction for integer problems, and whenever one has ideas as to how to use tight bounds on slack variables. Very little appears to have been done in such a direction. 4.3.3. Logical inequalities We need say little more beyond what is in 1.3 and 2.1.2. The generation of a logical inequality system by reduction (4.4)
involves row-by-row complementations and sorting of variables ([92, 1371). When the “reduction” procedure is preceded by bound-reduction, the resultant degree of a minimal inequality system (4.4) will be 3 2 (if it exists; i.e. if any logical implication can be drawn from the initial system without rowcombinations). It should be emphasized that an entire system (4.4) may derive from few original inequalities (often only one). This means that the reduction automatically focusses on the most constraining row of the original system, as it probably should. Any possible interaction between different rows of (1.1) o r (1.3) must be analyzed via an examination of linear combinations of the rows (possibly as in
174
K . Spielberg
[97,39]). Of course, a given logical system (4.4) may be of help in such an analysis. Relatively little has been done in examining a given systen (4.4), for example with the goal of constructing a vector y which is “most compatible” with it. One knows how to infer certain immediately useful 1- or 2-degree “implications” directly from a given general 2-degree system ([92, 137, 111). A 1-degree implication is of the form y(i)=O or y(i)= 1. A 2-degree implication may be y(i) = y(j) or y(i) = 1 - y(j), and can clearly be used to reduce the problem size. (Ref. [1021 examines 2-degree alternatives (not implications), and associated penalties, as branching choices in a BB scheme: “cross-branching”). For other guidance to be derived from a system (4.4) see Section 4.4.3, and also the “cascading” approach of ref. [ll]. 4.4. Directing the search: strategy, branching 4.4.1. Different phases of the enumeration Various devices already mentioned, such as the Balas criterion for branch choices and local searches, as well as exploitation of “contraction properties” (4.4.3), are often highly effective in leading to good feasible solutions (they constitute, in effect, a complicated counterpart to the “greedy heuristics” which are becoming increasingly popular). But even then, one may be up against a major search effort in ascertaining (proving) optimality within a certain percentage of the true optimum. Experience shows that this second task is usually ill-served by the strategies well adapted to the first. One does well, then, to distinguish between these two “phases” of the calculation and to redesign strategy as one moves from phase 1 to phase 2. Experience suggests fairly strongly that phase 2 might best be served by a strategy quite opposite to that of phase 1, for example by branching which is directed towards maximizing overall infeasibility rather than feasibility, etc. . The low-level nature of phase 2 searches in [136] are in part certainly a consequence of such a strategy change. 4.4.2. Search direction via feasibility or penalties The standard approaches for choosing a branch variable have always been to either (i) attain or maintain feasibility for enumeration, or (ii) remove fractionality for branch and bound programming. The first approach is natural when special structures are present (such as multiple choice sets). One will choose a branch at level 1 which permits the setting of a state at level l + 1 such that at least the special structure constraints are always satisfied. This may require a certain amount of “look-ahead” to the next level.
Enumerative methods in integer programming
175
For more general (pure) problems it is customary to choose the branch such that the sum (Balas criterion) or number of the projected infeasibilities at level 1 + 1 is minimized. Criterion (ii) may be used for problems with execution of an ALP at level 1, and also for mixed integer problems (for which feasibility is not so easily projected). If special pure integer structures are present in a mixed integer problem, a compromise between the criteria may be desirable (along the lines of 2.5.1, for example). When fractionality is the main concern, one usually chooses j* such that y(j*) is fractional and some f ( d ( j * ) , ~ “ ( j * ) )is maximized, often the absolute value of the difference in up- and down-penalties. We have found (primarily in conjunction with the use of logical inequalities, see 4.4.3) that it may be worthwhile to consider also the non-fractional variables (often, but not exclusively, the nonbasic variables). Starting with an ALP optimal tableau, we define a “propagation” as a branch which leaves the ALP-tableau unchanged. I.e., y(j*) is set to 0 (or l), if it is 0 or 1 in the tableau. A propagation on j * , say setting y(j*)=O, makes sense if j* can be selected such that the alternative setting, here y(j*) = 1, has a large penalty associated with it. The ALP need not be resolved at 1 + 1, and on return to level 1 the “alternate node” may no longer require resolution. 4.4.3. Search direction with logical inequalities The use of propagation may be somewhat trivial (e.g., if only nonbasic variables are involved and the “alternate penalties” are the nonbasic reduced costs). The matter becomes more interesting when logical inequalities are used. The basic idea can be illustrated with a simple example (from a (6, 12) problem used in [138]). Two of the minimal preferred inequalities (of degree 2) are: -y(2)+y(5)s0 (either y(2) is 1, or y(5) is O), and -y(9)+y(11)<0 either y(9) is 1 or y(l1) is 0). The penalties of interest are: y(2). . (0, 1.82), y(5). . (0,9.73), y(9). . (0,7.13), y(l1). . (0,4.62), all variables being nonbasic at value 0. One will consider seriously only propagations which satisfy the preferred inequalities and give rise to large alternate penalties: (1) Propagate on y(5) = 0. The alternate node has y(5) = 1 (complementary value), and y(2) = 1 (from preferred inequality). The total penalty is: T = ~ “ ( 5+)~ “ ( 2=)9.73 + 1.82 = 11.55. 2. Propagate on y(11) = 0: Alternate node: y ( 1 l ) = 1 and y(9) = 1. Penalty: T = ~ “ ( 1 1 ) + ~ ” ( 9 ) = 7 . 1 3 + 4 . 611.75. 2=
176
K . Spielberg
Additivity has been invoked in both cases, via the nonbasic nature of the variables. Without it, the alternate penalties would have been max {1.82,9.73} and max {7.13,4.63}. Notice that one would most likely still have chosen the propagation branch y(11)=0, because it would have left the relatively large penalties 7.13 and 9.73 still active, whereas the other case would have done away with the large penalty 9.73: choose persistence of large penalties, if possible. In addition to penalty considerations, one focuses on branches which are “contracting”. For certain variables (columns of Q) one can tell by inspection of column j whether a branch would lead lead to a lower value of the degree (to a more constrained problem) at the next level, i.e. whether the branch j is “contracting”. One may then choose branch variables j” such that one alternative has a high penalty associated with it and the other (possibly with small penalty) is contracting. A particularly favorable case may arise when both y(j)=O and y ( j ) = 1 are contracting: “double contraction”. We reproduce the start of a tree from [83] as Fig. 4. It is of single-branch nature, and the branches have been chosen such that the alternate nodes are ordered according to decreasing bounds on the objective functions (from 967.1 to 5963 ) . Propagating branches are marked by 0, contracting branches by A. Since the degree before branching is always 2, it becomes 1 after branching, so that variables can be fixed, as is indicated in Fig. 4. The alternate nodes need not be processed after the feasible (and integer optimal) solution with z = 550 is found. For the right problem one can start tHe search by a (propagation) single-branch tree, and then continue by either further enumeration, or by a genuine branch and bound procedure when the alternate nodes are no longer “interesting” (hybridtype algorithm: single-branch+multi-branch).
Fig. 4. (28, 35) Test problem ([ 1141)
5. Generalization and specializations The space is too short to give an exhaustive treatment of this subject. We shall give only outlines and some references.
Enumerative methods in integer programming
177
5.1. Integer programming There has been little enumerative work in strict integer programming. If one does not transform the variables into simple linear combinations of zero-one variables (explicitly, or better implicitly), one is probably best advised to limit enumeration to problems with small bound intervals. The concepts and tools of logical inequalities can be carried over, in reasonably useful form, to small interval integer programming; see [82,84]. 5.2. Mixed integer programming The probably most fertile ground for enumeration are mixed-integer problems with substantial (0, 1 ) structures, especially if these structures can not be handled readily by branch and bound programming (as multiple choice constraints can be handled, to some extent, by means of “SOS sets”). Such (0, 1) structures (in addition to multiple choice sets) may contain precedence requirements, restrictions of the number of permitted activities, interference restrictions, etc. Equally important, perhaps, we may see in future that generated associated inequalities, and logical inequalities derived from them, constitute (0, 1)structures in the above sense, and may best be handled enumeratively as well. Scheduling problems, in their manifold variety, seem particularly ripe for enumerative mixed integer programming. Examples may be found in [118] and also [87].
5 . 3 . Special problems Special problems are of major interest because:
- they often constitute prime examples for the type of mixed-integer problem discussed in 5.2, and - they readily respond to special techniques of great interest (and frequently developed precisely in connection with special problems): decomposition, partitioning, dual methods, relaxation, heuristics, complete o r at least substantial implicit representation of constraints, etc. . A specially interesting, and particularly wide, class of problems can be termed “weakly linked” (as in [30]), when the continuous and (0, 1) integer variables are linked by a system:
Mx 6 Zy,
(5.1)
with M a diagonal and I an identity matrix. Weakly linked systems subsume such problems as: fixed charge problems of various types, plant location problems, certain product scheduling problems, etc. . It is clear that many of the techniques featured in this survey should find ready application for any weakly linked problem. A method for treating (5.1) is
K . Spielberg
178
described in [132] which is also a good source for additional references. The term “Variable Upper Bound” is used for constraints of the form x ( j ) S y ( j ) . We finally list a number of special problems, references and techniques which have been used or demonstrated in connection with them:
- Travelling Salesman: Relaxation and dual techniques [98].
- Plant Location: Dual methods, variable upper bounds [43, 26, 46, 135, 86, 1321. Objective function properties (submodularity) [55,8,37]. - Set Covering (Packing): Rounding+ reduction to vertices of the polytope [ 1141. Heuristics for large problems [112, 127, 1281. Cutting planes [ 1161. See the surveys [9, 101. - Knapsack: Rounding, ordering, group methods. For good surveys see [49,50]. - (Machine) Scheduling: Duality methods [52, 531. Implicit techniques [57,58]. - Maintenance scheduling: Implicit techniques [87]. - Assembly line balancing: Implicit enumeration [118, 1401. The list above does not do justice to the substantial possibilities for enumeration with special problems. Much more will be done here in the future. One cautioning word may be necessary. Techniques which resolve a particular special problem may often have a surprisingly limited market, What is needed really are techniques which solve a special problem plus some added constraints. Much attention will have to be given to such problems, via decomposition, relaxation, etc. It may also be worthwhile to incorporate enumerative modules into production codes which are at present almost entirely branch and bound.
References [ 11 J. Abadie, ed., Integer and Nonlinear Programming (Elsevier, Amsterdam, 1970). [2] J. Aronofsky, ed., Progress in Operations Research, Vol. 3 (Wiley, New York, 1969). [3] E. Balas, An additive algorithm for solving linear programs with zero-one variables, J. ORSA 13 (1965) 517-546. [4] E. Balas, Bivalent programming by implicit enumeration, in: Encyclopedia of Comp. Sc. and Techn., Vol. 2 (Marcel Dekker, New York, 1975) 479-494. [5] E. Balas, Minimax and duality for linear and nonlinear mixed integer programming, in: [ l ] (1970).
Enumerative methods in integer programming [6] [7] [8] [9] [lo]
179
E. Balas, Intersection cuts-a new type of cutting planes, J. ORSA 19 (1971) 19-39. E. Balas, Discrete programming by the filter method, J. ORSA 15 (1967) 915-957. D.A. Babayev, Comments on the note of Frieze, Math. Programming 7 (1974) 249-252. E. Balas and M.W. Padberg, On the set covering problem, J . ORSA 20 (1972) 1152-1161. E. Balas and M.W. Padberg, Set-partitioning, a survey, SIAM Rev. 18 (1976) 710-760. [ I l l R. Breu and C.A. Burdet, Branch and hound experiments in 0-1 Programming, Math. Programming Study 2 (1974) 1-50. [ 121 J.F. Benders, Partitioning procedures for solving mixed-variables programming problems, Numer. Math. 4 (1962) 238-252. [13] E.M.L. Beale, Survey of integer programming, Operational Res. Quart. 16 (1965) 219-228. [I41 E.M.L. Beale, Selecting an optimal subset, in [l] (1970). [ 151 E.M. Beale, ed., Applications of Mathematical Programming Techniques (Engl. Univ. Press, London, 1970). [ 161 E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered sets, Math. Programming. 10 (1976) 52-69. [ 171 M. Bellmore and G.L. Nemhauser, The travelling integer salesman problem: a survey, J. ORSA 16 (1968) 538-558. [I81 M. Bellmore and H.D. Ratliff, Set covering and involutory bases, Management. Sci. 18 (1971) 194-206, 1191 D.E. Bell and J.F. Shapiro, A finitely converging duality theory for zero-one integer programming, RM-75-33, IIASA, Laxenburg (1975). [20] E.M.L. Beale and J.A. Tomlin, Special facilities in a general mathematical programming system for non-convex problems using ordered sets of variables, in: [ 1 1 I] (1969). [21] E.M.L. Beale and J.A. Tomlin, An integer programming approach to a class of combinatorial problems. Math. Programming 3 (1972) 339-344. [22] M. Benichou, J.M. Gauthier, P. Girodet, G. Hentges, G . Ribiere and 0. Vincent, Experiments in mixed integer linear programming, Math. Programming 1 (197 1). [23] M. Benichou, J.M. Gauthier, G. Hentges and G. Ribiere, The efficient solution of large scale linear programming problems; some algorithmic techniques and computational results, IXth Int. Symp. on Math. Progr., Budapest (1976). [24] E. Balas and R.G. Jeroslow, Canonical cuts on the unit hypercube, SIAM J. Appl. Math. 23 (1972) 661-69. [25] 0. Bilde and J. Krarup, Bestemmelse af optimal beliggenhed af produktiionssteder, Res. Rep., IMSOR, The Techn. Univ. of Denmark (1967). [26] 0. Bilde and J . Krarup, Sharp lower bounds and efficient algorithms for the simple plant location problem, Rep. 75/5, Inst. of Datology, Univ. of Copenhagen (1975). [27] M.L. Balinski, Integer programming: methods, uses, computational, Management Sci. 12 (1965). [28] M.L. Balinski (chm), Is mathematical programming moribund?, ORSA/TIMS Meeting, Phila. (1976). [29] C.E. Blaire and R.G. Jeroslow, Treeless searches, manuscript, Carnegie-Mellon Univ. (1976). [30] M.L. Balinski and K. Spielberg, Methods for integer programming: algebraic, combinatorial and enumerative, in: [2] (1969). [3 11 B. Bouvier and G . Messoumian, Programmes lineaires en variables bivalentes, these, Faculte des Sciences, Univ. Grenoble (1965). [32] V.J. Bowman and J.H. Starr, Partial orderings in implicit enumeration, Workshop on Integer Programming, Bonn (1975). [33] C.A. Burdet, A Class of cuts and related algorithms in integer programming, Management Sci. Rep. 220, Carnegie-Mellon Univ. (1970). 1341 C.A. Burdet, Enumerative cuts, J . ORSA 21 (1973) 61-89. [35] R.E. Burkard, Methoden der Ganzzahligen Optimierung (Springer, Berlin, 1972). [36] R. Chen, H.P. Crowder and E.L. Johnson, An integer programming formulation of the installation scheduling problem, Res. Rep., IBM Res., Yorktown Heights (1976). [37] G . Cornuejols, M.L. Fisher and G.L. Nemhauser, On the uncapacitated plant location problem, Rep. 76-5-01, Dept. Decision Sc.. Univ. PA (1976).
180
K. Spielberg
[38] A. Charnes and F. Granot, An algorithm for solving integer interval linear programming problems 1: A new method for mixed integer programming, Res. Rep. CS 142, Univ. Tex. (1973). [39] A. Charnes, D. Granot and F. Granot, On improved bounds for variables in linear programs by surrogate constraints, Res. Rep. CS 153, Univ. Tex. (1973). [40] N. Christofides, Zero-one programming using non-binary tree search, Comput. J. 14 (1971) 4 18-42 1. [41] R. Cottle and J. Krarup, Optimization Methods for Resource Allocation (Engl. Univ. Press, London, 1974). 1421 J. Colmin and K. Spielberg, Branch and bound schemes for mixed integer programming, Rep. 320-2972, IBM N.Y. SC. C. (1969). [43] P.S. Davis and T.L. Ray, A branch-bound algorithm for the capacitated plant facilities location problem, Naval Res. Logist. Quart. 16 (1969). [44] I. Dragan, Un algorithme lexicographique pour la resolution des programmes lineaires en variables binaires, Management Sci. 16 (1969) 246-252. [45] N.J. Driebeek, An algorithm for the solution of mixed integer programming problems, Management Sci. 12 (1966) 576-587. [46] D. Erlenkotter, A dual-based procedure for uncapacitated facility location, Work. P. 261, W. Mgmt. Sc. Inst., Univ. Cal., L.A. (1976). [47] H.M. Everett, Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, J. ORSA I 1 (1963) 399-417. [48] B.H. Faaland and F.S. Hillier, The accelerated bound-and-scan algorithm for integer programming, J. ORSA 23 (1975) 406-425. [49] D. Fayard and G. Plateau, Resolution of the 0-1 knapsack problem: comparison of methods, Math. Programming 8 (1975) 272-307. [SO] D. Fayard and G. Plateau, Techniques de resolution du probleme de knapsack en variables bivalentes, Rep. Lab. de Calcul, Univ. France, Lille (1976). [ S I] J.J.H. Forrest, J.P.H. Hirst and J.A. Tomlin, Practical solution of large mixed integer programming problems with Umpire, Management Sci. 20 (1974) 736-773. [52] M.L. Fisher, Optimal solution of scheduling problems using Lagrange multipliers: Part 1 , J. ORSA 21 (1973) 1114-1127. [53] M.L. Fisher, A dual algorithm for the one-machine scheduling problem, Rep. 7403, Center for Math. Stud. in Bus. and Econ., Univ. Chic. (1974). [54] M.L. Fisher W.D. Northup and J.F. Shapiro, Constructive duality in discrete optimization, Rep. 7321, C. for Math. St. in Bus. and Econ., Univ. Chic. (1973). [SS] A.M. Frieze, A cost function property for plant location problems, Math. Programming 7 (1974) 245-248. [56] R.L. Francis and J.M. Goldstein, Location theory: a selective bibilography, J. ORSA 22 (1974) 400-410. [57] M. Florian, P. Trepant and G.B. McMahon, An implicit enumeration algorithm for the machine sequencing problem, Management Sci. d7 (1971) B782-792. [58] M. Florian, C. Tilquin and G. Tilquin, An implicit enumeration algorithm for complex scheduling problems, Int. J. of Prod. Res. 13 (1975) 25-40. [59] R.S. Garfinkel and G.L. Nemhauser, Integer programming (Wiley, New York, 1972). [60] R.S. Garfinkel and G.L. Nemhauser, The set-partitioning problem: set covering with equality constraints, J. ORSA 17 (1969) 848-856. [61] R.S. Garfinkel and G.L. Nemhauser, Optimal set covering: a survey, in: [64] (1972). [62] A.M. Geoffrion, Integer programming by implicit enumeration and Balas’ method, SIAM Rev. 7 (1967) 178-190. [63] A.M. Geoffrion, An improved implicit enumeration approach for integer programming, J. ORSA 17 (1969) 437-454. [64] A.M. Geoffrion, ed., Perspectives on Optimization: A Collection of Expository Articles (Addison Wesley, Reading, MA, 1972). [65] A.M. Geoffrion, Lagrangean relaxation for integer programming, Math. Programming Study 2 (North-Holland, Amsterdam, 1974).
Enumerative methods in integer programming
181
[66] A.M. Geoffreon and R.E. Marsten, Integer programming: a framework and state-of-the-art survey, Management Sci. 18 (1972) 465-491. [67] F. Granot and P.L. Hammer, On the use of Boolean functions in 0-1 programming, Meth. of OR 12 (1972) 154-184. [68] P.C. Gilmore and R.E. Gomory, The theory and computation of knapsack functions, J. ORSA 14 (1966) 1045-1074. [69] F. Glover, A multiphase-dual algorithm for the zero-ne integer programming problem, J ORSA 13 (1965) 879-919. [70] F. Glover, Heuristics in integer programming, Working Paper (unpubl., Sept. 1967). [71] F. Glover, Surrogate constraints, J. ORSA 16 (1968) 741-749. [72] F. Glover and L. Tangedahl, Dynamic strategies for branch and bound, OMEGA, The Int. J. of Mgmt. Sci. 4 (1976) 571-576. [73] R.E. Gomory, All-integer integer programming algorithm, in: [119] (1963). [74] R.E. Gomory, Some polyhedra related to continuous problems, Linear Algebra and Appl. 2 (1964) 451-558. [75] R.E. Gomory and E.L. Johnson, Some continuous functions related to corner polyhedra, Math. Programming 3 (1972) 23-85, [76] H. Greenberg, Integer Programming, (Academic Press, New York, 1971). [77] H. Greenberg and R.L. Hegerich, A branch search algorithm for the knapsack problem. Management Sci. 16 (1970) 327-332. [78] M. Guignard and K. Spielberg, Search techniques with adaptive features for certain integer and mixed-integer programming problems, Proceedings IFIPS Congress, Edinburgh, North Holland (1969) 238-244. [79] M. Guignard and K. Spielberg, The state enumeration method for mixed integer zero-one programming, Rep. 320-3000, IBM Phil. Sc. C. (1971). [XO] M. Guignard and K. Spielberg, A realization of the state enumeration procedure, Rep 320-3025, IBM Phil. Sc. C. (1973). [81] M. Guignard and K. Spielberg, Mixed-integer algorithms for the (0, 1) knapsack problem, IBM J . Res. Dev. 16 (1972) 424-430. [82] M. Guignard and K. Spielberg, Reduction methods for state enumeration integer programming, Rep. 16 Dept. Stat. OR, Univ. PA (1976). [83] M. Guignard and K. Spielberg, Propagation, penalty improvement and use of logical inequalities in interactive (0, 1) programming, Int. Symp. Math. Pro., Heidelberg Univ. (1976). [84] M. Guignard and K. Spielberg, An experimental interactive system for integer programming, Rep. 17, Dept. Stat. OR, Univ. PA (1976). [85] M. Guignard and K. Spielberg, Algorithms for exploiting the structure of the simple plant location problem, Ann. Discrete Math. 1 (1977). [86] M. Guignard and K. Spielberg, A dual method for the mixed plant location problem, Rep. 20, Dept. Stat OR, Univ. PA (1976). [87] M. Guignard and K. Spielberg, Maintenance scheduling, Proc. IXth Int. Symp. Math. Progr., Budapest (1976). [88] M. Guignard, Inegalites valides de Gomory-Johnson, Journees de Combinatoire AFCET ( 197 1). [89] M. Guignard, Preferred shadow prices in 0-1 programming, Wharton School Techn. Rep., Univ. PA (1974). [90] G.W. Graves and Ph. Wolfe, eds., Recent Advances in Mathematical Programming (McGraw Hill, New York, 1963). [91] J. Haldi, 25 Integer programming test problems, Work. P. 43, Grad. Sch. Bus., Stanford Univ. ( 1964). [92] P.L. Hammer, Boolean procedures for bivalent programming, Res. Rep. CORR 72-1, Univ. Waterloo (1973). [93] P.L. Hammer, BABO - a Boolean approach for bivalent optimization, Centre de Rech. Math., Univ. Montreal (1971). [94] P.L. Hammer and S. Rudeanu, Boolean methods in operations research and related areas (Springer, Berlin, 1968).
182
K . Spielberg
[95] F.S. Hillier, Efficient heuristic procedures for integer linear programming with an interior, J .
ORSA 17 (1969) 638-679. [96] F.S. Hillier, Bound-and-scan algorithm for pure integer linear programming with general variables, J. ORSA 17 (1969) 638-679. [97] P.L. Hammer, E.L. Johnson and U.N. Peled. Facets of regular 0-1 polytopes, Math. Programming 8 (1975) 179-206. [9X] M. Held and R.M. Karp, The travelling salesman problem and minimum spanning trees, J . ORSA I8 (1970) 1138-1162. [99] P.L. Hammer and S. Nguyen, POSS- a partial order in the solution space of bivalent programs, Publ. Centre Rech. Math. 163, Univ. Montreal (1972). [ 1001 P.L. Hammer, M.W. Padberg and U.N. Peled, Constraint pairing in integer programming, Res. Rep. CORR 73-7, Univ. Waterloo (1973). [ 1011 M. Held, Ph. Wolfe and H.P. Crowder, Validation of subgradient optimization, Math. Programming 6 (1974) 62-88. [ 1021 R.G. Jeroslow, Cross-branching in bivalent programming, Mgmt. Sc. Res. Rep. 33 1, Cam.Mellon Univ. ( I 974). [I031 E.L. Johnson and K. Spielberg, Inequalities in branch and bound programming, in: [41] (1974). [ 1041 R.G. Jeroslow and T.H.S. Smith, Experimental results on Hillier’s linear search, Math. Programming 9 (1975) 371-376. [ IOS] C. Kastnig, Integer programming and related areas: a classified bibliography (Springer, Berlin. 1976). [I061 B. Korte. W. Krelle and W. Oberhofer, Ein lexikographischer Suchalgorithmus zur Loesung allgemeiner ganzzahliger PrOgrammierUngsdUfgaben, Unternehmungsforschung 13 (1964) 7398, 172-192. [ 1071 P.D. Krolak, The bounded variable algorithm for solving integer linear programming problems, Rep. COO-1493-16, Dept. Appl. Math. Comp. Sc., Wash. Univ. (1968). [ 081 P.D. Krolak, An improved bounded variable algorithm with adaptations to the special cases of the generalized knapsack problem and the integer fixed charge problem, Work. P. 1, Bus. Div., South. Ill. Univ. (1968). [ 091 H. Kreuzberger, Numerische Erfahrungen mit einem heuristischen Verfahren zur Loesung ganzzahliger h e a r e r Optimierungsprobleme, Elektr. Datenverarh. 12 (1970) 289-306. l o ] A.H. Land and A.G. Doig. An automatic method for solving discrete programming problems, Econometrica 28 ( I 960) 497-520. [ 1 I ] J. Lawrence, ed., OR 69, Proceedings of the Fifth International Conference on Operational Research, Venice, 1969 (Tavistock Publ., London, 1970). [ 1121 S. Lin, Computer solutions of the traveling salesman problem, Bell Syst. Tech. J. 44 (1965) 2245-2269. [ I131 J.D.C. Little, K.G. Murty, D.W. Sweeney and C. Karel, An algorithm for the traveling salesman problem, J . ORSA I I (1963) 972-983. [I141 C.E. Lemke and K. Spielberg, Direct search algorithms for zero-one and mixed integer programming, J. ORSA 15 (1967) 892-914. [ I IS] R.E. Marsten and T.L. Morin, A hybrid approach to discrete mathematical programming, Rep. OR 05 1-76, MIT (1976). [ 1 I61 G.T. Martin, An accelerated euclidean algorithm for integer linear programming (and later working papers), in: [90] (1963). [ 1171 D. McDaniel and M. Devine, Alternative relaxation schemes for Benders’ partitioning approach to mixed integer programming, Rep., Ind. Eng., Univ. Oklah. (1976). [ I I X ] P. Mevert and U. Suhl, Implicit enumeration with generalized upper bounds, Workshop on Integer Programming, Bonn (1975). [ I 191 J.F. Muth and G.L. Thompson, eds., Industrial Scheduling (Prentice Hall, Englewood Cliffs, NJ, 1963). [ I201 L. Nastansky and H.W. Grosse, Computational experiments with linear programming oriented implicit enumeration algorithms for the solution of linear (0, I)-programs, Work. P., Univ. Paderborn (1975). [ 1211 G.L. Nemhauser, L.A. Woolsey and M.L. Fisher, An analysis of approximations for maximizing submodular set functions- 1, Rep. 7618, CORE (1976).
Enumerative methods in integer programming
183
[ 1221 J.M. Patterson, Zero-one integer programming: a study in computational efficiency and a stateof-the art survey, Bicent. Conf. on Math. Progr., Bethesda (1976). [1231 C.C. Petersen, Computational experience with variants of the Balas algorithm applied to the selection of R. D. projects, Management Sci. 13 (1967) 736-750. [I241 C.J. Piper, Implicit enumeration: a computational study, Work. P. 115, Sch. of Bus. Adm., Univ. West. Ontario (1974). [ 1251 J.F. Pierce, Application of combinatorial programming to a class of all-zero-one integer programming problems, Management Sci. 15 (1968) 191-209. [126] J.F. Pierce and J.S. Lasky, Improved combinatorial programming algorithms for a class of allzero-one integer programming problems, Management Sci. 19 (1973) 528-544. [ 1271 J. Rubin, A technique for the solution of massive set covering problems, with application to airline crew scheduling, Transportation Sc. (1973) 33-48. [ 1281 J. Rubin, Scheduling of airline crews for aircraft schedules with frequency exceptions, Rep. G320-2103, IBM Cambr. Sc. C. (1974). [ 1291 H.M. Salkin, On the merit of the generalized origins and restarts in implicit enumeration, J. ORSA 18 (1970) 549-555. [ 1301 H.M. Salkin, Integer programming (Addison-Wesley, Reading, MA, 1975). [I311 H.M. Salkin and K. Spielberg, Adaptive binary programming, Rep. 320-2951, IBM N.Y. Sc. C. (1 968). [ 1321 L. Schrage, Implicit representation of variable upper bounds in linear programming, Math. Programming Study 4 (1975) 113-132. [ 1331 J.F. Shapiro, Generalized Lagrange multipliers in integer programming, J. ORSA 19 (1971) 68-76. [I341 A.L. Soyster, B. Lev and W. Slivka, Implicit enumeration via canonical separation, Report, Virginia Polytechn. Inst. and State Univ. (1976). [I351 K. Spielberg, Algorithms for the simple plant location problem with some side conditions, J. ORSA 17 (1969) 85-Ill. [ 1361 K. Spielberg, Plant location with generalized search origin, Management Sci. 16 (1969) 165- 178. [ I371 K. Spielberg, Minimal preferred variable reduction for zero-one programming, Rep. 320-3013, IBM Phil. Sc. C. (1972). [ 1381 K. Spielberg, Minimal inequality branch-bound method, Rep. 320-3024, IBM Phil. Sc. C. (1973). [I391 K. Spielberg, On solving plant location problems, in: [15] (1970). [ 1401 U. Suhl, Entwicklung von Algorithmen fuer ganzzahlige Optimierungsmodelle, Beitraege zur Untern. F., Heft 6, Freie Univ. Berlin (1975). [I411 A.J. Tomlin. Branch and bound methods for integer and non-convex programming, in: [1] (1970). [ 1421 L.E. Trotter Jr. and C.M. Shetty, An algorithm for the bounded variable integer programming problem, J. Assoc. Comput. Mach 21 (1974) 505-513. [ 1431 H.P. Williams, Experiments in the formulation of integer programming problems, Math. Programming Study 2 (North Holland, Amsterdam, 1974). [ 1441 S. Zionts, Generalized implicit enumeration using bounds on variables for solving linear programs with zero-one variables, Naval Res. Logist. Quart. 19 (1972) 165-181. [ 1451 S. Zionts, Linear and Integer Programming (Prentice Hall, Englewood Cliffs, NJ, 1974).
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 185-191 @ North-Holland Publishing Company
Report of the Session on
BRANCH AND BOUND / IMPLICIT ENUMERATION Egon BALAS (Chairman) M. Guignard (Editorial Associate)
This report is a summary drawn from the discussions and written statements by the participants. A name in parenthesis appears where it was deemed desirable to attribute a statement. Terminology There is no generally accepted terminology for the class of methods known as branch and bound, or search, or enumerative procedures. Admittedly, each of these names refers to methods for solving integer or other nonconvex programs by breaking up the feasible set into subsets, calculating bounds on the objective function value over each subset, and using the bounds to discard certain subsets of solutions from consideration. However, within this general class of methods, one can distinguish two basic prototypes. One, typified by Land and Doig [ll] and Dakin [6], is aimed at solving general mixed integer programs and uses linear programming as its main vehicle. The other one, typified by Balas [l], GeoMon [8], Lemke and Spielberg [13], is concerned with solving 0-1 programs, and uses as its main tool logical tests exploring the implications of the binary nature of the variables, or inequalities based on similar considerations. Some workers in the field prefer to reserve the term branch and bound for the first of the above two approaches, while calling the second one implicit enumeration. Others consider this distinction less and less relevant with the passage of time, as the two approaches are increasingly borrowing from each other, to the extent that the more recent algorithms usually contain elements of both. Finally, some authors use the terms branch and bound, or search, or enumerative methods, to describe the general class of procedures under discussion, and reserve the term implicit enumeration for the special subclass consisting of 0-1 programs. Here we follow the latter terminology. We will assume throughout that the problem to be solved in a minimization problem, and will use the following terms for the main components of a branch and bound algorithm. The search strategy (or node selection rule) is the criterion used to select a subproblem, i.e., a node of the search tree, to be examined. The branching rule is the device used for breaking up a subproblem, i.e., for generating the successors of a node of the search tree. In case the branching rule is 185
186
Report
defined with respect to a single variable, it subsumes the rule for choosing that variable. The lower bounding and upper bounding procedures are the methods by which such bounds are obtained. Logical tests (in the case of 0-1 programs) use the implications of binariness to strengthen the bounds, discard subsets of solutions, or (which is the same thing viewed differently) fix variables.
State of the art Branch and bound algorithms range from entirely heuristics-oriented procedures which do not use any but the most elementary mathematical results, to theoretically quite sophisticated methods incorporating devices based on convex analysis, group theory, polyhedral combinatorics, etc. The commercially available integer programming codes (see the survey by Land and Powell [121) are all based on general purpose miyed integer brancn and bound algorithms. While they can sometimes solve problems with hundreds of integer and thousands of continuous variables, they cannot be guaranteed to find optimal solutions to problems with more than 30-40 integer variables (Beale). On the other hand, they usually find feasible solutions of a practically acceptable quality for much larger problems (Krabek). These commercial codes, while quite sophisticated in their linear programming subroutines, do not incorporate at this time any, except for the most elementary, results of integer or combinatorial programming theory. They do incorporate, on the other hand, some useful heuristics derived from computational experience with many real-world problems. There was some discussion about why apparently useful theoretical results have so far not made their way into everyday use via the commercial codes. The main reason seems to be (Balas) that mathematical results concerning structural properties of integer or combinatorial programs can be translated. into algorithms in many different ways, and it requires a lot of experimentation to decide which, if any, of those ways are efficient. Such experimentation is both expensive and time-consuming, Furthermore, what is efficient for a certain class of problems (having some special structure), is often inefficient for others. The companies developing commercial software are reluctant to make the necessary investment into experimentation on the one hand, and into the development of specialized codes for which the demand may be too limited, on the other. This is confirmed, for instance, by the fact that such an important feature as GUB was deleted from MPSX 370 since the number of its users was deemed too small to justify the investment (Spielberg). A considerable number of specialized branch and bound algorithms, mostly of the implicit enumeration type, have been implemented by operations research groups in universities or industrial companies (see the surveys by Balas [2] and Spielberg [16]). Some of these codes can solve general (unstructured) 0-1 programs with up to 80-100 variables, and structured problems with up to several hundred (assembly line balancing, multiple choice, facility location), a few thousand (set covering, set partitioning, generalized assignment), or several thousand (knapsack, travelling salesman) 0-1 variables. However, unlike the
Branch and boundlimplicit enumeration
187
commercial codes, most of these programs are not used on a routine basis; neither are they universally available. Furthermore, while many of the real-world problems amenable to the above mentioned formulations fit within the stated limits, others substantially exceed those limits. Finally, some important and frequently occurring real-world problems, like job-shop scheduling and others, lead to combinatorial programming models which are almost always beyond the limits of what is solvable at our present state of knowledge.
Search strategies Of the two extreme search strategies known as “breadth first” (always choose node with best lower bound) and “depth first” (choose a best successor of the current node, when available, otherwise backtrack to the predecessor of the current node and reapply the rule), the first one tends to generate fewer nodes; the second one requires less storage space. The best strategy seems to lie in between, in the direction of always choosing the best successor node when available, but not using automatically the LIFO backtracking rule when no successor is available. Instead, in such cases one chooses the node with the best evaluation. The advantages of this rule were convincingly documented by Forrest, Hirst and Tomlin [7], whose evaluator (“best projection”) takes into account, besides the lower bound at the given node, the “distance” from an integer solution measured by the sum of integer infeasibilities. Several versions of the “best projection” evaluator, which differ in the way they measure the distance from an integer point, were tested by Piper [141 with interesting conclusions. While the merits of each specific evaluator may be open to dispute, there can be little doubt that the idea of replacing the simple criterion of best lower bound by one which also takes into account the amount of integer infeasibility at the given node is here to stay.
Branching rules When branching occurs on a single variable, its choice may be governed by various heuristic rules, like the largest penalty (or pseudo-cost), or the largest difference between the up and down penalties (or pseudo-costs), or the importance of the variable derived from extra-mathematical considerations. It has, however, been noted that whenever problem structure or problem formulation makes it possible to use branching rules which fix more than one variable, a more efficient branch and bound procedure results. This has been evidenced in the case of multiple choice constraints or special ordered sets (Beale and Tomlin [ 5 ] ) . Recently, the branching rule used in the above cases has been extended to a more general structure called linked ordered sets (Beale [4]).This latter development demonstrates the fact that the potential for such improved branching rules depends as much on problem formulation as it does on structure. More generally, it can be asserted that whatever problem structure one starts
188
Repon
with, after finding a good solution (and certainly after finding an optimal solution), the problem becomes tightly constrained (assuming the constraint which excludes solutions not better than the best one at hand). This feature deserves careful consideration (Balas), and only recently have there been attempts to use it systematically. One way of using it is to branch on a disjunction derived from a conditional bound (see Balas’ survey [3]). Each term of such a disjunction usually forces to zero several variables, and whenever the number of terms is small, this branching rule is superior to the usual one.
Bounding procedures Perhaps the single most important component of all enumerative methods from the point of view of its influence on overall efficiency, is the lower bounding procedure. Much of the discussion centered around ways of improving the latter. Cutting planes have an important role to play here, since they can substantially improve the bound obtained by linear programming. This is especially true in the cases where the cutting planes used are either facets, or other higher-dimensional faces of the convex hull of feasible integer points, as demonstrated in the case of the travelling salesman problem (Padberg). The key aspect here is to be able to generate a relatively small set of cutting planes which, when added to the linear program, substantially reduces the gap between the integer and the continuous optimum. This can sometimes be achieved by generating cutting planes from disjunctions corresponding to a partial search tree. It has been shown (Balas [3]) that such cuts often provide better bounds than the partial tree from which they were derived. Another way of using cutting planes is to take them into the objective function with appropriate multipliers, in a Lagrangean fashion. Optimal multipliers can either be calculated by subgradient or ascent methods, or approximated by heuristics. A related, but different procedure is to derive a Lagrangean dual by using the group equation associated with the basis at the given node (Shapiro [15]). From this point of view, branch and bound can be viewed as a method of perturbing the Lagrangean dual, in the sense that, if a given dual problem fails to solve the integer program, there exists at least one variable to branch on, such that the new Lagrangean dual will yield a stronger bound (Shapiro). The above discussion was concerned with generating strong lower bounds. The problem of obtaining good upper bounds is also an important one. This is mainly accomplished by various local search procedures and similar heuristics imbedded in implicit enumeration algorithms and meant to discover feasible integer solutions that might exist in the “vicinity” of the node under examination. Computational tests by Piper [14], Jeroslow and Smith [lo], and others have convincingly documented the usefulness of such procedures in bringing about the early discovery of good, frequently optimal, solutions.
Branch and boundlimplicit enumeration
189
Logical tests In spite of the increasing use of linear programming as a bounding procedure even within implicit enumeration algorithms, the usefulness of logical tests exploiting the binary nature of the variables has been convincingly documented by statistical experiments. Computational tests with implicit enumeration algorithms using an imbedded linear program have shown that taking out the logical tests increases computing times by 10-lOOo/~ (see Piper’s above mentioned study [14], and also Spielberg [15]). One of the shortcomings of the commercial branch and bound codes is that they do not contain such tests. Since these codes are sometimes used to solve pure 0-1 programs, their performance in such instances could clearly be improved by the incorporation of tests. In the mixed integer 0-1 case, logical tests can be used by deriving Benders-type inequalities in the 0-1 variables only. Along with the tests, the use of canonical or logical inequalities would also have a positive effect (Spielberg). A particularly useful class of tests, when available, are those based on dominance (Ibaraki). They have been used with good computational results on the knapsack problem. Furthermore, it has been shown by Ibaraki [9] that under certain assumptions a branch and bound procedure has an exponential behavior in the absence of dominance tests, but not so in their presence. Unfortunately, valid dominance tests have so far been found only for a few classes of problems. Special structures The short history of integer programming clearly shows the importance of structure. The most likely type of progress that one can expect in the area of integer programming is the discovery of new efficient procedures for various special structures (Balas). Though Karp’s classification of integer programs from the point of view of their computational complexity shows that most “special structures” are not really so special (Edmonds), it must not be forgotten that this classification is based on a worst-case analysis and there is empirical evidence that there are enormous differences between various classes of problems from the point of view of the expected performance of algorithms (Balas). It suffices to point to the 0-1 knapsack problem, for which one can always construct examples not solvable in time polynomial in the data (using binary encoding), whereas on randomly generated problems computing time with a recent algorithm of Balas and Zemel[3a] has been empirically observed to grow linearly with the number of variables and logarithmically with the range of coefficients. An important special structure which has not yet been successfully investigated is that of a set of loosely connected integer programs (Beale). While a similar structure in the linear programming context has led to efficient decomposition procedures, no comparably efficient procedure exists for solving the integer programming counterpart of this structure. Other interesting Special structures appear as a result of replacing various kinds of nonlinear functions with their
190
Report
piecewise linear approximations. The study of these structures holds good promise. Need for experimentation Several speakers have stressed the need for systematic computational experiments, controlled experiments in a statistical sense, to assess the role and efficiency of various choice criteria, branching rules, bounding procedures, tests, heuristics, etc. for mixed integer programs in general, and for different classes of problems in particular. Discussants at this session included E. Balas, E.M.L. Beale, J. Edmonds, J.J.H. Forrest, F. Giannessi, P. Hansen, T. Ibaraki, E.L. Johnson, R. Jeroslow. C. Krabek, C. Lofin, M. Magazine, M. Padberg, J. Shapiro and K. Spielberg.
References [l] E. Balas, An additive algorithm for solving linear programs with zero-one variables, Operations Res. 13 (1965) 516-546. [2] E. Balas, Bivalent programming by implicit enumeration, in: J. Belzer, A.G. Holzman and Kent, eds.), Encyclopedia of. Computer Science and Technology, Vol. 2, (M. Dekker, New York, 1975) 479-494. 131 E. Balas, Disjunctive programming, Ann. Discrete Math. 5, this volume. [3a] E. Balas and Zemel, Solving large zero-one knapsack problems, M.S.S.R. No. 408, CarnegieMellon University, June 1976, to appear in Operations Res. [4] E.M.L. Beale, Branch and bound methods for mathematical programming systems, present volume, p. [5] E.M.L. Beale and J.A. Tomlin, Special facilities in a general mathematical programming system for nonconvex problems using ordered sets of variables, in: L. Lawrence, ed., Proceedings of the 5th International Conference on Operations Research (Tavistock Publications, London, 1970) 447-454. [6] R.J. Dakin, A tree-search algorithm for mixed integer programming problems, Comput. J. 8 (1965) 250-255. [7] J.J.A. Forrest, J.P.H. Hirst and J.A. Tomlin, Practical solution of large mixed integer programming problems with Umpire, Management Sci. 20 (1974) 736-773. [8] A.M. GeoRrion, An improved implicit enumeration approach for integer programming, Operations Res. 17 (1969) 437-454. [9] T. Ibaraki, The power of dominance relqtions in branch and bound algorithms, J. Assoc. Comput. Mach. 24 (1977) 264-279. [lo] R.G. Jeroslow and T.H.S. Smith, Experimental results on Hillier’s linear search, Math. Programming 9 (1975) 371-376. [ll] A.H. Land and A.G. Doig, An automatic method for solving discrete programming problems, Econometrica, 28 (1960) 497-520. [12] A.H. Land and S. Powell, Computer codes for problems of integer programming, Ann. Discrete Math. 5, this volume. [13] C.E. Lemke and K. Spielberg, Direct search algorithms for zero-one and mixed integer programming, Operations Res. 15 (1967) 892-917. [14] C.J. Piper, Implicit enumeration: a computational study, W.P. 115, School of Business Administration, University of Western Ontario, London, Ontario (1974).
Branch and boundlimplicit enumeration
191
[15] J. Shapiro, A survey of Lagrangean techniques for discrete optimization, in: Annals of Discrete Mathematics 5 (North-Holland, Amsterdam, 1978) [161 K. Spielberg, Enumerative methods in integer programming, in: Annals of Discrete Mathematics 5 (North-Holland, Amsterdam, 1978)
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 193-194 @ North-Holland Publishing Company.
Report of the Session on CU”G
PLANES
Michel GONDRAN (Chairman) Bruno Simeone (Associate Editor) Cutting planes were used as early as 1952 when Dantzig, Fulkerson and Johnson began work on the travelling salesman problem. They used subtour elimination constraints as well as ad hoc linear constraints generated to exclude a current fraction answer. By 1958, Gomory had a finitely convergent cutting plane method. Perhaps the most successful extension and use of Gomory’s cutting plane work was by Glenn Martin beginning in the early 1960’s. In the mid-1960’s Edmonds showed how facets of the matching polytope, and later the matroid intersection polytope, could be used in a polynomially bounded algorithm. This work showed that, at least in these two special cases, despite the fact that there are an enormous number of facets, they can be used efficiently to solve problems. This work cannot be considered to be cutting plane work, but it has obvious implications. Later work has included convexity and intersection cuts, disjunctive cuts, hybrid methods, polarity, subadditive functions to give cuts, Benders cuts, and logically derived cuts. There is a common contention that cutting plane methods have been a failure in practice. The general view expressed is that there have been enough successes, enough untried avenues, and sufficiently many new directions that the contention is nonsense. However, in saying so, some new work employing different directions was brought in; for example, the idea of “facet cuts” used successfully by Grotschel, Hong, Land, Milliotis, and Padberg in carrying on the approach of Dantzig, Fulkerson, and Johnson to solve travelling salesman problems. This approach tries to remain integer feasible and generate a facet of the convex hull of solutions which goes through the current integer vertex and cuts off a better, but fractional, answer. This work also has the interesting aspect that it uses the matching facets as cuts. The possibility of using facets for a relaxed version of the problem is one promising avenue. In general, the idea of strengthening the linear programming relaxation of an integer program is thought to be very important for branch-and-bound methods. However, a straight-forward implementation need not work well. Guignard and Spielberg have had good results using the Gomory-Johnson subadditive cuts in an enumeration code, where the linear programs are solved infrequently but used to 193
194
Report
guide the subsequent search. In general, a stronger linear program is very important, and some of the new directions seem to be promising in providing this better program to be used in other solution techniques. The use of Bender’s cuts come from not following his decomposition procedure completely, but extracting some of the cuts derived in the process of his procedure. In connection with this idea, the notion of preserving or utilizing structure of a problem was raised. Sometimes cuts destroy the matrix structure present up to then. In particular, most cuts tend to be dense. The main suggestion raised was to use cuts in the objective function in a LaGrangean fashion, but the need to preserve structure points to a promising research direction. Certain problems tend to be solved easily using cuts such as Gomory’s original fractional cutting plane method. Among these problems are set covering problems such as those encountered in the crew scheduling problem. Glenn Martin’s success with his accelerated Euclidean method is well-known for such problems. However, there are hard set-covering problems, too. Gomory’s cutting plane methods can behave very differently on quite similar problems. In this area, there is practically no understanding, other than the vague one that the linear program should closely approximate the integer problem, of what makes a problem easy or intractable. Another drawback of cutting plane methods is that they have never worked well on mixed-integer problems. This fact is somewhat paradoxical since the mixed problem is a relaxation of the pure problem. Recent work on the mixed group problem and subadditive functions with gauges as directional derivatives gives hope of generating cutting planes which use the proper structure for mixed problems. A possible advantage of using cutting planes which was mentioned is that post-optimal analysis might be possible. Although this area is difficult, the introduction of a subadditive dual concept gives some hope of progress. One area which was discussed was when cuts should be kept and when discarded. Little effort has gone into systematically determining which constraints are non-redundant in a linear sense. Commonly, the original constraints are all kept and cuts are discarded when they become slack. In using facet cuts, the cuts are usually not discarded once they have been generated. Finally, we return to the success question. Are cutting planes fruitful only as an area for a proliferation of papers? Solid computational success, other than for the well-solved problems using special algorithms, is practically limited to Martin’s work on airline crew scheduling and the work on the travelling salesman problem. Yet the general consensus was that there are enough ideas and enough open avenues that cutting planes cannot be written off and should, in fact, become a useful part of the problem-solver’s library of codes.
Annals of Discrete Mathematics 5 (1979) 195-197 @ North-Holland Publishing Company
Report of the Session on GROUP THEORETIC AND LAGRANGEAN METHODS M.W. PADBERG (Chairman) A. Bachem (Editorial Associate) The session started out with J.F. Shapiro giving a short survey of his view of the field. He had worked experimentally and theoretically on the polyhedra describing the sets of cuts one could add to the linear programming tableau implicitly by group equations and he used Lagrangean methods for handling these cuts. The following computational advantages using Lagrangean methods were emphasized: 1. You do not have to explicitly generate these inequalities and add to the tableau. 2. You can generate monotonically increasing lower bounds. 3. The Lagrangean returns integer solutions every time. He reports on computational experience he had with Lagrangean and dual methods using the group problem as a subproblem. Especially he found that data manipulation procedures which permit to approximate large groups by small groups have sometimes been quite effective. He made a plea that one should not condemn the group theoretic methods because the groups became too large. He pointed out that this should be an area of numerical analysis in discrete optimization. Numerical analysis in LP had received good results but this had been neglected for integer programming so far. M. Gondran then explained importdnt features of the Lagrangean methods. Especially he gave a classification of problem classes solved by Lagrangean methods in the following way: Class A: Including the problems where the formulation is intrinsic and the realization of the constraints is imperative (e.g., the travelling salesman problem, where the constraints are imperative). Class B: Containing problems having an intrinsic formulation but where the realization of the constraints is not imperative (e.g., telecommunication problems). Class C: Problems where the formulation is not intrinsic (e.g.. duality in group theoretic problems). Lagrangean methods could be very interesting for Classes A and B and rather less interesting for Class C. Lagrangean methods gave good results if the order of the group was small; this was the case where cutting planes also gave good results. The cutting plane methods were perhaps better because they used at each step a new group and 195
196
Report
corresponded to a more additive method in this way. Gondran finally gave an example of a nonlinear integer programming problem which can be solved by group theoretic approaches. E.L. Johnson summarized some parts from his DO 77 paper on asymptotic problems. He emphasized that the group theoretic development grew out of cutting plane work and knapsack function work and had given: (1) An asymptotic theorem describing (for large b) the relation between the linear and the integer solutions to an integer linear programming problem. (2) A facet description of the convex hull of group solutions in terms of subadditive functions on the group leading to a duality result and a method for solving the group problem. (3) A dynamic programming or shortest path method for solving a relaxed problem, which has been used, e.g., by Gorry and Shapiro in pure integer enumeration codes. (4) A subadditive approach which generalizes to pure and mixed integer programming. ( 5 ) Use of group ideas in Lagrangean methods. The session continued with a discussion of duality results for group problems. Two different kinds of duality were mentioned, the kind worked on by Bell and Shapiro, and secondly, the subadditive duality worked out by Johnson and Jeroslow. Commonalities and differences between these two notions of duality still await studying. Also mentioned was a different type of duality theory due to Balas (1967). Balas explained that his version of duality was related to the Benders approach for the mixed integer program. There was no computational consequences for the linear case, although for the nonlinear case it gave a generalization of Benders decomposition method. The discussion then shifted to the question of what the group theoretic and the Lagrangean method had contributed to the theory of integer programming, especially what in particular group theory had contributed to the description of polyhedra. The fact that edge-matching problems and node-packing problems give rise to the same structure was noted with the inevitable conclusion that at least in this case no insights had been gained from the group approach. This puts the group approach into perspective as one possible relaxation of the general integer programming problem though certainly as the most extensively studied and best developed such relaxation. The subadditive methods permit a greater flexibility in this respect and thus can be expected to yield more with regard to the above question. With regard to Lagrangean methods it was remarked that their notable computational success stems mainly from the ability to solve certain linear programs by way of subgradient optimization. In this context, R. Jeroslow told of pertaining computational success by the late H. Samuelson who applied subgradient optimization to the (highly degenerate) set covering problem. Samuelson’s empirical findings (contained in several Management Science Research Reports available from GSIA, Carnegie-Mellon University) indicate that for large enough
Group theoretic and Lagrangean methods
197
problems there is a factor of rn (which is the number of rows) by which the subgradient methods are faster than the simplex method which however, was faster on smaller problems. The breakeven point was at about 40 rows from which point on the subgradient method was better. While pertaining empirical findings exist only for specially structured problems, it should be worthwhile to exploit subgradient optimization in general linear programming.
This Page Intentionally Left Blank
PART 4
COMPUTER CODES
CONTENTS OF PART 4 Surveys
E.M.L. BEALE,Branch and bound methods for mathematical programming 201 systems A. LANDand S. POWELL, Computer codes for problems of integer programming 221
Reports Current state of computer codes for discrete optimization (J.J.H.FORREST) 271 275 Codes for special problems (F. GIANNESSI) 279 Current computer codes (S. POWELL)
Annals of Discrete Mathematics 5 (1979) 201-219 @ North-Holland Publishing Company
BRANCH AND BOUND METHODS FOR MATHEMATICAL PROGRAMMING SYSTEMS E.M.L. BEALE Scientific Control Systems Ltd and Scicon Computer Seruices Ltd
Branch and Bound algorithms have been incorporated in many mathematical programming systems, enabling them to solve large nonconvex programming problems. These are usually formulated as linear programming problems with some variables being required to take integer values. But it is sometimes better to formulate problems in terms of Special Ordered Sets of variables of which either only one, or else only an adjacent pair, may take nonzero values. Algorithms for both types of formulation are reviewed. And a new technique, known as Linked Ordered Sets, is introduced to handle sums and products of functions of nonlinear variables in either the coefficients or the right hand sides of an otherwise linear, or integer, programming problem.
1. Introduction In principle, integer programming means the formulation and solution of problems that can be represented as linear programming problems when some or all of the variables are required to be integer. These variables are often zero-one variables indicating whether or not some activity should take place, but the Branch and Bound methods that have been incorporated in many mathematical programming systems are equally applicable to integer variables that can take general nonnegative values. And the methods have been extended to other formulations that could be expressed in terms of integer variables, but where this would make the problems harder by obscuring their logical structure. In practice, it therefore seems best to allow the term “integer programming” to cover all applications of Branch and Bound methods to nonconvex programming problems. The Branch and Bound approach can be explained in general terms as follows. We wish to maximize f(x) subject to the constraint that x E R . Suppose that we can find a point x ’ that maximizes f(x) subject to the constraint that x E R I , where R includes R . This is much easier than the original problem if f(x) is concave and R , is convex but R is not convex-since we can use a hill-climbing technique such as the Simplex Method to find a local optimum, which is then necessarily a global optimum. And the problem is not much harder if R , consists of the union of a moderate number of convex sets, since we can find the optimum within each convex subset of R , and take the best of them. If now x ’ E R it solves the original problem. Otherwise we modify R , so that it excludes x“ but no point in R, and solve the modified problem. More generally, if we have found a point x” in R, known as the Incumbent Solution, then we need only insist that the modified R ,
,
201
202
E . M . L . Beale
does not exclude any points in R for which f(x) >f(x"). Whenever we find a point in R with f(x)>f(x")this becomes the new incumbent solution. This process is repeated until we have a solution to the modified problem that is in R, or until R , is empty because there are no points in R with f(x)>f(x"). The original region R , is usually taken as the set of points satisfying all the equality and inequality constraints, but omitting the requirement that variables must take integer values. And the modifications to R , usually consist of adding linear inequality constraints. If a single inequality is added, it is known as a Cutting Plane. This approach was pioneered by Gomory [ 101, and many alternative types of cutting plane have been developed since. But the apparently more pedestrian approach of adding two alternative linear inequalities has proved more reliable. This was called Branch and Bound by Little et al. [12]. Branch and Bound methods for explicit integer programming formulations are discussed in Section 2. Special Ordered Sets, introduced by Beale and Tomlin [2], allow these techniques to be applied to some important problems without explicitly introducing integer variables. Beale and Forrest [I] have extended these ideas to the global optimization of nonconvex problems including continuous nonlinear functions of single arguments. This work is discussed in Section 3. More general nonlinear functions are considered in Section 4.Many problems contain only a moderate number of nonlinear variables in the sense that, when these variables are given fixed values, the problem reduces to a linear, o r integer, programming problem. A new technique, known as Linked Ordered Sets, is introduced to handle problems of this kind when all the coefficients of the variables, and the terms independent of the linear variables, can be expressed as sums of products of functions of individual nonlinear variables. Since the rest of this paper is about formulations and algorithms, it seems appropriate to end this introduction with a few words about the applicability of integer programming. The largest feasible size for integer programming problems is similar to that for linear programming problems; and integer programming codes take advantage of the recent improvements in linear programming methodology as well as other improvements, so that they are now about 10 times faster on the same hardware as they were 5 years ago. So the scope for integer programming applications has increased; but not dramatically because the size of an integer programming problem measured in numbers of constraints, variables and nonzero coefficients remains a very poor guide to its difficulty. Some large practical problems can be solved very easily. But unless the number of integer variables is very small, say less than 20, it is important that the continuous optimum should give a fair guide to the integer solution, or at least that the value of the objective function in the modified problem should approximate the value at the best integer solution, after a few alternative sets of inequality constraints have been added. Some formulations where this is not so have consumed many hours of CPU time on large computers without producing anything useful. So, although
Branch and hound methods
203
one should approach integer programming applications optimistically, this optimism must be cautious, and one must be prepared for setbacks on new ty’pes of application -or even on old types of application when the problem becomes larger. These setbacks can sometimes be overcome by developing a new mathematical formulation for the same physical problem, or by changing some of the parameters controlling the search strategy. Indeed two vital reasons for the success of Branch and Bound methods are that one can often see how to reformulate a problem to make it more amenable to these methods, and that one will usually have a good solution if the search is terminated before a guaranteed optimum has been found.
2. Explicit integer programming Branch and Bound methods for integer programming were pioneered by Land and Doig [l 13. They have been developed extensively since then, and many current ideas are described by Benichou et al. [3, 41, Mitra [ 141 and Forrest et al. [ 8 ] . This account is similar, but it incorporates unpublished experience from Scicon Computer Services Ltd. The methods have been developed primarily for problems containing both integer and continuous variables, but Breu and Burdet [6] show that they are also appropriate for problems containing only zero-one variables. It is convenient to start by introducing some general notation. We let x,, denote the value of the objective function to be maximized, and write the problem as: Choose values of the variables x; to maximize xo subject to
1i uijxi= bi
(i = 1, . . . , m),
(2.1)
L; =s x; 6 ui. Certain specified xi (the integer variables) must take integer values. (2.2) The method involves temporarily ignoring the constraints (2.2) and solving a sequence of linear programming subproblems. These all have the same form as the original problem (2.1) except that they have different sets of lower and upper bounds Li and U,. We let ui denote the value of xi at the optimum solution to a linear programming subproblem. For an integer variable we are interested in the integer and
E.M.L. Beale
204
fractional parts of ui,so we write uj = nj +f,,
(2.3)
where n, is an integer and 0 Sfi < 1 . We also define the value of a subproblem as the maximum attainable value of xo at any point satisfying (2.2) as well as the constraints of the subproblem. The method can now be described in general terms as follows. First solve the problem as a linear programming problem, treating the zero-one variables as continuous with bounds of zero and one, and general integer variables as continuous with suitable (integer) bounds. It does not usually matter if these bounds are far apart. If all integer variables then take integer values, the entire problem is solved. Otherwise this continuous optimum provides an upper bound uo on the maximum attainable value of xO,which may be compared with the value of xo at any feasible solution found later. Even if it is not a global optimum, such a feasible solution is known as an integer solution, indicating that all integer variables are within some tolerance of integer values. The word infeasible then describes a solution that does not satisfy (2.l), or a problem for which all solutions are infeasible in this sense. If the continuous optimum solution is not integer, we set up a list of alternative subproblems to be explored, with the original linear programming problem as its first entry. We also keep the incumbent solution, together with a cut-off value xoc. Normally xOc is taken as the value of the incumbent solution, but we may increase it by some nonnegative tolerance T V A L . If T V A L > O we may miss the global optimum, but we must find a solution whose value is within T V A L of the global optimum if we complete the search, and the task of completing the search may be made much easier. If the objective function is defined only in terms of integer variables, then T V A L can safely be set to a quantity just less than the Highest Common Factor of the coefficients defining the objective function. So if these are arbitrary integers we may set T V A L = 0.99. Until an integer solution is found we may set xoc to --oo. We can now define a general step of the algorithm, as follows. See if the list is empty. If so stop. The incumbent solution is guaranteed to be of the global optimum. within TVAL Otherwise remove a subproblem from the list, and solve it. If uOs xoc abandon the subproblem, since it cannot produce a significantly better solution. Otherwise, if all integer variables take integer values we have a new incumbent solution. Put xOc = uo+ T V A L , and remove any subproblem from the list if the upper bound on its value is less than or equal to the new value of xoc. If uo > xoc, but the solution is not integer, then choose a variable on which to branch from the unsatisfied integer variables, i.e. an integer variable x, such that f , # 0. Then we must ultimately have either x,S n, or x, 2 n, + 1. So add a pair of new subproblems to the list, which are both the same as the one just removed, except that in one U, is replaced by n, and in the other L, is replaced by n, + 1 .
Branch and bound methods
205
With each subproblem store: (a) the lower and upper bounds for each integer variable, (b) the optimum basis for the linear programming problem from which the new subproblem was generated, (c) a guaranteed upper bound U on the value of the subproblem, and (d) some estimate E of this value. This procedure is often represented as a tree-search. Each node of the tree represents a subproblem. The root represents the original linear programming problem, and when any subproblem, or node, is explored any new subproblems generated are joined to it by branches. The list of alternative subproblems to be solved is therefore called the list of unexplored nodes. This tree structure clarifies the logic of the method. Fortunately it does not need to be represented explicitly in the computer. This general description leaves five important questions unresolved: (Ql) Which subproblem should be explored next?
(Q2) How should it be solved? (Q3) On which variable should we branch? (Q4) How should U be computed? (QS) How should E be computed? We now consider each of these in turn. Concerning Question Ql, there is a theoretical argument for always exploring the subproblem with the highest value of V,because this would minimize the number of subproblems created before completing the search if the branching procedure from this subproblem were unaffected by the value of xoc. But this strategy makes no attempt to find any good integer solutions early in the search, and is subject to five objections: (i) The branching procedure can sometimes be improved if we have a good value of xoc. (ii) A good value of xOc reduces the work on some subproblems, since they can be abandoned before they are fully solved if it can be shown that voSx,,,. (iii) We can often arrange to solve either, or even both, of the subproblems that have just been generated without re-inverting to the current basis. (iv) The list of unexplored nodes may grow very long, and create computer storage problems. (v) N o useful results may be obtained if the search has to be ended before a guaranteed optimum has been found. An alternative strategy was advocated by Little et al. [12] and used in early work on large-scale integer programming. This is “Always Branch Right”, or
206
E.M.L. Beak
equivalently “Last In First Out”. It minimizes the number of subproblems in the list at any time, and usually finds a good solution early, which can be used to improve the rest of the search. It also allows the subproblems in the list to be stored in a sequential store such as magnetic tape. But it has been found inadequate on larger and more difficult problems, and has been discarded. There is not much wrong with Last In First Out as long as we are branching on one of the subproblems just created. And this is particularly true if the more promising of these subproblems is always chosen. But when neither subproblem leads to any new subproblem, Last In First Out involves backtracking through the most recently created subproblems, and this is often unrewarding. Practical integer programming codes now contain many options, but the following answer to Q1 is often as good as any. Divide the algorithm into stages. At each stage, consider a pair of subproblems that were created at the same time, and explore either one or both of them. Choose the pair containing the subproblem with the highest value of E, but if any new subproblems were created in the previous stage restrict the choice to one of these. Having selected a pair of subproblems, explore the one with the higher value of E. Then explore the other (if it is still in the list) unless a new subproblem has just been created with at least as good a value of E. Once xOc>-w, it is rather better to change the criterion and to use the ‘percentage error’ criterion due to Forrest e t al. [8]. This is to maximize ( U xoc)/( U - E ) rather than E when choosing subproblems. The reason for this is as follows. The value of the subproblem is certainly less than or equal to U, and our estimate is that it is U - ( U - E ) . Suppose that the value is in fact U - O ( U - E ) . This is an improvement on the best known solution so far if and only if
So the subproblem for which this ratio is maximized is in a sense the one most likely to yield a better solution. We now turn to Question Q2: how should each subproblem be solved? It is normally best to use some dual method for linear programming, both because we start with a dual feasible solution and because we can abandon the task if we can ever establish that uos xoc. And subproblems with a very low value of uo tend to be particularly awkward to solve. The most natural method is parametric variation on the bounds of the integer variables. Question 0 3 is perhaps the most important and the most difficult. In principle we should try to branch first on the important variables. We can see this by noting that a first branch on an irrelevant variable doubles the size of the problem without making any progress towards solving it. Often the formulator of the problem can assess the relative importance of the variables. So he should have the option to assign them priorities, and the code will then always branch on some unsatisfied integer variable with the highest priority.
Branch and bound methods
207
If no priorities are assigned, or if there is more than one unsatisfied integer variable with the highest priority, then some purely numerical method must be used to choose between them. This is done by considering the possibility of branching on each in turn and estimating the resulting values of either U or E. Early work on integer programming emphasized the bounds U. Driebeek [7] introduced the term penalty to describe the difference between uo and the bound associated with driving an integer variable from its current trial value to the nearest integer in some particular direction. He showed how these can be calculated by implicitly carrying out one step of the dual simplex method. Tomlin [15] showed how these penalties can sometimes be sharpened at a negligible computing cost by using the fact that no nonbasic integer variable can be increased to a value less than one. Geoffrion [9] presents these and stronger (but more elaborate) bound calculations as special cases of Lagrangian relaxation. Breu and Burdet [6] suggest that such methods are useful for zero-one programming problems; but the fact remains that practical problems tend to have several nonbasic variables with zero reduced costs, when these methods are largely useless. It therefore seems best to concentrate on the estimates E. Corresponding to each unsatisfied integer variable x, we find some way of computing OF, the estimated reduction in uo from reducing U, to n,, and D:, the estimated reduction in u(, from increasing L, to n, + 1. The quantities 0; and 0: are known as estimated degradations from driving x, down and up respectively. The most straightforward way to estimate degradations is to use pseudo-costs. We can define Down and U p pseudo-costs P,, and P,, representing the average incremental costs of decreases and increases in the integer variable x, from its optimum value in any linear programming subproblem. We can then write
and 0; = P,;(l
-h).
The pseudo-costs may be treated as input data, or they may be estimated from the results of previous branching operations in other subproblems. If uo dropped by A, when we imposed the bound xk G nk, then we may estimate PDk by AD/fk. Similarly if uo dropped by A, when we imposed the bound xk 2 nk + 1 then we may estimate PUk by A,/(l - f k ) . But the concept of pseudo-costs does not generalize naturally to Special Ordered Sets. So we now switch attention from columns to rows, and develop an alternative approach that seems to be more logical for both integer variables and Special Ordered Sets. When considering the effect of imposing a change on the value of some integer variable xk, it is natural to rewrite the constraints with this variable on the right
E.M.L. Beale
208
hand side. The problem then reads: Maximize xO subject to
c
a,,x,= b, - a , k X k
( i = 1, . . . , m )
i+k
L, s x, s
u,.
This shows that if we increase the trial value of x k by l - f k , then there is no effect on the value of xo or any of the other variables if we simultaneously decrease each b, by 2, = a , k ( l - f k ) . The same argument applies to decreasing x k by fk if we put z, = - a , k f k . So, to evaluate the effect of changing x k , we imagine that the value of this variable is held constant while we decrease each b, by z,, operating simultaneously on all rows. If rl denotes the shadow price, or Lagrange multiplier, o n the ith row, then if all 2, were small oo would be degraded 1,r , z , . But this is not often a realistic estimate of the degradation D. It is a guaranteed lower bound, which will often be zero. Indeed it will always be zero if x k is a basic variable. We can find an upper bound on the degradation in terms of the minimum and maximum shadow prices rMINl and TMAX,. Brearley et al. [ 5 ] show how to calculate these bounds. In particular, note that ~TMINO= rMAXO = 1, and ~ T M ~ N 3, 0 if the ith constraint is a less-than-or-equal-to inequality. If P denotes the set of rows for which z, > 0 and N the set of rows for which z, <0, then
So we may write D=
1T i z i+ 1ri, I
(2.4)
I
where
0 s ri s (rMAXi - r i ) z i for i E P for EN.
But these upper bounds are not necessarily useful and may even be infinite. It seems that only heuristic methods can provide realistic estimates for ri. If the reductions on the right hand sides are written as Oz,, then we may consider how the shadow prices pi vary as 8 increases from 0 to 1. Elementary theory shows that pizi increases monotonically. Although it is piecewise constant, and increases discontinuously when the basis changes, the overall effect is much the same as if it increased continuously. But the increases may stop, and the
xi
Branch and bound methods
209
individual p, will always lie within the bounds for the shadow prices, which may be far tighter than indicated by our preliminary analysis. So we introduce further adjustable parameters, which we call pseudo-shadow-prices r L l and r u l ,and pseudo-derivatives rDI. If we also define a small positive tolerance TpI,we may estimate D as if p, varied linearly with z , between limits of = max
(nMINl,
min (nL,,rl- TPJ)
rUA, = min
(rMAXl,
max ( r u lr,,+ Tpd),
rLAt
(2.5)
and, with a rate of change of rDI(which must be positive). Since r, = I:'=(, (p,(z)- r , )d z , and p,(O) = v,, these assumptions imply that, for i E P r, = +TD,z, 2
if
TD,Z,
< r U A ,- IT,,
I
r, = (ruA, - r l ) ( z ,-f(vUA, - r l ) / r D ,otherwise. )
and for i E N
(2.7) Note that, from i E P, ri > 0 unless vi = rMAXi, and for i E N, ri > 0 unless r i = rMINi. So the estimated degradation is never zero unless the true degradation is zero. The optimum parameter settings for this approach are not clear. Beale and Forrest (1976) recommend small positive values for all vUi,small negative values for all rLiand effectively infinite values for all rDi. These work reasonably well, but cannot be taken as the last word. We can now calculate D; for (2.4) to (2.7) with zi = -uiifi, and 0: in the same way with zi = qj(1-fi). We now return to Question Q3, and consider how to use either penalties or estimated degradations for branching. In some ways it is most natural to branch on the variable for which the larger of the two penalties, or estimated degradations, is largest. If we use penalties and can show that UGx,,, for some branch, then we can discard this branch immediately and make a forced move. Theoretically we may be able to make many such forced moves at once, driving each of the associated integer variables to its only possible neighbouring integer value. But the same argument does not apply when using estimated degradations; and it now seems best to branch on the most important variable, defined as the one for which the smaller of the two estimated degradations is as large as possible. Forced moves, to reduce the feasible region in a subproblem without increasing the number of subproblems, remain useful when they can be made easily. If xi is a nonbasic integer variable and the absolute value of its reduced cost is greater than or equal to t ~ ~ - x ~ , -then , this variable cannot be changed by a whole integer without destroying all hope of finding a better solution, so we can fix this variable
210
E.M.L. Beale
at its current trial value in this subproblem. Brearley et al. [5] discuss other devices for reducing the feasible regions for linear and integer programming problems. Concerning Question Q4, our current procedure is to set U to the value of uo for the subproblem from which the new problem will be generated. But a more sophisticated approach may well be justified, particularly if the corresponding value of E is less than xoc, so that there is some hope of eliminating the entire subproblem. Question Q5 can be answered by assuming that the degradations from driving all integer variables to integer values act independently. The resulting estimates are often useful, even when this assumption is far from valid. So the estimate associated with reducing Uk to f l k is defined by
C min (or,DT),
E = uo- ~i -
j f k
while the estimate associated with increasing
L k
to
nk+,
is defined by
3. Special ordered sets Special Ordered Sets were introduced by Beale and Tomlin [2]. They are of two types. Special Ordered Sets of Type 1, or S1 sets, are sets of variables of which not more than one member may be nonzero in the final solution. Special Ordered Sets of Type 2, or S2 sets, are sets of variables of which not more than two members may be nonzero in the final solution, with the further condition that if there are as many as two they must be adjacent. So the order in which the variables occur in the set affects the meaning of an S2 set. The order also affects the way the algorithm treats S1 sets; if they cannot be set out in an obviously logical order then it may not be useful to treat them as an S1 set. Part of the original motivation for this development was the fact that a suitable algorithm could be incorporated in a general Branch and Bound integer programming code very easily. If the variables hjk, for k = 0, 1, . . . ,K are defined as an S1 set, and if a linear programming subproblem assigns a nonzero value to more than one member of the set, then we can choose an index k , and note that either
or h;k,+i=...=hjK=o.
In other words we can branch on the sequence numbers of the first and last members of the set that are allowed to take nonzero values in the same way as we
Branch and bound methods
21 1
branch on the lower and upper bounds on general integer variables. If the set is an S2 set, then the procedure is the same, except that the choice is now that either A,() = ’ ’ ‘ = hlk,-l = 0
or hlk,+I -*
’ ’
= A,K = 0.
The solution strategy then follows the broad pattern described in Section 2. There remains the further problem of choosing the value of k , , and this and other details are discussed below. Before that, it seems useful to outline the other part of the original motivation for this development, which is that these concepts are useful in a wide range of applications. In many applications of S l sets, the variables are subject to the additional constraint
and in these circumstances all members of the set must take integer values. But this is an incidental consequence of the constraint (3.1) and not part of the definition of S l sets. The primary motivation for S2 sets, and indeed for Special Ordered Sets as a whole, was to make it easier to find global optimum solutions to problems containing piecewise linear approximations to nonlinear functions of single arguments in otherwise linear, or integer, programming problems. We may want to introduce some function a([) into either a constraint or the objective function, where the argument t is some linear function of other variables of the problem. If a ( t ) is approximately piecewise linear between the points t = T o TI, , . . . , T K , then we proceed as follows: Define a set of nonnegative variables A, for k = 0, 1 , . . . , K , and the constraints
The nonlinear function a ( t ) is ther, represented by the linear function
k
provided that the A k are treated as an S2 set. In practice, problems of this kind usually involve several functions of different arguments, which we may denote by ti, for j = 1 * * J . And for any argument t, we may have more than one nonlinear function. So we may write aii(tj)as the function of the jth argument occurring in the ith row of the problem. If all aij(t,)
E.M.L. Beale
212
are approximately piecewise linear functions of ti between the points ti = To,T,,, . . . , qK,,then we define J sets of nonnegative variables hjk and the constraints (3.1) and k
where t, is replaced by the linear function defining it if it is not an original variable of the problem. Each nonlinear function a,,(?,)is then represented by the linear function
2 atj(T,k)
*
Ajk,
k
provided that, for each j, the hlk form an S2 set. This approach was introduced by Miller [ 131. H e called the formulation Separable Programming, and presented a simple modification of the simplex method for finding a local optimum solution. Various methods have been proposed for adding further constraints and integer variables to find global optima. But it is simpler and more efficient to operate directly on the A-variables as S2 sets. Beale and Forrest [l] have reviewed the operation of Special Ordered Sets in the light of developments in Branch and Bound methods described in Section 2 of this paper, and also extended them to general functions of single arguments with continuous derivatives. This extension is straightforward in principle. We simply start with a finite grid of points, and interpolate where necessary to improve the accuracy to which each nonlinear function is approximated. The details are now outlined. We first need some terminology. We use the subscript i to refer to a row, and the subscript j to refer to a Special Ordered Set. The row (3.2) is known as the Reference Row for the jth Special Ordered Set. It is then convenient to define the columns in a set as functions of the reference row entry. So the variables in the jth set are denoted by x,(t,). Such a variable corresponds to the variable denoted earlier by hlk if t, = qk. In any linear programming subproblem, more than one member of the set may take nonzero values, so it is still convenient to use the subscript k to distinguish between different values of t,. The optimum value of X,(tjk) in a subproblem is denoted by xi’(t,k). The vector of coefficients of the variable x, ( tlk) is denoted by a,( f,k), and its ith component, representing the coefficient in the ith row, is denoted by a,,(tjk). We also need to know an upper bound b,, on the value of x k x,(t,k). Most problems include a row of the form (3.1) for each set, and then b,, = 1 . We now define the average reference row entry for the jth set by the formula
<
Branch and bound methods
213
We may also define the end points of the current interval, t; and t;, as the values of t, for which x i ( t j ) are consecutive members of the set such that
6
t; =s < t ; .
In practice this is only important for S1 sets, since if an S2 set contains only a finite number of members we can conceptually add further members by linear interpolation. Now corresponding to any linear programming subproblem we can assess the contribution of the variables in the jth set in three ways. There is the vector aAjof actual contributions defined by aA;
=
1
x:'(tjk)aj(tjk).
k
For an S2 set, there is the vector aCiof corrected contributions corresponding to the average reference row entry and the current sum of values of the members, which is defined by
while for an S1 set we may define corrected contributions a& and a& corresponding to the two end points of the current interval by the formulae
and
A third approach is relevant when we have to decide where to branch in the jth set. If tsi and tEi denote the smallest and largest values of t, for which xy(t,) > 0, then we define the vector a,,(tj)of interpolated coefficients corresponding to any t, between tsi and fEj by
where 6 is defined by the equation
+
ti = (1 - e)tsi et,,.
The discrepancy between a,,(t,) and a,(ti) then indicates the extent to which the current linear programming subproblem misrepresents the consequences of giving the jth argument the value ti.
E.M.L. Beak
214
As usual we let r i denote the shadow price on the ith row. Then if we denote the reduced cost of the variable x j ( t j ) by d,(fj), we have the equation
(3.3) In Section 2 we raised Questions Q1 to Q5, and we now consider how these are answered for Special Ordered Sets. Nothing more needs to be said about Q1 and Q2. Question Q3 concerns the choice of integer variable or Special Ordered Set for branching. To answer it we must estimate the degradation D associated with satisfying the conditions of the jth set. For an S2 set we use the formulae (2.4) to (2.7) with z, defined by z, =
- aAt~.
For an S1 set we take the smaller of the degradations associated with the alternative sets of values 2, = a;,, - aAll
and
z, = a& - aA,,.
If we decide to branch on a Special Ordered Set, we must next choose where to branch. To do this we define the function D,,(t,) as the value of D given by (2.4) to (2.7) where z, is defined by z, = a,,( t, ) - a,,, ( t, ).
For computational convenience we may set r,,, = a in (2.6) and (2.7) for this calculation. If we assume that we do not really know where the optimum value of f, lies between t,, and tE,, it is natural to branch at a value of tf" of t, that maximizes D,,(t,); since this maximizes the benefit from the division into two subproblems if this is measured by the average reduction in D,,(t,) over the current range of possible values of t,. If interpolation is not permitted, then DI,(C,) is evaluated at all points t, = T kbetween ts, and tEl. If the set is an S1 set we then branch so as to exclude an interval with one end point at t," and the other end point o n the same side of rf" as If interpolation is permitted, then t," may be taken at any local maximum of D,,(t,)for which this quantity is strictly positive. Question Q4, concerning the bound on the value of a subproblem, is more serious for Special Ordered Sets representing continuous functions, since the value of ul) in the solution to a linear programming subproblem based on a finite set of members of each set does not necessarily represent a valid upper bound. But, as in Dantzig-Wolfe decomposition, an upper bound derived from Lagrangian relaxation is given by the formula
6.
u = u,,+ C max (-dj(tj)) j
. bSj.
(3.4)
'8
We may be able to reduce this bound by trying to attain it. So we extend the
Branch and bound methods
215
linear programming subproblem by adding that member xi(ti) of the set having the most negative reduced cost di(ti), provided that the quantity di(ti) * bSi is less than some (negative) threshold value. This part of the process requires a onedimensional global minimization which is not always trivial. A method for this is described in the Appendix to Beale and Forrest [l]. It may also be useful to add the vector xi( 6 ) to the linear programming subproblem, since for some problems the average reference row entry is constrained by other parts of the model. This process of adding variables to the linear programming subproblem continues until either the bound falls below xoc o r until di(ti) * bSi is greater than or equal to the threshold for all variables in all sets.
4. More general nonlinear functions Almost any nonlinear function can be approximated by a polynomial, or by some other sum of products of nonlinear functions of single arguments. The methods described so far can handle both sums and nonlinear functions of single arguments. But a product is the simplest example of a nonlinear function of more than one argument, and therefore needs further attention. In principle, general product terms can be handled using logarithms and Special Ordered Sets, since
if 5;. = In xi. But this approach fails unless the variables are all strictly positive, and it often causes difficulties unless the values of all the variables are known a priori to within about one order of magnitude. Arbitrary quadratic functions, including simple product terms of two variables, can be represented as sums and differences of squares of linear functions. This is a particularly appropriate formulation when the nonlinearities are quadratic functions occurring in either the objective function o r a single constraint. But in other circumstances this approach may produce an excessive number of Special Ordered Sets needing Branch and Bound treatment. In any programming problem we can classify the variables as nonlinear or linear, in such a way that if the values of all the nonlinear variables are fixed the problem reduces to a simple linear, o r integer, programming problem. For some problems all the variables must be classified as nonlinear in this sense, but whether or not this is so there is often only a moderate number of nonlinear variables. And if the coefficients of the linear variables and the terms independent of the linear variables can all be expressed as sums of products of nonlinear functions of individual nonlinear variables, then we can find a global optimum using a new technique called Linked Ordered Sets, in which the only Branch and Bound operations are on the lower and upper bounds on the nonlinear variables.
E.M.L. Beak
216
We may think of the nonlinear variables yi as having sequence numbers, with the more important variables having the lower sequence numbers. We are then interested in representing simple nonlinear functions of yi, which we can do with Special Ordered Sets, and products of the form xufu(yi), where xu is either a linear variable or a product of functions of nonlinear variables with higher sequence and X M A X l , represnumbers and (perhaps) a linear variable. Suppose that XMINu ent the minimum and maximum possible values of xu, and suppose for the moment that yi can take only K + 1 possible values y k for k = 0, 1, . . . ,K , where Yjk
yk+l.
Then we may introduce 2 ( K + 1) nonnegative variables the constraints
hjuuk
for u = 1,2, and
(4.3) The quantity h j u l k here represents a weight associated with the possibility that xu =XMINu and yi = &. Similarly h j u 2 k represents a weight associated with the possibility that xu = X M A X u and yi = y k . If the function xufu(yi) were a linear function of xu and yi, we could therefore replace it by the linear function
c
(XMINufu(
Yik)Aiulk+
xMAXufu(
yk)’$uZk).
k
In general this is not valid. But if we impose the further condition that hiuuk= 0 whenever k f k , for all u and u then we fix yi to equal & , , and the function becomes a linear function of xu. The linear expression in terms of the hjuuk therefore becomes valid. And the results of solving linear programming subproblems when the h j u v k are not restricted in this way provide upper bounds on the possible value of the objective function when yi has not been restricted to a single value. We therefore treat the sets of variables hjuuk as Linked S1 Sets, and carry out Branch and Bound operations on the first and last values of k for which the variables may take nonzero values for all u and u simultaneously. If any Linked Ordered Sets are required for the nonlinear variable yi, then we can represent any simple function g(yi) by the linear function
u
k
If the set of possible values of yi is not finite, then we must interpolate, as with Special Ordered Sets. The theory that follows is very similar to that in Section 3. It is again convenient to define the columns in terms of their entries in the
Branch and bound methods
217
Reference Rows (4.2). So the variables are denoted by xjUu(y,),corresponding to the variables denoted earlier by hjuuk if yi = Yjk. Since several of these variables may take nonzero values in any linear programming subproblem, we still use the subscript k to distinguish between different values of yi. The optimum value of Xjuu(Yjk) in a subproblem is denoted by X:u,(Yjk), the vector of its coefficients in the constraints is denoted by a j u u ( y j k ) , and the ith component of this vector is denoted by a i j u u ( Y j k ) . We need an upper bound bLSjuon the value of x k Xj,,(Yjk). If the Row (4.1) exists then bLSiu= 1. The average reference row entry yj for any pair of sets is defined by the formula
xu
Since (4.2) takes the same form for each u, the value of jj will not depend on which value of u is taken when evaluating the right hand side of (4.4). For each nonlinear variable we define the end points of the current interval y; and yf as consecutive members of the set of possible values of yj such that y; s yj < y;
.
And for each pair of sets we define
Again, to decide where to branch for the jth nonlinear variable we may define a similar to uIj(t)for Special Ordered Sets. But this is rather more quantity alju(yj) complicated since we need to consider the sets for both values of u simultaneouusly . For each pair of sets we define ysiu and yEiu as the smallest and largest values of yi for which x,?,,(yj) is positive for either value of u. We also associate a value of u with each of these values of yi by defining s and e in such a way that xyu.s(ysju)
> 0 and x?ue(yEiu)> 0,
choosing s and e to be different from each other if possible. We then define aIju(Yj) =(l-e)aju,(Y,ju)+eajue(YEju),
E.M.L. Beale
218
where 0 is defined by the equation
Y,
= ( 1 - o)YsJu+ 0 Y E J U .
The quantities aDllu(y,), defined by u
D
I
j
u
~
~
j
~
~
~
l
~
~
~
~
u
j
~
~
~
~
~
~
u
j
u
~
~
~
S
+ e(a,U, (Y, 1- 4 u e (YElU))? can then be used as indications of the extent to which the current linear programming subproblem misrepresents the consequences of giving the jth nonlinear variable the value y,. Note that Q D ~ , ~ ( Y , ) =ifO y, = yslu or yEJu, and we define it to be zero for all y, outside the range yslu < y, < yElu. Again, if n; denotes the shadow price on the ith row, the reduced cost of the variable xluu( y,) is d,uu( y,), defined by diuu(yi)=
C Tiaijuu(yj). i
The estimated degradation D associated with satisfying the conditions for the jth nonlinear variable is then defined by the formulae (2.4) to (2.7), with zi defined by
or for Linked S1 sets by taking the smaller of the degradations associated with the alternative sets of values
zi =
(a&,- aAiiu) and zi = U
1
uAiju).
U
If we decide to branch on a nonlinear variable we must next choose where to branch. To do this we define the function DNli(yj) as the value of D given by (2.4) to (2.7) when zi is defined by
For computational purposes we may ser rDi = w in (2.6) and (2.7) for this calculation. We may then branch at a value y y of yi that maximizes DNli(yi).If there is only a finite number of possible values of yi, then we select y r from these values, branching so as to exclude an interval with one end point at y r and the other end point on the same side of y r as yi. If interpolation is permitted, then y y may be taken at any local maximum of DNlj(yj)for which this quantity is strictly positive.
j
u
~
~
Brunch and bound methods
219
Finally, the extension to (3.4) to cover both Special Ordered Sets and Linked Ordered Sets is
Acknowledgement The concepts of Linked Ordered Sets arose from joint work by Gulf Management Sciences in London and Scicon Computer Services Ltd.
References [ I ] E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered sets, Math. Programming 10 (1976) 52-69. [2] E.M.L. Beale and J.A. Tomlin, Special facilities in a general mathematical programming system for nonconvex problems using ordered sets of variables, in: J . Lawrence, ed., Proceedings of the Fifth International Conference on Operational Research (Tavistock Publications, London, (1970) 447-454. [3] M. Benichou, J.M. Gauthier, P. Girodet, G. Hentges, G. Ribiitre and 0.Vincent, Experiments in mixed integer linear programming, Math. Programming I (1971) 76-94. [4] M. Benichou, J.M. Gauthier, G . Hentges and G . Ribibre, ’The efficient solution of large-scale linear programming problems -some algorithmic techniques and computational results, Math. Programming 13 (1977) 280-322. [5] A.L. Brearley, G. Mitra and H.P. Williams, Analysis of mathematical programming problems prior to applying the simplex method, Math. Programming 8 (1975) 54-83. [6] R. Breu and C. A. Burdet, Branch and Bound Experiments in zero-one Programming, Math. Programming Study 2 (1974) 1-50. [7] N.J. Dricbeek, An algorithm for the solution of mixed integer programming problems, Management Sci. 12 (1966), 576-587. [HI J.J.H. Forrest, J.P.H. Hirst and J.A. Tomlin, Practical solution of large mixed integer programming problems with UMPIRE, Management Sci. 20 (1974) 736-773. [9] A.M. Geoffrion, Lagrangian relaxation for integer programming, Math. Programming Study 2 (1974) 82-1 14. [IO] R.E. Gomory, Outline of an algorithm for integer solutions to linear programs, Bull. Amer. Math. SOC.64 (1958) 275-278. [ I I ] A.H. Land and A.G. Doig, An automatic method of solving discrete programming problems, Econometrica 28 (1960) 497-520. [I21 J.D.C. Little, K.C. Murty, D.W. Sweeney and C. Karel, An algorithm for the traveling salesman problem, Operations Res. 11 (1963) 972-989. [ 131 C.E. Miller, The simplex method for local separable programming, in: R.L. Graves and P. Wolfe, eds.. Recent Advances in Mathematical Programming, (McGraw-Hill, New York, 1963) 69-100. [ 141 G. Mitra, Investigation of some branch and bound strategies for the solution of mixed integer linear programs, Math. Programming 4 (1973) 155-170. [IS] J.A. Tomlin, An improved Branch and Bound method for integer programming, Operations Res. 19 (1971) 1070-1075.
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 221-269 @ North-Holland Publishing Company
COMPUTER CODES FOR PROBLEMS OF INTEGER PROGRAMMING * A. LAND and S. POWELL London School of Economics, England
Introduction During the 70’s there has been a very rapid development of integer programming capability by organizations offering mathematical programming systems. Unfortunately, the degree of standardization of input and of terminology which had been attained for LP has suffered in this rapid development. We see the main purpose of our survey as providing a “consumer research” report on the different products. We have even attempted a modest amount of product testing, reported in the Appendix. The term “integer programming” covers a wide spectrum of models, which we can characterize by mixed integer programming (MIP) at one end, and combinatorial programming at the other end. It is also broadly true to say that the interest of those working in commercial organisations is currently focussed very much at the MIP end of the spectrum-indeed on problems which are basically large LP systems with relatively few integer variables -whilst academic researchers are in hot pursuit of methods to solve pure integer problems-frequently with special combinatorial structures. Thus in reading this as a consumer report one has to bear in mind which consumers are intended for each code. We have tried to gather information about all codes which are available (usually at a cost) to any user. However, we have chosen to give a full report only on those which are maintained. We considered applying a second criterion: that the code should be capable of obtaining a guaranteed optimum solution. A large and complex problem may not be capable of yielding an optimum integer solution within feasible cost and time limits on any code, so that the user has in fact to be content with a “good” solution obtained by heuristic methods. In that case, one ought to make a comparative test of an “optimizing” branch and bound method against codes which do not purport to optimize, but only to provide good solutions. However, we have not done this, although we have included some heuristic codes. A more agonizing decision has been to exclude some rather widely used “special purpose” codes. We wanted at first to include only codes to solve ILP and MIP problems without restrictions on signs of coefficients or of constraints. *This research has been supported by the U.K. Science Research Council.
221
222
A. Land, S. Powell
However, we felt it was not much of a relaxation to include “multi-constraint knapsack” algorithms. Depot location, however, we felt was going outside our brief. Thus the first part of our paper is a survey of eleven commercial MIP codes (listed in Table 1). It will be appreciated that we have also had to draw a line in time. Some of the codes are now elderly and perhaps soon to be retired. Some firms would have preferred us to report on their about-to-be announced new codes or revisions. However, those listed were all available for use in mid-1977. The second part of our paper is a summary survey of what we may call non-commercial, o r academic, codes. We cannot claim to be comprehensive: we simply hope to draw the attention of a user to codes which may serve his purposes.
I. COMMERCIAL CODES All of these codes are based on branch and bound methods, of which there are many excellent surveys [ S , 12, 17, 371. All start with an optimum continuous LP solution which constitutes a first node in the tree of solutions. In all but one of the codes (Haverley-MIP) the LP solves the problem with the simple bounds on the integer variables handled implicitly. There have been developments in the simplex algorithms of all of these systems in recent years, as well as in their MIP capabilities. Naturally, the more efficient the system for solving LP, the better the MIP system. However, we have chosen to ignore this aspect, and to concentrate purely on the strategy for finding integer solutions. The branch and bound algorithm (after the first node has been generated) can be presented in its simplest form as follows: (1) Select a node from the tree which does not satisfy the integer conditions. If there are none, the algorithm is complete and the best integer solution found, the incumbent (if any), is the optimum. (2) Choose a variable (or set, see below) which does not satisfy the integer conditions. (3) Branch to (usually) two subproblems by restricting the chosen variable (or set) to exclude its value at the node and not exclude any feasible integer solutions. This may produce a better integer solution than any obtained so far, which becomes the incumbent. Otherwise the new subproblems enter the list of nodes if they appear to possibly lead to a better solution than the incumbent. Return to step 1. It is a pity that the terminology established by Geoffrion and Marsten [17] has not been adopted by the producers of codes. Instead we find a dismaying mish-mash of terminology which besides using different words for the same concept, also uses the same word for different concepts.
223
Computer codes for problems of integer programming
Our primary source of information about the codes has been the users’ manuals [47-571. On the whole, these are fairly adequate from the point of view of the user trying to solve his problem, but some of them are very uninformative about the actual procedure followed in the algorithm, and we have had to supplement them with a considerable number of questions. In some cases, we have not succeeded in discovering all the answers.
1. Codes We have considered eleven maintained commercially available integer programming codes. Table 1 gives the name of the code, the name of the organization responsible, the computers for which the code is written, the approximate date of the first release and whether the code is still under development.
Table 1 . The commercial codes
Code
Organisation
Computer
Date
Under development
Apex I11
Control Data
1975
YES
FMPS HaverleyMIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo
Sperry-Univac Haverley Systems
1976 1970
YES
1970 1969 1974 1973 1972 1976 1975
NO NO YES YES NO YES NO
XDLA
ICL
Cyber 70 series, models 72, 73, 74 and 76; Cyber 170 6000 series 7600 series 1100 series IBM 360 and 370 series Univac 90/30 System 4-70 and 4-SO 360 and 370 series 370 series Series 60 (level 66) CDC 6600 Univac 1100 series B7000 series B6000 series 1900 series
1970
NO
ICL IBM IBM Honeywell SIA Scicon Burroughs
NO
2. Variables All of these codes are MIP codes: that is to say, they can all handle problems in which some of the variables are continuous, and some (or all) are restricted to take integer values in one way or another. Table 2 shows which type of integer variable can be handled by each code (indicated by a J entry), A binary variable can take values zero or one and an integer variable is restricted to integer values
A. Land, S. Powell
224
Table 2. Types of variables Code
Binary variables
Integer variables
Bivalent variables
Apex 111 FMPS Haverley-MIP LP400 MIP MIPI370 MPS Ophelie Sciconic Tempo XDLA
J J
J
J
Semi-continuous variables
J
J J J J J J J J
J J J J J J J J
J
between specified lower and upper bounds. In all the codes, except MIP/370, MPS, Sciconic, Tempo, and XDLA the bounds on an integer variable must be specified by the user. In Ophelie and LP400 the specified lower bound must be non-negative. Apex I11 also allows bivalent variables which must equal 0 or an upper bound, which need not be one. A semi-continuous variable is restricted to take the value zero or a value greater than or equal to one. These variables are available only in Sciconic.
3. Special ordered sets This is a concept introduced by Beale and Tomlin in 1970 [3] and present in some form in 8 of the 11 codes. There are two types, S1 and S2. An S1 set is a set of variables such that at most one of the variables may take a non-zero value, and S2 is a set such that at most two adjacent variables may take non-zero values. If the variables in the set have a natural order in the model (e.g., in terms of cost, or of capacity provided, or of physical location) it has proved valuable not to branch on the individual variables in the set, but rather to branch to exclude alternatively one end of the set or the other. S1 sets are useful when a selection of one from a set of alternatives has to be made. S2 sets are useful where there are non-linear functions in the model, either in the objective function or in the constraints, and where separable programming techniques may yield local rather than global optima to the linear approximation. A refinement in Sciconic, unlike any of the other codes, is that it offers an automatic interpolation of new elements into S2 sets (Beale and Forrest [2]) so that the user need only specify the coefficients (a, 6 and c ) of the functions for
Computer codes for problems of integer programming
225
the following cases:
ax2 + bx + c ; ax’+ bx4;
a lo&x; ae”; ax”+c; a(x+c ) ~ ; ax)xl+bx + c. Sciconic also provides ‘linked ordered sets’ as described in E.M.L. Beale’s paper appearing in this volume. It may be felt that we are here venturing into the field more of general non-linear programming than of integer programming. However, the emphasis is still on the non-convex situation of the type which is frequently modelled by linear approximation with integer variables. Table 3 summarizes the variants of special ordered sets which we have identified. The terms ‘group’ and ‘special set’ are used in some codes, rather than special ordered set. Table 3. Special ordered sets
Code Apex 111 FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA
Set variables must add to one?
Variables in set must be consecutive?
No Yes
Yes No
Yes
Yes
Yes Yes
Yes Yes
No Yes Yes
Yes No Yes
May have a reference row?
Must have a reference row?
No No No No no special set facility Yes No no special set facility Yes No No No no special set facility Yes Yes No No Yes No
Sets must be disjoint?
S2 sets
Yes Yes
Yes No
Yes
No
Yes Yes
No No
Yes No Yes
Yes No Yes
Beale and Tomlin defined a constraint in the model as a ‘reference row’ associated with a set which is used to determine the branch point in the set (v. Section 4.4) and in Sciconic this row must be specified. In the codes where a reference row need not (or cannot) be supplied, the branching is done by assigning weights to the variables in some other way (v. Table 8 in Section 4.4). In FMPS sets are treated differently in that branching is not done by fixing
226
A. Land. S . Powell
either one end or the other of the set to zero, but rather by selecting one of the variables to be one. Probably more by the historical accident of trying to introduce sets with minimal changes to data input, than by design, several of the codes require that the sets consist of disjoint sets of consecutive variables. Tempo alone is not restricted in this way, since the sets are not specified by the user but by an analysis of the model by the program. Since Benichou, Gauthier, Hentges, and Ribikre [6] reported that overlapping sets can accelerate the algorithm for some problems, perhaps we shall see this facility in the next generation of codes.
4. Branch and bound The eleven codes solve mixed integer or pure integer programming problems by a branch and bound algorithm. The method can be represented as a tree search where each node of the tree represents a subproblem which may be unsolved, partially solved or solved. The value of the objective function of a solved subproblem at node k is the functional value of node k , x,k, and the functional value of the initial linear program with all the integer constraints relaxed is xli. At each node there is a bound (upper if maximizing, lower if minimizing) on the functional value of an integer solution that may be found from that node. This bound, the value of the node, is usually the functional value of the node; but if a code is using penalties (Tomlin [42]), then the value may be the functional value f the minimum penalty. From here onwards, the symbol f is to understood as + when minimizing and - when maximizing. In many codes there is also at each node an estimate of the value of an integer solution that may be derived from that node. The different methods used for computing an estimate are described in Section 4.5.3. An integer variable, xl,with value X, is integer infeasible if f, > O where f, is the fractional part of XI. The integer infeasibility of the variable j is: min (fi, 1 - A ) . The integer infeasibility of a set is discussed in Section 4.4 below. A node is said to be fathomed if the continuous subproblem is infeasible, or its value is worse than a cut-of, x,, or the optimal solution of the continuous subproblem is integer feasible. A node is said to have been developed if it has been used as a branching node; a node that is neither fathomed nor developed is an active node. The process of branching consists of taking a solved but integer infeasible subproblem and from it creating in general two new unsolved subproblems, often called the ‘up branch’ and the ‘down branch’. Thus branching consists of taking a suitable active node and developing it to create two new active nodes. At the stage when a subproblem is solved, it may be possible, or considered
Computer codes for problems of integer programming
227
advisable, to only partially solve the subproblem, that is to truncate the node. The maximum (when maximizing, minimum when minimizing) value of the active nodes is the co-bound. Thus during the search the value of the optimal integer solution lies between the value of the incumbent and the co-bound. Within the general framework of the algorithm, there are many areas of choice. These may be summarised by a series of questions which we shall consider in turn. 1. 2. 3. 4. 5. 6. 7. 8.
What is the nature of a node? What is the method of branching? Which variable or set is branched upon? Which variable within a set is branched upon? Which node is developed? Which nodes can be truncated? How are the continuous LP problems solved? What is the value of the cut-off?
4.1. The nature of a node
At any stage in the computation, all of the codes have a list of active nodes which may have to be considered at a later stage. An active node may be a solved subproblem, a partially solved subproblem or an unsolved subproblem - that is to say, it consists of a linear program with added constraints (tighter bounds on the variables) which have not yet been satisfied. Those codes whose node list is unsolved subproblems solve a subproblem and then immediately decide upon the branching variable so creating two new unsolved subproblems. Those codes whose node list is solved subproblems decide upon the branching variable having selected the node for further development. The creation of partially solved subproblems is explained under the heading trunction (v. Section 4.6). Table 4 shows the node structure used by each code. 4.2. Method of branching All eleven codes perform branching on a variable or set that does not satisfy the integer or set conditions. In Apex I11 only, a variable or set which is integer feasible may be considered as the branching variable if the appropriate switch is set by the user. For a binary variable with bounds of 0 and 1 and whose value lies between its bounds, branching means that o n one branch the upper bound is replaced by 0 and on the other the lower bound is replaced by 1. Branching on an integer variable, whose values are currently restricted to lie between a and 6, and whose value is Xi, means that on one branch (the down branch) xi is restricted to lie between a and [Xi]and on the other branch (the up branch) xi is restricted to lie between [Xi]+ 1 and 6, where [Ti] is the integer below Xp Branching on an S1 set, say the variables h l , . . . ,A,, means that a variable Ak is chosen as the branch point. On one branch the variables A , , . . . ,A, are allowed to
A. Land, S . Powell
228
Table 4. Type of node in the node list Code
Unsolved
Partially solved
J
J
Solved
~~~
Apex 111 FMPS Haverley-MIP" LP400 MIP MIPI370 MPS Ophelie Sciconic Tempo XDLA"
J J J
J
J J J J J J J J
J J
J
J J
a In Haverley-MIP and XDLA there is some user control over the type of node in the list (v. Section 4.5.1).
take non-zero values and the variables h k + l = * . . = A, = 0; and on the other branch the variables A , = . * = A, = 0, and the variables Ak+,,. . . , A, are allowed to take non-zero values. For an S2 set, branching means the same except that the branch point variable, Ak, is allowed to be non-zero in both branches. The exception to dichotomous branching is branching on a set in Fh4PS. The user has the choice either of setting each set variable to one so that there are as many branches from the node as there are variables in a set, or of setting a single set variable to one, so that there is only one new node generated, or of setting only the set variables with non-integral values to one. In the latter two cases a flag on the node records that there remain other variables to be set to one at a later stage. A node is only considered to be developed when all the set variables have been set equal to one.
4.3. Which variable or set is branched upon? There is a wide variety of rules used for the choice of variables or set to 'arbitrate', or branch upon, and most of the codes offer the users a choice. Sections 4.3.1 to 4.3.5 below are concerned with the choice of a variable or set, which does not satisfy the integer or set conditions, upon which to branch. Table 5 summarizes the choice available to each code. In the case of sets, some assumption has to be made as to the choice of branching point within the set before the decision can be made as to which variable or set is to be chosen. Thus the full discussion for sets is postponed till Section 4.4.
4.3.1. Priorities Experience has shown (e.g. [37]) that branching upon variables in order of their importance, in some sense, can accelerate the progress of the algorithm. All the
229
Computer codes for problems of integer programming
Table 5. Choice of variable or set to branch upon
Code
Priorities
Penalties
Penalties for forced moves of non-basic variables
Apex I11 FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA
J J J
J
J
J
J
J J J J J J J
Pseudocosts
Integer infeasibility
Pseudoshadow costs
J J
J J
J
J J J J
J J J
J J J
J
J J
codes except LP400 permit the user to define a priority list. For this purpose a set is treated as an entity so that the priority list is a list of the user’s assessment of the relative importance of the integer variables and sets to his problem. In Apex 111, Haverley-MIP, Ophelie, Sciconic and XDLA a priority list can be formed only from a user specification. In XDLA the specification of a priority list includes the preferred branching direction for each variable and set in the list. FMPS and MPS, in the absence of a user specified order, assume that the input order of the variables, the sets, and the set variables (in FMPS only) is the priority order, in decreasing order of importance. MIP/370 has four methods of generating a priority list: (a) User specified (b) Input order in decreasing order of importance (c) In decreasing order of the absolute values of the cost coefficients. (The cost coefficient of a set is taken as the average of the absolute values of the cost coefficients of the set variables.) (d) An order that is built during the tree search using pseudo-costs (v. Section 4.3.3). In the absence of any user specification and depending upon the problem structure (v. Section 5) either the cost ordering or the pseudo-cost ordering is used as a default priority list. MIP has the three methods (a), (b) and (c) of MIP/370 for generating a priority list, and the input order is used as a default priority list. Tempo has three methods of generating a priority list, (a) and (b) as in MIP/370 and the third order is in decreasing order of the cost coefficients. When the user specifies a priority list it is not necessary, except in FMPS, for a priority to be assigned to all variables and sets. In only four codes, Apex 111, MPS, Sciconic and XDLA. is it possible to assign the same priority to more than one set
230
A. Land, S. Powell
and/or integer variable. The variables and sets without a priority are selected for branching only if there is no variable or set with an assigned priority eligible for branching. In all codes except MIP and MIP/370 the choice of branching variable or set among those with equal or no priority is made by the usual branching rules available to each code. In MIP the input ordering, and in MIP/370 the cost ordering is used as a priority order to choose among those variables that have no user specified priority. Table 6 shows which codes use which method for forming a priority list, and (by *) which is the default method. Table 6. Method of generating priority list User specified
Code Apex I11 FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA a
Input order
cost order
Pseudo-cost order
* J *
*
J J J * * J
*
*
J
no priorities
J
MIP/370-PUREa*
MIP/370-MIPa*
* J
J
See Section 5.
4.3.2. Penalties Penalties, which have been described by Driebeck [lo] and Tomlin [42], can be used by five codes, Apex 111, LP400, MPS, Tempo and XDLA, to select the branching variable or set. Penalties have the advantage that they provide guaranteed bounds on the function from branching in each direction on a variable. Howevever, they are costly to compute, and only provide information on the function value very close to the current solution. Particularly in cases where many (large) LP basis changes have to be made to completely solve the subproblem, they may be very uninformative about the final outcome of arbitrating the variable, and they are falling out of favour. If penalties are being used, up and down penalties, pui and pDi, are calculated for each variable or set, j , that is being considered for branching, and the kth variable is chosen for branching according to the value of 6,. In both Apex I11 and XDLA there is one option using penalties: 6, = max 6,, j
where 6, = max (pui, p,,).
23 1
Computer codes for problems of integer programming
In MPS there are four options: (a)
e k = max
6,, where 6, = max (lpul -pD,I), solve min (PUk,
PDk);
I
(b) 6,
= max
el, where
6, = max (p,,, pDI),solve max ( P U k ,
el, where
0, = min (pul, pDI),solve min ( P U k ,
PDk);
el, where
6, = max (puJ,pD,), solve min (PUk,
PDk).
PDk);
I
(c) e,
= max 1
(d)
6 k = max J
In Tempo, for each variable selected as a possible candidate for branching, (v. Section 4.3.3) up and down penalties are computed. Then, for each such variable, the updated column of the variables that would enter the basis in order to drive the variable in either direction is computed. It is then possible to count the number of infeasibilities that would result if a basis change were made. The ‘up and down number of infeasibilities’, I, and ID, are a measure of the difficulty of fixing a variable. The choice of branching variable x k is
Ok = max O,, where 6, = min (pDJID,p&). I
We have not been able to discover the details of the use of penalties in LP400. Penalty calculations for basic integer variables and sets can take a substantial amount of time if there are many such variables. Apex I11 enables the user to specify the maximum number of basic variables for which penalties are to be calculated. Because the number of variables available for consideration may exceed this limit the user has to specify whether variables close to an integer are to be preferred. Using the up and down penalties associated with each variable, whether basic or non-basic, one can test whether it is worthwhile to branch in either direction. If it can be shown that both branches must lead to a worse solution than the cut-off value then the node needs no further development. If it can be shown that only one branch must lead to a worse solution then a forced move in the opposite direction can be made. If several variables have these large penalties, then several forced moves can be made at once. In Apex I11 the user may choose whether or not he wishes these forced moves to be made. LP400, Sciconic, Tempo and XDLA make forced moves on non-basic variables which are integer feasible. Unlike the penalties on integer infeasible variables which require heavy computation, these penalties are, of course, immediately available as the reduced costs on the non-basic variables. 4.3.3. Pseudo -costs Pseudo-costs, which have been described by Benichou, Gauthier, Girodet, Hentges, Ribikre and Vincent [5], are used by four codes, MIP, MIP/370, MPS and Tempo. Pseudo-costs are an attempt to overcome the disadvantages of
A. Land, S. Powell
232
penalties by estimating the total effect of arbitrating a variable either upwards or downwards. The up and down pseudo-costs of the jth variable, PCU, and PCL,, are A +XU PCU, =1-6’ A -xo PCL, =-
6
where o n branching o n the jth variable, A + x o and A - x , , are the absolute change in the functional value on the up and down branches respectively. The basic assumption is that the net effect of branching on a particular variable upwards or downwards varies according to the amount by which it has to be changed, but tends to be more or less similar wherever it occurs in the tree. Their disadvantage is that they do not provide a guaranteed bound on the function. At the input stage, in all the codes which use pseudo-costs except Tempo, the user may specify the up and down pseudo-costs, PCU, and PCL,, for some or all of the integer variables. On obtaining the solution to a subproblem the appropriate pseudo-cost of the branching variable is computed. A pseudo-cost whose value has not been assigned in either of these ways is said to be ‘missing’. During the search at a particular node a pseudo-cost is required if it is associated with an integer variable having a non-integer value. If a required pseudo-cost is missing, the user of MIP/370 can specify whether it is (a) treated as zero, (b) computed, (c) partially computed: that is the optimization of the subproblem is prematurely halted and the pseudo-cost is approximated using the non-optimal value of the subproblem. The user of MIP has only options (a) and (b) above. In MPS the user may choose whether missing pseudo-costs take either a user specified value or the value x: - X I SO
where so is the sum of the integer infeasibilities at the continuous optimum. xi; and xI are defined in Section 4. In Tempo, having solved the initial continuous LP, up and down penalties are computed for all integer variables where the down penalty for an integer variable that is non-basic at its upper bound is its reduced cost, and the up penalty is zero. These penalties are used as initial pseudo-costs. Pseudo-costs are used by Tempo to select up to six variables as candidates for branching. There are three options offered by MIP/370 that use pseudo-costs to determine the branching variable xk : = min, el, where el = min (PCLlf,,PCU,(l -f,)); (a) (b) 0, =max, (PCL,fi+PCU,(l - f i ) ) ;
Computer codes for problems of integer programming
233
(c) A priority list in which the integer variables are ordered in decreasing priority is built. The branching variable is the non-integer variable with the highest priority in the list. If the list is empty or every variable in the list has an integer value then the variable, x k , is selected by option (b) above and is added to the end of the list. The priority list is built during the search but once built the ordering of the integer variables is unchanged. The option offered by MIP to determine the branching variable x k is: where 0, = min (PCL,fi, PCUi(l - A ) ) .
Ok = max O,, j
When choosing the branching variable all required pseudo-costs 'that are missing are computed. If the user has specified that missing pseudo-costs are to be treated as zero and all required pseudo-costs are missing then the branching variable is chosen by priority. The option offered by MPS is: O k = max i
where 0, = max (pvj, PCU, (1 - f,), pDi,PCL, (1- fi)),
O,,
and solve the subproblem with
In MPS and Tempo, each time a subproblem is solved the corresponding pseudo-cost of the branching variable is updated. In MIP and MIP/370 a pseudo-cost value is updated if the new value is a 'better' estimate. That is to say, a value computed from the complete solution of a subproblem is better than that from the partial solution; which is better than an initialized value; which is better than no value. in MIP/370 once a value has been computed from the solution of a subproblem it is not further updated. In MIP, the user has a choice of whether or not to further update. The default is not to update. 4.3.4. Integer infeasibility In XDLA, if the integer variables with non-integral values have equal priority, then option (e) or (f) in Table 7 is used, depending upon user specification. Haverley-MIP has a hierarchy of integer infeasibility options. Since the code does not have implicit upper bounding, there may be 0-1 variables at a value greater than 1. Such a variable is chosen first, then one of the options (a) to (d) above, depending upon user specification. MPS has options (c) and (d) in Table 7 and then solves the subproblem with ( h k ,PDk
).
4.3.5. Pseudo-shadow costs Pseudo-shadow costs, which are used only in Sciconic, have been described by Beale and Forrest [2]. At the input stage the user may specify up and down
A. Land, S. Powell
234
Table 7. Methods of choosing branching variables using integer infeasibility Bi = min (fi, 1-6)
Code
8, = min 19, 8, = max 8, i
Apex 111
ei = 8, = lim fi
i
8, = max fi
8, = max 8,
i
I
J
J
J
J
J
J
J J
J J
J
J
m i n e , 1-51 lzi + 1 8,
i
= min
8,
i
FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA
J
J
J
(a)
(b)
J (4
(C)
(el
J (f)
pseudo-shadow prices, p: and p,-, for each constraint, i, and q: and q;, up and down pseudo-costs for each integer variable. The pseudo-costs may be considered as the pseudo-shadow prices of the upper and lower bound constraints respectively on each integer variable. Both p: and p; are zero when i is the objective function: p: = 0 when the ith constraint is 2 ,and p; = 0 for a G constraint. The remaining pseudo-shadow prices and the pseudo-costs must be strictly positive and in the absence of any user specification these values are taken as 0.0005. The estimated degradation in the objective function, or estimated cost of satisfaction, in forcing an integer variable, x,, to the integer above its current value is
o:=( 4 + zrnax(P:a,,,-P;a,,? L
7r1a,,)}(l-i)
and in forcing x, to the integer below is
where r1is the current dual value of the ith constraint and a,, is the coefficient in the matrix of constraints, and the summation is over all rows including the objective function. The T value of the objective function is 1.0 when minimising and - 1.O when maximising. There are two options in the choice of branching variable, x k ,
+
(a)
Ok
= max
O,, where 0, = min (DT, D;), and
I
(b)
Ok
= rnax
O,, where O, = max (D;, 0;).
I
Option (a) is the default option.
235
Computer codes for problems of integer programming
Clearly the user can bias the direction of search by setting suitable values of the up and down pseudo-costs. 4.4. Choice of branching point within a set For those codes which have special sets of one type or another, there is not only the choice of which variable or set upon which to branch, but also the point within the set to be decided. In FMPS the latter choice is which of the set variables to be put equal to one, and in all the other codes it is the dividing point in the set such that one ‘end’ or the other of the set is to be zero. All the codes, Apex 111, FMPS, LP400, MIP/370, MPS, Sciconic, Tempo and XDLA, which have set facilities have a method (or methods) of defining for each set variable hi a weight wI. These weights may be: (a) the coefficients of a row of the coefficient matrix, the reference row or (b) the sequence numbers 1 , 2 , 3 , . . . ,p where p is the number of variables in the set, or (c) user specified values.
In all codes except MIP/370, the weights, however defined, must form a monotonic ascending sequence. In MIP/370 the code implicitly re-orders the variables so that the weights form a monotonic descending sequence. Table 8 shows which methods of specifying the weights are available to each code. An asterisk indicates the default method.
Table 8. Methods of specifying weights of variables in a set Code Apex I11 FMPS LP400 MIP/370 MPS Sciconic Tempo XDLA
Reference row
Sequence numbers
* J J
* J
* * *
User specified
* J
* *
A step in deciding the branch point within a set is the computation of the average weight W,that is:
where h; is the value of the jth variable in a set at a node, and the summation is over the p variables of the set. In all codes except Apex 111 and Sciconic C = 1.
4
A. Land, S. Powell
236
The value of W defines two adjacent variables hk and hk+, such that wk
W < wk+l.
In MIP/370 the adjacent pair of variables is such that wk 3 W > wk + 1 . In all codes except FMPS the variable hk is used as the branch point in S1 sets. For S2 sets there is a choice of hk or hk+, as the branch point. In Apex 111 hk iS chosen as the branch point of an S2 set if the fractional part of W is less than 0.5, otherwise hk+l is chosen. In XDLA the choice is ‘complex and involves penalties’. The choice in Sciconic, and an alternative method for selecting the branch point in an S1 set, is described in Section 4.4.5. In FMPS there are two methods of selecting the branch point within an S1 set by using the average weight or by priority order. If the user has specified weights for the set variables then hk is chosen as the branch point if 1~ - wk I c [W- W k + l I , otherwise A , + , is chosen. In the absence of such weights FMPS has extended the concept of priorities to variables within a set, and fixes them to one in decreasing order of priority. The priorities may be user specified or input order, see Section 4.3.1. In the choice of node and the choice of which variable or set to branch upon, some codes use the concepts of down and up composite variables, integer infeasibility of a set, and the fractional part of a set. The down and up composite variables are
respectively. The down composite variable will be referred to as f k . In Apex 111 and FMPS the integer infeasibility of an S1 set is defined as 1.0--
maxi(&.) CAi
(in FMPS, of course, CA,= I ) . In Apex 111 the infeasibility of an S2 set is: 1.0-
maxi(& +A,,.,)
Ch,
In MIP/370 and XDLA there are two different definitions of the fractional part of a set. In MIP/370 it is the down composite variable, f k , and in XDLA it is the fractional part of W,i.e. fi3. In MPS the concepts of the integer infeasibility and the fractional part of a set are not used. When selecting the branching variable or set by an integer infeasibility option (v. Section 4.3.4) or when evaluating the sum of the infeasibilities, MPS treats the set variables as ordinary integer variables. That is, the
Computer codes for problems of integer programming
237
infeasibility of any variable is min (fi, 1- f , ) . All the codes apply the same criteria to the integer variables and to the sets in deciding the branching variable or set and which subproblem (or subproblems) to solve. 4.4.1. Priorities The user of Apex 111, MIP/370, MPS, Sciconic and XDLA may specify that priorities are used to select the branching variable or set. FMPS chooses the branching variable or set only by priority ordering. Details of all these priority options are given in Section 4.3.1. 4.4.2. Penalties Tomlin [39] has described the computation of up and down penalties for a set. The up and down composite variables and their (implicit) corresponding rows in the updated simplex tableau are considered. From these rows, up and down penalties, pu, and p,,,, are computed. These set penalties are compared by Apex 111, LP400, MPS and XDLA with the penalties of the integer variables and the branching variable or set selected by the usual rules. In Tempo the up and down numbers of infeasibilities are computed by updating the two variables that would enter the basis to remove the infeasibility of the composite variables. The number of infeasibilities, combined with the penalties, is used to select the branching variable or set by the usual rule. 4.4.3. Pseudo - costs These are used to select a set for branching in three codes, MIP/370, M P S and Tempo. The application of pseudo-costs to sets in MIP/370 is described by Gauthier and Ribikre [14]. The pseudo-costs are computed in the usual way having branched at A,, using f k and 1-fk as the divisors of the corresponding change in the objective function. Thus there are as many (PCL, PCU) pairs attached to a set as there are possible dichotomies, that is p - 1. In MIP/370 at the input stage the user may specify up and down pseudo costs for each possible dichotomy of every set. The treatment of missing pseudo-costs and the updating of the pseudo-costs values for the sets is the same as for the integer variables. In Tempo the pseudo-costs of a set are initialized by computing the up and down penalties of branching on the set at Ak. After branching on a set the pseudo-costs are updated as in MIP/370. Pseudo-costs are used by Tempo to select a set as a candidate for branching. We have not discovered how pseudo-costs are used irr sets by M P S . 4.4.4. Integer infeasibility This option, as applied to a set is used by three codes, Apex 111, MIP/370 and XDLA. In Apex I11 the integer infeasibility of a set is 0, in options (a) and (b) of Section 4.3.4. In MIP/370 f k and in XDLA f a play the role off, in Section 4.3.4.
A. Land, S . Powell
238
When selecting the branching variable or set using an integer infeasibility option MPS treats the set variables as ordinary integer variables. 4.4.5. Pseudo-shadow costs Beale and Forrest [2] describe an alternative treatment of an S1 set and the treatment of an S2 set in Sciconic. For ease of exposition the following summary assumes that the convexity row CAI = 1 is present in the problem. The vector of constraint coefficients corresponding to the set variable A, is a,. For example, in a very simple case this vector would contain three non-zero elements, the unit element in the convexity row, the element in the reference row, w,, and a function element. At a node, the vector of actual contributions is aA
=
1
A1a~
where the summation is over the set variables. The vectors a k and a k + l correspond to the two adjacent set variables hk and hk+l. For an S1 set the degradation, or estimated cost of satisfaction, in removing the infeasibility 'downwards' is
D-=
,
maX{p:(a,k -atA)~-p~(aik-aiA)~ n,(alk-a~A)}
and 'upwards' is
D'=c
max{p:(aiki
I-aiA),-pT(a,k+l-a~A),
ri(a,k+l-aiA)}
I
where p:, p; are the up and down pseudo-shadow prices of constraint i and n; is the current dual value of the ith constraint. These values, D- and D', are used in a similar manner to 0 ; and 0: in Section 4.3.5 in order to decide whether to branch on this S1 set or some other integer variable or set. The degradation associated with an S2 set is computed differently. The vector of contributions corresponding to the average weight, W, is a,. This vector is computed by linear interpolation between vectors a k and ak+l so that @ = 4 wk
+ ( 1- 4 )wk
+1.
The degradation in the objective function, or estimated cost of satisfaction, in removing the infeasibility of an S2 set is 8=
c max {p:(ai,
- aiA), - p;(ai, - aiA),r i ( a i ,- aiA)}
i
where p f , p ; and n;. are defined above. The value of 8 is used in a similar manner to Bi in Section 4.3.5 in order to decide whether to branch on this S2 set or some other integer variable or set. Having decided to branch on a set, then there is the choice of where to branch in the set. The default option for an S1 set is to select A, as the branch point. There is a more sophisticated method of choice which is an option for an S1 set and the default option for an S2 set.
Computer codes for problems of integer programming
239
Let w s and wL be the smallest and the largest values of wJ for which A, # 0. The vector of interpolated coefficients corresponding to any w, between ws and wL is a, = ( 1 - 41% + 4%
where a value of 4 is defined by w, = ( 1 - 4 ) w s + 4 w L .
The discrepancy between a, and a, can be regarded as an indication of the extent to which the current solution misrepresents the set. For a particular variable A, the function D, is defined by
D, =
C max {p:(alJ- a l , ) , - p;(a,, - a,,),7 ~ , ( a-,a,,>}. ~ I
The branch point within a set is that variable, Ak, such that
D = max D,. I
If the above method is used to select the branch point of an S1 set then branching takes place so as to exclude an interval with one end point at A, and the other end at & + I r if G a w k , or at A&, if %<wk. For the continuous functions listed in Section 3, an option in Sciconic is to start with only the two limit points in an S2 set, and to generate new variables within the set by interpolation, until the maximum possible improvement in the function is within a specified tolerance. The details of generating the interpolated variable are described by Beale and Forrest [2]. 4.5. Choice of node to develop
The process of branching usually consists of creating two new subproblems. Immediately after branching there is a choice of which subproblem (or subproblems) to solve and then, if the new node (or nodes) cannot be immediately fathomed, which node to select for further development. We call this choice the immediate strategy, and, of course, this determines the nature of the node which is stored. On reaching a node (or nodes) which can be fathomed, then, if the tree search is not complete, a node has to be selected for further development. We shall call this choice the backtrack strategy. Both the immediate and the backtrack strategy select a node on the basis of a criterion. In the backtrack strategy, the choice of node is from a subset of the active nodes, the candidate nodes. The immediate and the backtrack strategies, the selection criteria, and the candidature rules used by the different codes are discussed below. 4.5.1. Immediate strategy We can distinguish six different strategies: (a) Solve one of the newly created subproblems and use it as the node for further development;
A. Land, S. Powell
240
(b) Solve both newly created subproblems and select one as the node for further development; (c) Solve one, and sometimes both, of the newly created subproblems and select one as the node for further development; (d) Solve both subproblems and select a node from all active nodes (i.e. as in the backtrack case, using strategy 1, v. 4.5.2); (e) Solve neither subproblem and select a node from all the active nodes (i.e. again as in backtrack strategy 1). Table 9 shows which strategies are available to each code. The asterisk signifies the strategy which is followed by default. The Apex 111 default is to solve one subproblem and develop further until an integer solution is found and then to solve neither subproblem and choose the node to develop from all active nodes.
Table 9. Choice of node: immediate strategy Solve 2
Solve 1 Solve & develop both &
& choose
Code
further
Solve 1 or 2 & choose 1 choose 1
from all
Apex I11 FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA
J
J
J
* J * *
* * *
*
Solve 0 and choose from all
J
J
* J
* *
J
J
When only one subproblem is solved, in Apex 111, Haverley-MIP, MPS and XDLA, there is, in general, a choice as to which one. In MPS, when the branching variable or set has been selected by priority ordering, in XDLA, when the choice has been by penalties, and in Apex 111, the rule is
where pUk and P D k are the up and down penalties. In MPS when. the branching variable or set is selected by one of the options in Section 4.3 then the subproblem to solve is also specified: however the user can modify this choice so that either the up branch or the down branch or the opposite subproblem is solved. In XDLA if the branching variable or set is selected by one of the options (a), (b), (e) or (f) in Section 4.3.4 then the down branch subproblem is solved if fk < O . S , otherwise the up branch subproblem is solved. However, if in XDLA, the
Computer codes for problems of integer programming
24 1
branching variable or set is selected by priority ordering then the choice of subproblem to be solved corresponds to the user-specified direction (v. Section 4.3.1). In Haverley-MIP the subproblem with bound xk = 1.0 is solved. When both subproblems are solved, in Apex 111, Haverley-MIP, LP400, MIP and MIP/370 there is a choice as to which subproblem to solve first. In Apex 111 and LP300 the choice is solve min (PUk, PDk ) * In Haverley-MIP the subproblem with bound x k = 0 is solved first. In MIP/370 t h e user can control the choice. The same switch variable also controls the truncating rules (v. Section 4.6). The choice depends upon the characteristics of the subproblem but commonly the ‘worst’ subproblem is solved first, that is sohe max (PCLkfk, PCUk( 1 -fk)) where PCLk and PCUk are the lower and upper pseudo-costs of the branching variable, xk. In MIP the user can control the choice. The same switch variable also controls the node that is selected for further development. The ‘worst’ subproblem is solved first where the user specifies the definition of worst, either max (PcLk, PCU,) or max (PcLkfk, P c u k ( 1 -fk)). In the former case the node selected for further development is the one for which the smallest pseudo-cost has been obtained; in the latter case, the node with either the best functional value or the best pseudo-cost estimate (v. Section 4.5.3) is chosen. In Apex 111, Haverley-MIP, LP400 and MIP/370, having solved both subproblems, and if neither node can be fathomed, then one is chosen for further development. The choice is made on the basis of the criteria that are available to the code. The user of Haverley-MIP can specify which of the two newly created nodes is to be given priority for further development; that is, either the subproblem with bound x k = 1 or the subproblem with x k =0. Table 10 shows which other criteria (as defined in Section 4.5.3 below) are used by each code. In these four codes and in MIP, if one of the two nodes is fathomed than the other is selected for further development, and only if both are fathomed does the choice of next node depend on the backtrack strategy. The strategy of solving either one or two subproblems depending upon the circumstances is available in four codes, LP400, Sciconic, Tempo and XDLA. In XDLA the first problem is solved by the rules discussed for immediate strategy (a), and in LP400 the problem with min (PUk, P D k ) is solved. Then, in both codes, if the functional value achieved is worse than the bound on the second subproblem, the second subproblem is solved and the node selected for further development is the one with the better value.
242
A. Land, S. Powell
Table 10. Criteria to choose which one of two solved subproblems to develop further Code Apex 111 Haverley-MIP LP400 MIP MIP/370
Value of node
Pseudocost est.
J
J
J
J J J
Best projection
Priority
J
J J
In Tempo, if the min (pUk,pDk) is ‘significantly less’ than max (puk, pDk) then the subproblem with minimum penalty is solved, otherwise both subproblems are solved. When both subproblems are solved then the choice of which node to pursue further depends on the ratio of the difference between the functional values of the two new nodes and the functional value of the parent node. If this ratio is more than 0.005 then the node with the lower (when minimizing) functional value is chosen for further development: otherwise the node with the smallest number of integer infeasibilities is chosen. In Sciconic the problem with min (D:, D;) is solved where 0; and D ; are the up and down pseudo-shadow costs of the branching variable or set. Having solved the problem the pseudo-shadow cost estimate (v. Section 4.5.3 (f) below) is computed for the new node. This estimate and the pseudo-shadow cost estimate of the second (as yet unsolved) subproblem are used to decide whether it appears advantageous to solve the second subproblem; if so it is solved. The user may specify the value of a parameter which will bias the problem towards or against solving the second subproblem. The node chosen for further development is chosen on the basis of either the pseudo-shadow cost estimate or the percentage error, depending upon user specification.
4.5.2. Backtrack strategy By the ‘backtrack strategy’ we mean what the code does when it arrives at a fathomed node. Needless to say, the situation is not entirely straightforward! For instance, in the four codes which may solve either one or two nodes in the immediate strategy, the second node is solved next if the first subproblem is fathomed, and only then is the general backtrack strategy applied. The backtrack strategies are: (a) LIFO, that is select the node that was last created. (b) Consider all active nodes and select one on the basis of the criteria available to the code (v. Section 4.5.3). Both of these strategies may be modified by candidature rules (v. Section 4.5.4). All the codes, except FMPS, have the LIFO option; and all the codes, except LP400, Ophelie and XDLA, consider all active nodes. Thus ‘all active nodes’ is
243
Computer codes for problems of integer programming
the only option for FMPS, and LIFO the only option for LP400, Ophelie and XDLA. The first six columns of Table 11 indicate which criteria are used by each code when considering all active nodes. The last column shows which codes have candidature rules. Three codes have alternative strategies to (a) and (b) above. Apex I11 and MIP do LIFO until there is an incumbent and then consider all active nodes. HaverleyMIP considers all active nodes when the fathomed node is an integer solution, otherwise it LIFO's. 4.5.3. Criteriu Below is a list of the criteria that are used in the immediate and backtrack strategies, and Table 11 indicates which criteria are used by each code.
Table 1 1 . Backtrack choice of node: criteria
Code
'
Value of node
Apex 111 J FMPS J Haverley-MIP J LP400 MIP J MIP/370 MPS J Ophelie Sciconic Tempo XDLA
Best pro- Pseudojection cost est.
Sum of Pseudointeger Norm shadow Per cent infeas. est. costs error
J J
J
J
J J J
Candidature rules
J
J J
J
J J
(a) Value of the node: this is a bound (upper if maximizing, lower if minimizing) on the value of an integer solution that may be derived from a node. In all codes, except MPS, this criterion selects the node with the best value, that is maximum if maximizing or minimum if minimizing. In MPS the user has to specify whether the node with maximum or minimum value is to be selected. This criterion is used by MIP and MIP/370 in the immediate strategy only. (b) Best projection (BP): this is an estimate of the value of an integer solution that may be derived from a node. The estimate is of the degradation in the value of the objective function which is expected to be necessary to reduce the sum of integer infeasibilities to zero. That is: BPk = Xk + yes, where BPk is the BP value at node k ; sk is the sum of the integer infeasibilities at node k and y is a scaling factor. In all the codes, except MPS, that use this
A. Land, S. Powell
244
criterion
where so is the sum of the integer infeasibilities at the continuous optimum, xi;, xi and xI are as defined in Section 4. In MPS the user may specify that either this definition of 8 is used or 8 may be a user specified value. In all codes except FMPS y has the value 1.0. In FMPS the user may specify the value of y. The default value is 1.0. If the value of x I is undefined the BP value cannot be computed. In Apex 111, when the active nodes may be unsolved subproblems, the BP value of such a node is that of its parent node. Tie-breaking, in Apex 111, of nodes of equal BP value, is done by using the values of the nodes. The BP criterion has been described by Forrest, Hirst and Tomlin [12] who also point out that the BP value is the same as the estimate based on pseudo-costs (v. (c) below) when all the pseudo-costs have equal value. (c) Pseudo-cost estimate (PCE): this is an estimate of the value of an integer solution that may be derived from a node. It is the sum of the degradations in the value of the objective function that will result from driving each variable to an integer value. That is
PCEk = xi
* C min (PCL,~~, PCU,(l -6)) I
where PCEk is the pseudo-cost estimate at node k , PCL, and PCU, are the lower and upper pseudo-costs of variable j , and xk, and fi are as defined in Section 4 above. The fractional part of a set, for MIP/370, is defined in Section 4.4.3. The evaluation of the pseudo-costs, PCL, and PCUi is discussed in Sections 4.3.3 and 4.4.3. The PCE value of an active node is computed upon reaching that node. The value of an estimate can change whenever the value of a pseudo-cost is updated, so, in MIP/370, MPS and in Tempo, the pseudo-cost estimates of all the active nodes are recomputed before selecting a node to develop by the backtrack strategy. In MIP the user can specify whether or not this recomputation is to be done. We have not discovered how MPS and Tempo compute the contribution of a set to the pseudo-cost estimate. (d) Sum of the integer infeasibilities. The definition of integer infeasibility for an integer variable is given in Section 4,and of a set in Section 4.4. MPS enables the user to specify whether the node with maximum or minimum infeasibilities is to be selected. The minimum sum of infeasibilities is only used in FMPS when the user has specified that the nodes available for selection are restricted by a candidature rule (v. Section 4.5.4). (e) Norm estimate (NE)
*,
Computer codes for problems of integer programming
245
Tempo has three methods of estimating the norm of node k :
(iii) NE,
=
depth nonint * IPCE,
-
xlll
where xg is as defined in Section 4 above, PCEk is the pseudo-cost estimate at node k (v. Section 4 . 5 3 (c)), ‘depth’ is the depth of node k within the tree and ‘nonint’ is the number of integer infeasibilities. The node with the greatest norm is selected. However, we have been unable to discover how Tempo uses pseudocosts of a set in estimating the norm. (f) Pseudo-shadow cost estimate (PSCE): this is an estimate of the cost of satisfying all the integer or set infeasibilities,
J
where Df and 0; for an integer variable are as defined in Section 4.3.6 and y is a scaling factor which may be user specified, and which has a default value of 1.0. 0: and 0;are replaced by D+ and D - (v. Section 4.4.5)for an S1 set and the min(Dt,D;) is replaced by 8 (v. Section 4.4.5) for an S2 set. (g) Percentage Error (PE): this measure of the error of an estimate is defined by Forrest, Hirst, and Tomlin [12]. It is used only by Sciconic and is
where PEk is the percentage error at node k, PSCEk is the pseudo-shadow cost estimate at node k, and xI, xk are defined in Section 4.This criterion cannot be used unless a value for X I is known. The mode with smallest percentage error is selected. 4.5.4. Candidature rules During the tree search a node is available for further development if it is an active node. Eight codes, Apex 111, FMPS, LP400, MIP, MIP/370, MPS, Tempo and XDLA have user-controlled rules to restrict the nodes that are available for branching to the candidate set, which is a subset of the active set. In FMPS a node where branching has taken place on a set is only considered as active if all other nodes have been developed, or immediately on finding an integer solution, or in both cases if the user so specifies. A further option in FMPS is that a node is only a candidate if its functional value is within a user-specified percentage of the co-bound..
246
A. Land, S. Powell
In Apex 111, LP/400, MPS, Tempo and XDLA, to be a candidate a node must have a value better than the cut-off .tp where p is a user specified amount. In XDLA p may be specified percentage of the continuous optimum. In Apex I11 and Tempo a node that is not considered as a candidate is kept in a postponed list. In Apex I11 it can be considered as a candidate if the user later resets the cut-off; in Tempo it is a possible candidate if all other nodes have been developed. In MPS a node whose functional value lies between the cut-off and a user specified value is only considered as a candidate if all other nodes have been developed. In MIP and MIP/370 the candidature rules are more sophisticated. To be a candidate a node must have a pseudo-cost estimate better than a user supplied value and also its functional value must be better than a ‘candidate cut-off whose value is min (cut-off - Ipl, co-bound+ 171) (when minimizing) where p and y are user specified values. If the user of MIP/370 wishes and sets the appropriate switch variable, the following additional candidature rules may be imposed: (a) A node is not a candidate unless its pseudo-cost estimate is better than a computed value, based upon the best pseudo-cost estimate of the active nodes. (b) A node is not a candidate unless its functional value is better than a computed value based upon the co-bound. (c) both (a) and (b). In MIP and MIP/370 the active nodes not considered as candidates are kept in a postponed list. They will be considered as candidates only if the candidature rules are changed. In all codes, except MIP/370, the candidature rules are only imposed if the user so specifies: that is the default is ‘no candidature rules’. In MIP/370-MIP (v. Section 5 ) the additional candidature rule used is that a node must have a good pseudo-cost estimate. 4.5.5. Stopping rules Within the agenda or control stream all the codes enable the user to stop the branch and bound search. In four codes, Apex 111, Haverley-MIP, Ophelie and XDLA, the user can specify parameters so that the search stops when a criterion is satisfied. These stopping rules are based upon the value of the best integer solution found so far in the search. Apex 111 stops when the value of an integer solution is within a user specified percentage or amount of the co-bound. Haverley-MIP stops when either a specified number of integer solutions have been found or a specified number within a specified percentage of the continuous optimum have been found. Ophelie and XDLA stop when an integer solution with a value within a user specified percentage of the continuous optimum has
Cornpurer codes for problems of integer programming
241
been found. XDLA also stops when an integer solution has a functional value at least as good as a user specified value. In MIP and MIP/370 the user can set parameters so that the search stops when a specified number of integer solutions have been found or when a specified amount of execution time has elapsed. 4.6. Truncation Three codes, Apex 111, MIP and MIP/370 detect which subproblems are considered to be not worth solving, either because their solutions will consume too many computing resources or because the solved subproblem will be of no interest and needs no further development. The codes create nodes that are partially solved subproblems, called ‘flagged nodes’ in Apex 111 and ‘prenodes’ in MIP and MIP/370. In Apex I11 a subproblem may be flagged if it is too time-consuming to solve, requires too many iterations or there is a computational failure. The iteration and time limits may be user specified. A flagged node is only considered for further development when all other active nodes have been developed. In MIP the truncation rules are not user controlled. In MIP/370 the truncating rules which stop the optimization of a subproblem are user controlled. The truncating rule computes a bound on the functional value and if this bound is exceeded than a prenode is created. Depending upon the selected rule, this bound is based upon the pseudo-cost estimate and the functional value of the parent node, or upon the functional value of the parent node and the pseudo-cost estimates of the parent node and of the first subproblem solved. Also in MIP/370 when solving a subproblem, if the user has so set the switch variable, a prenode will be created if the functional value of the subproblem becomes worse than the ‘candidate cut-off (v. Section 4.5.4). All prenodes are considered active nodes and are candidates for further development (this includes solving the subproblem first) if they satisfy the candidate rules. 4.7. Solution method of continuous LP problems During the solution of a subproblem by a dual or parametric algorithm, if the functional value becomes worse than the cut-off, there is no need to continue the optimization: this is done by FMPS, MIP, MIP/370, Ophelie and Tempo, and by MPS, Sciconic and XDLA when using a parametric algorithm.
4.8.The cut-08 In all the codes except Haverley-MIP the user may specify a cut-off so that the first integer solution found will be better than the cut-off. Whenever an integer solution is found the cut-off is updated so that the functional values of the integer solutions monotonically improve. In all codes except FMPS, in the absence of any user specified value the cut-off is initially set to --M if maximizing and + m if minimizing. In FMPS the cut-off is initialized to x::*0.5*x::. Since one has come
A. Land, S . Powell
248
Table 12. LP solution method
Code
Apex 111 FMPS Haverley-MIP LP400 MIP MIP/370 MPS Ophelie Sciconic Tempo XDLA
Dual
Parametric
User choice, primal or dual
J
J
J
?
J
Primal
User choice, primal or parametric
J
J J J J J
across problems where the sign of the function at the LP solution is different from that of the integer solution, this seems a rather unsatisfactory rule! By the use of the agenda or control stream all the codes enable the user to re-set the value of the cut-off each time an integer solution is found. In three codes the user may specify parameters to re-set the cut-off. In Apex 111 the cut-off may be re-set to a user specified value, or the value of the current incumbent f a user specified value, or the value of the current best candidate f a user specified value. In FMPS the cut-off may be re-set to the value of the current incumbent f a user specified percentage. In XDLA a switch may be set so that the value of the cut-off is never updated. In the codes that use best projection, the value of the cut-off is the value of x,, except when the cut-off is infinite, in which case the BP value can not be evaluated. In Apex I11 and Sciconic the user can specify both a cut-off and a target objective. The target objective is used as the value of x, by Apex I11 in computing the best projection value and by Sciconic in the percentage error value. On finding an integer solution, both the cut-off and the target objective are updated to the functional value of the integer solution.
5. Defaults We have tried to describe all the various devices and options available in the different codes, to the extent that we may have given the impression that it is impossibly difficult for the user to make all the right choices. However, all the codes can be used in a very straightforward way, and all the different options are for use on difficult problems, when the user has built up some experience in the tactics which best suit his particular problem.
Computer codes for problems of integer programming
249
Table 13 Default setting for selecting a variable or set where priorities are not provided (or not available) Apex 111 FMPS Haverley-MIP LP400 MIP MIP/370-MIP MIP/370-PURE MPS Ophelie Sciconic Tempo XDLA
penalties priority order based on input order integer infeasibility, option 4 of Section 4.3.4 penalties priority based upon input order pseudo-costs, option c of Section 4.3.3 until the first integer solution and then option b priority order based on the cost order penalties, option d of Section 4.3.2 integer infeasibility pseudo-shadow costs pseudo-costs and penalties (v. Section 4.3) penalties
In MIP/370 alone, there is an analysis of the nature of the problem to determine the correct default strategies. If the number of integer variables is less than 15% (or a user-specified percentage) the option MIP/370-MIP is used; otherwise MIP/370-PURE. (The user may alternatively simply specify which of the two is to be used). In most cases, one would expect the default setting to be the recommended standard method, but as experience has been gained in the use of the codes, so the recommendations have tended to move a little. The user’s knowledge of the reality of his problem, as reflected in the priorities he can give to the order in which the variables and sets are to be arbitrated, is the most widely recommended method. The defaults in Table 13 for selecting the variable or set upon which to branch (cf. Table 5 in Section 4.3) are only for cases where priorities are not known or not possible in the code. In MIP/370-MIP the pseudo-costs are partially computed when they are required, while in MIP and MIP/370-PURE the missing pseudo-costs are not computed. The recommended standard strategy for choosing the branching variable in MPS (in the absence of a priority list) is to use the combination of penalties and pseudo-costs described in Section 4.3.3. Missing pseudo-costs have the value zero. In Apex 111, by default, forced moves are made on non-basic integer variables and those integer variables and sets which are integer and set feasible are not considering as branching variables or sets. The default for the immediate strategy is denoted by an asterisk in Table 9 in Section 4.5.1. In MPS the default choice of subproblem to solve is the one specified by the method of selecting the branching variable or set (v. Section 4.3). In MIP the default choice of subproblem to solve first is the same as in MIPl370; that is the subproblem with max (PCLkfk, pcuk(l-fk))* Having solved both subproblems, MIP and MIP/370 select the node with the best pseudo-cost estimate for further development.
250
A. Land, S. Powell
Table 14. Default backtrack strategy for choosing the next node Apex I11
FMPS Haverley-MIP LP400 MIP MIP/370-MIP MIP/370-PURE MPS Ophelie Sciconic Tempo XDLA
LIFO until the first integer solution and then all active nodes all active nodes all active nodes if the fathomed node is integer, otherwise LIFO LIFO all active nodes all active nodes all active nodes LIFO LIFO all active nodes all active nodes LIFO
Table 14 shows the default backtrack strategy that each code uses on achieving a fathomed node. The recommended strategy for MPS is ‘all active nodes’. Table 15 shows the default criterion used when the default backtrack strategy is ‘all active nodes’. In MIP the pseudo-cost estimates of all active nodes are recomputed when selecting the node for further development in the backtrack strategy. In MPS when using the recommended strategy of all active nodes the recommended criterion is pseudo-cost estimate. In Apex I11 there are default settings of the time and iteration limits for the creation of flagged nodes. It is a characteristic of ‘large’ LP problems with ‘few’ integer variables that there are many simplex iterations, each resulting in a small change in the objective function, between a node and its immediate successors. This tends not to be so in pure or nearly pure integer problems. Thus in MIP/370-PURE the default truncation rule (v. Section 4.6) is based upon functional values while in MIP/370-MIP it is based upon pseudo-cost estimates. Table 15. Default criterion for choosing next node in the backtrack strategy Apex 111 FMPS Haverley-MIP MIP MIP/370-MIP
MIP/370-PURE Sciconic Tempo
best projection, tie breaking by value of the node best projection value of the node pseudo-cost estimate pseudo-costs estimate among nodes satisfying the additional candidature rule, until there are no more candidates. Then pseudo-cost estimate without the additional candidature rule. pseudo-cost estimate pseudo-shadow cost estimate until the first integer solution and then percentage error norm estimate (a)
Computer codes for problems of integer programming
25 1
6. Quasi-integer All the codes enable the user to specify an integer tolerance, so that a variable is declared integer if fi or 1-fi is not more than the tolerance. In FMPS, MIP, MIP/370, MPS and Tempo the concept of a quasi-integer variable is used. The branching variable is chosen from integer variables and sets which are not quasi-integer. If all the integer variables have quasi-integer values then the branching variable is chosen from the quasi-integer variables. In FMPS, MIP, MPS and Tempo an integer variable is quasi-integer if fi or 1-6 is less than a tolerance (with a default setting of 0.1). In MIP/370 the definition of quasi-integer depends on the method that is used to select the branching variable. If the choice of branching variable does not use pseudo-costs then the definition is the same as in FMPS, however if pseudo-costs are used then the decision as to whether a variable is quasi-integer is based upon its pseudo-costs and tolerances. In FMPS a set is quasi-integer if all the set variables have values closer to either 0 or 1 than a tolerance. In MIP/370 the same criteria that are used to determine whether a variable is quasi-integer are also used to determine whether a set is quasi-integer. In the decision the fractional part of a set (v. Section 4.4)is used in a similar manner to fi. The concept of quasi-integer is applied to a set in Tempo when only one variable of the set is non-zero. 7. Cascade All of the codes except Haverley-MIP, LP400, Tempo and XDLA have some method for the user to specify part of the tree: this feature is often called CASCADE. Ophelie has an option called CASCADE but we do not know the details. In FMPS the user can specify a list of variables to be set to 1 and a list to be set to 0; throughout the search these variables are fixed at either 0 or 1. In Apex 111 the user can specify a preferred value for a binary variable, a preferred range of values for an integer variable and, in the case of sets, the branch point and the set variables that are to be allowed to take non-zero values. The user may also specify a preferred direction of branching, that is which subproblem is solved. Nodes which satisfy these conditions are given preference at the node selection stage. The user can specify either that all nodes defined by the preferred values of the variables and sets are to be explored or that only on the first occasion that a variable or set is branched upon is the preferred direction to be taken. In MIP/370 the user can specify a preferred direction of branching for 0-1 variables (not general integer variables or sets) so that if such a variable is selected for branching by either priorities or pseudo-costs then the next node to be developed will have the variable at its preferred value. In Sciconic the user can specify for each variable and set either an initial direction of branching or the direction of branching on all occasions when the
252
A. Land, S.Powell
variable or set is selected as the branching variable, in the latter case the opposite direction is not searched at all. This latter condition can be strengthened by the user setting a switch so that a variable or set is only satisfied if its lower and upper bounds are equal.
8. Agenda Each of the branch and bound codes is part of a mathematical programming system. All of the systems give a user access to the component codes and their parameters by the use of an agenda or control stream which is a series of commands to the system. In the agenda the user can initialize the parameters of the tree search, for example, the cut-off, the method of choosing a variable to branch upon, the method of node selection etc. He can print a node, exit, change the parameter values, or continue, whenever the algorithm reaches an integer solution, or at a node count or an iteration count, or any other frequency count available to the code. The codes vary in the level of flexibility and sophistication that is available to the user, but all of them have these features. All the codes allow the user to save on a file the entire problem and its tree, and allow the user to restart the problem from the stored file. Apex 111, MIP/370, Sciconic and Tempo have a facility that enables a user to fix all integer and set variables at the values attained in an integer solution and then to perform postoptimal analysis of the continuous variables. In MIP/370, Sciconic and Tempo the problem with the fixed variables is set up so that all the LP procedures can be used: for instance, the problem data can be revised and the problem re-optimized with respect to the continuous variables.
9. Input and interactive control All the codes will accept the problem coefficients of an integer programming problem without any sets, in IBM MPS input format. Many of the codes have extended, in various different ways, MPS format in order to make it more flexible. The major difference between the input formats is in the method of defining a set. Most users when setting up a mathematical programming problem use a matrix generator. All of the codes have an associated matrix generator. Some generators can be used to input a problem into more than one code. A widely used example of this is MAGEN, the matrix generator of Haverley Systems. This matrix generator can be used to input into Apex 111, Haverley-MIP, MIP and MIP/370. All the operating systems under which these integer programming codes run enable the user to solve his problems in batch mode; MPS, MIP/370, Sciconic and Tempo may also be run interactively.
Computer codes for problems of integer programming
253
11. ACADEMIC CODES, published or otherwise available
There is very little that we can report on codes in this category. We have tried to contact anyone who might be able to provide a code to any interested research worker, but we are painfully conscious that we may have missed some. And on the other hand, by listing all those published which we have encountered, we have certainly included some which could not be recommended to anyone whose interest was in getting his problem solved (as opposed to one interested in developing algorithms). Since the amount of information we have on each code is very varied, we have not attempted to standardize our reports, but simply reported what we know, alphabetically under author’s name. As a general criticism, we feel that published codes should be accompanied by some warnings to the innocent reader -some discussion of the limitations of size and nature of the problems which he might reasonably expect to be able to solve using the code.
Bachem, A., Dohle, Hausmann, Petroll, Schrader Institut fur Okonometrie und Operations Research, Universitat, Bonn. They have a long list of programs in Fortran IV for ILP-type problems, of which most are maintained, and they would supply them to a government institution or industry on request without charge. For general ILP, the list includes: LEXISU: Solves ILP’s with lower and upper bounds for all variables using Fabian’s lexicographic search algorithm. (Schrader, Hausmann). DZLP: Minimization of linear zero-one problems. (Schrader). UNIVACBB: Branch and Bound algorithm (Land-Doig method [30]) for solving mixed integer linear programs, (M. Petroll). ILPCW1: Computes the k-th best solution of an integer linear program over a cone and checks its feasibility to the original integer program using Wolsey’s algorithm [46]. (Dohlt, Bachem). The Institute has also prepared a matrix-generator for the Land-Powell programs [31].
Bauer, E L . Algorithm 153 “Gomory”. Communications of the A.C:M. No. 6 1963 (p. 68), and No. 8 1963 (p. 449). An Algol version of the Gomory all-integer algorithm [21]. See also “Gomory 1” by Langmaack, below.
Baugh, C.R., Liu, T.K., Muroga, S., Young, M.H. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 6 1801.
254
A. Land, S. Powell
ILLIP-2 (1977) is a development of ILLIP (ILLinois Integer Programming Code) which has been in use both inside and outside the University of Illinois since 1968. It is an implicit enumeration code [ l , 151, with some sophisticated refinements as described in [26]. The program is available in both FORTRAN and PL/l, and makes use of sparseness in the coefficient matrix. It is claimed to be two to four times faster on some small problems than DSZlIP (See Lemke and Spielberg).
Byrne, J.L. and Proll, L.G. (1968) Algorithm 341, “Solution of Linear Programs in 0-1 Variables by Implicit Enumeration”. Communications of the ACM No. 11 1968, (p. 782) NO. 12 1969 (p. 692) and No. 4 1970 (p. 263). An Algol procedure based on Geoffrion [15]. See also Algorithm 14 “Initial Solution to the Zero-One Integer Linear Programming Problem” Zastosowania Matematyki No. 3 1971, (p. 347) by M. Syslo.
Brocklehurst, E.R., and Dennis, K . Division of Numerical Analysis and Computing, National Physical Laboratory, Teddington, Middlesex TW11 OLW. They have three codes; two heuristic, one for mixed and one for pure integer, and one (optimizing) generalized Branch and Bound [7]. They are all in ANSI standard FORTRAN, and are running on CDC 6500. All are under development, and are available to interested parties. “Suitable arrangements would be negotiated in each case, usually involving a charge in cash or kind”.
Christofides, N. and Trypia, M . (1976) Department of Management Science, Imperial College of Science and Technology, Exhibition Road, London SW7 2BX. This code for pure 0-1 ILP is machine-dependent on CDC 6600/6400, since it uses bit manipulation. It uses a sequential approach to the solution [8] and claims to be very efficient, having solved unstructured problems with up to 240 variables, with the restriction that the coefficients are in the range *20. The code would be made available free to anyone asking for it, but it will not necessarily be maintained. Some test problems could also be provided.
Fiala, Frantiieek (1973) Algorithm 449 “Solution of Linear Programming Problems in 0-1 Variables”. Communications of the A.C.M. No. 7 1973 (p. 445). A FORTRAN subroutine based on a modification of an algorithm by Hammer and Rudeanu, [ l l , 241.
Computer codes for problems of integer Programming
255
Fleisher, J.M., Shetty and Trotter Academic Computing Center, the University of Wisconsin, 1210 West Dayton Street, Madison, Wisconsin 53706. The Computer Center maintains a variety of ILP codes which they will distribute and charge only for costs of creating tape, documentation and mailing. They are written in FORTRAN V for the Univac 1100 Series, and would require a limited amount of modification for other computers. IPMIXD: A FORTRAN subroutine, branch and bound for upper bounded MIP problems, based on Land and Doig and developed from BBMIP (see under Shareshian, below) in IBM Share Library. Storage requirement is generally excessive when there are more than 40 or 50 rows. IPENUM: Pure IP FORTRAN sub-routine in general upper-bounded variables, using implicit enumeration. Developed from ENUMER8 by Shetty and Trotter. It has a restart facility and has solved problems up to 60 X 60. ENUMER8: Similar to IPENUM, but a complete program which reads data cards. Problem size up to 50x200. IPZERO: Pure 0-1 integer, FORTRAN subroutine, using an implicit enumeration algorithm designed by Hammer and Rudeanu [24]. It can find up to 30 alternate optima. Code allows problems up to 200 x 100-200, and has successfully solved 50 x 50 problems.
Geoflrion and McBride Western Management Science Institute, Graduate School of Management, University of California, Los Angeles, California 90024. GMINT: “an evolutionary descendant of the widely distributed RIP30C code”. “This code is experimental and is available only to our research collaborators or others with whom we have a personal relationship”. We mention this code although it does not meet our availability criterion, since we assume that it will be more widely available once it is in less of an experimental condition. It is in FORTRAN, has been run on IBM and CDC computers, and is LP-based branch and bound 0-1, mixed integer (with many refinements).
Gorry, G.A., Northup, W.D.,and Shapiro, J.F. (1973) IPA: an integer programming algorithm using a group optimization problem derived from the optimum LP solution, with a fall-back to branch and bound. In the paper describing it [22] the code is described as being in FORTRAN, implemented on IBM 360 System and on the Univac 1108, and particularly designed for transferability. It is available for use through the National Bureau of Economic Research, Computer Research Center, 575 Technology Square, Cambridge, Mass. 02139.
256
A. Land. S. Powell
Later private communication suggests that IPA, although it has solved a variety of IP problems including an airline crew scheduling problem with 312 rows and 298 0-1 variables, is not now maintained. SESAMIP: is a large scale interactive general purpose MIP code under development by Northup and Shapiro, using Orchard-Hays’ LP system SESAME as a subroutine. Details are in an MIT Technical Report [40] and a forthcoming paper [4].A further code is being developed using the same IP duality theory as SESAMIP in an implicit enumeration 0-1 program by a Ph.D. student, J.S. D’Aversa.
Ibaraki, Ohashi and Mine Department of Applied Mathematics and Physics, Faculty of Engineering, Kyoto University, Kyoto, Japan. A heuristic FORTRAN code for general mixed-integer problems [27]. Problems with 400 variables X 50 inequalities and 100-200 variables x 100-200 inequalities have been successfully solved. The programs have been run on FACOM 230/60. The program is maintained, and will be made available to noncommercial institutions without charge, although data and results would be expected in return.
Keulemans, W. K .M. Department of Mathematics, Technological University Eindhoven, P.O. Box 5 13, Eindhoven, The Netherlands. Two codes for pure integer programming and one for mixed integer programming. The codes have been written in Burroughs Extended Algol and therefore need some reprogramming to be transferable to another machine. The programs are maintained and if requested would be supplied to a government institution or to industry without charge, provided that when the programs are used the origin of the programs is properly mentioned in publications. A short description of the programs: 1. A code for the integer programming problem based on implicit enumeration [l, 15,16,19]. It can handle general integer problems, provided that suitable upper bounds for the integer variables are given. A special form of surrogate constraint is used, based on the reduced costs of slacks and integer variables. The size of the problems is limited by the restriction of computing time. Generally, good solutions are produced very fast, but proof of optimality may be time consuming. The code has solved problems with up to 300 zero-one variables. It has been written as a procedure and therefore the user has to write his own input program. 2. A code for the integer programming problem based on the cutting plane principle. The code uses the algorithm proposed by Haldi [23]. This algorithm has been slightly modified. The choice of the pivot is such that the columns remain
Computer codes for problems of integer programming
251
lexicographically positive. In this way the number of cuts does not change, but the number of pivot steps decreases considerably. 3. A code for the mixed integer programming problem. The code is based on partitioning and implicit enumeration. It has not been designed for a special type of problem, but it can, depending on the structure of the problem, in some cases be easily adapted to special problems. This has been done e.g. for a school allocation problem. The code is also programmed as a procedure and therefore the user may make his own input program. There is, however, an input program which can be combined with the mixed integer code. A description in Dutch [28] of the required input is available. Klein, Deiter Institute of History and Social Science, Odense University, Niels Bohrs Alle, DK-5000 Odense, Denmark. Three APL algorithms, which he is willing to share with interested persons without charge, to solve all-integer problems, based on Garfinkel and Nemhauser’s [ 131 descriptions of fractional, dual all-integer, and primal allinteger algorithms. Only small problems have been solved. Kochman, G. Department of Operations Research, Stanford University, Stanford, California 94305. A code for general integer variables, employing a branch-and-bound strategy, designed to solve block angular integer programs by decomposition into subproblems. It is in FORTRAN on IBM 370/168, has solved up to 32 x40 problems, and is designed for problems with up to 10 subproblems and 1000 non-zero coefficients. The code is maintained and would be supplied without charge except to cover processing and handling. Documentation from Miss Gail Lemmond at the above address (Technical Reports SOL 76-21 and SOL 76-20). Kucharczyk, J. Algorithm 16 “Implicit Enumeration Algorithm for Solving Zero-one Integer Linear Programs”, No. 1, 1972 (p. 133) Zastosowania Matematyki. An Algol procedure, ilpOlSW, based on Balas [l] with modifications [9, 18,391. Kuester, J.L. and Mize, J.H. (1973) This book [29] is a compendium of optimization algorithms covering a wide range of constrained and unconstrained methods. Relevant‘ to this survey are: MINT: Mixed integer programming algorithm, using the branch and bound algorithm of Land and Doig [30].
258
A. Land, S. Powell
ZONE: Zero-one programming, based on Balas [l]. The program is used by permission of Dr. Thomas L. Yates, Oregon State University. This program is a modification of an earlier code developed by Dr. C.C. Peterson, Purdue University.
Land, A. and Powell, S. (1973) London School of Economics, Houghton Street, London WC2A 2AE. Published as a book [31] and also available on magnetic tape. This is a set of well-tested, machine-independent Simplex based FORTRAN subroutines, using an explicit (but reduced) inverse matrix. Relevant to this survey are: (a) MIF (“Method of Integer Forms”), based on Gomory’s first algorithm [20], but adding several constraints at each LP optimum, and recording them in the original variables rather than in the updated tableau. This enables reinversion to take place if necessary. This algorithm is successful on the limited class of problems where cutting planes are usually successful. Otherwise it will fail because the basis matrix has become too ill-conditioned to proceed further, and will terminate with a message. (b) BB. A branch-and-bound procedure following the original Land and Doig [30], suggestion of fixing variables rather than narrowing their bounds. Uses a LIFO node choice and penalty branch choice. It has a restart facility. Variables have specified discrete step sizes rather than being constrained to simple integer values. This means that the matrix coefficients can be kept within an absolute range, 0.1 to 10.0. So long as this input limitation is achieved, the BB algorithm seems to be very robust, though it may, of course, take an excessive length of time to completely solve the problem and prove optimality. The principal limitation on the use of BB is the size of the explicit inverse. The main intention of these programs is to enable a research worker to modify the FORTRAN routines to develop his own algorithms. They have been successfully used to combine cutting plane ideas with branch and bound (e.g. [36]), and could certainly be adapted to pursue a more sophisticated branch and bound strategy. The authors intend to pursue this themselves, as well as eliminating the explicit inverse feature. Current versions of the algorithms are maintained and freely available to users on the University of London Computing Centre, and copies in UPDATE format for CDC computers can be provided for use elsewhere. Eventually, an improved version in machine-independent FORTRAN will be prepared.
Langmaack, H. Algorithm 263A “Gomory 1” Communications of the A.C.M., No. 8 1965 (p. 601) and No. 13 1970 (p. 326). A revision of the algorithm by Bauer (see above), also in Algol, and a further amendment in 1970 by Pro11 to incorporate Wilson cuts [45].
Computer codes for problems of integer programming
259
This publication is the source of the U.K. National Algorithms Group ILP algorithm, H 0 2 B A F in NAGFLIB: 1137/762: MKS, August 1975. Lemke, C.E., and Spielberg, K. (1966)
DZIPI; available from IBM, Program Information Department, 40 Saw Mill River Road, Hawthorne, New York 10532. An implicit enumeration Fortran IV code for 0-1 ILP, without simplex LP operations. It solves problems up to 80X 150, and is described in [32]. Apparently also based on this paper is D Z l I P obtainable from CDC. Marsten, R.E., and Morin, T.L. Professor Roy E. Marsten, College of Business and Public Administration, University of Arizona, 85721 and Professor Thomas L. Morin, School of Industrial Engineering, Purdue University, West Lafayette, Indiana 47906. M&MDP is a computer code for solving multi-constraint knapsack problems. These are of the form: maximize cx subject to Ax =sb, x binary (or integer), where A is a non-negative matrix. The methodoIogy is a combination of the branch-and-bound and dynamic programming approaches [33,35]. The code has successfully solved, to optimality, many problems of sizes up to 30 constraints and 60 variables in less than a minute of computer time. The code is written in FORTRAN and can be run on any large or medium scale computer. The code has the very novel feature of being able to solve an integer program parametrically with respect to its right-hand-side. Specifically, it will find all optimal solutions to the family of problems: maximize cx subject to Ax =s b + Od, x binary, for 0 s 6 C 1. The code is maintained and is available on an individually negotiated basis. SETPAR is a computer code for solving set partitioning problems. These are of the form: minimize cx subject to Ax = e, x binary, where A is a binary matrix and e is a vector of ones. The methodology is of the branch-and-bound type and is described in detail in “An Algorithm for Large Set Partitioning Problems”, [34]. The code is written in FORTRAN and can be run on any large computer. It has successfully solved to optimality, many problems of sizes up to 100 constraints and 5000 variables in a few minutes of computer time. The code is maintained and is available on an individually negotiated basis. A new, and as yet untested, version of the code is expected to be able to solve problems with up to 400 constraints and over 10,000 variables. Nastansky, L. Gesamthochschule Paderborn, 479 Paderborn, Pohlweg 55, Postfach 1621, W. Germany. GLPOlGEOFl is an implicit (0, 1)- Enumeration code based on Geoffrion’s paper [16,17].
260
A. Land, S. Powell
ALLINTLP an A l l Integer cutting plane algorithm based on Gomory’s method [21]. KNAPOLOI a dynamic programming algorithm due to C. Gerhardt [MI. These programs are all in ALGOL and were developed primarily for teaching purposes. However, they can be made available as source programs on tape without charge (tape provided). They are extremely fully and well documented in German. Plane, D.R., and McMillan, C This book [38] contains FORTRAN programs for the implicit enumeration solution of zero-one problems of up to 50 variables, and a cutting plane algorithm based on Gomory’s Method of Integer Forms [20] for up to 30 variables. Proll, L.G. Centre for Computer Studies, University of Leeds, LS2 9JT, England. Dr Proll has three 1LP codes: a cutting plane code (in ALGOL 60), an implicit enumeration code (in ALGOL 60) and a branch and bound code (in FORTRAN). They are intended primarily for running small examples for teaching purposes, but have been used for moderate sized problems in several research studies. They are on the Leeds University DEC-10 and ICL 1906A, and are widely available to educational institutions through the 1900 Applications Software Committee. The codes would be made available to other research workers, but they would not be maintained and of course due acknowledgement should be made if any results are published. Salkin, H.M. and Spielberg, K. (1969) DZLP -available from IBM, Program Information Department, 40 Saw Mill River Road, Hawthorne, New York 10532. An implicit enumeration, Fortran IV code to solve 0-1 all-integer problems, using a dynamic origin technique, and LP optimization. The user can select various strategy options. The standard size is 50 x 150 and problems ranging from 12 to 50 variables with up to 31 rows were solved in less than 1 minute on IBM 360150. Shareshian, R. BBMIP (1967) This is in the IBM Share Library and available from IBM, Program Information Department, 40 Saw Mill River Road, Hawthorne, New York 10532. It has been widely distributed and it is certainly still in use in some places. It is a branch and bound FORTRAN code, using an explicit simplex tableau.
Computer codes for problems of integer programming
26 1
Spath, Helmuth (1975) A collection of FORTRAN programs in a book [41], which besides transportation, travelling salesman, knapsack, and max-flow algorithms, includes two ILP algorithms: a Gomory all-integer program, and an implicit enumeration algorithm. Write-ups and FORTRAN comments are in German. Some small test problems and their solutions are included. Toyoda Y. (1975) Department of Management Engineering, College of Science and Engineering, Aoyama Gakuin University, 6-16-1 Chitosedai, Setagaya-Ku, Tokyo 157. The effective gradient method-a FORTRAN IV code to obtain quickly approximate solutions to large-scale multi-dimensional knapsack problems. It can solve problems involving more than a thousand zero-one variables in very modest computing times as described in [43]. It is available free to users, although Professor Toyoda would like users to report to him the results they have with the method. Trauth, C.A. and Woolsey, R.E.D. “IPSC a machine independent integer linear program” -full FORTRAN listing and user’s manual available from Professor R.E.D. Woolsey, Colorado School of Mines, Department of Mathematics, The University of Mineral Resources, Golden, Colorado 80401. This is a code based on Gomory’s all-integer method [21]. There is a published report [44] on its success in solving some small problems. Woolsey, R.E.D. ARRIBA: a FORTRAN system to solve small ILP problems by a variety of algorithms, namely: IPSC - a Gomory cutting plane code (see also Trauth and Woolsey), PRIMAL -a primal cutting plane code based on the algorithm of Harris [25], BALASG - a version of the original Balas 0-1 algorithm [13] with modifications by Glover [19]. This system is available from Dr W.O. Panadis, Box 0, Control Data Corporation, Minneapolis, Minnesota 55440, at a cost of about $25 for tape of source deck, test problems, and user manual. Appendix. Computational experience We thought that it would be of interest to include in the survey some computational experience of using the codes. We decided against testing the academic codes because of the differences between them. They were designed to solve different problems (e.g. 0-1 pure integer problems, mixed integer problems)
A. Land. S.Powell
262
by a variety of means (e.g. group theoretic methods, cutting planes, branch and bound) and they all have different input formats. It would have been very difficult to distribute test problems to 26 authors in as many different formats. However there were fewer commercial codes, they were all designed to solve mixed integer problems using the same methodology and they have similar input formats. ‘MPS format’, originally devised by IBM, has become an industry standard for linear programming problems. Most of the commercial codes can input a mixed integer programming problem in a variant of MPS format; the variation between the codes is in the method of defining the integer and set variables. When we approached each code manufacturer to ask for details of his code we also asked whether he would be willing to run some test problems. All generously agreed but they stressed that as we were in effect asking them for machine time and manpower they would like the problems well in advance, and even then they could only run them if resources were available. As the test problems were only available at a late stage we decided that it was only possible, because of communication difficulties, to attempt to run the problems on computers that were available in Europe. Two manufacturers do not have, for their own use, an appropriate computer in Europe. Also due to machine obsolescence or non-availability and staff re-organization three manufacturers withdrew their offers to run test problems. Finally the test data were sent to five manufacturers. Unfortunately due to pressure on their time and budgets we only received computational results from IBM, Scicon and Univac. We are particularly grateful to Scicon and to Univac for sending us such a comprehensive set of results. We had considerable difficulty finding a set of test problems. We looked for test problems that had been or were being used in industry as these are the type of problems for which commercial codes are designed. We rejected as a source of test problems academic combinatorial problems that are designed to be particularly easy or difficult to solve. We wanted ‘real’ problems in a range of sizes, covering a variety of applications and they must not all have been processed and formulated by the same commercial bureau. We were not able to find any problems involving special ordered sets. The problems we chose are described in Table A1 . Only one of the problems, PLAN with many general integer variables, was selected for its known difficulty of solution. However every manufacturer considered that all these problems were difficult to solve. Table A 1 . Test problems Name
PAINT
STEEL
COAL
PLAN
Rows” Columns Integer variables: general 0- 1
64 133
6 319
438 209
479 819
23 0
0 319
0 95
390 0
a
these d o not include the objective function
Computer codes for problems of integer programming
263
Table A2. Sciconic results Problem Continuous optimum First solution value branches iterations seconds Best solution value branches iterations seconds Complete search branches iterations seconds c o s t (L) first solution complete search
PAINT
STEEL
COAL
PLAN
811.84
290.93
149706.86
440.873
909.7 21 58 1.9
364.0 7 29 1.4
151415.0 29 147 10.3
536.961 79 344 29.8
same as first solution
307.0 61 1 1329 70.2
150850.0 1194 4565 319.1
482.339 2265 6861 531.73
202 533 19.1
636 1358 75.1
8844 33410 2464.8
search incomplete
1 9
1 40
6 1160
16 29 1 (to the best solution found)
Table A2 is the result of submitting the four test problems to Sciconic, manufactured by Scicon. The code was run on a Univac 1108 computer under the 1100 operating system, level 31. All the times and iteration counts are from the continuous optimum. An iteration is a basis change. The branch count is increased by one when a subproblem is solved. The timings, in seconds, are CPU times only - they do not include any I/O time. The costs, in pounds sterling July 1977, are those that would be charged to a ‘man-in-the-street’ who arrived with a problem to be solved during prime shift. Before Scicon ran these problems they were able, by studying the structure of the models, to modify the problems so as to make them easier to solve. To PAINT, 7 extra constraints and 7 extra variables were added and priorities were specified so that the new variables were branched upon before the original variables. A similar strategy was followed for the solution of STEEL, extra constraints and variables were added and priorities set so that branching on the new variables was preferred to branching on the original variables. For each variable the down pseudo-costs were set to 1.0. The up pseudo-costs and the pseudo-shadow costs took their (very small) default values. Among the variables with equal highest priority the choice of branching variable, x k , was
xk = max O,,
where
0, = max ( D t , DT)
i
where D t and 07 are the up and down pseudo-shadow costs (v. Section 4.3.5). The setting of the down pseudo-costs and this branching variable option has the effect of selecting, as the branching variable, the variable with the largest
264
A. Land, S. Powell
fractional part. In the immediate strategy the problem with min ( D l ,D;) was solved, that is the up problem forcing the branching variable to 1.0. The non-zero elements in the objective function of STEEL had integer values so, as the problem is pure integer, the value of a solution will be integer. Thus during the solution of STEEL the cut-off was updated to be 0.99 less than the value of the incumbent, The STEEL problem was also run with the extra constraints and variables, the down pseudo-costs of 1.0, the same branching variable option and the updating of the cut-off but with no priorities. With this strategy the first integer solution had a value of 307.0 but the complete search took longer. In COAL some of the rows were replaced by semi-continuous variables and priorities and pseudo-costs assigned to the ‘important’ variables. Redundant rows and variables were removed from PLAN prior to running the problem and priorities were assigned to the ‘important’ variables. For all the test problems, apart from the non-standard options described here, Sciconic was run with the default options. Table A 3 shows the results of submitting three of the test problems to FMPS, manufactured by Univac. The code was run on a Univac 1120 computer under the 1100 operating system, level 33. All the times and iteration counts are from the continuous optimum, an iteration is a basis change and the branch count is incremented having branched on a variable and upon starting to solve one of the subproblems. Thus most commonly the branch count is increased by two at a node corresponding to solving the subproblems on the up and down branches. Each problem was submitted to FMPS in a sequence of runs. A run was stopped by one of the search parameters reaching its user specified upper limit e.g. the number of branches. At the end of a run the value of the incumbent was printed but the tree was not saved. In the succeeding run the value of the cut-off was changed. It was set at the incumbent value and the tree search was restarted from the continuous optimum. FMPS cannot handle general integer variables so, in order to run PAINT, each general integer variable was replaced by five 0-1 variables. For each. integer variable the five new variables were placed in consecutive columns and the blocks of five variables were in the same order as the original general integer variables. The ordering is important because FMPS uses only priorities in the choice of branching variable and, by default, priorities are assigned in input order (v. Section 4.3.1). Associated with each block of five new 0-1 variables was a new constraint relating the new variables to the original variables. Thus the PAINT problem was input to FMPS as a problem with 87 rows and 248 variables of which 115 are 0-1 variables. After the first run on PAINT the results of branching on the variables were inspected, and for the second and subsequent runs the variables were re-ordered, which has the effect of altering the priorities. In the solution of STEEL the results of the first and second runs were used to re-order the variables prior to the third run. Three of the test problems were submitted to MIP/370, manufactured by IBM.
Computer codes for problems of integer programming
Table A3. FMPS results Problem
PAINT"
STEEL
COAL
Continuous optimum
811.84
290.93
149706.86
First run first solution value branches iterations seconds
1129.0 1610 3593 265
385.0 43 1 1003 92
151288 89 1429 240
same as first solution branch 1700
same as first solution branch 1000
150861 107 1599 284 branch 120
1041.59 1299 3453 222
384.0 1145 2869 253
best solution value branches iterations seconds stopped at Second run first solution value branches iterations seconds best solution value branches iterations seconds stopped at Third run first solution value branches iterations seconds best solution value branches iterations seconds stopped at
973.1 7640 16903 1200 branch 10000
same as first solution branch 1700
309 15 39 4
307 21 51 12 branch 5000 after 6 mins
"PAINT was converted from a general integer problem to a 0-1 problem.
265
266
A. Land, S. Powell
The code was run on an IBM 370 series computer. For PAINT several integer solutions were found with values from 1041.6 to 909.7 and the latter was shown to be the optimal solution. For STEEL several integer solutions were obtained with values from 394 to 308 and for COAL two integer solutions with values of 157171 and 152617 were found. Regrettably we did not receive any iteration counts or computation times to go with these results. It is frequently suggested that the solution of an integer programming problem can be made easier by the addition of extra constraints to a problem. Such constraints may not affect the continuous optimum but they do aid the branch and bound search. In the case of these test problems the addition of extra constraints and variables to PAINT and to STEEL enabled Sciconic to find a very good first integer solution to both problems early in the search. For solution by FMPS and MIP/370 neither of the problems had extra constraints or variables and the value of the first solution is much further from that of the optimum solution. Another much discussed strategy for the solution of both linear and integer programs is to preprocess the problem data so as to remove redundant constraints and columns. With its many general integer variables PLAN is clearly a difficult problem. Before submission to Sciconic some of the redundant rows and constraints were removed from PLAN and the code was able to find an integer solution early in the search. It seems likely that both this preprocessing and the use of priorities contributed to the solution of this problem. Although this paper is a survey of codes, not a survey of methods, this limited experience suggests that there is scope for further work to improve formulation and preprocessing of various kinds.
Acknowledgements It will be obvious that this paper could never have been written without .a great deal of help and information from a very large number of people. They were all very generous with their time and patient and helpful when we pestered them for details of their codes. We are very grateful to all of them. Any misrepresentations of their codes are entirely our responsibility. We are very grateful to Janet Scott for patiently typing and retyping the paper through numerous revisions.
References [l] E.Balas, An additive algorithm for solving linear programs with zero-one variables, Operations Res. 13 (1965) 517-546. [2] E.M.L. Beale and J.J.H. Forrest, Global optimization using special ordered sets, Math. Programming 10 (1976) 52-69. [3] E.M.L. Beale and J.A. Tomlin, Special facilities in a general mathematical programming system for nonconvex problems using ordered sets of variables, in: J. Lawrence, ed., Proceedings of the Fifth International Conference on O.R. (Tavistock Publications, London, 1970) 447-454.
Computer codes for problems of integer programming
267
[4] D.E. Bell and J.F. Shapiro, A convergent duality theory for Integer Programming, Operations Res. 25 (1977) 419-434. [5] M. Benichou, J.M. Gauthier, P. Girodet, G. Hentges, G. Ribikre and 0. Vincent, Experiments in mixed-integer linear programming, Math. Programming 1 (197 1) 76-94. [6] M. Benichou, J.M. Gauthier, G. Hentges and G. Ribikre, The efficient solution of large scale linear programming problems. Some algorithmic techniques and computational results, Math. Programming 13 (1977) 280-322. [7] E.R. Brocklehurst, Generalized branch and bound: a method for integer linear programs, in: K.B. Haley, ed., Operational Research 75 (North-Holland, Amsterdam, 1976). [8] N. Christofides and M. Trypia, A sequential approach to the 0-1 linear programming problem, SIAM J. Appl. Math. 31 (2) (September 1976). [9] Z. Cylkowski and J. Kucharezyk, Solution of zero-one integer linear programming problems by Balas’ method, Zastos. Mat. 11 (1969) 111-116. [ 101 N.J. Driebeck, An algorithm for the solution of mixed integer programming problems, Management Sci. 12 (1966) 576-587. [I13 F. Fiala, Computational experience. with a modification of an algorithm by Hammer and Rudeanu for linear 0-1 Programming, Proc. ACM 1971, Nat. Conf. ACM, New York (1971) 482-488. [I21 J.J.H. Forrest, J.P.H. Hirst and J.A. Tomlin, Practical solution of large mixed integer programming problems with UMPIRE, Management Sci. 20 ( 5 ) (Jan. 1974) 736-773. [13] R.S. Garfinkel and G.L. Nemhauser, Integer Programming (John Wiley, New York, 1972). [ 141 J-M. Gauthier and G. Ribikre, Experiments in mixed-integer linear programming using pseudocosts, Math. Programming 12 (1977) 26-47. [I51 A.M. Geoffrion, Integer programming by implicit enumeration and Balas’ Method, SIAM Rev. 9 (2) (1967) 178-190. [16] A.M. Geoffrion, An improved implicit enumeration approach for integer programming, Operations Res. 17 (1969) 437-454. [ 171 A.M. Geoffrion and R.E. Marsten, Integer programming algorithms: a framework and state-ofthe-art survey, Management Sci. 18 (1972) 465-491. [ 181 G. Gerhardt, Gedanken zur Lijsung des Knapsack-Problems, Ablauf und Planungsforschung 11 (1970). [19] F. Glover, A multiphase dual algorithm for the 0-1 integer programming problem, Operations Res. 13 (1965) 879-919. [20] R.F. Gomory, An algorithm for integer solutions to linear programs in: R. Graves and P. Wolfe, eds., Recent Advances in Mathematical Programming (McGraw Hill, New York, 1963) 269-302. [21] R.F. Gomory, An all-integer integer programming algorithm in: J.F. Muth and G.L. Thompson, eds., Industrial Scheduling (Prentice-Hall Inc., Englewood Cliffs, NJ, 1963). [22] G.A. Gorry, W.D. Northup and J.F. Shapiro, Computational experience with a group theoretic integer programming algorithm, Math. Programming 4 (1973) 171-192. [23] J. Haldi and L.M. Isaacson, A computing code for integer solutions to linear programs, Operations Res. 13 (1965) 946-959. [24] P.L. Hammer and S. Rudeanu, Boolean Methods in Operations Research and Related Areas (Springer-Verlag, New York, 1968). [25] P.M.J. Harris, The solution of mixed integer linear programs, Operational Res. Quart. 15 (1964) 117-133. [26] T. Ibaraki, C.R. Baugh, T.K. Liu and S. Muroga, An implicit enumeration program for zero-ne integer programming, Internat. J. Comp. Information Sci. (March 1972) 75-92. [27] T. Ibaraki, T. Ohashi and H. Mine, A heuristic algorithm for mixed-integer programming problems, Math. Programming Study 2 (1974). [28] W.K.M. Keulemans and R. Kool, Gebruikershandleiding voor het lineare programmeringssysteem Nathalie, Dept. of Mathematics, Eindhoven University of Technology (1973). [29] J.L. Kuester and J.H. Mize, Optimization techniques with FORTRAN (McGraw-Hill, New York, 1973). [30] A. Land, and A.G. Doig, An automatic method of solving discrete programming problems, Econometrica 28 (3) (July 1960) 497-520.
268
A. Land. S. Powell
[31] A. Land and S . Powell, Fortran Codes for Mathematical Programming: Linear, Quadratic and Discrete (John Wiley, New York, 1973). [32] C. Lemke and K. Spielberg, Direct search 0-1 and mixed integer programming, Operations Res. 15 (1967) 892-914. [33] R.E. Marsten and T.L. Morin, A hybrid approach to discrete mathematical programming, O.R. 051-76 Operations Research Center, MIT, Cambridge MA 02139 (to appear in Math. Programming). [34] R.E. Marsten, An algorithm for large set partitioning problems, Management Sci. 20 (5) (1974) 174-87. [35] R.E. Marsten and T.L. Morin, Parametric integer programming: The right-hand-side case, O.R. 050-76 Operations Research Center, MIT, Cambridge MA 02139 (to appear in Ann. Discrete Math.). [36] P. Miliotis, Integer programming approaches to the travelling salesman problem, Math. Programming 10 (1976) 367-378. [37] G. Mitra, Investigation of some branch and bound strategies for the solution of mixed integer linear programs, Math. Programming 4 (1973) 155-170. [38] D.R. Plane and C. McMillan, Discrete optimization: Integer Programming and Network Analyses for Management Decisions (Prentice-Hall, Englewood Cliffs, NJ, 1971). [39] L. Schrage and S . Woiler, A general structure for implicit enumeration, Technical Report, Stanford University, Stanford, California (1967) (mimeographed). [40] J.F. Shapiro, A survey of Lagrangean techniques in discrete optimization, Technical Report MIT Operations Research Center (May 1977). [41] H. Spath, Ausgewahlte Operations Research Algorithmen in FORTRAN, Verfahren der Datenverarbeitung (R. Oldenbourg Verlag, Munchen-Wien). [42] J.A. Tomlin, Branch and bound methods for integer and non-convex programming in: J. Abadie, ed., Integer and non-linear programming (North-Holland, Amsterdam, 1970). [43] Y. Toyoda, A simplified algorithm for obtaining approximate solutions to zero-one programming problems, Management Sci. 21 (12) (1975) 1417-1427. [44] C.A. Trauth and R.E. Woolsey, Integer linear programming. A study in computational efficiency, Management Sci. 15 (1969) 481-493. [45] R.B. Wilson, Stronger cuts in Gomory’s all integer programming algorithm, Operations Res. 15 (1967) 155-156. [46] L.A. Wolsey, Extensions of the group theoretic approach in integer programming, Management Sci. 18 (1) (1971). [47] LP 360/370 Linear Programming System: User and Operating Manual, Nov. 1975, Haverley Systems Inc., 78 Broadway, Denville, N.J. 07834. [48] Apex 111 Reference Manual (Revision E 1976). Control Data Corporation Data Services Publications HQCO2C P.O. Box 0, Minneapolis, Minnesota 55440. [49] Mathematical Programming System Extended (MPSX), Mixed Integer Programming (MIP) Program Description, Program NO.5734-XM4, First Edition (February 1971). IBM Corporation, Technical Publications Dept., 1133 Westchester Avenue, White Plains, New York 10604. [50] Burroughs, B 7700/B Systems, Tempo, Mathematical Programming System, User’s Manual (1975), Documentation Department, Business Management and Scientific Systems, Burroughs Corporation, Burroughs Place, Detroit, Michigan 48232. [5 11 Sperry-Univac 1100 Series Functional Mathematical Programming System (FMPS)Programmer Reference UP-8198 (March 1975). [52] Users Guide to Sciconic (Version 3.2). Scicon Computer Services Ltd., Brick Close, Kiln Farm, Milton Keynes MK11 3EJ, U.K. [53] Mathematical Programming System (MPS) Mixed Integer Programming User’s Guide October 1974. Order No: DD62, Rev. 0, Honeywell Information Systems, 60 Walnut Street, Wellesley Hills, Mass. 02181. [54] IBM Mathematical Programming System. Extended/370 (MPSX/370), Mixed Integer Programming/370(uLP/370). Program Reference Manual Program: Product 5740-XM3 (OSlVS), 5746-XM2 (DOS/VS), Second Edition (November 1975). IBM Corporation Technical Publications Dept., 1133 Westchester Avenue, White Plains, New York 10604.
Computer codes for problems of integer programming
269
[55] Ophelie LP User Manual SIA (Service in Informatics and Analysis Ltd.) Ebury Gate, 23 Lower Belgrave Street, London SWlW ONW. [56] Linear Programming 400, System 4, ICL, Second Edition June 1970, International Computers Ltd, ICL House, Putney, London SW15 1SW. [57] Linear Programming Mark 3, 1900 Series (1967) International Computers Ltd., Software Distribution Dept., ICL House, Putney, London SW15 1SW.
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 271-274 @ North-Holland Publishing Company.
Report of the Session on CURRENT STATE OF COMPUTER CODES FOR DISCRETE OPTIMIZATION J.J.H. FORREST (Chairman) H. Crowder (Editorial Associate)
Introduction The material for this section was contributed by Robert Harder, Charles Krabek, Susan Powell, Michel Gondran, Kurt Spielgurg, Jeremy Shapiro, Harlan Crowder and others. As the session involved a great deal of discussion, it is not always possible to put forward a unified attitude to the present state of commercial codes. The discussion covered four main areas: (1) General discussion of present codes, (2) Criticisms of some aspects of present codes, (3) Suggestions and forecasts for improvements to future codes, (4) Forecasts of improvements in software associated with computer codes.
1. General discussion of present codes Robert Harder expressed the view of a user of commercial codes. At General Motors, the internal consulting group is faced with two general types of problem: short term - those requiring a fixed number of runs (e.g. investment studies) and long term - those requiring an unspecified number of runs, possibly over a period of years (e.g. production scheduling). Short term problems are only solved a few times and so good general purpose mixed integer or network codes may be used. These provide the flexibility required when problem formulation may change during the study. However, in long problems, flexibility is less important than efficiency because the formulation will remain static during the life of the application. The greater the frequency of runs, the more significant an improvement in efficiency will be. In the past, general purpose codes have been designed to suit the short-term application. It was noted that in many standard LP problems of this type, the cost of model formulation is much greater than the cost of model solution, possibly by a factor of ten to one or greater. As a result, most of an M P S has nothing to do 27 1
212
Report
with mathematical programming, but is concerned with the interface between the MPS, the matrix generator, and the report writer. The design of a general code must consider these interfaces as being more important to the user than the algorithm. In mixed integer problems, although the cost of model solution is increased while the cost of formulation remains static, the interface is still of prime interest to the user. However, the cost of model solution is determined by the efficiency of the algorithm and this must be the designer’s main concern. It was observed that evaluation of the performance of different versions of the branch and bound algorithm over a representative range of problems is difficult since there is not a pool of such problems on which code designers can test improved algorithms and heuristics. The feasibility of implementing cutting plane methods in commercial LP codes was discussed. Where it has been attempted the experience did not appear to be promising. Although adding rows is possible, it sometimes causes excessive movement of data. The bigger problem is that if only cutting planes are used there is less chance of obtaining feasible but suboptimal solutions than with a branch and bound code. If branch and bound is combined with cutting planes then other problems arise. For example, if problem characteristics are altered at each node, it may be necessary to store almost the whole problem at each branch. Among the reasons put forward for not implementing cutting plane methods in commercial codes were: (i) The determination of which cut in a large set of cuts is the strongest is a computationally complex problem. (ii) Cutting plane methods are mainly suited to pure integer problems which are only a small proportion of actual problems. (iii) Such methods are not suited to sparse matrices. (iv) At a particular node, much effort may be expended producing a cut which, once the branch is made, becomes non-binding and therefore weak. 2. Criticisms of present codes
It was suggested that most large mathematical programming systems are designed and built as straight linear programming systems and then integer programming is added as an afterthought. It was generally agreed that while most codes are designed with the knowledge that they will be used as a basis for an integer code, whenever there is a conflict of interest between IP needs and the LP needs, it is the requirements of the LP system that are satisfied. Only branch and bound ideas, LP rather than other IP methods, were taken into consideration in commercial code design. These are designed as closed software systems which are not easy to modify. There is no provision made in the design for particular portions of the code (e.g. the values of the integer variables after a branch) to be organized so that the information can easily be accessed and the algorithm
Current state of computer codes for discrete optimization
273
modified by users who are not versed in the data storage techniques used in LP codes. At present it may take several man years for a commercial LP code to be adapted to be a more academic test code. It may be impossible to provide a non-commercial code which is both reasonably efficient and also easy to modify. This, together with the lack of easily understandable access to the commerical codes, has to be overcome before new theoretically attractive algorithms can be implemented.
3. New algorithms for implementation Michel Gondran outlined a totally different approach to solving combinatorial problems in ALICE -A Language for Intelligent Combinatorial Exploration. But in general the group supported Jeremy Shapiro’s view that care should be taken not to make an artificial distinction between combinatorial problems and MIP problems; there is really a continuum of problems. At one end is MIP -the underlying LP is feasible for whatever values the integer variables take as there are no implicit or explicit constraints on them. On the other end are purely combinatorial problems (e.g. set partitioning). Most problems fall somewhere in between. Hence it is artificial to talk about pure IP codes and MIP codes. The best approach is to design codes which contain various algorithms and let the solution strategy be determined by each particular problem. It was agreed that commercially available codes are reasonably efficient at the MIP end of the spectrum. Several suggestions were made as to possible improvements to branch and bound codes to make them more efficient on problems towards the combinatorial end of the spectrum. Kurt Spielburg criticized the methods used in the choice of branching variables and in cutting down the combinatorial aspects of the problem (by fixing variables and tightening bounds). He suggested that good results could be obtained by generating ‘logical inequalities’ at each node, possibly using input data representing the implicit logical structure of the integer problem as well as information available from the code. A similar approach also mentioned was the use of dual methods to substantially decrease the duality gap between the LP and IP solutions without using cutting plane methods.
4. Surrounding software
As pointed out by Charles Krabek and others, a practitioner of the art of integer programming learns over the years how to formulate models so that they will run efficiently. In the field of LP this process has been automated to some extent by the use of matrix generator languages. In the case of IP, however, there is still a need for better modelling languages. In IP, a badly formed model can
274
Report
make a problem insoluble. In currently available model formulation languages, no method has been devised for concisely defining integer structure. Such a clear definition would enable the user to select the most suitable algorithm. Although it is impossible to predict the amount of user knowledge of the underlying problem which will always be required, this process could become at least partially automated. Therefore, there would appear to be a real benefit in future work which addresses the problems of: (i) finding a simplified notation for representing IP problems, (ii) automatically detecting usable structure, (iii) allowing simplification and compact representation of IP problems. Another area of interest to the group was interfacing M P systems with other applications. In model computing environments, it is common for the input data for MP systems (and other large computer application systems) to be obtained from large complex data base/data management systems. An M P system may have to accept data from, and also produce data for, such a system so that it can be used by another application program. The solution to this problem of data interchange can be expensive in manpower and computer time. Therefore it seems important to consider the need for such interfaces when designing and improving M P systems.
Taking into account the opinions of those involved in the session, it seems that there is a large class of problems for which the currently available codes are reasonably well suited but that there are two main areas for improvement. (i) Model formulation languages and implementation of these languages to analyze the problem so that advantage may be taken of any special structure in order that techniques such as network codes may be used. (ii) Extending the range of soluble problems by allowing access to the implicit logical structure of the problem in a branch and bound code. This would allow ideas from implicit enumeration techniques and dual methods to be implemented. For continued implementation of theoretical algorithms, ease of access to key portions of commercial codes will be necessary.
Annals of Discrete Mathematics 5 (1979) 275-278 @ North-Holland Publishing Company
Report of the Session on CODES FOR SPECIAL PROBLEMS F. GIANNESSI (Chairman) P. Miliotis (Editorial Associate)
The sophisticated codes of integer programming come into their own primarily for special problems. If the structure of such problems is known and properly exploited, a special code can perform orders of magnitude better than a general purpose system. General production codes, however, are more robust, both in terms of the availability of a highly powerful code and in terms of the ease with which unforeseen reformulations can be accommodated. The challenge lies in bridging this gap. Codes devised for specific purposes have virtually an unlimited number of possible applications. In spite of their potential, actual applications exist only for a relatively small number of fields. Airline scheduling problems are one of these fields. Manpower scheduling for reservations and for airports have been modeled as network, generalized set covering, linear assignment, and knapsack, problems. Flight crew scheduling has led to the development of simple set covering or partitioning and multiple traveling salesmen models. Aircraft planning has been approached with linear assignment and dynamic programming methods. Common features of these problems are the many “dirty” constraints (union rules, government regulations, etc.), which cannot be expressed as linear constraints, and the large matrices which are generated. For example, one set covering problem for crew scheduling resulted in 500,000 rows and 100,000,000 columns. Crew scheduling involves constructing a minimum cost roster that ensures that every flight is manned by a legal crew. Flight sequences known as “rotations” are constructed to begin and end in a base city; they may be from 1 to 13 days in length, but they must comply with the minimum rest and maximum time constraints at all times. Because of the difficulties in handling these enormous matrices, some airlines have tried the “small matrix” method. This essentially takes only a small subset of the problem into consideration at a time. Columns are generated as needed, allowing the small matrices to be solved iteratively. After this is done a few thousand times, a good solution to the original problem exists. This approach has been used effectively by many American airlines, but European companies usually have a star-shaped network with few points in common. If too small a subproblem is considered, 275
216
Report
there can be no improvements in that part of the problem. This approach has not been very successful for them. While the preceding three types of problems are given a flight schedule to follow, some airlines have designed models that will create the schedule. The company’s main goal is to sell flights; departure and arrival times are therefore selected to meet the passenger demand as closely as possible. But meeting the passenger demand is only part of the problem. An airline must also decide the type of aircraft to fly the flight and route the aircraft types throughout the whole system. The generated schedule, then, must determine the optimal departure times, aircraft assignment, and aircraft routing. To create the flights, the patterns of passenger travel is defined by a twenty-four hour histogram for each city pair. Nonstop flights are created, for each aircraft type eligible to fly the route, to correspond to the peak travel times. More than one aircraft may actually be needed to meet the demand, causing the “flight” to become several flights with different combinations of aircraft. The aircraft selection and routing must be done simultaneously. This leads to an integer programming model with 6000 rows and 65,000 integer variables for an airline with 400 city pairs and four aircraft types. Although this would be for a relatively small airline, even this problem cannot be solved by conventional integer programming codes. Special algorithms are necessary to effectively take advantage of the imbedded network in the structure. This type of problem remains a challenge today. Much effort has also been spent in developing efficient models for managing aircraft, both for finding the optimal fleet size for assignment to flights, and for solving related problems such as fuel minimization, Airlines are not the only field where special discrete optimization codes have been developed. Apart from other transportation companies such as railroads, which have similar problems, there are several areas which have experimented with and implemented these methods. Plant location problems are included in these areas. Models containing 100 plants and 100 destinations seem to behave very well if approximate solutions to the dual of the underlying relaxed linear program are constructed heuristically. Closely related problems are the distribution models for which good results have been obtained with commercial codes. Another area is in machine maintenance. A problem for 50 machines and 52 time periods becomes a model with some 2500 integer variables. Even though this could possibly be solved by conventional integer programming, it would require a substantial amount of time. A special enumeration code has been developed, however, which handles the multiple choice sets and the Conflict constraints implicitly. Excellent solutions are produced in minutes, even though the code is written in inherently slower interpretive language. These examples are particularly illustrative of the challenges which special problems pose. The need to develop an effective, robust code is one of the critical
Codes for special problems
211
aspects which influence the growing theory of discrete optimization and mathematical programming methods in general. Some of these aspects will be briefly discussed. For many years, the major effort has been devoted to solving a “pure” model more efficiently; i.e., the simplification of a real situation. As a matter of fact, it is often impossible to devise a meaningful mathematical formulation of all the constraints and features of actual problems. A pure model is therefore derived by disregarding a set of constraints, sometimes called side constraints. For example, numerous applications have been solved using the travelling salesman problem by ignoring additional constraints. Any improvement in existing or new methods for pure models is important from a general viewpoint, but it may not represent a contribution to the development of the applications. In recent years, a distinction has been introduced between model-oriented and problem-oriented methods; the meaning becomes clearer as the needs for solving actual problems are specified. Methods which obtain an optimal solution in a single step, like a black box, are of the former type. A lack of methods of the latter kind has been pointed out several times. If model-oriented methods ever reach a limit where no new techniques are developed, then only problem-oriented versions of existing methods would be produced. This would result in an ideal situation for the applications and may also stimulate theoretical developments. It is difficult to systematically define the main requirements for problemoriented methods. Often, real problems do not ask for an optimal solution, but for a way of improving a given feasible solution by taking side constraints into consideration. A desired feature of a solution method, therefore, consists of descending to an optimal solution step by step; at each step, a feasible solution (not only for the pure model, but for the real one also) and an upper bound of the distance from the optimum are available. When this is possible, many other features can be exploited. For example, interactive processing on a computer would allow more control of the solution process. The solution procedure then becomes an expanding tool of human experience, rather than an alternative. Branch and bound procedures have contributed to this direction, and their success might be explained by this in spite of their poor mathematical framework. Another possibility lies in the possible use of new improved control languages which permit the calling of the system’s robust linear programming code. Interrogating and possibly changing the data are also permitted from the user’s routines. Matrix generation and, more generally, problem pre-analysis seem to be increasingly important aspects which may make algorithm applications successful. Pre-processing can produce a far easier problem to optimize. Until recently, such topics have not received much attention from a systematic viewpoint. Another important aspect is the decomposition of models or of systems.
278
Report
Decomposition is a field where the definition of the mathematical method and the implementation of the computer procedure are not two distinct and consecutive phases; they are strongly intertwined. The major advantage is that the same amount of resources (computer time, for example) can solve larger problems. Since the proposal of the Dantzig-Wolfe decomposition method for linear programs, this direction has been pursued extensively. Another growing advantage is the possiblity of maintaining an animated dialogue with the model to control the decomposition. This would require a hierarchical decentralization of the system into several subsystems. Bender’s decomposition, which has a deep mathematical background, has recently been recognized to be useful for many practical requirements. In our opinion, this tool will play a key role in the applications of mathematical programs. Additional techniques are needed to remove other obstacles in solving application problems. Even though they are usually presented in the form of theoretical questions, they are practical also. One is the need to efficiently handle sets of constraints, most of which are redundant or not binding at the optimal point. Sequential methods seem to be the best answer. Another is a method to resolve degeneracy in linear problems; many combinatorial and continuous problems are most difficult for this reason alone. All the aspects outlined here, and several others which should have been treated, can be condensed into the following questions. In recent years, they have been posed more and more frequently. What is meant by the application of discrete optimization methods to real life? Is it enough to define a model and build a resolution algorithm, which are then used as a kind of logical support for reasoning? Or is it necessary that the model become an integral part of real life? How many of the proposed models have found an application? Do we know the causes of failures? Are the absence of applications or the failures due to the lack of an adequate model or of a suitable software, including preprocessing and data base management? In several areas of transportation systems, many large problems of the scheduling type are treated by means of heuristic (commercial or not) codes, even if the literature contains models for them and related algorithms are day by day improved. Have we any knowledge of the causes of such a dichotomy? Do we feel that the feed-back from the needs of a real situation toward the definition of models and related algorithms should be improved? We should have much more quantitative insight into these questions, rather than having to rely on personal impressions. It is believed that the creation of specialized organizations, like the optimization laboratories suggested by G. B. Dantzig, could be a contribution to answer such questions. They should play the role of maintaining a certain level of correspondence between methodological tools and real life problems.
Annals of Discrete Mathematics 5 (1979) 279-283 @ North-Holland Publishing Company.
Report of the Session on CURRENT COMPUTER CODES S. POWELL (Chairman) P. Miliotis (Editorial Associate)
1. Current commercial codes
In their survey, Land and Powell found 11 commercially available computer codes for mixed integer programming (ref: D077). All of these codes use branch and bound and all are designed to solve mixed integer programming problems with relatively few integer variables compared to the number of continuous variables. The codes are all at different stages of the life cycle: some are elderly and will soon be retired while some are new, full of new ideas, and should continue to be developed over the years. It is interesting that of the 11 codes surveyed, only 5 are under current development. The survey paper is possibly misleading since it appears that all the codes have a bewildering number of options and that, in order to successfully use a code, the user is required to select the search strategies. This is not the case since all of the codes have a recommended strategy that is the default and is employed by 90% of the users. The other options are only used if the branch and bound search becomes excessive or if the user knows something about his problem. Given enough computing time, a branch and bound code will eventually find the optimal solution. However, because of the expense of such a complete search, users rarely prove optimality; they are usually satisfied with one or more good solutions. Thus code writers put considerable effort into developing default options whose aim is to find a “good” solution as soon as possible. Despite these provisions for a good default strategy, all the codes encourage user participation. Most codes allow the user to set priorities to define the branching order, specify objective function cut-off values, and allow control of the tree search. In this way, the state of integer programming codes is similar to the state of linear programming codes 10 years ago. Then, in order to solve a large linear programming problem, the user had to adjust tolerances and parameters, whereas today the user expects the code to do these things automatically. Over the past 5 years integer programming codes have become 10 times faster on the same computer. Half of this improvement is due to algorithmic improvements in the linear programming part of the branch and bound code, such as Kalan’s matrix pool, being systematic about Paula Harris’ scaling, and triangular factorisation. The other half of the improvement has come from tidying up the 279
280
Report
integer search. With all these improvements it is now possible to solve problems that were previously not solvable. There are many types of integer programming models, varying from on-line scheduling problems with a small number (not more than 50) integer variables that are run 10 times a day to long-term investment models that are only occasionally run. It has been estimated that since 1970 the number of problems that have been solved and used worldwide is in the thousands (less than 10,000) though the number of computer runs is of the order of 100,000. The consulting firm, Scicon, estimates that about 90% of their mathematical programming system runs are linear programming and 10% are mixed integer and non-linear programming. However, about 90% of their clients consider it important to have the integer facility in case they wish to add integer variables to their model at short notice. The mere existence of a mixed integer capability in a mathematical programming system is more important in this case than the actual use of the facility.
2. Future commercial codes The major commercial code developers consider that there will continue to be a demand for general mixed integer programming codes. Future codes will continue to use branch and bound with a linear programming code as a base since linear programming codes efficiently handle large sparse matrices and because the solution of the relaxed linear programming problem is a “not bad” starting point for a branch and bound search. The aim of a general code will continue to be to solve any problem. It is unlikely that future general purpose codes will be based exclusively upon a particular data structure, e.g. a network, because, although this would make the solution of some problems faster, it would make some problems unsolvable. There is considerable requirement for the development of new algorithms. Hybridisation of algorithms (that is combining integer programming procedures such as branch and bound with Lagrangean methods) was suggested as an area for new developments. The majority of models have a special structure and algorithms should be developed to exploit this structure. Such algorithms will probably help in the solution of combinatorial problems which are currently difficult to solve. It is important to realise that modifying a commercial code is expensive and the only ideas that will be implemented are those that are relatively easy to develop and inexpensive compared to the expected return. The time taken to solve an integer programming problem is strongly influenced by a good analysis and formulation of the problem. Currently an experienced user will inspect his problem to see if new constraints that tighten the problem can be added, whether the bounds on the integer variables can be tightened, or if a set of variables can be excluded. He will analyse the problem to decide which are the
Current computer codes
281
important integer variables and then assign priorities so that these variables are branched upon early in the tree search. It is anticipated that future codes will do much of this model analysis automatically thus reducing manual intervention.
3. Data Any attempt to compare different commercial codes is fraught with difficulties. The obvious method is to run all the codes on the same set of test problems. However Land and Powell experienced great difficulty in finding suitable test problems. In addition, because of formatting difficulties, none of the problems tested had special ordered sets. The importance of a good analysis of the problem cannot be overstressed. Thus the results of running the test problems should be viewed cautiously because, in many ways, the problems test the people who formulate them more than the codes themselves. These cautionary remarks are applicable to any computational experiment which involves testing an integer programming code or algorithm. There is an interaction between the problems that are submitted for solution and available software. The availability of general mixed integer programming packages that inexpensively fmd good solutions to a well formulated problem with few integer variables means that customers submit this type of problem for solution. New algorithms are developed because customers have problems that they want solved and which are difficult to solve with current systems. Currently few all-integer problems are submitted for solution to a commercial branch and bound code, and when they are, the solution of such problems tends to be expensive. It is a moot point whether there are few commercial all-integer problems to be solved or whether the inefficacy of existing software means that few all-integer problems are formulated. The optimal solution of a linear programming problem is usually easy to obtain, while a good solution of an integer programming problem may be expensive and an optimal integer solution may be prohibitably expensive. By suitably formulating a model, it may be possible to use the linear programming solution and save the expense of fmding an integer solution. In many situations, however, integer solutions are needed. Hence there is a great interest among users in seeing the efficiencies of integer programming codes increased.
4. S p e d purpose codes A special purpose code is designed to solve problems with a particular structure such as networks, set covering problems, plant location problems, etc. By exploiting the structure, these codes are able to solve specialised problems faster than general purpose codes. The disadvantage of such a code is that only problems with the particular structure may be solved. There are successful codes which have
282
Report
overcome this disadvantage. A network problem with additional linear constraints has been successfully solved by interlocking a special purpose network code and an efficient linear programming code. There exist special purpose branch and bound codes using an efficient linear programming system and a high level control language such as FORTRAN or PL/l; the latter can communicate directly with the linear programming data structures. These codes have branching rules which exploit the problem structure and allow the solution of large complex problems which are unsolvable with general purpose codes. It is difficult to gauge the demand for special purpose codes. The general purpose code developers consider that the demand is inadequate to support the development and maintenance of specialised codes; others feel that there is a good market for such codes.
5. Academic contriiution Since Gomory first published his cutting plane algorithms, the academic world has devoted considerable effort to developing algorithms to solve integer programming problems. The original work on branch and bound was done by Land and Doig, who were academics. Little, Murty, Sweeney and Karel demonstrated that the travelling salesman problem could be solved by branch and bound procedures. From these beginnings all the algorithmic development of branch and bound has been done by people working in commercial organisations. It is very striking that, with the exception of the initial branch and bound work, none of the methods proposed by the academic community are used in today's commercial codes. Some of the possible reasons for this are discussed below. A lot more academic research would be implemented if more attention was paid to the practical problems of how these ideas are to be implemented in the context of current commercial codes. There are codes which are generally available that could be used in academic research for algorithmic development. For instance, the code recently released by Saunders and Tomlin at Stanford contains most of the elements of commercial codes with which one needs to be concerned. Academics, having done excellent work with new ideas on small test problems, need to extend it to larger problems. Unless this is done, algorithmic ideas tend not to get implemented. Often this testing demands a robust linear programming code. A solution to this problem is to use one of the codes referred to above or to interact with one of the commercial codes. In IBMs h4PSX/370 there is a PL/1-based control language. The user can code an algorithm in PL/1 and access the linear programming routines and data whenever they are required. Currently the principal disadvantage of this solution is the high cost of renting the commercial code.
Current computer codes
283
Academics often blur the difference between algorithmic and software development. Once the algorithmic steps have been developed then a lot of work has to be done on software design. In a commercial code, 80-90% of the coding has nothing whatsover to do with integer or linear programming, it is concerned with data handling features. The academic environment is possibly not the best one for the development of large scale codes. A source of inspiration to commercial code developers is the customer’s problems that he wants solved. It has been suggested that an academic, to be successful in algorithmic and code development, also needs a client with a problem. As in commercial organisations, this forces an academic to be concerned with real problems and develop algorithms and codes to solve them. There are a few semi-experimental, semi-commercial codes that are currently finding good solutions to large (1000 rows and 200 integer variables) problems.
This Page Intentionally Left Blank
PART 5
APPLICATIONS
CONTENTS OF PART 5 Surveys R.L. GRAHAM, E.L. LAWLER, J.K. LENSTFUand A.H.G. RINNOOY KAN, Optimization and approximation in deterministic sequencing and scheduling: a survey 287 J. KRARUPand P. PRUZAN, Selected families of location problems 327 S. ZIONTS,A survey of multiple criteria integer programming methods 389
Reports Industrial applications (E.M.L. BEALE) Modeling (D. KLINGMAN) Location and distribution problems (J. KRARUP) Communication and electrical networks (M. SEGAL) Scheduling (A.H.G. RINNOOYKAN)
399 405 41 1 417 423
Annals of Discrete Mathematics 5 (1979) 287-326 @ North-Holland Publishing Company
OPTIMIZATION AND APPROXIMATION IN DETERMINISTIC SEQUENCING AND SCHEDULING: A SURVEY R.L. GRAHAM Bell Laboratories, Murray Hill, NJ, U.S.A.
E.L. LAWLER University of California, Berkeley, CA, U.S.A.
J.K. LENSTRA Mathematisch Centrum. Amsterdam, The Netherlands
A.H.G. RINNOOY KAN Erasmus University, Rotterdam, The Netherlands The theory of deterministic sequencing and scheduling has expanded rapidly during the past ycars. In this paper we survcy the state of the art with respect to optimization and approximation algorithms and interpret these in terms of computational complexity theory. Special cases considered are single machine scheduling, identical, uniform and unrelated parallel machine scheduling, and open shop, flow shop and job shop scheduling. We indicate some problems for future research and include a selective bibliography.
1. Introduction In this paper we attempt to survey the rapidly expanding area of deterministic scheduling theory. Although the field only dates back to the early fifties, an impressive amount of literature has been created and the remaining open problems are currently under heavy attack. An exhaustive discussion of all available material would be impossible -we will have to restrict ourselves to the most significant results, omitting detailed theorems and proofs. For further information the reader is referred to the classic book by Conway, Maxwell and Miller [Conway et al. 19671, the more recent introductory textbook by Baker [Baker 19741, the advanced expository articles collected by Coffman [Coffman 19761 and a few survey papers and theses [Bakshi & Arora 1969; Lenstra 1977; Liu 1976; Rinnooy Kan 19761. The outline of the paper is as follows. Section 2 introduces the essential notation and presents a detailed problem classification. Sections 3, 4 and 5 deal with single machine, parallel machine, and open shop, flow shop and job shop problems, respectively. In each section we briefly outline the relevant complexity 287
288
R.L. Graham
et
al.
results and optimization and approximation algorithms. Section 6 contains some concluding remarks. We shall be making extensive use of concepts from the theory of computational complexity [Karp 1972, 19751. An introductory survey of this area appears elsewhere in this journal [Lenstra & Rinnooy Kan 1978Bl and hence terms like (pseudo)polynomial-time algorithm and (binary and unary) NP-hardness will be used without further explanation. 2. Problem classification
2.1. Introduction
Suppose that n jobs J, ( j = 1 , . . . , n ) have to be processed on m machines Mi ( i = 1, . . . , m ) . Throughout, we assume that each machine can process at most one job at a time and that each job can be processed on at most one machine at a time. Various job, machine and scheduling characteristics are reflected by a 3-field problem classification a 1pI y, to be introduced in this section. 2.2. Job data In the first place, the following data can be specified for each 4: a number of operations mi; one or more processing times pi or pii, that Jj has to spend on the various machines on which it requires processing; a release date rj, on which Ji becomes available for processing; a due date dj, by which J, should ideally be completed; a weight wi, indicating the relative importance of J j ; a nondecreasing real cost function fi, measuring the cost fi(t) incurred if Jj is completed at time t. In general, mi, pi, pii, ri, di and wj are integer variables.
2.3. Machine environment We shall now describe the first field a = a,azspecifying the machine environment. Let odenote the empty symbol. If a I E { 0 , P,Q, R } , each Ji consists of a single operation that can be processed the processing time of Ji on Mi is pii. The four values are characon any M i ; terized as follows: a,= 0 : single machine; p l i = pi; a , = P: identical parallel machines; pi; = p i (i = 1, . . . , m ) ; a 1= Q: uniform parallel machines; pii = qipi for a given speed factor qi of Mi ( i = 1 , ..., m ) ; a ,= R: unrelated parallel machines. If a1= 0, we have an open shop, in which each Ji consists of a set of operations {Oli,.. . , Omi}.0, has to be processed on Mi during pii time units, but the order
Deterministic sequencing and scheduling
289
in which the operations are executed is immaterial. If a 1E{F,J}, an ordering is imposed on the set of operations corresponding to each job. If a 1= F, we have a flow shop, in which each Ji consists of a chain (Oli,. . ., 0,). Oii has to be processed on Mi during pij time units. If a1= J, we have a job shop, in which each J,. consists of a chain (Oli, . . . , O?,). Oii has t o . be processed on a given machine pii during pii time units, with pi-l,j$pii for i = 2 , . . . , mi. If a2 is a positive integer, then m is constant and equal to aZ.If a2= 0 , then rn is assumed to be variable. Obviously, a , = if and only if a2= 1. 0
2.4. Job characteristics
The second field P c {P1,. . . , Ph} indicates a number of job characteristics, which are defined as follows. (1)
PI E{Pmtn, 01. PI= prntn: Preemption
(job splitting) is allowed; the processing of any operation may be interrupted and resumed at a later time. p1= 0 : No preemption is allowed.
(2) P 2 ~ { r e sresl, , p2 = res: The presence of s limited resources Rh ( h = 1, . . . , s) is assumed, with the property that each Jj requires the use of rhi units of Rh at all times during its execution. Of course, at no time may more than 100% of any resource be in use. p2 = res 1: The presence of only a single resource is assumed. p2 = 0 : No resource constraints are specified. 0).
(3) P3 E bet, tree, 01. p3 = prec: A precedence relation
(4)
01.
P 4E
p4= rj: Release dates that may differ per job are specified. p4= We assume that rj = 0. 0 :
( 5 ) P s E {mi
m, 01.
ps = m, G m: A constant upper bound on mi is specified (only if No such bound is specified. ps= 0 :
(6)
E {pij = 1 _P pij B, 01Ph = pii = 1: Each operation has unit processing time.
Ph
a 1= J).
290
R.L. Graham et al.
p6 = p S pij 6 p: Constant lower and upper bounds on pii are specified. No such bounds are specified. p6 = 0 :
2.5. Optimality criteria The third field y~Cf,,,,C f i } refers to the optimality criterion chosen. Given a schedule, we can compute for each 4 : the completion time C,; the lateness Li = C;.- di; the tardiness T, = max (0, C, - d i } ; the unit penalty U, = if Ci 6 di, then 0, else 1. The optimality criteria most commonly chosen involve the minimization of fmax
where
fmax
E { C m a x , Lax}
= maxi
{fi(Ci)}with fi(C;.)= Ci, L,, respectively, or
T,, U,, wiCi, w,T, wiU,, respectively. where 1 fi =Xi”=, fi(Ci)with fi(C,)= C;., It should be noted that C wiC, and C wiLi differ by a constant 1 widi and hence are equivalent. Furthermore, any schedule minimizing L,, also minimizes Tmax and Urn,,,but not vice versa. The optimal value of y will be denoted by y*, the value produced by an (approximation) algorithm A by y ( A ) . If a known upper bound p on ?(A)/?*is best possible in the sense that examples exist for which y ( A ) / y * equals or asymptotically approaches p, this will be denoted by a dagger (t).
2.6. Examples
1 lprecl Lmax:minimize maximum lateness on a single machine subject to general precedence constraints. This problem can be solved in polynomial time (Section 3.2). R lpmtnl C Ci: minimize total completion time on a variable number of unrelated parallel machines, allowing preemption. The complexity of this problem is unknown (Section 4.4.3). 53 )pi;.= I ) Cmax: minimize maximum completion time in a 3-machine job shop with unit processing times. This problem is NP-hard (Section 5.4.1).
2.7. Reducibility among scheduling problems Each scheduling problem in the class outlined above corresponds to an %tuple 2ri is a vertex of graph Gi, drawn in Fig. 2.i ( i = 1, . . . , 8 ) . For two problems P’= (v[)x=, and P = (vi)f=,, we write P’+ P if either v [ = vi or Gi contains a directed path from v [ to v,, for i = 1 , . . . , 8 . The reader should verify that P’ + P implies P’ P. The graphs thus define elementary reductions among (q):=,, where
29 1
Deterministic sequencing und scheduling
u
Fig. 2.1. G,; c denotes an integer constant.
Fig. 2.2. G,.
Fig. 2.3. G,.
Fig. 2.4. G,.
Fig. 2.5. G,.
Fig. 2.6. G,.
3 F;F;@ P..' I
Fig. 2.7. G,.
Cmax
Fig. 2.8. G,.
scheduling problems. It follows that if P I - P and P E P , then P ' E P ; if P'+ P and P is NP-hard, then P is NP-hard.
3. Single machine problems 3.1. Zntroduction The single machine case has been the object of extensive research ever since the seminal work by Jackson [Jackson 19.551 and Smith [Smith 19561. We will give a brief survey of the principal results, classifying them according to the optimaiity criterion chosen. As a general result, we note that if ail ri = O we need
292
R.L. Graham et af.
only consider schedules without preemption and without machine idle time [Conway et al. 19671.
3.2. Minimizing maximum cost The most general result in this section is an O(n2) algorithm to solve 1 lprecl fmaX for arbitrary nondecreasing cost functions [Lawler 19733. A t each step of the algorithm, let S denote the index set of unscheduled jobs, let p(S)=CiESpj,and let S ’ c S indicate the jobs all whose successors have been scheduled. One selects J k for the last position among (4 I ] E S} by requiring that fk(p(s))Gfi(p(s)) for all j E S’. For 1 )I L,,,, this procedure specializes to Jackson’s rule: schedule the jobs according to nondecreasing due dates [Jackson 19551. Introduction of release dates turns this problem into a unary NP-hard one [Lenstra et al. 19771. 1 Iprec, rj, pi = 11 L,,, and 1 Ipmtn, prec, riJL,,, can still be solved in polynomial time: first update release and due dates so that they suitably reflect the precedence constraints and then apply Jackson’s rule continually to the set of available jobs [Lageweg et al. 19761. Various elegant enumerative methods exist for solving 1 Iprec, ril L,,,. Baker and Su [Baker & Su 19741 obtain a lower bound by allowing preemption; their enumeration scheme simply generates all active schedules, i.e. schedules in which one cannot decrease the starting time of an operation without increasing the starting time of another one. McMahon and Florian [McMahon & Florian 19751 propose a more ingenious approach; a slight modification of their algorithm allows very fast solution of problems with up to 80 jobs [Lageweg et al. 19761.
3.3. Minimizing total cost 3.3.1. 1 [PI 1 wiC, The case 1 1) C wiC, can be solved in O(n log n) time by Smith’s rule: schedule the jobs according to nonincreasing ratios wi/pi [Smith 19561. If all weights are equal, this amounts to the SPT rule of executing the jobs on the basis of shortest processing time first, a rule that is often used in more complicated situations without much empirical, let alone theoretical support for its superior quality (cf. Section 5.4.2). This result has been extended to O(n log n) algorithms that deal with tree-like [Horn 1972; Adolphson & H u 1973; Sidney 19751 and even series-parallel [Knuth 1973; Lawler 19781 precedence constraints; see [Adolphson 19771 for an O(n3) algorithm covering a slightly more general case. The crucial observation to make here is that, if 4 <Jkwith wi/pj < Wk/Pk and if all other jobs either have to precede Jj, succeed Jk,o r are incomparable with both, then Ji and J k are adjacent in at least one optimal schedule and can effectively be treated as one job with processing time pi+pk and weight wj+wk. By successive application of this device, starting at the bottom of the precedence tree, one will eventually obtain
Deterministic sequencing and scheduling
293
an optimal schedule, Addition of general precedence constraints results in NPhardness, even if all pi = 1 or all wi = 1 [Lawler 1978; Lenstra & Rinnooy Kan 1978Al. If release dates are introduced, 1 lrjl 1 Ci is already unary NP-hard [Lenstra et al. 19771. In the preemptive case, l)prntn, ri 1 C, can be solved by an obvious extension of Smith’s rule, but, surprisingly, (pmtn, ri(C wiCi is unary NP-hard [Labetoulle et al. 19781.
3.3.2. 1 101C wiT 111 CwilT;. is a unary NP-hard problem [La! ler 1977; Lenstra et al. 19771, for which various enumerative solution methods have been proposed, some of which can be extended to cover arbitrary nondecreasing cost functions. Lower bounds developed for the problem involve a linear assignment relaxation using an underestimate of the cost of assigning Ji to position k [Rinnooy Kan et al. 19751, a fairly similar relaxation to a transportation problem [Gelders & Kleindorfer 1974, 19751, and relaxation of the requirement that the machine can process at most one job at a time [Fisher 19761. In the latter approach, one attaches “prices” (i.e., Lagrangean multipliers) to each unit-time interval. Multiplier values are sought for which a cheapest schedule does not violate the capacity constraint, The resulting algorithm is quite successful on problems with up to 50 jobs, although a straightforward but cleverly implemented dynamic programming approach [Baker & Schrage 19781 offers a surprisingly good alternative. If all pi = 1, we have a simple linear assignment problem, the cost of assigning Ji to position k being given by f , ( k ) . If all wi = 1, the problem can be solved by a pseudopolynomial algorithm in O(n“C pi) time [Lawler 19771; the computational complexity of 1 (1 1 with respect to a binary encoding remains an open question. Addition of precedence constraints yields NP-hardness, even for 1 (prec, pi = 111 lT;. [Lenstra & Rinnooy Kan 1978Al. If we introduce release dates, 1 Irj, pi = 11 C w i q can again be solved as a linear assignment problem, whereas 1 Irjl T, is obviously unary NP-hard (6.Section 2.7).
c
3.3.3. 1 10) w i q An algorithm due to Moore [Moore 19681 allows solution of 1 IIC Ui in O(n log n) time: jobs are added to the schedule in order of nondecreasing due dates, and if addition of Ji results in this job being completed after di, the scheduled job with the largest processing time is marked to be late and removed. This procedure can be extended to cover the case in which certain specified jobs have to be on time [Sidney 19731. The problem also remains solvable in polynomial time if we add agreeable weights (i.e., pi
+
R.L. Graham et al.
294
Again, 1 Iprec, pi = 11 1Ui is NP-hard [Garey & Johnson 1976A.3, even for chain-like precedence constraints [Lenstra -1. Of course, 1 Ir,lC Ui is unary NP-hard. The preemptive case 1 lpmtn, ril Ui is an intriguing open problem. Very little work has been done on worst-case analysis of approximation algorithms for single machine problems. For 1 ( 1 1wiU,, Sahni [Sahni 19761 presents algorithms Ak with O(n3k)running time such that
c w i U i ( A , ) / ~w i o T a I
-1
k’ where U, = 1- Ui. For 1 (tree)C w i Y , Ibarra and Kim [Ibarra & Kim 19751 give algorithms Bk of order O(knk+’)with the same worst-case error bound.
4. Parallel machine problems
4. I . Introduction Recall from Section 2.3 the definitions of identical, uniform and unrelated machines, denoted by P, Q and R, respectively. Nonpreemptive parallel scheduling problems tend to be difficult. This can be inferred immediately from the fact that P2 11 C,,, and P 2 IIC wiCi are binary NP-hard [Bruno et al. 1974; Lenstra et al. 19771. If we are to look for polynomial algorithms, it follows that we should either restrict attention to the special case pi = 1, as we do in Section 4.2, or concern ourselves with the C;.criterion, as we do in the first three subsections of Section 4.3. The remaining part of Section 4.3 is entirely devoted to enumerative optimization methods and approximation algorithms for various NP-hard problems. The situation is much brighter with respect to preemptive parallel scheduling. For example, P lpmtn) C,,, has long been known to admit a simple O ( n ) algorithm [McNaughton 19591. Many new results for the C C,, C,,, and L,,,, criteria have been obtained quite recently. These are summarized in Section 4.4. With respect to other criteria, P2 lpmtnl C wiCi turns out to be NP-hard (see Section 4.4.1). Little is known about P (pmtnlC T, and P (pmtnl C U,; these problems remain open. However, we know from Section 3 that 1 lpmtnl C wiT, and 1 lpmtnl C wiUi are already NP-hard.
4.2. Nonpreemptive scheduling: unit processing times 4.2.1. Q Ipi = I I 1f;., Q 1 pi = 1I fm,, A simple transportation network model provides an efficient solution method for Q Ip, = 1)C f;. and Q Ipi = 11fmax. Let there be n sources j ( j = 1 , . . . , n ) and mn sinks (i, k ) ( i = 1 , . . . , m, k = 1, . . . , n ) . Set the cost of arc ( j , ( i , k ) ) equal to Cijk = f i ( k q i ) . The arc flow Xijk is
Deterministic sequencing arid scheduling
295
to have the interpretation: x.. = Irk
{
1 if Ji is executed on Mi in the kth position,
0 otherwise.
Then the problem is to minimize CijkXijk i.1.k
or max {Cijkxijk} i.i.k
subject to
C
Xllk
= 1 for all j ,
C
xllk
s 1 for all i, k ,
1.k
I
Xllk
20
for all i, j , k.
The time required to prepare the data for this transportation problem is O(mn2). A careful analysis reveals that the problem can be solved (in integers) in O(n7) time. Since we may assume that r n ~ n the , overall running time is O(n’). It may be noted that some special cases can be solved more efficiently. For instance, P (pi= 11 1 V, can be solved in O(n log n ) time [Lawler 1976Al. 4.2.2. P Iprec, pi = 1)C,,,, P (prec,pi = 11 C,,, is known to be NP-hard [Ullman 1975; Lenstra & Rinnooy Kan 1978Al. It is an open question whether this remains true for any constant value of m 2 3. The problem is in 9,however, if the precedence relation is of the tree-type or if m = 2. P Itree, p, = 11 C,,, can be solved in O(n) time by Hu’s algorithm [Hu 1961; Hsu 1966; Sethi 1976Al. The level of a job is defined as the number of jobs in the unique path to the root of the precedence tree. At the beginning of each time unit, as many available jobs as possible are scheduled on the m machines, where highest priority is granted to the jobs with the largest levels. Thus, Hu’s algorithm is a nonpreemptive list scheduling algorithm, whereby at each step the available job with the highest ranking on a priority list is assigned to the first machine that becomes available. It can also be viewed as a critical path scheduling algorithm: the next job chosen is the one which heads the longest current chain of unexecuted jobs. If the precedence constraints are in the form of an intree (each job has at most one successor), then Hu’s algorithm can be adapted to minimize L,,,; in the case of an outtree (each job has at most one predecessor), the L,,, problem turns out to be NP-hard [Brucker et al. 19771. P2 Iprec, pi = 11 C,,, can be solved in O(n2) time [Coffman & Graham 19721. Previous polynomial-time algorithms for this problem are given in [Fujii e t al. 1969, 1971; Muraoka 19711.
296
R.L. Graham et al.
In the approach due to Fujii et al., an undirected graph is constructed with vertices corresponding to jobs and edges { j , k} whenever Ji and J k can be executed simultaneously, i.e., J i K J , and JkKJj. An optimal schedule is then derived from a maximum cardinality matching in the graph. Such a matching can be found in O(n3) time [Lawler 1976Bl. The Coffman-Graham approach leads to a list algorithm. First the jobs are labelled in the following way. Suppose labels 1, . . . , k have been applied and S is the subset of unlabelled jobs all of whose successors have been labelled. Then a job in S is given the label k + l if the labels of its immediate successors are lexicographically minimal with respect to all jobs in S. The priority list is given by ordering the jobs according to decreasing labels. It is possible to execute this algorithm in time almost linear in n + a , where a is the number of arcs in the transitive reduction of the precedence graph (all arcs implied by transitivity removed) [Sethi 1976Bl. Note, however, that construction of such a representation requires O(n2.') time [Aho et al. 19721. Garey and Johnson present polynomial algorithms for P2 Iprec, pi = 1)C,, where, in addition, each job becomes available at its release date and has to meet a given deadline. In this approach, one obtains an optimal schedule by processing. the jobs in order of increasing modified deadlines. This modification requires O(n2) time if all ri = 0 [Garey & Johnson 1976Al and O(n3) time in the general case [Garey & Johnson 19771. We note that P Iprec, pi = 111 Ci is NP-hard [Lenstra & Rinnooy Kan 1978Al. Hu's algorithm does not yield an optimal 1C, schedule in the case of intrees, but in the case of outtrees critical path scheduling minimizes both Cma,and 1Ci [Rosenfeld -1. The Coffman-Graham algorithm also minimizes 1Ci [Garey -1. As far as approximation algorithms for P lprec, pi = 11C,,, are concerned, the NP-hardness proof given in [Lenstra & Rinnooy Kan 1978Al implies that, unless 9 = "9, the best possible worst-case bound for a polynomial-time algorithm would be $. The performance of both Hu's algorithm and the Coffman-Graham algorithm has been analyzed. When critical path (CP) scheduling is used, Chen and Liu [Chen 1975; Chen & Liu 19751 and Kunde [Kunde 19761 show that
In [Kaufman 19721 an example is constructed for which no CP schedule is optimal. Lam and Sethi [Lam & Sethi 19771 use the Coffman-Graham (CG) algorithm to generate lists and show that
Deterministic sequencing and scheduling
297
If SS denotes the algorithm which schedules as the next job the one having the greatest number of successors then it can be shown [Ibarra & Kim 19761 that
(t)
C,,,(SS)/C~,, S $ for rn = 2.
Examples show that this bound does not hold for rn z=3. Finally, we mention some results for the more general case in which pi ~ ( 1k}. , For k = 2, both P2 (prec, 1c pi d 21 C,, and P2 Iprec, 1d pi C 21 1 Ciare NP-hard [Ullman 1975; Lenstra & Rinnooy Kan 1978Al. For P2 (prec, pi E (1, k}J C,,,, Goyal [Goyal 1977B] proposes a generalized version of the Coffman-Graham algorithm (GCG) and shows that
for k
= 2,
4.2.3. P Ires, p, pi = 11C,,, We now take up the variation in which resource constraints enter the model. P2 Ires, pi = 11 C,,, can be formulated and solved as a maximum cardinality matching problem in an obvious way. However, P2 Iresl, free, pi = 1I C,,, and P3 Iresl, pi = 11 C,,, are unary NP-hard [Garey & Johnson 19751. For the case P Ires, prec, pi = 1, rn n ( C,,,, the following results for list scheduling (LS) using an arbitrary priority list are known [Garey et al. 1976Al:
s fsc:,,
c,,,(Ls)/c:,,
+is
+1
+is
+ 1 - 2s/c:,,.
and examples exist with c,,,(Ls)/c:,,
a isc:,,
For the CP scheduling algorithm, the bound improves considerably: c,,,(cP)/c:,,
CKS
+1
(s
22
(t)
0).
Let DMR denote the algorithm which schedules jobs according to decreasing maximum resource requirement. Then c,,,(DMR)Ic:,,s~s
+ I.
In the other direction, examples are given in [Garey et al. 1976Al for any with
E
>0
where a, = 1 and a,+,= ai(ai+ 1) for i 21. An even better bound applies to the case of independent jobs, i.e., P Ires, pi = 1, rn 2 n ( Cmax:
C,,,(LS)
(s +$j>c:,, + $ (s 2 l),
where the coefficient of C:,, is best possible.
R.L. Graham
298
et
al.
The case P Ires 1, p, = 1, m 5 n JC,,, has been the subject of intensive study (under the name of bin packing) during the past few years. The problem can be viewed as one of placing a number of items with weights rli into a minimum number of bins of capacity 1. It is also known as the one-dimensional cutting stock problem. It is for this scheduling model that some of the deepest results have been obtained. Rather than giving a complete survey of what is known for this model, we shall instead give a sample of typical results and refer the reader to the literature for details [Johnson 1973, 1974; Johnson et al. 1974; Graham 1976; Garey & Johnson 1976Bl. Given a list L of items, the first-fit (FF) algorithm packs the items successively in the order in which they occur in L,always placing each item into the first bin into it will validly fit (Le., so that the sum of the weights in the bin does not exceed its capacity 1). The number of bins required by the packing is just the time required to execute the jobs using L as a priority list. If instead of choosing the first bin into which an item will fit, we always choose the bin for which the unused capacity is minimized, then the resulting procedure is called the best-fit (BF) algorithm. Finally, when L is first ordered by decreasing weights and then first-fit or best-fit packed, the resulting algorithm is called first-fit decreasing (FFD) or best-fit decreasing (BFD), respectively. The basic results which apply to these algorithms are the following [Johnson et a]. 1974; Garey et al. 1976Al: Cmax(FF) s
T+?C:axl;
C,,,(FFD)
syC:ax+ 4;
C,,,(BFD)
SyCzax + 4.
The only known proofs of the last two inequalities are extremely lengthy. Examples can be given which show that the coefficients and are best possible. If constraints are made on the resource requirements, i.e., r c r l i s i: for all j , then the following results hold: if
z 3 i, then C,,,(BFD)
if '! 2 4 ,
then
if i:c& then
C,,,(BFD)
C,,,(FFD); = C,,,,,(FFD);
CmaX(FF)/C~,,~1+
if i: E (&, f], then
C,,,(FFD)
+
sGCzax c
for some constant c.
For these and a number of similar results, the reader is referred to [Graham 19761. Krause [Krause 19731 (see also [Krause et al. 1975, 19771) considers the case
299
Deterministic sequencing and scheduling
and he gives examples for which c,,,(~s)/c~,,s~-~/m. Krause also proves several bounds for the preemptive case P Ipmtn, res 11C,,,,,, one of which is
c,,,(DMR)Ic:,,
3
< 3 --
m
( m 2 2).
Goyal [Goyal 1977Al studies the case P Ires 1, prec, pi = 11 C,,, with the restriction that each resource requirement is either zero or 1009'0. Thus, two jobs both requiring the use of the resource can never be executed simultaneously. This problem is already NP-hard for m = 2 [Coffman 19761. Goyal proves that
2 C,,,(LS)/C~,, s 3 - m' C,,,(CG)/C~,, s 5 for m = 2, where in the latter case a priority list is formed according to the CG labelling algorithm described earlier. 4.3. Nonpreemptive scheduling: general processing times 4.3.1. P
11 1 wiCi
The following generalization of the SPT rule for 11)1 Ci (see Section 3.3.1) solves P 11 1 C, in O ( n log n ) time [Conway et al. 19671. Assume n = km (dummy jobs with zero processing times can be added if not) and suppose p , s * G p,,. Assign the m jobs J(i-l)m+,r J(j-,)m+Z,. . . , Jim to m different machines ( j = 1 , . . . ,k ) and execute the k jobs assigned to each machine in SPT order. Bruno, Coffman and Sethi [Bruno et al. 19741consider the algorithm RPT: first apply list scheduling on the basis of largest processing time first (LPT), then reverse the order of jobs on each machine, and finally left justify the schedule. RPT has the same behavior as LPT with respect to the C,,, criterion (see Section 4.3.5.1); however, it only yields
-
1C i ( R P T ) / x C T s m .
(t>
With respect to P 11 1 wiCj, similar heuristics are described and tested empirically by Baker and Merten [Baker & Merten 19731. Eastman, Even and Isaacs [Eastman et al. 19641 show that after renumbering
R.L. Graham el al.
300
the jobs according to nonincreasing ratios wi/pi
It follows from this inequality that
In [Elmaghraby & Park 1974; Barnes & Brennan 19771 branch-and-bound algorithms based on this lower bound are developed. Sahni [Sahni 19761 constructs algorithms A, (in the same spirit as his approach for 1 IIC wiUi mentioned in Section 3.3.3) with O(n(n'k)"-') running time for which
For m =2, the running time of A , can be improved to O(n2k). 4.3.2. Q 11 C Cj The algorithm for solving P 1 1 1Ci given in the previous section can be generalized to the case of uniform machines [Conway e t al. 19671. If 4 is the kth last job executed on Mi, a cost contribution kpij = kqipj is incurred. C Cj is a weighted sum of the pi and is'minimized by matching the n smallest weights kq, in nondecreasing order with the pi in nonincreasing order. The procedure can be implemented to run in O(n log n) time [Horowitz & Sahni 19761. 4.3.3. R )I C C, R 11 1 Cj can be formulated and solved as an rn X n transportation problem [Horn 1973; Bruno et al. 19741. Let
x..Ilk
=
[01
if Ji is the kth last job executed on Mi, otherwise.
Then the problem is to minimize
subject to i=l k=l
for all i, k, j=l
for all i, j, k. Xijk 3 0 This problem, like the similar one in Section 4.2.1, can be solved in O(n3)time.
Deterministic sequencing and scheduling
301
4.3.4. Other cases: enumerative optimization methods As we noted in Section 4.1, P2 (1 C,,, and P2 (1 1 wiCj are NP-hard. Hence it seems fruitless to attempt to find polynomial-time optimization algorithms for criteria other than C Ci. Moreover, P2 Itreel 1 Ci is known to be NP-hard, both for intrees and outtrees [Sethi 19771. It follows that it is also not possible to extend the above algorithms to problems with precedence constraints. The only remaining possibility for optimization methods seems to be implicit enumeration. R 11 C,,, can be solved by a branch-and-bound procedure described in [Stern 19761. The enumerative approach for identical machines in [Bratley et al. 19751 allows inclusion of release dates and deadlines as well. A general dynamic programming technique [Rothkopf 1966; Lawler & Moore 19691 is applicable to parallel machine problems with the C,,,,,, L,,,, wiCi and C wjUi optimality criteria, and even to problems with the C wiT, criterion in the special case of a common due date. Let us define F,(t,, . . . , t,) as the minimum cost of a schedule without idle time for J1,. . . ,Ji subject to the constraint that the last job on Mi is completed at time ti, for i = 1,. . . ,m. Then, in the case of f,,, criteria, F , ( t , , . . . ,t , ) =
and in the case of
Cfi
min {maxCf(ti),qpl(t1,. . . , ti - p i i , .
I r i ern
. . ,t,)}},
criteria,
F,(tl, . . . , t , ) =
min ( f i ( t , ) + F ; - , ( t , , . . . , t., - p .,,, . . . , t,)}. Isirm
In both cases, the initial conditions are
Fo(L
* * * Y
t,) =
I
0 if k = O for i = 1,.. . , m, otherwise.
m
Appropriate implementation of these equations yields O(mnC"-') computations for a variety of problems, where C is an upper bound on the completion time of any job in an optimal schedule. Among these problems are P 151 C,,,, Q 11 L,,, and Q 11 1 wjCj. P 11 1 wiUi can be solved in O(mn(maxj (4))")time. Still other dynamic programming approaches can be used to solve P 11 C fi and P 11 f,,, in O(m min {3", n2"C)) time, but these are probably of little practical importance.
4.3.5. Other cases: approximation algorithms 4.3.5.1. P(I C,,,. By far the most studied scheduling model from the viewpoint of approximation algorithms is P 11 C,,,. We refer to [Garey et al. 19781 for an easily readable introduction into the techniques involved in many of the "performance guarantees" mentioned below. Perhaps the earliest and simplest result on the worst-case performance of list
R.L. Graham et al.
302
scheduling is given in [Graham 19661: 1 C,,,(LS)IC~,, =z 2 -m’
If the jobs are selected in LPT order, then the bound can be considerably improved, as is shown in [Graham 19691:
c,,,(LPT)Ic:,,
+-im.
(t)
A somewhat better algorithm, called multifit (MF) and based on a completely
different principle, is given in [Coffman et al. 19781. The idea behind MF is to find (by binary search) the smallest “capacity” a set of m “bins” can have and still accommodate all jobs when the jobs are taken in order of nonincreasing pi and each job is placed into the first bin into which it will fit. The set of jobs in the ith bin will be processed by Mi. If k packing attempts are made, the algorithm (denoted by MF,) runs in time O(n log n + knm) and satisfies 1 C,,,,~(MF~)/C:,,=z 1.22+-. 2, We note that if the jobs are not ordered by decreasing pi then all that can be guaranteed by this method is 7
The following algorithm Z , was introduced in [Graham 19691: schedule the k largest jobs optimally, then list schedule the remaining jobs arbitrarily. It is shown in [Graham 19691 that
and that when m divides k, this is best possible. Thus, we can make the bound as close to 1 as desired by taking k sufficiently large. Unfortunately, the best bound on the running time is O(nk“‘). A very interesting algorithm for P (1 C,,, is given by Sahni [Sahni 19761. He presents algorithms Ak with O(n ( n 2 k ) ” - ’ ) running time which satisfy
For m = 2, algorithm A, can be improved to run in time O ( n 2 k ) .As in the cases of 1 11 1wiU, (Section 3.3.3) and P ( 1 1wiCi (Section 4.3.1), the algorithms A, are based on a clever combination of dynamic programming and “rounding” and are beyond the scope of the present discussion. Several bounds are available which take into account the processing times of
303
Deterministic sequencing and scheduling
the jobs. In [Graham 19691 it is shown that cmax(Ls)/czax
1 + (m - 1) max { p ; } / ~ p,. I
i
For the case of LPT, Ibarra and Kim [Ibarra & Kim 19771 prove that c",,,(LPT)/c:,,
1+
2(m - 1) for n n
2(m - 1)max {p,}/min {pi}. I
I
The following local interchange (LI) algorithm gives a slight improvement over the original 2- l / m bound: assign jobs to machines arbitrarily, then move individual jobs and interchange pairs of jobs as long as C,,, can be decreased by any such change. It then follows [Graham -1 that
I n [Bruno et al. 19741 the Conway-Maxwell-Miller (CMM) algorithm for solving P 11 1 C, (see Section 4.3.1) is considered. Let C:,,(CMM) be the minimum completion time among all schedules that can be generated by CMM. Then 1 m'
s 2 --
c:,,(cMM)Ic,,,(LPT)
G 2 --.
c:,,(cMM)Ic:,,
1
m
(t)
(t)
An interesting variation on the C,,, criterion arises in the work of Chandra and Wong [Chandra & Wong 19751. They consider the case P ( I C B : , where B, denotes the completion time of the job executed last on MI, and establish the surprisingly good behavior of LPT:
c Bf(LPT) /IB f * s Z . They also construct examples for which c B f ( L P T ) / C Bf*>Z-&/m..
Finally, we mention the following result [Garey et al. -3. For any LPT schedule, let t,,, denote the latest possible time at which a machine can become idle and let t,,, denote the earliest time a machine can be idle. Then tmaxltrn,,,
4m-2 3m-1
6-
and this bound is best possible. 4.3.5.2. Q 11 C,,,,. In the literature on approximation algorithms for scheduling problems, it is usually assumed that unforced idleness (UI) of machines is not allowed, i.e., a machine cannot be idle when jobs are available. In the case of
R.L. Graham
304
et al.
identical machines, UI need not occur in an optimal schedule if there are no precedence constraints or if all pi = 1. Allowing UI may yield better solutions, however, in the cases which are to be discussed in Sections 4.3.5.2-6. The optimal value of C,,, under the restriction of no UI will be denoted by C:,,, the optimum if UI is allowed by C:,x(UI). Liu and Liu [Liu & Liu 1974A, 1974B, 1974C] study numerous questions dealing with uniform machines. We outline some of their results. For the case that q , = * = qm-, = 1, qm = q 3 1, they prove
--
2(m- l + q ) q+2 m-l+q 2
for q S 2, for q > 2.
For the general case, they define the algorithm A, as follows: schedule the k longest jobs first, resulting in a completion time of c,(A,), and schedule the remaining tasks for a total completion time of CmaX(Ak). If Cmax(Ak) > C,(A,), then
where all qi 2 1 and
This is best possible when the qi are integers and Ciqi divides k. Gonzalez, Ibarra and Sahni [Gonzalez et al. 19771 consider the following generalization LPT’ of LPT: assign each job, in order of nonincreasing processing time, to the machine on which it will be completed soonest. Thus, unforced idleness may occur in the schedule. For the case that q, = . . = qm-, = 1, qm2 1, they show that
(1
+m for m = 2, for m > 2.
For the general case, they show
c,,x(LPT’)/c:,x s 2 --
2
m+l’
Also, examples are given for which C,,,(LPT’)/C:,, infinity.
approaches $ as m tends to
Deterministic sequencing and scheduling
305
4.3.5.3. R 11 C,,,,,. Very little is known about approximation algorithms for this model. Ibarra and Kim [Ibarra & Kim 19771 consider six algorithms, typical of which is to schedule Ji on the machine that executes it fastest, i.e., on an Miwith minimum pii. For all six algorithms A they prove C,,,,,(A)/C%, m
with equality possible for four of the six. For the other two, they conjecture
For the special case R 2 11 C,,,,,, they give a complicated algorithm G (however, with O(n log n) running time) such that
In a variation on R 11 C,,,, we assume that each .Ii has a processing time pi and a fixed memory requirement IJiI and that each Mi has a memory capacity [Mil. We require that (Mil 3 lJil in order for M ito be able to execute Ji, i.e.,
Kafura and Shen [Kafura & Shen 19771 show c~,,(Ls)/c*,,,c 1+log m. They also note that when rn is a power of 2, the bound can be achieved. Suppose a list is formed in order of decreasing 141; this algorithm is denoted by LMF (largest memory first). It can be shown [Kafura & Shen 19771 that C,~,(LMF)/C:,,
1 c 2 --
m'
A refinement of LMF is LMTF where ties in of pi. In this case, for
141 are broken
by decreasing order
m =2 ,
Kafura and Shen also give a complicated (but polynomial-time) algorithm 2D for which
Other results for this model may be found in [Kafura & Shen 19781.
R.L. Graham ef al.
306
4.3.5.4. P JprecIC,,,. In the presence of precedence constraints it is somewhat unexpected [Graham 19661 that the 2 - l / m bound still holds, i.e., 1 sz 2 --
c,,,(LS)/c:,,
m'
Now, consider executing the set of jobs twice: the first time using processing times pj, precedence constraints, m machines and an arbitrary priority list, the second time using processing times pJ C pi, weakened precedence constraints, m' machines and a (possibly different) priority list. Then [Graham 19661
m-1 cg,x(Ls)/c,,,(Ls) =s 1 +-. m'
(t)
Even when critical path (CP) scheduling is used, examples exist [Graham which
for
1 m'
= 2 --
c,,,(cP)/c:,, It is known [Graham
-1
-1
that unforced idleness (UI) has the following behavior:
Let C:,,(pmm) denote the optimal value of C,,, if preemption is allowed. As in the case of UI, it is known [Graham -1 that C,,,(LS)/C:,,(pmtn)
=s 2--
1
(t)
m'
Liu [Liu 19721 shows that 2 C*,,,(UI)/C~,,(pmtit) s 2 -m+l' Relatively little is known in the way of approximation algorithms for the more special case P (tree1 C,,,. It is conjectured in [Denning & Scott Graham 19731 that c,,,(cP)/c:,,
z 2 2 --m 2+ l '
If true this would be best possible as examples show. For the special case that the precedence constraints form an intree, Kaufman [Kaufman 19741 shows that
:1
C,,,,,(Cp)aC~,,(pmtn)+max{p,}- --ax
{p,~].
4.3.5.5. Q lprec) C,,,. Liu and Liu [Liu & Liu 1974B] also consider the presence of precedence constraints in the case of uniform machines. They show that, when
Deterministic sequencing and scheduling
307
unforced idleness or preemption is allowed,
When all qi = 1 this reduces to the earlier 2- llm bounds for these questions on identical machines. Suppose that the jobs are executed twice: the first time using m machines of speeds q,, . . . , q,,,, the second time using m' machines of speeds q;, . . . , q:,,,. Then
We mention here two rather special results of Baer [Baer 19741. He constructs an algorithm B based on the CG labelling algorithm which has the following behavior. For 0 2 (tree)C,,,, with q2/q, = 3,
4.3.5.6. P Ires, precl C,,,,,. The most general bound for P Ires, precl C,,, in [Garey & Graham 197.51. It states
is given
(t)
C,Llx(LS~/C~,xm
and, in fact, examples with s = 1 are given which achieve this bound. Thus, the addition of even a single resource in the presence of precedence constraints can have a drastic effect o n the worst-case behavior of an arbitrary priority list. For P Ires1 C,,,, it is shown in [Garey & Graham 19751 that for m 2 2 C,,,(LS)/C~,,smin
,s+2--
m
With the restriction that m 3 n, s 3 1, this can be improved to
C,,,(
Ls)/c:,,
s s + 1.
(t)
The techniques used to prove this inequality involve an interesting application of Ramsey theory, a branch of combinatorics.
308
R.L. Graham et al.
4.4. Preemptive scheduling 4.4.1. P lprntnl C Ci A theorem of McNaughton [McNaughton 19591 states that for P IpmtnlC wiCi there is no schedule with a finite number of preemptions which yields a smaller criterion value than an optimal nonpreemptive schedule. The finiteness restriction can be removed by appropriate application of results from open shop theory. It therefore follows that the procedure of Section 4.3.1 can be applied to solve P (pmtnl C C,. It also follows that P 2 lprntnl C wiC,is NP-hard, since P2 11 C wiC,is known to be NP-hard. 4.4.2. Q lpmtnl C Ci McNaughton’s theorem does not apply to uniform machines, as can be demonstrated by a simple counterexample. There is, however, a polynomial algorithm for Q IpmtnlC Ci. One can show that there exists an optimal preemptive schedule in which C, s ck if pi < p k [Lawler & Labetoulle 19781. Accordingly, first place the jobs in SPT order. Then obtain an optimal schedule by preemptively scheduling each successive job in the available time on the m machines so as to minimize its completion time [Gonzalez 19771. This procedure can be implemented in O(n log n + mn) time and yields an optimal schedule with no more than (m - 1)x ( n - 4 2 ) preemptions. It has been extended to cover the case in which 1 Cj is minimized subject to a common deadline for all jobs [Gonzalez 19771. 4.4.3. R Ipmtn)C Ci Very little is known about R Ipmtnl C C,. We conjecture that the problem is NP-hard. However, this remains one of the more vexing questions in the area of preemptive scheduling. 4.4.4. P Iprntn, precl C,,, An obvious lower bound on the value of an optimal P lpmtnl C,,, schedule is given by
A schedule meeting this bound can be constructed in O ( n ) time [McNaughton 19591: just fill the machines successively, scheduling the jobs in any order and splitting a job whenever the above time bound is met. The number of preemptions occurring in this schedule is at most m - 1. It is possible to design a class of problems for which this number is minimal, but the general problem of minimizing the number of preemptions is easily seen to be NP-hard. In the case of precedence constraints, P Ipmtn, prec, pi = 11 C,,, turns out to be NP-hard [Ullman 19761, but P Ipmtn, tree1 C,,, and P 2 lpmtn, precl C,,, can be
Deterministic sequencing and scheduling
309
solved by a polynomial-time algorithm due to Muntz and Coffman [Muntz & Coffman 1969, 19701. This is as follows. Define li(t) to be the level of a .liwholly or partly unexecuted at time t. Suppose that at time t rn’ machines are available and that n‘ jobs are currently maximizing li(t). If rn’< n’, we assign rn‘ln’ machines to each of the n’ jobs, which implies that each of these jobs will be executed at speed rn’ln’. If rn’zn’, we assign one machine to each job, consider the jobs at the next highest level, and repeat. The machines are reassigned whenever a job is completed or threatens to be processed at a higher speed than another one at a currently higher level. Between each pair of successive reassignment points, jobs are finally rescheduled by means of McNaughton’s algorithm for P lprntnl C,,,. The algorithm requires O(n2)time [Gonzalez & Johnson 19771. Recently, Gonzalez and Johnson [Gonzalez & Johnson 19771have developed a totally different algorithm that solves P (prntn,tree] C,,, by starting at the roots rather than the leaves of the tree and determines priority by considering the total remaining processing time in subtrees rather than by looking at critical paths. The algorithm runs in O(n log rn) time and introduces at most n - 2 preemptions into the resulting optimal schedule. Lam and Sethi [Lam & Sethi 19771, much in the same spirit as their work mentioned in Section 4.2.2, analyze the performance of the Muntz-Coffman (MC) algorithm for P (prntn,precl C,,,. They show
2 C,,,(MC)/C~,, s 2 -m
(rn a 2).
(t)
4.4.5. Q Jprntn,prec) C,, Horvath, Lam and Sethi [Horvath et al. 19771 adapt the Muntz-Coffman algorithm to solve Q lprntnl C,, and Q2 Iprntn, precl C,, in O(rnn2)time. This results in an optimal schedule with no more than (rn - l)n2 preemptions. A complicated, but computationally efficient, algorithm due to Gonzalez and Sahni [Gonzalez & Sahni 1978Bl solves Q lprntnl C,, in O(n) time, if the jobs are given in order of nonincreasing pi and the machines in order of nondecreasing qi. This procedure yields an optimal schedule with no more than 2(rn-1) preemptions, which can be shown to be a tight bound. The optimal value of C,, is given by
--
-
where p, a * 3 pn and q1S * sz qm. This result generalizes the one given in Section 4.4.4. The Gonzalez-Johnson algorithm for P lprntn, tree1 C,, mentioned in the previous section can be adapted to the case Q2 Iprntn, tree) C,,,,,. In [Horvath et al. 19771 it is shown that for Q Iprntn, precl C,,,, critical path
R.L. Graham eT al.
310
scheduling has the bound
and examples are given for which the bound (rn/8)1 is approached arbitrarily closely. 4.4.6. R IprntnI C,,, Many preemptive scheduling problems involving independent jobs on unrelated machines can be formulated as linear programming problems [Lawler & Labetoulle 19781. For instance, solving R lpnztnl C,,, is equivalent to minimizing C,,, subject to xii/pij= 1 (j = I, . . . , n), i=l
f f
xij SC,,,
( j = 1,. . . , n ) ,
i=l
XiiSCmax( i = l , . . . , rn),
;=I
xii 3 0
( i = l , . . . , rn, j = 1 , . . . , n ) .
In this formulation xii represents the total time spent by Ji on Mi. Given a solution to the linear program, a feasible schedule can be constructed in polynomial time by applying the algorithm for 0 (pmtnl C,,,,, discussed in Section 5.2.2. This procedure can be modified to yield an optimal schedule with no more than about $rn2 preemptions. It remains an open question as to whether O(m2) preemptions are necessary for an optimal preemptive schedule. For fixed rn, it seems to be possible to solve the linear program in linear time. Certainly, the special case R 2 JprntnlC,,, can be solved in O(n) time [Gonzalez et al. 19781. We note that a similar linear programming formulation can be given for the minimization of L,,, [Lawler & Labetoulle 19781. 4.4.7. P Iprntn, rjl L,,, P lprntnl L,,, and P Iprntn, ril C,,, can be solved by a procedure due to Horn [Horn 19741. The O(n2)running time has been reduced to O(mn) [Gonzalez & Johnson 19771. More generally, the existence of a feasible preemptive schedule with given release dates and deadlines can be tested by means of a network flow model in O(n3) time [Horn 19741. A binary search can then be conducted on the optimal value of L,,,, with each trial value of L,,, inducing deadlines which are checked for feasibility by means of the network computation. It can be shown that this yields an O(n3min {n', log n +log maxi {pi}}) algorithm [Labetoulle et al. 19781.
Deterministic sequencing and scheduling
311
4.4.8. Q Ipmtn, rjl L,,, In the case of uniform machines, the existence of a feasible preemptive schedule with given release dates and a common deadline can be tested in O(n log n + mn) time; the algorithm generates O(mn) preemptions in the worst case [Sahni & Cho 1977Al. More generally, Q (pmtn,rj( C,,, and, by symmetry, Q lpmtnl L,,, are solvable in O(n2)time; the number of preemptions generated is O(n2)[Sahni & Cho 1977B; Labetoulle et al. 19781. The feasibility test mentioned in the previous section has been adapted to the case of two uniform machines [Bruno & Gonzalez 19761 and extended to a polynomial-time algorithm for Q 2 Ipmtn, ril L,,, [Labetoulle e t al. 19781. It appears not unlikely that the Gonzalez-Johnson algorithm for P Ipmtn, tree( C,,, and the above mentioned algorithm for Q Ipmtn, rj( C,,, allow a common generalization that will make Q Ipmtn, treeIC,,, solvable in polynomial time.
5. Open shop, flow shop and job shop problems 5.1. Introduction We now pass on to problems in which each job requires execution on more than one machine. Recall from Section 2.3 that in an open shop (denoted by 0) the order in which a job passes through the machines is immaterial, whereas in a flow shop (F)each job has the same machine ordering ( M , , . . . ,M,) and in a job shop ( J ) possibly different machine orderings are specified for the jobs. We survey these problem classes in Sections 5.2, 5.3 and 5.4 respectively. An obvious extension of this type of problem involves machines which can process more than one job at the same time. The resulting resource constrained project scheduling problems are extremely hard to solve. We refer to surveys by Davis [Davis 1966, 19731 that contain an extensive bibliography. We shall be dealing exclusively with the C,,, criterion. Other optimality criteria lead usually to NP-hard problems, even for m = 2 [Garey et al. 1976B; Lenstra et al. 19771; a notable exception is 0 2 I(1 Cj, which is open. Only a few enumerative algorithms for problems involving criteria other than C,,, have been developed, e.g., for F2 11 1 Cj [Ignall & Schrage 19651, F 11 1 wjC, [Townsend 1977A1, F 11 L,,, [Townsend 1977B], and J I(1 wjT, [Fisher 19731.
5.2. Open shop scheduling 5.2.1. Nonpreemptive case The case 0 2 11 C,,, admits of an O ( n ) algorithm [Gonzalez & Sahni 19761. A simplified exposition is given below. For convenience, let ai = p l i , bj = p Z j . Let A = (4 I ai a 4 3 , B = {4 1 a, < bi}.
R.L. Graham et al.
312
Now choose J, and JI to be any two distinct jobs (whether in A or B)such that a, 3 max {b,}, JisA
bl 5 max {ai}. JisB
Let A ' = A -{J,, Jl}, B' = B -{Jr, J,}. We assert that it is possible to form feasible schedules for B' U{Jl} and for A' U {J,} as indicated in Fig. 5.1, the jobs in A' and B' being ordered arbitrarily. In each of these separate schedules, there is no idle time on either machine, from the start of the first job on that machine to the completion of the last job on that machine. Let T, = a,, T2 b,. Suppose T, - a, z=T2- b, (the case TI- a, < T2- b, being symmetric). We then combine the two schedules as shown in Fig. 5.2, pushing the jobs in B ' U { J l } on M2 to the right. Again, there is no idle time on either machine, from the start of the first job to the completion of the last job. We finally propose to move the processing of J, on M2 to the first position on that machine. There are two cases to consider. (1) a , c T 2 - b , . The resulting schedule is as in Fig. 5.3. The length of the schedule is max {TI,T2}. (2) a, > T2-b,. The resulting schedule is as in Fig. 5.4. The length of the schedule is max {TI,a, + b,}. For any feasible schedule we obviously have that C,,, 2 max {TI,T2, max {ai + bj}}.
xi
=xi
i
Since, in all cases, we have met this lower bound, it follows that the schedules constructed are optimal.
Fig. 5.1.
Fig. 5.2.
Fig. 5.3.
Fig. 5.4.
Deterministic sequencing and scheduling
313
There is a little hope of finding polynomial-time algorithms for nonpreemptive open shop problems more complicated than 0 2 )I C,,,. The case 0 3 ( 1 C,,, is binary NP-hard [Gonzalez & Sahni 19761 and 0 2 Iri) C,,,, 0 2 [tree(C,,, and 0 11 C,,, are unary NP-hard [Lenstra -3. 5.2.2. Preemptive case The result on 0 2 I(C,, presented in the previous section shows that there is no advantage to preemption for m = 2, and hence 0 2 (pmtnl C,,, can be solved in O(n) time. More generally, 0 (prntnl C,,, is solvable in polynomial time as well [Gonzalez & Sahni 19761. We already had occasion to refer to this result in Section 4.4.6. An outline of the algorithm, adapted from [Lawler & Labetoulle 19781, follows below. Let P = (pii) be the matrix of processing times and
xi
(xi
pii = C ) , slack otherwise. Call row i (column j) of P tight if pii = C We clearly have Czax2 C. It is possible to construct a feasible schedule €or which C,,, = C. Hence this schedule will be optimal. Suppose we can find a subset S of strictly positive elements of P, with exactly one element of S in each tight row and in each tight column, and at most one element of S in each slack row and in each slack column. We shall call such a subset a decrementing set, and use it to construct a partial schedule of length 8, for some S > 0. The constraints on the choice of S are as follows. ( 1 ) If pi, E S and either row i or column j is tight, then 6 S pii. (2) If p i i € S and row i (column j) is slack, then 6 < p i i + C - c , P i k ( 6 s p i i + c-ck
Pkj).
( 3 ) If row i (column j ) contains no element in S (and is therefore necessarily slack), then 6 c-ck P,k (6 c c - c k pkj). For a given decrementing set S, let S be the maximum subject to (l),(2), (3). Then the partial schedule constructed is such that for each pijE S, Mi processes Ji for min {pii, 6) units of time. We then obtain the matrix P‘ from P by replacing each p i , € S by max (0, pii -a}, and repeat the procedure until after a finite number of times P’ = (0). Joining together the partial schedules obtained for successive decrementing sets then yields an optimal preemptive schedule for P. By suitably embedding P in a doubly stochastic matrix and appealing to the Birkhoff-Von Neumann theorem, it can be shown that a decrementing set can be found by solving a linear assignment problem; see [Lawler & Labetoulle 19781 for details. Other network formulations of the problem are possible. An analysis of various possible computations reveals that 0 lpmtnl C,,, can be solved in O(r+ min {m4,n4, r’}) time, where r is the number of nonzero elements in P [Gonzalez 19761.
314
R.L. Graham ef ul.
5.3. Flow shop scheduling 5.3.1. F 2 IP I C m a x , F3 IP I C m a x A fundamental algorithm for solving F 2 11 C,,, is due to Johnson [Johnson 19541. He shows that there exists an optimal schedule in which J, precedes J k if min {pli, pZk}Grnin{pzi,Plk}. It follows that the problem can be solved in 0 (n log n) time: arrange first the jobs with pli
li = min {11, - pli, 12j
-
p2i)
and applying Johnson’s algorithm to processing times ( P , ~+ li, p2i + l i ) will produce an optimal permutation schedule, i.e., one with identical processing orders on all machines [Rinnooy Kan 19761. If we drop the latter restriction, the problem is unary NP-hard [Lenstra -1. Similarly, some F3 11 C,,, problems can be solved by applying Johnson’s algorithm to processing times (pli+ p2i, p2i p3j)re.g., if there exists a 0 E [0, l] such that
+
0 P , j + ( 1 - ~ ) P , j ~ P 2 k for all ( j , k ) [Johnson 1954; Burns & Rooker 19761 or if M2 can process any number of jobs at the same time [Conway et al. 19671. We refer to [Monma 19771 for further generalizations. The general F3 11 C,,, problem, however, is unary NP-hard, and the same applies to F 2 IriJC,,, and F2 Itreel C,,, [Garey et al. 1976B; Lenstra et al. 19771. It should be noted that an interpretation of precedence constraints which differs from our definition is possible. If Ji<’Jk only means that Oii should precede Oi, for i = 1,2, them F 2 ltree’l C,,, can be solved in O(n log n) time [Sidney 19771. In fact, Sidney’s algorithm applies even to series-parallel precedence constraints. The arguments used to establish this result are very similar to those referred to in Section 3.3.1 and apply to a larger class of scheduling problems [Monma & Sidney 19771. It is an open question whether F 2 Iprec’l C,,, is NP-hard. Gonzalez and Sahni [Gonzalez & Sahni 1978Al consider the case of preemptive flow shop scheduling. They show that preemptions on MI and M,,, can be removed without increasing C,,,. Hence, Johnson’s algorithm solves F2lpmtnl C,,, as well. F3 Ipmtn( C,,, turns out to be unary NP-hard. 5.3.2. F 11 CmaX As a general result, we note that there exists an optimal flow shop schedule with the same processing order on MI and M2 and the same processing order on
Deterministic sequencing and scheduling
315
M,,,-, and M,,, [Conway et al. 19671. It is, however, not difficult to construct a 4-machine example in which a job “passes” another one between M2 and M3 in the optimal schedule. Nevertheless, it has become tradition in the literature to assume identical processing orders on all machines, so that in effect only the best permutation schedule has to be determined. Except for some rather simple worst-case results for heuristics, obtained by Gonzalez and Sahni [Gonzalez & Sahni 3978A], that are to be mentioned in Section 5.4.2, all research in this area has focused on enumerative methods. The usual enumeration scheme is to assign jobs to the Ith position in the schedule at the 1th level of the search tree. Thus, at a node at that level a partial has been formed and the jobs with index set S = schedule ..., { 1 , . . . , n}-{m(l), . . . , ~ ( 1 ) )are candidates for the (1+ 1)st position. One then needs to find a lower bound on the value of all possible completions of the partial schedule. It turns out that almost all lower bounds developed so far are generated by the following bounding scheme [Lageweg et al. 1978Bl. Let us relax the capacity constraint that each machine can process at most one job at a time, for all machines but at most two, say, Mu and Mu (1 s u S u S m). We then obtain a problem of scheduling {J, I j E S } on five machines N . Mu, Nu,, Mu, N u . in that order, which is specified as follows. Let C(a, i) denote the completion time of Jcr(f)on M,. N.,,, Nu, and N u . have infinite capacity; the processing times on these machines are defined by
h=u+l
h=u+l
Mu and Mu have capacity 1 and processing times pui and pui, respectively. Note that we can interpret N . as yielding release dates q. ui on Mu and N u . as setting due dates -qu.i on Mu, with respect to which L,,,, is to be minimized. Any of the machines N.u,Nu,, N u . can be removed from this problem by underestimating its contribution to the lower bound to be the minimum processing time on that machine. Valid lower bounds are obtained by adding these contributions to the optimal solution value of the remaining problem. For the case that u = u, removing N. and Nu.from the problem produces the machine-based bound used in [Ignall & Schrage 1965; McMahon 19711: max 1su-m
{min { q .
uj)
jtS
I
+ C puj+ min {q, . . jeS jeS
Removing only N . results in a 1 (1 L,,, problem on Mu, which can be solved by Jackson’s rule (Section 3.2) and provides a slightly stronger bound.
316
R.L. Graham ef al.
If u f u, removal of N . u, Nu, and N, . yields an F2 I)C,,, problem, to be solved by Johnson’s algorithm (Section 5.3.1). As pointed out in that section, solution in polynomial time remains possible if Nu, is taken fully into account; the resulting bound dominates the job-based bound proposed in [McMahon 19711 and is the best one currently available. All other variations on this theme (e.g., taking u = o and considering the resulting 1 )rjlL,,, problem) would involve the solution of NP-hard problems. The development of fast algorithms or strong lower bounds for these problems thus emerges as a possibly fruitful research area. The computational performance of branch-and-bound algorithms for F 11 C,,, might be improved by the use of elimination criteria. Particular attention has been paid to conditions under which all completions of (Ju(l),. . . ,Ju(I),4 ) can be eliminated because a schedule at least as good exists among the completions of (J,(,), . . . , Jucl,,Jk,Jj). If all information obtainable from the processing times of the other jobs is disregarded, the strongest condition under which this is allowed is as follows. Defining Ai = C(akj,i)- C(aj, i), we can exclude Ji for the Ith position if max{Ai-,,Ai}Sp,j ( i = 2 , . . . , rn) [McMahon 1969; Szwarc 1971, 19731. Inclusion of these and similar dominance rules can be very helpful from a computational point of view, depending on the lower bound used [Lageweg et al. 1978Bl. It may be worthwile to consider further extensions that, for instance, involve the processing times of the unscheduled jobs [Gupta & Reddi 1978; Szwarc 19781. 5 . 3 . 3 . No wait in process In a variation on the flow shop problem, each job, once started, has to be processed without interruption until it is completed. This no wait constraint may arise out of certain job characteristics (e.g., the “hot ingot” problem in which metal has to be processed at continuously high temperature) or out of the unavailability of intermediate storage in between machines. The resulting F (no wait1 C,,, problem can be formulated as a traveling salesman problem with cities 0, 1, . . . , n and intercity distances
where pio=O (i = 1,. . . , rn) [Piehler 1960; Reddi & Ramamoorthy 1972; Wismer 19721. We refer to [Lenstra & Rinnooy Kan 19751 for an extension of this formulation to certain job shop systems and to [Van Deman & Baker 19741 for a branch-and-bound approach to Flno wait1 C Ci. For the case F2 (no wait( C,,,, the traveling salesman problem assumes a special structure and the results from [Gilmore & Gomory 19641 can be applied to yield an O(hZ)algorithm [Reddi & Ramamoorthy 19721. Both F )no wait) C,,,
Deterministic sequencing and scheduling
317
and Flno wait( C C, are unary NP-hard [Lenstra et al. 19771, and the same is true for 0 2 [no wait1 C,,, and 52 (no wait(C,,, [Sahni & Cho 1977Cl. In spite of challenging prizes awarded for their solution [Lenstra et al. 19771, F3lno wait( C,,, and F2 \no wait( C Ci are still open. The no wait constraint may lengthen the optimal flow shop schedule considerably. It can be shown [Lenstra -1 that
Czax(nowait)/CX,,< m
for m
3 2.
(t)
5.4. Job shop scheduling 5.4.1. 52 IP I Crnax, 53 IP I C m a x A simple extension of Johnson's algorithm for F 2 I( C,,, allows solution of J2(rniS 2 ( C,,, in O ( n log n ) time [Jackson 19561. Let Bi be the set of jobs with operations on M ionly ( i = 1 , 2 ) and $hi the set of jobs that go from Mh to Mi (hi= 12,21). Order the latter two sets by means of Johnson's algorithm and the former two sets arbitrarily. One then obtains ansoptima1 schedule by executing the jobs on M , in the order ($lz,$,,$z,) and on M z in the order (8;z,,$2,$12). This, however, is probably as far as we can get. Unary NP-hardness of 52 (1 C,,, results as soon as we allow one job to have more than two operations [Garey et al. 1976B; Lenstra et al. 19771. In fact, 52 11 G p i ic 2 1 C,,, and 53 lpii = 11 C,,, are already NP-hard [Lenstra & Rinnooy Kan 1978Bl. 5.4.2. J I( C,,, The general job shop problem is extremely hard to solve optimally. An indication of this is given by the fact that a 10-job 10-machine problem, formulated in 1963 [Muth & Thompson 19631, still has not been solved. A convenient problem representation is provided by the disjunctive graph model, introduced by Roy and Sussmann [Roy & Sussmann 19641. Assume each operation Oii being renumbered as 0, with u = cl;='l m k+ i and add two fictitious initial and final operations 0, and 0, with po = p* = 0. The disjunctive graph is then defined as follows. There is a vertex u with weight pu corresponding to each operation 0,. The directed conjunctive arcs link the consecutive operations of each job, and link 0, to all first operations and all last operations to 0,. A pair of directed disjunctive arcs connects :very two operations that have to be executed on the same machine. A feasible schedule corresponds to the selection of one disjunctive arc of every such pair, granting precedence of one operation over the other on their common machine, in such a way that the resulting directed graph is acyclic. The value of the schedule is given by the weight of the maximum weight path from 0 to *. We refer to Fig. 5.5 and 5.6 for examples. At a typical stage of any enumerative algorithm, a certain subset D of disjunctive arcs will have been selected. We consider the directed graph obtained by removing all other disjunctive arcs. Let the maximum weights of paths from 0
R.L. Graham et al.
318
a
a
Fig. 5.5. Job shop problem, represented as a disjunctive graph.
Fig. 5.6. Job shop schedule, represented as an acyclic directed graph.
to u and from u to *, excluding pu, be denoted by r,, ang qu, respectively. In particular, r, is an obvious lower bound on the value of any feasible schedule obtainable from the current graph [Charlton & Death 19701. We can get a far better bound in a manner very similar to the development of flow shop bounds in Section 5.3.2 [Lageweg et al. 19771. Let us relax the capacity constraints for all machines except M,. We then obtain a problem of scheduling the operations 0, on Mi with release dates r,,, processing times p,,, due dates -qu and precedence constraints defined by the directed graph, so as to minimize maximum lateness. As pointed out in Section 3.2, this 1 Iprec, ril L,,, problem is NP-hard, but there exist fast enumerative methods for its solution on each Mi. Again, all lower bounds proposed in the literature appear as special cases of the above one by underestimating the contribution of r,, q,, or both, by ignoring the precedence constraints, or by restricting the set of machines over which maximization is to take place. The currently best job shop algorithm [McMahon & Horian 19751 involves the 1 lril L,,, bound combined with the enumeration of active schedules. Starting from Oo, we consider at each stage the subset S of operations all of whose predecessors have been scheduled and calculate their earliest possible completion times r, + p , . It can be shown [Giffler & Thompson 19601 that it is sufficient to consider only a machine on which the minimum value of r,, + pu is achieved and to branch by successively scheduling next on that machine all 0,for which r, < mino,Es {ru + p,,}. In this scheme, several disjunctive arcs are added to D at each stage. An alternative approach whereby at each stage one disjunctive arc of some crucial pair is selected leads to a computationally inferior approach [Lageweg et al. 19771.
319
Deterministic sequencing and scheduling
The applicability of Lagrangean techniques to obtain stronger lower bounds is the subject of ongoing research. Either the precedence constraints fixing the machine orders for the jobs or the capacity constraints of the machines can be multiplied by a Lagrangean variable and added to the objective function. For fixed values of the multipliers, the resulting problems can be solved in (pseudo)polynomial time. Computational experiments will have to reveal if this approach, combined with subgradient optimization or another suitable technique, will lead to any substantially better job shop algorithm. As far as approximation algorithms are concerned, a considerable effort has been invested in the empirical testing of various priority rules [Gere 1966; Conway et al. 1967; Day & Hottenstein 1970; Panwalkar & Iskander 19771. No rule appears to be consistently better than any other and in practical situations one would be well advised to exploit any special structure that the problem at hand has to offer. Not much has been done in the way of worst-case analysis of approximation algorithms for flow shop and job shop problems. Gonzalez and Sahni [Gonzalez & Sahni 1978A] show that for any active flow shop or job shop schedule (AS)
(t)
C,,,,,(AS)/CZ,, S m.
This bound is tight even for LPT schedules, in which the jobs are ordered according to nonincreasing sums of processing times. They give an O ( m n log n ) algorithm H for F )I C,,, based on Johnson’s Algorithm for F2 11 C,,, with
With SPT defined similarly as LPT, it is also shown that for FII
C C, and J 11
C,
It thus appears that, in general, the obvious algorithms can deviate quite substantially from the optimum for this class of problems.
6. Concluding remarks If one thing emerges from the preceding survey, it is the amazing success of complexity theory as a means of differentiating between easy and hard problems. Within the very detailed problem classification developed especially for this purpose, surprisingly few open problems remain. For an extensive class of scheduling problems, a computer program has been developed that classifies these problems according to their computational complexity [Lageweg et al. 1978Al. It employs elementary reductions such as those defined in Section 2.7 in order to deduce the consequences of the development of a new polynomial-time algorithm or a new NP-hardness proof.
320
R.L. Graham er al.
As far as polynomial-time algorithms are concerned, the most impressive recent advances have occurred in the area of parallel machine scheduling and are due to researchers with a computer science background, recognizable as such by their use of terms like tasks and processors rather than jobs and machines. Single machine, flow shop and job shop scheduling has been traditionally the domain of operations researchers. Here, an analytical approach to the performance of approximation algorithms is badly needed, although for any practical problem it probably will remain true that a successful heuristic will have to exploit whatever special structure and properties the problem at hand may have. Thus, the area of deterministic scheduling theory appears as one of the more fruitful interfaces between computer science and operations research. Much progress has been made and more can be expected in the near future. Note. The last three authors are currently engaged in writing a book on scheduling problems and would very much appreciate being informed about new algorithmic and complexity results in this area.
Acknowledgments The authors gratefully acknowledge the useful comments from M.L. Fisher. The research by the last three authors was supported by NSF Grant MCS7617605 and by NATO Special Research Grant 9.2.02 (SRG.7).
References D. Adolphson (1977), Single machine job sequencing with precedence constraints, SIAM J. Comput. 6, 40-54. D. Adolphson and T.C. Hu (1973). Optimal linear ordering, SIAM J. Appl. Math. 25, 403-423. A.V. Aho, M.R. Garey and J.D. Ullman (1972), The transitive reduction of a directed graph, SIAM J. Comput. 1, 131-137. J.L. Baer (19741, Optimal scheduling on two processors with different speeds, in: E. Gelenbe and R. Mahl, eds., Computer Architectures and Networks (North-Holland, Amsterdam, 1974) 27-45, K.R. Baker (1974), Introduction to Sequencing and Scheduling (Wiley, New York). K.R. Baker and A.G. Merten (1973). Scheduling with parallel processors and linear delay costs, Naval Res. Logist. Quart. 20, 793-804. K.R. Baker and L.E. Schrage (1978), Finding an optimal sequence by dynamic programming: an extension to precedence-related tasks, Operations Res. 26, 111-120. K.R. Baker and Z.-S. Su (1974). Sequencing with due-dates and early start times to minimize maximum tardiness, Naval Res. Logist. Quart. 21, 171-176. M.S. Bakshi and S.R. Arora (1969). The sequencing problem, Management Sci. 16, B247-263. J.W. Barnes and J.J. Brennan (1977), An improved algorithm for scheduling jobs on identical machines, AIIE Trans. 9, 25-31. P. Bratley, M. Florian and P. Robillard (197.5). Scheduling with earliest start and due date constraints on multiple machines, Naval Res. Logist. Quart. 22, 165-173. P. Brucker, M.R. Garey and D.S. Johnson (1977). Scheduling equal-length tasks under treelike precedence constraints to minimize maximum lateness, Math. Operations Res. 2, 27.5-284.
Deterministic sequencing and scheduling
321
J. Bruno, E.G. CofIman, Jr. and R. Sethi (1974), Scheduling independent tasks to reduce mean finishing time, Comm. ACM 17, 382-387. J. Bruno and T. Gonzalez (1976), Scheduling independent tasks with release dates and due dates on parallel machines, Technical Report 213, Computer Science Department, Pennsylvania State University. F. Burns and J. Rooker (1976), 3 x n Flow-shops with convex external stage time dominance (unpublished manuscript). A.K. Chandra and C.K. Wong (1975), Worst-case analysis of a placement algorithm related to storage allocation. SIAM J. Comput. 4, 249-263. J.M. Charlton and C.C. Death (1970), A generalized machine scheduling algorithm, Operational Res. Quart. 21, 127-134. N.-F. Chen (1975). An analysis of scheduling algorithms m multiprocessing computing systems, Technical Report UIUCDCS-R-75-724, Department of Computer Science, University of Illinois at Urbana-Champaign. N.-F. Chen and C.L. Liu (1975), On a class of scheduling algorithms for multiprocessors computing systems, in: T.-Y. Feng, ed., Parallel Processing, Lecture Notes in Computer Science 24 (Springer, Berlin, 1975) 1-16. E.G. C o h a n , Jr. (ed.) (1976), Computer and Job-Shop Scheduling Theory (Wiley, New York). E.G. C o h a n , Jr., M.F. Garey and D.S.Johnson (1978), An application of bin-packing to multiprocessor scheduling, SIAM J. Comput., to appear. E.G. Coffman, Jr. and R.L. Graham (1972), Optimal scheduling for two-processor systems, Acta Informat. 1, 200-213. R.W. Conway, W.L. Maxwell and L.W. Miller (1967), Theory of Scheduling (Addison-Wesley, Reading, MA). E.W. Davis (1966), Resource allocation in project network models-a survey, J. Indust. Engrg. 17, 177-188. E.W. Davis (1973), Project scheduling under resource constraints -historical review and categorization of procedures, AIIE Trans. 5, 297-313. J. Day and M.P. Hottenstein (1970). Review of scheduling research, Naval Res. Logist. Quart. 17, 11-39. P.J. Denning and G. Scott Graham (1973), A note on subexpression ordering in the execution of arithmetic expressions, Comm. ACM 16, 700-702. W.L. Eastman, S. Even and I.M. Isaacs (19641, Bounds for the optimal scheduling of n jobs on m processors, Management Sci. 11, 268-279. S.E. Elmaghraby and S.H. Park (1974), Scheduling jobs on a number of identical machines, AIIE Trans. 6, 1-12. M.L. Fisher (1973), Optimal solution of scheduling problems using Lagrange multipliers, part I, Operations Res. 21, 1114-1127. M.L. Fisher (1976), A dual algorithm for the one-machine scheduling problem, Math. Programming 11, 229-251. M. Fujii, T. Kasami and K. Ninomiya (1969, 1971). Optimal sequencing of two equivalent processors, SIAM J. Appl. Math. 17, 784-789; Erratum. 20, 141. M.R. Garey (-1, Unpublished. M.R. Garey and R.L. Graham (1975). Bounds for multiprocessor scheduling with resource constraints, SIAM J. Comput. 4, 187-200. M.R. Garey, R.L. Graham and D.S. Johnson (1978), Performance guarantees for scheduling algorithms, Operations Res. 26, 3-21. M.R. Garey, R.L. Graham and D.S. Johnson (-), Unpublished. M.R. Garey, R.L. Graham, D.S. Johnson and A.C.C. Yao (1976A), Resource constrained scheduling as generalized bin packing, J. Combinatorial Theory Ser. A 21, 257-298. M.R. Garey and D.S. Johnson (1975), Complexity results for multiprocessor scheduling under resource constraints, SIAM J. Comput. 4, 397-41 1. M.R. Garey and D.S. Johnson (1976A), Scheduling tasks with nonuniform deadlines on two processors, J. Assoc. Comput. Mach. 23, 461-467.
322
R.L. Graham
et
al.
M.R. Carey and D.S. Johnson ( I 976B), Approximation algorithms for combinatorial problems: an annotated bibliography, in: J.F. Traub, ed., Algorithms and Complexity: New Directions and Recent Results (Academic Press, New York 1976) 41-52. M.R Carey and D.S. Johnson (1977), Two-processor scheduling with start-times and deadlines, SIAM J. Comput. 6. 416-426. M.R. Carey, D.S. Johnson and R. Sethi (1976B), The complexity of flowshop and jobshop scheduling, Math. Operations Res. 1, 117-129. L. Gelders and P. R. Kleindorfer (1974), Coordinating aggregate and detailed scheduling decisions in the one-machine job shop: part I, Theory, Operations Res. 22, 46-60. L. Gelders and P.R. Kleindorfer (1975). Coordinating aggregate and detailed scheduling in the one-machine job shop: I1 -computation and structure, Operations Res. 23, 312-324. W.S. Gere (1966). Heuristics in job shop scheduling, Management Sci. 13, 167-190. B. Giffler and G.L. Thompson (1960), Algorithms for solving production-scheduling problems, Operations Res. 8. 487-503. P.C. Gilmore and R.E. Gomory (1964), Sequencing a one-state variable machine: a solvable case of the traveling salesman problem, Operations Res. 12, 655-679. T. Gonzalez (1976). A note on open shop preemptive schedules, Technical Report 214, Computer Science Department, Pennsylvania State University. T. Gonzalez (1977), Optimal mean finish time preemptive schedules, Technical Report 220, Computer Science Department, Pennsylvania State University. T. Gonzalez, O.H. Ibarra and S. Sahni (1977), Bounds for LPT schedules on uniform processors, SIAM J. Comput. 6, 155-166. T. Gonzalez and D.B. Johnson (1977). A new algorithm for preemptive scheduling of trees, Technical Report 222, Computer Science Department, Pennsylvania State University. T. Gonzalez, E.L. Lawler and S. Sahni (1978). Optimal preemptive scheduling of a fixed number of unrelated processors in linear time (to appear). T. Gonzalez and S. Sahni (1976). Open shop scheduling to minimize finish time, J. Assoc. Comput. Mach. 23, 665-679. T. Gonzalez and S. Sahni (1978A). Flowshop and jobshop schedules: complexity and approximation, Operations Res. 26, 36-52. T. Gonzalez and S. Sahni (1978B). Preemptive scheduling of uniform processor systems, J. Assoc. Comput. Mach. 25, 92-101. D.K. Goyal (1977A), Scheduling equal execution time tasks under unit resource r6striction (to appear). D.K. Goyal (1977B), Non-preemptive scheduling of unequal execution time tasks on two identical processors, Technical Report CS-77-039, Computer Science Department, Washington State University, Pullman. R.L. Graham (19661, Bounds for certain multiprocessing anomalies, Bell System Tech. J. 45, 15631581. R.L. Graham (1969). Bounds on multiprocessing timing anomalies, SIAM J. Appl. Math. 17, 263-269. R.L. Graham (1976). Bounds on the performance of scheduling algorithms, in: [Coffman 19761, 165-227. R.L. Graham (-), Unpublished. J.N.D. Gupta and S.S. Reddi (1978). Improved dominance conditions for the three-machine flowshop scheduling problem, Operations Res. 26, 200-203. W.A. Horn (1972). Single-machine job sequencing with treelike precedence ordering and linear delay penalties, SIAM J. Appl. Math. 23, 189-202. W.A. Horn (1973), Minimizing average flow time with parallel machines, Operations Res. 21, 846-847. W.A. Horn (l974), Some simple scheduling algorithms, Naval Res. Logist. Quart. 21, 177-185. E. Horowitz and S. Sahni (1976). Exact and approximate algorithms for scheduling nonidentical processors, J. Assoc. Comput. Mach. 23, 317-327. E.C. Horvath, S. Lam and R. Sethi (1977). A level algorithm for preemptive scheduling, J. Assoc. Comput. Mach. 24, 32-43.
Deterministic sequencing and scheduling
323
N.C. Hsu (1966). Elementary proof of Hu’s theorem on isotone mappings, Proc. Amer. Math. SOC.17, 111-114. T.C. Hu (1961). Parallel sequencing and assembly line problems, Operations Res. 9, 841-848. O.H. Ibarra and C.E. Kim (1979, Scheduling for maximum profit, Technical Report, Computer Science Department, University of Minnesota, Minneapolis. O.H. lbarra and C.E. Kim (1976). On two-processor scheduling of one- or two-unit time tasks with precedence constraints, J. Cybernet. 5. 87-109. O.H. Ibarra and C.E. Kim (1977), Heuristic algorithms for scheduling independent tasks on nonidentical processors, J. Assoc. Comput. Mach. 24, 280-289. E. Ignall and L. Schrage (1965). Application of the branch-and-bound technique to some flow-shop scheduling problems, Operations Res. 13, 400-412. J.R. Jackson (1955). Scheduling a production line to minimize maximum tardiness, Research Report 43, Management Science Research Project, University of California, Los Angeles. J.R. Jackson (1956). An extension of Johnson’s results on job lot scheduling, Naval Res. Logist. Quart. 3, 201-203. D.S. Johnson (1973). Near-optimal bin packing algorithms, Report MAC TR-109, Massachusetts Institute of Technology, Cambridge, MA. D.S. Johnson (1974), Fast algorithms for bin packing, J. Comput. System Sci. 8, 272-314. D.S. Johnson, A. Demers, J.D. Ullman, M.R. Carey and R.L. Graham (1974). Worst-case performance bounds for simple one-dimensional packing algorithms, SIAM J. Comput. 3, 299-325. S.M. Johnson ( l9S4), Optimal two- and three-stage production schedules with setup times included, Naval Res. Logist. Quart. 1, 6 1 4 8 . S.M. Johnson (1958), Discussion: Sequencing n jobs on two machines with arbitrary time lags, Management Sci. 5. 299-303. D.G. Kafura and V.Y. Shen (1977), Task scheduling on a multiprocessor system with independent memories, SIAM J. Comput. 6. 167-187. D.G. Kafura and V. Y. Shen (1978), An algorithm to design the memory configuration of a computer network, J. Assoc. Comput. Mach. 25, 365-377. R.M. Karp (1972). Reducibility among combinatorial problems, in: R.E. Miller and J.W. Thatcher, eds., Complexity of Computer Computations (Plenum Press, New York, 1972) 85-103. R.M. Karp (19751, On the computational complexity of combinatorial problems, Networks 5 , 45-68. M.T. Kaufman (1972), Anomalies in scheduling unit-time tasks, Technical Report 34, Stanford Electronic Laboratory. M.T. Kaufman (1974). An almost-optimal algorithm for the assembly line scheduling problem, IEEE Trans. Computers C-23, 1169-1 174. H. Kise. T. Ibaraki and H. Mine (1978), A solvable case of the one-machine scheduling problem with ready and due times, Operations Res. 26, 121-126. D. Knuth (1973), Private communication to T.C. Hu, July 23 (1973). K.L. Krause (1973). Analysis of computer scheduling with memory constraints, Doctoral Dissertation, Computer Science Department, Purdue University, West Lafayette. K.L. Krause, V.Y. Shen and H.D. Schwetman (1975, 1977), Analysis of several task-scheduling algorithms for a model of multiprogramming computer systems, J. Assoc. Comput. Mach. 22, 522-550; 24, 527. M. Kunde (1976). Beste Schranken beim LP-Scheduling, Bericht 7603, lnstitut fur Informatik und Praktische Mathematik, Christian-Albrechts-UniversitatKiel. J. Labetoulle, E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan (1978), Preemptive scheduling of uniform machines subject to release dates (to appear). B.J. Lageweg, E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan (1978A), Computer aided complexity classification of deterministic scheduling problems (to appear). B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan (1976), Minimizing maximum lateness on one machine: computational experience and some applications, Statistica Neerlandica 30, 25-41. B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan (1977). Job-shop scheduling by implicit enumeration, anagement Sci. 24, 441-450. B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan (1978B), A general bounding scheme for the permutation flow-shop problem, Operations Res. 26, 53-67.
324
R.L. Graham et al.
S. Lam, R. Sethi (1977), Worst case analysis of two scheduling algorithms, SIAM J. Comput. 6, 5 18-536. E.L. Lawler (1973), Optimal sequencing of a single machine subject to precedence constraints, Management Sci. 19, 544-546. E.L. Lawler (1976A), Sequencing to minimize the weighted number of tardy jobs, Rev. Franeaise Automat. Informat. Recherche Opkrationnelle 10 (5 Suppl.) 27-33. E.L. Lawler (1976B), Combinatorial Optimization: Networks and Matroids (Holt, Rinehart, and Winston, New York). E.L. Lawler (1977). A “pseudopolynomial” algorithm for sequencing jobs to minimize total tardiness, Ann. Discrete Math. 1, 331-342. E.L. Lawler (1978), Sequencing jobs to minimize total weighted completion time subject to precedence constraints, Ann. Discrete Math., 2, 75-90. E.L. Lawler and J. Labetoulle (1978). On preemptive scheduling of unrelated parallel processors, J. Assoc. Comput. Mach., to appear. E.L. Lawler and J.M. Moore (1969), A functional equation and its application to resource allocation and sequencing problems, Management Sci. 16, 77-84. J.K. Lenstra (1977), Sequencing by Enumerative Methods, Mathematical Centre Tract 69, Mathematisch Centrum, Amsterdam. J.K. Lenstra (-), Unpublished. J.K. Lenstra and A.H.G. Rinnooy Kan (1979, Some simple applications of the travelling salesman problem, Operational Res. Quart. 26, 717-733. J.K. Lenstra and A.H.G. Rinnooy Kan (1978A), Complexity of scheduling under precedence constraints, Operations Res. 26, 22-35. J.K. Lenstra and A.H.G. Rinnooy Kan (1978B), Computational complexity of discrete optimization problems, Ann. Discrete Math., 4 (1979), Discrete Optimization I. J.K. Lenstra, A.H.G. Rinnooy Kan and P. Brucker (1977). Complexity of machine scheduling problems, Ann. Discrete Math, 4. 281-300. C.L. Liu (1972), Optimal scheduling on multi-processor computing systems, Proc. 13th Annual IEEE Symp. Switching and Automata Theory, 155-160. C.L. Liu ( 1 976). Deterministic job scheduling in computing systems, Department of Computer Science, University of Illinois at Urbana-Champaign. J.W.S. Liu and C.L. Liu (l974A), Bounds on scheduling algorithms for heterogeneous computing systems, in: J.L. Rosenfeld, ed., Information Processing 74 (North-Holland, Amsterdam, 1974) 349-353. J.W.S. Liu and C.L. Liu (1974B), Bounds on schedulling algorithms for heterogeneous computing systems, Technical Report UIUCDCS-R-74-632, Department of Computer Science, University of Illinois at Urbana-Champaign, 68 pp. J.W.S. Liu and C.L. Liu (1974C), Performance analysis of heterogeneous multiprocessor computing systems, in: E. Gelenbe and R. Mahl, eds.. Computer Architectures and Networks (NorthHolland, Amsterdam, 1974) 331-343. G.B. McMahon (1969). Optimal production schedules for flow shops, Canad. Operational Res. SOC.J. 7, 141-151. G.B. McMahon (1971), A study of algorithms for industrial scheduling problems, Thesis, University of New South Wales, Kensington. G.B. McMahon and M. Florian (1975). On scheduling with ready times and due dates to minimize maximum lateness, Operations Res. 23, 475-482. R. McNaughton (19S9), Scheduling with deadlines and loss functions, Management Sci. 6, 1-12. L.G. Mitten (1958). Sequencing n jobs on two machines with arbitrary time lags, Management Sci. 5, 293-298. C.L. Monma (1977), Optimal m x n flow shop sequencing with precedence constraints and lag times, School of Operations Research, Cornell University, Ithaca, NY. C.L. Monma and J.B. Sidney (1977). A general algorithm for optimal job sequencing with seriesparallel precedence constraints, Technical Report 347, School of Operations Research, Cornell University, Ithaca, NY. J.M. Moore (1968). An n job, one machine sequencing algorithm for minimizing the number of late jobs, Management Sci. 15, 102-109.
Deterministic sequencing and scheduling
321
R.R. Muntz and E.G. Coffman, Jr. (1969), Optimal preemptive scheduling on two-processor systems, IEEE Trans. Computers C-18, 1014-1020. R.R. Muntz and E.G. Coffman, Jr. (1970). Preemptive scheduling of real time tasks on multiprocessor systems, J . Assoc. Comput. Mach. 17, 324-338. Y. Muraoka (1971), Parallelism, exposure and exploitation in programs, Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign. J.F. Muth and G.L. Thompson (eds.) (1963). Industrial Scheduling (Prentice-Hall, Englewood Cliffs, NJ) 236. I. Nabeshima (1963). Sequencing on two machines with start lag and stop lag, J. Operations Res. SOC. Japan 5, 97-101. S.S. Panwalkar and W. Iskander (1977). A survey of scheduling rules, Operations Res. 25, 45-61. J. Piehler (1960), Ein Beitrag zum Reihenfolgeproblem, Unternehmensforschung 4, 138-142. S.S. Reddi and C.V. Ramamoorthy (1972), On the flow-shop sequencing problem with no wait in process, Operational Res. Quart. 23, 323-331. A.H.G. Rinnooy Kan (1976), Machine Scheduling Problems: Classification, Complexity and Computations (Nijhoff, The Hague). A.H.G. Rinnooy Kan, B.J. Lageweg and J.K. Lenstra (1975), Minimizing total costs in one-machine scheduling, Operations Res. 23, 908-927. P. Rosenfeld (-), Unpublished. M.H. Rothkopf (1966). Scheduling independent tasks on parallel processors, Management Sci. 12, 437-447. B. Roy and B. Sussmann (1964), Les problkmes d’ordonnancement avec contraintes disjonctives, Note DS no. 9 bis, SEMA, Montrouge. S. Sahni (1976), Algorithms for scheduling independent tasks, J. Assoc. Comput. Mach. 23, 116-127. S. Sahni and Y. Cho (1977A), Scheduling independent tasks with due times on a uniform processor system, Computer Science Department, University of Minneapolis. S. Sahni and Y. Cho (1977B3, Nearly on line scheduling of it uniform processor system with release times, Computer Science Department, University of Minnesota, Minneapolis. S. Sahni and Y. Cho (1977C), Complexity of scheduling jobs with no wait in process, Technical Report 77-20, Computer Science Department, University of Minnesota, Minneapolis. R. Sethi (1976A), Algorithms for minimal-length schedules, in: [Coffman 19761, 51-99. R. Sethi (l976B), Scheduling graphs o n two processors, SIAM J. Comput. 5, 73-82. R. Sethi (1977), On the complexity of mean flow time scheduling, Math. operations Res., 2, 320-330. J.B. Sidney (1973), An extension of Moore’s due date algorithm, in: S.E. Elmaghraby, ed., Symposium on the Theory of Scheduling and its Applications, Lecture Notes in Economics and Mathematical Systems 86 (Springer, Berlin, 1973) 393-398. J.B. Sidney (1975), Decomposition algorithms for single-machine sequencing with precedeme relations and deferral costs, Operations Res. 23, 283-298. J.B. Sidney (1977), The two-machine maximum flow time problem with series-parallel precedence constraints, Faculty of Management Sciences, University of Ottawa. W.E. Smith (1956). Various optimizers for single-stage production, Naval Res. Logist. Quart. 3, 59-66. H.E. Stern (1976), Minimizing makespan for independent jobs on nonidentical parallel machines - an optimal procedure, Working Paper 2/75, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer-Sheva. W. Szwarc (1968), On some sequencing problems, Naval Res. Logist. Quart. 15, 127-155. W. Szwarc (1971), Elimination methods in the rn x n sequencing problem, Naval Res. Logist. Quart. 18. 295-305. W. Szwarc (1973), Optimal elimination methods in the rn X n sequencing problem, Operations Kes. 21, 1250-1259. W. Szwarc (1978). Dominance conditions for the three machine flow-shop problem, Operations Res. 26,203-206. W. Townsend (1977A). A branch-and-bound method for sequencing problems with linear and exponential penalty functions, Operational Res. Quart. 28, 191-200. W. Townsend (1977B), Sequencing n jobs on m machines to minimise maximum tardiness: a branchand-bound solution, Management Sci. 23, 1016-1019.
326
R.L. Graham et al.
J.D. Ullman (1975). NP-complete scheduling problems, J. Comput. System Sci. 10, 384-393. J.D. Ullman (1976), Complexity of sequencing problems, in: [Coffman 19761, 139-164. J.M. Van Deman and K.R. Baker (1974), Minimizing mean flowtime in the flow shop with no intermediate queues, A l l E Trans. 6, 28-34. D.A. Wismer (1972), Solution of the flowshop-scheduling problem with no intermediate queues, Operations Res. 20, 689-697.
Annals of Discrete Mathematics 5 (1979) 327-387 @ North-Holland Publishing Company.
SELECTED FAMILIES OF LOCATION PROBLEMS Jakob KRARUP Institute of Datalogy, University of Copenhagen
Peter Mark PRUZAN Institute of Economics, University of Copenhagen This exposition is concerned with two important families of static, deterministic, single criterion, one product location problems: Center problems and median problems. A similar treatise covering additional families is at present under preparation and will be published elsewhere. Reference is made to the introduction for an overview of the families of problems considered. Each part, which is introduced by its own abstract and concluded by its own bibliography, can be read independently of the others as cross references are virtually non-existent.
0. Introduction
The location of any physical object whatsoever, partially or totally created by some living organism can be said to represent the solution to a location problem. Even if we restrict “living organism” to encompass homo sapiens only and even if we assume some “intellectual process” behind the choice of a solution, the entire field of location problems is still overwhelming and its story dates back as far as the story of mankind itself. Further limitations are necessary in order to find a suitable framework for the presentation of the subject, and the next subset of location problems emerges naturally if the rather vague phrase “intellectual process” is replaced by some “systematical approach based on what has been commonly accepted as OR-methodology”. This brings us to the idea of using models as a convenient tool in location decision-making. Apart from the nice iconic model by means of which the location of Copenhagen was determined in 1167 by Bishop Absalon (see the next page), we shall consider only abstract (mathematical-symbolic) models in the sequel. And, with the main purpose of this monograph in mind-a systematic exposition of prototype families of location problems -we are finally faced with those location problems which can be meaningfully formulated and solved within the framework of optimization. Instead of proceeding directly to our subject, we will elaborate first upon the relationship between the activities of those concerned in some way with decisionmaking as to the placement of “things” and those concerned with developing algorithms applicable to location problems. 327
328
J. Krarup, P.M. Pnczan
LOCATION PLANNING 1167’
June 15, 1967: 800 years ago, Bishop Absalon employed a heuristic location procedure to determine the optimal location when Copenhagen was founded. “Your Reverence must be kidding! Should our capital really be located in such a ridiculous place?”
The term “location problem” has a catalytic effect upon many practitioners concerned with physical-economic planning, e.g. regional planners, architects, managers of multi-plant and/or warehouse companies, town planners and the like. It conjures forth decisions characterized by many elements of a physical, economic, social and aesthetic nature, e.g. buildings, sites, transportation routes, investments, freight and handling costs, geographically dispersed organizations, environmental restrictions. And in fact to most ordinary people “location problem” is intuitively understandable and deals with how some thing or things are to be placed so as to improve the performance of a system. But for that subset of the population called “mathematical programmers” or “optimizers” (frequently disjoint with the subsets of practitioners or the ordinary people mentioned above), i.e. those possessed with the demon of optimization, the word “crutch” is perhaps more suitable than the euphemism “catalyst”. ‘With the kind permission of the famous interpreter of the Danish scene, Bo Bojesen, the newspaper “Politiken”.
Selected families of location problems
329
Optimizers seldom develop algorithms which can be directly used to generate a basis for decision making regarding the practitioners’ real-world problems. Rather, their concern with location problems permits them to consider a large group of more or less prototype optimization problems in the firm belief that their mental calesthenics are/will be both relevant for and applicable to the practitioners’ decision problems. It is our experience that these practitioners and the optimizers have indeed very little contact with or influence upon each others activities. And when they do, it is most often via an intermediary, quite often an OR-type who to some extent knows both groups’ jargon and who utilizes heuristic procedures based upon algorithms (i.e. computational procedures known to produce an optimal solution) or theorems dealing with the most simple of the prototype location problems. It is interesting to note in this connection, that while the huge and rapidly growing literature on location problems mainly deals with algorithmic matters, particularly based upon a frame of reference steeped in graph theory, the literature on applications of these algorithms is diminutive. To give some idea of the concept “huge” as used above, we can refer to a comprehensive annotated bibliography [Lea 197312. This bibliography surveys mainly the literature in English from the 1960’s to mid-1972; nevertheless, it consists of 231 pages and well over a thousand references. Useful supplements to [Lea 19731 are the selective bibliography compiled by [Francis & Goldstein 19741 covering 224/226 references and certain parts of the more recent bibliography [Golden & Magnanti 19771. And the literature since then has by no means been decreasing, on the contrary. It is therefore the outrageously ambitious goal of this monograph to consider several “classical” prototype location problems from the viewpoint and in the language of the optimizers, but in such a way that conclusions can be drawn as to where additional research and development should be encouraged so that the optimizers’ products will be more meaningful and relevant for the practitioners. We will however limit our level of ambition (and thus the reader’s state of joy -be she optimizer or practitioner -in anticipation of an all-inclusive, concise, precise and operational treatment of location problems) and make the following reservations: (1) Although the main emphasis will be on discrete optimization, we consider it to be self-evident that location problems in the plane should also be included. Practitioners are often able to choose between representing a given location problem as being posed in the plane or in a network. The decisive questions in such cases are often: (a) is the transportation network so well developed in the region under consideration, that a planar formulation is reasonable? (b) Is there a relatively small set of identifiable, possible locations so that a network formulation is reasonable? (c) Are the optimal solutions to a planar formulation readily
’The full reference to this and other bibliographies is given at the end of the introduction.
330
J. Krarup, P.M. h z a n
transferable to a set of possible locations without resulting in a serious increase in the value of the objective function? (d) Are there considerable computational simplifications obtainable via a network or a planar formulation? It is our experience that in many real-world problems, the answer to such questions are ambiguous. The model builder will have considerable flexibility in her choice of planar vs. network formulation, and a knowledge of both is therefore required. (2) We will place primary emphasis upon algorithms and their properties. Heuristic procedures for the prototype location problems will only be considered in so far as they represent especially attractive alternatives to existing algorithmic procedures or contribute to other fields of current research in combinatorial optimization including the evaluation of bounds. Therefore, only algorithms will be considered when dealing with the more simple prototype location problems, while heuristics will receive greater attention as we progress to more complex problems. This is by no means an indication that we consider heuristic procedures to be “second class citizens”; on the contrary, many of our own practical applications of location models have been heuristically oriented, as this is often the only reasonable approach in complex, real-world problems. Rather, the motivation for our emphasis upon algorithms is based upon the fact that this paper only considers prototype location problems, e.g. problems having a simple, welldefined structure. However, most heuristic procedures applicable to large, complex, real-world problems utilize algorithms for the prototype location problems as sub-routines. Therefore our emphasis upon algorithms for prototype problems is in fact commensurate with the goal of contributing to the development of solution procedures for the practitioners’ location problems. (3) We will essentially only consider deterministic formulations; it is not felt that the inclusion of stochastic formulations is of reasonable significance for the practitioners, or particularly relevant for the optimizers. The literature dealing with stochastic location problems is virtually non-existent, and far more relevant and challenging unsolved problems demand our attention. (4)We will only consider static formulations. Although the inclusion of dynamic location problems is of considerable importance for both the practitioners and the discrete optimizers, time and space limitations preclude their treatment in the present exposition. We derive solace, however, from the observation that it is not meaningful to consider dynamic location problems unless the static formulations are well treated, as essentially all dynamic formulations employ static formulations as sub-routines. Furthermore, within several of the prototype families of location problems, e.g. center problems, such dynamic formulations may be considered to be non-existent. For a rather comprehensive bibliography of dynamic location problems, reference is made to [Erlenkotter 19751 where 123 entries are listed. (5) We will assume that the reader is in agreement with our postulate that the
Selected families of location problems
331
basic reason for considering location problems here is an interest in improving our ability to meaningfully formulate and solve realistic decision problems rather than our ability to relate our research on some exotic problem to a more practical sounding title in order to justify that research. (6) We will not, however, assume that the reader is well versed in the location literature, but assume on the other hand that he is familiar with the most common phraseology employed by optimbers. (7) Contrary to common practise when drawing syntheses with respect to a field of endeavour, we have chosen to be selective rather than comprehensive in our choice of literature supporting the exposition. This is by no means either an indication of impoliteness or of laziness. As mentioned earlier, the literature is enormous. We will only refer to particularly noteworthy or challenging literature (our subjective evaluation). In particular, historical (more than 20 years old) references will be avoided apart from a few exceptional cases. Furthermore, we will not attempt to give equal space to either the problems considered or the literature cited; we will subjectively allocate our resources according to our perception of the practical relevance of the problems and algorithmic approaches considered. Over and above these -our own - reservations, we must admit that there is a limiting factor over which we cannot exert any significant influence; despite the huge literature, the “families” of prototype problems treated is very limited indeed. There exists a serious gap between the practitioners’ apriori definitions of location problems and the optimizers’ transformations of these problems. The former group often meets problems characterized by the following elements: many products, many time periods and seasonality, considerable uncertainty as to developments in the “market”, costs etc., and multi-dimensional objectives; the latter group considers for the main one product, static, deterministic, minimumcost formulations. Finally, a location problem is seldom a self-contained subject for the practitioners, but rather is most often but a subproblem within the area of long range planning, with extremely strong ties to the existing location and distribution structure; these aspects are essentially ignored in most of the literature. These observations will be referred to again wherever conclusions are drawn in attempts to synthesize the “state of the art” and to propose directions for further research. We have now expressed a number of reservations to justify excluded areas without really saying what we intend to cover. The present monograph in two parts considers the so-called center and median problems. These additional papers supplement this exposition: [Krarup & Pruzan 1977Al discusses challenging unsolved center and median problems and areas for future research. The plant location family is the subject of [Krarup & Pruzan 1977Bl to be published, while plant layout problems including quadratic assignment is the subject of [Krarup & Pruzan 19781.
332
J. Krarup, P.M. Pnczan
Each part, which is introduced by its own abstract and concluded by its own bibliography, can be read independently of the others as cross references are essentially non-existent.
Acknowledgements-and a bit more The organizers of DO77 - also the editors of this volume -originally comissioned us to write a “comprehensive survey on location problems”. What was presented at D077, however, revealed only the top of the iceberg and what follows in the subsequent two parts is still a fragment of the entire picture. As the preparatory work brought us through a wide variety of emotional ups and downs ranging from enthusiasm, desperation and frustration to resignation, we are strongly indebted for the irresistible motivation provided by the invitation. Otherwise, we might never have set about such a venture. After D077, several colleagues have kindly assisted us in pointing out certain omissions and misstatements. Our attention has also been drawn to lots of inaccessible (overlooked) but highly relevant literature. We are grateful to A.M. Frieze, A.M. Geofhion, M. Guignard, P. Hansen, S.K. Jacobsen, M.S. Klamkin, H.W. Kuhn, G.L. Nemhauser, K. Olsen, R.M. Soland, K. Spielberg and others for thus having extended our horizon. Thanks are in particular due to J.K. Lenstra and A.H.G. Rinnooy Kan for informal discussions of “complexity of location problems”, a discussion we indeed are looking forward to resuming. We are also indebted to R.E. Burkard for raising questions about “location problems and algebraic objectives”, to J. Halpern for providing more insight into some hybrid formulations, and to S. Walukiewicz for several useful suggestions as to further improvements of the exposition. Throughout the preparation, we have been well aware of the existence of the monograph [Elshafei et al. 19741 which covers a broad spectrum of facility location problems and lists 392 references. Unfortunately, it was impossible to obtain a copy until the very last minute. Being both extensive and of a high quality, their work would certainly have been of much use and saved us lots of time. Although regrettable, it is now too late for making further comments; at least we are pleased to acknowledge our thanks to A.N. Elshafei for his helpfulness and to list their work in our references. Other “last minute papers” include a series received most recently from A.N. Geoffrion. Of most interest in this context is a status report on strategic distribution systems planning [GeofTrion et al. 19771. The accompanying letter reads: “I couldn’t agree more with your assertion that theoreticians and practitioners do much less for one another than one would hope for. This has been a lament of mine ever since I began solving applied problems 5 years ago”. Subjects such as complexity, algebraization and hybrid formulations which are
Selected families of location problems
333
briefly mentioned in the above acknowledgements are beyond the scope of the present exposition. It is however our goal to produce a more conclusive synthesis of location problems, once we are finished with prototype considerations. The theme of this synthesis will be the meta-problem: How to optimally formulate location problems.
References Elshafei, A.N., K.B. Haley, and S.-J. Lee (1974), Facilities location: some formulations, methods of solution, applications and computational experience, OR Report No. 90, The University of Birmingham, England. Erlenkotter, D. (1975), Bibliography on dynamic location problems, Discussion Paper No. 55, Management Science Study Center, Graduate School of Management, UCLA. Francis, R.L., and J.M. Goldstein (1974), Location theory: A selective bibliography, Operations Res. 22 (2), 400-410. Geofion, A.M., and G.W. Graves (1977), Strategic distribution planning: A status report, Working Paper No. 272, Western Management Science Institute, University of California, LQS Angeles. Golden, B.L., and T.L. Magnanti (1977), Deterministic network optimization: A bibliography, Networks 7, 149-183. Krarup, J., and P.M. Pruzan (1977A), Challenging unsolved center and median problems, DIKUreport 1977-10, Institute of Datalogy, University of Copenhagen. To be published by the Polish Academy of Sciences. Krarup, J., and P.M. Pruzan (1977B), Selected families of location problems. Part 111: The plant location family, Working Paper No. WP-12-77, University of Calgary, Faculty of Business. Krarup, J., and P.M. Pruzan (1978), Computer-aided layout design, Math. Programming Study, 9, 75-94. Lea, A.C. (1973), Location-allocation systems: An annotated bibliography, Discussion Paper No. 13, University of Toronto, Dept. of Geography.
334
J. Krarup, P.M. Pruzan
PART I. CENTER PROBLEMS 1. Introduction The center problem is to locate a predetermined number of objects (points) so as to minimize the maximum of the weighted distances between the objects and the clients (points) assigned to serve/be served by them. The optimum location of just one object is called the center, and when p > 1 objects are to be located, the optimal solution is called the p-center. Such a minimax objective function can frequently be met in the formulations of so-called emergency problems, e.g. regarding police, fire and ambulance services. A reasonable criterion for the effectiveness of such service coverage is that any client may be reached from the object nearest it within a given weighted distance, time or cost. The weight characterizing a client may be interpreted as a measure of its importance or the probability of an emergency occurring and therefore will often be a function of the size of the client, e.g. of the number of people in a community. For a given number of emergency facilities to be located, the optimal value of a center problem’s objective function represents a lower bound for this criterion. During the last decade or so a wide variety of center problems have been treated in the literature with by far the most attention being devoted to center problems in a network, i.e. where the objects may only be located at points on the edges or vertices of a network while the clients have predefined locations at the vertices. Historically however, these problems have first been considered in the plane, i.e. where the object(s) may be placed at will in the plane and where the clients have predefined locations. We will introduce center problems by first considering the location of one object in the plane, then one object in a network, before continuing with the more realistic and demanding p-center problems in the plane and in a network. We will also briefly treat the related “dual” class of problems where the goal is to find the minimum number of objects (and their locations) necessary to serve the clients under the condition that the maximum weighted distance from a client to an object does not exceed a prescribed limit. In Part I1 of the present monograph we will then consider a class of problems closely related to the center or minimax location problem. In this new family of so-called median or minisurn location problems, the objects are to be located so as to minimize the sum of the weighted distances between the objects and the clients assigned to them. However, we will also consider “hybrid” problem formulations which contain both minimax and minisum considerations, either via an objective function including both measures or via a formulation where the sum of the
Selected families of location problems
335
weighted distances is to be minimized subject to a restriction as to the maximum weighted distance permitted between an object and a client.
2. l-Center problem in the plane 2.1. Euclidean distance Given a finite set J = { l , . . . , n} of clients with coordinates (xi, yi), and with weights wi assigned to each, j~ J. The l-center problem in the plane is to determine the location (xq,y,) of an object q such that the maximum weighted (Euclidean) distance between the object and any client is minimized. The Euclidean distance d, between q and the jth client is defined by d , , ~ l l * 1 \ 2 = { (-x,)’+(Y, ~, -y,)’}l,
j€J
so that the problem reads l-Center problem in the plane: Euclidean distance min (x,.y,kRZ
[max { wjdqj}] icJ
2.1.1. Solution methods The problem as formulated above has a unique solution; for a discussion of solution properties, see [Francis 19671. It can be shown to be equivalent to a convex programming problem, but may be solved by geometrically oriented algorithms. In our discussion of solution procedures, three different cases are distinguished : (i) All weights equal. In the most simple case where all clients are equally weighted, i.e. wj = 1, all j , the problem reduces to finding the circle with the smallest radius which encloses all the clients. This problem of finding a minimum covering circle was apparently first posed in 1857 by Sylvester as a mathematical brain-teaser, without any thought whatsoever of a location problem: “It is required to find the least circle which shall contain a given system of points in a plane”. In 1860 he presented a geometrical solution procedure attributed to Pierce. A solution procedure which is finite and for which the average cpu time on an IBM 7094 to cover 100 points is less than 0.5 seconds is given in [Elzinga & Hearn 19721. It can be noted, that while the iterative procedure of their algorithm guarantees an optimal solution, a simple, straightforward procedure can provide solution estimates (without the aid of a computer) which are often optimal, and which in the worst possible case underestimate the maximal distance from the center estimate to a client in the client-set by approximately 42%. Similarly, the distance
336
J. Krarup, P.M. Pruzan
between the optimum center location and the estimated location is at most approximately 33% of this maximum distance. The heuristic is to choose those two points which are farthest from each other and to estimate the center location as the midpoint of the line joining them; locating these two points may be done on a computer requiring (;) computations, but in most cases can be more simply performed by plotting the client coordinates and then visually locating the point-pair. If all the other points are enclosed by the circle centered at this midpoint and having diameter equal to the distance d between the two points, then the estimated location of the center is in fact the optimal solution. If this is not the case, the “maximum error” can be shown to be bounded above by $&d = 0.29d.
(ii) The center-addend case with equal weights. Elzinga & Hearn also consider the more general center-addend case:
where a, 3 0. For all j, they denote by ci a circle of radius a, centered at j. The sum of aj and the distance from q to j is the same as the distance from q to j and then to any point on c,. Thus, in the center-addend case, the problem is equivalent to finding the smallest covering circle enclosing all the ci. The non-negative addends may be interpreted as fixed costs, e.g. as a time delay between the recognition of the need for an emergency service and the arrival of the service, or as the time delay between the arrival of the service at a call station (client j ) for the service and the determination of the specific location of the emergency in the neighbourhood of j. Also in this addend case, the location of the center is unique. (iii) Individual weights. When the clients are not equally weighted, the problem can be reformulated to one of finding the circle of minimum radius that encloses the n clients, i.e. min z such that z 3 widd, j EJ. [Jacobsen & Pruzan 19761 propose a solution procedure based upon successively considering clients such that equality exists in three (occasionally two) of the restrictions z 3 wid,.
2.2. Rectilinear distance The formulation is identical to the l-center problem in the plane with Euclidean distance except that the rectilinear distance measure (the norm 11.11,) is employed. That is, it is assumed that the weighted distance between the object to be located and a client is the product of the rectilinear distance between them and the weight assigned to the client.
Selected families of location problems
337
1-Center problem in the plane: Rectilinear distance min (x,,Y,)ER*
[ma, {wid,}], i d
In principle, the same geometrically oriented approach considered under the Euclidean case can be employed here. For all wi equal, the problem reduces to determining the locus of all points equidistant from a given point, here a diamond, i.e. a square rotated 45" from the coordinate axes, and by replacing the concept of a minimum covering circle by a minimum covering diamond. See [Elzinga & Hearn 19721 which also considers the more general case: minqcR2 [maxi., {d, + ai}], ui > 0. The location of the center here, however, is not necessarily unique. This method based upon finding a covering diamond can presumably be extended for the case of unequal wi. However, for a larger number of clients n, such a procedure would become extremely unwieldly (nor would it be directly applicable for problems characterized by constraints or by more than 2 dimensions). Fortunately, however, the problem can be reduced to a parametric linear programming problem with only 5 n restrictions and 2n variables, - see the detailed discussion of the procedure due to [Wesolovsky 19721 in Section 5 on the p-center problem in the plane -or to a nonparametric programming problem with but 4n restrictions and 3 variables -see [Elzinga & Hearn 19731. It should be emphasized here that while both of these procedures treat a problem formulation rather different from the standard center problem, for the case p = 1 the problem formulations are identical, and thus their algorithms are directly applicable here.
3. 1-Absolute center problem in a network A few graph-theoretical concepts will be employed in the sequel. A graph G = (V, E) is defined by its vertex set V and edge set E. Throughout, V is assumed to be finite and of cardinality IVI = n. Wherever numbers are associated with its vertices and/or edges (e.g. in defining weights of vertices and/or lengths of edges), G is referred to as a network. If (x, y ) is an edge of G with end-vertices x and y and q is not a vertex, then (x, y ) is said to be subdivided when it is replaced by the edges (x, q) and (4, Y). Unless otherwise stated, we consider only undirected networks. For directed (or mixed) networks in which all (or some) edges can be traversed only in one specified direction, we shall speak explicitly of directed edges to emphasize the distinction.
J. Krarup, P.M. h z a n
338
While the 1-center problem in the plane assumes that every (weighted) client can be reached from every point in the plane and that the shortest path between an object and a client can be generated with the aid of a measure of distance (Euclidean and rectilinear in the cases considered here), the center problem in a given network assumes a more detailed model of the accessibility of objects to clients in the form of a vertex-weighted network. The 1-absolute center problem allows a point q to be located anywhere in a vertex-weighted network G = (V, E), i.e. q is either a vertex, q E V, or subdivides some edge ( x , y) E E. By the notation q + (x, y ) which covers both cases, we mean that the point q is located somewhere on the edge (x, y ) E~ where this edge includes both of its end vertices x, y. In some sense, E can be viewed as the set of feasible locations for q, but formally we cannot write something like “ q E E” since members of E are vertex-pairs. To overcome the difficulty, let 8 = 8 ( G ) denote the set of feasible locations as defined verbally above. In contrast, the 1-vertex center problem to be considered shortly only permits location on a vertex. For the sake of simplicity of exposition, we will focus our attention upon undirected networks; the extension to directed networks is in most cases straightforward. Note, however, that virtually all of the literature to date only considers the case of undirected networks. For G = (V, E), let
w, be the weight associated with vertex j~ V, lxy be the length of edge (x, y) E E, dii be the length of a shortest path between any pair i, j i E V,j E V of vertices. It will be assumed, that the matrix of distances D = {d,} corresponding to the shortest paths between each vertex pair in the network is available. Note, that for a network with n vertices, determining all the shortest path distances, requires computation at least proportional to n3. Let G’ = (V’, E’) be the subdivision network of G = (V, E) resulting from the assignments q + (x, y ) and q” +-(xo, yo) of two points q, qo. Let the real-valued functions 1’, d’ be defined for G’ as the corresponding 1, d for G. Accordingly, I:, + 1GY= lX,and d:, = min { 1Lx + dxj,1Ly + d,,i} and similar expressions hold for q replaced by qo. qoE % ( G )is the absolute center in the vertex-weighted network G if max jeV
S max jEV
for every q E 8 . The absolute center of an undirected network is thus that point in the network from which the maximum weighted distance to (or from) any vertex in the network is a minimum.
Selected families of location problems
339
The l-absolute center problem in an undirected network is then:
l-Absolute center problem in an undirected network
1 The extension to directed networks is straightforward; see [Christofides 1975, pp. 82-83]. Depending upon whether we wish to determine an “out-center” (the location of a point in the network which minimizes the maximum weighted distance from the point to a reachable weighted vertex), an “in-center” (which minimizes the maximum weighted distance from a vertex to a reachable point in the network) or an “out-in-center” (which minimizes the sum of the weighted distances from a point to a reachable vertex and back to the reachable point) then only the distance calculations are to be modified with due consideration to those edges which are directed.
3.1. Solution methods The problem was apparently first formulated by [Hakimi 19641. His algorithmic procedure consists of a straightforward, easy to understand, but computationally demanding two-stage search procedure, which is sketched here for the case of undirected networks. The first stage consists of finding the local centers on all the edges where the local center for a given edge ( x , y ) is the point q” + ( x , y ) minimizing the maximum weighted distance between any point q + ( x , y ) and any vertex j~ V. That is, q*(x, y ) is a local center on (x, y) if max {wid;*i}G min [ma~{w~d:~}] jCV
q 4 x . y )
ieV
This local center on ( x , y) is found by plotting maxi., {oidL} for 0 s 1cq=slX,. The second stage consists simply of finding the local center with minimal minimax weighted distance. The computational requirements can be reduced considerably due to the following observations. The maximum distance between a point on an edge and a vertex is a piece-wise linear function (which may have several minima) and the local center can only be located at one of its break points, i.e. at a point where the derivative of the function changes value. The minimax weighted distance from a vertex center on a network (very simple to calculate, as can be seen from the following section on the l-vertex center problem in a network), is at least as great as the corresponding minimax weighted distance from the absolute center. This property can be utilized to reduce the search for local centers by determining when a local center on an edge cannot improve upon the vertex center of the network and therefore cannot be a candidate for the absolute center, see [Odoni 19741. A similar and more effective bounding procedure to exclude considering
J . Krarup, P.M. h z a n
340
edges as candidates for the location of absolute centers is described in [Christofides 19751. The description above was based upon the assumption of undirected networks. As mentioned previously, the only modifications required to treat directed edges concern the determination of distances. In fact, the calculations are simplified since for directed edges, a local center will clearly only be found at one (or both) of its vertices, and thus the demanding computation of the piece-wise linear envelope of weighted distances from points on the edge to all the vertices in the network can be eliminated. An alternative, iterative procedure is suggested by [Christofides 19751; this will, however, be desired in the section on p-center problems. The 1-absolute center problem is particularly simple to solve for the special case where the network is an undirected tree. The algorithm is of order n2 where n is the number of vertices: (1) Determine the vertices io and j o which define the weighted diameter 8 of the tree : 8 = max [wiw,dii/(wi+ wi)], i,jeV
(2) the unique absolute center qo is located on the path from io to j o at a distance O/w? from io. See e.g. [Dearing & Francis 19741 which also derives properties of the objective function for the general 1-center problem in a network. In particular, for the case where the network is an undirected tree, and the vertex weights wi are all equal, the algorithm is of order n and reduces to finding the longest path between any pair of vertices in the tree. The absolute center is at the midpoint of the longest path (between 2 end vertices) and determining these end vertices is a maximax problem which requires a special application of the algorithm for a least cost tree rooted at a vertex of a general network. The algorithm utilizes the fact that one and only one path in a tree connects any two points, and thus does not require determining the distance matrix D of the shortest path between all vertices [Handler 19731. This algorithm due to Handler will provide background material for a “hybrid” formulation to be considered in Part I1 on the 1-median problem in a network. The hybrid objective function to be treated is: minimize a convex combination of the minimax objective function considered here (minimize the maximum distance to a vertex) and the minisum objective function corresponding to the median problem (minimize the sum of the weighted distances to the vertices in the network). [Goldman 1972; Dearing & Francis 19741 have also considered the more general “vertex-addend’’ form of the weighted center problem in a tree. In this formulation, previously briefly considered under the discussion of the 1-center problem in the plane, the weighted distance from a point q in the tree to a vertex j is: a, + wid,.
Selected families of location problems
341
The algorithm is essentially identical to that presented above without the vertex-addends : (1) Determine the vertices i” and j” which define the weighted diameter 8 of the tree:
+
8 = max [(wiwidii wia,
+ wiai)/(wi+ wi)],
i,icV
(2) If i o = j”, then qo = i”, (3) If i” # jo, qo is the point on the path from i” to j” such that dioqo= (e - ay)/wy. [Halfin 1974; Lin 19751 have developed particularly simple algorithms for the vertex-addend problem in a tree where all the wi are equal.
4. 1-Vertex center problem in a network The 1-vertex center problem (or simply the center problem in a network as it is often referred to) assumes that objects may only be located at the vertices of a network. The center of a vertex-weighted undirected network G = (V, E) is that vertex from which the maximum weighted distance to (or from) any weighted vertex in the set V of vertices is a minimum.
1-Vertex center problem in a network
[
1
min max { widii} iev
jcv
The extension to determining “in-centers”, “out-centers’’ and “in-out-centers” in directed networks is straightforward as will be seen below.
4.1. Solution methods For the simplest case, an undirected network with all the vertex weights wi equal, the vertex center can be determined by inspection of the matrix of lengths of the shortest paths between vertex-pairs, D = {dii}; the center is the vertex (row) with the minimum of the maximum elements in each row. For undirected networks with individual vertex weights, the center can be determined by first forming the weighted distance matrix by multiplying each column in the D-matrix by the weight wi and then following exactly the same inspection procedure as for the unweighted (equally weighted) case, i.e. the center is the vertex (row) with the minimum of the maximum elements in each row. For the most general case of a directed network and unequal vertex weights, the procedure is in principle identical. Should an “out-center” respectively an “in-center’’ be sought (i.e. that vertex from-which/to-which the maximum weighted distance to/from a reachable vertex is minimized), then the weighted distance
342
J. Krarup, P.M. Pruzan
matrix is formed by multiplying the columns/rows of the distance matrix by the corresponding vertex weights and choosing the minimum of the maximum elements in each row/column. The extension to determining an “out-in-center” is obvious: both weighted matrices are generated and the “out-in-center” is that vertex i from which the weighted sum of the distance to a reachable vertex j and from j back to the (reachable) vertex i is minimized. Clearly, the calculations required to determine the vertex center of a network are far less demanding than those required for determining an absolute center and are in fact rather trivial once the distance matrix of the lengths of the shortest paths between vertices is established: determining this matrix is by far the most demanding procedure (of order n3). For the special case where the network is a tree with all vertex weights equal, the computations are even simpler as the vertex center is clearly the closest vertex adjacent to the absolute center and therefore can be determined by finding the longest path between any two end vertices of the tree; this does not even require determining the distance matrix of the shortest paths between vertices as was pointed out under the absolute center problem.
5. p-Center problem in the plane Given a set J of n clients in the plane with coordinates (xi,yi) and weights wi and p objects to be located. For given object locations, client-object pairs are formed by assigning each member of the set J to that object which is closest to it. The client weights can be interpreted as the amounts to be shipped between the clients and their associated (closest) objects, the allowable object locations are free of restrictions and the weighted distance between an object and a client is the product of the client weight and the appropriate measure of distance. A set of p points S(d={(xq,,y,,), i = 1 , . . . ,p } is called a p-center in R2 if for every set of p points S, in R2: max [wjd(S:j)]smax [w,d(S,j)] jtJ
i€J
where d(S,j) is the length of the shortest distance from a point qi E S, to a client j . The p-center problem in the plane is to determine the coordinates (xq,, y,,) of the set S, so as to minimize the maximum weighted distance between any clientobject pair:
p-Center problem in the plane
ISPI = P
Selected families of location problems
343
where d(S,,j) = min [dqj]. 5.1. Solution methods Very little attention has been paid to this class of problems. [Elzinga & Hearn 19721 suggest that their geometrically based solution procedures for the case all wi equal and distance measure either rectilinear or Euclidean can be extended from the 1-center case. Clearly, a solution will be characterized by each of the p objects to be located having a group of clients associated with it which it serves or is served by. Therefore, the solution process can be considered as one whereby clients are assigned to objects whereafter each object is located so as to minimize the maximum weighted distance to the clients associated to it. We suggest therefore that an extension of the geometrically based solution procedures might be achieved by employing an iterative heuristic procedure with an allocation phase which assigns clients to objects and a location phase which use the procedure of Elzinga & Hearn to locate each object as a center for its assigned clients. (See the discussion of the Alternate Location Allocation procedure in Section 11.1.) Consideration of the above procedure leads to the conclusion that in the general case, solutions will not be unique. In an optimal solution to a typical p-center problem, the location of only one of the p-centers will determine the minimax value of the objective function, i.e. the maximum weighted distance between all client-object pairs will be determined from the location of only one “critical” object. Thus, the locations of the remaining objects will not be decisive and they will not have to be located as the center for their associated groups of clients as long as their locations do not result in the weighted distance between an object and a client exceeding the minimax distance determined. These remarks will of course also hold true for the p-center problem in a network. In a manner analogous to the geometrically based solution procedures of Elzinga & Hearn, [Wesolowsky 19721 has considered a problem apparently closely related to the standard p-center problem in the plane formulated above. However, as will be seen, this class of problems, although certainly of interest, differs rather significantly from the standard p-center problem (aside from the case p = 1, in which case the formulations are identical). The problem is to locate a set S, of p points in the plane with respect to a set J of n clients with given locations such that the maximum weighted rectilinear distance between an object and a client or between two objects is minimized: p-Center problem in the plane with mutual communication
1
max max { widqj,vikdqQqk) , 1S i s k s p,
344
J. Krarup, P.M. h z a n
where wii is the prespecified weight to be delivered from object i to client i, Dik is the prespecified weight to be delivered from object i to object k and dqj and dq,qk are the corresponding rectilinear distances. This apparent generalization of the standard p-center problem in the plane to include the weighted distance between objects as well as the weighted distances between objects and clients is, however, restricted due to the fact that the weights here are double-indexed. The weights to be transferred between each object pair and between each object-client pair are prespecified, while the standard formulation includes both a location aspect (where are the objects to be located) and an allocation or assignment aspect (each client is to serve/be served by the object nearest to it); this allocation aspect is eliminated from Wesolowsky’s formulation. Reference is also made to the sections treating the p-median problem in the plane, where very similar minisum formulations are discussed. The problem is formulated as a parametric linear programming problem. The first step in Wesolowsky’s approach to this minimax problem is to consider the corresponding minisum (p-median) problem
and to transform this problem into an equivalent LP-problem:
subject to
At optimality
This formulation is then modified so as to be applicable to the p-center minimax formulation. First the following constraints are added where d represents an upper limit on the weighted distance between an object and any other point.
The constants in the objective function are then replaced by wii and u l k which are nonnegative numbers that are not zero when the corresponding wii and t)ik are not
345
Selected families of location problems
zero. The following LP-problem results:
subject to - el: 6 ( yqi - yi) s eij
- elj s (xqi-xi) s eij; w..(e!. +e!!)
i E S,, j E J,
' I }
-f:;
s(xq,-xqk)sf:k; uik (f:k + f :k) d -f:k
(Yqc - Yqk)
tL
1
1s i < k s
e:j?e : ; f~:k, f y k , xq,, Yq, O. It can be easily shown that at optimality all the constraints not involving d are tight, i.e. e:~ = 1% -yll; f:k
= lxq, - xqk
e:: = 1y4, - Y I I
1;
f:;
=
I
IYq, - 1 Yqk
i
sp,j
J,
1 s i < k < p.
Thus to determine the locations of the p-centers such that the maximum weighted distance between an object and any other object or client is minimized, it is only necessary to reduce d until a further reduction results in nonfeasibility. This can be done by replacing d in the constraints by ( d - 8 ) and by using parametric programming. d is first chosen so large that the resulting program is feasible for 8 = 0 and then 8 is increased until no feasible solution can be found for a larger value of 8. This procedure is thus similar to that attributed in 1860 to Pierce by Sylvester for finding a minimum covering circle; see the previous discussion of the l-center problem in the plane. Note that the solution as obtained for the given constants w:,and t(k need not be unique; there may be other locations of the p-centers which give the same maximum weighted distance d. Rerunning the problem with different constants w : ~and u : k may result in different solutions, and there is no guarantee that such a trial and error procedure will generate all the optimal solutions. [Elzinga & Hearn 19731 modify Wesolowsky's approach to this p-center problem with rectilinear distance and formulate a nonparametric LP-formulation with a smaller number of restrictions and variables. They formulate the problem directly as one of minimizing d. The problem is thus to minimize d subject to the restrictions:
J . Krarup, P.M. h z a n
346
It should be noted that the coordinate system can be chosen so that all xq, > O , y,, 3 0 so that nonnegativity restrictions are not required. This reformulation of Wesolowsky’s parametric LP-problem has only 4p(n +$(p - 1)) restrictions in contrast to Wesolowsky’s 5p(n + b ( p - 1)) and only (2p + 1)variables in contrast to 2p(n +&p + 1)).In particular this significant reduction in the number of variables suggests a dual formulation and the unrestricted variables eliminate the need for slack variables in the resulting dual. Furthermore, for the case of all wij equal, a quite significant reduction in problem size is obtained. The LP-formulation is: min d xq,+ yq, + d 3 xi + yj xq,+ yq, - d s xi + yj
1
xq,- yq, + d 3 xi - yi
i E S p , jcJ,
The first 4pn restrictions are replaced by the 4p restrictions: x,,
I
+ yq, + d 3 max [xi + yi] i
xq,+ yq, - d s m i n [xi + yj] i 1
x,,-y,,-d~min[x,-y,] I
I
I
xq, - Yq, + d a max [XI - Y, 1
i c S p ,j E J .
This problem is seen to be independent of n, the number of clients. While the p-median problems to be considered in Part I1 are further developed as plant location problems in [Krarup & Pruzan 19771 where the number as well as the location of the objects are to be determined, such extensions have not apparently been treated as regards the p-center problem in the plane or in a network for that matter.
6. p-Absolute center problem in a network The 1-absolute center problem in a network is generalized here to the problem of determining the simultaneous location of p objects at a set Q, of p points in the vertex-weighted network G = (V, E) so as to minimize the maximum weighted distance between any client-object pair. Client-object pairs are formed by assigning each client (vertex) j E V with weight wi to that object q E a, which is located nearest to it.
Selected families of location problems
341
The following exposition will be restricted to the case of undirected networks as the extension to directed networks (and to determining “out-p-absolute-centers”, “in-p-absolute-centers” and “out-in-absolute-p-centers”) is straightforward and follows the development under the 1-absolute center problem. As discussed under the preceeding section on the p-center problem in the plane, a solution will be characterized by each of the p objects to be located in the network having a group of clients (vertices) associated with it which it serveslis served by. A solution procedure can therefore be considered as containing the following elements: 1) assigning clients to objects, 2) locating each object as the center for its group of clients, i.e. so as to minimize the maximum weighted distance to an assigned client, and 3) redetermining the assignment of clients to objects so as to minimize the minimax weighted distances resulting from any possible assignment. This leads to the formulation given below of the p-absolutecenter problem in a network. First, recall from the development under the 1-absolute center problem in Section 3 that 8 is the set of all feasible locations (everywhere) on any edge (including its end vertices) and that d; denotes a shortest path from q to j in the subdivision network resulting from the location q + (x, y). Let R = {rqi}, q E Q,, j E V be a (0, 1)-matrix of assignments: rd = 1 if vertex (client) j is assigned to object q ; otherwise, rd = 0. For a given Q, and a given R, the maximum weighted distance d”(Q,, R) between any client-object pair amounts to do(Qp,R ) = max {wjrddAj} q t Q.,i E
V
We arrive thus at the following formulation p-Absolute center problem min [min {d’(Q,, R ) ) ] , Q,=%
R
subject to
rqj
E {0,13,
q E Qp,
i E V.
Note that 1 VJ5 lQ, I in general but that such a restriction is not necessary from a computational point of view. As was discussed under the p-center problem in the plane, optimal solutions need not be unique as the maximum weighted distance between all client-object pairs will be determined from the location of one (or more) “critical” objects, thereby permitting the location of the remaining objects to be modified from that proposed by a solution procedure for the above problem formulation. Aside from
J. Kramp, P.M. Pruzan
348
the critical object(s), the remaining objects need not be located as the center for their associated groups of clients as long as their locations do not result in the weighted distance between an object and a client exceeding the minimax distance for the critical object($ 6.1. Solution methods The p-center problem was first suggested by [Hakimi 19651 as a vertex-center problem in connection with problem formulations regarding the location of a minimum number of objects (vertices) in a network such that every vertex in the network is at most a prespecified distance from the object nearest it. Although the p-center problems class of prototype problems is far more realistic and meaningful for realistic location problems than the 1-center class, it has received very little attention in the literature in comparison with the many publications dealing with variations of the 1-center problem in a network. [Minieka 19701 suggested a solution procedure for the p-absolute center problem in a network based upon solving a sequence of set covering problems. Such an algorithm, apparently rapidly converging and capable of treating fairly large networks (i.e. 50 vertices, 80 edges in around 30 seconds on a CDC 6600 computer) is presented in [Christofides & Viola 19711 and expanded upon in [Christofides 19751. Following [Christofides 19751, a skeleton of the algorithm is as follows: (1) Initialize a parameter, A = 0. (2) Increase A by a small amount AX. (3) Find the sets QA (j) for all j E V where QA ( j ) is the set of all points q, 4 E 8, from which vertex j is reachable within a distance aj = A/w,. These sets can be calculated by a simple, shortest path like algorithm. (4) Determine the regions @A where these regions are the sets of points in G all of which can reach exactly the same set of vertices for the given A. A region may be one or more edges and/or a section of an edge or may just contain one point. The regions can be calculated from the reachable sets as follows: The regions that do not reach any vertex are given by: r L
1
i
J
where the second term excludes all regions of G that can reach some vertex j . The regions that can reach the t vertices jl, j2, . . . ,j, (for any t = 1,2, . . . , n) and no other vertex are given by @A(ji9j2,...,jt)=
n..., Q A ( ~ ~ ) \ n [
s=l.
1
s=l, ...,1
Q~(.k)]n[= t + u l , ...,n Q,(jS)]
where the second term excludes regions that can reach other vertices in addition to i,, jz, . . . ,h.
Selected families of location problems
349
( 5 ) Form the bipartite graph G‘ = (V’U V, E’) where V’ is a set of vertices each representing a region and E’ is a set of edges so that an edge between a region-vertex and vertex j exists if and only if j can be reached from that region. (6) Find the minimum dominating set of G’. This gives the minimum number of regions in G that can reach all the vertices of G within the weighted distance. Finding the minimum dominating set of G’ is clearly a set covering problem; solving the set covering problem is by far the most time consuming step in the algorithm. (7) If the number of regions in the minimum dominating set is greater than p return to Step 2). Otherwise stop. The p-absolute center of G can be determined by choosing one point from each of the p regions in the set; one or more of these will be “critical”, i.e. the region is reduced to a point. Several observations can be made. First of all, it is clear that the procedure described above also derives the absolute (n - 1)- , (n - 2) - ,etc. centers where n is the cardinality of V. Secondly, at the end of any iteration during which the number of regions in the minimum dominating set is decreased by one, say from m to (m - l), the given value of A is the critical value below which no (m - 1) center exists (for the given step-size Ah). This critical value of A is the minimax weighted distance for the (m - 1) center problem and is determined by the one (or more) of the regions in the minimum dominating set which is a “critical” “point”, or to be more precise, a region of length at most equal to Ah. Thirdly, the algorithm permits the determination of the smallest number (and location) of centers so that all vertices of G lie within a given critical distance from the centers; Steps (3) to (7) are simply performed with A set to this critical distance,
7. p-Vertex center problem in a network
The vertex-center problem is generalized here to the problem of finding a set K , of p objects (vertices) in a vertex-weighted network G = (V, E) which minimizes the maximum weighted distance between any client-object pair. Clientobject pairs are formed by assigning each client (vertex) j~ V with weight wi to that object k E K,, which is located nearest to it. Once again, the exposition is limited to the case of undirected networks as the extension to directed networks is straightforward: only the distance calculations need be modified to take account of those edges which are directed-see the discussion under the 1-absolute center problem in a network. In analogy with the p-absolute center problem in a network, we let rkj = 1 or 0 according to whether vertex (client) j is assigned to object k or not. For a given K, and a given R = {rki}, the maximum weighted distance do(K,, R ) between any
J . Krarup, P.M. Pruzan
350
client-object pair is do(&, R) = max (Wjrkjdkj}. k E K,,j
EV
Hence the problem reads p- Vertex center problem
C
Tkj
= 1, j E V,
keK,
rkiE (0, l},
k E K,, j
E
V.
7.1. Solution methods While the solution to a 1-vertex center problem in a network could be found by simple inspection of the weighted distance matrix, such an inspection procedure for the p-vertex center problem would clearly require p(n -p)(i> comparisons. This is too time consuming for reasonably large values of n, e.g. for n = 50 and p = 5 approximately 3 X lo6 comparisons would be required, and this would grow to approximately 3 X 10" for n = 100 and p = 5. Fortunately, however, the algorithm described under the p-absolute center problem in a network can be applied to this more restricted problem formulation. While Step 3 in that algorithm defined the sets of points in G from which a vertex j is reachable within a distance ai = h/wi, the corresponding set of vertices must be defined here, and the regions @A determined in Step 4 must be based upon these sets of vertices. The algorithm is then directly applicable if in Step 7 the number of vertices of G contained in the regions in the minimum dominating set is checked rather than just the number of regions as was the case in the p-absolute center problem. That is, Step 7 is reformulated as follows: If the minimum number of vertices of G which can be chosen from among the minimum dominating set and which can reach all the vertices of G within the given weighted distance A is greater than p , return to Step 2. Otherwise stop; the p-vertex center of G consists of the p vertices so chosen. Once again, the solution need not be unique if there are other such sets of p vertices in the minimum dominating set which can reach all the vertices of G within the given weighted distance. In fact computation time can be reduced by taking advantage of the fact that all distances required are directly available from the D-matrix. If all the distances are ordered in increasing magnitude, the number of iterations can be reduced; it is not necessary to increase A by a small arbitrary amount Ah as in Step 2 of the algorithm, and a binary search of the set of the (2") ordered distances can instead be used.
Selected families of location problems
351
As was discussed under the case of the p-absolute-center problem in a network, the algorithm also permits the determination of the smallest number (and location) of vertex-centers so that all the vertices of G lie within a given critical distance from the centers. An alternative procedure based upon a set covering formulation with inequality constraints is provided by [Toregas, Swain, ReVelle & Bergman 19711. Although not based upon a p-center problem, this hybrid formulation is so closely related (and certainly as relevant for location decisions) that it will be treated briefly here. For given maximum weighted distances or response times dj, only the set &. of vertices within d, of vertex j can serve j ; 8.= {i 1 dij< di} (or 4. = {i I widi,S dj}). If there are n clients, i.e. )VI= n, there will be n sets of 3,each set having at least one member as d,, = 0. This problem of determining the minimum number (and location) of vertex centers so that all vertices of G lie within given critical weighted distances from the centers can thus be formulated as follows:
i=l
1 r i 3 1 , j = 1 , 2,..., n, i EN,
ric{O,l}, i = l , 2,..., n where ri = 1 if an object is located at vertex i, 0 otherwise. This set covering problem is solved by relaxing the integer constraint and then using LP supplemented by a single cut constraint in those cases when the solution to the linear programming problem is not integer. 1.e. if mo is an optimal ri 3 non-integer objective value obtained from the LP solution, then the cut Cy='=, [mO]+1 is added as in any integer solution, the minimum number of objects must be at least as great as the least integer, [mO]+ 1, greater than mo. Although it is possible to have a noninteger solution where m 0 is integer, such a solution has not been observed according to the authors, and all noninteger solutions have been resolved with the addition of a single cut constraint. The procedure can easily treat fairly large problems, i.e. with hundreds or perhaps thousands of vertices, as the number of restrictions is equal to the number of clients (vertices) to be served and the number of variables is equal to the number of potential locations (at most the number of vertices).
References (Part 1) Christofides, N., and P. Viola (1971), The optimal location of multicenters on a graph, Operations Res. Quart. 2, 145-154. Christofides, N. (1975), Graph theory -an algorithmic approach (Academic Press, New York). Dearing, P.M., and R.L. Francis (1974), A minimax location problem on a network, Transportation Sci. 8, 333-343.
352
J. Krarup, P.M. Pruzan
Elzinga, J., and D.W. Hearn (1972), Geometrical solutions for some minimax location problems, Transportation Sci. 6, 379-394. Elzinga, J., and D.W. Hearn (1973), A note on a minimax location problem, Transportation Sci. 7, 100-103. Francis, R.L. (1967). Some aspects of a minimax location problem, Operations Res. 15, 1163-1169. Goldman, A.J. (1972), Minimax location of a facility in a network, Transportation Sci. 6, 407-418. Hakimi, S.L. (1964), Optimum locations of switching centers and the absolute centers and medians of a graph, Operations Res. 12, 450-459. Hakimi, S.L. (1965), Optimal distribution of switching centers in a communication network and some related graph theoretic problems, Operations Res. 13, 462-475. Halfin, S. (1974), On finding the absolute and vertex center of a tree with distances (Letters to the editor), Transportation Sci. 8, 75-77. Handler, G.Y. (1973), Minimax location of a facility in an undirected tree graph, Transportation Sci. 7, 287-293. Jacobsen, S.K., and P.M. Pruzan (1976), Lokaliseringsmodeller og Igsningsmetoder, IMSOR, The Technical University of Denmark. Krarup, J., and P.M. Pruzan (1977), Selected families of location problems. Part 111: The plant location family, Working Paper No. WP-12-77, University of Calgary, Faculty of Business. Lin, C.C. (1975), On vertex addends in minimax location problems (Letters to the editor), Transportation Sci. 9, 166-168. Minieka, E. (1970), The rn-center problem, Journal of SIAM (Review) 12, 138-139. Odoni, A.R. (1974), Location of facilities on a network: a survey of results, Technical Report 03-74, Operations Research Center, M.I.T. Cambridge, U.S.A. Toregas, C., R. Swain, C. ReVelle, and L. Bergman (1971), The location of emergency service facilities, Operations Res. 19, 1363-1373. Wesolowsky, G.O. (1972), Rectangular distance location under the minimax optimality criterion, Transportation Sci. 6, 103-1 13.
Selected families of location problems
353
PART 11. MEDIAN PROBLEMS 8. Introduction The median problem is to locate a set of objects (points) so as to minimize the sum of the weighted distances between the members of the set and the clients (points) assigned to them, i.e. the clients to serve/be served by them. As was the case for the center problems considered in Part I, clients are assumed to be assigned to the objects nearest them. The optimum location of one such object is simply called the median, while the optimum location of p objects is referred to as the p-median. The origin of the name “median” will become clear from the discussion of the median problem in the plane with rectilinear distance. Such a minisum objective function can be considered as a prototype for many location problem formulations where some measure of the total or average cost of serving clients is to be minimized. In many problem formulations, however, the objective function to be optimized is more complicated. In fact, the well-known simple plant location problem (SPLP) to be treated in ([Krarup & Pruzan 1977A1, to be published) is but a generalized median problem where fixed costs are assigned to the establishment of the objects or “plants” and where the number of objects to be located is not given in advance (which it seldom is in typical location problems), but to be determined by minimizing the sum of the fixed costs and the costs of serving the clients. As was the case with center problems, by far the most attention has been paid to median problems in a network although historically these problems have been first formulated in the plane, i.e. where an object to be located may be placed at will with respect to the clients with predefined locations. We will therefore follow the logical -and historical -approach employed in surveying the center problems, and begin here by considering first the location of one object in the plane, then one object in a network before continuing with the more realistic and demanding p-median problems in the plane and in a network.
9. 1-Median problem in the plane 9.1. Euclidean distance
Given a finite set of clients J = { 1 , . . . , n } with coordinates (xi, yi) and with weights wi, j E J. The 1-median problem in the plane is to determine the location (xq, y,) of an object q such that the sum of the weighted distances widqi (here Euclidean) from the object to the clients is minimized.
J . Krarup, P.M. Pruzan
354
The Euclidean distance d,, between q and the jth client is defined by d,,
=)I \I*G{(X~ -xi)’+(y,
- Y,)~};.
The weighted Euclidean distance between the object and a client is the product of the Euclidean distance between them and the weight assigned to the client. The problem is thus: l-Median problem in the plane
The problem is a generalization of a problem first formulated by Fermat in the early 1600’s: “Given 3 points in the plane, find a fourth point such that the sum of its distances to the three given points is as small as possible:” The problem was solved geometrically around 1640 by Torricelli. In 1750 Simpson generalized the problem by including unequal weights w l ,w 2 and w 3 . In 1909 Weber utilized this generalized model to determine the optimal location of a factory which produces one product with two given, distinct sources for its raw material and which serves a single customer, where the three given points are not collinear (so that the objective function is strictly convex). The problem formulation given above as the l-median problem in the plane with Euclidean distance is often referred to as the generalized Weber-problem or the generalized Fermat-problem, where this later characterization is also applied to problems regarding the location of an object in Euclidean space with dimension greater than two. Duality in nonlinear programming, an aside. Participants in the IX International Symposium on Mathematical Programming (Budapest, 1976) will undoubtedly recall one of its highlights: Harold Kuhn’s lucid exposition of duality in nonlinear programming, now available in [Kuhn 19761. To shed more light over the history of nonlinear programming, Kuhn has on several occasions discussed Fermat’s problem and attributed its dual to Fasbender (1846). However, in his continuing search for finding the first occurrance of duality in the mathematical literature (i.e. a pair of optimization problems, min z, max w, based on the same data, with z 3 w for feasible solutions to both problems and optimality if and only if z = w), Kuhn has discovered an earlier source, The Ladies Diary or Women’s Almanack (1755) in which the following problem is posed: “In the three Sides of an equiangular Field stand three Trees, at the Distances of 10, 12, and 16 Chains from one another: To find the Content of the Field, it being the greatest the Data will admit of?” Further, in the Annales de Mathematiques Pures et Appliquies (1810-ll), a similar question is asked: “Given any triangle, circumscribe the largest possible equilateral triangle about it”. The. substance of the solution provided by the French mathematician Vecten (1811-12) is that the altitude of the largest
Selected families of location problems
355
equilateral triangle that can be circumscribed about a given triangle is equal to the sum of distances from the point solving the Fermat problem, defined by the three vertices of the given triangle. To quote Kuhn: “Until further evidence is discovered, this must stand as the first instance of duality in nonlinear programming!”
9.1.1. Solution methods The objective function is strictly convex when the clients are not located on a straight line and convex in the case of collinearity; in fact the l-median problem in the plane is convex for all distance measures based upon a norm )I - I(, = ( ( x , -xi)” + ( y , - Y ~ ) ~ ) ’ ’ ” for m 3 1. Thus a local minimum is also a global minimum. Therefore a global minimum satisfies the condition that the gradient of the objective function equals zero. This results in the following equations in the case m = 2 :
These equations cannot be solved analytically, hence a numerical method is necessary. This led Weiszfeld to conjecture in 1937 whether the following iterative procedure could be used to generate a sequence of values of (xi, y i ) t = 1 , 2 , . . . , which could converge to the optimal coordinate values (x:, y:) solving the above equations:
The conjecture was almost true. Troubles only arise when ( x i , y i ) = (xi, y j ) for some t and j , i.e. the object location coincides with the location of a client at some stage. In this case d$ = 0 in the two denominators, corresponding to the fact that the objective function is not differentiable at the client locations (xi, yi), all j . A natural extension of the algorithmic map based upon 1’Hospital’s rule was made by [Kuhn & Kuenne 19621. This produced a mapping with fixed points at all ( x i , y i ) , as well as at ( x : , y t ) . In addition, Kuhn and Kuenne provided a test of optimality for the case ( x i , y i ) = (xi, y j ) . The only problem thereafter was to find an ( . a 1 , y a ’ ) when the optimality test failed. [Kuhn 19731 gave evidence that the event (xi, y b ) = (xi, yj) # ( x i , y i ) is rather unlikely. [Jacobsen 19731 gave a formula for (x;’, y r ’ ) when (xi,y i ) = (xi, y j ) , and [Cordelier, Fiorot & Jacobsen 19761 proved this extended algorithm to converge when clients are not collinear. The algorithm expends quite a few iterations on homing-in on the optimum once the location is in its neighbourhood. The Newton-method mentioned in an
J. Krarup, P.M. h z a n
356
excerise by [Francis & White 19741 converges much faster - if it does not oscillate. Some authors have approached the non-differentiability problem by approximating the objective function by a function differentiable everywhere; for a hyperbolic approximation, see [Eyster, White & Wierville 19731; for a parabolic approximation, see [Jacobsen 19741. Quite a few fairly recent papers and books recommend the straightforward Weiszfeld-conjecture without mentioning the non-differentiability problem. In view of the evidence reported by [Kuhn 19731 this may be practically acceptable and seldom lead to a non-optimal fixed point, but some qualification should then be given to make the reader aware of the problem. [Schaefer & Hurter 19741 consider the 1-median problem in R" where the solution is constrained to be within a maximum distance of each client. They develop an algorithm for solving such problems based upon a Lagrangian form of the problem which involves solving a sequence of unconstrained 1-median problems. The algorithm can not only be employed to solve problems characterized by a Euclidean measure of distance in the plane, but is also applicable in the rn -dimensional space and for all distance measures based upon a norm 11 * That is, for q a point in rn-dimensional space, the algorithm solves:
., I[
min
1 wid,
qtR" jcJ
where d, S di (or wid, S di), j = 1, . . . , n and d, is any metric. Such hybrid problem formulations are of considerable interest as it is seldom that the minimization of the sum of the weighted distances adequately reflects for example many emergency problems while the center problems considered earlier, focusing as they do only upon the maximum weighted distance between an object and a client, can be criticised for ignoring the sum of the weighted distances (costs) associated with the minimax solution. The constrained problem above can be rewritten in Lagrangian form (rn = 2):
or
where hi is the optimal Kuhn-Tucker multiplier for the jth constraint. These multipliers can be interpreted as the additional weight that would have to be added to wi in order that the optimal solution to the constrained problem is an optimal solution to the unconstrained problem with the original weights replaced by the modified weights, i.e. hi is an additional weight to be added to the original
Selected families of location problems
357
weight so as to “pull” the unconstrained optimum into the feasible region for the constrained formulation. Clearly as the d,, the right hand sides of the constraints, increase, the optimum value of the objective function for the constrained formulation decreases towards the unconstrained minimum, and the artificial weights hi decrease; thus the optimal value of the objective function for the constrained problem decreases strictly as the weights increase (an exception though is when the object is located at a client and only that client’s weight increases, in which case the objective function remains unchanged). Furthermore, we need only consider the clients whose constraints are not satisfied by an optimal solution to the unconstrained problem, i.e. the subset of clients j E 0, such that dq+j> dj where q+ is the optimal solution to the unconstrained problem. Consider the Lagrange formulation of the 1-median problem for a client i belonging to the set Di
or equivalently min
[2
(xq7~qs&) j =
wid, + ( w i +Ai)(dqi-di)].
1, j # i
An increase in Ai is equivalent to an increase in wi,and the objective function increases as the weight (wi+ h i ) increases. The algorithm involves solving this problem, that is, for each jEDj, find the smallest A, a0 such that the solution is feasible with respect to client j when all the other weights are unchanged. This corresponds to a search over Aj and is the basis of the algorithm. If the solution to this problem is feasible, it is a candidate for the optimal solution to the full-blooded constrained problem. The value of the objective function is compared with an upper bound (e.g. from the solutions obtained from previous solutions for jED,). If the new solution is less than the previous upper bound it becomes the new upper bound. This procedure continues until all j E D j have been examined and the final upper bound represents the optimum value of the objective function and its corresponding solution is the optimal solution. [Hurter, Schaefer & Wendell 197.51 consider properties of solutions to generalized 1-median problems subject to more general constraints than the norm constraints considered above. With reference to the above use of a Lagrangian formulation, it is established that if the objective function consists of a sum of functions of the distances between the object to be located and the clients, &cj(d,) where d,. is a norm function, and ci(d,), the total cost of shipping between the object q E R” and client j , is a non-decreasing lower-semicontinuous function [O,m) of d,. alone, and if the only constraints are on the distances, d, G d,, then the solution to the constrained formulation is also an optimal
J. Krarup, P.M. Pnczan
358
solution to the unconstrained problem:
where Zj (d,) = cj (d, ) + Aid,.
[Jacobsen & Pruzan 19761 consider the sensitivity of the total weighted distance to the location of the object in the case where the object both receives goods from the clients in its neighbourhood and sends these goods in modified form, e.g. after a production process, to a major client located outside the neighbourhood. This is a typical situation in many agricultural situations where grain, milk or animals are sent from farms to a central silo, dairy or slaughterhouse and where the processed goods, often characterized by reduced weight and easier handling and transportability, are then sent to population centers located outside the surrounding countryside. Their results together with several empirical investigations indicate that the location of objects (e.g. silos, warehouses etc.) in such problems can be determined primarily on the basis of in-transport costs alone. 9.2. Rectilinear distance The formulation is identical to the I-median problem in the plane with Euclidean distance except that the 11 * Ill norm or rectilinear distance measure is employed.
where d,
3
II
*
111
={lxq -xjl+
IYq
- YjI).
In many problem formulations, the rectilinear distance measure will be preferred. Not only is this measure of distance perhaps more relevant in considering the location of an object in a city characterized by streets which are orthogonal to each other, but also in many three-dimensional problems, e.g. dealing with the installation of pipes, electric cables etc. in a building. Furthermore, even should this measure of distance not be more relevant than say the Euclidean measure, it is much easier to work with due to its separability. This permits a much simpler algorithm to be employed and our experience shows that the optimal solution to a 1-median problem is relatively insensitive to the choice of distance measure. 9.2.1. Solution methods From the problem formulation given above, it is evident that the objective function is separable so that the two sub-models can be solved independently of
Selected families of location problems
359
each other:
As mentioned under the 1-median problem with Euclidean distance, the objective function is also convex here and therefore it is sufficient to determine a local minimum. This can be found by differentiating the two independent objective functions with respect to xq and yq respectively, with due consideration to the fact that these are non-differentiable for xq = xi or for yq = yi, j E J. The tangents to the right of, respectively the left of, such a point xi are:
and
Corresponding expressions result when considering differentiation with respect to yq and we will therefore continue this exposition considering the x-sub problem only. The conditions for a local, and therefore a global optimum value of x,, xz are:
We define the marginal distribution of demand
Simple manipulation results then in the following conditions: w ( x : - ) / W ( + ~ ) s $W(x,“+)lW(+@+ s As W(+a)=CiGJwi, it is seen that the optimal location (xz, y:) is the median of the marginal demand distributions, and thus the origin of the name “1-median problem”. It can also be seen that the median is not necessarily unique. An alternative solution procedure is based upon reparing the nondifferentiability of the objective function at a client location by modifying the distance measure. As was also observed when considering a similar procedure for the 1-median problem with Euclidean distance, this procedure is rather demanding in comparison to simply determining the median which only requires a sorting of the clients according to their coordinates. Finally, in extension to the remarks made earlier that the rectilinear distance measure permits a simpler algorithmic procedure than the Euclidean distance measure, reference is made to [Francis & White 19741 where the following
J . Krarup, P.M. Pruzan
360
formula is developed for the interval for the optimum value of the objective function in the Euclidean case:
where (x:, y:) is an optimal rectilinear solution and (Xi, Y,’) is an optimal Euclidean solution.
10. 1-Median problem in a network Instead of starting the exposition by first formulating the 1-median problem in a network as an absolute-median problem and then as a vertex-median problem, it will be worthwhile here first to consider whether or not it is necessary to distinguish between these two cases. If the absolute-median can in fact be located somewhere on the edge (x, y) joining the vertices x and y, then it can also be located at both the vertices; were this not the case, then it should be possible to move the median in the direction of one of the two vertices and obtain a marginal reduction in the weighted distance, in which case the same argument leads to the median being at only one of the vertices. In other words, either the median is located at every point on the edge, and therefore at both of its end vertices, or it must be located at one, but not both, of the vertices. The formal proof following these lines was given by [Hakimi 19641. Hence, it is only necessary here to consider the 1-vertex median problem in a network. Furthermore, as the above results hold for directed graphs as well as for undirected graphs, it is not necessary to distinguish between “in” medians and C C ~ medians, ~ f ; ’ and therefore we will not distinguish between directed and undirected networks. The following formulation is straightforward: 1-Vertex median problem in a network min
widii.
i s V jev
10.1. Solution methods Once again, it is assumed that the matrix of shortest path distances D = {qi}is available. We are not able to determine solution procedures other than the straightforward brute-force search of the sum of the weighted distances from every vertex (which requires an n-term scalar product for each vertex) and the determination of the minimum by direct comparison. Thus the total calculation is proportional to n3, as the computation of the D-matrix is roughly of order n3.
Selected families of location problems
361
[Odoni 19741considers two modifications of the “pure” 1-median problem in a network considered above; we will structure these two formulations so that they lead up to the more general discussion of solution properties in the following section on p-median problems. In both of the two modified formulations it is assumed that an object (destination) 6 has been located which can receive the commodities to be sent by each client or source, and that the problem is to optimally locate a second object q. The “new median” problem is to determine the location of a second destination so as to minimize the sum of the weighted distances from the clients (sources) to the destinations, where sources send their commodities to the destination nearest to them (i.e. which minimizes the length of the shortest path in the network from the client to a destination). This is in fact a special case of the “supporting median” problem in which a supporting object is to be located which can receive and process the commodities to be sent from the sources and then transship the commodities after the processing to the destination; a source will send its commodities to the supporting object and then on to the destination if this processing results in a net reduction in the total weighted distance from the source to the destination. The processing at the supporting median can be considered to be a transformation which results in a reduction of the weight of the commodity (or, equivalently, a reduction in the cost per unit distance or time from the median to the destination). The problem is to locate the object (supporting median) q such that the sum of the weighted distances from all sources to the destination is minimized, where the weighted distance for goods processed at the supporting median is proportional to the distance from the median to the destination. If we replace the vertex weights wi in the “pure” median formulation by two components, wj =the weight (or cost per unit distance) of the source-to-destination or source-to-median movement of a commodity with source j, and w:-the weight of the median-to-destination movement of the commodity after it has been transformed at the supporting median, and if we divide the set of vertices V into the two subsets V, and Vq such that V, U Vq = V and for all j E V,, widis s widiq+ wi’d,, while for all j E V,, widj6> widiq+ w;dq,, then the supporting median problem can be formulated as: Supporting median problem r
As in the case of the “pure” 1-median problem, at least one of the optimal solutions to this “supporting median” problem can be shown to be a vertex. This result is seen to apply as well for the “new median” problem sketched above; in that formulation we may simply consider the processing at the supporting median to result in a reduction to zero of the weight of the commodities sent to it for processing and transshipment to the existing object (destination) so that the supporting object becomes in fact a destination.
362
J. Krarup, P.M. Pruzan
As can be seen in Section 12, these two formulations are variations of the more generalized forms of p-median problem in a network treated by [Goldman 19691 and by [Hakimi & Maheshwari 19721. Having once again established that it is only necessary to consider the l-vertex median problem, we are still faced in principle with the task of having to employ the straightforward, brute-force search of the sum of the weighted distances from every vertex, although it is presumably possible to reduce this search due to prior knowledge of the location of the existing destination. Extensions of this problem appear to be quite realistic in practical location problems, characterized as they most often are by existing objects. The following logical extensions do not appear to have been the subject of attention: (a) > 1 existing destinations, (b) > 1 new or supporting medians are to be located, (c) combinations of cases (a) and (b). It appears however to be a reasonable conjecture that these cases will in fact have properties similar to those determined for the modified l-median problems considered above. [Goldman 19711 has, however, developed simple solution methods for the l-median problem in networks that contain no cycles (acyclic case, a tree) and networks containing exactly one cycle (unicyclic case). For the acyclic case, the algorithm is extremely simple: (1) Select any end vertex. Compare its weight to w/2 where w =CEJw,. If wi < w/2, go to Step 2. Otherwise go to Step 3. (2) Add the weight of the end vertex just considered to the weight of the vertex to which it is connected. Then modify the network by deleting both the end vertex and the edge connecting it to the vertex currently being considered. This may or may not generate a new end vertex in the modified network. Return to Step 1. (3) Stop, the median is the vertex just considered.
This procedure does not require determining the matrix of shortest path distances as in fact distance is not employed in determining the median. Together with the results of [Handler 19731 regarding the center in a tree with vertex weights, referred to previously in Section 3 on the center problem in a network, the above formulation by Goldman is employed by [Halpern 19761 in a “hybrid” formulation. The objective function is a convex combination of the objective functions of a center problem where all the vertex weights are assumed to be unity (corresponding to minimizing the maximum distance from a point on the tree to an (end) vertex) and the median problem in the weighted tree. In other
Selected families of location problems
363
words, the objective function is:
Cent-dian problem
The point where this hybrid objective function assumes its minimum value has been named the “cent-dian” by Halpern as a suitable combination of the words center and median “because of its nice sound, in particular when pronounced in a French accent”. Halpern first shows that the cent-dian is located on the path P ( m , c ) connecting the median m and the “center” c (as only the maximal distance is considered here, all vertex weights are implicitly considered to be equal to 1).Then, as the point q moves along this path, the objective function is continuous, convex and piecewise linear with breaking points (points where its derivative changes value) at the vertices along P(m, c ) . From this it follows immediately that for given A, the cent-dian is located at one of the vertices along the path connecting the median and the “center” of the tree or at one of the end points, m and c, of this path. A simple procedure is then presented for locating the cent-dian of the tree for any value of A where the median is determined following [Goldman 19711 and the “center” is determined following [Handler 19731. Accordingly, if the weights wj of the vertices in the tree are integers, then for any h 2 ; the cent-dian coincides with the median. If the original tree G = (V, E ) is modified by assigning the center point c to the edge (x,y) E E , i.e. c -+(x,y), then the resulting subdivision tree G’ = (V’, E’) is defined by
E’ = E U {(x, c ) , (c, Y )}\{(x, Y 11. If the new vertex c E V‘ is weighted with the weight w, = l / A - 1, then the median of G’ coincides, as shown by Halpern, with the cent-dian of G. Unfortunately, it appears that these results cannot directly be extended to the more general case where the center of the tree is determined as that point which minimizes the maximum weighted distance to a vertex (instead of the maximum distance to a vertex considered by Halpern). Extensions to networks in general and to p-cent-dim problems are non-existent at present to the best of our knowledge.
V’ = V U { c } ;
11. p-Median problem in the plane 11.1. Euclidean distance Given a set J of n clients in the plane with coordinates (xi, yi) and weights wi. The problem is to determine the coordinates (x,,, yqa), i = 1,.. . ,p , of a set S, of p
J. Krarup, P.M. Pmzan
364
objects so as to minimize the sum of the weighted distances between client-object pairs. For given object locations, client-object pairs are formed by assigning each client to that object which is closest to it, The client weights can be interpreted as the amounts to be shipped between the clients and their associated (closest) objects, the allowable object locations are free of restrictions and the weighted distance between an object and a client is the product of the client weight and the appropriate measure of distance. In analogy with the 1-median problem in the plane, this problem may be referred to as a multiobject generalized Weber problem although the term Weber problem is traditionally reserved for cases where the measure of distance is Euclidean. Let fij be the proportion of wi sent between client j and object qi. The problem is then:
p-Median problem in the plane
ISPI
= P,
fij=l, j g J isS,
where d,, is determined according to the measure of distance employed. In the following, we will focus the discussion upon the Euclidean distance measure, dqj= 11 * \ I 2 = {(xq,- xi)’ + (y,, - yj)2}i although the algorithm discussed will also be applicable to other distance measures; [Kuenne & Soland 19721 discuss e.g. Euclidean, rectilinear and great circle metrics although the latter are transformed into so-called rhumb-line map distances. Judging from the apparent lack of relevant literature, there exist very few algorithms for solving the above problem; we know of only one publicized algorithm, [Kuenne & Soland 19721 and its results are rather depressing. On the other hand, there exists a wedth of documented experience with heuristic procedures. We will therefore also include some mention of such approximative approaches, and in fact structure the discussion of their algorithm upon the well-known Alternate Location Allocation procedure (or A L A as it is often referred to) used in many of the heuristics. Unfortunately, with the heuristic methods one does not know how “far” a solution is from the global optimum, and by how much the solution’s total weighted distance exceeds the minimum weighted distance. It appears to be difficult to determine good lower bounds which can be used to evaluate heuristic procedures for the p-median problems in the plane, and as will be seen shortly, this difficulty in establishing good lower bounds explains many of the difficulties in constructing a good branch and bound procedure.
11.1.1. Solution methods In comparison with the 1-median problem in the plane, the objective function for the p-median problem in the plane is neither convex nor concave. Therefore,
Selected families of location problems
365
the problem at hand is far more difficult to solve and there are no simple procedures which can guarantee convergence to a global minimum. On the other hand, there are two insights which strongly suggest both a combinatorial programming approach via branch and bound and the above-mentioned heuristic approach via ALA. First of all, it can be seen from the objective function to be minimized that the fij terms will be 0 or 1 in an optimal solution; for given object locations the weighted distances are linear in the fij and there are no capacity constraints (the so-called Single Assignment Property). Secondly, the existence of an efficient algorithm for the 1-median problem i.e. for determining the optimal location of an object for any allocation of clients to it, suggests determining optimal client groupings or clusters. In order to better structure the discussion of the algorithm for the p-median problem, we will fist briefly discuss the ALA heuristic. As mentioned above, the objective function is not convex. However, if the fij take on given values of 0 or 1, then the objective function becomes convex in the ( 4 ,yi) values and decomposes into p 1-median problems solvable by the iterative, rapidly converging algorithm considered previously. Furthermore, if the ( x i , yi) are given, the objective function becomes linear in the hi variables and therefore convex. Thus, an iterative, heuristic procedure (with solutions dependent upon the initial solution) is as follows: Allocation phase: For given object locations ( x i , yi) which are a result of either an initial input or a previous iteration, allocate the clients to the nearest object; this determines the values of the fii as 1 if client j is assigned to object i, and 0 otherwise. Location phase: The hi values are given from the preceding allocation phase and the locations (xi, yi) of the objects are to be determined. This requires the solution of p 1-median problems, e.g. using the previously described algorithm. This Allocation-Location procedure continues to iterate between the two phases until a new iteration does not produce a solution in which the locations (or weighted distances) differ from the locations (weighted distances) in the preceding iteration by more than a prespecified amount. This procedure can be performed quite efficiently on a computer and is at present able to treat p-median problems which are far beyond the limits of the algorithm to be described below which is extremely demanding with respect to computer storage. The heuristic procedure can treat problems where the number of clients and/or objects is on the order of thousands, while the branch and bound procedure can run into serious storage difficulties for problems with far less than 100 clients and only a few objects. However, we cannot share the optimism regarding the ALA heuristic expressed by [Kuenne & Soland 19721 as to a “high probability of arriving at a local
. I . Krarup, P.M. Pruzan
366
Table 11.1 Number of solutions
T
1
7
21
37
54
72
86
88
81
96 Total weighted distance
*1* 2000 1 1
1900 1800 1700 4
1 *I*
1600
3
1500 3 1
2 8 2
*2*
1400
2
1300
1
1200
3 1 4 10 *5*
1 3 1 20 14 8 *6*
1100 2
1000
900 3 20 16 23 *8*
1 800
5 28 39 *13*
2 3 24 37 22
*** ~~
1
2
3
4
5
6
7
700 4 15 45 17
*** ~~~
8
9
2 600 22 35 500 *37* 400 + 10 P
minimum value reasonably close to the global minimum value”. The following table from page 119 in [Jacobsen & Pruzan 19761 gives the results of applying the ALA heuristic to a problem characterized by II approximately 100 and for p varying from 1-10 where a number of different initial object locations within the convex hull of the clients were chosen at random to start the iterative procedure. The interval for the minimal weighted distance is starred (determined by an algorithm) and the number of solutions determined for each value of p is given above each column while the entries are the number of solutions with objective function values within a given interval. Even though the histograms do not necessarily include all the possible ALA solutions, the table gives an indication of
Selected families of location problems
367
how far a local optimum can lie from the global optimum. It should be mentioned, however, that the problem considered was a modified p-median problem, in that the objects locations were compelled to coincide with client locations. See [Teitz & Bart 19681 for a discussion of the performance of an ALA routine in this case. Further evidence as to the attributes of the ALA heuristic can be found in [Eilon, Watson-Gandy & Christofides 197 11. The following results are based upon a test problem where n = 50 and where a number of initial locations were chosen for each value of p considered: Table 11.2
p
Number of initial solutions
~~
Number of local optima found
% deviation: worst vs. best solution
~
1 2 3 4
5
2 230 200 185 200
1 18 26 37 61
6.9 25.8 15.2 40.9
Furthermore, there appeared evidence that there is but a weak correlation between good initial solutions and their corresponding final solutions and that the cost function is shallow in the region of the optimum. With this background, the following is an overview of the branch and bound procedure suggested by [Kuenne & Soland 19721; following the overview, emphasis will be placed upon quality of lower bounds employed, for as will be seen, it is the poor quality of these bounds which makes the procedure so demanding of computer storage, and thereby limits its application to ‘‘larger’’ problems. The set of solutions to the problem is partitioned on the basis of the assignment of clients to objects. Each subset of solutions is characterized by its particular assignment of some of the clients, and a branching client is selected at each branching from those clients as yet unassigned in the given subset. For each subset that is obtained from partitioning a larger subset, a lower bound is computed. If this bound is less than the value of the objective function for the best solution found thus far, a branching client is chosen, and a feasible solution is found as follows. Suppose that a subset of solutions assigns clients to say k < p objects. Each of these objects has a tentative location determined as that location which minimizes the weighted distance to the clients so far assigned to it, where this location is determined by applying, e.g. the 1 -median algorithms previously described. Based upon the hypothesis that these object locations will not change significantly, these locations are temporarily fixed and each unassigned client is assigned to the closest object. As a partitioning of the subset into smaller subsets should be performed such that the lower bounds for these are as large as possible, a heuristic procedure is
J . Krarup, P.M. Pruzan
368
then carried out whereby an as yet unassigned client is chosen as the branching client if its weighted distance to the tentatively located objects is large, corresponding to, e.g. a client where the product of weight and minimum or average distance to the tentatively determined object locations is maximized. The algorithm stops when all subsets have lower bounds that equal or exceed the value of the best solution found thus far; in practise, the computations can be speeded up by making comparisons based upon (1+ p)-’ times the value of the best solution so far, thereby guaranteeing a solution whose objective function has a value within 1OOp per cent of the value of the optimal solution. As mentioned earlier, we will now concentrate upon the development of lower bounds for the p-median problem in the plane; it will be seen that at present there do not exist procedures with the ability to produce “good” bounds. The development of such procedures can therefore be considered to be one of the challenging unsolved problems, both from the point of view of improving the algorithm described above, and for improving the ability to evaluate solutions generated by the many heuristic procedures available. We will start by developing a lower bound for the 1-median problem in the plane and then generalize to the p-median problem. The exposition follows in principle the development in [Kuenne & Soland 19721 but is based upon [Jacobsen & Pruzan 19761, as the bounds developed for the p-median problem there appear to be sharper than those referred to in Kuenne & Soland, but not sharp enough to result in any significant improvement to this algorithm. As a starting point we consider a lower bound for determining the location of the object q in the 1-median problem based upon the triangle inequality da+dqkadjk, j, k = l , . . . , f l where dab is the distance between the locations of a and b according to the metric employed. This inequality holds for wj = w k = 1. In the general case, wid,
+ Wkdqk3 min {wj, Wk}djk.
Summation over all unique pairs j and k gives
where the left hand side is ( n - 1) times the total weighted distance for any given location of q as each term wid, appears (n - 1) times. A lower bound is thus
As the above approach forms the basis of the bounds developed in both [Kuenne & Soland 19721 and in [Jacobsen & Pruzan 19761, it is of interest to compare
Selected families of location problems
369
such bounds to the corresponding minimal weighted distance. The following table [Jacobsen & Pruzan 1976, p. 1461 is based upon four 20-client problems considered in [Eilon et al. 19711 where the client locations are identical in each of the problems but the ‘weights wivary. Clearly, the bounds cannot reasonably be characterized as being “sharp”, just the opposite is true. Table 11.3 Problem
LB,
Min. value
1 2 3 4
47,661 130,033 609,171 1,743,270
66,098 288,666 2,038,924 10,129,263
When the number of objects is greater than 1, an extension of the above bounding method based upon the triangle inequality results in even poorer (bounds than those presented above for the case of 1 client; see [Jacobsen & Pruzan 1976, pp. 147-1481. It is primarily due to the poor quality of the bounds that the procedure in [Kuenne & Soland 19721 is so ineffective and demands so much computer storage. Experience indicates that poor bounds can result in an explosive increase of the dimensions of a branch and bound search-tree. Before we conclude this rather lengthy section on the difficult general p-median problem in the plane with Euclidean distance and proceed to a special case (which as will be seen has many similarities with the well-known quadratic assignment problem), a few comments are called for: (1) Experience indicates that the computational requirements of the branch and bound routine described above can be reduced by starting with a “good” solution based e.g. upon an application of the ALA heuristic. (2) Over and above the ALA heuristic, there exists a wealth of heuristic procedures; see [Jacobsen & Pruzan 1976, Chapters 2 and 31 for a synthesis and evaluation. (3) A fundamental assumption underlying the choice of a planar instead of a network formulation must be that the transportation system in the area considered is quite well developed and that there do not exist significant barriers (e.g. rivers, mountain chains, etc. which only permit communication via certain routes), i.e. that it is reasonable to assume that for practical purposes, each point in the plane can be directly reached (using the given measure of distance) from every other point. The following publications suggest procedures whereby the distance measure can be modified in cases where the above assumptions are not fulfilled: [Christensen & Jacobsen 19691 consider a planar model with a rectilinear measure of distance and develop a method for determining when the shortest path
370
J. Krarup, P.M. Pruzan
between two points cannot be determined directly from their coordinates as well as a procedure for determining the length of the shortest path in such cases. [DeSalamanca & Zulueta 19711 develop similar procedures for planar models with specific applications to the placement of telephone exchanges. [Eilon et al. 19711 describe a heuristic procedure for minimizing the sum of weighted distances in cases where although distance is assumed to be proportional to the Euclidean measure of distance, the proportionality factor depends upon where in the plane an object is located. p-Median problem in the plane with mutual communication. [Francis & Cabot 19721 consider what at first glance appears to be a more general problem than the p-median problem in the plane with Euclidean distance and develop a set of properties for this class of problems. However, as will be seen, this class of problems, though certainly of considerable interest (see e.g. [Wesolowsky & Love 1971, 1972; Morris 1975]), differs considerably from the standard p-median problem in the plane. In fact it may be considered to be a special case leading up to the quadratic assignment problem rather than a more general class of p-median problems. They consider the problem of locating a set S, of p objects in the plane with respect to a set of weighted clients with given locations so as to minimize a cost function of a sum of costs directly proportional to the weighted Euclidean distances between the objects themselves and costs directly proportional to the weighted Euclidean distances between the objects to be located and the clients they serve. This apparent generalization of the p-median problem to include the weighted distances between objects as well as the weighted distances between objects and clients is however restricted due to the fact that all the weights have two indices instead of one. In the p-median problem the weights wj represent the total weight to be delivered to (or from) client j from all the p objects, while in the problem considered by Francis & Cabot the weights are given as wij and Uik, where wij is the prespecified weight to be delivered from object i to client j and ‘Uik is the prespecified weight to be delivered from object i to object k. That is, the allocation aspect or the “allocation-location’’ content of the p-median problem is removed. The problem formulation is thus: p-Median problem in the plane with mutual communication
where the first term represents the total weighted distance due to communication between objects and clients while the second term represents the total weighted
Selected families of location problems
37 1
distance due to communication between the objects themselves; dqj and dqlqkare the distances (here Euclidean) between object i and client j and between objects i and k respectively. It can be noted that should the above formulation, characterized by the double subscripted weights, be modified to include only a finite number of possible, known locations, then the problem is a special case of the well-known quadratic assignment problem often used as a prototype formulation in connection with so-called plant layout problems, see [Krarup & Pruzan 19781. Strictly speaking this remark could appear along with the discussion of the p-median problem in a network in the following section. As mentioned earlier, the above formulation may in fact be considered as a special case of the p-median problem in the plane. This becomes quite clear if we assume that there is no communication between the objects, i.e. Uik = 0 for all i, k. In this case the problem reduces to solving p independent 1-median problems in the plane, instead of reducing to the standard p-median problem in the plane. Given the above formulation, Francis & Cabot establish the intuitively obvious fact that the total weighted distance function has a minimum. They proceed then to establish necessary conditions for a minimum to be obtained, and determine necessary and sufficient conditions for the weighted distance function to be strictly convex (it is always convex). It is shown that the weighted distance function is strictly convex if and only if for i = 1, . . . ,p the set Qi= {j 1 wii > 0) is nonempty and the points in each set Qiare not collinear. It is noteworthy that these conditions are independent of the second term in the weighted distance function which represent the weighted distance between the objects to be located. Furthermore, it is shown that in the case where each object communicates with at least one other object which directly or indirectly (via yet another object) communicates with a client, then the optimum location of the objects is in the convex hull of the locations of the clients. Finally a dual to the problem is obtained and interpreted; the dual relationships are of theoretical interest but it is not established whether or not the dual problem can be computationally useful in solving the primal formulation. The objective function of the dual formulation is everywhere differentiable, which as [Love 19691 shows is not the case for the primal formulation. This lack of differentiability characterizes points of discontiniuty where either two objects or an object and a client occupy the same position. Due to the non-existence of the partial derivatives it is not possible to guarantee the convergence to an optimal solution of gradient reducing procedures, and attempts to utilize such approaches have shown that the derivatives may oscillate considerably when either two objects approach each other or an object approaches a client. In fact, if the optimal solution is characterized by two points coinciding, a gradient method cannot converge. These limitations on using gradient reducing methods are quite significant, especially as many median problems in the plane are characterized by solutions with two or more points coinciding.
J. Krarup, P.M. Pruzan
372
[Love 19691 has tackled the special problem (in 3-dimensional space) considered here by employing a so-called Fined Function Method which modifies the objective function over certain parts of the domain of object locations, so that it is continuous everywhere, has continuous first partial derivatives everywhere and converges uniformly to the original objective function. Tests are employed to determine when the objective function is to be so revised, and a gradient reducing method is employed which guarantees that a unique solution will be reached within a prespecified interval of the exact optimal solution; the dual problem is used to compute the interval. In fact the use of such procedures also permits including constraint equations as long as these define a convex set which is the case if the restrictions are of the form g(q)aO, q a0 and g(q) concave. Included here are constraints which confine an object to lie within given boundaries (linear constraints) or within a given spherical region.
11.2. Rectilinear distance We will consider here the two formulations treated in the preceeding section on p-median problems in the plane with Euclidean distance. The first of these is the standard p-median formulation:
p-Median problem in the plane (standard)
i=l
The second formulation considers communication between the objects to be located, but assumes that the weights to be transferred between both objects and clients and between objects is prespecified:
p-Median problem in the plane (with mutual communication)
11.2.1. Solution methods As was mentioned under the Euclidean case, the standard p-median problem in the plane is apparently a rather ignored (very difficult?) area for algorithm
Selected families of location problems
373
builders; the same comments were also previously applied to the p-center problem in the plane. While there are a large number of heuristic procedures available, the branch and bound procedure of [Kuenne & Soland 19721 appears to be the only well-known algorithm available. The algorithm will not be described here, as it has already been treated under the Euclidean case, where it was mentioned that the procedure can be applied to both Euclidean and rectilinear metrics, and via suitable transformation of the latitude coordinate, to approximations to the great circle metric as well. Furthermore, most of the heuristic methods for solving the p-median problem with Euclidean measures can also be applied to the rectilinear case as well. Recent heuristics based upon the ALA procedure are investigated in [Love & Juel 19761. When we consider the second, special formulation of the problem, there exist two solution methods. The first of these is based upon LP formulations which therefore also permit the inclusion of linear constraints, e.g. to limit the location of any or all of the objects to given regions. The second method is based upon approximating the objective function by a non-linear convex function having continuous derivatives with respect to xq, and yq, and then applying gradient reduction methods.
LP methods. Following [Wesolowsky & Love 19711, the problem can be transformed into an equivalent LP problem as follows. We consider first the location of the p objects so that the sum of the weighted distances to the n clients with given locations (xi, yi), j E J is minimized. Clearly, were this the only problem in that the qk were all zero, then the problem would be reduced to p 1-median problems. We start nevertheless by ignoring the inter-object communication for the sake of clarity of exposition. The function to be minimized is
2 c wii{lx* -xi/ +
i = l jcJ
IYq.
- Yj 11-
The absolute distances between the objects and the clients can be expressed as d 1ii = (xi - xqi) if xqis xi, otherwise 0, d2, = (xq,-xi) if xqi xi, otherwise 0, d 3,,.= (yi - yqt) if yq, 6 yi, otherwise 0,
d4,, = (yqi- yi) if yq, 3 yi, otherwise 0, dlii, d2ii, d3,,, d4,,20. These equalities can be rewritten as follows: ~,~+dlii-d2ii=~j) yq, + d 3i, - d4,j = yi i = l , . . . , p; j e J . dliid2,,= 0 t
J. Krarup, P.M. Pruzan
374
If we ignore the last two conditions, the problem considered here can be rewritten as the following LP problem min
f 1 wij(dlij+ d2ij+ d3,, + d4ij) i = l jcJ
subject to
+ yq,+ d3,, - d4ij= yi,
xq, d l i j- d2, = xi,
i = 1, . . . , p ; j E J.
dlij,d2+ d3ij,d4,, a 0. Clearly, if the last two conditions are met, then the LP problem is equivalent to the problem of minimizing the weighted sum of the distances from objects to clients. That in fact the conditions are met at an optimum solution to the LP problem above can be verified by assuming that the current value of dlij,d2ij,d3ij and d4ij are given by dl;j, d2ij, d3ij and d4$ where dl:j>d2;jf.0 and d3;j> d4 0. Clearly, a solution which reduces the value of the objective function is given by d l , =dlij-d2b, d2, =0, d3ij=d3ij-d4ij, d4,, = O . Thus, using this approach also for the term
y1 f
i=l k=i+l
vikdq,qk
the problem of minimizing the sum of the weighted distances between objects and clients and between the objects themselves can be transformed into the following LP problem with p(2n + p - 1) constraints and 4pn + 2p2 variables corresponding to an equality constraint and roughly 2 variables for each absolute value term in the original objective function (all constraints corresponding to wij = 0 or Uik = 0 can be omitted).
subject to X, + dljj -d2ij = ~j ys, + d3ii - d4i, = yi
I
i = 1 , . . . ,p; j e J ,
Selected families of location problems
315
A dual formulation of the problem results therefore in 4pn+2p2 constraints where 2p(2n + p - 1) are bounds on the p(2n + p - 1) variables and only 2 p are ordinary linear inequalities. This suggests solving the dual problem with a bounded variable algorithm instead of the primal. Clearly, as the problem is transformable into an LP problem, linear restrictions as to the locations and as to distances between objects or between objects and clients are permitted. A similar approach is presented by [Morris 19751 which results in an LP problem whose dual has 3pn + 3(n2- n ) / 2 variables and pn + (n’ - n)/2 + 2n nontrivial constraints. In both of the above approaches, the addition of linear constraints as to the locations of the objects or as to the distances permitted between objects and clients or between objects clearly will not increase the number of constraints in the dual formulations. The relative effectiveness of the two approaches will depend upon the LP algorithms employed and in particular upon the effectiveness of routines for treating bounded variables in the Wesolowsky and Love formulation. In both cases however, the size of the LP problem becomes quite large as the number of objects and clients increases, and the linear programming approach becomes impractical. Gradient reduction methods. Therefore, following the approach by [Love 19691 to the p-median problem in the plane with Euclidean distance, [Wesolowsky & Love 19721 have approximated the nonlinear, convex objective function with a convex nonlinear function having continuous derivatives with respect to xq, and yq,. This results in a nonlinear formulation with 2p variables and no constraints, and permits the use of gradient reduction methods where convergence to any desired degree of accuracy is assured. Each term in the original objective function is replaced by a hyperbolic approximation; e.g. a term w,, Ixq, - x,I is replaced by wcl{(xq,- x,)’ + b’p a hyperbola with center (x,, 0) and asymptotes with slopes fw,,. The maximum difference between these two terms is w,,b which can be made arbitrarily small by reducing b. The approximation is strictly convex and is asymptotically convergent to w,, lxqq- x, 1 as lxq,- x, I becomes large. Similarly, a term u,k lxq,- xq,l is replaced by the convex term qk{(xq, - xqk)2+ b’}& and the maximum difference between the two terms is uIkb.The resulting approximation for the original objective function is then:
2 c [wii{(xqi
- Xi>*
+ b y +{( yq,- Yj>’ + b2}f] +
i = l ioJ
and the difference between this approximation and the original objective function approaches 0 as b approaches 0. As long as 6 is not 0, the partial derivatives of
J. Krarup, P.M. Pnczan
376
the approximating function are continuous, thereby permitting the use of gradient reduction methods; however, as b becomes small the derivatives, while continuous, undergo sharp transitions which hinder the convergence of such gradient reduction methods. We note too, that as long as b is not 0, two objects cannot be located at exactly the same point or an object cannot be placed at exactly the same location as a client. Via the derivation of an upper bound for the difference in the two objective functions, it can be shown that as b decreases, the approximating function is uniformly convergent to the original objective function, and therefore the optimal value of the approximating function also converges to the optimal value of the original function. Clearly, the choice of b is of importance; experience has shown that the speed of convergence is greatly increased as b is increased at a very small expense in accuracy. Finally, it should be noted that as was the case in the LP formulations, constraints can be included. However, they are no longer confined to linear constraints as long as they form a convex constrained set, and conventional nonlinear programming routines can be used.
12. p-Median problem in a network As was the case when considering the median (1-median) problem in a network, it is not necessary here to distinguish between an absolute-median problem and a vertex-median problem. Following the same arguments as for the 1-median problem, it can be shown that there exists at least one subset K , s V of p vertices of the set of n clients (vertices) V in the graph G(V, E) which minimizes the sum of the weighted distances between client-object pairs. Clientobject pairs are formed by assigning each client to the object located nearest it, i.e. with the minimum shortest path to it. The first formal proof of this logical extension of the result for 1-median in a network is given in [Hakimi 19651. The results apply in fact to both directed and undirected networks, as is the case far the results to be presented below, and we will henceforth use the term network to represent either of these cases. ~ be a (0, 1)matrix of assignments. The formulation of Let R = {rkj}, k E K,, j . V, the p-median problem in a network is thus
p-Median problem in a network
IKPI
C
=PI
rkj = I,
j E
V,
keK,
rkj E (0, I}, k E K,, j E
v
where dkj is the length of a shortest path from k to j .
Selected families of locution problems
311
This problem can of course be solved by complete enumeration. For each of the subsets of p vertices chosen among the n vertices, this requires determining the optimal assignment of clients to the set and then evaluating the sum of the weighted distances resulting. The set K, with the minimal sum of weighted distances is then the p-median. This procedure can be performed as follows:
(3 possible
(1) Post-multiply the (n X n) matrix D of shortest path distances by the (n X n ) diagonal matrix W where the diagonal elements are the n vertex weights. This results in the unsymmetric weighted distance matrix DW with elements {widii}. (2) By selecting the appropriate p rows of OW, form the ( p x n ) matrix of weighted distances from each member of the set K , of p vertices currently under consideration to each of the n clients. (3) For each of the n columns, determine the minimal element in the column; this is the least weighted distance from client j to any member of K,. (4)Sum the least elements in each column; this gives the optimal solution for the given set K,. ( 5 ) Repeat this for the (:) possible sets of p vertices and choose as the optimal solution for the p-median problem at hand that set K , having the minimum sum of weighted distances. This is a combinatorial search problem of considerable dimensions for realistic values of n and p ; for n = 100 and p = 5 , there are approximately 75 x lo6 possible p-medians while for p = 10 there are on the order of 1013 possible p -medians. [Goldman 19691 extended Hakimi’s result discussed above and considered problems where a client (vertex) may be a source “s” and/or a destination “t” for commodities that require two stages of processing. The processing may modify the goods such that the weight for median-to-destination movement may not be identical to the weight for source-to-median movement and the processing may be carried out at either one or two objects (medians) as each object has facilities for both stages of processing. For a variable ordered vertex pair (s, t) that ranges over all source-destination (vertex) pairs between which movement is to take place, the simple vertex weights wi are then replaced by the functions: (non-negative) weight per unit distance w(s, t ) of source-to-object movement, w’(s, t) of objectto-destination movement and w”(s, t) of object-to-object movement. The problem considered is therefore: p-Median problem in a network, extended min Q,cg
lQ,l
{
min
[w(s, t)d:,+ w”(s, t)d$,+ w’(s, t)di,l],
s s ~ , t c VqEQo.hEQp
=P
where d:,, dkh and d i t are shortest path distances in the resulting subdivision network.
378
J. Krarup, P.M. Pruzan
Goldman showed that under these more general conditions Hakimi’s results for the “pure” p-median problem still hold, i.e. there exists a set Q, c V of p vertices such that the objective function takes on its minimum value for all Q, c 8. [Hakimi & Maheshwari 19721 generalized Goldman’s results to include an arbitrary number of commodities, each requiring an arbitrary number of stages of processing. Under the assumptions that each median is capable of performing any stage of processing for any commodity and that the cost of transporting any commodity a distance from any processing stage at an object (median) to an object for the following processing stage is a nondecreasing concave function of the distance, then there exists a set of p vertices such that the total weighted costs are minimized. These results show that under fairly general assumptions the results of the “pure” p-median problem still hold, i.e. we need only consider a set of p vertices when searching for a p-median. Furthermore, these results hold even if one places capacity constraints on the edges of the network (and thus even if route splitting is required when sending a commodity from its source to its destination). The pure p-median problem considered here may be viewed as a special case of more general median problems which also include members of the plant location family. The fact that p, the number of objects to be located, is prespecified may be interpreted as the logical result of a problem formulation equivalent to a simple plant location problem. If the total funds available for fixed and variable plant costs are limited, and both the fixed costs of establishing a plant (object) as well as the variable costs per unit of throughput at an object are independent of the object locations, then it can easily be shown that the budget restriction results in an upper bound on the number of objects to be located. As it is clear that in such a case the number of objects should be maximized, the p-median problem results; see e.g. [ReVelle & Swain 19701. It is perhaps not too broad a statement to say that the difficulty in solving the broader class of location problems characterized by fixed costs, an unspecified number of objects to be located and possible capacity restrictions is inherent in the “pure” p-median problem itself. Fortunately, and in contrast to the p-median problem in the plane, considerable effort has been expended upon the p-median problem in a network. In the following exposition, we will group the discussion in three sections: (1) a heuristic, (2) LP based approaches, and (3) branch and bound algorithms.
A heuristic. Even though this paper is mainly oriented towards algorithms, a heuristic will be discussed as it provides excellent results and is very efficient as regards computer time and storage; most articles on algorithms compare the algorithmic results to the results of this heuristic procedure. The heuristic procedure is presented in [Teitz & Bart 19681 and is based upon a savings procedure referred to as a “vertex substitution”. It starts by choosing a
Selected families of location problems
319
subset K, of p vertices and determining those clients associated with (closest to) each of these. It then tests whether a vertex not in this subset, i.e. belonging to V-K,, can replace a vertex in the subset, thereby producing a new trial subset with a reduced objective function. The test is performed by replacing each vertex in K, by the candidate vertex and determining the maximal reduction in the objective function. If this maximal reduction is positive, the candidate vertex replaces that vertex in the subset K, which results in this reduction and a new subset K,’ is obtained. The same tests are now performed on the new trial set by a new candidate vertex not in K, or K,’ and this procedure continues until all vertices in the complement of K , have been tried. The resulting subset is defined as a new K, and the steps are repeated. Such a complete repetition is called a cycle. When a cycle results in no reduction in the value of the objective function, the procedure terminates. This final set is then an estimate of the p-median. This procedure is similar to many heuristic procedures within other problem areas and forms, e.g. the basis for heuristic procedures such as CRAFT used to propose solutions to plant layout problems; see [Krarup & Pruzan 19781. Finally, it should be mentioned that as suggested by Teitz & Bart the quality of the final solution could be improved by considering the substitution of p vertices at a time, 1< p < p. This would require however that in order to just verify that a final set cannot be improved upon ($(“pp) possible substitutions must be performed and the corresponding changes in objective function value evaluated. This number of substitutions increases rapidly for increasing values of p while for p = 1,only very few cycles appear to be required to reach a stable estimate of the p-median. LP-based approaches. The most straightforward LP based procedure was developed by [ReVelle & Swain 19701. They formulate the problem as an integer LP problem which they then attempt to solve by relaxing the integer requirements and then applying standard LP. The procedure appears to result in integer solutions in most cases, and in the event of a non-integer solution, a simple branch and bound procedure is suggested in which branching fixes a non-integer assignment of a client at vertex j to an object at vertex i to either 0 or 1; LP is then used to determine the solution to each of the resulting subproblems and this continues until all assignments are integer, that is either 0 or 1. min
c
C
widiirii,
icV.jcV
rii =1, j € V ,
ieV
rii - rii 3 0, iEV, j c V , iZj,
rij~ ( 0l}, , i E V, j E V
380
J . Krarup, P.M. Pruzan
where the integrality constraint can be replaced by the equivalent, but weaker form
rii~ ( 0l}, , i E V, rii 3 0, i E V, j~ V, i f j , as for any set of p-medians (vertices for which rii= 1) the remaining rii are 0 or 1. The first restriction assures that each client j is fully assigned. The second restriction assures that rii = 1 only if rii= 1 thereby assuring that clients are only assigned to vertices which are self-assigned and therefore in the median set. The third restriction fixes the number of objects and thus the number of vertices which can assign to themselves. If the requirement that the rij be either 0 or 1 is removed and replaced by rii 3 0 , the problem is a simple LP problem where the rii are restricted to be S 1 due to the first set of restrictions. This LP problem has n2 variables and (n’+ 1) constraints and the number of iterations required can be expected to be quite favorable when compared to an enumeration procedure, e.g. if the number of iterations required to solve the LP formulation is less than twice the number of constraints, then an upper bound on this number is 2(n2+ 1) compared to (F) combinations to be evaluated under direct enumeration. Furthermore, the number of LP iterations can be reduced considerably by starting with a “good” guess as to a solution, and by adding the constraints rii- rii2 0 as they are required. A suggested procedure is first to assign each vertex to its nearest neighbour without any self-assignment. Then assign only the p vertices to themselves which have the p largest assignment costs when assigned to their nearest neighbours and then add the constraints rii- rii 3 0 for only those vertices i which do not self-assign but where j assigns to vertex i under the above procedure. If an LP solution does not result in a solution where an r k k < rkj then the solution is optimal, otherwise add additional self-assignment constraints and proceed in this manner. Alternatively, heuristic methods can be employed for determining which vertices self-assign and then add only these self-assignment constraints. If the solution is optimal, no vertex will be assigned to a vertex which does not self-assign. This LP-based approach can be seen to be quite flexible. It permits the very simple checking of heuristic procedures for optimality and enables the consideration of points on the network edges if this is so desired. Furthermore it enables other linear restrictions to be included. Finally, it permits simple treatment of sensitivity analyses, e.g. with respect to the number of objects p by considering the dual variables or by simply varying p and rerunning the problem. It has however two major drawbacks. The possibility of non-integer solutions exists of course but this appears to be a quite rare occurence. The second major drawback is however quite serious; the number of constraints ( n 2+ 1) limits the straightforward application to problems where n < 100, even with e.g. M P S X on a large IBM/370.
Selected families of location problems
38 1
Another LP-based approach is provided by [Garfinkel, Neebe & Rao 19741. The problem formulated is identical to that of [ReVelle & Swain 19701. However the relaxed integer programming is solved by decomposition. The problem is decomposed on the index i with the linking constraints consisting of the constraints which assure that each client is fully assigned (CieVrii = 1, j E V) and the constraint that fixes the number of vertices which can self-assign (Cis” rii= p). The LP basis to be dealt with is thereby reduced to ( n + 1)rows. In case of non-integer solutions, a group theoretic algorithm is employed which takes advantage of the fact that the master problem to the decomposed LP problem is a modified set partitioning problem. It is to be noted that due to degeneracy difficulties, this LP approach appears to be more efficient for larger rather than for smaller values of p for given n ; this is evidenced in tests in [Garfinkel, Neebe & Rao 1974; Khumawala, Neebe & Dannenbring 19741 by the LP decomposition failing to converge within a given number of pivots. Once again ReVelle & Swain’s observation that the LP formulation often results in an optimal integer solution is confirmed. A final LP-based approach to be discussed here is provided by [Marsten 19721 who again used ReVelle & Swain’s formulation as a starting point. As this formulation in its most general form is a large LP problem with n2 variables and (n’ + 1) constraints (although as mentioned previously the number of constraints can be significantly reduced via a simple heuristic) Marsten seeks a more efficient procedure. It is shown that every p-median of the network for p = 1 , . . . n is an extreme point of the same polyhedron P. A Lagrangian reformulation of ReVelle & Swain’s formulation serves as the starting point for a parametric LP procedure which begins with an n-median (the set of all vertices) and then generates a tour around the polyhedron P that passes through most of the p-medians and very few others. It may however pass through extreme points which correspond to fractional values of p and may not determine a p-median for certain values of p. In addition, the computations increase considerably as the number p of medians decreases as the necessary changes in the configuration of the solution become significant; with decreasing p, the vertices are grouped into fewer and fewer clusters and therefore the number of vertices that must be reassigned due to removing a median increases, resulting in an increasing number of basis changes before a new extreme point is determined. The associated pivot operations must be performed on an inverse that is increasing in size. The decrease in efficiency for decreasing values of p may be of importance in realistic location problems, as these generally deal with the location of but a few facilities. A test example with 33 vertices resulted in all p-medians with the exception of the 9 and %medians being found in the range p = 33 down to p = 4; the tour was halted for p = 4 due to the above mentioned growth in the storage requirements. It is suggested that the series of solutions developed by an application of the procedure make it easy to find the missing p-medians with the aid of a branch and bound search, or by simple inspection of the surrounding solutions for values of p above and below the missing value of p.
J . Krarup, P.M. Pruzan
382
Before proceeding to a discussion of branch and bound procedures for solving the p-median problem in a network, we will briefly consider a modified formulation of this problem which, although not employing LP as a solution procedure, is based upon the LP formulation [ReVelle & Swain 19701 and the LP formulation of a location set covering problem [Toregas et al. 19711considered in Part I under the discussion of a modified p-center problem in a network. By defining 4.= {i I widi,c d,} where d, is a given maximum weighted distance, the p-median problem to be considered here is the pure p-median problem subject to maximum distance constraints: p-Median problem in a network with distance constraints
1 wjdiiri,,
min jE
V,i EN,
rii-riisO,
~ E Vi ,E N j and i#j,
rii E (0, l}, j E V, i E 8.. The only difference between this formulation and that of the pure p-median problem considered above is that the potential objects which can serve/be served by a client are restricted to be within a given maximum distance from the client. [Toregas et al. 19711 consider the related problem of determining the minimum number of objects to be located such that distance constraints are not surpassed; this problem is formulated as a set-covering problem and LP is employed to solve the relaxed problem where additional cut constraints are added to resolve situations with fractional solutions. [Khumawala 19731 rejects this approach in the case of the p-median problem as integer solutions are not guaranteed and as it does not permit parametrization of the distance constraints. Branch and bound could be employed as it will both guarantee integer solutions and permit such parametrization; however, while such an approach may be quite reasonable for the modified p-center problem of Toregas et al., according to Khumawala it requires considerable computer time and storage to obtain an optimal solution for the modified p-median problem. Khumawala presents therefore two simple heuristic procedures based upon savings for solving the modified p-median problem. The first of these consists of computing the minimum savings in weighted distance which can be obtained by each potential object were it established (i.e. located at a vertex) and proceeds to eliminate objects with the least minimum savings. The second procedure consists of computing the total savings an object would result in were it established relative to only those objects already established. For a given problem both procedures are employed and the solution giving the least weighted distance is
Selected familiesof location problems
383
chosen. It is difficult to test the goodness of the solutions so obtained, as even for very small problems (e.g. n = 30, p = 15) computer storage and time requirements on an IBM 370/165 become excessive for solving the corresponding branch and bound procedure; clearly the branch and bound approach will be most time demanding for values of p close to an. However, for the many test problems where comparisons were possible, the heuristic procedure resulted in solutions within but a few per cent of the optimal solutions. As parametrized solutions to the modified p-median considered here are important inputs for decision making in location problems, where trade-offs between total weighted distance and worst-case service are of importance, there appears to be room for considerable future research effort here. Branch and bound algorithms. There exist several well documented branch and bound procedures for solving the p-median problem in a network, two of which will be described below. These differ of course primarily according to the branching and bounding rules employed. Common to all such approaches, however, is the attempt to utilize the structure of the underlying problem (in contrast to the integer programming formulations considered above). The first such procedure to be described is due to [Jarvinen et al. 19721. It assumes the existence of the (n X n) matrix of weighted (shortest path) distances, D W = {widij}and is based upon the facts that the n weighted distance terms in the objective function corresponding to an optimal solution are located in only p rows of this matrix, and that there are at least p zero terms in this sum of weighted distances corresponding to the self-assignments as represented by the diagonal elements in the matrix. The partitioning rule employed is to remove first one vertex from the set of possible vertices in the p-median, then two vertices and so on until (n - p) vertices have been removed. The lower bound computation assumes that rn vertices have been removed (1 s rn < r~- p) and that ( n- rn) vertices therefore remain from which p vertices must be chosen. Thus there are two sets of vertices, V, with rn vertices and Vn-, with (n- rn) vertices. For the columns of the DW-matrix corresponding to the vertices in V, one computes the weighted distances si = miniEvn-m { widi},j E V, representing the minimum weighted distance for serving non-median vertex j from one of the (n - rn) potential medians in the set Vn-,. For the columns of D W corresponding to vertices in vn-, one computes the weighted distances s k = mini,v,-m,i+k{widik}, k E Vn-, representing the minimum weighted distance for serving potential median k from another potential median. The lower bound is then the sum of the following two terms: the rn weighted distances si and the (n- rn - p) smallest sk values as it is known that at least p of the weighted distance terms are zero due to self-assignment and that (n - rn -p) vertices which are as yet potential medians will not be medians in the final solution and must therefore be allocated to other potential medians.
J. Krancp, P.M. h z a n
384
The procedure is started by considering all n vertices as potential medians, choosing as the first non-median vertex that which has the minimum lower bound as calculated above associated with it, and continuing in this manner until rn = (n - p). A feasible solution now exists (called branch and bound solution without backtracking). If the value of its objective function is less than or equal to the lower bounds still under consideration, then an optimum solution has been found; otherwise the lower bounds greater than this value are removed from further consideration, backtracking takes place to that node corresponding to the least lower bound still active and so on. Computational experiments were carried out for n = 20 and p taking on the values 5, 10 and 15, Vertex weights and the distance matrices were produced at random and a total of 50 DW-matrices were generated. The algorithms were programmed by Autocode for the Elliot 503 and the results of Tables 12.1 and 12.2 were obtained where “Substitution” refers to the previously referred to heuristic procedure of [Teitz & Bart 19681 and where “Efficiency” is the quotient of the optimal solution obtained via branch and bound with backtracking to branch and bound without backtracking and the substitution heuristic respectively.
Table 12.1. Efficiencies and corresponding numbers of problems solved Branch-andbound without backtracking
Efficiency
Substitution
No. of medians, p
5
10
15
5
10
15
1.oo 0.95 0.90 0.85 0.80 0.75 0.70 0.65 and lower
9 9 9 9 3 5 2 1
25 16 2 6 1
37 5 6 2
20 5 3 7 2 2 2
31 2 6 6 4 0 1
48 0 2
Table 12.2. Running times in seconds Branch-and-bound Mean 5-median 10-median 15-median
109.26 46.26 2.52
St.dev. Min 59.88 41.65 0.46
Substitution Max
18.0 275.0 5.8 226.8 1.7 3.8
Mean 2.00 2.27 1.80
St.dev. Min 0.65 0.66 0.26
1.1 0.9 1.4
Max 4.3 3.8 2.7
Selected families of location problems
385
It is clearly seen that the efficiencies of the two heuristic procedures, substitution and branch and bound without backtracking, increase for increasing p and that the branch and bound algorithm clearly requires less time as p approaches n/2 (which is of course to be expected as each in-depth search continues until m = (n -p) which is a minimum for p = 42). Little can be said however as to the computational requirements as a function of n ; in practice many location problems will involve far greater values of n than n = 20 considered above. [El-Shaieb 19731 presents an alternative branch and bound approach. In contrast to that of [Jarvinen et al. 19721 referred to above which only constructed a destination set, it constructs both a source (median) set and a destination (non-median) set. It starts with both sets empty. A vertex is added to either set at each separation. This results in two nodes, one of which represents the addition of the vertex under consideration to the source set while the other node represents its addition to the destination set. A lower bound is calculated for each subset of solutions, and the node among the active nodes with the lowest bound is selected for branching. After a number of iterations, either the destination set will include ( n-p) vertices or the source set will include p vertices and a feasible solution results. The procedure continues in a manner similar to that of [Jarvinen et al. 19721 by checking for optimality, updating the active sets of nodes, backtracking etc. El-Shaieb also presents a series of test examples which indicates that computational requirements increase at a rapid rate for increasing n and for increasing p in relation to n. [Khumawala et al. 19741 demonstrate that for increasing n the time demands of this branch and bound procedure increase at a far greater rate than other procedures for solving the p-median problem. However, they also indicate that one of the two lower bound procedures suggested by El-Shaieb is “the most efficient lower bound known for this problem”. In conclusion, we in all modesty will refer to [Krarup & Pruzan 1977Bl for a series of subjects for further research.
References (Part 11) Christensen, T., and S.K. Jacobsen (1969), Network for hospital planning (in Danish), M.Sc. thesis, IMSOR, Technical University of Denmark. Cordellier, F., J.C. Fiorot, and S.K. Jacobsen (1976), An algorithm for the generalized Fermat-Weber problem, IMSOR, Technical University of Denmark. DeSalamanca, M.E., and J. Zulueta (1971), Computer aids to network planning: placement of telephone exchanges in urban areas, Electrical Communication 46, 196-201. Eilon, S., C.T.D. Watson-Gandy, and N. Christofides (1971), Distribution management. Mathematical modelling and practical analysis (Griffin, London). El-Shaieb, A.M. (1973), A new algorithm for locating sources among destinations, Management Sci. 20, 221-231. Eyster, J.W., J.A.White and W.W. Wierwille (1973), On solving multifacility location problems using a hyperboloid approximation procedure, AIIE Transactions 5, 1-6.
386
J. Krarup, P.M. &zan
Francis, R.L., and A.V. Cabot (1972), Properties of a multifacility location problem involving Euclidean distances, Naval Res. Logist. Quart. 19, 335-353. Francis, R.L., and J.A. White (1974), Facility layout and location: an analytical approach (PrenticeHall, Englewood Cliffs, NJ, U.S.A.). Garfinkel, R.S., A.W. Neebe, and M.R. Rao (1974), An algorithm for the m-median plant location problem, Transportation Sci. 8, 217-236. Goldman, A.J. (1969), Optimal locations for centers in a network, Transportation Sci. 3, 352-360. Goldman, A.J. (1971), Optimal center location in simple networks, Transportation Sci. 5 , 212-221. Hakimi, S.L. (1964), Optimum locations of switching centers and the absolute centers and medians of a graph, Operations Res. 12, 450-459. Hakimi, S.L. (1965), Optimal distribution of switching centers in a communications network and some related graph theoretic problems, Operations Res. 13, 462-475. Hakimi, S.L., and S.N. Maheshwari (1972), Optimum locations of centerS in networks, Operations Res. 20, 967-973. Halpern, J. (1976), The location of a center-median convex combination on an undirected tree, J. Regional Science 2, 237-245. Handler, G.Y. (19731, Minimax location of a facility in an undirected tree graph, Transportation Sci. 7, 287-293. Hurter, A.P. Jr., M.K. Schaefer, and R.E. Wendell (1975), Solutions of constrained location problems, Management Sci. 22, 55-56. Jacobsen, S.K. (1973), Om lokaliseringsproblemer-modeller og Lsningsmetoder, Ph.D. Dissertation, IMSOR, Technical University of Denmark. Jacobsen, S.K. (1974), Weber rides again, IMSOR, Technical University of Denmark. Jacobsen, S.K., and P.M. Pruzan (1976), Lokalisering-modeller & 18sningsmetoder,IMSOR, Technical University of Denmark. Jarvinen, P., J. Rajala, and H. Sinervo (1972), A branch-and-bound algorithm for seeking the p-median, Operations Res. 20, 173-178. Khumawala, B.M. (1973), An efficient algorithm for the p-median problem with maximum distance constraints, Geographical Analysis 309-321. Khumawala, B.M., A.W. Neebe, and D.G. Dannenbring (1974), A note on El-Shaieb’s new algorithm for locating sources among destinations, Management Sci. 5 , 212-221. Krarup, J., and P.M. Pruzan (1977A). Selected families of location problems. Part 111: The plant location family, Working Paper No. WP-12-77, University of Calgary, Faculty of Buisness. Krarup, J., and P.M. Pruzan (1977B), Challenging unsolved center and median problems, DIKUreport 1977-10, Institute of Datalogy, University of Copenhagen. To be published by the Polish Academy of Science. Krarup, J., and P.M. Pruzan (1978), Computer-aided layout design, Math. Programming Study, 9, 75-94. Kuenne, R.E., and R.M. Soland (1972), Exact and approximate solutions to the multi-source Weber-problem, Math. Programming 3, 193-209. Kuhn, H.W. (1973), A note on Fermat’s problem, Math. Programming 4, 98-107. Kuhn, H.W. (1976), Nonlinear programming: a historical view, SIAM-AMS Proceedings, Vol. 9. Kuhn, H.W., and R.E. Kuenne (1962), An efficient algorithm for the numerical solution of the generalized Weber problem in spatial economics, J. Regional Science 4, 21-33. Love, R.F. (1969), Locating facilities in three-dimensional space by convex programming, Naval Res. Logist. Quart. 16, 503-516. Love, R.F., and H. Juel (1976), Properties and solution methods for large location-allocation problems, Technical Summer Report No. 1634, Mathematics Research Center, University of Madison, Wisconsin. Marsten, R.E. (1972), An algorithm for finding almost all of the medians of a network. Discussion paper 23, Northwestern University, Evanston, Illinois, U.S.A. Morris, J.G. (1975), A linear programming solution to the generalized rectangular distance Weber problem, Naval Res. Logist. Quart. 22, 155-164. Odoni, A.R. (1974), Location of facilities on a network: a survey of results. Technical Report 03-74, Operations Research Center, M.I.T. Cambridge, Mass., U.S.A. ReVelle, C., and R. Swain (1970), Central facilities location, Geographical Analysis 2, 30-42.
Selected families of location problems
387
Schaefer, M.K., and A.P. Hurter Jr. (1974), An algorithm for the solution of a location problem with metric constraints, Naval Res. Logist. Quart. 21, 625-636. Teitz, M.B., and P. Bart (1968), Heuristic methods for estimating the generalized vertex median of a weighted graph, Operations Res. 16, 955-961. Toregas, C., R. Swain, C. ReVelle, and L. Bergman (1971), The location of emergency service facilities, Operations Res. 19, 1363-1373. Wesolowsky, G.O., and R.F. Love (1971), The optimal location of new facilities using rectangular distances, Operations Res. 19, 124-130. Wesolowsky, G.O., and R.F. Love (1972), A nonlinear approximation method for solving a generalized rectangular distance Weber Problem, Mangement Sci. 18,656-663.
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 389-398 @ North-Holland Publishing Company.
A SURVEY OF MULTIPLE CRITERIA
INTEGER PROGRAMMING METHODS Stanley ZIONTS School of Managernenf, Stale University of New York at Buffalo, NY, U.S.A. Several methods have been proposed for solving multiple criteria problems involving integer programming. This paper contains a brief survey as well as a typology of several such methods. Although computational date is scanty to date, an attempt is made to evaluate the methods from a user orientation as well as from the perspective of a researcher trying to develop a workable user-oriented method.
1. Introduction In recent years research in solving multiple criteria integer programming problems has been carried out. Some of it has been classified as such, and some of it has appeared in areas such as decision theory. The purpose of this paper is to provide a survey of such methods in one place, and to provide a guide for users and researchers. The paper consists of 5 sections. In the first section we overview the problem area and devise a problem as well as a solution framework. In the second section we explore methods for finding efficient or nondominated solutions where the solutions are described explicitly. In the third section we explore utility maximization schemes where the solutions are described implicitly, using constraints. In the fourth section we explore utility maximization methods in which the solutions are described explicitly. The paper is concluded in Section 5 .
Problem framework A mixed integer linear programming problem may be written as
maximize C’X + d’y, subject to Ax + Ey S b, x 3 0 and integer,
y20
where A and E are matrices of appropriate order and x, y, c, d and b are vectors of appropriate order. A multicriteria mixed integer linear programming problem may be written as maximize Cx + Dy, subject to Ax E y S b, x a 0 and integer,
+
389
(MCIP) y 20
390
S . Zionts
with C and D as matrices instead of vectors as in (IP). (Where y is null, MCIP is an all integer problem.) “Maximize” in problem MCIP is not well defined. We may use as our definition “vector maximize.” In that case the solution to MCIP is the set of nondominated solutions to the constraint set of (MCIP). Letting u equal the vector of objective functions, the set of nondominated solutions consists of all solutions such as u*, where there exists no solution u 3 u* (with strict inequality in at least one component.) A two objective example is shown in Fig. 1. The nondominated solutions are indicated,
Objective 2
I
Objective I
Fig. 1. A graph of feasible integer solutions involving two objectives. Both are to be maximized. The nondominated solutions are indicated.
The diagram in Fig. 2 provides a classification of the methods for solving multiple criteria integer problems. There are two ways in which the solutions may be described: implicitly, that is, by using constraints; o r explicitly, by listing all of the alternatives available. (The second way is not feasible for problems with continuous variables, in general, because there are an infinite number of solutions.) I have defined two different types of optiniality of solutions: efficient o r nondominated and maximum utility. Efficient o r nondominated solutions imply vector maxima; for a solution to be nondominated, there exists no other solution with all components at least as large as the first solution. The intent is to identify all nondominated solutions. and to limit considerations only to those. The other possibility for the evaluative mechanisms involve the use of a utility function. A utility function relating a single measure of effectiveness to all of the objectives is developed and then maximized over the set of alternatives. There are several classes of utility function that we consider: A linear utility function ascribes a weight to each of the objective functions. Letting the vector of weights be A with components Ai > 0 for objective i, the solution of the problem is the solution to the integer programming problem Maximize A’Cx+ A’Dy. The problem is to determine A and solve the corresponding integer programming problem. An additive utility function allows for a nonlinearity in the relationship between the level of a given objective and its contribution to the overall utility. However, that relationship holds for any levels of other objectives. Thus the
391
Multiple criteria integer programming methods Solution Set Implicit All constraints expressed mathematically; the feasible solutions satisfy the constraints.
Nature of Optimality of Solutions
All solutions stated explicitly
Bitran [ I ] Bowman [2]
Efficient or NonDominated
, Linear Maximum Utility
Explicit
Lee [7] Zionts [ I31
(under some circumstances, the linear methods may be generalized to additive.)
Additive Mole Genera I
Sarin [Y]
*Fishburn [4] **Keeney and Raiffa [I51 Zionts [ 141
Fig. 2. A classification scheme for multiple criteria integer programming’
problem is still to determine A and then to solve the problem Maximize A’g(x, y ) where g(x, y) is the vector of objective functions as a function of the decision variables. For the more general case, although certain specific functions have been identified and extensively studied, we lump them all together as Maximize u ( g ( x , y)) where u is a function of the vector g(x, y). Here the function u must be determined. The methods differ in the precision to which they specify the utility function: some attempt to completely specify it, whereas others attempt to approximate it only as precisely as necessary to identify the optimal solution (or solutions) in a neighborhood. A third dimension of the problem is the stochastic nature of the problem: stochastic or deterministic. Of the methods shown in Fig. 2, o n l y the methods of Fishburn [4] and Keeney and Raffa [5] may be applied in the stochastic case. (The method of Zionts [ 141 can be extended to the stochastic case.) (The function g(x, y ) may be linear in which case we can represent the utility function as u(Cx+Dy).) We now turn to the various methods for solving the problems. First we consider methods involving the implicit statement of solutions using constraints.
2. Finding efficient or nondominated solutions
Where the feasible solution space is convex and the objective functions concave (to be maximized) - not the integer case at all -then a parametrization of the A third dimension is deterministic versus probabilistic. The methods indicated with an asterisk are probabilistic as well as deterministic. The other methods are deterministic.
S. Zionts
392
weights on the objective functions (Maximize Ag(x, y)), over all possible values of the vector A (which may be normalized without less of generality) is sufficient to identify all possible efficient or nondominated solutions. Not surprisingly, even for linear programming problems (see for example Evans and Steuer [3] and Yu and Zeleney [ll], for example), there is a huge number of efficient extreme point solutions for relatively small problems. Various researchers (e.g., Steuer [ 101) have tried to find all possible efficient solutions in a small solution neighborhood, as defined by limiting the range on the vector of multipliers A. Steuer has had success by first identifying the proper general region of the optimal solution as given by the ranges, and then finding all solutions in that region. It is fairly well known (see, for example, Bowman [2]) that the parametrization used for the continuous problem is not sufficient to generate all efficient integer solutions. This may be seen graphically in Fig. 3. Solutions X and Y are the only 2 solutions that will be generated using a parametric approach maximizing a linear utility function Alu,+A,u,(A,,A2>0): X if A , > A , , and Y if A , > A , (both are equivalent if A l = A,). However solution Z is nondominated as well. Objective 2, u2 X 0 0
.Y
I Objective I, u1 Fig. 3. Maximizing a linear utility function may not identify all ethcient integer solutions.
We now consider two methods that generate all efficient integer solutions. Bowman [2] has proposed a generalized Tchebycheff norm to generate all efficient integer solutions:
subject to Ax =S 6, x a 0 and integer, where O < pi <m and gi =Max gi(x).
(No extension to mixed integer problems has been proposed.) This is done by parametrically solving for all pi (perhaps using a normalized form 1 pi= 1, pi > 0). The problem may be more simply solved by parametric linear programming. The problem to solve is the following minimize subject to
2, z 3 pi(gi
Ax
-
gi(x))
=S b,
x 3 0 and integer
for all i,
Multiple criteria integer programming methods
393
for all possible values of p,. The method has not been tested, nor does it seem promising except for problems involving two objectives. Bitran [l] has proposed a method for finding all efficient solution points to a multiobjective zero-one integer programming problem. His procedure is to first ignore the problem constraints and to focus on the unit hypercube and its integer solutions. (x, = 0 or 1 all j). Then given the matrix of objective functions C where the sense of the optimization is Maximize Cx, Bitran’s method first computes the vectors u having components 1, -1, and 0 only such that Cu 2 0 (with at least one component of the inequality strict.) The vectors u are generated using an implicit enumeration procedure. Then using a point-to-set mapping, the dominated zeroone points on the unit hypercube’ are identified. The remaining solutions on the hypercube are therefore efficient. If an efficient solution point on rne unit hypercube does not satisfy the constraints, solutions that are not efficient on the hypercube may be efficient solutions to the original problem. A procedure is given for identifying the solutions in this manner, and compcltational experience is cited. All of the problems solved are quite small (a ma’ximum of 9 variables with as many as 4 objectives and 4 constraints), and the computational results are not very encouraging: (For example, a four objective function, four constraint, nine variable problem requires about 70 seconds for solution whereas a similar problem involving 6 variables requires about 3 seconds for solution.) Times are on a Burroughs 6400 computer. Extensions to mixed integer programming methods are cited, although Bitran indicates that such problems are much more difficult to solve. Based on these two methods as well as the analogous work done with continuous variables, it does not appear that finding all efficient solutions is a fruitful approach to integer variable multicriteria problems.
3. Finding a unique solution-utility maximization Zionts [131 describes an approach for solving multicriteria mixed integer programming problems where the utility function is linear (or additive if the functions g i ( x , y ) are concave and can be adequately handled as piecewise linearizations). He first solves the corresponding continuous problem using the Zionts-Wallenius method [141 for solving multiple criteria linear programming problems. In that method a sequence of efficient extreme points are generated, and for each a series of tradeoff questions are posed to the decision maker. The decision maker is requested to respond whether he likes or does not like a tradeoff proposed or cannot say for sure. The result is a continuous solution to the ’The point to set mapping IS the following: If a component of a vector u (satisfying Cu ~ 0 is )as below (on the left), then make the corresponding component of the dominated vector one of the corresponding values (on the right).
1 4 0 ; -141;
040.1.
For example, if u = (1,-1, 1 , O ) satisfies Cv 5 0, the solution (0,1,0. 0) and (0, 1,0, 1) are both dominated solutions.
394
S. Zionts
problem and linear inequalities on the weights A, based on expressed tradeoffs made by the decision maker. There are two approaches discussed to continue the solution process to find a mixed integer optimum. The more promising is a branch and bound approach. The branch and bound procedure is rather standard; the main feature added is in the comparison of solutions. Either the preference between two solutions is indicated by the constraints on the weights A,, or it is not. (To know this, one may have to solve a very small linear programming problem.) If the preference is implied, we use the indicated preference and continue with the branch and bound procedure. If the preference is not implied, we ask the decision maker his preference and generate an inequality on the weights Ai based upon the results, and continue the procedure. The method has not been tested, but does appear to be a viable way of solving multicriteria integer programming problems. A second variation is to use a dual cut method, but it is said that such an approach does not appear promising. Sang Lee [7] has developed a goal programming approach to Multiple Criteria integer programming. (See also Lee and Morris [8].) Lee explores the incorporation of three integer programming algorithms within the goal programming framework in a rather straightforward fashion, except for a few twists.3 The algorithms are as follows: (1) A dual cutting plane method (2) A branch and bound method (3) An implicit enumeration method. According to the author, all three methods have been programmed for computer in FORTRAN. Very little is said about performance. The author states [7] that, ". . . integer goal programming methods have some ot the same inherent problems experienced by integer linear programming methods. . ." and that work on improving the performance is underway. However no performance measures are given.
4. Discrete alternatives A decision theory framework has been used for problems in which the constraints are not e ~ p l i c i t .Instead, ~ a set of alternatives is available, and one alternative is to be chosen. It is a methodology poineered by Fishburn [4]and Keeney and Raiffa [ 5 ] and others. The idea is to assess subjective probabilities as is done by the above authors and others. Then a utility function over the For example, instead of using the dual simplex procedure as part of a dual cut method, Lee uses a super priority on the appropriate deviation term. The effect is essentially that of an artificial variable. 4Kung [6] describes an efficient sort-like algorithm for finding all efficient or nondominated vectors. Zionts and Wallenius [16] describe a programming method for finding all vectors not dominated by a convex combination of other vectors.
Multiple criteria integer programming methods
395
attributes is estimated. Where applicable, an additive or quasi-additive utility function n
n
( ~ ( ~ , * - i*ikiui(&)+C = l~ ~ ) = I = l j >Ck,jui(X,)* i uj(~)+ ...+k~...~Ul(xl)...Un(~))
is used. Where the assumptions for quasi-additivity (the attributes are mutually utility independent), or additivity (the attributes are mutually preferentially inde~endent)~ hold, , then the estimation procedures are rather straightforward. Otherwise, the estimation procedure requires a specification of the form of the utility function as well as an estimation of the parameters. The latter does not appear to have been used much in practice, whereas the former approach has been rather successful. See Keeney and Raiffa [5] for a review of several successful applications. Rakesh Sarin [9] in his doctoral thesis has developed an Evaluation and Bound procedure that can be used for both probabilistic and certain decision problems. An additive utility function is assumed. The idea is to provide successively tighter bounds for the maximum and minimum possible utilities for each alternative. When the maximum possible utility for one alternative is less than the minimum possible utility attainable for another, then the former alternative may be discarded. Bounds may be used on three factors: (1) the utility of an outcome (2) the probability of an outcome (3) the weight of an objective function Successive refinements of the bounds are carried out until all alternatives but one are eliminated. The remaining alternative is the indicated choice. Sarin has used an experimental design and has tested between 2 and 4 attributes with between 3 and 4 alternatives, each for several deterministic problems. The total number of evaluations required (exclusive of judgments to estimate weights, etc.) without using evaluation and botmd, i.e., using complete evaluation of all outcomes, is the number of alternatives times ther number of attributes less twice the number of attributes. Using evaluation and bound, the number of judgments is reduced to from 25% to 75%. However, the reduction decreases as the problem size increases, and the total number of Judgments increases as the problem size increases. Thus, although Sarin is optimistic about the results of his method, it is this author’s opinion that the method is useful only for relatively small problems. Zionts [14] has developed a method based on his work with Wallenius for discrete alternatives in the certainty case. The method works by f i s t establishing a preference ordering for each objective. A cardinal preference (such as cost in dollars) may be used, or an ordinal preference may be used (such as A is preferred to C is preferred to B). The method then arbitrarily assigns scores to See references for further definitions.
396
S . Zionts
any ordinal preferences reflecting the preferences. Next an efficient reference solution that maximizes an arbitrary weighted sum of objectives is found. Using that solution as a basis for comparison, the decision maker is asked to choose between that solution and another efficient solution. Based on the responses for several such comparisons, a new efficient reference solution is found. A sequence of such reference solutions is generated; each such solution is preferred to its predecessor. The procedure terminates with a subset of prespecified maximum size that the decision maker seems to prefer most. Even though a linear utility function is used, the method will find optimal solutions for general concave utility functions of objectives. The method has been programmed for computer and the results have been favorable. Several problems have been solved, involving as many as five criteria. The number of comparisons required has been modest. For example on two separate solutions of a problem involving 9 efficient alternatives and 5 criteria, a total of 5 and 6 questions were required to determine the best choice. The total number of pairs is 36 for that problem. Problems involving as many as 35 alternatives and 5 criteria have been solved. For all problems considered, the maximum number of questions asked has never exceeded 20. The method should be able to solve stochastic problems for cardinal objectives, though it has not yet been done, and it is believed that it can also be extended to solve stochastic problems where ordinal objectives are involved. The method does not require the assumptions of preferential and utility independence that Keeney and Raiffa [5] requires for use of additive or quasi additive functions. Instead it requires that the partial derivative of the utility function with respect to each objective (cardinal o r ordinal) be positive. Zeleny [12] describes a means of weighting discrete alternatives that uses a dynamic weighting scheme based on the notion of entropy. The key idea is that the weights used in an additive model are dynamic in that they are a function of the alternatives. An example given by Zeleny concerns the importance of safety in choosing an airline. Suppose safety is the most important factor for an individual. Zeleny argues that if all airlines are equally safe, then safety is no longer the most important attribute. Zeleny’s scheme is to develop the dynamic weights and then use the same kind of measure as used by Bowman to find the closest solution point (according to a specified weighted metric) to the ideal solution. Zeleny also proposes deleting certain of the poorer alternatives and finding the closest solution to the new (displaced) ideal. The procedure seems rather arbitrary in nature in that it selects one of the efficient solutions in a seemingly sophisticated way without the rationale for the selection backing up the sophistication. He also does not explore the question of how the weights are determined. 5. Conclusion Several methods have been developed for solving multiple criteria integer programming problems. These can be divided into two categories in terms of the
Multiple criteria integer Programming methods
397
nature of the optimality of the solution: finding all efficient or nondominated solutions versus choosing a solution by maximizing the utility of a solution. The work in the former category does not appear to have been fruitful, and about the only remaining fruitful prospect is to extend Steuer’s work [101 for continuous problems. However, because of the necessity of solving a large number of integer programming problems, that possibility does not look promising either. The maximum utility approaches have more promise. Let us first eonsider those in which the solutions are implicit: solutions to expressed constraints. The approaches of Sarin [9] and Zionts [13] seem to offer the most promise, though neither has been really tested, Where the solutions are stated explicitly (the discrete alternative case), the decision theory approach of Fishburn [4] and Keeney and Raiffa [5] has been very successful in practice, though extensive tests of independence have to be made in using the approach. The method of Zionts [14] has been used to solve small deterministic problems in practice. A useful extension appears to be extending that method to solve stochastic problems. Perhaps what is most needed at this point is to carry out computational tests of the various approaches to show more definitively how the approaches perform.
References [ I ] G.R. Bitran, Linear multiple objective programs with Zero-one variables. Unpublished paper, University of Sao Paulo, Sao Paulo, Brazil (1976-77). [2] V.J. Bowman, Jr., On the relationship of the Tchebycheff norm and the efficient frontier of multiple-criteria objectives, in H. Thiriez and S. Zionts, eds., Multiple Criteria Decision Making (Jouy-en-Josas, France 1975: Springer-Verlag Berlin, 1976). [3] J.P. Evans and R.E. Steuer, Generating efficient extreme points in linear multiple objective programming: Two algorithms and computing experience, in: J.L. Cochrane and M. Zeleny, eds., Multiple Criteria Decision Making, (University of South Carolina Press, 1973) 349-365. [4] P.C. Fishburn, Utility Theory for Decision Making, (John Wiley, New York, 1970). [S] R.L. Keeney and H. Raiffa, Decisions with Multiple Objectives: Preferences and Value Tradeoffs (John Wiley New York, 1976). [6] H.T. Kung, On the computational complexity of finding the maxima of a set of vectors, Proceedings of t h e 15th Annual IEEE Symposium on Switching and Automatic Theory, (October 1974) 117-121. [7] S.M. Lee, Interactive integer goal programming: Methods and application, paper presented at Conference on Multiple Criteria Problem Solving: Theory, Methodology, and Practice, Buffalo, NY (August 1977). [XI S.M. Lee and R. Morris. Integer goal programming methods, Management Sci. (Forthcoming special issue). [9] R.K. Sarin, Interactive procedures for evaluation of multi-attributed alternatives, Doctoral Thesis, U.C.L.A. June 1975 (Working Paper No. 232 of the Western Management Science Institute). [ 101 R.E. Steuer, Multiple Objective Linear Programming with Interval Criterion Weights, Management Sci., 23 (3) (1976) 305-316. [ I I ] P.L. Yu and M. Zeleny, The set of all nondominated solutions in linear cases and a multicriteria simplex method, Math. Anal. Appl., 49 (2) (Feb. 1975) 430-468. [I21 M. Zeleny, The attribute-dymanic attitude model (ADAM), Management Sci. 23 ( 1 ) (1976) 12-26.
398
S . Zionrs
[ 131 S. Zionts, Integer linear programming with multiple objectives Annals of Discrete Mathematics ( I ) (North-Holland, Amsterdam, 1977). [ 141 S. Zionts, Multiple criteria decision making for discrete alternatives with ordinal criteria, Working paper No. 299, State University of New York at Buffalo (Feb. 1977). [ I 51 S. Zionts and J. Wallenius, On finding the subset of efficient vectors of an arbitrary set of vectors, forthcoming in: A. Prekopa, ed., Proceedings of the Third Matrafured Conference, January 1975, Matrafured, Hungary (to be published by Springer-Verlag, Berlin, 1975-76). [ 161 S. Zionts and .I.Wallenius, Identifying efficient vectors: Some theory and computational results, to appear in Operations Res.
Annals of Discrete Mathematics 5 (1979) 399-404 @ North-Holland Publishing Company.
Report of the Session on INDUSTRIAL APPLICATIONS E.M.L. BEALE (Chairman) D. Granot (Editorial Associate)
1. Introduction This Chapter is written from the material contributed at an ARIDOSA session by Thomas Baker, Robert Harder, Darwin Klingman, Wilhelm Krelle, Moshe Segal and others. Five types of models are used in industrial applications of mathematical programming: Network flow models, General linear programming models, Convex nonlinear programming models, Mixed integer linear programming models, Nonlinear nonconvex models, perhaps also including integer variables. The ARIDOSA was concerned primarily with the last two types of model, but others were also considered. Such applications of mathematical programming are a strong growth area among organizations with the appropriate computer software and professional expertise. There are three reasons for this: (1) Developments in Integer Programming software have made it about 10 times faster on the same hardware as it was 5 years ago. And these developments will continue. (2) Integer Programming modelbuilders are more experienced. Following the failures of some large integer programming models in the early 1970’s they approach their tasks more cautiously. They know that many problems can be solved only if the model is made compatible with the solution strategy-in particular if the linear programming version of the model is a reasonable guide to its integer solution. They also know that many other problems cannot be solved unless the user is willing to accept a very simplified representation of those features of his system that have little effect on the decisions w;+h which the model is primarily concerned. This understanding of the capabilities and limitations of integer programming will surely increase. Some analysts may fail to try modelling techniques on the grounds that they were computationally infeasible five years earlier, but human curiosity is such that this is not too serious a danger. (3) There is a better understanding of the importance of communications and interaction between the analyst and the users of any model. The analyst must give 399
400
Report
the user a clear picture of the structure of the model-not algebraically but in terms of the assumptions made and the data used. He must also make it easy for the user to change these data, and as far as possible other assumptions, in the light of the model results. Computer terminal facilities are now quite well adapted to these purposes. Some speakers stressed the value of allowing the user to develop his model at an interactive computer terminal. It is then particularly valuable to have the model solutions displayed graphically. There is more doubt about the need for interaction within the optimization process itself. In my experience, users are happy to leave this to the analyst, provided that he accepts that the user knows his own business best even when he needs the analyst’s help in formulating his requirements mathematically. Tom Baker pointed out that large integer programming models can often benefit from heuristics derived from the physical processes represented in the model, when choosing how to branch at a node and perhaps when choosing which node to explore. As an example, he quoted a ship scheduling model in which a linear programming subproblem might have a solution with 0.6 ships going to Terminal 1 and 0.4 ships going to Terminal 2. The most natural way to develop this subproblem might then be to have one ship visiting both terminals, even though the value of the corresponding variable in the LP solution is zero. Such heuristics might come from the user, but in general they are more likely to come from the analyst working with him.
2. Types of application Early successes with mixed integer programming were in medium and longerterm planning, when set-up costs, and the fixed costs and other economies of scale in new investments, are considered within an otherwise linear programming model. Many new applications of this kind are being developed, Tom Baker described various studies within Exxon. Studies for expanding an existing plant, when one must decide how much new capacity, if any, to install of say 3 different types. Piecewise linear representations of the 3 concave cost curves are then developed, using about 20 binary variables in all. Similar problems for a new plant might require 20 curves defined by 100 binary variables in all. These curves would all be represented by S2 sets, rather than binary variables, if one were using a computer code with this facility. He also described a multi-timeperiod planning model, with a one-year time horizon, for scheduling blocked operations. This considers start-up costs, inventory holding costs, variable demands, costs of switching between operations, etc. The model has binary variables defining which operation to run in each time period. The lengths of the time periods are continuous decision variables of the model. A typical problem has 600 rows, 1000 continuous variables and 120 binary variables to study 7 operations plus a shut-down for 7 products in a single reactor.
Industrial applications
401
Robert Harder described a sequence of models for the production and distribution of all General Motors automobiles in the United States and Canada. Some short-term planning problems are formulated as pure network models and solved with Klingman’s PNET code with a solution time of about 15 seconds. Trade agreements between the U.S. and Canada which regulates the number of cars that can be shipped between the two countries impose additional linear constraints which are not network type. PNET is then used to find a starting basis for this problem which is solved by MPSX. Longer term planning problems which include the choices of where each model is to be made requires integer variables. Current models have 3000 rows, 5000 variables and 30 integer variables. They are solved with difficulty using MPSX/MIP370 on a 370/145. Ship scheduling is another area with several integer programming applications. The problems are similar to those of airline crew scheduling, but without the possibility of developing cyclic solutions. Some applications have failed because of difficulties in supplying the computer with up-to-date data. But further successful applications will surely be developed. The Exxon system described by Tom Baker for choosing which terminal or terminals should be supplied by each ship carrying Liquified Petroleum Gas to Japan is an interesting example. A typical problem encompasses 7 months, 30 ships and 9 terminals. This leads to a model with 1781 rows, and 4367 columns including 435 binary variables. Moshe Segal described a system for scheduling duty rosters for telephone operators. This could be adequately represented as a pure network flow problem. He also described an interactive computer system for assigning territories to telephone repairmen. The data for the area to be assigned are defined as elementary blocks with given co-ordinates and assumed work contents. The user then chooses nominal centres for each territory, and the computer derives the data for a simple transportation problem in which the blocks are assigned to centres so as to minimize the weighted sum of the distances from the centres to the blocks, subject to the constraint that equal amounts of work must be assigned to each centre. Some blocks may be split between centres, in which case the computer then allocates the split blocks to single centres so as to minimize the maximum derivations from the work content assigned to any centre. The user can then change the solution, by moving the centres, or by forbidding the allocation of particular blocks to particular centres, or by forcing the allocation of particular blocks to particular centres. The work is described more fully in Segal and Weinberger (Turfing, Operations Research, 25 (1977) 367-386). A somewhat similar system is used for dividing up a large area into districts covered by single Telephone Directories. Two newer areas of application of integer programming are to short-term (weekly or even daily) scheduling of industrial production and distribution, and to the co-ordination of the information supplied by teams of specialist consultants in large-scale economic studies. Integer programming is now cheap enough, and reliable enough, to be used for short-term scheduling of production and distribution in many situations where the
402
Report
approximations inherent in a linear programming model are unacceptable. Three particular features may lead to integer programming: Set-up costs, or other economies of scale from using a large amount of some activity rather than a small amount. The need, for operational convenience, to limit the number of simultaneous activities, such as the number of distinct ingredients in a blend. The concept of a semi-continuous variable is useful for many such models. This is a variable that may be either zero or between one and some upper bound, in an otherwise linear programming model. Nonlinear constraints. These arise in particular if the yields of different products depend on the conditions under which a plant is operated. Even if the equations defining the yields of each product are linear, the constraints in the mathematical programming model are nonlinear, since the amounts of the different products that are made are defined as the yields multiplied by the flow rate. A problem of this kind from Gulf Management Sciences led Scicon Computer Services to develop Linked Ordered Sets to handle such product terms. This development is described in my DO77 paper. Tom Baker described another way of handling nonlinear constraints used by Exxon. This is to take a fixed grid in each variable, say Xj=Xjk
for k=0,..., K ,
and to assume that the function f(x,, . . . ,x,) any region of the form
is approximately linear throughout
Binary variables are then introduced to indicate which hyperrectangle is to be used. This piecewise linear approximation is generally discontinuous along some of the grid hyperplanes. But this is not particularly serious. In such applications, when the model is run frequently, with revisions to the numerical data but only occasional changes in the model structure, it is particularly appropriate to consider special-purpose algorithms and computer codes, such as network flow codes and their extensions, to reduce computer running time. In some ways the most interesting applications of Integer Programming are to the co-ordination of large scale economic studies. Such studies are often sponsored by Government or an International Agency, but some are specifically industrial. For example, in a study of the economic viability of different possible schemes for a Gas Gathering Pipeline System to collect the associated gas produced at the oil fields in the North Sea, a mathematical model is needed to co-ordinate information from specialists in oil and gas reservoir studies, pipeline operations, platform construction and compressor operations, shore terminal operations, and gas marketing.
Industrial applications
403
The use of mathematical programming for such purposes is sometimes criticized, because it concentrates on finding the precise optimum value of a possibly rather arbitrarily chosen objective function, when the real world is so much more complicated than the mathematical model within which this optimum is being found. But this criticism overlooks the facts that the hardest part of such a study is usually to find out what the problem really is, and that mathematical programming is a great help in this process. If the analyst uses the concepts of mathematical programming, he is forced to define his decision variables and the limits on their possible values. And the selection of a numerical objective function helps him and his client to identify the precise purpose of the study. The initial model will be formulated after discussions with each team of specialist consultants, and tentative agreement will be reached on how they should present their data. The Mathematical Programming Modelling team must then develop computer programs to read in these data, to check them for internal consistency, and to print them out neatly so that the Project Manager for the study can assess their significance. This part of the work is more time-consuming and far more important than the technical task of generating a matrix to enable the Mathematical Programming System to compute an optimum solution. But this optimum solution is important, not because it defines an optimum course of action in the real world, but rather because it does not. In other words the form of this optimum solution will usually reveal defects in the structure of the mathematical model, or in the values of the numerical data, or both. This process is helped by the fact that mathematical programming models tend to find extreme solutions. But often deficiencies in the structure of the model are exposed earlier, during the first detailed attempt at data collection. After a few iterations of this process of changing and re-running the model, the team will have a sensible summary of the relevant data for the study, and a fair understanding of their significance. This discussion emphasizes the importance of being able to make many runs of the model at an acceptable cost in time and money. It also emphasizes the value of a solution technique that allows the addition of a few integer variables, or perhaps a few nonlinear constraints, at short notice without having to change the solution strategy. This is the real strength of a General Mathematical Programming System for this type of work. The main pitfall to be avoided is the usual one in mathematical modelling: the temptation to include unnecessary detail about those parts of the system for which the data are known most precisely. But there are specific pitfalls associated with integer variables. There is a temptation to include them unnecessarily. An integer solution is always easier to interpret than a fractional solution, but the continuous linear programming version of the model may actually be more useful than the integer programming version when interpreted properly, as well as being easier to compute. For example one may say that the 3 alternative ways to handle some situations are A or B or C. One may then introduce 3 zero-one variables to define which is chosen. But a linear programming solution that is 75% A and 25% B
404
Report
may well represent an adequate approximation to some sensible compromise between A and B that is more effective than either. The other temptation is to forget that the computing time to solve integer programming models of any given type may increase exponentially with the number of integer variables. Many practical problems are solved with hundreds of integer variables. But one should never assume that there will be no difficulties with a new application if one doubles the numbers of integer variables between the models generated by test data and the real data.
3. Discussion There was some discussion at the ARIDOSA whether new types of Industrial Application could be foreseen, as well as continuations of the trends already noted. It was agreed that long term financial planning models that took explicit account of uncertainty could be valuable, but it was also agreed that this raised formidable problems of modelling and data collection. This led to a discussion of the extent to which constraints representing somewhat arbitrary or uncertain limits need to be taken literally by the optimization model. In my view it is very important not to confuse the definition of the problem and the algorithm for solving it. I therefore believe that all constraints should be interpreted literally, and stated as simply as possible for the benefit of the client. But some integer programming models are easier to solve, and more useful, if such “soft” constraints are formulated with bounded and costed slack variables, allowing some violation of the constraint as originally defined at a specified cost. Wilhelm Krelle raised the important question of the extent to which the analyst should accept all constraints suggested by the client for “political” or other intangible reasons. He suggested that models should be solved both with and without such constraints, so that the client can see just how much he is losing in tangible benefits by accepting these constraints. He may then decide that the constraints are too costly to be acceptable.
Annals of Discrete Mathematics 5 (1979)405-410 @ North-Holland Publishing Company
Report of the Session on MODELING Darwin KLINGMAN (Chairman) J. CLAUSEN (Editorial Associate)
This paper is written from material contributed at the modeling session by J. J. H. Forrest, D. Klingman, and K. Lenstra. During the session o n modeling, the discussion focused primarily on the following three subjects: Communication between the modeler and his client, modeling techniques, and future aspects of modeling. The main points of the discussion on each of these areas are summarized in the following three sections.
Communication between the client and the modeler The advent of the computer has given rise to the development of computer based planning models. The techniques for building, solving, refining, and analyzing such models have undergone a steady evolutionary development as computer hardware has changed. This evolution, however, has often focused on increasing the level of mathematical abstraction, rather than on communication of the model and its solution to the user. This has led to a major communication gap and consequently has reduced the use of “optimization models” in industry and government, mainly because the researchers have largely ignored the fact that mathematically complex models are not readily understood. Furthermore, the opaqueness of optimization techniques has led companies to surrender to expedient but less than fully satisfactory processes of an ad hoc nature such as simulation. The question as to the extent to which this gap should be resolved gives rise to two different viewpoints. The first is that the client should be involved as little as possible in the modeling phase. In this case, the modeler presents to the client a complete model which the developer, using a very systematic presentation, explains to the client only as far as the client himself wants. Such an approach requires that the client has a high degree of confidence in the modeler and trusts him to do a good job. The disadvantage of this procedure is that often clients do not know what they 405
406
Report
really want and/or which data are necessary to obtain the desired answers. Hence, the client should be involved in the modeling phase, thus creating an interactive process between the modeler and client. However, such an approach requires the communication gap to be resolved. It must be realized that a solution by itself cannot characterize the nature or scope of the decision to be made. Increased model complexity, if approached properly, has the potential for providing insights into the factors pertaining to the solution. This potential has not yet been realized. Past increases in model complexity have instead led to a numbers in/numbers out game which tells the client what but not why. In many cases this is due to the fact that models can only be solved once or twice because of exhorbitant computing costs. To garner insights requires that the model be iteratively solved using a systematically developed sequence of supply/demand policy scenarios. Once these scenarios have been solved, then insights can be acquired. But these insights must be coaxed out by a laborious synthesis of the solutions for different scenarios, interpreted and moderated through the experience of the analyst. Practitioners have been quick to observe that solutions to complex models are extremely difficult to collate and synthesize. In response to this situation, special schematic procedures for representing models and exhibiting solutions which allow the user to gain more insights must be developed. One viable approach is to couple the technological advances in the multi-media and computer industries with the techniques for representing complex, algebraic models pictorially. Historically, network models have been used extensively in industry and government. This is primarily due to the fact that (1) these models are easily understood via a picture -one picture is worth a thousand words, and (2) they are easily solved. Building on this, practitioners developed schematic modeling procedures both for representing input problem data and for viewing more complex models. For example, Shell Oil Company developed a graphical problem generation system for mathematical optimization problems involving their refinery operations. More recently, new network formulation modeling techniques, called NETFORM, have been developed which allow many problems that were previously expressed only as complex algebraic models to be given a pictorial, network-related formulation which is equivalent to the algebraic statement of the problem. The pictorial aspects of NETFORM have proved to be extremely valuable in both communicating and refining problem interrelationships without the use of mathematics and computer jargon. While NETFORM has proved to be extremely valuable in some real-world applications, it has not solved the modeling communication gap. It does suggest an integrated interdisciplinary sequence of modeling/computer/multi-mediaresearch projects which may accomplish this goal.
Modeling
407
Modeling techniques Roughly speaking, the techniques used in modeling can be divided according to whether the problem to be modeled possesses some special characteristics or not.
Techniques for general problems
If the problem to be modeled possesses no special characteristic or hidden structure, the modeling technique used depends on whether the model is to be solved by a general purpose MIP/LP code or some special purpose code. Among the models to be solved by special purpose codes, the generalized network (GN) model is especially interesting. The generalized network problem represents a class of LP problems that heretofore has received only a small portion of the attention it has deserved. While GN problems can easily be depicted pictorially, modelers have never focused attention on casting and identifying problems as such. This is largely due to the lack of available special purpose computer codes to solve GN problems. Recently, computer codes have emerged for solving GN problems that are substantially faster than general purpose LP codes. For example, a GN natural gas pipeline model with 4000 constraints and 11000 variables was recently solved on an IBM 370/168 for a large consultation company for $30.00. The previous cost using MPS-360 had been over $1,000.00. Not only was the problem solved more cheaply, but it was also found that it was much easier to generate the data given the pictorial view of the problem. In general, it is believed that through research many problems can be effectively modeled and solved as GN problems. The GN model is very robust and allows the modeler to capture many facets of a problem if sufficient ingenuity is applied. Another subject of interest in this connection is the advantage which can be obtained from approaching the formulation of LP problems as constrained network problems. A constrained network is an assignment, transportation, transshipment, or GN problem which also includes extra linear constraints. To both improve communications and solution aspects of models, it would be extremely valuable to train and guide modelers and decision makers to visualize their LP problems as constrained network problems. This is important for a number of reasons. It would allow the modeler to visualize the problem pictorically during both the formulation and solution stages. Further, it would allow development of graphical problem generators and solution analysis routines. Efficient algorithmic procedures have been developed for constrained network problems when the network portion of the problem is comparatively large. Preliminary testing indicates that recognizing that an LP problem is a constrained network and exploiting this fact has improved solution times by 25 times. As can be seen from the discussion above, the special purpose code approach to
408
Report
modeling has several advantages, In particular, the models seem to be easily solved and readily communicated to the user, thus enabling the user to participate in the modeling. Furthermore, the interpretation of the solutions is easy for such models. It should be mentioned too, that when dealing with special purpose codes, it is often possible to handle constraints which destroy problem structure (e.g., logical constraints) within branch and bound procedures, and thus make the solution procedure more efficient. However, specialized codes and models also have certain disadvantages as will become apparent from the next paragraph. The disadvantages of the general purpose codes and models compared to the special purpose codes and models are mainly that their running times are longer and that they are not easily communicated to the client. However, general purpose codes and models have some advantages, too. Often, the client has a “fuzzy” problem, i.e., he has not realized in advance all aspects of the problem. In such a case, the model will normally be subject to several modifications during a long period of time, and such modifications will be (in most cases) fairly easy and cheap to incorporate in general purpose codes and models while they may be either impossible or at least very expensive to incorporate within special purpose codes and models. Furthermore, if a model is to be used only a few times, it is often less costly to use a general purpose code or model rather than a special purpose code or model because the cost of developing the model will exceed the savings obtained from the decrease in the computed run time of the model. Finally, general purpose codes usually offer facilities enabling the user to make sensitivity analysis, etc., while such features seldom appear in connection with special purpose codes. Generally, it appears that special purpose codes and models should be used if the model is to be run “fairly” often, either by the same or by different clients, while general purpose codes and models should be used when the model itself is seldom run and when revisions of the model over time are foreseeable.
Techniques for problems with special characteristics There are various special characteristics of a problem whlch may be exploited when modeling. The following is a discussion of exploitation techniques which have proven to be successful. The fkst relates to scheduling problems. Though scheduling problems have been intensively studied during the last few years, there is still an enormous gap between the problems solvable by the theoretical methods and the problems actually occurring in the factories. However, in certain practical cases, e.g., where the capacity constraints of some of the machines dominate the restrictions so that relaxing these constraints leads either to bad bounds which are cheap to compute or to good bounds which are extremely expensive to compute, the following “general heuristic principle” often works quite well: Identify the bottlenecks of the problem and use them to break up the problem into subproblems, each of
Modeling
409
which is solvable by nice, clean theoretical methods. Solve the subproblems and concatenate their solutions to obtain a solution to the original problem. Another modeling technique relies on the fact that many problems have a hidden simple structure which can be exploited to formulate them as a special purpose model. Examples of theoretical standard models which have occurred in real-life problems are:
The travelling salesman problem : computer wiring, vehicle routing, clustering a data array; see Lenstra and Rinnooy Kan: Some simple applications of the travelling salesman problem (Op.Res. Quarterly, 1975). The weighted graph coloring problem (given an undirected edge-weighted graph and a fixed number of colors, find a coloring of the vertices which minimizes the total weight of the edges whose vertices have different colors): assigning shipping companies which have known interference patterns to a limited number of piers. The acyclic subgraph problem (given a directed edge-weighted graph, find an acyclic subgraph of maximum total weight): given a large number of individual preference orderings of members of a Dutch job union, find a global preference ordering which minimizes the total number of ignored individual preferences. In general, the tricks used in constructing satisfactory and solvable models seem, in spirit, closely related to the techniques which are applied to obtain problem reductions in the theory of computational complexity.
Future aspects of modeling In the following section some future aspects of modeling will be discussed including aspects of the codes used to solve the models. In the field of communication and interactive modeling, it seems promising to extend and reline the NETFORM techniques for network formulation of models. Currently, NETFORM techniques provide modelers with the capability to pictorically represent any 0-1 LP problem and any 0-1 mixed integer LP problem whose continuous portion is a network in a pictorial, network-related formulation which is mathematically equivalent to the algebraic statement of the problem. Research is needed to extend these techniques to handle additional nonlinear requirements and to refine these techniques to produce more efficient and compact pictorial formulations. In connection with network models, it should be pointed out that at present no automatic procedure for determining the largest network part of a given LP exists. Hence, the development of a mathematical theory and a corresponding computer code for this task is of major interest. Also, the developments of hardware in the fields of display devices and mini-computers seem applicable in the field of interactive modeling and solution analysis. Today, we have good interactive time-shared computer system. Additionally we have interesting computer terminals -three dimensional terminals,
410
Report
light pen terminals, color terminals, etc. -and very good plotters. By effectively combining the pictorial problem formulation, computer based models could be developed which could be easily used and understood. These models could actively involve the decision maker in the modeling and solution process. Further, features could be designed with such a system to provide “split screen” presentations of old and new solutions or models. The system could provide “map” type plots of the solution displayed on a map showing (in color) which plants ship to which warehouses and which warehouses service which customers. An interactive system of the type noted above would permit a decision maker to remain at a computer terminal and explore and evaluate many alternative policy questions. This is important since most problems are typically characterized by a complex environment in which multiple criteria of the “goodness” of a solution cannot be readily accomodated in a single run. With the recent advances in mini-computers and the envisioned implementation of solution codes on mini-computers, it may well be possible to make the interactive system discussed above available on dedicated mini-computers. The potential advantages of this development include: (a) with a dedicated system, it would be feasible and highly desirable to design the operating system to minimize the human engineering aspects of problemsolving -that is, to minimize the difficulties of entering, modifying, and verifying problem data, passing the problem data to the solution code, interpreting the output, and so forth; (b) the hands on availability of a simple to use mini-computer in a division’s office would greatly stimulate people to use such techniques. Learning to use a large computer system and keeping up to date on what system routines have been changed, etc. often discourages many potential users. With respect to the solution of general purpose code models, the incorporation of special purpose codes as options in the general purpose code seems to be a field of interest. The area of automatic preprocessing of the model is also promising, both with respect to general purpose models and special purpose model. Automatic preprocessing may include the introduction of new constraints derived from the original ones to tighten the integer formulation of the problem, identification of essentially equal columns of the matrix of the model (as such columns may cause branchand-bound procedures to fail or at least decrease their speed substantially), identification of dominating and dominated columns of the matrix of the problem, determinant reductions, etc. In some existing codes, a subset of these features has already been implemented, but not much use is currently made of such options. Finally, it should be mentioned that although it seems possible to give some “axioms” and rules for modeling which are helpful when teaching this subject, there is in general not much hope when considering such axioms for real-life modeling due to the fact that in most cases human insight plays a major role in the modeling procedure.
Annals of Discrete Mathematics 5 (1979) 411-415 @ North-Holland Publishing Company.
Report of the Session on LOCATION AND DISTRIBUTION PROBLEMS J. KRARUP (Chairman) L. Butz (Editorial Associate) Zoutendijk [4]gives a brief outline of the general location problem and also presents a straightforward approximative technique for its solution. He concludes with the colorful remark “the algorithm worked nicely and, as could be expected, led to a local minimum which sometimes had to be built on the top of a mountain or in the middle of a lake.” This quotation can suitably serve as a headline for the following discussion. It compels us to reflect upon the fact that while we can analyze future trends in location theory, stimulated only by its mathematical challenges, it is essential to keep in mind that practitioners should not be neglected when areas for future research are proposed. The subsequent paragraphs reflect to some extent both aspects; the more theoretical questions and the somewhat broader OR-oriented viewpoints. Krarup and Pruzan [3] reported on the different phases they went through during the preparation of their survey. These phases ranged from enthusiasm (when the invitation to contribute was received and August 1977 seemed far ahead), frustration (when the huge bibliographies were scanned and the material appeared to be difficult to structure) and resignation (when the ambition level was gradually lowered and it was decided to include only 2-product, static, singlecriterion, deterministic location problems encompassing three families of so-called prototype problems; center (minimax), median (minisum), and the plant location family (characterized by the presence of fixed costs)). One of the key phrases of almost all of the sessions at ARIDOSA was computational complexity. What definitely remains to be tackled as far as these three families are concerned is a more specific classification scheme and complexity results. Of course, what also remains, are comprehensive surveys of multiproduct, dynamic, multicriteria, stochastic location problems including all relevant combinations of these characteristics. If we turn to the more general OR-viewpoint, that is to the practical applicability of location theory, some real-world problems have been recognized which could be meaningfully treated by “pure” center, median or plant location models. Personal experiences have confirmed that the better results frequently are achieved by means of fairly simple models, more or less well-understood by the decision maker. Yet, there is a growing need for slightly more complex models than the prototype models referred to before. 411
412
Report
During the preparation of the first three parts of the previously mentioned survey paper, lots of virgin territory was discovered and a list of some 25 “challenging unsolved location problems” was compiled, some of them dealing with certain side constraints and a number of “hybrid” problems which have not as yet been formulated explicitly. Among the known hybrid formulations investigated so far are the account location problem (comprising both the p-median and the simple plant location problem) and the cent-dian problem (a convex combination of the center and the median problem). For the account location problem, very nice results have recently been achieved by Cornuejols, Fisher and Nemhauser [l].They include bounds on the worst-case behavior of various heuristics and relaxations; examples show that these bounds are tight. Another family of hybrid problems arises if we turn to the combination of location and distribution. Distribution models are mainly based on given facilities (plants, warehouses, customers) and occasionally on a given vehicle fleet with given capacities, so such problems are often of a “short term planning nature” i.e., how to optimally operate a specified system. Long term planning (which in this context means how to establish new facilities and distribution fleets or to expand existing facilities and fleets) will often (more or less explicitly) involve questions as to the overall performance. Such integrated models which tend to be highly complex are extremely difticult to handle with the tools available, and much more research lies ahead on this frontier. It is for example an open question as to the best way to structure such problems; will better solutions to real-world problems be obtained via integrated formulations aiming at the optimizal design of the overall system, or is it better to treat location and distribution problems separately, perhaps iteratively? Another open area of great importance to practitioners is the role of postoptimal analysis in location theory. It cannot be stressed too often that decision makers seldomly are satisfied just by some so-called optimal solutions since the underlying model will never reflect all of their preferences. Neither will it completely describe the underlying physical-economic-social system. Finally, it seems appropriate to draw attention to the general question of how to approach location problems in practice. What kind of representation (planar, graph), and which measures of distance and demand should be chosen? Should the number of facilities be given, limited, or open? How about the cost structure, the planning horizon? How should capacity, environmental, and budget restrictions be treated? The need for complexity results, further refinement in complexity measures, further analyses of heuristics, their worst-case behavior and average-case behavior was more or less stressed by all discussants. With respect to discrete location and distribution problems, most of the procedures proposed have been heuristics. Computational complexity results suggest that even very stylized versions of the underlying optimization problems are computationally intractable,
Location and distribution problems
413
e.g., the travelling salesman and warehouse location problem. A notable exception is the area of location problems on graphs (i.e., p-median and p-center problems), many of which can be solved in polynomial time. However, practical problems of non-trivial size often yield only to an approximation approach. It is rather typical that there appears to be little consensus on which heuristic approaches are most useful. A large number of fairly obvious heuristics are available for location and distribution problems. On the basis of available evidence it is often difficult to more or less unequivocally classify their results. This partly reflects the rather low standards of computational experimenting. Some benchmark problems do exist that allow a reasonable comparison. In any case, many practical cases do not conform exactly to any standard model and display some special structure that has to be taken into account in solving them. There are notable exceptions to the unsatisfactory evaluation of heuristics. The worst-case behavior for a number of heuristics has been established. Though technically demanding, this seems to be a very promising line of attack for future theoretical research. Typically, the worst-case results are very pessimistic in the sense that they hardly reflect average-case behavior. It may be possible to analyze the latter phenomenon, but in spite of some striking results due for example to Karp [2] the specification of an adequate probability distribution over the set of all problem instances remains a rather virgin area. In terms of promising trends, however, this approach should certainly not be ignored. In summary, then, it appears likely that location and distribution will remain among those areas where theoretical results are primarily used to analyze the implications of certain modelling assumptions and to yield insight into the behavior of heuristic approaches. Some suggestions on future work in complexity and approximations are: (1) A systematic classification of problems should be carried out together with a delineation of classes with respect to their complexity; e.g., identification of “easiest” NP-complete” ones. (2) For the uncapacitated location problem and the p-median problem, one should study approximation that use the data in a more intelligent way. Available approximations appear to use the data only to get values of the objective function. For this case of approximations, the greedy algorithm and its variations are in a certain sense best for maximizing a submodular function. By not using the data other than for function values, we are treating the location problem as a more general one of maximizing a submodular function. This is true, for example, of Benders decomposition. We need to develop procedures that use more information. (3) The fastest procedure known for solving the LP relaxation is a dual ascent approximation. Can we develop something reasonably efficient that gives an exact solution or is primal? (4)Results for the uncapacitated problem should be extended, when possible, to the capacitated one and to network problems. For example, there is a greedy
Report
414
procedure for the capacitated problem that involves solving assignment problems and obtains the same bounds as the problem. It needs to be evaluated empirically. (5) Expected behavior of approximations and relaxations need to be studied. Positive results have been found for a special case of the p-median problem and we believe they can be extended. The following classification scheme for these problems might be useful. Combinatorial location and distribution problems fall roughly into three or four classes of increasing complexity and difficulty: (1) Simple or uncapacitated plant location. (2) Capacitated plant location. (3) (Tree-level) network problems with fixed charges at nodes. (4)More general network problems of multi-commodity type with zero-one variables on nodes and routes. It is commonly recognized that for effective treatment these problems require formulation in the “disaggregated form”, i.e., with a set of constraints
xii S yi, all i,j replacing the (integer-, but not LP-) equivalent, form
C xij s myi,
all i.
i
Class (1) is relatively well in hand for problems with up to 100 or 200 sources. Basic techniques are: (A) Construction of local, dually feasible solutions (leading to Benders inequalities, gain functions) and enumerative steps based upon them. This is equivalent for the “zero” origin (all plants closed) to using what is the “greedy” heuristic. (B) Computation of good dual solutions to the LP-relaxation by “ascent methods”. Technique B is probably more efficient for the start, technique A for the completion of the search. Tie-ins of technique A with (“Lagrangean”) relaxation methods, as well as interesting worst-case bounds have been given. Class (2) can be treated by decomposition of the dual relaxation LP, for relatively modest sizes (up to about 50 plants). A dual ascent method for the LP relaxation has recently been given with good results for quite large problems. Recourse to enumeration (such as “state enumeration” with the state specified from the complementarity conditions) is certainly necessary as well. Branchbound methods are probably less appropriate. The near future should bring an extension of dual ascent methods to class (3). Otherwise, classes (3) and (4) can, for relatively tractable problem data, be tackled by Benders Partitioning and related techniques, or perhaps also directly by production mixed-integer codes (shown to be quite effective for a distribution problem in which the integer variables enforce a multi-commodity no-splitting condition in a manner which allows the elimination of the flow variables.
Location and distribution problems
415
References [l] G. Cornuejols, M. Fisher and G. Nemhauser, Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms, Management Sci. 23 (8)(1977) 789-810. [2] R. Karp, The Probabilistic analysis of some combinatorial search algorithms, in: J. Traub, ed, Algorithms and Complexity (Academic Press, New York, 1976). [3] J. Krarup and P. Pruzan, Selected families of discrete location problems, Annals of Discrete Math. 5 (1979), this volume. [41 G. Zoutendijk, Mixed integer programming and the warehouse allocation problems, in: M. Beale, ed., Application of Mathematical Programming Techniques (English Universities Press, London, 1970).
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979)417-421 @ North-Holland Publishing Company
Report of the Session on COMMUNICATION AND ELECTRICAL NETWORKS M. SEGAL (Chairman) M. Magazine (Editorial Associate)
Introduction Discrete optimization problems arise naturally in the planning of communication networks. These problems entail fmding the least cost routing of circuits or the most efficient ways of adding capacity to a given network. The presence of economics of scale in costs and the discrete size of the facilities lead to these discrete optimization problems. These problems are typically modelled as multicommodity network flow problems, where each point-to-point circuit requirement corresponds to a commodity. The network itself is a graph with capacitated arcs. The size of these problems usually require arc-chain rather than arc-node formulations. That is, for each commodity it must be decided which of several ways, rather than through which nodes, it will proceed through the network. We will proceed by describing some typical problems that have arisen in this area, how they were dealt with, and what technology and research is necessary to handle their natural generalizations.
Problems and generalizations In single period problems the emphasis is on satisfying forecasted circuit requirements. Rearrangement of circuits and augmentation of the links are allowed for a cost. A linear programming model can be used to formulate this problem, but for large ones (500 arcs and 2000 commodities) solution has been made easier by taking advantage of the GUB structure of the problem, i.e. the constraints corresponding to each of the point-to-point demands for circuits. For such a large problem a specially written computer code still required almost three minutes of computation time on an IBM 370/165. The natural generalization of this problem is to consider the multiperiod planning problem. Computation times for these problems grow very fast and research on structured LP problems is needed. Special attention should be given to the development of algorithms taking into account sparsity and the blockstructure of the constraints matrix. Although problems of stability and convergence still exist, some preliminary work using nesting has been encouraging. In 417
418
Repon
the absence of such algorithms it may be necessary to resort to heuristics. One heuristic algorithm that was used for such a long range problem contains the following steps: (i) costs are associated with the arcs for all time periods, (ii) least cost path for all node pairs for each time period are found, (iii) circuit requiretients for each year are assigned the appropriate routes. The required capacities for each arc can now be used to determine a facility expansion plan for each arc. These plans are then used to readjust unit costs of the arcs and the sequence of steps repeated until the plan stabilizes. The final plan can then be used to determine the evolving topology of the network and the implications it has on future requirements of various types of transmission facilities. This method was used in a long range planning study of the interstate communication network in the U.S.A. Another suggestion for future research in this area is to reduce the number of commodities by considering flow patterns, that is, flow coming from or going to a particular point, rather than point-to-point requirements. Technologically, we may want to restrict flows to certain paths making this approach inappropriate but the computational savings should be investigated. Another example of a long range planning problem was the planning for the transatlantic telecommunication network. Binary variables were used to model what type of cable and in what year it should be installed. These, along with continuous variables representing the circuit requirements between countries yielded a mixed integer LP. Included in the problem were circuit demand constraints for each destination in each period, cable capacity constraints, parity constraints (to ensure proper balance between circuits carried by cable and by satellite) and circuit diversity constraints for reliability. With twenty time periods and ten destinations there were approximately 100 binary variables, 10,000 continuous variables and 20,000 constraints. It was observed on a small version of the problem that running time was excessive (over an hour with IBM’s MPSX/MIP) and that integerizing the LP solution would give quite satisfactory results. A somewhat different problem involves the bundling or multiplexing of circuits. Here too, size and complexity dictates a heuristic solution approach. Transmission in the interstate network is provided by high capacity, high frequency lines. In order to be carried by these lines, individual circuits are bundled in stages (12 circuits into a channel group, five channel groups into a supergroup and ten supergroups into a mastergroup). At any node bundles may be “dropped”, “picked up” and “connected through”. The bundling is done by multiplexing equipment at the end points of the bundled section with a fixed cost and linear cost (per bundle). A bundle passing through an intermediate node intact, without dropping or picking up portions of the bundle, requires a connector at the level of the bundle or any higher level with a linear cost (per connector). Line haul costs are linear in distance and not linear in capacity. Consider the complexity of finding the least cost bundling configuration for even a simplified problem of a line graph where the point-to-point circuit requirements are given. In general, attempts to increase utilization on the master groups will reduce line haul costs
Communication and electrical networks
419
but increase multip1exing)connector costs. A trade-off exists such that the total costs are minimized at a utilization less than the maximum which is attainable. Consider now the more complicated case with a more general graph, where the demands may be satisfied on different paths. A lead time of two to five years is required for expansion of a facility and not all arcs are augmented simultaneously. The dynamic problem of fmding the least cost plan to augment the network satisfying yearly point-to-point circuit requirements now includes the cost of the bundling configuration as well as rearrangement cost. The complexity of these problems is realized when we note that even finding a feasible integer multicommodity flow in a capacitated network often requires using a heuristic. This has been done in an annual planning problem of the French telecommunication network by assigning one flow at a time to routes while minimizing the gap with feasibility. The problem of the short term management of hydroelectric reserves can be modelled in the form of a multicommodity network flow problem, where, each block represents the water balance of a particular valley. This problem has been solved by partitioning, when consumption is thought to take place at only one point. The problem grows considerably when there is consumption through a power transmission system. The flows through this system must satisfy many energy balance and conservation of flow constraints. A multi-period model also includes the effects of fixed charge costs associated with restarting thermal plants. The algorithm sequentially solves relaxed versions of the whole problem and converges finitely to the optimal solution. Algorithms are needed for finding minimum cost multicommodity network flows. When the costs are linear with fixed charges it is possible to develop a greedy heuristic which gives good solutions. Even so the computational complexity of this algorithm is O( Inumber of edges in the support graph 1).' This can be reduced by taking into account the submodularity of the cost function. It should be pointed out that this problem has the same structure as the plant location problem. Capacity expansion of power plants exemplifies the application of mathematical programming to electricity generation. This was done in Greece where accurate information concerning long term electricity generation costs would lead to a revision of electricity tariffs. Since many discontinuities and non-linearities are present an IP model was proposed. This was abandoned, however, because of the lack of availability of an IP code and the high acquisition cost of getting one. In such applications, however, this may be a blessing in disguise. That is, the information one gets from LP codes is invaluable. In particular, the dual values of the exhaustible resources can be used to study substitution effects in this complex system. However, since the expansion of each plant is not tracked by LP, IP models should be used for short (5 year) planning horizons. A different application involves the allocation of costs for cable vision networks. It is assumed that the cheapest network has been constructed for delivering cable service to each city on the network. The question is to allocate the cost
420
Report
of the network to these cities in a fair manner. This was done in the Atlantic Provinces in Canada. The least cost network is a minimum Steiner tree but we may consider the more easily found minimum spanning tree (MST). The method previously used was to allocate costs by the actual costs on the arcs. A fairer division could be found, however, by using a game theoretic approach. The h t attempt is to find the core of the game. This represents the set of solutions (costs) having the characteristic that every subset of cities is paying no more than they would if they were acting alone. This may not even exist for the minimum Steiner tree and even when it does finding it represents solving an LP problem with 2" constraints (where n is the number of cities). Solutions obtained for the MST may not be satisfactory. For example, some cities may pay the maximum they would ever pay and some the minimum. In this case we may seek the nucleolus, which is a subset of the core. We should pursue taking advantage of the structure of the cost vector in finding the core. Unfortunately, it is not submodular, as this would allow for the greedy algorithm to always produce an optimal solution. There is an ordering of the variables which will produce an optimal solution using the greedy algorithm. It is conjectured that all extreme points in the LP problem defining the core are integral-which once again is known to be true if the costs were submodular. It should be pointed out once again that this problem was simply allocating costs to a given network. We could have looked at the problem from the point of view of the delivery company who is trying to maximize profits. This would lead to marginal cost pricing which might be considered optimal from the general economics point of view.
Other considerations Even though long term telecommunications and electrical networks are going to be greatly affected by future technology, the analysis performed today could influence engineering design. For example, this is done by assuming a design and in a long range planning model seeing what facilities are frequently chosen, or, above what price a new facility will not be chosen at all. Although it is important to continue our search for efficient algorithms for these large problems we shouldn't lose sight of how valuable it is to actually design the process and build the model. Sometimes more than 50% of the time is spent in designing a black box that would take input characteristics (fuel costs, number of periods, demands, etc.) and would output the objective function and constraints. These efforts should be improved as they give us a great deal of information. As in other discrete optimization problems we need efficient ways of handling side constraints. Often reliability or diversity constraints prevent us from using only the least cost route through a network. One way this has been handled was to find the k-shortest (disjoint) paths through the network. Then if no more than l / k of the total circuits can be sent through on any one path this would represent an optimal solution.
Communication and electrical networks
421
Capacity expansion problems of communication, electrical and transportation networks present many analogs. Further study should be ongoing to ferret out these similarities and differences. In developing models for these problems one must be very careful how decision variables are dealt with. Classifying these variables according to their cost implications and reversibility is essential. For instance, a decision variable to determine the timing of a major facility installation, possibility as yet undeveloped, involves an extremely significant decision because of the magnitude of the commitment and the long lead time involved. The values of variables designating the routing of circuits in year +20 are of secondary importance since they are subject to forecasting review and have relatively short lead times to implementation. Also, we should be extremely frugal in defining binary or integer variables. Our current computer capabilities suggest unless this is impossible, heuristic approaches will be the only alternative.
This Page Intentionally Left Blank
Annals of Discrete Mathematics 5 (1979) 423-426 @ North-Holland Publishing Company
Report of the Session on SCHEDULING A.H.G. RINNOOY KAN (Chairman) M.J. Magazine (Editorial Associate) The area of sequencing and scheduling is one of the AW areas o discrete optimization that are motivated by practical considerations and have also led to successful theoretical investigations. Research on these problems has been ongoing since the early 1950’s. Recent years have witnessed an increasing use of sophisticated tools, both by operations researchers and by computer scientists. The early research was carried out by operations researchers, who concentrated mainly on single machine, flow shop and job shop scheduling and contributed the first polynomial-time algorithms. Most of their work, however, involved investigating enumerative methods and evaluating simple heuristic rules by simulation. Unfortunately, these were not tested on large practical problems, and it turned out to be impossible to make any conclusive statements as to their effectiveness. In fact, with respect to accepted measures of performance, random rules were often shown to be as good as any other. Computer scientists, motivated by the applicability of scheduling models in operating systems theory, entered the scene at a later stage. They made extensive use of computational complexity theory and developed various new good algorithms for single and parallel processor problems. For difficult (NP-hard) problems, the worst-case performance of many approximation algorithms has been successfully analyzed. Needless to say, the field still harbours many open problems. Some of these are theoretical in nature and some are of immediate practical importance. Without question there is a great challenge in bridging the gap between these two areas.
ScheduJing theory For a limited class of deterministic scheduling problems, approximately 9,000 problems have been catalogued according to their computational complexity (R.L. Graham, E.L. Lawler, J.K.Lenstra, A.H.G. Rinnooy Kan, “Optimization and approximation in deterministic sequencing and scheduling: a survey”, Ann. Discrete Math., Volume 5 ) . Roughly 9 percent of these are now known to be easy (i.e., solvable in polynomial time); 77 percent are NP-hard, and the remaining 14 percent are open. The problems in the latter class clearly require further investigation; notorious examples are minimizing total tardiness on a single machine, and 423
424
Report
minimizing maximum completion time of unit-time tasks on three identical processors subject to general precedence constraints. In the area of preemptive scheduling of parallel machines, there is a particular need of new techniques for obtaining NP-hardness results. For NP-hard problems it seems natural to develop and analyze more polynomial-time approximation algorithms. The worst-case behavior of many such algorithms has already been thoroughly investigated for parallel machine problems and this approach should be extended to other types of scheduling problems. For instance, it may very well be that for the general job shop problem no polynomial-time algorithm exists that guarantees a fixed worst-case relative deviation from the optimum value. Very little work has been done in analyzing the expected performance of heuristics, though it appears to be useful in light of previous results to consider methods that incorporate limited look-ahead rules. With respect to optimization algorithms, improvements in polynomial-time bounds for solving easy problems should be pursued further. Enumerative methods are unavoidable for solving hard problems and developing new techniques to improve their performance is an important endeavor. Lagrangean techniques appear to be promising in this regard. There are several generalizations to the class of scheduling problems discussed by Graham et al. For example, the set of tasks to be performed may well be defined by cyclic demands; this occurs in workforce scheduling problems. Although rolling schedules have been proposed for such problems, discrete optimization theory has not as yet played a major role. If demand changes over time, we have production scheduling problems. The inclusion of inventory costs in these models has greatly limited the applicability of current scheduling techniques. Another extension is the inclusion of additional resource or capacity constraints, possibly varying over time. Once again, there is a need of theoretical development of methods handling these types of constraints. To appreciate a further reaching generalization, note that many real world scheduling problems exhibit uncertainty in some of their parameters, such as processing times or release dates of jobs. It seems natural to attempt to develop techniques that permit this randomness to enter the model. Queueing theory would seem to be an obvious tool to use, but has not achieved spectacular success in practice. However, since scheduling and queueing theory cover similar classes of problems, the gap between them is worth bridging. A unifying theory of “stochastic scheduling” would hopefully provide us with insights on the connection between its components as well as enable us to solve certain problems that currently fall in the gap. Almost all known tools of combinatorial optimization have been applied to scheduling problems at one point or another. It is worthwhile to continue the search for applicable results from related areas such as distribution, location and routing.
Scheduling
425
Scheduling practice In what ways does the existing theory help practitioners to solve real problems? How do theoreticians find out about the kinds of problems that occur in practice? What do practitioners know and think of theoretical results? The difficulty in answering these questions illustrates the gap, existing not only in scheduling, between theoretical results and the solution of practical problems. The type of approach best suited to solve a scheduling problem depends on several considerations, such as the frequency of occurrence of the problem, the costs of producing a schedule, and the need for user override or interaction. As to the latter, many scheduling problems lend themselves to be solved via a so-called interactive approach, which is now made easier by available hardware and software. Such an approach enables the scheduler to change data and constraints during a run and to revise the solution during the final stages. A typical example occurred in scheduling breaks and lunches for telephone operators. Modelled as an integer program, this problem gave rise to sixty variables, twenty constraints and six generalized upperbound constraints. However, it could also be formulated as a single commodity network flow problem with a small number of additional linear constraints. The solution method was to solve the network flow problem, to locate breaks and lunches heuristically, and to repeat these steps under user’s control allowing for user override. A similar problem in scheduling manpower at an airport was successfully solved by the use of Benders decomposition. As indicated by the first example, it may be more helpful to find alternative formulations than to search for better algorithms. This is especially important if there are many side constraints in addition to the defining constraints. Take, for example, the airline crew scheduling problem, in which one seeks to assign crews to flights subject to several restrictions. These include, for instance, that a duty period may not exceed three landings or eleven hours from the first take-off to the last landing. The classical set covering formulation does not permit efficient handling of such side constraints. However, the use of Benders decomposition on a traveling salesman-like formulation led to good feasible solutions with little effort. It is important to determine whether such side constraints are “soft” or “hard”. In practice, soft constraints occur very frequently, arising, for example, out of reserve capacity in machine maintenance scheduling. An obvious method to deal with them is simply to ignore them and see whether they are satisfied by the solution to the relaxed problem, or whether they can be satisfied off-line. Alternatively, they can be incorporated into the objective function in a Lagrangean fashion. More research is needed to decide which approach is most appropriate in practice. Knowing that most problems do not possess good algorithms, one wonders how
426
Report
large problems are actually solved. Does Benders decomposition help or is it simply a tool for modelling? Do integer programming formulations lead to solvable problems? A natural attempt would be to decompose a problem according to capacity bottlenecks and to solve the resulting subproblems successively while concatenating the resulting solutions, or to allocate scarce resources by identifying the most dominant one for each problem. Nothing seems to beat thinking carefully and cleverly about a specific problem with its own characteristics. For most practical problems it is sufficient to get good, not necessarily optimal solutions. In general, therefore, it is important to have approximation algorithms for precise formulations and optimization algorithms for formulations that ignore some complexities of the situation. Seen in this light, theoretical work on simple models acquires new significance. The area of scheduling will undoubtedly continue to provide opportunities for fruitful interaction between theory and practice of discrete optimization.
Acknowledgements F. Giannessi, R.L. Graham, J. Krarup, J.K. Lenstra, B. Nicoletti and M. Segal were among the discussants in Banff. I have made extensive use of a first draft, carefully prepared by M.J. Magazine.
Annals of Discrete Mathematics 5 (1979) 427-453 @ North-Holland Publishing Company
CONCLUSIVE REMARKS
1. The field of discrete optimization Discrete optimization is one of the most interesting but also one of the most difficult areas in mathematical programming and operations research. The main field of discrete optimization is that of integer programming, which is concerned with the solution of linear programming problems when the solution vector is constrained to be integer. If the components of the solution vector of the corresponding linear programming problem are “large” numbers (>100 say), then for many practical problems the additional integrality requirement can be satisfied by simply rounding up or down the noninteger components. This is a simple procedure for getting feasible solutions to the integer programming problem which are not far from the optimum. But, if the components of the LP solution are comparatively small, this simple rounding procedure often fails completely: the vectors produced by this method are usually not feasible, and even if they are so, they may be far from optimal. The group relaxation work shows that “rounding” linear programming solutions to get integer solutions is somewhat complicated even in the large integer case. Successfully handling integer programming problems seems to require a completely different and more sophisticated theory and methodology than for the usual linear programming problems. The importance of integer programming problems for applications is due to the fact that many optimization models for real-world decision problems involve integer solutions which are small in the above sense, or which are of considerable importance for the decision maker (e.g. one airplane more or one less can make a difference). In particular, problems where the variables are restricted to only two values, say 0 or 1, are completely intractable by the standard linear programming methods, and require new, combinatorial techniques. These 0-1 programming problems, involving several decisions between one of two alternatives, form a special class of discrete optimization problems, which are predominantly suited for representing genuine decision models. Thus, 0-1 programming serves as a bridge between integer programming on the one hand and combinatorial optimization and graph theory on the other hand. Many combinatorial, and in particular graph theoretical, optimization problems can be formulated as 0-1 programming problems, and vice versa. A few examples 01 the many fields of practical applications of discrete optimization are: computer science (code optimization, wiring), economics (dynamic production planning, budgeting) and scheduling (vehicle routing, time table 427
428
Conclusive remarks
problems). A detailed discussion of further examples can be found in the working group reports. 2. Early developments
Soon after his discovery of the simplex algorithm in 1949, Dantzig showed that there are some remarkable cases in which linear programming also solves the corresponding integer problem. The most important example is the transportation problem. The next major impulse for solving integer programming problems came in 1954 when work on the travelling salesman problem led to the idea of “cutting planes”: additional linear constraints that are generated from the current simplex tableau and are used to eliminate a non-integer extreme point. Thus the first known solution methods referred to special structured integer programming problems. The earliest applicable algorithm for general integer programming problems was constructed by Gomory (1958), who developed a general cutting plane algorithm. His work was the origin of an increasing research activity on cutting plane algorithms for integer programming problems. Many authors developed more detailed cutting plane procedures, as well as extensions of the algorithms for more general problems. Parallel to this, another algorithmic idea arose, which has seen many variations and extensions. In 1960, Land and Doig came up with an enumerative technique for the general mixed integer problem. In this “branch and bound” algorithm, as it was called later, the finite nature of integer programming problems is exploited. Essentially the algorithm is finite because there are finitely many solution candidates. It uses an intelligent enumerative procedure (with bounds developed from linear programs) to minimize the number of candidates which have to be inspected explicitly. In spite of the straightforwardness of the basic idea of branch and bound, this approach has proved extremely useful, and provides the basis of the majority of currently applied procedures. Perhaps its success among the practitioners of this field is mainly due to its flexibility and to its large adaptability to various problems with special features. A third line of research in the beginning of integer programming theory was initiated by Gomory in his classical 1963 paper on the cutting plane algorithm where he discussed some group structures appearing in integer programming problems, He in fact showed that the coefficient vectors of the cutting planes generated in his algorithm form a finite abelian group, where the group operation is addition modulo 1. This direction together with work on the cutting stock problem originated the application of group theoretic results to integer programming. Gomory transformed any arbitrary integer programming problem to a relaxed optimization problem defined on a group by omitting the nonnegativity constraints -but not the integrality constraints -on the basic variables. Moreover a sufficient criterion was given for an optimal solution of the group problem to be also an optimal solution of the integer programming problem.
Conclusioe remarks
429
These results led to active research on solution methods for the group problems and on algorithms serving as frameworks for the group approach. The group problem is often formulated as an integer programming problem with one single constraint, a so-called “knapsack problem”. The first algorithms proposed in 1965, 1966 and 1969 are of the dynamic programming type. Another treatment of group problems involves mainly the solution of shortest route problems in networks. The network derived from this has a special structure, and efficient shortest route algorithms for these networks were developed in the late 1960’s. An interesting approach to solving 0-1 programs is pseudo-Boolean optimization. As early as 1959, the possibility of applying the theory of two-element Boolean algebra to discrete optimization was considered. This idea was extended in 1963, and in the following years the Boolean approach to discrete optimization developed into an independent theory. The main advantage of Boolean methods as compared to standard methods of linear algebra is the possibility of transforming problems into simpler ones and getting a better insight into their structure. Another solution method belonging to none of the classes described above is Benders’ (1962) partitioning method for mixed integer programming problems, which can be applied to linear optimization problems where only a (small) part of the variables is restricted to be integer. This approach exploits the fact that any mixed integer program is equivalent to an integer programming problem with only a few variables (namely the original integer variables) but with a huge number of constraints which correspond to the dual linear program for fixed integer variables. All the algorithmic approaches to discrete optimization mentioned so far are concerned with the solution of more or less general discrete programming problems. The computational effort for each of these methods can grow exponentially. Even in the early days of discrete optimization, people tried to derive efficient algorithms for particular classes of discrete optimization problems. Early examples are algorithms for the shortest path problem (1959) and a modification of the simplex algorithm for the transportation problem (1951). The problem of constructing a maximum spanning tree in a network was solved (1956) by what is now called the “greedy algorithm”, and what turned out later to be a stimulating impulse for modern matroid theory. By an exploitation of the inherent combinatorial structures, it was possible to solve two other important graph theoretical optimization problems, the matching and the branching problems, by “good” algorithms. The distinction between “good” and “bad” algorithms was the beginning of complexity theory which is nowadays a common subfield of discrete optimization and computer science. To conclude our review of the developments of discrete optimization we should mention that during the last decades the need for practicable methods for solving large-scale discrete programming problems was too urgent to wait for mathematically exact optimization methods. Therefore at all times heuristic methods were
430
Conclusive remarks
devised which do not solve a problem optimally but try to find a fairly good feasible solution in a reasonable computing time. Since the development of such heuristics is more of an art than of a science, heuristic methods until recently received insufficient attention. 3 . Difficulties of discrete optimization
Let us now turn to some general remarks on the nature of integer programming problems. As we have pointed out in the previous section, the integrality constraint turns easily solvable problems into hard ones to which methods of linear programming are no longer applicable, with a very few exceptional cases. This is mainly due to the fact that discreteness makes the methods of classical “applied mathematics” inapplicable. The pioneering success of the simplex algorithm for linear programming had suggested the possibility of developing algorithms of similar efficiency for arbitrary integer or mixed-integer programming problems. Unfortunately this has not yet been achieved. On the contrary, recent theoretical results indicate the limits for an algorithmic treatment of general integer programming problems, as will be explained later. One reason for the early euphoria was an overrating of the finiteness of available solution methods. Classical applied mathematics has emphasized to a great extent the notion of convergence of processes; consequently, procedures which could be proven to be even finite were considered to be the best one could ask for. However, the finiteness of procedures like the cutting plane algorithms or the branch and bound techniques by no means implies that the general integer programming problem is solved. Obviously, complete enumeration of all feasible solutions of a bounded discrete optimization problem eventually leads to an optimal solution after a finite number of tries, but one can easily give examples of rather small problems for which this finite number is so large that it would take years for any of today’s computers to carry out the enumeration process. This is not very surprising, as we have been referring to the most primitive method imaginable, but unfortunately any of the known very cleverly designed algorithms for the general integer programming problem can be shown to perform equally badly in the worst case. In fact for many algorithms for general or for specially structured integer or mixed-integer programming problems it has been shown that the number of steps needed to guarantee a solution grows exponentially with the number of variables of the problem. To make this perfectly clear: If advanced technology will speed up computing time by a factor k and if previously only problems with a number of variables up to n were tractable, then a k-times faster computer shall still not be able to handle problems with more than n + log, k variables. Complexity theory is a relatively young field of research dealing with such topics as the worst-case behaviour of algorithms. Thus, mere convergence or even finiteness of an algorithm does not say much about its practical applicability. That is why approaches of a combinatorial nature have become more and more important for discrete optimization.
Conclusioe remarks
43 1
4. The Advanced Research Institute on Discrete Optimization and Systems Applications: Trends and developments The NATO Advanced Research Institute on Discrete Optimization and Systems Applications (ARIDOSA) took place in order to examine the thrust of developments in an area which is in the midst of substantial progress in several diverse directions as outlined above. Integer programming is of great importance to most of these directions, but there are other related fields in computer science, systems engineering, decision analysis, combinatorics and, more generally, applied mathematics, which are equally essential. Although these fields address similar questions belonging under the general heading of discrete optimization and systems applications, all too often the progress being made and even the problems under consideration remain unknown beyond the narrow boundaries of a particular discipline. The participants of ARIDOSA has the benefit of a unique and valuable opportunity to expound on the most important developments in their particular fields of work and, even more interestingly, to react to the challenges posed by the progress taking place in other fields. In order to facilitate this interchange, survey lectures were prepared as background material and delivered during the week preceding ARIDOSA. One of the strong points of ARIDOSA was this common understanding as a basis for further discussion and analysis of developments. ARIDOSA provided two different main means of communication: plenary sessions and working groups. Each of the many plenary sessions was devoted to a specific sub-field and chaired by an expert in the respective field. These sessions turned out to be an excellent way of having fruitful discussions not only among those participants who regarded the respective topic as their main field of research or experience, but also including those who contributed aspects from more or less related fields and thus made the discussions more controversial and meaningful. The reports of the sessions as published here contain the views expressed and conclusions drawn on what was perceived as representing the major lines of development. Besides the regular sessions on different topics in discrete optimization and systems applications (the reports of which appear in these two volumes), special working groups were organized. In these groups the participating experts analyzed the conclusions arrived at in the mainstream session, with a view to detecting dominating developments and trends. To summarize the main outcomes of the working groups, a dominant theme at ARIDOSA was the motivating force of applications of discrete optimization in large systems of various types. The need for solutions to the problems arising in real world systems provides an impetus for development of the computational and mathematical fields of discrete optimization. Participants included workers from various countries, representing the automobile industry, airline companies, electric companies, the oil industry, software and consulting groups, the computer indus-
432
Conclusive remarks
try, as well as university and research people with experience in a variety of applications. The working group on applications drew on the experience of several practitioners from various countries who represented, manufacturing industries, communication industries, petroleum industries, computer manufacturers, airlines and electrical power companies. Their report is based, in part, on work discussed in several of the sessions and tries to extract common themes: reasons for the discrete nature of the problems, current solution techniques, and obstacles to adequately solving their problems. They conclude with a list of difficulties and technical advances needed in order to give usable solutions to existing problems. Next we turn to the working group on computer codes. That group reviewed the history of integer programming and mixed integer programming codes and provided suggestions for improving and communicating the results of code development projects. Furthermore, new solution techniques were discussed, such as the usefulness of heuristics capturing the special structure of a problem, improved linear programming systems exploiting implicit problem structure, and improved branch and bound procedures dealing with combinatorial aspects of mixed integer problems. Additionally, the benefits of introducing automatic structural analysis were examined, opening a broad scope of possible code improvements. Finally, avenues of research leading to an optimal future MIP system have been outlined, including characteristics of symbolic model formulation, problem data acquisition, model processing, solution analysis, and report writing systems. The working group on methodology performed the valuable service of bringing some order into the anarchic multiplicity of procedures for handling discrete optimization problems. The group gave characterizations for each of the following list of categories: branch and bound methods, enumeration methods, cutting planes and sub-additive methods, determination of facets, logical and Boolean methods, heuristics, group theoretic methods, dual and Lagrangean methods, decomposition and partitioning methods. The working group on combinatorics was divided into two subsections: one concerned with polyhedral aspects and the other with complexity analysis. The polyhedral work has established a close link between linear programming and combinatorial optimization problems. Classes of problems have been established for which one has duality theorems, optimality criteria, and frequently, good algorithms; linear programming is a central core of this area. The work on analysis of algorithms and complexity of combinatorial problems has attracted the attention of many researchers. We now know that not only are most of our hard problems in some sense equivalent but that they are “extremely hard”. Even within this class of hard problems, several important subclasses have been developed. The notion of heuristic algorithms has gained a new-found respectability as researchers analyse their performance on classes of problems. It is a pleasant duty to express sincere thanks to the enthusiastic work of the
Conclwsiue remarks
433
core members of the working groups, who besides participating regularly in the debates of these groups, have undertaken the additional effort of summarizing the conclusions appearing below. Working group on applications: T. Baker, G. Giannessi, R. Harder, J. Krarup W. Krelle, C. Loflin, B. Nicoletti, M. Segal. Working group on codes: H. Crowder, J. Forrest, D. Klingman, C. Krabek, S. Powell. Working group on methodology: E. Balas, E. Beale, T. Ibaraki, A. Land, G. Nemhauser, J. Shapiro, K. and M. Spielberg. Working group on polyhedral combinatorics and computational complexity: G. de Ghellinck, R. Graham, A. Hoffman, E. Lawler, J. Lenstra, M. Padberg, A. Rinnooy Kan.
1. Applications The discrete optimization problems forming the basis for this report arise in the following systems transportation systems (airlines, subway); communication systems (telephone); manufacturing-distribution systems (automobile); electrical power systems (France and Greece) petroleum refining and distribution systems. Rather than looking at specific applications, several important problem classes will be named and some general conclusions will be drawn. For each class, we give some of the places in which problems in the class arise, the reasons for the discrete nature of the resulting optimization problems, current solution techniques used, and difficulties encountered in solving realistic versions of the problems. Finally, we give some general remarks on hinderances to finding practically useful answers to the complex problems found in large systems and what directions are needed in order to make discrete optimization more useful to practioneers. The first problem class is financial planning. This class can be broken down into strategic planning and budgeting problems. These problems are common in government and industry. Strategic planning typically involves choice variables, e.g. representing making or not making some capacity expansion. These models may involve planning decisions over several years and typically include choices among competing alternatives of an inherently discrete nature. Budgeting models cover a shorter planning horizon and do not include 0-1 choice variables. Discreteness arises because of economics of scale, fixed charges, and, in general, concave costs. The same considerations arise in strategic planning models, making them doubly hard. Current solution methods include enumeration with heuristic devices, dynamic programming, and general mixed integer branch-and-bound. Concave costs, or, in
434
Conclusioe remarks
general, piece-wise linear costs, can be handled using special-ordered sets of type 2 (see Beale’s survey, this volume). Small problems with no 0-1 choice variables can be solved easily enough. Introducing a few choice variables (15 or 20) over a short-range planning horizon of several months can still be handled. However, long range planning models over several years with 100 or more choice variables along with concave costs are virtually intractable. Before even beginning to solve a mixed integer program, the modeller faces several difficulties: the data is frequently very hard to obtain, there are uncertainties present which are difficult to incorporate into the model, and there may be no clear-cut objective function or else several possible objective functions. The multi-time period aspect of the model may lead to large linear programs. When a mixed integer program is finally specified, it may require many hours of computer time to solve the problem. Despite their great potential, mixed integer programming formulations of strategic planning models present great difficulties in practice. Another general problem class is scheduling. Here, we divide the class into scheduling of manpower and scheduling of machines. The airline crew scheduling problem is well-known and has been successfully addressed at least in North America. In Europe, the problem appears to be more difficult because of the “star-shaped” networks encountered there. We also saw a similar problem of scheduling telephone operators. In general, these manpower planning problems lead to integer programs of which at least a large part is a set covering problem. Various techniques are used to generate a promising collection of columns, and then either cutting planes, enumeration, branch-and-bound, or simple rounding are used to force integer answers. The discrete nature of the problem derives from the discreteness of the objects being scheduled: either persons or machines. A simple, but commonly used, way to force integrality is to simply round some linear programming solution. In one case in the airlines industry, a linear program was formulated which was so large and degenerate that a couple of hundred hours of computer time was required to find an optimum solution. Simple rounding of that solution, however, gave a total profit figure six times as large as could be obtained by decomposing the problem and solving smaller pieces of it. This experience underscores the great potential importance of solving large discrete optimization problems but, also, the difficulty of handling the models formulated. Another important general class of problems is production allocation and distribution models. Experience with a model of this type was presented at the industrial applications session. The model involves planning automobile production for a six-month period. The discrete part of the problem is because the production lines must be set up for some mix of model types, and there are change-over costs for any significant changes in the mix of models which will be produced on a given assembly line. The model includes distribution costs for the resulting production. A general purpose mixed integer programming code is used to solve the resulting integer program. There are some 2,000 constraints and 3,000-4,000
Conclusive remarks
435
variables of which about 30 to 50 are 0-1 variables. Running times of 10-50 hours were reported, and optimality is almost never proven. The reduction in costs using the solutions given by the program are from $2 to $3 per car. With six months production of 2 to 3 million cars, the company realizes an annual saving of over 10 million dollars a year. Thus, the many hours of computing time are amply justified. One discrete aspect of the problem which is completely ignored is discreteness of the automobiles themselves, In essence, the only output of the integer program is to say where production mixes should be changed and what mix to change to. After knowing that critical information, a more refined linear model can be used, and final production figures can be set by other means. This model illustrates keeping only important integer variables in an otherwise linear model which is large enough to reflect global considerations of a large production-distribution system. Other discrete constraints arising in models of this type are restrictions on the total number of activities, among a subset of activities, which can be operated at positive levels and restrictions which say that one variable can be positive only if (or only if not) another variable is positive. A fourth general problem type is the class of location-capacity expansion problems. This area overlaps with strategic planning but is rather specific and arises by itself in other situations such as lock-box location problems. In fact, the simple plant location problem has an extensive literature of its own. The discrete nature of the problem, as in some financial planning models, arises because of fixed costs, economies of scale and packaging, and, generally, nonconvex cost functions. Current solution techniques are heuristic, such as the add and drop algorithms, and enumerative. Bender’s partitioning and variants of it have been used. Some solving of linear programs seems to be helpful especially when the “disaggregated” constraint form of the linear program is used. The basic constraint on solving such problems is the number of potential locations, particularly when there are large fixed costs. One hundred location problems remain difficult to solve. One promising direction is using special techniques to solve the stronger, but larger, linear programming problem. Network methods using the embedded dual network or VUB structure could be used. Finally, we mention network planning problems which commonly arise in the communications industry. The discrete nature here comes from non-convex costs; frequently of a step nature. Typical problems have a linear programming relaxation which is a multi-commodity flow problem. These problems can be extremely difficult for reasonable size networks. Multicommodity flow techniques such as use of the arc-chain form have been employed. We now turn to difficulties encountered in general. We divide these into two types: technical and human. With regard to the latter, we could summarize these
436
Conclusive remarks
problems as one of communication between the problem solver and the customer. Frequently there is a resistance to change, resistance to giving up any power, etc. Another problem, however, is resistance to relying on an unkown technique to solve a not-well-understood model of the actual situation. We return to the modelling question later as a technical problem, but for now let us comment that a simpler model may be better if for no other reason than that the user can understand its deficiencies. A more complicated model will also not completely capture the real situation and may leave the decision-maker with only the two choices: reject the solution entirely, or accept it without change. On the technical side, formulation is always a problem. The difficulty is to capture enough of the reality of the situation without making the model so complicated that the resulting problem cannot be solved or the resulting answers not understood. Another technical problem is our inability to handle uncertainty. Presently, parametric linear programming and, more generally, variational studies are the main techniques employed. Simulation sometimes is helpful, and stochastic programming has some limited potential. However, when discrete elements are presented in a model, the modeller is usually unable to do anything satisfactory about uncertainties in the available data, the future, and knowing exactly what is important. Finally we turn to the technical problem of solving the resulting problems. Unlike linear programs, integer programs formulated in practice frequently cannot be solved in reasonable times. Particularly, this statement is true for problems, such as some fixed charge problems, with poor linear programming approximations. At present, the main tool available in commercial codes is linear programming. When that tool is inappropriate, or the problems have too many 0-1 variables, these codes tend to run forever. We conclude with three points: (1) A great difficulty is in problem formulation to try to capture enough reality and still be able to solve problems. (2) The client should be involved from an early stage in order to obtain answers to his problem which he can and will use, (3) A big “want” is a large, fast MIP code which will handle uncertainties, including uncertainties as to what the objective function should be, and which will allow post-optimal analysis of a reasonable sort. Short of this, perhaps unobtainable wish, one is left with developing special purpose codes, or codes which can easily be adapted to special problems, or particular applications.
2. Codes
The last twenty years have seen a number of LP and MIP code development projects begun. Some of these projects have produced codes which have advanced
Conclusive remarks
437
the state-of-the-art. Many projects have, however, fallen short of this objective and/or were prematurely terminated before their merits could be fully evaluated. The first section of this report focuses on some of the lessons gleaned from this decade and provides some suggestions for improving and communicating the results of code development projects. The second section suggests avenues of research for enhancing the solution efficiency of future codes. The last section does a little crystal ball gazing on future model generation and report writing systems. 1.0. Lessons learned
The last twenty years have taught us many things. This summarizes some of these lessons and suggests how they may be used to improve the state-of-the-art.
1.1. Folklore This subsection briefly discusses some of the shortcomings of efforts to improve computational research. LP research tool For at least the past ten years there has been a desire among the academic community to have or develop an MIP code which would be easily modifiable, portable, programmed in moderately efficient higher level language, especially well documented and which would embody most of the important characteristics of commercially available codes. The motivation for such a code is to provide a system which could be used by researchers to test new algorithms. Such algorithms could then be efficiently incorporated into commercial codes. Several such systems have been proposed or “implemented” over the past few years. When one considers the experienced cost of producing such a system, the danger of missing essential features of commercial codes, and the fact that the codes produced for this purpose tend to be nearly as complex as commercial codes, one wonders why more use is not made of commercial codes as testing devices. This is particularly true considering the trend toward commercial implementation in predominantly higher level languages (with computationally critical elements in assembly language) and the common provision of procedural language interfaces. The alternative is to accept and acknowledge the high cost of such a code and to provide a continuing professional support organization capable of developing and maintaining such codes.
Random problems Again, at this meeting, there have been numerous instances where it has been obvious that the performances of wide classes of algorithms have significantly different performance characteristics on problems constructed from random data than on problems constructed from “real” data. While in the majority of cases it
438
Conclusive remarks
appears that “real” problems are less difficult than random problems, there is a significant minority where the opposite is true. It is therefore suggested that for algorithmic research to be relevant to code development, computational experience with real data is necessary. 1.2. Standardization of MIP vocabulary Every commercial MIP code has a user manual. A manual should be a bridge between the client and the code. The principal sources of information for the Land and Powell survey of commercial codes [in this volume] were the user manuals. They found a wide disparity in the terminology used-there are as many definitions of “node”, “branch”, and “node estimation” as there are codes! Because of the apparent simplicity of the branch and bound algorithm, it is easy to imagine that little description of the algorithm is needed and that there is no need to explain the pitfalls. Even a sophisticated user, for example, may not understand that forcing the code to find a particular integer solution does not necessarily speed up the proof of optimality. We would like to urge upon manual writers that great care should be taken in writing the manual associated with an MIP code. The manual should have examples of the uses of the user specifiable variables and guidance on the use of the options offered by the codes. No doubt manual writers will respond that they already do this, but in the code survey the descriptions of the more complex codes varied between 7 and 250 pages and many of the code options were scattered throughout the manual, often with little or no cross-referencing! As well as encouraging user oriented manuals we would like to submit a plea for standard notation, for instance that set forth by Geoffrion and Marsten [“Integer Programming Algorithms: a framework and state-of-the-art survey”, Management Science, 18, pp. 465-4911. The use of such standard notation would aid user understanding of the algorithm and hence, improve his use of the code. 1.3. Research guidelines In the past few years it has been recognized that algorithms should be published with details of large scale computational experience. This necessity of evaluating the commercial viability of a code incorporating the algorithm indicates the need for a real-world problem library as indicated in Section 1.1. A commercial company, such as a computer manufacturer, who wishes to implement the algorithm, however, needs further information. Even if the algorithm seems commercially viable, there remains the important question of embedding it in a branch and bound code. Such a code is a large and complicated piece of software with a fairly rigid structure. The feasibility of implementing an algorithm may depend on how well it is suited to the structure of the code. It is hoped that it will be possible at some future time to publish guidelines on the suitability of various algorithmic approaches. It is recommended that this take
Conclusive remarks
439
the form of a tutorial paper explaining the concepts and structures of branch and bound MIP commercial codes. Additionally, it would be extremely helpful if this paper clearly illustrates the major strengths and weaknesses of these structures. For example, commercial codes are designed to be efficient at solving continuous linear programming problems. Their integer capability will probably have been designed at the same time but the structures used will be those best adapted to linear programming. Modern codes are designed to be efficient on large sparse matrices where it is expected that the non-zero coefficients of the matrix have only a few different values. This allows the creation of an element pool and pointers to this pool. This saves computer memory and makes it possible to solve large problems in-core, but it is not suited to some algorithms. An algorithm which adds cutting planes as new rows may not be well suited to any large scale code if the rows are dense. Also, most codes are not well suited to adding an arbitrary number of rows to a matrix. But even if the rows added are sparse, there is a significant difference between an approach when the new non-zero elements have small integral values less than some known upper bound (in this case the values can be inserted in the pool initially) and an approach generating values which cannot be predicted in advance (and therefore constantly require modification of the pool). It is for this reason that it would be helpful to anybody interested in integer programming if there existed some information pointing out the suitability of various algorithms to commercial codes. 2.0. New solution techniques In the previous section we briefly discussed some of the difficulties of developing and implementing effective MIP codes for both the industrial and academic communities. This section contains some thought on how current codes may be improved.
2.1. Heuristics Commercially available MIP codes are designed to solve general mixed integer programming problems with a small percentage of integer variables. In the past few years there has been an increase in the use of heuristics in these codes. These heuristics (for instance pseudo-costs, pseudo-shadow costs, best projection) have been shown to be powerful aids for the solution of MIP problems. Thus problems may be solved now that were previously considered unsolvable. The heuristics are used in two places in a branch and bound search, first to give an estimate of the effect on the objective function of branching on a particular variable and second, to estimate the value of an integer solution that may be found from a node. A heuristic gives an estimate while mathematical methods, such as penalties, give true bounds. It is certain that heuristics will continue to be used and developed. The heuristics are useful because they attempt to implicitly capture, or crudely
440
Conclusioe remarks
summarize, the structure of a problem. Pseudo-shadow costs are the only heuristics in current use that attempt to exploit more explicitly the structure of a problem. We anticipate that this is the direction in which heuristic procedure will develop, that they will attempt to take more account of the problem structure. Heuristic estimates have been used in the graph searching techniques of artificial intelligence. It is feasible that such ideas and those of automated learning may be a source of inspiration to the development of new MIP algorithms. 2.2. Exploiting special structure Improved LP systems The advantages of network systems have motivated researchers to develop extensions for solving LP problems with embedded networks. Special basis compactification procedures have been developed for such problems that maintain all of the network as the “implicit” portion of the basis. The operations involving the implicit portion of the basis are simply carried out as they would ordinarily be for the underlying network. Thus it is possible to take advantage of the efficient procedures for solving network problems. This has several benefcial consequences. For instance, it reduces computer memory requirements and reduces numerical error. Additionally it allows many operations which are performed arithmetically in LP codes to be performed by logic operations. Preliminary computational results indicate that this procedure is highly efficient for solving LP problems with a large network portion. The importance of the basis compactification area for solving L P with embedded networks is difficult to overrate due to the large number of real-world applications which have this structure. It should be the focus of numerous active investigations and may produce the next breakthrough in the developments of linear programming codes. It is further recommended that research efforts attempt to identify other special structures which merit special handling within general purpose LP codes. In terms of integer programming, the development of LP systems which exploit structure suggests that researchers should focus on specializing branch and bound heuristics, penalty calculations, cutting plane procedures, etc., for such problems. For example, it is well known that if an optimal solution exists to a fixed charge problem then there is an extreme point optimum. This solution property can be utilized by maintaining linear independence of the variables set to be active in the branching process. If the problem is a network, linear independence can be easily maintained by creating a graph of the variables (arcs) set to be active and never allowing a variable to be set active which creates a cycle in this graph. The use of such a technique has the added property that at any time during the solution process it can be used to force all free variables which form a cycle in this graph to be inactive. The next subsection examines changes which should be considered in designing future branch and bound codes to allow researchers and users to evaluate and employ such specializations.
Conclusive remarks
441
Improved branch and bound procedures Current branch and bound codes have proven to be successful in a wide area of mixed integer problems, but they have proven to be less successful on problems with a large combinatorial element. The choice of branching variables and the task of dealing with the combinatorial aspects (by shrinking bounds and fixing variables) are not always satisfactorily handled. Better results could be obtained by the generation and use of “logical inequalities” at each node for a certain class of special structures. While such processing may be feasible in a general control language, more specialized interfaces would be attractive. One way in which current research on special structures could be implemented in a commercial branch and bound code is by the provision of a limited interface of the following nature. A new task would be added to the code to accept user input which would link the LP variables to user accessible multi-dimensional arrays. After solving each subproblem of the branch and bound tree, the code would transfer the values (and reduced costs) of the desired LP variables to the defined arrays. The user could define his own arrays if, for instance, he wished to make some logical constraints implicit. Other arrays shared between the code and the user would be for selected parts of the dual solution and for a branch state vector (to allow for multiple branch choice). The user could then write a few subroutines to use stronger fathoming algorithms, more sophisticated branching methods, or specialized heuristics. 3.0. Problem formulation
Structural analysis As integer programming becomes more of a routine tool it will be necessary to automate certain tasks which at present can be performed only by a person who has some experience in integer formulations. For continuous linear programming problems it is unusual for a bad formulation to increase the cost of solution by more than an order of magnitude, but in integer programming a “good” formulation may make the difference between being able to obtain the optimal solution and failing to obtain any solution. Therefore, the possible benefits of automatic structural analysis for integer programming are much greater than for continuous linear programming. The main areas where there is scope for structural analysis are (1) Reduction. As now done automatically for continuous problems, e.g., to remove redundant rows, remove vectors which must take zero value, change rows to simple upper bounds. (2) Extended integer reduction techniques which recognize the extra integer structure. The simplest example is that of removing an integer vector where a row gives it an implicit upper bound less than 1.0. (3) Changing coefficients to “tighten” the problem, bringing the continuous solution closer to the integer solutions. The most common example occurs when an
442
Conclusive remarks
integer variable S acts as a switch for a continuous variable x. The equation
x c M 6 is then valid at an integer solution for any very large M, but M should be chosen as small as possible to “tighten” the problem. (4)Adding new equations (cutting planes) to bring the continuous optimum closer to the integer optimum. A few such equations are often very powerful. (5) Adding cutting planes after the continuous optimum has been reached, by using cutting plane methods and adding a few strong cuts. ( 5 ) Recognition that the problem has a special structure or that a subproblem has such a structure (e.g., a network structure) or can be translated into such a structure. Taking advantage of such structure can lead to a significant increase in efficiency as was discussed in Section 2.2. Whether such algorithms can be constructed is an open question of great practical importance. The area of automatic structural analysis seems to be almost a virgin area where theoreticians can make an impact.
4.0.Model generation and report writing systems In the coming decade we will see significant advances in the way we formulate mathematical programming models and report solutions. It is possible to solve many mathematical programming problems of reasonable size and reasonable complexity. The real challenge in the future is to develop systems for allowing users to easily generate mathematical programming models, easily acquire and control the problem data for the model, and to easily interpret and understand the answers from the model once it has been solved. Barring some dramatic new algorithm which will supplant the simplex method for solving linear programming problems, we will not soon see any great advances in the underlying principles of problem solution and analysis; it is almost certain that we will see tremendous advances in the areas of problem generation and solution reporting. 4.1. Problem conception The term “problem conception” is a generalization and an extension of “matrix generation”, which has traditionally been the first step in solving a mathematical programming problem. It has been noted that a mathematical program is typically generated twice: once for the edification of the user, and then once for the computer. Although the first step may be done correctly, this in no way implies that the second step will be done correctly. In the worst of all cases, these two steps are performed by different people. The main idea of problem conception is to combine these two separate steps of problem generation into a single unified operation; the output of problem conception should be a model which is equally understandable to both the computer and the user. Problem conception has three functional divisions: symbolic model formulation, problem data acquisition, and model processing.
Conclusive remarks
443
Symbolic model formulation The primary tool for creating mathematical programming models will be an interactive symbolic mathematics system. The high level language used to formulate problems with this system will correspond closely, if not exactly, with the mathematical notation of modelers: the writing of equations and relations, the defining of index sets and index limits, etc. In addition, topological data and relations for network and network-related problems will be entered graphically to the system. Commonly used model components will be entered via macro definitions. The language recognized by the system will not be static; it will be possible to add new constructs which will allow application-specific input formats to be created. Various groups are currently experimenting with such systems. We hope that from these various efforts a consensus will emerge as to the principles which should be embodied in a high-level mathematical programming modeling language. Ultimately, we hope to see a standard language which will be used for both model formulation and communication. This high level problem formulation language will be entered interactively to a local multiprocessing minicomputer-based terminal. As the user enters model components, the system will check for syntax errors, inconsistencies, redundancies, invalid index limits, etc. In addition, the system should perform some degree of data-independent automatic structural analysis of the model (this topic is treated in more detail in the next section). At the very least, the user will communicate with the system via an alpha-numeric keyboard and a graphic display. More sophisticated terminals will be equipped with a tablet, allowing hand-written instructions to be interpreted using pattern recognition. When this phase of problem conception is finished, there will exist a symbolic model which is easily understandable to the user and yet contains all of the necessary information about the problem for the system. This symbolic model is important because it is this tool which we use to inquire and make changes to the problem in later stages. The next step will be to acquire the actual problem data for the model. Problem data acquisition Almost without exception, the input to future mathematical programming systems will be composed of segments from large, complex data base systems. These data segments will either be created specifically for the M P S from direct input or be the output from some related application system. We will not become embroiled in the centralized vs. distributed data base controversy because, for our purposes, it is irrelevant; either alternative (or both) will be capable of interfacing with mathematical programming systems. For other application systems to share data with the mathematical programming system, there will be a rigid and well defined data-interchange protocol between similar systems and their common data base. To accomplish this, systems will be
444
Conclusive remarks
implemented in “application sets”, with all systems in the set using a common data base/data management interface. This will allow data generated by one system to be used directly by other systems in the set. In the area of direct data entry, the use of programs to compute tables and lists of “matrix elements” will be made simpler through the use of powerful, directly interpreted, interactive high level languages like APL. However, there will always be the need to input “raw” empirical data to application systems. The next decade will see a proliferation of key-to-cassette, key-to-disk, and direct keyboard input devices. In addition, mathematical programming data acquisition should be able to take advantage of graphic input terminals, point-of-sale and other transaction driven terminals, and hand-held portable cassette terminals. Later, voicerecognition terminals might be used for data acquisition.
Model processing This phase of problem conception insures that there is a one-to-one correspondence between the elements of the symbolic model produced in the formulation phase and the data base elements of the acquisition phase. The first step is model verification, insuring that symbolic model components have corresponding data base elements, and that there is consistency as regards data type, dimension, etc. Assuming that all is in order, the next step is model binding, associating with each symbolic model component the location in the (possibly distributed) data base of its corresponding data element. To accomplish model processing, it will be necessary to bring on-line the symbolic model which was created and stored locally in the formulation phase. The model processing phase is typical of the blending of data base systems with large applications which will be common in the next computer generation. There will no longer be “stand alone” systems, but rather environments in which several systems interact to perform related applications. 4.2. Solution analysis and reporting Traditionally, the function of solution analysis or sensitivity analysis has been associated with the problem solution function of a mathematical programming system. This was natural because the same underlying algorithms are used for both functions and it was easy to serially invoke the problem solver and solution analysis routines in the same batch run. Unfortunately, the user had to know in advance which solution analysis algorithms might be required. In a batch environment, this usually required that the user solve the problem, save the solution, print out and inspect the results, and then run another job with the appropriate analysis routines, restarting from the previously saved solution. In the system we see, the problem solution and solution analysis functions will be divorced because, despite the similarities, the former function will still be essentially a batch operation while the latter will be made more powerful and effective by operating interactively. It is in the analysis phase that mathematical
Conclusive remarks
445
programmers will enjoy the real benefits of real-time interactive computing, possibly even more than in the symbolic model formulation phase. Using an on-line graphics terminal, it will be possible to rapidly explore and investigate the multiple alternatives suggested by a mathematical programming solution. Elements of the solution will be accessed and modified by referencing their corresponding symbolic model components. System graphics routines will express results in terms of graphs and drawings, as well as the usual alpha-numeric charts and tables. A high level analysis and report language, similar to the symbolic model language described above, will allow the user to easily program solution analysis procedures. The analysis and report language will also be used to interactively format the final model solution report. It will be easy to produce reports with graphs, histograms, and other graphics as well as charts and tables. Reports, especially those to be read only once by a small number of people, will be cheaper and easier to store and retrieve electronically than in hard paper copy.
4.3. The small system -An M P S machine ? The use of minicomputer-based applications systems will pervade the computer industry in the coming decade. We will see the effect of these small, relatively inexpensive systems in two ways: first, they will assume many of the modest applications which are performed on large systems today. Second, and more important, they will open up many new applications of existing procedures. The extent of these new computer uses, which we will call “consumer applications”, is probably too vast to accurately assess at this time. As the price of these systems continues to decline with accompanying increases in performance, we think it safe to say that anyone with an application and modest resources will be able to afford a computer solution. This popularization of computers will result in a vast new community of users who will view data processing from a unique perspective. They will not know or care how their computer system works; their sole concern will be that the system responds at their command to produce solutions to their specific problem. They will be unwilling, and in many cases unable, to hire programmers and other data processing support personnel. This means that the systems for these consumer applications must be self-contained, reliable, foolproof, and have an utterly simple interface with the user. This new breed of user will be completely devoid of vendor loyalty; if systems fail to meet specifications, they will not hesitate to replace them with the competition’s product. The essential elements of a small system for mathematical programming will be a multiprocessing minicomputer, an intelligent terminal, possibly with graphic capabilities, a moderate sized non-intelligent mass storage system, and options like a printer and a bulk data input device. The system control program and main application function (in our case, the mathematical programming algorithm) will be implemented in hardware o r in microcode.
446
Conclusive remarks
The software system will consist of a vendor-supplied, application-specific front-end for problem conception and back-end for analysis and reporting. The front-end program will interactively elicit responses from the user using an English language query system. Users will, in essence, fill out a questionnaire about their problem from which the system will construct and modify the model. The back-end program will allow the user to modify and re-solve the model, and to generate reports based on solutions. Being application-specific, it will cast problem solutions in the language of the user.
3. Methodology Methods for solving discrete optimization problems can be classified into the following categories (1) branch and bound and enumerative methods; (2) cutting planes and subadditive methods; (3) logical and Boolean methods; (4)group theoretic methods; ( 5 ) dual and Lagrangean methods; (6) decomposition and partitioning; (7) heuristics; (8) methods for special classes. Each of these categories includes considerable significant results and is the subject of on-going study and research. Perhaps the most likely successful new methodologies will combine or borrow from several of these areas. 1. Branch and bound and enumeration
Branch and bound seems to remain the dominant method for general mixed integer programming. It can always call on other methods to help out with difficulties, and it can compromise between optimizing and using built-in heuristics in order to find reasonably good solutions. Promising areas for further work are (a) Alternative branching strategies, other than simply changing the bounds on individual variables. Such strategies could be based on the physical significance of the variables or could further exploit the logical dichotomies of the problem. An example already used is the special ordered set strategies of Beale and Tomlin. (b) Improvement of the linear programming bounds by using cutting plane, Lagrangean, dual, or group theoretic methods. (c) Improvement of the heuristic estimates of the value of an integer solution at a node, and the likely consequences of branching in different ways at that node. Heuristics to control the search in order to quickly find good solutions is important.
Conclwsioe remarks
441
(d) Use of Ibaraki’s concept of dominance of one node over another, particularly for problems with nearly decomposable structure. (e) Enumeration at nodes of the branch and bound tree in order to find integer solutions, direct the search strategy, or improve bounds. Historically, the term “enumeration” has been applied to methods using branching and, perhaps, bounding but not using linear programming. One can say that enumeration keeps integer variables at integer values. These distinctions become fuzzy when enumerative codes solve linear programs to direct the search and when branch and bound codes use enumeration at nodes to find integer solutions. For our purposes, let us use the term implicit enumeration to mean a method which keeps integer variables at integer values, makes at most sparing use of linear programs, and tries to develop and exploit some logical structure of the problem. For this latter point, see the description below of logical and Boolean methods. Implicit enumeration is likely to come into wider use for certain structured (usually large and pure 0-1) integer programs. Such problems are not very well approximated by the linear programming relaxation, and they are often characterized by special constraints (multiple choice, precedence, conflict) which can be directly exploited in enumeration. Sometimes this structure can be generated through finding logical inequalities. 2 . Cutting planes and sub-additiue methods Cutting planes have provided a useful method for solving set partitions, travelling salesman, and set covering problems. In addition, several authors report success with Benders cuts on specially-structured MIP problems. The use of cutting planes is likely to increase, particularly on problems containing special sub-structures with cuts designed to exploit this structure. The growth in theoretical devices for producing cuts allows one to “specially design” cuts for a given structure. Cutting planes will probably also find increasing use in hybrid algorithms. For instance, sub-additive cuts have been employed with substantial success at the 0-level of implicit enumeration (the final objective function expression being carried along for exploitation during the enumeration). However, cuts at the 0-level in branch and bound schemes are likely to be useful only if they are faces or even facets of the convex hull of the integer solutions or facets of the mixed integer group problem, since otherwise they are likely to become redundant constraints as branching proceeds. One interesting use of cutting planes, though possibly of rather limited applicability, is to provide a cost-sensitivity analysis (as opposed to a right-hand side sensitivity) to a solved problem. The theory of cutting planes would probably benefit from developing further interconnections with both the cutting planes for the “well-solved’’ problems and with the facet-producing methods for programs with special structure.
448
Conclusive remarks
Sub-additive methods represent a highly developed theoretical work on cutting planes. We can distinguish two aspects: first, they involve a reduction of questions on cutting-planes to constructions of sub-additive functions; and, second, they provide some means for producing sub-additive functions which give powerful cuts. Although they were originally developed in the context of the “group problem”, they have proved to be adaptable to many other contexts as well, and they are a primary source of information on an integer program that is of a very different nature from the information provided by branch-and-bound. Sub-additive methods could be made more useful by the development of techniques for obtaining “good” problem relaxations, such as group and semigroup problems which are different from the usual group problem. A sub-additive framework for directly treating binary variables would be valuable, as would be the continued development of new methods for generating sub-additive functions. 3 . Logical and Boolean methods
The use of logical and Boolean methods in the solutions of discrete optimization problems has been discussed in detail in one of the surveys appearing in this volume. We shall indicate here only some of the essential points regarding these methods. The possibility of deriving “logical” constraints from a given linear inequality in 0-1 variables, has been noticed by several authors (Balas, Hammer, Spielberg), who have also devised different algorithms (mainly of the enumerative type) involving such relations. The set of all “logical consequences’’ of a linear inequality turns out actually to be equivalent to the original inequality. Unfortunately, this set is too large to offer a realistic way of reducing the original problem to some standard form consisting only of logical relations. The main advantage of examining the logical consequences derived from the individual constraints of a problem, consists in the fact that - by using Boolean methods - these consequences can be combined into forceful relations. This idea has been shown to lead to valuable algorithmic and other developments. It is felt now that more powerful methods will be constructed by successful combinations of Boolean procedures with other approaches. 4. Group theoretic methods
The group theoretic relaxation of an integer program provides valuable insight into the program when the right-hand-side is large. It can be efficiently solved by list processing algorithms for problems where the size of the group is not greater than a few thousand. The group problem has stimulated a good deal of work in cutting-plane theory, and has led to cutting-planes which have proved useful in some enumerative algorithms. The potential of group based ideas, however, goes well beyond the usual group problem. In fact, by the use of homomorphisms into semi-groups a large variety
Conclusive remarks
449
of group and semi-group problems can be designed, which yield to the same theoretical analysis. These problem relaxations can embody much useful information about the integer program, which is valuable if the number of the irreducible vectors of these relaxation is kept within bounds. These alternative group and semi-group relaxations deserve further investigation. The use of Lagrangean techniques for altering the objective function of group and semi-group problems, while retaining the constraints, is a valuable means for improving the bounds obtained from these problems.
5 . Dual and Lagrangean methods Dual extreme points or rays can be translated into inequalities over the primal integer variables. The coefficients of the resulting inequalities can then often be used in the construction of “greedy” enumeration steps. Alternatively, logical inequalities can be generated. Dual ascent methods are heuristic procedures for generating a dual feasible solution to a linear programming relaxation of an integer program (possibly one strengthened by cuts). They appear to be surprisingly effective, and economic in space and time, especially for highly structured problems (prime examples are simple and capacitated plant location). Lagrangean methods are intended to exploit special structures arising in a variety of discrete optimization problems. Sometimes the special structures are not obvious such as the 1-tree structure imbedded in the travelling salesman problem. Moreover, special structures are sometimes induced, such as the group equations in integer programming. Lagrange multipliers on the complicating constraints are added to the objective function. The resulting calculation is easy to perform because it is an optimization only over the special structure. The selection of appropriate multipliers in the Lagrangean calculation is accomplished by solving the dual problem. The dual problem is a nondifferentiable concave optimization problem for which a number of methods exist, but for which dominant solution strategies have not yet been determined. Lagrangean methods need to be embedded in branch and bound because there is no guarantee that they will provide an optimal solution to the original discrete optimization problem. In the branch and bound framework, they provide lower bounds, and occasionally feasible and optimal solutions to the original problem. Lagrangean techniques can also be used to evaluate heuristics. 6 . Decomposition and partitioning
There has been relatively little use of decomposition (except for highly structured problems) and somewhat more use of Benders partitioning (again for special problems). It is likely that large scale codes would be enhanced by the availability of an automatic (partial) decomposition and partitioning facility, possibly coded in a high level systems language. One very important problem for branch and bound methods is to improve the
450
Conclusive remarks
way weakly linked component problems are handled. We may think of these as independent component problems except for a few common rows and a few linking columns. But the more general concept is that sometimes we may believe on physical grounds that the optimum way to handle one component problem will be only marginally influenced by the way we handle the others. In Professor Ibaraki’s terminology, the problem is to recognize when one combination of values of the integer variables within one component problem “dominates” another combination of values of the same variables,
7 . Heuristics There remain many types of problems in the area of discrete optimization which will not be solved optimally in the foreseeable future. For these we clearly need better heuristic methods. We can mention greedy algorithms, the Shen-Lin type of algorithm for the travelling salesman, and airline crew-scheduling as areas where good heuristics have been found. Simple “local searches” within branch and bound or enumeration (i.e. looking at a limited set of neighbours to a current point), or conducting a controlled search for integer solutions around the LP solution at a node, are frequently very useful. Branch and bound and enumerative methods are both often used as heuristic rather than as strictly optimizing routines and the tactics of the search may need modification when this is explicitly the case.
8 . Some special classes The investigation of the facial structure of polyhedra associated with hard combinatorial optimization problems, such as travelling salesman problems, set packing and set covering problems and multi-dimensional knapsack problems, constitutes a systematic way to obtain tighter formulations of the respective problems in terms of linear inequalities than is provided by an ad hoc formulation of such purely combinatorial problems. Much current research effort has been spent to date on the descriptive identification of facets of various polyhedra associated with these standard combinatorial problems, while at present comparatively little is known about the algorithmic identification of facets. (See the survey on covering, packing and knapsack problems.) A recent computational study concerning the usefulness of inequalities defining facets of the convex hull of tours of the travelling salesman problem has confirmed the hypothesis that such inequalities do indeed provide considerable computational help in the solution of this difficult combinatorial problem.
4. Polyhedral combmatoria and computational complexity Polyhedral combinatorics is an area which owes much to the work of Fulkerson. His work on network flows was a major step in unifying several combinatorial
Conclusive remarks
45 1
areas. Examples of the theorems now seen as instances of linear programming duality (along with integrality of both an optimum primal and dual solution to the network flow linear programming problem) are Dilworth’s theorem, the Egervhry-Konig theorem, and the max-flow min-cut theorem. The original proof of the max-flow min-cut theorem by Ford and Fulkerson (1956) used the chain-arc formulation which later became a prototype of a blocking pair. Further work on chain-arc and path-node incidence matrices in both the directed and undirected cases has led to extremely interesting generalizations and theorems. The notion of a blocking pair of clutters and a duality result was due to Edmonds and Fulkerson. This work generalized and included many “bottleneck extrema” problems. It also led, along with other results such as the chain-arc form of the max-flow min-cut proof, to Fulkerson’s notion of a blocking pair of polyhedra. Every chain (directed path) from source to sink in a directed graph meets at least one cut separating source and sink. The blocking pair of polyhedra are those given by Ax 5 1, x 3 0, and By 3 1, y 3 0, where A is the incidence matrix whose rows correspond to chains from source to sink, and B is the matrix whose rows are the incidence vectors of cuts separating source and sink. The vertices of one of these polyhedra correspond to the facets of the other, and vice versa. Linear programming duality, along with the preceeding integrality result, gives the max-flow min-cut theorem for the first of these polyhedra and gives the shortest chain, max packing of cuts theorem for the second. These two combinatorial results are intimately related as polyhedral pairs within this theory. One of the most striking results of polyhedral combinatorics is the proof of the perfect graph conjecture by Fulkerson and Lovhsz. The main line of proof is through the anti-blocking (the packing version of the blocking theory) theory of Fulkerson. LovPsz eventually succeeded in proving the result, which had been an outstanding conjecture for many years. Beyond these developments, Edmonds’ (1965) characterization of the matching polytope and derivation of a good (polynomially bounded running time in the worst case) algorithm must be considered as a major achievement. Later, he also gave similar results for the matroid intersection problem and related problems. Furthermore, his distinction between good algorithms and problems for which there exist such algorithms was a forerunner of the current complexity work in computer science and combinatorial optimization. To this distinction, we need only add Cook’s theorem showing existence of an NP-complete problem (roughly, one such that if it could be solved in polynomial time then any problem which could be solved by a “branching tree process of polynomial depth” could be solved in polynomial time) to leave the way open for a drawing of the line between known “easy” and known “hard” problems. The question as to whether P = NP is one of the major unsolved problems in the study of algorithms. Most people working in this area doubt that P = N P because if so, most of the hard problems could be solved in polynomially-bounded
452
Conclusive remarks
time. In the meantime, there has been much work to show many problems to be NP-complete. The delineation of the boundary between P and NP continues and becomes increasingly precise as special classes of NP-complete problems are shown to be solvable by good algorithms. For example, good algorithms have been found for the optimum independent node-set problem in a graph which is claw-free. What conditions suffice to allow good algorithms for hard problems is a question which is hard to know in advance until one has an algorithm which will solve the problem given those conditions. This whole area is very interesting and active, but a major difficulty is that it is so hard to be able to prove that no algorithm can work well on a class of problems. “Algorithm” is defined so broadly that proving anything about all algorithms is very hard. Another active area is to improve the polynomial bound on already well-solved problems. Edmonds and Karp gave an O ( N5 )algorithm for the maximal flow problem. Subsequently, Dinits reduced the bound to O(N4),and then Karzanov reduced it to O ( N 3 )using an algorithm which does not send flow along augmenting paths. This is just one example of a problem where better bounds have been shown for improved algorithms. Preprocessing operations may often be applied to NP-complete problems to simplify their expression and render them more amenable to solution. For example, reduction of coefficients in the constraints of linear 0-1 programs, as studied by Bradley, Hammer and Wolsey, yields equivalent problems with fewer non-zero coefficients and coefficients of smaller magnitude. For some problems, such as the knapsack problem, many variables may be fixed at 0 or 1 immediately as shown by some easy tests. Similar results may be obtained with more sophisticated tools; e.g. Nemhauser and Trotter have shown that the variables equal to 0 or 1 (corresponding to vertices included or excluded) in the optimal solution of the linear programming relaxation of the weighted vertex packing problem retain this value in at least one optimal solution. Logical inequalities having few variables may often be deduced from the constraints of a problem in 0-1 variables and provide insight into that problem’s structure. Looking at the order relations between variables and combining such relations allows often to show some problems are infeasible or may be simplified by fixation or identification variables. While such Boolean operations alone rarely suffice to obtain optimal solutions, they enhance, sometimes considerably, the efficiency of branch-and-bound algorithms which also use other bounding methods. The study of the facets of the integer polytopes of many NP-complete problems is being actively pursued. When a sufficient knowledge of these facets has been obtained, algorithms which are polynomial in most cases, even if they are non-polynomial in the worst case may be obtained; as is the case for Padberg’s work on the travelling salesman problem. The relevance of the worst-case criterion is subject to some criticism. Mean-
Conclusive remarks
453
case analysis, although more desirable from the practical point of view than worst-case analysis, is often difficult; the probability distribution is hard to ascertain and computations may be unwieldly. Some mean-case results have however been obtained with random graph models for the independence number problem by Tarjan, and for the chromatic number problem by McDiarmid. For the transitive closure of a graph, algorithms which take O(N3)operations in worst-case but O(N2) in mean-case are known. An interesting outgrow of computational complexity analysis are the polynomial approximate algorithms, which have been recently much studied. Such algorithms allow solving the subset-sum problem within 1+ E , the bin-packing problem within 11/9 and the metric travelling salesman problem within 2. Polynomial approximate algorithms, among which the greedy algorithm is one of the simplest, have been studied for independence systems by Jenkins and by Korte and Haussmann, and for submodular functions by Cornuejols, Fisher, Nemhauser and Wolsey. The complexity of finding an approximate solution to a given problem has also been studied and yields some surprising results and some insight into the relative complexity of NP-complete problems. For instance Garey and Johnson have shown that solving the chromatic number problem within 2 is NP-complete. Several problems were raised, in addition to those cited above. Let a ( G ) denote the independence number of a graph G with n vertices. Can we find a ( G ) in polynomial time if G is perfect? Conversely, knowing how to obtain a ( G ) in claw-free graphs, can we determine the stable sets polyhedron in such graphs? Can Euclidean properties be further exploited as in Shamos’ work? For instance can we solve Euclidean matching problems more efficiently than in O(N 3 ) time? Is graph isomorphism NP-complete in general? Finally, is a good polyhedral characterization of a combinatorial optimization problem equivalent to the problem being polynomially solvable? And so on.
...
And so, on.
This Page Intentionally Left Blank