A branch-and-cut method for 0-1 mixed convex programming

Math. Program., Ser. A 86: 515–532 (1999)  Springer-Verlag 1999 Digital Object Identifier (DOI) 10.1007/s101079900094...

Author: Stubbs R. A.

69 downloads 377 Views 134KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Math. Program., Ser. A 86: 515–532 (1999)

 Springer-Verlag 1999

Digital Object Identifier (DOI) 10.1007/s101079900094

Robert A. Stubbs · Sanjay Mehrotra?

A branch-and-cut method for 0-1 mixed convex programming?? Received January 16, 1996 / Revised version received April 23, 1999 Published online June 28, 1999 Abstract. We generalize the disjunctive approach of Balas, Ceria, and Cornuéjols [2] and devevlop a branchand-cut method for solving 0-1 convex programming problems. We show that cuts can be generated by solving a single convex program. We show how to construct regions similar to those of Sherali and Adams [20] and Lovász and Schrijver [12] for the convex case. Finally, we give some preliminary computational results for our method. Key words. mixed integer programming – convex programming

1. Introduction This paper develops a branch-and-cut method for solving the mixed zero-one convex programming (MICP) problem: minimize cT x subject to gi (x) ≤ 0 for i = 1, . . . , m, 0 ≤ x i ≤ 1, i = 1, . . . , p,

(1)

where c, x ∈
516

Robert A. Stubbs, Sanjay Mehrotra

A2. for all x ∈ C, gi (x) ≥ −L, for i = 1, . . . , m, where L is a large positive constants. Our development is motivated from the successful employment of branch-and-cut methods for structured 0-1 integer linear programs [15,18] (SILP) as well as general mixed 0-1 integer programs [2,3] (MILP). The effectiveness of branch-and-cut methods for 0-1 linear programs is a consequence of significant advances in the cutting-plane generation methodology, and the linear programming solution methodology during the past two decades [14]. Although branch-and-cut methods are the most popular methods for SILP and MILP, to the best of our knowledge, no such method has been developed for solving the MICP problem. The recent years have seen significant progress in solution methods for solving the convex programming problems [16]. We expect that the development of this paper combined with the robust convex programming solvers will significantly improve our ability to solve the MICP problem. This paper makes following contributions. (i) It shows that the convex hull of (1) can be generated sequentially, thus extending the results of Balas, Ceria and Cornuéjols [2], and Sherali and Adams [20,21] for the linear case. (ii) It shows that the problem of finding a valid inequality can be formulated as a single convex optimization problem. (iii) It then shows that the tighter sets generated by Sherali and Adams [20] and Lovász and Schrijver [12] for the pure 0-1 linear case can also be generated in this more general setting. (iv) Finally, it shows how to lift valid inequalities generated at a node of the branch and bound tree to the entire tree when Lagrange multipliers are available. We do not loose any generality by considering the MICP problem in the form (1) because a problem with convex objective function (say g0 (x)) can be written in this form after introducing a continuous variable µ (minimize µ) and a constraint g0(x) − µ ≤ 0. We take a linear objective function because for general convex functions it is possible that the minimization of the relaxation problem is in the strict interior of the feasible set and we can not add a valid inequality in this situation. It is shown later in this paper that it is possible to add inequalities if the problem is in the form (1). The MICP models arise in many applications including chemical process design [7], electrical power management [17], and financial modeling [19]. In the past, methods have been developed for solving special cases of MICP problem. These methods fall into the following four catagories: linearization, implicit enumeration, generalized Benders’ decomposition [8], and outer-approximation [5,6]. The linearization methods [1,20] are applicable if functions are multi-linear (see [9] for an excellent survey) where each term has at most one continuous variable. The outer approximation method of Duran [5] and Duran and Grossman [6] for mixed 0-1 problems solves integer convex programs that are linear in the binary variables. Duran [5] showed that at each iteration, the lower bound given by the relaxed master program in the outer-approximation method is greater than or equal to the lower bound obtained by the generalized Benders’ decomposition method. Limited computational success was reported for the outer approximation method for solving some chemical industry application problems [6,11]. This paper is organized as follows. Section 2 develops the theory for generating separating hyperplanes. We show that the resulting separation problem can be solved by solving a convex optimization problem. Section 3 describes how to generate valid

A method for convex integer programming

517

inequalities that cut away a given solution. Section 4 provides representations of sets similar to those used by Sherali and Adams [20] and Lovász and Schrijver [12] for the convex case. In Section 5 we describe a branch-and-cut method and show that cuts at any node of the branch-and-bound tree can be lifted for the entire tree. We report our preliminary computational experience with this method on a few test problems in Section 6.

2. Convex hulls and minimum distance problems 2.1. Convex hulls This section shows how to write the convex hulls with respect to the binary variables. Let C = conv {x ∈ C | x i ∈ {0, 1} , for i = 1, . . . p} be the convex hull of the solution set for which the first p variables are binary. For a binary variable x j , let C 0j = x ∈ C|x j = 0 , and C 1j = x ∈ C| x j = 1 . (2) Let

 x = λ0 u 0 + λ1 u 1 ,   λ + λ = 1, λ ≥ 0, λ ≥ 0, . M j (C) = (x, u 0 , u 1 , λ0 , λ1 ) 0 1 0 1     u0 ∈ C0, u1 ∈ C1   

j

(3)

j

Let P(M j (C)) be the projection of M j (C) onto the x-space, i.e., P(M j (C)) ≡ x |(x, u 0 , u 1 , λ0 , λ1 ) ∈ M j (C) .

(4)

For simplicity denote P(M j (C)) by P j (C). The next proposition states that P j (C) is the convex hull of the sets C 0j and C 1j . The proof of this proposition follows from the definition of the convex hull. Proposition 1. P j (C) = conv C 0j , C 1j = conv C ∩ x j ∈ {0, 1} . t u Now, let 01 C 00 j1 j2 = x ∈ C | x j1 = 0, x j2 = 0 , C j1 j2 = x ∈ C | x j1 = 0, x j2 = 1 , 11 C 10 j1 j2 = x ∈ C | x j1 = 1, x j2 = 0 , C j1 j2 = x ∈ C | x j1 = 1, x j2 = 1 , and    x, u , u , u , u , 00 01 10 11 M j1 j2 (C) =   λ00 , λ01 , λ10 , λ11

x = λ00 u 00 + λ01 u 01 + λ10 u 10 + λ11 u 11 , λ00 + λ01 + λ10 + λ11 = 1, λ00 , λ01 , λ10 , λ11 ≥ 0, 01 10 11 u 00 ∈ C 00 j1 j2 , u 01 ∈ C j1 j2 , u 10 ∈ C j1 j2 , u 11 ∈ C j1 j2

    

.

518

Robert A. Stubbs, Sanjay Mehrotra

Let P j1 j2 (C) ≡ P(M j1 j2 (C)) be the projection of M j1 j2 (C) onto the x-space, i.e., P j1 j2 (C) = x |(x, u 00 , u 01 , u 10 , u 11 , λ00 , λ01 , λ10 , λ11 ) ∈ M j1 j2 (C) . Also define M j1 j2 ... jl (C) and Pj1 j2 ... jl (C) in a similar fashion. Clearly, P j1 j2 (C) = 01 10 11 conv C 00 j1 j2 , C j1 j2 , C j1 j2 , C j1 j2 = conv C ∩ x j1 , x j2 ∈ {0, 1} . The next theorem states that the convex hull with respect to two or more binary variables can be written either directly as above, or in a recursive fashion. Theorem 1. P j1 j2 (C) = P j1 (P j2 (C)), and more generally P j1 j2 ... jl (C) = P j1 (P j2 j3 ... jl (C)), j1 6= j2 6= . . . 6= jl . Furthermore, P12... p (C) = P1 (P2... p (C)) = C. Proof. From its definition the set P j1 (P j2 (C)), is given by      x = λ0 u 0 + λ1 u 1  x λ0 + λ1 = 1, λ0 ≥ 0, λ1 ≥ 0, ,     u 0 ∈ P j2 (C), u 0 j1 = 0, u 1 ∈ P j2 (C), u 1 j1 = 1 which by expanding for P j2 (C) equals   x = λ˜ 0 u 0 + λ˜ 1 u 1 ,          λ˜ + λ˜ = 1, λ˜ ≥ 0, λ˜ ≥ 0,  0 1 0 1 . x ˜ ˜      u 0 = λ00 u 00 + λ01 u 01 , u 00 ∈ C, u 01 ∈ C, (u 00 ) j2 = 0, (u 01 ) j2 = 1, u 0 j1 = 0,      u 1 = λ˜ 10 u 10 + λ˜ 11 u 11 , u 10 ∈ C, u 11 ∈ C, (u 10 ) j2 = 0, (u 11 ) j2 = 1, u 1 j1 = 1 (5) In the construction of u 0 if λ˜ 00 > 0, then (u 00 ) j1 = 0, if λ˜ 01 > 0, then (u 01 ) j1 = 0, and if λ˜ 00 > 0, λ˜ 01 > 0, then (u 00 ) j1 = 0, (u 01 ) j1 = 0. Hence, the set u 0 = λ˜ 00 u 00 + λ˜ 01 u 01 , u 00 ∈ C, (u 00 ) j2 = 0, u 01 ∈ C, (u 01 ) j2 = 1, (u 0 ) j1 = 0 can also be written as u 0 = λ˜ 00 u 00 + λ˜ 01 u 01 , u 00 ∈ C, (u 00 ) j2 = 0, (u 00 ) j1 = 0, u 01 ∈ C, (u 01 ) j2 = 1, (u 01 ) j1 = 0. By using a similar argument for u 1 in (5) we can write it as   x = λ˜ 0 (λ˜ 00 u 00 + λ˜ 01 u 01 ) + λ˜ 1 (λ˜ 10 u 10 + λ˜ 11 u 11 ),        u ∈ C, (u ) = 0, (u ) = 0, u ∈ C, (u ) = 1, (u ) = 0,    00 00 j 00 j 01 01 j 00 j   2 1 2 1        u 10 ∈ C, (u 10 ) j2 = 0, (u 10 ) j1 = 1, u 11 ∈ C, (u 11 ) j2 = 1, (u 11 ) j1 = 1,  x ˜ . ˜ ˜ ˜    λ0 + λ1 = 1, λ0 ≥ 0, λ1 ≥ 0,       λ˜ 00 + λ˜ 01 = 1, λ˜ 00 ≥ 0, λ˜ 01 ≥ 0,            λ˜ + λ˜ = 1, λ˜ ≥ 0, λ˜ ≥ 0 10 11 10 11 It is now easy to verify that the above set is equal to P j1 j2 (C). The statements that P j1 j2 ... jl (C) = P j1 (P j2 ... jl (C)), j1 6= j2 6= . . . 6= jl , and P12... p (C) = P1 (P2... p (C)) = C can be proved by induction using similar arguments. t u

A method for convex integer programming

519

2.2. Minimum distance problem Let x¯ be a solution to the current continuous relaxation of (1). If any of the first p components of x¯ are not in {0, 1}, then we want to add a valid inequality to the current continuous relaxation such that this inequality is violated by x¯ . This inequality need not be linear. However, if we consider generating linear inequalities, then the problem can be stated as that of finding a separating hyperplane that separates x¯ from the set P j , or more generally separating x¯ from the set P j1 j2 ... jl . With this as a motivation, let us consider the following minimum distance problem that minimizes the distance, measured by some norm, from x¯ to a point in P j (C) : min

x∈P j (C)

f(x) ≡ kx − x¯ k .

(6)

This problem has a solution because from Assumptions A1 and A2, it follows that the set C is compact. It is straight forward to verify that a solution of (6) is given by the x components of a solution of min

(x,u 0 ,u 1 ,λ0 ,λ1 )∈M j (C)

f(x).

(7)

The difficulty in solving (7) is that the equality constraints defining x are not linear.

2.3. Convex reformulation We now show that by a transformation of variables, the constraints defining M j (C) can be represented by a set of convex constraints. This allows the separation problem to be solved directly by solving a single convex program. First we define a new function related to a given convex function gi (x) : C ⊂ Rn → R. Let ( λgi (˜x /λ) for x˜ /λ ∈ C, λ > 0 (8) h i (˜x , λ) ≡ 0 for x˜ = 0, λ = 0, and let x˜ = λx. Now consider the set   h i (˜x , λ) ≤ 0, for i = 1, . . . , m,        0 ≤ x˜ ≤ λ, for i = 1, . . . p,  i C˜ = (˜x , λ) .  −λL ≤ x˜ i ≤ λL, for i = p + 1, . . . , n,        0 ≤ λ ≤ 1

(9)

Note that C˜ is obtained from C by using the above transformation of variables and multiplying each of the constraints defining C by λ. Assumption A1 ensures that x˜ → 0 as λ → 0 and Assumption A2 ensures that h i (˜x , λ) → 0 as λ → 0. Therefore, we can define h(0, 0) = 0. The only functions defining C˜ that are not obviously convex are those of the form h(˜x , λ). The following lemma shows that h(˜x , λ) is a convex function ˜ over C.

520

Robert A. Stubbs, Sanjay Mehrotra

Lemma 1. Let h(˜x , λ) : C˜ → < be defined as in (8). Then, h(˜x , λ) ≥ −L for all ˜ (˜x , λ) ∈ C. ˜ and 0 ≤ α ≤ 1. First Proof. Let (v1 , λ1 ) and (v2 , λ2 ) be any two points in the set C, assume that λ1 > 0, and λ2 > 0. We have h αv1 + (1 − α)v2 , αλ1 + (1 − α)λ2 αv1 + (1 − α)v2 = αλ1 + (1 − α)λ2 g αλ1 + (1 − α)λ2 αλ1 v1 (1 − α)λ2 v2 1 2 = αλ + (1 − α)λ g + 1 αλ1 + (1 − α)λ2 λ1 αλ + (1 − α)λ2 λ2 1 2 1 v v αλ (1 − α)λ2 ≤ αλ1 + (1 − α)λ2 g + g 1 2 1 1 2 αλ + (1 − α)λ λ αλ + (1 − α)λ λ2 1 2 v v = αλ1 g 1 + (1 − α)λ2 g λ λ2 = αh v1 , λ1 + (1 − α)h v2 , λ2 . The inequality above follows from the convexity of g(x). Now if λ1 = λ2 = 0, then there is nothing to be proved (since v1 and v2 must also be zero and h(0, 0) = 0). Therefore, without loss of generality assume that λ1 = 0 and λ2 > 0. Then, 2 v 1 2 1 2 2 h αv + (1 − α)v , αλ + (1 − α)λ = (1 − α)λ g λ2 = αh(0, 0) + (1 − α)h v2 , λ2 . The boundedness of h(˜x , λ) follows because 0 ≤ λ ≤ 1 and gi (˜x /λ) ≥ −L over C.

t u

Let ˜ x j = 0}, C˜ 1j = {(˜x , λ) ∈ C|˜ ˜ x j = λ}, C˜ 0j = {(˜x , λ) ∈ C|˜ v0 = λ0 u 0 , and v1 = λ1 u 1 , and consider the following set: ( ) x = v + v , v ∈ C˜ 0 , v ∈ C˜ 1 , 0 1 0 1 j j ˜ j (C) = (x, v0 , v1 , λ0 , λ1 ) . M λ0 + λ1 = 1, λ0 ≥ 0, λ1 ≥ 0

(10)

(11)

˜ j (C) Note that, since C is a compact set, it can be shown that the C˜ and the set M is also a compact set. The following theorem formally states that solving the minimum distance problem over (11) is equivalent to solving (7). Theorem 2. Let P j (C) be defined as in (4), w ≡ (x, v0 , v1 , λ0 , λ1 ) and f(w) ≡ kx− x¯ k. Then, n o ˜ j (C) . P j (C) = x | w ∈ M

A method for convex integer programming

521

Furthermore, x ∗ is an optimal solution of (6) if and only if the problem min

˜ j (C) w∈M

f(w)

(12)

has a solution whose x components are x ∗ .

t u

3. Separating inequalities Next we show how a cut can be generated once an optimal solution of the minimum distance problem (6) is obtained. Since our feasible set is compact and the objective function is continuous, we know that an optimal solution of (6) exist, and from Theorem 2 an optimal solution of (12) also exist. The following theorem is on the existence of cuts. It does not assume the knowledge of Lagrange multipliers at the optimal solution of (6). Theorem 3. Let x¯ ∈ / P j (C), and xˆ be an optimal solution of (6). Then there exists a ξ such that ξ T (x − xˆ ) ≥ 0

(13)

is a valid linear inequality in x which cuts away x¯ . Furthermore, ξ is a subgradient of f(x) at xˆ . Proof. Since xˆ is an optimal solution, there exist a ξ such that ξ T (x − xˆ ) ≥ 0 for all x ∈ P j (C) (Theorem 3.4.3 [4]). Furthermore, ξ is a subgradient of f(x) at xˆ . From the convexity of f(x) we know that ξ T (x − xˆ ) ≤ f(x) − f(ˆx ). By letting x = x¯ , we get ξ T (¯x − xˆ ) ≤ −k¯x − xˆ k < 0, hence x¯ does not satisfy (13).

t u

Clearly if the subdifferential set of f(x) at xˆ has only one element, then ξ in the above theorem is this element. This is the case if f(x) is differentiable at xˆ . For example, if f(x) = kx − x¯ k22 , then ξ = ∇ f(x) = 2(x − x¯ ). We now give results that can be used to compute a subgradient defining a cut if f(x) is not a differentiable function. The two cases that are of particular importance to us are: (i) f(x) = kx − x¯ k1 and (ii) f(x) = kx − x¯ k∞ . However, we start with the most general situation first. ˜ j (C), let For a nonlinear inequality constraints h i (w) ≤ 0 giving M n o ˜ j (C) , ∂h i (w) ˆ ≡ ξˆi |h i (w) ≥ h i (w) ˆ + ξˆiT (w − w), ˆ for all w ∈ M (14) ˜ j (C) (subdifferential set for short). and call this set the subdifferential set of h i over M In order to simplify notation we also use ∂h i (w) ˆ to represent the gradient of a linear inequality constraint. Let ∂ f(w) ˆ be the subdifferential set of f defined in the standard way as n o ˆ f(w) ≥ f(w) ∂ f(w) ˆ ≡ ξ| ˆ + ξ T (w − w), ˆ for all w ∈
522

Robert A. Stubbs, Sanjay Mehrotra

Theorem 4. Consider the minimum distance problem (12) and assume that w ˆ is a feasible solution of (12). Let A be the set of active inequality constraints at w. ˆ Also let J be the Jacobian of the linear equality constraints in (12). Assume that the system P ξˆ + i∈A µi ξˆi + J T π = 0, (15) ˆ µi ≥ 0, i ∈ A, ξˆ ∈ ∂ f(w), ξˆi ∈ ∂h i (w), ˆ i ∈ A. has a feasible solution. Then, w ˆ is an optimal solution and ξ in Theorem 3 can be obtained from ξˆ = (ξ, 0) . Proof. Assume that (15) has a feasible solution and w ˆ is not optimal. Then, for some w ∈ ˜ j (C), f(w) M ˆ + ξˆ T (w − w) ˆ ≤ f(w) < f(w), ˆ hence ξˆ T (w − w) ˆ < 0. Similarly, for the ˆ ≤ 0, J(w − w) ˆ = 0. Hence, the system ξˆ T d < 0, ξˆiT d ≤ 0,Jd = 0, same w, ξˆiT (w − w) has a solution for any ξˆ ∈ ∂ f(w), ˆ and any ξˆi ∈ ∂h i (w), ˆ which from Farka’s lemma implies that (15) does not have a solution, reaching a contradiction. The convexity of ˜ j (C). Now, for w ∈ M ˜ j (C), h i (w) implies that h i (w) − h i (w) ˆ ≥ ξˆiT (w − w) ˆ ∀w ∈ M T ˆ ˆ = 0. Hence, 0 ≥ ξi (w − w), ˆ therefore for µi ≥ 0 h i (w) ≤ 0, and for i ∈ A, h i (w) X µi ξˆiT (w − w) ˆ ≤ 0. i∈A

Now, since J(w − w) ˆ = 0, and the only non-zero elements in ξˆ correspond to x because f(w) uses only x variables we have ξ T (x − xˆ ) ≥ 0. The remaining proof is similar to the proof of Theorem 3. t u The existence of Lagrange multipliers can be shown under the constraint qualifi˜ j (C) at w cation condition that the cone in (15) is equal to the normal cone of M ˆ (for example, see Theorems VII.1.1.1 and VII.2.1.4 in [10]). The following proposition gives a way for numerically computing ∂h i (w) for the nonlinear convex constraints. Proposition 2. Let ∂h i (w) be defined as in (14) and let gi (u) represent the corresponding constraint in M j (C). Here w = (u, λ). Let ∂gi (u) denote the subdifferential set of gi (u) at u ∈ M j (C). For λ > 0, we have (ξ, g(u) − ξ T u) ∈ ∂h(u, λ), where ξ ∈ ∂g(u/λ). For λ = 0, we have (ξ, g(0)) ∈ ∂h(0, 0), ξ ∈ ∂g(0). Proof. The proof follows by direct substitution in the definition of ∂h(u, λ).

t u

A subset of ∂h i (w) is given by taking the convex hull of the vectors in the above proposition (appending 0 for the variables that do not appear in h(˜x , λ)). As a consequence a subset of the cone in (15) is available for computing Lagrange multipliers. The computation of ξ from Theorem 4 is still not very convenient since it requires the solution of a subproblem that may not be easily solvable. If the constraint functions gi (u) are differentiable, then ∂h(w) as given in Proposition 2 has only one computable element. Furthermore, if f(w) is either the 1-norm

A method for convex integer programming

523

function, or the ∞-norm function then ∂ f(w) ˆ is a polytope. Therefore, in this case ξ can be obtained by solving the following feasibility linear program: ξ¯ +

P

ˆ + JT π = i∈A µi ∂h i (w) Pk µi ≥ 0, ξˆ = i=1 λi ξ i , P λi ≥ 0, ki=1 λi = 1,

0, (16)

where ξ i are the extreme points of ∂ f(w) at w. ˆ All the extreme points of ∂ f(w) at w ˆ are easily computable if f(w) is a 1-norm or ∞−norm function. The discussion thus far assumed that the optimal solution of (6) or (12) is computed by an algorithm which does not necessarily compute the Lagrange multipliers corresponding to the constraints. The discussion in such generality was necessary to include non-differentiable convex functions and algorithms that do not necessarily compute the Lagrange multipliers. The case where an algorithm for the minimum distance problem (12) gives Lagrange multipliers, the computation of ξ is straight forward since we can use the Lagrange multipliers in (15). In such cases the 1-norm and the ∞−norm objective functions can be written as linear functions by introducing additional variables and/or linear constraints using the standard techniques and ξ can be recovered from the solution of the problem thus obtained. The next theorem gives a linear projection cone available to us at the solution of the minimum distance problem (12). This projection cone can be used to identify additional cuts by considering linear programming subproblems. The linear projection cone is ˜ j (C) at the optimal obtained from taking the linear approximation of constraints in M solution of the separation problem. Theorem 5. Let us follow the Assumptions of Theorem 4 and let J = [Jx | J¯ x ] be the ¯ partition of J so that the columns h i in Jx correspond to variables x and Jx have the T ¯ x] remaining columns. Let H = ξˆi , ξˆ ∈ ∂h i (w), ˆ i ∈ A. Also partition H = [Hx | H following the partition of J. Let b be the right hand side vector corresponding to the linear equality constraints and c be a vector such that ci = ξˆiT w. ˆ Then, π T Jx x + µT Hx x ≤ π T b + µT c, where

n

o ¯ xT µ = 0, µ ≥ 0 , (π, µ) | J¯ xT π + H

and π and µ are vectors of appropriate lengths, gives a valid linear inequality for P j (C). t u We also note that once the separation problem is solved, linear approximations of the set C 0j and C 1j can be taken at uˆ 0 and uˆ 1 , respectively, and one can use these linear approximations to lift with respect to other fractional variables to obtain additional cuts cheaply by solving linear programs.

524

Robert A. Stubbs, Sanjay Mehrotra

4. Representation of tighter relaxations 4.1. Relationship with the linear case In this section we discuss the relationship of our results with those for the linear case. Consider the set K = x ∈
(18)

They linearize (18) by substituting yi for x i x j , i = 1, . . . , n, i 6= j, and x j for x 2j . After making the substitutions we get   y j = x j,     ˜ ˜ (x, y) A(x − y) − (1 − x j )b ≤ 0, . (19)     Ay ˜ − x j b˜ ≤ 0 It can be shown that for the linear case the description of convex hull in Section 2 reduces to (19). However, it is interesting to note that for general convex problems, the convex hulls constructed in Section 2 cannot be constructed by simply multiplying each constraint with x j and 1 − x j and linearizing these constraints. It can also be shown that computationally the most successful method in Balas, Ceria, and Cornuéjols [2,3] corresponds to the ∞-norm minimum distance problem. Sherali and Adams [20,21] and Lovász and Schrijver [12] have given the following tighter set by lifting the linear constraints into an even higher dimensional space. They multiply (17) by x j and (1 − x j ) for j = 1, . . . , p to obtain the nonlinear system ˜ − b) ˜ ≤ 0, (1 − x 1 )( Ax ˜ − b) ˜ ≤ 0, x 1 ( Ax .. . ˜ ˜ (1 − x p )( Ax − b) ≤ 0,

(20)

˜ − b) ˜ ≤ 0. x p ( Ax They then linearize (20) by substituting yi j for x i x j , setting yi j = y ji , for i = 1, . . . , n, j = 1, . . . , p, i 6= j, and substituting x j for x 2j , j = 1, . . . , p.

A method for convex integer programming

525

4.2. Generalizing Sherali and Adams [20] and Lovász and Schrijver [12] constructions We now describe a way of generating sets for MICP that specialize to the Sherali and Adams [20,21] and Lovász and Schrijver [12] sets for the linear case. Consider the following set:    x  x = λk0 u k0 + λk1 u k1 , u k0 ∈ Ck0 , u k1 ∈ Ck1 ,      k0      k0 k1  u k1  λk0 + λk1 = 1, λ ≥ 0, λ ≥ 0   ˆ M(C) =  u  k1 k1 , (21) j1 j1      λk0  (λ u ) j = (λ u )k ,      k1 for k = 1, . . . p, j = 2, . . . p, j > k  λ where Ck0 , Ck1 are defined in (10). Also, let n o ˆ N (C) = x x, u k0 , u k1 , λk0 , λk1 , k = 1, . . . p ∈ M(C)

(22)

ˆ be the projection of M(C) onto the x space. The following theorem is a generalization of results in Sherali and Adams [20] and Lovász and Schrijver [12]. The statement (ii) in the following theorem shows that the last set of equality constraint in (21) are valid equality constraints. The statement (iii) in this theorem shows that the set C is obtained if the above lifting (followed by projection) is done p times. We leave the proof the this theorem to the reader. Theorem 6. Let N 1 (C) = N (C) and N t (C) = N (N t−1 (C)), for t ≥ 2. Then, p (i) N (C) ⊆ ∩ j=1 conv C ∩ x j ∈ {0, 1} , (ii) C ⊆ N (C) (iii) N p (C) = C.

t u

Note that we can use the transformation of variables vk0 = λk0 u k0 and vk1 = λk1 u k1 , for k = 1, . . . p into (21) to obtain a convex program similar to (12) defining a minimum distance problem. In particular, we have the set    x     0 k0 k1 k0 k1 1     k0  x = v + v , v ∈ C˜ k , v ∈ C˜ k ,    v      λk0 + λk1 = 1, λk0 ≥ 0, λk1 ≥ 0   k1   ˆ ˜ M(C) =  v  k1 . (23)     (v ) j = (v j1 )k ,  k0     λ     for k = 1, . . . p, j = 2, . . . p, j > k    k1 λ ˆ C) ˜ onto the x − space is also N (C). The following theorem shows The projection of M( that a semidefinite matrix inequality constraint can also be added to the constraint set in (23) to generate an even tighter region as proposed by Lovász and Schrijver [12] for the 0-1 linear problems.

526

Robert A. Stubbs, Sanjay Mehrotra

Theorem 7. Let V jk = (vk1 ) j = (v j1 )k , k = 1, . . . p, j = 1, . . . p, and V = [V jk ], and partition x = (x B , x R ), where B = {1, . . . p} and R = { p + 1, . . . n}. Then the inequality V − x B x TB 0 is a valid convex inequality for C which can be added to the set of constraints in (23). Here 0 means that the matrix is positive semidefinite. Proof. First note that the set of constraints in (23) imply that if x k ∈ {0, 1}, then vk0 = x(1 − x k ) and vk1 = xx k . Hence, V − x B x TB = 0, is a valid constraint for C. The constraint V − x B x TB 0, is simply a relaxation of this constraint. Furthermore, it is equivalent to the constraint " # V xB 0. x TB 1 This inequality gives a convex region because the set of semidefinite matrices is a convex set. t u ˆ C) ˜ we have After adding the semidefinite constraint to the constraint set defining M( a theorem similar to Theorem 6.

5. A branch-and-cut algorithm We now describe a branch-and-cut algorithm that can be used to solve MICP. The description here is analogous to the description of branch-and-cut methods for mixed 0-1 integer linear programming problems [3]. Let F0 be the set of binary variables fixed at zero, and F1 be the set of binary variables fixed at 1. Branch-and-cut procedure for MICP Input: c, gi (x), i = 1 . . . m, p. 1. Initialization: Set S = {(F0 = ∅, F1 = ∅)} and UB = ∞. 2. Node Selection: If S = ∅ stop, otherwise choose an element (F0 , F1 ) ∈ S and remove it from S. 3. Lower bounding step: Solve the convex program CP(F0 , F1 ) : minimize cT x subject to gi (x) ≤ 0 0 ≤ xi ≤ 1 xi = 0 xi = 1

for for for for

i i i i

= 1, . . . , m, = 1, . . . , p, ∈ F0 , ∈ F1 .

(24)

If the problem is infeasible go to Step 2, otherwise let x¯ denote an optimal solution of CP(F0 , F1 ). If cT x¯ ≥ UB, go to Step 2. If x¯ j ∈ {0, 1}, j = 1, . . . p, let x ∗ = x¯ , UB = cT x¯ and go to Step 2.

A method for convex integer programming

527

4. Branching or cutting: If more cutting planes should be generated at this node then go to Step 5, else go to Step 6. 5. Cut generation: Pick an index j such that 0 < x¯ j < 1, j ∈ {1, . . . p} and generate a separating inequality ξ T (x − xˆ ) ≥ 0 valid for CP(F0 , F1 ) : minw∈M ˜ j (C) f(w), x i = 0 for i ∈ F0 , x i = 1 for i ∈ F1 , ˜ j (c) is defined in (11) and w, f(w) are defined in Theorem 2. If we find a valid where M inequality, make this inequality valid for MICP and go to Step 3, otherwise go to Step 6. 6. Branching step: Pick an index j ∈ {1, . . . p} such that 0 < x¯ j < 1. Generate the subproblems corresponding to (F0 ∪ { j}, F1), and (F0 , F1 ∪ { j}), calculate their lower bounds and add them to S. Go to Step 2. All of the steps in the above algorithm are completely defined except for Step 5. We note that a valid inequality in Step 5 will be found if the algorithm used in Step 3 generates an extreme point solution of the set of optimal solutions. We now describe how an inequality valid for the current node of the branch-and-bound tree can be made valid for the entire branch and bound tree. This proces is called lifting. The lifting procedure here extends the lifting procedure in [2]. In this discussion we assume that Lagrange multipliers at the optimal solution of minimum distance problem are known from (16). Recall that (16) assumes that ∂h i (w) has only one computable element. Let F = {1, . . . n}\(F0 ∪ F1 ) denote the set of variables that are not yet fixed at node (F0 , F1 ). We assume, without loss of generality, that F1 = ∅, since if F1 6= ∅ then all variables x k ∈ F1 can be replaced by yk = 1 − x k . Note that this substitution retains convexity of the constraint functions in new variables. Let x¯ F be the solution of the continuous relaxation at the current node, and x¯ Fj be fractional. A valid inequality is found by solving the minimum distance problem

min x F − x¯ F s.t. x F − v0F − v1F = 0, v0F j = 0, v1F j − λ1F = 0, λ0F + λ1F = 1, λ0F ≥ 0, λ1F ≥ 0, h i v0F , λ0F ≤ 0, i = 1, . . . m + 2n + 1, h i v1F , λ1F ≤ 0, i = m + 2n + 2, . . . 2(m + 2n + 1).

(25)

Let xˆ F be an optimal solution of (25). Let A0F and A1F be the index set of the constraints h i (v0F , λ0F ) and h i (v1F , λ1F ) active at xˆ F , respectively. Let [H0F : H0λ ] and [H1F : H1λ] be the matrices whose rows are ∂h i (v0F , λ0F )T , i ∈ A0F and ∂h i (v1F , λ1F )T , i ∈ A1F , respectively. Let (πxF , γλF , γ0F , γ1F ) be the Lagrange multipliers for the equality constraints and (µ0 , µ1 ) be the Lagrange multipliers corresponding to the inequality constraints in A0F and A1F respectively. The multipliers computed from (16) ensures (πxF , γλF , γ0F , γ1F ),

528

Robert A. Stubbs, Sanjay Mehrotra

and (µ0F , µ1F ) such that −πxF + H0F

T

F T

µ0F + γ0F e j = 0,

πxF + − H1 µ1F + γ1F e j = 0, T H0λ µ0F + −γλF = 0, T H1λ µ1F + γλF − −γ1F = 0,

(26)

µ0F ≥ 0, µ1F ≥ 0,

and ξ F = −πxF . Here e j is the jth column of the identity matrix. A valid inequality is given by T T πxF x F ≤ πxF xˆ F . (27) The cut (27) is valid for the current node. Note that in (27) we have used the special form of equality constraints. The following theorem shows how this cut can be made valid for the entire branch-and-bound tree. Theorem 8. Let xˆ = (ˆx F , 0), vˆ 0 = (ˆv0F , 0), vˆ 1 = (ˆv1F , 0), λˆ 0 = λˆ 0F , λˆ 1 = λˆ 1F , and w ˆ = (ˆx , vˆ 0 , vˆ 1 , λˆ 0 , λˆ 1 ). Let A0 , and A1 be the index set of the constraints h i (v0 , λ0 ) and h i (v1 , λ1 ) active at w. ˆ Let [H0 : H0λ ] and [H1 : H1λ ] be matrices whose rows are ∂h i (v0 , λ0 )T , i ∈ A0 and ∂h i (v1 , λ1 )T , i ∈ A1 , respectively. Let us consider the following system of equalities and inequalities: −π + H0T µ0 + γ0 e j ≥ 0, −π + H1T µ1 + γ1 e j ≥ 0, T H0λ µ0 + γλ = 0, T H1λ µ1 + γλ − γ1 = 0, µ0 ≥ 0, µ1 ≥ 0.

(28)

A feasible solution of (28) can be constructed as follows: µ0 = (µ0F , 0), µ1 = (µ1F , 0), γλ = γλF , γ0 = γ0F , γ1 = γ1F , πi = πiF , i ∈ F, and πi = min{eiT H0T µ0 , eiT H1T µ1 } i ∈ / F. Furthermore, π T x ≤ π T xˆ gives an inequality which is valid for the entire branch-and-bound tree. Proof. First observe that H0F ei = H0ei for i ∈ F because of our assumption that ∂h i (w) has only one computable element. The feasibility of (µ0 , µ1 , γλ , γ0 , γ1 ) can be verified by direct substitution. Now consider the following linear approximation of the constraint set of (11) at (ˆx , vˆ 0 , vˆ 1 , λˆ 0 , λˆ 1 ) :   x − v0 − v1 = 0,           λ λ ˆ   v + H λ ≤ H v ˆ + H λ , H 0 0 0 0 0 0   0 0       λ λ ˆ H v + H λ ≤ H v ˆ + H λ , 1 1 1 1 1 1 1 1 L j (ˆx ) ≡ x . (29)  λ0 + λ1 = 1, λ0 ≥ 0, λ1 ≥ 0,            (v0 ) j = 0,         (v1 ) j − λ1 = 0

A method for convex integer programming

529

Clearly, P j (C) ⊆ L j (ˆx ). Therefore, a valid linear inequality for (29) is also valid for P j (C). Now, using the feasible solution of (28) we have T T π T x + −π + H0T µ0 + γ0 e j v0 + −π + H1T µ1 + γ1 e j v1 + µ0T H0λ + γλ λ0 + µ1T H1λ + γλ − γ1 λ1 ≤ µ0T H0vˆ 0 + µ0T H0λ λˆ 0 + µ1T H1vˆ 1 + µ1T H1λλˆ 1 + γλ . The result follows by observing that (−π + H0T µ0 + γ0 e j )T v0 ≥ 0, (−π + H1T µ1 + γ1 e j )T v1 ≥ 0, H0λµ0 + γλ λ0 = 0, H1λµ1 + γλ λ1 = 0, and that (µ0T H0vˆ 0 + µ0T H0λλˆ 0 ) + (µ1T H1vˆ 1 + µ1T H1λλˆ 1 ) + γλ = (µ0F )T (H0F vˆ 0F + H0λ λˆ 0F ) + (µ1F )T (H1F vˆ 1F + H1λλˆ 1F ) + γλF = (πxF )T (ˆv0F + vˆ 1F ) = π T xˆ . Note that we have used (25) and (26) in observing these equalities. t u 6. Preliminary computational experience In this section we demonstrate the effectiveness of branch and cut algorithm developed in the earlier sections. We give results on each of the four real world test problems that were used in Duran and Grossmann [6]. This test set is small, and the results presented here should be considered preliminary.

6.1. Test problems The first three problems (PD1, PD2, PD3) are process synthesis problems, where the goal is to simultaneously determine the optimal structure and operating conditions of a process that will meet certain given design specifications. The fourth example (PP1) is a problem of determining optimal positioning of a new product in a multiattribute space. The description of these problems are available in [6]. The statistics on the test problems is given in Table 1. The first column gives problem name, the second column gives the total number of variables, the third column gives the number of integer variables, and the fourth column gives the number of constraints of the original problem. The column labeled NL gives the number of nonlinear constraints. The column labeled Rex gives the objective value of the problem obtained by relaxing the integrality constraints and the last column gives the optimal objective value of these problems. Table 1. Test problem statistics Prob PD1 PD2 PD3 PP1

n 6 11 17 30

p 3 5 8 25

m 6 14 23 30

NL 2 3 4 25

Rex 0.75928439 -0.554405 15.082191 -16.419774

Opt 6.009759 73.035316 68.009744 -8.064182

530

Robert A. Stubbs, Sanjay Mehrotra

6.2. Computational results The results reported in this subsection are obtained from a preliminary implementation of our algorithm. In the implementation we generated several rounds of cuts at the root node and then solved the problem using an implementation of branch-and-bound algorithm. No cuts are generated in the branch-and-bound tree. In each round we generate a cut for each fractional variable sequentially. The first column of the Table 2 gives the number of cuts at which the code started branch-and-bound algorithm. If 0 cuts were generated, then the problem was solved using a pure branch-and-bound method. Our branch-and-bound implementation uses a depth-first search strategy. At each node of the branch-and-bound tree, the variable that is most fractional is chosen as the branching variable. A tie (it never happened) is broken in favor of the smallest index variable. Once a branching variable is chosen, we always branch on the node x j = 0 first. A node is fathomed if it is infeasible or if its lower bound exceeds the best known upper bound. When a node is fathomed, the node having the smallest lower bound (in a minimization problem) is selected as a branching node. The continuous relaxation subproblems and the separation problems are solved by using MINOS 5.4 [13]. We experimented with 1-norm, 2-norm, and ∞-norm objective function in the separation problem, and found the performance of ∞-norm to be the best. Therefore, the results in Table 2 are reported for this separation problem only. The first column in Table 2 gives the number of cuts, the second column gives the % gap closed by the cuts in the solution of the continuous relaxation problem, the third column shows the number of node at which the optimal solution was found in the branch-and-bound tree, and the last column reports the number of branch-and-bound node at which the optimality was verified. Table 2. Preliminary computational results Cuts 0 3 6 9

Problem PD1 %GC Opt Node 0.0 3 84.0 3 100.0 3 100.0 0

Nodes 4 4 4 0

Cuts 0 5 10 15 20

Cuts 0 8 16 24 32 34

Problem PD3 %GC Opt Node 0.0 16 61.2 13 74.8 9 94.8 2 100.0 1 100.0 0

Nodes 20 18 12 4 2 0

Cuts 0 25 50 75 100

Problem PD2 %GC Opt Node 0.0 11 89.3 4 92.1 5 96.2 2 100.0 0 PP1 %GC Opt Node 0.0 65 71.5 33 92.6 15 98.2 20 99.5 16

Nodes 16 4 6 2 0 Nodes 92 44 28 34 34

The computational results in Table 2 show that for all of the process design problems the gap was closed by 100% just by generating cuts. For the product positioning problem (PP1), the gap was closed to 99.5% by using 4 rounds of cuts. When we generated the fifth round of cuts, due to numerical inaccuracies, we found that the optimal solution was

A method for convex integer programming

531

cut away, therefore we have not reported these results. This is an important phenomenon, because it points out that unless proper care is taken, cuts may not be valid because of convergence issues when solving a nondifferential program (12) using an algorithm for continuously differentiable optimization. Although the total number of cuts generated to close the gap by 100 % are greater than the number of nodes in the branch-and-bound algorithm started with no cuts, we observe that for problems PD2 and PP1, the total number of cuts and the number of branch-and-bound nodes is smaller if the branch-and-bound algorithm was started after generating one round of cuts. Since both the cut generation problem and the continuous relaxation problem solve a convex program, the results show that cuts are useful for problems PD2 and PP1. Our results should be considered very preliminary, as we have not implemented any enhancements such as cut strengthening and generating multiple cuts. We expect that this would lower the cost of generating cuts and also strengthen the cuts that are generated. Acknowledgements. We thank two anonymous referees for their constructive suggestions towards this paper. In particular, the discussion in Sections 3 and 5 is significantly improved due to insightful remarks of one of the referees.

References 1. Adams, W.P., Sherali, H.D. (1990): Linearization strategies for a class of zero-one mixed integer programming problems. Oper. Res. 38, 217–226 2. Balas, E., Ceria, S., Cornuéjols, G. (1993): A lift-and-project cutting plane algorithm for mixed 0-1 programs. Math. Program. 58, 295–324 3. Balas, E., Ceria, S., Cornuéjols, G. (1996): Mixed 0-1 programming by lift-and-project in a branch-andcut framework. Manage. Sci. 42, 1229–1246 4. Bazaraa, M.S., Sherali, H.D., Shetty, C.M. (1993): Nonlinear Programming: Theory and Algorithms, second edition. John Wiley & Sons 5. Duran, M.A. (1984): A Mixed-Integer Nonlinear Programming Approach for the Systematic Synthesis of Engineering Systems. PhD thesis, Carnegie-Mellon University, December 1984 6. Duran, M.A., Grossmann, I.E. (1986): An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36, 307–339 7. Floudas, C.A. (1995): Nonlinear and Mixed Integer Optimization: Fundamentals and Applications. Oxford University Press 8. Geoffrion, A.M. (1972): Generalized benders decomposition. J. Optim. Theory Appl. 10, 237–260 9. Hansen, P., Jaumard, B., Mathon, V. (1993): Constrained nonlinear 0-1 programming. ORSA J. Comput. 5, 97–119 10. Hiriart-Urruty, J-B., Lemaréchal C. (1993): Convex Analysis and Minimization Algorithm-I, vol. 305. Springer 11. Kocis, G.R., Grossmann, I.E. (1987): Relaxation strategry for the structural optimization of process flow sheets. Ind. Eng. Chem. Res. 26, 1868–1880 12. Lovász, L., Schrijver, A. (1991): Cones of matrices and set-functions and 0-1 optimization. SIAM J. Optim. 1, 166–190 13. Murtagh, B.A., Saunders, M.A. (1995): Minos 5.4 user’s guide. Technical Report SOL 83-20R. Stanford University, Systems Optimization Laboratory, Department of Operations Research, Stanford University, Stanford, California 94305-4022 14. Nemhauser, G.L. (1994): The age of optimization: Solving large-scale real-world problems. Oper. Res. 42, 5–13 15. Nemhauser, G.L., Wolsey, L.A. (1988): Integer and Combinatorial Optimization. John Wiley & Sons 16. Nesterov, Y., Nemirovskii, A. (1993): Interior-Point Polynomial Algorithms in Convex Programming. SIAM Stud. Appl. Math.

532

Robert A. Stubbs, Sanjay Mehrotra: A method for convex integer programming

17. Noonan, F., Giglio, R.J. (1977): Planning electric power generation: A nonlinear mixed integer model employing benders decomposition. Manage. Sci. 23, 946–956 18. Padberg, M., Rinaldi, G. (1991): A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33, 60–100 19. Perold, A.F. (1984): Large scale portfolio optimization. Manage. Sci. 30, 1143–1160 20. Sjerali, H.D., Adams, W.P. (1990): A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM J. Discrete Math. 3, 411–430 21. Sjerali, H.D., Adams, W.P. (1994): A hierarchy of relaxations and convex hull characterizations for mixed-integer zero-one programming problems. Discrete Appl. Math. 52, 83–106