Recent advances in nonsmooth optimization

RECENT ADVANCES IN0NSM001H OPTIMIZATION This page is intentionally left blank RECENT ADVANCES INONSMOOTH OPTIMIZATIO...

Author: Du D.-Z. | Qi L. | Womersley R.S. (eds.)

55 downloads 1576 Views 181MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

RECENT ADVANCES IN0NSM001H OPTIMIZATION

This page is intentionally left blank

RECENT ADVANCES INONSMOOTH OPTIMIZATION Editors

Ding-Zhu Du Computer Science Department University of Minnesota Minneapoiis, USA

LiqunQi & Robert S. Womersley School of Mathematics University of New South Wales Sydney, Australia

World Scientifie Silgapore

- New Jersey

London • Hong Kong

Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

RECENT ADVANCES IN NONSMOOTH OPTIMIZATION Copyright © 1995 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, Massachusetts 01923, USA.

ISBN 981-02-2265-3

This book is printed on acid-free paper.

Printed in Singapore by Uto-Print

V

Preface The field of nonsmooth optimization is significant, not only because of the exis tence of nondifferentiable functions arising directly in applications, but also because several important methods for solving difficult smooth problems lead directly to the need to solve nonsmooth problems, which are either smaller in dimension or sim pler in structure. For example, decomposition methods for solving very large scale smooth problems produce lower-dimensional nonsmooth problems; penalty methods for solving constrained smooth problems result in unconstrained nonsmooth problems; nonsmooth equation methods for solving smooth variational inequalities and smooth nonlinear complementarity problems give rise to systems of nonsmooth equations. After the early work in nonsmooth optimization, many books and proceedings have appeared in this field. The area of nonsmooth optimization is lively, with rapid developments in its applications, theoretical foundations, and available computational methods. There are three features of nonsmooth optimization. The first feature is the variety of classes of nonsmooth optimization problems. They include not only optimization problems whose objective and constraint functions are nondifferentiable, but also optimization problems whose objective and constraint functions are first-order but not second-order differentiable. Superlinear convergence rates for the latter problems cannot be directly established using the smooth opti mization techniques. Furthermore, nonsmooth variational inequality problems, nons mooth equations, set-valued problems, bilevel programming, etc, are all in the general scope of nonsmooth optimization. Such a variety produces a, wide range of analytical tools and algorithms. The second feature is the rapid expansion of nonsmooth analysis. Starting from convex analysis, it now also includes generalized second-order derivatives, set-valued analysis, generalized convexity, and many other topics. On one hand, nonsmooth analysis is now a subject itself. On the other hand, it continuously provides useful tools for nonsmooth optimization. Terry Rockafellar plays a leading role in this field. This book includes two papers by him and his collaborators. We would like to dedicate this book to him on the occasion of his 60th birthday in October 1995. The third feature is the development of new computational methods in nonsmooth optimization. From convergence analysis to numerical implementation, we observe a renewed surge in this frontier. Early approaches included bundle methods and the adaption of smooth methods to convex composite problems. Now there is a much wider range of methods exploiting the structure of the nonsmooth problem. More than half of this book is devoted to this frontier. The book contains twenty five papers written by forty six authors from twenty countries in five continents. This illustrates the strong interest in nonsmooth opti mization. We hope this collection will provide the reader with a glimpse of the recent advances of nonsmooth optimization and their features.

vi We would like to thank the authors of the papers, the anonymous referees, our associate and assistants, Dr. Xiaojun Chen, Mr. Houyuan Jiang and Mr. Zengxin Wei, and the publisher for helping us to produce this excellent collection of papers. Ding-Zhu Du, Liqun Qi and Robert S. Womersley University of Minnesota and University of New South Wales May 1995

VII

Contents Hybrid Methods for Finding the Nearest Euclidean Distance Matrix S. Al-Homidan and R. Fletcher Subdifferential Characterization of Convexity R. Cornea, A. Jofre and L. Thibault

1 18

A Simple Triangulation of Rn with Fewer Simplices for Solving Nonsmooth Convex Programming C.-Y. Dang

24

On Generalized Differentiability of Optimal Solutions and its Application to an Algorithm for Solving Bilevel Optimization Problems S. Dempe

36

Projected Gradient Methods for Nonlinear Complementarity Problems via Normal Maps M. C. Ferris and D. Ralph

57

An NCP-Function and its Use for the Solution of Complementarity Problems A. Fischer

88

An Elementary Rate of Convergence Proof for the Deep Cut Ellipsoid Algorithm J. B. G. Frenk and J. Gromicho

106

Solving Nonsmooth Equations by Means of Quasi-Newton Methods with Globalization M. A. G. Ruggiero, J. M. Martinez and S. A. Santos

121

Superlinear Convergence of Approximate Newton Methods for LC1 Optimization Problems without Strict Complementarity J. Han and D.-F. Sun

141

On Second-Order Directional Derivatives in Nonsmooth Optimization L. R. Huang and K. F. Ng On the Solution of Optimum Design Problems with Variational Inequalities M. Kocvara and J. V. Outrata Monotonicity and Quasimonotonicity in Nonsmooth Analysis S. Komlosi

159

172 193

viii

Sensitivity of Solutions in Nonlinear Programming Problems with Nonunique Multipliers A. B. Levy and R. T. Rockafellar

215

Generalized Convexity and Higher Order Duality of the Non-Linear Programming Problem with Non-Negative Variables B. MondandJ.-Y. Zhang

224

Prederivatives and Second Order Conditions for Infinite Optimization Problems W. Oettli and Pham H. Sach

244

Necessary and Sufficient Conditions for Solution Stability of Parametric Nonsmooth Equations J.-S. Pang

261

Miscellaneous Incidences of Convergence Theories in Optimization and Nonlinear Analysis, Part II: Applications in Nonsmooth Analysis J.-P. Penot

289

Second-Order Nonsmooth Analysis in Nonlinear Programming R. Poliquin and T. Rockafellar Characterizations of Optimality for Homogeneous Programming Problems with Applications A. M. Rubinov and B. M. Glover On Regularized Duality in Convex Optimization A. Ruszczynriski

322

351 381

An Interior Point Method for Solving a Class of Linear-Quadratic Stochastic Programming Problems J. Sun, K.-E. Wee and J.-S. Zhu

392

A Globally Convergent Newton Method for Solving Variational Inequality Problems with Inequality Constraints K. Taji and M. Fukushima

405

Upper Bounds on a Parabolic Second Order Directional Derivative of the Marginal Function D. Ward

418

A SLP Method with a Quadratic Correction Step for Nonsmooth Optimization J.-Z. Zhang, C.-X. Xu and Y.-A. Fan

438

A Successive Approximation Quasi-Newton Process for Nonlinear Complementarity Problem S.-Z. Zhou, D.-H. Li and J.-P. Zeng

459

Euclidean

Distance

Matrices

1

itecent Advances in Nonsmooth Optimization, pp. 1-17 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Hybrid Methods for Finding the Nearest Euclidean Distance Matrix Suliman Al-Homidan Department of Mathematics, Saudi Arabia

Roger Fletcher Department of Mathematics DDl 4HN, Scotland, UK

King Saud University,

and Computer

Science,

Riyadh

11451, PO Box

University

of Dundee,

4511,

Dundee

Abstract

A concise characterization is presented for a Euclidean distance matrix in terms of null-space matrices, and methods for the solution of the Euclidean distance matrix problem are considered. One approach (Glunt et al. [8]) is to formulate the problem as a constrained least distance problem in which the constraint is the intersection of two convex sets. The Dykstra-Han projection algorithm can then be used to solve the problem. This method is globally convergent but the rate of convergence is slow. However the method does have the capability of determining the correct rank of the solution matrix, and this can be done in relatively few iterations. If the correct rank of the solution matrix is known, it is shown how to formulate the problem as a smooth unconstrained minimization problem, for which rapid convergence can be obtained by for example the BFGS method. This paper studies hybrid methods that attempt to combine the best features of both types of method. An important feature concerns the interfacing of the component methods. Thus it has to be decided which method to use first, and when to switch between methods. Also it may not be straightforward, as we shall see here, to use the output of one method to start the other method. Difficulties such as these are addressed in the paper. Comparative numerical results are reported.

S. Al-Homidan and R. Fletcher

2

1

Introduction

Symmetric matrices that have non-negative offdiagonal elements and zero diagonal elements arise as data in many experimental sciences. This occurs when the values are measurements of squared distances between points (e.g. atoms, stars, cities) in a Euclidean space. Such a matrix is referred to as a Euclidean distance matrix. Because of data errors such a matrix may not be exactly Euclidean and it is desirable to find the best Euclidean matrix which approximates the non-Euclidean matrix. The aim of this paper is to study methods for solving this problem. An important application arises in the conformation of molecular structures from nuclear magnetic resonance data (see Havel et al. [10] and Crippen [4], [5]). Here a Euclidean distance matrix is used to represent the squares of distances between the atoms of a molecular structure. An attempt to determine such a structure by nuclear magnetic resonance experiments gives rise to a distance matrix F which, because of data errors, may not be Euclidean. There are many other applications in subjects as diverse as archeology, cartography, genetics, geography and multivariate analysis. Pertinent references are given by Al-Homidan [1]. Characterization theorems for the Euclidean distance matrix have been given in many forms over the years. In Section 2 we show that a very concise form of this result can be proved in terms of null-space matrices, that brings out the underlying structure and is readily applicable to the algorithms that follow. Many advances have taken place in constrained optimization over the last forty years or so. There are now effective methods for situations in which the objective and constraint functions are smooth functions. Under reasonable assumptions, these methods can be shown to converge globally (that is from any starting point) to a point which satisfies optimality conditions for the problems. Also the rate of convergence can often be shown to be superlinear. Some progress has also been made for prob lems in which non-smooth functions occur. If these functions are a composition of a convex polyhedral function and a smooth function, then again globally and superlinear convergent methods have been suggested. A rather more difficult non-smooth optimization problem occurs when some matrix, defined in terms of the problem vari ables, has to be positive semi-definite. One way to handle this problem is to impose a functional constraint in which the least eigenvalue of the matrix is non-negative. However, if there are multiple eigenvalues at the solution, which is usually the case, such a constraint is non-smooth, and this non-smoothness cannot be modelled by a convex polyhedral composite function. An important factor is the determination of the multiplicity of the zero eigenvalues, or alternatively the rank of the matrix at the solution. If this rank is known it is usually possible to solve the problem by conven tional techniques. For the Euclidean distance matrix problem, a certain transformed matrix (Dj in Theorem 2.2 below) has to be positive semi-definite. It follows that the Euclidean distance matrix constraint shares the same non-smooth characteristics as the positive semi-definite matrix constraint. This observation is clear from the

Euclidean Distance Matrices

3

characterisation result for the normal cone given by Glunt et al. [8]. One approach [8] is to formulate the Euclidean distance matrix problem as a constrained least distance problem in which the constraint is the intersection of two convex sets. The Dykstra-Han alternating projection algorithm can then be used to solve the problem. This idea is outlined in Section 3. This method is globally convergent but the rate of convergence is linear or slower. It is this latter feature that has probably contributed to the relatively little interest that has been shown in such methods. However the method does have the capability of determining the correct rank of the solution matrix, and this can be done in relatively few iterations. If the correct rank of the solution matrix is known, it is shown in Section 4 how to formulate the problem as a smooth unconstrained minimization problem, for which rapid convergence can be obtained by for example the BFGS method. We discuss how best to parametrize the problem, and give expressions for the objective function and its first derivatives. A trial and error approach to estimating the correct rank is possible, but is not very appealing. Thus we are led to study hybrid methods in Section 5 of the paper. The hybrid method has two different modes of operation. One is a projection method which provides global convergence and enables the correct rank to be determined. The other is a quasi-Newton method which enables rapid convergence to be obtained. An important feature concerns the interfacing of these modes of operation. Thus it has to be decided which method to use first, and when to switch between methods. Also it may not be straightforward, as we shall see here, to use the output of one method to start the other method. Difficulties such as these are addressed in the paper. Numerical experiments are reported in Section 6. Recently, and since the research in this paper was carried out, there has been much interest in interior point methods applied to problems with semi-definite matrix constraints (e.g. Alizadeh et al. [2]). It would certainly be of interest to compare this approach with the hybrid methods described in our paper. Throughout this paper the lower case boldface letters such as x, y, v are used to denote vectors. Matrices are denoted by capital letters such as A, B, C. We use the notation Diag(y4) to denote diag(a„), i = 1,...,n . Superscript (fc) generally denotes quantities related to the fcth iterate, for example f' ', jf'" etc,. Quantities relating to the solution are superscripted with an asterisk, e.g. r~, £>*, etc.

2

The Euclidean Distance Matrix Problem

In this section the definition of the Euclidean distance matrix is given, and the rela tionship between points and distances is summarized. A characterization theorem for the Euclidean distance matrix is proved in a concise way that brings out the under lying structure and is readily applicable to the algorithms that follow. The theorem is essentially due to Schoenberg [12] in the case that p = Xi (see below). Young and Householder [13] independently obtain a similar result.

S. Al-Homidan and R. Fletcher

4

It is necessary to distinguish between distance matrices that are obtained in prac tice and those that can be derived exactly from n vectors that are irreducibly embed ded in IRr where r < n — 1 (this concept is explained below) Definition 2.1. A matrix D e IR"X" is called a distance matrix iff it is the diagonal elements are zero dn = 0

symmetric,

* = 1,..., n,

and the off-diagonal entries are non-positive d<j < 0

V«\± j .

Definition 2.2. A matrix D £ ]Rn*n is called a Euclidean distance matrix iff there exist n points X i , . . . , x n irreducibly embedded in IRr (r < n — 1) such that <% = - | | x , - x J | ^

Vt,j.

(2.1)

The negative sign in the definition of d,j is not common, but is included to simplify the subsequent presentation. The Euclidean distance problem can now be stated as follows. Given a distance matrix F G IR"*", find the Euclidean distance matrix D G IR"X" that minimizes \\F-D\\F

(2.2)

where \\.\\F denotes the Frobenius norm. To explain the concept of the irreducibly embedded space, we consider vectors X i , . . . , x n in ]Rm and examine the mutual displacements between these vectors. To determine a spanning set for these displacements we choose any fixed vector p that is a weighted combination n

n

1=1

i=i

of the vectors X ] , . . . ,x„ and examine the displacements from p. The rank r of this spanning set defines the dimension of the irreducibly embedded space as follows. Definition 2.3. The vectors^,.., ,x„ are irreducibly embedded in W iff the vectors X, — p, i = 1 , . . . ,ra have rank r. If r < m then it is possible to determine vectors v l t . . ., v„ in IR r+1 (a space of lower dimension than IRm) having the same mutual displacements. By definition of rank, the dimension of this space cannot be further reduced. The set of all vectors v that are weighted combinations of V i , . . . , v n then provides a, realization of the irreducibly embedded space.

Euclidean Distance Matrices

5

In matrix notation the vectors X i , . . . , x n tire columns of an m x n matrix X and we can express p = Xy for some vector y such that e T y = 1 where e = ( 1 , . . . , 1) T . The displacement vectors x,; — p , i = 1 , . . . ,n are columns of the matrix X -peT

= X - XyeT

= XP

(2.3)

where P = I -

T

(2.4)

ye T

is a skew projection matrix. Clearly rank(P) = n — 1 and P e = 0. A convenient choice for p is the vector Xi, in which case y is the unit vector ei and P is a matrix whose first column is the zero vector. Another possibility is to choose the centroid p = Xe/n, in which case P becomes the symmetric projection matrix P = I — e e T jn. The vector e is seen to be significant and we denote M = {v 6 H n : v r e = 0} as the null space of e. Let columns of a matrix Z £ IR"*!"-1) provide a basis for M. Z is called a null-space matrix and is characterized by rank(Z) = n — 1 and ZTe = 0. A possible choice for Z is the matrix, Zv say, obtained by selecting n — \ linearly independent columns from the matrix P in (2.4). Definition 2.3 is then equivalent to the statement that rank(A'Z p ) = r and hence rank(A'Z) = r

(2.5)

for any null-space matrix Z. This structure enables us to characterize a Euclidean distance matrix in a concise way. Theorem 2.1. Let D € Htnx" be a symmetric matrix with Diag(D) = 0. Then D is a Euclidean distance matrix iff D is positive semi-definite on M (or equivalently ZTDZ is positive semi-definite). Moreover if rank(Z T DZ) =r then D can be derived from vectors X j , . . . , X„ irreducibly embedded in IRr Proof Let D be a Euclidean distance matrix which is derived from columns of the m x n matrix X. Then [Z T DZ] 0 - = J2 zkidkizij = Y, ^-'(2x[x, - x[x fc LI

xfxfizij

kl

from (2.1). The term involving YM 2 t ,xjx t 2/j can be rearranged as ( £ A zwxjfxji) £ , ztj, and is zero because Z T e = 0. Likewise the term derived from x^xj is zero. Thus [ZTDZ]t]

= 2^2fclx[x,;,J = hi

2j2zh,xl/kxl,izh kql

or in matrix notation ZTDZ

= 2Z T A' T A'Z.

S. Al-Homidan and R. Fletcher

6

Hence ZTDZ is positive semi-definite. If the columns of X are irreducibly embedded in M r , then from (2.5) r = rank(XZ) = rank(Z T DZ). Conversely we let D be a symmetric matrix such that Diag(.D) = 0 and Z DZ is positive semi-definite of rank r, and we show how to construct a matrix X whose columns are irreducibly embedded in IRr such that D is derived from X. We define A = \PTDP where P is given by (2.4), and choose the matrix Zv above as the null space matrix. It readily follows that r = iank(Z*DZp)

= rank(yl).

[Proof: Since the extra column in P is a linear combination of the columns of Zv it follows that iank(ZlDZp) = rank(Zj.DP). Likewise we deduce that Tank(ZlDP) = rank(P T Z)P).] It also follows that A is positive semi-definite, so we can express A = XTX

(2.6)

where X £ IR rxn and rank(X) = r. Then - | | x ; - Xj||l = 2xfxj - xfx; - x j x j = 2ay - a,, - a:j = pfDPj

- fyjDpi - f p j # p ,

(2.7)

where p, denotes column i of P . It follows from (2.4) that p; = e, — y. Substituting into (2.7) and using du = 0 yields -|IX.--XJII2 =

d

a-

1

Thus D is derived from X. Moreover y Ay = 0 by definition of A and (2.4). It follows from (2.6) that \\Xy\\ = 0 and hence p = Xy = 0. Finally we can deduce from rank(X) = r that rank(XP) = r, and it follows from (2.3) and Definition 2.3 that the columns of X are irreducibly embedded in lRr. ■ The special case in which p = Xj, y = ei and P = I — eje T is particularly useful. The resulting matrix A = ^PTDP has zeros in the first row and column, and a general expression for the remaining elements is

a.i = Wv ~ d" - dij)

l

^ 2-

i ^ 2.

(2.8)

Then the vectors X i , . . . ,x„ that are constructed from (2.6) are such that Xi = 0. Another useful application of Theorem 2.1 occurs in the projection algorithm of Section 3 for which an orthogonal basis for the null space is available. T h e o r e m 2.2. Let Q G IR"X" be the Householder matrix given by Q= /-^^ww

T

,

w = (l,...,l,l + Vn)r

(2.9)

Euclidean Distance Matrices

7

Then the distance matrix D G IR nxn is a Euclidean distance matrix iff the (n - 1) x (n — 1) block D\ in

A

QDQ is positive

dT

di 6

(2.10)

semi-definite.

Proof Because Q is an orthogonal matrix and QTe = e„, it follows that the first n — 1 columns of Q provide a null space matrix Z. Since D\ = ZT DZ the result can be deduced from Theorem 2.1. ■

3

The Projection Algorithm

In this section we describe a projection algorithm due to Glunt et al. [8] for solving the Euclidean distance matrix problem (2.2). At the end of the section, a more simple and flexible rearrangement of the algorithm is also given. These algorithms are derived from an alternating projection algorithm due to Dykstra [6] for finding the least distance from a fixed point to an intersection of convex sets. This algorithm is given independently by Han [9]. An important feature is the generation of formulae for certain projection maps that are needed. More background is given about projection methods for the Euclidean distance matrix problem in [1]. The Dykstra-Han algorithm solves the problem minimize

||f — x|| 2

subject to

x € Pi Ki

m

;=i

where the Ki are convex sets in IR" and f is given. The algorithm initializes f° = f and generates a sequence If'*'} using the iteration formula f(*+D

=

f(*) + p m ( . . . ^ ( f W ) . . . ) _ ^ ( f W ) .

(3.1)

Here P,(f) denotes the /2 projection of f on to Kt, that is the (unique) nearest vector to f in Ki. It is shown by Boyle and Dykstra [3] that P,(... Pi(fw)...)-+ x* for any i > 1. However the sequence {f'*'} does not in general converge to x* (see [1]). In applying this method to the Euclidean distance matrix problem, it is appropri ate to use the Frobenius matrix norm, and to express (2.2) as minimize subject to

\\F — D\\p D e KM n Kd

(3.2)

where KM = {A : Ae IR nx ",

AT = A,

x T Ax > 0 V x e M}

(3.3)

S. Al-Homidan and R. Fletcher

Q is a convex cone, and Kd = {A: AeJRnxn,

AT = A,

a,, = 0 Vt = l , . . . , n }

(3.4)

is a subspace. Clearly from Theorem 2.1, D G KM n /<"<; if and only if D is a Euclidean distance matrix. To apply algorithm (3.1) we need formulae for the projection maps PM(-) and Pd(-), corresponding respectively to Pi(.) and P2(-) in (3.1). These are the maps from K = {A: A£ K" x ", A = AT} on to KM and Kd. Because these projections maintain symmetry, there is no need to impose the symmetry constraint explicitly. Since Kd is a subspace, Pd is straightfor wardly denned by Pd(F) = F - Diag(F), (3.5) that is Pd maps F into the matrix obtained by zeroing the diagonal elements of F. The projection map PM(F) is determined by finding the solution D of the problem minimize

\\F — D\\F

subject to

D G KM-

(3.6)

It is convenient to use the orthogonal matrix Q in (2.9) to express F =Q

Fi

f

and

D= Q

£>, dT

d 8 Q.

where ( and S are scalars. Then the constraint D £ KM is equivalent to the constraint that £>] > 0. Since \F-D\\F

=

\\Q(F-D)Q\\F

F1-D1 fr - d T

f- d C- 6

it follows that ||.F — D\\p is minimized over D G KM when d = f, the solution of the problem minimize

||F] — -DI||F

subject to

D\ > 0.

8 = ( and Z)x is

(3.7)

Using a theorem of Higham [11], the solution of (3.7) is given by Di = U/i+UT,

(3.8)

where U\UT is the spectral decomposition of Fi and the components of A+ are defined by Xf = max(A,,0), i = 1 , . . . ,n — 1. Together these results give PM{F)

= Q

UA+UT

f Q

(3.9)

Euclidean Distance Matrices

9

as the required solution of (3.6). We can now use the projection maps PM(F) and Pd{F) given by (3.9) and (3.5) to implement the Dykstra-Han algorithm (3.1). Given a distance matrix F € HT*", the algorithm is initialized by F<°> = F and the iteration formula is F(k+i) = F(k) +

p,(p M (f(*))) _ p M ( f ( ' ) ) .

(3.10)

This is the form of the algorithm used by Glunt et al. [8]. The sequences {PM(F^k))} and {Pd(PM(F^))} both converge to the solution D* of (3.2) and hence (2.2). We have found it more convenient to use a different form of the algorithm. By virtue of (3.5), the iteration formula (3.10) can be rearranged as F(M-i) = FW

_

Diag (p M ( j p('=))).

(3.H)

The effect of this formula is that it only changes the diagonal elements of F^k\ This suggests that we iterate with the diagonal matrix A(k) = F(k)

_

F

pJ2)

The iteration formula (3.11) then becomes A(*+D = A(k)

_ Diag(D ( i ) ),

(3.13)

where £)(" = PM(F + A ^ ) . These matrices Z?'*' converge to the solution of (2.2). An advantage of this formulation, which we make use of in Section 5, is that the iteration can be initialized with any diagonal matrix A' 0 ' and not just A' 0 ' = 0. Moreover, given any F^ (or A ( k ) ), the test Diag(D
(3.14)

determines whether Z)' k ' is a Euclidean distance matrix or not.

4

Solution by Unconstrained Minimization

In this section we consider a different approach to the Euclidean distance matrix prob lem (2.2). The main idea is to replace (2.2) by a smooth unconstrained optimization problem in order to use superlinearly convergent quasi-Newton methods. To do this it is necessary to estimate the rank r of the irreducibly embedded subspace, as this piece of information is not generally known. Once a value of r is chosen, the problem (2.2) is solved by the BFGS method. We give the relevant formulae for derivatives. At the end of the section we discuss details of initialization and implementation. If the rank r is known, it is possible to express (2.2) as a smooth unconstrained optimization problem in the following way. The unknowns in the problem are chosen to be the elements of the matrix X introduced in (2.3). We take X to have r rows

S. Al-Homidan and JR. Fletcher

10

and choose the translation p = Xi so that the vector Xi = 0. This gives us an unconstrained optimization problem in r(n — 1) unknowns. We therefore parametrize X by variables x,, ? = ! , . . . ,r(n — 1) in the following way 0 0

X =

0

X,

X2

3*B

*n+l

Zn-1 • • •

^2(7.-1)

• ■

Xr(„_!)

I(r-l)(n-l)+l

(4.1)

(In fact it is possible to parametrize the matrix with | ( r — l)(r — 2) fewer unknowns by rotating X to be upper trapezoidal. However it is indicated in [l] that the re sulting method tends to use more line searches on a selection of randomly generated problems.) The objective function {X) is readily calculated by first forming D from X as indicated by (2.1), after which <j> is given by 4>{X) = \\D — F\\2F. The elements of the matrix D take the form r-l

dn = 0, dij

=

dji

dn = du

=

— ^(x.'-l-ibn.!

—

—

Z J k=0

X

i+km-l

Xj+im-l)2

I

=

«»j

2,

.,n

2,...,n

where m denotes n — 1. Hence

Hx) = £ (/<,- - dt]f ■J=I

= 2{£(/ f t - da)» + £ ( / « - *,■)*} .=2

■.J=2

n

r-l

= 2 { D E 4 » - 1 + /..)2 + = 1 *:=0

52 (X)(*i+*m-l - Zj+Jfcm-l)2 + / ; i ) 2 }

(4.2)

•>1

Our chosen method to minimize ^(A') is the BFGS quasi-Newton method (see for example [7]). This requires expressions for the first partial derivatives of , ,which are given from (4.2) by &$

— a x

rV^

2

= 8x,{L,*l+km

"

r

1

+ f 1+1,11

h=0 m

+

8

r —1

:E

i S E( X '+*™ "~ ^J+tm)2 + /|+ij + i](x, =1

jt=o

Xj+tm)}

(4.3)

Euclidean Distance Matrices

11

for all s = 1, ...,r(n — 1) where t = (s-l)/m and / = mod(s,m) and if / = 0 then / = m. The BFGS method also requires the Hessian approximation to be initialized. Where necessary we do this using a unit matrix. Some care has to be taken when choosing the initial value of the matrix X, in particular the rank of X must be r. If not the minimization method may not be able to increase the rank of X. An extreme case occurs when the initial matrix X = 0 is chosen, and F ^ 0. It can be seen from (4.3) that the components of the gradient vector are all zero, so that X = 0 is a stationary point, but not a minimizer. A gradient method will usually terminate in this situation, and so fail to find the solution. Similar concerns apply if for example X ' " = 0 is chosen for any iterate. A reliable method for initializing X is to use the construction suggested by (2.8) and (2.6). Thus we define the elements of A from those of F by •« = ! ( / « - / « - / « )

*>2,

i>2.

(4.4)

The first row and column of A are zero and are ignored. We then find the spectral decomposition UAUT of the nontrivial part of A. Finally the nontrivial part of X in (4.1) is initialized to the matrix A^2Uj where Ar = diag(A,), i = l , . . . , r is composed of the r largest eigenvalues in A, and columns of Ur are the corresponding eigenvectors. When Ar is positive definite, this procedure ensures that X has rank r. Otherwise the process must be modified in some way, for example by ensuring that the diagonal elements in Ar lie above a positive threshold. An advantage of the unconstrained method is that it allows the spatial dimensions to be chosen by the user. This is useful when the rank is already known. For example if the entries in F are derived from distances between cities then the dimension will be no more than r = 2. Likewise, if the entries are derived from distances between atoms in a molecule or stars in space, then the maximum dimension is r = 3. In general however the rank of the irreducibly embedded space is not known, for example the atoms in a molecule may turn out to be collinear or coplanar. We there fore must consider an algorithm in which we are prepared to revise our estimate of r. A simple strategy is to repeat the entire unconstrained method for different values of T. If r* denotes the correct rank relating to the solution of (2.2), then it is observed in [1] that the BFGS method converges rapidly if r < r", and exhibits superlinear con vergence. On the other hand if r > r' then slow convergence is observed. One reason is that there are more variables in the problem. Also redundancy in the parameter space may have an effect. Thus it makes sense to start with a small value of r, and increase it by one until the solution is recognised. One way to recognise termination is when Z?' r ' agrees sufficiently well with Z?' r+1 ', where £>'r' denotes the Euclidean distance matrix obtained by minimizing (j> when X in (4.1) has r rows. Details of this test, and relevant numerical experience for solving various test problems by this method, are reported in [l]. An obvious alternative to using the BFGS method is to evaluate the Hessian matrix of second derivatives of (X) and use Newton's method. This would be likely

12

S. Al-Homidan and R. Fletcher

to reduce the number of iterations required. It would also enable the algorithm to make progress when Xw is a stationary point (e.g. Xw = 0) that is not a local minimizer. However there is also the disadvantage of increased complexity, and increased housekeeping at each iteration. Moreover it is possible that the Hessian has some negative eigenvalues so a modified form of Newton's method would be required. A simple example serves to illustrate the possibility of a negative eigenvalue. Take n = 2, r = 1 and let F = [^ "*] and X = [0 n ] . Then <j> = 2(1 - x\f. This has global minimizers at xx = ± 1 , a local maximizer at xt = 0, and the Hessian is negative for all xx such that 3xj < 1.

5

Hybrid Methods

The algorithms of Sections 3 and 4 have entirely different features, some good, some bad, which suggests that a combination of both approaches might be successful. Projection methods are globally convergent and hence potentially reliable, but the rate of convergence is first order or slower, which can be very inefficient. QuasiNewton methods are reliable and locally superlinearly convergent, but require that the correct rank r* is known. We therefore consider hybrid methods in which the projection algorithm is used sparingly as a way of establishing the correct rank, whilst the BFGS method is used to provide rapid convergence. In order to ensure that each component method is used to best effect, it is important to be able to transfer information from one method to the other. In particular a mechanism must be established so that the result from one method is used to provide the initial data for the other, and vice versa. This mechanism must have a fixed point property, so that if one method finds a solution, then the other method is initialized with an iterate that also corresponds to the solution. We show in this section how this can be done. We have already indicated at the end of Section 3 how the projection method can be initialized with any diagonal matrix A. However if £><*' is the Euclidean distance matrix derived from the result matrix X<*» of the BFGS method, it is not obvious how to calculate an initial matrix A for the projection method. To address this difficulty, we consider an iteration of the projection method. The current iterate A ^ ' determines F<*> = F + A'*' and the product QF^Q yields FjW, f**> and ('*', where ' Fw f(*>l F<*> = Q F(k) = Q $)T , QQ. (5.1) The spectral decomposition FJk) = £ / « A « £ / W T is calculated and £><*> is determined by £> *> PM(F^) == QQ [ V £ >< <* > == PM(F^)

m

^ V W

f« j Q Q-

(5 2)

Euclidean Distance Matrices

13

It follows from (5.1) and (5.2) that

(£,(*) .- F"")e == Q

A<*))t/(*)T 0J

0 = 0, 0. Qe-

since Qe = e„. Setting F<*> = F + A<*> from (3.12) implies that A(*>e = m(*). A(*'e (Dw -- F)e. F)e.

(5.3)

This expression is exact for the projection method. Because A'*' is diagonal, (5.3) can be used to compute a matrix A<*> from any given matrix £><*>. In our hybrid algorithm we use this as a way of initializing A<*> for the projection method, from the D(k) matrix obtained from the BFGS method. If the BFGS method is using the correct rank r = r' and has found the global solution of <j>, then Z)<*> is the solution D* of (2.2). Hence (5.3) gives the correct solution A* for the projection method. Even if the rank r f r" in the BFGS method, (5.3) enables some information to be extracted from D^ that is hopefully useful. Conversely we let £>'*> be the matrix obtained in (5.2) by the projection method, and consider how to initialize X for the BFGS method. If £><*> is a Euclidean distance matrix, then it solves (2.2), and by Theorems 2.1 and 2.2, the correct rank r" is the number of positive eigenvalues in the matrix A"1'. We denote this number by A/"(A<*>). In general, when D^ is not a solution, we use Af(Mk)) to determine the row dimension r of X in (4.1) for the BFGS method. To determine the elements of X we again use the construction suggested by (2.8) and (2.6). Thus we define the elements of A from those of D(fc) by i >>2, ay = §(<*« 2, 3j > 2. (5.4) Xi) \(d,} --dudu -- ddu) The first row and column of A are zero and are ignored. We then find the spectral decomposition UAUT of the nontrivial part of A. Finally the nontrivial part of X in (4.1) is initialized to the matrix \xJ2Uj where Ar = diag(A,), t = 1,.. . , r contains the r positive eigenvalues of A, and columns of Ur are the corresponding eigenvectors. We have found that it is sufficient to carry out only one iteration of the projection method between each call of the BFGS method. Thus we can express our hybrid algorithm in detail as i. Initialize k = 0, r<°> and A'(0> ii. Minimize (X) using the BFGS method, giving A ' « and £><*> iii. Use (5.3) to calculate A<*' iv. Redefine £><*> = PM(F + A<*>) using (5.1) and (5.2) v. Terminate if Diag(£><*)) is within tolerance vi. Set r'*+1* = Af(A'*') where A'*' is the eigenvector matrix of F,(fc)-

S. Ai-Homidan and R. Fletcher

14

vn. Initialize *<'c+1> from £<*> using the technique described in (5.4) ff. viii. Set k = k + 1 and go to step ii. An advantage of this approach is that if the rank is not correct, one iteration of the projection method can often give a better estimate. Also r'fc' is nor restricted to being increased by one, as for the unconstrained algorithm, and can either increase or decrease. Moreover good approximations of X ' ' can be made from the matrix £)' ' obtained by the projection method. We have evaluated two different versions of this algorithm which differ in respect of how r<0' and X'°> are initialized. In Algorithm 1 we carry out iterations of the projection method starting with A' 0 ' = 0 until Af(Aw)

= A/"(A(*-J))

j =

l,...,s

where s is some pre-selected positive number. This value becomes r' 0 ' for step i above, and X^ is initialized as in (5.4). The choice of 5 is a compromise between two effects. If s is small then the rank may not be accurately estimated, but the number of (expensive) iterations taken in the projection method is small. On the other hand if s is large then a more accurate rank is obtained but the projection method needs more iterations. In practice we have found s = 2 to be adequate for problems in which r is small. In Algorithm 2, r' 0 ' is supplied by the user and X^ is calculated from F as indicated in (4.4). This approach avoids the initial sequence of projection iterations, but works well if the user is able to make a good estimate of the rank, which is often the case.

6

Numerical Results

The algorithms have been tested on randomly generated distance matrices F with val ues distributed between 10~3 and 103 A Fortran 77 program has been written, using the NAG library to compute eigenvalues for the projection method. The computations are carried out in double precision on a SUN SPARCstation SLC. Table 1 summarizes the results for the four different approaches, the projection method, the unconstrained method, and the hybrid Algorithms 1 and 2. The termination criterion for the un constrained method is | | £ ) ( * ) - D ^ - 1 ) | | < 10" 5 and ||A<*> - A'*" 1 '!! < 10" 5 for the other methods. All four algorithms converge to essentially the same values. Table 1 shows the comparative results for all methods and Table 2 shows the progress of Algorithm 2 in more detail. An asterisk indicates where the correct rank has been identified. In some cases, with Algorithm 1, the final rank is r' + 1 but the solution is within the required tolerance. For the projection algorithm, each iteration involves the matrix product QF^Q, followed by an eigensolution, which are relatively expensive 0(n3) calculations. Thus

Euclidean Distance Matrices

15

the projection algorithm is not competitive. For the other algorithms, the housekeep ing associated with each line search is 0(n2). Also, if care is taken, it is possible to calculate <j>(X) and V{X) in 0(n2) operations. Thus each line search is much less expensive than an iteration of the projection method. For the unconstrained algo rithm the initial value r' 0 ' is tabulated, and r is increased by one until the solution is found. The total number of line searches is tabulated, and within this figure, it is found that fewer line searches are required as r increases. It can be seen that the total number of line searches is much greater than is required by the hybrid methods. Also the initial value r' 0 ' = 6 is rather arbitrary: a smaller value of r' 0 ' would have given an even larger number of line searches. Both hybrid algorithms are seen to be effective. As n increases, Algorithm 1 takes an increasing number of projection iterations before the rank settles down. We find it better to increase the value of s as the value of r* increases. Once the projection iteration has settled down, the BFGS method finds the solution rapidly and no further projection steps are needed. Algorithm 2 requires a relatively large number of line searches (see Table 2) in the first call of the BFGS method, after which one projection step finds the correct rank, and the next call of BFGS finds the solution in a few line searches. This is because of the good initial starting matrix X given by the projection method. Because the projection steps in Algorithm 1 are relatively expensive, the difference in computing time between these algorithms is not very significant.

References [l] S. Al-Homidan, Hybrid methods for optimization problems with positive semidefinite matrix constraints, Ph.D. Thesis, Department of Mathematics and Com puter Science, University of Dundee, Dundee, Scotland, (1993). [2] F. Alizadeh, J-P. A. Haeberly and M. L. Overton, Primal-Dual Interior-Point Methods for Semidefinite Programming, working paper XV Symposium on Mathematical Programming, Ann Arbor (1994). [3] J. P. Boyle and R. L. Dykstra, A method for finding projections onto the inter section of convex sets in Hilbert space, in Advances in Order Restricted Statistical Inference, (Eds. R.L. Dykstra, T. Robertson, and F.T. Wright), Lect ure Notes in Statistics 37, Springer-Verlag, Berlin, (1986) 28-47. [4] G. M. Crippen, A novel approach to calculation of conformation: distance ge ometry, Journal Computational Physics 24 (1977) 96-107. [5] G. M. Crippen, Rapid calculation of coordinates from distance measures. Journal of Computational Physics 26 (1978) 449-452. [6] R. L. Dykstra, An algorithm for restricted least squares regression, Journal of the American Statistical Association 78 (1983) 839-842.

16

S. Al-Homidan and R. Fletcher

[7] R. Fletcher, Practical Methods of Optimization, 2nd. Edition, Wiley, Chichester, (1987). [8] W. Glunt, T. L. Hayden, S. Hong, and J. Wells, An alternating projections method for computing the nearest Euclidian distance matrix, SIAM Journal on Matrix Analysis and Applications 4 (1990) 589-600. [9] S. P. Han, A successive projection method, Mathematical Programming40 (1988) 1-14. [10] T. Havel, I. Kuntz and G. M. Crippen, The theory and practice of distance geometry, Bulletin of Mathematical Biology 45 (1983) 665-720. [11] N. J. Higham, Computing a nearest symmetric positive semi-definite matrix, Linear Algebra and its Applications 103 (1988) 103-118. [12] I. J. Schoenberg, Remarks to M. Frechet's article "Sur la definition axiomatique d'une classe d'espace distances vectoriellement applicable sur l'espace de Hilbert", Annals of Mathematics 36 (1935) 724-732. [13] G. Young and A. S. Householder, Discussion of a set of points in terms of their mutual distances, Psychometrika 3 (1938) 19-22.

Euclidean

n 5 10 15 20 25 30 35 40 45 50

Distance

r* 2 4 5 7 8 9 9 10 11 13

PA NPI 21 46 64 101 85 129 115 168 136 171

Matrices

17

UA TNL 12 80 140 176 221 144 382 161 246 288

r«o)

2* 3 4 5 6 6 6 6 6 6

Al NV 8 36 70 133 192 261 306 390 484 637

s 2 2 3 3 3 3 4 4 4 4

NPI 2 2 4 4 4 4 8 7 9 7

rm 2* 4" 6(5*)

7* 8' 10(9*)

9* 11(10*)

11* 13*

NL 7 15 22 18 14 19 23 21 17 13

A2 TNL 12 44 76 81 106 52 109 38 64 142

Table 1: Comparing four algorithms for the Euclidean distance matrix problem. PA: T h e projection algorithm (Section 3). UA: The unconstrained algorithm (Section 4). Al: Hybrid Algorithm 1. A2: Hybrid Algorithm 2. N P I : Number of projection iterations. NL: Number of line searches in the BFGS method. T N L : Total number of line searches in the unconstrained algorithm. NV: M a x i m u m number of variables in the unconstrained algorithm.

A2 n

5 10 15 20 25 30 35 40 45 50

r(o)

2* 3 4 5 6 6 6 6 6 5

NL in BFGS 12 33 63 70 94 42 98 22 46 125

r(*l

from

OPA

NL in BFGS

4* 5* 7* 8* 9* 9*

IOII* 13*

Table 2: Detailed progress of Algorithm 2. OPA: One iteration of the projection algorithm. NL: Number of line searches.

11 13 11 12 10 11 16 18 17

R. Correa, A. Jofre and L. Thibault

18

Recent Advances in Nonsmooth Optimization, pp. 18-23 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Sub differential Characterization of Convexity Rafael Correa and Alejandro Jofre Universidad de Chile, Facultad de Ciencias Fisicas y Matemdticas, Ingenieria Matemdtica, Casilla 170/3-Correo 3, Santiago, Chile

1

Departamento

Lionel Thibault Departement de Mathematiqv.es, Universite de Montpellier II, Place Eugene Bataillon, 34095 Montpellier Cedex 05, France

Abstract In this paper we show how our method in [5] can be applied in a, simple way to characterize convex functions in terms of the monotonicity of any presubdifferential.

1

Introduction

A great effort has been made to characterize convexity properties of a function via the monotonicity of a subdifferential . Poliquin [14] characterized extended-realvalued convex functions, defined on finite dimensional vector spaces, in terms of the monotonicity of their proximal subgradients (see also Clarke [3] for locally Lipschitz functions). After this work, we have developed two papers [4] and [5] extending this result to any subdifferential on general Banach spaces. These works have been naturally divided in two parts. In the first part (see [4]) we reduce the problem to the case of locally Lipschitz functions by using the Moreau-Yosida approximation and a general mean value theorem, and hence to apply the Clarke's result. This method which is the most natural in our point of view requires the reflexivity of the space because of the use of the Moreau-Yosida approximation. The method of the second This work was partially supported by Fondo Nacional de Ciencia y Tecnologi'a, FONDECYT.

SubdifferentiaJ Characterization of Convexity

19

part (see [5]) is to apply directly the general mean value theorem to any subdiflFerential on Banach spaces in order to prove that the monotonicity of a subdifferential ensures that this subdiflFerential coincides with the Fenchel subgradient set. Note that the corresponding results with Frechet (resp. proximal) subgradient set (for appropriate Banach spaces) is a direct consequence of [5]. Since these papers, other proofs have been established in Aussel & et al [1] and Luc [9]. Our aim in this paper is to extend our second method to any presubdifferential and also to show how this method can be presented in a still more simple way.

2

Presubdifferential Characterization

In all the paper E will be a general real Banach space and E* its topological dual. As in [17] and [18] we consider the following notion of presubdifferential. Definition 2.1 A presubdifferential is any operator 8 which associates to any func tion f : E —» M U {+00} and any point x € E a (possibly empty) subset 8f{x) of E" such that (i) 6f(x) = 4> whenever f(x) = +00. (ii) 8f(x)

is the usual subdifferential of Convex Analysis whenever f is convex.

(Hi) 8f(x) = 6g(x) for any function g which is equal to f near x. (iv) 0 £ Sf(x)

whenever x is a local minimizer of f and f(x) < +00.

(v) for any continuous convex function g % + /)(*) C Sg{x) + lira sup 8f(y) 1 y->X

whenever f is lower semicontinuous near x, where limsup 8f(y) is the weak-star sequential upper limit and y —» x means j - n and f(y) —^ f(x). Examples. All subdifferentials considered in [5] (Clarke subdifferential, approximate subdifferentials and so on, see for example [7], [8], [10], [11], [15], [16]) are presubdifferentials. If E is an Asplund space (resp. a super-reflexive space) the Frechet subgradient set (resp. the proximal subgradient set, see for example [2]) is a presub differential. This last fact follows from Fabian [6] or Mordukhovich [12]. Remark. It is not difficult to see from the above properties, that the usual subd ifferential of Convex Analysis at each point x is contained in limsup8f(y) for any extended-real-valued function / .

R. Correa, A. Jofre and L. Thibault

20

As it has been observed in [5] and [17] the following result by Zagrodny [20] holds for any presubdifferential. Zagrodny Mean Value Theorem. Let f : E -* R U {+00} be a l.s.c. (lower semicontinuous) function and a, 6 £ domf. Then there exist c £]a, &], a sequence (xt) converging to c with f(xk) —> /(c) and x\ £ Sf(xk) such that i) fl^fl lim < *J, a - x , > > / ( a ) - /(&), ii) lim < x*k, a - 6 > > / ( a ) - f(b). We recall that a presubdifferential of a function / is monotone if for all x,y £ E, x' £ 6/(x) and j/* £ <5/(y) < x* -y",x

-y > > 0.

Theorem 2.2 / / a function f : E —> Jf U {+00} ?'s /.s.c. , iften
for all

xeE.

c

where <9 /(x) denotes the usual subdifjerential of Convex Analysis of a function f at a point x £ E, that is, dcf(x)

= {x' £ E" :< x',u - x >< f(u) - f(x), Wu £ E]

if x £ domf := {x 6 E : f(x) < +00} and dcf(x)

= <j> if'x ^ domf.

Proof. Let x" £ 6f(x). For any J/ £ domf taking xt —► c £ ]x,t/] and xj £ 6f{xk) from Zagrodny mean value theorem and using the monotonicity of Sf it follows that / ( * ) - f(y) < j j ^ j j lim < x'k,

x-xk>

\\x-y\\,

and hence x ' £ dcf(x),

< T. [7 lim < x * , x — x/t> = < x ,x — ! / > , ||x-c|| which completes the proof. D

As dcf is always monotone, it follows from property ii) in Definition 2.1 that Sf is monotone whenever / is convex. In fact we are going to prove in our main theorem that the monotonicity of the presubdifferential Sf of / is a characterization of the convexity of the function. Before proving our main theorem we need the following proposition.

Subdifferential Characterization of Convexity

21

Proposition 2.3 If a function f : E —> R U {+00} is l.s.c. and ifSf is monotone, then for all x G domf and y £ domSf := {u £ E : 8f(u) ^ } we have [x, y] C domf.

Proof. Taking / — {y*,.) for some y' € 8f(y) we may assume that 0 6 limsup 6f{u). 1

Suppose that z\ := x + X(y — x) ^ domf for some A e]0,1[. Fix any integer number n > f(x). Applying the Zagrodny mean value theorem to the l.s.c. function h defined by h(u) = f(u) if u ^ z\ and h(u) = n if u — z\ we obtain the existence of H - » c £ ]■£>,£] and x\ £ 6f(xk) such that \\m{x'k, z\ — Xk) > 0 and \im(x*k,z\ — x) > 0. Then for some k we have (x"k,y-xk)

= (xl,y-zx)

+ (xl,zx-xk)

> A _1 (l - X)(x*k, zx - x) > 0

which is in contradiction with the monotonicity of Sf and 0 6 limsup<5/(u). D / u—*y

Theorem 2.4 Let f : E —> M U {+00} be a l.s.c. function. assertions are equivalent:

Then the following

i) 8f is monotone, ii) f is convex. Proof. Let x, y £ domf and 2 = Ax + (1 — X)y, with A G]0, l[. It is easy to see from the mean value theorem that there exists a sequence (yk) in dornSf such that yk —> y and f(yk) —+ f(y). Let zk = Xx + (1 — X)yk. From Proposition 2.2, zk € domf. 1) If zk is not a local minimum of / we can choose z'k such that \z'k — zk\ < (1/k) and f(z'k) < f(zk). Applying the Zagrodny mean value theorem on [z,t,zit] we obtain sequences Zfc,„ A ck £}zk,z'k], z'kn 6 Sf(zk> [f(zk) - f{z[)\ ■ j g " Zk\ > 0. Pk ~ Zk\\ By Theorem 2.2 we have z'kn £ dcf(zkin). f(x) - f{zk,n)

>< z*kn,x-

Hence

zk,„ > and f(yk) - f(zk,n)

>< z'kn,yk

- zk

and these inequalities and the lower semicontinuity of / imply that Xf(x) + (1 - X)f(yk)

> liminf[/(2 M )+ < z'k^,zk - zk,n >] >

f(ck).

R. Correa, A. Jofre and L. Thibault

22

2) If Zk is a local minimum of / then 0 6 Sf{zk) C dcf(zk). Hence putting c/t = Zk we obtain f(x) > f(ck) and f(yk) > f{ck) which imply in this case that A/(x) + (1 - X)f(yk) >

f(ck).

As f(yk) —* f{y) and Ck —► -z, it follows from the lower semicontinuity of / and from (2) that Xf(x) + (l-X)f(y)>f(z), which is the desired inequality. □ When the presubdifferential is the Frechet or the approximate subgradient set we obtain the following corollary.

Corollary 2.5 Assume that E is an Asplund space (resp. E is a super reflexive space). Then a l.s.c. function f : E —► M U {+00} is convex if and only if its Frechet subgradient (resp. proximal subgradient) set is monotone.

References [1] D. Aussel, J.-N. Corvellec k M. Lassonde, Subdifferential characterization of quasiconvexity and convexity, to appear. [2] J. M. Borwein, D. Preiss, A smooth variational principle with applications to subdifferentiability and differentiability of convex functions, Transactions of the American Mathematical Society 303 (1987) 517-527. [3] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York (1983). [4] R. Correa, A. Jofre and L. Thibault, Characterization of lower semicontinuous convex functions, Proceedings of the American Mathematical Society 116, 1 (1992) 67-72. (Preprint January 1991) [5] R. Correa, A. Jofre and L. Thibault, Subdifferential monotonicity as characteri zation of convex functions, Numerical Functional Analysis and Optimization 15 (1994) 531-535. (Preprint October 1991) [6] M. Fabian, Subdifferentiability and trustworthiness in the light of a new vari ational principle of Borwein and Preiss, Acta University Carolinae 30 (1989) 51-56. [7] D. Ioffe, Approximate subdifferentials and applications, 2 and 3, Mathematika 33 (1986), 111-128; 36 (1989) 1-38.

Subdifferential

Characterization of Convexity

23

[8] A. Jofre and L. Thibault, Proximal and Frechet normal formulae for some small normal cones in Hilbert spaces, Nonlinear Analysis Theory, Methods and Appli cations 19 (1992) 599-612. [9] D. T. Luc, On the monotonicity of subdifferentials, Ada Mathematica ica 8 (1993) 99-106.

Vietnam-

[10] Ph. Michel and J. P. Penot, Calcul sous differentiel pour des fonctions lipschitziennes et non lipschitziennes, C.R. Acad. Sci. Paris, 298 (1984), 269-272. [11] B. S. Mordukhovich, Metric approximations and necessary optimality conditions for general classes of nonsmooth extremal problems, Soviet Mathematics Doklady 22 (1980) 526-530. [12] B. S. Mordukhovich and Y. Shao, Nonsmooth sequential analysis in Asplund spaces, Transactions of the American Mathematical Society, to appear. [13] J. J. Moreau, Fonctionelles convexes, Lecture Notes, Seminaire "Equations aux derivees partielles", College de France, (1966). [14] R. A. Poliquin, Subgradient monotonicity and convex functions, Nonlinear Anal ysis Theory, Methods and Applications 14 (1990) 305-317. [15] R. T. Rockafellar, Convex Analysis, Princeton University Press, (1970). [16] R. T. Rockafellar, Generalized directional derivatives and subgradients of nonconvex functions, Canadian Journal of Mathematics 32 (1980) 257-280. [17] L. Thibault, A note on the Zagrodny mean value theorem, to appear. [18] L. Thibault, D. Zagrodny, Integration of subdifferentials of lower semicontinuous functions on Banach spaces, to appear in Journal of Mathematical Analysis and Applications. [19] J. S. Treiman, Shrinking generalized gradients, Nonlinear Analysis Theory, Meth ods and Applications 12 (1988) 1429-1449. [20] D. Zagrodny, Approximate mean value theorem for upper subderivatives, Non linear Analysis Theory, Methods and Applications 12 (1988) 1413-1438.

24

C. Dang

Recent Advances in Nonsmooth Optimization, pp. 24-35 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

A Simple Triangulation of Rn with Fewer

f ^ X S o l v i n g Nonsmooth Convex Chuangyin Dang Engineering Science

Department,

Auckland

University,

Auckland,

New

Zealand

Abstract We propose a triangulation of Rn, named Z^-trianguIation, for simplicial algorithms. It has fewer simplices than the Di -triangulation and more or less the same number of simplices as the £>' r triangulation. We give an application of the DJ-triangulation to one of simplicial algorithms for solving nonsmooth convex programming. The DJ-triangulation seems simpler than the D' r triangulation to be used in simplicial algorithms. We expect simplicial algorithms based on the DJ-triangulation to be more efficient.

1

Introduction

In [20] Scarf proposed t h e first elegant constructive proof of existence of a fixed point of a continuous mapping from t h e unit simplex t o itself. S t i m u l a t e d by this pioneering discovery, a number of algorithms, called simplicial algorithms in literature, has been developed for computing fixed points, such as homotopy m e t h o d s in [5], [7], [16], and variable dimension algorithms in [11], [12], [13]. See also [l] a n d [22]. Scarf's algor i t h m uses primitive sets, however all t h e subsequent developments employ simplicial subdivisions or triangulations of the t h e space. It was was investigated in [8], [18] and a n d [23] [8], [18] [23] t h a t efficiency of simplicial algorithms heavily depends on triangulations underlying that t h e m . To speed up u p simplicial algorithms, several triangulations have been proposed. them. See [2], t c . Triangulations tthat h a t can [2], [6], [6], [10], [14], [15], [19], [19], [24], eetc. can easily b bee employed he D in [2], x-triangulation in simplicial algorithms are are tthe ZVtriangulation [2], K\-triangulation AVtriangulation in [9] A*[13], 7 rr riiangulation r i i a n g u l a t i o n in [22], [24]. It was triangulation in [13], [22], and and £>i-triangulation £)' r triangulation in [24]. was and [24] [24] that t h a t the the A-triangulation ^ - t r i a n g u l a t i o n and and DJ-triangulation are better b e t t e r tha.n shown in [2] and DJ-triangulation are than other triangulations according to measures of efficiency of triangulations. The T h e triangulations proposed in [10], [15] and [19] have some nice properties, b u t t h e y are t o o • •

i

i

i

i

Triangulation & Simplicial

Algorithm

25

complicated to be employed in simplicial algorithms. In this paper we give a triangulation that is a modification of the Di-triangulation and is named £)J-triangulation. According to the number of simplices in a unit cube, the £>*-triangulation is better than the Z)i-triangulation and more or less the same as the D\- triangulation. The Z)J-triangulation seems simpler than the D\-triangulation to be used in simplicial algorithms. The paper is organized as follows. We introduce the Z)J-triangulation in Section 2. We discuss how to transform a nonsmooth convex programming problem into an equivalent problem of existence of a fixed point in Section 3. We give a simplicial algorithm for solving nonsmooth convex programming in Section 4.

2

Triangulation

In [2] the so-called Z)1-triangulation of Rn was proposed. Its definition is as follows. Let W be the set given by either {y G Rn | all components of y are odd} or {y € R" | all components of y are even}. Let y be a vector in W. Let ir = (7r(l),7r(2),..., ir(n)) be a permutation of elements of { 1 , 2 , . . . ,n], and s = (si,s2,... ,sn)T a sign vector with s< 6 { 1 , - 1 } , i = 1,2,... ,n. Let p be an integer with 0 < p < n — 1. Let u' be the ith unit vector of Rn for i = 1,2,... ,n. If p = 0, then y° = y, and y1 = y + s^j)u'"u),

j = 1,2,

...,n,

and if 1 < p, then y° = y + s, and r i^-a.uju"0'. i = l,2,...,p-l, y1 =

1 y + s^u^,

j

=p,p+l,...,n.

We use Dx(y, n,s,p) to denote the convex hull of y3, j = 0 , 1 , . . . , n. Then Dx(y,ir,s,p) is a simplex. The collection of all such simplices, denoted Dj, is a triangulation of Rn See [2] for the details. Let N = { 1 , 2 , . . . , n } and W0 = { 0 , 1 , . . . , n } . Let ft € i?" be an integral vector. Let 70(ft) = {i G W | hi is odd} and 7eC0 = {i € N \ h{ is even}. We use r to denote the number of elements of I0{h). Note that the number of elements of Ie(h) is given by n — r. Let A(h) = {x\ - 1 < x, - hi < 1, i g I0(&), and x, = ^ , i € 7e(/i)}

26

C. Dang

and X,:; -- fe, B(h) ={x\ B{ft)= {i | - 1 < x, ft,<< 1, i 6€ /.(ft), and z, /„(ft)}xt = ft,, i G /„(/*)}.

We use B(ft) to denote the convex hull of {0} xA(ft)U{l}

xB(h).

Then it is easy to see that the intersection of £>(ftx) and D{h2) for any two integral vectors ft1 and ft2 is given by the convex hull of 2 2 2 l l ({0} x A(ft>) Aih1) nnA(h A(h ))))Uu({1} ({1}xxB(k B(k ) ) nn B(ft B(ft2)), )),

which is either empty or a common face of both £?(&*) and £(ft 2 ), and UAeZnZ)(/i) /)(ft) = [ 0 , l ] x / ? " . See [11] for the details. triangulation with

We obtain a triangulation Te of R" by using the £>,-

W={yeRn\

all components of y are even},

and a triangulation T0 of Rn by using the ^^triangulation with W = {y G 6 /RTn | all components of j/y are odd}. Let T. be the set of faces of simplices of Tc, and f0 the set of faces of simplices of TB. The restriction of T. to A(h) is given by {aeT A(h) and dim(cr) = r}, Tc\A(h) = {a Ge\crC f. | a C A(ft) and the restriction of T0 to B(h) is given by fT00|B(ft) \B(h) = {
j"j = l1,2,,..,r, ,2,,..,r,

and if 1 < p, then y° = y + £T =1 *lr<0ti«W, „J _

y —

lr y^" yi^ 1 - s„wmu*M, u*U\ j = l , 2 , , . . . p - l , I J/ + *^o)«' ( j ) . j =PiP+

l,--,r.

As Let i£ Let

27

Triangulation & Simplicial Algorithm

Let
and if 1 < p, then y° = y + E? = 7 ^(,)i**» MO"^, r(j) '"W, , j = 1,2, . . . , p - 1,

I V + i i ^ j u * ^ , j =p,p =p,p + + l l , . .ll,...n-r. .n-r. Let <7 be the convex hull of y\ j = 0 , 1 , . . . , n - r. Then u is a simplex in f0|B{h), and the collection of all such simplices forms T0\B{h). Let
UheZ »T(h). Uk ez^T(h).

Then T forms a triangulation of [0,1] x Rn. Notice that the reflection of T about the coordinate plane x0 = 0 gives a triangulation of [-1,0] x 73". Thus we have a triangulation of [-1,1] x RT We obtain a triangulation of Rn+1, denoted D\, by translating the triangulation of [-1,1] x i f We call it D'-triangulation. From the above construction, it is not difficult to get the following definition of the Z)J-triangulation. Let y = (jfo, Su •.•• yn)T be a vector ruch that all of fts somponents are even. Let T = (TT(0), TT{1), . . . , jr(n)) be a permutation of elements of { 0 , 1 , . . . , n}. Let q be the integer with ir(q) = 0. Let s = (s0,su.. .,sn)T be a sign vector with Sig{-l,l1}, = 00,,....n. Let p 0 oan pi be ewo integerr such that - 1 1 <j < q - — if 0 < 9, pj = - 1 if q = 0, 0 < p2 < n - 9 - 1 if q < n, and p2 = = if q = =. Let uj be the j'th unit vector of Rn+1 for j = 0 , 1 , . . . , n . Definition 1 If

Pl

= - 1 , then y~l = y, and yj = y + s„{])u*U), j = 0 , 1 , . , . .q - l,

«/0 <, pi, then tfcen y~ I/"11=y = y + £ £ o MO"'* 1 ' aW if0 = < 1 I !/ + « * ( * ) « ' i = Pi,Pi + ! , • • • . 9 - 1 ,

C. Dang

28 if p2 = 0, then yq = y + s, and yj = y" - s„wu*V\

j = « + 1, g + 2 , . . . , n,

if 1 < Pi, then yq = y + £ L o s ^ u * ' 1 ' , and ( f-1

+ s*mu*Uh J = q + 1, q + 2,. . . , q + p2 - 1,

3

y = I v~ - Sx(j)«' w . i = i + ?2,? + pi + 1 , • ■ • , « , where y' = y -r s. We use D\{y, ir, s,p\,p2) to denote the convex hull of y3, j = — 1 , 0 , . . . , n. Then it is a simplex in D^, and the collection of all such simplices forms the £>i-triangulation ofir* + 1 . Let C = {x | 0 < Xi < 1, i = 0 , 1 , . . . , n } . It is easy to see that the restriction of the .DJ-triangulation to C, J D*|C

= {
is a triangulation of C. Theorem 1 The number of simplices of the K\ -triangulation in C is equal to N(K%) = (n + 1)!. The number of simplices of the D\-triangulation in C is equal to

™-£^ Proof: See [2]. Theorem 2 The number of simplices of the D\-triangulation N(D\) =

YjCldqdn.q, ,=o

where Ci = " d

and ^o = di = 1.

>=

2 +

j$n-j)V

r £fci r(j -kurn + 1)!-f°

2

^ *

in C is equal to

Triangulation &: Simplicial

Algorithm

!9

Proof: From the definition of the Z)j-triangulation, it is clear that | y = 0 and a = ( 1 , 1 , . . . , 1) T }.

D\\C = {D[(y, w,s,pup2)

Given an integer q with 0 < q < n, there are C%dqdn-q simplices with n(q) = 0 in D*\C, Therefore, the number of simplices in D'\C is equal to n

? , Cndqdn-q. ,=o We complete the proof. □ Theorem 3 N(Dl)/N(Ki)

approaches to (e — 2)2 as n goes to infinity.

Proof: Observe that Njpi) N(K$

1 n+

" dq dn.q l^q[(n-q)r

We know that dk/k\ approaches to e —2 as k goes to infinity (see [2]). It is not difficult to show that N(D*)/N(K1) approaches to (e —2)2 as n goes to infinity. We complete the proof. □ In [24] Todd and Tunccel proposed a triangulation of Rn+1, denoted D[, the number of simplices of which in C is equal to " fc-i 7V(L>;) = 2 + 2" + £ £

(„ + i v -

They also showed that N(D[)/N(Ki) approaches to (e — 2)2 as n goes to infinity. Therefore, the Z)*-triangulation and Z)J-triangulation have more or less the same number of simplices in C. However, the Z?*-triangulation seems simpler than the D[-triangulation to be used in simplicial algorithms. As follows, we give the pivot rules of the ZJJ-triangulation in Figure 1, which tell us how to generate all adjacent simplices of a simplex. Let a = D\(y, n, s,pi,p2) be a given simplex with vertices y', i = — 1,0,..., n. We want to know what is its adjacent simplex, a' = .DJ(i/',7r', s',p[,p'2), opposite to vertex y' We show how to obtain y', 7r', S', P'17 p'2 from y, IT, s, pi, p2 in Figure 1 where y" = y + 2^ (0) w T(0) , ' y + 2^ ( ,_ 1)U ' r <'- 1 »

if i =

g-2,

y + 2s„{q_2)u*li-V

iii =

q-l.

b

y

s" = s-

2s 7r(0) u' r(0) ,

C. Dang

30

Figure 1: Pivot Rules of the Z?J-Triangulation

Triangulation & Simplicial Algorithm

31

f 5 - 2^,,_ 1 ) u' r ('- 1 )

if i = q - 2,

[ s - 23 T( ,_ 2) u'(«- 2 )

iii =

b

s =\ s = a-

q~l,

2s„{n)u*l"\

( S - 2s„(n)U*<">

ifi = 7 1 - 1 ,

[ s - 2slr(n_1)u'r

if i = n.

1

ir == («r(l), - . . ,*(n),«-(0)), x 2 = ( T ( 1 ) , . . . , *(q), x(0), x(q + 1 ) , . . . , 7r(n)), x 3 = (*(0),. -. ,7r(t - l),ir(i + 1 ) , . . . , x(n), x(*)), x 4 = (ir(0),..., x(i - 1), x(i + 1 ) , . . . , *(q), i ( i ) , x ( 9 + 1 ) , . . . , x(n)), x 5 = ( x ( 0 ) , . . . , x ( t + l),x(2),...,x(rc)), *"6 = ( T ( 0 ) , • • •, T(PI - 1), »r(»). * (Pi), ■ • •, *(* - 1), T(I* + 1 ) , . . -, x(n)), x7 = ( x ( 0 ) , . . . , x ( 9 + l ) , x ( 9 ) , . . . , x ( n ) ) , x 8 = (x(
3

+ P2), • • • ,T(Z - l),x(* + 1 ) , . . . ,x(n)).

Nonsmooth Convex Programming

The problem we consider is min

f(x)

(CP) subject to

Qi(x) < 0, i = 1,2, • • ■ , ro,

where / : J P - » i i and #, : i?" —> .ft, j = 1,2, ■ • ■ ,m, are convex. The following definition can be found in [17]. Definition 2 A subgradient of a convex function h at a point i f i i " is equal to a vector £ such that h(y)-h(x)>tT(x-y) for ally € R".

C. Dang

32

Let dh(x) denote the set of all t h e subgradients of h at x € Rn We call dh(x) subdifferential of h at x 6 Rn It is obvious t h a t dh is u p p e r semi-continuous on Rn. Let q(x) = uiaxi i? is convex. Let 7(x) = {2 | 0,(x) =
if9(x)<0,

conv(<9/(x) U dq(x)),

if q(x) = 0,

dq(x),

ifq(x)>0

for x 6 .ft", where conv(V) denotes the convex hull of V. Let
points

T h e following l e m m a was given by Eaves in [4]. L e m m a 1 Let Q be a compact and convex subset of ft", and an upper semicontinuous mapping from Q to the set of nonempty, convex and compact subsets o / f t " Suppose that there exists an interior point c of Q such that c 6 <j>(x) for all x on the boundary of Q. Then 4> has a fixed point in Q. Using L e m m a 1, one can easily derive t h e following theorem t h a t gives Merrill's condition for convergence of simplicial algorithms. T h e o r e m 5 Let <j> be an upper semi-continuous mapping from Rn to the set of n nonempty, convex and compact subsets of R . Suppose that there exist w € R", 0 < p. and 0 has a

As a corollary of Theorem 5, we make the following conclusion. C o r o l l a r y 1 If the set {x \ q(x) < 0} i* bounded, that 0 G p{x*).

then there exists a point x"

such

Notice t h a t if there is a point x with q(x) < 0, a point x* with 0 £ p(x") gives an optimal solution to (CP). T h e above discussion can also be found in [21]. Under the condition t h a t {x | q(x) < 0} is bounded and there is a point x with q(x) < 0, we describe a convergent simplicial algorithm to c o m p u t e such a point x* in next section.

Triangulation & Simplicial Algorithm

4

33

Algorithm

Let T be a triangulation of [0,1] x ft" such that vertices of every simplex in T are in {0,1} x R". One can easily use the D*-triangulation to obtain such a triangulation. Let A be a positive definite matrix. Given a point (t,x) G {0,1} x .ft", we assign to (t,x) a vector f y€ y€p(x) iifi f i = 1, P(x) Kt,x)=\ Kt,x)=\ if 0, {{ A{x-x°) A{x-x°) if // == 0, where x° is an arbitrary point. Definition 3 Let a be a simplex of T, and r a facet of a with vertices zj, j = 0 , 1 , . . . , n . We say that r is a complete simplex if y\,l{z ) = and n n, , ==l i y>,/(z->) = oQ and 1

have a nonnegatwe solution, and that a is an almost complete simplex if a has a complete facet. Without loss of generality, we assume that every almost complete simplex has exactly two complete facets. Translating T if necessary, we can have that (0, x°) is an interior point of some simplex T0 in {0} x .ft" Let a0 be the simplex of T such that T0 is its facet. Notice that there is only one complete simplex in {0} x R" that is r 0 . The following algorithm can be found in [16]. Algorithm: Step 1: Let z+ be the vertex of a0 ooposite et o0. . e t t = = 0nd dg ot otep p2 Step 2: Compute l{z+). Let z" be the vertex of rk with l(z~) = l{z+), and Tfc+I the facet of uk ooposite et o~ Go ot otep p3 Step 3: If rk+1 is in {1} x i f , then the algorithm terminates. Otherwise, do as follows. Let ak+i be the simplex of T opposite to 2", and z+ the vertex of ak++ opposite to rfc+1. Set t = k + 1 1nd gg oo Step 22 It is easy to see that the algorithm generates a sequence of adjacent almost complete simplices. Since every almost complete simplex has exactly two complete facets and there is a unique complete simplex in {0} x ft", all simplices generated by the algorithm are different from each other and the algorithm either terminates at a complete simplex in {1} x R" within a finite number of iterations or generates infinitely many adjacent almost complete simplices. When grid size of T is sufficiently small, from the discussion in Section 3, one can easily see that all almost complete simplices generated by the algorithm are in a bounded set. Notice that the number of simplices of

34

C. Dang

the 73J-triangulation in a bounded set is finite. Therefore, we conclude the algorithm terminates at a complete simplex in {1} x Rn within a finite number of iterations when the grid size of T is sufficiently small. Any point of a complete simplex in {1} x R" yields an approximate optimal solution to (CP). If the accuracy is not satisfied, one can repeat the algorithm with a triangulation of smaller grid size of [0,1] x R".

References [l] E. L. Allgower and K. Georg, Simplicial and continuation methods for approx imating fixed points and solutions to systems of equations, SIAM Review 22 (1980) 28-85. [2] C. Dang, The ZVtriangulation of Rn for simplicial algorithms for computing solutions of nonlinear equations, Mathematics of Operations Research 16 (1991) 148-161. [3] T. M. Doup, Simplicial Algorithms on The Simplotope, Lecture Notes on Eco nomics and Mathematical Systems 318, Springer-Verlag, Berlin, 1988. [4] B. C. Eaves, An odd theorem, Proceedings of American Mathematical 26 (1970) 509-513. [5] B. C. Eaves, Homotopies for computation of fixed points, Mathematical gramming 3 (1972) 1-22.

Society Pro

[6] B. C. Eaves, A Course in Triangulations for Solving Equations with Deforma tions, Lecture Notes on Economics and Mathematical Systems 234, SpringerVerlag, Berlin, 1984. [7] B. C. Eaves and R. Saigal, Homotopies for the computation of fixed points on unbounded regions, Mathematical Programming 3 (1972) 225-237. [8] B. C. Eaves and J. A. Yorke, Equivalence of surface density and average direc tional density, Mathematics of Operations Research 9 (1984) 363-375. [9] H. Freudenthal, Simplizialzerlegungen von beschrankter flachheit, Annals of Mathematics 43 (1942) 580-582. [10] M. Haiman, A simple and relatively efficient triangulation of the n-cube, Discrete Computational Geometry 6 (1991) 287-289. [11] M. Kojima and Y. Yamamoto, Variable dimension algorithms: basic theory, in terpretation, and extensions of some existing methods, Mathematical Program ming 24 (1982) 177-215.

Triangulation
Algorithm

35

[12] H. W. Kuhn, Simplicial approximation of fixed points, Proceedings of National Academy of Science 61 (1968) 1238-1242. [13] G. van der Laan and A. J. J. Talman, A restart algorithm without an extra dimension, Mathematical Programming 17 (1979) 74-84. [14] G. van der Laan and A. J. J. Talman, A new subdivision for computing fixed points with a homotopy algorithm, Mathematical Programming 19 (1980) 78-91. [15] C. Lee, Triangulating the d-cube. In: Discrete Geometry and Convexity, 205-211, J. E. Goodman, E. Lutwak, J. Malkevitch, and R. Pollack, New York Academy of Sciences, New York, 1985. [16] O. H. Merrill, Applications and Extensions of an Algorithm that Computes Fixed Points of Certain Upper Semi-Continuous Point to Set Mappings. PhD Thesis, Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, 1972. [17] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [18] R. Saigal, Investigations into the efficiency of fixed point algorithms. In: Fixed Points: Algorithms and Applications, 203-223, S. Karamardian, Academic Press, New York, 1977. [19] J. F. Sallee, Middle cut triangulations of the n-cube, SIAM Journal on Algebraic and Discrete Methods 5 (1984) 407-418. [20] H. E. Scarf, The approximation of fixed points of a continuous mapping, SIAM Journal on Applied Mathematics 15 (1967) 1328-1343. [21] A. J. J. Talman, Variable dimension fixed point algorithms and triangulations, Mathematical Centre Tracts 128, Mathematisch Centrum, Amsterdam, 1980. [22] M. J. Todd, The Computation of Fixed Points and Applications. Lecture Notes on Economics and Mathematical Systems 124, Springer-Verlag, Berlin, 1976. [23] M. J. Todd, On triangulations for computing fixed points. Mathematical Pro gramming 10 (1976) 322-346. [24] M. J. Todd and L. Tunccel, A new triangulation for simplicial algorithms, SIAM Journal on Discrete Mathematics 6 (1993) 167- 180.

S. Dempe

36

Recent Advances in Nonsmooth Optimization, pp. 36-56 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

On Generalized Differentiability of Optimal Solutions and its Application to an Algorithm for Solving Bilevel Optimization Problems Stephan D e m p e Department of Mathematics, Chemnitz, Germany

Technical

University

Chemnitz-Zwickau,

09107

Abstract

Bilevel optimization problems are mathematical programming problems where the set of all variables is divided into two parts x and y and x is to be chosen as an optimal solution of a second problem parameterized in y. Thus, bilevel programming problems are hierarchical ones having a parametric optimization problem as part of their constraints. By use of sensitivity analysis for optimal solutions of parametric optimization problems it is possible to reformulate this problem as an one-level problem with implicitly determined objective function. If this function proves to be locally Lipschitz continuous, then a bundle - trust region algorithm can be applied for solving the reformulated problem. Recently, Lipschitz continuity of the optimal solution of parametric optimization prob lems has been shown using assumptions which (by the first time) do not imply uniqueness of the associated Karush-Kuhn-Tucker multiplier. In this paper we will describe a method for surely computing a subgradient of the optimal so lution which is needed when applying the bundle - trust region algorithm for solving the reformulated problem. We will also prove that the assumptions necessary for a convergence proof for this algorithm are satisfied.

1

Introduction

Let us consider a situation where two decision makers try to realize best decisions with respect to their own, generally different goals which are additionally forced to act according to a certain hierarchy. This means t h a t a first decision m a k e r , the

37

Bilevel Optimization Problems

so-called leader, selects his decision y G Y C Rm first and that the second decision maker, which is called the follower, selects his choice x(y) in reacting optimally on the leader's selection. For being concrete, let the follower take an optimal solution of the following optimization problem parameterized in y as his solution: x(y) * ( » ) G **(y) ( y ) ::= = Argmin {h0Q(x,y) (x,y) \ h(x,y) < 0}, 0},

(1)

wherefc 0 GC 2 (IR"xR m [[R), h G C2((R" x R " 1 , ^ ) , h(x,y) = (h,(x,y),..., hp(x,y))T Now, taking the reactions x(y) of the follower into account, the leader tries to find a value for the parameter y such that this parameter together with the reaction of the follower minimizes his objective function /0, i.e., he solves the so-called bilevel optimization problem: ffo(x,y) —* " min " 0{x,y)^-mm» yeY,

xe #(»), *(y),

(2)

where/0 G C ^ R " x R m ,IR). The bilevel optimization problem has many potential applications, we cite only two of them: a) Computing "optimal" chemical equilibria: Although the chemists are technically not able to observe in situ the single chemical reactions at higher temperatures, they described the final point of the system by a convex programming problem. In this problem, the entropy functional h0(x,p,T) is minimized subject to the conditions that the mass conservation principle is satisfied and masses are not negative. Thus, the obtained equilibrium state depends on the pressure p and the temperature T in the reactor as well as on the masses y of the substances which have been put into the reactor: Ect(p,T)x,+ f:x{\If —►im i=i

i=i

a z = E x,, Ax = y, x > 0, JJ — —AI

where G < N denotes the number of gaseous and TV is the total number of reacting substances. c(p,T) gives the chemical potential of a substance which depends on the pressure p and the temperature T [36]. Let x(p,T,y) denote the unique optimal solution of this problem, p, T, y can thus be considered as parameters for the chemical reaction. The problem is now that there exists some desire about the result of the chemical reactions which should be reached as best as possible, as e.g. the goal that the mass of one substance should be as large or as small as possible in the resulting equilibrium. To reach this goal the parameters p, T, y are to be selected such that the resulting chemical equilibrium satisfies the overall goal as best as possible [27]: (c,x) ->min (p,T,y)& Y, x = x(p,T,y).

S. Dempe

38

b) Problems of Principal-Agency Theory: Let there be a principal who has engaged an agent to act for, on behalf of or as representative for him. Both decision makers have made a contract where it is fixed that the principal delegates (some part of) jurisdiction to the agent thus giving him the freedom to select his actions (more or less) according to his own aims only. Hence, having only an expectation about the results of his actions and using an utility function G : IR x A —* IR for measuring the value of the reward s(x) from the principal against the effort for his action, the agent tries to maximize the expected utility of his action a € A : / G(s(x),a)g(x\a)dx

—> max,

(3)

where X is the set of possible results of the actions a £ A of the agent. The density function g(x\a) is used to describe the probabilities of realizing the result x 6 X if the agent uses action a 6 A. The reward s(x) is paid by the principal to the agent if result x is achieved. The function 5 : X —» IR is also part of the contract made by both parties. From the view of the principal, the function s describes a system of incentives which is used to motivate the agent to act according to the aims of the principal. Thus, the principal has to select this function such that he achieves his goals as best as possible. Assuming that the principal uses the utility function H : IR —> IR to measure his yield x — s(x) resulting from the activities of the agent and that he uses the same density function g to evaluate the probabilities for realizing the result x, he will maximize the function / H{x — s(x))g(x\a')dx

—> max,

(4)

where 5 is a set of possible systems of incentives and a' solves (3) for fixed s(-). The model (4), (3) for describing the principal-agency relationship is not complete without the condition / G{s{x),a')g(x\a')dx>c, (5) where a' maximizes (3). If this inequality is not satisfied, the agent will not be willing to sign the contract with the principal. Summing up, the principal's problem is to select s 6 S maximizing the function (4) subject to the condition that a' 6 A solves (3) for fixed function s £ S and (5) is satisfied.

2

Methods for Solving Bilevel Programming Problems

As introduced above, the leader has no possibility to influence the follower's selection of x(y) 6 #(«/)■ This implies that he is able to evaluate the real value of his objec tive function (and to check feasibility of his selection) only after being aware of the follower's choice. This has the following unpleasant implications:

Bilevel Optimization

Problems

39

1. The leader is in general not able to estimate the quality of his choice in the moment of its selection. 2. If the leader is able to compute all the real reactions x(y) of the follower on his decisions and if he inserts the resulting function x : R m —> R n into his objective function to obtain the auxiliary function g(y) := /o(x(y),y) to be minimized then this function would in general not be continuous. 3. An optimal solution of the resulting problem will generally not exist. To illustrate these properties, consider the following Example 2.1 ([23]) Let ^(y) = Argmin {xy : 0 < x < 1} and consider the problem

x2 + y2 i f

—* min

*(
Then, tf(0) = [0,1], tf(y) = {0} for y > 0 and tf(y) = {1} for y < 0. Now, if the follower selects x(0) = 0.5, then g(0) = 0.25, g(y) = 1 + y2 if y < 0 and g{y) = y2 for y > 0. The infimal value zero of g(-) is not attained, i.e., the bilevel programming problem has no solution. To avoid the difficulties resulting from nonuniqueness and discontinuity of the solution mapping of the lower level problem (1) the following assumptions are used in the sequel: ( A l ) hj(-,y),

i = 0 , . . . , p , are convex functions for each y G Y.

(A2) Vy G Y 3x G R." satisfying h{x,y) < 0. Under (Al), (A2), a vector x G R" with h{x, y) < 0 is optimal for problem (1) iff the set of Karush-Kuhn-Tucker multipliers A{x,y) := {A > 0 : VxL(x,y,X)

= 0, \Th(x,y)

= 0}

is nonempty, where L(x, y, A) = h0(x, y) + XTh(x, y) denotes the Lagrangean of prob lem (1). For (x,y, A,r) G R n x R m x R" x R m let K(x,y,\,r):={d:

Vxh,{x,y)d+ Vyhi(x,y)r = 0, Vxht{x,y)d + Vyhi{x, y)r < 0,

if A, > 0, if A, = h,{x, y) = 0}

denote the critical cone and K°(x,y, A) := {d : Vxh,(x, y)d = 0 for A, > 0}. (A3) Vy G Y, Vx G *(«/), VA G A(x,y), Vd G A'°(x,y,A) with d ^ 0 we have dTVlxL{x,y,X)d>0.

S. Dempe

40

Assumption (A3) is a strong sufficient optimality condition of second order guaran teeing uniqueness of the optimal solution of the convex optimization problem (1). Slater's condition (A2) implies that A(x,y) is a convex compact polyhedron being equal to the convex hull of its finite set of vertices Ek(x,y). We will also use the following constant rank assumption [3]: (A4) For each pair (z°, y°) with y° e Y, x° g *(j/°) there exists an open neighborhood U of (x°, y°) such that for each subset I C I(x°, y°) := {j : hj{x°, y°) = 0} the family of gradient vectors {Vxhi(x,y) : i € 7} has the same rank for all (x,y) € U. In most cases, methods for attacking bilevel programming problems rest on con verting them into (locally) equivalent one-level problems (c.f. the commented bibli ography on bilevel programming [37]). The first approach in doing so is to use the Karush-Kuhn-Tucker conditions applied to the lower level problem which are neces sary and sufficient optimality conditions provided that the assumptions (Al), (A2) are satisfied. The resulting problem fo(x,y) -» min VxL(x,y,\) = 0, \Th{x,y) = 0, A > 0 , h(x,y)<0,

,6} y€Y

has nonlinear equality constraints and complementarity conditions in general. More over, usual constraint qualifications as the Mangasarian- Fromowitz or the linear independence constraint qualifications cannot be satisfied in general. So, apart from vertex ranking algorithms for linear [4] or quadratic bilevel programming problems [38] there seem to exist only two ideas for solving problem (6): either one uses an enumeration approach where the complementarity constraint \Jh(x,y) = 0 is used for branching [9] or a penalty function approach penalizing some of the constraints in the objective function [26]. Note that conjecture 1 in [9] is generally not true if the lower level problem is not a quadratic one which may cause some difficulties in computing bounds in the suggested branch-and-bound algorithm. The second approach uses a parametric variational inequality related to the lower level problem for reformulating the bilevel programming problem as a one-level opti mization problem. This approach is slightly more general than ours since each convex optimization problem can be equivalently written as a variational inequality but not vice versa. In [11], a descent algorithm for such problems is given using sensitivity information for the lower level problem. In [24], an interesting exact penalty function approach is used for deriving necessary and sufficient optimality conditions for the bilevel programming problem. A third approach uses the optimal value function
h(x,y)<0}

Bilevel Optimization

Problems

41

to reformulate the bilevel programming problem as fo(x,y) -> min x,y

ho{x,y)<
h{x,y)<0,

(7)

y£Y.

The function y>(-) is locally Lipschitz continuous provided that the assumptions (Al), (A2) are satisfied. Hence, problem (7) is a Lipschitzian optimization problem and methods for attacking such problems can be used for investigating this problem. Using this approach, necessary optimality conditions for problem (2) have been derived in [40]. A fourth idea is based on the application of a penalty function approach directly to both levels in problem (2). Using this idea in combination with sensitivity analysis for optimal solutions of unconstrained optimization problems, an algorithm solving the bilevel programming problem has been described in [15]. A last but not least method for investigating problem (2) uses sensitivity anal ysis for the optimal solution of the parametric nonlinear programming problem (1) directly. Assuming that the optimal solution of problem (1) (or the solution of the variational inequality constraint if a reformulation of the problem according to the second approach is used) is given by a locally uniquely determined and locally Lip schitz continuous vector-valued function x(-), problem (2) allows its local equivalent reformulation as fo(x(y),y)-ymm. (8) In [30] a certain quadratic approximation of the objective function in (8) is used to describe a globally convergent iterative descent algorithm for bilevel programming problems. In [18], [19], [28], [29] a bundle - trust region algorithm is applied directly to the problem (8). In all these papers, the Lipschitz property of the vector-valued function x(-) is essential. Recently, it has been shown that the assumptions needed for proving local Lip schitz continuity of this function can be weakened [32]. The main distinction between the used assumptions is that the weaker ones do not imply unique Karush-KuhnTucker multipliers in the lower level problems. The consequences of this fact will be investigated in this paper. We will derive formulae for the computation of a generalized gradient of the function x(-) and give conditions guaranteeing that indeed one generalized gradient has been computed when applying these formulae. Moreover, we will prove that the assumptions needed for a convergence proof for the bundle trust region algorithm are satisfied. The reason for proposing to use the bundle trust region algorithm when solving bilevel programming problems are twofold. At the first hand, this algorithm has shown to be a good method for solving bilevel programming problems [18], [19], [28] (under stronger assumptions!). On the other hand, the ideas of such an algorithm have been intensively discussed (cf. e.g. [34], [35]), so that it seems not to be necessary to repeat the description of the algorithm here.

42

S. Dempe

3

Properties of the Optimal Solution of t h e Lower Level Problem

Theorem 3.1 Let assumptions (At) - (A4) be satisfied at y = y° G Y, x° G \P(y°). 1. [20] The optimal solution is strongly stable, i.e., there exist an open neighborhood V ofy° and a uniquely determined continuous vector function x : V —> R" such that x(y°) = x° and x(y) is the optimal solution of problem (1) for each y G V. S. [5] This function x(-) is directionally differentiable aty = y°, i.e., the directional derivative lim t~1[x(y

x'{y°',r):= exists for each direction r g R m

+ tr) — x°]

.

3. [22], [32] The function i ( ) is locally Lipschitzian. 4- [32] The directional derivative x'(y°;r) is equal to the unique optimal solution of the convex quadratic optimization problem Q(x°,y°, \,r) : \dTVlxL{x\

y°, X)d + dTVlyL(x°,

y°, A)r -> mm

d£K(x°,y°,\,r) for each X € S(x°,y°,r)

:= Argmax {VyL(x0,y°,fi)r

:n G

\(x°,y0)}.

5. [7] The function i'(t/°; •) is Lipschitz continuous, too. If the existing function x(-) is inserted into problem (2), then a generally nonconvex, nondifferentiable Lipschitzian optimization problem arises: 9(y) ■= fo(x(y),»/)-► mm.

(9)

For being able to solve problem (9), at least one generalized gradient of the function g(-) has to be computed in every step of the bundle - trust region algorithm. A continuous function z : ft —> Ft is called a PC1 function near y° if there exist an open neighborhood U of y° and a finite number of continuously differentiable functions z* :U ->Rk, i = l,...,N, such that z(y) G {z\y),..., zN(y)} V y eU. PC1 functions are known to be locally Lipschitzian [14], [21]. It is not very difficult to verify that the function i(-) given in (1) is a PC1 function near y° provided that

Bilevel Optimization

43

Problems

the assumptions (Al) - (A4) are satisfied [22], [32]. The functions x'(-) thereby used are equal to the optimal solution functions of the enlarged problems min{/i 0 (z,y)

: h

i{x,y) = 0, i G / } ,

(10)

where I C I(x°,y°) is chosen such that the gradients {Vxhi(x°,y°) : i G / } are linearly independent. These are the main ideas used for proving Theorem 3.1(3.) which can also be used to derive a formula for a generalized gradient of x(-) at y = y°. The proof of the following theorem is a direct consequence of the implicit function theorem (cf. e.g. [10]). Theorem 3.2 Let assumptions (Al) - (AJ,) be satisfied at y = y°, x° G ty(y°). Let A G EA(x°,y°) and {j : Xj > 0} C I C I(x°,y°) be such that the gradients {Vxhi(x°, y°) : i G / } are linearly independent. Then, 1. Problem (10) has an optimal solution which is strongly stable and coincides with x° for y = y°. 2. This optimal solution is continuously differentiable at y = y° 3. Denote the unique Karush-Kuhn-Tucker A 7 (y). Then,

multiplier vector of problem (10) by

(v*y)\ ( VAV) ) ~

B.iNt ' ''

whe VlxL,(x°,y°,\'(y0))

B,

VjM*0.»°)

and

-V^M*°,?/\AV))

v 11

and Z,/(i,t/, A7) = h0(x,y)+

l

-Vyh^x^y0) E X]h3(x,y),

h, =

(h,(x,y))j&1.

By Theorem 3.1(3.), the function x(-) admits a generalized Jacobian dx(y°) in the sense of Clarke [2]: dx(y°) := conv{# G R" x m : 3 {yk}?=1 satisfying Km yk = y°, Vx(yk)

exists and lim Vx(yk)

= H}.

k—*oo

By Theorem 3.2, the matrices Vx 7 (j/°) are candidates for being elements of dx(y°).

44

4

S. Dempe

The Generalized Gradient of the Optimal Solution

Let U(y°) be an open neighborhood of y° and define

M :=

IJ U

AeEA(z°,i,°) A6EA(r°,v«)

M(X), MA)>

where M{\) A<(A) = { K:{j:X If : {j }>0}C : Aj > 0} C K C /(x°,y°) with {VA(*°,y°) being linearly linearly independent}. independent}. {VA-(* 0 ,;/ 0 ) :: »i G^ A"} } being For each I £ M, the set tt == {y € t/(y°) : x(y) = x'(y)} can be considered. For computing the generalized derivative of the vector-valued function »(•), we need the family of sets C:={l€M A4 :y° g cl C:={/G :y°ed{mt¥i}). {int *>}}. Theorem 4.1 [33, Proposition A.4.1] Under the assumptions (A1)-(A4),

dx(y°) = = conv

we have

{v* V) :lee}.

It has been shown in Theorem 4.1 that, for computing the generalized derivative of the function x(-) it is sufficient to compute Vx'(y0) for all / G C. The following very simple example is used to illustrate Theorem 4.1: Example 4.2 Consider the problem — x —* —> min X X

*22

z (( xx -- 11 .. 55 )) 22

< <

1 1

J:

< 33- y\ y2 -- y\ y\ < > 0 . 7 5 ( y . 5 2) 2-- ((yy 2 -- 0 0.5) . 5 ) 22 > 0.75 - ( y i 1- - 00.5) 2

with two parameters y, and y2. Then,

ifj/ev{1},

x(y) = =1j yfo - y? - yyf22 ( 1.5 - ^0.75 - (yi - 0.5)2 - (y2 - 0.5)2 where

yy {{ ]] }} V{2) V{2) Y{ Ym3)

= = = = = =

if
2 2 {y {y:y: 2y\ 2+ + y| y\ < < 2, 2,2 ((yyii -- 0.5) 0.5)2 ++ (y (y22 -- 0.5) 0.5)2 >> 0.5}, 0.5}, {y:2
Bilevel Optimization

Problems

45

Figure 1: The sets Yj in Example 4.2 Let y° = (1,1) T . Note that V{i}, Y{2) and Y^ are the only sets in C and that (0,0)

for y G int Y{1},

Vx(y) = < \/3 - yl - yl

for y G int Y{2),

(-s/i.-j/a) 2

2

^0.75 - (y, - 0.5) - (y2 - 0.5) "'(y, - 0.5, j/ 2 - 0.5) for y 6 int Y{3}. Hence, <9x((l,l) T ) = c o n v { ( - l , - l ) , ( l , l ) } , but it is not clear whether a sequence like {yk} C Y^ can essentially contribute to dx(y°) in general. Moreover, it seems to be important to detect sets / G M \ C, or equivalently, to provide a tool for surely selecting I € C. This gives us a motivation for the following investigations. For / £ M, let Tj — lr: 3{yk}?=l

-{

Q Y, satisfying lim yk = y°, lim k—'oo

_ y.,011 *_«, ||j,«* _

be a tangent cone for F/ at y = y° Then, due to z(y) = X 7 (J/) for each y G F/ and by Theorem 3.1 we have x'(y0;r) = Vx'(y0)rVr€T,. (11) Consider the following assumption: (A5) The gradients with respect to both variables {Vhi(x°, y°) : i G I(x°,y0)} linearly independent.

are

S. Dempe

46

Assumption (A5) is a certain nondegeneracy assumption for the vertices in EA(x°, y°), it does not imply that A(x°,y°) reduces to a singleton. Theorem 4.3 Let assumptions (A1)-(A5)

be satisfied. Then

1- r £ Tj, I £ M, implies that the following system has a solution d:

vMAy°)d + VMAy>{ <°0'

J j ^ ^ j .

(12)

2. r £ int Tj, I £ M, implies that the following system has a solution d:

vMAy°)d + VMAy°)r{ =% f - ^

,o )U .

^

Proof: a) Let r £ Tj, I £ M. Then, by (11) and Theorem 3.1 it is easy to verify that d = x'(y°;r) satisfies system (12). b) Consider r° £ int Tj, I £ AA. Then, there exists an open neighborhood V(r°) such that, for each r £ V{r ), system (12) has a solution. If for r' £ V(r°), system (13) has no solution, then there exists at least one index j £ I(x°,y°) \ I such that Vthi(xoyyd+Vthi(x0,y°y = 0, Vi € 7, implies V x fc i (ar 0 ,y 0 )d+V v fc i (x 0 ,» 0 )r' = 0. This is only possible, if {V r /!,(i°,i/ 0 ) : i £ I U {j}} are linearly dependent. Then there exist numbers 7,, i £ I U {_?'} with ~/j > 0 (by I £ M) such that £ T . V A ' t x ' V ) = 0,

(14)

where I' = 7U {j}. By r' £ V(r°) there exists an open neighborhood V'(r') C V(r°). We show that the system (13) has no solution for each r £ V'(r'). Let there exist r" £ V'(r') such that system (13) has a solution d" for r = r" Then, we have

0 > £ 7.-[VA-(*°, y V + V,fc,-(x°, t/V'] = £ 7,:Vyfc,-(*°, y V ie/'

ie/'

or

-£7.vA(z

i^yh^yy

On the other hand, since system (13) has no solution for r = r', we have

-£7iVA(*°y)r'=i^M^yyConsider r = a r ' + (1 — a)r" for sufficiently small a > 1 such that r 6 V'(r'). Then, 1iVyhj(x0,yy

= « 7 i V , f c i ( * ° , » V + (1 - a ) 7 J V ^ J ( x ° , j / 0 ) r "

> -«E7iVA(*°,»V-(i -«)E7.-VA(*°.»V=

-E^A(*°,»,)F,

47

Bilevel Optimization Problems i.e. o £E77..V >0 . - Vv /,*f 'c((z* o, < l y/°°))rr>

But then, for each d satisfying V xAh(t {*x°\, y°)d + Vyh,(x°,y°)r

= 0, V i G / ,

we have 0 < E 7 i V A ( x* ° , y°)d y°)d + + £E 7 , V A ( * ° , y°)r = ^ [VV . M M*A0 , V°)3 + V^;(x°, Vyhj(X°, ier ie/' 16/' ig/'

y°)r}. y°)r].

By 7j > 0 this implies that 3 cannot be a solution to (12) for r = r. This contradiction to r° G int Ti shows that (13) has no solution for each r" G V'(r'). But then, E 7.'V y M* 0 ,y°)r" = 0 for each r" G V'(r') or £ 7.V^,(x 0 ,j, 0 ) = 0. Due to (14) this contradicts (A5) and proves the theorem.

D

Remark 4.4 Since problem Q{x°,y°,X,r) is a quadratic optimization problem with strongly convex objective function, it has an optimal solution if and only if its feasible set is not empty. Comparing this feasible set with the system (12) we see that feasibility of (12) implies solvability of the problem Q{x°,y°,X,r) for each multiplier X e Mx°,y°) satisfying {j : A, > 0} C I. This, in fact, is also true for the unique vertex X G A(x°,y°) with {j . A, > 0} C I. By Theorem 4.3(2), if r G int T,, then the following constraint qualification is satisfied for the direction r: (A6) There exists a solution of system (13) and the gradients {VA(*°,*/°) are linearly independent.

:>£/}

This constraint qualification is dual to the one by Gollan [13] (cf. [l]). Gollan's constraint qualification is implied by the Mangasarian-Fromowitz constraint qualification for each r. If Gollan's condition is satisfied for each r 6 R ™ , then the MangasarianFromowitz condition is also satisfied. But notice that (A6) is related to the modified problem mm{ho(x,y):h,{x,y)

= 0, i £ I, hj(x,y) < 0, j G e I(x°,y°) \ 1}

(15)

only, and not to problem (1). We will compute the directional derivative of the optimal solution x'(-) of problem (15). Therefore, we have to consider the set S/0(z°,2/V)

= Argmax {VyL,o(x0,y°,^r

:V V xILAx°,y°,n) L / o(x°,y 0 ,^) = 0, W W > 0, i G 7 ° \ / } , (16)

S. Dempe

48

where 1° = I(x°,y°).

The dual problem to (16) is

Vxh0{x°,y°)d

+ Vyho(x0,y0)r

-» mm

vMAy°)d + vMAy> {=°;

j ^ ^ j .

(17)

Clearly, the unique A with {i : A, > 0} C I is feasible for (16). Moreover, by (13) and complementarity slackness, it is also optimal for (16) and each feasible solution for (13) is optimal for (17). But this implies that S,o(x0,y°,r)

= {A} ( and S(x 0 ,y°,r) = {A}).

Now, by (A3) and (A6), the directional derivative of x'(-) in direction r exists at y = y° [1]. At least, this implies that problem (15) has an optimal solution ~xI(y° + tr) at y = y° + tr for sufficiently small t > 0. But this solution needs not be equal to x(y° + tr), the inclusion y° + tr G Yj is not necessarily true. Hence, the opposite assertion to Theorem 4.3 (2) does not hold in general. E x a m p l e 4.5 Consider the problem min{i : x < y}. Here, x(y) = y if y < 0 and x(y) — 0 otherwise. Assumptions (A1)-(A5) are satisfied. At y = 0, r = 1, with I = {l}, system (13) reads a.s d — r = d — 1 = 0 , i.e. (A6) is also satisfied. The modified problem (15) is min{i 2 : x = y} with x'(y) = y. Hence, I ; ( J / ) ^ I(J/) for y > 0. The following theorem gives a condition guaranteeing y° + tr 6 Yj for sufficiently small < > 0. Theorem 4.6 Let assumptions (A1)-(A5) be satisfied {or problem (l). Select any vertex A0 6 EA(x°,y°) and a set I £ M(\°). Let r° be a direction such that (A6) is satisfied. If strict complementary slackness holds for Q(x°,y°, \°,r°), i.e., if I={i:

V r / i , W y ( * / V 0 ) + Vyh,(x0,y°)r0

= 0} = {t : A? > 0} U {j : ,,° > 0}, (18) where fi° denotes an optimal Karush-Kuhn-Tucker multiplier vector for the problem Q(x°,y0,X°,r0), then r° G T 7 . Proof: (A6) implies solvability of system (12). Hence, x'(y°; r°) is equal to the unique optimal solution of the problem Q(x°, y°, A0, r°). Moreover, S(x°, y°, r°) = {A0}. The definition of the directional derivative x'{y°; r°) and (18) imply that hi(x(y°+tr°), y°+

Bilevel Optimization

Problems

49

tr°) < 0 for i 0 I and sufficiently small t > 0. Hence, x(y° + tr°) is also the unique optimal solution of the enlarged problem min{h0(x,y)

: hi(x,y) < 0, i G / }

for y = y° + tr° and small t > 0. For this problem, the linear independence constraint qualification is satisfied at y = y° (by / G M(X0)). Hence, its Karush-Kuhn-Tucker multiplier vector is uniquely determined and is also directionally differentiable [16]. This multiplier A(y°) is equal to (A°) i€ /, its directional derivative is (n°)jel at y — y°. Hence, A,-(y° + tr 0 ) := A,-(»°+*r°) = X?+ttf+o{t) > 0, i G / , \j(y°+tr°) := 0, j £ I is one Karush-Kuhn- Tucker multiplier vector for problem (1) for sufficiently small t > 0, where lim °-f- = 0. Thus, /i,(i(2/° + tr°),y° + tr°) = 0, V i G J and small i > 0. Consequently, x(y° + 0, i.e., r° G T/.

a It should be noticed that

|J cl T, = Rm /ec provided that assumptions (A1)-(A5) are satisfied. Hence, Theorem 4.3 implies that (A6) is satisfied a.e. on IRm. It is not very hard to show that, if equation (18) is violated, then it can be satisfied by arbitrarily small linear right-hand-side perturba tions of the constraints i £ I and by adding a linear term to the objective function. In this sense, equation (18) should be generically satisfied. The following simple example shows that equation (18) can be violated for all sets / G M and all r° G R m Example 4.7 Consider the problem min{(x — y)2 : x < y}. Then, x{y) = y, assumptions (Al)-(A6) are satisfied for each y, r. Moreover, x'(y; r) = r, hence {i : Vxhi(x,y)x'(y!r) + V1/hi(xty)r = 0} = {1}. But X(y) = 0, i.e., equation (18) cannot be satisfied. Corollary 4.8 For I G M, let Z[ denote the set of all tuples (d,r,a,fi) VlxL{x°,

y°, X°)d + VlyL(x°,

vxh(*0>y0)d + vMAy°)r{ X° + n,,+ a > 0, iel, where X° G EK{x0iy°) with a < 0.

satisfying

y°, X°)r + VTxh(x°, j , > = 0,

<°l h i { A y 0 )

IH = 0, i$

+ a,

]%]]

I,

with {i : X° > 0} C I. Then, r G T, for each (d,r,a,n)

G Z,

S. Dempe

50

Thus, for computing one generalized gradient d € dx(y°) we can use the following algorithm provided that there exists a direction r G R m such that equation (18) is satisfied: Step 1: Solve the combinatorial optimization problem mm{a :(d,r, a, p.) e (J Zj] by incomplete minimization (i.e., select one feasible solution (d, r, a, p) for this problem, for which a < 0.) Step2: Let (d°,r°, a°,p°) be the solution computed in step 1, let 1° £ M be such that (d°, r°, a0, p°) 6 Zp. Then, compute v € dx(y°) by use of Theorem 3.2(3), where / = 7°, A , V ) = \°, i G / and v = V I ' ° ( J , ° ) . Remark 4.9 By restricting to parameter-independent constraints, it should be pos sible to use an idea similar to that in [28] to guarantee that for each I G M. there exists at least one direction r G R™ with r G Tj and Yj ^ 0 (cf. also [25]).

5

The Sub differential of the Objective Function of the Auxiliary Problem

If assumptions (A1)-(A4) are satisfied, then problem (2) can be replaced by the auxiliary problem mm{g{y) := f0(x(y),y):yeY}. (19) Under appropriate assumptions with respect to Y, problem (19) is a nondifferentiable, Lipschitzian optimization problem provided that f0 G C J (R" x R m , R ) . For the subdifferential of g(-) we then have dg(y) - {VI/0(i(j/),

y)u + V V / O ( I ( J / ) , y) : u G dx(y)}.

A directionally differentiable locally Lipschitzian function z : R' —* R 1 is called Clarke regular at w G R' iff z'(w; p) = z°(w; p) := max vTp V p G R' vEdz(w)

(cf. e.g. [8]). The following example shows that the objective function g of the auxiliary problem (19) is in general not Clarke regular: Example 5.1 Consider the problem minfxj/!^ . x G *(«/i, y2)}, where $(3/1,1/2) is equal to the set of optimal solutions in Example 4-2.

Bilevel Optimization

Problems

51

Then, for V ° = ( } ) , dg(y°) = {v + (1,1) : v e conv { ( - 1 , - 1 ) , (1,1)}}. S e t r = ( J V Then, 9°(y°; r) = max {0 ■ 0 + (1 - fi) ■ 2} = 2, but

x V ; r ) = ( - l , - l ) , le.,g'(y°;r) = 0. A directionally differentiable, locally Lipschitzian function z : R' —> R is called weakly semismooth if z'(w.r) = lim v(w + tr)r, (->+0

v

'

where v(w + tr) 6 dz(w + tr) for f > 0, V r 6 R', V u> G R'. Theorem 5.2 The function g defined in problem (19) is weakly semismooth provided that assumptions (Al)-(Aj) are satisfied. Proof: We have to show only that x'(y°; r) = lim d(y° + tr)r

(20)

for each r, where d(y° + tr) e dx(y° + tr) V t > 0 sufficiently small. Let t > 0 and take any d(y° + tr) £ dx(y° + tr). By application of a convexity argument it is obvious that (20) is satisfied for each sequence {d(y° + tr)} iff it is satisfied if d(y° + tr) are taken as vertices of dx(y° + tr). Then, d(y° + tr) = VxK(y° + tr) for some K = K(y° + tr) € M (by Theorem 4.1). Take any subsequence {t' > 0} of {t > 0} such that the sets K(y° + tr) = A" are independent of t 6 {? > 0}. Then, {V x fc,(i°,y 0 ) : i 6 K'} are linearly independent, the sequence {X(y° + tr)} t6 { ( / >0 } of uniquely determined vertices of A(x(y°+tr),y°+tr) with {i : \j(y°+tr) > 0} C A''has a limit point A0 e EA(x°, y°). Moreover, A0 g S(x°, y°, r) [5]. Hence, x'(y°; r) is equal to the unique optimal solution of the problem Q(x°, y°, A0, r) and {d(y° + tr)} (e { t / >0 } converges to VxK\y°) by Theorem 3.2(3) with / = A', \K'(y°) = A0. Now, it is easy to see that {VxK'(y°)r, V\K (y°)r) satisfies the Karush-Kuhn-Tucker conditions of the problem Q(x°,y°,A°,r), i.e. x'{y°;r) = Vxh"(y°)r. Since {f > 0} and {xK(y° + tr)} have been taken arbitrarily, the proof follows. □ Remark 5.3 By the proof of part 3 of Theorem 3.1, the function g(-) defined in (19) is also a PC1 function. Thus, ifY = R m , optimality conditions for (19) can be found

e.g. in [39].

S. Dempe

52

Theorem 5.2 enables us to use a bundle - trust region algorithm for minimizing the function g(y). The bundle- trust region algorithm has been described in detail in [35] so that it seems to be not necessary to repeat this here. The following convergence result is a simple consequence of the result reported in [35]: Theorem 5.4 Consider the problem 9{y) ■= fo(x(y),y)

-

> m

j

n

and let the assumptions (A1)-(A5) be satisfied for each y £ R m . Let, for each r g R.m and each y 6 R.m satisfying (A6) also equation (18) be satisfied. If the sequence {yk^'h=\ of iteration points remains bounded, then there exists an accumulation point yof{yk)T=1 withOedg(y). It seems to be worth mentioning that the function
6

Conclusion

For solving bilevel programming problems, the application of a bundle - trust region algorithm is possible if the optimal solution function proves to be locally Lipschitzian. It has been shown in this paper that this is also possible, if the linear independence assumption usually used is replaced by the Mangasarian-Fromowitz plus the constant

Bilevel Optimization

Problems

53

rank constraint qualifications. But, under the latter assumptions, the Karush-KuhnTucker multipliers are no longer uniquely determined which causes some difficulties in computing a generalized gradient necessary for running the bundle - trust region algo rithm. In this paper we have given a method for surely computing such a generalized gradient. The presented material enables us to treat bilevel programming problems where the constraints in the lower level problem are nonlinear inequalities. In this case, the assumption of linear independence constraint qualification seems to be too strong at least if the number of active constraints cannot be bounded by the dimension of the solution space in the lower level problem. Example 6.1 in [5] can be used to show that the used assumptions cannot be weakened while maintaining Lipschitz continuity of the optimal solution function of the lower level problem.

References [l] A. Auslender and R. Cominetti, First and second order sensitivity analysis of nonlinear programs under directional constraint qualification condition, Opti mization 21 (1990) 351-363. [2] F. H. Clarke, Optimization and Nonsmooth Analysis, (Canadian Math. Soc. ser.: Monographs and Advanced Texts), 1983. [3] J. W. Daniel, On perturbations in systems of linear inequalities, SIAM Journal on Numerical Analysis 10 (1973) 2. [4] S. Dempe, A simple algorithm for the linear bilevel programming problem, Op timization 18 (1987) 373 - 385. [5] S. Dempe, Directional differentiability of optimal solutions under Slater's condi tion, Mathematical Programming 59 (1993) 49-69. [6] S. Dempe, A necessary and a sufficient optimality condition for bilevel program ming problems, Optimization 25 (1992) 341 - 354. [7] V. F. Dem'yanov and A. M. Rubinov, Quasidifferential Calculus, (Optimization Software Inc., Publ. Division, New York, 1986). [8] V. F. Dem'yanov and A. M. Rubinov, Nonsmooth Analysis and Quasidifferentiable Calculus, (Nauka, Moskva, 1990) (in Russian) [9] T. A. Edmunds and J. F. Bard, Algorithms for nonlinear bilevel mathematical programs, IEEE Transactions on Systems, Man and Cybernetics 21 (1991) 83-89. [10] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Programming (Academic Press, New York, 1983).

54

S. Dempe

[11] T. L. Friesz, R. L. Tobin, H.-J. Cho, and N. J. Mehta, Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality con straints, Mathematical Programming 48 (1990) 265-284. [12] J. Gauvin, A necessary and sufficient regularity condition to have bounded mul tipliers in nonconvex programming, Mathematical Programming 12 (1977) 136 139. [13] B. Gollan, Perturbation theory for abstract optimization problems, Journal of Optimization Theory and Applications 35 (1981) 417-441. [14] W. W. Hager, Lipschitz continuity for constrained processes, SIAM Journal on Control and Optimization 17 (1979) 321-228. [15] Ishizuka, Y. and E. Aiyoshi, Double penalty method for bilevel optimization problems, AnnaJs of Operations Research 34 (1992) 73 - 88. [16] K. Jittorntrum, Solution point differentiability without strict complementarity in nonlinear programming , Mathematical Programming Study 21 (1984) 127 138. [17] K. C. Kiwiel, Methods of Descent for Nondifferentiable Optimization, Verlag, Berlin et al., 1985).

(Springer-

[18] M. Kocvara and J. V. Outrata, A nondifferentiable approach to the solution of optimum design problems with variational inequalities, in: P. Kail (ed.): System Modelling and Optimization (Proc. 15. IFIP Conference on System Modelling and Optimization, Zurich, 1991), 1992, 364-373. [19] M. Kocvara and J. V. Outrata, A numerical solution of two selected shape opti mization problems, in: Proceedings of the 16. IFIP Conference on System Mod elling and Optimization, Compiegne, 1993. [20] M. Kojima, Strongly stable stationary solutions in nonlinear programs, in: S.M. Robinson, (ed.): Analysis and Computation of Fixed Points (Academic Press, New York, 1980) 93 - 138. [21] L. Kuntz and S. Scholtes, Structural analysis of nonsmooth mappings, inverse functions, and metric projections, to appear in Journal of Mathematical Analysis and Applications [22] J. Liu, Sensitivity analysis in nonlinear programs and variational inequalities via continuous selections, Report, Dept. of Operations Research, George Washington University, Washington, USA, 1993. [23] R. Lucchetti, F. Mignanego and G. Pieri, Existence theorem of equilibrium points in Stackelberg games with constraints, Optimization 18 (1987) 857 866.

Bilevel Optimization

Problems

55

[24] Z.-Q. Luo, J.-S. Pang, D. Ralph, and S.-Q. Wu, Exact Penalization and stationarity conditions of mathematical programs with equilibrium constraints, Report, McMaster University, Ontario, Canada, 1993. [25] K. Malanowski, Differentiability with respect to parameters of solutions to convex programming problems, Mathematical Programming 33 (1985) 352-361. [26] 0 . L. Mangasarian, Misclassification Minimization, Journal of Global Optimiza tion 5 (1994) 309-323. [27] W. Oeder, Ein Verfahren zur Losung von Zwei- Ebenen-Optimierungsaufgaben in Verbindung mit der Untersuchung von chemischen Gleichgewichten, Ph.D. Thesis, Techn. Univ. Karl-Marx-Stadt, 1988. [28] J. V. Outrata, On the numerical solution of a class of Stackelberg problems, Zeitshrift fur Operations Research 34 (1990) 255-277. [29] J. V. Outrata, On optimization problems with variational inequality constraints, SIAM Journal on Optimization 4 (1994) 340-357. [30] J.-S. Pang, S.-P. Han, and N. Rangaraj, Minimization of locally Lipschitzian functions, SIAM Journal on Optimization 1 (1991) 57-82. [31] T. Petersen, Optimale Anreizsysteme, (Gabler Verlag: Wiesbaden, 1989). [32] D. Ralph and S. Dempe, Directional derivatives of the solution of a parametric nonlinear program, to appear in Mathematical Programming. [33] S. Scholtes. Introduction to Piecewise Differentiate Equations, Preprint No. 53/1994, Universitat Karlsruhe, Institut fur Statistik und Mathematische Wirtschaftstheorie, 1994. [34] H. Schramm, Eine Kombination von Bundle und Trust-Region - Verfahren zur Losung nichtdifferenzierbarer Optimierungsprobleme, (Bayreuther Mathematis che Schriften, No. 30, 1989). [35] H. Schramm and J. Zowe, A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results, SIAM Journal on Optimization 2 (1992) 121-152. [36] W. R. Smith and R. W. Missen, Chemical Reaction Equilibrium Analysis: The ory and Algorithms, (J. Wiley & Sons: New York et al., 1982). [37] L. N. Vicente and P. H. Calamai, Bilevel and multilevel programming: A bibli ography review, Journal of Global Optimization 3 (1994).

56

S. Dempe

[38] L. N. Vicente, G. Savard, and J. J. Judice, Descent approaches for quadratic bilevel programming, Journal of Optimization Theory and Application 81 (1994) 379-399. [39] R. S. Womersley, Optimality conditions for piecewise smooth functions, Mathe matical Programming Study 17 (1982) 13-27. [40] J. J. Ye and D. L. Zhu, Optimality conditions for bilevel programming prob lems, Working Paper, Department of Mathematics and Statistics, University of Victoria, Canada, 1993.

Projected

Gradient

Methods

for NCP

57

Recent Advances in Nonsmoohh Optimization, pp. 57-87 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Projected Gradient Methods for Nonlinear Complementarity Problems via Normal Maps Michael C. Ferris 1 University of Wisconsin-Madison, 53706, USA Daniel R a l p h 2 University of Melbourne, Australia

Department

Computer

Sciences

of Mathematics,

Department,

Melbourn,,

Madiso,,

Vic.

WI

3052,

Abstract

We present a new approach to solving nonlinear complementarity problems based on the normal map and adaptations of the projected gradient algorithm. We characterize a Gauss-Newton point for nonlinear complementarity problems and show that it is sufficient to check at most two cells of the related normal manifold to determine such points. Our algorithm uses the projected gradient method on one cell and n rays to reduce the normed residual at the current point. Global convergence is shown under very weak assumptions using a property called nonstationary repulsion. A hybrid algorithm maintains global convergence, with quadratic local convergence under appropriate assumptions.

'The work of this author was based on research supported by the National Science Foundation grant CCR-9157632 and the Air Force Office of Scientific Research grant F49620-94-1-0036. 2 The work of this author was based on research partially supported by the U.S. Army Research Office through the Mathematical Sciences Institute, Cornell University, the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, under grant DMS-8920550, and the Australian Research Council.

M. C. Ferris and D. Ralph

58

1

Introduction

The nonlinear complementarity problem is to find a vector z € R" satisfying: /(*)>0,

z > 0 ,
(NCP)

where / : R" —> R" is a smooth function and all vector inequalities are taken component-wise. In this paper, we will describe an algorithm for solving nonlinear complementarity problems that is computationally based on the projected gradient algorithm, and uses a reformulation of (NCP) as a system of nonsmooth equations. The algorithm is con ceptually simple to implement and has a low cost per iteration; and we demonstrate its convergence properties assuming only that / is continuously differentiable. The problem (NCP) can be reformulated using a normal map: 0 = f+(x)^'f(x+)

+ x-x+,

(NE)

where x+ is the Euclidean projection of x onto R+. Note that z solves (NCP) if and only if z — f(z) solves (NE), and x solves (NE) if and only if x+ solves (NCP). Normal maps were introduced by Robinson in [32] (see also [29, 30]) and we note here simply that the formulation (NE) has some advantages over (NCP). For example, it is an equation rather than a system of inequalities and equalities, hence its examination from the viewpoint of equations may yield insight difficult to obtain otherwise. This has proven to be the case as demonstrated by recent advances on nonsmooth Newton like algorithms for (NE) in [4, 5, 12, 28, 34]. Nonsmoothness of the normal map, however, is the difficulty assumed. In fact, normal maps such as / + can be cast in a more general framework, where x + is replaced by 7rn(x), the projection of x onto a nonempty closed convex set fi. In this context, finding a zero of the normal map

/n(i) = /(Tfl(x)) + x- Tn(x) is equivalent to a nonlinear variational inequality [ll] defined by the set fi and the function / . In the special case where fi = R", fn = f+- F ° r polyhedral fi, the normal map [31, 33] /n is intimately related to the normal manifold [32]. This manifold is constructed using the faces of the set fi; it is a collection of n-dimensional polyhedral sets (called cells) which partition R". The normal map fa is smooth in each cell of R"; nondifferentiability only can occur as x moves from one cell to another. A cell is sometimes called a piece of linearity. In the particular example resulting from nonlinear complementarity problems where fi = R+, the cells of the normal manifold are precisely the orthants of R". Practical Newton-like methods for (NE) solve a linear or piecewise linear model based at the fcth iterate, xk, to obtain the next iterate xh+1 Unfortunately, this model is not always invertible and this creates problems for defining algorithms and

Projected Gradient Methods for NCP

59

in computing xk+1. In this paper, we are concerned with defining practical algorithms with strong global convergence properties for finding zeros of normal maps. Our goal is to obtain convergence, at least on a subsequence, -to a Gauss-Newton point for normal maps. This generalizes the familiar notion from nonlinear equation theory where a Gauss-Newton point is a stationary point for the problem of minimizing the Euclidean norm residual of the function. We are ultimately interested in zeros of /n but finding one may be on the level of difficulty of finding zeros of general nonlinear functions. We revert to considering the residual function 9(x)±{

minl-\\Mx)\\2,

which gives us a measure of the violation of satisfying/n(x) = 0. Our aim in this paper is to develop a robust algorithm for minimizing 8 that has a low cost per iteration. Note that 8 is a piecewise smooth function. In order to motivate our definition of Gauss-Newton points, let us first examine the notion of a Gauss-Newton point for nonlinear equations. This corresponds to the case where Q = R", and fa = f. A Gauss-Newton point for the smooth function / is a point x" € R n such that x = x* minimizes the first-order model \ ||/(x*) + V/(x*)(x - x')\\2 of 0{x) over R". For general fi, we construct a piecewise linear model of the residual function 8 based on the directional derivative fci{x*; ■). There are several key ideas on which the development of this paper are based. (i) The characterization of Gauss-Newton points for normal maps requires the stationarity of the residual function 8 with respect to every cell that contains that Gauss-Newton point. Thus, for complementarity problems, we must examine up to 2" orthants to determine whether or not x* is a Gauss-Newton point of / + . Our first key result is to show that it is sufficient to check at most two of these cells, independent of the magnitude of n. An alternative characterization given in this paper shows that one cell and at most n rays in neighboring cells need to be examined to verify stationarity of 8 (or give a descent direction). (ii) The inherent difficulty in defining an algorithm to determine a Gauss-Newton point is that one must be sure that the limit point of the algorithm is station ary for 8 in each piece of smoothness (orthant) containing that limit point. The second key idea, motivated by the characterizations above, is to apply variants of the projected gradient method [2] simultaneously to a single cell and n rays, to reduce 8. This means that the work performed by the projected gradient al gorithm at each step of the Gauss-Newton method is comparable to performing just two projected gradient steps. (iii) Our algorithm depends heavily on the projected gradient method having NonStationary Repulsion or NSR (see Section 3). Simply stated, if an algorithm has NSR, then each nonstationary point has a neighborhood that can be visited by

60

M. C. Ferris and D. Ralph at most one iterate of the algorithm. The third key result is that the projected gradient algorithm and the adaptations that we use in our algorithm have NSR. This property forces our algorithm to generate a better point in a neighboring orthant if the limit point of the sequence is not stationary in such an orthant.

The paper is organized as follows. In Section 2 we define the notion of a GaussNewton point for / + and and prove several equivalent characterizations (Proposi tion 2.3). We give a testable regularity condition (Definition 2.4) that guarantees that such Gauss-Newton points are solutions of (NE). Section 3 outlines the nonstationary repulsion property and shows that any algo rithm having NSR possesses strong global convergence properties (Theorem 3.2). We prove several technical results that are key to the convergence of our algorithms. A special case of these results are used to show that the projected gradient algorithm has NSR (Theorem 3.6). Section 4 contains a description of three algorithms and their convergence prop erties. Our main convergence result, Theorem 4.3, proves the Gauss-Newton method we present is extremely robust: assuming only continuous differentiability of / , every limit point of the method is stationary for 9. No regularity assumptions on limit points are required. However, before proving this result, we outline a basic algorithm that can easily be shown to have NSR and hence global convergence under the same assumptions. Theorem 4.3 proves convergence of an extension to the basic algorithm that is motivated by the practical considerations of reducing the number of function and Jacobian evaluations. A Newton based hybrid method with global and local quadratic convergence is given in Subsection 4.3. Some simple examples of the use of these algorithms conclude the paper. There have been many other research papers devoted to solving nonlinear com plementarity problems. Some of the more recent papers are mentioned below. There are several types of Newton methods for solving nonsmooth equations; see Subsection 4.3 for a brief introduction. Here we mention the following references on Newton methods for nonsmooth equations and extensions, [4, 5, 6, 12, 13, 15, 16, 20, 21, 26, 27, 28, 34]. A feature shared by "pure" Newton methods is the need for an invertible model function at the current iteration; applying the inverse of this model yields the next iterate. However singularities occur in many problems, for instance see [12], causing numerical difficulties for, or outright failure of these methods. To circumvent the singularity problem, several Gauss-Newton techniques for solv ing nonlinear complementarity problems have been proposed. These can be found in the following references [1, 9, 19, 22, 23, 24]. Alternative techniques can be found in [8, 10, 14, 17, 18, 36]. Most of the notation in this paper is standard. We use R n to denote the ndimensional real vector space, (•,■) for the inner product of two elements in this space, ||-|| for the associated Euclidean norm, and B for the corresponding ball of vec tors x such that ||x|| < 1. For a differentiable function $ : R" -> R m , V * ( z ) e R m x " represents the Jacobian of $ evaluated at x, and V $ ( x ) T represents the transpose of

Projected Gradient Methods for NCP

61

this matrix. If $ is only directionally differentiate, we denote the directional deriva tive mapping at x by $'(x; •). Calligraphic upper case letters in general represent sets of indices, upper case letters represent sets or operators. If ft is a convex set, the normal cone to ft at a point x £ $7 is

Na{x) = {y:(y,c-x)<0,

Vc £ ft).

The tangent cone at x € ft, is defined by Trj(x) = Nn(x)°, where for a given convex cone K, the polar cone is defined by K° = {y-(y,k)<0,

VkeK}.

Both the tangent and normal cones are empty at points x ^ ft. The Euclidean projection of x onto the set ft is represented by 7rn(x). A function $ . ft —> R m is C 1 (continuously differentiable) if it is differentiable in the relative interior of ft and, for each sequence {x*} in the relative interior of ft that converges (to a general point of ft), {V$(x k )} is also convergent. If ft is a polyhedral convex set, and F is a face of ft then iVfl(x) is the same set for every x in the relative interior of F [32], We call this set Nn(F). A facet of ft is a face that has dimension 1 less than ft. Further definitions from convex analysis can be found in [35]. We may abuse notation, when there is no possibility of confusion, by writing 8o instead of 8\o to mean the restriction of 9 to an orthant O (to be distinguished from a normal map involving ■KQ)- Finally, throughout the paper the function / : ft —> • R m is assumed to be C , and usually ft = R^.

2

Gauss—Newton Points and Regularity

As we outlined in the introduction, a Gauss-Newton point for the smooth function / is a point x* £ R" such that x = x* minimizes the first-order model ^ll/(z*) + V / ( x * ) ( x - x * ) | | 2 of 6(x) = i ||/(z)|| 2 over R n . Equivalently, x' is a stationary point of 6, that is V0(x*) = V/(x*) T /(x*) = 0. Note again that for the remainder of this paper we assume that / is continuously differentiable on its domain (ft or R+). In the general case, we approximate the normal map /n(x) by the piecewise linear model /n(x*) + /fj(x*;x —x*), where the directional derivative f'n{x'\ ■) is a piecewise linear map. We can now define the notion of a Gauss-Newton point of /n, which is based on this directional derivative. Definition 2.1 Let x" £ R" solves the problem

We say x* is a Gauss-Newton point for /n if x = x"

minl||/a(x-) + /^(x-;x-x*)||2

(1)

62

M. C. Ferris and D. Ralph

Equivalent^, x* is a Gauss-Newton point if 2 hM*')\\ |//n(x-) + + Ux*;x-*')\\ ^ ( x - ; x - *2*, ) | | 2 , ^ | | / n O O | | 2 < i\|/h(x')

Vx € R".

For the remainder of this paper we will consider only the special case of nonlinear complementarity problems where n = R". However, many of the results have analogues in the general polyhedral case.

2.1

Gauss-Newton points of complementarity problems

Using Definition 2.1, we see that x* is a Gauss-Newton point of /+ if it solves (1) with / n = / + . To understand this more fully, we now investigate the directional derivative / i in more detail. We can easily calculate the directional derivative of the function x+ at x in the direction d: it is the vector x\{d) in R n whose tth component is given by dt {( di

if if Xi n >> 0,

[x'+(d)]i = = { (*)+ (di)+ K(d))i

|I o0

if x Xit = = 0,

if x, < 0.

In fact x'+(d) is exactly the projection of d onto the critical cone of R+ at x, K(x). This critical cone is the Cartesian product of n intervals in R, the ith interval being

v

" 1

IP ^

I{ {0}

ifx,>0, , Xx, = 0, i , = 0, ifif x, 0. x, << 0.

Since / is continuously differentiable, /+ is directionally differentiate: for x, d 6 R", f'f'+(x;d) +(x;d)

= = Vf(x+)ir Vf(x+)irKK(d) (d)

+ +

d~ic d~icKK{d), {d),

where the notation K = K(x) is used. As a function of d, the mapping on the right is exactly the normal map induced by the matrix V / ( x + ) and the convex cone K, so f'++(x;d) (x;d)

= V/(x Vf{x+ )+A)K-(d). (d).

As mentioned above, the difficulty in determining whether a point x is a GaussNewton point is that we must examine potentially exponentially many pieces of smoothness of /+, or pieces of linearity of V/(x+)K. In fact, the number of pieces of linearity of V / ( X + ) R - is the number of orthants containing x, and is given by 2 m where m is the number of components of x equal to zero. The next result removes this difficulty by showing that at most two pieces of linearity need to be considered. We introduce some notation. Given an orthant O, let ft, be the half-line ± R + , i = 1,...,n, such that 0
Projected Gradient Methods for NCP

63

The complement of O at a point x G O is the orthant O given as the Cartesian product of half-lines Hi where

^ _ J Hi '

\ -Hi

if xt ± 0, if Xi = 0.

It may seem odd that the complement of O at an interior point x is O itself. This is actually quite natural in the context of stationary points of 0 because 8 is differentiable at each interior point x of an orthant, hence the question of stationarity of 8 at x is independent of other orthants. We next introduce the formal definition of a stationary point. Definition 2.2 If 8 is directionally differentiable and Q. is a nonempty convex set, then x* is a stationary point for min l 6 n 8(x) if 8'{x';d)>0,

VdeTn(x*).

Note that if O = R", then a stationary point satisfies 6'(x*; d) > 0, for all d £ R " P r o p o s i t i o n 2.3 Given x" 6 R", let K be the critical cone to R1J. at x', O* be any orthant containing x' and O* be the complement of O* at x' Suppose f is continuously differentiable, then the function 8, defined by

0(x) d 4 f i ||/ + (x)||\ is directionally differentiable and 8'(x';d) = (U(x')J'+(x';d)),

WeRn.

The following statements are equivalent: 1. x' is a Gauss-Newton point of f+. 2. x* is a stationary point o/min{0(x):x 6 R"}. 3. 0 G V / ( x ; ) T / + ( x ' ) + K° and 0 € f+(x*) + K. 4. x" is stationary for both min{0(x):x G O") and m i n | 0 ( x ) : x G O'j 5. x" is stationary for min {^(x): x G O") and for each 1-dimensional problem min{0(x):x G x* + ND-(F)} , where F is a facet of O" containing x".

(2)

M. C. Ferris and D. Ralph

64

Proof

If statement 1 holds, then we define

7(^ = ^||/ + (**)+ /;(**; * - x * ) | 2 , and note that -y'(x";h) = 0'{x*;h), for all h. Since x~ is a Gauss-Newton point, it follows that 0'(x'; d) + o(d) > 0, for all d, and hence that statement 2 holds by positive homogeneity. Conversely, if statement 2 holds, then for all d and fi > 0, 0 < (f+(x*),f'+(x"\d)), so that 7 (x*

+ fid) = i (/+(*•) + f'+(x'; fid)^ = g |/+(**) + P/+( s *S'0|

>|ll/+(**)ll8 + ^ 1 |/+(^;'0r > 7(«*)Hence statement 1 holds. Statement 2 means that (/+(i*), Vf(x'+)K(d)}

> 0, for all
V/(x;);r( 0, for all k g A'. Similarly, (f+(x"),v) > 0, for all v g A°. This is exactly statement 3. Conversely, let d g R n , and recall from the Moreau decomposition that d = k + f where A; = 7Tft-(rf) and ;/ = 7r/f.(
= (U(x%Vf(x%)k)

+ (U(x'),

u) > 0.

Thus statement 2 holds. Clearly statement 2 implies statement 4. Suppose statement 4 holds. Consider a facet F of 0* containing x* There is a unique index 1 0}. Note further that se' g O* Thus statement 4 implies statement 5. Suppose statement 5 holds and consider e = ± e ' for any index i. If x* ^ 0, then stationarity of x" for min{0(x):x g O*} yields that 8'(x';e) > 0. Ifx* = 0 then either e g O" or e g N0'(F) for some facet F of C?* containing x". Therefore 0'(x'; e) > 0. It follows by linearity of #'(x"; ■) on each orthant, that 0'(x~; d) > 0 for each d in each orthant, hence for d g R" This is statement 2. p The proof of the equivalence between statements 1, 2 and 3 in Proposition 2.3 can be immediately adapted to the case of a general polyhedral set ft, with K then representing the critical cone to ft at the point x.

Projected Gradient Methods for NCP

2.2

65

Regularity

We now turn to the question of when a Gauss-Newton point for / + is a solution of f+(x) = 0. This is commonly called regularity and we introduce a notion of regularity that is pertinent to our Gauss-Newton formulation. Recall from Proposition 2.3 that a: is a Gauss-Newton point if and only if

-/+(*) e K,

-Vf(x+)Tf+(x)

e K°,

where K is the critical cone to R+ at x. A simple regularity condition would be - / + ( i ) £ K,

-Vf(x+)Tf+(x)

G K° = > /+(*) = 0.

However, this condition is difficult to verify in most practical instances. In order to generate a more testable notion of regularity, we follow the development of More [19]. Here, f+(x) is replaced by a general vector z and extra conditions that are satisfied by f+(x) are used to weaken the regularity assumption. Thus we define

V^{i:xi>Ot{f+{x)]i>0}, tf¥{i:x,>0,[f+(x)},<0}, C ^ {t: [/+(«)], = 0 } , and we note that [f+(x)]p > 0, [f+(x)]jj < 0 and [f+(x)]c = 0. Definition 2.4 A point x £ R" is said to be regular if the only z satisfying -z G K,

-Vf(x+)Tz

€ K°,

zP > 0,

zM < 0,

zc = 0

is z = 0. This condition is closely related to [19, Definition 3.1]. This is because x is regular if and only if z # 0,

- 2 e K,

zv>0,

ZM<0,

zc = 0 = > -Vf{x+)Tz

i K°,

and the condition on the right is equivalent to existence of p £ —K such that 2 r V / ( x + ) p > 0. In contrast to [19, 22], the point x is not constrained to be nonnegative. Using Definition 2.4, we can prove the following result. Lemma 2.5 x is a regular stationary point for 6 if and only if x solves (NE). Proof If f+(x) = 0 then C = { 1 , . . .,n} so z = zc = 0. Further, using (2), x* is stationary for 9. Conversely, if x is stationary, then Proposition 2.3 shows that z = f+(x) satisfies all the relations required in the definition of regularity, and hence

/+(*) = 0.

□

M. C. Ferris and D. Ralph

66

We turn to the question of testing whether a point x is regular. [19, 22, 28] give several conditions on the Jacobian of / to ensure that x is regular in the sense defined in the corresponding paper. For brevity we only discuss the s-regularity condition of Pang and Gabriel [22], and do not repeat definitions here. More [19] ar gues that s-regularity is stronger than his regularity condition; a similar comparison between Definition 2.4 and s-regularity can be made. Here we make a new obser vation about s-regularity. To explain this, recall that the goal of [22] is to solve 0 = Q(x) = (1/2) ||min{/(x),x}||' ! , where the min is taken component-wise; a solu tion of this equation solves (NCP) and vice versa. If x is nonstationary for 0 , then s-regularity of x ensures that for some direction j 6 R ° and all x near x, y is a (strict) descent direction for 0 at x, i.e. Q'(x;y) < 0; see [22, Lemmas 2, 6 and 7]. However 0 (like 9) is only piecewise smooth, and may have a nonstationary point which is a local minimum of some piece of smoothness of 0 , contradicting the existence of such a direction y. So s-regularity is too strong in the context of this investigation. In what follows, we give conditions that ensure x to be regular in the sense of Definition 2.4. These results are proven by adapting arguments from More [19]. A key construct in the results is the matrix

J(x) d=

T^Vftx+jT-1

where

r = diag(ri), *, = { ^

lj**£

T is chosen so that every component of z = Tz is nonnegative. Under this transfor mation, x is regular if 0 + ~zv > 0,

zc = 0,

z £ -TK

= > 3p G -TK,

zTJ(x)p

> 0,

where V =f {i: [/+(x)],- / 0,x, > 0}. Note that z{ = 0 when i $ V. The results we now give impose conditions of J(x) to guarantee regularity. We note that A € R n x n is an S-matrix if there is an x > 0 with Ax > 0, see [3]. Theorem 2.6 LetJ(x) = T~lVf(x+)T~l If[J(x)}£(: set £ with T> C £ C {i: x, > 0}, then x is regular.

is an S-matrix for some index

Proof Since [J(x)] £ £ is an S-matrix, there is some pe > 0 such that [J(x)]eep£ > 0. Let p be the vector in R" obtained by setting other elements to zero, so that [J(x)p\£ > 0. Now 0 j= zv > 0 and V C £ so zTJ(x)p > 0. Also, -TK is the Cartesian product of -{TK)i=

[R if x ; > 0 R+ ifx i = 0 , { {0} if Xi < 0

i = l,...,n.

Thus p € —TK and hence x is regular by remarks preceding the theorem.

(3) r-i

Projected Gradient Methods for NCP

67

A is a P-matrix if all its principal minors are positive. P-matrices are S-matrices [3, Corollary 3.3.5]. The following corollary is now immediate. Corollary 2.7 / / [Vf(x+)]-pv Proof

is a P-matrix, then x is regular.

The hypotheses imply that [J(X)]I>T> is a P-matrix and hence an S-matrix. p

To complete our discussion of tests for regularity, we give the following result. Recall that if A is partitioned in the form AMM AMM

A =

AMM AMM def

and the matrix AMM is nonsingular, then (A\AMM) = AMM — AMMA~^\AMM is called the Schur complement of AMM in A. The proof of the following result is modeled after [19, Corollary 4.6]. Theorem 2.8 If[^f{x+)]MM 's nonsingular and the Schur complement in [J(X)]PT> is an S-matrix, then x is regular. Proof

of[Vf(x+)]MM

Let A = [J(x)]x>r> and partition A into AMM AMM

where AMM — [^f(x+)]MM

AMM AMM

and M = V\Af. [J{x)\

We construct PM, PM such that

PM vv

>0.

(4)

PM

Let a > 0, then PM, PM solve AMM

AMM

PM

AMM

AMM

PM

if and only if PM, PM solve AMM 0

AMM (A\AMM)

PM PM

I -

AMMAMM0

Since (A\AMM) is an S-matrix by assumption, there exists p~M > 0 with (A\AMM) PM > 0. Multiplyingpx by an appropriately large number gives (A\AMM) PM+AMMAJf^a > 0. It follows that q =f (A\AMM) PM + AMMA~MMa > °i a n d taking PM = AMlu(a AMMPM)

implies (4).

Let p € R n be the vector constructed from pM and PM by adding appropriate zeros. Then it is easy to see that p € -TK, see (3). Furthermore, zTJ(x)p = ZQ[J(X)P]T> > 0. Hence x is regular. □

M. C. Ferris and D. Ralph

68 Note that [5, 12, 28] all assume that [Vf(x+)]ee

is nonsingular and ( [ V / ( i + ) ] £ £ \ [ V / ( i + ) ] £ f ) is a P-matrix.

(5)

Here £ = {z:i, > 0} contains AT, and £ = {i:xi > 0} contains T>. Theorem 2.8 requires the non-singularity of a smaller matrix and a weaker assumption on the Schur complement. However, (5) guarantees regularity in the sense of Definition 2.4 as we now show. L e m m a 2.9 / / (5) holds or, equivalently, the B-derivative / I ( x ; •) is invertible, then -z G K,

-Vf(x+fz

£ K° = > z = 0,

and so x is regular. Proof The equivalence between (5) and the existence of a Lipschitz inverse of f'+(x; •) is given by [28, Proposition 12]. Since all piecewise linear functions are Lipschitz, the claimed equivalence holds. Suppose — z G A and — Vf(x+)Tz 6 A'0 It follows that z, = 0, i'. £ C. Also - V / ( x + ) T 2 G K" implies

[Vf(x+fz}£

= 0,

[v/(x+)r^>0,

where M = C \ £. Using the invertibility assumption from (5) and —z G A' again, we see that ([Vf(X+)T]MM

- [Vf(x+)T]Me[Vf(x+)T]^{Vf(x+)T]£M)

zM > 0,

zM<

0.

The Schur complement is a P-matrix and hence z = 0 follows from [3, Theorem 3.3.4]. D

3

Nonstationary Repulsion (NSR) of the Projected Gradient Method

Let fJ be a nonempty closed convex set in R™ and <j> : Q -+ R be C 1 (We are thinking of fi being an orthant and (j> = 8\Q.) We paraphrase the description of the projected gradient (PG) algorithm given by Calamai and More [2] for the problem mind>(x).

(6)

For any a > 0, the first-order necessary condition for x to be a local minimizer of this problem is that TTQ{X — a V ^ ( i ) ) = x.

Projected Gradient Methods for NCP

69

When xk 6 ft is nonstationary, a step length ctk > 0 is chosen by searching the path **(<*) - M*k

~ <*V(xk)), Q > 0.

Given constants 7J > 0, 7 2 £ (0,1), and ^ and ft2 with 0 < ^ s < fi2 < 1, the step length a^ must satisfy # * > * ) ) < ^(xfc) + p , ( V # s * ) , x*(o») - x*)

(7)

and ctk > 7i

or

ok > i2ak > 0,

(8)

where a* satisfies

*(**(**)) > 4>(*k) + ^ (V(*k), A^k) ~ x) k .

(9)

Condition (7) forces a t not to be too large; it is the analogue of the condition used in the standard Armijo line search for unconstrained optimization. Condition (8) forces a t not to be too small; in the case that a t < 7i, this requirement is the analogue of the standard Wolfe-Goldstein [7] condition from unconstrained optimization. The PG method is a feasible point algorithm in that it requires a starting point i ° in ft and produces a sequence of iterates {xk} C ft. It is also monotonic, that is, if x 6 ft is nonstationary, then 4>(xk+1) < {xk). We claim that the PG method has NSR: Definition 3.1 An iterative feasible point algorithm for (6) has nonstationary repul sion (NSR) if for each nonstationary x € ft, there exists a neighborhood V of x such that if any iterate xk lies in V n ft, then (xk+1) < 4>{x). The fact that the steepest descent method, i.e. the PG method when ft = R", has NSR is easy to see. Also, Polak [25, Chapter 1] discusses a general descent property that is similar to NSR and provides convergence results like Theorem 3.2 below. It is trivial but important that NSR yields strong global convergence properties: Theorem 3.2 Suppose A is a monotonic feasible point algorithm for (6) with NSR. Let x° 6 ft. 1. Any limit point of the sequence generated by A is stationary. 2. Let B be any monotonic feasible point algorithm for (6). Suppose {xk} is a sequence defined by applying either A or B to each x Then any limit point of {x^Jtgic is stationary if A is applied infinitely many times, where K = ik:xk+1 is generated by A\. Proof

M. C. Ferris and D. Ralph

70

1. This is a corollary of part 2 of the theorem. 2. Let x G fi be nonstationary for (6) and K. have infinite cardinality. NSR gives t > 0 such that (j>{xk+1) < <j>(x) if it € K. and x* G (x + «B) n $1 If the subsequence {X*}JC does not intersect (x + eB)(~lO then x is not a limit point of this subsequence. So we assume that xk £ (x + eB) n fi for some k £ IC, hence * ( * * ' ) < <j>{x). By continuity of (j> there is t\ 6 (0, e) such that i^(x) > <j>(xh+1) if x G (x + £ i B ) n n . By monotonicity of A and 6 , <^(x*+J) < ^(x* +1 ) for each j > 1; hence x is not a limit point of {x*+^}j>i, or of {x }ic-

Of course NSR is not a guarantee of convergence. To guarantee existence of a limit point of a sequence produced by a method with NSR we need additional knowledge, for instance boundedness of the lower level set

{x: <£(x) < ^(x0)} where x° is the starting iterate. To prove that the PG method has NSR we need to establish that the rate of descent obtained along the path x(a) = 7rn(x — a V ^ ( i ) ) is uniform for feasible x in a neighborhood of a given nonstationary x G fi. The lemma below states a uniform descent property for all small perturbations <j> about a given function ; the reader may consider = 4> for simplicity. In the case where many functions <j> are present we use the notation Xj,(ot) = TVQ(X —

aVip(x)).

Definition 3.3 1. Let x G R n and v > 0. If <j>: 0 —> R is C1, the modulus of continuity of V at x G fl is the function of and /3 > 0 (and x, v) « f > , / 3 ) ^ s u p { | | V ^ ) - V ^ ( x ) | | : x , 3 / G n , ||x - x|| < i/, ||x - y\\ < 0} . 2. Let x G R", v > 0, and <j>: R" -> R be C 1 Given t > 0, let U(e) = U(e, 4>, x, v) be the set of all C1 functions <j> : fl —> R n such that sup [\<j>{x) - j>(x)\ + \\v<j>(x) - V ^ ( x ) | : x G (x + i/B) n n } < e,

(10)

and

u(t,P) < (i + cMitP),

v/9e(o,i/).

(ii)

Projected Gradient Methods for NCP

71

Lemma 3.4 Let <j> : Rn —> R be C1, v > 0 and x G SI be nonstationary for min|<^(x):x G 0 | . Tftere exist positive constants e and K such that for each (f> G U(e) = U(t, (j>, x, v), x e {x + eB) n SI, and a > 0, (V0(x), x^(a) - a:) < - min{a, e}/c. Proof

(12)

Let <j> : ft -» R be C \ x € ft and v > 0. According to [2, (2.4)], (V^(x), x 0 (a) - x) < - \\x4(a) - x|| 2 / a ,

Vx G ft, a > 0.

Moreover, [2, Lemma 2.2] says that, as a function of a > 0, ||x^(a) — x\\ /a is antitone (nonincreasing); in particular for any a > 0, I M " ) - ill / a > I M « ) - x\\ / a ,

Va € (0,a).

Using this with the previous inequality, we deduce for any a > 0 that -a{\\x4,{a) - x\\ /a)2,

{V(x), X
<-a(||x*(a)-a;||/a)2,

Vx € ft, a > 0 VxGft, 0 < a < a .

(13)

Fix a > 0. By hypothesis, the point x is such that x^(a) — x > 0. Also by (10), if x —» x and 4> —> <j>, where convergence of means that <j> G U(t) and e J. then xj,(a) converges to x$(a). Hence there are e > 0, K > 0 such that for G U(t), x G x + eB, ||x^,(a) — x|| > \Zita. Together with (13), this yields (V<£(x), x^(a) - x) < -OK,

Vx G (x + eB) n ft, a G [0, a}.

Let e = min{e, a}, then (12) holds for 0 < a < e. Using the well known antitone property of (V(x),x^(a) — x) in a > 0, see [2, (2.6)], we see that (12) also holds for a > e.

a The following result gives some technical properties of the PG method that will be important for our main algorithm. We use it later to prove that the PG method has NSR, though in this case NSR follows from the simpler case in which is a fixed function. Proposition 3.5 Let 4>'■R" —» R be C 1 , v > 0 and x G ft be nonstationary for min{^(x):x G ft}. Then there is a positive constant e such that for each x G (x + eB) n ft and <j> G U(e) = U(e, <j>, x, v):

M. C. Ferris and D. Ralph

72

1. For each a € [0, ej, (x) +

^(V^(x),xi,{a)-x).

2. One step of PG on (6) from x generates x^,(a) with a > t. Proof Suppose 0, x and u are as stated. Let 7! > 0, 72 € (0,1) and 0 < ft\ < Hi < 1 be the constants of the PG method. Let ei, K be given by Lemma 3.4, and G t/(ei); and assume without loss of generality that «i g (0, v], i.e. (10) and (11) hold with t = v = €1.

We estimate the error term e(, x, y) = 4>{y) - 4>{x) - (V(x), y -

x),

where y,x £ fl. By choice of ^ 6 U(e\), specifically (11) with e = v — e%, for each 0 G (0,ti), 1 G (x + e i B ) n Q and y £ (x +/3B) n H, ||V^(x) - V # y ) | | < (1 + e t M W ) . Thus, for x G (x + «iB) n fl and y G (x + «iB) n O, |e(<£,2/,*)l = I / (x),y - x) dt\ Jo

<(l

+ ei

M^||x-s,||)Jl^il,

(14)

By continuity of V ^ on the compact set (x + eiB) n $7, there is a finite upper bound 77 on V^(x) for x G (x + ejB) fl fl. Define fj = 77 -f e1; by choice of <j> G t^(«i), specifically (10) with e = v = ex, ||V^(x)|| < 77 for x G (x + exB) n Q. It follows for such x and any a > 0, that IM<>) - x|| < art?, (15) because 7TQ is Lipschitz of modulus 1. Furthermore, since V is uniformly continuous on compact sets, u}(4>, 0) | 0 as 0 J. 0. Thus, using the fact that ui((j>, •) is nondecreasing, there exists e2 G (0, £i) such that for x G (x + £iB) fl fl and a G (0, e2), both a?) < ej and fj (1 + e,)a;(^,a:7y")- < «(1 - /j 2 ). From these inequalities and the inequalities (14) and (15), we see that for x G (x + € x B ) n n and a G (0,£ 2 ), c(4>,xt{a),x)

< ctK(l-[i2).

(16)

73

Projected Gradient Methods for NCP Now for such x and a, *(«*(a)) + e(^x^{a),x) + 2( l+- /{l-^ * 2 2))}{V4>(x),x ] ( W ( x ) , x,Xi(a),x) (*) + +[/*2[/J. *(«*(«)) == (x) < 4>(x) + + M + oa(l-p V#f l0*, )*, s, ,( (aa))--xx )) -- «a(l-/*j)/c W (V <4>(x) (1-W)« + ( l - M2a)ic )*

where the second inequality relies on the uniform descent property of Lemma 3.4 and (16). Thus (*)
Theorem 3.6 The PG method applied to (6) has NSR. Proof Let x g R n be nonstationary, so according to Proposition 3.5 and Lemma 3.4, if xk g (x + eB) n n then ak > e and h+1 (xk+1 ) < 4>(xk) + n (V4,(xk), xk(ak) - xk) k
(17) (17)

where 6 = ^ « / 2 > 0. Now by continuity of there is t g (0, t) such that

\{x)-4>{x)\<S, Vxg(x Vie(x + €( -B)nn. Take V = (x + eB) n Q, and use the above inequality with (17) to see that for any xk e V, (xk+1) < (x)-S. <j>(x (x)-6. The NSR property of Definition 3.1 follows.

4

D

Projected Gradient Algorithms for N C P

Our main goal here is to present a method for minimizing 6 that has a low computational cost, and has NSR. Before proceeding we will make a few comments on guaranteeing convergence, at least on a subsequence. Existence of a (stationary) limit point of a sequence produced by a method with NSR follows from boundedness of the lower level set

{xgR":||/++(x)||<|/ (x)||<|/++(x°)||}, (x°)|},

M. C. Ferris and D. Ralph

74

where x° is the initial point. This boundedness property holds in many cases, for instance if / is a uniform P-function, see Harker and Xiao [12]; hence if / is strongly monotone. However the uniform P-function property implies that f'+(x; ■) is invertible for each x, a condition that we believe is too strong in general (c.f. Lemma 2.9). A weaker condition yielding boundedness of the above level set is that / + is proper, namely that the inverse image /^(S) of any compact set S C R n is compact.

4.1

A simple globally convergent algorithm

Given statement 4 of Proposition 2.3, it is tempting to use the following steepest descent idea in algorithms for minimizing 9. Given the kth iterate xk € R n , an orthant Ok containing xk and the complement
subject to

\\d\\ < 1, d e Ok U 0k

d

This essentially requires two n-dimensional convex quadratic programs to be solved (a polyhedral norm on d may be used), one for each orthant. If d = 0 is a solution, then xk is stationary for 0. Otherwise 9'(xk; dk) < 0, and we can perform a line search to establish a* > 0 such that for xk+1 = xk + cikdk, 6{xk -f otkdk) is strictly less than

e(xk). However if 9 is nonsmooth there seems to be little global convergence theory for algorithms based on this idea. For instance, it is not known if the step length a t can be chosen to be uniformly large in a neighborhood of a nonstationary point, while still retaining a certain rate of descent; hence it is hard to show that the sequence produced will not accumulate at a nonstationary point. Pang, Han and Rangaraj [23, Corollary 1] give an additional smoothness assumption at a limit point that is required to prove stationarity. Alternatively given the stationarity characterization of Proposition 2.3.5, we can design a naive steepest descent algorithm for minimizing 0, each iteration of which is based on a projected gradient step over an orthant Ok containing the current iterate xk, and an additional m projected gradient steps on 1-dimensional problems corresponding to moving in directions normal to the m facets of Ok that contain xk (so m is the number of zero components of xk). It is significant that to obtain global convergence, we only need to increase the number of 1-dimensional subproblems at each iteration from m to n, i.e. normals to all facets of Ok must be examined. The algorithm below introduces notation not strictly required for its statement; this notation is presented in preparation for the main algorithm, Algorithm 2, which appears in the next subsection. By 9o we mean the restriction 9\o of 9 to O. Algorithm 1. Let x° <= R n Given k 6 { 0 , 1 , 2 , . . . , } and xk <= R n , define xk+1 as follows. Choose any orthant Ok containing xk, let y°(a) = T0k[xk — aV0Ok(xk)], and ao be the step size determined by one step of the projected gradient

Projected Gradient Methods for NCP

75

algorithm applied to min {d(x): x £ Ok} from xk. Suppose Fx,..., the facets of O

k

d

k

For j = 1 , . . . ,n, let ^ = *Fj(x ),

Nj =

Fn are N0*(Fj),

J

«*/v,[-Vtf(j/ )], and C*, be the step size determined by one step of the projected gradient algorithm applied to min{0(x)::re y'+ Nj},

(18)

starting from y J . Let xk+1

= y}{ctj),

where

j 6 argmin{«(y J '(a J -)): j = 0 , 1 , 2 , . . . ,n} .

If 0(z* +1 ) = 9(xk) then STOP; z* is a Gauss-Newton point of /+. Remark. In Algorithm 1 the projected gradient method is used as a subroutine. Therefore we assume that if the starting point of a subproblem is stationary, then the projected gradient method merely returns this point; the decision of whether or not the main algorithm should continue is made elsewhere. Theorem 4.1 Algorithm 1 is well defined and has NSR. Proof Since the projected gradient method is well defined, for each k and xk the algorithm produces xk+1. If 9(xk+1) = 0(xk) then none of the subproblems of the form (18) produced a point with a lower function value than 0(xk). So xk is stationary for each subproblem for which Fj is a facet of Ok containing xk, and by Proposition 2.3, xk is also a Gauss-Newton point of / + . Thus Algorithm 1 is well defined. We show that the algorithm has NSR. Suppose x is not a Gauss-Newton point of / + . For xk sufficiently close to x, x £ Ok. So consider the case when Ok = O for some fixed orthant 0 containing x. By Proposition 2.3 z is nonstationary either for min \0(x): x € 6} or for min {9(x): x € x + No(F)}, where F is some facet of O containing x. In the former case, for some e = e(F) > 0 and each xk £ x + eB, we have from Theorem 3.6 with = 6$ and il = O, that the candidate y°(a0) for the next iterate xk+1 yields 6{y°{a0)) < 0(x). Hence our choice of xk+l also yields 6{xk+1) < 8(x). In the latter case, we can apply Proposition 3.5 by reformulating the subproblem (18) as min {6(yj + d):d£ N0(F)}, i.e. define (d) = 0(y3+d), 4>{d) = $(x+d), Q = Ng(F), and v as any positive constant, and let t\ > 0 be the constant given by Proposition 3.5. Given the simple form of , it is easy to check that there is e = t(O) > 0 such that if xk € x + eB, then £ U{ti,<j>,x,v). For such xk, Proposition 3.5 says that the candidate iterate yJ(aj) yields 9{y1(a1)) < 9(x), hence 0(z* +1 ) < 9(x). Since there are only finitely many orthants, we conclude that for some e > 0 independent of Ok, and each xk £ x + eB, we have 9(xk+l) < 9(x). n

76

M. C. Ferris and D. Ralph

This algorithm is extremely robust: under the single assumption that / is C 1 on R+, the method is well denned and accumulation points are always Gauss-Newton points. It is also reasonably simple, using the projected gradient method as the work horse. A serious drawback of Algorithm 1 is that we need at least n + 1 function and Jacobian evaluations per iteration, in order to carry out the projected gradient method on the n + 1 subproblems. By contrast, the use of 1-dimensional subproblems means the linear algebra performed by Algorithm 1 is only around twice as expensive as the linear algebra needed to perform one projected gradient step on an orthant.

4.2

An efficient globally convergent algorithm

We present a globally convergent method for finding Gauss-Newton points of /+ based on the PG method. It is efficient in the sense that per iteration, the number of function and Jacobian evaluations is comparable to that needed for the PG method applied to minimizing a smooth function over an orthant, and the linear algebra computation involves about double the work required for linear algebra in the PG method. At each iteration, we approximate 6 by linearizing / about xk+. Let

AH*)^\\\LU*)( where

k d L L\[z) =' //((*4* ) + + V/(s*)(x V/(^)(x+ + --x%) x%) + xx--- x+. +(x) ^

(19)

The "linearization" I * is a local point-based approximation [34] when V / is locally Lipschitz, and more generally a uniform first-order approximation near xk [28]; such approximations are more powerful than directional derivatives in that they approximate /+ uniformly well for all x near xk. In [5, 4, 28, 34] these approximation properties have been exploited to give strong convergence results for Newton methods applied to nonsmooth equations like f+(x) = 0. Our main algorithm, below, and its extremely robust convergence behaviour also rely on these approximation properties. Lemma 4.2 Let s € R" and e > 0. There is a non-decreasing function e : R + -> R+ such that e{fi) = o(/3) as 0 j 0, and for each xk, x 6 x + eB, \\9(x)-A \B(x) - k(x)\\ Ak(x)\

<

e ( | x - *z »* | ) .

Proof We have k

k \9{x) (x),f++(x) (x)- --L\(x)) \0(x) --A - Ak(x)\ (x)\ = (1/2)
k
77

Projected Gradient Methods for NCP

where c € (0,oo) is the maximum value of (1/2) | / + ( x ) + L+(x)| for xk,x e x + eB. Let u be the modulus of continuity of Vf on R+ n (x + + eB) (see Definition 3.3). Similar to (14) in the proof of Proposition 3.5, \\f+(x) - L\(i)|

< w(|»-**|)|i~x*|/2,

where u(0) -> 0 as /? | 0. Take e(/?) d= cw(/?)/?/2.

O

We will search several paths during an iteration but, unlike Algorithm 1, our criteria for choosing the path parameter a will use derivatives of the approximation A rather than of 9. Let fio € (0,/ii) and R" with j/(0) = y. Given a > 0, j/(ar) is a. candidate for xk+1 if 9(y(a)) < Ak(y) + Mo (VA$,(y), y(a) - y) < Ak(y).

(20)

Here AQ is the restriction Ak\o- If a fails the above test, we can try a = era. Note that if y = xk then VAJ^y) = S79o{xk), and the obvious choice for y(a) is 7ro(x* — aV#o(x*)). In this case (20) is equivalent to (7) with = 9o and Q = O. The first part of Algorithm 2 is a single step of Algorithm 1 applied to Ak instead of 9. The second part determines the path and the corresponding step length that will define the next iterate xk+1. Algorithm 2. Let x° G R n and (in addition to the constants used for the PG method), ft0 E (Q,Hi), a G (0,1). Given k € { 0 , 1 , 2 , . . . , } and xk G R n , define xk+1 as follows. Part I. Choose any orthant Ok containing xk, let y° = xk,

y°(a)^irok[xk-aV90,(xk)}, and cto be the step size determined by one step of the projected gradient k k algorithm applied to min {A (x):xeO } from y° Suppose F i , . . . , Fn are the facets of Ok. For j = 1 , . . . , n, let y' = vFj(xk), Oj^Fi

N, =

N0*[Fj),

+ N,,

yi(a) ^y' +

a^l-VA^y*)},

and Qj be the step size determined by one step of the projected gradient algorithm applied to min{A' i (x):xG!/ J + Af;} from y1. Part II. Path search: Let M = { 0 , . . . , n}, j d= 0 and a0 = a0/a. REPEAT

(21)

M. C. Ferris and D. Ralph

78

Let as = aa-y If a5 < \\y* - x*| then M = M \ {j}. If M = 0 then STOP; xk is a Gauss-Newton point of / + . Else let j G argmin { A V ) + Co (VAk0i(yj),

yj{Qj)

- y') :j € M) .

UNTIL (20) holds for y{a) = ys(a3), y = y'1 and O = 0-3. Let x* +1 d=

y'\a5).

Remark. For the algorithm to work properly, we assume that part I returns a, = 0 if i/J is already stationary for the corresponding subproblem. Theorem 4.3 Algorithm 2 is well defined and has NSR. Proof First we show that each step of the algorithm is well defined. Consider one step of the algorithm given k G {0,1,2,...} and xk G R". Part I is well defined because the projected gradient method is well defined. For part II we see that each iteration of the REPEAT loop is well defined; we claim that the loop terminates after finitely many iterations. Certainly if j G { 0 , . . . ,n} and yi ^ x ' , then after a finite number of loop iterations in which j = j and a ; = aaj, we have aj < w — xk\; hence in any subsequent loop iterations j' £ M. and j ^ j . Instead suppose j is such that j / J = xk. Either y' is stationary for the j t h subproblem hence aj = 0 and, by construction of M, j equals j for at most one loop iteration; or using Proposition 3.5.1, initially aj > 0 and after finitely many loop iterations in which j = j and aj = aaj, (20) holds, terminating the loop. It is only left to check that xk is a Gauss-Newton point of / + if M. = 0. In this case, aj < By* — xku for each j , in particular aj = 0 if y' = xk, i.e. for j = 0 and each j in M = lj: 1 < j < n,xk 6 Fj\. This is only possible if xk is stationary for each k k subproblem min [A {x):xe O \ and min {Ak(x):xexk + N0j(Fj)} where j € M'. Since for each orthant O containing xk we have V Ak*>(xk) = V0o(xk), it follows that k x is also stationary for min {9(x):xeO } and min {9(x):xexk + N0!(Fj)} where k j G M.'. Proposition 2.3 says x is indeed a Gauss-Newton point of / + . We now prove the NSR property. Suppose that x is nonstationary for 6. As in the proof of Theorem 4.1 we assume Ok = O for some fixed orthant O containing x. Observe from Proposition 2.3, that either x is nonstationary for min {0(a:): x G 0\ or x is nonstationary for min {0(x): x € x + A ^ F ) } , for some facet F of O containing x. Below we assume the latter, and deduce for xk near x that 0(xk+1) < 6(x). Let O be an orthant containing x and F be a facet of O containing x. Assume x is nonstationary for min{0(x):x G x + NQ(F)}. Assume further that xk is some k iterate with O = O, so if F\,... ,Fn are the facets of Ok, then F = Fj for some

Projected Gradient Methods for NCP

79

index J. To simplify notation we omit the superscript or subscript J where possible. Let N = Ne{F), 0 = F + N,y = nF{xk), y(a) = wN{y - aVAk0(y)), and A

A(x)

M

Observe, since V0Q(X)

(l/2)||/(x+) + V / ( x + ) ( x + - x + ) + x - x + | | 2 .

= VAQ(X),

that x is nonstationary for min {A(x): X 6 y + N\.

Rewriting the Jth subproblem, min {Ao[x): x G y + N\, as

min {Ao(y +

d):deN},

AkN(y

defining 4>(d) = + d), (f>(d) = AN(y + d),il = N and choosing v > 0, enables us to apply Lemma 3.4 and Proposition 3.5. Then there exist t\ > 0, K > 0 such that if p * — * < «i and G U(ti) = U(ei, 4>, x, v) (see Definition 3.3), then Ak(y(a))

- Ak(y) < Ml (VAk0(y),

y(a) - y) ,

(22)

(VA£,(x*), i/(a) - y) < - min{a, e,}«,

(23)

and the initial step size aj chosen in Part I of the algorithm is bounded below by «!. Now AN and A^ are quadratic functions defined on the half-line N, hence, by continuity of V / , it follows easily that there exists e2 £ (0, £i] such that <j> € £/(ei) if III* - x < e2. Thus (22) and (23) hold for such xk and a G [0, e2]. Let xk — x < «2 and 0 < a < e2. We have 9(y(a)) -

0(xk)

k

= [A (y(a))-Ak(y)}

+ [Ak(y) - 0(xk)\

+ [9(y(a)) -

Ak(y(a))\

< ^ {VAk0{y), y(a) - y) + [Ak(y) - tf (**)] + [%(<>)) - Ak(y(a))],

(24)

using (22). Let L be an upper bound on VA£,(j/) for x* G x 4- e 2 B, and observe \\y(a) - x*|| < ||i/(a) - j / | | + \\y - xk\\ < aL + \\y - xk\\ . Also y = irp(x*) is bounded on x+e 2 B, therefore Lemma 4.2 provides a non-decreasing error bound e(t) = o(t) such that for each x* G x + e 2 B, a 6 [0, e2], % ( a ) ) - X fc (j,( a )) < e(aL + |y - x*|).

(25)

Let k = (fii — HO)CTK,/2 and choose a G (0, e2) such that e(2aL) < ka. Now choose e3 G (0,e 2 ) such that if llx* - xll < e3, then both | j / - s*| < dmin{(7, £ } and |A*(y) - 0{xk)\ < ka. Let x* G x + e 3 B. For a G (0,a], (24) and (25) yield 0(y(a)) - 6(xk)

< ftJVAk0{y),y{a)-y\ k

+ ka + e(aL + aL)

< fti(vA j(y),y{a)-y)

+ (m - pi0)aaK

<

+(ci-ft)K

ft(V4(!f),!/(a)-!(}

Vae[cra,a].

M. C. Ferris and D. Ralph

80

From (23), if a < tx then OK < - {VAk0(y), y(a) - y); therefore

e(y(a))-e(xk)
Vaeka.s].

(26)

From above, the initial step size QJ and the point y' = y are such that ctj > a, and era > t/ — i * | . We claim it follows from (26) that, during the REPEAT loop of part II, j S M and a j > era. To see this suppose that a j decreases in some loop iteration after the first loop iteration. Then at the end of the previous loop iteration, (j = } and) the condition (20) fails for y(a) = y3(aj): y = y' and O = Of, so it follows from (26) that a j > a. Thus the new value aaj of a$ is bounded below by era, hence also ny — xkn and j is not deleted from M. Therefore after the REPEAT loop terminates, j G M and aj > era; and the selection of xk+1, whether or not using y3(-) = y(-), satisfies 0(xk+1) < mm{Ak{y>) + ^ o < V A ^ ( y J ' ) , ^ ( a y ) - y3) : j e
+

M}

h fi0(VA o(y),y(aj)-y)

k

< A (y) — //omin{aj,£]}K

(from (23))

k

< A (y) - 6, where 6 = afiodiK is a positive constant independent of xk- As noted above, Ak(y) —> 6(x) as xk —► x, so 0(xk+1) < 6{x) for xk sufficiently close to x. A similar argument can be made for the case when Ok = O and x is nonstationary for min|^(x):a; 6 0\. In this case j = 0, y = xk and y(a) = no(xk — a V A ^ ( i ' ) ) . We do not give details, but only note that this process is somewhat simpler than that above because the inequality corresponding to (24) only has two summands on the right: 9(y(a))-6(xk)<^{VAk(xk),y(o)-xk)

+ [%(*)) -

Ak(y(a))].

Since there are only finitely many choices of O, the NSR property of Algorithm 2 is established. g

4.3

A hybrid algorithm with quadratic local convergence

Both of the algorithms given above have at best a linear rate of convergence because the projected gradient method is only a first-order method. However, if an algorithm for finding a Gauss-Newton point of / + has NSR (such as Algorithms 1 and 2), then this lends itself to hybrid methods that alternate between steps of the original algorithm and Newton-like steps and therefore admits the possibility of quadratic local convergence. For such a hybrid algorithm, let AC be the set of indices it for which the original algorithm determines xk+1 If K. has infinitely many elements and monotonicity of the algorithm is maintained, accumulation points of the subsequence

Projected Gradient Methods for NCP

81

{x }keic are Gauss-Newton points of / + . If such a limit point x' is in fact a point of attraction of a Newton method, and a Newton step is taken every ith iteration, then convergence will be ^-step superlinear, or i-step quadratic if Vf is Lipschitz. See [2] for details on a related hybrid algorithm in the context of quadratic programming. We briefly sketch three popular Newton methods for solving the nonsmooth equa tion f+(x) = 0, which often produce Q-quadratically convergent sequences of iterates. To make comparisons easy, we use the general notion of a Newton path [28] which, given the iterate xk, is some function pk : [0,1] —> R" with pk(0) = xk; the next iterate xk+1 is defined as pk(a) for some a G [0,1] (details are given below). We say a Newton iterate or Newton step is taken if xk+1 = p fc (l). We may not take a Newton step, however, if it does not yield "sufficient progress". A simple damping strategy is used to ensure sufficient progress: recall the constants fio,a 6 (0,1), and define a as the largest member of {1, a, a2,...} such that ||/+(p*(«0)| < ( l - M o a ) | | / + ( x * ) | . Then xk+1 = pk(a);

(27)

this is the damped Newton iterate.

N e w t o n path 1. Mk = Vf+\ok,

Given k and xk, let Ok be an orthant containing xk,

dk = -(Mk)-'f+(xk), N e w t o n path 2. of /+ at xk,

Given k and xk, let Bk = f'+(xk;

dk = (Bk)-l[-f+(xk)}, N e w t o n path 3. xk given by (19),

pk(a) =xk + adk ■), the B-derivative

pk(a) = xk + adk.

Given k and xk, let Lk+ be the linearization of / + at

pk(a) = (4rM(i "«)/+(**)]■ Let Newton 1 be Newton's method with Newton path 1, etc. We give sufficient conditions for Q-quadratic convergence of these algorithms. Assumption (A). The point x is such that first, for each orthant O containing x, V/ + |<5(i) is invertible; and second, for some neighborhood V of 2+, V / is Lipschitz in V n R+. Assumption (B). The point i is such that first, f'+(x; ■) is invertible; and second, for some neighborhood V of x+, Vf is Lipschitz in K n R " .

82

M. C. Ferris and D. Ralph

Suppose f+(x) = 0. Assumption (A) is sufficient for Newton 1 to produce a sequence {xh} such that the Newton step is taken at each iteration k' > k, if some x is near enough to x [15, 26]. Furthermore, {xk} converges Q-quadratically to x. However (A) may not be sufficient for Newton 2 and Newton 3 to be well-defined, even though some xk is arbitrarily close to x. We need condition (B): if some xk is near enough to x, then the same conclusion holds for Newton 2 [26] and Newton 3 [28, 34] as holds above for Newton 1. Assumption (B) implies assumption (A) but not vice versa; assumption (A) is called BD-regularity [27]. Also note that the condition f+(x) = 0 is implied, assuming (B) holds, by x being a Gauss-Newton point of / + (Lemma 2.9). Further work on Newton 2 is given in [12, 20, 27] and on Newton 3 in [5, 4]. We define a hybrid algorithm involving Algorithm 2, for global convergence proper ties under no regularity assumptions, and Newton's method for fast local convergence under one of the regularity assumptions (A) or (B). Hybrid. Let x° 6 R n and (in addition to the constants used in Algo rithm 2) a 6 (0,1]. Fix "Newton's method" as one of Newton 1, Newton 2 and Newton 3. Given k £ { 0 , 1 , 2 , . . . , } and xk G R n , define xk+1 as fol lows. If f+(xk) = 0 then STOP. If Newton's method produces pk(a) such that a > a, then accept this as the next iterate: x* +1 = pk(a). Otherwise, let Algorithm 2 produce xk+l from xk Observe that given xk it may not be possible to carry out a step of Newton's method, for example in Newton 1 it may be that Mk is a singular matrix; yet the Hybrid algorithm is still well defined (via Algorithm 2). On the other hand, if Newton's method is feasible and provides sufficient descent, for instance by taking a Newton step, then Algorithm 2 is not invoked. Also, by setting 3 = 1 , Newton's method is restricted to taking a Newton step; this is rejected by the Hybrid algorithm if (27) is violated for a = 1. The proof of the next result follows easily from Theorem 4.3 and the above remarks on convergence properties of algorithms Newton 1-3. Theorem 4.4 The Hybrid algorithm either terminates after finitely many steps at a Gauss-Newton point x of f+, or produces an infinite sequence {xk} each limit pointx of which is a Gauss-Newton point of /+. Moreover, if either of the following conditions hold: 1. f+(x) = 0, the Hybrid uses Newton 1 and assumption

(A),

2. the Hybrid uses any of Newton 1-3 and assumption (B), then x is a zero of f+ such that after finitely many iterations each iteration is a Newton step, {xk} actually converges to x, and the rate of convergence is Q-quadratic.

Projected Gradient Methods for NCP

4.4

83

Examples

We now give some examples of the algorithms described in this section. The first example is a linear complementarity problem, to find z > 0 with f(z) =f Mz + q > 0, where 0 1 1 M = 1 1 -1 ' The unique solution of this problem is z = (0,1). The corresponding zero of the normal map f+(x) is x = (—1,1). The difficulty with this example is that if we start at any point x > 0, the corresponding piece of the normal map is not invertible. We will first show how to apply Algorithm 1 to this problem. In all the calculations below we will use exact minimization to choose the step length a rather than the less stringent conditions (7)-(9). Also note that Ak and 0 are identical for each k, because / is an affine function. Suppose that we start at the point x° = (1/2,1/2) at which 0(x°) = 1/2. Then O0 = R2+ and V0 O o(x°) = (1,1). Thus y°(a) = (1/2 - a, 1/2 - a)+. The value of a that minimizes 6(y°(a)) over a > 0 is a0 = 1/4, giving y°(a0) = (1/4,1/4) corresponding to 6(y°(a0)) = 1/4. We enumerate the facets of O0 by setting the corresponding x, = 0. Thus Fi = 0 x R+, Ni = R_ x 0, 0 i = R_ x R + , y1 = (0,1/2) and Vflo,^ 1 ) = (1/2,0). Hence y1(a) = (—a/2,1/2), and 0(y*(a)) is minimized over a > 0 at a — 1. This gives <*i = 1, SfVi) = ( - 1 / 2 , 1 / 2 ) and % > , ) ) = 1/8. Since F2 = R + x 0 it follows that N2 = 0x R-, 02 = R+ x R_, y2 = (1/2,0), and V8o2{y2) = ( 0 , - 1 / 2 ) . Then irN2{-Ve0l(y2)) = 0, and y2{a) = y2 for each a > 0. So we take the optimal value of a as a2 = 0, yielding j/ 2 (a 2 ) = (1/2,0) corresponding to 0(y2(a2)) = 1/4. Thus, in Algorithm 1, x 1 = y1(cii) and we proceed to the second iteration. In fact, O2 = R_ x R + remains the chosen orthant for the remaining iterations, and it can be easily calculated that {x*} converges at a linear rate to x = ( — 1,0). However, in O2 the linear map is invertible, so if the Hybrid were employed at this point, then the algorithm with any of Newton 1-3 would move to the solution in a single step. If instead, we start at the point x° = (1/4,1/4), then y° = (1/4,1/4), V6>o0(y°) = (0,0), hence a0 = 0 and y°(a0) = (1/4,1/4) with 0(y°(ao)) ~ 1/4. It is easy to calculate that y 1 = (0,1/4), V0Ol{vl) = ( I / 4 , - I / 2 ) , yl{<*) = ( - a / 4 , 1 / 4 ) , hence the optimal value of a is ar = 1 which gives yl(oti) = ( - 1 / 4 , 1 / 4 ) with 0(y1(al)) = 9/32. Also, a2 = 0 implying that y2(a2) = (1/4,0) with 0{y2(a2)) = 5/16. Hence the algorithm terminates at (1/4,1/4) which is a Gauss-Newton point. Clearly, it is not regular. Note that the hypotheses of Theorem 2.6, Corollary 2.7, Theorem 2.8 and Lemma 2.9 are all violated at this point. If x° = (1/2,3/4), then it can also be seen that x 1 = ( - 3 / 4 , 3 / 4 ) and so the algo rithm (similarly to the first example) converges to the solution of the complementarity

M. C. Ferris and D. Ralph

84

problem. In order to avoid converging to a non-regular Gauss-Newton point, we need to get enough descent on one of the rays to leave the nonnegative orthant. In all the above examples, Ak = 8 and hence the steps of Algorithm 1 are identical to those of Algorithm 2. We now adapt the example to a nonlinear complementarity problem.

Let

/(*) =

4 + ' 2 4-12

and consider starting Algorithm 2 at x° = (1/2,1/2). Note that the unique solution of the corresponding nonlinear complementarity problem is z" = (0, 1/A/2) with cor responding x' = z' — f(z") = (—0.7702,0.7071). We first construct a piecewise linear model of /+ at x°, which results in the linear complementarity problem given above. Hence applying the projected gradient algorithm to the cell and the two rays leads to j/Vo) = (l/4,l/4),

y ^ d ) = (-1/2,1/2),

y2(a2) = (1/2,0),

with corresponding values of cto = 1/4, c^ = 1/2, a 2 = 0. Proceeding to Part II of Algorithm 2, we note that M = {0,1,2} and that for ^o = 0.1, the value of the models linearized around y° = (1/2,1/2) and y1 = (0,1/2) and y2 = (1/2,0), are j = 0 1/2 + 0.1 ((1,1), (1/4,1/4) - (1/2,1/2)) = 0.45 j = 1 1/4 + 0.1 ((1/2,0), (-1/2,1/2) - (0,1/2)) = 0.225 i = 2 1/4+ 0.1 ( ( 0 , - 1 / 2 ) , ( 1 / 2 , 0 ) - ( 1 / 2 , 0 ) ) =0.25 respectively. Thus we evaluate 8(yl(oti)) = 0.045 which is below the linear model value of 0.225, hence x1 = ( — 1/2,1/2). Algorithm 2 now proceeds from this point by relinearizing. Applying the Hybrid algorithm in the new cell would lead to the solution very quickly. Acknowledgement. We thank an anonymous referee for drawing our attention to reference [16].

References [1] J. V. Burke and M. C. Ferris, A Gauss-Newton method for convex composite optimization, forthcoming in Mathematical Programming. [2] P. H. Calamai and J. J. More, Projected gradient methods for linearly con strained problems, Mathematical Programming 39 (1987) 93-116. [3] R. W. Cottle, J. S. Pang and R. E. Stone, The Linear Complementarity (Academic Press, Boston, 1992).

Problem

Projected Gradient Methods for NCP

85

[4] S. P. Dirkse and M. C. Ferris, A pathsearch damped Newton method for com puting general equilibria, Mathematical Programming Technical Report 94-03, Computer Sciences Department, University of Wisconsin (Madison, Wisconsin 53706, 1994). [5] S. P. Dirkse and M. C. Ferris, The PATB solver: A non-monotone stabilization scheme for mixed complementarity problems, Optimization Methods & Software (1994, forthcoming). [6] M. C. Ferris and S. Lucidi, Nonmonotone stabilization methods for nonlinear equations, Journal of Optimization Theory and Applications 81 (1994) 53-74. [7] R. Fletcher, Practical Methods of Optimization (John Wiley & Sons, New York, second edn., 1987). [8] M. Fukushima, Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems, Mathematical Program ming 53 (1992) 99-110. [9] S. A. Gabriel and J. S. Pang, An inexact NE/SQP method for solving the non linear complementarity problem, Computational Optimization and Applications 1 (1992) 67-91. [10] P. T. Barker, Lectures on Computation of Equilibria with Equation-Based Meth ods, CORE Lecture Series (CORE Foundation, Louvain-la-Neuve, Universite Catholique de Louvain, 1993). [11] P. T. Barker and J. S. Pang, Finite-dimensional variational inequality and non linear complementarity problems: A survey of theory, algorithms and applica tions, Mathematical Programming 48 (1990) 161-220. [12] P. T. Barker and B. Xiao, Newton's method for the nonlinear complementarity problem: A B-differentiable equation approach, Mathematical Programming 48 (1990) 339-358. [13] N. B. Josephy, Newton's method for generalized equations, Technical Summary Report 1965, Mathematics Research Center, University of Wisconsin (Madison, Wisconsin, 1979). [14] C. Kanzow, Some equation-based methods for the nonlinear complementarity problem, Optimization Methods and Software 3 (1994) 327-340. [15] M. Kojima and S. Shindo, Extensions of Newton and quasi-Newton methods to systems of PC 1 equations, Journal of Operations Research Society of Japan 29 (1986) 352-374.

86

M. C. Ferris and D. Ralph

[16] B. Kummer, Newton's method for non-differentiable functions, in: Advances in Mathematical Optimization (Akademie-Verlag, Berlin, 1988) pp. 114-125. [17] 0 . L. Mangasarian, Equivalence of the complementarity problem to a system of nonlinear equations, SIAM Journal on Applied Mathematics 31 (1976) 89-92. [18] P. Marcotte and J.-P. Dussault, A note on a globally convergent Newton method for solving monotone variational inequalities, Operations Research Letters 6 (1987) 35-42. [19] J. J. More, Global methods for nonlinear complementarity problems, Techni cal Report, MCS-P429-0494, Argonne National Laboratory (Argonne, Illinois, 1994). [20] J. S. Pang, Newton's method for B-differentiable equations, Mathematics of Operations Research 15 (1990) 311-341. [21] J. S. Pang, A B-differentiable equation based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-132. [22] J. S. Pang and S. A. Gabriel, NE/SQP: A robust algorithm for the nonlinear complementarity problem, Mathematical Programming 60 (1993) 295-338. [23] J. S. Pang, S.-P. Han and N. Rangaraj, Minimization of locally Lipschitzian functions, SIAM Journal on Optimization 1 (1991) 57-82. [24] J. S. Pang and L. Qi, Nonsmooth equations: Motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443-465. [25] E. Polak, Computational Methods in Optimization; A Unified Approach (Aca demic Press, New York, 1971). [26] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227-244. [27] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Pro gramming 58 (1993) 353-368. [28] D. Ralph, Global convergence of damped Newton's method for nonsmooth equa tions, via the path search, Mathematics of Operations Research 19 (1994) 352389. [29] S. M. Robinson, Mathematical foundations of nonsmooth embedding methods, Mathematical Programming 48 (1990) 221-229.

Projected Gradient Methods for NCP

87

[30] S. M. Robinson, An implicit-function theorem for a class of nonsmooth functions, Mathematics of Operations Research 16 (1991) 292-309. [31] S. M. Robinson, Homeomorphism conditions for normal maps of polyhedra, in: A. Ioffe, M. Marcus and S. Reich eds., Optimization and Nonlinear Analysis, Pitman Research Notes in Mathematics Series No. 244 (Longman, Harlow, Essex, England, 1992) 240-248. [32] S. M. Robinson, Normal maps induced by linear transformations, of Operations Research 17 (1992) 691-714.

Mathematics

[33] S. M. Robinson, Nonsingularity and symmetry for linear normal maps, Mathe matical Programming 62 (1993) 415-425. [34] S. M. Robinson, Newton's method for a class of nonsmooth functions, Set Valued Analysis (1993, forthcoming). [35] R. T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, New Jersey, 1970). [36] P. K. Subramanian, Gauss-Newton methods for the complementarity problem, Journal of Optimization Theory and Applications 77 (1993) 467-482.

A.

88

Fischer

Recent Advances in Nonsmooth Optimization, pp. 88-105 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

An N C P - F u n c t i o n and its Use for the Solution of Complementarity Problems Andreas Fischer Institute for Numerical Dresden, Germany

Mathematics,

Technical

University

of Dresden

D-01062

Abstract

So-called NCP-functions serve to transform a complementarity or related prob lem into a nonlinear and frequently nondiiferential system of equations or into a minimization problem. The paper summarizes recent results in the use of the NCP-function (a,b) = \Ja2 + b2 — a — b for the solution of linear and nonlinear complementarity problems and for the solution of the KKT-system belonging to a nonlinear constrained optimization problem.

1

Introduction

An approach t h a t is frequently used to solve complementarity problems is to transform t h e m equivalently into a system of equations or a minimization problem. For this purpose functions

if and only if

a > 0, b > 0, ab = 0

(1)

are of special interest. In [20] such functions tp are called N C P - f u n c t i o n s . Obviously, the one-dimensional nonlinear complementarity problem x > 0,

f(x)

> 0,

xf(x)

= 0

with / : R —> R has the same solution set as the equation V>(.T,/(X)) =

0

does. Various functions

Solution of Complementarity

Problems

89

• Q. •
(l/2)max 2 {0,a + ft},

V,(a,b)

= 0(\a - b\) - 0(a) - 8{b), where 0 : R —> R is any strictly increasing function with 0(0) = 0, see [25].

• ifia(a,b) = ab + ( l / 2 a ) ( m a x 2 { 0 , a - ab} - a2 + max 2 {0, 6 - aa} - ft2) with (a 6 (1 , oo)), see [27] and [42], [44]. During the last few years, particular attention was also paid to the following function if>(a, b) = 4>(a, b) = Va2 + b2 ~ a - b, see [5, 6, 7, 8, 11, 18, 19, 20, 37, 41]. It was developed by W. Burmeister during a discussion with the author in 1985. The function <j> turned out to possess nice proper ties which can be exploited for constructing new merit functions for complementarity problems. Based on this, algorithms with interesting convergence properties for the solution of complementarity problems can be developed. The present paper gives a survey of these recent results together with some improvements and will encourage further studies in this area. The paper is organized as follows. Section 2 starts with some preliminaries and contains basic properties of the function . Then, in Section 3, appropriate merit functions for the nonlinear complementarity problem will be defined. Moreover, their local and global growth behaviour with particular respect to the boundedness of level sets defined by these merit functions will be considered. Section 4 presents a damped and perturbed Newton-type method for the solution of linear complementarity problems with Po - rn at rices. For the approach that, by means of a merit function, the nonlinear complementarity problem is written as a minimiza tion problem Section 5 describes some basic results. Section 6 is concerned with the KKT system of constrained optimization problems, where the function will be used to transform this system into a nonlinear (nondifferentiable) system of equations. A generalized Newton method will be analyzed with regard to the conditions for superlinear or quadratic convergence. Some final remarks in Section 7 conclude the paper.

2

Preliminaries and Basic Properties of (j>

Let a function F : R" —> R71 be given and let R+ denote the nonnegative orthant of if". Then, we consider the nonlinear complementarity problem NCP(F): Find i such that

i € Rl,

F(x) e Rl,

xTF(x)

= 0.

(2)

A. Fischer

90

If F(x) = Mx + q we obtain the linear complementarity problem LCP(q, M) as a particular case: Find i such that

x G R%

Mx + qe #+,

xT(Mx

+ q) = 0.

Sometimes, by introducing a vector y of slack variables, these problems are formulated in the 2n-dimensional space. Then, the problem NCP(.F) would read as follows: Find x, y such that

x 6 R%,

y G #+,

xTy = 0,

F(x) - ?/ = 0.

(3)

As we mentioned in the Introduction it can be useful to transform the nonlinear complementarity problem into an equivalent system of equations. Using the function this can be done by F(x)-y

*(*i,Fi(*))

S(x) =

= 0

or

T(x,y) =

= 0, (Xn,yn)

where the latter equation T(x, y) = 0 corresponds to the 2n-dimensional formulation of the nonlinear complementarity problem. For simplicity let z = (x, y). The solution sets of the problems (2) and (3) are denoted by X* and Z* respectively. For a given set W C Rp the point-to-set distance is defined as usual by dist[u,jy]= inf \\v — w\\. Unless otherwise specified, .|| denotes an arbitrary norm. Further, for r > 0 the set B(W,r) is introduced by B{W,r) = { u e f l p | d i s t [ t ; , W ] < r } . Let the function h : Rp —> Rq be Lipschitz-continuous in an open neighbourhood of to G RP. Then, dBh{w) = ID I 3{u>"} : Jim w" = w, 3{Vh(wl/)},

lira Vh(w") =

DT\

was introduced in [35] and is called the B-subdifferential of h at w [34]. With this notation Clarke's Jacobian dh(w) can be written as the convex hull of dsh(w), i.e., dh(w) = codfl/i(u>). See [1] for basic properties of dh(w). At this point we are able to define semismooth functions, see [28]. The more general definition used here is taken from [36].

Solution of Complementarity

Problems

91

Again, let the function h R? —► ft' be Lipschitz-continuous in an open neigh bourhood of w £ Rp. The function h is said to be semismooth at w if lim {Vd'\V

£dh(w + td')\

(4)

exists for any d £ RP. Further, according to [36, 37] h is said to be strongly semismooth (or first-order semismooth) at m if h is semismooth at w and for any V £ dh(w + d), 0,

Vd-h,(w;d)

=

0(\\df),

where h'(w; d) denotes the directional derivative of h at w with respect to d. (If h is semismooth at w, then h is directionally differentiable at ID and h'(w; d) is equal to the limit in (4).) Now, let us recall some elements of the theory of complementarity problems. For more details see e.g. [2, 13, 15, 16]. At first, let / = { 1 , . . . , n). The function F is called strongly monotone if, for some c > 0, (x - y)T(F(x)

- F(y)) > c\\x - ytf

(x, y £ ft").

The function F is called strongly copositive if, for some c > 0, xT(F(x)

- F(0)) > C ||x|| 2

(x £ ft;).

This implies that the NCP(F) has a nonempty and bounded solution set. If the LCP(0, M) has the single solution x* = 0, then M is called Ro-matrix (M £ RQ). This property is equivalent to the assumption that the solution set X* of the LCP(g, M) is bounded for every q £ R" A matrix M is called positive semidefinite (respectively, positive definite) if xTMx > 0 (respectively, xTMx > 0) for &l\ x £ R". A matrix M is said to be a Po-matrix (M £ P0) if all its principal minors are nonnegative. It follows that any positive semidefinite matrix must be a Po-matrix. If M £ Ro O Po, then the solution set of LCP(g, M) is not empty for every q £ R" The matrix M is called copositive if xTMx > 0 for all x £ iPJ.. If the condition [o ^ x £ .ft"]

=>

[x, > 0 and ( M i ) , > 0 for some i £ I]

is satisfied, M is said to be semimonotone (M £ EQ). The class Eo of semimonotone matrices contains both the copositive and the /o - matrices. The latter class contains the positive semidefinite matrices. If there is a vector x / 0 such that Mx £ ft!J. and x G ft!}., then M is called 5 0 -matrix (M £ S0). If there is x £ ft+ such that (Mx), > 0 for all i £ I, the matrix M is called S-matrix (M £ S). A solution x* G X* of the linear complementarity problem LCP(q, M) is said to be nondegenerate if x* + (Mx* + q)i > 0 (i £ I).

92

A.

Fischer

T h e LCP(<7, M) is called nondegenerate if it has a nondegenerate solution. T h e remaining l e m m a s summarize some basic properties of t h e function <j>, see [5,8,41]. L e m m a 2.1 The function

is

(a)

an NCP-function,

i.e., property

(1) is

satisfied,

(b)

subadditive,

(c)

positive

(d)

convex, i.e., <j>(aw + (1 — ct)w') < a(w) + (1 — a){w') for all w,w'

i.e., <j>(w + w') < <j)(w) + 4>{w') for all w, w' E R2,

homogeneous,

i.e., <j>(aw) = ct(j>(w) for all w G R

and all a > 0, G R2

an

all a G [0,1], (e)

Lipschitz-continuous for all w,w' 6 R2,

(f)

everywhere

(g)

differentiable

with modulus L = l + \ / 2 , i.e., \(w)—<j>(w')\ < L \\w—w'\\2

directionally

differentiable,

at any point (a,b) apart from ( 0 , 0 ) .

L e m m a 2 . 2 For all w = (a, 6) ^ (0,0) we have for the partial 4>a(a, b)~l-

a/y/a2

and it holds the inequality

+ 62 G [ - 2 , 0 ] ,

derivatives

& ( a , b) = 1 - 6 / V a 2 + b2 G [ - 2 , 0 ]

(4>a(a,b))2 + (i,(a,b))2 > 3 — 2 \ / 2 > 0. More

generally,

A2 + 7 2 > 3 - 2\f2 > 0 is valid for all (A, 7) G d(j>(w) and all w G R2 T h e last inequality is i m p o r t a n t for proving results on loacal superlinear convergence without strict complementarity (see Theorems 4.3, 6.1 and 6.2). To show global convergence for t h e N e w t o n - t y p e method t h a t will be described in Section 4, property (g) of L e m m a 2.1 is of particular interest. This property together with (b) and (c) seems mainly responsible for t h e very encouraging numerical results for different m e t h o d s t h a t employ the function <j> (see [5, 8, 11, 18, 20] ). T h e next l e m m a is needed to prove the boundedness of suitable level sets (compare T h e o r e m 3.1, L e m m a 3.2 and Corollary 3.3). L e m m a 2 . 3 Let {(a",h")} C R2 (a1- —» —00)

or

with

{V —> —00)

or

for v —► 00. Then, \<j>(a", 6")| —» 00 for v —> 00.

(a" —+ 00 and b" —> 00)

d

Solution of Complementarity

Problems

93

L e m m a 2.4 The following inequalities are valid: -(2-v/2)min{a,6) — min{a,f>} < (a,b) 0) (ab < 0) (a,6<0)

Consequently, it holds in particular ( 2 - v / 2)|min{a,6}| < |
3

{a,b e R).

Merit Functions and their Growth Behaviour

Merit functions can be used as a tool both for proving convergence properties and for the design of algorithms. For instance, merit functions appear in step length proce dures or serve to replace the original problem by a minimization problem. Since the actual distance from a given point to the solution set of an NCP(F) is not available in general, a merit function for the NCP(F) is used in some sense as a measure for that distance. Therefore, relations between merit functions and the actual distance are important and will be given below for two special functions. Based on the refor mulation of the NCP(F) as a system of equations S(x) = 0 or T(z) = 0 (see Section 2) the norm of the natural residual ||5(x)||

or

||T(*)||

(5)

is used as merit function. That means, ||5(i)|| (respectively HT'(z)ll) serves as a measure for the distance dist[i, X"} (respectively dist[z, Z*]). Theorem 3.1 Let F be Lipschitz-continuous (a)

and assume that X* / 0.

Then there is a constant CQ > 0 such that

\\s(x)\\
{xeRn),

\\T(z)\\ < c0dist[z,Z'\

{zeR2n).

(b) If either F is strongly monotone or F(x) = Mx + q with M 6 RQ, then there is a constant C\ > 0 such that
(x£R"),

(6)

dist[z,Z'}
(zeR2n)-

(7)

If F is Lipschitz-continuous only on an open subset of Rn containing X*, then the previous theorem can easily be modified. Moreover, for strongly copositive functions

A. Fischer

94

F the inequalities (6) and (7) hold for all x 6 Rn with distfa:, X*} > 1 and all z e R2n with dist[z, Z'\ > 1 resp. To prove global convergence of an algorithm it has not only to be shown that at least one of the accumulation points generated by the algorithm is a solution of the problem but also that such a point exists at all. A sufficient condition that is frequently used for the existence of an accumulation point is the boundedness of a suitably defined level set. With regard to (5) the following level sets are considered: 0 s ( r ) = {XerT\

\\S(x)\\ < r},

fir(r)

= {z € i? 2 " | ||T(z)|| < r } .

Lemma 3.2 Let the solution set X" (respectively Z*) be nonempty and bounded. Then, for an arbitrarily chosen r > 0 equation (6) (respectively (7)) implies that the level set Sls{r) (respectively flrir)) is bounded. Theorem 3.3 If either F is Lipschitz-continuous and strongly copositive or F(x) = Mx + q with X' =£ 0 and M e Ro, the level sets fis(r) and ^ ^ ( r ) ^ are bounded for an arbitrarily chosen r > 0. The following theorem (for the case of an affine linear function F) on the local growth behaviour can be found in Fischer [6]. There, a result of Robinson [38] plays an essential role for the proof. Another possibility is to exploit a result on the growth behaviour of | | m i n { i , M i + i?}|| in Luo/Mangasarian/Ren/Solodov [23], see also Luo/Tseng [24]. Theorem 3.4 Let F(x) = Mx + q with X* ^ 0. Then there is a constant C2 > 0 suck that dist[x,X'} < C2||5(i)||

(x £ B{X', 1)),

dist[z, Z*\ < C 2 ||r(z)||

(z 6 B(Z\

1)).

For additional details and general results concerning the growth behaviour of different general classes of merit functions for nonlinear complementarity, we refer to the paper of Tseng [41]. At the end of the section let us note that any norm can be used to define the merit functions (5). Moreover, it can be useful to employ the squared norm of the natural residuals ||5(«)||» or ||r(2)||2, as merit functions, see Section 5. It is easy to adapt the previous results for this case.

4

A Newton-Type Method for LCP

Throughout this section only linear complementarity problems LCP(q, M) with a Po-matrix M will be considered. The damped and perturbed Newton-type method

Solution of Complementarity

Problems

95

we are going to describe is based on [6, 8], although the assumptions used there are sometimes stronger. The method reduces the merit function ||T(z)||oo in each step by means of a step length procedure. The search direction is obtained from a modified Newton linearization applied to T(z) — 0. This modification has three main aims: • To have well defined subproblems even if T is not differentiable at a point zk, • To guarantee that all occurring subproblems are solvable, • To enable a superlinear rate of convergence under relatively mild conditions. Algorithm SO:

Choose z° € R2n and ft G (0,1]. Set k = 0.

Si:

If T(zk) = 0, then stop.

S2:

Determine Azk = (Ax*, Ayk) e R2n such that G(zk)Azk

S3:

+ T(zk) = 0.

Find the largest number tk £ { (\)j \ j £ N} such that ||T(z* + t * A z * ) | j „ < ( l - ^ ) | | r ( z * ) | U .

S4:

Set zk+1 = zk + tkAz*

and k = k + l.

Go to Si.

In view of the subsequent definition, the matrix G(z) can be regarded as a perturba tion of the Jacobian VT(z) (if it exists). Let G(z) =

M A(z)

-I T(z)

where the entries of the diagonal matrices A(z) = diag(A,(z)),

T(z) = diag( 7 .(z))

are given by X(z) - I MxUVi) '^ ' 1 —ff(xi) iz\ _ [
1 — o(\)i)

if

\MxirVi)\ otherwise

if

\Mx'iy>)\

>

a x

( i)

> a(y>)

otherwise

if (xi,yi) ^ (0,0). In the case that (x;,t/i) = (0,0) we set A,(z) = - 1 , v(z) = - R+ is defined by a (a) = 0.01(1 + a 2 )" 1 min{||T(z)||2XD, 1}.

A.

96

T h e o r e m 4.1 The algorithm is well defined, in particular length t^ > 0 can always be determined.

a direction

Az

Fischer

and a step

T h e o r e m 4 . 2 Each accumulation point of the iteration sequence {zk} is a solution of LCP(q,M). If the level set Qx(|jT(z 0 )||oo) is bounded, then (at least) one accumu lation point exists. A condition ensuring the boundedness of fir(r) is given in Corollary 3.3. T h e following two theorems on local superlinear convergence assume t h a t t h e algorithm does not end with a solution after a finite n u m b e r of steps. T h e o r e m 4 . 3 Let the linear complementarity z* = (x*,y*) which satisfies the condition xTMx //, in addition, (a)

> 0

for all

the iteration

x ^ 0

problem LCP(q, M) possess a

with

x, = 0

sequence {z } is bounded,

z" = lim zk is the unique solution

of LCP(q,

if

solution

y* > 0.

then the following

(8) holds:

M),

k~*oo

(b)

tk = 1 for all sufficiently

(c)

There is a constant

large k 6 N,

C, > 0 such

that

\\zk^-z'\\
(keN).

Condition (8) is closely related to second-order sufficient o p t i m a l i t y conditions for constrained optimization problems and implies t h a t t h e linear c o m p l e m e n t a r i t y prob lem LCP(<7, M) is uniquely solvable, see [26]. For stating t h e next result which avoids such a condition let ea be defined by ea = e 1 " 1 - 2 0 ' T h e o r e m 4.4 Let the LCP(q,M) Then, there is e > 0 such that:

(e > 0).

be nondegenerate

If for some a S [0,1/2) and k0 £ N the

and M be positive

conditions

dist[zko,Z'}||r(^)|r are fulfilled, that

(9) (*£/)

then tt, = 1 is valid for all k > ko and there is a constant dist[zk+\

semidefinite.

Zm] < C'dist[zk,

Z'Y~a

{k G N).

(10) C* > 0 such

Solution of Complementarity

Problems

97

Instead of a condition like (8), the LCP(g, M) is required to be nondegenerate (see Section 2). Moreover, it is assumed that the algorithm generates an iterate zk° which is (in the sense of conditions (9), (10)) sufficiently close to a nondegenerate solution. The closer a is to | , the weaker condition (10) is. On the other hand, if a tends to | , then ta goes to 0. The motivation for proving Theorem 4.4 came from interior-point methods and from the observation that the Newton-type algorithm which was suggested in [8] frequently converges superlinearly although condition (8) is violated. In contrast to interior-point methods the components of the iterates zk are also allowed to be negative or zero. In spite of this, due to the definition of G(zh) the local convergence behaviour is similar to a number of interior-point methods if the convergence to a nondegenerate solution is considered. For degenerate linear complementarity prob lems which satisfy condition (8) the present algorithm converges Q-quadratically, whereas all interior-point methods that belong to a general framework generate at most linearly convergent sequences [30].

5

Minimization of Merit Functions

The concept to transform the nonlinear complementarity problem into an equivalent equation ([25] and many others) is frequently closely related to the idea of reformu lating the complementarity problem as a minimization problem. Different possibili ties for using this general idea were proposed. In [11, 17, 27, 42, 44] unconstrained minimization techniques are considered whereas in [9, 4, 27, 29, 31, 33, 40] the com plementarity problem is reformulated as a minimization problem with nonnegativity constraints. Throughout this section let F : Rn —> Rn be continuously differentiable. A classical approach for the solution of a nonlinear equation, in our case S(x) = 0 or T(z) = 0, is to minimize the norm of the natural residual. In the previous section this was done for T(z) = 0 (with F(x) = Mx + q) for the maximum-norm ||. ||oo by applying a Newton-type method. Of course, one can use other norms. An important question which will appear in any case is the following: Which (weakest possible) condition guarantees that any minimum of the natural residual is also a global one. Apparently, if the nonlinear equation (i.e., the underlying complementarity problem) is solvable, then any global minimum will solve the complementarity problem. With regard to Section 4 the question mentioned can be answered for the case of minimizing HTWHoo (for linear as well as for nonlinear complementarity problems) by Theorem 5.1 Let V F ( i ' ) be a P0-matrix. (a) If z' = (x",y") is a minimum of \\T(z)\\<xi, then z' is also a global minimum of IIT^Hso on R2n with T(z') = 0. (b) If z" is not a minimum, then there is a descent direction d £ R2n such that ||T(z*-M 0.

A. Fischer

98

Recently, in continuation of the general approach [17], Geiger/Kanzow [11] proposed to minimize ||S(x)||2. This can be very advantageous, in particular with regard to large-scale nonlinear complementarity problems. To minimize ||f(x)|j;j or ||S(x)|| it is possible to use gradient like descent directions or a suitable combination of them with Newton-type directions, for a first approach see the algorithm in [11]. Theorem 5.2 Let VF(x') be a P0-matrix. IfV(\\S(x')\\£) minimum o/||S(x)||2 on Pc1 with S(x*) = 0.

= 0, then x' is a global

For the case of a positive semidefinite matrix VF(x') this result is given in [11]. The result for the more general case with Po-matrices is contained in an unpublished manuscript of F. Facchinei (independent of [11]). Whereas ||S(x)||2 is everywhere continuously differentiable, the function ||S(x)||2 has this property for all x £ Rn\X" For the latter function we obtain Corollary 5.3 Let VF(x')

£ P0. Then, S(x') ± 0 implies v(||S(x)|| 2 ) # 0.

The assumption VF(x') 6 P0 is (to my knowledge) the weakest known condition in order to ensure that a stationary point x' of a suitably defined unconstrained minimzation problem solves the nonlinear complementarity problem NCP(i ? ). For instance, let us consider the minimization problem n

ma{x) = ^v? Q (x,-,F,(x)) -► min, ;=i

where the NCP-function <pa given by Mangasarian/Solodov [27] (see Section 1) is used. Yamashita/Fukushima [44] proved that any stationary point x* of ma is a so lution of NCP(F), if VF(x*) is positive definite. By a counterexample they showed that, in general, this condition cannot be replaced by the weaker positive semidefiniteness. However, with regard to some of the papers cited above, in particular [31], the question arises whether the condition V F ( i " ) 6 Po can be further weakened, if the nonlinear complementarity problem is transformed into a constrained minimization problem. Looking to the definition of stationary points (Vn|S(x)|||J = 0), one will find the next lemma. Let [VF(x*)]i denote the principal submatrix of V F ( i ' ) corresponding to the index set T C 7 and let V = {i £ 7|x,- > 0, Fi(x) > 0}. Lemma 5.4 Let x~ 6 R^, F(x') e R\ and S(x') ± 0. If [VF(x')]v £ S0 or [VF(x*)] r £ S or [VF(x*)] z £ E0 for some index set 1 C I with I D V, then it follows thatV(\\S{x')\\l) jtO. As it was mentioned in the Introduction the class Eo contains both the P0- and the copositive matrices. Keeping the lemma in mind it seems possible to obtain results

Solution of Complementarity

Problems

99

that need similar or weaker assumptions than the one used by More [31]. He replaced the NCP(F) with the constrained minimization problem |\\H{x,v)\\l-*jmi |jy(x,!/)||2-niin

subject t t

x > 00

y > 00

where H{x, y) = (Fi(x) - yu ,.., Fn(x) ~yn,xlVu..., xnyn)T. In our case one can consider the constrained minimization problem \\T(i, y)\\2 - min

subject to

x > 00 , > 00

An important question with respect to the iterative solution of complementarity problems is how the boundedness of the iterates can be ensured. If the method reduces the merit functions ||T(z)|| or ||5(i)|| in some sense (also a nonmonotone line search [12] can be very efficient), it suffices that the corresponding level sets S7x(||T(^°)||) or ns(||5(i°)||) are bounded, compare Section 3.

6

A Newton Method for the K K T System

The Karush-Kuhn-Tucker (KKT) conditions for the optimization problem / ( s ) ^- m i n

ssbject to

{s) 9l(s) 9i

>0

(( G £ J = { l , . . . ,m})

((11

with differentiate functions /,gu.. .,gm : R* -» i? can be written as a mixed nonlinear complementarity problem. If additional bounds on the variables (s3 < 0 for all ; € J) occur in (11), then we would obtain a nonlinear complementarity problem, see e.g. [15]. Using the Lagrangian s), L(,,«) = /( Sf{s)-±u )-g>^), L{s,u) =

i=l

3gi{

the KKT conditions belonging to (11) read as follows: Vsi(s,u)=0,

Uj>,,

gj(s)>0,

ujgj{s)

=0

(t € J ) .

(12)

In [5] the function was used to toansform t h t KKK conditions into tot equivalent nonlinear and nondifferentiable system of equations '

R(s,u)= = R(s,u)

VsL(s,u)

'

f>{9l{s Ul)

'

}

== 00..

. (9m(s),Um) . Then, the local Q-quadratic convergence of the following perturbed Newton method was analyzed. For simplicity let w = (s,u) € Rn+m

100

A.

Fischer

Algorithm SO:

+m Choose w° € .R" fl"+m and eps > 0. Set k = 0.

SI:

If R(w*) fl(w*) = 0, then stop.

S2:

Choose wk € R"+m Choose Gk G dR(wk) R(wk)

S3:

such t h a t \\wk - wk\\ < min(eps,

||rt(u;*)||). \\R(wk)\\).

and compute «!*« wk+1 as t h e solution of

+ Gk(w - wk) = 0.

Set k = fck + 1 and go to Step SI.

By a suitable choice (see e.g. [5]) of t h e vectors wk it can be ensured t h a t t h e m a p p i n g R is differentiable for all occurring iterates wk). However, we note t h a t (according to t h e subsequent Theorem) t h e choice wk = wk is always possible and does not destroy t h e Q - q u a d r a t i c convergence to a K K T point (s*, «*). To formulate t h e corresponding result let the index sets J+ C J* C J be given by J' = {J G e ■ % ( * * ) = 0o

aan

J J = {j G g*\u) g*\u* > 0}o

Further, we m a k e use of t h e linear independence condition (LI) and of t h e strong second-order sufficiency condition (SSOS), see Robinson [39]: (LI) (SSOS)

T h e vectors Vjj(s*) T

s V„L(s',u*)s>0 VssL(s",u*)s>0

(j £ J*) are linearly independent for all s ^ 0 with sTV9j(s') {s*)

= 0 (; 6 JJ++)

T h e o r e m 6.1 Let f,gu---,9m be twice Lipschitz-continuously differentiable in an open neighbourhood of s" Furthermore, let w' = (^*,«•) be a point satisfying the KKT conditions (IS) and the conditions (LI) and (SSOS). If w° G B(w',e) for a sufficiently small e > 0, then the algorithm described above generates a well defined sequence {wk} C B{w',t). Moreover, if this sequence is infinite, it converges Qquadratically to w' In contrast to t h e case when certain other (differentiable) NCP-functions are used to transform the K K T conditions (12) into an equivalent nonlinear system of equations, e.g. ifii, <pa or if this transformation is done according to t h e related approach in [21], t h e previous Theorem does not need the strict complementarity condition J" = J+. On t h e other hand, in comparison with other nondifferentiable N C P - f u n c t i o n s , e.g. is moro robust (with respece to t h e existence of Gl1 far from a solution w', since is sifferentiable eor rll l o i n t s s a a r t trom ( 0 , 0 ) ) . In [5] it was mentioned (Remark 4.3) t h a t , with regard to t h e theory of Kummer [22], a superlinear rate of convergence is also possible for only one t i m e dif ferentiable functions f,gu...,gm with Lipschitz-continuous derivatives, if additional

Solution of Complementarity

Problems

101

assumptions are satisfied. A very nice result in this direction was recently given by Qi/Jiang [37]. They made use of semismooth functions (see Section 2) and obtained in particular the following result which generalizes Theorem 6.1 to a larger class of func tions such that important applications like stochastic quadratic programs and minimax problems are incorporated. Instead of (SSOS) an appropriate generalization is necessary: (SSOS') sTHs > 0 for all H € d,VaL(s,u)

and all a # 0 with sTVgj{s')

= 0 ( j £ J+)

Theorem 6.2 Let f,gi,-..,gm be Lipschitz-continuously differentiable with semismooth (respectively, strongly semismooth) derivatives in an open neighbourhood of s' Furthermore, let (s*,u') be a point satisfying the KKT conditions (12) and the conditions (LI) and (SSOS'). If w° G B(w*,t) for a sufficiently small e > 0, then the algorithm described above generates a well defined sequence {w } C B(w',e). Moreover, if this sequence is infinite, it converges Q-superlinearly (respectively, Qquadratically) to w" The algorithm considered here (see [5]) is in some sense more general than the one in [37]: We allow Gk to be taken from Clarke's Jacobian dR(iv) at w = wk or at the perturbed iterate w = wk.

7

Final Remarks

At the end of this survey some possible or imaginable extensions and generalizations shall be mentioned. The function (f> can be used to transform more general problems into an equivalent system of equations or into a minimization problem (mixed com plementarity problems, variational inequalities). Different results presented in this paper allow a corresponding generalization. With regard to the superlinear conver gence of the algorithm for LCP(g, M) in Section 4, it should be possible to prove such results also for nonlinear problems under suitably modified assumptions. A further interesting research topic is the design of algorithms for constrained optimization problems with global convergence, see [19] for a result concerning linearly constrained convex minimization problems. With regard to Section 6 as well as to a number of papers (e.g. [4, 34]), where semismoothness appears as a key for weakening differentiability requirements, one is led to employ this, in particular, for algorithms that reduce a suitable merit function (see Section 5). In order to weaken the conditions (Section 3) which ensure that an algorithm generates at least one accumulation point, it can be useful to look for modifications of the algorithm. A first approach was proposed in [7] for the special case of quadratic programming problems. Acknowledgment: The author is very much obliged to F. Facchinei, C. Kanzow,

102

A. Fischer

L. Qi and P. Tseng for discussions concerning different topics of this paper and to the anonymous referees for their remarks.

References [1] F. H. Clarke, Optimization and nonsmooth analysis, John Wiley, New York, 1983. [2] R. W. Cottle and J.-S. Pang and R. E. Stone, The linear complementarity prob lem, Academic Press, 1992. [3] Yu. G. Evtushenko and V. A. Purtov, Sufficient conditions for a minimum for nonlinear programming problems, Soviet Mathematics Doklady 30 (1984) 313316. [4] F. Facchinei, Minimization of SC1 functions and the Maratos effect, Preprint, Department of Informatics and System Theory, University of Rome, 1994. [5] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992) 269-284. [6] A. Fischer, On the local superlinear convergence of a Newton-type method for LCP under weak conditions, Preprint MATH-NM-07-1993, Institute for Nu merical Mathematics, Technical University of Dresden, 1993 (revised 1994). [7] A. Fischer, A Newton-type method for sparse quadratic programming problems, 15th International Symposium on Mathematical Programming, Ann Arbor, 1994. [8] A. Fischer, A Newton-type method for positive semidefinite linear complemen tarity problems, Journal of Optimization Theory and Applications 86 (1995). [9] A. Friedlander and J. M. Martinez and S. A. Santos, Resolution of linear comple mentarity problems using minimization with simple bounds, Technical Report, Department of Applied Mathematics, University of Campinas, 1993. [10] M. Fukushima, Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems, Mathematical Program ming 53 (1992) 99-110. [11] C. Geiger and C. Kanzow, On the resolution of monotone complementarity prob lems. Preprint 82, Institute of Applied Mathematics, University of Hamburg, 1994. [12] L. Grippo and F. Lampariello and S. Lucidi, A nonmonotone line search tech nique for Newton's method, SIAM Journal on Numerical Analysis 23 (1986) 707-716.

Solution of Complementarity

Problems

103

[13] P. T. Harker and J.-S. Pang, Finite-dimensional variational inequality and non linear complementarity problems: A survey of theory, algorithms and applica tions, Mathematical Programming 48 (1990) 161-220. [14] P. T. Harker and J.-S. Pang, A damped Newton method for the linear com plementarity problem, in: E. L. Allgower and K. Georg (ed.), Computational Solution of Nonlinear Systems of Equations, Lectures in Applied Mathematics Vol. 26, AMS, Providence, Rhode Island, 265-284. [15] G. Isac, Complementarity Problems, Lecture Notes in Mathematics, Vol. 1528, Springer-Verlag, Berlin, Heidelberg, 1992. [16] J. J. Judice, Algorithms for linear complementarity problems, in: E. Spedicato (ed.), Algorithms for Continuous Optimization, Kluwer Academic Publishers, 1994, 435-474. [17] C. Kanzow, Nonlinear complementarity as unconstrained optimization, Journal of Optimization Theory and Applications, to appear. [18] C. Kanzow, Global convergence properties of some iterative methods for linear complementarity problems, Preprint 72, Institute of Applied Mathematics, University of Hamburg, 1993, SIAM Journal on Optimization, to appear. [19] C. Kanzow, An unconstrained optimization technique for large-scale linearly constrained convex minimization problems, Computing 53 (1994) 101-117. [20] C. Kanzow and H. Kleinmichel, A class of Newton-type methods for equa lity and inequality constrained optimization, Preprint 61, Institute of Applied Mathematics, University of Hamburg, 1992, revised version 1994, Optimization Methods and Software, to appear. [21] H. Kleinmichel and K. Schonefeld, Newton-type methods for nonlinearly con strained programming problems - algorithms and theory, Optimization 19 (1988) 397-412. [22] B. Kummer, Newton's method for non-differentiable functions, in: J. Guddat et al. (ed.), Mathematical Research, Advances in Mathematical Optimization, Akademie Verlag, Berlin, 1988, 114-125. [23] Z.-Q. Luo and O. L. Mangasarian and J. Ren and M. V. Solodov, New error bounds for the linear complementarity problem, Mathematics of Operations Re search, to appear. [24] Z.-Q. Luo and P. Tseng, Error bound and convergence analysis of matrix split ting algorithms for the affine variational inequality problem, SIAM Journal on Optimization 2 (1992) 43-54.

104

A. Fischer

[25] O. L. Mangasarian, Equivalence of the complementarity problem to a system of nonlinear equations, SIAM Journal on Applied Mathemaiics 11 (1976) 89-92. [26] O. L. Mangasarian, Locally unique solutions of quadratic programs, linear and nonlinear complementarity problems, Mathemaiical Programming 19 (1980) 200212. [27] O. L. Mangasarian and M. V. Solodov, Nonlinear complementarity as unconstrained and constrained minimization, Mathemaiical Programming 62 (1993) 277-297. [28] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957-972. [29] R. D. C. Monteiro and J.-S. Pang and T. Wang, A positive algorithm for the nonlinear complementarity problem, Preprint, Department of Systems and Industrial Engineering, University of Arizona, Tucson, 1992. [30] R. D. C. Monteiro and S. Wright, Local convergence of interior-point algorithms for degenerate monotone LCP, Preprint MCS-P357-0493, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, 1993. [31] J. J. More, Global methods for nonlinear complementarity problems, Preprint MCS-P429-0494, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, 1994. [32] J.-S. Pang, Newton's method for B-differentiable equations. Mathemaiics of Operations Research 15 (1990) 311-341. [33] J.-S. Pang and S. A. Gabriel, NE/SQP: A robust algorithm for the nonlinear complementarity problem, Mathematical Programming 60 (1993) 295-337. [34] J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443-465. [35] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathemaiics of Operations Research 18 (1993) 227-244. [36] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathemaiical Programming 58 (1993) 353-367. [37] L. Qi and H. Jiang, Karush-Kuhn-Tucker equations and convergence analysis of Newton methods and Quasi-Newton methods for solving these equations, Applied Mathematics Report AMR94/5 (revised version), School of Mathematics, The University of New South Wales, 1994.

Solution of Complementarity

Problems

105

[38] S. M. Robinson, Some continuity properties of polyhedral multifunctions, Mathematical Programming Study 14 (1981) 206-214. [39] S. M. Robinson, Generalized equations and their solutions, Part II: Applications to nonlinear programming, Mathematical Programming Study 19 (1982) 200-221. [40] K. Taji and M. Fukushima and T. Ibaraki, A globally convergent Newton method for solving strongly monotone variational inequalities, Mathematical Program ming 58 (1993) 369-383. [41] P. Tseng, Growth behaviour of a class of merit functions for the nonlinear comple mentarity problem, Technical Report, Department of Mathematics, University of Washington, Seattle, 1994. [42] P. Tseng and N. Yamashita and M. Fukushima, Equivalence of complementarity problems to differentiable minimization: A unified approach, Information Science Technical Report NAIST-IS-TR94005, Nara Institute of Science and Technology, Graduate School of Information Science, Nara, 1994. [43] A. P. Wierzbicki, Note on the equivalence of Kuhn-Tucker complementarity con ditions to an equation, Journal of Optimization Theory and Applications 37 (1982) 401-405. [44] N. Yamashita and M. Fukushima, On stationary points of the implicit Lagrangian for nonlinear complementarity problems, Journal of Optimization Theory and Applications, to appear.

J. B. G. Frenk and J.

106

Gromicho

Recent Advances in Nonsmooth Optimization, pp. 106-120 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

An Elementary Rate of Convergence Proof for the Deep Cut Ellipsoid Algorithm J. B. G. Frenk Econometric Institute, DR Rotterdam, The

Erasmus University Netherlands

Rotterdam,

P.O.Box

1738, NL - 3000

J. Gromicho Tinbergen & Econometric Institute, Erasmus University Rotterdam, P.O.Box 1738, NL - 3000 DR Rotterdam, The Netherlands and Departamento de Estatistica e Investigaccao Operacional, Faculdade de Ciencias, Universidade de Lisboa, Bloco C2 - Campo Grande, Cidade Universitaria, P - 1700 Lisboa, Portugal

Abstract

In the literature several variants of the ellipsoid algorithm for solving nonsmooth convex optimization problems are discussed, sometimes in combination with a rate of convergence proof. Unfortunately, almost all of these proofs lack simplicity and fail to explain the effect of so-called deep cuts on the convergence rate. In this paper we present an extremely simple and geometrically intuitive proof that overcomes these limitations. Moreover, our proof applies to a lesser known variant of the ellipsoid algorithm which tries to overcome some of the numerical instabilities of the classical versions.

1

Introduction

T h e first explicit s t a t e m e n t of the ellipsoid algorithm is due to Shor ([23, 25]) who dis covered this m e t h o d through his work on space dilation in subgradient optimization ([21, 22, 24]). Moreover, Yudin and Nemirovskii ([17, 26, 27]) pointed out t h e im plications of this algorithm in complexity issues for convex p r o g r a m m i n g . Following Shor's papers different versions of the ellipsoid algorithm appeared in t h e literature.

Deep Cut Ellipsoid

Algorithm

107

Among them we mention the versions discussed in the computational oriented pa pers of Ecker and Kupferschmid ([4, 5]) and Dziuban, Ecker and Kupferschmid ([3]). Around the same time Goffin ([9]) completed in detail the incomplete proofs of Shor and of Yudin and Nemirovskii and using a rather complicated proof he derived a similar rate of convergence result as discussed in the present paper covering only the central cut version applied to an unconstrained nonsmooth convex optimization prob lem. The proof was extended to the constrained case by Luthi ([15]) still covering only central cuts. Contrary to our elementary and geometrically intuitive approach both the proofs of Goffin and Luthi need a deep result in convex analysis about vol umes of so-called concave arrays. Observe that Frenk, Gromicho and Zhang gave in [8] a convergence proof of a version of the ellipsoid algorithm applied to a constrained nonsmooth convex optimization problem showing the beneficial influence of deep cuts in the convergence rate. This proof, contrary to the previous ones, is based on simple results for convex functions and can be generalized to the larger class of so-called lower subdifferentiable functions ([18]). However, the geometrical interpretation of the proof given in [8] is not immediately clear. Subsequently, the same authors sim plified in [7] the proof for the unconstrained case by replacing the proof given in [8] by a more geometrically intuitive approach. This approach is now extended in this paper to the constrained case using related techniques.

2

The Algorithm

The term convex programming applies to the problem of minimizing a convex function over a closed convex set. To mention some fields of quantitative decision making these problems occur frequently in engineering, economics and management science. In this paper we only consider finite dimensional instances of the problem. Therefore our universe will be the s-dimensional Euclidean space 1R3. To be more precise, a finite dimensional convex optimization problem is given by inf {/(as) .x£R3,

g,(x) < 0, t = 1 , . . . ,n)

(P)

with / , <7; : iR*—> M, i = 1, . . . , n , denoting a set of convex finite valued functions over the s-dimensional Euclidean space. Since each function
J. B. G. Frenk and J. Gromicho

108

be applied to general instances of (P) assume in most cases that the objective function f and the constraint functions gt, i = 1 , . . . , n, are differentiable ([2, 6, 14]). The ellipsoid algorithm does not require differentiability. Moreover, since the maximum of a finite number of finite valued convex functions is finite valued and convex we may take in the definition of (P), without loss of generality, the number of different constraint functions equal to one, i.e. n = 1. In fact, gt(x) < 0 for every i = 1 , . . . ,n if and only if maxi<,< n gi(x) < 0. For simplicity we denote this single constraint function by g instead of g%, A similar argument applies to the objective function / and so optimization problem (P) also covers min-max problems. In order to introduce a version of the ellipsoid algorithm to solve (P) we make the following assumption. Observe that we denote by B the open Euclidean unit ball. A s s u m p t i o n 2.1 An optimal solution x" of the optimization problem (P) exists sat isfying x* g ao + rB with ao g Ms and r > 0 known in advance. By the above observations we may assume (without loss of generality) that a con vex programming problem is defined by two convex functions, the objective function and the constraint function, and therefore it has the following description. min{/(x) : x £ Ms, g(x) < 0}.

(C)

We now introduce the definition of level sets. Definition 2.2 For a given function tp : M"—> IR the so-called level set of level a is given by £=(«) := {as € M3 : ¥>(«) = ° } while the so-called lower level set of level a has the form C^(a):={xeRs:^(x) 2R an affine function that C^(y) is an hyperplane for every y € M and so Cj^(y) is an halfspace. We shall call this set a lower halfspace. To finish this section we mention a result which is an immediate consequence of the subgradient inequality for convex functions and this result will be extremely useful in the next section. L e m m a 2.3 Let

tp(a) + h(x) for every x g M3. Now the result follows immediately by the definition of a lower level set. □

Deep Cut Ellipsoid

3

Algorithm

109

The Ellipsoid Method

In this section we assume the existence of an oracle that for each convex function involved in the definition of (C) is able to provide at every point of its domain the function value and an element of the subgradient set. Thus, we will freely use f(x) and g(x) for different instances of x and take elements of the sets df(x) and dg(x). The main idea of the ellipsoid algorithm is the following. By Assumption 2.1 there exists a sphere with center a0 containing an optimal solution. Subsequently an oracle (to be detailed) provides an hyperplane including the center of the sphere and with the property that the corresponding lower halfspace includes all the optimal solutions. Clearly, by Assumption 2.1 an optimal solution belongs to the intersection of the initial sphere and the lower halfspace and this halfsphere has half the volume of the sphere. It is now possible to compute an ellipsoid with minimal volume containing this intersection. We now replace the initial sphere by the new ellipsoid and repeat the procedure. Depending on the information available at the present stage of the algorithm it is convenient to take less than half of each ellipsoid whenever possible. We refer to the first type of cuts, going through the center of the ellipsoid, as central cuts and to the second type of cuts, that leave the center of the ellipsoid in the part to be neglected, as deep cuts.

Figure 1: The central and the deep cut ellipsoid methods In Figure 1 one can see the difference between the two types of cuts. The first picture represents a central cut on a sphere and the minimal volume ellipsoid that includes the half sphere. The second picture shows the same sphere and a cut parallel to the first but shifted halfway along the radius of the sphere. One can see in this case that the ellipsoid with minimal volume including this part of the sphere is indeed smaller than the one in the previous picture. The minimum volume ellipsoid required at each step is easy to compute if one finds such a (central or deep) cut. In order to recall how, we first need to introduce a mathematical description of an ellipsoid. A set £ C M" is called an ellipsoid if there exists a vector a G M° and a positive

110

J. B. G. Frenk and J. Gromicho

definite s X s-matrix A such that £ = £(A; a) := {x {as £ tf? R':a : (as (as -- afA-\x a)TA-1(a! -- a) a) << ll}} .. Moreover, in order to determine whether a given hyperplane in M" with normal a* intersects an ellipsoid £(A;a) we observe ([11]) that

and

min{a*Tas : as £ f(A; £(A; a)} = a*Ta - V^Atf V^Atf

(1)

max{a*Tx : as £ £(A; a)} = a*Ta + V^Atf. max{o* V^AH*.

(2)

T

This implies that the hyperplane C^(/3) with h(x) := a* (x - a) has a nonempty intersection with £{A;a) whenever -y/a^Aa* < -p < y/a*TAa* If this holds the hyperplane is called a valid cut. It is shown in [11] that for - 1 / s < a < 1 and a := ~P/y/a*TAa* a minimum volume ellipsoid containing the intersection £{A; a) (1 C% (/?) exists and this new ellipsoid has a strictly smaller volume than £{A; a). Moreover, for a = 1 the intersection of the lower halfspace and the ellipsoid reduces to a single point and there is no need to proceed. To bring the exposition into the framework of an iterative algorithm let us denote the current ellipsoid by £(Am;am) and the corresponding cut by C^Jpm) with hm(x) := < T ( x - am). Finally, denottng the depth of the current cut by 0 < am := :=

m ~f / " sJ
one can show ([1]) that the ellipsoid £(Am+1;am+i)

< < 11 with center given by

am+l •— a m - rrmmb6mm and matrix Am+1 m+l

(3)

ammbmmbTTJ := 6mm(Amm - <J

(4)

with updating values .

6m =

'-

s2(l-a2Jm)

s2-1

'

am: am:

-(, -(,

2(1 2 ( 1 +sa + samm))

l ) ( l++ QQmmy) ' + l)(l

r T:

1 +l+sa sam

- ™-:-^TT ^rr

and

\Ja*mAma*m is t , e minimum volume elliDso:J containing £(A . « ) H C- (8 ) We note here that for am = 0 (resp. 0 < am <1) the hypwplane is called a valid central cut (resp. valid deep cut).

Deep Cut Ellipsoid

Algorithm

111

Taking the same matrix Q as described on page 151 of [16] and copying with some obvious modifications the proof of Propositions 2.7 and 2.8 of [16] one can show that Am+i is positive definite given that a2m < 1 and Am is positive definite. Since in this section cuts are generated by means of subgradients the following result is useful to determine when these cuts are valid.

L e m m a 3.1 Let

Proof: Observe by the subgradient inequality, x G £{A;a)

and (1) that

0 <
= Va*TAa* and hence by (1) with /9 := £ —
D

The ellipsoid algorithm is well-known for the occurrence of numerical instabilities. These instabilities are related to the fact that each time a new ellipsoid is generated this new ellipsoid is not completely included in the previous one, bringing new points into consideration (see Figure 1). The inclusion of new points may induce the ellipsoid to elongate along one of its axes in such a way that eventually it may become "flat" on one of the other 5 — 1 axes. An effect of the potential elongation of the ellipsoid is that the center of the present ellipsoid may end up extremely far away from the initial ellipsoid. In order to try to overcome this we introduce the concept of norm cuts. Such cuts appeared first in [8]. The idea is to remember Assumption 2.1 at each iteration and not as for the basic version to use this information only to initialize the algorithm. As the next section shows the inclusion of this new cut permits a very simple convergence proof. Thus, if it happens that the center am of the current ellipsoid is outside the first ellipsoid then a cut is generated using the function n(x) := ||SE — a 0 || 2 . A geometrical interpretation of this cut is given in Figure 2. Before discussing the detailed steps of how to implement the oracles yielding the different cuts we first list the improved version of the algorithm.

J. B. G. Frenk and J. Gromicho

112

Step 0 let m := 0, Am := r2I and Ci-i := +o°; Step 1 if am satisfies some stopping criteria then stop else goto Step 2; Step 2 ifn(am) > r then (apply a norm cut] else if g{am) > 0 then (apply a constraint cut] else (apply an objective cut]; Step 3 (update the ellipsoid], let m := m + 1 and return to Step 1

v

y

The above algorithm requires the specification of four different procedures marked as framed statements. To start with a norm cut we observe the following. It follows by Lemma 2.3 and Assumption 2.1 that x' G £(Ao',ao) = ££(r) Q ^Km(r ~ n(am)) with hm(x) := Vn(a m ) T (i — a m ) and Vn(am) = ,.° m J°S, . Consequently the optimal point x* belongs to the lower halfspace £^m(/3m) with /3m := r — n(am). The validity of this norm cut follows now from i * G £(Am; am) and Lemma 3.1 and so the norm cut reduces to the following procedure. let o m := {n(am) -

r)/yJVn(a m ) T A m Vn(a m ) J

Clearly, after performing a norm cut if follows that x' £ £(Am+1;am+1), In order to apply a constraint cut we observe by Lemma 2.3 that £ | ( 0 ) C £/Tm( — 0 and so the constraint cut reduces to the next procedure.

Observe, since £J=(0) is nonempty and g(am) > 0 that a*m can not be zero. It is also clear after performing a constraint cut that x" G £(Am+i\ am+1). Moreover, to apply an objective cut we observe the following. Since / is finite and convex on 1RS it follows for every x G £j(0) that the subgradient set df(x) is nonempty ([19, 12]) and hence for every a*m G df(am) the so-called subgradient inequality /(as*) > f(am) + hm(x) holds with hm(x) := al£(x - am). Observe, if a^ = 0 then am is optimal and therefore there is no need for a cut. For a derivation of a deep or central valid cut with respect to / introduce £m := min{/(ajt) : y(a^) < 0, 0 < k < m} and observe by Lemma 2.3 that x~ G CJ{lm) C C^m((m f(am)).

Deep Cut Ellipsoid

113

Algorithm

Hence x~ must belong to the lower halfspace £ j (/?m) with /?m := £m — f(a.m)- The validity of this cut follows again by x" G £(Am;am) and Lemma 3.1. Clearly this is a valid deep cut whenever Zm < f(am) and this deep cut can be derived using only negligible additional computer effort. The objective cut now reduces to the next procedure. if f(am)

< C _ ! then let £m := f(am eJselet £m := £m-i; take a*m G df(am); if a*m = 0 then stop eise let am := (f(am) -

em)/^/a£Ama*

It is also clear after performing an objective cut that x* G £(Am+i; a m +i). Finally the update of the ellipsoid in S t e p 3 is done by applying formulas (3) to the center and (4) to the matrix. This concludes the description of the four procedures necessary to complete the ellipsoid algorithm. The technique to generate deep cuts using the subgradient inequality was first proposed by Shor and Gershovich in [25]. In the next section we give an easy and geometrically oriented rate of convergence proof for this version of the ellipsoid algorithm.

4

T h e Proof

To start this proof we need the following result. L e m m a 4.1 For each matrix A G J? 3XS with det(A) / 0 and each pair of vectors a,b€M' we have det(A + abT) = (1 + 6 T A" 1 a)det(>i). Proof: Observe by well-known properties of determinants ([13]) that det(>l + aV) = d e t ( [ ;

= det('1 + which completes the proof.

A

^ fc

] ) = det

1 -a

6T A

^f° ° T ])=(l + 6^a)det(A) n

We now assume that the ellipsoid algorithm has already performed m steps, m = 1,2,..., with centers ak, 0 < k < m, and no optimality check or stopping rule was applied.

J. B. G. Frenk and J. Gromicho

114

We may assume without loss of generality that 0 < a* < 1 and a*k ^ 0 for every 0 < k < m since a^ = 1 would mean that £(Ak+i; at+i) reduces to a point and a*k = 0, only possible for the objective function / , would make the algorithm stop. The following result is fundamental for our proof. For the definition of the se quences £m and /?m and the functions n and hm we refer to the previous subsection. Lemma 4.2 For each m > 0

cf(em)ncf(o)nc$(r)

C

£(Am;am)nC^(0m)

holds. Proof: Recall first that C^(r) = £"(j40;ao)- Now observe that in each iteration the ellipsoid algorithm applies either an objective, a constraint or a norm cut. By Lemma 2.3 we obtain in case of an objective cut that Cj(£m) C £ S m (/? m ), while a constraint cut satisfies £ | ( 0 ) C £j7m(/?m) and a norm cut ££(r) C Cnm(fim). The result follows now easily by induction since £m is nonincreasing and £(Am;am) n

CZjpm)C£(Am+l;am+l).

a

In order to prove the main convergence theorem we need the following regularity condition. This condition is the strong Slater condition (cf. [12]). Assumption 4.3 There exists some x € a0 + rB with g(x) < 0. The following result will also be useful for the proof of the main convergence theorem. Lemma 4.4 Let

z+^

W

with Lipschitz

£<(,).

Proof: Take x £ z + ^f^B. It follows that ||x - z\\2 < y-=£p- and this implies by the Lipschitz continuity of the function ip that y?(x) — v>(2) — IvC35) — v ( z ) l ^ Lv\\x — z\\2 < y — v(z). From this inequality we obtain that 1 and so -A- < 1.

Deep Cut Ellipsoid

Algorithm

115

Theorem 4.5 If the ellipsoid algorithm executes an infinite number of iterations then it follows that £m J. / ( * * ) . Moreover, if f is Lipschitz continuous on £ | ( 0 ) (~l C£(r) with Lipschitz constant Lj and g is Lipschitz continuous on C£(r) with constant Lg then there exists some m 0 such that

5

^-«^ ^?(i)'!^

-Qk otk

for every m > m0. Proof: We start by evaluating det(A m ). From (4) and Lemma 4.1 it follows by induction that m-1

det(Am) = det(,4o) I I (**(* ~ »*)) • k=0

Since Ao = r2I and Am is positive definite for every m we obtain after some calcu lations that

o<det(^)=^n(a"(i-^(^)<^(i),m and so it follows using ab > 1 that det(j4 m ) —» 0. By the convexity of £ < ( 0 ) n £ < ( r ) , Assumption 2.1 and Assumption 4.3 it follows that the line segment [£,»:*[ := {Ax + (1 — A)x" : 0 < A < 1} is contained in £<(o)n£< f(x*) for every m > 0 and so we obtain that c := lim mToo £ m exists and c > f(x*). We will now show by contradiction that c equals f(x*). Suppose therefore c > f(x'). If c < f(x) then from the continuity of / it follows that there exists an x S ]x, x'[ with f(x) = c. If c > f(x) then take x equal to x. In both cases this yields by the convexity of / that x := (x + x')/2 6 Cf (c) D £<(0) D £<(r) and therefore, since Cf{c) D £< (0) n £<(r) is an open set there exists some 6 > 0 such that x + SB C Cf(c) n £<(0) D £<(r). Hence from Lemma 4.2 and £m J. c it follows that x + SBC Cf(em) n £<(0) n £<(r) C £ ( A m ; a m ) n ££ m (/? m ). Finally, since vol(£(>l m ; a m ) ) = yfdet(Am)va the previous inclusion that

with u3 := vol(B) ([11]), we obtain from

0 < 6'v, = vol (x + SB) < vol (£{Am; am) n £^(/?m)) 0 and this contradicts det(A m ) —» 0. Hence it must follow that £m I /(**) a n < l s o t n e n r s t P a r * >s proved.

116

J. B. G. Frenk and J. Gromicho

Figure 2: A norm cut

Figure 3: Geometric interpretation of the proof

Deep Cut Ellipsoid Algorithm

117

To prove the stated inequality we first assume that every optimal solution a;* satisfies g(x') = 0. Since (m I f{x") and by our assumption / ( g ) > f(x') there exists some m, such that

/(»*) /(*') < 4, < /(«)

for every m>mx. The continuity of / enables us to create the sequence xm G [£, x'[ w i t h / ( x m ) = lm. Now we use this sequence to create the sequence xm := {xm+x')/2 (see Figure 3) and for this new sequence it follows by the convexity of / that f(xm) < Hence by Lemma 4.4 we obtain that im xm

in <(U + ^-f(*~)BcC + -Tf{ia). BcCj(lm).

Recall now from the convexity of / and lm = f(xm)

(5) (5)

that

C ~ /(»m) >^ tCm- ~ /f(x') C (»') | | x m - x m | | 2 ~ | | x« m - x«*- | | 2, and, by construction, \\xm - x'\\2 = 2||x m - x m || 2 . This yields that tm - f(xm) {lm - /(x*))/2 and thus (5) implies that

>

i.^-nc^). 2L/

(6) (6) 2Lf On the other hand, by the convexity of g we obtain that g(xm) < 0 and applying again Lemma 4.4 yields that (7) C?)

^ / " ^ S C ^ O ) . LLgg

Now, from the convexity of g, g(x") = 0, and the Lipschitz continuity of / with Lipschitz constant Lf it follows that

1 , ^ ,-J|x*-xm||2 ^ ... * » - / ( » • ) 9{x) -*(*») > --^f^) ^ f ^ ) ^^ --g^ g a2|l«*-a|l, l l s ' - £ | | , "- " ^ ^~L . 2Lj\\x--x\\ l l x ' - x l b2 -*(*») and this, together with (7), leads to + Xmm +

*

-g(x){i -g(x){£mm - / (f(x')) **))„ < K 2 L / L J x * - xx\\, | | 2 BS CC£ * 3( 0( 0)) 2L,L,\\x--

Combining (6) with (8) and observing that -g(x) Xm

+

f(X )] ' Bc 2L}Lg\\x' -x\\2 }Lg\\x'-x\\2

((8) 8

< Lg\\x' - x\\2 finally yields

C<(i„. ) n £ < ( 0 ) .

)

118

J. B. G. Frenk and J. Gromicho

Since [x,x'[ C C<(r) there exists an e > 0 such that i + J C x € [x,x*[. Taking now m2 such that for m > m2 -g{x)(£ -g{x)(£mm - /(«*)) 2L 2LfL/ IJx*-x|| g\\x'-x\\2 2

£<(r) for every

<£

"

it follows by Lemma 4.2 for m > m0 := max{m 1 , m2} that

-**®1-"

<

(f

) n £*-(()) n C^fr)

a Q£(A C£r(A n £ ^ (An). m ro; ;omro))n£^(A»).

Thus

-(^«3?-)*-«^<.*««.
and computing these volumes gives

(-g(x){lmm-f{x'))\\_ (-g(x)(i - f(x'))Y ii 2L,L \\x'-x\\ ) 9 2 2L/La||x--x|| 2 J

1 / 1 \ * /I ^ 11 I 2 . m— I 2 V./1-<M r2 T T / M ' / (1 al) r ( 1 _ a J '{lVTT^r+ akh" ' - '~2\ 2 \ | °E\ab) /iUJ J-V\

V

Dividing the previous inequality by v„ raising both sides to s" 1 and multiplying by 2L L ' '}fi~*[h yields the desired result. To complete the proof we still have to consider the case when an optimal solution x" exists satisfying g{x") < 0. It is not difficult to verify that for this case the same result holds, since in fact im can be taken equal to x* for every m > m0 for which <"-/(*•) < e with s > 0 satisfying x'+eBC £<(0) n £<(r). Thus, from Lemma 4.4 we conclude that

x. J\Xx )> Q x* i. *-m m ~ f( c £
Lf

Lf

n n

g g

and from here on one may proceed to achieve the same result.

D

Finally, we would like to remark if an unconstrained instance of (C) has to be solved then it follows from the last steps of the proof (see also Ref [7]) that

o<em

-«"^U) S^ T

/

-i

\ "I

m-1

-

^

/ ,i a*

Deep Cut Ellipsoid

Algorithm

119

References [1] R. G. Bland, D. Goldfarb and M. J. Todd, The ellipsoid method: A survey, Operations Research29 (1981) 1039-1091. [2] D. den Hertog, Interior point approach to linear, quadratic and convex program ming - algorithms and complexity, volume 277 of Mathematics and Applications, Kluwer Academic Publisher, 1994. [3] S. T. Dziuban, J. G. Ecker and M. Kupferschmid, Using deep cuts in an ellipsoid algorithm for nonlinear programming, Mathematical Programming Study 25(5) (1985) 93-107. [4] J. G. Ecker and M. Kupferschmid, An ellipsoid algorithm for nonlinear program ming, Mathematical Programming 27 (1983) 83-106. [5] J. G. Ecker and M. Kupferschmid, A computational comparison of the ellipsoid algorithm with several nonlinear programming algorithms, SIAM Journal on Control and Optimization 23(5) (1985) 657-674. [6] A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Uncon strained Minimization Techniques, Wiley, New York, 1968. [7] J. B. G. Frenk, J. Gromicho and S. Zhang, General models in min-max con tinuous location: Theory and solution techniques, Technical Report TI 93-175, Tinbergen Institute, Rotterdam, The Netherlands, 1993. To appear in Journal of Optimization Theory and Applications. [8] J. B. G. Frenk, J. Gromicho and S. Zhang, A deep cut ellipsoid algorithm for convex programming: Theory and applications, Mathematical Programming 63(1) (1994) 83-108. [9] J. L. Goffin, Convergence rates of the ellipsoid method on general convex func tions, Mathematics of Operations Research 8 (1983) 135-150. [10] D. Goldfarb and M. J. Todd, Linear programming, volume 1 of Handbooks in Operations Research and Management Science, chapter II. North-Holland, Amsterdam, 1989. [11] M. Grotschel, L. Lovasz and A. Schrijver, Geometric Algorithms and Combina torial Optimization, Springer-Verlag, Berlin Heidelberg, 1988. [12] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I: Fundamentals, volume 305 of A Series of Comprehensive Studies in Mathematics, Springer-Verlag, Berlin, 1993.

120

J. B. G. Frenk and J. Gromicho

13] P. Lancaster and M. Tismenetsky, The Theory of Matrices, New York, second edition, 1985.

Academic Press,

14] D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Reading Massachusetts, 1984. 15] H. J. Luthi, On the solution of variational inequalities by the ellipsoid method, Mathematics of Operations Research 10 (1985) 515-522. 16] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Wiley, New York, 1988.

Optimization,

17] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, John Wiley & Sons, Chichester, 1983. 18] F. Plastria, Lower subdifferentiable functions and their minimization by cutting planes, Journal of Optimization Theory and Applications 46(1) (1985) 37-53. 19] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. 20] R. T. Rockafellar, Conjugate Duality and Optimization, 1974.

SIAM, Philadelphia,

21] N. Z. Shor, Utilization of the operation of space dilation in the minimization of convex functions, Cybernetics 6 (1970) 7-15. 22] N. Z. Shor, Convergence rate of the gradient descent method with dilation of the space, Cybernetics 6 (1970) 102-108. 23] N. Z. Shor, Cut-off method with space extension in convex programming prob lems, Cybernetics 13 (1977) 94-96. 24] N. Z. Shor, Minimization Methods for Non-differentiable Functions, series in computational mathematics, Springer-Verlag, Berlin, 1985.

Springer

25] N. Z. Shor and V. I. Gershovich, Family of algorithms for solving convex pro gramming problems, Cybernetics 15 (1979) 502-508. 26] D. B. Yudin and A. S. Nemirovsky, Evaluation of the informational complexity of mathematical programming problems, Matekon 13(2) (1976) 3-25. 27] D. B. Yudin and A. S. Nemirovsky, Informational complexity and efficient meth ods for the solution of convex extremal problems, Matekon 13(3) (1977) 25-45.

Nonsmooth

Equations

by Means of Quasi-Newton

with Globalization

121

Recent Advances in Nonsmooth Optimization, pp. 121-140 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Solving Nonsmooth Equations by Means of Quasi-Newton Methods with Globalization Marcia A. Gomes-Ruggiero, Jose Mario Martinez and S a n d r a Augusta Santos Department of Applied Mathematics, State University of Campinas, CP 6065, 13081 Campinas SP, Brazil

Abstract

We consider the utilization of quasi-Newton methods for solving nonlinear sys tems of equations, without smoothness assumptions. In order to improve the global convergence properties of the algorithms, we use a globalization strat egy based on a merit function. We adopt a tolerant procedure that permits a nonmonotone behavior of the merit function. We test our methods with a collection of large scale nonsmooth systems originated in nonlinear complemen tarity problems.

1

Introduction

We consider t h e resolution of nonlinear systems of equations F(x)

= 0

(1)

without smoothness assumptions on t h e m a p p i n g F : ( l C lRn —» IRn Methods for solving (1) are iterative. Quasi-Newton m e t h o d s , originally developed for smooth systems, g e n e r a t e sequences {xk} according to

xk+1 = xk- BklF(xk),

(2)

See [6], [7]. T h e quasi-Newton strategy consists in considering t h a t , choosing Bk in an a p p r o p r i a t e way, we have F{x)*Lk(x)

= Bk{x-xk)

+ F(xk),

(3)

122

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

at least in a neighborhood of the current point xk. Newton's method (for smooth systems) chooses Bk = F'(xk). See [7], [32]. Secant methods choose an arbitrary 5 0 and update the successive "Jacobian approximations" Bk in such a way that Bk+lsk

= yk

(4)

for all k = 0 , 1 , 2 , . . . , where sk = xk+i — xk and yk = F(xk+i) — F(xic). The "secant equation" (4) guarantees that the affine function Lk(x) defined on (3) satisfies Lk(xk) = F(xk)

and Lk(xk-i)

= F(xk--i).

(5)

The local convergence theory of secant methods for smooth systems of equations is well-developed. See [2], [7], [8], [25], [26]. Some authors (see [3], [4], [5], [21], [29]) studied quasi-Newton and secant methods for different nonsmooth problems, but a local convergence theory with sufficiently general assumptions is far from being complete. In spite of this limitation, the iterative approximation of a nonsmooth F by an affine interpolatory function is an appealing idea that deserves experimental investigation. This is one of the aims of this paper. In order to improve the global convergence properties of local algorithms, strate gies based on the minimization of a merit function / (most frequently f(x) = ||.F(£)|| 2 ) are generally used both in the smooth and the nonsmooth case. Martinez and Qi [28] considered one of these strategies in connection with the so called inexact iteration based on the Newton method. See also [20]. In this paper we generalize that strategy in order to consider completely arbitrary directions without smoothness or semismoothness hypotheses on F. We prove a global convergence theorem related to this strategy. We implement an algorithm that is able to deal with large nonsmooth systems. In this implementation we combine local (quasi-Newton) and global iterations using a tolerant strategy that calls to the global (special) iteration, which is more expensive, only when it is necessary. This strategy has been used for smooth systems in [12] and for some inverse problems in [9]. We use our methods to solve a collection of problems of dimensions 100 x 100 and 1000 x 1000. This paper is organized as follows. In Section 2 we introduce a general "global algorithm" for solving nonsmooth systems and we prove global convergence results. In Section 3 we describe a practical implementation (for large-scale problems) of the global algorithm. In Section 4 we define the quasi-Newton methods used in our study. In Section 5 we describe the tolerant strategy. The numerical experiments are reported and commented in Section 6.

2

Global Algorithm

In this section we analyze the convergence properties of a general model algorithm under minimal assumptions on F. In particular, no smoothness assumptions will be

Nonsmooth Equations by Means of Quasi-Newton with Globalization

123

made. As many methods for solving both smooth and nonsmooth equations, the algorithm is based on the monotone reduction of a merit function / . Usually, we choose

/(«) = ^ ( * ) H 2 n

(6) n

However, any function / : R -* R such that f(x) > 0 for all x £ R and f(x) = 0 iff F(x) = 0 will serve for our purposes. If F is not defined for some x € Rn, we define f(x) = oo. The main global algorithm, which use elements of the approaches of [10] for smooth problems and [28] for semismooth problems, is described below. Algorithm 2.1. Assume that cr G (0,1), 77 G (0, | ] . Choose x0 G Rn, an arbitrary initial approximation such that f(x0) < 00 and set a 0 = 1. Given xk G Mn, ctk 6 (0,1], the steps for obtaining xfc+i,ajfc+i are: Step 1. Choose dk G Rn (7) Step 2. If f{xk+akdk)
(8)

(9)

define ak+\ = 1. Otherwise, choose O/t+i 6 [yotk, (1 - r))ak].

(10)

Observe that Algorithm 2.1 is more general than line-search or backtracking meth ods (see [7]) since the "direction" dk can change after a failure of (8) or (9). We will take advantage of this feature in the practical implementation. We denote by K\ the set of indices of the ''very successful" iterations: K, = {keJN

I (9) holds}.

(11)

Let us now prove the convergence results related to Algorithm 2.1. The thesis of the convergence theorems will always be lim/t_0O f(xk) = 0. Clearly, if x, is a limit point of {xk} and / is continuous at x, this implies that f(x,) = 0. Lemma 2.1 Let {xk} be the sequence generated by Algorithm 2.1. If Y, log(l - o-ak) = - 0 0 ,

(12)

lim /(**) = 0.

(13)

the k —+00

124

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

Proof. By (9), for all k G Kt we have log /(x* + 1 ) < log(l - aak) + log Therefore, since f{xk+1)

f(xk).

< f(xk) for all k G W,

fog / ( * * ) <

l o g ( l ~ f f Q > ) + l o e //((ls0a ) E l°g(l~<">,•)+loe ieK, i
So, by (12), lim log f(xk) = - o o .

k—i-oo

□ D

This implies that /(x fc ) -»00

Lemma 2.2 Let {xk} be the sequence generated by Algorithm 2.1. Then, either lim^00/(xfc) = 0 or there exists K2l an infinite subset of IN, such that

lim akk = 00 and

/f(x( kx t++a^kd)k)-f(x - / (k)x t )

>>

(14) (14)

(

}

otk otk

(15)

forallkGK2. Proof. By Lemma 2.1, if (13) does not hold, either Kt is finite or E

log(l - aa
If #1 is finite, Kx C { 0 , 1 , . . . , fc0}, we have that (9) does not hold and ak+1 < (l-r,)ak for all k > k0. So, at -> 0 in ntis sase, and dth eesult ti sroved. If Kx is infinite and the series £*<=*, l°g(l - aak) converges, we necessarily have that ak -» 0 0orrcG Kt. Thereforee ak < 1 for rarge enough h G &i. Now, ,ince a* > 7<**-i) rjotk-i, we also have that «*_i -» 0 for fc G AV Moreover, since ak < 11 it follows that (9) does not hold at iteration fc - 1. So, (14) and (15) hold defining A 2 = {fcG W W|fc |fc + 1 e l ( , } . This completes the proof.

□

Lemma 2.3 Let {xk} be the sequence generated by Algorithm 2.1. Assume that there exist a G (0,1) and J, = {juj2l...}, an infinite subset of Kx such that a<*j, > |

>- ~ 7

for all i = 1,2,.... Then lim f(x /(xkfc) ) = 0. *:—oo

(16) (16)

Nonsmooth Equations by Means of Quasi-Newton with Globalization

125

Proof. By the fact that log(l - aak) < 0 for k G W and by (16), we have that £ fceft'i

log(l - aak) < Y. log(l - a a t ) < £ log(l - -r-). teJj

*=i

(17)

K

But, by the integral criterion, the series on the right hand side of (17) diverges, so the same happens with the series on the left hand side. Therefore, the desired result follows from Lemma 2.1. Q Lemma 2.3 suggests the following especialization of Algorithm 2.1. Algorithm 2.2. Assume that
f(xh + akdk) - f(xk)

(18)

< — crJ(Xk)

Oik

whenever

a at < — mk If such a choice is not possible, stop. (The algorithm breaks down.) Step 2. If (8) holds, define xk+1 = xk + a ^ . Otherwise, define Xfe+1 Step 3. If (9) holds, define ak+i = 1 and mk+l

(19)

x k-

= mk + 1.

Otherwise, choose ajt+i G [»?ai, (1 — ri)oik} and define m^+i = mk. Lemmas 2.1, 2.2 and 2.3 allow us to prove the following theorem related to Algo rithm 2.2. Theorem 2.4 Assume that Algorithm S.2 does not break down and that {xk} is gen erated by this algorithm. Then lim /(**) = 0. Proof. Let us prove first that Ki is infinite. In fact, if A'i C {0,1,...,feo} we should have mk = m^ for all k > k0. But, since k £ Ki for all k > k0, we have that Ofc+i < (1 — v)ak for all k > k0. This implies that ak —> 0 and then, for some k > fc0, ak <

a a = —. mko mk

So, by the definition of the algorithm, (18) holds at iteration k. This implies that (9) holds. So, k G Ki, which contradicts the assumption of finiteness of K%.

126

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

Therefore, from now on, we assume that K\ is infinite. Let k £ K\. If ak = 1 we clearly have that ak > ^ . (20) mjfc If ak < 1, then k — 1 ^ Ki. So, mk = rrik-\- But, since (9) does not hold at iteration k — 1, we have that (18) does not hold at this iteration and, so, by the definition of Algorithm 2.2, a a a*-i > But ak £ [»?a*_i,(l - 7/)a/t_i], so Ck > Vak-i >

■qa

mk Therefore, we proved that K\ is infinite and that for all k £ Ki, (20) holds. As a result, we are under the hypotheses of Lemma 2.3, with K\ = J\ and a = rja. So, the desired result is proved. □

3

Implementation of the Global Algorithm

In this section we describe an implementation of Algorithm 2.2 for solving nonsmooth systems of equations, remembering that we are specially interested in large-scale problems. The main tool for our practical algorithm is a method for minimizing convex quadratics with box constraints developed in [13] and [14]. The idea is to choose, at each iteration, sk = a^dk as an approximate minimizer of

Ms) = \\\Vks + F(xk)f on an appropriate trust-region (see [11]) of the form ||s||oo < A) where Vk is a suitable n x n matrix. In semismooth problems we choose Vk £ dsF(xk) (see [28] and [35]). After the computation of sk, if ak < a/mk, we test the inequality (18). If this inequality does not hold, we stop the execution (the algorithm breaks down). This necessarily happens when the problem has no solutions. The choice of the || ■ ||oo norm instead of the Euclidean allows us to deal with bounds on the variables. In this case, the || • ||cx. norm fits well with the bounds and the approximate minimizers are not difficult to find. Observe that constraints are naturally considered in our formulation since we can define / ( x ) = oo if x is infeasible. Algorithm 3.1 Let a 6 (0,1), r/ € (0, | ] , a e (0,1), tol £ (0,1), max £ IN be given independently of k. Let xo £ Mn be an arbitrary initial point, such that / ( i o ) < oo, A 0 = M, mQ = 1 and a0 = 1. Given xk £ Rn, &k > 0, mk £ N and ak £ (0,1], the steps for obtaining xjt+i, Afc+i, mk+i and ak+\ are the following: S t e p 1. Compute 5^, an "approximate solution'' of

Nonsmooth Equations by Means of Quasi-Newton with Globalization

Minimize «/>*(*) = i||V kS + F(xk)\\2

s.t. H U < A*.

127

(21)

The approximate solution of (21) is obtained by applying the method described in [13] (see also [14]), stopping when \\VpMsk)\\
(22)

(where Vpif>k(s) is the projected gradient of V>* on the box Halloo < A*) or when the number of iterations used by the algorithm [13] exceeds max. Step 2. If ak < a/mk but (18) does not hold, stop (the algorithm breaks down). Step 3 . The same as Step 2 of Algorithm 2.2. Step 4. The same as Step 3 of Algorithm 2.2. Step 5. If ajt +1 = 1, define Afc+i = M. Otherwise, define Ajt+] = ||5*:||oo/2. The parameters used in our implementation were a = 10 - 4 , a = 10 - 5 , 77 = | , M = 103, tol = ^ j , max = 300. The software used for this implementation was an adaptation of the algorithm for box constrained minimization introduced in [14]. The algorithm used for obtaining the approximate solution of (21) (see [13]) is an active set method that combines conjugate gradient iterations with projected and "chopped" gradient iterations in such a way that many active constraints can be added or dropped in a single iteration.

4

Quasi-Newton Methods

As we mentioned in the Introduction, the idea of considering a linear system as subproblem for computing each iteration, using interpolation, is very attractive, not only because of its simplicity but also because of its success in the smooth case. In this study, we consider three quasi-Newton formulae for the implementation of the local iteration (2). The first one corresponds to the classical Newton's method. This choice can be made, for example if F is semismooth (see [31], [36]). In this case we choose Bk as one of the matrices Vj, in the generalized differential set dgF(xk). In large sparse problems, where each equation depends only on a few number of variables, % is a sparse matrix. The LU factorization of Vk, which is necessary for computing xk+\ in (2) was computed using a static data structure and partial pivoting as described in [15] and [18]. The other two quasi-Newton methods considered are Broyden's method and the Column-Updating method. See [1], [17], [23], [24]. In both methods we choose B0 as in Newton's algorithm and Bk+i in such a way that the secant equation (4) is satisfied for all k = 0 , 1 , 2 , . . . and the recurrence formula for obtaining Bk+i is B^

= Bk^-B;;fk)T

(23)

128

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

where Zk =

Sk

for Broyden's method and zk = ej>, \(e")Tsk\

= \\sk\U

for the Column-Updating method, ({e 1 ,... , e"} is the canonical basis of JRn.) Applying the Sherman-Morrison formula to (23) ([16], p. 51]) we obtain ^-fy^TB;>. (zkYBk yk Formula (24) shows that B£"+i c a n D e obtained from Bk^ using 0(n2) point operations in the dense case. Moreover, B^=B^

+

fl^x = (/ + «*(*0 T W 1 .

(24) floating

(25)

where uk = (sk - B ^ 1 j / k ) / ( z t ) T B ^ 1 ^ , so B;1 = (I +

Ufe_1(^_1)

T

) . - . ( / + u0(zo)T)Bo\

(26)

for k = 1 , 2 , 3 . . . Formula ( 26) is used when n is large (see [18] and [30]). In this case, the vectors u0,... ,uit_i, 20) ...,Zk-i are stored and the product BklF(xk) is computed using (26). In this way, the computer time of iteration k is O(kn) plus the computer time of computing BQ F(xk). If k is large the process must be periodically restarted taking

Bk « J{xk).

5

Tolerant Strategy

In Sections 3 and 4 we studied two ways of approximating a solution of (1). The first is to use the merit function f(x), applying the globally convergent Algorithm 3.1. If this algorithm does not break down at some iteration k, an approximate solution of (1) is computed, in the sense that (13) holds. The second is to use the recurrence (2). The global method is based on the monotonic behavior of f(xk). Many times, to impose f(xk+i) < f(xk) for all k is not satisfactory. In fact, at least for smooth problems, efficient local methods frequently converge rapidly to a solution but the generated sequence {xk} does not exhibit monotonic behavior in f(xk). In these cases, the pure local method is much more efficient than the monotone global method. Often, the monotone method converges to a local (nonglobal) minimizer of / , while the local method converges to a solution of (1). By these reasons, it is necessary to give a chance to the local method before calling the minimization algorithm. This necessity

Nonsmooth Equations by Means of Quasi-Newton with Globalization

129

has been considered by several authors (see [19]). Here we describe a strategy that combines local algorithms and minimization methods. A similar strategy has been used in [9] for some overdetermined systems coming from inverse problems and in [12] for large-scale differentiable nonlinear systems. Let us define "ordinary iterations" and "special iterations". By an ordinary iteration we understand an iteration produced by any local (quasi-Newton) method, like the ones described in Section 4. A special iteration is an iteration produced by Algorithm 3.1. We define, for all k € IV, wk=

Argmin

{f(x0),...,f(xk)}.

For completeness, we define f{wk) = f(x0) if k < q. Ordinary and special iterations are combined by the following algorithm. Algorithm 5.1. Initialize k <— 0, FLAG <— 1. Let q > 0 be an integer, 7 G (0,1). Step 1. If FLAG = 1, obtain xk+i using an ordinary iteration. Else, obtain xk+i using a special iteration. Step 2. If

/(**+,) < 7 / K - , )

(27)

set FLAG «— 1,k <— k + 1 and go to Step 1. Else, re-define xk+\ *— wk+i. Set FLAG < 1, k <— k + 1 and go to Step 1. If the test (27) is satisfied infinitely many times, then there exists a subsequence of {xk} such that lim f(xk) = 0. k—*oo

Conversely, if (27) does not hold for all k > kQ, then all the iterations from the &oth on will be special, and the convergence properties of the sequence will be those of Algorithm 3.1. The parameters 7 and q measure the degree of tolerance in the local algorithm. If q is large we admit a large number of iterations without enough progress. On the other hand, 7 says what we mean by "enough progress". The closer the real number 7 £ (0,1) is to the unity, the more tolerant the algorithm becomes. Therefore, the parameters 7 and q are essential to a satisfactory performance of our tolerant strategy.

6

Numerical Experiments

In a recent paper, Luksan [22] published a collection of 17 large-scale differentiable nonlinear systems of equations. For each Luksan's problem g,(x) = 0,i = l,...,n,

(28)

we associated a semismooth system of equations defining: 1, = ( 1 , 0 , 1 , 0 , . . . ) T

(29)

130

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

and, for all i = 1 , . . . , n, ft.(x\ _ I 9<(x) - gi(x.), i odd; \ 9i(x) - gi(x,) + 1, i even. Finally, we denned .F(x) = ( / i ( x ) , . . . , /„(x)) T , where for all i = 1 , . . . , n , /,(x) = min {xi,hi(x)}.

(30)

In this way, x» is a solution of the system F(x) = 0, which is equivalent to the nonlinear complementarity problem (see [33], [34]) x > 0, h(x) > 0, (x,h(x))

= 0.

We used the same initial points of [22]. We worked in a SUN Sparc Station 2 and used the SUN FORTRAN compiler, using single precision in the ordinary iterations and double precision in the special iterations. In the purely local methods, convergence is declared at Xk whenever ||F(xt)||2 < y'nlO - 5 and divergence is declared if either the number of iterations performed is greater than 100 or ||jF(a!i)||oo > 1020 In the globalized methods, convergence at iteration k is declared if ||.F(xjfc)||2 < \fnl0~b or Ajt < 10 - 1 0 . In the former we have convergence to a solution to (1) whereas in the latter Xk is a "strangled point", stationary for problem (6) but not necessarily a solution of (1), which corresponds to a breakdown at step 2 of the Algorithm 3.1. The results of the global methods are sensitive to the choice of the parameters 7 and q. We used 7 = 0.9 and q = 5 for all tests. The results of this set of experiments are summarized in Tables 6.1, 6.2 and 6.3 where, respectively, we present the performance of Newton's method, Broyden's method and the Column-Updating method both in their local and global versions. In the global version, every cycle of local iterations is executed with a Newton ini tialization. In our tables we adopted the following notation to present the numerical results: (IT, NFI, T) for the local methods and (IT (N,G), EVALF, NFI, T) for the global ones, where IT represents the total number of iterations performed; NFI = ||.F(x)||2/v^! where x is the vector obtained in the last iteration; T is the total time spent (in seconds) in the numerical phase; N is the total number of Newton iterations; G is the total number of special iterations; EVALF is the total number of evaluations of f(x). We observe that the symbolic phase is performed twice for each problem since we have two different dimensions (n = 100 and n = 1000). For the smaller dimension, it lasts 0.01 seconds on average. For n = 1000, the average time spent in the symbolic phase is 0.1 seconds.

Nonsmooth Equations by Means of Quasi-Newton with Globalization Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Dimension

Newton Local Version

Newton Global Version

n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000

(100, 0.5E-1, 1.04) (4, 0.2E-8, 0.34) (3, 0.3E-8, 0.04) (2, 0.1E-6, 0.19) (100, 0.2E+1, 1.16) (100, 0.2E+2, 9.52) (15, 0.9E-6, 0.15) (15, 0.3E-6, 1.27) (51, 0.5E-5, 0.59) (100,0.3, 10.72) (100, 0.1E+5, 0.83) (58, 0.1E+13, 3.72) (100, 0.2E+9, 0.81) (100, 0.6E+8, 6.76) (100, 0.2E+9, 1.18) (100, 0.7E+8, 9.86) (100, 0.5E+6, 1.43) (100, 0.2E+6, 13.34) (5, 0.5E-6, 0.05) (5, 0.3E-6, 0.48) (1,0.0, 0.01) (1, 0.0, 0.05) (6, 0.1E-11,0.04) (6, 0.1E-11,0.36) (100, 0.2E-3, 0.62) (100, 0.2E-3, 4.36) (100, 0.1E+1, 0.76) (100, 0.1E+1, 6.35) (100, 0.1E+8, 2.52) (100, 0.2E+9, 23.8) (4, 0.2E-7, 0.03) (4, 0.1E-6, 0.26) (100, 0.4E+15, 0.85) overflow

(24(10,14), 276, 0.2E-1, 3.1) (4(4,0), 5, 0.2E-8, 0.34) (3(3,0), 4, 0.3E-8, 0.04) (2(2,0), 3, 0.1E-6, 0.23) (37(7,30), 642, 0.1, 8.01) (100(5,95), 1605, 0.2, 2168.7) (11(8,3), 28, 0.0, 0.35) (23(15,8), 55, 0.0, 7.01) (35(22,13), 112, 0.4E-5, 6.72) (59(38,21), 145, 0.9E-5, 96.92) (66(10,56), 1178, 0.6,21.1) (12(4,8), 189, 0.1E+1, 20.4) (11(11,3), 31, 0.2E-11, 0.34) (11(9,2), 30, 0.2E-6, 3.1) (11(9,2), 25, 0.1E-5, 0.55) (18(13,5), 46, 0.1E-6, 9.84) (28(18,10), 82, 0.1E-5, 2.2) (27(17,10), 83, 0.1E-6, 25.8) (5(5,0), 6, 0.5E-6, 0.07) (5(5,0), 6, 0.3E-6, 0.49) (1(1,0), 2, 0.0,0.01) (1(1,0), 2, 0.0, 0.04) (6(6,0), 7, 0.1E-11,0.05) (6(6,0), 7,0.1E-11, 0.36) (12(10,2), 31, 0.0, 0.24) (11(9,2), 32, 0.9E-5, 2.52) (29(7,22), 489, 0.3, 4.3) (34(10,14), 507, 0.3, 47.5) (22(16,6), 61, 0.2E-6, 4.2) (25(19,6), 55, 0.2E-6, 135.7) (4(4,0), 5, 0.2E-7, 0.04) (4(4,0), 5, 0.1E-6, 0.25) (100(8,92), 108, 0.5E-1, 59.4) overflow

Table 6.1

Newton's Method: First set of problems

131

132

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Dimension

Broyden Local Version

Broyden Global Version

n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000

(7, 0.5E-6, 0.06) (100, 0.8E+6, 14.1) (100, 0.2E+7, 1.61) (100, 0.7E+6, 14.5) (100, 0.2E+3, 1.66) (100, 0.4E+3, 14.68) (67, 0.2E+26, 0.84) (14, 0.1E+14, 0.82) (5, 0.2E+35, 0.06) (100, 0.5E+22, 0.26) (100, 0.2E+1, 1.49) (100, 0.1E+1, 13.4) (100, 0.1E+11, 1.53) (100, 0.9E+11, 13.8) (100, 0.7E-2, 1.64) (92, 0.8E-5, 12.44) (100, 0.6, 1.64) (100, 0.2E-1, 14.8) (17, 0.5E-5, 0.13) (20, 0.3E-5, 1.23) (1,0.0, 0.01) (1,0.0, 0.04) (12, 0.1E-7, 0.07) (12, 0.5E-6, 0.55) (13, 0.5E+3, 0.08) (100, 0.5E+3, 12.3) (100, 0.4E+1, 1.54) (100, 0.1E+1, 13.5) (100, 0.5E+11, 1.85) (100, 0.1E+12, 17.0) (15, 0.2E-5, 0.10) (32, 0.7E-5, 2.07) (100, 0.4E+17, 1.57) overflow

(7(1,0), 8, 0.5E-6, 0.06) (10(2,3), 12, 0.3E-6, 0.98) (10(1,1), 12, 0.3E-15, 0.1) (10(1,1), 12, 0.2E-17, 0.9) (41(6,34), 702, 0.1, 9.15) (100(4,95), 1605, 0.2, 2170.3) (20(7,6), 47, 0.1E-8, 0.56) (17(4,3), 34, 0.8E-7, 3.67) (37(14,10), 147, 0.6E-5, 7.3) (85(27,50), 420, 0.8E-5, 868.) (67(9,56), 1179,0.6,21.5) (13(3,8), 190, 0.1E+1, 20.3) (29(9,8), 62, 0.1E-5, 0.7) (42(13,12), 112, 0.7E-5, 17.9) (30(10,9), 77, 0.2E-6, 1.35) (33(9,8), 73, 0.6E-12, 12.15) (59(18,26), 210, 0.2E-7, 5.37) (61(19,28), 222, 0.9E-5, 50.3) (17(1,0), 18, 0.5E-5, 0.13) (20(1,0), 21, 0.3E-5, 1.22) (1(1,0), 2, 0.0,0.01) (1(1,0), 2, 0.0,0.04) (12(1,0), 13, 0.1E-7, 0.07) (12(1,0), 13, 0.5E-6, 0.54) (13(4,3), 31,0.0, 0.25) (13(4,3), 32, 0.9E-5, 2.45) (30(6,22), 490, 0.3, 4.31) (34(8,24), 507, 0.3, 47.2) (29(7,6), 50, 0.3E-7, 2.03) (33(11,10), 70, 0.1E-5, 169.7) (15(1,0), 16, 0.2E-5, 0.09) (15(2,1), 22, 0.2E-6, 22.5) (100(7,92), 108, 0.5E-1, 59.3) overflow

Table 6.2 - Broyden's Method: First set of problems.

Nonsmooth Equations by Means of Quasi-Newton with Globalization Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Dimension

Column-Updating Local Version

Column-Updating Global Version

n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000

(9, 0.2E-5, 0.07) (13, 0.1E-5, 0.57) (100, 0.9E+6, 1.18) (100, 0.7E+7, 9.14) (100, 0.6E+1, 1.08) (100, 0.5E+3, 9.14) (54, 0.4E+18, 0.47) (54, 0.1E+18, 3.79) (3, 0.2E+27, 0.03) (3, 0.6E+26, 0.18) (100, 0.1E+1, 1.6) (100, 0.1E+1, 14.4) (100, 0.5E+20, 1.00) (100, 0.8E+20, 8.41) (100, 0.2E+4, 1.05) (100, 0.2E+4, 8.77) (100, 1.36, 1.11) (100, 0.1E+1, 9.59) (13, 0.3E-5, 0.09) (13, 0.3E-5, 0.58) (1, 0.0, 0.01) (1, 0.0,0.05) (12, 0.5E-6, 0.05) (12, 0.5E-6, 0.45) (100, 0.9E+3, 0.8) (100, 0.9E+3, 6.4) (100, 0.1E+4, 0.99) (100, 0.8E+3, 7.96) (100, 0.2E+11, 1.25) (100, 0.4E+12, 11.7) (63, 0.2E-5, 0.51) (34, 0.5E-5, 1.55) (100, 0.3E+16, 0.96) (overflow )

(9(1,0), 10, 0.2E-5, 0.06) (5(2,1), 7, 0.1E-5, 0.97) (11(2,1), 13, 0.3E-7, 1.23) (11(2,1), 13, 0.8E-5, 6.84) (41(6,34), 702, 0.1, 9.3) (100(5,95), 1605, 0.2, 2169.3) (20(7,6), 47, 0.1E-6, 0.54) (29(3,3), 44, 0.9E-6, 4.15) (35(14,13), 145, 0.9E-5, 7.31) (99(36,46), 422, 0.8E-5, 971.9) (67(9,56), 1179,0.6,21.1) (13(3,8), 190, 0.1E+1, 20.3) (30(9,8), 63, 0.1E-5, 0.7) (37(14,13), 110, 0.1E-5, 18.2) (27(10,9), 74, 0.2E-11, 1.33) (29(6,5), 54 , 0.7E-7, 7.49) (53(16,25), 194, 0.2E-11, 4.71) (54(16,24), 192, 0.7E-11, 43.4) (13(1,0), 14, 0.3E-5, 0.1) (13(1,0), 14, 0.3E-5, 0.6) (1(1,0), 2, 0.0, 0.01) (1(1,0), 2, 0.0,0.04) (12(1,0), 13, 0.5E-6, 0.08) (12(1,0), 13, 0.5E-6, 0.45) (13(3,2), 27, 0.6E-5, 0.21) (14(3,2), 30, 0.0,2.16) (30(6,22), 490, 0.3, 4.34) (34(8,24), 507, 0.3, 47.2) (23(7,6), 47, 0.5E-6, 1.99) (34(11,10), 71, 0.1E-5, 170.4) (20(6,6), 58, 0.3E-7, 2.41) (18(2,1), 20, 0.3E-6, 0.93) (100(7,92), 108, 0.5E-1, 59.4) (overflow)

Table 6.3 - Column-Updating Method: First set of problems

133

134

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

We proceed to the analysis of the numerical results. In the local version of New ton's method, 14 cases converged out of the 34 tests performed. For these 14 successful cases, the global version of Newton's method reproduced exactly the same iterations as the local version in 11 tests. In 2 other cases, the global version reached conver gence in a smaller number of iterations than the local one, but with an increase in the amount of total time spent in the execution. In only one case, the performance of the global version was worse than the one presented by the local version: the problem 4, n = 1000, spends more iterations to achieve the same solution of (1). In the 20 tests in which the local version of Newton's method failed to converge, the global version effectively worked in 11 tests, reaching the solution of system (1). In other 6 cases the sequence generated by the global version converged to a stationary point of (6). In the remaining 3 tests, interruption by overflow occurred, or the maximum number of iterations was achieved. Comparing the quasi-Newton methods, their performance were practically iden tical, with a slight advantage for the Column-Updating method in terms of running time, since in that method the number of operations is smaller than the one per formed by Broyden's algorithm. This fact becomes more perceptible as the dimension increases. In the local version of the quasi-Newton methods, 48 cases failed out of the 68 tests performed. For these 48 cases, the global version reached convergence to a solution of (1) in 32 tests. Other 10 cases converged to stationary points of (6) and, finally, there was failure in the remaining 6 tests. As regards the 20 tests in which the local version of the quasi-Newton methods converged, the global version reproduced the sequence of local iterations in 15 cases. In the 5 cases left, analogously to the behavior of the global Newton's method, a solution for the system (1) was obtained in fewer iterations but spending more time than in the local version. Comparing Newton and quasi-Newton methods, we observe that they behave al most in the same way. In figures, the local version of the Newton's method performed successfully in practically 40 % of the tests (14 out of 34). The local quasi-Newton method converged in 30 % of the tests (20 out of 68). Therefore, the globalization role is as much effective for the quasi-Newton methods as it is for Newton's algorithm. Such a behavior is quite different from the numerical results of [12], where the differentiable Luksan's problems [22] were used. When it comes to Newton's method, 15 out of 34 local tests and 24 out of 34 global ones had the same performance in terms of convergence or divergence, both in the differentiable and in the nondifferentiable problems. On the other hand, for the quasi-Newton methods, 53 out of 68 local tests and 47 out of 68 global ones performed analogously in both cases. Both Broyden's method and the Column-Updating method behave in the same way. Through these figures, we can see that Newton's method is not as robust for nondifferentiable prob lems as it is for differentiable ones. In fact, in the present experiments, although the local version of Newton's method reaches slightly better results than the local quasi-Newton, the globalization is quite more effective for the quasi-Newton methods than for Newton's method.

Nonsmooth

Equations by Means of Quasi-Newton with Globalization

135

We performed a second set of experiments, where
K

'

l 9i(x) - 9i(x.) + 1,

i odd or i > n / 2 ; i otherwise.

In this case, the function F is not differentiate at x,, since fe,(x.) = [x„]; = 0 if i is even and i > n/2. In Tables (6.4), (6.5) and (6.6) we report the performance of Newton's method, Broyden's method and the Column-Updating method, for n = 100.

Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Newton Local Version

Newton Global Version

(4, 0.1E-15, 0.05) (100, 0.2E-1, 1.19) (100, 0.2E+1, 1.13) (15, 0.9E-6, 0.16) (86, 0.3E-6, 0.99) (100, 0.6E+4, 0.83) (100, 0.1E+8, 0.85) (100, 0.2E+9, 1.09) (100, 0.2E+9, 1.68) (5, 0.3E-6, 0.07) (1, 0.0, 0.01) (100, 0.3E+10, 0.81) (100, 0.2E-3, 0.62) (100, 0.2E+1, 0.85) (100, 0.5E+7, 2.6) (4, 0.1E-6, 0.03) (100, 0.9E+16, 0.86)

(4(4,0), 5, 0.1E-15, 0.05) (32(13,19), 336, 0.1E-1, 5.83) (41(9,32), 657, 0.7E-1, 7.19) (11(8,3), 28, 0.0, 0.33) (100(43,57), 485, 0.3E-4, 53.1) (100(10,90), 2058, 0.2E+1, 26.0) (14(11,3), 30, 0.3E-10, 0.37) (11(9,2), 25, 0.5E-5, 0.68) (100(35,65), 698, 0.3, 27.9) (5(5,0), 6, 0.3E-6, 0.05) (1(1,0), 2, 0.0,0.01) (6(5,1), 14, 0.1E-5, 0.18) (12(10,2), 31,0.0, 0.3) (69(14,55), 1226, 0.2,37.9) (26(20,6), 60, 0.2E-6, 3.04) (4(4,0), 5, 0.1E-6, 0.04) (100(10,90), 110, 0.2E+6, 53.7)

Table 6.4 - Newton's Method: Second set of problems

136

M. A. Gomes-Ruggiero, Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

J. M. Martinez

and S. A.

Broyden Local Version

Broyden Global Version

(100, 0.1E+7, 1.57) (100, 0.1E+7, 1.57) (100, 0.1E+3, 1.61) (31, 0.34+20, 0.28) (5, 0.2E+31, 1.04) (100, 0.3E+1, 1.54) (100, 0.2E+11, 1.47) (100, 0.2, 1.56) (100, 0.4E+0, 1.64) (20, 0.6E-5, 0.16) (1, 0.0,0.01) (100, 0.6E-3, 1.55) (100, 0.3E+3, 1.5) (100, 0.3E+1, 1.62) (100, 0.3E+11, 1.90) (100, 0.3E-1, 1.56) (100, 0.5E+17, 1.57)

(11(2,1), 13, 0.6E-5, 0.18) (50(9,29), 483, 0.1E-1, 7.17) (44(8,35), 682, 0.7E-1, 7.21) (14(6,5), 39, 0.2E-5, 0.5) (36(15,16), 143, 0.9E-5, 7.41) (100(9,88), 2032, 0.2E+1, 24.5) (31(11,10), 77, 0.6E-5, 1.32) (30(10,9), 77, 0.2E-6, 1.31) (33(7,16), 325, 0.2, 6.31) (10(2,1), 12, 0.3E-5, 0.12) (1(1,0), 2, 0.0,0.01) (10(2,1), 12, 0.3E-6, 0.09) (13(4,3), 30, 0.4E-8, 0.3) (70(10,55), 1227, 0.2, 38.3) (30(10,9), 65, 0.4E-7, 4.38) (17(2,1), 19, 0.3E-6, 1.15) (100(10,90), 110, 0.2E+6, 52.9)

Table 6.5 - Broyden's M e t h o d : Second set of p r o b l e m s Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Column-Updating Local Version

Column-Updating Global Version

(100, 0.4E+7, 1.05) (100, 0.9E+7, 1.11) (100, 0.4E+3, 1.15) (87, 0.5E+23, 1.86) (3, 0.2E+27, 1.02) (100, 0.2E+2, 1.05) (100, 0.5E+12, 1.00) (100, 0.2E+1, 1.02) (100,2.4 1.21) (14, 0.7E-5, 0.07) (1,0.0, 0.01) (100, 0.8E+1, 0.89) (100, 0.9E+3, 0.8) (100, 0.7E+3, 1.19) (100, 0.8E+11, 1.54) (100, 0.2E+0, 1.03) (100, 0.2E+17, 0.94)

(11(2,1), 13, 0.3E-5, 0.17) (45(7,25), 458, 0.1E-1, 6.96) (44(8,35), 682, 0.7E-1, 7.16) (14(6,5), 39, 0.2E-5, 0.5) (40(14,15), 146, 0.9E-5, 6.97) (100(9,87), 2006, 0.2E+1, 24.6) (32(11,10), 78, 0.8E-6, 1.28) (26(9,8), 68, 0.4E-6, 1.25) (60(20,28), 219, 0.2E-5, 5.92) (14(1,0), 15, 0.7E-5, 0.1) (1(1,0), 2, 0.0,0.01) (14(2,1), 23, 0.2E-5, 0.19) (16(3,2), 30, 0.6E-6, 0.23) (40(5,26), 483, 0.3, 9.79) (23(6,5), 41, 0.7E-7, 1.4) (14(2,1), 16, 0.8E-7, 0.64) (100(10,90), 110, 0.2E+6, 54.4)

Table 6.6 - C o l u m n - U p d a t i n g M e t h o d : Second set of p r o b l e m s

Santos

Nonsmooth Equations by Means of Quasi-Newton with Globalization

137

In more than 50 % of these experiments the solution obtained was different from the nondifFerentiable point a;,. A summary of these cases is given in Table (6.7). We also observe, in this set of experiments, that the pure local versions of the methods are far less efficient than the global versions.

Execution

Method

Local Version

Newton Broyden Column-Updating Newton Broyden Column-Updating

Global Version

Convergence to a solution x ^ x, 4 2 2 4 6 6

Convergence to I , 2 0 0 6 5 6

Convergence to a stationary point

3 4 3

Table 6.7 - Summary of the performance of the methods when function is not nondifFerentiable at the solution x, Following a suggestion of a referee, we observed the rate ||a;(t+i — I«||/||a;ifc ~ x*\\ a t the final iterations of the three methods tested. A typical result is shown in Table 6.8. Here, the execution reported is global, but the final four iterations are purely local. In this case, the practical behavior of the rate seems to reflect theoretical superlinearity. This property, which has been proved in [36] for Newton's method, probably holds for quasi-Newton methods in many particular situations. In fact, local convergence properties of quasi-Newton methods for nonsmooth systems represents one of the more challenging problems in this research area. Newton 0.7972983E-01 0.1338583E+00 0.1487052E-01 0.2918130E-03

Broyden 0.7176153E-01 0.8193998E-01 0.5369848E-02 0.2844602E-01

Column-Updating 0.7176153E-01 0.6818981E-01 0.4678174E-02 0.2814408E-01

Table 6.8 - Convergence rate ||z*:+i - i . | | / | | i i - x,\\ for the last four iterations of the Global Versions of the Newton, Broyden and Column-Updating Methods for the Problem 4

References [1] C. G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation 19 (1965) 577-593.

138

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

[2] C. G. Broyden, J. E. Dennis, and J. J. More, On the local and superlinear convergence of quasi-Newton methods, Journal of the Institute of Mathematics and Applications 12 (1973) 223-246. [3] X. Chen, On the convergence of Broyden-like methods for nonlinear equations with nondifferentiable terms, Annals of the Institute of Statistical Mathematics 42 (1990) 387-401. [4] X. Chen and L. Qi, A parameterized Newton method and a quasi-Newton method for nonsmooth equations, Computational Optimization and Applica tions 3 (1994) 157-179. [5] X. Chen and T. Yamamoto, On the convergence of some quasi-Newton methods for nonlinear equations with nondifferentiable operators, Computing 48 (1992) 87-94. [6] J. E. Dennis and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Review 19 (1977) 46-89. [7] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Opti mization and Nonlinear Equations, (Prentice-Hall, Englewood Cliffs, New Jer sey, 1983). [8] J. E. Dennis and H. F. Walker, Convergence theorems for least-change secant update methods SIAM Journal on Numerical Analysis 18 (1981) 949-987. [9] M. A. Diniz-Ehrhardt and J. M. Martinez, A parallel projection method for overdetermined nonlinear systems of equations, Numerical Algorithms 4 (1993) 241-262. [10] S. C. Eisenstat and H. F. Walker, Globally convergent inexact Newton meth ods, Research Report, Department of Mathematics and Statistics, Utah State University, USA. [11] R. Fletcher, Practical Methods of Optimization (2nd edition), (John Wiley and Sons, New York, 1987). [12] A. Friedlander, M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos, A new globalization strategy for the resolution of nonlinear systems of equations, Relatoriode Pesquisa, RP04/94, Institute of Mathematics, University of Camp inas, Brazil, 1994. [13] A. Friedlander and J. M. Martinez, On the maximization of a concave quadratic function with box constraints. To appear in: SIAM Journal on Optimization.

Nonsmooth Equations by Means of Quasi-Newton with Globalization

139

[14] A. Friedlander, J. M. Martinez and S. A. Santos, A new trust region algo rithm for bound constrained minimization. To appear in: Journal of Applied Mathematics & Optimization. [15] A. George and E. Ng, Symbolic factorization for sparse Gaussian elimination with partial pivoting, SIAM Journal on Scientific and Statistical Computing 8, (1987) 877-898. [16] G. H. Golub and Ch. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore and London, 1989. [17] M. A. Gomes-Ruggiero and J. M. Martinez, The Column-Updating Method for solving nonlinear equations in Hilbert space, RAIRO Mathematical Modelling and Numerical Analysis 26 (1992) 309-330. [18] M. A. Gomes-Ruggiero, J. M. Martinez and A. C. Moretti, Comparing al gorithms for solving sparse nonlinear systems of equations, SIAM Journal on Scientific and Statistical Computing 13 (1992) 459-483. [19] L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique for Newton's method, SIAM Journal on Numerical Analysis 23 (1986) 707 716. [20] S. P. Han, J. S. Pang and N. Rangaraj, Globally convergent Newton methods for nonsmooth equations, Mathematics of Operations Research 17 (1992) 586-607. [21] C. M. Ip and J. Kyparisis, Local convergence of quasi-Newton methods for B-differentiable equations, Mathematical Programming 56 (1992) 71-90. [22] L. Luksan, Inexact trust region method for large sparse systems of nonlinear equations, Technical Report no.547, January 1993, Institute of Computer Sci ences, Academics of Sciences of the Czech Republic. [23] J. M. Martinez, A quasi-Newton method with modification of one column per iteration, Computing 33 (1984) 353-362. [24] J. M. Martinez, On the Convergence of the Column-Updating Method, Matematica Aplicada e Computacional 12 (1993) 83-95. [25] J. M. Martinez, Local convergence theory for inexact Newton methods based on structural least-change updates, Mathematics of Computation 55 (1990) 143168. [26] J. M. Martinez, On the relation between two local convergence theories of least change secant update methods, Mathematics of Computation 59 (1992) 457481.

140

M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos

[27] J. M. Martinez, A theory of secant preconditioners, Mathematics of Computa tion 60 (1993) 681-698. [28] J. M. Martinez and L. Qi, Inexact Newton methods for solving nonsmooth equations, Relatorio de Pesquisa 67/93, Institute of Mathematics, University of Campinas, Brazil, 1993, to appear in: Journal of Computational and Applied Mathematics. [29] J. M. Martinez and M. C. Zambaldi, Least Change Update Methods for non linear systems with nondifferentiable terms, Numerical Functional Analysis and Optimization 14 (1993) 405-415. [30] H. Matthies and G. Strang, The solution of nonlinear finite element equations, International Journal of Numerical Methods in Engineering 14 (1979) 1613 1626. [31] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957-972. [32] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York, 1970). [33] J. S. Pang, Newton's method for B-differentiable equations, Mathematics of Operations Research 15 (1990) 311-341. [34] J. S. Pang, A B-differentiable equation based, globally, and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-131. [35] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa tions, Mathematics of Operations Research 18 (1993) 227-244. [36] L. Qi and J. Sun, A nonsmooth version of Newton's method, Programming 58 (1993) 353-368.

Mathematical

Approximate

Newton

Methods

141

Recent Advances in Nonsmooth Optimization, pp. 141-158 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Superlinear Convergence of Approximate Newton Methods for LC1 Optimization Problems without Strict Complementarity 1 Jiye Han a n d Defeng Sun Institute of Applied Mathematics,

Acadernia

Sinica Beijing

100080, P. R.

China

Abstract

In this paper, the Q-superlinear convergence property of the approximate New ton or SQP methods for solving LC 1 optimization problems is established under the assumptions that the derivatives of the objective and constraint functions are semismooth, the strong second-order sufficiency condition is satisfied and the gradients to the active constraints are linearly independent. The strong second-order sufficiency condition is weaker than the second-order sufficiency condition and the strict complementarity condition.

1

Introduction Consider the s t a n d a r d nonlinear p r o g r a m m i n g minimize subject to

}(x) g(x) < 0, h(x) = 0,

(1-1)

where / , g and h are differentiable functions from Rn into R, Rp and Rq respectively. One m e t h o d for solving (1.1) is to solve t h e following linearly constrained q u a d r a t i c p r o g r a m Qk minimize

V}(xk)T(x

subject to

g(xk) h\xk)

- xk) + hx

+ Vg(xk)T{x + Vh{xk)T(x

- xk)TGk{x

-

xk)

- xk) < 0, - xk) = 0

'This work is supported by the National Natural Science Foundation of China

(L2)

142

J. Han and D. Sun

successively. Here Gk is an n x n matrix. This method is called an approximate Newton method or a SQP (sequential quadratic programming) method. If Gk is exactly the second-order derivative of the Lagrangian at xk, this is Wilson's method. See Garcia Palomares and Mangasarian (Ref. 4) and Robinson (Refs. 21-22). Before the advent of the very recent paper by Qi (Ref. 19), the proof of the superlinear convergence of such approximate Newton or SQP methods for solving nonlinear programming problems requires twice smoothness of the objective and constrained functions. Sometimes, the second-order derivatives of those functions are required to be Lipschitzian, for example, see Garcia Palomares and Mangasarian (Ref. 4), Han (Ref. 5), McCormick (Ref. 9) and Robinson (Refs. 21-22). However, the secondorder differentiability may not hold for some problems. For example, the extended linear-quadratic programming problem, recently emerged in stochastic programming and optimal control, even in the fully quadratic case, does not possess twice differentiable objective functions. However, their objective functions are differentiable and their derivatives are Lipschitzian in that case. See Rockafellar (Ref. 24) or Rockafellar and Wets (Ref. 25) for a detail. We call a function F : R" -> BT a LC1 function, if it is differentiable and its derivative function is locally Lipschitzian. We call a nonlinear programming problem a LC 1 optimization problem if its objective and constrained functions are LC1 functions. For the detail of LC 1 functions and LC 1 optimization problems, see Qi (Ref. 17). In Qi (Ref. 19), the Q-superlinear con vergence of the approximate Newton or SQP methods for solving LC 1 optimization problems was established under the assumption that the derivatives of the objective and constrained functions are semismooth and the three key assumptions that the second-order sufficiency condition, the strict complementarity slackness and linear in dependence of the gradients to the active constraints are satisfied under the context of LC 1 optimization problems. Basing on generalized equations' theory established by Robinson (Ref. 23), Josephy (Refs. 7-8) provided a proof to the local superlinear (quadratic) convergence of quasi-Newton (Newton) methods without assuming the strict complementarity slackness condition when the second-order differentiability is available. Also basing on Robinson's generalized equations' theory (Ref. 23), without assuming the strict complementarity condition Lescrenier (Ref. 29) provided a proof to the convergence of a class of trust region methods proposed by Conn, Gould, and Toint (Ref. 30) for optimization problem with simple bounds constraints when the objective function is twice continuously differentiable. In this paper, we will discuss the superlinear convergence of approximate Newton or SQP methods for solving LC1 optimization problems without assuming the existence of the second-order differen tiability and the strict complementarity slackness condition. In a certain sense, our results in this paper are the LC 1 version of the results in Josephy (Refs. 7-8) or a generalization of the results in Qi (Ref. 19) without the strict complementarity slackness. To achieve this, our technique is different from that of Josephy (Refs. 7-8) or Qi (Ref. 19). First we consider the superlinear

Approximate Newton

Methods

143

convergence of a generalized approximate Newton type method for solving nonsmooth equations, recently developed in Pang (Ref. 14) and Qi (Refs. 16-17). Then, we prove that the approximate Newton or SQP methods are special cases of such generalized approximate Newton method. In section 2, we discuss the strong second-order sufficiency condition and linear independence under the context of LC 1 optimization. The Q-superlinear convergence of approximate Newton or SQP methods for LC1 optimization is established in section 3. In section 4, we give some discussions.

2

The Strong Second-Order Sufficiency Condition Throughout this paper, we assume that / , g and h in (1.1) are LC 1 functions.

The Lagrangian of (1.1) is L(x,u,v) = f(x) + uTg(x) + vTh(x). gradient of L with respect to x by FUfV. Then Fu,v(x) = v / ( * ) + V f f (i)u +

Denote the

Vh(x)v

is a locally Lipschitzian function. In Josephy (Refs. 7-8) or Robinson (Ref. 23), the two key assumptions other than second-order differentiability are the strong second-order sufficiency condition and linear independence of the gradients to the active constraints. We still need these two assumptions. However the strong second-order sufficiency condition needs to be modified because we will not assume the second-order differentiability of / , g and h. In general, assume that F : Rn —> Rm is locally Lipschitzian. By Rademacher's Theorem, F is differentiable almost everywhere. Let Dp be the set where F is dif ferentiate. Let dF be the generalized Jacobian of F in the sense of Clarke (Ref. 2). Then dF{x) = co{ lim F'(xk)\, (2.1) where co{A} is a convex hull of a set A. In Qi (Ref. 16) and Pang and Qi (Ref. 15), the concept dsF(x) dBF(x)

= { lim

F'(xk)\.

Then dF(x) =

codBF(x).

was introduced

J. Han and D. Sun

144

For m = 1, 8BF(X) was introduced by Shor (Ref. 26). Let F, denote the ith compo nent of F. Sun and Han (Ref. 27) introduced dbF(x) = dgF^x)

x dBF2(x) x ■ ■ ■ x

dBFm(x).

Then 8BF(X) C di,F(x) and the converse relation does not hold in general. For example if F : R1 —> R2 has the form v

'

/ min(*,*') y min(-x, -x')

\ I

then

w>-{(i).(. , 1 ) }•»«•)-{(;). (-.)•(-.)■(:)}■ and dBF(0) C dt,F(0). But when m = 1, 3 t F(a:) =

dBF{x).

From the results of Clarke (Ref. 2), Qi (Ref. 16), and Sun and Han (Ref. 27) we know that dF(x), dBF(x) and di,F(x) are nonempty compact subsets of Rmxn, and the maps dp, dBF and d^F are upper semi-continuous (Ref. 1). In fact if we note that dF{x) and diF(x) are compact subsets, and that the maps dF and d{F are upper semi-continuous (Ref. 2), we can draw the same conclusions for the maps dsF and d\,F through the standard analysis. In this paper we use M{x,F) to represent one of dF(x), dBF(x) and dt,F(x) and use the multifunction M(-, F) to represent one of dF, dBF and dtF. Therefore, M(x,F) is a nonempty compact subset of Rmxn, and the map M{-,F) is upper semi-continuous. Suppose that f\, / 2 : Rn —+ i?1 are continuously differentiable functions. Let fo(x) = m i n ( / i ( x ) , / 2 ( i ) ) , then 9*/o(i) =

{V/,(*) T } {V/r(x)r,V/2(zn {V/ 2 (x) T }

if fi(x) < / 2 ( x if / i t 1 ) = /2(x i f / i ( z ) > f2(x

This formulae will be used later in this paper. The first-order Kuhn-Tucker conditions for (1.1) are F»,v{x) = Vf{x) + Vg{x)u + Vh(x)v = 0, « > 0, g(i) < 0, u;5,(z) = 0, for i = l,...,p, ft(x) = 0.

(2.2)

Let H(z)=

/ V / ( i ) + Vg(x)u + Vh(x)v min{u,-g{x))

V

~h(x)

\ ,

)

(2.3)

Approximate

Newton

Methods

145

where the 'min' operator denotes the componentwise minimum. Then the first-order Kuhn-Tucker conditions are equivalent to H(z) = 0. Denote Ht{z) = V / ( x ) + Vg(x)u + Vh(x)v, H2(z) = min(u, -g(x)) and H3{z) = -h(x). Then

H(z) =

/ Hx(z) \ H2{z) .

\ H3(z) J For every z = (x, u, v) G R" x R? x R?, denote dQH(z) = M{z,Hx)

x dbH2(z) x

{VH3(zf}.

It is easy to see that 3QH(Z) is a nonempty compact subset of Rmxm, and the map 3QH is upper semi-continuous, where m = n + p + q. For any A G M(z, Hx), there exists V 6 R"xn such that A = {V V(?(i) Vh(x)). Denote

vx(z) = {ve Rnxn\ (v vg(x) Vh(x)) e

M^,^)}.

From the definition of the map M(-,-), it is easy to see that for any z = (x,u,v) G R" x R? x Rq, we have M(x,Fu,v)CVx(z). Suppose that z = (x,u,v)

£ R" x Rp x Rq is a Kuhn-Tucker point of (1.1). Let

I(z) = {»| 1 < t < p, 0}, I°(z) = {," 6 J(*)| ut- = 0}, G(z) = {<* G / T | /'(x; d) = 0, gKx; d) = 0 for i £ /+(«), 0 for all d £ G(z)\0 (d G G+(z)\0), V 6 Vx(z). Suppose that z = (x,u,v) G R" x Rv x Rq is a Kuhn-Tucker point of (1.1). We say that z satisfies the linear independence condition if {Vj;(x), i G I(z)} and {Vhi(x), i = 1,...,<J} are linearly independent. We say that z satisfies the strict complementarity slackness condition if I°(z) = 0. When the strict complementar ity condition is satisfied (i.e., I°(z) = 0), then G(z) = G+(z). Therefore, secondorder sufficiency conditions and the strict complementarity slackness condition mean

J. Han and D. Sun

146

strong second-order sufficiency conditions. In general, strong second-order sufficiency conditions mean the second-order sufficiency conditions, but don't mean the strict complementarity slackness condition. The strict complementarity slackness condi tion may not hold in nonlinear optimization problems. Therefore, we will consider the superlinear convergence properties of approximate Newton or SQP methods for LC1 optimization problems without assuming the strict complementarity condition. First, we shall consider the nonsingularity of matrices W € 8QH(Z) at a solution of H(z) = 0. If the components of such a solution are denoted by xo, uo, «o, we can partition the vector g(xo) into smaller vectors g+(x0), g°(xo) and g~(xo), of dimensions r, s and t, respectively, and partition u0 conformably into itg , u° and u^ so that

ut >0,

g+(x0) = 0, g°(x0) = 0, g-{x0) < 0,

(2.4)

u° = 0, un = 0,

where the ordering is componentwise. After suitable arrangement, (2.3) can be writ ten as / x \ I V / ( i ) + Vg{x)u + Vh(x)v u+ min(u + , — g+(x)) H u° = min(u°, -g°{x)) u~ min(u~, — g~(x)) K

v j

-h(x)

\

\ .

(2.5)

j

Theorem 2.1. Suppose that z0 = (x0,u0,v0) 6 Rn x Rv x Rq satisfies the strong second-order sufficiency conditions and the linear independence condition of (1.1). Then all W G 8QH(Z0) are nonsingular. Proof. According to the definition of dQH(z0), we only need to prove for i = 0,1,..., s, the nonsingularity of the following matrices

( WW =

v

Gf

G°' T

G°jT

Gf

-Gt

0 0 0 0 0

0 0 0 0 0

0 0

0 0 0

-GV 0 0

I -Ho

'jxj

0 0

'txt

0

#oT\ 0 0 0 0 0 /

where V € VXo(z0), H0 denotes Vh(x0)T1 G£ denotes Vg+(x0)T, etc, / = {1,...,«'} (when i = 0, J = 0), J = {\,...,s}\I, j = | J | , G° 7 is a matrix of the / rows of G°, J G°0J is a matrix of the J rows of Gj, and LXj and Itxt are the unit matrices of if*

Approximate and Rtxt

Newton

Methods

147

respectively. Suppose that a, b, c, d, e and / are such that Va + G+Tb + G°0'Tc + G°0jTd + Gft -Gja -Gl'a Iixid

+ H0Tl = = = =

-ffxte

0, 0, 0, 0,

(2.6)

= 0,

-//0a

= 0.

Therefore, we get Va + G+Tb + G°'Tc + H0Tl -G+a -Gl'a -H0a

= = = =

0, 0, 0, 0.

(2.7)

Premultiplying the equations in (2.7) by aT, bT, cT and lT, respectively, and adding the result we find that aTVa = 0. This, together with the second and fourth equations of (2.7) and the strong second-order sufficiency conditions, implies that a = 0; the first equation of (2.7) and the linear independence assumption now imply that 6, c and / are also zero. The fourth and fifth equations of (2.6) means that d and e are zero. Thus the matrix Wt^ is nonsingular. This completes the proof. □ Corollary 2 . 1 . Under the conditions of Theorem 2.1, there exist 8 > 0 and C > 0 such that for any z = (x, u, v) G R" x W x Rq, satisfying p — z0\\ < 6, and any W € dQH(z), W is invertible and WW^W < C. Proof. Applying Theorem 2.1 of this paper, and that 3QH{Z) is a nonempty compact subset and the map 8QH is upper semi-continuous, we can easily obtain the conclusion. □ We say that a locally Lipschitzian function F : R" —* Rm is semismooth at x if lim

{Vh'}

(2.8)

V£0F(i+th')

exists for any h 6 Rn. If F is semismooth at x, then F is directionally differentiable at x and F'(x; h) is equal to the limit in (2.8). Semismoothness was first introduced by Mifflin (Ref. 10) for functional. Convex functions, continuously piecewise linear functions, smooth functions and subsmooth functions are examples of semismooth functions. Scalar products and sums of semismooth functions are also semismooth functions. In Qi (Ref. 16) and Qi and Sun (Ref. 18), the definition of semismoothness was extended to F : R" —» R™■ It was proved in Qi (Ref. 17) that F is semismooth at x if and only if each of its components is semismooth at x.

J. Han and D. Sun

148

3

Superlinear Convergence P r o p e r t y

To establish the superlinear convergence of approximate Newton or SQP methods, we need the following two properties of semismoothness: Suppose that F : Rn —+ i f is locally Lipschitzian and semismooth at x. Then (1) F is B-differentiable at x, i.e., F'(x; h) exists for all h G if1 , and F(x + h) = F(x) + F'(x; h) + o(||A||), (2) For any V G dF(x +

(3.1)

h),h-^0 Vh-F'(x;h)

= o(\\h\\).

(3.2)

See Theorem 2.3 of Qi and Sun (Ref. 18). The approximate Newton method (ANM) for solving (1.1) is as follows: Start at a point z° = (x°,u°,v°) G Rn x Rp x R". Having zk = (xk,uk,vk), k+1 k+1 k+l k+1 find a Kuhn-Tucker point z = (x ,u ,v ) of the quadratic subproblem Qk described by (1.2). If zk+l is not unique, choose any Kuhn-Tucker point zk+l which is closest to zk in terms of distance \\zk+1 — zk\\. Suppose that z' = (x",u",v') G Rn x R" x Rq is a solution of H(z) = 0 (i.e., z' is a Kuhn-Tucker point of (1.1)). For every z = (x,u,v) G i?" x Rv x Rq, denote a(z) = {i |u, > -gi(x)}, For ielfl

f3(z) = {i\ u, = -gi(x)}

and f(z) - {i\ «,• < - p ; ( x ) } .

= {1,...,2W*'>I}, define

ffW(*) =

/ V / ( x ) + V^(x)u + V/i(i)u \ , P <"( 2 )

-Hx)

V where p^\z)

(3.3)

j

G P(z) and P(z) consists of all the following functions p(z),

Pj(z)

< -
if j G a(z'), if j G P(z'), if j G j(z*),

j = 1, ...,p and define 9Qi7('»(Z) = M{z,Hx)

x {Vp(')( 2 ) T } x { V i / 3 ( ^ T } .

Lemma 3.1. Suppose that z" = (x*,u*,v") G Rn x R? x il« is a Kuhn-Tucker point of (1.1) and satisfies the conditions of Theorem 2.1. Then there exist positive

Approximate Newton

Methods

149

constants 6 and C such that for any z = (x,u,v) G 7?" x 7?" x 7?' with z G {z\ \\z—z*\\ < 6}, and any i G 7", all W{i) G dg77(z) are invertible and HW^H < C . Proof. From the definition of H^(z) and dQH^(z) we know that H®(z*) = 0

Vie/"

and dQH(i){z*)CdQH(z')

Vie/".

From Theorem 2.1 we know that all matrices W G dqH(z') are nonsingular. This means that all matrices Wj,-) 6 <9Q77('>(Z*), i G 7" are nonsingular. It is easy to see that all 3 Q / 7 ' ' ' ( Z ) , i G 7^ are nonempty compact subsets and all the maps 8QH^'\ i G I13 are upper semi-continuous. Therefore for each i G I13 there exist a neighborhood N^(z') of z' and a positive number C, such that for any z G -/V(''(z*), all W(,) G dqH^(z) are nonsingular and satisfy |[WT^11] < C,. Since I13 is of finite elements, the conclusion of this lemma holds. D In order to establish the superlinear convergence of approximate Newton method, we first consider the following generalized approximate Newton method (GANM) for solving H(z) = 0 : Given z° = (x°,u°,t>0) G Rn x R" x R«. For k = 0,1,..., choose i G 7" and let

^+i = 2<= _ B j ^ ' V ) , where B (i)fc = S/H^k(zkf k

H^ (z)

(3-4)

and TfW* is defined as

/ V/(x*) + V ^ a ^ u + V/i(xfc)u + G t (z - xk) \ = q"k(z) , k \ -h(x )-Vh(xk)T(x-xk) )

(3.5)

i G 7^, where q<-''lk(z) is defined as

?f *M =

( -9j(xk)

- V9l(xk)T(x

k

T

- xk)

if j G a(**),

pf{z ) + Vpj'V) (* ~ «*)

if J e £(**),

[ Uj

(3-6)

i f ; G l{z*),

j = l,...,?,andGfc€7?nxn Remark 3.1. In practice, we can't use the above method since we don't know z" However, the above method provides an approach to prove the Q-superlinear convergence of the approximate Newton method. T h e o r e m 3 . 1 . Suppose that z* = (x*,u*,v*) G 7?" x Rp x Rq is a Kuhn-Tucker point of (1.1) and satisfies the conditions of Theorem 2.1. Suppose that V / , Vg and

J. Han and D. Sun

150

V/i are semismooth at i*. Let C and 6 be the positive constants in Lemma 3.1. If there exists Vk 6 Vxk{zk) such that \\Gk-Vk\\<±

Vfc,

(3.7)

then the above method GANM is well defined and Q-linearly converges to z* in a neighborhood of z* If furthermore,

£2,

\](Gk-Vk)(xk"-xk)\\

\\zk»-z«\\

(3 8)

- °'

"

then the convergence is Q-superlinear. If in the later case H(zk) ^ 0, we have

lim"^f;;j"=0.

(3.9)

*-• ll#(**)ll Proof. Since V / , Vg and V/t are semismooth at x', H and H^'K i 6 I13 are semismooth at z". From the definitions of Vx*(zk) and dQH^(zk), i € /", for each £(,-)*, i e J" there exists W(,-)t £ dQH{i1(zk) such that for any z = ( i , t i , « ) e i f x i ? x /?« | | ( 5 ( 1 > - W{i)k)z\\ = \\(Vk - Gk)x\\.

(3.10)

\\B{i)k-W{i)k\\<\\Vk-Gk\\<±.

(3.11)

In particular, we have

If \\zk - z*|| < 6, then by Lemma 3.1, W{'\ exists and || W ^ J < C. By the Pertur bation Lemma of Ortega and Rheinboldt (Ref. 12, p. 45), B^)k is invertible and l|B(7)\ll < \C

(3.12)

Recall that a map is semismmooth at z* if and only if each of its components is semismooth at z* and there are finite elements in the set / , so by (3.1) and (3.2), for every e > 0 there exists a neighborhood N(z*) of z" such that when z G N(z') and W(i) 6 dQH^{z) (note Wfa e dHf\z)) we have n+P+9

\\H®{x) - H^(z')

- W{l)(z - OH <

£

l^ ( , , (*) -

tfjV)

- Wj)(» - z')\

J=l

<e||z-z*||

Vie/* 3 . (3.13)

Approximate

Newton Methods

151

So we may choose S\ > 0 sufficiently small such that when \\zk — z*\\ < Si, for any i 6 / " we have \\HM(zk) - H^(z')

- W{t)k(zk - z')\\ < ± | | * * - z*\\.

(3.14)

Let S = min(<5j, (5). Then when \\zk — z*|| < 6, we have H ^ i _ 2 .|| = ^k _ B^H^(z")

- z'\\

< H^IIII^°(«*) - # V ) - %*(** - **)ii < \\B^\\l\\HU(zk)

- H<<\z') - W(i)k(zk - z')\\

+\\{Bmk-W{i)k){zk-z')\\}.

(3.15)

Substituting (3.11)-(3.12) and (3.14) into (3.15) gives

^_z»n

< _ Cc( ( — ++ — jMI-* l12 3

iC

8C

:

= i||^-z*||.

(3.16)

This proves that GANM is well defined and Q-linearly converges to z* in a neighbor hood of z'. Furthermore if (3.8) holds, by (3.10)-(3.11), (3.13) and (3.15), we have \\zk+1 ~ 2*|| < \c[\\HV{zk)

-

tf
- W{i)k(zk - z')\\

+ 11(^(0* - W(.l*)(**+1 - **)ll + ll(B(,-,t - Wm){zk^ - z')\\] < \c[o{\\zk
- z'\\) + \\{Vk - Gk)(xk" +

- xk)\\ + ± | | z * + 1 - *'||]

o{\\zk»-zk\\)+l-\\zk»-z'\\. (3.17)

This, and the Q-linear convergence of {zk}, turns out to be \\zk+1 - z'\\ = o(\\zk - z*\\),

(3.18)

i.e., the convergence of GANM is Q-superlinear. The proof of (3.9) is similar to the proof of Theorem 3.1 of Qi (Ref. 16).

D

J. Han and D. Sun

152

Remark 3.2. For unconstrained optimization problem ( / G C 2 ), condition (3.8) is known as the Dennis-More type condition (see, e.g., Dennis and Schnabel (Ref. 3)) and that for nonlinear programming (C2 optimization problem) with equality constraints a generalization of this condition due to Boggs, Tolle, and Wang (Ref. 31) is widely used. Corollary 3.1. Assume that the conditions of Theorem 3.1 hold. Then there exists a positive number e > 0 such that when there exists Vk € Vxt{zk) such that ||Vl-Gfc||
V *,

(3.19)

the approximate Newton method described above is well defined and Q-linearly con verges to 2* in a neighborhood of z* - If furthermore (3.8) holds, then the convergence is Q-superlinear. If in the later case, H(zk) ^ 0, then (3.9) holds. Proof. To complete the proof, we prove that the approximate Newton method is a special case of GANM in a neighborhood of z* Choose a positive number 82 > 0 {82 < 6/3, 8 is defined in the proof of Theorem 3.1) such that when z, zheB(z'-382)

= {z\

\\z-z'\\<382},

we have -g;(xk) - Vgt(xk) (x — xk) < uk -9i(xk) - X7g,(xkf(x - xh) >uk

if i € a(z*), if i € f(z').

(3.20)

So when zk £ B(z~\ Z82) we have a(*') C «(**), j(z") C 7 (**) and /3(zk) C

fi(z').

(3.21)

The first-order Kuhn-Tucker conditions of the quadratic subproblem Qk can be written as Hk(z) = 0, (3.22) where Hk{z) is defined as Vf{xk) + Vg(xk)u + Vh(xk)v + Gk(x - xk) H (z) = j min(u, -g(xk) - Vg(xkf(x - xk)) k k k -h{x )-Vh{x f{x-x )

\

k

(3.23)

/

We now show that (3.22) has a solution if 82 sufficiently small. Similarly to the proof of Theorem 4.1 of Robinson (Ref. 23), we can easily conclude that the following matrix / V. V 5o(2 .,(x*) Vh(x') \ A. = -V5a(z.)(x-)r 0 0 j

\

-Vh(x'f

0

0

/

Approximate

Newton

Methods

153

is nonsingular, and the Schur complement B(z') =

C{z'f'A^C{z')

is a P-matrix (i.e., a matrix with positive principle minors), where V, 6 Vx>(z") and C{z') From the definitions of M(z,Hi) exists 83 > 0 such that when

0 0

and V x (z), for every e > 0 we can prove that there

zheB(z'-63)

=

{z\\\z-z'\\<63},

we have V,*(**)CV,.(z*) + e £ ( 0 ; l } ,

(3.24)

where 5(0; 1) = {2 g Rn\ \\z\\ < 1}. So we may restrict S2 and e such that for any zk € B(z'; S2) = {z\ \\z - z*|| < 62}, the matrix .. V ffa(z . ) (x fc )

G* *(**) = ) - v 5 a ( , - ) ( ^ ) -Vh(xkf

VA(**) \ 0 0

0 0

J

is nonsingular, and the Schur complement B(zk) = C{zk)TA(zk)

*C(zk)

is a P-matrix, where Vgp(z.)(xk) 0

( k

C(z ) = Note that in the matrix

A(zk) k

-C{z f

C(zk)

0

(3.25)

the index sets a and 0 are defined at z" but the various gradients are evaluated at In order to consider the solvability of the system (3.22), we consider the solvability of the following system F„* ,„*(!*) + Gk

+ Vh(xk)d» = 0, -gi(xk) - Vgi(xkfdx = 0 for i G a(z'), mm(uk + d"-,-g,(xk) - Vgt(xkfdx) = 0 for i € /?(z*), for z 6 7(2*), w* +
(3.26)

J. Han and D. Sun

154

The component dUt is explicit for i £ j(z'). Simplifying these equations, we deduce that the remaining components of the vector d = (dx,du,dv) £ Rn x R" x Ri can be obtained by solving the mixed linear complementarity problem q(zk) + A(zk)w + C{zk)d^ -gp(xk)-C(zk)Tw>0, u* + d»" > 0, I [-gp(x«)-C{zk)Tw]T{4

= 0, ^M>

+ d^) = 0,

where w = ( d - . d - . O . f(**) = (&(**)»-»•(**),-A(**)), &(**) =

F „*{xk)~Vg^xk)uk uK

and a, /? and 7 denotes respectively the index sets a(z*), P(z*) and 7(2*). From linear complementarity theory (see, e.g., Murty (Ref. 11)), we know that a sufficient condition for the system (3.27) to have a unique solution is (i) the matrix A(zk) is nonsingular and (ii) the Schur complement B(zk) = C(zk) A(zk) C(zk) is a Pmatrix. Since we have proved that these two conditions are satisfied, system (3.27) has a unique solution. Then system (3.26) has a unique solution when zk £ B(z*; <52). We denote this solution by dk = (rf*\
xR" x K>.

It is easy to prove that for each k there exists i 6 I13 such that H^(zk)

+ B{i)kdk = 0.

(3.28)

From the proof of Theorem 3.1, we know that ||*h + d * - z * | | < i ] | * f c - * * | | .

(3.29)

Let zk+1 =zk + dk. Then zk+l € B{z"; 62) if zk 6 B{z~; S2). We now prove that Hk(zk+1) zk, zk+1 £ B{z*;62), we have mm(uk+\-9i(xk)

= 0, which means that (3.22) has a solution, when

- V g ^ f ^ -g,(xk) Sf+1

1

- xk))

- Vgi(xkf(xk+l

- xk)

if i 6 a ( * ' ) , if i £ -y(z').

Approximate

Newton Methods

Thus if zkeB(z';62),

k+t 1) Hkk(zk+

H (Z )

155

then k k kk k (Fu.ut vt„*(**) (xk) + Vg(x )d»" + Vh(* )d» + G d** \ (F Vg(xk)d" Vh(*k)d" Gtk<*** k = min(u* + « f \ -g(x ) V (x*) V ) nan(u* ^ , - j ( * * ) - V 5f f ( * * ) V

V \

-*(**) - Vfc(sf>* VMx*) V

//

k fc .*(**)++VV5(x V k ++Vfc(*V Vh(xk)d"k++ G Gt„*(x*) 5(x )
= 0, which means that system ffl(z) = 0 has a solution zfc+1 in B(z*;62), i.e., z*c+1 is a Kuhn-Tucker point of (1.2). Suppose that i*+1 6 £(z-;Z62) is an arbitrary solution of Hk(z) = 0. Since zk+l € B{z'; 3S2), then T min(u* ++11 \- 5 ,(x*) - V«7,-(x*) V«7,-(x*)T(x*+> (x*+>- -xkx))k))

f -g,(xk) 1 ik+l

- Vgi{xk)T($k^

- xk)

if a(z*), if t € a(z*), if f(z'). if i G 7(*»).

Therefore d* = zt+1 - zk is also a solutton of syssem (3.26)) From the uniqueness of the solution of system (3.26-), we know that z*+1 = zk+1, which shows that zt+1 is the closest Kuhn-Tucker point to zk in terms of distance \\zk+1 - zk\\. So there exists i 6 P such that ZM

=

sk+l = zk_

B-l^(.)(/))

which means that approximate "Newton method is a special case of GANM in a neighborhood of z* So we complete the proof of Corollary 3.1 by considering Theorem 3.1. □ Remark 3.3. If we choose Gk 6 V^(zk),

4

(3.7) and (3.8) are satisfied.

Some Discussions

In this paper we considered the local convergence of approximate Newton or SQP methods for LC 1 optimization problems without assuming the strict complementarity condition. The global convergent technique used in Qi (Ref. 19) can be applied to this paper similarly.

J. Han and D. Sun

156

GANM is useful in proving the Q-superlinear convergence of approximate Newton or SQP methods, but it can't be used in practice since we don't know oc(z'), fi(z') and 7(2*). The approximate Newton or SQP methods are well used and in each step a quadratic programming is needed to be solved. In the following we give such a method that in each step only a linear equations is needed to be solved. Given z° = {x°,u°,v0) For k = 0,1,...,

g R" x R" x R". zk+1 = zk - B;lH(zh),

where Bk G dQHk(zk)

= {VLk(zk)T}

Lk(z) = Vf(xk)

(4.1)

x dbgk(zk) x {Vhk(zk)T},

+ Vg(xk)u + Vh(xk)v

gk(z) = min(u, -g(xk)

+ Gk{x -

- Vg(xkf(x

-

and xk),

xk))

and hk(z) = -h(xk)

- Vh(xkf(x

-

xk).

It is easy to see that in a neighborhood of the solution z" of H(z) = 0, the above method is a special case of GANM. So similar convergent properties for (4.1) can be found in Theorem 3.1. Acknowledgements. The authors thank the two referees for their valuable com ments and suggestions on this paper and are grateful to Professor L. Qi for his helpful suggestions on nonsmooth equations and related problems.

References [1] J. D. Aubin, and F. Frankowska, Set-Valued Analysis, Birkhauser, Boston, 1990. [2] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [3] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Opti mization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, New Jercy, 1983. [4] U. C. Garcia Palomares and O. L. Mangasarian, Superlinearly convergent quasiNewton algorithms for nonlinearly constrained optimization problems, Mathe matical Programming 11 (1976) 1-13. [5] S. P. Han, Superlinearly convergent variable metric algorithms for general non linear nrogramming problems, Mathematical Programming 11 (1976) 263-282.

Approximate Newton Methods

157

[6] J. B. Hiriart-Urruty, J. J. Strodoit and V. H. Nguyen, Generalized Hessian ma trix and secord-order optimality conditions for problems with C 1,1 data, Applied Mathematics and Optimization 11 (1984) 43-56. [7] N. H. Josephy, Newton's method for generalized equations, Technical Summary Report 1965, Mathematical Research Center, University of Wisconsin-Madison, 1979. [8] N. H. Josephy, Quasi-JVewton methods for generalized equations, Techni cal Summary Report 1966, Mathematical Research Center, University of Wisconsin-Madison, 1979. [9] G. P. McCormick, Penalty function versus non-penalty function methods for constrained nonlinear programming problems, Mathematical Programming 1 (1971) 217-238. 101 R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1972) 957-972. 11] K. G. Murty, Linear Complementarity, Helderman-Verlag, Berlin, 1988.

Linear and Nonlinear

Programming,

121 J- M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. 131 J.-S. Pang, S. P. Han and R. Rangaraj, Minimization of locally Lipschitzian functions, SIAM Journal on Optimization 1 (1991) 57-82. 141 J.-S. Pang, A B-differentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-131. 15l J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443-465. 16] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa tions, Mathematics of Operations Research 18 (1993) 227-244. 171 L. Qi, LC 1 functions and LC 1 optimization problems, Applied Mathematics Preprint 91/21, School of Mathematics, The University of New South Wales, Sydney, Australia, 1991. 18] L. Qi and J. Sun, A nonsmooth version of Newton's method, Programming 58 (1993) 353-368.

Mathematical

191 L. Qi, Superlinearly convergent approximate Newton methods for LC 1 opti mization problems, Mathematical Programming 64 (1994) 277-294.

158

J. Han and D. Sun

[20] L. Qi and R. Womersley, An SQP algorithm for solving extended linearqua.dra.tic problems in stochastic programming, Applied Mathematics Preprint 92/23, School of Mathematics, The University of New South Wales, Sydney, Australia, 1992. [21] S. M. Robinson, A quadratically convergent algorithm for general nonlinear programming problems, Mathematical Programming 3 (1972) 145-156. [22] S. M. Robinson, Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear programming algorithms, Mathematical Programming 7 (1974) 1-16. [23] S. M. Robinson, Strongly regular generalized equations, Mathematics of Oper ations Research 5 (1980) 43-62. [24] R. T. Rockafellar, Computational schemes for solving large-scale problems in extended linear-quadratic programming, Mathematical Programming 48 (1990) 447-474. [25] R. T. Rockafellar and R. J.-B. Wets, Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time, SIAM Journal on Control and Optimization 28 (1990) 810-822. [26] N. Z. Shor, A class of almost-differentiable functions and a minimization method for functions of this class, Kibemetiica 4 (1972) 65-70. [27] D. Sun and J. Han, Newton and quasi-Newton methods for a class ofnonsmooth equations and related problems, Technical Report No. 026, Institute of Applied Mathematics, Academia Sinica, Beijing, China, 1994. [28] C. Zhu and R. T. Rockafellar, Primal-dual projected gradient algorithm for extended linear-quadratic programming, to Appear in SIAM Journal on Opti mization. [29] M. Lescrenier, Convergence of trust region algorithms for optimization with bounds when strict complementarity does not hold, SIAM Journal on Numerical Analysis 28 (1991) 476-495. [30] A. R. Conn, N. I. M. Gould and Ph. L. Toint, Global convergence of a class of trust region algorithms for optimization with simple bounds, SIAM Journal on Numerical Analysis 25 (1988) 433-460. Erratum in the same Journal 26 (1989) 764. [31] P. T. Boggs, J. W. Tolle and P. Wang, On the local convergence of quasinewton methods for constrained optimization, SIAM Journal on Control and Optimization 20 (1982) 161-171.

Second-Order Directional Derivatives

159

Recent Advances in Nonsmooth Optimization, pp. 159-171 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

On Second-Order Directional Derivatives in Nonsmooth Optimization L. R. Huang Department of Mathematics, South China Normal University, Guangzhou, China K. F. Ng Department of Mathematics, The Chinese University of Hong Kong, Hong Kong

Abstract Some relationships between the second-order derivative of Ben-Tal, Zowe and that of Chaney are established. These derivatives are used in providing optimality conditions for nonsmooth optimization problems with and without constraints.

1

Introduction

As a major role in second-order nonsmooth optimization problems, different kinds of generalized second-order directional derivatives have been introduced among which one has D2f(x;u,v) of Dem'yanov-Pevnyi [13], Ben-Tal, Zowe [3], and f"(x;x',u) of Chaney [6] where / is a locally Lipschitz real-valued function on a normed space X. In this paper, we shall survey some of the results obtained in [4, 14, 15, 16] as well as give further new results. In particular we provide, in section 4, a new set of sufficient conditions for a minimum point in nonsmooth optimization problems with and without constraints.

2

Definitions

Though most results can be generalized, we assume for simplicity that X = 1in; let W be an open set in X and / a real-valued locally Lipschitz function. The lower Dinidirectional derivative of / at x £ X in the direction u G X is denoted by D-f(x; u)

L. R. Huang and K. F. Ng

160 and is defined by D-f(x;u)

:= liminf j{f(x

+ tu) - / ( x ) } .

The upper Dini-directional derivative D+f(x;u) is similarly defined (with the lower limit replaced by the upper one). If D-f(x;u) = D+f(x;u) then the common value is denoted by f'(x;u). As in [2], Clarke's generalized (upper) directional derivative and subdifferential are denoted by f°(x;u) and df(x) respectively. In the case when f'(x;u) exists, the following definition was introduced by Ben-Tal and Zowe [3]. Definition 2.1 Let x € W and u,v G X. The Ben-Tal/Zowe lower and upper generalized second-order directional derivatives of f at x in the directions u and v are defined respectively by Dif(x;

u, v) := liminf i { / ( x + tu + t2v) - f(x) - tD+f(x;

u)}

(2.1)

Dlf(x;

u, v) := limsup - { / ( x + tu + t2v) - f(x) - tD.f{x\ no *

u)}.

(2.2)

and

It is easy to see that - oo < D2_f(x; u, v) < D\f(x;

u, v) < +oo

(2.3)

2

-D _f(x;u,v) = Dl(-f)(x;u,v). (2.4) Similar but different notions have appeared in the literatures, e.g. in Penot [20] and Studniarski [24]; these authors use D+f and D-f in the above definitions to replace £>_/ and D+f respectively. In our approach it is true [15] that if D2f(x;u,v) exists and is finite then the first-order derivative f'(x;u) also exists. This property is not shared by the approach of [20], [24] (see Example 3.6 in [24]). A sequence (i^) is said to converge to x in the direction u, denoted by ( i t ) —> u x, if (x/,) converges to x, x/c ^ x for every k, and the sequence (|| u \\ ,,**"*,.) converges to u. Chaney's subdifferential [5] of / at x in the direction u is denoted by duf(x) and defined to be the set consisting of all x* for each of which there exist sequences (a^) and x*k € df(xk) such that (x&) —> u x and (x*k) —» x' Because we have assumed that X = 7£", duf(x) is nonempty and duf(x) C df(x) since the multifunction df is upper semicontinuous [2]. The following generalized lower and upper second-order directional derivatives are also due to Chaney [5]. Definition 2.2 Let x e W and u 6 X. Suppose that x~ G <9„/(x). Then /i'(x;x*,u) is defined to be the infimum of all numbers lim inf -~{f(xk)

- f(x) - x*(xk - x)}

k^oot'k

taken over all triple of sequences (xk), (x£) and (tk) for which

Second-Order Directional Derivatives

161

(a) tk > 0 for each k and (xk) converges to x, (b) (tk) converges to 0 and f5^-12) converges to u, (c) (xj) converges to x* with x*k in df(xk)

for each k.

Similarly, /"(x;x*,u) is defined by the supremum of all numbers lim sup "5{/(x fc ) - f(x) - x"(xk - x)} fc—oo tk

taken over all triples of sequences (xjt), (x'k) and (tk) for which (a), (b) and (c) above all hold. Clearly, — oo < f"(x;x',u) < f+(x\x*,u) < +oo. Further, if f"(x;x',u) = f+(x;x",u), then we denote this common value by f"(x;x*,u) and call it Chaney's generalized second-order directional derivative of f at x and x* in the direction u.

Remark.

By (b), we see that if u ^ 0, then Xk — X \\xk-x\\

(xk — x)/tk \\xk-x\\ltk

u \\u\y

that is, (xk) converges to x in the direction u. Thus, lim inf ~^{f(xk)

- f(x) - x"(xk - x)}

k~ootk

= lim inf j ^ {f{xk) - f(x) - x'(xk - x)} *-°° || xk - x ||2 = || u f lim inf j fc-»oo || Xk

^ — {f(xk) — X

k

—r t{

- / ( * ) - x'(xk - x)} .

\\'

Hence, f"{x; x', u) equals to the infimum of all numbers || u ||2 lim inf -

jrr {f(xk)

- f{x) - x*(xk - x)}

fc^oo || xk — X | | '

taken over the set of all sequences (xk) such that (a') (xk) converges to x in the direction u and (b') there exists a sequence x"k € df(xk)

converging to x".

112

L. R. Huang and K. F. Ng

162

3

Relationships Between the two Second-Order Derivatives and Application for Convexity

In the case where / is C2 it is well known and easy to verify that the second-order derivatives of Chaney and Ben-Tal/Zowe are given by f"{x;x*,u)

= -

D2f(x; u,w) = -
, V2f{x)u

>

+x'(w),

where x' = S/f(x); this implies the following relationship between the two derivatives: f"(x;x\u)

= D2f(x;u,w)-x"{w),

Vw.

(3.1)

This relationship persists in some nonsmooth cases, e.g., if duf(x) = {x*} and /"(x;x*,u) exists (see [15, Corollary 4.2]. Another instance is provided by Theo rem 3.2 below. But (3.1) may fail to hold in general: the right-hand side of (3.1) may not be a constant function (of w); indeed it may be a convex and non-affine function of w. Based on the works of Ben-Tal and Zowe [3] the following weaken relation was established by Chaney [6] f"(x;x',u)=mi{D2f(x;u,v)-x*{v):veV>}

(e K)

(3.2)

for a very special class of nonsmooth functions. The following theorem 3.1 implies that (3.2) in fact holds in general provided that both sides of (3.2) exist in It. Theorem 3.1 Letx,u,x* G 1Z-n and suppose that x" £ duf{x) andx*(u) = Then f"{x;x',u) <mi{D2_f{x;u,v) - x*(v) :v£lln} < f'l{x;x\u)

D+f(x;u).

provided that the infimum is finite. Consequently, if f satisfies an additional condition that f"(x;x*,u) exists then f"(x;x',u)

= \nt{D2_f(x;u,v)

- x'(v) : v G

ft"}

(3.3)

Remark. In the terminology of convex analysis, the conclusion (3.3) simply says that f"(x;x*,u) equals to the value at x* of the conjugate function of D2_f(x;u,-). This result is proved in [15]. In a special case, it was established by Chaney [6] who considered / of the form f(x) = JlZi 9<(hi(x)) where each hi is a sup-type function and each #; is C2 with <7,'(/i,(x)) > 0. Theorem 3.2 Let x, u, x" G Un such that x" G duf(x), suppose that f"(x\x',u) exists. If inf{D2_f(x;u,w)

- x'(w) : w G ft"}

x'(u)

= f'(x;u),

and

Second-Order Directional Derivatives

163

and sup{D*/(x; u, w) - x'(w) : w € Tln) are both finite then D2f(x; u, to) exists and for all w € TZn f"(x; u, to) = D2f{x; u, to) - x*(to). The following result from [16] show that though Dlf(-,u,0) and D2_f(;u,0) do not equal in general but under reasonable conditions they have the same lower bounds on any open subset of "R.n Theorem 3.3 Suppose that f is regular on 7?" Let W be an open subset of TV1 and u€Kn. Then ini{D2+f(x;u,0)

:x€

w) = inf {Dif(x;u,0)

:x 6

w)

if the left-hand side is finite. Under the same conditions, we also have, for each x, that lim inf D\f(y; u, 0) = lim inf D2_f(y; u, 0) and lim

inf

Dlf(y;v,0)

y—*x,v—*x

T

= lim

D2_f(y;v,0)

inf y—*x,v—*u

Remark. As an application of Theorem 3.3, we note the following corollary which provides a sufficient condition for the existency of D2f(x;u,0). Corollary 3.4 If there exists xo € W such that inf{D*/(x;u,0) : x £ W] =

D2+f(xo;u,0)

then Dlf(xo;u,0)

=

D2_f(xo;u,0).

Indeed, by Theorem 3.3, the given assumption implies that D\f(x0; for all x and hence the equality at x = x0.

u, 0) < D2_f(x; u, 0)

Theorem 3.5 Let f be as in Theorem 3.3. Then f is convex on W if and only if D\f(x\u,Q) > 0 for each x € W and each unit vector u. For detailed proofs as well as the corresponding results in terms of Chaney's derivative, we refer the reader to [16].

L. R. Huang and K. F. Ng

164

4

Second Order Optimality Conditions

Let / , glt ■ ■ -, gm+p be real-valued locally Lipschitz functions on an open set W in 72". We consider the following optimization problems V, Vc without and with constraints respectively: (V)

minimize f(x),x

£ W

and (Vc)

minimize f(x),x

£ W

subject to

9i(x) < 0 for i £ {1,2,■■■,m} gj(x) = 0 for j € {m + 1 , ■•• ,m + p}. For any subset S C. W and x0 £ S, we use K$(x0) to denote the contingent cone [l] of S at x 0 . It is well-known and easily verfied that if x0 £ 5 is a local minimum point of / on S then D-f(x; •) > 0 on Ks(x0). The following result which roughly deals with a converse situation was established by Ioffe [17] for the special case when S = W and by the authors for the general case [14]. Lemma 4.1 Let S be a subset of TZ", x0 £ S and f be a locally Lipschitz function of S into TZ. Suppose that D-f(x0; u) > 0 for any u £ K$(x0)- Then for any e > 0, there exists 6 > 0 such that f(x0) < f(x) + t \\ x — x0 \\ for any x £ S with x / x0 and || x — x0 ||< 8; thus x0 is a strict local minimum point of Fc on S where Ft is defined by Fi(x):=f(x) + e\\x-x0\\. By virtue of this lemma and Ekeland's variational principle [l], one can show (cf. [14]): Theorem 4.2 Let f : W —> 7Z be a locally Lipschitz function such that D_f(x0; •) > 0 on Tln. Ifu £ Tln is such that D-f(x0\u) = 0, then 0 £ duf(x0). More generally, we have Theorem 4.3 / : W —> 72. be locally Lipschitz near x with x £ W. Then 7J_/(i; ■) is continuous, and, for any unit vector u in 7Zn with D-f(x; u) = mini|„||=1 D-f(x; v), one has 0 e 8J(x) + D-f(x;u)Bi, where B\ denotes the unit ball in 72.™. Proof. That D-f(x;-) is a continuous (finite) real-valued function follows from the Lipschitz property of / . By compactness it follows that there exists u £ B^ such that D-f(x;u) = min||„|)=1 D-f(x;v). Fix any such u, let F : 72" —► 72 be defined by F(y):=f(y)-D^f(x-u)\\y-x\\.

Second-Order Directional Derivatives

165

Then it is easy to verify that D-F(x;v)

= D..f(x;v)

- D-f(x;u)

>0

for all v with || v || = 1 and, in particular, D.F(x;

u) = D-f{x; u) - D-f(x;

u) = 0.

It follows from Theorem 4.2 that 0 6 duF(x) : there exist sequences tk, zk and z'k with tk 1 0, ^ 2 —► u and z'k G dF(zk) convergent to 0. Since by [2, Proposition 2.3.3], 9F(zk) C df(zk) + D.f(x;u)d(\\--x 11)00 and since d(- || • - x \\)(zk) Q Bu we write z\ = x"k + D_f(x; u)y'k with xj G 0 / 0 * ) and yl E Bi. Since the subdifferential function takes values locally in a compact set [2, Proposition 2.1.2], and by considering subsequences if necessary, we can assume that x\ —► x* and y"k —> y* for some x* and y* in TV1 These imply that x" G duf(x) and y" G Bi and so 0 = x- + £ > _ / 0 ; u)y* G 9 u /(u) + f»_/(x; w)5i-

□

Remark. Let / : w —> 7?. be a locally Lipschitz function and attain its local minimum at some point x0 G W (thus, in particular D-f(xo; •) > 0 on 1Z\). Suppose u G 72A is such that Z)_/(xo; u) = 0 then, by Theorem 4.2, 0 G duf(x0). Consequently f'l(xo; 0,u) is meaningfully defined by Definition 2.2 and, in fact, it is now easily seen from the definition and the minimality that f'l(xo; 0, u) > 0. This result on necessary conditions for / to attain its minimum extends the corresponding result of Chaney [8] in two aspects: firstly we have removed his additional semismooth assumption of / , secondly the positive definite property of f'l(xo', 0, ■) is now established on the whole set {u : D-f(xo',u) = 0} of ''critical directions" while his result is only on a subset. See [8] and [14] for details. The following result plays a key role for our subsequent discussions. Theorem 4.4 Let f : W —► 1Z be a locally Lipschitz function and x0 G Tin Suppose that D-f(x0; •) > 0 on TZn Let u G TV be a unit vector and let (xk) be a sequence convergent to x in the direction u such that f(xk) < f(x0) for each k. Then 0 G dj(x0) and fZ(xo;0,u) < 0.

Proof. From the assumption of (xk), one has D-f(x0;u) < 0 and in fact the equality holds in view of the given condition D-f(x0; •) > 0 on TZn. By Lemma 4.1 (with S = W), this condition also implies that for any («;) J. 0 with e; G (0,1), there exists (Si) I 0 such that Fit(x) '■= / O ) + «; || x — x0 || attains a minimum on B[x0,6i]

L. R. Huang and K. F. Ng

166

at x 0 . Now for each i, take a large enough fc; such that xki € B[x0, j6;]. Note that if x g B[x0,2 || xki - x 0 ||] then f{xk.) < / ( * . ) < f(x) + £,■ || * - i . ||< / ( * ) + 2e, || xki - io || • By Ekeland's variational principle [l] with A = e]/2 || «*, - x„ || / 2 , we can find zki 6 B[x„, 2 || xki — x 0 ||] such that

(i) I I * * - * * , ||<£! / 2 ||x f c j -x 0 ||/2, (ii) / ( * * ) < / < * * ) , and (iii) /(**,) < / ( * ) + 4«J/2 || x _ ^

|| for

a ll a

g 5 [ x 0 ) 2 || x*. - x 0 |||.

From (i) we have x<, ^ zjt, and zki lies in the open ball B(x 0 ,2 || x^, — x 0 ||). Thus, from (iii) we obtained 0 € df(zki) + 4eJ B\ by results of Clarke [2], and so there exists

*;ea/(xOwith 114 ll<4ej/2; therefore (zJJJ —> 0 6 3/(x„). Note that by (i)

( ( * * . - * * i ) / I I * * . - * . ID — o Then for f, = || x t l - x„ ||, Zfc, - X 0

_ [(zt, - X t , ) / < , ] + [(Xfc, -

x0)/U]

II **.- - *<« II ~ II (2*. - x*. )/U + (xk, - x„)/U || Hence the properties (a'), (b') in the Remark after Definition 2.2 are satisfied by the sequences (zkt), {z*k%). It follows from the definition of / " , (ii) and the inequality / ( * * ) < / ( * . ) that' /"(x„; 0, u) < lim inf j

— — {f(zk%) - /(*„)} < 0.

—°° II «*. - xo II 2

□

This result provides a short proof of the following sufficient condition theorem [14, Theorem 2.9], for the problem V. T h e o r e m 4.5 (Second-order sufficient conditions without constraint). and suppose that (i) D-f(xo;)>0

onTln,

Let x 0 € W

Second-Order Directional Derivatives (ii) / " ( x o , 0 , u) > 0 whenever D-f(x0;u)

167 = 0, and u / 0.

Then there exists 6 > 0 such that f(x) > f(x0) for all 0 ^ | | x — x0 ||< <S. Indeed, if not: there exists a sequence (i^) in W such that x* —» i 0 , Xk ^ x„ and /(**) < / ( i o ) - By compactness of the unit ball in 1ln we assume without loss of generality that xk —» u x„ for some unit vector u. Then D_/(x„;ii) = 0 by (i), and hence / " ( x o ; 0 , u ) < 0 by Theorem 4.4. This contradicts the given assumption (ii). Remark Theorem 1 of Chaney in [9] is a weak form of the following result where he assumed the following stronger condition in the place of (ii): (ii)* / " ( x 0 ; 0,u) > 0 for all unit vectors u in Hn for which 0 £ <9u/(x0). Corollary 4.6 Let x 0 € W, and suppose that (i) v ■ u > 0 for all unit vectors u in 1Zn and v in <9 u /(x 0 ). (ii) f"(xo;0,u)

> 0 for all unit vectors u in 1Zn for which Z)_/(x 0 ;u) = 0.

Then there exists 6 > 0 such that f(x) > /(x 0 ) for all 0 <|| x — x 0 ||< 8. To see this corollary we need only to show that there exists V- £ duf(x0) such that D-f(xo,u) = V- ■ u. In turn, for each t > 0, we apply the Mean Value Theorem of Lebourg [10, Theorem 2.3.7] to obtain at € (0,t) and vt € df(xo + atu) such that - { / ( X 0 + tu) - f{x0)}

= VfU

Passing to the lower limits, we have D-f(xo\u) = liminf ( j 0 f( • "• Since the multi function x —► df(x) is closed and locally takes values in a compact set [10], we can choose a sequence (/„) J. 0 such that lim„_ 00 vtn = u_ for some i;_ € <9/(x0). Clearly V- has the desired properties, as (x 0 + at„u) converges to x0 in the direction u. In connection with the constrained problem (Vc), let x 0 £ W be feasible (i.e. x 0 satisfies the given constraints). Let us say that a locally Lipschitz function F : W —► 11 is an allied function to / at x 0 if F(x) < F(x0) whenever x £ W is feasible and f(x) < f(x„). For example, if 0 = (/?,•)So* € ^ 1 + m +P is a "Lagrange multiplier compatible with x 0 ", that is, if m+p

A , A , ■ ■ ■, A* > 0 and (5m+u ■ ■ -,pm+p such that Pi9i(x.) = 0

€ 11 with £ .=o

# = 1

L. R. Huang and K. F. Ng

168

for all i = 1, • • • , m + p, then the corresponding Lagrange function L defined by m+p

L(x) := &/(*) + J2 pi9t(x) i=l

is an allied function to / at x0. Theorem 4.7 Let L : W —► 11 be an allied function of f at x0 such that D-L(x0; •) > 0 on 1Zn. Let (xk) be a sequence of feasible points in W convergent to x0 in the direction u, and suppose that f(xk) < / ( x 0 ) for all k. Then 0 6 3 u i(x 0 ) and

i'I(x„;0,u) < 0 . Proof. By assumptions, L(xk) < L(x0) for each k. Now the result follows from Theorem 4.4 (applied to function L in place of / ) . Part (ii) of the following theorem is a restatement of Theorem 4.7 while (i) is easy to verify. Theorem 4.8 Let (xjt) be a sequence of feasible point such that xk —►„ x 0 and f{xk) < /(xo) for each k. Then the following statements hold: (i) u € Ks{x0) and D-f(x0;u)

<0

(ii) If L is an allied function of f at x0 such that D-L(x0\-) 0 € duL(x0) and L'L(xo;0,w) < 0.

> 0 on 1in then

Let U denote the set of all unit vectors u in TV1 with the following property: there exists a sequence (x^) of feasible points such that x* —►„ x0, and / ( x t ) < / ( x 0 ) for all it. Theorem 4.9 (Second-order sufficient condition with constraints) Suppose for each u 6 U, there exists an allied function L of f at x0 such that D_L(x0\ •) > 0 on 1Zn and L"(x o ;0,u) > 0. Then there exists S > 0 such that f(x) > f(x0) for every feasible x ^ x 0 satisfying || x — x0 ||< <5. Remark. This result clearly improves an earlier result of ours [14, Theorem 5.7], which in turn extends a result of Chaney [9] where the function L is required to assume the regularity condition. Proof. Suppose not: there exists a sequence xk —► x„ and f(xk) < /(x 0 ) for each k. Without Xk —* u x0 for some unit vector u. Then u G U. allied function L of / at x» such that £>_Z,(x0;-)

(xk) of feasible points such that loss of generality we assume that By assumptions, there exists an > 0 and L"(x o ;0,u) > 0. This

Second-Order Directional Derivatives contradicts (ii) to Theorem 4.8.

169

□

For discussion of the accompany necessary condition theorems for (V) and (Pc) we refer the reader to [14]. Results in terms of the second-order derivatives of Cominetti and Correa [11] are also studied in [4]. A c k n o w l e d g e m e n t s . The author gratefully acknowledge the financial supports from the Research Grant Council of Hong Kong, the United College and the Institute of Mathematical Science, Chinese University of Hong Kong.

References [1] J-P Aubin and I. Ekeland, Applied Nonlinear Analysis (John Wiley & Sons. 1984). [2] A. Ben-Tal, Second-order and related extremality conditions in nonlinear pro gramming, Journal of Optimization Theory and Applications 31(1980) 143-165. [3] A. Ben-Tal and J. Zowe, Necessary and sufficient optimality conditions for a class of nonsmooth minimization problems, Mathematical Programming 24 (1982) 7091. [4] W. L. Chan, L. R. Huang ang K. F. Ng, On generalized second-order derivatives and Taylor expansions in nonsmooth optimization, SIAM Journal on Control and Optimization 32 (1994) 591-611. [5] R. W. Chaney, On second derivatives for nonsmooth functions, Nonlinear Anal ysis Theory, Methods and Applications 9 (1985) 1189-1209. [6] R. W. Chaney, Second-order directional derivatives for nonsmooth functions, Journal of Mathematical Analysis and Application 128 (1987) 495-511. [7] R. W. Chaney, Second-order necessary conditions in constrained semismooth optimization, SIAM Journal on Control and Optimizations 25 (1987) 1072-1081. [8] R. W. Chaney, Second-order necessary conditions in semismooth optimization, Mathematical Programming 40 (1988) 95-109. [9] R. W. Chaney, Second-order sufficient conditions in nonsmooth optimization, Mathematics of Operations Research 13 (1988) 660-673. [10] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley-interscience, New York, 1983.

170

L. R. Huang and K. F. Ng

[11] R. Cominetti and R. Correa, A generalized second-order derivative in nonsmooth optimization, SIAM Journal on Control and Optimization 28 (1990) 789-809. [12] C. N. Do, Generalized second-order derivatives of convex functions in reflexive Banach Spaces, Trnasactions of the American Mathematical Society 334 (1992) 281-301. [13] V. F. Dem'yanov and A. B. Pevnyi, Expansion with respect to a parameter of the extremal values of game problems, USSR Computational Mathematics and Mathematical Physics 14 (1974) 33-45. [14] L. R. Hunag and K. F. Ng, Second-order necessary and sufficient conditions in nonsmooth optimization, Mathematical Programming (1994) 379-402. [15] L.R. Huang and K.F. Ng, On some relations between Chaney's generalized second-order directional derivative and that of Ben-Tal and Zowe, SIAM Journal on Control and Optimization (to appear). [16] L. R. Hunag and K. F. Ng, On lower bounds of the second-order directional derivatives of Ben-Tal - Zowe and Chaney, submitted. [17] A. D. Ioffe, Calculus of Dini subdifferentials of functions and contingent coderivatives of set-valued maps, Journal of Nonlinear Analysis Theory, Methods and Applications 8 (1984) 517-539. [18] H. Kawasaki, Second-order necessary and sufficient optimality conditions for minimizing a sup-type function, Applied Mathematics and Optimization 26 (1992) 195-220. [19] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 959-972. [20] J.-P. Penot, Generalized higher order derivatives and higher order optimality conditions, preprint, Universite de Pau, 1985. [21] J.-P. Penot, Second-order generalized derivatives: comparisons of two types of epi-derivatives, Lecture Notes in Economics and Mathematical Systems 382 (1992) 52-76. [22] R. T. Rockafellar, First-and second-order epi-differentiability in nonlinear pro gramming, Trnasactions of the American Mathematical Society 307 (1988) 75108. [23] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematics of Operations Research 14 (1989) 462-484.

Second-Order Directional Derivatives

171

[24] M. Studniarski, Second-order necessary condition for optimality in nonsmooth nonlinear programming, Journal of Mathematical Analysis and Applications 154 (1991) 303-317.

M. Kocvara a n d J. V.

172

Outrata

Recent Advances in Nonsmooth Optimization, pp. 172-192 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

On the Solution of Optimum Design Problems with Variational Inequalities Michal Kocvara and Jiff V. O u t r a t a Institute of Information Theory and Automation, Czech Academy Voddrenskou vezi 4, 182 08 Praha 8, Czech Republic

of Sciences,

Pod

Abstract

The paper deals with the numerical solution of a class of optimum design prob lems in which the controlled systems are described by elliptic variational in equalities. The approach is based on the characterization of (discretized) sys tem operators by means of generalized Jacobians and the subsequent usage of nondifferentiable optimization methods. As an application, two important shape design problems are solved.

1

Introduction

O p t i m a l control problems, in which the controlled systems are governed by monotone variational inequalities, arise frequently in economic modelling and o p t i m u m design. Inspite of t h e fact t h a t the basic theoretical questions, connected with t h e existence of solutions, optimality conditions and suitable approximations, have been already successfully answered, the numerical solution of such problems remains t o be a difficult task. A classical regularization technique ([8]) consists in converting t h e variational inequality into an equation by means of a smooth penalty. In this way, one obtains a s t a n d a r d optimal control problem, to the solution of which various effective m e t h o d s are available. However, the smooth penalization leads either to a low accuracy or to ill-conditioned problems. Moreover, the incidental presence of state-space constraints which are mostly also treated by a smooth penalization, makes a suitable choice of t h e single penalty p a r a m e t e r s extremely cumbersome. Similar difficulties can be expected when using the penalty approach of [6], where the resulting optimization problem has to be solved on t h e product of the state and the control space.

O p t i m u m Design Problems

with Variation*/

173

Inequalities

In this p a p e r we develop another approach, started with [18], in which we, under suitable a s s u m p t i o n s , handle t h e variational inequality as a nondifferentiable con trolled s y s t e m by using t h e tools of nondifferentiable analysis [2]. As already shown in [19] a n d [17], in this way one can achieve a substantially higher accuracy compared with t h e regularization technique. Moreover, t h e incidental state-space constraints m a y be t r e a t e d by exact penalties, which further contributes to t h e quality of the results. In t h e spirit, our approach is close to the heuristic algorithms suggested in [5], b u t we do not use any heuristic reasoning in its development 1 . Assume t h a t U and Y are Banach spaces, B[U —> Y*\ is a linear continuous operator, t h e m a p A assigns to controls u linear selfadjoint elliptic ([8]) operators m a p p i n g Y into Ym and K C Y is a n o n e m p t y closed convex set. We confine ourselves to the controlled systems governed by the following variational inequality: For a given control u 6 U, find the state variable y 6 K such t h a t {A(u)y - Bu, v - y) > 0 for all v £ K.

. . ^ '

Since A is a selfadjoint operator, solving (1) for a given control u a m o u n t s to solving the convex p r o g r a m

±(y,A(u)y)-(Bu,y)

-> inf

subject to

(2) y e K,

which clearly possesses a unique solution due to the ellipticity of A(u) [8]. Also, if we discretize (1) e.g. by a finite element ( F E ) m e t h o d , a finite dimensional approximation y of y, corresponding to a finite dimensional approximation u of u, can b e found as t h e u n i q u e solution of a finite dimensional convex q u a d r a t i c p r o g r a m m i n g problem of t h e form (2). Therefore it is reasonable to investigate the optimization of systems governed by (1) (after an appropriate discretization) in a more general framework of two-level optimization problems #(u,y) -»inf subject to y 6 argmin
W

u £ w, m

where u £ R , y , v G R",.=, is a closed convex subset of lR n , u> is a compact subset of IR m , a n d for any fixed u 0 £ u the lower-level problems (y?(u0, v ) -* inf subject to

(4)

'After the paper had been finished, the autors learned about another approach proposed by J.S. Pang. This approach relies on a combination of interior and exterior penalties. It requires, however, the strict complementarity to be satisfied at the solution point. This is frequently violated in the applications discussed in Section 3.

174

M. Kocvara and J. V. Outrata

satisfy a number of assumptions. Problems (3) belong to so-called Stackelberg prob lems and have been already intensively studied from various points of view ([20,1,15]). Under the assumptions of [3], guaranteeing that the map S : U H arg min <^(u, v) is a locally Lipschitz operator, (3) may be rewritten into the form 0{u)=0(u,S(u))-Mnf subject to

(5) u 6 u).

Hence, to its numerical solution, existing nondifferentiable optimization (NDO) meth ods can be applied, provided 0 is at least directionally differentiable and we are able to compute vectors from the generalized gradient of 0 for arbitrary u 6 u>. This so-called direct method has been proposed and investigated in [18], where for the computation of the mentioned "subgradients" a passable way is derived provided the operators Vy U y>(u,S(u)) []Rm —> R n ] are surjective for all u 6 w. The aim of this paper is (i) to analyze the nonsurjective situation (appearing often in practical problems) from the point of view of the computation of "subgradients"; (ii) to apply the mentioned approach to two important design problems. Concerning item (i), in the next section we derive conditions, under which a matrix from the generalized Jacobian ([2]) of S at an u 0 6 lRm can be expressed as a linear operator denned by a special quadratic programming problem. The verification of these conditions is not easy in the general case, but assuming a special structure df E, they may be substantially simplified. It allows in particular to test the correctness of the vectors supplied as subgradients in the "packaging" and in the "incidence set identification" problems, used in Section 3 as test examples. However, even without this testing we have not observed any difficulties related to wrong subgradients. For the reader's convenience we recall still the definition of the generalized gradient and generalized Jacobian used throughout the whole paper. Definition 1.1 Let the operator F[Rm —» Ht"] be Lipschitz near x0 € H m and let up denote the set of points at which F fails to be differentiable. The generalized Jacobian of F at x0, denoted 8F(x0), is the set of[n x m] matrices, given by dF(xo) = conv I lim V_F(x,) | Xj —► x 0 , xi £ °~F{ For n = 1, dF(xo) is termed the generalized gradient of F at x0. Its elements are called subgradients. R e m a r k . The term subgradient is borrowed from convex analysis, where it is used for the vectors belonging to the subdifferential. Recently, however, it is frequently used also in a nonconvex setting for the elements of various generalized gradients.

Optimum Design Problems with VariationaJ Inequalities

175

The following notation is employed: A/"(C) is the null space of a linear operator C, S'(x0; d) is the directional derivative of an operator 5 at a point x0 in the direction d, Pi denotes the subspace of the polynomials of the first-order and x' is the i t h coordinate of a vector x G IRm and for x, y G R n the inequalities x > y (x > y) mean x' > y" (x* > y') for all i. H1, H£ are the usual Sobolev spaces W1-2, W01,2 and A denotes the closure of a set A.

2

Computation of the Subgradients

Let S = {yeB."|**'(y)<0, 1 = 1,2,...,?},

(6)

where functions $*[R" —* R] are convex and twice continuously differentiable. As sume that there exists an open subset V of R m containing u> such that

0, 0,

i = l,2,...,P,

. . >

(

v . . where £(u,y,A) = <^>(u,y) + *£. A'$'(y) is the standard Lagrangian. To be able to utilize the strong results of [3, 10], we impose still the Strong Second-Order Sufficient Condition, i.e., (A2) for all u 0 G V, y 0 G 5(u 0 ) and h G R m , h + 0, one has (h,V 2 , y £(u 0 ,yo,A 0 )h) > 0 , whenever (V*'(y 0 ), h> = 0

for

i G J(y 0 ) := {} € 7(y 0 ) | AJ0 > o}

Under (Al) and (A2), S is single-valued on V; moreover, it is locally Lipschitz ([3]) and directionally differentiable ([10, 13] ). The same is true about the operator A[R m —t R p ] , assigning to vectors u 0 G V the corresponding Kuhn-Tucker vectors An. It is well-known that if the strict complementarity condition 7(y 0 ) = J{yo) holds

176

M. Kocvara, and J. V. Outrata

at some y„ = S(u 0 ), u 0 G V, then 5 is even difFerentiable at u 0 ([4]). The gradient VS(uo) is in this case given as the operator which assigns to an arbitrary vector z G R m the (unique) solution of the quadratic program i(v,Q(u 0 )v> + ¥ ,(u ( u 00,,y y 00 ))z,v) z,v) — —► inf subject to

(8) i/(y„)(uo), v G £/(y„)(uo),

where Q(u0) = V ^ y £ ( u 0 , y 0 ) A0) and for an arbitrary index set G C { 1 , 2 , . . . ,p] LG(u)) = {{vv G £ RR"" | | (V*'(S(uo)),v) (V$ i (5(u 0 )),v) = 0, it 6G G) G}

(9)

If the differentiability of S at some u 0 G V is not ensured by the strict comple mentarity condition, we need for the evaluation of a subgradient from d 0 ( u o ) one arbitrary matrix from the generalized Jacobian dS(u0). Such matrices will now be constructed exactly according to Definition 1.1. Let y„ = S(u 0 ) and let the index set J(y 0 ) satisfy the inclusions (10)

J(yo) C J(y 0 ) C 7(y 0 ).

We denote A = 7(y 0 ) \ J{y0), & = J(y0) \ J(yo), o,ouo2,33 the cardinalities of I(yo),J(yo), A and B, respectively, and C(y 0 ) the [o x m] matrix, composed from V$'(y 0 ), i G 7(y 0 ), as rows. Evidently, C(y 0 ) may be divided into three matrices Cj{y0),CA{y0) and Cg(y0), composed from V$'(y 0 ) for i G J(y0),i G A and i G B, respectively. P r o p o s i t i o n 2.1 Let assumptions (Al), (A2) hold, u 0 G V,y 0 = S(u„) and let the index set J(y 0 ) satisfy inch (10). Assume that the linear system --C CTjA({yo)y\ yo)yI

+ Q(u 00)y^ )y; + y2* + Ce(y0)y5 )y*3

does not possess a solution (y-uy'2,y'3,y'4,yl) the conditions

(yi.y»)>o,

+

Cj(y 0 )yJ + C j ( y 0 ) y ; 2

g R° x R

03

xR

m

=0 =0

l[ii llj

>

3

x R°> x R° , satisfying

(yr,y;)^o

yie^((VXu 0t y 0 )f)n^(C J (y 8 )).

(12)

TTien tne operator Pj (u„) wfcicft assijw to an arbitrary vector z G R " «Ae fwj'^e) solution of the quadratic program i(v,Q(u0)v) + ( V V ^^(U u 0c, y oo)Kz ,vv )) -> inf subject to

(13) v G 6 7j(yo)(uo) ij(yo)(uo)

belongs to

dS(u0).

Optimum Design Problems with Variational Inequalities

177

Proof. In the first step we show the existence of a direction h e R " for which J ( S ( u 0 + Mi)) = 7(5(u 0 + Mi)) 0h)) = J(y J(y00))

(14)

for all sufficiently small positive t). By using of the directional derivatives of S and A, this condition may be rewritten into the form C^(y )5'(u 0 ;h) C4(y 0 )S'(u (A')'(u 0 ;h)

< 0 >0

iovzeB. fortGB.

(15)

Denote by A/(u 0 ) the subvector of A(u 0 ) composed from the multipliers, correspond ing to active constraints. Again, A/(u 0 ) may be decomposed into Aj(u 0 ),A^(u 0 ), A B (u 0 ) in the same way as C(y 0 ). The vectors S'(u 0 ;h), A' 7 (u 0 ;h) form the unique Kuhn-Tucker point of a special quadratic program with the constraints Cj(y0)S'(u0; h) = 0, CA(y0)S'{\i0;h) < 0 and C e (y 0 )S'(u 0 ; h) < 0, (cf. [10]), for which the KuhnTucker conditions attain the form Q(u 0 )S'(u )5'(u 0 ; h) + V ^ ( U o , y 0 )h + C r (y 0 )A',(uo; )A' / (u 0 ; h) = 0 (A')'(u 0 ; h ) ( V $ ' ( y 0 ) , 5'(u 0 ; h)) = 0, (A')'(u 0 ; h) > 0

for i g I(y0) \ J(y„).

(16)

By combining of relations (15),(16) and using the complementarity argument, one immediately concludes that the desired direction h exists whenever the linear system of equalities and inequalities Q(u 0 )S'(u 0 ;h) + V ^ ( u 0 , y 0 ) h + C Cj(yo)A^(u j ( y 0 ) A ^ u 00;;y y 00 ) + CjA' e (u 0 ;y 0 ) = 0 Cj(y0)S'(u0;

h) = 0,

C^(yo)5'(u0;h)<0,

C e (y 0 )S'(uo; h) = 0

(17)

A'B(uo;yo)>0

is consistent. Its consistency is according to the well-known Motzkin theorem of the alternative ([16]) equivalent to the inconsistency of (11), (12) and thus, under the assumptions imposed, the strict complementarity condition holds at the points u 0 + tfh for all tf > 0 sufficiently small. Consequently, S is differentiable at these points and VS(u 0 + tf h) is the operator, assigning to an arbitrary vector z € R m the unique solution v of the quadratic program rfh)v) - (V£u¥>(u0 + Ml, t?h, (5u (Su 0 + Mi))z, tfh))z, v) —► inf | ( v , Q(u 0 + Mi)v) —> inf subject to

(18) vv e G iLj. -

.(u 0 + Mi). (uo-Mh).

With respect to the definition of the generalized Jacobian, it remains to prove that VS(u 0 + tfh) converges for d | 0 to ^J (yo )(uo). To this purpose we denote u,, = u 0 + i?h and observe that V vs(u,) 5W = = r(u,)o(-v>K,s(u,))), rK)o(-V>(u,,5(u,))),

178

M. Kocvara and J. V. Outiata.

where T(u^) is the projection operator which projects (Q(u^)) 1 d, d € E. m , onto Ly, Ju#) in the Q(u#)— metric. Due to the continuity assumptions being imposed, T as well as V yu <^(-, 5(-)) depend continuously on xij over V so that limV5(u,) = P7(yo)(u0)e55(uo) by definition.

□

R e m a r k . In the particular case J{yo) = ^(yo), the variable y j disappears and in (12) we have to require y j > 0, yj ^ 0. Analogously, in the case J(yo) = ^(yo), the whole second equation of (11) disappears and in (12) we have to require y* > 0,

rt± o. Of course, the satisfaction of the above conditions can hardly be tested in the presented general form. Fortunately, these conditions may be drastically simplified in the case, when $'(y) = — y \ i = 1,2,... ,p, arising frequently in applications. Let us delete from Q(uo) and (Vy U c^(u 0 ,yo)) r all columns, corresponding to in dices i 6 J(yo) a n d denote these new [m x (m — ox)] and [n x (m — oi)] matrices by Q(uo) and F(\io), respectively. Corollary 2.2 Let p < m, $'(y) = —y',i = 1,2, . . . , p , let assumption (A2) hold, Uo £ V and yo = 5(uo). Suppose that n + p > m + o2 + o 3 ,

(19)

and the \(n -f p — o) x (m — Oi)] matrix, composed from F(UQ) and the rows of Q(u 0 ) corresponding to nonactive constraints, has maximal rank, i.e. m — 0\. Then the assertion of Proposition 2.1 holds true. Proof. We observe first that if (19) holds as equality, we have to test a square matrix of order m — o%; otherwise, the number of rows is greater than the number of columns so that the maximal rank of this matrix is indeed m — o\. The rows of the matrix C(y 0 ) possess nonzero elements only on positions specified by the index of the corresponding inequality. Since y j 6 A/"(C,/(yo)), one has (yj)' = 0 for t G -/(yo) and thus it suffices to consider only the remaining components. We denote by y\ the subvector of y*z composed from (y^)'',» £ ./(yo), and observe that the condition y% G M ((Vy U (^(u 0 ,yo)) T ) reduces to y\ 6 Af(F(u0)) and the first equation of (11) reduces to the form -C.S(yo)y! + Q(»°)r3 + Cj(y0)yl

+ Cj(y 0 )y 5 * = 0.

As clearly ( - C j ( y o ) y ! + Cj(y0)y;

+ Cj(y0)y5*)* = 0 for all i ? 7(y 0 ),

Optimum Design Problems with Varia.tiona..1nequalities

179

we conclude that necessarily ( Q ( u o ) y - ; )== 00 for all i <£(y„). (Q(uo)y-;) <£(y0).

(20) (20)

Equations (20) together with condition y£ 6 A/"(F(u0)) form a homogeneous linear system, the solution of which must be zero due to the imposed rank condition. Thus the whole vector y j = 0 and we immediately infer that the linear system (11) and conditions (12) are inconsistent due to the linear independence assumption (Al), being evidently satisfied. D Condition (19) restricts the admissible number of active constraints, whose multipliers are zero. It is not too restrictive, because usually p = m and thus it reduces to n > 03 + o3. In the examples solved, n was always larger than 02 + o3 and thus the criterion of the above corollary could be applied. For the evaluation of subgradients from 30(uo) it is not necessary to compute a matrix from 5S(uo) according to Proposition 2.1. Instead of it, we can apply the idea of the adjoint program ([18]) which leads to the following assertion. Proposition 2.3 Assume that g is continuously differentiable on V x HT, assumptions (Al), (A2) hold, uo 6 V, y 0 E 5(u 0 ) and J(y0) is an arbitrary index set satisfying inclusion (10). Let the linear system (11) do not possess a solution satisfying the conditions (12) and p 0 be the (unique) solution of the adjoint quadratic program ll(p,Q(u0,)p) ( p , Q ( u 0 ) p ) - (V (Vy^u0,y0),p) y f f (u 0 ,y 0 ),p) —> inf subject to (21) (21)

pP €€ ^W 11*)Then VuS(u0,y 0 ) - ( v X u 0 , y 00 ) ) T p o 6 * » ( « . ) .

(22) (22)

Proof. According to the Jacobian Chain Rule ([2]) TT (^J(yo) (uo)) £ := V u p(u 0 ,y0) + {Pj( (uo)) Vyyff
As already mentioned in the proof of Proposition 2.1, f

r

u ^(yo)( °) = ( u °o() -°V ( -> V(Xuuoo, ,yyoo)))) .. JWH = rW

where T(u 0 ) projects (
,(uo) in the <5(u 0 )-metric.

T £ = Vu5l (u00,y0) - (VyuV (u0,y ,y0)) r(u00)V )Vy5y5(u0,y0). (u0,y0). u5 (u yu ^(u 0 )) r(u

However, r ( u 0 ) V y ^ ( u 0 , y o ) is nothing else but the solution p 0 of our adjoint program (21) and thus inclusion (22) holds true. D This way of computing subgradients has been applied in the design problems investigated in the next section.

180

3

M. Kocvara and J. V. Outrata

Two Selected Optimum Design Problems

In this section we present two examples of shape design problems. In both examples, the controlled system describes the behaviour of a membrane supported by a rigid obstacle. In the first problem we want to find such a shape of the membrane, that its surface is minimized, while it is still in contact with a given part of the obstacle. In the second problem we do not care about the surface, but we want to manage the contact between the membrane and the obstacle exclusively on that given part. However, this part can move now, together with the boundary of the membrane. The first optimum design problem known as packaging problem has been analyzed and numerically solved in [7, 8] via the above mentioned regularization technique. In the following we show that results obtained by the technique proposed in this pa per are substantialy different both in terms of the design variables and the objective function. The second, more complicated problem, known as incidence set identifica tion problem has been analyzed in [9]; however, as far as we know, it has not been numerically solved yet. We start with the description of the controlled system which is the same for both problems.

3.1

Membrane with a rigid obstacle

Let us introduce the set of admissible design variables Vu = i u € C^QO, 1]) | 0 <

Cl

< u(x2) < c2, l ^ r « ( * 2 ) | < c 3 a.e. in (0,1) 1 ,

where C\,C2,cs are given positive constants such that Uad =f= 0. Consider a family of admissible domains Q(u) with variable right "vertical" part of the boundary: Q(u) = {(*!, x2) e IR21 0 < xx < u(x2) for all x2 € (0,1)} . Denote by O = (0,c 2 ) x (0,1) the largest domain from this family. Let ip € C(Q) be the obstacle function such that ip < 0 on <9fi U ((ci,C2) x (0,1)), and K(u) = {v 6 Hg(il(u))\v > i\> a.e. in fi(«)} be the set of admissible states. For u £ t/ 0 j, the corresponding state of the controlled system is computed by solving the variational inequality: Find v = v(u) € K(u) such that (VD, V(w - u)) 0 i n ( u ) > ( / . w ~ ")o,n(u) for all w €

K(u),

V{u)

where (., .)o,n(u) stands for the scalar product in L2(il(u)) and / € L2(il(u)). In the notation of (1), A(u)v = —Av on fi(u),B(u) = £f, where £ is the canonical embedding of L2(£2) into .//"'(H); however, the convex set K is in V(u) replaced by a set-valued map K(u).

181

Optimum Design Problems with VariationaJ Inequalities

Figure 1: Discretization of fi and u. The state problem V(u) describes the deflection of a membrane fixed on the bound ary given by dil(u), loaded by the pressure / and supported from below by a rigid obstacle described by xp. We discretize V(u) by the finite element method in the following way. Let 0 = a0 < di < • • • < (/,) = 1 be a partition of [0,1]. The discretization of Uad is defined as follows: E^ = {«AeC([o,i])|«t|[a,_J(%]ePi, 0 < ci < uh < c 2 ,

|ufc(a,-) - « A (o,_i) CLi —

l,...,D(h)\,

di-

i.e., U^d contains piecewise linear functions from Uaj. Further, we introduce a subset Uad C TRD(h), isometrically isomorphic with U^d: Uad = {ue

R D ( / , ) | u*' = uh{ai) for some uheU^d,

i = 1 , . . . , D(h)} ,

i.e., it is the set of vectors of ii-coordinates of the nodes u ' = (u A (a,),a,),

i=

l,...,D{h).

For Uh 6 U^d we define a polygonal computational domain il(uh) = { ( n , i j ) € R 2 | 0 < X i
0<x2<

l}

and construct its triangulation T(h, uh) depending on the mesh parameter h as well as on Uh. and consisting of two parts: the fixed triangulation of a rectangle (0, co] x [0,1] with Co < cx and the moving triangulation constructed by means of principal moving nodes (design nodes) u' and associated moving nodes, ^-coordinates of which are given by an equidistant partition of the segments [co,u/,(a,)], see Fig. 1. Thus, for a fixed h > 0, the triangulation T{h,uh) depends continuously on uh. The domain fi(uh) with a given triangulation T(h,uh) will be denoted by ilh-

M. Kocvara and J. V. Outrata

182

For discretization of P(u) we use the triangulation T{h,Uh) and the set of piecewise linear basis functions (Courant basis functions). In the standard way we obtain the stiffness matrix A(u) € IR NxA ' and the right-hand side vector f(u) € R N , con tinuously depending on a given u € Ua(j (N = N(h) is the number of nodes of the triangulation T(h, u^)). Denote by x', i = 1 , . . . , N, the nodes of T(h,Uh) and by * ' = il>(x'), i = 1 , . . . , N. The discretized problem can be written as the quadratic programming problem l{v,A-(f(u),v)—faif subject to

v 6 Kh(u),

) >

('P(u))^

J

where u 6 \Jad and K (u) = { v e R " | v- > *«', i = 1 , . . . ,N) h

Since the triangulation T(h,Uh) depends on u/, £ Ujjd, the same holds for ^ ( u ) . In order to be able to use the direct method described in Section 2, we perform a simple transformation y = v-*(u) and replace ('P(u))/ l by the problem subject to

| ( y , A(u)y) +
y>o.

CP(u))h

J

In such a way, the values of Kk are replaced by R + and u enters only the objective oi(V{u))h.

3.2

Packaging problem

In the packaging problem we try to find such a control u* for which the contact of the membrane with the obstacle occurs in a given subset of $7, while the surface of the membrane (i.e., the measure of fi(u*)) is minimized. Let fi0 be a given closed simply connected subset of [0,co] x [0,1]. For u 6 Uai, denote by Z(u) the contact region which is the set {x S 0(u) | v(x) = ip(x)}, where v is the solution of V(u). The packaging problem is defined as follows: G(u) = meas fi(u) —► inf subject to Z{u) D n 0 , u G Uad, v solves V(u).

(P)

183

Optimum Design Problems with Variations/ Inequalities

If the set {u 6 Uai I Z{u) D H0} is nonempty, then (P) has at least one solution

([8, Thm. 6]). For the treatment of the state constraint Z(u) D fto, an exterior quadratic penalty technique has been proposed and analyzed in [7, 8]. However, as v > i\>, the direct method being applied allows us to augment this state constraint by a differentiable exact penalty. (This is impossible in the regularization technique used in [7], where the relationship v > i\> does not hold.) One obtains then an augmented objective functional GT(u,v)= meas Q(u) + r / (v-ifi)dx, Jn0 where r > 0 is the penalty parameter. Hence, instead of (P), we will solve the problem GT(u, v) —► inf

subject to (Pr)

u £ Uad,

v solves V(u). In [8, Thms. 9.3, 9.4] it is shown that for exterior quadratic penalty the penalized problem has at least one solution for any r > 0, and if r —► oo, then u r , the solutions of ( P r ) , converge uniformly to u", the solution of (P). Analogously, the same can be proved for Gr with the exact linear penalty. The discretization of ( P r ) is straightforward. Let T>0 be the set of indices of nodes lying in fi0. The discretized problem reads as follows: £ r ( u , y ) = meas Uh + — ^y' h

—► inf

i£V0

subject to

(P.) u e Uad y solves {P(u))h,

with r > 0. It is known ([8, Thm. 9.5]) that if h —> 0+ then uTh, the solutions of (Pr);,, converge uniformly to u r , the solution of (P r ). Problem (P r )/i is now exactly of the form (3). It satisfies all requirements needed by the direct approach; in particular, the appropriate function 0 is locally Lipschitz and directionally differentiable. For the evaluation of its subgradients, relations (21) and (22) can be applied which provides us with the formula V u £ r ( u , y ) - [V u (A(u)y + A ( u ) * ( u ) ) - V f ( u ) ] T p € 5 0 ( u ) , where y solves {V(u))h

(23)

and the variable p solves the adjoint quadratic program i ( p , A ( u ) p ) - ( V y £ r ( u , y ) , p ) —► inf

subject to

(24) p{ = 0

for i 6 J{y),

M. Kocvara and J. V. Outrata

184

where J ( y ) C J ( y ) C J(y). In the computations performed, we have set J(y) = I(y) during the whole it eration process. To be correct, one should apply the test of Corollary 2.1 at each point, where the strict complementary condition is violated. Fortunately, we have not met any computational difficulties, which shows the robustness of our approach when applied to this kind of controlled systems. Example 3.1 (see [7]): Consider the packaging problem in which fl0 = [0.25, 0.5] x [0.25, 0.75], f(xi,x2) = —1.0 and ^ ( x ^ x j ) = —0.05xi. The set Uad is specified by the parameters c\ = 0.6, c-i = 1.0, C3 = 3.0, the triangulation parameter CQ = 0.5. First we have used (as in [7]) the quadratic penalty term — - 2 ^ (y') 2 w ' t n penalty 2h .ePo 4 parameter r = 10 The problem has been computed by the NDO code BT [19]. The discretized problems (V(u))i, have been solved by a two-step algorithm introduced in [13]. The results obtained for h = ~g (D(h) = 17) are close to those of [7] at least concerning the optimal value of the objective (E°pt = 0.784213). However, the used quadratic penalty has led to considerable inaccuracies in the satisfaction of the state constraint (up to 4 % of the deflection at the front corners of f?o)- Therefore, we have solved the problem with the exact penalty. The penalty term — 52 (y1) has been used for three discretizations given by i€Z>o

^ = 8' 16' 1*2' 64' Th e P e n a lty parameters r have been chosen large enough to satisfy the state constraint exactly. Their values together with the corresponding optimal objective values E°pt are given in Table 1.

TABLE 1 D{h) 9 17 33 65

r E?* 8 • 103 0.787932 8 • 104 0.826013 8 ■ 105 0.850895 8 • 106 0.866364

The final design for D(h) — 65 is depicted in Figure 2. We see that in the set Ho the isolines of the solution follow the isolines of the obstacle (which are parallel to 12—coordinate). Comparing the values of E°vt for the quadratic and linear penalty, respectively, we see a significant difference (5% for h = -h). Also the resulting optimal design is quite sensitive to the exact satisfaction of the state constraint. This is even more evident in the next example.

Optimum Design Problems with Variationai Inequalities

185

Figure 2: Optimal design for Ex. 3.1, D(h) = 65. Example 3.2 (see [6]): Let rf>{xux7) = -0.05(iJ + (z 2 - 0.25)2) and all other data be the same as in Example 3.1. Again, the linear penalty has been used for three discretizations given by h = | , ^ , J J i . The values of the penalty parameters r and the corresponding optimal objective values E°pt are given in Table 2. TABLE 2 £>(fc) 9 17 33 65

r E°?" 1,6 ■ 104 0.780361 3 • 105 0.900842 5 • 105 0.934860 1 106 0.980475

The final design for the finest mesh {D{h) = 65) is depicted in Figure 3. Again, in do, the isolines of the solution coincide with isolines of the obstacle, which are depicted in Figure 4. The comparison of the obtained optimal design with that computed via regular ization technique and with the quadratic penalty is quite interesting in this example. When comparing the maximum components of the design vectors for the same dis cretization parameter h = jr, we obtain the difference of 11%, i.e., the resulting designs differ substantially.

186

M. Kocvara and J. V. Outrata

Figure 3: Optimal design for Ex. 3.2, D(h) = 65.

Figure 4: Isolines of obstacle for Ex. 3.2.

Optimum Design Problems with Variations] Inequalities

187

Note that it is necessary to use a nonsmooth code in this approach. Numerical test performed with a smooth (SQP) code failed in most cases due to line-search difficulties at a point which was far from the true solution.

3.3

I n c i d e n c e set identification

In the second design problem of incidence set identification ([8]) we do not take care about the surface of Q(u) as in the packaging problem, but we want to manage the contact between the membrane and the obstacle exclusively on n 0 . However, in contrary to the first problem, this set can move together with the moving boundary of fi(u), because now fio = {(xi,x2)

e E. x [f,S\ |wi(x 2 ) < zi
,

where 0 < 7 < 6 < 1, wj, a>2 £ C0,1([7,<$]) and have uniformly bounded derivatives a. e., and for given positive scalars e, A, C4 e < wi(x 2 ) + A < LJ2(X2) < c4,

for all x2 G [7,5].

In [8], two different objective functionals, expressing the "identification" require ment have been proposed. In our approach we utilize the complementarity conditions to create another suitable objective, but we introduce it only after the problem has been discretized. The discretization we start with the domain S70. Let 7 = 60 < &i < • • ■ < bo'{h) — & be a partition of [7,1$] corresponding to the partition a0 < ■ ■ ■ < ar}(h.) of [0>1]> i-e-> such that the x2—coordinates of 6, coincide with x2—coordinates of some a,-; see Figure 5. Define for a given positive c5 (specifying the upper bound for the derivatives) a new set of admissible (discretized) design variables corresponding to il0:

Kd = {"h = («!*,«») G (C([i,6]))2 \ujh\lb,_lM

G Pu

0 < e < Uik + A < u2h < c4, i = l,...,D'(h),j

< c5, h - k-l = l,2}

and for ilk = (wi/i, "2/1) € U'J^ the associated discretized set fi0fc(w/i) as fto/i(ufc) = {(xi, x2) G R x [7,S] I u u (a; 2 ) < xx < u2h(x2)} . For uh G U£d and fih = («U,W 2 A) 6 tC> the single principal moving nodes are given by couples

(ulh(bi),k),

(u/.(<2;)>a0) (u3h(bi),bi),

i= i=

l,...,D(h), l,...,D'(h),

M. Kocvara and 3. V. Outrata

188

Figure 5: Discretization of n,u,o>i and u>2. and X\—coordinates of the associated moving nodes are given by an equidistant par tition of segments for x'2 < 7 or xl2 > 6,

[0,u/,(a<)] [0,uih(bj)],

[uih(6j),«2k(&i)], [u2h(bj),uh(ai)]

for 7 < x*3 < 6,

see Fig. 5. Analogously to U a j , we introduce still a set ~Vad by

V a „ = {u € R°(fc)+«'(M I u .

= Uh{a.):

uD
w

'

w

£ = ! , . . . , D(fc), /y(A),

= «»(*), i = l , . . . , t f ( A )

for some u h 6 C&, («u, u2A) 6 U*dj , i.e., V a j contains vectors of ii—coordinates of all principal moving nodes. For the construction of the objective we utilize the fact that if y' > 0 at some node in (T'(u))/,, then the coordinate A' of the appropriate (unique) Kuhn-Tucker vector must be zero. As A = A(u)y + A ( u ) t f ( u ) - f ( u ) ,

189

Optimum Design Problems with Varia.tiona.1 Inequalities we may employ the objective £ (A(u)(y + * ( u ) ) - f ( u ) ) \ where V\ = {1,2,...,

N}\

2>a. Thus, the discretized problem of identification of the incidence set can be formu lated as *v(u,y)= E (A(u)(y+*(u)-f(u))' + - [ subject to u G V„ d , ^ y solves {V(u))h,

?

£yi^mf

' (P'r)*

where r > 0 is the appropriate penalty parameter. Of course, even if E'r = 0, "semiactive" contacts with zero multipliers may occur; however, this undesirable phenomenon has not been observed in computations. Problem (P'r)h may again be solved numerically by using of the direct method and a suitable NDO routine. In this case, however, the computation of subgradients according to (21), (22) is slightly more complicated than in the packaging problem, because the evaluation of V u £ ^ , VyJE^ is nontrivial and the dependence of A, f on the design variable is more complex. Example 3.3. Consider the problem of identification of the incidence set where f(xi,x2) = —1.0 and i>{xi, 12) = —0.03. The sets Uad, U'Jj are specified by parameters c, = 0.7, c2 = 1.2, c3 = 2.5, 7 = 0.25, S = 0.75, e = 0.15, A = 0.05, c4 = 0.65. The proper choice of the (linear) penalty parameter r is more difficult than in the obstacle problem, because both terms in the objective functional E'r are of the same nature and thus r determines a scaling between them. For r = 33, h = yg and h = ^ , the final values of the objective functional are E'ropt = 0 and £^ p< = 0.395618 • 10" 4 , respectively. In Figure 6 we see the initial design of $7 and fi0 and the corresponding isolines of the solution. Figure 7 shows the optimal design for h = ^ ; the boundary of fl0 in fact coincides with the isoline —0.03. Finally, Figure 8 shows a 3D view of the solution and the obstacle with changed scaling in the vertical axis. We see that the problem constraints are satisfactorily satisfied. The obtained results could not be compared with any other ones, because in [9] there is no attempt to solve the problem numerically.

4

Conclusion

The direct method proposed in [18] and further developed in this paper proved to be an effective tool for the numerical solution of the considered shape optimization problems. It may be recommended whenever the number of design variables is small with respect to the number of state variables and a 'high' accuracy of the results is required. Another applications of this technique are reported in [11, 12].

190

M. Kocvara and J. V. Outrata.

Figure 6: Initial design for Ex. 3.3, D(h) = 33.

Figure 7: Optimal design for Ex. 3.3, D{h) = 33.

Optimum Design Problems with Variational Inequalities

191

Figure 8: Optimal state solution for Ex. 3.3, D(h) = 33.

References [1] J. P. Aubin and I. Ekeland, Applied Nonlinear Analysis, J. Wiley & Sons, New York, 1984. [2] F. H. Clarke, Optimization and Nonsmooth Analysis, J. Wiley & Sons, New York, 1983. [3] B. Cornet and G. Laroque, Lipschitz properties of solutions in mathematical programming, Journal of Optimization Theory and Applications 53 (1987) 407427. [4] A. V. Fiacco, Sensitivity analysis for nonlinear programming using penalty meth ods, Mathematical Programming 10 (1976) 287-311. [5] T. L. Friesz, R. L. Tobin, H.-J. Cho and N.J. Mehta, Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality con straints, Mathematical Programming 48 (1990) 265-284. [6] P. T. Harker and S. C. Choi, A penalty function approach for mathematical programs with variational inequality constraints, WP 87-09-08, Dep. Dec. Sci., Univ. of Pennsylvania, 1987.

192

M. Kocvara and J. V. Outrata

[7] J. Haslinger and P. Neittaanmaki, On the design of the optimal covering of an obstacle, in: Lecture Notes in Control Inf. Sci. 199, Springer-Verlag, Berlin, (1988) 192-209. [8] J. Haslinger and P. Neittaanmaki, Finite Element Approximaiion for Optimal Shape Design: Theory and Applications, J. Wiley & Sons, Chichester, 1988. [9] K.-H. Hoffmann and J. Haslinger, On the identification of incidence set for elliptic free boundary value problems, DFG Research Report No. 174, Augsburg, 1989. [10] K. Jitorntrum, Solution point differentiability without strict complementarity in nonlinear programming, Mathemaiical Programming Study 21 (1984) 127-138. [11] M. Kocvara and J. V. Outrata, Shape optimization of elasto-plastic bodies governed by variational inequalities, in: J.-P Zolesio, editor, Boundary Control and Variation, Lecture Notes in Pure and Applied Mathematics 163, pages 261271. Marcel Dekker, 1994. [12] M. Kocvara and J. V. Outrata, A numerical approach to the design of masonry structures, in: Proceedings 16th IFIP Conf. on System Modelling and Optimization, Compiegne, July 5-9, 1993 (to appear). [13] M. Kocvara and J. Zowe, An iterative two-step algorithm for linear complementarity problems, Numerische Mathematik 68 (1994) 95-106. [14] J. Kyparisis, Sensitivity analysis for nonlinear programs and variational inequalities with nonunique multipliers, Mathemaiics of Operations Research 15 (1990) 286-298. [15] P. Loridan and J. A. Morgan, Theoretical approximation scheme for Stackelberg problems, Journal of Optimization Theory and Applications 61 (1989) 95-110. [16] O. L. Mangasarian, Nonlinear Programming, McGraw Hill, New York, 1969. [17] M. Makelaand P. Neittaanmaki, Nonsmooth Optimization, World Scientific, Singapore, 1992. [18] J. V. Outrata, On the numerical solution of a class of Stackelberg problems, Zeitschrift fur Operations Research 34 (1990) 255-277. [19] H. Schramm and J. Zowe, A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results, SIAM Journal on Optimization 2 (1992) 121-152. [20] H. von Stackelberg, The Theory of Market Economy, Oxford University Press, Oxford, 1952.

Monotonicity

and

Quasimonotonicity

193

Recent Advances in Nonsmooth Optimization, pp. 193-214 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Monotonicity and Quasimonotonicity in Nonsmooth Analysis 12 Sandor Komlosi Faculty of Economics, Janus Pannonius H-7621, Rdkoczi ut 80, Pecs, Hungary

University

Abstract

The role of the monotonicity concept for linear operators, bifunctions and multifunctions in several branches of Mathematics such as Functional Analysis, Nonlinear Analysis, Optimization Theory is rather well known. Quite recently the notion of monotonicity has been generalized by different authors. The aim of the present paper is to give some recent contributions to the theory of gen eralized monotonicity in a unified way incorporating results for a wide range of generalized derivatives and subdifferentials.

1

Introduction

T h e role of t h e monotonicity concept for linear operators, bifunctions and multifunctions in several branches of M a t h e m a t i c s such as Functional Analysis, Nonlinear Analysis, O p t i m i z a t i o n Theory and its relationships with convexity are rather well known [1,2,8,30,35,36]. For an instance we mention only two well known theorems from Convex Analysis: (i) a directionally differentiable lower semicontinuous function is convex if and only if its directional derivative is a m o n o t o n e bifunction; (ii) a lower 'This paper is the written version of the invited lecture delivered by the author on "Workshop on Nonsmooth Analysis and its Applications" (Banach Center, Warsawa, May 3-14, 1993) and which was completed in October 1993, during the author's stay at the Department of Mathematics, University of Pisa. I take the opportunity of thanking Professor Franco Giannessi and Professor Massimo Pappalardo for their warm hospitality. 2 Partially supported by the Hungarian National Scientific Research Foundation (Grant No. T013967).

S. Komlosi

194

semicontinuous function is convex if and only if its convex subdifferential is a nonvoid monotone multifunction [30,33]. Since it has been well known that the convexity assumption on the function can be weakened to certain kind of generalized convexity assumption without "destroying" the nice results valid for the convex case, therefore there has recently been similar attempts to weaken the monotonicity assumption to some kinds of generalized monotonicity concept. In 1976 Karamardian introduced a concept of pseudomonotonicity for gradient maps [16]. Some years later, in 1983, Hassouni introduced the notion of quasimonotonicity for multifunctions [14]. The paper of Karamardian and Schaible [17] from 1990, where several kinds of generalized monotonicity were introduced and studied for gradient maps, might be considered as the one opening a new theory, the theory of generalized monotonicity. These concepts were extended to nondifferentiable functions (for generalized derivatives and subdifferential maps) by Komlosi [19-25] and Luc [27,28] and for Equilibrium Problems by Blum and Oettli [2], Further remarkable results on this topic can be found in [3,4,5,9,10,12,15,31,36,37]. The aim of the present paper is to give some recent contributions to the theory of generalized monotonicity in a unified way incorporating results for a wide range of generalized derivatives and subdifferentials. The paper is organized as follows: Chapter 2 gives a short overlook to monotonic ity concepts proved to be useful in several branches of optimization theory and related topics. Chapter 3 is devoted to the introduction of different generalized monotonicity concepts for bifunctions (generalized derivatives) and the study of their interrelations with certain kinds of generalized convexity. In Chapter 4 we show how monotonicity can be characterized via quasimonotonicity. These results are applied in Chapter 5, where quasimonotonicity - quasiconvexity results are proved applying suitable mean value theorems, due to Diewert and Zagrodny, respectively. Chapter 6 is devoted to the generalized monotonicity of multifunctions, for which different characteriza tions are given. For special subdifferentials generalized monotonicity is linked with generalized monotonicity of the generalized derivatives associated with them.

2

Monotonicity Concepts in Nondifferentiable Optimization

Let X, X* and (■, •) denote, respectively, a real Banach space, its topological dual and the canonical bilinear form on X* x X throughout this paper. The following models present the most favourable classes of optimization theory and related fields. Mathematical Programming Problem (MPP): minimize / ( x ) x £ C, C CX,

Monotonicity and

Quasimonotonicity

195

where the function / is, / : C —► R. Complementarity

Problem (CP): find x £ l such that xGC,

F ( X ) G C * and

(F(x),x)

= Q,

where C C X is a given convex cone, C" C X' is the dual cone to C and F is a given function, F : C —» X' Variational Inequality Problem (VIP): find x a X and y € X* such that x G C, y 6 T(x) and (y,z - x) > 0 for all z 6 C, where C c l and T is a multifunction T : C ^

X*.

Equilibrium problem (EP): find x £ . Y such that / ( x , y ) > 0 for all y € C, where C C X and f : C x C ^ R with / ( z , z) = 0 for all z G C There are interesting interrelations between these classes of problems, namely (MPP) can be formulated as (CP), (CP) can be given in the form of (VIP) and all of them may have the form of (EP). More details on these interrelations can be found in [2,8,37]. It is well known that in existence proofs for (MPP) the convexity of the objective function / ( x ) is a celebrated property, whereas for the other problems (CP), (VIP) and (EP) the monotonicity of the functions F(x), / ( x , y) and the multifunction T(x) guarantees the existence of solutions and supports algorithms computing them. For the sake of convenience let us recall the definitions of monotonicity for the different cases considered. a.) F:C ^X", C CX, (F(y)-F(x),y-x)>0 b)T:C

-> X*, C

for all

x,y€C

(1)

CX,

< r ( y ) - r ( x ) , y - x ) C R+

for all

x,y<EC.

(2)

Here ( r ( z ) , u ) denotes the set {(z*,u) | z* G T(z) } and R+ denotes the set of nonnegative reals. c) / : C x C -> R, with / ( z , z) = 0 for all z G C, /(x,y) + / ( y , x ) < 0

for all

x,yGC.

(3)

S. Komlosi

196

The interrelations between monotonicity and convexity are well known, moreover behind both properties there are well developed theories: Convex Analysis, Minty's Theory [35]. By a classical result, the monotonicity of the directional derivative / ' ( x , d) com pletely characterizes the convexity of the given function / ( x ) . Recently similar char acterizations were obtained in terms of some other kinds of generalized derivatives such as Clarke-, Rockafellar- and Dini derivatives [6,22,26].

3

Generalized Monotonicity for Bifunctions

Generalized derivatives might be considered in a unified way as a bifunction h(x,d) with finite or infinite real values, where x refers to a given point of a given subset C from X and d refers to a given direction of X. Since the domain for d is always the whole space X, therefore it is sufficient to specify the domain of x only and so, for the sake of brevity, we will say that /i(x, d) is a bifunction defined on C. Let us begin with introducing generalized monotonicity concepts for generalized derivatives proved to be useful for nondifferentiable functions [19]. Definition 3.1 Let /i(x,d) be a bifunction defined on the convex set C. h(x,d) is called monotone, strictly monotone, quasimonotone, pseudomonotone, strictly pseudomonotone on C, if for every y , z 6 C, y / z, condition (M), (SM), (QM), (PM), (SPM) holds, respectively: (M)

h(y,z-y)

+

h(z,y-z)<0,

(SM)

h{y, z - y) + h(x, y - z) < 0 ,

(QM)

h(y,z-y)>0

implies

h(z,y - z) < 0 ,

(PM)

h(y,z-y)>0

implies

/j(z,y-z)<0,

(SPM)

h(y,z-y)>0

implies

h(z,y - z) < 0

Remark. In the definitions of (M) and (SM) we adopt the following rule: (+oo) + (-oo) = 0 . The following interrelations are immediate consequences of the above definitions. (SM) => (M) => (PM) => (QM)

Monotonicity and

197

Quasimonotonicity

and (SM) =► {SPM) => (J°M) => (QM). Remark. If / ( x ) is differentiable with gradient map F(x) = / ' ( x ) , then the monotonicity concept (1) for F(x) coincides with (M) if we apply it for h(x,d) = ( F ( x ) , d ) . It should be mentioned that for this choice of h(x,d) the other kinds of generalized monotonicity concepts gives just the ones introduced by Karamardian and Schaible [17]. If ft(x,d) is a generalized gradient, then (M) is equivalent to (3) with / ( x , y) = h(x, y - x). One of the most useful applications of the generalized monotonicity concepts is that these properties can be related to appropriate kinds of generalized convexity [15,17,20,23,27, 28]. To proceed this way let us recall first some definitions. Definition 3.2 Let the function / ( x ) and the bifunction h(x, d) be defined on the convex set C C X. / ( x ) is called convex, strictly convex, quasiconvex, h-quasiconvex, h-pseudoconvex or h-strictly pseudoconvex on C, if conditions (CX), (SCX), (QCX), (h-QCX), (h-PCX) or (h-SPCX) hold, respectively: (CX): for all x , y 6 C and t g [0,1] one has

/(tx+(l-i)y)<*/(x) + (l-i)/(y), (SCX): for all x , y € C, x / y and t 6 (0,1) one has

(*x+(l-t)y)
/(ix+(l-*)y)<max{/(x),/(y)}, (h-QCX): for all x,ye

C,

/(x) < /(y) (h-PCX): for all x,y

implies

ft(y,x-y)<0,

implies

h{y, x - y) < 0 ,

implies

h(y,x-

(EC, /(x) < /(y)

(h-SPCX): for all x , y € C, x ^ y, /(x) < /(y)

y) < 0 .

Remark. The above definitions can be applied for function / ( x ) and bifunction /i(x, d) with 'extended' real values, as well. The following interrelations are immediate consequences of the definitions: (SCX) => (CX) => (QCX),

198

S. Komlosi ( h - S P C X ) => ( h - P C X ) =*• ( h - Q C X )

In the sequel we shall focus our attention to some special kinds of generalized derivatives of a given function / ( x ) at a, whose definitions are given below using the following notations [13]: ^ . (z, a) -> (a, /(a)) and ( z ) ,, and a > ((z,z ,Qa)) || a < (z,a)->(a,/(a)) > //(z) ( z , a ) ||aa< ! -*=*= ) - (z,a) -» ((a, a , //(a)) ( a ) ) and and a < //(z) (z) . (z,a) (z,a)-> Dini derivatives (upper and lower):

/(a ,Di A\ r / ( a + id)f d ) -/(a) - / (; a ) D / (a,d) (a, d) ::= hmsup —, = limsup (—0+ (—0+ ti /o(a,d) := liminf

/(a + id) - /(a) t

1

Dini-Hadamard derivatives (upper and lower): /

— -/(a) ,

(a, d) = hm sup

"-To*? & a /00 H(a,d) liminf^ H ( a , d ) = ]imini^ u-»d

+ tU U +

^ ' ~^ ^ ; t

Clarke derivatives (upper and lower): rC/ tCi J \ if(z + td) - a f (a,d) = hmsup , (z, 0 )|a

/ C ( a , d ) = liminf / ' ( Z 1-.0+

+ M )

~

a

;

t

Rockafellar derivatives: z+ / H" ( a , d ) = = l limsup i m s u p inf -^^Z + **U")) -—-- ,,

(z,o)la / n ( a , d ) = liminf sup / ( z + * " ) - < * . weaA; Rockafellar derivatives: a u f *(a,d) = + **U) ) ~~ / ( aa ) _, = limsup u infd / ((& +

i_o+

—

'

Monotonicity and

Quasimonotonicity

199

t i A\ v • t /(a + *u)-/(a) fwR(a,d) = liminf sup — -^-^ . Remark. The "limsup inf" and "liminf sup" operations were introduced by Rockafellar [34,35]. The meaninig of limsup inf (_0+ (z.o)la

u

—d

operation, for istance, is the following sup lim sup e>0

inf

(^o+ l l u - d l l < £ (z,o)ia

The meaning of the other operations used above are similar. The following theorem provides the relationships between these derivatives ac cording to the partial order "<", the proof of which can be found in [13]. Lemma 3.3 For all x G C and d 6 X one has /5(x,d) I V /c(x,d) |V /*(x,d)

> > >

/D"(x,d) I v /D(x,d) |V rfi(x,d)

> > >

fwR(x,d) |v /D(x,d) |V /0ff(x,d)

> > >

/„(x,d) | V /c(x,d) |V //(X,d)

where the "unnamed" derivatives / s ( x , d ) and / / ( x , d ) are defined as follows: fSi J\ r / ( z + ru) - a / (a,d) = limsup , u-d (z,a)la //(a, d) = liminf—1 (-.0+ u^d (z,a)Ta

t

Remark. It should be mentioned that for lower semicontinuous functions the convergence (z, a) J. a is equivalent to the simpler convergence z —*j a, whose meaning is: z - > , a <=$■ ( z , / ( z ) ) - > ( a , / ( a ) ) , and in this case a can be replaced with /(z) in the different quotient. Taking into account the above relationships the following "majoring criterion'' might be very useful.

S.

200

L e m m a 3 . 4 [23] Let the bifunctions h(x, d) andg(x, Assume that for all x € C and d £ X g(x,d)

< h(x,d)

Komldsi

d) be defined on the set C C

X.

.

Ifh(x,d) is (strictly) monotone, quasimonotone or (strictly) pseudomonotone on C, then g{x,d) is (strictly) monotone, quasimonotone or (strictly) pseudomonotone on C, respectively, as well. T h e proof is quite simple and thus o m i t t e d . T h e following two theorems provide a base in studying the links between general ized convexity and generalized monotonicity of different kinds of generalized deriva tives. T h e o r e m 3 . 5 Let f(x) be radially lower semicontinuous on the convex set C and let either h(x,d) = / D ( X , d ) or h(x,d) = / D ( x , d ) . Then each of the following statements are true: (i) f(x) is convex on C iff h(x,d) is m o n o t o n e on C, (?*) / ( x ) is strictly convex on C iff h(x, d) is strictly m o n o t o n e on C, (ii) / ( x ) is quasiconvex on C iff h(x,d) is quasimonotone on C, (Hi) / ( x ) is (strictly) h-pseudoconvex on C iff h(x,d) is (strictly) pseudomono tone on C. T h e proof of (i) and (i") can be found in the papers [22, 26]. S t a t e m e n t (ii) was proved in [19,20,27], whereas the proof of (Hi) was given in [23]. T h e o r e m 3.6 Let f(x) be lower semicontinuous on the open convex set C C Then each of the following statements are true. (i) f(x) is convex on C iff fR(x,d) is monotone on C, (ii) f(x) is quasiconvex on C iff fR(x,d) is quasimonotone on C,

X.

T h e proof of (i) of the above theorem was given in [26], whereas s t a t e m e n t (ii) was proved in [27]. T h e following result is an i m m e d i a t e consequence of Theorem 3.5 and L e m m a 3.3. T h e o r e m 3 . 7 Let f(x) be radially lower semicontinuous have for all x G C and d G X

on the convex set C and

fD(x,d)
Monotonicity and

Quasimonotonicity

201

Remark. The above theorem can be applied for any members of the following collection: /S(x,d)

>

|v

/C(x,d)

fDH(x,d)

>

/„ H (x,d)

>

/D(x,d)

|v

>

|v

fD(x,d)

From Theorem 3.6 and Lemma 3.3 one can infer the following statement. Theorem 3.8 Let f(x) be lower semicontinuous on the open convex set C and let us have for all x £ C and d 6 X fR(x,d)/c(x,d)>/*(x,d). This result can be sharpened by applying the following general assertion. Lemma 3.9 Let the bifunction h(x, d) be defined on the open set C. Set g(x, d) = limsup inf h(z,u) z—.x

(4)

a—>d

Then (i) the monotonicity of h(x, d) on C implies the monotonicity of g(x, d) on C, (ii) the quasimonotonicity ofh(x,d) on C implies the quasimonotonicity ofg(x,d) on C. Proof. First we prove assertion (ii). Suppose that h(x,d) is quasimonotone on C and assume for contradiction that 0

and

ff(y,x-y)>0.

Due to (4) it follows that there exist x ' , y ' 6 C such that h{x', y' - x') > 0

and

h(y', x - y') > 0 ,

which contradicts to the quasimonotonicity of h(x,d). applied for proving (i). ■

The same reasoning can be

Since for lower semicontinuous functions we have fDH(x,d)

< fR(x,d)

< limsup inf / D W ( z , u ) , z->x

u-»d

(see [39]) therefore, by Lemmas 3.3 and 3.9, we arrived to the following result:

S. Komlosi

202

Theorem 3.10 Let f(x) be lower semicontionuous on the open set C. Then the following statements are true: (i) /£>tf(x,d) is monotone on C iff J " ( x , d ) is monotone on C, (") /Dw(x,d) is quasimonotone on C iff fR(x,d) is quasimonotone on C. Taking into account the above result, Theorem 3.8 can be sharpened as follows. Theorem 3.11 Let / ( x ) be lower semicontinuous on the open convex set C and let us have for all x G C and d 6 X

/zur(x,d)
> >

/D"(x,d) |V fD(x,d) R |V

> >

/„«(x,d) |V /D(x,d) | V

/*(x,d) > r (x,d) > fDff(x,d) 4

Characterizing Monotonicity in Terms of Quasimonotonicity

It is worth mentioning that the proofs for convexity-monotonicity and quasiconvexity-quasimonotonicity interrelations have been elaborated so far independently from each other in both of the cases of Theorems 3.5 and 3.6 (see [22,26,27]). What is rather surprising, due to the following two lemmas, convexity-mono tonicity statements can, however, be proved by applying quasiconvexity-quasimonotonicity results directly. Lemma 4.1 The function f(x) is convex on the convex set C C X iff the function F(x) = / ( x ) + (g,x> is quasiconvex on C, for all g 6 X'. Proof. Necessity: Obvious, since the sum of a convex and linear function is always convex.

Monotonicity and

Quasimonotonicity

203

Sufficiency: Assume that F(x) = / ( x ) + (g, x) is quasiconvex on C for all g £ l ' . Suppose for contradiction that / ( x ) fails to be convex on C. It means that there exist two distinct points x, y from C and a third point z = tx + (1 — i)y from the open line segment (x, y) such that i / ( x ) + (1 - t)f(y)

< /(z) .

By virtue of the Hahn-Banach Extension Theorem you can always find an appropriate g* € X' such that F'(x) = F'(y) < F*(z) , where F*(x) = / ( x ) + (g*,x). The above conditions, however, contradict to the quasiconvexity of F'(x). This contradiction proves the thesis. ■ Lemma 4.2 The bifunction h(x,d) tf(x,d)

is monotone on C iff the bifunction = /*(x,d) +
is quasimonotone on C for all g 6 X' Proof. Necessity: Assume that h(x,d) is monotone on C. arbitrary and set H(x,d) = h(x,d) + (g,d). Since

Let g G X" be

H(x, y - x) + H(y, x - y) = h(x, y - x) + h(y, x - y ) , therefore H(x, d) is also monotone and thus quasimonotone on C, as well. Sufficiency: Let H(x, d) be quasimonotone on C for all g € X* Since 0 6 X* therefore it follows that h(x, d) is quasimonotone on C, as well. Assume for contradiction that /i(x, d) fails to be monotone. It means that there exist two distinct points x and y in C such that /t(x,y-x) + / i ( y , x - y ) > 0 .

(5)

Without loss of the generality we may assume that ft(x,y-x)

> 0.

From the quasimonotonicity of h(x, d) it follows, that h(y, x — y) < 0. Taking into account inequality (5) we get 0 < -fe(y,x-y) <

h(x,y-x).

In virtue of the Hahn-Banach Extension Theorem you can always find a g* £ X' such that -h(y,x-y) < ( g * , x - y ) < h{x,y-x) . If you consider H*(x, d) = h(x,d) + (g",d) then the above inequalities yield H"(x,y-x)>0

and

H'{y,x

which contradicts to the assumption that H'(x,d) contradiction proves the thesis. ■

- y) > 0 ,

is quasimonotone on C.

Applications of the above lemmas will be given in the next chapter.

This

S. Komlosi

204

5

Two Mean Value Theorems

First we present a proof for statement (ii) of Theorem 3.5. Then, by applying Lemmas 4.1 and 4.2, we get statement (i) of the same theorem. To proceed this way we need the following mean value theorem due to Diewert. [11, Theorem 1, Corollary 1]. Theorem 5.1 Diewert's Mean Value Theorem. Let the function / ( x ) be defined on the line segment [y,z] and assume the values f(y), / ( z ) to be finite. Let s(t) = /(y + *(z — y)) be lower semicontinuous on [0,1]. Then there exists to £ [0,1) such that

/D(xo,z-y)>/(z)-/(y), where x 0 = y + t0(z — y). Proposition 5.2 Let / ( x ) be radially lower semicontinuous on the convex set C. Then / ( x ) is quasiconvex on C, iff any of the following conditions holds: (0 / D ( x i d ) is quasimonotone on C, (ii) fD(x,d) is quasimonotone on C. Proof, (i): Necessity. Assume that / ( x ) is quasiconvex on the convex set C and /D(x,y-x)>0,

(6)

where x, y are arbitrary elements from C. Our task now is to prove that /o(yi x—y) < 0. It is easy to verify that the following implication is true for / ( x ) on C [11]: u , v € C , / ( u ) < / ( v ) =► / D ( v , u - v ) < 0

(7)

From (6) / ( x ) < / ( y ) follows, which, together with (7) gives / o ( y , x — y) < 0. Sufficiency. The proof given here is a slightly modified version of the one given by Luc in [27], Let /£>(x,d) be quasimonotone on C. For contradiction suppose that / ( x ) fails to be quasiconvex. Then it means that there exist a line segment [a, b ] c C and a point z on it, z 6 (a, b), such that /(z)>max{/(a),/(b)} Due to the quasimonotonicity of / D ( X , d) we may assume that / ( z ) is finite. Assume for contradiction that /(z) = -foo for all z S (a, b). Then we have / n ( a , b — a) = /£>(b, a — b) = +oo, which contradicts to the quasimonotonicity of / D ( X , d) on C. Since / ( a ) , / ( b ) and / ( z ) are all finite, therefore there exist, by Diewert's Mean Value Theorem, u G [a, z) and w G [b, z) satisfying conditions /D(u,z-a)>/(z)-/(a)>0 and /o(w,z-b)>/(z)-/(b)>0.

Monotonicity and

Quasimonotonicity

205

Taking into account the positive homogenity of the Dini derivative with respect to its direction argument, the last two inequalities provide the following ones: /D(U,W-U)>0

and

/ D ( W , U - w) > 0 ,

contradicting the quasimonotonicity assumption. This contradiction proves statement (i) of the present theorem. Since / D ( x , d ) > /£>(x,d), therefore statement (ii) follows immediately from (i) above and Lemma 3.4. ■ Now we present a proof for statement (ii) of Theorem 3.6 capturing the essence but simplifying the details of the original one [27]. Then, by applying Lemmas 4.1 and 4.2, we get statement (i) of the same theorem. The same approach can be found in [41]. Let us consider now the generalized derivative fR(a, d), with 'extended' real val ues, and the subdifferential 6Rf(a), possibly empty, associated with it (in the sense of Definition 6.1) as follows: **/"(*) = { g € X' | (g, d) < / " ( a ; d) for all d g X } The basic tool in proving the above claim is the following approximate mean value theorem due to Zagrodny [38,Theorem 4.3]. Theorem 5.3 Zagrodny's Mean Value Theorem. Let / ( x ) be a proper lower semicontinuous function defined on an arbitrary Banach space X. Let f(x) assume finite values at a, b € X. Then there exist a point z £ [a, b) and two sequences {z^}, Zjt G X and {gk}, g* € SRf(zk) such that l i m Zjt = z , k—*+oo

liminf(gfc,b-z,)>^^||b-z||,

(8)

liminf(gt,b-a)>/(b)-/(a).

(9)

k—*+oo

The following corollary to this theorem, which allows us to replace b in (8) with any element from the half ray y = a 4- t(h — a) 6 C, t > 1, will be of direct use in the sequel. This lemma was proved in [27] in a rather complicated way. Here we present a quite simple proof for it. Lemma 5.4 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Let / ( x ) assume finite values at a, b € C. Then there exist a point z G [a, b) and two sequences {z*}, z* € C and {gfc}, gfc € SRf{zk) such that lim Zk = z , k—*+oo

S. Komlosi

206 and for any y = a + <(b — a) € C, t > 1 we have liminf(g,,y-z,>>Z^5M||y-z||.

Proof. We shall prove that the series {z*} and {g^} from Theorem 5.3 meet the requirements of the present lemma. Since the set C is open, therefore we may assume without loss of the generality that zk € C for all k. Let us denote ( / ( b ) — / ( a ) J / | | b — a|| by the simpler symbol K. Observe that y - z* = (y - b) + (b - zk) = a ( b - a) + (b - zk) . with a = |(y - b||/||b - a||. It follows that (g*,y-z*> = a ( g i , b - a ) + (gA,b-zfc) . Taking into account the superadditivity of the liminf operation, the definitions of scalars K and a and applying (8) and (9) one can deduce that l i m i n f ( g t , y - z / t ) > liminfa(g k ,b - a) + liminf(g t ,b - zk) > k—*+oo

k—*+oo

k—»+oo

> ^ ( | | y - b | | + ||b-z||) = A'||y-z|| and it was to be proved. ■ Now we are ready to prove the following theorem. Proposition 5.5 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is quasiconvex on C, iff the Rockafellar derivative fR(x,d) is quasimonotone on C. Proof. Necessity: For the proof we refer to [27]. Sufficiency: Let fR(x,d) be quasimonotone on C. Assume for contradiction that / ( x ) fails to be quasiconvex on C. It follows that there exist two points a, b 6 C with finite values / ( a ) and / ( b ) and a third point c on the open line segment (a,b) such that we have /(c)>max{/(a),/(b)}. (10) Without loss of the generality we may assume that /(c) is finite, as well. (If / ( c ) = +oo, then redefine / ( x ) only at x = c in such a way that (10) fulfil and we will arrive to the same conclusion.) Apply first Lemma 5.4, the corollary to Zagrodny's Mean Value Theorem for the line segment [a, c]. It follows the existence of a point z G [a,c) and two sequences {z/t}, zi 6 C and {gt}, gk G 6Rf(zk) satisfying the following conditions: lim Z/t = z ,

207

Monotonicity a n d Quasimonotonicity

and l i m i n f ( g , , b - zk) > M

ip~\\b

- z|| > 0 .

( T h e last inequality is a consequence of (10).) It follows t h a t for sufficiently large k one has (gt,b-zfc)>0.

(11)

Since z / c, therefore t h e sequence {<:*}, where c* denotes t h e projection of c to the closed line segment [zk, b] a d m i t s t h e following properties: ck 6 C

and

{cit} —> c,

cfc G ( z t , b ) , as

k —> + o o .

Since / ( c ) > / ( b ) and / ( x ) is lower semicontinuous, therefore for sufficiently large k /(e») > / ( b ) .

(12)

Let k b e arbitrary. Apply now L e m m a 5.4 to t h e line segment [cfc,b]. It follows t h e existence of a point wj. 6 [ c t , b ) and two sequences {w^,}, w ^ 6 C and {gw}, git, 6 SRf(wki) satisfying t h e following conditions: lim wfc, = Wfe , 1—* + co

and lim i n f ( g h , z t - w t . ) > ^ ~ { j , b ) ||zt - w t | | > 0 (13) ■^+oo ||ct - b|| ( T h e last inequality is a consequence of (12).) Let it b e sufficiently large satisfying (11) and (12). Due to the definition of w^ it is obvious t h a t (11) is equivalent to (gt,w/fc-zt) > 0 .

(14)

According t o (13) a n d (14) you can choose for this k a sufficiently large index i such that (g/ki,Zfc - w f c l ) > 0 and (gfe.Wfci - zk) > 0 hold. Set X! = w t , 6 C and x 2 = z* € C. Since / f l ( X i , X 2 - X i ) > (gjfci, Zfc - Wjti) > 0 and / f l ( X 2 , X ! - x 2 ) > (g*,Wfc -Zfc) > 0 , therefore these inequalities contradict to the quasimonotonicity of fR(x,d). contradiction proves t h e s t a t e m e n t of the above theorem. ■

This

S. Komlosi

208

6

Generalized Monotonicity for Mult if unctions

Subdifferentials, playing a very important role in Nonlinear Analysis and related fields, are useful dual objects of geometric character for generalized derivatives. Let / ( x ) be a function defined on X and let a £ X. The convex subdifferential of / ( x ) at a is defined as follows: */(*) := { 8 G X' | 0

implies

(v,x-y)<0,

(15)

Monotonicity and

Quasimonotonicity

209

(ii) pseudomonotone on C if for every x , y 6 C and u £ T(x), v £ T(y) one has (u,y-x)>0

implies

(v,x-y)<0,

(16)

(Hi) strictly pseudomonotone on C if for every x , y £ C, x / y and u € T(x), € T(y) one has

v

(u,y-x)>0

implies

(v,x-y)<0.

(17)

The following interrelations are immediate consequences of the definitions above. monotonicity =$■ pseudomonotonicity =4- quasimonotonicity and strict pseudomonotonicity =>• pseudomonotonicity . Amongst the subdifferential maps Sh(x) you can find a favourable class which is characterized by the following property: for each a £ C, h(a, ■) is the support function of Sh(a). In other words it means that A(a, ■) is a, positively homogeneous sublinear function and for all d G X we have fc(a,d) = sup{(d,g) |ge<5/i(a)} . Michel-Penot-, Clarke-, Rockafellar subdifferentials belong to the above class. The Michel-Penot directional derivative is defined as follows: fUp,

,i ,/ ( a + tz + i d ) - / ( a + fz) (a, d) = suphmsup , zex f-.o+ t and has some remarkable properties, namely it is a lower semicontinuous convex function of d and for all a, d £ X, we have /

/o(a,d) < /D(a,d) < /MP(a,d) < /c(a,d) The following theorem demonstrates that the generalized monotonicity concepts for bifunctions and multifunctions fit well to each other, moreover shows the way for transforming the results of the previous parts for subdifferentials possessing support function. (For the proof consult [22].) Theorem 6.3 Let 6h(x) be a subdifferential map defined on the convex set C with support function h(x,d). Then 6h(x) is monotone, quasimonotone, (strictly) pseu domonotone on C if and only if its support function h(x, d) is a monotone, quasimonotone, (strictly) pseudomonotone bifunction on C, respectively.

S. Komlosi

210

Let us consider now the generalized derivative fR(&, d) and the subdifferential S f(a) associated with it in the sense of Definition 6.1: R

**/(*) = { g € X' | (g,d) < / f i ( a , d ) for all d g X } . It is well known [34] that fR(&, ■) is the support function of 6Rf(a). By combining Theorem 6.3 and Proposition 5.5 we obtain the following theorem. Theorem 6.4 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is quasiconvex on C iff the Rockafellar subdifferential 6Rf(x) is quasimonotone on C. Thanks to the following result due to Hassouni [14,15], Theorem 6.4 enables us to characterize convexity in terms of the monotonicity of 6Rf(x). Now we give a proof bit simpler than that of Hassouni, applying characterization (15) instead of the Hassouni's definition. Lemma 6.5 Let T(x) be defined on C C X. Then the following statements are equivalent: (i) T(x) is monotone on C, (ii) T(x) + g is quasimonotone for all g 6 X* Proof, (i) =>■ (ii): It is not difficult to prove that the monotonicity of T(x) implies the monotonicity and thus the quasimonotonicity of its "translates" T(x) + g for any g g X * (ii) => (i): Suppose that T(x) + g is quasimonotone for any g g X*. Setting g = 0, we infer that T(x) itself is quasimonotone, too. Assume for contradiction that T(x) fails to be monotone on C. It follows that there exist x , y g C and u g T(x), v g T(y) such that (u,y-x) + (v,x-y) >0 (18) Without loss of the generality we may assume that (u,y-x) >0. Taking into account (15) and (18) it follows that (u,x-y) < (v,x-y) <0.

(19)

Let g = —(u + v)/2 and consider the multifunction M(x) = T(x) + g . Then we have u + g g M(x) and v + g g M(y) and from (19) we obtain that (u + g , y - x ) > 0

and

(v + g , x - y ) > 0 ,

which contradicts to the quasimonotonicity of M(x). As a corollary to Lemma 6.5 and Theorem 6.4 we have the following result (cf. [7,26,32,40]). Proposition 6.6 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is convex on C iff the Rockafellar subdifferential 6Rf(x) is monotone on C.

Monotonicity and

Quasimonotonicity

211

References [1] J.-P. Aubin, Optima and Equilibria, Springer-Verlag, Berlin Heidelberg, 1993. [2] E. Blum and W. Oettli, From optimization and variational inequalities to equi librium problems, The Mathematics Student 63 (1993) 1-23. [3] M. Bianchi, Generalized quasimonotonicity and strong pseudomonotonicity of bifunctions, Contributi di Ricera, Universite Cattolica del Sacro Cuore di Milano, 1994. [4] M. Bianchi, Strong pseudomonotonicity and generalized quasimonotonicity of bifunctions, in P. Mazzoleni (ed.) Optimization of Generalized Convex Problems in Economics, Milano, (1994) 101-112. [5] E. Castagnoli and P. Mazzoleni, Orderings, generalized convexity and mono tonicity, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidelberg, (1994) 250-262. [6] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley and Sons, New York, 1983. [7] R. Correa, A. Jofre, A. and L. Thibault, Characterization of lower semicontinuous convex functions, Proceedings of the American Mathematical Society 116 (1992) 67-72. [8] R. W. Cottle, F. Giannessi and J-L. Lions, (Eds.) Variational Inequalities and Complementarity Problems, Wiley and Sons, New York, 1980. [9] J.-P. Crouzeix and A. Hassouni, Quasimonotonicity of separable operators and monotonicity indices, Working Paper, Blaise Pascal University, 1993. [10] J.-P. Crouzeix and A. Hassouni, Generalized monotonicity of a separable product of operators: the multivalued case, Working Paper, Blaise Pascal University, 1993. [11] W. E. Diewert, Alternative characterizations of six kinds of quasiconcavity in the nondifferentiable case with applications to nonsmooth programming in: S. Schaible-W. T. Ziemba (eds.) Generalized Concavity in Optimization and Eco nomics, Academic Press, New York, 1981. [12] R. Ellaia and A. Hassouni, Characterization of nonsmooth functions through their generalized gradients, Optimization 22 (1991) 401-416.

212

S. Komlosi

[13] K-H. Elster and J. Thierfelder, On cone approximations and generalized direc tional derivatives, in: F. H Clarke, V. F. Dem'yanov a n ( j F. Giannessi, (eds.) Nonsmooth Optimization and Related Topics, Plenum Press, New York, (1989) 133-154. [14] A. Hassouni, Sous-difFerentiels des fonctions quasiconvexes, These de 3eme Cycle de 1'Universite Paul Sabatier, Toulouse, 1983. [15] A. Hassouni, Quasimonotone multifunctions: applications to optimality condi tions in quasiconvex programming, Numerical Functional Analysis and Opti mization 13 (1992) 267-275. [16] S. Karmardian, Complementarity over cones with monotone and pseudomonotone maps, Journal of Optimization Theory and Applications 18 (1976) 445-454. [17] S. Karmardian and S. Schaible, Seven kinds of monotone maps, Journal of Op timization Theory and Application 66 (1990) 37-46. [18] S. Karmardian, S. Schaible and J.-P. Crouzeix, Characterizations of generalized monotone maps, Journal of Optimization Theory and Application 76 (1993) 399413. [19] S. Komlosi, Generalized monotonicity of generalized derivatives, Working Paper, Janus Pannonius University, Pecs, (1991) 8. [20] S. Komlosi, On generalized upper quasidifFerentiability, in: F. Giannessi (ed.) Nonsmooth Optimization: Methods and Applications, Gordon and Breach, Lon don, (1992) 189-201. [21] S. Komlosi, Generalized monotonicity of generalized derivatives, in: P. Mazzoleni (ed.) Proceedings of the Workshop on Generalized Concavity for Economic Applications held in Pisa April 2, 1992, (Verona, 1992) 1-7. [22] S. Komlosi, Generalized monotonicity in nonsmooth analysis, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidel berg, (1994) 263-275. [23] S. Komlosi, Generalized monotonicity and generalized convexity, Working Paper, WP-92-16, Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest, (1992) 23. (accepted by Journal of Optimization Theory and Application). [24] S. Komlosi, Generalized global monotonicity of generalized derivatives, in: R. Tomlinson (Ed.) International Transactions in Operational Research 1 (1994) 259-264.

Monotonicity and Quasimonotonicity

213

[25] S. Komlosi, Generalized monotonicity in nondifferentiable optimization, in: L. Martic, L. Neralic, H. Pasagic (Eds.) KOI'93 Proceedings, Croatian Operational Research Society, Zagreb, (1993) 17-31. [26] D. T. Luc and S. Swaminathan, A characterization of convex functions, Nonlinear Analysis, Theory, Methods and Applications 20 (1993) 697-701. [27] D. T. Luc, Characterization of quasiconvex functions Bulletin of the Australian Mathemaiical Society 48 (1993) 393-405. [28] D. T. Luc, On generalized convex nonsmooth functions, Bulletin of the Australian Mathemaiical Society 49 (1994) 139-149. [29] O. L. Mangasarian, Pseudoconvex functions, SIAM Journal on Controls (1965) 281-290. [30] J. J. Moreau, Fonctionelles Convexes. Lecture notes, Seminare equations aux derivees partielles, College de France (1966). [31] R. Pini and S. Schaible, Some invariance properties of generalized monotonicity, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.)Generalized Convexity. Springer Verlag, Heidelberg, (1994) 276-277. [32] R. A. Poliquin, Subgradient monotonicity and convex functions, Nonlinear Analysis 14 (1990) 305-317. [33] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ., 1970. [34] R. T. Rockafellar, Generalized directional derivatives and subgradients of nonconvex functions, Canadian Journal on Mathemaiics 32 (1980) 257-280. [35] R. T. Rockafellar, The Theory of Subgradients and its Applications to Problems of Optimizaiion: Convex and Nonconvex Functions, Heldermann Verlag, Berlin, 1981. [36] S. Schaible, Generalized monotone maps, in: F. Giannessi (ed.) Nonsmooth Optimization: Methods and Applications, Gordon and Breach, London, (1992) 392408. [37] S. Schaible, Generalized monotone maps - a survey, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.)Generalized Convexity, Springer Verlag, Heidelberg, (1994) 229249. [38] D. Zagrodny, Approximate mean value theorem for upper subderivatives, Nonlinear Analysis, Theory, Methods and Applications 12 (1988) 1413-1438.

214

S. Komlosi

[39] D. Zagrodny, A Note on the equivalence between the mean value theorem for the Dini derivative and the Clarke-Rockfellar derivative, Optimization 21 (1990) 179-183. [40] M. Tosques, Equivalence between generalized gradients and subdifferentials for a suitable class of lower semicontinuous functions, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidelberg, (1994) 116— 133. [41] D. Aussel, J. N. Corvellec and M. Lassonde, Subdifferential characterization of quasiconvexity and convexity, Journal on Convex Analysis, to appear. [42] S. Komlosi, Monotonicity and quasimonotonicity for multifunctions, in: P. Mazzoleni (ed.) Optimization of Generalized Convex Problems in Economics, Milano, (1994) 27-39.

Sensitivity

of Solutions

in Nonlinear

Programming

215

Recent Advances in Nonsmooth Optimization, pp. 215-223 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Sensitivity of Solutions in Nonlinear Programming Problems with Nonunique Multipliers A. B. Levy Department of Mathematics,

Bowdoin

R. T. Rockafellar Department of Mathematics,

University

College, Brunswick,

of Washington,

ME 04011

Seattle,

USA

WA 98195

USA

Abstract

We analyze the perturbations of quasi-solutions to a parameterized nonlin ear programming problem, these being feasible solutions accompanied by a Lagrange multiplier vector such that the Karush-Kuhn-Tucker optimality con ditions are satisfied. We show under a standard constraint qualification, not requiring uniqueness of the multipliers, that the quasi-solution mapping is differentiable in a generalized sense, and we present a formula for its derivative. The results are distinguished from previous ones in the subject, in that they do not entail having to impose conditions to ensure that dual as well as primal elements behave well with respect to sensitivity.

1

Introduction

A s t a n d a r d p a r a m e t e r i z e d nonlinear p r o g r a m m i n g problem can b e formulated in t e r m s of s m o o t h functions / ; on IR" x IR'' as follows: minimize fQ{x,w) where t h e set C(w) C(w)

— (v,x)

over all x € C(w),

C IR" is denned by

: = {x : h{x,w)

< 0,...., f,(x,w)

< 0J,+l(x,w)

= 0, . . . . , / „ ( * , u>) = o } .

(1)

A. B. Levy and R. T. RockafeU&r

216

Here w £ 1R and v £ fit" both serve as parameter elements. In principle, the "tilt" perturbations represented through v could be subsumed into w, but they have an essential role in theory, and we therefore keep them explicit. We concentrate our attention on points x that are quasi-solutions to the mini mization problem (1) in the sense of satisfying, in association with some multiplier vector y, the Karush-Kuhn-Tucker (K-K-T) optimality conditions: 3y = {Uu■■•,Um) € A r *:(/i(z,"'), • • •, fm{x,w)) V = Vr/0(l,U)) + y1VI/1(x,tu)+

with

(2)

\-ym^xfm(x,w),

written here for convenience in terms of NK(U) denoting the set of normal vectors at u to the polyhedral cone K := {ueW

:ui < 0 , . . . , u , < 0 , u s + 1 = 0 , . . . , « m = 0}.

(3)

Thus, for any u £ K the vectors y 6 NK(U) are the ones with j / ; > 0 for indices z £ { l , . . . , m } having u, = 0 but yt = 0 for indices i £ { l , . . . , m } with u; < 0, whereas j/, is unrestricted for indices i £ {s + 1 , . . . , m } . (By convention, NK(U) is the empty set when u ^ K.) The notation y 6 NK(U) saves us from repeatedly having to write down such complicated details, and it has the further advantage of adapting in the framework of variational analysis to a broad range of circumstances beyond those that come into play here. The K-K-T conditions are of course necessary for a feasible solution x to be locally optimal under the Mangasarian-Fromovitz constraint qualification, which in turn takes the form

fly = {yu---,ym) yiVi/i(i,«)) +

e Ni<(fi(x,™),■ ■ ■ Jm{x,w)) with \-ym^xfm(x,w)

(4)

= 0, except y = 0.

Quasi-solutions are sure to be optimal solutions when the minimization problem ex hibits convexity with respect to x, but this is not an issue of concern to us here. The quasi-solution mapping S in this framework associates with each parameter element (w, v) G E.d+n the set S(w,v) := [x £ IRn : the K-K-T conditions (2) hold}. Since, in general, S(w, v) is not a singleton, this equation defines a multifunction (set-valued mapping) S : 1R + " ^? R". In the main result of this paper, we calculate a kind of generalized derivative of S with respect to (w,v). Until now, differentiability properties of 5 have been studied by distinctly different means than will be used here. Most researchers (cf. [2], [6], and [1] for a survey) have looked at the sensitivity of solution multifunctions defined by K-K-T pairs {x,y), T(w,v) := {(x,y) : x solves the K-K-T conditions (2) with y as multiplier},

Sensitivity of Solutions in Nonlinear Programming

217

being forced by this strategy to make strong assumptions about the multipliers y (e.g. uniqueness) in order to draw conclusions about the x-components of these pairs. Some exceptions to this approach are described in [6] where, however, single-valuedness of the solution mapping S is essential. Our approach is new and has the advantage of enabling us to study the "primal" solution multifunction S directly, without any restrictions on the multipliers. In such a setting, much broader than previously has been accessible, our approach leads to formulas describing the magnitude and di rection of perturbations of quasi-solutions in terms of approximations based on set convergence. It does not in itself, though, provide tests for whether S is single-valued in a localized sense. Our methodology derives from our recent work in [3], where we studied the sensi tivity of parameterized "variational conditions'' over a set which itself can depend on the parameters. Associated with any closed set C C 1R" and mapping F : W1 —► IR" is the variational condition F{x) + Nc(x) 9 0,

x € C.

(5)

When C is convex, this expresses the variational inequality for C and F. When C is not convex, Nc(x) is interpreted as the cone of "limiting proximal normals" in nonsmooth analysis, rather than the cone of normal vectors in the sense of convex analysis. The parameterized variational conditions studied in [3] are of the form F{x, w) + NC(w)(x) 3v,

x <= C(w),

(6)

with parameter element (w,v) 6 K. x IB". As long as the Mangasarian-Fromovitz constraint qualification (4) holds, the K-K-T optimality conditions (2) can be re formulated in terms of this type of parameterized variational condition by taking F(x,w) = Vxf0(x,w) (details in Section 3). The quasi-solution multifunction S is given then by the solutions to the parameterized variational condition (6), the pertur bations of which were analyzed in [4]. Here we show that by applying the results of [3] to this formulation of the K-K-T optimality conditions, it can be established that S is "proto-differentiable," moreover with a specific formula for the proto-derivatives.

2

Proto-derivatives

Proto-differentiability, a concept of generalized differentiability which was introduced in [9], is distinguished from other differentiability concepts through its utilization of set convergence of graphs. Consider any multifunction T : W =t Ht" and any pair (u>, z) in the graph of T, i.e., with z G T{w). For each t > 0 one can form the difference quotient multifunction (A ( I%, f : u> ~ [T(w + tw) -

z]/t.

218

A. B. Levy and R. T. Rockafellar

Instead of asking the difference quotient multifunctions (AtT)w,z to converge in some kind of pointwise sense as tiO, proto-differentiability asks that they converge graph ically, i.e., that their graphs converge as subsets of IRm x IR" to the graph of some multifunction A. Then A is the proto-derivative multifunction at w for z and is denoted by T^y, for each u> G IR , T'mi{u) is a certain (possibly empty) subset of IR". The concept of Painleve-Kuratowski set convergence underlies the formation of these graphical limits. It refers to a kind of approximation described from two sides as follows. The inner set limit of a parameterized family of sets {Gt}t>o in IR is the set of points n such that for every sequence <* 10 there is a sequence of points rj^ € Gtk with rjk —> r]. The outer set limit of the family is the set of points t] such that for some sequence tki0 there is a sequence of points nk € Gtk with 77* —► 77. When the inner and outer set limits coincide, the common set G is the limit as tiO. In our framework, this is applied to sets that are the graphs of multifunctions. For a multifunction T : IR™ ■=? W and any pair (w,z) in gphT, i.e., with z 6 r(u>), the graph of the difference quotient mapping (AjT)^^ is r _ 1 gphT — (w, z)\. The multifunction T^ IRm =? IR" having as its graph the outer limit of the sets g p h ( A < r ) ^ as tiO is called the outer graphical derivative of T at w for z. Similarly, the multifunction Y!^i : IRm ■=? IR" having as its graph the inner limit of these sets is the inner graphical derivative. Proto-differentiability of V at w for z is the case where the outer and inner derivatives agree, the common mapping being then the proto-derivative: T'0i = r £ f = T0i, cf. Rockafellar [8]. For the sake of better understanding of the approximation inherent in protodifferentiability, we furnish a description of the kind of uniformity that the concept involves. Proposition 2.1 Under the assumption that T : IR"1 ^ W and A : IRm ^ IR" are multifunctions having closed graph, the following is necessary and sufficient for V to be proto-differentiable at w for z (where z £ T(u))j with proto-derivative T^ ^ = A: for every t > 0 there exists r > 0 such that, for all t £ (0, T), (a) whenever z + t( 6 T(w + tui) with \(\ < e'1 and \w\ < e~x, there exist £' and w' with \C - (\ < e, \w' - w| < e, C e A(w'). (b) whenever ( € A(u) with \(\ < e"1 and |w| < e _1 , there exist (' and w' with \(' -C\ 0 such that, for all t g (0, r ) , one

A. Proto-differentiability convergence is known to and every bounded set B has

Gt n B C G -I- V and G D B C G, + V. It suffices in this to consider neighborhoods V of (0,0) G IR m+ " formed by the product of an e ball around the origin of IRm and such a ball in IR", and on the other hand

Sensitivity of Solutions in Nonlinear

Programming

219

to consider bounded sets B formed by the product of an e ' ball around the origin of IRm and such a ball in 1R". The two inclusions reduce then to (a) and (b). D The proto-derivative notation simplifies when T happens to be single-valued at w, i.e., such that the set r(u>) is just a singleton {z}, then it suffices to write T'^. The next result clarifies the relationship between proto-differentiability in this case and B-differentiability as defined by Robinson [7]. Proposition 2.2 Suppose that T is single-valued on a neighborhood ofw. Then V is B-differentiable at w if and only if T is continuous at w and proto-differentiable at w with rjj single-valued, in which event one has the local expansion T(w + tu) = T(w) + tVa(u) + o(t\u>\) for t > 0.

(7)

Proof. B-differentiability corresponds to having an expansion of the form described, but in which the middle term on the right is f A(w) for a continuous (single-valued) mapping A. When this holds it is clear that T is continuous at w and protodifferentiable there with T'^ = A. Conversely, if the latter properties hold with respect to a single-valued mapping A, then it can be deduced from [9, Theorem 4.1] that there exists K > 0 such that \T(w) — T(w)\ < K.\W — w\ for all w in some neigh borhood of w. In particular, this yields A(w) < K|W| for all u. Because the graph of A, being a limit under graph convergence, is closed, it follows that A must be continuous. The characterization of proto-differentiability in Proposition 2.1 special izes then to show that the mappings (A ( r)^, (which are single-valued on ever larger neighborhoods of 0 and bounded there by K) converge uniformly on bounded sets to A. That is the meaning of the expansion expressing B-differentiability. □ This result means that proto-differentiability extends to set-valued mappings, just in the manner that might be wished, the notion of one-sided directional differentia bility deemed most appropriate in the sensitivity analysis of single-valued mappings, smooth or nonsmooth. The question of whether a certain mapping is single-valued or not can be dealt with as a separate issue, which need not be resolved before progress can be made on quantitative stability of solutions.

3

Sensitivity Theorem

Our main result rests on the reformulation of the K-K-T optimality conditions (2) as a parameterized variational condition (6), so we give the details of this reformulation next. In terms of the mapping G : RnU - . IRm defined by

G{x,w) :=

^f1(x,w),...,fm(x,w)),

the K-K-T conditions simply require that

VJ0(x,w)

+ VxG(x,wYNK(G(x,w))

3 v,

(8)

A. B. Levy and ft. T. Rockafellar

220

where VxG(x, w)* is the transpose of the partial Jacobian matrix for G with respect to x. In [3, Theorem 5.1], we have shown that if the Mangasarian-Fromovitz constraint qualification holds at (x, u>), then for all pairs (x,w) G 1R" x ]Rd that are sufficiently close to (x,tt>), the set VXG(X,W)'NK\G(X,W)J is equal to the normal cone mapping associated with the set C(w) at x. Under these circumstances then, the K-K-T optimality conditions (2) come out as the variational condition Vxf0(x,w)

+ NC(w){x)

3 v.

(9)

Theorem 3.1 Let S : \R.n+,i —> IRm be the K-K-T solution mapping defined by S(w,v) := \x : the K-K-T conditions (2) hold), and let x G S(w, v) be such that the Mangasarian-Fromovitz constraint qualification (4) is satisfied. Then for all (w, v) sufficiently close to (w, v) and for all x £ S(w, v), S is proto-differentiable at (w, v) for x with proto-derivative given by the formula: S ( W ( W > ' ) : = {*' ■ VlJo(x,w)x' with z = v — Vxf0(x,w)

+ VlJ0(x,w)w' and M(x,w)

+ M[XtwU(x',w') :=

B v'}

x

Nc(w){ ),

the multifunction M being proto-differentiable at {x,w) for z. Proof. From the equivalence of the K-K-T optimality conditions (2) to the varia tional condition (9), we get the K-K-T solution mapping S to reduce to the solution mapping associated with this variational condition, namely S(w, v) = {x : V x / 0 (x, w) + NC(w)(x)

3 v}.

This is exactly the kind of solution mapping whose proto-differentiability was studied in [3]. The proto-differentiability of M(w,x) = Nc(w)(x) immediately follows along with that of S from [3, Theorem 5.2]. □ To carry this further, a formula for the proto-derivatives of M is required. We can obtain such a formula from viewing C(w) for each w as the x-section at w of the set E = {(x,w) £ R n x m.d : G(x,w) € K). Poliquin and Rockafellar [5] show that when the Mangasarian-Fromovitz constraint qualification holds at (x,w) £ E, the multifunction NE : (x,w) t-> NE{X,W) is protodifferentiable at (x,w) for any {z,q) € NE(X,W). Then from [3, Theorem 5.2] we have

K.»)AX'>W')

= U j * ' : V with (2'>
j.

(10)

Here the formula for the proto-derivatives of NE is a key ingredient. To develop it and employ it put our various pieces together, we need the following notation.

Sensitivity of Solutions in Nonlinear

Programming

221

We let Ia(x, w) and Im denote the sets of active indices at (x, w) in the specification of E, namely I,(x,w)

= [i& {l,....,s} : fi(x,w)

= 0} and Im = {3 + 1,...., m } ,

and we define the polyhedral cone Q(x, tv) C W+d 0(x w) — l(x'

by

w') ■ V / i ( * . » ) - ( » ' . « « ' ) < 0 « " i € / . ( * , w ) , 1

n n

Next we introduce certain sets of multiplier vectors, first the bounded, polyhedral set Y(x w z q)=[

V = (Vu ■ ■ ■ ,Vm) £ NK-(fi(x,w),...,

fm(x,w))

:

1

and its face Ymax(x,w,z,q;x',w')

= argmax

£ J/.U 1 '.«"'), V 2 /.(x,u>)(x', u/)Y

yey(i,tu,z,9) i = i

\

/

and then the polyhedral cone

Y (x wx w)= l ' ' '

I V' = (y'1,...,y'm)eNK(f1(x,w),...Jm(x,w)): ) i / \ > [ y'i = 0 for i with (Vfi(X, w),(x', w')j / 0 j

Theorem 3.2 Under the assumptions of 3.1, the proto-derivatives of the multifunc tion M are given as follows. For (x',w') (fc Q(x,w), the set M! w . (x', w') is empty. But for (x',w') 6 Q(x,w), the set MLW\ z{x',w') consists of all vectors z' having the form m

r

T

m

z

' = H t/i v L/.( i . w)x' + v L/.( a: . w V + YJ yy*f>ix, w) - j/i* i=l

L

J

i=l

generated by arbitrary choices of y' € K'(a:,u>; x', u/) an
= {(x',w') 6 Q(x,v>) : {(z,q),{x',w'))

= 0} .

We have (z,q) 6 NE(X,W) if and only if there exists y £ Y(x,w,z,q). holds. The formula in question says that the set {NE)[KIW\,(z,q){x'iw')

Suppose that ' s empty if

A. B. Levy and R. T. Rockafellar

222 (x',wf) £ Q(x,w;z,q) (z1, q') of the form

whereas if (x',w') e Q(x,w;z,q) m

this set consists of all pairs

m

(*',
1=1

generated by arbitrary choices of y 6 Ynu%x(x, w, z, q), y' 6 Y'(x, w; x', w') and y'0 £ 1R. When this is plugged into (10) we get the formula claimed here. □ Theorems 3.1 and 3.2 can be extended to cover other solution mappings associated with much more general optimization problems, but we will not take this up here. As seen, these results rely heavily on the those in of [3], in particular on [3, Theorem 5.2]. The theory developed in [3] allows a direct sensitivity analysis of parameterized optimization problems to a degree that has not been possible before. We are able on this foundation to obtain in Theorems 1 and 2 proto-derivatives in the sensitivity analysis of the "primal" solution mapping S without making any restrictions on the multiplier vectors y in the K-K-T optimality conditions. When the solution mapping S happens to be single-valued, Theorem 3.1 gives results about the B-differentiability of S. Theorem 3.3 Let S : 1R"+ —> lRm be the K-K-T solution mapping defined by S(w,v) := \x : the K-K-T conditions (2) hold>, and let x G S(w, v) be such that the Mangasarian-Fromovitz constraint qualification (4) is satisfied. If S is single-valued on some neighborhood of (w, v) and continuous at (w, v), and S',as<. is single-valued as well, then S is B-differentiable at (w, v) with the expansion S(w + tw',v + tv') = S(w,v) + tS[aA(w',v')

+

o(t\{w',v')\),

the B-derivative SL s\(w', v') being given by the formula in Theorem 3.1 in combina tion with the one in Theorem 3.2. Proof. This combines the preceding results with Proposition 2.

□

References [1] A. V. Fiacco and J. Kyparisis, Sensitivity analysis in nonlinear programming under second order assumptions. In A.V. Balakrishnan and E. M. Thoma, edi tors, Lecture Notes in Control and Information Sciences, Springer-Verlag, (1985) 74-97.

Sensitivity of Solutions in Nonlinear Programming

223

[2] J. Kyparisis, Sensitivity analysis for nonlinear programs and variational inequal ities with nonunique multipliers. Mathematics of Operations Research, 15 (1990) 286-298. [3] A. B. Levy and R. T. Rockafellar, Variational conditions and the protodifferentiation of partial subgradient mappings. Nonlinear Analysis: Theory, Method and Applications, (1994) to appear. [4] A. B. Levy and R. T. Rockafellar, Sensitivity analysis of solutions to generalized equations. Transactions of the American Mathematical Society, 345 (1994) 661671. [5] R. A. Poliquin and R. T. Rockafellar, Proto-derivative formulas for basic subgradient mappings in mathematical programming. Set-valued Analysis, 2 (1994) 275-290. [6] D. Ralph and S. Dempe, Directional derivatives of the solution of a parametric nonlinear program, 1994, Research Report. [7] S. M. Robinson, Local structure of feasible sets in nonlinear programming, part iii: stability and sensitivity. Mathematical Programming Study, 30 (1987) 45-56. [8] R. T. Rockafellar, Nonsmooth analysis and parametric optimization. In A. Cellina, editor, Methods of Nonconvex Analysis, Springer-Verlag, (1990) 137-151. [9] R. T. Rockafellar. Proto-differentiability of set-valued mappings and its applica tions in optimization. In H. Attouch, J. P. Aubin, F.H. Clarke, and I. Ekeland, editors, Analyse Non Lineaire, Gauthier-Villars, (1989) 449-482.

B. Mond and J.

224

Zhang

Recent Advances in Nonsmooth Optimization, pp. 224-243 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Generalized Convexity and Higher Order Duality of the Non-linear Programming Problem with Non-negative Variables B e r t r a m Mond School of Mathematics,

La Trobe University,

Bundoora,

Victoria,

3083,

Australia.

Jinyun Zhang School of Mathematics,

La Trobe University,

Bundoora,

Victoria,

3083,

Australia,

Abstract

Consider the nonlinear programming problem with non-negative variables. A number of different second order duals are given and appropriate duality the orems established under weakened second order convexity conditions. Higher order dual problems are also discussed and corresponding duality results estab lished.

1

Introduction

Consider t h e nonlinear programming problems (P) min/(x) s.t. g(x) > 0 (P') min/(i) s.t. g{x)>0, x>0 where / and g are twice difFerentiable functions from i?" into R and i ? " respectively. T h e Wolfe duals [14], [7] of (P) and (P') are respectively (where V denotes the gradient column vector with respect to x).

(ID)

maxf(u)-yTg(u) s.t. VyTg(u)

(ID')

m a x f(u)

= V/(u), T

- y g(u)

s.t. V / ( « ) > VyTg(u),

y > 0 T

- u [Vf(u)

-

y > 0

VyTg(u)}

Generalized Convexity and Higher Order Duality

225

The duality of (ID) to (P) and (ID') to (P') was first established with / convex and g concave. Bector et.al. [1], Mahajan and Vartak [6] and Mond and Weir [10] established the duality of (ID) to (P) and (ID') to (P') when the Lagrangean f-yTg T T respectively, / — y g — v [-\ is pseudo-convex. Mond and Weir [10] and Mond and Zhang [12] established the duality of a general dual to (P) and (P') under still weaker convexity conditions. Mangasarian [8] first formulated the following second order dual to (P) (where p £ R" and V 2 is the symmetric n x n matrix of second order partial derivatives) (2D) m « / ( u ) - yTg(u) - \pT[V2f(u) T

T

s.t. Vy g{u) + (V*y g(u))p

- VVff(«)]p

= V/(tt) +

V2f(u)p

2/>0 and established duality theorems under somewhat involved assumptions. Note that if p = 0, then (2D) becomes (ID). In [9], Mond established the duality of (2D) to (P) under the following simpler assumptions:

/ ( * ) - / ( « ) > (x ~ u)TVf(u)
9i(u)

t = 1,2,... , m ,

<(x-

u)TVg,(u)

(for all

+ (x - u)TV2f(u)p + (x-

u)TV2g,(u)p

- \pTV2f(u)p

(1)

- -pTV2gi{u)p

(2)

(x,u,p)).

Mahajan [5] calls the conditions (1) and (2) second order convexity and concav ity respectively. Similarly, Mahajan [5] and Mond and Weir [11] give the following definitions: / is said to be second order pseudo-convex in (x,w) for p if

(x - u)TVf(u)

+ (x- u)TV2f(u)p

> 0 => f(x) > / ( « ) -

l

-pTV2f{u)p.

(3)

/ is said to be second order quasi-convex in (x,u) for p if f{x) - fiu) + \pTV2f(u)p

< 0 => [x - uf[Vfiu)

+ V 2 /(«)p] < 0.

(4)

A function / is second order pseudo-concave or second order quasi-concave if — / is second order pseudo-convex or second order quasi-convex. Note that second or der convexity, pseudo-convexity and quasi-convexity imply, respectively, (first order) convexity, pseudo-convexity and quasi-convexity since the respective inequalities must hold for p = 0. Clearly a function that is second order convex is also second order pseudo-convex and second order quasi-convex.

B. Mond and J. Zhang

226

Mond and Weir [11] gave a number of second order duals of (P) and established duality theorems under weakened second order convexity conditions. We now give a number of second order duals of (P') and establish duality theorems under weakened second order convexity conditions. Higher duals are also considered.

2

Second Order Duality

We first give the following second order dual to (P') max f(u) - yTg{u) - uT[Vf(u)

(2D')

- VyTg{u) + V2f(u)p -

V2yTg(u)p]

- \pT\V2f(u) - vVs(«)]p s.t.

V / ( u ) + V 2 /(«)p > VyTg(u)

+ V2yTg(u)p

y>o

(5)

Theorem 2.1 (Weak duality) Let x satisfy the constraints of (P1) and (u,y,p) satisfy the constraints of (2D1). If f is second order convex for all feasible (x, u,y,p) and yTg is second order concave for all feasible (x,u,y,p) then inf(P')

> sup(2D').

Proof: Since / is second order convex for all feasible (x,u,y,p), /(*) > / ( u ) + (x - u)TVf(u) = f(u) + xT[Vf(u) > / ( « ) + xT[VyTg(u)

+ (x - u)TV2f(u)p

+ V2f(u)p] - uT[Vf(u)

\pTV2f[u)p

-

+ V2f(u)p] -

+ V2yTg(u)p] - uT[Vf(u)

= / ( « ) + (x - u)T[VyTg(u) - VyTg(u) - V2yTg{u)p\

-

> / ( « ) + yTg(x) - yTg(u) - uT[Vf(u)

X

-pTV2f{u)p

+ V2f(u)p] -

+ V2yTg(u)p\ - uT[Vf(u)

- \pT(VV9(u))p-\pT[V2f(u)

we have

\pTV2f{u)p

+ V2/(u)p

V2yTg(u)]p + V 2 / ( „ ) p - VyTg(u)

-

V2yTg(u)p]

- \pT[V2f{u) - V2yTg(u)]p > /(u) - yTg{u) - uT[Vf(u)

+ V2f(u)p - VyTg(u)

-

V2yTg(u)p]

-l-pr^2f{u)-V2yTg{u)]p (The second inequality holds since x > 0 and V / ( u ) + V 2 /(i/)p > VyTg(u) + V2yTg(u)p, the third inequality holds since yTg is second order concave, and the last inequality holds since y > 0,g(x) > 0). D

Generalized Convexity and Higher Order Duality

227

Theorem 2.2 (Weak duality). Let x satisfy the constraints of (P') and (u,y,p) satisfy the constraints of (2D'). If, for each v, f — y g — vT[-] is second order pseudo-convex in (x,u) whenever (x,u,y,p) is feasible for (P') and (2D'), then inf(P') > sup{2D'). Proof: Let v = V / ( u ) + V2f(u)p - VyTg(u) (x - u)TV{f(u)

= (x-

- V2yTg(u)p.

- yTg(u) - uTv} + (x - u)TV2{f(u)

u)T{Vf(u)

From (5), - yTg(u) -

uTv)p

- VyTg{u) - v + V 2 /(«)p - V 2 2 / T 5 ( u )p} = 0.

Since / — y g — v [■] is second order pseudo-convex, we have f{x) - yTg(x) - xTv > f(u) - yTg(u) - uTv - \pT[V*f(u)

-

V*yTg{u)\p.

By y > 0, g{x) > 0, x > 0 and v = V / ( « ) + V2f(u)p - VyTg{u) -

V2yTg(u)p

we have f{x) > fin) - yTgiu) - uT[Vf(u)

+ V 2 /(u)p - VyTg(u)

- V 2 j/ T 5 («)p]

- i / [ V 2 / ( U ) - V2yT9(u)]Pa Theorem 2.3 (Strong duality) Let x° be a local or global optimal solution of (P1) at which a constraint qualifi cation is satisfied. Then there exists y" 6 Rm such that ix°,y°,p = 0) is feasible for (2D1) and the corresponding values of (P') and (2D1) are equal there. If also, for each v, f — yTg — VT[-] is second order pseudo-convex in (x,u) whenever (x,U, y,p) *s feasible for (P') and (2D'), then x° and (i°, y",p = 0) are global optimal solution for (P') and (2D') respectively. Proof: Since a constraint qualification is satisfied at x", then by the necessary Kuhn-Tucker conditions, there exists y° € Rm such that V/(z°) > VyTgix°) x o T [V/(x 0 ) - Vy°Tgix°)] = 0 y°Tg(x°) = 0, y" > 0

B. Mond and J. Zhang

228

Thus (x°,y",p = 0) is feasible for (2D') and the corresponding values of (P') and (2D') are equal. If / — yTg — vT[-] is second order pseudo-convex in (x,u), then by weak duality, x° and (x°, y°,p = 0) must be optimal for (P') and (2D'j respectively. □ Before deriving a general second order dual of (Pi), we first list some special cases.

(2D1')

max/(u) - ^pTV2/(u)p V / ( u ) + V 2 / ( " ) p > VyTg(u) + V2yTg(u)p, yTg{u) + uT[Vf(u) - VyTg(u) + V2f(u)p - V2yTg{u)p)

s.t.

- \pTV2yTg(u)p (2D2')

< 0,

max f(u) - yTg(u) - ^pT[V2f(u)

y>0 -

V2yTg{u)]p

s.t. V / ( « ) + V 2 /(u)p > VyTg{u) + V2yTg{u)p, uT[Vf(u) - VyTg(u) + V 2 /(«)p - V2yTg{u)p] < 0, y>o

(2D3')

max f(u) - uT[Vf{u)

- VyTg(u) + V2f{u)p

- V2yTg(u)P] s.t.

\pTV2f{u)p

V / ( u ) + V2f{u)p > VyTg{u) + yTg{u) - ^pT^2yTg(y-)p

< o,

V2yTg(u)p

y >o

(2Dl') is a dual to (P') under the assumption that / is second order pseudo-convex and yTg+vT[-] is second order quasi- concave in (x, u) for all feasible (x, u, y, p). (2D2') is a dual to (P') if / — yTg is second order pseudo-convex, and (2D3') is dual to (P') ii f — vT[-] is second-order pseudo-convex and yTg is second order quasi-concave in (x, u) for all feasible (x, u,y,p). (These will be shown later as special cases of a general result.) Observe that xTv for all v, will be second order concave, and hence second order quasi-concave in x, and so this condition does not have to be stated in (2D2'). Other second order duals to (P') are possible, with the components p; of g grouped in different ways, depending on the convexity conditions of / and g. We now give a general second order dual to (P'). Let M = { l , 2 , . . . , m } . N = { 1 , 2 , . . . , « } . Let Ia C M, a = 0 , 1 , 2 , . . . , r with I0 n Ie = <j>, a =f 0 and \J Ia = M. 0=0,1,...,r

LetJaC7V,

a = 0 , 1 , 2 , . . . , r with Ja n J0 = , a / £ and

(J

Ja = N

o=0,l,...,r

Note that any particular /„ or Ja may be empty. Thus if M has rT disjoint subsets and N has r 2 disjoint subsets, r = max{ri,r 2 }. Thus, if ri > r2 then Ja, a > r 2 , is

Generalized Convexity and Higher Order

Duality

229

empty. max/(u) - £ yl9i{u) - E u 3 [V/(«) + V2f{u)p •e/o jeJo - VyTg(u) - V 2 j, T <,(«R

(2DC)

- V 2 £ S/.'5.(«)]P ie/o s.t. Vf(u) + V2f(u)p-VyTg(u)-V2yTg(u)p>0 (6) E »*(«) + E «i[V/(«) + V 2 /( u )p - V»TS(u) - VV*(«)rii -

Jz P T [ V 2 / ( « )

- Vv2^W,(a)p<0,

a = l,2,...,r.

»>0. T h e o r e m 2.4 (Weak duality) Let x satisfy the constraints of (P1) and (u, y,p) satisfy the constraints of (2DG''). If f — E y>9i ~ E VA'\ !S second order pseudo-convex and E Vi9i ~v E VA']> •e/o je^o '£'<> izJa a = 1 , 2 , . . . , r, is second order quasi-concave in (x, u) whenever (x, u, y,p) is feasible for (P') and (2DG1), then inf(P') > sup(2DG'). Proof: From (6), let v = V/(u) + V 2 /(u)p - VyTg(u) - V2yTg(u)p. Then E SKtf«'(*)+ E

x v

i J- E Vi9i{u)- E

Since E y>9< + E "it'l ' s

secon

u v

i 3 + 2PTv2 E Vi9i(u)p > 0, a = 1,2,..., r.

d order quasi-concave, this implies

(a; - u)TV i E 2/.5.(«) + E +(i-U)TV2|E^.(")+ E W or

p

- ° '

ffl =

1

u v

ii

»2.---.r

(x - u)T | V E Z/.'S.-(«) + v E

u v

ii

+(x - u)TV2 E 0, a = 1,2,... , r .<E/o

Thus ( x - U ) T ( v E **<(«) + v E <W + V * E I. i€M\/0 j€N\J0 ieM\/0

W9»(«)P}>0

J

B. Mond and J. Zhang

230

or

(x - u)T \ V Y

u v

Vigi{u) -\- v - V Y

ii +

V2

Vi9i(u)P \ > °

Y

Since v = V / ( u ) -f V 2 /(u)p — Vy g(u) — V y g(u)p, we have, (x - u ) r I V / ( u ) + V 2 /( U )p - V J^ Wft(«) - V £ «Y"J [ ieh jeJo T (x - u) V < f(u) - Y Vi9i(u) - Y

+ (x-ufv 2 1f( u )- YVi9i(u)(

ie/o v

Since f — Y, V'9> ~ Y, il"]

ii

' E Vi9i(")P \ > ° ieio J

\

Yum\P>° j€Jo

iS s e c

u v

V

)

°nd order pseudo-convex, we have

f(x) - Y yM1) - Y xm > /(«) - Y v*i(u) - Y «w - \pTF2f(u) - V * Y y.9>W}p By V > 0,g(x) > 0 , i > 0 a n d i ) = V/(u) + V 2 /(u)p - Vt/ T ff («) - V2yTg{u)p > 0, we have /(*) > /(«) - £ iG/o

wft(u)

- $3 Ui[V/(«) + V2f(u)p - VyTg(u)

- V*yTg{u)p]3

j€Jo

- Jpr[v2/(«) - v 2 Y v.*(«)fo z

ie/o

Theorem 2.5 (Strong duality) Let x° be a local or global optimal solution of (P1) at which a constraint quali fication is satisfied. Then there exists y" € Rm, such that (x°,y°,p = 0) is feasible for (2DG1) and the corresponding values of (P1) and (2DG') are equal. If also for each v, f — Y Vi9i ~ Y VA'] ™ second order pseudo-convex and Y Vi9< + Y vii']> a = 1,2,... ,r is second order quasi-concave in (x,u) whenever (x,u,y,p) is feasible for (P1) and (2DG1), then x° and (x°,y",p = 0) are global optimal solutions for (P') and (2DG1) respectively. Proof: Since a constraint qualification is satisfied at x°, then by the necessary Kuhn-Tucker conditions there exists y° G Rm such that V / ( J : 0 ) > ^y°^9{^0)^ x° T [V/(x°) - Vy°Tg(x°)} = 0,

yoTg(x°) = 0,

y° > 0.

Generalized Convexity and Higher Order Duality

231

Thus (x°,y°,p = 0) is feasible for (2DG') and the corresponding values of (P/) and (2DC) are equal. Optimality then follows, if / — 2 J V>9i ~ 5 ^ VA'] la second order pseudo-convex •eft

and 2_, V<9i + zl ie/ 0

jeJo

v

A']i a = 1,2,... ,r, is second order quasi-concave in (x,u),

from

jeJ«

weak duality. □ We now consider some special cases of the dual (2DC) and theorems 4 and 5. If I0 = M, J0 = N then (2DC) becomes (2D'). From Theorems 2 and 3 or Theorems 4 and 5, (2D') is a dual to (P') if / — yTg — vT[-] is second order pseudoconvex for all feasible (x,u,y,p). In particular / — yTg — vT[-] is second order pseudoconvex if / is second order convex and g is second order concave. If I0 = 4>,J0 = < M I = M,Ji = N then (2DC) becomes (2D1'). From Theorems 4 and 5, (2D1') is a dual to (P') if / is second order pseudo-convex and yTg + vT[-] is second order quasi-concave for all feasible (x,u,y,p). If J0 = M, J 0 = <Mi = N then (2DG') becomes (2D2'). From Theorems 4 and 5, (2D2') is a dual to (P') if / — yTg is second order pseudo-convex for all feasible (x,u,y,p). If To = (j>, Jo = N,h = M then (2DC) becomes (2D3'). From Theorems 4 and 5, (2D3') is a dual to (/") if / — vT[-] is second order pseudo-convex and yTg is second order quasi-concave for all feasible (x,u,y,p). If 7o = , Jo = N, 7, = {i},i = 1,2,... ,m, (r = m) then (2DG') becomes (2D4')

max / (u) - uT[Vf(u) s.t.

+ V 2 /(«)p - VyTg{u) -

V2yTg(u)p]

\pTV*f(u)p

V f(u) + V 2 / ( " ) P > Vt/ T 5 («) + V2yTg{u)p Vifl'.-(w)- 2?Tv2y>9>(u)P ^ °.

« = 1,2,...,m

2/ > 0 From Theorems 4 and 5, (2D4') is a dual to (P') if / — uT[-] is second order pseudoconvex and each y,p,, i = 1,2, . . . , m is second order quasiconcave for all feasible (x,u,y,p). Note that if <7, is second order quasiconcave and y{ > 0, then t/,<7; is second order quasiconcave. Thus (2D4') is a dual to (P') if / - vT[-] is second order pseudoconvex and g (i.e. each component of g) is second order quasiconcave. We now give a Mangasarian type strict converse duality theorem for the dual (2DG') to (P/). A function / will be said to be second order strictly pseudo-convex at x" if, for all x ^ x* and p, (x - x*) r [V/(x*) + V 2 / ( z > ] > 0 =* f(x) - f(x')

- \pTV2f(x')p

> 0.

232

B. Mond and J. Zhang

It will be said to be second order strictly pseudo-convex at x*,p*, if, for all x # x', (x - x')T[Vf(x')

+ V2/(x*)p*] > 0 => f{x) - f(x')

- -p*TV*f(x*)p'

> 0.

Theorem 2.6 (Converse duality) Let x" be an optimal solution of (P') and let a constraint qualification be satisfied atx" Let v

J2"J'A-] /f-- E K^ *_ ~ E N

be second order pseudo-convex

and let

*W EW W++5E>» JJ H H, ■€/<.

jeJa

l,2,,...r «a = l-2,,...r

be second order quasi-concave in (x,u) whenever (x,y,u,p) is feasible for (P') and (2DG'). If(x',y',p') is an optimal solution of (2DG1) and iff - E y*i9i - E !>,•[•] is second order strictly pseudo-convex /(*•) = /(x*) £ y:rfftV) /(.') - E gi(x')

at x',p',

then x" = x°, i.e. x'''solves

£ *,-[V/(i*) **[V/(x*) + + V 2 /(x*)p* - E

(P'°) and

T Vy* Vy' g(x')

1 2 x ^ p T j -- V //( ( x *) * ) -- E »*tf(**)]P* -- VvyVT^j,(*v]i V[TVtv2 E vt9i(**)W 2

.e/o

Proof: We assume that x" / x° and exhibit a contradiction. Since x° is a solution of (P') and a constraint qualification is satisfied at x°, it follows by strong duality that there exists y° € Rm,p = 0 such that {x°,y°,p = 0) solves (2DC). Hence = /(*•) /(*•) -- ££ yfr,(*») yfgi(x°) - £ £ x'j[Vf(x') x°[V/(x°) /(*•) =

- V^/^fx0)], Vy^(x-) = /(**) - E *?«(**) -" E ^*[V/(x*) - V;/*^(x*; + V + V22/(x*)p* /(x*)p* -- V VV^fx V^fx V VL L

- V[W(x*)-EvV5 .(-*)K ,(x*)K 2

(7)

iek

1

V*f(x')p'-Vy'Tg(x')-

Since (x',if,p*) is feasible for (2DC), if we let v = Vf(x*) + VyTg(x')p', then we have

E *:*(**) + E *to - E v?w(**) - E *>i *>; •e/a

ie^o

■£/„ T T

22

+ ^P* + ^P* V V

jeJc

£ ¥**(*') P*>0, P*>0,

Q == ll,2,...,r. ,2,...,r.

233

Generalized Convexity and Higher Order Duality Also, since £ y*gi + £ v>[-] is second order quasi-concave it follows that ie/o J€Ja (x° - x') x*)TT{{ V 5J 3] y* t/*
j€^a

+ (( x^° -- xI rf VV 2 ^{ 55 3; ^, (, (I x' *) )++ E 5 3* x; «^i}} pP ** > 0 , ■e/a ■€/<.

a = l,2,,..,r. l,2,,..,r.

jeJa ieJ<.

Thus T T 2 (x°-x*) 53 x*„, x*^}} ++ ((x°-x-) ( x ° - x * ) T {{v V ^53 *?*(**)+ *?*(**) + Vv E x ° - x ' ) T Vv 2 553 3 t/;>0o .eM\/„

i e Af W l

.eM\/ 0

r rr 2 =^(x°-x-) + Vvx= * ( x ° - x " ) r {{v V 553 3 ^,(x-) ^,(x-) + x - t,-v53x> t , - V 5 3 ;x > ; ++Vv2 553 3 y*»; }> 0 ft -(x>*}>o 9i(x-)p-

•€M\/„ jeJo .eAf\/„ •€M\/0 jeJo «eAf\/„ ^ ( x " - x ' ) T { V 5 3 ^yT. 9i( (xx')* ) + + v - V 53 x)V]■ + V 2 5 3 ^^ ,,( (xx* > ) p**}} >> 0 ■eM\/ jeJo «eA/\/ ■ eM\/00 j€./o i£M\;, 0

Since v = Vf(x')

+ V 2 / ( x ' ) p - - Vy'Tg(x')

- V*y>Tg(x')p*,

it follows that

(x° - x*) T {V/(x*) - V 53 j,* 5 l (x') - V 5 3 x > , + V 2 /(x*)p' 2 -- V > 00 V 2 55 33js/,"*<5? ,l (( x^ -))pp*' }} >

= » ( * • - x°*)rv{/(x-) - 53 ^,(x*) - 53 x>,} ie/o

j€-/o je^o

+ (x° - x*)Tv2{/(x-) - 53 y:9,(x') - 53 x>,}p* > o ■e/o ie/o

ieJis

Since / - 5 3 y't9i - 5 3 «j[-] is second order strictly pseudo-convex at x*,p*, we have that

/(*•) - E ^.^°) /(*°) »?*(**) - E *J«* > /(**) - E »?*(**) ie/o

>e/o

JE'/O

W 22 /(^)-V /(^)-v 2253y-- yeJo EE ^^ ++ W 5:y-''e/o55 ,(x-)]p* ,(x-)]p* from (7), which implies i'€/o

This is a contradiction since y' > 0,5(x°) Vy'Tg(x') - V2y^«,(x*)p* > 0.

iGJo

> 0,x° > 0 and v = V/(x*) +V 2 /(x*)p* D

B. Mond and J. Zhang

234

3

Higher Order Duality

In [8], Mangasarian gives the following higher order dual to (P) (HD) max / ( « ) + h(u, p) - yTg(u) - yTk(u, p) s.t. V„h(u,p) = Vp(yTk(u,p)) y>0 where k : R" x R" -* R and ifc : R1 X R" -> R™ are differentiable functions; V p /t(w,p) denotes the n x 1 gradient of A with respect to p and V p (y T fc(u,p)) denotes the n x l gradient of yTk with respect to p. Note that for the first order dual (D), h(x,p) = p T V / ( x ) and fc;(x,p) = pTVgi(x),i = 1,2, . . . , m ; while for the second order dual (2D), h(x,p) = p T V / ( x ) + lpTV2f(x)p and *,-(*,p) = i^Vft(x) + ip r V 2 < ? ,(x)p, * = 1,2,.. . ,m. Mangasarian, however, does not prove a weak duality theorem for (P) and (HD) and only gives a limited version of strong duality. In [11] Mond and Weir give con ditions for which weak duality holds between (P) and (HD), prove strong duality for (P) and (HD) and consider other higher order duals to (P). We now give a higher order dual (HD') to (P') and conditions for which duality holds between (P') and (HD'), and also consider other higher order duals to (P').

(HD')

m a x / ( u ) + h(u,p) - yTg(u) - yTk{u,p) s.t. Vph(u,p)> Vp(yTk(u,p)) y

- (u + p)T[Vph{u,p)

-

VpyTk(u,p)} (8)

>0

Theorem 3.1 (Weak duality) Let x be feasible for (P') and («,?/,p) feasible for (HD'). (x,u,y,p) /(*) - /(«) > (* - u)TVph(u,p)

+ h(u,p) -

If for all feasible

T P (Vph(u,p))

(9)

and gi(x)-gi{u)

< (x-u) T V„*:,(u,p)-f fc,(u,p) - p T (V p fc,(u,p)),

. = 1 , 2 , . . . , m , (10)

then inf(P') > sup(2/D') Proof: / (a) > /(«) + (x-

u)TVph(u,p)

+ h{u,p) -

pT(Vph(u,p))

= /(it) -I- xTVph{u,p) - uTVph(u,p) + h(u,p) pT(Vph(u,p)) T T T > f(u) + x Vp{y k(u, p)) - (u + p) Vph(u, p) + h(u, p) = / ( « ) + (x - u)TVp{yTk{u,p)) + uTVp(yTk(u,p)) + h(u,p) - (u +

p)TVph(u,p)

Generalized Convexity and Higher Order Duality = / ( u ) + (x - u)TVp(yTk(u,p)) T

- (u + p) Vph(u,

+ yTk(u,p)

T

p) - y k(u, p) + p

- pT(VpyTk(u,p))

T

(Vp!/Tfc(u, T

> / ( « ) + VT9(x) - / / ( « ) - yT9(u) + A(u,p) - yTk(u,p)

235

T

T

p)) + u Vpy k(u,

+ h{u,p) p)

- (u + p ) T [ V p % , p ) - V p y T fc(u,p)]

- (u + p) r [V p / J (u,p) -

VpyTk(u,p)]

(The first inequality holds by assumption (9), the second inequality holds by x > 0, Vph(u,p) > V p j/ T fc(u,p), the third inequality holds by assumption (10). The last one holds since y > 0,g(x) > 0). D Theorem 3.2 (Strong duality) Let x° be a local or global optimal solution of (P') at which a constraint qualifica tion is satisfied and let h(x",0) = 0,*(»°,0) = 0, Vph(x',0)

= Vf(x°),Vpk(x°,0)

= Vg(x°)

(11)

then there exists y" 6 Rm such that (x°,y°,p = 0) is feasible for (HD1) and the corresponding values of (P1) and (HD') are equal. If also (9) and (10) are satisfied for all feasible (x,u,y,p), then x" and (x°,y",p = 0) are global optimal solutions for (P1) and (HD') respectively. Proof: Since a constraint qualification [7] is satisfied at x° by the necessary KuhnTucker conditions [4] there exists y" £ Rm such that V/(x°) > Vy°Tg(x°) z° [V/(x°) - Vy°Tg(x°)} = 0 y°Tg(x°) = 0, t / ° > 0 T

Thus, from (11), (x°,j/°,p = 0) is feasible for (HD') and the corresponding values of (P') and (HD') are equal. If (9) and (10) hold, then by Theorem 3.1, x° and (x°,y°,p = 0) must be global optimal solutions for (P') and (HD') respectively. D Remarks. If h(x,p) = p T V / ( x ) + | p T V 2 / ( x ) p then (9) becomes the second order convexity condition given by Mond [9] and Mahajan [5]. If fc;(x,p) = pTVg>;(x) + |p T V 2 <7,(x)p, then (10) becomes the second order concavity condition given in Mond [9] and Mahajan [5]. Also, conditions (11) are satisfied if h(x,p) = p r V / ( x ) + \pTV2f{x)p and k(x,p)

= pTV9i{x)

+ -pTV2gi(x)p,

i = 1,2,... ,m.

We now show that weak duality between (P') and (HD') holds under weaker convex conditions than those given in Theorem 3.1.

B. Mond and J. Zhang

236 Theorem 3.3 (Weak duality). Let x be feasible for (P') and (u,y,p) (x,u,y,p) and v € Pt1,

be feasible for (HD').

If for all feasible

( x - u)TVp{h(u,p) - yTk(u,p) - pTv} > 0 T T => f(x) - y g(x) - x v - (f(u) - yTg{u) - uTv) - (h(u,p) + pT[Vph(u,p) - VpyTk(u,p)} > 0.

yTk(u,p)) (12)

Then inf(P') > sup(#D') Proof: Since (u, y,p) is feasible for (HD'), from (8) we let v = Vph(u,p)-VpyTk(u1p) (x - u)TVp{h(u,p) - yTk(u,p) - pTv} = ( i - u)T{Vph(u,p) - VpyTk{u,p) - VppTv) T = {x- u) {Vph(u,p) - VpyTk(u,p) -v} = 0

(13)

(by (13))

From (12) it follows that /(*) - VTg(x) - xTv - (/(«) - yT9(u) - uTv) - (h(u,p) + pT{Vph(u,p)-VpyTk(u,p)}>Q

yTk(u,p))

Since y > 0,g(x) > 0,x > 0, (8) and (13), we have f(x) > f(u) - yTg(u) + h{u,p) - yTk{u,p)

- {u + p)T[Vph(u,p)

-

VpyTk(u,p)]. D

Remarks. If / satisfies (9) and g satisfies (10), then / — yTg — vT[-] satisfies (12). If h(u,p) = pTVf(u) + \pTV2f{u)p and k,(u,p) = pTVg,(u) + lpTV*gi(u)p, i = 1,2,..., m, then (12) implies that / — yTg — vT[-] is second order pseudo-convex as defined by Mahajan [5]. Also strong duality between (P') and (HD') still hold if conditions (9) and (10) are replaced by condition (12). Other higher order duals to (P') are also possible. For example, under suitable conditions, the problem (HD") s.t.

max /(«) + h(u, p) - pTVph(u,

y >o is a dual to (P').

p)

Vph{u,p)-Vp(yTk(u,p)) >0 T T y g(u) + u [Vph(u,p) + yTk(u,p)-pTVpyTk(u,p)<0

(14) T

Vpy k(u,p)] (15)

Generalized Convexity and Higher Order Duality Theorem 3.4 (Weak duality). Let x be feasible for (P') and {u,y,p) (x,u,y,p), and v € R", (x - u)TVph(u,p)

237

feasible for (HD").

If, for all feasible

> 0 =>■ f(x) - f{u) - h{u,p) + pTVph(u,p)

>0

(16)

and yTg(x) + xTv - yTg(u) - uTv - yTk{u, p) + pTVpyTk{u, ^{x-u)TVp{yTk(u,p) + pTv}>0,

p) > 0 (17)

then inf(P') >

sup(HD").

Proof: Since x is feasible for (P') and (u,y,p) v = Vph(u,p) — VvyTk{u,p), then from (14)

is feasible for (HD"), and we let

yTg(x) + xTv - yTg{u) - uTv - yTk(u,p) From (17), it follows that (x - u)TVp{yTk(u,p) (x-u)T{VpyTk{u,p) => (x - u)TVph(u,p)

+ pTV„yTk(u,p)

> 0.

+ pTv} > 0, that is,

+ v) > 0 > 0

(By v = Vph(u,p) -

=» / ( * ) - / ( « ) - A(«,P) + pTVPh(u,P)

VpyTk{u,p))

> 0 (By (16)) D

Theorem 3.5 (Strong duality). Let x" be a local or global optimal solution of (P') at which a constraint Qualifica tion is satisfied and let h(x\0)

= 0,k(x°,0)

= 0, V p /i(x°,0) = V/(x°), Vpk(x"p) = Vg(x°)

(18)

then there exists y° € R" such that (x°,y°,p = 0) is feasible for (HD") and the corresponding values of (P') and (HD") are equal. If also (16) and (17) are satisfied for all feasible (x,u,y,p), then x" and (x",y",p = 0) are global optimal solutions for (P1) and (HD") respectively. Proof: Since a constraint qualification [7] is satisfied at x", by the necessary KuhnTucker conditions [4], there exists y" G Rm such that V/(x°) > VyoTg{x°), T

y° g(x°)

x o T [V/(x°) - Vy°Tg(x°)} = 0 = 0,

y'> 0.

B. Mond and J. Zhang

238

Thus from (18) (x°,y°,p = 0) is feasible for (HD") and the corresponding values of ( F ) and (HD") are equal. If (16) and (17) hold, then by Theorem 3.4, x° and (x°, y°,p = 0) must be global optimal solutions for (P') and (HD") respectively. D R e m a r k s . If h{u,p) = pTVf(u)

\pTV2f(u)p,

+

ki(u,p) = pT^ 9i(u) + -pTV2gi{u)p,

i=

1,2,...,m

then conditions (16) and (17) reduce to second-order pseudo-convexity of / and second order quasi-concavity of yTg + vT[-], and also the higher order dual (HD") reduces to the second order dual (2D1'). We now formulate a general higher order dual to (P'). Let / „ C M = {l,2,...,m},

JaCN

= {l,2,...,n},

a = 0 , 1 , 2 , . . .,r,

with r

| J Ia = M

and

/„ n Ip = 4> if a ^ 0

and

Ja n J/3 = (j> if a ^ /3

a=0 r

| J Ja = N

Note that any particular Ia or Ja may be empty. Thus if M has r\ disjoint subsets and N has r 2 disjoint subsets, r = max{ri,r2}. Thus, if ri > r 2 then Ja, a > r 2 , is empty. Consider the problem (HDG)

max / ( « ) + h(u,p) - ^2(y,gi(u) <e/o

-pT[Vph(u,P) -

+ j/ifc,(u,p))

VPJ2VMU,P)}

- Yl «J[ V P%>P) ~ vp!/rfc(u,p)]j jeJo

s.t.

Vph(u,p)-

V„yTk(u,p)

E t e ( u ) + UiWu
>0

(19)

Yl VMU,P)

«iIV,fc(w, P) - V„yTA:(U, p)]j < 0 a = l,2,...,r,

T h e o r e m 3.6 (Weak duality).

(20) y>0

Generalized Convexity and Higher Order Duality Let x be feasible for (P1) and (u,y,p) (x,u,y,p), and v g R",

239

feasible for (HDG). If, for all feasible

(x - u)T[Vph(u,p) - V„ £ yM^P) ~ VP Y p ^ ] > 0 •€/o jeJo =* /fa) - J2 Vi9i(x) ~ £ a;j«j - (/(«) - £ Wffi(«) ~ H u j»i) •€/o ieJo i€/o J6J0 - ( * ( « , P) - £ ViH»tP)) + p T [V p /i(«,p) - V p £ Vik(u,p)] > 0(21) and £ sw.fa) + £

x v

Ji -

£ y.?i(«) + £

u w

J i

- £ y(u,p)

+ p r v P l£y,fc,(«,p)) >o (x - u)TVp{J2 vMu,P) + £ pm) > ° •'6/= jeJa a = l,2,...,r

(22)

inf(P') > sup(#£>G). Proof: Since x is feasible for (P') and (u,y,p)

is feasible for (HDG), if we let T

v = Vph(u,p)-Vpy k(u,p),

(23)

then Y y<9i(x) + £ t£/a

x v

i i - ( £ 2/*fa) + £

>£ja

+ PTVP(Y

U V

3 J) - £

J6ja

>e^o

VMW))

VMU>P)

«"€*•

* °.

« = 1,2,...,r.

(x - u)T{V„( £ »,-*»(«,?)) + V„ £ ?*»,} > 0.

« = 1,2,... ,r.

Hence by (22)

Thus ( x - « ) T { v p Y yik(u,p) + vP Y ieM\/0 jeN\Jo

=>(x-u)T{Vp £

PM}>O

}iW«,p)+«-VPEwl>«

=> (x - u f { V p ( U , p ) - V„ £ W**(«,f») - V p £ p ^ } > 0 (by (23)). is/o je-'o

B. Mond and J. Zhang

240

From (21) it follows that

f(x) ~ Y, Vi9i{x) - £ XjVj - [f(u) - J2 Vi9i(u) - J2

- (h(u,P) - £ yMu,p)) + pT[vPh(u,p) - vP £ t'e/o

u v

i i\

WM«,P)1

>°

I'eAi

Since x is feasible for (P') and (u,y,p) is feasible for (HDG) and (23), we have f{x) > / ( « ) + h(u,p) - ^2(yigi(u) + yiki(u,p)) i'€/o

- pT{vph(u,P) - v p Y.

VM*,P)]

te/o

- 5Z « j [ V ( u , p ) - Vpj/Tfc(u,p)]J »GJS) D

Theorem 3.7 (Strong duality) Let x° be a local or global optimal solution of (P') at which a constraint qualifica tion is satisfied and let conditions (11) be satisfied. Then there exists y° 6 Rm such that (x",y°,p = 0) is feasible for (HDG) and the corresponding values of (P1) and (HDG) are equal. If also (SI) and (22) are satisfied for all feasible (x,u,y,p) then x° and (x°,y°,p = 0) are global optimal solutions for (P') and (HDG) respectively. Proof: Since a constraint qualification [7] is satisfied at x" by the necessary KuhnTucker conditions [4] there exists y° g Rm, such that VyoTg(x°)

< V/(x°)

T

x° [V/(x°) - Vy°Tg(x°)} = 0 yoT9{x°) = 0 y°>o. Thus from (11), {x°, y°,p = 0) is feasible for (HDG) and the values of (P') and (HDG) are equal. If (21) and (22) are satisfied, then, by Theorem 3.6, x° and (x°,y°,p = 0) are global optimal solutions for (P') and (HDG) respectively. □ Remarks. If I0 = M, J0 = N then (HDG) becomes the higher order dual (HD'). In addition, conditions (21) and (22) become the condition (12). If Io = (f>, h = M, J0 = , J\ = N then (HDG) becomes the higher order dual (HD"). In addition, conditions (21) and (22) become conditions (16) and (17) respectively. In the case of h(x,p) = p T V / ( x ) + \pTV2f(x)p and fc,(x,p) = p r V#,(x) -f lpTV2<7;(x)p, i = 1,2,... , m, then (HDG) becomes the second order dual (2DG') and

Generalized Convexity and Higher Order Duality

241

condition (21) and (22) become second order pseudo-convexity of / — YJ j/,^,- — YJ »,•[■] and second order quasiconcavity of Y^ J/,'J7; — E "iHi a = 1,2,... , r respectively. We now give a Mangasarian type [7] strict converse duality theorem for the higher order dual (HDG) to (P'). Theorem 3.8 (Converse duality) Let x" be an optimal solution of (P1) and let a constraint qualification be satisfied at x" Let condition (11) be satisfied at x" and let conditions (21) and (22) be satisfied for all feasible (x,u,y,p). If (x*,y*,p*) is an optimal solution of (HDG) and if for all x / x" and v € R"

(x - x')T[Vph(x',p')

- Vp(£

yiki{x',p')))

- v p ( £ P*[VPM«%P*) - v>ytTk(x*y)W > o jeJo

=> /(*) - E $#(*) - Y: *& - (f(x-) - E y-gi(x') - Y. *;»i) •e/o jeJo >6/o je-Aa -(fc(x',P*) - YJy*fc.(^,P*))+P* T [V P /i(x*,P*)- V p £>;*,•(**.?*)] > 0 , •e/o te/o

(24)

then i* = i ° , i.e. x' solves (P') and

/(x°) = /(*•) +fc(x-,p')- 52[y;9<(x*) + y;k,(x',p')] ■e/o

r

- p- [vPMx-,p*) - v p yj y;k,(x',P')} •e/o - E

V

¥

*K PM* .P*) "

VpVW,P*)]i

Proof: We assume x° ^ x' and exhibit a contradiction. Since x" is a solution of (P') and a constraint qualification is satisfied at x", it follows by strong duality that there exists y" € /P", p = 0 such that {x",y°,p = 0) solves (HDG). Hence / ( * • ) = f(x') - E V?ft(*°) - E *°Ff(x°) •e/o ie./o

- V y ^ f i 0 ) ] , (By (11))

= /(«») + h(x',P>) - E b . % ( ^ ) + »?*»(*',?*)] <6/o

- p-T[Vpfc(**,p*) - V p £ y:Ux\p')) ie/o - E x*[V p ft(x',p*) - Vp2/'Tfc(x*,p*)3J

(25)

B. Mond and J. Zhang

242

Also since (x*,y',p") is feasible for (HDG) and from (19) if we let v = Vph(x',p*) Vpy*Tk(x*,p*), then we have that for a = 1,2,... , r

-

E visit"') + E *;«i - ( E »;*(**) + E *;»*) - E y?*.-(**,p*) + P * T V P ( E

VM*\P*))

> o.

and so by (22), we have (»• - x * ) T V p { £ si?fc,(s>*) + E P;»>} > 0. •e/a

« = 1,2... . , r .

ieJ«

Hence (*«-x*) T {V„

E

J/,-fc.(x*,P*) + V p

.£M\/o

^(x°-x*)r{Vp

£

E

P>i}>0

J6N\J0

3/,*^*>P*) + « - V P E P > i } ^ 0 j'6Jo

>£M\7o

Since u = Vpfe(i*,p*) — Vpt/*Tfc(i*,p*) we have

(*' - x-)T{vph(x',p') - vp E y'M*',p') - v , E P>.) > o and by (24) it follows that:

/(*•) - E y*9>(x0) - E *:»i - [/(*') - E »**(**) - E *&•] -(/i(x-,p*)-^2/*fc,(x-,p*)) + p*T[Vph(x",p')

- V p £ V?k(*%P')] > 0. i'e/o

(26)

But from (25) and (26) and v = Vph(x",p") - Vpy"Tk(x',p')

we get E y'9i(x°) + ieh E ^ [ ^ P M ^ ' I P * ) ~~ ^p!/*T^(:r*!P*)]j < 0 which is a contradiction since x° is feasible

for (P') and (x*,j/*,p*) is feasible for (HDG). D A number of papers containing second order duals for programming problems with non-differentiable functions can be found in the literature, see e.g., Bector and Chandra [2,3]. There too the problems considered do not require that the variables be non-negative. The method and results given here are applicable to these non-smooth problems as well and will be further discussed in a subsequent paper. In [13] Qi considered LC1 optimization problems, i.e., problems where the func tions and constraints are differentiable and the derivatives are locally Lipschitzian. He showed that many results proved for problems with second order differentiable func tions actually hold for LC1 problems. The possibility of establishing higher order duality results for LC1 optimization problems will subsequently be considered.

Generalized Convexity and Higher Order Duality

243

References [1] C. R. Bector, M. K. Bector and J. E. Klassen, Duality for a nonlinear programming problem, Utilitas Mathemaiica 11 (1977) 87-99. [2] C. R. Bector and S. Chandra, First and second order duality for a class of nondifferentiable programming problems, J. Inf. Opt. Sci. 7 (1986) 335-348. [3] C. R. Bector and S. Chandra, Second order duality with nondifferentiable functions, in manuscript. [4] H. W. Kuhn and A. W. Tucker, Nonlinear programming, Proceeding of the 2nd Berkeley Symposium on Mathemaiical Statistics and Probability, University of California Press, (1951) 481-492. [5] D. G. Mahajan, Contributions to optimality conditions and duality theory in nonlinear programming, PhD thesis, Indian Institute of Technology, (1977), Bombay, India. [6] D. G. Mahajan, M. N. Vartak, Generalization of some duality theorems in nonlinear programming, Mathemaiical Programming 12 (1977) 293-317. [7] O. L. Mangasarian, Nonlinear Programming, McGraw-Hill, New York (1969). [8] O. L. Mangasarian, Second and higher-order duality in nonlinear programming, Journal of Mathemaiical Analysis and Applications 51 (1975) 607-620. [9] B. Mond, Second order duality for nonlinear programming, Opsearch, Journal of the Operational Research Society of India 11 (1974) 90-99. [10] B. Mond and T. Weir, Generalized concavity and duality, in S. Schaible and W. T. Ziemba (ed.), Generalized Concavity in Optimization and Economics (Academic Press, New York, 1981) 263-279. [11] B. Mond and T. Weir, Generalized convexity and higher order duality, Journal of Mathematical Sciences 16-18 (1981-1983) 74-94. [12] B. Mond and J. Zhang, Generalized convexity and duality of the nonlinear programming with non-negative variable, Proceedings of a symposium on Optimization, Ballarat, Australia, 14 July 1994, to appear. [13] L. Qi, Superlinearly convergent approximate Newton methods for LC1 optimization problems, Mathemaiical Programming 64 (1994) 277-294. [14] P. Wolfe, A duality theorem for nonlinear programming, Quarterly of Applied Mathematics 18 (1961) 239-244.

244

W. Oettli and P. H. Such

Recent Advances in Nonsmooth Optimization, pp. 244-260 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Prederivatives and Second Order Conditions for Infinite Optimization Problems Werner Oettli Universitdt Mannheim,

P h a m Huu Sach Institute of Mathematics,

68131 Mannheim,

Germany

Box 631 Bo Ho, 10000 Hanoi,

Vietnam

Abstract

This paper deals with the problem of minimizing a supremum function over a subset C of a topological vector space X. Second order necessary and sufficient optimality conditions are written in terms of some approximations of the data of the problem.

1

Introduction

In what follows we consider the infinite optimization problem (P)

m i n { / ( i ) :=supf,(x)

:x G C}.

teT

Here T ^ 0 is a topological space, C / 0 is a subset of some real topological vector space X, say, and ft:C—>]R{oT all ( £ T . We shall derive necessary second order conditions for a local m i n i m u m of ( P ) , and sufficient second order conditions for a strict local m i n i m u m of ( P ) . We do not work with (explicitly defined) derivatives, but instead we work with prederivatives, i.e., with approximations having specified properties. It will be shown t h a t various forms of derivatives fit into our general model. If the underlying space X is finite-dimensional, then problem ( P ) becomes an instance of what is commonly called a semi-infinite p r o g r a m m i n g problem. Various

Prederivatives and Second Order Conditions

245

forms of second order conditions for semi-infinite programming problems, mostly for the case C = X, have been given in [2, 7, 9, 12, 13, 14, 19, 20, 24]. Here X will be of arbitrary dimension, and C will be a proper subset of X. The reader who is interested in the general theory of higher order optimality conditions for abstract mathematical programming problems is referred to [1, 3, 8, 11, 15, 17, 19]. Amongst recent approaches which would be useful for inf-sup problems we mention [22] where the notion of epi-derivative [23] is used as the main tool for deriving second order optimality conditions.

2

Second Order Necessary Conditions

In this Section we derive necessary conditions for a local minimum of problem (P). We assume that T is a compact space, X is a real topological vector space, C C X is arbitrary, and ft : C —> R for all t £ T. We say that XQ G C is a local minimum of (P) iff f{x0) is finite and there is a neighborhood V of x0 in X such that f(x0) < f(x) whenever x £ C PI V For fixed x we denote the function t — t > ft(x) by /(.)(x). Let x0 G C be fixed such that f(x0) is finite. Without loss of generality we suppose that f(x0) = 0. We assume that the function /(.)(zo) is upper semicontinuous on T, so that the set T0 := {t G T : ft(x0)

= /(*„)} = {t€T

: ft(x0)

> 0}

(2.1)

is compact and nonempty. The approximations to be used in this Section are collected in the following As sumption 2.1, which remains in force throughout this Section. Assumption 2.1 (i) Let H C X be a convex cone, and D C H a nonempty subset. (ii) For all x £ H and d G D let there exist Oi(-) : M+ —> X such that, for all £ > 0 sufficiently small, xe:=x0

+ ed + E2x + o1{E2) (EC,

(2.2)

where Oi(-) is subject to the condition that lim

Ol(e)/e

= 0.

(2.3)

(Hi) For all t G T, let f] : clH -> M and ft : D -* R have the following properties: (a) fl(•) is lower semicontinuous, convex, and positively homogeneous of degree 1; (b) the functions fh(x) and fh(d) are upper semicontinuous on T for all x £ clH and d G D respectively; (c) for all x G H and d G D there exists o2(-) : R+ —> R such that, for all e > 0 sufficiently small, ft(xc)-ft(x0)<efl(d)

+ e'(f}(x)

+ ff(d))+o2(Si)

V
(2.4)

W. Oettli and P. H. Sach

246 where xc is given by (2.2) and lim o 2 (e)/e = 0. e|0

Remark 2.2 If 0 £ D and if / ( 2 (0) = 0, then by setting d := 0 in (2.4) and substituting e for e 2 we obtain the following relation between / ( and f}: ft(x0 + ex + o,(e)) - /,(x 0 ) < e//(x) + o2(e)

V< £ T.

Definition 2.3 A direction d £ X is called critical iff d £ D and, for some 6 > 0, ft(xo) + 6f}{d)<0

VteT.

(2.5)

Remark 2.4 Since / ( (x 0 ) < 0 for all t £ T, (2.5) implies that /t(*o) + */?(<*) < 0 Vi £ T, Ve £ [0,*].

(2.6)

In the sequel we shall need the following result. Lemma 2.5 Assume that Xo £ C is a local minimum o/(P). Then for every critical direction d we have max (fl(x) + f?(d))>0 Vx£#. (2.7) ifcio

Proof. Fix d £ X, a critical direction. Set (?i(x) := f}(x) + f?(d). Assume, for contradiction, that (2.7) is false. Then there exists x £ H and 7 > 0 such that 9,(5) < - 7

Vi£T0.

Then the set T\ := {t £ T : (?i(x) < —7/2} is open and contains T0. Hence T2 := T \ 7] is compact and disjoint from To. Therefore k0 := max/((.To) < 0. teT2

Let &i := max//((f), A:2 := maxqt(x). We set xe := x0 + ed + e 2 x + oils'2) and obtain from (2.4) that /«(*«) < /.(*o) + e//(«0 + £29<(5) + o 2 (e 2 ). For < £ Ti it follows from (2.6) and (2.8) that, for all e £ [0,(5], /*(*.) < e2qt{x) + o 2 ( £ 2 ) < - e 2 7 / 2 + o 2 (e 2 ).

(2.8)

Prederivatives and Second Order Conditions

247

For t E T2 it follows from (2.8) that ft(xc) 0 sufficiently small we obtain that i t £ C and sup/ ( (x e ) < 0 = sup/ ( (x 0 ). (6T

teT

This is impossible, since Xo is a local minimum of (P). The next lemma can be read off from [16, Theorem 1] or [4, p.99-100]. For the sake of completeness we include its proof. Lemma 2.6 Let H C iRn be a nonempty, closed, convex set. Let T be a nonempty compact set. Let ip : T x H —> M be a function such that up{t, •) is convex and lower semicontinuous on H for all t E T, and ip(-,x) is upper semicontinuous on T for all

xeH.

if maxip{t,x)>0

Vx E H,

(2.9)

tef then there exists a finite subset T C T such that m&xu>(t,x) > 0 V i e / / . Proof. For simplicity we assume that T has at least n + 1 elements. Let xo E H and Hp := {x E H : ||i — x 0 || < p} (p > 0). Assume that (2.9) is true. Then for every fixed e > 0, p > 0 the family of sets H(t) (t E f) with H{t):=

{xeHp

:
-c}

has empty intersection. Since the sets H(t) C JR." are convex and compact for all t E T and have empty intersection it follows from Helly's Theorem [21, Corollary 21.3.2] that there is a subfamily of n + 1 sets having empty intersection, i.e., there is {tut2,...,tn+1) E fn+1 such that max

i^(ti,x) > — e

Vx E H„.

i=l,2,...,n+l

Hence the sets F(£,p):={(h,t2,...,tn+1)efn+i

:

max

*(*,•,x) > -e

Vx E ^ }

%=l,^,...,n+l

are nonempty for all e > 0, p > 0. This implies at the same time that any finite collection of the sets F(e,p) has nonempty intersection, since f]F(ei,pi)

D

F(mme„ma.xpi).

IV. Oettli and P. H. Sach

248

The sets F(e,p) being closed subsets of the compact set T n + 1 , the collection of all F(e, p) with e > 0, p > 0 has nonempty intersection. Then from

(Ix,I 2 ,...,Vi)e f|

F £

( >P)

e > 0 p>0

follows

max

y?(?,, x) > 0 Vi 6 ff. Hence, setting T := {t\,t2, ...,t n +i} we

t=l t 2,...,n+l

obtain the desired conclusion. For the next lemma we have to introduce some notation. Let T ^ 0 be a compact set. Let C(T) be the Banach space of all continuous functions F : T —* M with the norm \\F\\ := max|.F(/)|. We denote by C*(T) the continuous dual of C(T). For <e? A € C*(?) we define A > 0 :«=> (A, F) > 0 for all F £ C ( f ) such that F(r) > 0 on

f.

Lemma 2.7 Lei H ^ ® be a, convex set. Let T =fc % be a compact set. Let ip : T x H —► M be a function such that (p(t, •) is convex on H for all t £ T, and tp(-, x) is continuous on T for all x £ H. If max0

Vz £ H,

(2.10)

!6T

i/ien iftere exists A £ C*(T) suc/i that A > 0, A ^ 0, and (A,v(-,x))>0

VxeH.

Proof. By assumption, for every fixed x £ H, y>(-,x) £ C(T). Let Q, := {F £ C ( f ) : F(t) < 0 Vt £ f } , Q 2 : = {F £ (7(f) : there exists x eH

such that F(t) >
Vf £ f } .

The sets 0 on Q2- The first inequality gives A > 0. Applying the second inequality to the functions ^p[-,x) £ Q2 we obtain the claimed result. Remark 2.8 If the set T is finite, then the conclusion of Lemma 2.7 gives the existence of real numbers A( > 0 (t £ T), not all zero, such that E, e jrA,y(*,*)>0

V.T£i7.

Without loss of generality we may assume that £ ( e j:A t = 1. As a consequence of Lemmas 2.5-2.7 we obtain

(2.11)

Prederivatives and Second Order Conditions

249

Theorem 2.9 Assume that x0 is a local minimum of (P). Then for every critical direction d and every finite-dimensional linear subspace S C X there are a finite subset T CT0 and real numbers \ t > 0 (t g T ) with £ t e r A« = 1 such that L1(x)>0

VxetfnS,

L2(d) > 0,

(2.12) (2.13)

where L\x)

:= E t e r A,/, 1 ^), L\d)

:= £ t e T KfUd)-

(2.14)

Proof. We fix d E X, a critical direction, and 5 C -V, a finite-dimensional subspace. For x G clff and t € T0 let v>(r,x) := f}(x) + fi(d). From Lemma 2.5 it follows that maxu((,i)>0

Vx e H D S.

Since S is of finite dimension, and maxy?(i, ■) is convex, this implies that t€To

maxy>(r,x) > 0 Vx G H, where if denotes the closure of H f~l 5 in S. We apply Lemma 2.6 and obtain a finite subset T C To such that maxu;((,i) > 0 Vx £ H. Now we apply Lemma2.7, where we substitute T :=T From Lemma 2.7 and Remark 2.8 we obtain real numbers A< > 0 (t € T) with EfgT -^t = 1 such that EteT-W(M)>0

Vx6#.

This implies by (2.14) that L 1 (s) + I 2 ( i ) > 0

VxetfnS.

(2.15)

From (2.15) follows (2.13), since Ll{Q) = 0, and (2.12), since £*(•) is positively homogeneous of degree 1. This completes the proof. For the next result we need the following additional requirement, where d £ X is a critical direction, and S C X is a linear subspace: If max<j((x) > 0 Vx e H n S, then maxg,(x) > 0 Vx € # , *er„ w ; _j<=T„ where qt{x) := f}{x) + f?(d), and .// is the closure of H D 5 in S.

(2.16)

Obviously, (2.16) is satisfied if either of the following conditions holds: (a) H f~l S is closed in S; (b) H (~) S has nonempty interior in S; (c) maxqt(-) is upper semicontinuous on clH.

W. Oettli and P. H. Sach

250

Theorem 2.10 Let x 0 be a local minimum o/(P). Let d G X be a critical direction. Let S C X be a reflexive Banach space whose norm topology is stronger than the topology inherited from X. Let (2.16) hold. Then for every e > 0 there are a finite subset T C To and real numbers Xt > 0 (t G T) with J2teT ^t = 1 such that L\x)

> -e||x||

VxGtfnS,

L2{d) > - e ,

(2.17) (2.18)

where L'(-) for i = 1,2 are defined by (2.14). Proof. Let e > 0, p > 0. Let qt(-) and H be as in condition (2.16). Let Hp := {x g H : ||x|| < p}. From Lemma 2.5 and condition (2.16) follows max<7((x) > 0 Vx G Hp. Therefore the family of sets K(t) := {x € Hp : qt{x) < — e}, where t G To, has empty intersection. The set Hp is weakly compact, since S is reflexive, and the sets K(t) are convex and closed, hence weakly closed. So there exists a finite subset T C To such that M K(t) = 0. So max — E VX G HP. We apply Lemma 2.7 t€T _ __ _ ter with H := Hp, T : = T, 0 (t £ T) with ^ZieT ^f = 1 such that YlteT ^t ~s Vx G Hp, i.e., _ I V ) + # ( < * ) £ - « VxGifp. (2.19) Since 0 6 ^ and L'fO) = 0, (2.19) implies (2.18). Choose a > 0 such that a > L2(d). It follows from (2.19) that L1(x) > — (e + a) Vx G #,,, and from the homogeneity of Ll{-) follows L\x)>-{e + a)\\x\\jp Vx G tf. Choosing p := (e + a ) / 6 we obtain (2.17). This completes the proof. Theorem 2.11 Assume that the functions fl.dx) (i = 1,2) are continuous on T. If xo is a local minimum o/(P), then for every critical direction d there exists A £ C"*(Tb) with A > 0, A / 0, such that T x (x) > 0 Vx G H, L2(d)>

0,

(2.20) (2.21)

where L\x)

:= (XJ^x)),

L2{d) : = (Xjf.^d)).

(2.22)

Proof. We concatenate Lemma 2.5 and Lemma 2.7, where we substitute H := H, f := To, 0, A ^ 0, such that {X, 0 V i 6 / f , i.e., Ll(x) + L2{d) > 0 Vx G H. From this follows (2.21), since LX(Q) = 0 , and (2.20), since Lx{-) is positively homogeneous.

Prederivatives and Second Order Conditions

3

251

Second Order Sufficient Conditions

In this Section we derive sufficient conditions for a strict local minimum of problem (P). We say that x0 £ C \s a, strict local minimum of problem (P) iff there is a neighborhood V of x0 in X such that f(x0) < f(x) whenever x G C D V, x / x0. We assume that X is a normed space; C C X and the topological space T are arbitrary. We fix x0 G C, and as in the previous Section we assume that f(x0) = 0, and To is given by (2.1). The approximations to be used in this Section are collected in the following As sumption 3.1 which remains valid throughout this Section. Assumption 3.1 (i) Let H C X be a cone. (ii) Let h : C —> H have the following property: For all i E C , h(x) = x — x0 + o(x — xo),

(3-1)

where o(-) : X —> X satisfies limo(f)/||£|| = 0. (Hi) For all t G To, let f't : clH —> (a) //(■) is positively homogeneous (b) / ( 2 (-) is positively homogeneous (c) there exists oj(-), o2t(-) : H —►

]R (i = 1,2) have the following properties: of degree 1; of degree 2; M such that, for all x €. C,

ft(x) - Mxo) > fi(h(x)) + o](h(x)), ft(x) - /«(*„) > /?(*(*)) + ftiKx))

+ o]{h{x)),

(3.2) (3.3)

where h(-) is as in (3.1), and Yimo)(h)l\\h\\ = 0, Yimo2t{h)l\\h\\2 = 0. h—►0

h—tO

We recall that Assumption 3.1 (ii) was used in [18] as a main tool to derive sufficiency results. R e m a r k 3.2 If ff(-) is lower semicontinuous at 0, then (3.2) is a consequence of (3.3). Indeed, from /, 2 (0) = 0 and the lower semicontinuity follows the existence of 6 > 0 such that f?(h) > - 1 for all h G H with \\h\\ < 6, hence by homogeneity f?(h) > -<5-2||ft||2 for all he H. Therefore for all h G H we have

f?(h) + o2(h)>-6-2\\h\\2

+

o2(h)=:ol(h).

This shows that (3.3) implies (3.2). Definition 3.3 A sequence {dk} C X is called weakly critical iff dk € H, \\dk\\ = 1 for all k, and l i m s u p / / ( 4 ) < 0 Vt G T0. (3.4) k—*oo

A direction d G X is called weakly critical iff d G clH, \\d\\ = 1, and

fl(d)
wteT0.

W. Oettli and P. H. Such

252

Theorem 3.4 Assume that, for every weakly critical sequence {dk}, there are a finite subset T C To and real numbers Xt > 0 (t g T) such that L\x)

> 0 Vz £ H,

(3.5)

limsupi2(4) >0,

(3.6)

k—*oo

where L'(x) := YlteT ^tft(x)

(l

=

1,2). Then XQ is a strict local minimum of (P).

Proof. Assume, for contradiction, that io is not a strict local minimum of (P). Then there is a sequence {xk} C C such that qk := xk — x0 —► 0, qk ^ 0 and 0 > / ( x t ) — / ( x 0 ) for all k. The last inequality implies for all k that 0 > ft(xk)

- ft(x0)

VteT0.

(3.7)

Let h be the map appearing in (3.1). Putting hk := h(xk) it follows from (3.1) that

(A*-»)/ll«*l|-»fc This implies, since ^ —> 0, that hk — qk —> 0, and therefore /i^ —> 0. oJ(A/t)/||/it|| -> 0, o?(/i*:)/||^fc||2 -> 0. Combining (3.7) and (3.2) we obtain 0 > fl(hk) + o](hk)

Hence

V i e To,

(3.8)

Vi

(3.9)

or, equivalently,

o > tftoO + °}(fc*)/HM

er0,

where <4 := At/||A*|| 6 if. By letting fc —» oo in (3.9) we get (3.4). This shows that {dk} is a weakly critical sequence. By assumption there are a finite subset T C To and \ t > 0 (t G T) satisfying (3.5) and (3.6). Combining (3.7) and (3.3) we obtain for all k that Q>fl{hk)

+ f?(h)

+ o2t(h)

ViGT

(3.10)

Multiplying (3.10) by A( and summing up we obtain 0 > Ll(hk) + L*{hh) + d{hk), where o(hk)/\\hk\\2

(3.11)

—► 0. Dividing (3.11) by \\hk\\2 and taking account of (3.5) we find 0>L2(dk)

+ d(hh)/\\hk\\\

which implies 0 > \\m sup L2(dk), k—*oo

(3.12)

253

Prederivatives and Second Order Conditions contradicting (3.6). The theorem is thus proved.

Remark 3.5 If oj(-) in (3.2) is independent of t, then the assumption in Theorem 3.4 can be replaced by the following assumption: There are S > 0, 7 > 0, a finite subset T C T0, and A, > 0 (! 6 T) such that (3.5) holds and L2(d) > 7 for all d e Hs, where Hs := {d £ H \\d\\ = 1, f}(d) < 6 Vt e To}. Indeed, let the sequence {dk} be as in the proof of Theorem 3.4. Then from (3.9) follows, for all k sufficiently large, that dk 6 Hs, hence L2(dk) > 7, which implies limsupl 2 (djt) > 0. k—*oo

Remark 3.6 If X is finite-dimensional and the functions / ( ' ( ) (i = 1,2) are lower semicontinuous, then the following condition is sufficient for the assumption of The orem 3.4 to be satisfied: For every weakly critical direction d there are a finite subset T C To and A( > 0 (t E T ) such that (3.5) holds and L2{d) > 0. Indeed, take an arbitrary weakly critical sequence {dk}. By the the compactness of the unit sphere in X and the lower semicontinuity of //(•) we may suppose that dk converges to some weakly critical direction d. Prom L2(d) > 0 and the lower semicontinuity of /?{■) follows (3.6). Theorem 3.7 Let TQ be compact, let f!.\(x) {i = 1,2) be continuous on To, and let o2(-) in (3.3) be independent oft. Assume that for every weakly critical sequence {dk} there is a linear functional A on C(To), A > 0, such that (3.5) and (3.6) are satisfied with L'(x) := (A,/,',(x)) (i = 1,2). Then x0 is a strict local minimum o/(P). Proof. The proof is a replica of the proof of Theorem 3.4. Remark 3.8 The results of Sections 2 and 3 can be applied to the infinite program ming problem (Po)

min{/ 0 (x) : xeC,

ft(x) < 0

V(£ T},

where /o : C —* M is a given function. Indeed, let xo G C be a feasible point for problem (Po)- Consider the following optimization problem (Pi)

min{sup/,(i) : x € C } ,

where f := T U {0} and ft

\

/((I) :=

if t e T, / *(* 1 /o(xx) — fo{x0) if t = 0.

Problem (Pi) has the same structure as our standard problem (P). Moreover it is easily seen [12] that:

W. Oettli and P. H. Such

254

x0 is a local minimum of (Pi), if it is a local minimum of (Po); io is a strict local minimum of (Po), if it is a strict local minimum of (Pi). In this way we obtain necessary conditions for a local minimum of (Po), a n d sufficient conditions for a strict local minimum of (Po).

4

Examples

Below we indicate some examples for the approximations used in Sections 2 and 3. The first two examples refer to Assumption 2.1 (ii). E x a m p l e 4.1 Assume that C C X is a convex set. Let D := H := {A(c - x 0 ) : A > 0, c € C} -. cone(C - x 0 ). We shall see that Oi(e) = 0 satisfies (2.2) for all x € H, d 6 D. Indeed, choose a > 0 and j3 > 0 such that d\ := x 0 + a_1 0 sufficiently small, and (2.2) is true. Example 4.2 Let C := {x € X \ g(x) = 0}, where g : X —► Y is a mapping between the Banach spaces X and V. Assume that g is twice continuously Frechet differentiable, with first and second Frechet derivatives <7'(x) and g"(x) respectively. Assume that ^'(^o) is surjective. Choose H :={xeX D := {d € X

: g'{x0)x = 0},

g'(x0)d = 0, (g"(x0)d, d) = 0}.

Then there exists Oi(-) satisfying (2.2) and (2.3), see [3, Proposition 7.2]. From now on let X be a normed space. The next four examples refer to Assump tion 2.1 (iii). More precisely we shall verify condition (2.4) for different choices of ft(-). In all these examples it is supposed that ft:X—iRis continuously Frechet differentiable for all t 6 T, and that f}(x) := / ( '(x 0 )x. In accordance with (2.2) we set x£ := x 0 +
(e > 0),

where Oi(-) satisfies (2.3). The functions /,(■) will be positively homogeneous of degree 2 in all cases. Example 4.3 Assume that the functions /*(•) are twice continuously Frechet differ entiable, and let

f?(d):=l-(ti'(x0)d,d).

Frederivatives and Second Order Conditions

255

By the second order Taylor expansion formula we have, for all t G T, /t(x«) - ft(x0) = f't{x0)(xe

- x„) + r(/*'(&,e)(x £ - x 0 ), xE - x 0 )

for some £tiS £ [x0, x j . Hence /t(*«) - /i(*o) = efl(d) + e2fi(x)

+ e2f?(d) +
where <^(e2) := / ( '(xo) 0l (e 2 ) + 1 ((/ { "(^, e )(x e - *„),*. - Xo> - e3(f?{x0)d,d)). We require that the sets {/,'(x0) : < € T} and {/"(x 0 ) : < 6 T} are bounded, and that /"(•) is continuous at x 0 , uniformly in t 6 T. Then, as e J. 0, ypt(e2)/e2 converges to zero uniformly with respect to t £ T, hence (2.4) is satisfied. Example 4.4 Let ft(-) be continuously Frechet differentiable. We set fi{d) := limsup — (ft(x0 + ed) - ft(x0)

- ef't(xQ)d),

(4.1)

assuming that the limit in (4.1) is finite and that convergence to the limit is uniformly in t £ T. Then from (4.1) follows the existence of a function r(e) such that limr(e) = 0 and, for all t 6 T, ft(x0 + ed) - /,(*„) - eft(x0)d

< e2f2{d) + r(e) • e 2 .

By the mean value theorem we have, for all t £ T, ft{xc) - ft(x0) = }'Mt,c){s2x + o,{e2)) + ft(x0

+ ed) -

ft(xQ)

for some £(|£ £ [x0 + e 0 such that x 0 + h G V for all /i with \\h\\ < rj. Following [12] we define f2(d):=

sup ||X -

Uft(x + ed)d-f[{x)d).

X 0 | | < T)

||x + ed— x 0 || < 0 < £ < T]

T)

(4.2)

W. Oettli and P. H. Sach

256

It follows from [12, Proposition 4] that fi(eh) = e2f?{h) for all e > 0, and that

\fUh)-ff(h)\<^\\h-m\\h\\ + \\M)Moreover, if ||ft|| < r/ we have /«(*o + *) - /f(*o) - f't(x0)h = j f ( / I ( * o + \h)h = / " ( A ' C ^ + €v)v - rt(xo)v)de <2fUv)[mede Jo

= \\hff?(v)

f't(x0)h)dX

[v := h/\\k\\, e := X\\h\\]

= f?(h).

Thus, if H&II < r/, then /«(*o + A) - /«(*o) < yj(*o)fc + # ( * ) < tf(*o)fc + f?(h) + |||fc - I | | (||h|| + ||*||). Now we let e > 0 be so small that ||xE —£0|| < V, and obtain with h := x s —x 0 , ft := ed, that /t(*«) ~ /t(*o) < /f(*o)(«« + £ 2 x) + e2f?(d) + p«(e»), 2 where V ( (e ) := /;(x 0 ) O l (e 2 ) + f ||e2x + 0 l (e 2 )|| • (||x. - x 0 || + e||d||). We require that {/,'(xo) : t 6 T) is bounded. Then v?((e2)/£2 -> 0 for £ | 0 uniformly in <, and (2.4) is satisfied. The approximation (4.2) was introduced in [12]. However, in [12] it was not applied to the functions ft themselves, but to the Lagrangian of problem (P). The resulting second order approximation to the Lagrangian is sublinear with respect to the multiplier, whereas our function L2(-) see (2.22) depends linearly on the multiplier. Therefore our results are different from those in [12]. Example 4.6 Let X be finite-dimensional. Assume that the Frechet derivatives /,'(•) are Lipschitz continuous in a convex neighborhood V of x 0 . Define St(x,d) := limsup — (f'v t{u + ed)d — f!{u)d). u —> x 2e ' Then [10] the following mean value property holds for all Xi,x 2 € V : ft(x2)

- ft(Xl)

< f't{Xi)(X2

for some £< 6 [xx,X2]. Moreover St(x,ed)

- x , ) + St({ttXi

= e2St{x,d)

fi(d):=St(x0,d).

-Xi)

for all e > 0. We set

(4.3)

Prederivatives and Second Order Conditions

257

This is the limiting case of (4.2) as 77 J. 0. Then we obtain from the above mean value property: ft(Xc)

~ ft(xo)

< f',{Xo)(xc

- X0) + St(£t,e,Xc

-

X0)

+ e2x) + f?(ed) +
where v?((e2) := / ( '(x 0 )oi(e 2 ) + max{0, St(^c,xc - x0) - St(x0,£d)}. The function St(-, ■) is upper semicontinuous [6]. We require that it is so uniformly in t € T, at all points {x0,d) with d G D, and that the set {/,'(i 0 ) : t G T] is bounded. Then (2.4) holds. We note that (4.3) can equivalently be written as f2(d):=^Sup{{Md,d)

: MeS2ft(x0)},

(4.4)

where S2ft(x0) C C(X,X') is a generalized set-valued second derivative, see [10]. The next three examples refer to Assumption 3.1. More precisely we shall verify inequality (3.3). In these examples C C X is an arbitrary set, and we let H := cone(C — Xo). Then we can set h(x) := x — x0 and o(h) s 0 in (3.1). As before, /?(*) := f[{x0)x. Example 4.7 Assume that / ( (-) is twice Frechet differentiable. Then (3.3) holds with

f?(d):=±(f?(xo)d,d), since ft(x) - ft(x0) = f't{x0){x - x0) + \{f't'{x0){x rt(x — x0)l\\x — x0\\2 —» 0 for x —> x0-

- x0),x - x0) + rt{x - x0), where

Example 4.8 Assume that / ( (-) is Frechet differentiable and that /,'(•) is Lipschitz continuous in a neighborhood V of XQ- Fix 77 > 0 as in Example 4.5 and let fl(d):=

inf Uft(x + ed)d-ft(z)d). Ze \\x-x0\\
(4.5)

0 < e < 77

Similarly with Example 4.5 we obtain for all h sufficiently small that ft(x0 + h)-

ft(x0)

> f't(x0)h

+

f2{h).

Thus (3.3) holds with h(x) := x - x0 and o2(h) = 0 for all h sufficiently small. Example 4.9 Let X be finite-dimensional. Assume that the Frechet derivative /,'(■) is Lipschitz continuous in a neighborhood V of x0. Define st(x,d)

:= liminf ^ ( / , ' ( « + ed)d -

f't(u)d),

W. Oettli and P. H. Sach

258 and set /,2(
Similarly with Example 4.6 we obtain that

ft(x0 + h) - ft(x0)

> ft(x0)h

+ st((t, h)

> ft{x0)h + f*(h) + rt(h), where & 6 [x0, x0 + h] and rt(h) := min{0, st((t, h) — st(x0, h)}. The function st(-, •) is lower semicontinuous. We require that st(-, h) is lower semicontinuous at xo, uniformly with regard to all h with \\h\\ = 1. Then r((/8)/||/»||2 —> 0 for h —> 0, and condition (3.3) is satisfied. Let us mention that a more complex form of necessary and sufficient second order conditions for problem (P) has recently been given by Kawasaki [13, 14]. In these conditions the inequalities involving second order derivatives - the counterparts of (2.13) and (3.6) carry an extra term which describes a suitable approximation to the function sup, € T / ( (-). It has not been possible as yet to incorporate this extra term into our general formalism. Acknowledgment. This paper was written during a research visit of the second author to Universitat Mannheim. He is indebted to Deutscher Akademischer Austauschdienst (DAAD) for financial support.

References [l] A. Ben-Tal, Second order theory of extremum problems, In: Extremal Methods and Systems Analysis (Lecture Notes in Economics and Mathematical Systems, Vol. 174), 336-356. Springer-Verlag, Berlin, 1980. [2] A. Ben-Tal, M. Teboulle, and J. Zowe, Second order necessary optimality con ditions for semi-infinite programming problems, In: Semi-Infinite Programming (Lecture Notes in Control and Information Sciences, Vol.15), 17-30. SpringerVerlag, Berlin, 1978. [3] A. Ben-Tal and J. Zowe, A unified theory of first and second order conditions for extremum problems in topological vector spaces, Mathematical Programming Study 19 (1982) 39-76. [4] E. Blum and W. Oettli, Mathematische 1975.

Optimierung, Springer-Verlag, Berlin,

[5] J. M. Borwein, Semi-infinite programming duality: how special is it? In: Semiinfinite Programming and Applications (Lecture Notes in Economics and Math ematical Systems, Vol. 215), 10-36. Springer-Verlag, Berlin, 1983.

Prederivatives and Second Order Conditions

259

[6] F. H. Clarke, Generalized gradients of Lipschitz functionals, Advances in Math ematics 40 (1981) 52-67. [7] B. S. Darkhovskii and E. S. Levitin, Quadratic optimality conditions for prob lems of semi-infinite mathematical programming, Transactions of the Moscow Mathematical Society 48 (1985) 175-225. [8] N. Furukawa and Y. Yoshinaga, Higher-order variational sets, variational deri vatives and higher-order necessary conditions in abstract mathematical program ming, Bulletin of Informatics and Cybernetics 23 (1988) 9-40. [9] R. P. Hettich and H. Th. Jongen, Semi-infinite programming: conditions of opti mality and applications, In: Optimization Techniques, Part II (Lecture Notes in Control and Information Sciences, Vol. 7), 1-11. Springer-Verlag, Berlin, 1978. [10] J. -B. Hiriart-Urruty, J. J. Strodiot, and V. H. Nguyen, Generalized Hessian ma trix and second-order optimality conditions for problems with C 1,1 data, Applied Mathematics and Optimization 11 (1984) 43-56. [11] K. H. Hoffmann and H. J. Kornstaedt, Higher-order necessary conditions in abstract mathematical programming, Journal of Optimization Theory and Ap plications 26 (1978) 533-569. [12] A. D. Ioffe, Second order conditions in nonlinear nonsmooth problems of semiinfinite programming, In: Semi-Infinite Programming and Applications (Lecture Notes in Economics and Mathematical Systems, Vol. 215), 262-280. SpringerVerlag, Berlin, 1983. [13] H. Kawasaki, An envelope-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems, Mathematical Programming 41 (1988) 73-96. [14] H. Kawasaki, Second-order necessary and sufficient optimality conditions for minimizing a sup-type function, Applied Mathematics and Optimization 26 (1992) 195-220. [15] F. Lempio and J. Zowe, Higher order optimality conditions, In: Modern Applied Mathematics, ed. by B. Korte, 147-193. North-Holland, Amsterdam, 1982. [16] V. L. Levin, Application of a theorem of E. Helly in convex programming, the problem of best approximation, and related problems, Matematichesirii Sbornik 79 (1969) 250-263. [17] E. S. Levitin, A. A. Miljutin and N. P. Osmolovskii, Higher order conditions for a local minimum in problems with constraints, Uspekhi Matematicheskikh Nauk 33 (1978) 83-148.

260

W. Oettli and P. H. Sach

[18] H. Maurer and J. Zowe, First and second order necessary and sufficient opti mality conditions for infinite-dimensional programming problems, Mathematical Programming 16 (1979) 98-110. [19] Pham Huu Sach, Second-order necessary optimality conditions for optimization problems involving set-valued maps, Applied Mathematics and Optimization 22 (1990) 189-209. [20] Pham Huy Dien and Pham Huu Sach, Second-order optimality conditions for the extremal problem under inclusion constraints, Applied Mathematics and Opti mization 20 (1989) 71-80. [21] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1972. [22] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematics of Operations Research 14 (1989) 462-484. [23] R. T. Rockafellar, First- and second-order epi-differentiability in nonlinear pro gramming, Transactions of the American Mathematical Society 307 (1988) 75108. [24] A. Shapiro, Second order derivative of extremal-value functions and optimality conditions for semi-infinite programs, Mathematics of Operations Research 10 (1985) 207-219.

Solution

Stability

of Nonsmooth

Equations

261

Recent Advances in Nonsmooth Optimization, pp. 261-288 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Necessary and Sufficient Conditions for Solution Stability of Parametric Nonsmooth Equations Jong-Shi P a n g 1 Department of Mathematical Sciences, The Johns Hopkins University, Maryland 21218-2689, U.S.A.; [email protected]

Baltimore,

Abstract

This paper gives necessary and sufficient conditions for the stability and the strong stability of a solution to a parametric nonsmooth equation under a set of mild assumptions. These assumptions are imposed on a first-order approxi mation of the nonsmooth function; one of them is motivated by the well-known second-order necessary condition in nonlinear programming and generalizes a key assumption in the classical Leray-Schauder fixed-point existence theorem. Specializations of the stability results to a parametric variational inequality and its Karush-Kuhn-Tucker system are discussed.

1

Introduction

Introduced by S.M. Robinson [26], the notion of a strongly regular solution to a gen eralized equation has played a central role in the sensitivity and stability analysis of p a r a m e t r i c , constrained optimization problems and variational inequalities. In t h e context of a p a r a m e t r i c nonlinear program ( N L P ) with differentiable inequality and equality constraints, Robinson showed in this reference t h a t if a stationary point of the program satisfies two assumptions: (i) LICQ-linear independence of the gradients of t h e active constraints, and (ii) S S O S C - s t r o n g second-order sufficiency condition, t h e n t h e stationary point is a strongly regular solution of the Karush-Kuhn-Tucker ( K K T ) system of t h e given program. Recently, Bonnans and Sulem [3] establish a very interesting converse of this result for a local m i n i m u m ; specifically, they show 'This work was based on research supported by the National Science Foundation under grant CCR-9213739.

J. S. Pang

262

that if the stationary point is a local minimum of the nonlinear program, then strong regularity of the point implies LICQ and SSOSC. Since every local minimum must satisfy the second-order necessary condition (SONC) [4], the Bonnans-Sulem result raises the question of whether more consequences of the SONC can be obtained than previously known; in particular, since it has been shown in [10] that the SONC has an important role to play in the stability analysis of parametric variational inequal ities (Vis) and related problems, it is natural to suspect that solution stability of a parametric VI can perhaps be characterized under the SONC. This line of thought has led us to the present investigation. Another motivation of this work stems from a previous paper [20] in which we have obtained some sufficient conditions for a given solution to a parametric nonsmooth equation to be stable; we have applied the result to a parametric variational inequal ity defined on a polyhedral set that is independent of the parameter. The goal of this paper is to obtain necessary and sufficient conditions for a given solution to a paramet ric nonsmooth equation to be stable and to apply the derived results to a parametric variational inequality in which the defining set is parametrized and not necessarily polyhedral. Similar characterizations are obtained for strong stability. In the context of a parametric KKT system, under the strict Mangasarian-Fromovitz constraint qual ification (SMFCQ) [13] and a second-order necessary condition, we show that strong regularity is equivalent to strong stability. Other specializations of the main charac terizations of stability and strong stability will also be discussed. For references on the sensitivity analysis of parametric Vis and NLPs, we cite [26, 27, 10, 12, 14, 23, 32]. As it turns out, a key assumption needed in our analysis happens to unify both the SONC in nonlinear programming and an assumption in the classical Leray-Schauder fixed-point existence theorem; [16, Theorem 3.1.4] or [18, Theorem 6.3.3]. This is somewhat unexpected because the SONC, well known as it may be in optimization, is not known to bear any relationship to existence results of fixed points of continuous mappings. The tool that brings out such a connection is degree theory [16, 18]. For readers who are not familiar with this theory, we advise them to consult the two references.

2

Miscellaneous Concepts

We shall be concerned with the system of parametric nonlinear equations:

F(*,w) = v. +m

(1)

where F : ft" —> ft" is a continuous function. The parameters of this equation are y 6 ft" which varies in a neighborhood of the origin 0 € ft" and u g ftm which varies in a neighborhood of a given vector u>* £ ftm; x is the primary variable of the equation. Suppose x* g ft" is a zero of the function F(-,w*). We are interested in the sensitivity of this solution as the parameters (w, y) vary around (ui*, 0). The following concepts play a central role in the sensitivity analysis of (1).

Solution Stability of Nonsmooth

Equations

263

Definition 1 The vector x" 6 F(-,u>*) 1(0) is said to be stable if there exist a positive scalar c> 0 and neighborhoods V C R", W C Rm, and U C Rn1 of x' ,u)*, and the origin, respectively, such that (a) for each (u,y) £ W x U, the set

5v(«,») = F(- 1 «)- 1 (y)nV is nonempty, and (b) sup{||x - x*|| : x € Sv{u,y)}

< e(||w - w ' | | + ||»||).

The same zero x" is said to be strongly stable if the set Sv(u,y) is a singleton for all (u,y) £ W X U and the function Sy : W X U —> V is Lipschitz continuous on its domain. We make several remarks regarding the above definition. Stability of the zero x" implies that the perturbed system F(x,ui) = y,

x £ V

is solvable, although not necessarily uniquely, for all pairs (w, y) sufficiently close to (u>*,0). Condition (b) implies that x" is an isolated zero of F(-,UJ")\ i.e. Sy(a/,0) = {x"}. In the language of set-valued analysis [1], (b) stipulates a locally, pseudo upper Lipschitzian property of the perturbed solution map Sv at the base pair (w*, 0) relative to the vector x'; that is, SY(U,y)

C {x*} + c ( | | w - « * | | + ||»||)B(0,1),

where B(0,1) is the unit Euclidean ball in Rn Clearly, if x" is strongly stable, then it is stable. Note that the Lipschitzian property of the solution function Sv means that there exists a constant L > 0 such that for any two pairs (w, y) and (a/, y') that belong to W x U, we have ||Sv(w,») - Sv(u\y')\\

< I ( | | w - « ' | | + || v - y'\\).

(2)

This last inequality is stronger than the requirement (b) which concerns only the base pair (w*,0) and one perturbed pair (oJ,y). As we see from the above discussion, the stability of the zero x" involves two requirements: (a) which asserts the (local) solvability of all slightly perturbed equa tions, and (b) which provides some kind of continuity of the perturbed solutions with reference to x* In the context of variational inequalities, Bonnans [2] has coined the term "hemistability'' for condition (a) and "semistability" for (b). He has obtained necessary and sufficient conditions for the semistability of a solution to a linearly con strained variational inequality and has discussed the combined role of hemistability

J. S. Pang

264

and semistability in the convergence analysis of Newton's method for KKT systems of variational inequalities. Related analysis of the latter type for general nonsmooth equations can be found in [20, Section 5]. Bonnans did not provide conditions for the hemistability of x*. In the subsequent study of variational inequalities, we allow the problems to be defined on perturbed, non-polyhedral, convex sets. In general, the stability of the zero x' will be studied via a first-order approxi mation of the mapping F(-,u>"). This approximation concept was formally defined in [28] and had played an important role in nonsmooth analysis. Definition 2 A function $ : R" —> R" is said to be a first-order approximation (FOA) of the function H : Rn —> Rn at the vector x* 6 Rn if the error function e(x) = H(x) — $ ( i ) satisfies two properties: (a) e(x*) = 0 and (b) V ; l.m = 0. *-»» \\x — x \\

This approximation is said to be strong if lim

e(x) — e(x') -V 7TT- = °-

If H is F(rechet)-differentiable at x* with the Jacobian matrix V'H(x*), then the affine function

$(i) = H(x") +

VH{x')(x-x')

is a FOA of H at x". This is a strong FOA if H is strongly F-differentiable at x' We will see more examples of FOAs when we discuss the application of the general theory to parametric variational inequalities. Although Definition 1 concerns a zero of the function F which has two arguments, the stability and strong stability concepts certainly are applicable to a function of a single argument.

3

The Assumptions for Stability

Returning to the parametric equation (1), we postulate some blanket assumptions (besides continuity) on the function F in order to study the stability of x" These assumptions are as follows: (A) The function F{-,UJ") has a FOA $ at x'\ moreover, $ is continuous in a neigh borhood of x". (B) The function $(u) = $(x* + v) is positively homogeneous; i.e. $ ( r u ) = for all v e .ft" and all r > 0.

rtytv)

Solution Stability of Nonsmooth

Equations

265

(C) There exist neighborhoods Vx and Wx of x* and u" respectively, and a constant 7 > 0 such that for all (s,w) g V j X t f , , ||/'(*,u)-.F(x,aOII<7i|w-wl. (D) There exists a continuous function T : iJ" —» i f such that (i) the origin is the only zero of T (thus the index of T at the origin, denoted ind(r,0), is well defined), (ii) ind(T,0) is nonzero, and (iii) for all scalars <5 > 0, the function r + 6$ never vanishes at a nonzero vector. We explain the above assumptions. Assumption (A) is the cornerstone of our theory. Assumption (B) is motivated by the case of a B(ouligand)-differentiable func tion F(-,u)*). (We refer the reader to [19] and [28] for some basic properties of a B-differentiable function.) Indeed if F(-,UJ') is B-differentiable at x* with the Bderivative denoted by BF(x*,u>*)(-), then we have \P(t;) = B F ( i ' , u * ) ( » ) and * is positively homogeneous in v. It should be pointed out that assumption (B) is not of major importance in our theory; its role is to allow us to state some characterizing conditions as global properties of * . Without (B), these conditions would become lo cal properties of ty. Since our interest in this paper is restricted to the B-differentiable case, we find assumption (B) convenient to have. Assumption (C) is a local Lipschitzian assumption of the function F(x, ■) at w", with the Lipschitzian modulus independent of x that is sufficiently close to x*. This assumption is commonly made in sensitivity analysis of this type; see for example condition (b) in [28, Theorem 3.2]. Unlike the other three assumptions, (D) seems rather artificial at first sight. We could think of (D) as a kind of nonvanishing property of the function ^ ; this assump tion generalizes the nonvanishing condition assumed in the Leray-Schauder fixed-point existence theorem in classical nonlinear analysis which corresponds essentially to T being the identity map; see [16, Theorem 3.1.4] or [18, Theorem 6.3.3]. In what fol lows, we give two simple instances in which (D) holds. The first instance occurs when ") is F-differentiable at x* with the partial Jacobian matrix VXF(X",UJ') being nonsingular; in this case, we have A = V r F(x*,o;*). Another instance in which (D) holds is when * is a P0-function [17]. In this case, it suffices to take T to the identity mapping. The validity of the requirements (i) and (ii) is obvious as ind(T, 0) = 1. For every scalar 6 > 0, the function T + 6 * is a uniform P-function [17]; thus T + <5* is injective. Consequently (iii) holds. In a later section, we will see how assumption (D) is implied by some second-order necessary conditions (SONC) in nonlinear programming. The following lemma is easy to prove; see [20].

J. S. Pang

266 L e m m a 1 Let ¥ . Rn —> Rn be continuous ing two statements are equivalent.

and positively

homogeneous.

The

follow

(a) ¥ - 1 ( 0 ) = { 0 } ; (b) there exists a scalar A > 0 such

that

||*(w)|| >-MMI,

4

forallveR"

(3)

Main Results

T h e r e are two main results. T h e first result concerns the stability of t h e zero x", and t h e other concerns t h e strong stability. T h e o r e m 1 Let F : R"+m —> Rn be a continuous function and suppose F(X*,LJ*) 0. Under assumptions (A) to (D), the following four statements are equivalent.

=

(a) The origin is a stable zero of ¥ . (b) ¥ is surjective

and ^ _ 1 ( B ( 0 , 1 ) ) is

bounded.

(c) The origin is the unique zero of ty. (d) x* is a stable zero of

F(-,u>').

P r o o f , (a) => (b). Since ¥ is positively homogeneous, we have ¥ ( 0 ) = 0. Assume t h a t the origin is a stable zero of ¥ . T h e n there exists neighborhoods Ui and U2 of t h e origin, and a constant c > 0 such t h a t the system * ( « ) = y,

v£Ux

has a solution, for all y £ U2\ moreover,

HI < «#ll for any such solution. To show (b), let y £ Rn be arbitrary. T h e n ry £ U2 for all r > 0 sufficiently small. Hence there exists a vector v such t h a t ¥ ( u ) = ry. By the positive homogeneity of \P, it follows t h a t * t ( r _ 1 t ; ) = y. T h u s ¥ is surjective. To show the boundedness of \ P _ I ( B ( 0 , 1 ) ) , let y £ 6 ( 0 , 1 ) and assume t h a t \P(v) = y. T h e n for all T > 0 sufficiently small, we have ry £ U2,TV £ U\, and ^(TV) = ry. T h u s it follows t h a t T||V|| < cr\\y\\ which implies ||u|| < c. Hence (b) holds. (b) =4> (c). Suppose t h a t fy(v) = 0 for some vector v =£ 0. T h e n \v £ fl>_1(0) for all A > 0. Thus tf_1(23(0,1)) is unbounded. Consequently t h e origin is t h e unique zero of ¥ and (c) holds.

Solution Stability of Nonsmooth

Equations

267

(c) =*• (d). Since the origin is the unique zero of ^ i n d f ^ O ) is well defined. Moreover, there is a constant A > 0 such that (3) holds. It follows that * _ 1 is globally upper Lipschitzian at zero relative to x"; i.e., we have * _ 1 (j/) C {**} + A \\y\\ 6(0,1),

for all y 6 RT

Define the homotopy H : Rn x [0,1] -> R" by H(v, t) = (l-

i)r(v) + (*(»),

for (v, t) e R" x [0,1].

By assumption (D), the function H{-,t) does not vanish at any vector v ^ 0 for all t g [0,1); the same is true for t = 1 by (c). Hence by the invariance property of the degree of a continuous mapping, it follows that ind($, i*) = ind(*, 0) = ind(tf(•, 1), 0) = ind(#(-, 0), 0) = ind(r, 0) ^ 0. By Theorem 1 in [20], part (d) follows. (d) =4> (a). We first show that (c) holds under (d). Suppose ty(v) = 0 for some vec tor v 6 fi" Let c, V, W, and U be given by Definition 1. By the positive homogeneity of g, we have for any scalar r > 0, 0 = * ( T » ) = $(z* + TV) (4) = F(x~ + TV,Uj') — {F(x"

+ TV,U>') — $ ( x ' + TV)) .

Since $ is a, FOA of F(-,OJ') at x', it follows that for every e 6 (0,1/c), where c is the constant in condition (ii) of the stability of x', we have \\F(x" + TV,UJ') - $(x* + rv)\\

< ST\\V\\

for all T > 0 sufficiently small. Choose r > 0 such that ETV G U and x" + TV £ V Then by condition (ii) in the definition of stability, it follows that T\\v\\ < c\\F{x'

+ TV,UJ') - $ ( x ' + TV)\\ < <XT\\v\\.

By the choice of e, we deduce u = 0. Thus the origin is the only zero of g. As in the proof of [(c) =*> (d)], it follows that ind(>P,0) ^ 0. Thus for any open neighborhood U' C Rn of the origin, the degree of ^ at zero relative to V is nonzero. By the nearness property of the degree, it follows that for all vectors y with \\y\\ sufficiently small, the degree of the translated function $ — y at zero relative to U' is also nonzero. In turn, this implies that there exists a vector v € U satisfying ip(i;) = y. This establishes the first requirement for the origin to be a stable zero of * . The second requirement follows from the inequality (3). Consequently, (a) follows. Q.E.D. R e m a r k . Assumption (D) is not needed for the proof of (a) => (b) => (c).

J. S. Pang

268

After seeing the above proof, Asen Dontchev communicated to the author that conditions (a) and (d) are equivalent under assumptions (A) and (C) only. Indeed, in the paper [6] which was completed after the present paper, this equivalence is extended to set-valued mappings. Hence, assumptions (B) and (D) are responsible only for the implications [(a) =*• (b)] and [(c) =>■ (d)]. We have stated Theorem 1 by assuming all four conditions (A)-(D) because as we shall see in the next section, statements (b) and (c) are central to the applications of the theorem discussed later. Next we turn our discussion to the characterization of strong stability. For this purpose, we need to strengthen assumptions (A) and (C). In essence, the strength ening of (A) amounts to the assumption of a uniform, strong FOA for the function F(-,u>) for all u sufficiently close to ui'\ the strengthening of (C) amounts to the assumption of Lipschitz continuity of the function F(x,-) in a neighborhood of w*, with the Lipschitz modulus being the same for all x sufficiently close to x" (A)' The function F(-, •) has a strong FOA $ at {x* ,w") in the sense of [28, Definition 2.4] which satisfies $(x*) = 0; that is, for every e > 0 there exist neighborhoods V€ of x" and Wc of LJ' such that for all (x', UJ) 6 Vc x Wc for i = 1,2, we have I [F(x\u>) - $(x 1 )] - {F(x\u)

- $(x 2 )] (j < 4X1 - x 2 ||.

(5)

(Note that this assumption implies that $ is continuous in a neighborhood of x*, by the continuity of F.) ( C ) ' There exist neighborhoods V\ and Wj of x' and u>' respectively, and a constant 7 > 0 such that for all (x,w,u/) G Vl x Wx x Wx, ||F(x,u;)-F(x,u/)||<7||W-u/||. By taking (x 2 ,u;) = (**,w*), we see that (A)' =* (A). Clearly (C)' =* (C). We say that a mapping H : Rn —> Rn is a (global) Lipschitzian homeomorphism if H is a homeomorphism from Rn onto itself and both H and H~l are Lipschitz continuous on Rn. The following is the second main result. Theorem 2 Let F : Rn+m —► Rn be a continuous function and suppose F(x',w') = 0. Under assumptions (A)', (B), (C)', and (D), the following three statements are equivalent. (a)' The origin is a strongly stable zero of ^ . (b)' * is a homeomorphism and \P - 1 is Lipschitz continuous on R". (d)! x" is a strongly stable zero of F(-,u>"). Furthermore, if in addition 9 is piecewise linear, then any one of the above statements is further equivalent to each one of the following statements:

Solution Stability of Nonsmooth Equations (e) ^ is a (global) Lipschitzian

269

homeomorphism;

(f) * is injective; (g) * is bijective. Proof, (a)' => (b)'. It follows from the proof of (a) =>• (b) in Theorem 1 that * is surjective. By the strong stability assumption of the origin as a zero of * and by a scaling argument, it can be proved easily that there exists a constant c> 0 such that whenever tf(u') = y' for i = 1,2, then l 2 2 -v-\\
(6)

This establishes that (d)'. It follows from Theorem 1 that x* is a stable zero of F(-,ui*). Let V,W, and U be, respectively, the neighborhoods of x',to', and 0 associated with the stability of x' To show the Lipschitzian property (2), let c > 0 be the Lipschitz modulus of tf-1 Let e 6 (0,1/c) be arbitrary and Vc and Wt be, respectively, the neighborhoods of x' and u' asssciated with e as stipulated in asssmptton (A)'. Let W = IV = PF W nn W, W, nn w«. Wt. Suppose F(x\u/) = y' for some (<J,y%) G W x U for t = 1,2. By restricting the neighborhoods IV and t/ if necessary, it follows from the stability condition (ii) that x' e Ve n Vi for i = 1,2. Write u ; = x' - x* We have j,' = *(«■) + F ( x+F(x«, V ) - <WD')-*(x ( x 1 )i),, y< =*(«') or equivalently, if = S^fo* ~ * " ( * V ) + *(**))• Consequently,

K-«a|| 2 < cdlyi - j/2|| + \\[F(x\u; ||[F(xi, w1)') - *(*»)] F(xW *(x*)]||) Six1)] - [[F(X\UJ ) ) - <&(x*)]||) 2 < c(||»» -
+||f(*V)- (*V)ll)

c

(

7

2

ax(1 7) lll,wii< l . w i i < c ™7* *; ' (V-^!l ii/-^l + + i^-" ll^-"2 ll) ll)

270

J. S. Pang

as desired. (d)' => (a)'. By Theorem 1, it follows that the origin is a stable zero of # . Thus * is surjective. Hence it remains to show the existence of a constant c > 0 such that (6) holds whenever tf(v') = t/' for i = 1,2. (This will imply in particular that * is injective, hence is bijective because we already know that * is surjective.) Let L, V, W, and V be, respectively, the Lipschitz modulus of the solution map Sv and the neighborhoods of x*,u>", and the origin associated with the strong stability of x'Suppose i ${vi) = yi/\ *(v'') , for i = 1,2. Choose e € ( 0 , l / £ ) ; let V€ and Wc be as before. As in (4), we have for any r > 0, ry' = F(x' ry' = F(x'

+ TV\U") + TV\U")

-

(F(x' (F(x'

+ TV\U*) + TV\U*)

- $(x* + r»')) - $(x* + r»'))

. .

By choosing r sufficiently small, we can ensure that x'= x'+ rv* eVDVs F(x',u') ry' + + F(x',uJ') = = ry-

and

1 (F(x',o J')-$(x'))GU, (F(x\^)-nx ))GU,

for i = 1,2. Thus we have T|l^-^| < 1 Lr (lb1 - y2\\ + || [*ix [*(x\w*) ,u*)

- ^(x 1 )] - (F{X\LO-) (F(x2^')

- $(x 2 )] | | ) .

Using the strong FOA inequality (5) and rearranging terms, we deduce

1 2 1 2 lb F - - 2-« H <||< 2 / 1 - 2-^||, / ||, r ^ lI l^|b which establishes the desired Lipschitz property of * - ' . Finally, if * is piecewise linear, then * must be Lipschitz continuous, by a result in [8]. Thus (b)' is equivalent to (e). The equivalence of (e), (f), and (g) for a piecewise linear map is pointed out in the proof of Theorem 9 in [21]. Q.E.D. Remark. We point out that assumption (D) is needed only for the proof of (d)' =■ (a)'. In particular, (D) is really redundant in the proof of (b)v =4- (d)'. Indeed, if * is a homeomorphism, then ind(*,0) is already equal to ± 1 ; thus (D) holds with T = ^ under the assumption in (b)'. The equivalence of statements (a)' and (d)' in Theorem 2 under assumptions (A)' and (C)' was proved in [7] within a much more general context than the present setting. We have retained the other two assumptions (B) and (D) in the above theorem in order to establish the equivalence of all the statements therein. We close this section by mentioning the Habilitation Thesis of Scholtes [31] which contains a chapter discussing piecewise afflne functions; in particular, Section 2.3 is particularly relevant to the equivalence of statements (e), (f), and (g) in Theorem 2.

Solution Stability of Nonsmooth

5

Equations

271

P a r a m e t r i c Variational Inequalities

Let F : F n + m - . f f - b e a continuously differentiable mapping, and let g : Rn+m —> RP and h : Rn+m —» 7?« be twice continuously differentiable vector-valued functions such that <7,(-,w) is convex and hj(-,u) is affine in the first argument for every fixed second argument w € i f . (The continuous differentiability assumption of these functions can be relaxed somewhat. Nevertheless, for the sake of simplifying the discussion, we shall use the assumption as stated.) Let C(w) = {x € Rn : g(x,u>) < 0, h[x,cj) = 0} which is a closed convex subset of Rn for every u> £ Rm inequality is the family of problems:

The parametric variational

{VI(F(.,«),C(w)):«en}, m

where Q is a subset of R and for each w € fJ, VI (F(-, w), C'(w)} denotes the problem of finding a vector x G Cfo;) such that (y-x)TF(x,u)

> 0,

for all y 6 C(w).

For any u> € i f , let SOL(F(-,oi),C(u;)) denote the (possibly empty) set of solutions of the VI(F(-,w),C(w)). By letting HK(U) denote the Euclidean projection of the vector u £ Rn onto the closed convex set K C R", it is known that the VI (F(-,u>),C(u>)) is equivalent to the normal equation [29]: 0 = H(z,u)

= F(nCM(z),w) + z - n C M ( 2 ) ,

(7)

in the following sense: namely, if re solves the VI (F(-,u>), C(ui)), then z = x — F(x,u>) is a zero of H{-,LJ)\ conversely, if z is a zero of H(-,u>), then x = Hc(u,)(z) solves the VI(F(-,u,),CH). We shall say that a solution x" £ SOL(F(-,ui*),C(u>*)) is stable if the correspond ing vector 2* = x' — F(x*,u) is a stable zero of the parametric equation H(-,u>*) = 0 in the sense of Definition 1. Strong stability of x" is defined similarly. In what follows, we shall apply the theory developed in the last section to the parametric equation (7); as a consequence, we will obtain a characterization for the stability of a solution to a parametric variational inequality.

Properties of the projection Throughout the rest of this section, we let (z*,w*) G Rn+m be such that H(z",u>*) = 0; write x* = n C („.)(z*)- Thus x' G SOL(F(-,w*),C(w*)) and z' = x'F(x*,u*). For the theory in the last section to be applicable, we need to obtain a FOA of the function H(-,ui*) at x'; in turn, the existence of such a FOA relies on some basic

272

J. S. Pang

properties of the parametric projection operator II0(w)(z). By definition, for each w, the projection vector U0(u)(z) is the unique optimal solution of the nonlinear program in the variable x: minimize |\\\x | | s - z\\2 (8) subject to x€C{iv). x€C(u). As a function in x (with z fixed), the objective function of this program is strongly convex (and quadratic). Let JJ(x',u>*) ( * * , w ' ) = { t : ^ ( {i: gi{x-,u;')=0} ir* 1w*)=0} be the index set of active (inequality) constraints at (x',u;*). Throughout the rest of this section, we postulate that the constant-rank constraint qualification (CRCQ) holds at the pair (x',u") [11]; that is, (CRCQ) there exist neighborhoods Vj C Rn of x* and W1 C Rm of LJ" such that tor any subsets Q C {1,.., ,q} and J C I(x*,ui), the set of gradient vectors {Vshj(x,w) {V.hj{x,w)

: j G 2 } U {V^,(*,w) {V.flfow)

: «i G J}

has the same rank (depending on Q and J) for all vectors (z,w) G Vi x W,. Let M(**,«*) denote the set of KKT multipliers of the projection problem (8) at (z*,w*); that is Af(«*,«*) consists of vectors (A,/i) G i?p+,J such that x--z'

+ E L , AiV l5 ,(i*,w*) + E?=i HjV / ^ Vxhj(x*,u>*) ^x'.w*) = = 00 A>0,

T ATg(i*,uj*) A <,(x',u/) = 0.

(9)

The CRCQ implies that M(Z\LO") is a nonempty polyhedron. As noted before, i - - z' = F(x*,w*); thus the system (9) is exactly the KKT system of the VI (F(;w*),C{u*)) at the solution x". Define the VI (vector-valued) Lagrangian function: for (xf\,p,u) £ffxiip+xffxr, vP

9 i

C(x,\, ^2\,V hJ(x^). C{x,X,»,u,)sF(x,u,) VJijx(x,w). l,^) = F(x,oJ) + £\iV Igt(x,u)++'ENY,^ agi(x,w)

(10) (10)

Since fc(-,w) is affine for each fixed w, we note that V I £ ( x \ A , MM, Ww * ) = V * F (( :sc>>' ') )++££AA,, VV ^^ (( *z > ' ) is independent of the multiplier ft. In addition to the CRCQ, we also need the familiar Mangasarian-Fromovitz constraint qualification (MFCQ) at (sv,w*) to ensure that C(w) is nonempty and the projection n c ( „ , ii well lefined for rll w near u" Thii sonstrainn qqalification ii stated below:

273

Solution Stability of Nonsmooth Equations (a) the gradient vectors {Vxh](x',u'):j

=

l,...,q}

are linearly independent; and (b) there exists a vector u G R" such that Vxg,(x',L>') g,{x',Lj')TTuu

<0

V^hiix-^'fn V^x'^'^u

= 0 for all j = 1,,.. 1 , , . . ,?.

for all i G Z (l{x',w% *\w*),

Under the above setting, the following property of the parametric projection map can be established by applying [24, Theorem 2] to the parametric nonlinear program (8) with x as the primary variable and (z,w) as the parameter. Lemma 2 Under the CRCQ and MFCQ at (x*,w*), there exist neighborhoods Z0 ofz- and W0 o/w* such that the projection map UC(^(z) as a function of (*,w) is PC \ hence locally Lipschitz and B-differentiable, on the neighborhood ZQ x W0; in particular, there exists a constant 70 > 0 such that for all ( « \ u / ) GZ0xW0,i = l,2, 1

2

1

2

1 l|nCC(( w„»)(, l k1 -- u||).. ||n IW 2)||| < < 7o 7o ((ll^ u>2|||| + + p \\zx -- zz2\\) )) (2 )) -- n c( „)2)222)|||

Although the results in [24] can be used for computing the B-derivative of the parametric projection map, we shall need this derivative only for the map IIo(w)(z) considered as a function in z alone (with UJ' fixed). Define the critical cone of the set C(w*) at x* as follows: £ ( * W ) = {v K(x',iv') {vGR GnR : n : Vxg,{x',uj') g,(x*,w')TTvv Vxr^(i*,u;*) h]{x\uj')TTvU

< 0, for all i G J(**,W*),

= 0, for all j =- 1 , . . . ,q) ,g} n (x* - z*) x1 ,

where ax denotes the orthogonal complement of the linear subspace spanned by the vector a G Rn. Since 1" - z' = F(x%w*) and the CRCQ holds at (x*,w*)) we obtain the following representation of the critical cone which does not involve the auxiliary vector z'\ K(x',u') T(x*,C(uj*))nF(x~,u K(x',w') = T(x\C(u,*))n F(X',UJ')T, J')T, (11) where T(x',C(u')) is the tangent cone of C(u") at x' We write 5n c ( w -)(«*, d) to denote the directional derivative of the function n 0 ( „.) at z' along the direction d. The following lemma is a summary of some known results whose proofs can be found in [21, 19, 28], Lemma 3 Let u' be fixed. Under the CRCQ stated above, for every d G Rn, the Bderivative BUciu-)(z*,d) is the unique optimal solution of the strictly convex quadratic program in the variable v G Rn: minimize \vTAv - vTd subject subject to to vGfC(x',tj'), VGIC(X',LJ'),

° )

J. S. Pang

274 where

A = I+ £ AiVL«(x%«*)

(13)

i:A,>0

is symmetric positive definite, and (A,/j) £ M(2*,u>*) is arbitrary. Moreover, if each g;(-,to*) is affine (yielding A = I), then the B-derivative BUc(u*)(z", •) i s strong; indeed it holds in this case that

nc(„.)(z* + ») = x* + nc(*.,M.)(t>)

(14)

for all v £ R™ with \\v\\ sufficiently small. Related results on the directional differentiability of solutions to parametric non linear programs under other constraint qualifications can be found in [32] and the bibliography in [24].

Verification of assumptions (A)-(D) We now focus our discussion on the parametric equation (7). The following result identifies a FOA of H(-,ui") at z" Proposition 1 Suppose the CRCQ holds at (x*,ui*). given by the continuous function Z H (VXF{X',UJ')-

I)BUc{u,'){z\z-

A FOA of H(-,u>") at z* is

z') + z - z ' ,

for all z £ Rn

Moreover, this FOA is strong if each *)-I)Bnc{„-)(z',v)

+ v,

foralluGi?"

(15)

is clearly positively homogeneous, (B) holds with this function G. To verify assump tion (C). Suppose that the MFCQ also holds at (x',u>'). We have H(z,v)

- H{z,u*) =

F(uCM{z),ui) - F(nC(„.)(z),w*) - n C H ( z ) + nc(u>)(z) = [F(n CM (2),u;)-F(n c(w) (;),^)]+[ J F(n CM ( 2 ),a;-)-F(n c(l ,. ) ( 2 ),^*)] -n C (u,)(*) + n C ( l J . ) (2). By Lemma 2 and the differentiability assumption of the function F, it follows that condition (C) is satisfied for H at (z*,u>*). Incidentally, the sole role of the CRCQ is

Solution Stability of Nonsmooth

Equations

275

to ensure that ( j l l ^ ^ ^ ) — ric(^*)(2:)|| is bounded by a constant (that is independent of z near z*) times ||w — w*||. As long as the projection operator has the latter property (possibly without the CRCQ), it becomes possible to apply Theorem 1. Finally we come to the last condition (D). We postulate the following assumption: (SONC) for every t; € £(x*,w*), there exists (\,(J.) € M(z*,u*) vTVxC(x*,\,ti,ur)v>0.

such that (16)

The reason why we label this assumption as SONC is that when F(x,u)

= Vx0(x,u),

for all (x,u) 6

Rn+m,

for some real-valued twice continuously differentiable function 6 : Rn+m —► i?, the above SONC is exactly the second-order necessary condition [4] for the following nonlinear program in the variable x € Rn: minimize

9(x,u>*)

subject to x € C[uf) Note that in general the above SONC is satisfied if for every (A,/i) e M(z*,ui*), the matrix V r £ ( i * , \,JJ.,OJ*) is copositive on the critical cone £(x*,u;*). Proposition 2 If SONC holds, then condition (D) is valid for the function H defined in (7). Proof. Let the function T required in (D) be the identity map. It remains to verify that I+ 6G never vanishes at a nonzero vector for all 8 > 0, where G is given by (15). Assume the contrary; let w / 0 be such that w + 6G(w) = 0,

(17)

z

for some 8 > 0. Let v = BYlc(u'){ ','w)- By Lemma 3, v € fC(x',u>*). Let (A,/i) G M(z*,ui*) be such that (16) holds. Let A be defined by (13) with this (A,//). Consider the map H(d) = (7 - A)P(d) + d, deRn, where P(d) is the unique solution of the quadratic program (12). In particular, we have P(w) = v. According to [21, Lemma 8], the map H is a global Lipschitzian homeomorphism from Rn onto itself, and its inverse is given by ■H-1(y) = (A-I)U)C(y)

+ y,

where K. is a, shorthand for K.(x*,u>"). Moreover, we have UK: o ft = P In terms of the mapping P, the equation (17) can be written as (l+8)w

+ 8(VxF(x',co*)-I)v

= 0.

(18)

J. S. Pang

276

Let u = H(w).

T h e n we have U/c(u) = P(w) = v; moreover w = W _ 1 ( « ) -{A-

I)UK(u)

+ u = (A - I)v + u.

Substituting t h e m a t r i x A from (13), we obtain (1 +8)(u-

I f e M ) + ((1 + 6)A + 8(VxF(x~,to-)

- / ) ) « = 0,

or equivalently, (1 + S)(u - n K ( « ) ) + (A + SVxC{x\

A, n,ui'))v

= 0.

Premultiplying this equation by vT = IIJC(W) T and using the following facts: A is pos itive definite, vT'VIC(x', A, fi,u}*)v > 0, and since XT is a cone, HK;(U)T (U — H/c(u)) = 0, we deduce t h a t v = 0. In turn, from (18), we obtain w = 0, which is a contradiction. Q.E.D. Summarizing t h e above discussion, we can now s t a t e the following result which is essentially a corollary of Theorem 1. In this result, K" denotes the dual cone of the set K. T h e o r e m 3 Assume that the differentiability and convexity conditions on the func tions F,g, and h as stated in the beginning of this section are valid. Let (x*, Z*,UJ") be as given above. Suppose that the CRCQ, MFCQ, and SONC hold at (x*,u>*). The following statements are then equivalent. (a) x* is a stable solution (b) For every (\,fi)

of the parametric

G M{z',ui"), 0 ^ v €

the implication K(x",u*}

e fC(xm,urY

Vx£(x*,\,fi,u>')v

(c) There exists (A,//) € M(z*,w*)

VI (F(-,to*),

C(UJ')).

below holds: vTVxC(x',\,fi,w')v>0.

such that the implication

(d) For every ( A , / J ) € M(z",u>"), there exists a constant vector q £ Rn, the generalized linear complementarity variable v: v €

(19)

' (19)

holds.

c > 0 such that for every problem (GLCP) in the

K.{X",UJ")

5 + V1£(l^A,(J,a:>6):(l>•)• T

v {q + \'IC(x',\,n,u>')v) has a solution

and all such solutions

v satisfy

(20)

= 01 \\v\\ < c\\q\\.

(e) There exist (A,/x) 6 M(z",<jj*) and c > 0 such that the conclusion

of (d)

holds.

Solution Stability of Nonsmooth

Equations

277

In addition, if the matrix VxF(x",u>*) is symmetric, then any one of the above state ments (a)—(e) is further equivalent to the either one of following two statements. (f) For every (A,/j) e M(z*,u)*), the implication below holds: 0 / i i £ KI(x*,u>*) =► vTVzC(x*,\,ti,w*)v (g) There exists (X,fi) 6 M(z',u")

> 0.

(21)

such that (21) holds.

Proof. The assumptions of Theorem 1 hold for the function Thus this theorem is applicable. (a) =>• (b). By the implication (d) => (c) in Theorem function G defined in (15) has a unique zero at the origin. (A,/J) G M(Z',UJ"), the implication (19) fails to hold for some satisfies 0 / v € K(x',u>*) S7x£(x*,\,(i,w*)v

H at the pair (z*,w*). 1, we deduce that the Suppose that for some vector v. This vector v

e K.{x',w*)*

T

V VIC(X\\,H,IJJ')V

= 0.

Let A be defined by (13) with this (A, /x), and let H and P be the mappings associated with this matrix A, as in the proof of Proposition 2. Let u = v — VxC(x", A, /j.,u>*)v. Then we have v = II^(u), where K. is a shorthand for K.{x',u>*). Let w = 'H~1(u). Then w = (A- I)Uic(u) + u = (A- I)v + u. Moreover, we have P(w) = lie ° ~H(u>) = IIJC(U) = u; that is, v = Now, G(w) = (VxF(x',io*)I)BUc{u,.}(z',w) +w = (VxF(x-,um)

- I)v + (A-I)v

= {VxC(x',\,n,u>')-

BYlc^'){z',w).

+u

I)v + u = 0.

Thus w = 0 which implies u = v = 0. This is a contradiction. Hence (b) holds. (b) => (c). This is obvious. (c) => (a). Either assumption (c) or (b) is equivalent to the function G having the origin as the unique zero. Thus (a) holds if either (c) or (b) holds. Consequently, (a), (b), and (c) are equivalent. In a similar way, we can establish that either (d) or (e) is equivalent to the function G being surjective and G~1(B(0,1)) being bounded. But the latter two facts are exactly condition (b) in Theorem 1. Consequently, the five conditions (a)—(e) in the present theorem are all equivalent. Finally, if VxF(x",w") is symmetric, then so is VxC(x', A,/z,a;*) for any (A,/i) £ M(z~,u>'). In this case, the equivalence of the two implications (19) and (21), under the assumed SONC, is an elementary fact which has been noted in [10, Proposition 6], amongst other places. Q.E.D.

278

J. S. Pang

The implication (21) states that the matrix VxC{x*,A,/i,w*) is strictly copositive on the cone K(x*,u*). When specialized to the case oo farametric conlinear programming, this condition is exactly the second-order sufficient condition used e.g. by Robinson [27]. It is a weakened form of the well-known, classical second-order sufficiency condition in nonlinear programming, which essentially requires that the matrix be positive definite on the linear span of the cone JC(xm,u?*). We make some further remarks about Theorem 3. From complementarity theory [9], it can be shown that for «, given pair (A,/z), if the matrix VxC{xm, A,//,u>*) is copositive on K(x',w'), then the implication (19) holds for this (\,/i) if and only if the GLCP (20) has a solution for all vectors q G R1 and all such solutions must be bounded by a constant multiple of ||g||. It is known from [10, Theorem 8] that if the matrix VI£(i;*,A,^,w*) is copositive on JC(x*,w*) for all pairs of (A,/i) G M(z*,w*), then without the CRCQ, statement (b) )mplies (a)) Thus the CRCQ allows su to weaken the assumption in this previous result (which requires the copositivity of the Jacobian matrix of the Lagrangian function for all multiplier pairs) to the present SONC. At the present time, we are not sure whether some form of Theorem 3 will be valid without the CRCQ.

6

The Case of a Fixed Polyhedral Set

We consider the case where C(u) is a constant polyhedron in Rn, which we denote C, for all u G R"1. Our goal is to apply Theorem 2 to characterize the strong stability of a solution x" € SOL(F(-,w*), C). We leave it to the reader to specialize Theorem 3 to obtain detailed characterizations for stability in the (constant) polyhedral case. Here we simply point out that in this case, statement (b) in Theorem 3 is equivalent to Reinoza's strong positivity condition [25]; further discussion on the latter condition can be found in [2, 10]] Throughout this section, the MFCQ is not needed, whereas the CRCQ clearly holds because of the polyhedrality of C. As before, let z* = x* - F(x*,w*). Thh function H has s slightly yimpler rorm: H(z,w) = F(Yl H(z,to) F(Uc{z),o + z-nc{z). z-Ilc(z). c(z),u)J)

(22)

Let K denote the critical cone of C at x' with respect to the function F(-,w*); that is, K= T(i*,C)nF(x*,)*)x, where T(x', C) is the tangent cone of C at x' 2 i - (VxF[x*,w*) F{x',u*)

- /)II;c(z I)U^{z -z')

Define the functions: + z-

F{x*,ui')-J)H^(v) G(v) = (VIxF(x*,a)*) - I)UK(v) + v

z',

for all z G e R" Rn

for all v G Rn

We note that G is the "linear" normal map associated with the pair (V,F(x',u ( V , F ( * '},,),fC); «•).*);

(23)

Solution

Stability

of Nonsmooth

Equations

279

see [29]. Necessary and sufficient conditions for such a m a p to be a Lipschitzian h o m e o m o r p h i s m on B" are obtained in this reference. In particular, one of these conditions is t h a t G is ''coherently oriented"; see t h e reference for definition. Further "nonsingularity" results of t h e m a p G in t h e case where VxF(x',ui') is symmetric can be found in [30]. We also note t h a t there is a one-to-one correspondence (of t h e s t a n d a r d type) between t h e solutions of the equation

q + G{w) = 0 and t h e solutions of t h e affine variational inequality (AVI) defined by t h e polyhedral cone K, and t h e affine m a p t> — i>q+

VxF(x",u*)v.

We denote t h e latter problem by AVI (q, VxF(x', ui'),K). Since AC is a cone, this AVI is equivalent to the G L C P (20) with VxC(x', A, fj.,u>*) and >C(X',UJ*) s u b s t i t u t e d by VxF(x*,u>*) a n d K. respectively. By t h e identity (14) and the continuous differentiability of F, it is not difficult to verify t h a t assumption ( A ) ' is valid. Assumption (C)' is also easy to verify. In the present setting, t h e S O N C takes the following simple form: ( S O N C : c o n s t a n t p o l y h e d r a l c a s e ) : t h e m a t r i x VxF(x*1 cone tC.

u>*) is copositive on the

We are now in a position to apply T h e o r e m 2 and obtain the following analog of T h e o r e m 3 for t h e strong stability of x" T h e o r e m 4 Let C be a polyhedron in Rn; let F : Rn+m —> FC be continuously differentiable. Let x" 6 SOL(F(-,u}"),C). Suppose that \7XF(X',UJ*) is copositive on the cone K. The following statements are then equivalent. (a) x* is a strongly

stable solution

of the parametric

(b) The map G defined by (23) is a Lipschitzian (c) The map G is coherently

(F(-,LO*),C).

homeomorphism

on Rn

oriented.

(d) The origin is a strongly stable solution (e) For every vector q 6 Rn,

VI

of the AVI (0,

the AVI (q,VxF(x*,w*),fC)

VXF(X',OJ*),K,). has a unique

solution.

P r o o f . T h e r e is nothing m o r e to prove about this theorem except to point out two things: (i) s t a t e m e n t (d) is equivalent to t h e strong stability of the origin as a zero of t h e m a p G, and (ii) t h e unique solvability of the AVI (q, VxF(x",u>*),IC) is equivalent to t h e bijectivity of t h e m a p G. T h e asserted equivalence of t h e five s t a t e m e n t s ( a ) — (e) is easily seen to be a consequence of T h e o r e m 2. Q.E.D.

J. S. Pang

280

Remark. The implication (b) => (a) is closely related to the implicit-function theo rem for the parametric normal map for H in (22); see [28]. As such, it holds without the copositivity assumption of VxF(x", LJ*). This assumption is needed for the reverse implication (a) =£- (b) or (c), which is like a converse of the implicit-function theorem. To the best of our knowledge, such a converse has not appeared in the literature.

7

The K K T System of a VI

We can apply Theorems 1 and 2 to the KKT system of a parametric VI defined on a nonconvex set that varies with the parameter. It is well-known that such a KKT system can be considered as a VI defined on a special polyhedral set. Hence in principle, Theorems 3 and 4 are applicable; nevertheless a naive specialization of these latter two theorems would lead to results that are valid only under a restrictive positive semi-definiteness assumption; see the discussion below. In order to obtain characterizations under an alternative, and perhaps less restrictive, assumption we revisit the assumption (D) in the context of the KKT system. This is the main focus of the following analysis. Consider the following system in the primary variables (x, X,fj.)^Rn+p+q: C(x,\,fi,u) A>0,

=0 \Tg(x,uj)

g(x,u>)<0,

(24)

h(x,ui) = 0, where C is the VI Lagrangian function defined in (10). Unlike the development in Section 5, we do not assume that g(-,u>) is convex or h(-,u>) is affine. Instead, we continue to assume that F is continuously differentiate and g and h are twice continuously differentiable. Define the function H : /?"+?+?+"• _> R"+P+I by / £(*A^,w) \ H(X,X,H,UJ)

for (i,A,/z,u>) e ^+"+9+"

-g{x,u>)

=

\

-h(x,uj)

J

and let C = Rn x R^. x R>. Then the system (24) is equivalent to the parametric VI (H(-,u>),C) where C is fixed. Let y G Rn+?+i denote a general triple (xt\,fl). We have VxC(x,\,fi,u}) Vxg(x,u)T Vxh(x,u)T VyH(x,\,p,oj)

-Vxg(x,u)

0

0

-Vxh(x,u>) Let y* = (x',\*,fi*) be a given solution of (24) corresponding to u>*. We need to evaluate the critical cone K.{xj',C) of C at the KKT-triple y*. For this purpose,

Solution Stability of Nonsmooth

Equations

281

define three index sets Q = { i : A* > p = {i:\;

0=gt{x",u*)}

=

o=gi(x;L,')}

7 = { i : A? = 0 >
= Rnx

(RW x R^+lM)

X R>.

Hence fc(y*,C) = Rn x (ijN x flf x {0} h l ) x

fl'.

(25)

By the special structure of the critical cone )C(y", C) and the matrix VyH(x, A,^,OJ), it can be seen that the copositivity of the latter matrix on the former cone is equivalent to the positive semi-definiteness of VxC{x, \,fi,uj); this positive semi-definiteness assumption is needed in order for Theorems 3 and 4 to be directly applicable to the parametric KKT system (24). In what follows, we shall derive an alternative assumption for the characterization of stability and strong stability of the KKT triple (*v,A*,/«*). Specifically, we shall assume that the strict Mangasarian-Fromovitz constraint qualification (SMFCQ) holds at (x*, X",fi',u>*); namely, (a) the gradient vectors { V I 0 < ( a : \ u ; * ) : i e a } u { V I ^ ( 2 : \ w * ) :j =

l,...,q]

are linearly independent; (b) there exists a vector u 6 Rn such that Vigi(x*,u')Tu<

0

for alii 6/9,

T

Vxgj(x*,u>") u = 0 for all i 6 Q, Vxh3{x',oj*)Tu

=0

for a l l ; =

l,...,q.

It has been proved in [13] that the SMFCQ holding at (x*, A*,/i*,w*) is equivalent to (A*,/i*) being the unique KKT multipliers corresponding to the pair (x*,w*). With K(x",u)') denoting the critical cone as given by (11), (note that K(x'',ui"') is different from the cone K(y",C) defined in (25)) we postulate the following assumption which is basically the same as the SONC in Section 5 for the case of convex constraints because of the SMFCQ: ( S O N C : under SMFCQ) the matrix VxC(x", A*,/i*oj*) is copositive on the cone K{x',u"). The key in applying Theorems 1 and 2 to the parametric KKT system (24) at the triple (x",\*,ti") hinges on verifying assumption (D) for the function G(z) = (VyH(x*,\',ix',u;')~I)ntciy.,c)(z)

+ z,

for z 6 Rn+"+",

(26)

J. S. Pang

282 We define the required function T as follows. Let Vxg(x',Lj')T

/

Vxh(x',u')T

-Vxg(x",u*)

0

0

-Vxh(x',u)*)

0

0

and T(z) = (E-

/)n K ( v ., c ) (z) + at,

for

2

e

Bn+P+".

L e m m a 4 For the functions T and G defined above, T + 8G has the origin as the unique zero for all 6 > 0, and ind (T,0) = 1. Proof. This result follows from the theory of the mixed linear complementarity problem (LCP) as established in [10]. In what follows, we sketch the key ideas of the proof and refer to the reference for the omitted details. We have ( r + 6G)(z) = (1 + S)[(E(S) - / ) I W , c ) ( z ) + *], where (J + *V«£(y»,«'))/(l + tf) Vxg(x-,Lo-)T E{6)

Vxh(x',w*f

-Vxg(x',ui*)

0

0

-Vxh(x",uj')

0

0

We note that there is a one-to-one correspondence between the zeros of the function T + 6G and the solutions of the following homogeneous, mixed LCP in the variables (x,X,n): (I + 6VxC(y-,u,'))x

+ Ef=1 A,-Vx5i(a:*,W*) + £]=i ^ V A ( s W ) = 0 Vxgi(x",w')Tx

K > 0,

= 0,

<0

T

=0

Vxgi(x',uj') x

\l(S7xgl(x",u>*) x) \i = 0, T

Vxhj(x',u') x

for i 6 a

T

for i G ft

for i 6 7

= 0,

for j = l , . . . , o .

By the SMFCQ and the SONC, it can easily be shown that this mixed LCP has a unique solution, namely (x, A, fj.) = 0; see the proof of Theorem 7 in [10]. Thus T + SG has a unique zero. To show ind(T,0) = 1, we consider the following homotopy joining the identity map and V: for (2, t) S #" + p + " x [0,1], n{z,t)

= tz + (l

-t)T(z).

283

Solution Stability of Nonsmooth Equations

It suffices to verify that for each t € (0,1], H{-,t) has a unique zero, namely zero. The desired index property of T will then follow from the homotopy invariance principle of degree and the fact that the index of the identity map at zero is equal to one. We have H{z, t) H{z, t) = = ( [(1 - t)E t)E + + tl]-l t l ] - l )U )IIJC(*) + z, z, K(z) + which shows that for each t E (0,1] H{-,t) is the linear normal map associated with the positive definite matrix (1-t)E + tI and the polyhedral cone K. As such H(-,t) is a global homeomorphism, hence it has a unique zero. Q.E.D. With Lemma 4, we can now apply Theorem 1 to characterize the stability of the KKT triple (x*, A*, /**). Theorem 5 Let F be once continuously differentiable and g and h be twice continuously differentiable. Let y' = {x',X*,u') be a KKT triple at w*. Assume that the SMFCQ and the SONC hold at (x\ A*,/i*,w*). The following statements are then equivalent. (a) y' is a stable solution of the parametric KKT system (24) at LJ* (b) The implication below holds: \ =>v =>vTTVVxx/l{x*,X*, /l{x*,X*,ffi',uj*)v>0. J

e K{x\uny

C(x',X',^,u>')v VxC(x',X',v',u')v

(27)

(c) The GLCP (q,Vr£(i*,\*tfi*,u*),IC(x*,u}*}) has a solution for all vectors q G R1; moreover there exists a constant c> 0 such that for all q and for any such solution v, \\v\\ . As in the proof of Lemma 4, it follows that the following homogeneous, mixed LCP: V

VXxC{ C{y',u*)x A,V,<7,.(W) + + E'=i EU c^M,"^') c M,%*)) V + J£f=i X i kVM(*,"*) V',LJ')X TT

Vxxg,(x',u>*) g,(x',u>*)x x = 0, A;>0, A;>0,

TT

fori£«

Vxgi(x',u;*) (x',u;*) x<0 x<0 TT

Xt{V {Vxxg,{x',u') g,{x',u') x)x) A, = 0 ,, TT

) \ for i e6 8 =0J

for x £ 7

VxxhhJJ(x*,Uj') (x*,Uj')x x = = 0,

iovj = = iovj

l,...,q. l,...,q.

=° =

J. S. Pang

284

has a nonzero solution (x,\,[i). By the S M F C Q , it can be proved t h a t t h e vector x m u s t b e nonzero. This vector x will t h e n violate t h e implication (27). Conversely, if v violates t h e latter implication, by reversing the a r g u m e n t , it is easy to construct a nonzero vector z such t h a t G(z) = 0. T h e equivalence of (b) and (c) under t h e SONC has been noted at t h e end of Section 5. Finally, the last equivalence has been noted in t h e proof of T h e o r e m 3. Q.E.D. T h e r e is much similarity between the two Theorems 3 and 5. T h e r e are also some differences. First, t h e assumptions are different: in t h e former, convexity of t h e sets C(u>) is assumed and t h e C R C Q is needed; in t h e latter, neither t h e convexity nor the C R C Q is assumed. Instead, uniqueness of the multiplier pair (X*,fi*) is assumed in T h e o r e m 5. This uniqueness assumption leads to t h e second difference between the two theorems. In Theorem 3, t h e stability of the solution x" £ SOL(F(-,u!*),C(iom)) is characterized; although the multiplier m a p must be locally upper Lipschitzian at (x*,u)*) [27, 23, 10], Theorem 3 does not assert t h e stability of any multiplier pair. One last difference between the two theorems is t h a t t h e way t h e y are derived. Although both results are corollaries of Theorem 1, they are based on different systems of equations. We next obtain necessary and sufficient conditions for t h e K K T triple (x*, A*,//*) to be strongly stable. T h e o r e m 6 Let F be once continuously differentiable and g and h be twice contin uously differentiable. Let y" = (i*,A*,u)*) be a KKT triple at UJ* Assume that the SMFCQ and the SONC hold at (%*, A*,/i*,w*). The following statements are then equivalent. (a) y* is a strongly stable solution

of the parametric

(b) The mapping G defined by (26) is a Lipschitzian (c) The

KKT

system

homeomorphism

on

matrix ■V,C(X',X*,fi*,u>) A=

is nonsingular,

Vxh(x*,uj')T

(Vxga(x%uj*)f

-Vxga(x*,u>*)

0

0

-VJi{x*,u*)

0

0

and the Schur

complement - {Vx9l)(x',L0'))T 1

[ Vxg0{x\u>') 0 0 ] A-

0 0

is a

(24) at

P-matrix.

1

u', Rn+p+q.

Solution Stability of Nonsmooth

Equations

285

(d) y" is a strongly regular solution of the parametric KKT system (24) at u>* in the sense of Robinson [26]. If in addition the matrix VxF(x',u>*) is symmetric, then any one of the above state ments (a)—(d) is further equivalent to the following two conditions combined: (LICQ) the gradient vectors {Vx9i{x',w')

: i € Z(x*,w*)} U { V A ' ( z W ) : j = 1,...,«}

(28)

are linearly independent, and (SSOSC) the matrix VxC(x~,\*,fi*,u>') is positive definite on the null space of the vectors {Vr<7;(x*,u;*) :i€a}\j{Vxhj(x',u>') : j = 1,. .., (d). Finally, we sketch the proof for the reverse implication (d) =3- [LICQ + SSOSC] under the SONC and the symmetry of VxF{x',w"). (As mentioned in the beginning of the paper, this result is due to [3].) By the symmetry of VXF(X',LO*), it can be established that (c) is equivalent to the LICQ and the following implication: for any vector v ^ 0 such that V xC(x' ,\*, pT ,u')v belongs to the linear span of the gradient vectors (28) and V'xgi(x~,u*)Tv = 0, T

Vxh3{x',uj*) v

i€ a

= 0,

j-l,...,q,

T

we have v V xC{x*, X',fi',uj")v > 0. Moreover VxC(x", A*, ji*,u>") is strictly copositive on the cone JC(X',LJ'). Let u be an arbitrary, nonzero element of the null space of the vectors in (29). Consider the equality-constrained quadratic program: minimize

^vT'VxC(x',\"

,fi*,u>")v

subject to Vxg,(x',uj")Tv T

Vxhj{x',w') v T

Vxgi(x*,Lo') v

= 0,

i€ a

= 0,

j —

l,...,q T

= Vxg{(x',uj') u,

i e /3.

The vector u is feasible to this program; moreover by the strict copositivity of VxC(x', A*,/i*,a>*) on K.(x",u>"), it follows that this quadratic program has an opti mal solution v. \iv = 0, then u e K.{x*,u>") and thus uTVxC(x*, \",fi*,uJ*}u > 0. If 0 ^ 0 , then we have UTVXC{X',\\H',UJ')U

> vTVxC{x',y,ti\u>")v

> 0,

286

J. S. Pang

where the last inequality holds because VxC(x*, X*, fi*,u>*)v must be a linear combi nation of the vectors in (28), by the optimality of v. Q.E.D. We conclude this paper by mentioning that Theorems 1 and 2 can also be applied to some quasi-variational inequalities and complementarity problems of various types; the application will extend previous sensitivity results for these problems [22, 15]. The details are omitted. Acknowledgments. The author is deeply indebted to Dr. Frederic Bonnans for some fruitful discussion on the subject of this paper while he was visiting INRIA, France in June 1994. Indeed, it was Dr. Bonnans who told the author about the Bonnans-Sulem characterization of strong regularity in [3] that led him to the main results of this paper, Theorems 1 and 2. The author also acknowledges the generous support of INRIA which has made his visit there possible; he is especially grateful to Claude Lemarechal for his kind hospitality as the local host. Dr. Danny Ralph has made some helpful comments regarding a draft of this work and provided clarifications to Lemma 3 which essentially is his joint result with S. Dempe. Finally, the author is indebted to Dr. Asen Dontchev for constructive comments on Theorems 1 and 2 and for making available to the author his preprint [6] which contains generalizations of part of these two theorems.

References [1] J.-P. Aubin and H. Frankowska, Set-Valued Analysis (Birkhauser, Boston 1990). [2] J. F. Bonnans, Local analysis of Newton-type methods for variational inequalities and nonlinear programming, Applied Mathematics and Optimization 29 (1994) 161-186. [3] J. F. Bonnans and A. Sulem, Pseudopower expansion of solutions of general ized equations and constrained optimization problems, manuscript, INRIA (May 1994). [4] J. V. Burke, An exact penalization viewpoint of constrained optimization, SIAM Journal on Control and Optimization 29 (1991) 968-998. [5] S. Dempe, Directional differentiability of optimal solutions under Slater's condi tion, Mathematical Programming 59 (1993) 49-69. [6] A. L. Dontchev, Characterizations of Lipschitz stability in optimization, in R. Lucchetti and J. Revaliski, eds., Well-Posedness and Stability of Optimization Problems and Related Topics, Kluwer Academic Publishers, to appear. [7] A. L. Dontchev and W. W. Hager, Implicit functions, Lipschitz maps and sta bility in optimization, Mathematics of Operations Research 19 (1994) 753-768.

Solution Stability of Nonsmooth

Equations

287

[8] T. Fujisawa and E. S. Kuh, Piecewise-linear theory of resistive networks, SIAM Journal of Applied Mathematics 22 (1972) 307-328. [9] M. S. Gowda, Complementarity problems over locally compact cones, SIAM Journal on Control and Optimization 27 (1989) 836-841. 10] M. S. Gowda and J. S. Pang, Stability analysis of variational inequalities and complementarity problems, via the mixed linear complementarity problems and degree theory, Mathematics of Operations Research 19 (1994) to appear. [11] R. Janin, Directional derivative of the marginal function in nonlinear program ming, Mathematical Programming Study 21 (1984) 110-126. [12] D. Klatte, On qualitative stability of non-isolated minima, Control and Cybernatics 23 (1994) 183-200. [13] J. Kyparisis, On uniqueness of Kuhn-Tucker multipliers in nonlinear program ming, Mathematical Programming 32 (1985) 242-246. [14] J. Kyparisis, Parametric variational inequalities with multivalued solution sets, Mathematics of Operations Research 17 (1992) 341-364. [15] J. Kyparisis and C. M. Ip, Solution behavior of parametric implicit complemen tarity problems, Mathematical Programming 56 (1992) 65-70. [16] N. G. Lloyd, Degree Theory (Cambridge University Press, Cambridge 1978). [17] J. More and W. C. Rheinboldt, On P- and S-functions and related class of ndimensional nonlinear mappings, Linear Algebra and its Applications 6 (1973) 45-68. [18] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York 1970). 19] J. S. Pang, Newton's method for B-differentiable equations, Mathematics of Op erations Research 15 (1990) 311-341. 20] J. S. Pang, A degree-theoretic approach to parametric nonsmooth equations with multivalued perturbed solution sets, Mathematical Programming 62 (1993) 359383. 21] J. S. Pang and D. Ralph, Piecewise smoothness, local invertibility, and paramet ric analysis of normal maps, Mathematics of Operations Research 20 (1995) to appear. 22] J. S. Pang and J. C. Yao, On a generalization of a normal map and equation, SIAM Journal on Control and Optimization 32 (1994), to appear.

288

J. S. Pang

[23] Y. Qiu and T. L. Magnanti, Sensitivity analysis for variational inequalities, Math ematics of Operations Research 17 (1992) 61-76. [24] D. Ralph and S. Dempe, Directional derivatives of the solution of a paramet ric nonlinear program, manuscript, Department of Mathematics, University of Melbourne, Victoria, Australia (March 1994). [25] A. Reinoza, The strong positivity condition, Mathematics of Operations Research 10 (1985) 54-62. [26] S. M. Robinson, Strongly regular generalized equations, Mathematics of Opera tions Research 5 (1980) 43-62. [27] S. M. Robinson, Generalized equations and their solutions, Part II: Applications to nonlinear programming, Mathematical Programming Study 19 (1982) 200-221. [28] S. M. Robinson, An implicit-function theorem for a class of nonsmooth functions, Mathematics of Operations Research 16 (1991) 292-309. [29] S. M. Robinson, Normal maps induced by linear transformations, of Operations Research 17 (1992) 691-714.

Mathematics

[30] S. M. Robinson, Nonsingularity and symmetry for linear normal maps, Mathe matical Programming, Series B 62 (1993) 415-426. [31] S. Scholtes, Introduction to piecewise differentiable equations, Preprint No. 53/1994, Institut fur Statistik und Mathematische Wirtschaftstheorie, Univer s i t y Karlsruhe (1994). [32] A. Shapiro, Sensitivity analysis of nonlinear programs and differentiability prop erties of metric projections, SIAM Journal on Control and Optimization 26 (1988) 628-645.

Convergence

Theories

289

Recent Advances in Nonsmooth Optimization, pp. 289-321 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Miscellaneous Incidences of Convergence Theories in Optimization and Nonlinear Analysis, P a r t II : Applications in Nonsmooth Analysis 1 Jean-Paul Penot Faculte des Sciences, Mathematiques appliquees, Av. de I'Universite, 64000 PAU, France

URA CNRS

1204,

Abstract

We examine the common features of various approaches in nonsmooth analysis in an unified way. In particular, we consider the question of the equivalence of a geometrical approach with an analytical approach. This question can be split into two parts we call coherence. We also deal with the delicate problem of stabilization (or closure) of the notions of subdifferential and normal cone and bring some proposals for this problem. Finally we make some general observations about the links of nonsmooth analysis with other fields in which limit problems or stabilized equations bring strange new terms.

In t h e first p a r t [126] of this series of two papers we have considered the inci dence of t h e discovery (or revival) of new types of convergences on optimization. In particular we have dealt with the behavior of solutions to optimization problems un der p e r t u r b a t i o n s of t h e objective in a continuous way with respect to the bounded hemiconvergence or b o u n d e d Hausdorff topology. This topology first introduced in [94] a n d [110] has been given a new life in [109] and [118] and has been extensively studied in [10]-[13]. This topology can be used with profit in nonsmooth analysis; t h e s a m e can be said of t h e Mosco topology and of t h e Joly topology [92], [93], a useful variant for t h e non reflexive case exhibited in [19] and [20] as a workable topology, and t h e n brilliantly characterized in [28], [29] under t h e n a m e of slice topology. These topologies can play a useful role in nonsmooth analysis: see for instance [56], [123]

and [124]. 'Dedicated to Charles Castaing on the occasion of his 60th birthday.

J. P. Penot

290

Our purpose here is different. We mainly focus our attention on the stabilization process for normal cones and subdifferentials (see section 3). Stability, also called closedness, is a very desirable property for a multifunction. If, for instance, the value of this multifunction is the set S(w) of solutions to some problem depending on some parameter w, it ensures that the limit xo of some family of elements x(w) of S(w) as w —► WQ belongs to the set S(wo) of the limit problem. In the case of subdifferentials and normal cones the stabilization procedure brings non trivial sets under appropriate assumptions and, more importantly, enables one to dispose of useful calculus rules. Moreover, it brings a kind of unification of the various subdifferentials. However different procedures can be used: sequential closures are simple but are not closed and not idempotent; topological closures with respect to weak topologies are too large and not suitable for duality arguments. Moreover, any infinite dimensional dual space contains unbounded weak* converging nets.We try to avoid these drawbacks by using a variant of a convergence on a dual Banach space introduced in another context in [58]. Its use is simple enough and it is suitable for duality questions. In section 2 we simply characterize this convergence as the classical continuous convergence [96]. Before applying this convergence to normal cones and subdifferentials we give in section 1 a bird's eye-view on the basic constructions of nonsmooth analysis. The starting point for such constructions can be either a geometrical approach or an analytical approach. We examine whether the connections between both approaches are reversible or not. It appears that such a property we call coherence can be given simple characterizations. It has the pleasant effect of discarding any ambiguity in speaking of normal cones or subdifferentials of a certain type. Although we do not treat this question in an axiomatic way, we try to adopt a unified point of view. The need of such a unified treatment has been felt by several authors for some questions such as the Mean Value Theorem (see [16], [54], [60], [102], [128], [143] for instance). We devote the last section of the paper to analogies we point out between nonsmooth analysis and other topics of infinitesimal analysis such as homogeneization theory. In both fields, it is interesting to take unconventional limits using epiconvergence theory, either to introduce some kind of epi-derivative or some limit problem. When writing down in concrete terms the expression of the limit one gets both clas sical terms and "alien" terms whose effect are often decisive. We believe that more attention should be given to such phenomena which are quite important in semiinfinite optimization, stochastic programming and sensitivity analysis one one hand, elasticity and mechanics of composite or porous media on the other hand.

1

Coherence in Nonsmooth Analysis

The abundance of concepts in nonsmooth analysis is certainly a nuisance for the users and for the researchers. Thus it may be useful to single out a few useful concepts and eliminate the others, or to put it in more gentle and sensible terms, to leave the other concepts in the shadow, in some sort of reserve from which they could

Convergence Theories

291

be extracted if needed. Many biologists and ecologists have a similar point of view about living species : not all of them are presently useful for the needs of agriculture or industry, but it may appear that new needs turn out to make precious some neglected species. Chemistry has also shown that rare and queer elements such as cobalt, lithium, uranium and thorium may turn out to be crucial for our needs. Another way to make the field of nonsmooth analysis easier to grasp consists in making simple general observations about its developments and its rules. Since we believe it is too early to decide whether 3, 4 or 7 concepts deserve a special distinction, we will follow here this second path. First we observe that two points of view can be taken as a starting point : a geometrical approach and an analytical approach. Fortunately several connections are known between these two approaches : epigraphs for one direction, indicator functions, distances functions and support functions for the reverse direction. Then a question arises : are the two points of view equivalent? Second, we note that the preceding question can be raised at two different levels since there exist constructions which deal both with primal and dual objects according to the general scheme tangent cone T'

<=* directional derivative d'

Tl

Tl

normal cone N' *^ subdifferential <9' For some concepts only the lower level is available directly. Since we consider this level is the essential one, although primal concepts from the upper level may be very useful, for instance in bifurcation theory or in viability theory, we restrict our attention here to the dual notions of the lower level. When primal notions are available directly one obtains normal cones from tangent cones by taking polar cones and one gets subdifferentials from directional derivatives by taking continuous linear forms which minorize the directional derivatives.We observe that arrows from the lower level to the upper level can be drawn too, using polarity and support functions respectively. However such processes entail convexity and a certain loss of accuracy. Therefore, when no convexity is present, usually one avoids to pass from the lower level to the upper level because the reverse way will not bring back to the same sets. Nonetheless, we will do it below to identify the polar cone to the limiting normal cone. Let us insist on the fact that we adornate each piece of the preceding four pilarred construction by a symbol marking its type. Here we use the symbol ? as a generic type. For the Dini-Hadamard (or contingent) notions such as the contingent cone to a subset F at a T(F,a) : = l i m s u p -(F - a) 1-.0+ t

or the contingent subdifferential df of a function / df(a) := (x* G X* : lim

inf

-if {a + tu) - f(a)) > (xm, v) Mv G X

J. P. Penot

292

we do not use any superscript or subscript because it seems to us that they are the most basic and primitive notions. As an illustration of this assertion let us mention the following result which proof is an easy consequence of the definitions. An analogous result holds with the Frechet normal cone and the Frechet subdifferential which are closely related notions defined below. In fact the result is valid for the whole family of subdifFerentials 9' and the family of related normal cones N> associated with a bornology B* as in [42]. Proposition 1 ([115], Cor. 4-3) Suppose f attains on F a finite local maximum at a £ F. Then df(a)cN(F,a):=(T(F,a)Y In particular, if f is Hadamard differentiable at a one has f'(a) 6

N(F,a).

In [124] we used an exclamation mark ! for the contingent type in order to stress its analogy with the inferior Dini-Hadamard notion (or incident or adjacent notion) we denote with some i or an upside down exclamation mark: T'(F,a) :=lim inf -(F - a). For the Clarke-Rockafellar notions (circatangent cone T\ circa -subderivative d}f or / ' , circa-subdifferential d^/...) we use an arrow : T](F,a)

:=lim inf -{F - x). x—a

We observe that these three notions enter a general framework introduced in [117] and used and developed in [70]. This framework also contains the prototangent cone or pseudo-circatangent cone [131] Tp{F,a):=

n

lim inf

-(F-a-tu)

and its two variants, the quasi-circatangent cone [131] T"(F,a):=

n

lim inf

-(F-a-tu)

a+tugF

and the boundedly circatangent cone [145] (see also [78], [90] and [91] for the following characterization) Tb(F,a):=n

k>0

lim inf

~

x

AF - x). K

£$.,M\ -4

Convergence Theories

293

The preceding examples (see also [33], [46], [69], [82], [105] ) are defined for any subset of any Banach space. One may also consider a notion which is only defined for the subsets, or the closed subsets, of a restricted class of Banach spaces X; for simplicity we suppose X contains all finite dimensional spaces and is stable by taking products. Similarly, one may consider a notion of subdifferential which is defined on the set LSC(X) of lower semicontinuous functions on the members X of the class X. We do not introduce here a formal axiomatization for subdifferentials; we refer the reader to the ones already existing in the literature for various points of view (see [16], [69], [82], [143], for instance). Let us first consider the coherence of the geometrical approach. Let us recall that the subdifferential d' associated with a normal cone concept N1 is given by d?f(a) := {x'eX:

N7(Ef,aj)}

(x',-1) €

where a/ := (a, /(a)) and Ej is the epigraph of / , Ej := {(x, r) G X x 2R : r > f(x)}; on the other hand the normal cone notion associated to a subdifferential d' is given

by N7(F,a):=d1iF{a), where ip is the indicator function of F : IF(X) := 0 for x £ F, IF(X) ~ + ° ° for x 6 X \ F. In the following result we make precise what we mean by a coherent normal cone notion. Proposition 2 Let N' be a normal cone notion satisfying the following two condi tions: (a) 7V 7 (.R + ,0) = — JR+ when M+ is considered as a subset of M; (b) for any closed subsets A C A', B cY, a € A, b e B, one has N"(A x B, (a, 6)) = N-(A, a) x N7(B, b). Then the normal cone N" associated with the subdifferential d' deduced from N' coincides with TV'. In such a case we say that N' is coherent. In fact it suffices that 7V? satisfies the following condition: (c) for any closed subset A C X, a £ A, one has N\A

x ffi+,(a,0)) = N?{A,a)

x (-7R+).

Clearly, the incident normal cone and the circa-normal cone satisfy conditions (a), (b). It was first observed by L. Thibault that the contingent cone also satisfies these conditions. In a similar way one can show that the Frechet normal cone given by N-(F,a):=\x-

eX'-.Um {

sup (x* x<=F,x-.a

X

~ " ) < 0}

\\X -

a\\

J

J. P. Penot

294

satisfies these conditions. Proof. By construction, given F C X, a G F, we have x* G N"(F,a) iff x* G a ? i F ( « ) iff ( * * , - ! ) 6 i V ? ( £ , > , ( a , 0 ) ) = N?(F x R+,(a,0)) = N?{F,a) x (-M+), by (c),iffx"G N-{F,a).a Let us now consider the coherence of t h e analytical approach. In t h e following s t a t e m e n t t h e dual space X* is endowed with an arbitrary topology or convergence which is compatible with scalar multiplication. P r o p o s i t i o n 3 Let d be a subdifferential taking closed values. In order that d' coincides with the subdifferential d" associated with the normal cone N' deduced from d' it is necessary and sufficient that for any X in X and for any f G LSC(X), x £ Df := domf one has, with xj := (x,f(x)),

d?f(x) x {-1} = d7tEf(xj) nx'x

{-1}

and it is sufficient that the following two conditions are satisfied for any X in X and for any f £ LSC(X), x£Df: (a) d7iBf(xj) = c/(JR+(0 ? /(aO x { - 1 } ) when d-f(x) / 0; (b) d^Ejixf) n (X* x { - 1 } ) = 0 when d"'f(x) = 0. In such a case we say t h a t d' is coherent. Proof. T h e first assertion follows easily from the definitions. Let us prove the second one. Let X be in X and let / G LSC(X), x £ Df If x* € <9 ? /(i) we have (x*,—1) € d'iEj(xj), by (a), hence x" £ d"f(x). Conversely, if x' £ 9 ? ? / ( x ) i.e. if (x*, — 1) £ d'iE,(xf) condition (b) shows we cannot have 3 ? / ( x ) = 0. Therefore there exist nets ( i , ) , e / , ( x ; ) i g / in 1R+ and <9 7 /(x) respectively such t h a t (x*, — 1) = lim,'6/(£,-(x*, —1)). It follows t h a t (<,),-e/ —» 1 and ( x * ) i £ / —> x*. Since <9 ? /(x) is closed we get x* G d1 f(x).0 It has been observed in [76] Prop. 5.3 t h a t the contingent subdifferential satisfies condition (a) of the preceding proposition. It is easy to show t h a t it also satisfies con dition (b). Moreover these conditions are also satisfied by t h e incident subdifferential d\ the circa-subdifferential <9T and the Frechet subdifferential d~ given by d~f(a)

: = fx* e X'

: lim

inf

||x - a\\-\f(x)

- f(a)

- (x*,x - a)) > o } .

Let us note we can give to [76] Prop. 5.3 the following general form which shows the necessity of condition (a) of the preceding proposition under mild assumptions. L e m m a 1 Let N? be the normal cone associated with a coherent subdifferential <9? Suppose that for each X in X , for any subset E of X and any e G E the set N^(E, e) is a closed and convex cone. Suppose moreover that N?(E,e) C ( R + u ) ° whenever E-\-v C E for some v G X. Then, for any X in X and for any f G LSC(X), x £ Dj with d-f(x) ^ 0 one has : driEf(xf) = cl(JR+(d-f(x) x {-1})).

Convergence Theories

295

Proof. Let (x',r) g N"{Ehx;) := S ? i B / (x/). As £y + (0,1) C E,, we have r < 0. When r < 0 we get ( ( - r ) - V , - 1 ) g tf*{E},Xj), hence ( - r ) - V g 9 ? /(x) and (x*,r) g jR+(d ? /(i) x {-1}). When r = 0, taking x*0 g <9?/(zo) and using the convexity of N?(Ej,Xf), for each t g (0,1) we get « , -<) := (tx'0 + (1 - t)*«, - i ) g J V 7 ^ , , * , ) , hence (x',r) = \im,^0(x',-t) g c/(7R + (d ? /(*) x {-1})) by the preceding case. The reverse inclusion is obvious.□ The proof of the following observation is immediate, but the result is worth noting. Proposition 4 If the subdifferential d' is coherent, then the associated normal cone is coherent. If the normal cone N' is coherent, then the associated subdifferential is coherent.

2

Weak Convergences and Weak Limits Superior

Now let us turn to questions which link convergence theory, functional analysis, nonsmooth analysis and optimization theory. For many purposes, such as the stabilization process considered in the next section, it is necessary to dispose of topologies or con vergences for which sufficiently many compact subsets are available. The familiar weak* topology on a dual space X = Y* meets this requirement. However it has some drawbacks : it is not a sequential topology (i.e. it is not determined by the use of converging sequences), neighborhoods of 0 are enormous, and the coupling functional c : X x Y —> JR given by c(x,y) := {x,y) is not continuous if X is in finite dimensional. This last inconvenience is particularly annoying for questions in which duality plays a key role, in particular when one uses normal cones, limiting subdifferentials or Fenchel conjugation. Let us give a precise statement. Proposition 5 IfYis an infinite dimensional normed vector space and if X = Y*, the canonical coupling functional c : X x Y —> IR given by c(x,y) := (x,y) is not continuous when X is endowed with the weak*-topology and Y is endowed with the topology induced by the norm. Proof. Suppose on the contrary that c is continuous, so that one can find a weak* neighborhood V of 0 in X and a ball B with center 0 in Y such that c(VxB) C [—1,1]. Since V contains an infinite codimensional subspace, there exists a whole line Mv contained in V. Taking u g B such that c(v,u) / 0, we get a contradiction.□ It was independently suggested in [58] and [65] to turn to convergence tools either in terms of nets familiar to analysts in [58] or in terms of filters in [65] , more adapted to a topologist's viewpoint (see [49], [77] for general information on the topic). As a matter of fact, the category of convergence spaces is much more versatile than the category of topological spaces and is quite natural for many questions (for instance

296

J. P. Penot

a.e. convergence, continuous convergence, order convergence,...). Moreover, conver gence is frequently a simpler tool than the use of neighborhoods or open sets : the case of convergence of test functions in the theory of distributions is a striking ex ample but pointwise convergence and weak convergence are other cases familiar to analysts. Sometimes the use of convergences instead of topologies is just compulsory : for instance the Painleve-Kuratowski convergence on the hyperspace V{X) of a nonlocally compact topological space X is not topologizable. Here a convergence on X is understood as a relation 7 between nets of X and points of X denoted by (ii)ig/ —> x (or N -^ x \l N : I —* X denotes the net (x;); e /) satisfying the following conditions: (Ci) the constant net with value x converges to as; (C2) if (xj)jeJ is a subnet of a net (x;),' 6 / —► x then (xj)jej —> x; (C3) if x £ X and a net (x,)ig/ of X are such that any subnet (XJ)J^J has a further subnet (x^ktK such that {xk)keK —* x then (x,);e/ —> x. When the relation —> (also denoted by 7) satisfies condition (C\) only we say that it is a preconvergence. Since a convergence 7 can be naturally associated to a preconvergence 7 by adding to it nets all of whose subnets have a further subnet con verging to the considered point for 7, we will not distinguish between preconvergence and convergence when no confusion can arise. Let us give a precise statement for the preceding assertion. In it we say that a preconvergence a is finer than a preconver gence /3 if any net N converging to x for a also converges to x for /?; then j3 is said to be coarser than a. The notion of subnet we adopt here is the one introduced in [1]; using the original notion due to Moore would not make much difference for the present purpose (see also [104]). L e m m a 2 Given a preconvergence a on X, the family of convergences on X coarser than a has a finest element 7 = 7(a) described as follows: a net N converges to x € X for 7 iff for any subnet P of N there exists a subnet Q of P such that Q converges to x for a. Moreover, a mapping f : X —» Y of X into a topological space (or a space with a convergence) Y is continuous for 7 iff it is continuous for a. Proof. If 7 is a convergence, it is clearly the finest element of the family of convergences coarser than a. Clearly, 7 satisfies conditions (Ci) and (C 2 ). In order to prove that it satisfies condition (C3) let us consider a net N of X such that for any subnet R of N there exists a subnet 5 of R such that S -^» x. Let P be a subnet of TV.Taking R = P we can find a subnet S of P such that 5 -2» x. Then, by the definition of 7 we can find a subnet Q of 5, hence of P, such that Q A i . We have proved that N —> x. The last assertion is immediate.□ It may be comforting to know that a convergence can be described without making use of nets or filters but by using a process which evokes some proposals in algebraic topology ([68], [140]) so that one almost remains in the realm of topological spaces and continuous maps. However, the test spaces are no more compact topological spaces or simplexes but are what we call hushed spaces. We define a hushed space

Convergence Theories

297

to be a topological space which has only one non isolated point and a hushed map h : S —► T between two hushed spaces with non isolated points s0 and t0 respectively is a continuous map such that h(s0) = t0, h(s) ^ t0 for s ^ s0. Proposition 6 In order to define a convergence on a set X it suffices to associate for each hushed space T a set C(T, X) of mappings from T to X such that the following conditions are satisfied: (Cj) for any hushed space T, any constant map from T to X belongs to C(T,X); (Cj) for any hushed spaces S, T, for any hushed map g : S —> T and any f £ C(T, X ) one has fog £ C(S, X); (C3) if T is a hushed space and if a mapping f : T —» X is such that for any hushed space S and any hushed mapping g : S —> T there exists a hushed space R and a hushed map h : R —> S such that f o g o h £ C(R, X) then f £ C(T, X). Conversely, for any convergence space X the families of continuous mappings from a hushed space into X satisfy the preceding conditions. Proof. The last assertion is easy to check. On the other hand, given an association satisfying the conditions above one declares that a net N : I —► X converges to some x £ X if for S :— IU {00} , where 00 is an additional point, topologized by taking as a base of open sets the extended tails S; := /,■ U {00}, with /,- := {j £ I : j > i) the mapping / given by /(co) = x, f(j) = N(j) belongs to C(S, X). We leave to the reader the verification of the required conditions, with the following hints. If I and J are directed sets and if g : / —» J is a filtering map (i.e. for each j € J there exists i £ I such that g(Ii) C Jj) then the extension of g by g(oo) = 00 is a hushed map ; conversely, given a hushed space R with non isolated point ro, a directed set / and the associated hushed space S = / U { o o } a s above any hushed map h : R —> S gives rise to a filtered map g : H —» / when one sets H :={(r,i)e Rx I :r^r0, h{r) > 1} g((r,i))

= i.a

In order to cope with the insufficiencies of the weak* topology mentioned above, the following convergence which is a variant of a convergence considered in ([58]) can be introduced : a net (s,),g/ of X is said to be ^-convergent to x and we write (^i)ig/ -^ x if (xi)ig/ —► x for the weak* topology and if (x,); € / is eventually bounded, i.e. if a tail (x J ) J >, () for some i0 £ I is bounded. Equivalently, (x,), e / —> Z iff (x,), e / converges to x in the weak* topology and if limsup i e / ||x,|| < 00. Thus, the convergence 7 rules out weak* convergent nets which are unbounded; such nets exist in any infinite dimensional dual Banach space. Another characterization can be given as follows. L e m m a 3 The convergence 7 is the convergence 7(/3) associated with the preconvergence /3 given as follows, via the procedure described in the preceding lemma: a net N = (x;),-e/ converges to x for /3 iff it is bounded and converges to x in the weak' topology.

298

J. P. Penot

Proof. Clearly, if a net N converges to x for 7, it converges for the convergence l{j3) associated with /?. Conversely, \l N = (x,-)ig/ converges for 7(/3) then it is weak* convergent to x; let us show it is eventually bounded. If it is not the case, for each 1 6 / and each n € iV one can find }(i,n) 6 / such that j(i,n) > i, ||zj(i,n)|| > n. Then we cannot find a subnet (xk)keK of (*i(i,n))(t,n)g/xW which converges to x for f) and a fortiori is bounded. □ Moreover one can show the following relationship with the classical boundedweak* topology [67], [80], [97] and with the weak* sequential topology considered in [113], [114]. Lemma 4 The topology associated to the convergence 7 by taking as closed subsets of X the subsets which contain the limits of their 7- convergent nets is the bounded weak* topology (x,y). Theorem 1 The convergence 7 on the dual X of the n.v.s. Y coincides with con tinuous convergence. Therefore the coupling functional (•, ■) is continuous when X is endowed with the convergence 7 and Y is endowed with the topology r associated with the norm. Proof. Let us first prove the second assertion. Given (Xj);e/ -^» x,(y;), 6 / —> y, writing \(xi,yi)-(x,y)\ < \(xi,yi-y)\ + \(xi-x,y)\

Convergence Theories

299

and using the fact that (x,) l € / is eventually bounded, we see that ((x,-, i/,)),g/ —» (x, j/). This shows that 7 is finer than continuous convergence and that the coupling func tional is continuous. On the other hand, if (x,);e/ —> x for the continuous convergence, then (xj)i € / —» x for the weak* topology (take j/, = y for each i G 7). If (x,), e / is not eventually bounded, we can select a subnet (xj(jin))(,-in)6/XAr with j(i, n) > i, ||xj(i n )|| > n as in the proof of Lemma 3. Choosing Uj(;,n) in the unit ball of Y such that (*j(f,n),«/(i,n)) > n, -1

and setting yi(iiTl) = n u,-(i,n), we see that (Uj(i,n)) -+ 0 but (iEj(i,n),yj(«,n)> > 1. for each (i,n) 6 7 x N, a contradiction. Q The following result which is an immediate consequence of the continuity of the coupling functional shows the interest of the convergence 7 for various problems. Corollary 2 The graph of a maximal monotone operator A : Y ~» X = Y* is closed in the product convergence of the norm convergence with the convergence 7. We define a multifunction F : T ~~» X from a topological space T into X to be stable (or closed or upper semicontinuous) at to G T if for any nets (2;),e/ —> x 0 with x, G -f^t;) for each i € I one has x 0 G .F(£o)- It has been observed in [58] Prop.1.7 that this notion can be put in purely topological terms ; this simple observation prompted by [47] has been magnified in [23] by the introduction of the Mosco-Beer topology. Let us complete this result by considering what occurs when X can be endowed with a norm ||.||Q which induces on the closed unit ball Bx of X the weak* topology. This is the case when X is the dual of a separable n.v.s.. It is also the case when there exists a compact embedding j' : X —> Xo into a dual Banach space (Xo, ||-||o) whose restriction to Bx is continuous for the weak* topologies (this is automatic if X is reflexive). In the latter case j(Bx) is weak* compact in Xo hence (weak*) closed and compact in (Xo, \\-\\o), so that the weak* topologies coincide on ](Bx) with the topology induced by ||.|| 0 . When X is provided with two norms as above it is of interest to introduce the local mixed excesses : e°(C,D) = sup {d0(x,D)

:x G

CDrBx}

for r G P := (0,oo), C,D C X, where d0(x,D) = inf {||x - y\\0 : y G D}. The fol lowing result, which completes [58] Prop. 1.7, shows that the use of the convergence 7 in order to define the limit superior of a parametrized family of sets is sensible. We hope it will help researchers who are reluctant to leave the realm of sequential convergences (whose drawbacks are well-known, especially for closure operations). P r o p o s i t i o n 7 Let F : T ~+ X be a multifunction such that for some t0 G T the set F(t0) is closed in X0. Then the following assertions are equivalent (a) F is stable (or closed) at t0 when X is endowed with the convergence 7 ;

J. P. Penot

300

(b) for each weak'-compact subset K of X the multifunction FK given by Fx{t) = F(t) D K is upper continuous at t0 (in the classical sense recalled in the following section); (c) F is continuous at to when the space V(X) of subsets of X is endowed with the topology generated by the sets [Kc]+ = {A € V{X) : A C X\K} for K in the family of weak? compact subsets of X ; (d) F is upper hemicontinuous at t0 for the excesses e°, r £ P •' for each r £ P one has e°(F(t),F(t0)) -.0flsf-tfo. Proof. The equivalence (a) •»(b)-«'(c) are analogous to [58] Prop. 1.7. Let us show (d) =>(a). Let (t,), e / —» t0 in T, (x,),e/ -^ x with a;,- 6 F(f,) for each i £ I. We have to show that x0 £ F(t0). Taking a subnet if necessary we may assume there exists r £ M+ such that z, £ rBx for each i £ / . Then we have d0(x{, F(t0)) —> 0, so that there exists («i)»el m F(t0) such that ||x,- — 2,-||0 —» 0. Then both (x,), € / and (2,),gj converge to xo in X 0 . Since F(t0) is closed in X0 we get x 0 G F(t0). Now suppose (a) holds but (d) does not hold : for some r, a in P and some net (<;),-£/ in T with limit to one has for each i £ / e°(F(t,),JF(t0))>a. Thus we can find (xi), g / in rfiy with X{ £ .F(£;), dofar^^fo)) > a for each z £ / . Since rBx is weak*-compact, (i,)i € i has a subnet (ijjjgj which converges to some xo £ rBx for the weak* topology. By (a) we have x0 £ F(t0). Since (xj)jej also converges for the norm ||.|| 0 we get a contradiction with d0(xj,F(to)) > a. □ We hope the preceding result will incite researchers to treat concrete examples (which abound) with the help of the convergence 7 and of the mixed excesses e°. The following result which relates the convergence 7 with the strong and the weak* convergences is a slight generalization of [101] Prop. 3.2. Here we say that a subset C of X is shadowy with respect to (w.r.t.) some point x £ X if x + [1, oo[(C — x) C C. A cone C is always shadowy w.r.t. 0 and when convex it is shadowy w.r.t. any x £ C n ( — C) ; the lower subdifferential of a quasiconvex function ([133]) is shadowy w.r.t. 0. Proposition 8 Let C be a weakly* locally compact subset of X, let x £ C and let (x,)i g / be a net in C. Among the following assertions one has (a) •& (b) <= (c) ; if C is shadowy w.r.t. x the three assertions are equivalent : (a) (x,); 6 / -^» x (b) (ij) —> x in the weak* topology ;

( C ;II*,--*||-»O. Proof. Clearly (a) => (6) and (c) =*> (a). We also have (b) => (a) since x has a neighborhood V in the weak* topology such that K n C i s weak* compact and there exists h £ / such that i ; £ V n C for ? > h.

Convergence Theories

301

It remains to show that (a) =>• (c) when C is shadowy w.r.t. x. Suppose on the contrary that there exists s > 0 and a cofinal subset J of 7 such that ||XJ — x\\ > s for each j £ J . Let V be a neighborhood of x in the weak* topology a such that V n C is weak* compact, hence bounded. Let r > 0 be such that \\v — x|| < r for each u 6 V n C; we may suppose r > s. Let V = x + ar _ 1 (V — x) ; since V is a neighborhood of x and since for each t ' f V ' f l C, setting v := x + r s _ 1 ( i / — x) € V, we have v £ C since C is shadowy w.r.t. x, hence ||t; — x|| < r and ||i>' — x|| < s. It follows that Xj g V n C, a contradiction with (Xj) —> x weakly*.□ The reader will find information about the use of weakly* locally compact convex sets in [48]; let us mention that this class includes the following example in which r is a positive constant, K is a compact subset of Y and Y is a Banach space : C := {x £ X : /iA-(x) := max{i,j) > H|x||}. yeh'

Such cones will be called Bishop-Phelps cones or sharp cones (see [8] for one of their uses). We provide a simple proof which brings some more information. Lemma 5 Any Bishop-Phelps cone as above is closed and locally compact in the bounded weak* topology o~\, and in the weak* topology a. Proof. The case of the topology a is treated in see [101] Prop. 3.5. Since | hx \ is one of the seminorms defining the topology at,, and since the dual norm is l.s.c. for cr hence for uj, the set C is closed for at- Setting V := K° = hjfQ—oo, 1]), a neighborhood of 0 for 0 and some compact subset KofY one has rB C Q + K. Then the polar cone C := Q" of Q is a Bishop-Phelps cone. If the cone Q and if the compact set K are closed convex in the Banach space Y and satisfy Y = Q + M+K, then C := Q° is a Bishop-Phelps cone. Proof. For each x £ C and for each b £ B vie can find y £ Q, z £ K such that rb = y + z, so that (x,rb) < {x,z) < hi<{x). Taking the supremum on b we get the result, the last assertion being a consequence of the Robinson-Ursescu's open mapping theorem.□ We refer to [101] and the next section for the use of such a result in connection with compactly epi-Lipschitzian sets and to [84] for instance for the possible use of the convergence 7 in nonsmooth analysis. Some applications of the convergence 7 for duality questions are treated in [58]. Since our interest is focused on nonsmooth analysis, let us give more details on the following simple relationship which does not seem to have been put in full light except for the reflexive case (see [43] Prop. 3.1). Recall that the weak tangent cone to a subset F of a n.v.s. X at x is the cone T 7 (F, x) obtained by replacing norm convergence by the convergence 7 in the definition of the contingent cone, X being considered as a subset of X** as in [91].

J. P. Penot

302

Proposition 9 Suppose F is a subset of a n.v.s. X. Then the Frechet normal cone to F atx € F is the polar of the weak tangent cone to F at x in the duality between X and X" : N-iF,x) = {T{F,x))°. Proof. Given x" € N~(F,x) and v" = 7 - l i m ^ / f r 1 ^ . - x) £ T~<(F,x) with {ti)iel —» 0 + and (v{) := ( i , * 1 ^ ; — 1)) eventually bounded, we see that (r,),- 6 ; := (||zi - x\\)ieI —► 0 and we may assume that (g,-) := (<, _lr i) n a s a finite limit q; then we have (x*,u**) = lim(i",r, _1 (xi - x))^1^ < 0. Conversely, if x* € X' \ N~(F,x) we can find a sequence (x„) in F \ {x} with limit x and a > 0 such that || i „ - a: 11 for each n. Let v" be a weak** cluster point of the sequence (ijf^zfij)- Then (x",v") a, v" 6 T~i{F,x) and x~ $ {T~<(F,x))0.a

3

>

Stabilized Subdifferentials and Stabilized Nor mal Cones

Stability (also called closedness or upper semicontinuity) is a desirable feature for a multifunction. Let us recall that a multifunction M : W ~> Z between two topological spaces is said to be stable (or closed or upper semicontinuous) at w G W if nU€0(a)cl(M(U))

C M(w),

where cl denotes the closure and 0(w) denotes the family of open neighborhoods of w in W. It is said to be upper (semi) continuous at w if for any open subset V of Z containing M(w) one can find U 6 0(w) such that M(U) C V. Let us recall the following result, the first part of which is obvious, taking into account the fact that the multifunction M is stable at each point iff its graph is closed. Its last assertion follows from the fact that a multifunction with values in a compact set is upper continuous if and only if it is stable. Lemma 7 [35] Let F : W ~» Z be a multifunction between two topological spaces. There exists a smallest stable multifunction M whose graph contains the graph of F; it is obtained by taking the closure of the graph of F in the product space W x Z. If F is densely defined and if for each w € W there exists some U 6 O(w) such that F(U) is contained in a compact subset of Z then there exists a (unique) smallest upper continuous multifunction N containing F which is upper continuous with nonempty compact values. It is obtained by taking N(w) = n ( , E O M c!(F(C/)). In fact, M and N coincide.

303

Convergence Theories

W h e n Z is a convergence space, it is still possible to consider stable multifunctions and to stabilize any multifunction by taking its closure; but the simple procedure described in the preceding lemma in order to stabilize a multifunction is no more valid. However, taking the closure of a multifunction with respect to a convergence may bring useful properties. Unfortunately, convexity is lost in the process, but we will see that other important properties are preserved. Let us give a precise definition for the case we are interested in, i.e. the case of a dual Banach space endowed with the convergence 7 described above. Definition 1 Given a topological space W, a Banach space X and a multifunction M : W ~~* X", the stabilized multifunction M associated to M is given by M(w) := {x* e X' : 3{wt)t€l

-> w,3(x'),e,

A a;*,Vt G / x* G M(w,)}

When W = X and M = d1 f for some / € LSC(X) and some subdifferential <9? we denote by 9 ? / the multifunction obtained in this way, requiring furthermore that (wi)iel ~~* w i-e- that (u>,),g/ —> w and (/(«;,));£/ —> /(to) and we call it the stabilized subdifferential of f. Let us observe that the added condition can be interpreted either by changing the topology of W or by taking the closure of the hypergraph of / given by { ( I , / ( I ) , I * ) : x G X } and taking the projection. When W = X and M = N• (F,.) for some closed subset F of X and some normal cone N' we denote it by N?(F,.) and we call it the stabilized normal cone to F. The question arises whether the stabilized subdifferential (resp. normal cone) multifunction associated with a coherent subdifferential (resp. normal cone) is still coherent. The following result presents conditions ensuring this property. Proposition 10 (a) If N' is the normal cone associated with the subdifferential d then iV? is the normal cone associated with the stabilized subdifferential &1. (b) If d' is the subdifferential associated to the normal cone N' and if for any closed subset F of X, a G F and any p G X such that F + p C F one has N7(F, a + p) C Nn'{F,a) then d1 is the subdifferential associated with the stabilized normal cone 7V? (c) Under the preceding assumption, if N' (resp. d) is coherent then 7V? (resp. _?

d ) is coherent. We note that the assumption in assertion (b) is satisfied in particular for the contingent, the incident and the Frechet normal cones. Proof. The first assertion is an immediate consequence of the definitions and the last one follows from the two preceding assertions. Let us prove (b). Given X in X, f G LSC(X), x G Df, x' G d1'f(x), we can find a net (i;); 6 / -+ x and a net (x*), € / ^> x* such that x* G d ? f(x,) for each 1 G / and (f(x,))ie, -> f{x). Then ((*T, - l ) ) i 6 ; -> ( x ' , - 1 ) , ((x„/(x,))jie/ - (*,/(*)) and as ( x ' , - 1 ) G N-(Ej,(x,J(x,)) for each i G I we get (x*,— 1) G NJ(Ej, (x,f(x))). Conversely, if x* satisfies this relation, we can

J. P. Penot

304

find a net ((xj,r;)),- 6 / —» (x,/(x)) in Ej and a net ((x,*,s,))j e / —» (x*,—1) such that (x,*, a,-) € N7(Ef, (x,-, r,)) for each i £ / ; then, for t large enough, we have s, < 0 and, by our assumption, (| s'1 | x * , - l ) e N'!(Ef,(xi,f(xi))) so that | s" 1 | x* G d1 f{xi) and x* G d 7 /(x).D The following result describes the way polarity behaves with respect to the pre ceding closure process. It completes and mimics [15] Th. 1.1.8, [58] Th. 2.2, [116] Lemma 2.13. Proposition 11 Let W be a topological space, let M : W ~» X* be a multifunction whose values are closed convex cones and let M be its closure with respect to the convergence 7. Then, taking polars in the duality between X* and X", one has (M(w))° n X = (lim inf M(v)a) n X. If moreover the values of M are weak* -closed one can take polars in the duality between X* and X : (M(w))° = lim inf M(vY. v—*w

'

Proof. The inclusion (liminf„_ w M(«)°) n X C (M(w))° follows from the conti nuity of the coupling functional c : X" x X* —» 1R at each point of X x X*: when (x*) —» x*,(x**); —> x one has

\(xV>xi) ~ ix>x')\ < \ix" ~ * . » * } | + K*>«* - x'}\ - » ° Now if x € X \ liminf„_„, M(u)° we can find r > 0 and a net (u)i), 6 / —> w such that B**(x,r) n M(wi)° = 0 for each i G / . As B"(x,r) is weak"compact, the Hahn-Banach theorem provides some x* G M(tUi) with norm one such that (x*,v**) > 0 for each v" G

B"(x,r).

Taking a subnet if necessary we may assume (x*) l€ / has a weak*—limit x* As this net is bounded, the limit is valid for the convergence 7 and x* G M(w). Since the preceding inequality implies (x*,x) > r for each i G / , we get (x*,x) > r and x ^ (M(w))°. When the values of M are weak* closed, we may replace the ball B"(x,r) by the open ball U(x,r) of X and get x* G M(iu,) 00 = M(w{). D Corollary 3 The polar (in X) of the stabilized contingent normal cone N(F,x) to a closed subset F of X at x G F is (liminf , , co(T(F,x'))) n X. The polar (in X) of the stabilized Frechet normal cone N~(F,x) to a closed subset F of X at x G F is (liminf j F co(T^(F,x'))) n X. If X is reflexive one has Ti(F,x) = (N(F,x))°, N1(F,x) = co{N(F,x)).

Convergence Theories

305

Proof. The first assertion follows from the proposition and the fact that co(T(F, x')) = (T(F, x'))°° = (N{F, i'))°. The second one is proved similarly, using cd(Ti(F, x')) = (T1 (F,x'))"" — (N~(F,x'))°. The last assertion is a consequence of the equality \immfxli:xCd(T~'(F,x')) = T^(F,x) in [43] Th 3.1, taking into account that in a re flexive Banach space the weak contingent cone T ^ F , x') coincides with the sequential weak tangent cone.n The following result compares our definition with related ones. Proposition 12 Let M : W ~~* X* be a multifunction and let W (resp. Jf"eg) be its closure (resp. sequential closure) with respect to the weak" topology a. Then for each w € W one has

I » c l W c F ( 4 If there exist a neighborhood Uofw and a weak" closed, weak" -locally compact subset C of X' such that M(u) C C for each t i £ ( / then the last inclusion is an equality. If moreover X is reflexive both inclusions are equalities. Proof. Let x £ M (w). Since U x X* n M C U x C, and since C is closed for a we have i G C . Let V be a neighborhood of x' for a such that V n C is compact for a. If ((wi,x"))i£[ is a net in M with limit (w,x") the net (x*) is eventually in V n C, hence is convergent for 7 so that x" 6 M(w). The last assertion is proved in [101] and is based on Whitley's construction.□ Ph. Loewen further shows ([101] Prop. 3.7) that if F is epi-Lipschitzian (i.e. satisfies the cone condition [2]) or compactly epi-Lipschitzian in the sense of [34], then the preceding condition is satisfied for the multifunction N(F,-) on W := F. Then we say that F is a Loewen set or that F satisfies the (LC) condition. The epi-Lipschitzian conditions can be relaxed in the following way. Proposition 13 Let F be a closed subset of X which is compactly tangentially de termined near x £ F in the following sense: there exists r > 0, a compact subset K of X and a neighborhood U of x such that for any u 6 FC\U one has rB C T(F, u) + K, where B is the closed unit ball of X. Then N(F,x) coincides with N {F,x) obtained by stabilizing N(F,-) with respect to a. Proof. This follows from the preceding proposition and the fact that N(F, u) is contained in a fixed Bishop-Phelps cone.D Any closed subset of a finite dimensional n.v.s. is obviously compactly tangentially determined (take r = 1, K — B). It is also the case if there exist a neighborhood U of x and r > 0, v £ X such that B{v,r) C T{F,u) for each ue FnU (take K = {-v}) or if F is compactly epi-Lipschitzian in the sense of [34]. Corollary 4 If f is a Lipschitzmn function, or more generally, a function whose epigraph satisfies the (LC) condition, then the weak" closure and the -y-closure of df coincide:

Qmj

c

g*f =g /.

306

J. P. Penot

In particular, for each x € X the set df(x) is weak"-closed and the multifunction df(-) is closed. If moreover the space X is reflexive, then the sequential weak' closure of df coincides with the other two closures. In usual spaces it is not necessary to distinguish between the stabilized Frechet subdifferential and the stabilized contingent subdiiFerential when dealing with locally Lipschitzian functions. We need some definitions to present this fact. Definition 2 Given a subdifferential d' a Banach space X is said to be dependable for d1, or d1 -dependable if for any l.s.c. functions f,g : X —» MV {00} with g locally Lipschitzian and any x 6 domf, any e > 0, any x* € d'(f + g)(x), there exists u,ve B(x,e), u* € d?f(u), v' € d-g(v) with | f{u) - f(x) |< e, | g(v) - g(x) |< e, \\u' + v' — x"\\ < e. If in the preceding condition g is supposed to be convex and Lipschitzian we say that X is C-dependable for d' We call dependable the spaces which are d-dependable, i.e. dependable for the contingent subdifferential. It has been shown that a space is d~ - dependable, i.e. dependable for the Frechet subdifferential iff it is trustworthy in the sense of [81] iff it is an Asplund space, iff its dual satisfies the Radon-Nikodym property (see [62], [71]-[73], [81]). In particular, reflexive Banach spaces and separable Banach spaces are trustworthy. Thus, this class of spaces is important. Moreover, it can easily be characterized in terms of separable subspaces. It is shown in [61] that the class of LCJ-bumpable spaces, i.e. the class of spaces for which there exists a. non null Lipschitzian function of class C1 with bounded support is ^"-dependable and in fact ^"-dependable, where d" is the viscosity subdifferential defined in the following way: x* belongs to dvf(x) iff there exist a function g of class C 1 such that g'(x) = x" and f — g attains its minimum at x. The situation for dependable and C-dependable spaces is not as clear yet ; it is likely that these classes are much more restricted. On the contrary, the following class is at least as large as the class of trustworthy spaces: if we substitute in it the Frechet subdifferential to the contingent subdifferential, we obtain exactly the class of Asplund spaces (see [127]). A similar reason shows that this class is contained in the class of space on which exists a bump function of class T 1 in the sense of Proposition 17 below. Moreover it is contained in the class of Gateaux differentiability spaces in the sense of [132]. Definition 3 A Banach space is said to be reliable or a R-space if for any l.s.c. function f : X —> iRL){oo} and g : X —> HI convex and Lipschitzian and any x 6 domf at which f + g attains a local mimimum, and for any e > 0, there exists u,v € B{x,e), u* G df{u), v' € dg{v) with \ f(u) - f(x) |< e, | g(v) - g(x) |< e, [|«* + v*|| <£■ The proof of the coincidence result we announced is analogous to the proof of [88] Lemma 4.

307

Convergence Theories

Proposition 14 Let X be an Asplund space and let f be a locally Lipschitzian func tion on X. Then d f coincides with the multimapping d f obtained by stabilizing the Frechet subdifferential d~f. The stabilization procedure we have presented is a simple prototype. There are several other ways of stabilizing subdifferentiaJs; in particular, one may consider a stabilization procedure through the function itself considered as a variable (see [75], [152], [153] for instance). It may also be useful to introduce slightly more complicated processes involving e-subdifferentials or restrictions to finite dimensional subspaces; let us describe them shortly, since they follow a pattern similar to what precedes. The (contingent) e-subdifferential of / is given by

dj(x) := {i* ex'vvex

f(x,v) > (x-,v) -e\\v\\),

where f'(x, v) := lim

inf

t'\f{x

+ tu) -

f(x))

t—*G+,u—*v

is the contingent derivative of / . In nice spaces the regularisation procedure using this approximate subdifferential coincides with the one we presented. Proposition 15 For any l.s.c. function f on a C-dependable space (resp. reliable space, resp. trustworthy) X and for any x € domf one has d f(x) = 7 - l i m sup ,

dj(u),

(resp. d f(x) 3 7 - lim sup ,

d~f(u),

resp. d f(x) = 7 — limsup /

d~ f(u)).

Proof. Let us prove the first assertion, and then show the necessary changes for the second one. It suffices to show that any x" of the right hand side of this equality belongs to d / ( x ) . One can find nets (e,-)ie/ —> 0+, (x;);e,r —> x,(x*), e / -» x' weakly' with (f(xi))ieI -> f(x), (x*)ieI bounded, x* G 9 £ ,/(x,) for each i G / . Setting qi(u) := \\u — x;||, we see that x* G d(f + £,g)(x,-), hence x* G u* + 2E{B~ with u' G df(ui) for some u; G B(x„ e,) satisfying | f(u,) - /(x,) |< e,-._Thus, (u')iei -> x* and is bounded, and (u,-);e/ -> x with (/(xi)), e j —> f(x) : x" G9 / ( x ) . When x* belongs to the right hand side of the second equality we can take x* G d~f(x{) for each i G /■ Then x, is a local minimizer of / + 2e,-<7, — x* and reliability (resp. trustworthiness) enables us to conclude as above. □ The proof of the following result is of the same type and is omitted. Proposition 16 For a l.s.c. function f : X —» RU {00} and a locally Lipschitzian function g on a dependable (resp. trustworthy ) space X, for any x G domf one has

d(f + (resp.d

g)(x)Cdf(x)-rdg(x)

(f + g)(x)cd

f(x) + d

g(x)).

J. P. Penot

308

Therefore the stabilization process we used provides useful calculus rules. For more on this subject in a similar framework, see [107], [108], [88]. Let us note that, to the contrary of the A-subdifferential considered in [83], [84], [88], the stabilized subdifferential we introduce here coincides with the usual subdifferential in the convex case and in the case of a function of class C 1 (or even of class T1 as defined in [129]). Proposition 17 Suppose f = g + h, where g : X —> R U {oo} is convex and l.s.c. and h is of class T1, i.e. is continuous, Gateaux differentiable with a locally bounded derivative which is continuous for the weak"-topology. Then for each x £ domg one has d / ( * ) = dg{x) + h'(x). Proof. It suffices to show that any x' £ B f{x) belongs to the right hand side of this equality. One can find nets (i,),'e/ —> x, (x')iej —> x* weakly* with (/(xi))ie/ ~~* f(x)> (xi)iei bounded, x" 6 3/(x,) = dg(xi) + h'(xi) for each i € / , by [129], Prop. 1.5. Then (y*),e/ := (x* — /j'(x;)), g / is bounded and weak* converges to y" := x~ — h'(x). From the continuity of the coupling functional we conclude that y* 6 dg(x).0 Another stabilization process consists in using restrictions to finite dimensional subspaces. Let us give a short account of it ; here we replace weak convergence in [83] by 7—convergence. We denote by T the directed family of finite dimensional subspaces of X. Definition 4 For a l.s.c. extended real-valued function f on an arbitrary Banach space X the finitely stabilized subdifferential of f at x 6 domf is given by d f(x) :=FeJT n 7 - limsupd/u+F(u), t

u—*x

where fu+F(w)

:= f(w) if w £ u + F , +00 else.

The interest of such a modification lies in the fact that the nice calculus rules of [83] are preserved ; in particular, for an arbitrary Banach space X, for any f : X —* ML! {00} l.s.c, finite at x, for g : X —> ]R locally Lipschitzian one has

~dA{f + g){x)CdA

f{x)+~dAg{x).

Moreover the following new property holds. Proposition 18 / / / is an arbitrary convex function on the Banach space X, for each x 6 domf one has d f(x) = df(x) = df(x). Proof. Since for each u 6 domf and each F 6 T one has df(u) C dfu+p(u) the inclusion df(x) Cd f(x) holds. Conversely, given x~ £d f(x) let us show that for each w € X one has f(w)>f(x) + {x',w-x).

Convergence Theories

309

We pick F e f containing w and i , a net (x,); e / -^ x, a bounded net (x*), e / with weak" limit x* such that x* G dfXl+F(xi) for each i G / , so that xt- € F for each i € / . Then, as the restriction of / to F is convex and as the contingent subdifferential coincides with the ordinary subdifferential in this case, we have

f(u>)>f(xi) +

{xlw-Xi)

and taking limits we get x" G df(x) thanks to the continuity of the coupling functional. The second equality is proved similarly. D Up to now, we have not used other existing links between the analytical approach and the geometrical approach such as the distance functional. The reason is that this last means does not give an accurate connection as the indicator function does (see [37]). For a similar reason we have not treated here the case of proximal normal cones N* which reflect more the metric properties of the space (and the set) than its linear structure; see [39], [43], [44], [53], [88], for instance, in this connection. However, the following result akin to results in [88] is worth noting. Let us note that it does not ensure that the stabilized normal cone to F is weak*—closed, inasmuch the last assertion of the preceding corollary. Proposition 19 Suppose X has a smooth norm. Let dF be the distance function associated to the closed set F : dF(x) := m{yeF \\x — y\\. Then for each x € F one has N(F,x)

= R+

ddF(x).

Proof. The inclusion N(F,x) D M+ d dp(x) is obvious, provided one makes an appropriate adaptation of [88] Lemma 5. Now, given x' G N(F, x), we can find a net (x,), e / in F with limit x and a bounded net (x*),-g/ in X* with weak*— limit x* such that x* € N(F,xi) for each i G /• Since ddF(u) = N(F, u) n B*(0,1) for each u G F, as easily checked, (see also [88] Lemma 3) we can write x* = r,u", with n := ||x*||, u* G ddF(x,).Without loss of generality we may assume (n) and (u*) converge to some r and u* respectively and u" G<9 dp(x). Then x* = ru* and the reverse inclusion is proved. □

4

Alien Terms in Limit Problems

A connection between convergence theory and optimization problems which deserves comments and thoughts lies in the derivation of optimality conditions and in sensi tivity problems. We observe that the most recent and efficient optimality conditions [55], [85], [86], [95], [121], [122], [125] involve additional unexpected terms. Similar phenomena occur in other fields such as homogenization theory [4], [31], [51], [59],

J. P. Penot

310

[111], [112], [138] from which we borrow the words "alien term' 1 and elasticity [45], [50]. One may wonder whether such a fact is fortuitous or not. We believe it is not accidental. For the moment, this belief does not have firm grounds ; we hope that these lines may be an incentive to find some sound reasons. However we dispose of the following observations. Given a n.v.s. X, xo G X and an arbitrary function f : X —> fit' = KU {oo} which attains its minimum on X at Xo we observe that for each t > 0 the functions ft, f»: X - R given by n(v) = t-1[f(xo + 2

f't'(v) = 2t- [f(x0

tv)-f(x0)}

+ tv) - f(x0) - (x*0,tv)]

with XQ = 0 attain their minimum at 0. If the family (/(')i>o (resp (/,")i>o) epi-converges to some function f (resp. fxo,x') then one has f'xo{v) > 0 (resp. / " x'(v) — 0) for each v € X. In fact this inequality holds for the lower epi-limit of the family (/(')<>o (resp. (f")t>o) without supposing epi-convergence. This stems from the fact that the epigraphs E(ft) of / ( for / , = f't or /"are contained in X x 2R+ so that their limit superior (in the sense of the preceding section) is contained in X x IR+. It follows that the necessary condition f'xo{-) > 0 (resp. / " -•(.) > 0) involves the calculation of some epi-limits which may differ from the pointwise limits (see [57], [125] for instance).. On the other hand when one considers the limit problem of a family of equations of the form (Et)

Ft(v) = 0

or inequations of the form {VI,)

(Ft{v),u-v)>0Vu€C

one also takes graphical limits which involve additional terms. These "alien terms" are difficult to interpret and depend on the specific problem at hand. However one may wonder whether there exist natural links between the extra terms in optimality conditions and the "alien terms" of the limit problem of {Et) or {VIt). This question arises naturally when for instance {VIt) is the Euler equation of some parametrized minimization problem {Pt) minimize ft(x) : x £ C The observation above shows that problems of this type appear in the derivation of first order and second order optimality conditions. The relationships between convergence of parametrized families of convex functionals and convergence of their subdifferentials detected in [4], [7], [110], [123] yield some hints. But much more is to be done, from a theoretical point of view as well as from an applied point of view.

Convergence Theories

311

Sensitivity analysis represents another promising field for such a study : whenever the constraints are non polyhedral, additional terms should be added in the expression of the derivatives of the performance function. The fact that most problems with constraints involving partial differential equations are governed by non polyhedral sets in some functional spaces justifies an interest for such intricate matters.

References [1] J. F. Aarnes and P. R. Andenaes. , On nets and filters, Mathematica Scandinavica 31 (1972) 285-292. [2] S. Agmon, Lectures on Elliptic Boundary Value Problems, Van Nostrand Math ematics Studies 2 (1965), Princeton, N.J.. [3] E. Asplund and R. T. Rockafellar, Gradients of convex functions, Transactions of the American Mathematical Society 139 (1969) 443- 467. [4] H. Attouch, Variations] Convergence for Functions and Operators, Pitman, Boston (1984). [5] H. Attouch, D. Aze and G. Beer, On some inverse stability problems for the epigraphical sum, Journal ol Nonlinear Analysis Theory, Methods and Appli cations 16 (1991) 241-254. [6] H. Attouch, R. Lucchetti and R. J.-B. Wets, The topology of the p-Hausdorff distance, Annali Di Matematica Pura ed Applicata 160 (1991) 303-320. [7] H. Attouch, J.-L. Ndoutoume, M. Thera, Epigraphical convergence of functions and convergence of their derivatives in Banach spaces, Seminaire d'Analyse Convexe, Montpellier, Expose No9, 1990. [8] H. Atttouch and H. Riahi, Stability results for Ekeland's e — variational principle and cone extremal solutions, Seminaire d'Analyse Convexe 20, Montpellier, Expose 5 (1990). [9] H. Attouch and R. J.-B. Wets, Epigraphical Analysis, in Analyse non lineaire. H. Attouch et al (eds) Gauthier Villars, Paris (1989) pp.73-100. [10] H. Attouch and R. J.-B. Wets, Isometries for the Legendre-Fenchel transform, Transactions of the American Mathematical Society 296 (1986) 33-60 [11] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: I. The epigraphical distance, Transactions of the American Mathematical Society 328 (2) (1992) 695-729.

312

J. P. Penot

[12] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: II. A framework for nonlinear conditioning, SIAM Journal on Optimization 1992. [13] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: III. e-approximate solutions, Preprint (Oct. 1987) IIASA Laxenburg, Austria. [14] J.-P. Aubin and I. Ekeland, Applied Functional Analysis, Wiley-Interscience, New York (1984). [15] J.-P. Aubin and H. Frankowska, Set-Valued Analysis, Birkhauser, Basel (1990). [16] D. Aussel, J. N. Corvellec and M. Lassonde, Mean value theorem and subdifferentiability criteria for lower semicontinuous functions, Transactions of the American Mathematical Society, to appear. [17] D. Aze and J.-P. Penot, Recent quantitative results about the convergence of convex sets and functions, in: Functional Analysis and Approximation, P. L. Papini (ed) Pitagora, Bologna (1989) 90-110. [18] D. Aze and J.-P. Penot, Operations on convergent families of sets and functions, Optimization 21 (1990) 521-534. [19] D. Aze and J.-P. Penot, Qualitative results about the convergence of convex sets and convex functions, in: Optimization and Nonsmooth Analysis, A. D. Ioffe et al. (eds). Pitman Research Notes 244, Longman, Harlow, (1992) 1-25. [20] D. Aze and J.-P Penot, The Joly topology and the Mosco-Beer topology revis ited, Bulletin of the Australian Mathematical Society 48 (1993) 353-363. [21] D. Aze and A. Rahmouni, Lipschitz behavior of the Legendre- Fenchel trans form, Preprint, University Perpignan, 1992. [22] D. Aze and A. Rahmouni, Intrinsic bounds for Kuhn-Tucker points of perturbed convex programs, Preprint, University Perpignan, 1992. [23] G. Beer, On Mosco convergence of convex sets, Bulletin of the Australian ematical Society 38 (1988) 239-253.

Math

[24] G. Beer, On the Young-Fenchel transform for convex functions, Proceedings of the American Mathematical Society 104 (1988) 1115-1123. [25] G. Beer, Conjugate convex functions and the epi-distance topology, Proceedings of the American Mathematical Society 108 (1990) 117-126 [26] G. Beer, Hyperspaces of a metric space : an overview, Preprint (1990). [27] G. Beer, Mosco convergence and weak topologies for convex sets and functions Mathematika 38 (1991) 89-104.

Convergence Theories

313

[28] G. Beer, Topologies on closed and convex sets and the Effros measurability of set-valued functions, Sem. d'Analyse Convexe, Montpellier 21 (1991), expose no 2. [29] G. Beer, The slice topology : a viable alternative to Mosco convergence in nonrefiexive spaces, Nonlinear Analysis Theory, Methods and Applications 19 (1992) 271-290. [30] G. Beer and J. M. Borwein, Mosco convergence and reflexivity, Proceedings of the American Mathematical Society 109 (1990), 427-436. [31] A. Bensoussan, J.-L. Lions and G. C. Papanicolaou, Asymptotic Periodic Structures, North Holland, Amsterdam, 1978.

Analysis for

[32] G. Beer and J. M. Borwein, Mosco convergence of level sets and graphs of linear functionals, Journal of Mathematical Analysis and Applications 175 (1993) 5367. [33] J. Birge and Liqun Qi, Semiregularity and generalized subdifferentials with applications to optimization, Mathematics of Operations Research 18 (4) (1993) 982-1005. [34] J. M. Borwein, Epi-Lipschitz-like sets in Banach spaces : theorems and ex amples, Journal of Nonlinear Analysis Theory, Methods and Applications 11 (1987) 1207-1217. [35] J. M. Borwein, Minimal cuscos and subgradients of Lipschitz functions, In: Fixed Point Theory and its Applications, J.-B. Baillon and M. Thera, eds. Pitman Lecture Notes in Maths, Longman, Essex, (1991), 57-82. [36] J. M. Borwein, Differentiability properties of convex, of Lipschitz, and of semicontinuous mappings on Banach spaces, in: Optimization and Nonlinear Anal ysis, A. Ioffe, M. Marcus and S. Reich (eds), Pitman Research Notes in Math. 244, Longman (1992), 39-52. [37] J. M. Borwein and M. Fabian, A note on regularity of sets and of distance functions in Banach spaces, Journal of Mathematical Analysis and Applications 182 (2) (1994) 566-570. [38] J. M. Borwein, S. P. Fitzpatrick and J. R. Giles, The differentiability of real functions on normed linear spaces using generalized subgradients, Journal of Mathematical Analysis and Applications 128 (2) (1987) 512-534. [39] J. M. Borwein and J. R. Giles, The proximal normal formula in Banach spaces, Transactions of the American Mathematical Society 302 (1987) 371-381.

J. P. Penot

314

[40] J. M. Borwein and A.D. Ioffe, Proximal analysis in smooth spaces, Preprint, University Simon Fraser, Vancouver, 1994. [41] J. M. Borwein and A.S. Lewis, Convergence of decreasing sequences of convex sets in nonreflexive Banach spaces, Preprint, University of Waterloo, October 1992. [42] J. M. Borwein and D. Preiss, A smooth variational principle with applications to to subdifferentiability and to differentiability of convex functions , Transactions of the American Mathematical Society 303 (1987) 513-527. [43] J. Borwein and H. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part I ■ Theory, Canadian Journal of Mathematics 38 (1986) 431-452. [44] J. Borwein and H. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part II : Applications, Canadian Journal of Mathematics 39 (1987) 428-472. [45] F. Bourquin, P. G. Ciarlet, G. Geymonat and A. Raoult. T-convergence et analyse asymptotique des plaques minces, C.R.A.S. (I) 315 (1992) pp. 10171024. [46] J. Burke and Liqun Qi, Weak directional closedness and generalized subdifferentials, Journal of Mathematical Analysis and Applications 159 (2) (1991) 485-499. [47] Ch. Castaing, Proximiteet mesurabilite. Un theoreme de compacite faible, Colloque sur la Theorie Mathematique du Controle Optimal, Brussels, (1969) 2533. [48] Ch. Castaing and M. Valadier, Convex Analysis and Measurable Lecture Notes in Maths. 580, Springer Verlag, Berlin 1977.

Multifunctions.

[49] G. Choquet, Convergences, Annales Inst. Fourier Grenoble 23 (1947-1948) 55112. [50] P. G. Ciarlet, Plates and Junctions in Elastic Multi-structures: Analysis, Masson, Paris (1990).

An

Asymptotic

[51] D. Cioranescu and F. Murat, Un terme etrange venu d'ailleurs I, II. in: JVonJinear Partial Differential Equations and their Applications, College de France Seminar, vol. II, Pitman, London, (1982) 98-138. [52] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.

Convergence Theories [53] F. H. Clarke, Methods of Dynamic and Nonsmooth Optimization, regional conferences series 57, S.I.A.M. , Philadelphia, 1989.

315 CBMS-NSF

[54] F. H. Clarke, R. J. Stern and P. R. Wolenski, Subgradient criteria for monotonicity, the Lipschitz condition, and convexity, Canadian journal of mathematics 45 (6) (1993) 1167-1183. [55] R. Cominetti, Metric regularity, tangent sets, and second-order optimality con ditions, Applied Mathematics and Optimization 21 (1990) 265-287. [56] R. Cominetti, On pseudo-differentiability, Transactions of the American Math ematical Society 324 (1991) 843-865. [57] R. Cominetti and J.-P. Penot, Tangent sets of order one and two to the positive cones of some functional spaces, in preparation. [58] L. Contesse and J.-P. Penot. Continuity of the Fenchel correspondence and continuity of polarities, Journal of Mathematical Analysis and Applications 156 (1991) 305- 328. [59] G. Dal Maso. An Introduction to T—Convergence, Birkhauser, Boston, 1992. [60] R. Deville, A mean value theorem for non differentiable mappings, Preprint, University Bordeaux I, 1993. [61] R. Deville and E. M. El Haddad, The subdifferential of the sum of two functions in Banach spaces I. First order case, Preprint, University Bordeaux I, 1993. [62] R. Deville, G. Godefroy and V. Zizler, Smoothness and Renormings in Banach Spaces, Pitman Monographs in Math 64, Longman, Essex, 1993. [63] J. Diestel, Sequences and Series in Banach Spaces, Springer-Verlag, New York (1984). [64] S. Dolecki, Tangency and differentiation : some applications of convergence theory, Annaii di Matematica Pura ed Applicata 130 (1982) 223-255. [65] S. Dolecki, Continuity of bilinear and non bilinear polarities, in: Optimization and Related Fields, Erice, 1984, R. Conti et al eds. Lecture Notes in Maths. 1190, Springer Verlag, 1986, 191- 213. [66] S. Dolecki, Convergence of minima in convergence spaces, Optimization (1986) 553-572.

17

[67] N. Dunford and J. T. Schwartz, Linear Operators, vol.1, Interscience, New York, 1958.

316

J. P. Penot

[68] S. Eilenberg, Homotopie et Espaces Fibres, unpublished Lectures, Paris 19661967. [69] K.-H. Elster and J. Thierfelder, On cone approximations and generalized di rectional derivatives in: Nonsmooth Optimization and Related Topics, F. H. Clarke, V. F. Dem'yanov and F. Giannessi eds., Plenum Press, New York, 1989, 134-154. [70] B. El Abdouni and L. Thibault, Quasi-interiorly e-tangent cones to multifunctions, Numerical Functional Analysis and Optimization 10 (7&8) (1989), 619641. [71] M. Fabian, Subdifferentials, local e—supports and Asplund spaces, Journal of the London Mathematical Society 34 (1986) 568-576. [72] M. Fabian, On classes of subdifferentiability spaces of loffe, Nonlinear Theory, Methods and Applications 12 (1) (1988) 63-74.

Analysis

[73] M. Fabian, Subdifferentiability and trustworthiness in the light of a new variational principle of Borwein and Preiss, Acta University Carolinae Math, et Phys. 30 (2) (1989) 51-56. [74] M. Fabian and N. V. Zhivkov, A characterization of Asplund spaces with the help of local e— supports of Ekeland and Lebourg, Comptes rendus Acad. bulgare Sci. 38 (6) (1985) 671-674. [75] H. Frankowska, The first order necessary conditions for nonsmooth variational and control problems, SIAM Journal on Control and Optimization 22 (1) (1984) 1-12. [76] H. Frankowska, S. Plaskacz and T. Rzezuchovski, Measurable viability theorems and Hamilton-Jacobi-Bellman equation, Cahiers Ceremade No 9207, Universite Paris IX, 1992. [77] A. Frolicher and W. Bucher, Calculus in vector spaces without norm, Lecture Notes in Math, 30 (1966) Springer Verlag, Berlin. [78] E. Giner, Etude sur les fonctionnelles integrates, these d'Etat, Universite of Pau, 1985. [79] J.-B. Hiriart-Urruty, New concepts in nondifferentiable programming, Bulletin de la Societe Mathematique de France, Memoire No 60 (1979) 57-85. [80] R. B. Holmes, Geometric Functional Analysis and its Applications, Verlag, New York, 1975.

Springer

Convergence Theories

317

[81] A. D. Ioffe, Subdifferentiability spaces and nonsmooth analysis, Bulletin of the American Mathematical Society 10 (1984) 87-89. [82] A. D. Ioffe, On the theory of subdifferential, in Fermat Days 85: Mathematics for Optimization, J. B. Hiriart-Urruty (ed) Elsevier Sci. Pub. (North Holland) Amsterdam (1986) 183-200. [83] A. D. Ioffe, Approximate subdifferentials and applications II, Mathematika 33 (1986) 111-128. [84] A. D. Ioffe, Approximate subdifferentials and applications 3: the metric theory, Mathematika 36 (1) (1989) 1-38. [85] A. D. Ioffe, On some recent developments in the theory of second order optimality conditions, in Optimization, S. Dolecki ed., Lecture Notes in Maths, vol. 1405, Springer Verlag Berlin (1989) 55-68. [86] A. D. Ioffe, Variational analysis of a composite function : a formula for the second-order epi-derivative, Journal of Mathematical Analysis and Applications 160 (2) (1991) 379-405. [87] A. D. Ioffe, Composite optimization : second order conditions, value functions and sensitivity, Proc. Symposium Antibes, June 1990, A. Bensoussan and J.-L. Lions ed., Lecture Notes in Control and Information Sc. 144, Springer Verlag, (1990) 442-452. [88] A. D. Ioffe, Proximal analysis and approximate subdifferentials, Journal of the London Mathematical Society 41 (1990) 175-192. [89] A. D. Ioffe, Non-smooth subdifferentials : their calculus and applications, Pro ceedings International Symposium on Nonlinear Analysis, Tampa, August 1992. [90] A. Jofre and J.-P. Penot, Comparing new notions of tangent cones, Journal of the London Mathematical Society (2) 40 (1989) 280-290. [91] A. Jofre and L. Thibault, Proximal and Frechet normal formulae for some small normal cones in Hilbert space, Journal of Nonlinear Analysis Theory, Methods and Applications 19 (7) (1992), 599-612. [92] J.-L. Joly, Une famille de topologies et de convergences sur l'ensembledes fonctionnelles convexes , these d'Etat, Universite de Grenoble, 1970. [93] J.-L. Joly, Une famille de topologies sur l'ensemble des fonctions convexes pour lesquelles la polarite est bicontinue, Journal de mathematiques pures et appliquees 52 (1973) 421-441.

318

J. P. Penot

[94] T. Kato, Perturbaiion Theory for Linear Operators, Springer-Verlag, New York (1966). [95] H. Kawasaki, An envelop-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems, Mathematical Programming 41 (1988) 73-96. [96] J. L. Kelley, General Topology, Van Nostrand, Princeton, 1955. [97] J. L. Kelley and I. Namioka, Linear Topological Spaces, Van Nostrand, Princeton, 1963. [98] E. Klein and A. Thompson, Theory of Correspondences, Wiley, Toronto (1984). [99] A. Kruger, Properties of generalized differentials, Siberian Journal 26 (1985) 822-832.

Mathematics

[100] P. D. Loewen, The proximal normal formula in Hilbert space, Journal of Nonlinear Analysis Theory, Methods and Applicaiions 11 (1987) 979-995. [101] P. D. Loewen. Limits of Frechet normals in nonsmooth analysis, in: Optimization and Nonlinear Analysis, A. Ioffe et al. Eds. Pitman Research Notes 244, Longman, Harlow, 1992. [102] P. D. Loewen, A Mean Value Theorem for Frechet subgradients, Preprint, University British Columbia, Vancouver, August 1992. [103] J. E. Marsden, Countable and net convergence, The American Monthly 75 (1968) 397-398.

Mathematical

[104] E. J. McSchane, Partial orderings and Moore-Smith limits, The American Mathematical Monthly 59 (1952) 1-10. [105] Ph. Michel and J.-P. Penot, A generalized derivative for calm and stable functions, Differential and fntegral Equations 5 (2) (1992) 433-454. [106] B. S. Mordukhovich, Approximation Methods in Problems of Optimization and Control, Nauka, Moscow, 1988 (Russian; English translation to appear in Wiley Interscience) [107] B. S. Mordukhovich and Yongheng Shao, Extremal characterizations of Asplund spaces, to appear, Proceedings of the American Mathematical Society. [108] B. S. Mordukhovich and Yongheng Shao, Nonsmooth sequential analysis in Asplund spaces, Preprint, Wayne State University Detroit, 1994. [109] J. J. Moreau, Intersection of moving sets in a normed space, Scandinavica 36 (1975) 159-173.

Mathematica

Convergence Theories

319

110] U. Mosco, Convergence of convex sets and solutions of variational inequalities, Advances in Mathematics 3 (1969) 510-585. I l l ] F. Murat, Compacite par compensation, Annali della Scuola normale superiore di Pisa, Classe di Scienze (4) 5 (1978) 481-507. 112] F. Murat, H-convergence, Rapport du seminaire d'analyse fonctionnelle et numerique de l'Universite d'Alger, (1978). 113] J.-P. Penot, Topologies faibles sur des varietes de Banach, C.R. Acad. Sci. Paris 274 (1972) 405-408. 114] J.-P. Penot, Topologies faibles sur des varietes de Banach. Application aux geodesiques des varietes de Sobolev, Journal Differential Geometry 9 (1974) 141-168. 115] J.-P. Penot, Calcul sous-differentiel et optimisation, Journal of functional anal ysis 27 (2) (1978) 248-276. 116] J.-P. Penot, A characterization of tangential regularity, Journal of Nonlinear Analysis Theory, Methods and Applications 5(6) (1981) 625-643. 117] J.-P. Penot, Variations on the theme of nonsmooth analysis : another subdifferential, in: Nondifferentiable Optimization: Motivations and Applications, Proc Sopron, 1984, V.F. Demyanov and D. Pallasche, ed., Lecture Notes in Econ. and Math. Systems No 255, Springer-Verlag Berlin 1985, 41-54. 118] J.-P. Penot, Preservation of persistence and stability under intersections and operations , Preprint (1986), to appear Journal of Optimization Theory and Applications 119] J.-P. Penot, The cosmic Hausdorff topology, the bounded Hausdorff topology and continuity of polarity, Proceedings of the American Mathematical Society 113 (1991) 275-285. 120] J.-P. Penot, Topologies and convergences on the set of convex functions Journal of Nonlinear Analysis Theory, Methods and Applications 18 (10) (1992) 905916. 121] J.-P. Penot, Optimality conditions in mathematical programming, Preprint (1990) 122] J.-P. Penot, Optimality conditions for composite functions, Preprint (1990). 123] J.-P. Penot, On the convergence of subdifferentials of convex functions, Nonlin ear Analysis Theory, Methods and Applications 21 (2) (1993) 87-101.

320

J. P. Penot

1241 J.-P. Penot, Second-order generalized derivatives : relationships with conver gence notions, in: Nonsmooth Optimization : Methods and Applications, F. Giannessi, ed., Erice (1991) Gordon and Breach, Philadelphia, (1992) 303-322. 1251 J.-P- Penot, Optimality conditions for minimax problems, semi-infinite pro gramming problems and their relatives, Preprint. 1261 J.-P. Penot, Miscellaneous incidences of convergence theories in optimization and nonlinear analysis I: behavior of solutions, Set-Valued Analysis 2 (1994) 259-274. 1271 J.-P. Penot, On the interchange of subdifferentiation and epi-convergence, to appear in Journal of Mathematical Analysis and Applications. 128] J.-P. Penot, Yet another Mean Value Theorem, submitted. 129] J.-P. Penot, Favorable classes of mappings and multimappings in nonlinear analysis and optimization, to appear in Journal of Convex Analysis. 130] J.-P. Penot, Stabilized subdifferentials, in preparation. 131] J.-P. Penot and P. Terpolilli, Cones tangents et singularites, C. R. Acad. Sc. Paris, 296 (1983) 721-724. 1321 R- R- Phelps, Convex Functions, Monotone Operators and Differentiability, Lecture Notes in Mathematics, 1363, Springer-Verlag, New York, 1989. 1331 F. Plastria, Lower subdifferentiable functions and their minimization by cutting planes, Journal of Optimization Theory and Applications 46 (1985) 37-53. 134] R. T. Rockafellar, Directionally Lipschitzian functions and subdifferential cal culus, Proceedings of the London Mathematical Society 39 (1979) 331-355. 135] R. T. Rockafellar, Generalized directional derivatives and subgradients of nonconvex functions, Canadian Journal of Mathematics 32 (1980) 157-180. 136] R. T. Rockafellar, The Theory of Subgradients and its Applications to Problems of Optimization: Convex and Nonconvex Functions, Heldermann Verlag, Berlin, 1981. 137] R. T. Rockafellar and R. J. B. Wets, book in preparation. 138] E. Sanchez-Palancia, Homogenization Techniques for Composite Media, Lecture Notes in Physics 272, Springer-Verlag, Berlin (1987). [139] Y. Sonntag, Convergence des suites d'ensembles, monograph, to appear.

Convergence Theories

321

1401 N. E. Steenrod, A convenient category of topological spaces, The Michigan Mathematical Journal 14 (1967) 133-152. 1411 Y. Sonntag and C. Zalinescu, Set convergences: a survey and a classification, Set-valued Analysis 2 (1994) 339-356. 1421 L- Thibault, On subdifferentials of optimal value functions, SIAM Journal on Control and Optimizations 29 (5) (1991) 1019-1036. 143] L. Thibault and D. Zagrodny, Integration of Subdifferentials of Lower Semicontinuous Functions on Banach Spaces, Preprint, 1992. 1441 J- S. Treiman, Clarke's generalized gradients and epsilon-subgradients in Ba nach spaces, Transactions of the American Mathematical Society 294 (1986) 65-78. 1451 J- S. Treiman, Shrinking generalized gradients, Journal of Nonlinear Analysis Theory, Methods and Applications 12 (1988) 1429-1450. 1461 J- S. Treiman, An infinite class of convex tangent cones, Journal of Optimization Theory and Applications 68 (3) (1991) 563-582. 1471 J. S. Treiman, The linear nonconvex generalized gradient and Lagrange multi pliers, Preprint, Western Michigan University, Kalamazoo, January 1993. 1481 D. Walkup and R. Wets, Continuity of some convex cone-valued mappings, Proceedings of the American Mathematical Society 18 (1967) 229-253. 1491 D. E. Ward, Convex subcones of the contingent cone in nonsmooth calculus and optimization, Transactions of the American Mathematical Society 302 (2) (1987) 661-682. 1501 D. E. Ward, Which subgradients have sum formulas? Journal of Nonlinear Analysis Theory, Methods and Applications 12 (1988) 1231-1243 15ll D. E. Ward, The quantificational tangent cones, Canadian Journal of Mathe matics 40 (3) (1988) 666-694. 1521 J- Warga, Derivate Containers, Inverse Functions and Controllability, Calculus of Variations and Control Theory, Academic Press, New York, (1976) 13-46. 1531 J- Warga, An implicit function theorem without differentiability, Proceedings of the American Mathematical Society 69 (1978) 65-69. 1541 A. Wilanski, Topics in Functional Analysis, Lecture Notes in Math. 45 Springer Verlag, Berlin 1967.

322

R. A. Poliquin

and R. T.

Rockafellar

Recent Advances in Nonsmooth Optimization, pp. 322-350 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Second-Order Nonsmooth Analysis in Nonlinear Programming Rene Poliquin 1 Department of Mathematical Canada, T6G SGI Terry Rockafellar 2 Department of Mathematics,

Sciences,

University

University

of Alberta,

of Washington,

Edmonton,

Seattle,

WA 98195

Alberta

USA

Abstract

Problems of nonlinear programming are placed in broader framework of com posite optimization. This allows second-order smoothness in the data structure to be utilized despite apparent nonsmoothness in the objective. Second-order epi-derivatives are shown to exist as expressions of such underlying smooth ness, and their connection with several kinds of second-order approximation is examined. Expansions of the Moreau envelope functions and proximal map pings associated with the essential objective functions for certain optimization problems in composite format are studied in particular.

1

Introduction

Problems in nonlinear programming are customarily stated in t e r m s of a finite system of equality and inequality constraints, defining a feasible set over which a certain func tion is to be minimized. For most numerical work it is assumed t h a t t h e constraint and objective functions are C 2 , so t h a t second-order methodology can be utilized. 'This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under grant OGP41983. 2 This work was supported in part by the National Science Foundation under grant DMS-9200303.

Second-order Nonsmooth

Analysis

323

This is taken as the model for "smooth" optimization, and any problem whose objec tive function fails to enjoy such differentiability, for instance by being only piecewise C2, belongs then to the category of "nonsmooth" optimization. But in practice a distinction between smooth and nonsmooth optimization based on such grounds is artificial. Many problems that start out with a nonsmooth objective, perhaps involving penalty functions and "max" expressions, can be recast with a smooth objective. On the other hand, nominally smooth problems with inequality constraints inherently exhibit nonsmoothness in their geometry. Anyway, techniques for solving those prob lems often veer into nonsmoothness by appealing to merit functions or dualization. The real issue in numerical and theoretical optimization alike is how to represent and exploit to the fullest whatever degree of smoothness may be available in a prob lem's elements. In this respect the traditional format falls short. Its deficiency is that it places all the emphasis in problem formulation on making a list of constraints, which must be simple equations or inequalities, each associated with an explicit constraint function, and afterward merely specifying one additional function for the objective. While a vehicle is provided for working with nonsmoothness in the boundary of the feasible set, none is provided for nonsmoothness as it might be found in the graph of the function being minimized, or for that matter for any other structural features of the objective. In contrast, the composite format for problems of optimization treats both con straints and objective more supportively and is able to span a wider range of situations with ease. In the composite format, a problem is set up by specifying a representation of the type (V)

minimize

f(x) := g(F{x))

over x G Rn,

where F : Mn —> R™ is the data map-ping and g : Mm —* 1R is the model function. The mapping F supplies the problem's special elements and carries its smoothness, whereas the function g provides the structural mold. Not only does g have no need to be smooth, it can even be extended-real-valued, with values in M = [—00,00] instead of just M = (—00,00). This is central to the idea. The feasible set in (V) is defined to be C = d o m / := [x I f(x) < 00} = [x \ F(x) £ D}, where D = domg := lu \g(u) < 00 j . Here we aim at applying second-order nonsmooth analysis to the essential objec tive function / of problem (V) in this format. Keeping close to the ordinary domain of nonlinear programming, if not the usual framework, we concentrate on the case in which • F is a C2 mapping, and • g is a proper, convex function that is polyhedral,

R. A. Poliquin and R. T. Rockafellar

324

i.e., such that the set epig := {(u,a) G Mm x m\a > p(u)} is polyhedral convex, cf. [17, Section 19]. The set D is polyhedral then as well. The nature and extent of the problem class covered under these restrictions is explored in Section 2 along with the relationship to "amenable" functions, which by definition have composite expressions / = g°F with smooth F and convex g satis fying a certain constraint qualification. For amenable functions a highly developed theory of first- and second-order generalized derivatives is now in place and ready for application under the circumstances described here. Formulas for such derivatives are worked out in Section 3 and incorporated into optimality conditions in the composite format, in particular second-order conditions related to epigraphical approximation. In Section 4, second-order expansions in terms of uniform convergence instead of epi graphical convergence are studied, and the question of Hessian matrices in a standard or generalized sense is taken up. Finally, Section 5 analyzes the Moreau envelope functions e A (z) := m i n | / ( x ' ) + — \x - x^\

for A > 0,

which relate to epigraphical approximation of / because e A (x) increases to f(x) as A \ 0 . These functions not only approximate but provide a kind of regularization of / . While / may be extended-real-valued and have discontinuities (in particular, jumps to oo), eA is finite and locally Lipschitz continuous and has one-sided directional derivatives at all points. Moreover, the minimizing sets agree: argmine A = argmin/ for all A > 0. We investigate the degree to which second-order properties of eA at minimizing points x correspond to such properties of / at these points. Secondorder properties of eA have a bearing on numerical techniques like the proximal point algorithm in the minimization of / , since they inevitably depend on the proximal mapping P\(x) : = argmin j f(x') + — \x' — x\ > for A > 0. x'

^

2A'

i i

This phase of our effort owes its inspiration to recent work of Lemarechal and Sagagastizabal [6], followed by Qi [16], who were motivated by the goals just mentioned. These authors have concentrated on finite, convex functions / , not necessarily of the composite form adopted here, whereas we relinquish convexity and welcome infinite values in order to obtain results that deal with constraints. On the other hand, Qi [16] takes up the topic of semi-smoothness of Ve A , which is not addressed here.

2

Problem Characteristics and Amenability

To understand better the class of optimization problems (V) covered by the composite format through some choice of a C2 mapping F and polyhedral function g, it helps first to see how problems that are stated in the traditional manner can be accommodated.

325

Second-order Nonsmooth Analysis

Example 2.1. For C2 functions / o , / i , . . ,,fm on R", consider the minimization of fo{x) subject to , /, v' f < 0 for» fori = 1,...,«, l,...,s, /Ml) ( l ) " {=0 \ = 0 fori=8 + l,...,m. This fits the composite format of minimizing f = g°F over Rn for the C2 mapping F : Rn -» 2Rm+1 defined by F(x) = ( / o ( x ) , / i ( s ) , . . . ,/»(*)) and the polyhedral function g : J? m + 1 -> 3? defined by f ., J < 0 for fori i = = 1,...,s, U i j W = S ( u0,u g(u)=g(u « 1 ,m.). . , u r a ) == \U°"" ,", Ui \\ == 00 fori for i = s + I,.. l , . . . ,.,m, m, 0 ,u...,u [ oo otherwise. Next we look at an apparently very different model, which illustrates accommodations that can be made to nonsmoothness. Example 2.2. For C2 functions fu...

Jm

on ST, consider the minimization of

/ ( * ) = max {/,1*),,..,/m(x)} /W=m»{/,1x),,..,/ m(x)} over all x € BT (no constraints). This fits the composite format f = goF with F(x) = (/i(x),...,/ra(x)) andg{u) = g{uu...,um) = m a x { u 1 , . . . , u a } . The mapping F is C2 and the function g is polyhedral. It is well known that this kind of problem, although nominally concerned with unconstrained minimization of a nonsmooth function, can be posed instead in terms of minimizing a linear function subject to smooth inequality constraints. Indeed, in the notation x = (x,a) £ Rn+1 it corresponds to minimizing f0(x) subject to fi(x) < 0 f o r i = 0 , 1 , . . . , m , where f0(x) = a and /,(£) =/„-(x) - a for t = l , , . . , m . Thus it surely deserves to be treated on a par with other problems where smoothness dominates the numerical methodology, at least as long as the dimension n is not unduly large. Another sort of flexibility in the composite model comes to light in the way constraints can be handled in patterns deviating from the standard one in Example 2.1. Simple equations and inequalities can be supplemented by conditions that restrict a function's values to lie in a certain interval. Box constraints on x do not have to be written with explicit constraint functions at all. Example 2.3. For C2 functions / 0 , / i , . . . , / m on Rn, nonempty closed intervals / , , . . . , / „ in R and a nonempty polyhedral set X C Rn, consider the problem of minimizing f0{x) over the set C:={xeX|/,(x)€/„ C:={xeX\f,(x)€l„iJ or equivalent^,

.m}, = l . . l...m},

minimizing f(x) over all x G Rn in the case of

f(x) = /„(*) + 6c(x) = | f^x) fffXx II £'£ ' / ( * ) = fo(x) + &c{x) = I f^x)

X f x

R. A. Poliquin and R. T. Rockafellar

326

This concerns f = g°F for the C2 mapping F : M1 -> Rm+n+i

defined by

F(x) = (/ 0 (x), / i ( * ) , . . . , / » ( » ) , *) and the polyhedral function g : K m + n + 1 —► 5? defined by . f / u, G 7,fon = 1 , . . . , m, | ( u m + i , . . . , u m + n ) 6 X, | oo otherwise.

g{u) = g(u0, u j , . . . , u m , u m + 1 , . . .

Example 2.3 encompasses Example 2.1 as the special case where X = 1R" and li = (—oo, 0] for i = 1 , . . . , s but 7,- = [0,0] for t = s + 1 , . . . , m. On the other hand, Example 2.3 could be extended by taking /o to be a max function as in Example 2.2, f0(x) = m a x { / o i ( i ) , . . . , foT(x)}. Then the C2 functions /ok would become additional components of F, and the uo part of u would turn into a vector (uoi,. •., «or), with m a x { u 0 i , . . . , u0r} entering the formula for g(u). An alternative way of arriving at nonsmoothness in the objective is illustrated by the following model. Example 2.4. For C2 functions fo, fi, ■ ■ ■, fm on Mn and proper polyhedral gt : M —+ M for i = 1 , . . . , m, the problem of minimizing fo(x)+gi(h(x))

+

functions

---+gm(fm(x))

over all x 6 Rn corresponds to f = g°F for the C2 mapping F with F(x) (fo(x), fi{x),..., fm(x)\ and the polyhedral function g with g(u) = g{u0,uu...,um)

=

= u 0 + #i("i) + •■• + ^ m ( u m ) .

Polyhedral functions g, of a single real variable as in Example 2.4 are piecewise linear convex functions in the obvious sense, except that they could have the value oo outside of a some closed interval /;. As a special case, such a function could have just one "piece," being afrine on /,, or even just 0 on 7; (with the term gi{fi(x)) just representing then a constraint fi(x) 6 7,). Piecewise linear functions with multiple slopes arise in a setting like Example 2.4 when constraints are relaxed by linear penalty expressions. Of course, a geometric constraint i £ l with X polyhedral (e. g. a box—a product of closed intervals, not necessarily bounded) could be built into Example 2.4 as in Example 2.3. Within nonsmooth analysis, the composite format in optimization is closely as sociated with concept of "amenability." For simplicity in stating the definition and working with it in the rest of the paper, we introduce the following notation. For any mapping F : Rn —* Mm and any vector y G Mm we simply write yF for the scalar function defined by (yF)(x) = (y,F(x)). Thus, {yF)(x) = yifi(x)

+

1- ymfm(x)

F = ( / i , . . . , / m ) and y =

when

(yi,...,y„ Vmh

Second-order Nonsmooth

327

Analysis

and if F is C1 with Jacobian V F ( x ) one has further that V(yF)(x)

= yiVfi(x)

+ ■■■ + » m V/ m (x) =

VF(x)Ty.

Definition 2.5. A function f Mn —► Ht is amenable at x if f(x) is finite and, at least locally around x, there is a representation f = g«F in which the mapping F is C , the function g is proper, lsc (lower semicontinuous) and convex, and the following condition, an abstract constraint qualification, is satisfied by the normal cone No(F(x)j to the convex set D = domg at F(x): there is no vector y ± 0 in ND(F(xj)

with V(yF)(x)

= 0.

(CQ)

It is strongly amenable if F is C2 rather than just C1, and fully amenable if, in addition, g is piecewise linear-quadratic. To say that g is piecewise linear-quadratic is to say that its effective domain D is the union of finitely many polyhedral sets, on each of which the formula for g is linear-quadratic, i.e., a polynomial of degree at most 2. When no quadratic terms are involved, g is just piecewise linear (piecewise affine might be a better term). The convex functions that are piecewise linear are precisely the polyhedral functions of convex analysis we have been referring to so far. This leads to the following observation, which paves the way for us to applying the theory of amenable functions, cf. [11]—[15], to the class of problems under consideration. Proposition 2.6. For problem (V) in the composite format with F of class C2 and g polyhedral, let x be a point of the feasible set C at which constraint qualification (CQ) is satisfied. Then the essential objective function f is fully amenable at all points x £ C in some neighborhood of x. Proof. This merely records the import for problem (V) of the observations just made, utilizing the fact that if (CQ) holds at x it must hold for all x £ C in some neighborhood of x (cf. [13]). D The constraint qualification (CQ) is satisfied trivially when JF(X) £ intD, since No(u) = {0} at all points u £ intZ). To see what it means in other situations, we inspect the preceding examples one by one. Example 2.1'. In Example 2.1, the constraint qualification (CQ) reduces to the Mangasarian-Fromovitz condition (written in its equivalent dual form): unless all the coefficients j / i , . . . , ym are taken to be 0, it is impossible to have the equation jfiV/i(x) + --- + ym V / m ( i ) = 0 with yi > 0 for indices i £ { 1 , . . . ,s] such that /,(x) = 0, and j/, = 0 for indices i £ { 1 , . . . ,s} such that / , ( i ) < 0 (but y, unrestricted for indices i £ {s + 1,... ,m}). Detail. The set D in this case consists of all vectors u = ( u i , . . . , « m ) such that u: < 0 for i = 1 , . . . ,s and u,- = 0 for i = s -f 1, — , m . For any u £ D, therefore, the

R. A. Poliquin and R. T. Rockafellar

328

normal cone JVD(u) consists of the vectors y with y, > 0 for i 6 { 1 , . . . , s} such that u,- = 0, whereas yi = 0 for i £ { 1 , . . . , s] such that ut < 0. O Example 2.2'. In Example 2.2, condition (CQ) reduces to triviality; it is satisfied automatically at every point x € 2R". Detail. In this case D = Mn, hence F(x) g intZ) always.

□

Example 2.3'. In Example 2.3, the constraint qualification (CQ) at a feasible point x means that the only multipliers yi € Njt (/;(s)J " E™ i!/.V/,(i) £ Nx(x)

are

Vl

satisfying

= 0,... ,ym = 0.

Here 7, is a closed interval with lower bound a, and upper bound 6, (these bounds possibly being infinite, with a; < 6,j, and the relation yi € Nj, (fi(x)j restricts sign of yi in the following pattern, depending on how the constraint fi(x) 6 7; is satisfied at x relative to these bounds:

yi e TV/, (/,(*))

j/i yi yi yi

> 0 <0 =0 free

when when when when

a, a, a, a,

< = < =

fi(x) /,(x) /,(x) /,(x)

= 6,, < 6,, < 6;, = &;.

x

Nx(x).

Detail. The representation / = g°F for this case has D - Jj x • • • x J m x X, and consequently ND(F(x))

= Nh (/,(*)) x • •• x NIm (fm(x))

The characterization of the one-dimensional relations t/, 6 Nit («,) is elementary.

□

Note that the constraint qualification in Example 2.3' reduces to the MangasarianFromovitz condition in Example 2.1' when X is the whole space, so that Nx(x) = {0}, while Ii = (-oo,0] for i = 1 , . . . ,$ (so that TV/,(«;) equals [0, oo) if u ; = 0 but equals {0} if u, < 0), whereas Ii = [0,0] for i — s + 1 , . . . ,m (so that iV/,(u,) = (-00,00) as long as Ui = 0). Example 2.4'. In Example 2.4 with the closed intervals domj, denoted by Ii (these possibly being all of Si for some indices i), the constraint qualification (CQ) takes the same form as it does in Example 2.31, except that Nx(x) is replaced by {0}. The examples have indicated the advantages of the composite format in allowing optimization problems to be expressed in a variety of ways. But just how general is the class of problems the composite format covers under our restrictions? This question is answered by the next result.

Second-order Nonsmooth

Analysis

329

Theorem 2.7. The optimization problems that can be placed in the composite for mat as (V) for a C2 mapping F and a polyhedral function g are precisely the ones which, in principle, concern the minimization over a set C, specifiable by a finite system of C2 equality and inequality constraints, of a function f0 that is either C2 itself or expressible as the pointwise max of a finite collection ofC2 functions. Moreover, the representation can always be set up in such a way that a point x E C satisfies the constraint qualification (CQ) for (V) if and only if it satisfies the Mangasarian-Fromovitz condition relative to the equality and inequality constraints utilized in representing C. Proof. If an optimization problem has a representation of the kind described, it fits into the composite format in the manner of Example 2.1 as supplemented by the device explained after Example 2.2. Then (CQ) reduces to the MangasarianFromovitz constraint qualification just as in Examples 2.1'. Conversely, suppose / = g°F for a C2 mapping F and a polyhedral function g. The epigraph set e p i / consists then of the points (x,ct) such that (F(x),a) € epig. To say that g is polyhedral is to say that epi g can be represented by a finite system of linear constraints, say ,

,

.

(u,a)eepig

, , ^

, f < 0 for k = 1,.. 1 , . . . .,q,

Zt(«,Q)|

= Q

for

fc

=

g +

1

where each function 4 is affine on JR m+1 Without loss of generality this system can be set up so that the Mangasarian-Fromovitz condition is satisfied at all points of epig. (Proceeding from an arbitrary system, one can rewrite as equality constraints any inequalities that never hold strictly, and then pare down the list of equality constraints until none is redundant.) The equality constraint functions lk must have the form /*(«, a) = (a^, u) — bk for some vector ak € Mm and scalar bk 6 M, since otherwise the hyperplane defined by h(u,a) = 0 could not contain epig. The same form may be present for some of the inequality constraint functions. We can suppose that for a certain p < q all of the functions lk for k = p-f 1 , . . . , r have this special form, whereas for k = 1 , . . . ,p none of them has it. In the latter case we can rescale Ik to write it 4(u, a) = (a/,, u) — b^ — a for some a^ 6 JRm and 6* € R, since otherwise, again, the half-space defined by h{ti,a) < 0 could not contain epig. The set D = domg is given then by r

u€D

n

^

, « . / <0 (a*,u)-fc*j = Q

for k = forfc = 9 +

p+l,...,q, 1)..,jr.

We have V/*(u,or) = ( a * , - l ) for k = l , . . . , p , but Vlk(u,a) = (ak,0) for k = p + 1 , . . . , r. The fact that the Mangasarian-Fromovitz condition holds everywhere for the system representing epig implies that it holds everywhere for this system representing D.

330

R. A. Poliquin and R. T. Rockafellar

Because e p i / consists of all pairs (x,a) such that (F(x),a) € epig, it is specified by lk(F{x),a) < 0 for k = 1 , . . . ,q and lk(F(x),a) = 0 for k = q + 1,. - . , r . Let M x ) = (a*>F(x)) -bkhti = 1,...,r. Thus, according to what we have arranged, ( xi ,, QQ )) ee e p i /

[
In other words, the set C = dom / is specified by /i*(x) < 0 for k = p + 1,...,q and M x ) = 0 for k = q+ 1,...,r, and the problem of minimizing / over Mn corresponds to minimizing over this set C the function /„(*) = m a x { / i i ( i ) ) , . . , &,(x)}. How do the constraint qualifications correspond in this framework? Consider any x € C. Condition (CQ) forbids the existence of a nonzero vector y € ND[F{X)) such that V(yF)(x) = 0. We know from the representation given to D that consists of all y = E L P + i ^ak sucn that AJ

ND[F(x))

f > 0 forJfcS{p+1,•..,g} for/fcS{p+1,•..,
where furthermore (because the Mangasarian-Fromovitz condition is satisfied univer sally in the representation of D) the vector y = ELp+i ^*«* cannot be 0 unless all the coefficients \k vanish. It follows that the vectors of the form V(yF){x) for some y € ND(F(x)) are precisely those of the form £t=p+ l \kVhk(x), and that (CQ) re quires, under the restrictions listed for \k, that the zero vector cannot be expressed in this form except by taking every \k = 0. Thus, (CQ) at x comes out as identical to the Mangasarian-Fromovitz constraint qualification at x relative to the specification of C by the functions hk. □ In the statement of Theorem 2.7, the words "in principle," "specifiable," and "expressible" warn that although it may be possible to reduce a problem to the special form described, this may be neither easy nor expedient. The advantage of the composite format is that it bypasses such reformulation and allows one to move ahead without it, if that is preferred.

3

Subgradients, Epi-derivatives and Optimality

Our task in analyzing problem (V) is greatly assisted by Proposition 2.6. When a function / : Mn -> R is amenable at x, it is Clarke regular at x in particular; cf. [2] and [12]. In consequence, all the various definitions of "subgradient" that might in general be invoked lead to the same set df(x). Derivatives simplify as well. First-order one-sided derivatives arise from consider ing difference quotient functions

A,,,/ : ££ ^>-> [/(as [/(*++t() *0-- f(x)]/t / ( * ) ] / * for fort «>>0.0.

Second-order Nonsmooth

Analysis

331

Classical differentiability of / at x can be identified with the case where, as t\0, the functions A£,tf converge pointwise, uniformly on all bounded sets, to some linear function. Such uniform convergence, even if to a possibly nonlinear function, is too narrow an idea, though, to serve when / is extended-real-valued, as we wish it to be here in harmony with our mode of handling constraints. A substitute notion with many interesting ramifications can be based instead on epi-convergence of functions, which expresses set convergence of their epigraphs. We say that / is epi-differentiable at x if, as t \ 0 , the functions A i i t / epi-converge to a proper function h; such a limit function need not be linear but must of necessity be lsc and positively homogeneous. Then h is the first-order epi-derivative function for / at x and is denoted by ft. The property of epi-convergence translates into having, for each choice of a sequence t " \ 0 and a vector £, that liminf^ A ii( i-(^' / ) > /^(£) lim sup„ A i|t i-(f ") < / j ( 0

for every sequence £" —» £, for some sequence £" —> £.

We say further that f is strictly epi-differentiable at x if, not only as < \ 0 but as x —► x with f(x) —► f(x), the functions A X | ( / epi-converge (the limit in this wider sense necessarily still being the function /£ 4 ). Theorem 3.1. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution to (V) at which condition (CQ) holds. Then at all feasible solutions x in some neighborhood of x, f is epi-differentiable at x and has at least one subgradient there as well, the subgradients being characterized as the vectors v such that fW)

> f(x) + (v,x'-x)

+

o(\x'-x\).

The epi-derivative function f'T is convex and positively homogeneous, the subgradient set df(x) is convex and closed, and the two are related by

fM)=

sup (v,0, v£dj(x)

df(x)

= {v\f'x(t)>(v,0

for

all^}.

Furthermore, these epi-derivative functions and subgradient sets are obtained from those for g by the formulas m)=9'F(x){VF(xK), df(x)

= {v(yF)(x)\y

e

dg(F(x))}.

In addition, there is a neighborhood U of x such that, relative to U x Rn, the set of points {x,v) with x € U and v 6 df(x) is closed, and relative to this set, the mapping (x,v)~Y(x,v)

:= {y I V{yF)(x)

= v,

y£dg(F(x))},

332

R. A. Poliquin and R. T. Rockafellar

is locally bounded with closed graph, hence in particular compact-valued, function (x,v) H-+ / ( X ) is continuous.

while the

Proof. From Proposition 2.6 we know that / is fully amenable at every point x G C near enough to x. All these properties, except for the very last (concerning continuity of / ) , are already understood to hold for any amenable function; see [19], [22]. Really, they only need F to be C1 and g to be lsc, proper, convex. The last property has been established in [14, Prop. 2.5] in the name of strongly amenable functions, but again the proof only requires amenability. O Moving to second-order concepts, we work with second-order difference quotient functions which depend not only on a point a; where / is finite but also on the choice of a subgradient v g df(x), namely the functions

K,v,J ■ t •- [/(* + t() - f(x) - t{v,0]/\t2

for t > 0.

We say that / is twice epi-differentiable at x for a vector v if f(x) is finite, v 6 df(x), and the functions A? 5 , / epi-converge to a proper function as t \ 0. The limit is then the second epi-derivative function f'J^ : JRn —> 1R\ see [12], [19] and [21]. When df(x) is a singleton consisting of v alone, the notation / j - can be simplified to f'l- The second epi-derivative function, when it exists, has to be lsc and positively homogeneous of degree 2, although not necessarily quadratic. Further, we call / strictly twice epi-differentiable at x for v if the stronger property holds that the functions A j , , , / epi-converge as ( \ 0 , x —> x with f(x) —► f(x), and v —> 8 with

v 6 df{x). It is important to appreciate that, because it is defined in terms of epi-convergence, second-order epi-differentiability is essentially a geometric property of approximation of epigraphs. This kind of approximation differs in general from the classical kind of approximation expressed by uniform convergence of functions on bounded sets, although key relationships can be detected in special situations. Such uniform con vergence is not a viable concept for broad use in an environment like ours here. Circumstances where it does nicely come into play will be identified in Sections 4 and 5, where second-order "expansions" of / and its envelopes eA will be considered. For now, f'lv has to be thought of as providing a second-order approximation

f(x + tO* f(x) + t(v,t) + \t2fls(0, not in the usual sense of local uniformity, but the closeness of epi A? s ( / to epi f?-. A rather remarkable fact about second-order epi-differentiability was established in [19]: when / is fully amenable at x, it is twice epi-differentiable there for every v 6 df(x). The widespread availability of this property in the context of optimization is what makes it especially interesting. Before looking at what the theory of second epi-derivatives tells us about the class of problems under consideration, we look at a parallel concept which turns out to be closely connected with this.

Second-order Nonsmooth

Analysis

333

Second-order differentiation of / can be contemplated also in the framework of first-order differentiation of the subgradient mapping df : Rn ■=? Mn (where df(x) is always regarded as the empty set when f(x) is not finite). For any set-valued mapping T : Mn =? FT, one can work with difference quotient mappings AXiVitT : Mn =? Mn associated with pairs (x, v) in the graph of X, namely A», M T : { >-> \T(x + tO - v]/t

for t > 0.

The mapping T is said to be proto-differentiable at x for v if v 6 T(x) and the mappings A i i S i ( r converge graphically to a mapping A as r \ 0 , in which event the limit mapping is denoted by T^e and called the proto-derivative of T at x for v; see [13], [20], [22]. (Graph convergence of these mappings refers to the convergence of their graphs as subsets of Mn x IFF.) We say that T is strictly proto-differentiable at x for 0 if in fact the mappings AXiVitT converge graphically to 2^ a as i \ 0 and (x,v) —> (x,v) with v 6 T(x). Again, a geometric notion of approximation is invoked. We have T(x + tO

wT(s)+^(0,

not with respect to some kind of uniform local bound on the difference, but in the sense that the set epi AiiC,(T can be made arbitrarily close to epi T's B (relative to the concepts of set convergence appropriate for unbounded sets) by taking the parameter t > 0 sufficiently small. The mapping T~fi assigns to each £ 6 Mn a subset T'%^(£) of Mn, which could be empty for some choices of £. When T(x) is a singleton consisting of t; only (as for instance in the case where T is actually single-valued everywhere), the notation T's g (£) can be simplified to T±(£). In stating the next theorem, we continue the notation introduced in advance of Definition 2.5 by writing V2(yF) for the matrix of second partial derivatives of the function yF : x t-> (y, F(x)). Then

V2(yF)(x) = !,,V 2 /,(i) + • • ■ +

ymV2Ux)

when F = ( / i , . . . , / m ) and y = (yt,...

,ym).

Theorem 3.2. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution to (V) at which condition (CQ) holds. Then at all feasible solutions x in some neighborhood of x, and for all subgradients v G df(x), (a) / is twice epi-differentiable at x for v, (b) df is proto-differentiable at x for v, and the second epi-derivative function f"v and proto-derivative mapping {df)'xv are related to each other by (3/)i„ =

d(lf'lv)-

R. A. Poliquin and R. T. Rockafellar

334 Furthermore, one has the formula f {

max {£,V*(yF)(x)t) oo

if(eE(x,v), if(£E(x,v),

where Y(x,v) is the compact subset of EV" defined in Theorem 3.1 and E(x,v) is the closed cone in jR" defined by E(x,v) = Nsm(v)

= {t\fx(£)

= (v,()}

= {t\fx(0

<

{v,0}-

In fact Y(x, v) and E(x, v) are polyhedral, and in terms of the finite set Ycxi(x, v) consisting of the extreme points of Y(x, v) the second-order epi-derivative formula can be written as f"AO

f max U,V2(yF)(x)A = \ ven„(x,,) x ( oo

if (V(yF)(x)

- «,f> < 0 for all y G

Y(x,v),

otherwise.

Proof. Once more we appeal to Proposition 2.6 for the observation that our hy pothesis implies / is fully amenable at points x near enough to x with f(x) finite. Then we apply the twice epi-differentiability result and formula of [19] with the protodifferentiability result and formula of [10]. This, in combination with the results in Theorem 3.1, takes care of all the assertions except those at the end relying on the polyhedral nature of Y(x,v). The fact that Y(x,v) is polyhedral is obvious from its definition in Theorem 3.1 as the set of vectors y £ dg(F(x)J satisfying the linear equation VF(x)Ty = v, since the subgradient set dg{F(x)\ is itself polyhedral (due to g being polyhedral). Indeed, this has previously been observed in [11], [13]. For any fixed vector £ the function V •-» (£i ^2{yF)(x)w ' s linear, so its maximum over Y(x,v) has to be attained at one of the finitely many points of Yext(x, v). Because the set df(x) is the image of the polyhedral set dg(F(x)j under the linear mapping y >-» V(t/i7')(x) = VF(x)Ty, it is polyhedral as well. Then E(x,v) must be polyhedral, since it is the normal cone to df(x) at v. The definition of this normal cone characterizes E(x,v) as consisting of the vectors £ such that (v' — v,() < 0 for all v' G df(x). Hence it consist of all £ such that (V(yF)(x) - v,t;) < 0 for all y€dg(F(x)). D The last part of Theorem 3.2 reveals interestingly enough that the second epiderivative function /"„ has the same character as that ascribed to / itself in Theorem 2.7, although simpler. It is the max of finitely many C2 (actually quadratic) functions plus the indicator of a set defined by finitely many C2 (actually linear) constraints. Note again that just because we know that a set can in principle be expressed in terms of such constraints, this does not mean we can readily make use of such an

Second-order Nonsmooth Analysis

335

expression. To write E(a, v) in terms of a finite system of linear constraints we would have to identify all extreme points and extreme rays of dg(F(x)j. Depending on the circumstances, this might or might not be easy. Additional formulas for the proto-derivative mapping (df)'x can be developed from this description of f"v by following the lines in [13]. To see more closely what the results in Theorems 3.1 and 3.2 mean in common situations, we focus on two key cases, the ones in Examples 2.2 and 2.3 (as extended in Examples 2.2' and 2.3'). Example 3.3 [13, Thm. 2]. In the problem of Example 2.2, consider any x G Mn and let 7(a) denote the set of indices i such that fi(x) = f(x). Then f is epi-differentiable at x and has at least one subgradient there, with a / ( x ) = ccoo {{Vfi(x) v / , ( x ) I zi GG7I(x)}, (x)}, 8f(x)

f'x(0== -- max max (V/,-(x),£). (V/,(a),A. f'M) ■ £/(r)

Moreover, / is twice epi-differentiable at x for any subgradient v G df(x), second-order epi-derivative function given by

I

/

with the

m

V m -v,(\ < 0 for all i G 7(a), max YjkfaV'M'X) i{^(V /.'(^)f"M) = \ I 6m < /.(^) - « , { > < 0forall i G I(x), i ; ,a ix ( i , » )YjkfaV'M'X) ,=l [ !»€*V«t 00{*.») 1=1 otherwise. oo otherwise. where Y„t{x, v) is the finite set of extreme points of the compact polyhedral set yi > 0 ifi e I{x), y{ = 0 if i £ 7(a), Y{x,v) := ly y > 0 if i € 7(a), y{ = 0 if i $ I(x), Y(x,v):

-{'

{

EZLilK = 1, E£i¥
I

( 1 7>, (/.(*)) = | [

( — oo,0] [0,oo) (-00,00) [0,0] [0,oo) (-00,00) [0,0]

f;(x) = &, -, when a, < /;(x) &,, when a; = /,(x) < 6,, when a, < /,(x) < b{, when a; = f,(x) = b(. when a; = f-(x) < b„ when a, < f,(x) < bh when a; = f,(x) = &,.

R. A. Poliquin and R. T. Rockafellar

336

Example 3.4 [13, Thm. 4]. For the problem of Example 2.3, consider any x e C satisfying the constraint qualification described in Example 2.3!. Let L{x,y) = }0{x) + yifi(x)

+

\-ymfm{x).

For all x 6 C in some neighborhood of x, f is epi-differentiable at x and has at least one subgradient there, with df(x) = V/ 0 (x) + Nc(x) = {vxL(x,y)\yi fry

=

f (V/o(*U) ( oo

6 Ni, (/*(*))} +

Nx(x),

# £ e Tjr(x) and ( v / , ( x ) , £ ) £ 7/.(/,(x)) for all i, otherwise.

Moreover / it twice epi-differentiable at x for every subgradient v 6 df(x), with the second-order epi-derivative function given in terms of the Lagrangian L by fZJO

= max (t,VlJL(x,y)t)

+

63M(0,

where Y(x, v) is a compact polyhedral set and H(x, v) is a polyhedral cone, namely Y(x,v)

= {y

y,€N,,(f,(x)),

5(i,B)={fgrc(*)

v-VxL(x,V)

€

Nx(x)\,

( « - V / 0 ( x ) , € ) = o}

= {( 6 Tx(x

(VM*U)

6 Ti, (f,(x))

for all i, (v - Vf0(x),(.)

= o}.

Here Y(x, v) can be replaced in the max expression by its finite set of extreme points. In Example 3.4 the function / 0 has been assumed to be C2, but the methodology is not limited to that case. We could easily go further by taking / = f0 -f Sc with the set C chosen according to the specifications in Example 2.3, but with f0 taken to be any fully amenable function. In particular, /o could be a max function of the kind in Examples 2.1 and 3.3, hence nonsmooth. This generality is attained through the calculus we have developed in [12], which provides formulas for /"„(£) and (df)'x „(£) when / is expressed as the sum of two fully amenable functions under an associated "constraint qualification" on the domains of the functions. For f = f0 + £c this constraint qualification is satisfied in particular when f0 is finite everywhere, as in the max function case. Then df(x) = df0(x) + Nc(x), and for any v G 9f(x) one has in terms of the set V(x,v)

:= {(«D,«I)|VO 6 9f0(x),

vz e Nc(x),

v0 + i>i = t>}

the expressions £.({)=

wuo=

max

{(/o): i 1 w (0 + ( * o ) : « ( f ) } ,

u (vo,«l)eViri»«(l.l'.0

fMlj« + WU«)}. '

Second-order Nonsmooth

337

Analysis

where ^ ^ ( i j U , ^ ) is the set of vectors (t>o,t>i) that achieve the maximum. The problem in Example 2.4 could likewise be handled by such calculus or tackled directly through Theorems 3.1 and 3.2. The first- and second-order epi-derivatives that have been shown to exist for the general problems in composite format we have been considering can be used employed in particular in the statement of optimality conditions. Theorem 3.5. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution at which the condition (CQ) is satisfied. Let Y(x, 0) := [y € dg^F(x)) I V(yF)(x) = o}, this being a compact, polyhedral convex set (possibly empty), and let V ext (x,0) be its finite set of extreme points. (a) If x is locally optimal, then Y(x,0) must be nonempty, and max U,V2(yF){x){,) !/€n*>(r)

> 0 for £ satisfying

dg(F(xj).

is nonempty and max

U, V2(yF)(x)A

>0/or{/0

(V(»F)(S),f) < 0 for all y £

satisfying 8g(F(x)),

then x is locally optimal. Proof. This applies the formulas of Theorems 3.1 and 3.2 to the general characteri zation of local optimality in terms of first- and second-order epi-derivatives in [19]. □

4

Hessians and Second-order Expansions

Pursuing second-order properties to a greater depth, we turn to the question of the existence of second-order expansions for / in the sense of locally uniform convergence of difference quotient functions rather than the epi-convergence employed so far. In this endeavor we draw on results from our paper [15]. Two definitions from this paper set the stage. Definition 4.1. A single-valued mapping G from an open neighborhood of x 6 IRn into Mm has a first-order expansion at a point x £ O if there is a continuous mapping D such the difference quotient mappings AttiG : [G{x + tO - G(x)\/t for i > 0

R. A. Poliquin and R. T. RockafeUar

338

converge to D uniformly on bounded sets as t\0. The expansion is strict if actually the mappings AX,,G : [G(x +1£) - G(x)] /t for t > 0 converge to D uniformly on bounded sets as t\0

and x —* x.

The existence of a first-order expansion means that G is directionally differentiable at i : for every vector £ £ 1R", the directional derivative limit ,.

lim

G(x + tO - G{x) : —

exists. The existence of a strict first-order expansion means that G is strict directional differentiable at x; it corresponds to the existence for every £ of the more complicated limit where x is replaced by x, and x —> x along with £ —♦ £ and t\0. In both cases the mapping D in Definition 4.1 gives for each f the directional derivative D(£)Definition 4.2. Consider a function g on ]Rn and a point x where g is finite and differentiable. (a) g has a second-order expansion at x if there is a finite, continuous function h such that the second-order difference quotient functions

*IM® := W + *0 - 9{x) - t{Vg{x),Z)]/\t2 converge to h uniformly on bounded sets as t\0. The expansion is strict if g is differentiable not only at x but on a neighborhood of x, and the functions

&IMO ■= [ x. (b) g has a Hessian matrix H at x, this being a symmetric n x n matrix, if g has a second-order expansion with h(£) = ((, H(,}- The Hessian is strict if the expansion is strict. (c) g is twice differentiable at x if its first partial derivatives exist on a neighborhood ofx and are themselves differentiable at x, i.e., the second partial derivatives of g exist at x. Then V2g(x) denotes the matrix formed by these second partial derivatives. A second-order expansion in the sense of Definition 4.2 automatically requires the function h also to be positively homogeneous of degree 2: A(Af) = A2/i(£) for A > 0, and in particular, h(0) = 0. It means that g(x + ti) = g(x) + t(Vg(x),0

+ h2h(0

+

o(t*\tf)

for such a function h that is finite and continuous. The existence of a Hessian corre sponds to h actually being quadratic. The existence of a second-order expansion for an essential function / can be settled in a definitive manner on the basis of the second-order epi-derivative formula in

Second-order Nonsmooth

Analysis

339

Theorem 3.2 and a general result in our paper [14]. It is crucial for this purpose that strongly amenable functions / , such as we know we are dealing with now by virtue of Proposition 2.6, have a property called "prox-regularity," which we introduced in [14] (cf. Prop. 2.5 of that paper). This property is a typical hypothesis for most of the results of [14] and [15] that will be applied in what follows. Here we leave all discussion of it aside, jumping directly to the conclusions it supports. Theorem 4.3. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = d o m / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, the following properties are equivalent: (a) / has a second-order expansion at x; (b) / is differentiable at x; (c) df(x) contains a solitary vector v; (d) V(t/F)(i) is the same vector v for all y e dg(F(x));

(e) (d/)U°) = W for some vUnder these circumstances necessarily x 6 int C and V / ( x ) = v, and the expansion of f takes the form

/ ( * + *« = /(*) + („,£) +^< 2 max (CV*(yF)(x)t)

+

o(t2\e),

y€YM(x)

where Yext(x) is the set of extreme points of the compact, polyhedral convex set dg(F(x)J, and the max expression also equals /"„(£)• Proof. In view of Proposition 2.6, condition (CQ) extends from x to all points x sufficiently near to x with f(x) finite. It suffices therefore to argue the equivalences just at x itself. If a second-order expansion exists at x, / must in particular be differentiable at x and the function h expressing the second-order term must be the second epi-derivative function f'J, inasmuch as locally uniform convergence of difference quotient functions implies their epi-convergence. Conversely, if for any v € df(x) the function / ? 5 is finite, we obtain from [14, Thm. 6.7] (through the prox-regularity of / mentioned prior to the statement of the present theorem) that (b) and (c) hold with V / ( x ) , and moreover that (a) holds with the second-order term in the expansion dictated by h = f^. At this juncture we can apply the formula for fj[n in Theorem 3.2, which yields all the rest. In particular, (e) is obtained as an equivalent condition because (df)'M s (0) consists of the subgradients of | / ^ „ at 0. The subgradient formula for this function (cf. [13]) indicates that the unique subgradient at the origin is 0 if and only if the cone E(x,v), which is the effective domain of f'^s, has the origin in its interior, i.e., this cone is the whole space. D When does the expansion in Theorem 4.3 correspond actually to a Hessian for / at x? The following lemma will help answer this and a subsequent question as well.

R. A. Poliquin and R. T. Rockafellar

340

Lemma 4.4. Let Q{, i = 0 , 1 . . . , m be symmetric matrices in Mnxn, and let M be any subspace of Mn (perhaps JR" itself). Then in order to have the property max {(,Qit) = (£,QoO for all (, 6 M, 1=1,...,m

there must actually be an index i0 G { 1 , . . . ,m} such that the quadratic forms as sociated with Qi0 and Q0 agree on M. In other words, there must exist i0 such that to € argmax(£, Q;() for all £ G M. i = l , ...,m

Proof. We may assume without loss of generality that M = J?", since otherwise a change of coordinates can be employed to bring about a reduction to a space ]Rr with d < n. For each i G { 1 , . . . , m} let C, denote the closed subset of JRn consisting of the points x where index i gives the max, i.e., where the quadratic function
Second-order Nonsmooth

Analysis

341

Recall that a function h : M" —» M is a generalized (purely) quadratic function if it is expressible in the form U j

~\oo

iit$M,

where M is a linear subspace of R" and Q is a symmetric matrix in ]Rnxn. On the other hand, a possibly set-valued mapping D : IFF =t 2Rm is a generalized linear mapping if its graph is a linear subspace of Rn x Mm. The generalized quadratic functions are known to be precisely (up to an additive constant) the functions whose subgradient mappings are generalized linear mappings. Let us think of / as having a generalized Hessian at z relative to a subgradient v G df(x) if the second-order epi-derivative function / " „ exists and is a generalized quadratic function. We do not want to push this terminology too far, since the concept reverts to approximation in the sense of epi-convergence rather than locally uniform convergence, but a certain case can be made for it, especially in view of the results that will be obtained in the next section in connection with envelope functions. The idea is that a generalized quadratic function h can be regarded as associated with a "generalized matrix for which some of the eigenvalues may be oo," this being identified with a subspace M and an equivalence class of symmetric n x n matrices Q with respect to inducing the same quadratic form on M. These matrices all have the same eigenvalues relative to M; by an isometric change of coordinates that preserves the orthogonal decomposition of JRn into the sum of the subspaces M and M x , they can simultaneously be reduced to the same diagonal matrix whose entries are these eigenvalues. We can simply regard M1 as the eigenspace associated with the eigenvalue oo. These remarks are chiefly intended to be motivational, but the question of when / " „ is a generalized quadratic function turns out to be important for a number of reasons. We proceed with putting together an answer. In this we denote by ri B the relative interior of a convex set B (in the sense of convex analysis [17]). Theorem 4.6. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = d o m / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, and all v £ df(x), the following properties are equivalent: (a) f"v is generalized quadratic; (b) {df)'xv is generalized linear; (c) there exists y € ridg(F(x)j such that V(yF)(x) = v; further, there exists y € Yext(x,v) such that ({, V2([2/ -y]F){x){)

> 0 for all y £ Ycxl(x,v)

where the notation is that of Theorem 3.2.

and ( G E(x,v),

342

R. A. Poliquin and R. T. Rockafellar

Proof. The equivalence between (a) and (b) is assured by the relation between / " „ and (df)'xv in Theorem 3.2. For the equivalence between (a) and (c), we recall from Theorem 3.2 that, for all i in a neighborhood of x, the domain of / " „ is the normal cone to the convex set df(x) at v. The normal cone to a convex set is a subspace precisely when the point under consideration belongs to the interior of the set. Because df{x) is the image of the convex set dg(F(x)) under the linear transformation y -► VF(x)Ty (by Theorem 3.1), ,ts selative enterior ri sth emage under ri dg(F(x)) under this transformation (cf. [17, Sec. 6]). Thus, the cone H(x,u) is a subspace if and only if v = V(yF)(at) for some y £ to apply Lemma 4.4.

ridg(F(x)).

It remains only D

The "generalized Hessian" case also arises in connection with strict second-order epi-differentiability of / . Theorem 4.7. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = dom / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, and for all v € df(x), the following properties are equivalent and imply in particular that f'x'v is a generalized quadratic function: (a) / is strictly twice epi-differentiable at x for v; (b) / £ „ . epi-converges (to something) as (*',«') -» (x,v) in the set of pairs (x',v') with v' €df{x') for which f'J,y is generalized quadratic. Proof. This comes out of [15, Cor. 4.3] because of Theorem 3.2 and the proxregularity of / consequent to the strong amenability in Proposition 2.6. D A test of sorts for the case in Theorem 4.7, albeit a stringent one, is the following. Proposition 4.8. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let 3 be any point of the feasible set C = d o m / at which the condition (CQ) holds. Suppose the function /£„ is generalized quadratic for a certain v G df(x), and for all points (x,v) near (x\ v) such that /£„ is generalized quadratic denote by YmM(x, v) the set of vectors y satisfying the associated condition in Theorem 4.7(c). Then a sufficient condition for f to be strictly twice epi-differentiable at x for v is that both Ex,v -» E*t8 and r m a x (x, v) -> Ymax(x, v) as x -* — »nd v -» — in the set of pairs (x,v) with v € df(x) for which /»„ is generalized quadratic. Proof. All we need to do, according to Theorem 4.7, is to show that fx'v epi-converges to / j g as (x,v) -> (x,v) i i nhe eet to fairs sx,v) with h G Gf(x) for rhich h/ „i generalized quadratic. We first need to show that for all (

fl,(0<^Mf'Jk^k) 1

i

K—«-00 k—tco

—

* i

K K

'

'

K K

/■

•

/

whenever ffc -» f, xk - * and vk -. v in the set of pairs (xk, vk) with vk G d6(xk) for which fXk,„„ is generalized quadratic. If ^ k i„, £ E(xt,vt) for all k sufficiently large

Second-order Nonsmooth

343

Analysis

there is nothing to show. Assume not, then £ G E(x,t>). Now consider y G ^nax(x,C>). Because ym»x(x,i>) -» y m M (x,C), there exists y* G Fm»x(zk,f;fc) with y* -> y. It follows that f"ktVls((k) = (ik,^2(ykF)(Xk)ik), and in the limit we get the desired inequality. Finally we show that for all £ there exist £It„ —> £, as x —► x and u —> v in the set of pairs (x, t>) with u G df(x) for which /Xi„ is generalized quadratic, with

limsup fUU«) < f'UO-

X->I, u->6

If £ £ - ( x , u) there is nothing to show. When £ G E(x,i>) there exists (XiV G E(x,r) with £x,„ -> £. We have /£„(£*,„) = (6,«, V 2 (y I ,„F)(x I ,„)f x ,„) for some y I|B G Kn»x(x,u). We may assume that yXi„ —► y with y G K nax (x,C); this is due to (CQ). In the limit we get the desired inequality. D To what extent are these various properties realized in our examples? The case of a max function furnishes some good insights. Proposition 4.9. In the case of a function f = m a x { / i , . . . , fm} in Example 2.2 (as continued in Examples 2.2 and 3.3), consider any x G Rn and any v 6 df(x) = co {V/,-(x) | i G I(x)]. (a) / i a s a second-order expansion at x if and only if the vectors V/;(x) for i G I(x) coincide (or I(x) is just a singleton). It has a Hessian at x if and only if, in addition, the matrices V 2 /,(x) for i G I(x) coincide, this common matrix then being the Hessian matrix. (b) / " „ has a subspace for its effective domain 3(x,v) if and only if one actually has v G ri co j V/,(x) i G 7 ( x ) | , in which event E(x,v) := {£ |
Y(x,v) = ly y, > 0 if i G J(x), yt = 0 if i <£ I(x),

E£iw = i, £r=i!/.v/,(x) = v] such that £,yi(t,V2fi(x){) 1=1

> £ y . ' ( £ , V 2 / , ( z K ) for all y' G Y(x,v)

and £ G E(x,v).

i=l

Proof. These results follow from Theorem 4.6 via Theorem 3.2.

□

Strict twice epi-differentiability is harder to pin down in this example, but an elementary sufficient condition for it can readily be developed. Recall that a set of vectors Vo, «i • - •, v, is affinely independent if the set {vi — v0,..., va — v0} is linearly independent.

344

R. A. Poliquin and R. T. Rockafellar

Proposition 4.10. For the max function in Proposition 4.8, suppose that (a) the vectors Vf(x) for i E I{x) are aiEnely independent, and (b)*€ri[co{v/,(z)|i €/(*)}]. Then f is strictly twice differentiate at x for v. Indeed in this case, for all (x, v) sufficiently close to (x, v) with v G df(x), the function £'„ is generalized quadratic and depends epi-continuously on (x, v). Proof. Let gph 5 / denote the graph of the mapping df, i.e., the set of pairs (x,v) with v G df(x). We first show that under our assumptions there is a neighborhood U of (x,t>) such that for all (x,v) G U n gphdf, we have 7(x) = 7(x). Consider xk -t x and » t - » S with vk G df{xk). We have T,jeiM(yk)iVMxk) = «* for some vector yk G Y{xk,vk). Because E,e/(xt)(!/fc). = 1 and (yk)t > 0, we may assume that (yk)i -» yt (as Jb -» oo). We may also assume (by taking a subsequence if necessary) that I(xk) = /* for some subset 7* of { l , . . . , m } . In the limit we have D e i * *VJi(S) = »• Then it follows from our assumptions that I* = 7(x). We next show that Y(x, v) consists of only one vector when { v / , ( x ) I i G 7(x)} is affinely independent. To see this, assume that

E wv/*(x) y-vf,(x) ==«v= vFM*) = E vFM*) ■e/(x)

.g/(x)

for j / and y> in F(x,u). This in turn means that

E E (».■(v.■- y,')v/,-(*) y,')v/,-(*) ==oo== EE (vi-Vi)*M*)> (w-»0v^(*)'

i£/(x) ■ei(jc)

i6/(x) <€/{*)

because E i e * W * = 1 = E* 6 /W»i- Therefore

E (w-y')(v/<(*)-v/1(*))=o,

.e/(x) •e/(»)

which shows that y; = •/< for all z. It follows easily from the preceding observations that (a) and (b) are satisfied at (x, v)eUn gph df. Also note that the arguments we have furnished show that for all (x, v)€Un gph df we have y> -> y as x' - . x and v' -> u where y* = F(x', i/) and t/ = K(*,w). We know then from Proposition 4.8 that f'Jv is a generalized quadratic for all (x, v)£lfn gph df (inasmuch as Y(x, v) is a singieton). Finally we demonstrate that for all (x, v) € gph df in a neighborhood of (x, v) the function / is strictly twice epi-differentiable at x for v. We know that 7(x') = 7(x) for a l l ( * > ' ) G f/ngph df. Fix(«,») 6 C/ngphc?/. Because the set {(V/,(x') - «') | i G 7(x)} is affinely independent, we have E{x',v') -> E(x,u) as x' -+ x and v' -* v with »' € df(x'). Recall that F max (ar',u') -» K B ^ X . U ) . We now apply Proposition 4.8, and this completes the proof. □'

Second-order Nonsmooth

Analysis

345

The condition in Proposition 4.10 is so powerful that it guarantees not only the strict second-order epi-differentiability of / at x for v but the same also for all (x,v) near (x,v) in the graph of df. It is hard to come up with a tractable condition for strict second-order epi-differentiability that is more modest in its consequences. The following example does show, however, that a max of finitely many C2 functions can be strictly twice epi-differentiable at a point i (actually here a point of global minimum) without necessarily being strictly twice epi-differentiable at nearby points. Example 4.11. Let fi(xi,x2)

f(xux2)

:= xi 3 S2 2 and f2{xi,x2)

:= —fi(xi,x2).

Consider

:= \fi(xi,x2)\ = m a x ( / 1 ( i 1 , i 2 ) , / ! ( i 1 , i 2 ) } .

l

This function f is C (in fact it is both C 1+ (differentiable with locally Lipschitz continuous gradient mapping) and lower-C2), and it is strictly twice epi-differentiable at x = (0,0), yet it does not have this property at points of the x\-axis away from the origin. Detail. The functions f\ and f2 agree on the X\- and 12-axes, with Vfi(x^,x2) = (0,0) there for i = 1,2. This shows that / is C 1+ as well as lower-C2, and in particular C1. Furthermore, / has a global minimum at x = (0,0), where both ft and f2 the null matrix as their Hessian. We therefore have /(oo)(oo)(0 = ^ f° r a ^ £ by Theorem 3.2, so the function /(o0woo) ' s quadratic. At a general point x not on the X\- or x2-axes, f'Jv is the quadratic associated with the Hessian of f\ or f2. For points with x\ = 0, the second-order epi-derivative likewise has the property that /('U),(o,o)(0 = ° f o r all £■ But when x2 = 0 we have (£,V2fi(xu0)0 = 2x?& 2 and 2 2 so (£, V / 2 ( i i , 0)f) = —2xj^2 , that except for the origin, / is not twice differentiable at such a point nor strictly twice epi-differentiable there. Instead, //' 0 , I 0 I O)(0 = max{2ij£ 2 2 , — 2x\(,22} = |2ij^ 2 2 | for all f = ( ^ I , ^ ) - The formulas we have identified for the second-order epi-derivative show that /"_vf(r) converges uniformly on bounded sets to /(oo) (oo) as i - ' 0 ; in particular they epi-converge. Hence by Theorem 4.7, / is strictly twice epi-differentiable at (0,0) for (0,0). □ We now turn our attention to Example 2.3, where f(x) = fo{x) + 8c{x) with / 0 smooth. Adopting the terminology of [1], we say in this setting that a pair (x,v) for v 6 df(x) furnishes a nondegenerate stationary point (relative to the problem of minimizing / - (v, ■) in R") if v - V/ 0 (z) € ri Nc{x). Proposition 4.12. In Example 2.3, consider any point x € C where the constraint qualification is satisfied (as characterized in Example 2.31), and let v 6 df{x), which is equivalent to v — V/ 0 (x) € Nc(x). Then (a) the effective domain E(x,v) of /£„ is a subspace if and only if (x,v) furnishes a nondegenerate stationary point, in which event 3(x,v)={{eTc(x)

= {teTx(x)

(v/,(i),^)€T / ,(/,(x)) for all z, (v - V/ 0 (i),f) = o};

R. A. Poliquin and R. T. Rockafellar

346

(b) f"v is a generalized quadratic function if and only if, in addition, there is a multiplier vector y in the set Y(x, w) = {y | jfc € Ni, (fi(x)),

v - VxL(x, y) €

Nx(as)}

with the property that ((, *LH*,

V)t) < (t, V * , i ( * . V)t) *>«■ »U V' € Y(x, v)and(e

E(x, v).

Proof. This result follows from Example 3.4 and Theorem 4.6. Note that from Example 3.4 we do have df(x) = V/ 0 (x) + Nc(x), and therefore t; 6 r\df(x) if and only if v — V/ 0 (x) G ri Nc(x), i.e., (x,v) is a nondegenerate stationary point. □ P r o p o s i t i o n 4.13. In Example 2.3, consider any x G C with v G V/ 0 (x) + Assume that (a) (x,v) furnishes a nondegenerate stationary point, (b) | V / , ( i ) fi(x) £ r i / , } is linearly independent,

Nc(x).

(c) x = mn. Then for all (x, v) in a neighborhood of(x, v) with v G df(x) the function f is strictly twice epi-differentiable at x for v, and in particular / " „ is generalized quadratic. Proof. The line of proof is very similar to that of Proposition 4.10. First notice that there exists a neighborhood U of ( i , v) such that for all (x, v) G U n gph df we must have | V / , ( x ) /,(z) ^ ri /,} linearly independent. Next notice that we may also assume that {2|/,(x)Gri/,} = {J|/1(x)Gri/,} when (x,v) £ f / f l gph df. This is because v — V/ 0 (x) G r\Nc(x),

Nc(x) = {VxL(x,y)\y,

€

where

N^Mx))}

(recall that L(x,y) = f0(x) + Ej/./.(x)). £From this it follows that Y(x,v) is a singleton for all (x, t>) G C/ n gph df. We now easily conclude that YmaLX(x, v) —> imax(x,i>) and E(x,») —► 3(x,v) when x —> x and v —► 6 with u G 9f(x). To finish off, we apply Proposition 4.8. □

5

Proximal Mappings and Envelopes

^From now on we concentrate on the envelope functions eA and proximal mappings P x defined at the end of Section 2 in association with a function / . We continue to take / to be the essential objective function for problem in composite format. Mainly we concentrate henceforth on the case of minimizing points x G argmin/. Such points have v = 0 as a subgradient: 0 G df(x) by Theorem 3.5.

Second-order Nonsmooth

347

Analysis

First on the agenda is the specialization to this context of a selection of facts from [14] and [15]. (The interested reader should consult these papers for many other results.) Theorem 5.1. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for each A > 0 sufficiently small, there is a neighborhood of x on which the function ex is C1+ and lower-C2, the mapping Px is single-valued and Lipschitz continuous, and

Vex = \-l[l-Px]

= [\I+(df)-l}~\

Px = (l + Ad/)" 1 with Px(x) = x. Proof. We invoke [14, Thms. 4.4, 4.6, 5.2], making the observation, as above, that our assumptions entail through Proposition 2.6 that / has the prox-regularity de manded in those theorems. □ Functions that are C1+ have been the focus of much research recently. The reader interested in the study of generalized second-order directional derivatives and Hessians of these functions will surely want to consult the work of Cominetti and Correa [3], Hiriart-Urruty [4], Jeyakumar and Yang [5], Pales and Zeidan [9], and Yang and Jeyakumar [23]. Note that here the function tx is not only C 1+ but also lower-C2 Theorem 5.2. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which condition (CQ) is satisfied (so that 0 G df(x) in particular), and for A > 0 define

dx(0 = m i n { ^ , 0 ( 0 + - | f - {|2} for all £. Then for all A sufficiently small the function d\ is both C 1+ and Jower-C2, the gradient mapping Vdx being Lipschitz continuous globally, and the following properties hold: (a) ex has a second-order expansion at x, given by ex(x + t{) = ex(x) + t2dx(0 +

o(\tt\2),

(b) Vex has a first-order expansion at x, given by

vtx(x + to = tvdx(o + o(\m), (c) Px has a first-order expansion at x, given by

Px{x + ti) = x +

t[l-\Vdx(0}+o(\t(\).

Proof. This time we apply [15, Thm. 3.5], again utilizing the prox-regularity of / furnished through Proposition 2.6. D

348

R. A. Poliquin and R. T. Rockafellar

Theorem 5.3. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent and necessarily involve the same matrix Hx: (a) ex has a Hessian matrix Hx at x; (b) VeA is differentiable at x with Jacobian matrix Hx; (c) ex is twice differentiable at x, with H\ = V 2 e A (x); (d) Px is differentiable at x with Jacobian matrix I — \HX; (e) / " 0 is generalized quadratic. Proof. This goes back to [15, Thm. 3.8], once more under the prox-regularity that our hypothesis guarantees. □ Theorem 5.4. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent: (a) / is strictly twice epi-differentiable at x for v; (b) ex has a strict Hessian at x; (c) VeA is strictly differentiable at x; (d) ex is twice differentiable at x, and V2ex(x) —> V 2 e A (i) as x —t x in the set of points x where ex is twice differentiable; (e) ex is strictly twice epi-differentiable at x for v; (f) VeA is strictly proto-differentiable at x for v; (g) Px is strictly differentiable at x; (h) Px is strictly proto-differentiable at x; Proof. This quotes [15, Thms. 4.1,4.2] in the environment of the prox-regularity of / that comes from Proposition 2.6. □ Theorem 5.5. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent: (a) eA is C2 on a neighborhood of x; (b) Px is C1 on a neighborhood of x; (c) For all (x,v) near to (x,v) in the graph of df, f is twice epi-differentiable, f"v is generalized quadratic, and f'Jv depends epi-continuously on (x,v), i.e., f", , epi-converges to /"„ as (x',v') —> (x,v) with v' G df(x'). Proof. We appeal here to [15, Thm. 4.4].

□

Corollary 5.6. In the case of Theorem 5.5 where f happens to be differentiable at x, or merely if it satisfies a local growth condition of type f(x) < f(x) + s\x — x\2, properties (a) and (b) hold if and only if f is itself C2 on a neighborhood of x.

Second-order Nonsmooth

Analysis

349

Proof. The additional assumption forces fga to be finite (cf. Theorem 4.3), and the property in (c) of Theorem 5.5 reduces then to / being C2; see also [15](Cor. 4.5). □ Example 5.7. For the function f of Example 2.2, the assumptions of Proposition 4.8 and Theorem 5.5 ensure the presence of properties (a) and (b) of Theorem 5.5. Example 5.8. For the function f of Example 2.3, the assumptions of Proposition 4.13 and Theorem 5.5 ensure the presence of properties (a) and (b) of Theorem 5.5.

References [l] J. V. Burke, On the identification of active constraints II: the nonconvex case, SIAM Journal Numberical Analysis 27 (1990) 1081-1102. [2] F. H. Clarke, Generalized gradients and applications, Transactions of the Amer ican Mathematical Society 205 (1975) 247-262. [3] R. Cominetti and R. Correa, A generalized second-order derivative in nonsmooth optimization, SIAM Journal on Control and Optimization 28 (1990) 789-809. [4] J.-B. Hiriart-Urruty, Characterization of the plenary hull of the generalized Jacobian matrix, Mathematical Programming Study 17 (1982) 1-12. [5] V. Jeyakumar and X. Q. Yang, Second-order analysis of C 1,1 functions and con vex composite minimization, preprint 1992. [6] C. Lemarechal and C. Sagastizabal, Practical aspects of the Moreau-Yosida regularization I: Theoretical properties, preprint 1994. [7] A. Levy and R. T. Rockafellar, Variational conditions and the proto-differentiation of partial subgradient mappings, Nonlinear Analysis theory, Methods and Applications submitted. [8] P. D. Loewen, Optimal Control via Nonsmooth Analysis, CRM Proceedings & Lecture Notes 2, AMS, 1993. [9] Z. Pales and V. Zeidan, Generalized Hessian for C 1+ functions in infinite dimen sional normed spaces, preprint 1994. [10] R. A. Poliquin, An extension of Attouch's Theorem and its application to secondorder epi-differentiation of convexly composite functions, Transactions of the American Mathematical Society 332 (1992) 861-874. [11] R. A. Poliquin and R. T. Rockafellar, Amenable functions in optimization, Nonsmooth Optimization Methods and Applications, F. Giannessi (ed.), Gordon &: Breach, Philadelphia, 1992, 338-353.

350

R. A. Poliquin and R. T. Rockafellar

[12] R. A. Poliquin and R. T. Rockafellar, A calculus of epi-derivatives applicable to optimization, Canadian Journal of Mathematics 45 (4) (1993) 879-896. [13] R. A. Poliquin and R. T. Rockafellar, Proto-derivative formulas for basic subgradient mappings in mathematical programming, Set-Valued Analysis 2 (1994), 275-290. [14] R. A. Poliquin and R. T. Rockafellar, Prox-regular functions in variational anal ysis, preprint October 1994. [15] R. A. Poliquin and R. T. Rockafellar, Generalized Hessian properties of regular ized nonsmooth functions, preprint November 1994. [16] L. Qi, Second-order analysis of the Moreau-Yosida approximation of a convex function, preprint 1994. [17] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [18] R. T. Rockafellar, Maximal monotone relations and the second derivatives of nonsmooth functions, Ann. Inst. H. Poincare: Analyse Non Lineaire 2 (1985) 167-184. [19] R. T. Rockafellar, First- and second-order epi-differentiability in nonlinear pro gramming, Transactions of the American Mathematical Society 307 (1988) 75107. [20] R. T. Rockafellar, Proto-differentiability of set-valued mappings and its applica tions in optimization, Analyse Non Lineaire, H. Attouch et al. (eds.), GauthierVillars, Paris (1989), 449-482. [21] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematical of Operations Research 14 (1989) 462-484. [22] R. T. Rockafellar, Generalized second derivatives of convex functions and saddle functions, Transactions of the American Mathematical Society 320 (1990) 810822. [23] X.Q.Yang and V. Jeyakumar, Generalized second-order directional derivatives and optimization with C 1 ' 1 functions, Optimization 26 (1992), 165-185.

Homogeneous

Programming

351

Recent Advances in Nonsmooth Optimization, pp. 351-380 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Characterizations of Optimality for Homogeneous P r o g r a m m i n g Problems with Applications A. M. R u b i n o v 1 Department of Mathematics and Computer Science, Faculty of Natural Science, Ben-Gurion University of the Negev, Beer-Sheva, 84105 Israel. E-mail: [email protected]. ac. il

B. M. Glover School of Information Technology and Mathematical Ballarat, Ballarat 3350, Victoria Australia. E-mail:

Sciences, University of [email protected]

Abstract Necessary and sufficient conditions for optimality for various classes of convex and nonconvex programming problems involving positively homogeneous ob jective and constraint functions are developed. In particular global optimality criteria for the maximization of a sublinear function subject to sublinear con straints are established under the assumption that the value of the problem is known. This assumption is removed for certain specially structured problems. Applications to mathematical economics and functional analysis are discussed, in particular the following problem is considered in detail: to describe an ele ment on the unit sphere which realizes the norm of a given linear operator.

1

Introduction

Positively homogeneous functions arise naturally in m a n y applications including eco nomic modelling and m a t h e m a t i c a l p r o g r a m m i n g . Optimization problems involving these functions p e r m i t interesting analysis due to the presence of homogeneity even in 'The research of the first author is supported by the Ministry of Science, Israel and an Australian Government Bilateral Science and Technology Grant

352

A. M. Rubinov and B. M. Glover

the absence of convexity. They provide a rich source of examples within the context of global optimization which is an area receiving considerable attention both from the theoretical and computational perspective (see Horst and Tuy [11] and Pardalos and Rosen [19]). In addition the study of positively homogeneous functions is fundamen tal in nonsmooth analysis where such functions arise as nonlinear approximations to nondifferentiable functions (see [4, 6]). Consequently the results of this paper are a contribution to the theoretical study of mathematical economics, functional analysis and nonsmooth analysis. In this paper we consider programming problems involving positively homoge neous functions, both objective and constraint. In particular the results developed apply to a range of nonconvex programming problems, including problems involving convex functions (eg. sublinear maximization problems). We begin by developing very general theoretical results concerning the concepts of support sets, subdifferentials and superdifferentials for a variety of positively homogeneous functions. In particular we discuss existence of these approximating sets and their properties. The approach used throughout is based on convexification. A similar approach has been discussed by various authors, see, in particular, Ioffe and Tikhomirov [12] and Ekeland and Temam [7]. We provide a detailed discussion of the geometrical basis for this convexification process and its manifestation in a variety of examples. In particular we discuss the significance of this approach for difference sublinear functions using a, concept of set difference applicable to compact convex sets. The main results are related to establishing optimality conditions for various ex tremal problems involving possibly nonconvex positively homogeneous objective and constraint functions. In particular we are able to obtain characterizations of opti mality, in terms of subdifferentials and superdifferentials, for the maximization of a sublinear function subject to positively homogeneous (not necessarily convex) con straints. Such global optimization problems have received considerable attention in the literature (see for example [13, 11, 9]). They arise, for example, in certain approx imation and norm comparison problems (see [10]). In the initial results presented we assume that the value of the programming problem is known, however this assump tion is relaxed for certain specially structured problems. The conditions presented, involving the intersection of subdifferentials and superdifferentials, may assist in the development of computational schemes for solving such global optimization problems, for instance by providing verifiable stopping criteria or search paradigms. This will be the focus of future research. The structure of the paper is as follows: in section 2 we introduce the concept of a set which is star-shaped with respect to zero and develop the properties of such sets and their Minkowski gauges (complementing the results of [22]). These gauges are positively homogeneous nonnegative l.s.c functions. In section 3 subdifferentials of these functions are introduced and studied. Sets which are star-shaped with respect to +00 and their respective gauges are studied in sections 4 and 5. Maximization of nonnegative sublinear functions and minimization of superlinear nonnegative func-

Homogeneous

Programming

353

tions are investigated in sections 6 and 7 respectively. We widen the applicable classes of extremal problem under consideration in section 8 using the concept of associated problems (see Tuy [24] for related ideas). In sections 9 and 10 we discuss sublinear maximization under a single constraint and subject to finitely many constraints respectively. In section 11, as an aid to the reader, we summarize the results and compare the various extremal conditions obtained in sections 6 to 10. Finally, in section 12, we present an application of our results to the problem of locating an element on the unit sphere which realises the norm of a given linear operator. To complete this introduction we outline some areas of potential application of the results contained in this paper. The study of extremal problems involving positively homogeneous functions, to which the results of this paper are primarily applicable, has at least two main areas of application, mathematical economics and functional analysis. We briefly outline the context of these applications as follows: Mathematical economics One of the main tools for studying problems of economic theory are the so-called production function and cost function. These nonnegative functions are defined on the cone IR/J. of vectors with nonnegative coordinates. The production function provides the value of an output of production (number) under a given input (vector), the cost function provides the value of an input (number) under a given output (vector). As a rule these functions, F, are positively homogeneous of degree a > 0 (and, in many circumstances of practical interest, of degree one). How ever, in any case, we can consider the function Fxla which is positively homogeneous of degree one when studying extremal problems involving objective functions of this type. As a rule only concave production functions and convex cost functions have been considered in Mathematical Economics. Under these convexity assumptions we generate classical convex extremal problems involving the production and cost functions. When it is possible economists have attempted to remove these convexity assumptions. For example when studying nonlinear eigenvalue problems the Nobel prize winners (in Economics) R. Solow and P. Samuelson [23] consider production functions which are positively homogeneous of degree one and increasing (and there fore nonnegative) without further convexity assumptions. Further generalizations of results in this direction are contained in [17, 18]. The results in this paper allow the further removal of these convexity assumptions for particular extremal problems arising in Mathematical Economics. A very important role in this regard is played by Theorem 7.2 which gives a nonconvex generalization of the following classical eco nomic problem: minimization of the cost of the input under a given value of the output. More recently nonnegative positively homogeneous functions have been used un der additional convexity assumptions for the investigation of certain problems of Eco nomic Equilibrium (see [15]). The point here is that positive homogeneity permits the study of economic equilibrium with the help of simple extremal problems such as those discussed in this paper.

A. M. Rubinov and B. M. Glover

354

Functional Analysis Many important positively homogeneous nonnegative functions are considered in functional analysis, for example the norm in a normed linear space, the norm of a linear operator, spectral radius, the least eigenvalue of a positive definite matrix and so on. It is usual, in applications in functional analysis, to consider the minimization of nonnegative sublinear functions (such as the norm function). This presents a classical convex problem, however it should be noted that the solution of this problem has interesting properties, for example, it provides information on the properties of the elements of best approximation. On the other hand the maximization of a sublinear function is also of interest. The maximum of a positively homogeneous function on the unit ball is a quantity which shows a degree of 'dilation' of the ball under the action of this function. It is quite natural to expect special properties of the elements where the maximum is achieved. In the final section of this paper we demonstrate this by an example concerning the norm of a linear operator.

2

Star-shaped with Respect to Zero Sets and Their Gauges

Throughout this paper X shall denote a locally convex Hausdorff topological vector space. For a set D C X we shall denote the closure, convex hull and cone generated by D as cl D, co D and cone D respectively. The interior of a set D will be denoted int D and a set with nonempty interior will be called solid. The dual space of X, denoted by X', will be endowed with the weak* topology. Definition 2.1 A set U C X is said to be star-shaped with respect to zero (0-st-sh) if zero is a star point of this set i.e

x 6 U, 0 < A < 1 = > Xx e u. We can rewrite the definition of a star-shaped with respect to zero set in the following form: U is 0-st-sh <=^ (VA G [0,1]) \U C U. Definition 2.2 Consider a 0-st-sh set U. IRU {+oo}) defined as follows, for x 6 A',

Then the function \iv : X —► ]R +00 (=

/jy(z) = inf {A > 0 : x € \U] is called the Minkowski gauge (or 0-gauge) of the set U. We assume that the infimum of the empty set is equal to + co. Clearly if U is convex then /iy coincides with the well known Minkowski gauge from convex analysis (see, for example, [20]).

Homogeneous Programming

355

For x 6 X let Rx denote the ray {Ax : A > 0}. It is easy to check that fiV(x) = 0 <*=> Rx C U. Hu(x) = +oo <(=* Rx n U = {0}. {x : nu(x) < +00} = cone U(= (J XU). A>0

In particular fiu(x) < +00, for all x 6 X, if and only if zero is an algebraic interior point of the set U (i.e U n {Ax : A > 0} ^ 0 for all x e A"). It is easy to check that UiCU,

<=► (V* € X)

W](x)

>

W2(x).

Definition 2.3 For c £ ]R the level set { i e l : o(x) < c} o/tfte function g : X —> IR+oo un7/ 6e denoted by Sc(g). Lemma 2.1 Le< g be a function following are equivalent:

defined on X and mapping into IR+oo, then the

(i) g is positively homogeneous nonnegative and l.s.c with o(0) < +00 (ii) there is a closed nonempty 0-st-sh set U such that g coincides with the Minkowski gauge fiu of U. Proof: (i) implies (ii). If o is positively homogeneous and 0 < 5(0) < +00 then g(0) = 0 and therefore 0 6 U = S\(g). The set U is nonempty and it is easy to check that the set is 0-st-sh. Since g is l.s.c it follows that U is closed. Let us consider the Minkowski gauge MJ of the set U. Since g is positively homogeneous and nonnegative we have fj.u{x) = inf {A > 0 : x/A 6 U) = inf{A > 0 :g(x/X) < 1} = inf{A>0:O(x)
Si(fiu).

Since Sc(nu) = CS\(MJ) whenever c > 0 we have that the set Sc(fiu) is closed. Since S0(fiu) = n c>0 5 c (/xy) it follows that S0(nu) is closed. Thus the level sets Sc(nu) of the function nu are closed whenever c > 0. Therefore fiu is l.s.c. O Remark 2.1: Let us note that the l.s.c positively homogeneous nonnegative function g is the Minkowski gauge of the set Si(g). Remark 2.2: Connections between special classes of 0-st-sh sets and continuous positively homogeneous functions have been studied extensively in [22].

A. M. Rubinov and B. M. Glover

356

Deflnition 2.4 The set of all l.s.c nonnegative positively homogeneous (of degree one) functions g defined on the space X with the property g(0) < +00 will be denoted by PH((X). The totality of all closed nonempty 0-st-sh sets will be denoted by St0(X). We now introduce the natural order > in PH((X)

as follows:

ffi > 92 <=> (Vx G X) gi(x) > g2{x) where <7; € PHi(X). Also the order relation defined on St0(X) inclusion is defined as follows:

determined by anti-

Ui > U2 <=> Ui c u, for Ui 6 St0(X). It is easy to check that the ordered sets PH((X) and St0(X) are complete lattices (see [3]). An arbitrary subset of either PHg(X) or St0(X) has a supremum and infimum. If, for example, (p0)aeA Q PHe(X) then (supp a )(x) = supp 0 (x), (infp„)(x) = cl[infp a (x)\. a a

a

°

Here (sup 0 p Q ) and (in^p^) are boundaries in the lattice PHe(X) and supapQ(x) and inf a p„(x) are pointwise boundaries; cl/ denotes the l.s.c hull of the function / (see [20]). Similarly, if (Ua)a€A C St0(X) then supUa = P | Ua, inf [/„ = cl (J Ua. °

aeA

aeA

It is easy to check that the following holds. Theorem 2.1 The mapping ip0 : St0(X) isomorphism.

—t PHe(X),

where ip0(U) = nu, is a lattice

Now let us consider the dual space L = X' of all continuous linear functions defined on X. Definition 2.5 Let g g PHt(X). The set s(g) = { f 6 l : ( V i £ l ) l(x) < g(x)} is called the support set of the function g. Definition 2.6 The function x t~> sup {t{x) : (. € s(g)} is called the i-convex hull of the function g and is denoted coLg. Clearly s(g) is nonempty (since 0 € s(g)) weak* closed and convex and coLg is a l.s.c sublinear function which is the greatest l.s.c sublinear minorant of g. Let us note that a function g 6 PH/(X) is sublinear if and only if g is a Minkowski gauge of a convex set U 6 Stc(X), in other words if the set Si(g) is convex.

Homogeneous

357

Programming

Now consider the level sets U = Si(g) and V = Si(coLg). Clearly U is a closed 0-st-sh set and 0 G V By Lemma 2.1 it follows that g is a Minkowski gauge of U and coLg is a Minkowski gauge of V Since coLg is the greatest l.s.c sublinear minorant of g we have, by applying Theorem 2.1, that V is the greatest convex subset of St0(X) which is majorized by U (relative to the order relation in the lattice St0(X)). Since the order relation in St0(X) is defined by anti-inclusion we can say that V is the least (by inclusion) convex closed set which contains U, i.e V is the closed convex hull of the set V, V - dcoU.

3

Sub differentials of Functions in PH^(X)

In this section we consider the support set s(g) of a function g G PHt(X) and subdifferentials of this function at a point. Firstly let us consider sublinear functions. Let p be a l.s.c sublinear function. Since p is convex we can consider the subdifferential dp(x) of the function p at the point i £ l It is well known (see [20]) that dp(x) = {£ G dp ■ £(x) = p(x)}. Here, and subsequently, dp denotes the subdifferential of the function p, by definition: dp = {£ e A" : (Vi 6 X) C(x) < p(x)}. If a sublinear function p belongs to PH((X) then dp = s(p). It is easy to check that d(coLg) = s(g) for g G PH((X). We now define the subdifferential at a point for a function g G PHe(X). We give the same definition that is used for sublinear functions. Definition 3.1 The subdifferential of the function g G PHe(X) is the set §g{x) = {eet(g):t(x)=g(z)}.

at the point x G X

In the nonconvtx case this set may be empty. Proposition 3.1 Let g G PHt(X). 1. g(x) = (coLg)(x) 2. dc,(x)jt®

Then the following hold:

<=> dg(x) =

d(coLg){x).

= > g(x) = (coLg){x).

3. (g(x) = (coLg)(x),

d(coL9)(x) * 0) => 2g(*) + «■

Proof: 1. This is true since s(g) = s(coLg). 2. If dg(x) ^ 0 then there is I G s(g) such that t(x) = g(x). By the definition of the function coL# we have I G d(coLg) and t(x) = ( c o ^ ^ x ) . Therefore g(x) = (coLg)(x).

A. M. Rubinov and B. M. Glover

358 3. Follows easily from part 1.

D

Example 3.1: Let p l 5 p2 be continuous sublinear functions defined on the space X such that pi(x) > P2(z) for all x £ X. Let # = Pi — P2- Clearly g is a continuous nonnegative positively homogeneous function. In order to describe the support sets * s(g) and the subdifferential dg(x) we require the notion of set difference — between convex sets. By definition A-

B= {x:x

+

BCA}.

Proposition 3.2 Letpx, p2 and g be as above. Then the following hold: m

1- »(g) = dpi - dp2-

* 2. For all x G X, dg{x) = dpi(x) -

dp2(x)-

Proof: 1. We have that t e s(g) <^> (Vx) t(x) + p2(x) <

JH(X)

<=>• d{t + pt) c 8pi ^=>( +

dp2^dp1

<=>^G dpi - dp22.£edg(*) *=> (t + dP2 C dpu £(x) + P2(x) = Pi(x)) dpi(x) <=> £e dpi(x) - 8p2(x).

<=> ( +

dp2(x)C a

For a detailed discussion of differences of compact convex sets and their connection with various concepts of subdifferential in nonsmooth analysis see [21]. Let us now give a geometric interpretation of the subdifferential dg(x). Let £ G s(g). Since £ and g are positively homogeneous we have that £ G s(g) (i.e for all x G X, £(x) < g(x)) if and only if g(x) = 1 implies £(x) < 1. Now let x G Si(g). Then x = Ax', where g(x') = 1 and A < 1 therefore £(x) = X£(x') < 1. Thus the inclusion £ G s(g) holds if and only if the level set Si(g) is contained in the halfspace H( = {x : £{x) < 1}. If we consider £ G dg(x) then I G s(g), i.e S^g) C fl^, moreover ^(x) = g(x). Therefore £ G d#(x) if and only if the set {x' : ^(x') = g(x')} is a supporting hyperplane to the set Si(g) at the point x. The equality dg(x) = d(coLg)(x) shows that a hyperplane supports Si(g) at the point x if and only if this hyperplane supports the closed convex hull clco5i(^) at the same point. Let p be a sublinear function which is continuous at a point x G X. It is well known that the subdifferential dp(x) is a nonempty weak* compact convex set and dp(x)

Homogeneous Programming

359

coincides with the subdifFerential dp'x of the directional derivative p'x{-) = p'(x,-) of the function p at the point x: p'x(u) = lim a~1(p(x + au) — p(x)) = max £(u). Clearly if 5 6 PHt(X) and dg(x) ^ 0 then the directional derivative g'x may not exist. However we can say that the lower Dini derivative of the function g at the point x in the direction u f l : gD(x,u)

= liminf a~x(g(x + au) — p(x))

majorizes p'x(u) where p = coLg. This follows since p(x) = g(x) and p(y) < g(y) for all 3/ £ X so that p'x(u) = lim a _ 1 (p(x + au) — p{x)) < liminfa~ y (g(x + aw) — g{x)) =

gD(x,u).

We now provide an example to show that the inequality p'x(u) < gD{x,u) may be strict. E x a m p l e 3.2: Let X = IR2 and U = Ui U U2 where U\ is the convex polyhedron with vertices at (0,0), (1,0), (1,1), (0,1/2) and U2 is the convex polyhedron with vertices at (0,0), (—1,0), ( — 1,1), (0,1/2). It is easy to verify that the Minkowski gauge g\ of the set f/i has the form: , \ _ J rnax(—ii + 2x 2 , xi) if xj > 0, x 2 > 0 5i(xi,x 2 ) - I + 0 Q in all o t h e r cases The Minkowski gauge g2 of the set U2 has the form , 52^ 1,

, _ J max (xj — 2x2, —Xi) if xj < 0, x 2 > 0 2) — S _

o o

j

n ajj 0 ( - n e r

c a s e s

Since U = Ui U U2 we have #(x) = inf (<7i(x), <72(x)) for all x 6 IR2. Here g is the Minkowski gauge of the set U. Clearly the convex hull of the set U is the rectangle with vertices ( — 1,0), ( — 1,1), (1,1), (1,0). The Minkowski gauge p of this rectangle has the form: / max(|a:iJ, x2) p(xi,x 2 ) = j + o o

x2 > 0 X2
Clearly p = coLg. Now let x = (1,1). We have g(x) =
360

A. M. Rubinov and B. M. Glover

However pfa = max fa, x2) near the point x and so p'(x,u) = max(u max(u p'(S,u) l 7 uu2). It is easy to check that g'(x,u)=p'(x,u) g'(x,u)=p'fau)

= uu

iifffu! ,,>>«m2 ,

and g'fa u) = - u , + 2u2 >"2=

p'(x, u), if uj < u2.

The role of very general directional derivatives for positively homogeneous functions have been recently explored in [8, 14, 5]. The following lemma will be useful when we study extremal problems in subsequent sections. L e m m a 3.1 Let p £ PHt{X), assertions are equivalent: (i) i€L,

x' G X, p(x') = A and 0 < A < +oo. The following

(VX G Sk(p)) tfa t(x) < £fa), t(x'),

(fa) l(x*) = 1

(u) xeedp(x*) (ii) xeedpfa) Proof: (i) => (ii). Since i(z*) = 1 we have (Xt)(x*) = A = p(x'). Let us establish that A^ G dp. At first we consider a vector i such that ptx) > 0. Let x' = -4TZ. Since p(x') = A it follows that i(x') < 1 and therefore Ufa < pfa. Now let pfa = 0. Clearly fix G 5A(p) for all ft > 0. Therefore £(/ix) < 1, for all ft > 0, i.e ^(x) < 0. Consequently we obtain the inequality (fa < pfa in this case also. (ii) = > (i). We have, for all x, AC(x) < p(x) and Ufa) = pfa) = A. Therefore (.fa) = 1 and, for all x G Sx(p),

£fa < \pfa) = 1 = (fa) efa A

a

4

Star-shaped with Respect to +00 and Their Gauges

We now introduce the notion of star-shaped with respect to +00 sets. This notion is defined symmetrically with the notion of 0-st-sh sets.

Homogeneous

Programming

361

Definition 4.1 A subset U C X is called star-shaped with respect to +00 (+oo-stsh) if

x e u, A > 1 =*■ \xeu. Equivalently AC/ C t/ for all A > 1. If U is +oo-st-sh then the function v\j : X —> IR+00 tu/iere i/[/(x) = sup {A > 0 : x G At/} 15 called the +oo-gauge of the set U. Such functions are discussed in relation to interior point methods in [Sj. We assume that the supremum of the empty set is equal to zero. It is easy to check the following: Vu{x)

= 0 <^=> Rx n U = 0 or Rx n U = {0}. uv{x) = +00 <==;> Rx C U. uu(x) > 0 <$=> x 6 cone U. (Vx G AT) ^/(x) = +00 <^=> J7 = X

If 0 G [/ then i/t/(0) = +00 and if 0 £ U then i>[/(0) = 0. Consider the following examples. Example 4.1: Let I G L = X', H~ = {x : i(x) < c}, H+ = {x : £(x) > c} where c > 0. Clearly i/ c _ is a 0-st-sh set and H+ is a +oo-st-sh set. We have HH-(x) = inf {A > 0 : x G XH~} = inf {A > 0 : C(x/\)

= i n f { A > 0 : -C(x) < A) c = min(0, - « x ) ) . c j/ H +( x ) = sup {A > 0 : x G Ai/+} = sup{A > 0 : -£(x) > A} c = min(0,

c

-?{x)).

Therefore vH+ = nH-. Example 4.2: Consider the following generalization of Example 4.1. Let U be a 0-st-sh subset of X such that the interior of U, int U, is nonempty and every ray Rx

A. M. Rubinov and B. M. Glover

362

(x ^ 0) does not intersect the boundary of U more than once. Now let V = JRn\U. It is easy to check that V is a +oo-st-sh set. We have vv{x) = sup {A > 0 : x/A g V} = i n f { £ > 0 : £ x € V} = inf {(, > 0 : £x g £/} = sup{<£ > 0 : i^x 6 C} = inf {u> > 0 : x £ u>£/} = /*u(a:). Example 4.3: Let K be a cone in X. Clearly K is a 0-st-sh set and, simultaneously, a +oo-st-sh set. In this case we have HK{x) = inf {A > 0 : x € AA'} = \°+QQ

i/*(x) = sup {A > 0 : x e AA'} = | ^

^

K

j = M*)>

x £ A' J

=

5

*\*(x)

where (5z denotes the indicator function of a set Z. In the following we shall only be interested in closed +oo-st-sh sets. For a function g : X —* IR+oo we shall denote the level set {x £ X : g(x) > c} by Qc(g)Lemma 4.1 Let g : X —> IR+oo- Then the following are equivalent: (i) g is positively homogeneous nonnegative u.s.c and there is an x 6 X such that g(x) > 0, (ii) there is a closed nonempty +oo-st-sh set U such that g coincides with the +oogauge vy of the set U. Proof: The proof is similar to that of Lemma 2.1. We provide an outline below for completeness. (i) implies (ii). If g(x) > 0 then there is a A > 0 such that #(Ax) > 1 and therefore U = Qi(g) y£ 0. Since g is u.s.c the set U is closed. It is easy to check that g = u\j. (ii) implies (i). Let U be a closed nonempty +oo-st-sh set and let vu be the +oogauge of U. Then there is a x e X such that uv(x) > 0 (if x G U then vv(x) > 1). The level sets Qd^v) = cQi{vv) = cU are closed whenever c > 0. The set Qo(vu) — X is closed also. Thus vu is u.s.c. D Remark 4.1: Note that the function g in (i) above is the +oo-gauge of the set Q\{g). Definition 4.2 The set of all u.s.c nonnegative positively homogeneous functions defined on X with the property supxeX g(x) > 0 will be denoted by PHU{X). The totality of all closed nonempty +oo-st-sh sets will be denoted by St+ao(X).

Homogeneous

Programming

363

We introduce the natural order relation > in PHU(X) PHU{X): 9i>92

as follows, for gi: g2 G

<=> (Vx 6 X) g2(x)

and the order relation defined by inclusion in St+00(X),

for Ux, U2 G

St+00(X):

and St+00(X)

are complete

Ui > U2 «=» Ux D U2. It is easy to check that the ordered sets PHU(X) lattices. If {pa)aeA C PHU(X) then

(supp a )(z) = cl(supp Q (x)), (infp 0 )(x) = infp„(x). a a a

a

Here c l / denotes the u.s.c hull of the function / . If (Ua)o€A

Q St+00(X)

then

sup[/ a = cl [J Ua, miUa = P | Ua. a

a£A

a£A

The following is easily established. Theorem 4.1 The mapping tp+00 : St+00(X) lattice isomorphism.

5

—t PH^(X)

where ifi+00(U) = uu is a

U p p e r Support Sets and Superdifferentials at a Point

It should be noted that it is not possible to introduce the notion of L-concave hull and superdifferential at a point for all functions in PHU(X). This follows since the Lconcave hull of a function is a superlinear function. However there are no superlinear nonnegative functions q =fi 0 defined on all of X with 0 then ?(0) > q(x) + q{—x) > 0). So we can only introduce these notions for a special subclass of PHU(X). Let g G PHU(X). We introduce the following sets Vg = clcoQi{g), Kg = clconecoQjfff) = cl \J \>o

co\Qj(g)

where Q\{g) = {x : g(x) > 1} is the level set of the function g. Clearly Kg = clconeV^. We will consider below only functions g G PHU{X) such that 0 £ Vg. We will denote the set of all such functions by PH°(X). Definition 5.1 Let g G PH°U(X). The set s(g) = {( G L : (Vx G K„) t{x) > g(x)} is called the upper support set of g. Clearly this set is closed and convex. Proposition 5.1 For g G PH°(X)

the set~s{g) is not empty.

A. M. Rubinov and B. M. Glover

364 Proof: Since g G PH°(X) can find £ G V such that

we have 0 £ Vg and applying the separation theorem we inf # x ) =

inf tfs) > 0.

Assume that in{l€Q^g)£(x) = 1. We have £(x) > g(x) if g(x) if g(x) > 0. Clearly ^(x) > 0 for all x G clconeco Vg = /f3 and therefore £(x) > g(x) for all x G A s , that is £ G s(g). □ Let g G PH°(X). The function co^g where, for x £ X,

c^)(*)={fw*):'e^)} ;!£ is called the L-concave hull of the function
The superdifferential dg(x) of the function g at

dg(x)={ees(g):£(x)=g(x)}. Proposition 5.2 Let g G PH°(X).

Then the following hold:

1- 9(x) = {c°Lg)(x) <=> dg{x) =

d(coLg)(x).

2. dg(x) ± 0 = > g{x) = (coLg){x). 3. (g(x) = (coLg)(x), d(coLg)(x)) / 0 = ^ dg(x) ? 0. Proof: Follows as in Proposition 3.1. D Example 5.1: Let K be a solid closed convex cone, q1, q2 be superlinear functions defined on K with q\ > q2 on K and „/„1 _ / 9i( x ) ~ ©fa) ^ ' ~ \ 0 Clearly g G PH°(X).

I(

= A x0 A

It is easy to check, as in Example 3.1, that s(g) = dqx — dq2,

dg(x) = dqi(x) — dq2(x) for x G int K. Let g G PH°(X) and x G <3i(ff)- It can be shown, in a similar fashion to the approach for 0-st-sh sets, that I G dg(x) if and only if the set {x' : £(x') = g(x')} is a support hyperplane for the set Qi{g) at the point x.

365

Homogeneous Programming L e m m a 5.1 Let g € PH°(X), following are equivalent:

x~ e Kg and g(x') = A with 0 < A < +oo. Then the

(i) £ 6 L, (Vx 6 Qx(g)) £(x) > i(x'), £(x') = 1.

(a) weaj(i'). Proof: (i) implies (ii). Since £(x') = 1 we have (X£)(x*) = g(x*). Now let us consider a vector x g Kg such that g(x) > 0. Let x' = -Ayx. Since g(x') = A it follows that £{x') < 1 and therefore A^(x) < g{x). Since £(x) > 1 for all x € Q\{g) we have that £(x) > 0 for all x £ Kg = clconeco[^Q A (5)]. So if x 6 Ks and g(x) = 0 then A^(x) > g(x) also. Therefore A^ £ dg(x'). The proof of (ii) implies (i) is similar to that of Lemma 3.1. □

6

Maximization of Sublinear Functions Subject to Positively Homogeneous Constraints

We consider the following extremal problem: P(c)

f(x)

—> max

subject to

g(x) < c

where / , g 6 PH((X) and / is sublinear. Assume that c > 0. We assume that there exists a solution of this problem with d = max {/(x) : g(x) < c] and the value d of the problem P(c) is finite. Moreover we assume that d ^ 0. Theorem 6.1 Let f,g(z PH((X) and assume f is sublinear. Let the inequalities c > 0, d > 0 hold. Let the function f be continuous at a point x* such that g(x*) = c. Then the point x" is a solution of the problem P(c) if and only if the intersection

ia/(**)n-&(**) a contains a nonzero linear functional.

(i)

c

Proof: Let x* be a solution of the problem P(c). Since / is continuous at the point x" it follows that / is continuous at the point Ax* with A > 0. Clearly /(Ax*) < d when 0 < A < 1. Hence the set Td(f) = {x : f(x) < d) is open. Applying the separation theorem we get a linear function I 6 L such that ^(x) < ^(x*) for all x G Sc(f). Since 0 € Td(f) we have ^(x*) > 0. Without loss of generality we can assume that £(x*) = 1. Lemma 3.1 now yields d£ g df(x). Since x* is a solution of the problem

A. M. Rubinov and B. M. Glover

366

P(c) it follows that the inclusion Sc(g) C Sd(f) holds and therefore £(x) < £(x") = 1 for all x € Sc(g). Applying Lemma 3.1 again we obtain c£ € dg(x'). Thus the intersection (1) contains the linear functional £ ^ 0. Conversely, let intersection (1) contain the linear functional £ ^ 0. Thus c£ 6 dg(x") and d£ 6 df(x'). These inclusions show that the following equalities hold:

ct(x') = g(x'),

d£(x") = /(*•).

Since g(x*) = c we have that C(x') = 1 and therefore f(x*) = d = max{/(x) : g{x)
O

Remark 6.1: Let us note that if x" is a solution of the problem P(c) then the subdifferential dg{x") is non-empty. Remark 6.2: We can rewrite Theorem 6.1 in the following form: x' is a maximizer of the problem P(c) if and only if there are functionals £\ € df(x*) and (2 6 dg(x') such that ■Mi + \2£2 = 0 where Aj = 1/d, \2 = — 1/c. Thus we can consider the numbers \\ and A2 as Lagrange multipliers. Let us note that the ratio \i/\2 = —c/d of these multipliers is determined by the ratio of the value of c to the value of the program d. Remark 6.3: We can consider various approximations for a function g £ PH((X) near a point 1*. For example if g is locally Lipschitz we can consider the Clarke subdifferential (see [4]) or the Michel-Penot subdifferential (see [16, 5]). Let us note that these and other known approximations are defined with the help of general ized directional derivatives which majorize the lower Dini directional derivative. The subdifferential dg(x) = d(coLg)(x) is defined using the directional derivative of the sublinear function coLg. We have seen that this directional derivative is a lower sublinear approximation to the lower Dini directional derivative (possibly strictly lower, see Example 3.2). Let us consider the situation where the value d of the problem P(c) is unknown. Theorem 6.1 gives the following necessary condition for a maximum in this case: Let f,g& PH((X) and / be sublinear and continuous at a point x", the putative solution of the problem P(c). Assume that 0 < f(x*) < +00. Then there are numbers Aj > 0 and A2 > 0 and /' 6 L, I' / 0 such that

l'e^df(xnn^dg(x') A\

(2)

A2

Now we consider a function / which has a form

f i(x)

itxeK G K

1 +00 otherwise otherwise

(3) '

v

Homogeneous

367

Programming

where if is a closed solid convex cone and / 6 L and l(x) > 0 for all x S K. Clearly / € PHt(X). Let x' G int K and /' e df(x'). We have l'(x) < l(x) for all x£l< and l'(x') = l(x"). Since x' 6 int A' we have /' = /. So df(x*) = {/} in our case. We can give a necessary and sufficient condition for a global maximum in this case without using the value d of the problem P(c). Theorem 6.2 Let g e PHt(X) and assume f has the form (3). A point x* e K with g(x') = c is a solution of the problem P(c) if and only if there is A > 0 such that

Medg(x')

(4)

Proof: Let x~ be a solution to P(c). Then the formula (2) holds. Since df(x') = {/} we have I' = j-l. Therefore (4) is true with A = A2/Ai. Now let (4) hold with A > 0. Let g(x) < c. We have by the definition of subdifferential that: A l(x) < g(x) < c A l(x') = g(x') = c Therefore l{x') > l(x) if g(x) < c. □ Now let us explain why we consider only a function / which has the form (3). Let / 6 PH°(X) and d o m / = K where K is a closed solid convex cone. Let us give sufficient conditions for a solution of the problem P(c) with given / and g € PH((X): If x* £ K, g(x') = c and there are numbers A] > 0, A2 > 0 and a functional P € I , /' # 0 such that

t'e^-dg(x*)n^df(x*)

(5)

then x* is a solution of the problem P(c). (Compare with the necessary condition (2)). Actually, if g{x) < c we have in this case

f/(*) < ?t» < Y^X) $ Yc Ai

Ai

A2

l / ( x ' ) = l'(x') = 1 g(x*) = i-e. A2

Aj

A2

Since Ai > 0 we have f(x) < f(x') if g(x') < c. Now let / be a continuous nonnegative positively homogeneous function defined on the cone K. We can extend the function / to the entire space using two methods. Either we can consider the function / : /(*) =

+00

x£l< x &K

0

x eK xqLK

or the function / :

7(x) =

A. M. Rubinov and B. M. Glover

368

Clearly / £ PHt{X),_f £ PH°U{X). Clearly if x € K the subdifferential §£(x) and the superdifferential df(x) depend only on / (but do not depend on / or / ) . So we can write both the necessary condition (2) and sufficient condition (5) in this case. It can be shown that if both of these take place at the point x' £ int K then / has the form (3). In this case both of the sets dj(x*) and df(x") are not empty. Let /i £ df(x') and l2 £ df(x"). Then, for all x £ K,

h(x) < f{x) < h(x) that is h — h £ K" and h(x*) = hix"). Since x~ £ int if we have /i = l2- Thus / is equal to a linear function on the cone K. So using this method we no longer require knowledge of the value d provided we are dealing with functions which have the form (3). Now we can compare conditions of local and global maximum for the problem P(c) involving a function / which has the form (3). Assume that g £ PH((X) and g is a Lipschitz function near a point x" £ int K. Then necessary conditions for a local maximum (but not always sufficient) have the form: XI £ dcg{x") where dcg(x") is the Clark subdifferential of the function g. At the same time a necessary and sufficient condition for a global maximum has the form (4): A / £ dg(x'). Let us note that the convex compact set dg(x') is contained within the convex compact set dcg(x"). R e m a r k 6.4: We can obtain, applying the proof of Theorem 6.1, necessary conditions for a local maximum which do not depend on the value d of the program. Let x' be a local maximum of the problem P(c). Then there is a small cone K such that x* is a global maximizer of the following problem: f(x)

—> max

subject to

g(x) < c, x £ K

Let us replace the function g by the function g defined as follows:

^

= {+oo

xtK

Clearly x* is a global maximizer of the following program /(x)

—> max

subject to

and so, by Theorem 6.1, the intersection

ia/(x*)n-9p(x*) d

c

g(x) < c

Homogeneous

369

Programming

contains a nonzero linear functional (.. Here d = f(x*) and we can rewrite the necessary condition in the following form: the intersection

'^(On-SM

(6)

f(x

contains a nonzero linear functional £. Clearly this condition does not depend on the value d. Note that the inequality g > g implies that coLg > coLg and therefore we cannot substitute 3g(i*) for dg{x") in (6).

7

Minimization of Superlinear Functions Subject to Positively Homogeneous Constraints

We now consider the following extremal problem: Q(c)

f(x)

—» min

subject to

g(x) > c

where f,g£ PH°(X) with Kj = Kg and c > 0. We further assume that there exists a solution of this problem with d = min {f{x) : g(x) > c) and 0 < d < +oo. Theorem 7.1 Let f,g£ PH°(X) and suppose there is a closed solid convex cone K such that Kj = Kg = K and K ^ X. Let the restriction of f to the cone K be superlinear. Then the point x~ 6 intK such that g(x') = c is a solution of Q(c) if and only if the intersection:

]df(x')n-dg(x') d contains a nonzero linear functional.

c

Proof: Let x" be a solution of the problem Q(c). Since x" 6 intA' the interior of the closed convex set Qd{f) is not empty. Since / € PH°(X) we have 0 £ Vg = Qi(g) and therefore 0 £ Qd(g)- Now let us consider the set x* — K and show that int (x'~

K) n Qd(f) = 0.

If y e int (x* — K) then there i s a t G int K such that x' = y + v. Since v g int K there is a A > 0 such that v — \x* G K. Thus there is a w £ K such that v = \x* + w. Applying the superlinearity of / on the cone K we have

/(**) > f(y) + f(v) > f(y) + A / ( 0 + f(w) > f(y) + Xf(x').

370

A. M. Rubinov and B. M. Glover

Therefore /(») < (1 - A)/(x') < d. Thus y g Qd(f)- On the other hand since i* S int K we have 0 £ int (x* - K). Now we can apply the separation theorem to find t € L such that inf l(x) > sup £{z) = £(s*) + sup £(x'). xeQdf) xtx'-K x'e-K Since —K is a cone we have supxi^_K £(x') = 0. Since 0 6 int (x* — K) we have t(x') = supX£X._K £(x) > 0. Without loss of generality we can assume that £(x") = 1. The remainder of the proof follows in a similar way to the proof of Theorem 6.1. O We now provide a result related to Theorem 6.2 which has application to economic theory. Theorem 7.2 Let K be a closed convex cone, f, g € PH°(X), Kj = Kg = K and there is I 6 X' such that f(x) = l(x) for all x 6 K and l(x) > 0 for all x £ K, x ^ 0. Then a point x" 6 K such that g(x') = c is a solution of Q(c) if and only if there is A > 0 such that Xledg(x')(7) Proof: Let x* be a solution of the problem Q(c) and f{x*) = l(x') = d. Clearly d > 0. We have 1 = l(x*)/d < l(x)/d for all x € Qd(f)- Using Lemma 5.1 we can obtain inclusion (7) under A = 1/d. On the other hand if this inclusion is true then using an approach similar to the proof of Theorem 6.2 we obtain that x' is a minimizer of the problem Q(c). □ We now give an economic application of Theorem 7.2. Let us consider the cone i C of all vectors with nonnegative coordinates in the n-dimensional coordinate space R" as a cone of vectors of resources. Let us consider the economic system which can transform a vector of resources x into a vector of output. Denote the value of the output by G(x). We assume that there is a price vector / in the system, I = (htht ■ ■ ■ >'n)i where /, > 0 is a price of the product i and the value of the output is calculated with the help of the price vector /. The function G is called the production function. Clearly G is a nonnegative function defined on the cone Pi]. As a rule it is assumed in economic theory that G is positively homogeneous of degree a > 0 and a continuous function. In this case a function g where g(x) = G(x) 1 ^ 0 is positively homogeneous of degree one and continuous. One of the classical problems of economic theory is the following: find a vector of resources which has a minimal value between all vectors which allow the receipt of an output greater than or equal to the given value c > 0. Clearly this problem coincides with the problem Q(c) which is defined using the functions / and g, where f(x) = l(x) and g(x) is defined as above for all x € R+. We assume also that f(x) = g(x) = 0 for all x which do not belong to iff.. Theorem 7.2 gives both necessary and sufficient conditions for the solution of this problem. Let us note that previously only concave functions g were considered in economic theory. The concavity allows the application of the Karush-Kuhn-Tucker theorem for the analysis of this problem.

Homogeneous

8

Programming

371

Associated Problems

We considered in section 6 maximization problems P(c) with sublinear objective func tion and positively homogeneous constraints belonging to the class PH((X). Now we consider problems of the form P(c) where the objective function is merely assumed to belong to the class PHt(X) and the constraint is superlinear. At first we consider an arbitrary problem P(c) with positively homogeneous nonnegative objective and constraint function: P(d)

g(x)

—» max

subject to

f(x) < d

Here / and g are nonnegative positively homogeneous functions, 0 < d < +oo. Let c be the value of this problem, assume that 0 < c < +oo and that the set T of solutions to this problem is nonempty. If x" G T then g(x") = c, f(x*) = d. Now consider the following problem: Q(c)

f(x)

—> min

subject to

g(x) > c

Lemma 8.1 The value of the problem Q(c) coincides with d. Proof: We have min^)^,; f[x) < f(x*) = d. Assume that there is x' such that g(x') > c and f(x') = d' 0 then f(dx'/d') = d and g{dx'/d') > cd/d' > c. We have a contradiction since the value of the problem P(d) is equal to c. If a" = 0 then f(Xx') = d! = 0 < d and g(Xx') > Ac for all A > 0. Again this yields a contradiction. □ Clearly the set T of solutions of the problem P(d) has the form: T = {x : g{x) < c} n {x : f{x) > d}

(8)

On the other hand the set of solutions of the problem Q(c) has the same form as (8). Thus the problems P(rf) and Q(c) have the same set of solutions and we can consider Q(c) instead of P(d). We say that P{d) and Q(c) are associated problems. Recently Tuy [24] has discussed a similar concept to the idea of associated problems and computational implications. Theorem 8.1 Let us consider the problem P(d) g(x)

—> max subject to f(x) < d.

Assume that all the conditions of Theorem 7.1 are satisfied. Then x* is a solution of the problem P(d) if and only if the intersection -df{x')n-dg{x') d c contains a nonzero linear functional.

A. M. Rubinov and B. M. Glover

372

Proof: We can consider the associated problem Q(c) and apply Theorem 7.1.

Q

Remark 8.1: Let us note that the optimality condition is expressed in terms of superdifferentials. At the same time the optimality condition in Theorem 6.1 is ex pressed using subdifferentials. Now we consider the problem Q(d): g(x)

—> min subject to f(x) > d

and define the associated problem P(c) as follows: f(x)

— ► max subject to g(x) < c.

Assume that 0 < c < +oo, 0 < d < +oo. Lemma 8.2 The value of the problem P(c) coincides with c. Proof: Similar to the proof of Lemma 8.1. □ It is straightforward to check that associated problems have the same solution sets. Theorem 8.2 Consider the problem Q(d): g(x)

—> min subject to f(x) > d.

Assume that all conditions of Theorem 6.1 are satisfied. Then x" is a solution of Q(d) if and only if the intersection \df(x') a contains a nonzero linear functional.

9

n

c

-dg(x')

Sublinear Maximization

Now we consider the problem P(c): f(x)

—> max subject to g(x) < c

where / and g are l.s.c sublinear functions defined on the space X, in this case not necessarily nonnegative. We assume that there exists a solution of the problem with d = max {f(x) : g(x) < c} and - c o < d < +oo, d / 0. It is easy to check that both the system of inequalities c > 0, d < 0 and the system of inequalities c < 0, d > 0 are impossible. So we consider the following two cases: 1. c > 0, d > 0 2. c < 0, d < 0 For the first case, c > 0, d > 0, we require the following version of Lemma 3.1.

373

Homogeneous Programming

Lemma 9.1 Let p be a sublinear l.s.c function defined on X, x* € X and p(x*) = X where 0 < A < -fee. The following assertions are equivalent:

(i)iBL,

(Vx e sx{p))e(x) < e(x'), e(X') = 1

(a) xe e 5p(x*) Proof: (i) implies (ii). The equality (A^)(x*) = p(x') and the inequality (A^)(x) < p(x) when p(x) > 0 were proved in the proof of Lemma 3.1. Now we will prove that (A^)(x) < p(x) if p(x) < 0. Applying the l.s.c of the function p and the inequality A = p(x') > 0 we obtain the inequality p(ax + x*) > 0 for all sufficiently small a > 0. Since p(ax + x*) > 0 we see that the following inequality holds: [X£)(ax + x*) < p(ax + x*). Since p is sublinear we have p(ax + x*) < ap(x) + p(x'). Since (X£)(x*) = p(x*) we have that (A/)(x) < p(x). Thus (\(){x) < p{x) for all x and (X£)(x') = p(x'). Thus

Xi edp{x'). (ii) implies (i). Similar to that of Lemma 3.1

□

Theorem 9.1 Let us consider the problem P(c) where f, g are l.s.c sublinear func tions. Assume that c > 0, d > 0. Let the function f be continuous at x" where g(x') = c. Then x" is u solution of the problem P(c) if and only if the intersection

ia/(z*)ni&r(x*) a contains a nonzero linear functional.

c

The proof is similar to Theorem 6.1 and hence is omitted. Now for the case c < 0, d < 0 we require the following lemma. L e m m a 9.2 Let p be a l.s.c sublinear function with p(x) < 0 for all x 6 domp. Let x* € domp, X = p(x") and A < 0. Then the following assertions are equivalent: (i) £ € X', (Vx 6 Sx{p))e(x) (ii)

< £(x*), ((x') = 1

-X££dp(x').

Proof: Similar to that of Lemma 3.1. □ Now we pass on to the case c < 0, d < 0. Let us note that those x where either f(x) > 0 or g(x) > 0 are not interesting in this case and we will consider only functions / and g with the following properties: (Vx6dom/)/(x) <0,

(Vx G domg) g{x) < 0.

(9)

For example we can substitute for / ( x ) and g(x), +oo at all points where these functions are positive. Thus we obtain l.s.c functions.

A. M. Rubinov and B. M. Glover

374

Theorem 9.2 Let us consider l.s.c sublintar functions f and g such that inequalities (9) hold. Let c < 0 and let d, the value of the problem P(c), be negative. Let the function f be continuous at the point x* such that g(x*) = c. Then the point x" is a solution of the problem P(c) if and only if the intersection (1) contains a nonzero linear functional. Proof: The analysis is similar to that in the proof of Theorem 6.1. It shows that the result will follow if we use Lemma 9.2 instead of Lemma 3.1 and the following assertion which will be proved: if £ is a linear function such that £ ^ 0 and £(x") = max{^(x) : f(x) < f(x*)} then l(x') < 0. We have f(2x') < f(x') in the our case. Therefore £(2x') < £{x'). Hence £{x') < 0. Assume that £(x') = 0. If y € domp then f{x* + y) < f(x') + f{y) < f{x*) and therefore £(y) = £(x" + y) < £(x") = 0. Thus, for all y €. domp, £(y) < 0. Since the function / is continuous at the point x" it follows that Ax* is an interior point of the set S/(r»)(/) under A > 1 and therefore Ax* is an interior point of the cone dom p. If £(x") = 0 and £ is nonpositive on the cone dom p and x* is an interior point of this cone then £ = 0, a contradiction. □ Remark 9.1: If dom / = domg = K and x* 6 int K then we can prove Theorem 9.2 with the help of Theorem 7.1 by considering the functions —/ and — g instead of the functions / and g and the problem Q(— c) instead of problem P(c), where Q(—c) is the problem: ( —/)(x) —> min subject to (— g)(x) < —c. Recently Jeyakumar and Glover [13] have discussed conditions characterizing global optimality for programming problems including sublinear maximization problems such as P(c) using a generalization of Farkas' lemma.

10

Lagrange Multipliers for Sublinear Maximization

In section 6 we established Lagrange multiplier rules for positively homogeneous pro gramming problems with a single constraint. We now consider problems involving a finite number of sublinear constraints. Consider the following extremal problem: (P) / ( x )

—> max subject to #;(x) < 1, i g I = { 1 , 2 , . . . , n}, x £ K

where / ,
Homogeneous

Programming

375

Clearly

M; = K* n [(A*)A€R]* = K'nHx where Hx = {£ G X' : ^(i) = 0} is the hyperplane in X' generated by x. Lemma 10.1 For x G K the following holds:

dg(x) = co |J &,(*) - M ; i/Aere J^ = {i £ I: gi(x) = g(x)}. Proof: It is straightforward to show that

dg{x) 2 co U &,-(*) - M;. We have, by applying well known rules in subdifferential calculus, that dg = co |J dg,: - tf* Thus it is easy to establish the reverse inclusion and so the result follows. □ We can now use Theorem 9.1 to study problem (P). By applying this result it follows that x* is a solution of (P) if and only if there are linear functionals £, £', (.; (i G 7X-) and numbers a; > 0 (i G Ix-), J2> a< = 1 such that 1 + 0, dt 6 df{x'),

£, G dg,(x'),

t G Ai£.

e=-£(a,e,)-?

(io)

■€4« Let (." — dL Then we can rewrite this condition in the following form: there are 1° G df(x'), £, G dgi{x') and numbers A,, A0 such that (VxGMI.)(A°r+ £

Xili)(x)<0,

■€/,•

where A0 = 1/d, £ , A, = - 1 , A, < 0. Let A, = 0 for i 0 Ix». Clearly we can consider numbers A0, A j , . . . , A„ as Lagrange multipliers. Let us note that the following condition is a necessary and sufficient condition for a global maximum: A

where d is the value of the problem.

°

I

A. M. Rubinov and B. M. Glover

376

11

Conclusion

Let us consider the classical extremal problem involving the minimization of a sublinear function or, equivalently, the maximization of a superlineax function over a convex set. We consider only the simplest cases for discussion purposes. Let K be a solid closed convex cone, / a continuous superlinear nonnegative function defined on K and g be a continuous sublinear nonnegative function defined on K. We are interested in the following problems: P(c) Q(d)

f(x) g(x)

—> max subject to g(x) < c, x €E K —> min subject to f(x) > d, x g K

Assume that d is the value of problem P(c) then c is the value of problem Q(d). Applying the separation theorem and Lemmas 3.1 and 5.1 it is easy to check that x' is a solution of either problem P(c) or problem Q(d) if and only if there is I ^ 0 such that

ee~df{x')n-dg(x'), d c The following table provides a summary of the results of this paper. Problem

Objective

Constraint

Direction of Constraint

Characterization of optimality: Nonzero £ G X' in

max

subl

PHt

<

ia/(s*)ni&(x')

min

superl

PHI

>

\df{x-)n\d9(x')

max

PHI

superl

<

\df{x*)n\dg{*')

min

PHt

subl

>

\df{x*)n\ds{x*)

max

superl

subl

<

^/(x')ni^(x-)

min

subl

superl

>

]Mx*)nia,(i')

In the above 'subl' denotes sublinear, 'superl' denotes functions, / , in PH°(X) with superlinear restriction on the set A'/; PH( denotes PHt(X), c (d) is the value of the right-hand side of the constraint for constraint function g (/) and the value of the problem if g (/) is the objective function. We can see that we must use subdifferentials for functions belonging to PHt (in particular sublinear functions) and superdifferentials for functions belonging to PH° (in particular functions which have superlinear restrictions). Let / be a continuous and positively homogeneous function defined on a solid cone K and x 6 int K. We can

Homogeneous

377

Programming

apply both subdifferentials and superdifferentials in this case. The table shows that we must use subdifferentials if our function is the objective function under minimization or the constraint function under maximization. We must use superdifferentials if our function is the objective function under maximization or the constraint function under minimization.

12

Applications

Let X and Y be Banach spaces and A : X —» Y be a bounded linear operator. Let us consider the following extremal problem: ||Ax||y —> max subject to ||x||x < 1Clearly the value of this problem is equal to ||A||, the norm of the operator A. Let us denote

l|z|U=ff(*) \\y\W = p{y) \\Ax\\Y = f(x). Thus we have the extremal problem (Pi) f(x)

—► max subject to g(x) < 1.

Here c = 1 and d = \\A\\. Clearly dg = B'x, dp = BY, where B'x and BY denote the unit ball in the respective dual spaces X' and Y' Since f(x) = p(Ax) we have the following chain rule (see, for example, [1]) df = A'dp = A'BY where A* is the conjugate with respect to the operator A. Clearly dg(x) = {I € Bx : l(x) = \\x\\x} = {/ G X' : \\l\\ = 1, l(x) = ||x||;c}- Let us compute df(x). We have df(x)

= {leA'(BY):l{x)

=

\\Ax\\Y}.

Thus / € dj{x) if and only if there is a /' g BY such that / = A'V and l(x) = p(Ax) or equivalently: / = A'l', l'(Ax) = (A'l')(x)

= p(Ax), I'edp

(11)

The formulae (11) show that / G df{x) if and only if / = A'l' where /' G dp{Ax) or df(x) =

A'{dp(Ax)).

Thus we obtain the following result. T h e o r e m 12.1 Let X and Y be Banach spaces and A : X —> Y a bounded linear operator. A point x' G X has the properties 11*1* = 1 ||i4x*|| r = ||A||

(12)

A. M. Rubinov and B. M. Glover

378 if and only if there is an £ ^ 0 such that

t eflff(x')n J^A*(dp(Ax*)).

(13)

Here g(x) = \\x\\x and p(y) = \\y\\Y. If both X and Y are Banach spaces with smooth norms then dg(x*) = {Vg(x*)} and dp(Ax') = {Vp(Ax')}, (x* =£ 0). Therefore (13) can be written in the following form V<7(x*) = p | A'(Vp(Ax')) (14) We can consider (14) as an equation for locating the element x*. Now let X and Y be Hilbert spaces. We have V
(15)

If x* is a point satisfying (15) then || Ax*||y = ||A|| and we have the necessary condition for optimality: x* is an eigenvector of the operator A*A with the eigenvalue ||/1|| 2 . Let us now show that this condition is sufficient for optimality. If ||x||x = 1 and ||A||2x* = (A*A)(x*) then

||||A||V|U = ||A||2 = ||(AM)(x-)|U < M*||||Ax-||r. Since ||A*|| = ||A|| we have ||A|| < \\Ax*\\Y. Since \\x\\x = 1 we have \\A\\ = ||Ax*||y. Thus we have the following result. T h e o r e m 12.2 Let X and Y be Hilbert spaces and A : X —► Y a bounded linear op erator. Then a point x* £ X has the properties (12) if and only if x* is an eigenvector of the self adjoint operator A*A with eigenvalue ||A|| 2 .

References [l] J.-P. Aubin and I. Ekeland, Applied Nonlinear Analysis, Wiley, New York, 1984. [2] A. Barbara and J.-P. Crouziex, Concave gauge functions and applications, Zeitschrift fur Operations Research, 40 (1994) 43-74. [3] G. Birkhoff, Lattice Theory, American Mathematical Society, Providence R. I., 2nd Edition, 1948.

Homogeneous Programming

379

[4] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [5] B. D. Craven, D. Ralph and B. M. Glover, Small convex-valued subdifferentials in mathematical programming, to appear Optimizaiion (1994). [6] V. Demyanov and A. M. Rubinov, Introduction to Constructive Nonsmooth Analysis, to appear 1995. [7] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, Elsevier North Holland, Amsterdam, 1976. [8] F. Giannessi, Semidifferentiable functions and necessary optimality conditions, Journal of Optimizaiion Theory and Applications (1989) 191-241. [9] J.-B. Hiriart-Urruty, From convex to nonconvex minimization: Necessary and sufficient conditions for global optimality, in Nonsmooth Optimization and Related Topics, Plenum, New York, (1990) 219-240. [10] J.-B. Hiriart-Urruty and C. Lemarechal, Testing necessary and sufficient conditions for global optimality in the problem of maximizing a convex quadratic function over a convex polyhedron, Preliminary Report, University of Paul Sabatier, Toulouse, (1990). [11] R. Horst and H. Tuy, Global Optimization, Springer-Verlag, Berlin, 1990. [12] A. D. Ioffe and V. M. Tikhomirov, Theory of Extremal Problems, Nauka, Moscow, 1974. [13] V. Jeyakumar and B. M. Glover, Nonlinear extensions of Farkas' lemma with applications to global optimization and least squares, to appear Mathemaiics of Operations Research (1995). [14] S. Komlosi and M. Pappalardo, A general scheme for first order approximations in optimization, Optimization Methods and Software 3 (1994) 143-152. [15] M. I. Levin, V. L. Makarov and A. M. Rubinov, Mathemaiical Models of Economic Interaction, Nauka, Moscow (in Russian), 1993. [16] P. Michel and J.-P. Penot, A generalized derivative for calm and stable functions, Differential and Integral Equations 5 (2) (1992) 433-454. [17] M. Morishima, Equilibrium, Stability and Growth, Clarendon Press, Oxford, 1964. [18] H. Nikaido, Convex Structures and Economic Theory, Academic Press, New York, 1968.

380

A. M. Rubinov and B. M. Glover

[19] P. M. Pardalos and J. B. Rosen, Constrained Global Optimization: Algorithms and Applications, Lecture Notes in Computer Science, 268, Springer-Verlag, Berlin, 1987. [20] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [21] A. M. Rubinov, Differences of convex compact sets and their applications in nonsmooth analysis, in Nonsmooth Optimization: Methods and Applications, F. Giannessi (ed), Gordon and Breach, Amsterdam, (1992) 379-391. [22] A. M. Rubinov and A. Yagubov, The space of star-shaped sets and its applica tions in nonsmooth optimization, Mathematical Programming Study 29 (1986) 176-202. [23] R. M. Solow and P. A. Samuelson, Balanced growth under constant returns to scale, Econometrica 20 (1953) 412-424. [24] H. Tuy, D. C. Optimization: theory, methods and algorithms, Hanoi Institute of Mathematics Preprint 1993, to appear Handbook of Global Optimization, R. Horst and P. M. Pardalos (eds), Kluwer, 1994.

Regularized

Duality

381

Recent Advances in Nonsmooth Optimization, pp. 381-391 Eds. D.-Z. Du, L. Qi and R..S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

On Regularized Duality In Convex Optimization Andrzej Ruszczyriski International Institute

for Applied Systems

Analysis,

2361 Laxenburg,

Austria

Abstract

The general convex programming problem, under constraint qualification, is shown to be equivalent to a non-zero sum game in which objectives of the players are obtained by partial regularization of the Lagrangian function. Based on t h a t , a solution method is developed in which the players improve their decisions while anticipating the steps of their opponents. Convergence of the method is proved and application to decomposable problems is discussed.

1

Introduction

Let / : 1R" (-» IR a n d <j>, : IR" i-» IR, i = 1 , . . . , m , be convex functions, let X C IR" be a convex closed set, and let 6 £ IR m We consider t h e convex p r o g r a m m i n g problem min/(x) 9i{x)
i = l,...,m, x 6 X.

(1) (2) (3)

Associated with problem (l)-(3) is the Lagrangian L : IR" x IT™ >-> IR defined as m

L(x,y) = f(x) + '£yi(Lgi(x)-bi),

(4)

where y 6 Y = IR? is t h e vector of dual variables. T h r o u g h o u t this paper we shall a s s u m e t h a t t h e following condition holds.

A.

382

Ruszczynski

Constraint Qualification Condition. There exists x° 6 ri X such that gi(x°) < bi, i =

l,...,m.

It is well known (see, e.g., [12]) that the following proposition is true. Proposition 1.1 Assume that the Constraint Qualification Condition is satisfied. Then a point £ £ X is a solution of (l)-(3) iff there exists y € Y such that the pair (£,y) is a saddle point of the Lagrangian (4) on X X.Y, i.e L(x, y) < L(x, y) < L(x, y),

Vx e X, Vy € Y.

(5)

This is the starting point of our considerations; we shall aim at developing a new approach to constrained nonsmooth optimization problems based on a saddle point procedure. There were many attempts to solve optimization problems via saddle point seeking methods; the simplest algorithm (see, e.g., [l]) has the form xk+l =

Ux(xk-TkLx(xk,yk)),

yk+1=nY(yk

+ TkLy(xk,yk)),

* = 1,2,...,

where Lx(xk, yk) and Ly{xk, yk) are some subgradients of L at (xk, yk) with respect to x and y, and IIJC(-) and Ily (•) denote orthogonal projections on X and Y, respectively. Such methods are convergent only under special conditions (like strict convexityconcavity) and with special stepsizes for primal and dual updates: rk —» 0, Y^T=o Tk = oo (cf. [10]). One possibility to overcome these difficulties is the use of the proximal point method [9, 14]. Its idea is to replace (5) by a sequence of saddle-point problems for regularized functions ^{i,v)

= L(U)

+ ^\\(-xkWi-P-\\rl-ykr

(6)

A saddle point (£k,rik) of At is substituted for (xk+1,yk+1) at the next iteration, etc. A variation of this approach is the alternating direction method [5, 2]. We are going to develop an iterative method for (5) which does not have saddlepoint subproblems. The key idea, which generalizes and simplifies the concept used for linear programming in a recent work [6], is to replace the regularized function (6) by two convex-concave functions: a primal and a dual one, and to make steps in x and in y using subgradients of these functions. We shall develop the basic concept in section 2, and in section 3 we describe the method. Next, in section 4 we prove its convergence to a saddle point of L. Finally, in section 5 we discuss the application of this approach to some convex optimization problems of special structure.

Regularized

Duality

383

For a convex set X C IR", the cone of feasible directions at x G X is denoted by Kx(x) = {d G IR" : 3 ( T > 0) x + rd G X}. The conjugate (negative of the polar) of a cone K C Ht" is defined to be K* = {d € IR" : V(z G K) (d,x) > 0}. For a convex-concave function L : IR" x ]Rm n M w e use dxL(x,y) and dyL(x,y) to denote its subdifferentials with respect to x and j/. Elements of these subdifferentials (subgradients) will be denoted by Lx(x,y) and Ly(x,y).

2

Regularized Duality

Let us define a non-zero sum game with two players: P and D. The objective of P is to minimize in the variables x £ X the regularized primal function: P(x,y)

= max L{x,r,)-Z\\V-yf

(7)

where p > 0 is some parameter. The objective of D is to maximize with respect to the variables y G Y the regularized dual function: D(x,y) = rrun\LU,y)

+ ^U~x\\A.

(8)

2

i€X

A Nash equilibrium of the game is defined as a point (x,y) G X x Y such that x G argminP(x,2/),

(9)

and y G arg max D(x,y).

(10)

We define the proximal mappings £(i, ?/) and !j(a:, i/) as the solutions of the subproblems in (8) and (7), respectively. We also introduce the error functions: A(x,t,) = U(x,y)

- x\\2 + \\r,(x,y) -

yf,

and the regularized duality gap E(x,y)

= L(x,n(x,y))

-

They satisfy the following relations. Lemma 2.1 For all x G X and y G Y, E(x,y)>pA(x,y).

L(£{x,y),y).

A.

384

Proof. By the definition of £ = £(x,y), there exists a subgradient Lx(£,y)

Lxti,y) + p(Z-x)eK'x(0. As x — £ 6 /<*(£),

we

Ruszczynski such that

(11)

have

L ( I , J / ) - £(£,») > (Z*(&*),* - 0 > p||{ - x|| 2 . In a symmetric way, from the definition of ■q = //(x, j/) it follows that L{x,y)-

L(x,ri) < (Ly(x,ri),y-T))

< -p||r?-y||2

Subtracting the last two inequalities, we obtain the required result. □ We can now prove the equivalence of (5) and our game. Theorem 2.2 The following statements are equivalent: (a) (i,y)

is a Nash equilibrium of the game (9)-(10);

(b) E(x,y)

= 0;

(c) A(x,y)

= 0;

(d) (i,y)

is a saddle point of L over X x Y.

Proof. We denote £ = £(x,y) and fj = n(x,y). (a)=>(b). Since p > 0, the function n(x,y) is continuous. Therefore dxP(x,y) = dxL(x,r)(x,y)). Using this equality in the optimality conditions for (9), we deduce that there exists a subgradient Lx(i,fj) 6 Kx(x). Thus L{£,rj) - L(x,fj) > {Lx(x, fj),i-x)>

0.

Analogously, optimality conditions for (10) yield —Ly((,,y) 6 KY(y) f° r some subgra dient Ly{(,y), so L(i,y)-L(i,f,)>0. Adding the last two inequalities we obtain E(i,y) non-negative, (b) follows.

< 0. Since E{x,y)

is always

(b)=^(c). The result follows immediately from Lemma 2.1. (c)=>(d). Since A(x,£) = 0, one has £ = x and fj = y. By (11), Lx(x,y) e Kx(x) for some Lx(x, y). This is equivalent to the right inequality in (5). Similarly, —Ly(£, y) £ Ky(y) for some Ly(x,y), which completes the proof of (c). (d)=>(a). The left inequality in (5) implies L(x,y) = m&xL(x,r>) = max L(x,V)-^\\r,-y\\2

=

P(£,v).

Regularized

Duality

385

On the other hand, for every x 6 X, from the right inequality in (5) we get L{£,y) < L(x,y)

P,l_ < max L{x,j))--\\Ti-y veY

Consequently, P(i,y) < P(x,y) for all x € X. D(£, i) > D(x,y) for all y £ Y. □

-.1,2

=

P(x,y).

In the same manner we prove

In convex programming the regularized Lagrangian functions take on well-known forms. The regularized primal function is the augmented Lagrangian (cf. [13]) for (l)-(3): P(x,y) = f(x) + -L £ [max (0, <,,(*) - b, + py,)]2 - I f > ; ) 2 -

(12)

The regularized dual function is the augmented Lagrangian for the dual problem, D(x,y)

= min

/(0 + E ».*■(«+ fll(-x|i

(13)

Consequently, Theorem 2.2 shows that a solution of the convex programming problem and the associated dual vector can be obtained as a Nash equilibrium of a game, in which augmented Lagrangian functions serve as players' objectives. It appears to be a step backwards: games are generally more difficult than optimization problems, but our game exhibits regularities that can be exploited by the solution procedure.

3

The Partial Regularization Method

Let us now describe in detail a method for finding a saddle point of L. It is, in fact, a subgradient algorithm for solving the game (9)-(10). It can also be interpreted as a method operating on the original saddle problem in which both players try to predict the moves of their opponents to calculate the best response. Initialization. Choose x° € X, y° G Y and 7 € (0,2). Set Jb = 0. Prediction. Calculate r)k = T](xk,yk) and (,k = Stopping test. If &(xk,yk)

((xk,yk).

= 0, then stop.

Direction finding. Find subgradients Lx(xk,rik)

and Ly((,k,yk)

and define

< = lie* ( - ! . ( « * , 1?*)),

dk =

Ilcky(Ly(e,yk)),

where Ckx and Cy are closed convex cones such that C£ 3 Kx{xk) KY(V").

and CY D

A.

386

Ruszczynski

Stepsize calculation. Determine

n

= —w

L

(14)

Step. Update the points X*+1 = UX (xk + Tkdhx) , yk+1 = Uy (yk + rkdk) , increase k by one and go to Prediction. Let us stress that, contrary to proximal point methods, our approach does not have saddle point subproblems. Instead of them, two auxiliary optimization problems are solved at the prediction step. Our method resembles in some way the extragradient method of [8], but our prediction step uses proximal operators, not just a linear Jacobi step. Owing to that, we can solve nonsmooth problems. We also have a constructive stepsize rule. It should be stressed that projections on Cx and CY are optional; we can always use Ckx = HT and CY = M m . Still, the use of Cx = clKx{xk), CY = cl Ky(yk) is easy in some classes of problems (like polyhedral ones) and yields larger stepsizes, because removal of the normal component may substantially decrease direction lengths.

4

Convergence

To avoid obscuring the main idea, we shall now prove convergence of the method in its basic form, presented in the previous section. Various modifications and extensions will be discussed after the proof. Theorem 4.1 Assume that a saddle point of L on X x Y exists. Then the method generates a sequence l(xk,yk)> _ convergent to a saddle point of L on X x Y Proof. Let (x",y') be a. saddle point of L on X x Y We define ^

= ||^-x«||2

+

||/-^||2

(15)

Our proof uses the general line of argument developed for iterative methods based on abstract Fejer mappings (see Eremin and Astafiev [3] and Polyak [11]). We shall prove that our algorithmic mapping decreases the distance Wk whenever {xk,yk) is not a solution.

387

Regularized Duality

At first we establish a descent property of directions (dk,dk). Using the formula h = nc(/j)+n_ c .(A), which holds for any closed convex cone C, with h = — Lx(xk, r]k) and C = Ckx, we obtain -Lx(xk,r,k)

n_ U_icic>.x).(-Lx(xk,r, ,r)k)).

= dk +

Multiplying both sides of this equation by x' - xk £ Kx{xk) inequality kk kk (d>,z* -- xk) > (L {LIx(x (xkk,,r, ),x v ),x {£,**■

- x') x') > L(xk,r,k)

-

C Cx we get the L(x',r, L(x',Vk).

Likewise, k k k ,y ). )(dkyy.y--y .y'-ykk)>L(e,yl-L(t )>L(e,y')-L(e,y

By the saddle point conditions (5), h L(e,y')>L(x',r, L(e,y')>L(x',r,k).).

Adding the last three inequalities we obtain: (4,x« (4,x«

k k k k k _ (dkk,y' ,y' - yykk)) > L(xkk,,vVk)k) - L(( = E{x £(**,»*)■ _ xxk)) + + (d > L(x L((k,y,yk)) = ,y ).

(16) (16)

This implies, in particular, that dk = {dk,dk) ^ 0, since otherwise one would have E{xk,yk) = 0 and, by Theorem 2.2, the algoritm would stop. Therefore the stepsize (14) is well defined. Since the projection on X is non-expansive, ll^Z-II^W^'

+

r^X-'r

= 11** " 2*H22 + 2Tk{dkkx,Xk k - X') +

= ^

2 Tt2||^|| k 2 - X'\\ + 2Tk(d x,X - X') + T^\\d \\

In a similar way, \\yk+l - y'W2 < \\yk - y'W2 + 2r {dk,yk \\yk+l - y'W2 < \\yk - y'W2 + 2rkk{dk,yk

- f) + r22\\dkk\\22 - f) + r \\d \\

Adding the last two inequalities and using (16) we conclude that 2 2k W*+i <Wk- 2rkEk + rT%\\d t ||/|| ,f,

with Ek = E(xk,yk).

(17) (17)

Substituting (14) we get l{2 l{2 wk+ ^Ek l<w<W kWk+1 ~^kEk . . k-

(18) (18)

Thus the sequence {Wk} is non-increasing and

Jii^ [i^% ==0°-J™Pf

(19) (19)

388

A.

Ruszczynski

Since Wk is bounded, the sequence {(xk,yk)} has an accumulation point {£,$). Thus {dk} is bounded and, by (19), limt-^, Ek = 0. Therefore E(x, y) = 0. By Theorem 2.2, (i,y) is a saddle point of L and we can use it instead of (x*,y*) in (15). Then, from (18) we see that the distance to (x,y) is non-increasing. Consequently, (z,y) is the only accumulation point of the sequence {(xk,yk)}. □ It is clear from the proof that we may replace the stepsize rule (14) with a more flexible requirement,

VA* ,

i(L(xk,Vk)-LUk,yk))

.

P^Ti
W¥- with At = A(xk,yk)

pfjji

Pf

'.

and 0 < A < 7 < 2. Indeed, (17) implies

w^1<Wt-AP-^A' * +1

S

*

(20)

||d*||»

The rest of the proof is the same, but with At instead of Ek. We can also have iteration-dependent parameters 0 < pt < p and 0 < Xk < ~fk < 2, provided that £r=o X^(2 - lk)p\ = =o, because e20) still implies l i m i n f ^ ^ A* t =0 Finally, it should be stressed that instead of proximal operators in the prediction steps, we can use more general mappings with similar properties (see [7] how to modify the proofs in this case). We choose to present the idea with the use of quadratic regularizations just for simplicity, to avoid obscuring it with technical details.

5

Application to Decomposable Problems

Let us now consider decomposable problems of the form n

™n £/,(*,)

(21) (21)

£«,-(*,•)<*, t = l,..,,m,

222) 222)

3=1

xXj:- e ^ , ; = ll,,......,,nn..

(23)

We assume that the functions f} and gtJ are convex and the sets X, are convex and closed. As usual, we introduce multipliers y € E.™ and the Lagrangian n

j=i j=i

m m

.=1 .=1

/ n

v=i \j=i

\\

- 6.1/ ■ /

Our method, when applied to this problem, takes a rather simple form.

Regularized Duality

389

Indeed, the prediction step in the dual variables can be carried out analytically, separately for each constraint: 77,(1,3/;) = max T)i(x,yi) \

P

\i=l

'

/

, i — 1,...,m. + Vi/ 1

(24)

The resulting regularized primal function (12) is the augmented Lagrangian for (21)(23):

max (o, £gvixj)

7=1 7=1

*p < .'=1 ^ =1 m

(

J=l

- b, + py)} - P- £ » * /J

'=1

Consequently, the update of primal variables is a projected subgradient step for the augmented Lagrangian function. It is clearly decomposable. Note that in a related work [15] of ours, we used here a whole sequence of nonlinear Jacobi-type steps. The dual function (13) takes on the additive form: ■ 1

. 1

¥

l»

D(x,y)-D(x,y) =

jrD]{xJ,y)

with Dj( <&) + + fajf 116 - *j|f] , ; =- 1....,». Dj(XiXi,y) ,y)==min min[/,&) [/,&)+5>* + £ W»?»<&) " ^llj ; = L» -,»■(25) (25) The minimizers f* ara used in the dual update, which is just aa nnder-relaxed step of the multiplier method, very similar to (24): + y* ( 0,T^ (£*,■(#) ~ -M 6,| + + Vi) Vi) ,i, «==1,.. 1,...,m. . ,m. y ' ' - max L j (£*,■(#)

In some cases, subproblems (25) can be quite easy to solve. The simplest example is the standard linear programming problem with ftfa) = CjXh gujfa) = atJx_, and Xj = [lj, Uj]. Then (25) has a closed-form solution, which can be calculated in parallel for each j = l , . . . , n . It is worth noting that the regularized dual function D(x,y) becomes the augmented Lagrangian function for the dual problem. Properties of our method in the case of linear programming are analyzed in detail in [6], with limit properties of the stepsizes r^, with the analysis of the rate of convergence, and with some numerical results. In fact, the highly encouraging properties discovered in [6] and analysed in a series of papers [16], [7] and [4] motivated the research reported in the present paper. Acknowledgement. The author is greatly indebted to Markku Kallio, earlier co operation with whom provided an impulse for this work. Thanks are also offered to Sjur Flam for many helpful comments.

390

A.

Ruszczynski

References [1] K. J. Arrow, L. Hurwicz and H. Uzawa, Studies in Linear and Nonlinear Programming (Stanford University Press, Stanford, 1958). [2] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming 55 (1992) 293-318. [3] I. I. Eremin and N. N. Astafiev, Introduction to the Theory of Linear and Convex Programming (Nauka, Moscow, 1976). [4] S. D. Flam and A. Ruszczynski, Noncooperative convex games: computing equilibrium by partial regularization, working paper WP-94-42, IIASA, Laxenburg, 1994. [5] D. Gabay, Application de la methode des multiplicateurs aux inequations variationelles, in: M. Fortin and R. Glowinski (eds.), Methodes de Lagrangien Augments (Dunod, Paris, 1982) 279-307. [6] M. Kallio and A. Ruszczynski, Parallel solution of linear programs via Nash equilibria, working paper WP-94-15, IIASA, Laxenburg, 1994. [7] M. Kallio and A. Ruszczynski, Perturbation methods for saddle point computation, working paper WP-94-38, IIASA, Laxenburg, 1994. [8] G. M. Korpelevich, The extragradient method for finding saddle points and other problems, Ekonomika i Matematicheskee Metody 12 (1976) 747-756. [9) B. Martinet, Regularisation d'inequations variationelles par approximations successives, Rev. Francaise Inf. Rech. Oper. 4 (1970) 154-159. [10] A. S. Nemirovski and D. B. Yudin, Cesaro convergence of the gradient method for approximation of saddle points of convex-concave functions, Doklady AN SSSR 239 (1978) 1056-1059. [11] B. T. Polyak, Minimization of nonsmooth functional,, Zhumal Matematiki i Matematicheskoi Fiziki 9 (1969) 509-521.

Vychislitelnoi

[12] R. T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970). [13] R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Mathematics of Operations Research 1 (1976) 97-116.

Regularized

Duality

391

[14] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization 14 (1976) 977-898. [15] A. Ruszczyiiski, Augmented Lagrangian decomposition for sparse convex op timization, working paper WP-92-75, IIASA, Laxenburg, 1992 (to appear in Mathematics of Operations Research). [16] A. Ruszczyiiski, A partial regularization method for saddle point seeking, work ing paper WP-94-20, IIASA, Laxenburg, 1994.

J. Sun, K. E. Wee and J. Zhu

392

Recent Advances in Nonsmooth Optimization, pp. 392-404 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

An Interior Point Method for Solving a Class of Linear-Quadratic Stochastic Programming Problems' Jie Sun, Kwan Eng Wee and Jishan Zhu Department of Decision Sciences, National Crescent, 0511, Singapore

University

of Singapore,

10 Kent

Ridge

Abstract

The quadratically convergent polynomial algorithm of Ye and Anstreicher is suggested for solving a class of two-stage stochastic programs in which both the present cost function and the recourse problem are linear-quadratic. Such stochastic programs, although are nonsmooth in nature, can be reduced to a linear complementary problem with a special structure. The proposed algo rithm takes advantage of this structure and performs well in computational tests.

1

Introduction

An i m p o r t a n t source of nonsmooth optimization problems is stochastic p r o g r a m m i n g . A two-stage stochastic programming model can be briefly formulated as follows. At the first (current) stage, a decision x G R" has to be m a d e , incurring a direct cost <j>{x), subject to x € X, where A" is a closed set. At t h e second (future) stage, a r a n d o m event is observed with outcome u> € Q, where Q is a probability space. T h e decision x and t h e outcome u> then determine an additional cost rf>w(x). Our task is to m a k e the 'This research is partially supported by grants RP-920068 and RP-930033 of the National Uni versity of Singapore.

IP Method for Stochastic Programming

393

best decision i "here and now" with respect to present cost and constraints as well as the expected cost Ea[il>u(x)] and certain induced constrains. The mathematical form of this model is: minimize <j>(x) + 23„[^w(a:)], subject to x e X.

(1.1)

The most widely used case comes when X is a nonnegative orthant (i.e. the set iff. = {x 6 Rn\xj > 0, j = 1,■ ■ ■, n}), (x) is a linear or convex quadratic function, fl is a finite set, and ^(x) is a piecewise linear-quadratic recourse function. In this paper, we consider a recourse function in the following form: i M * ) = sup {zZ(h„ - Tux) - \zlH„zw}, where hu is a vector, Tw and Hw are matrices, and Zu is a nonnegative orthant. In addition, nite. Note that the vector hw, the matrices Tw allowed to be random, although a particular much randomness.

(1.2)

the superscript T represents transpose, Hu is symmetric and positive semidefiand Hw, and the set Z^ are in principle application might not involve quite so

In many practical situations that motivate our model the recourse function is actually a penalty function of the vector hw — T„x. For example in the case of stochastic programming with simple recourse [9], which includes the stochastic transshipment problem [8] as a special case, 4>^(x) is a piecewise linear function of u> — x. Therefore one might ask why we do not use a more explicit formula than the supremum function (1.2) to designate rjjw{x). It turns out that the formula (1.2) can cover many penalty functions so far used in practice and that it provides a clean duality framework. See the fundamental papers of Rockafellar and Wets [4], [6] for details. Under this framework, problem (1.1) is equivalent to a saddle point problem and it is this saddle point form that opens a door to possible interior point methods for problem (1.1). We may re-write the linear-quadratic case of problem (1.1), according to the discussion above, in the following form: minimize cTx + -xTPx

+ 2 J ^wip^(x), subject to x G X,

(1.3)

u€fi

where 7r„ is the probability of the random event u 6 f l . It has been shown that ipu(x) is in general a convex piecewise linear-quadratic function for every fixed u> in the sense that the function is convex and its domain is a union of convex polyhedra, on each of which the function is given by a quadratic or an affine formula. Therefore, the function is nonsmooth and problem (1.3) is a nonsmooth convex piecewise quadratic program [4].

J. Sun, K. E. Wee and J. Zhu

394

Let

y=Y[Zw,b

=

ir0L

and Q = diag (• • ■, wwH^,, ■ ■ •).

A=

wen \

I

\

/

Then problem (1.3) can be further generalized to the following form: minimize^* f(x) = cTx + -xTPx

+ sup{(6 - Ax)Ty -

-yTQy},

(1.4)

where X = R\, y = J?" are nonnegative orthants, x,c € Rn, y, b G i?"1, m = £ „ |Z U | (|Z U | is the dimension of 2„), A € Rmxn,P g Rn*n,Q € i ? m x m , and the matrices P and Q are symmetric positive semidefinite. It should be noted that the assumption of X being the nonnegative orthant is not a serious restriction. For example, if X = {x £ Rn\Cx = d,x > 0}, where C € R'x" and d € R', then we introduce additional vectors y1 > 0 and y2 > 0, and put an additional term sup {(d-Cx)T(y'-y2)} y',y2eR'+

in f{x). The new term produces infinite penalty for the violation of Cx = d. By redefining y = y x R'+ x R\ and redefining 6, A, and Q appropriately, we obtain an equivalent problem on X x y with X = R^_. Therefore, as a preliminary study, we concentrate on the case of X and y being nonnegative orthants in this paper. Several methods have been proposed to solve problem (1.4) including the finite gener ation method of Rockafellar and Wets [6], the projected gradient method of Zhu and Rockafellar [13], the steepest descent method of Zhu [12] and the infeasible interior point method of Wright and Ralph [10]. For the special case where both matrices P and Q are diagonal and both X and y are boxes, a simplex-active-set method has been developed [5]. In this paper we explain how the recent predictor-corrector algorithm of Ye and Anstreicher [11] can be applied to problem (1.4). Unlike other algorithms, the algorithm possesses polynomial complexity and local quadratic rate. Our computational results show that the algorithm is apparently efficient and the total number of iterations increases insignificantly as the dimension of both primal and dual problems increases. In Section 2 of this paper we estimate the error of the iterates and state global and local convergence results. Then we study a variety of this method that incorporates the special structure of (1.3), present preliminary results of our computational tests, and conclude the paper in Section 3.

IP Method for Stochastic

2

Programming

395

The Algorithm and its Convergence Properties

Problem (1.4) has a symmetric dual problem: maximize^? g(y) = bTy - -yTQy •£ The corresponding saddle function is: /(*, y) = cTx + -xTPx

- sup{{ATy - cfx - -xTPx}. xex 2

(2.1)

+ bTy - ^yTQy - yTAx.

(2.2)

It can be seen that the dual problem is large-scale, which is meant to specify decisions zu with respect to all possible realizations of the random event u>. However, the primal vector x, together with matrix P, is likely to be of ordinary size. A strong duality theorem has been established for problems (1.4), (2.1), and (2.2), see [6] and [4]. It states as follows. The Strong Duality Theorem If both (1.4) and (2.1) are feasible (i.e. There exist x £ X and y 6 y such that f(x) < oo and g(y) > — oo), then both problems have finite optimal values and optimal solutions. In addition, the primal optimal solution i* and the dual optimal solution y* form a saddle point (x", y*) of (2.2), and the value l(x",y") is the common optimal value of (1.4) and (2.1). According to this theorem, in order to find optimal solution for (1.4) or (2.1), we only have to find a saddle point for (2.2). Since l(x,y) is a convex-concave function, the sufficient and necessary conditions for (x, y) to be a saddle point of (2.2) on X X y are -Vxl(x,y)eNx(x), Vyl(x,y)eNy(y), (2.3) where Nx(x) stands for the normal cone of X at i and Ny(y) has a similar meaning. The condition can be equivalently translated into an equation-inequality system Px — ATy — w = —c Ax + Qy — s = b • wTx = 0 sTy = 0 x,w,y,s > 0.

(2.4)

This is a linear complementary problem of a special form. Our task is to select a specific interior point method that is suitable for the structure of problem (1.3). To apply an interior point method to problem (2.4), we need an additional assumption.

J. Sun, K. E. Wee and J. Zhu

396

Assumption 2.0 Problem (2.4) has an interior feasible solution. That is, there is a quadruple (x,y,s,w) > 0 such that the first two equations of (2.4) is satisfied. Under this assumption, problem (2.2), and thus problem (2.4), will have a solution as shown in the following proposition. Proposition 2.1 Under Assumption 2.0, problem (2.2) has a saddle point on X x y.. Proof. Let J V ( P ) and J V ( Q ) be the null spaces of P and Q, respectively. Let TCX and rcybe the recession cones of X and y, respectively. According to [4], x is feasible to the primal problem if and only if xeX

and 6 - Ax € [ r c j n Af(Q)}°,

where <S° represents the polar cone of the set S; that is 5° = {p\pTq < 0, for q £ £ } . Now we have rcy n Af{Q) = {q\Qq = 0,q > 0}. Thus the polar cone is [rcy n Sf{Q)]° = {p\p = Qy -s

for some y 6 Rm and some 0 < s £

Rm}.

Thus x > 0, s > 0, and the second equation in (2.4) imply the feasibility of x to problem (1.4). Similarly y > 0, w > 0, and the first equation in (2.4) imply the feasibility of y to problem (2.1). We conclude that function (2.2) has a saddle point on X x y according to the strong duality theorem. D Let us define the central path of problem (2.4) as the set {(x(fi),y(fi),w(n),s(ii))\p, where (x(p),y(fi),w(p),s(n))

> 0},

is the solution of the following system: Px — ATy — w = —c Ax + Qy — s = b ■ WjX} = fi, j = l,...,n Siyi = fi, t = l,...,m i , w,y,s > 0.

(2.5)

The proposed algorithm finds a sequence of approximate solutions of (2.5) as fi { 0, starting from an approximate solution to (x(p.B), j/(/i 0 ), w(^o), S(A'O)), where fi0 is the

IP Method for Stochastic

397

Programming

initial value of fi. In particular, the proposed algorithm performs one-step Newton's method to get an approximate solution of (2.4) (the predictor step). Accordingly, the parameter n is reduced. However, to keep the solution estimable, the algorithm then performs a one-step Newton's method to get an approximate solution of (2.5) (the corrector step), so that the new iterate is still close to the central path.

Denote the positive diagonal matrices diag(ii, ■ • ■ , i „ ) , diag(yi, ■ • ■, j/ m ), diag(u>i, ■ ■ ■ ,w and diag(s!, ■ • • ,sm) by X, Y, W and S, respectively. To describe the extent of ap proximation to the central path, we define a proximity function * ( x , y , w , . , / 0 = ( I I — - e\\l + | | ^ - e|| 2 ) 1/2 , (2.6) fl ft where e is a vector of ones of compatible dimension. With a little abuse of the notations, the same e is used no matter what the dimension is. The following result provides an error bound for an approximate solution of (2.5) which satisfies the first two equations of (2.5) but may not satisfy the other equations. Proposition 2.2 If xk > 0 and y > 0 satisfy the first two equations of (2.5) together with some wk > 0 and sh > 0 and 6(xk,yk,wk,sk,pk) < a, then 0 < f(xk) - g(yk) < (1 + a/Vn + m){n + m)fik. Proof. By definitions of f(x) and g(y) (see (1.4) and (2.1)), we always have f(x) > g(y) for all {x,y) £ X x y (the weak duality). Therefore we only need to prove the second inequality. We have f(x") = cTxh + \{xk)TPxk T k

k

= c x + \(x fPx

k

2

=

+ sup{[6 - Axk)Ty - V Q J / } j,>0

1

+ sup{[Qy

^ k

k T

\yTQy}

- s}y -

y>0

^

k 1 k k 1 k k cTx + -{x )Tpx + -(y fQy

+ *Mkyk)TQyk

- \yTQy - ( < ? / ) V - v] - (*kfy}

< CTX* + l-{xkfpxk + \(yk)TQyk (2.7) The last inequality uses the convexity of yTQy and the nonnegativity of (sk)Ty. A symmetric argument for the dual problem implies 9(yk)>bTyk-\(yk)TQyk-\(xk)T*k-

(2.8)

J. Sun, K. E. Wee and J. Zhv

398

Subtracting (2.8) from (2.7) and using the first two equations of (2.5), we have

/(**) - g(yk) < cTxk + (xkfPxk - 6 V + (yk)TQyk = (wkfxk + (sk)Tyk. On the other hand, from 6(xk,yk,wk,sk,fik) {wkfxk

+ (skfyk

< \\4

= eTWkxk + eTSkyk

< a, we have = er (

^

"

^

+ ("» + » V *

f Wkxk - fike ' \ Skyk - nke J + (m + n)iik < aiik^/n + m + (n + m)fik.

Hence we have f{xk) - g{yk) < (1 + a/y/n + m){n + m)jj,k

Now we are ready to state the proposed algorithm, which is a specialization of a method of Ye and Anstreicher [11] for linear complementarity problems to the case of problem (2.4). Note that by setting A = 0 and A = 1, the Newton directions of (2.4) and (2.5) are determined by WAx + XAw = -Xw + Xfie SA.y + YAs = -Sy + X/ie Aw = PAx - ATAy As = AAx + QAy.

(2-9)

The associated direction is the predictor (afnne-scaling) direction if A = 0 and is the corrector (centering) direction if A = 1. Algorithm 2.3 (Ye and Anstreicher [11]) Step 0 (Initialization) Let k = 0. Choose (x°, y°, w°, s°) > 0, fi0 > 0 and 0 < a < 1/4 such that The first two equations of (2.4) are satisfied by (x°,y°, w°,s°), and such that <5(:E 0 ,I/ 0 ,U> 0 ,S 0 ,//O) < Q-

Step 1 For k = 0,1, ■ • •, until \ik < e/[(l + a/y/n + m)(n + m)] (e is the user assigned tolerance), do Step 1.1 Solve (2.9) with x = xk,y = yk,w = wk,s = sk,n = /xt, and A = 0. Denote by Axp, Ayp, Awp, and Asp the resulting directions. Let ,k_(Xk \0

0WAx"\ Yk)\Ayp)'

2a '-^

a

2+

4a||rf*||+tt'

IP Method for Stochastic

Programming

399

x(8) = xk + 9Ax",

y{9) = yk + 9Ayp,

w(9) = wk + 6Aw",

s{9) = sk + 8Asp,

and H{0) = {x(9)Tw(8) + y{6)Ts(9)}l{m

+ n).

This is the predictor step. S t e p 1.2 Solve (2.9) with x = x(9),y = y(8),w = w{9),s = s(9),ft = y,{B) and A = 1, resulting in A i c , Ayc, Awc and A.sc Let xk+l = x(9) + Axc,

yk+1 = y(9) + Ayc,

wk+1 = w{9) + Aiuc,

sk+l = s{8) + Asc,

and fik+\ — l*(9)- This is the corrector step. Update k and go to next iteration of Step 1. Convergence properties of this algorithm, stated in [11] (which uses an earlier result of Ji, Potra, and Huang [1]), are as follows. Theorem 2.4 Assume that problem (2.4) has a strictly complementary solution. That is, there is a solution (x,y,w,s) of (2.4) such that x + w > 0 and y + s > 0. Then 1. (xk,yk,wk,sk) k

k

> 0 f o r all jfc;

k k

2. (x ,y , w ,s ) satisfies the first two equations of (2.4) (Thus xk and yk are feasible to (1.4) and (2.1), respectively, according to the proof of Proposition 2.1); 3. The algorithm has iteration complexity 0(^/m + nL), where L is the input length of (2.4); 4. (xk)Twk

+ (yk)Tsk

-> 0

Q-quadratically.

We make some remarks on this algorithm. Remark 1 There are many available interior path-following methods for linear com plementarity problems. This one seems to be among the best of them in theoretical properties concerning local and global convergence.

400

J. Sun, K. E. Wee and J. Zhu

Remark 2 A serious drawback is that the algorithm needs a starting point near the central path. For a primary study, we can generate random testing problems with the required initial point, as we will do in Section 3. For practical problems, we might select an infeasible version of the algorithm like [3] (by "infeasible" algorithms we mean the algorithms that can start from arbitrary x > 0 and y > 0) or use a standardized approach to construct an initial solution in step 0, see [2J. In general, those standard approaches tend to increase the complexity of the algorithms. Remark 3 Current research indicates that the assumption on the existence of a strictly complementary solution can be removed at the cost of losing a little rate of convergence. In a recent paper [3], Mizuno proposed a predictor-corrector algorithm that does not require this assumption and has a superlinear convergence rate. Remark 4 Note that the steplength 6 can be computed through a formula in Step 1.1. Therefore, unlike the existing algorithms for problem (1.4), the algorithm does not need any line search involving f(x) or g(y).

3

Computational Aspects of the Algorithm

The major computational effort of the algorithm is spent on solving equation system (2.9). Since the matrix Q is extremely large, the key point is how to reduce the amount of work by taking advantage of the block-diagonal structure of Q. To achieve this goal, we re-write system (2.9) as ' WAx + X(PAx - ATAy) = -Xw + X^e SAy + Y{AAx + QAy) = -Sy + A/xe Aw = PAx - A7 Ay As = AAx + QAy. Solving the second equation for Ay and substituting it into the first equation, we get the following equivalent system: ' [(P + X-1W) + AT(Q + Y-1S)-1A]Ax = AT{Q + Y-'Sy'i-s + A/zF-'e) - w + < (Q + Y-1S)Ay = -AAx-s + \fiY-1e T Aw = PAx - A Ay As = AAx + QAy.

XfiX^e (3.1)

The solution of this system consists of three steps. First, we solve for (U, v) the

IP Method for Stochastic

Programming

401

equation system [Q + r- 1 5)(f/, v) = (A, -a +

\v.Y-lc).

Note that Q + Y~lS is positive definite due to y > 0 and s > 0. Since Q + Y^S is block-diagonal, the solution (U, v) can be obtained by decomposing the system into |ft| (the number of elements in ft) small systems and solving them in parallel (in sequential in our implementation). Second, we compute the left-hand side matrix and the right-hand side vectors of the first equation in (3.1) by using (U, v) and solve the resulting equation. Since the dimension of this equation is ordinary and the left-hand side matrix is positive definite due to x > 0 and w > 0, this goal can be achieved even if the left-hand side matrix is dense. Finally, we substitute Ax obtained in the second step into the second equation to get Ay, again by using block-diagonal decomposition of Q + Y~XS, and compute AID and As by using the rest of the equations in (3.1). The block-diagonal structure of matrix Q can be used to save computer memories as well. As a matter of fact, the result of (Q + Y^S^A = U is computed blockwise and multiplied with AT and added to P + X~l W in the same fashion. There is no need to store a large matrix like U. Factorization of a block of Q + Y_1S can be saved for later use in solving the second equation in (3.1). The algorithm has been implemented on a DEC7000/620 computer under the UNIX operating system at the National University of Singapore for solving a sequence of randomly generated problems. The primal vector x has four sizes: n = 10,20,50, and 100. In each group, for all UJ £ ft, the dimension of vectors z^ is identical, which we call the block dimension of the problem. The probabilities i u are also identical. The size of the probability space |ft| in problem (1.3) is so designed that the dimension of vector y in problem (1.4) is 100, 1000, 5000, and 10000, respectively. For instance, if the block dimension is 50 and m = 10000, then the size of the probability space is 10000/50=200. The matrix Q in problem (1.4) then have 200 blocks with each block being a 50x50 dense matrix. A total of 32 problems is tested. An initial solution (x°,y°,w°, s°) > 0 is randomly generated together with P, Q, and A and special care is taken to make P and Q positive definite. The right-hand side vectors 6 and c in (2.4) are then computed so that the first two equations of (2.4) are satisfied. Special care is taken so that the initial point satisfies W0x° w e and S0y° w e. Therefore, Ho = 1 is easily chosen to satisfy S(x°, y°, w°, s°, ^ 0 ) < a = 0.25. The stopping criteria is that every x}Wj (j = 1, • ■ • ,n) and every ytst (i = 1, • • •,m) must be less than 10~8 An additional feasibility check is done by computing the 1norms of the vectors Px- ATy-w + c and Ax + Qy-s-b before the algorithm outputs computational results. We find none of them exceeds 7xl0~ 1 0 in all 32 problems.

402

n 10 20 50 100

J. Sun, K. E. Wee and 3. Zhu m=100 cpu time itn no. 3.30E-01 21 5.69E-01 22 2.00E+00 22 6.57E+00 23

m=1000 cpu time itn no. 3.94E+00 25 6.82E+01 25 2.11E+01 25 8.57E+01 33

m=5000 cpu time itn no. 49 4.51E+01 47 7.14E+01 44 2.01E+02 47 6.46E+02

m=10000 cpu time itn no. 56 1.04E+02 1.72E+02 55 5.23E+02 57 1.45E+03 52

Table 1. CPU Time and Number of Iterations with Block Dimension = 20 (itn no. = number of iterations)

n 10 20 50 100

m=100 cpu time itn no. 4.84E-01 17 6.01E-01 14 1.87E+00 16 5.58E+00 18

m=1000 cpu time itn no. 7.89E+00 25 1.18E+01 25 2.85E+01 25 8.71E+01 31

m=5000 cpu time itn no. 7.30E+01 40 1.08E+02 41 2.58E+02 42 6.00E+02 44

m = 10000 cpu time itn no. 1.96E+02 52 2.75E+02 51 7.20E+02 55 1.60E+03 52

Table 2. CPU Time and Number of Iterations with Block Dimension = 50 (itn no. = number of iterations)

The computational results are shown in Tables 1 and 2. It is seen that for fixed block dimension, the CPU time increases with respect to both n and m. Moreover, the number of iterations is dominated by m, which coincides with the theoretical estimate of 0(y/n + mL) w 0{^JmL) (for m » n). However, the increase of num bers of iterations is slower than yjm + n as the size of the problem increases, which is commonly observed in computational experiments of interior point methods for linear programming. Unfortunately, we have not been able to compare the algorithm with other existing methods for two-stage stochastic programming problems because there seems to be a lack of common basis for such a comparison. For example, the results reported in Ruszczyriski [7] is for linear problems with more general recourse functions, while the results in Zhu and Rockafellar [12)[13] are for specially structured optimal control problems with m = n. It is inappropriate to use those test problems to examine an algorithm for quadratic stochastic programming. In summary, our research indicates that interior point methods can be adopted to incorporate the special structure of two-stage linear-quadratic stochastic program ming problems. Since the proposed algorithm has good theoretical properties and

IP Method for Stochastic

Programming

403

its performance is satisfactory in our preliminary computational experiments, we feel that it is worthwhile to pursue further research in this direction.

References [1] J. Ji, F. Potra and S. Huang, A predictor-corrector method for linear comple mentarity problems with polynomial complexity and superlinear convergence, Preprint, Dept. of Math. University of Iowa, Iowa, USA (1991). [2] M. Kojima, M. Megiddo, T. Noma and A. Yoshise, A unified approach to in terior point algorithms for linear complementarity problems, Lecture Notes in Computer Science No. 538, Springer-Verlag, Berlin, Germany (1991). [3] S. Mizuno, A superlinearly convergent infeasible-interior-point algorithm for ge ometrical LCPs without strictly complementary condition, Preprint No. 214 Mathematische Institute der Universitat Wuerzburg, Germany (1994). [4] R. T. Rockafellar, Linear-quadratic programming and optimal control, SIAM Journal on Control and Optimization 25 (1987) 781-814. [5] R. T. Rockafellar and J. Sun, A finite simplex-active-set method for monotropic piecewise quadratic programming, in: D. Du and J. Sun eds. Advances in Op timization and Approximation, Kluwer Academic Publishers, Dordrecht, The Netherlands (1994). [6] R. T. Rockafellar and R. J.-B. Wets, A Lagrangian finite generation technique for solving linear-quadratic problems in stochastic programming, Mathematical Programming Studies 28 (1986) 63-93. [7] A. Ruszczyriski, A regularized decomposition method for minimizing a sum of polyhedral functions, Mathematical Programming 35 (1986) 309-333. [8] J. Sun, K.-H. Tsai and L. Qi, A simplex method for network programs with con vex separable piecewise linear costs and its application to stochastic transship ment problems, in: Network Optimization Problems: Algorithms, Applications and Complexity, D. Du and P. Pardalos eds. World Scientific Publishing Co., London, UK (1993). [9] R. J.-B. Wets, Solving stochastic programs with simple recourse, Stochastics 10 (1983) 219-212.

404

J. Sun, K. E. Wee and J. Zhu

[10] S. Wright and D. Ralph, A superlinear infeasible-interior-point algorithm for monotone complementarity problems, Preprint, MCS-P344-1292, Math and Comp. Sci. Division, Argonne National Laboratory, Argonne, IL, USA (1993). [11] Y. Ye and K. Anstreicher, On quadratic and 0(- v /ni) convergence of a predictorcorrector algorithm for LCP, Mathematical Programming 62 (1993) 537-552. [12] C. Zhu, On the primal-dual steepest descent algorithm for extended linearquadratic programming, Preprint, Dept. of Math Sciences, The Johns Hopkins University, Baltimore, MD, USA (1992). [13] C. Zhu and R. T. Rockafellar, Primal-dual projected gradient algorithms for extended linear-quadratic programming, SIAM Journal on Optimization 3 (1993) 751-783.

A Newton

Method

for Vasiational

Inequality

Problems

405

Recent Advances in Nonsmooth Optimization, pp. 405-417 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

A Globally Convergent Newton Method for Solving Variational Inequality Problems with Inequality Constraints Kouichi Taji a n d Masao Fukushima 1 Graduate School of Information Science, Takayama, Ikoma, Nara 630-01, Japan

Nara Institute

of Science

and

Technology,

Abstract

Newton method for variational inequality problems can be made globally con vergent by incorporating a line search for a merit function. But the existing methods may not be easy to apply to problems with general nonlinear con straints, because the evaluation of a merit function requires the solution of an optimization problem with such nonlinear constraints. In this paper, we propose a new globally convergent Newton method for solving variational in equality problems with general inequality constraints. The method solves at each iteration an affine variational inequality problem, in which not only the mapping of the problem but also the constraints are linearized. In a line search, it makes use of the merit function recently proposed by the authors, which can be evaluated by solving a convex quadratic minimization problem with linear constraints. Thus each step of the algorithm can be carried out finitely even if the constraints are nonlinear. We show that, when the mapping is strongly monotone, the method is globally convergent to the solution, and that, under some additional assumptions, the rate of convergence is superlinear.

x This work was supported in part by the Scientific Research Grant-in-Aid from the Ministry of Education, Science and Culture, Japan.

K. Taji and M. Fukushima

406

1

Introduction

We consider the variational inequality problem of finding x~ € S such that {F(x*),x-x')

> 0 for all x£S,

(1)

1

where 5 is a nonempty closed convex subset of iT , F is a continuously differentiable mapping from B" into R" and (•, •) denotes the inner product in i f . In this paper, we suppose that the set S is specified by S= {x<= Rn\ci(x)

<0,

i = l,...,m],

(2)

where c< : R" —> R are twice continuously differentiable convex functions. Many iterative methods, such as Newton method, projection methods, the linearized Jacobi method and the successive over-relaxation methods, have been proposed to solve the variational inequality problem (1) (see [7] and the references cited therein). Among them, the Newton method generates a sequence {xk}, where xk+1 is a solution to the linearized variational inequality problem (F(I') + V F (

I

' ) V

+ 1

-

I

' ) , I - I

H I

) > 0

for all

x€

S.

(3)

It can be shown that, under suitable assumptions, the Newton method converges quadratically to a solution i", provided that an initial point x° is chosen sufficiently close to x" [7, Theorem 4.1]. For the variational inequality problem (1), various merit functions have been proposed and their properties have been studied [1, 2, 4, 8]. Those functions can be used to globalize the Newton method. Marcotte and Dussault [10] obtained a globally convergent Newton method by incorporating an exact line search strategy for the gap function g{x) = ma,x{{F(x),x - y) \ y £ S}, introduced by Auslender [2]. Another modification is the one recently proposed by Taji, Fukushima and Ibaraki [12], which makes use of Armijo line search for the regularized gap function / ( x ) = m a x | - ( F ( x ) , y - x ) - - (y - x,G(y - x))

yes

}•

introduced by Fukushima [4], where G is an n x n symmetric positive definite matrix. Under suitable assumptions, both modifications are shown to be globally convergent to a solution with quadratic rate of convergence [10, 12]. Note that the above mentioned methods tacitly assume that the constraint set S has a relatively simple structure. For example, when 5 is a polyhedral convex set, that

A Newton Method for Variational Inequality

407

Problems

is, the functions c, are all affine, the variational inequality subproblem (3) of New ton method becomes an affine variational inequality problem and the gap functions g and / can be evaluated by solving linear and quadratic programming problems, respectively. However, when 5 is a general convex set defined by nonlinear convex functions, solving the linearized subproblem (3) and evaluating g(x) and f(x) should be considered difficult tasks. In this paper we propose a new globally convergent Newton method for solving vari ational inequality problems with general inequality constraints. The method solves at each iteration an affine variational inequality subproblem, in which not only the mapping F but also the constraint functions c, are linearized. Moreover it makes use of the merit function recently proposed by Taji and Fukushima [13] to obtain global convergence. The method has a clear advantage over the methods that solve subproblems (3) and use the merit function g or / , in that each step of the algorithm can be carried out finitely even if the set 5 is specified by nonlinear inequalities. It is shown that, when the mapping is strongly monotone, the method converges globally to the solution, and that, under some additional assumptions, the rate of convergence is superlinear. The method is closely related to a successive quadratic programming method for solving nonlinear programming problems.

2

Preliminaries

In this section, we summarize some preliminary facts which will be useful subse quently. The mapping F : -ft" —> ft" is said to be monotone if {F(x) - F(y), x - y) > 0 for all

x,yeRn,

strictly monotone if the above inequality holds strictly whenever x ^ y, and strongly monotone with modulus \i > 0 if (F(x) ~F(y),x-y}>n\\x-y\\2

for

all x, y € ft"

It is well known [11, Theorem 5.4.3] that, when F is continuously differentiable, F is strongly monotone with modulus \i if and only if the Jacobian VF(x) satisfies {d, VF(x)d)

>n\\d

||2

for all x,deRn

(4)

It is also well known [7, Corollary 3.2] that, when the mapping F is strongly monotone, the variational inequality problem (1) admits a unique solution.

408

K. Taji and M. Fukushima

A function : R" -» [>00,+00] is ]ais to t o birectionally differentiate direction d if <j>(x) ii sinite aad the eimit

tioT>0

at a tn the

{x) 6(x - 4>{x) — *i —

exists. We call the limit the directional derivative and denote it by 0'(x;d). In the remainder of the paper, we suppose that the Slater's constraint qualification holds for (2), i.e., there exists an i 6 if1 such that c,(x) <0 for all i= , . . . , , m .

(5)

Under this assumption, x" is a solution to (1) if and only if there exists a Lagrange multiplier vector A* = (X^,,..,A'm) such that (x",\') is a solution to the following mixed nonlinear complementarity problem [7, Proposition 2.2] : m

F(X*) + F(x') + X:A*V 52\*Vc C ,(X-) i(x') == 0,0,

(6)

c,(x") < < 0" 0" A* > 0, A*Ci(x') = 0, i = = l , . . . , m. m.

3

A Merit Function

In this section, we review the merit function recently proposed by the authors [13] for the inequality constrained variational inequality problem. The reader may refer to [13] for details. Choose an n x n positive definite matrix G and define the function / : R" -> R bb /(*) = max { -(F(x),

x) - i\ (y [y - xtG(v G{v ~ *)) |\ je r ( * ) } , j,y - z)

(7)

where T(x) is the polyhedral convex set defined by T(x) = {y€R"\c,(x) {i,€/r |c,(x) + {Vc,(x) (Vc,(x),2/-x) >y-x)

< <0,0, i = 1 , . . . , m } .

By the convexity of a, it is easy to verify that, for all x e Rn, T(x) is a closed convex set containing 5 . Thus / provides an over-estimate of the function / introduced by Fukushima [4], i.e., f(x) > f{x) for all x € R* In particular, when c,- are all linear, / coincides with / . Note that the positive definiteness of G guarantees that the maximum in (7) is always attained by y = H(x) uniquely, where H(x) is the unique solution y to the convex quadratic programming problem QP(i):

minimize, subject to

\(y - x,G(y - x)) + (F(x), y - a) ^(y-x,G(y-x)) (F(x),y-a) c,(x) <0, Ci(x)++ (Vc,(x),y-x) < 0, »> = l , . . . , m .

.8)

409

A Newton Method for Variational Inequality Problems Therefore, the function / can be written as f(x) = ~{F(x), H{x) - x) - i (H(x) - x, G(H(x) - x)).

(9)

Let A(x) denote the set of optimal Lagrange multiplier vectors for QP(x), that is, A(x) = {A G iK" f | F(x) + + G(H(x) G(H{x) - x) + f > , V c , ( x ) = 0, A; A; > 0, \i[ci(x) + (Va(x),H(x)

-x)}

(10)

=0,i = l,...,m}.

Since H(x)) = i - holds for the solution x* of (1) [13, Lemma 2.1], A(x") coincides with the set of vectors A* satisfying (6). Using the function / , we can formulate the optimization problem minimize f(x)

subject to x6 x 6 55..

(11)

We can prove that this problem is equivalent to the variational inequality problem (1). Proposition 3.1 [13] Let the function f:Rn^Rbe defined by (7). Then f(x) > 0 for all xG S. Moreover, x £ 5 and f{x) = 0 if and only if x solves the variational inequality problem (1). Hence x solves (1) if and only if it solves the opttmization problem (11 and / ( x ) = 0. For given i £ RT and A e UT, we define the matrix M(x, A) by m

M(x,A) = VF(x) + X > ; V c , , ( x ) .

(12)

t=i

The next proposition demonstrates the directional differentiability of / . Proposition 3.2 [13] Suppose that the mapping F is continuous and the convex functions C, i — 1 , . . . ,m, are continuously differentiable. Suppose also that the Slater's constraint qualification (5) holds. Then the function f defined by (7) is continuous on R". Moreover, if F is continuously differentiable and c{, i = l,...,m, are twice continuously differentiable, then f is directionally differentiable in any direction d 6 Rn and its directional derivative f'{x;d) is given by /'(x; d) = min (F(x) - [M(x, A) - G](H{x) fix; G](H(x) - x), d).

(13)

K. Taji and M. Fukushima

410

Remark 3.3 If the set A(x) is a singleton {A}, then / is differentiable at x and the gradient is given by V/(x) = F{x) - [M(x, A) - G](H(x) - x). A sufficient condition [3, Theorem 6] for A(x) to be a singleton is that the gradi ent vectors Vc;(x),i G I(x), are linearly independent, where I(x) = {i | c,(x) + (Vci(x),H(x) — x) = 0}, and the strict complementarity condition is satisfied, i.e., A,- = 0 implies c;(x) + (Vc,(x), H(x) - x) < 0. By Proposition 3.1, x solves the variational inequality problem (1) if and only if x is a global optimal solution of (11). The next proposition gives a condition under which any point satisfying the first order necessary optimality condition for (11) is actually a global optimal solution of (11). Proposition 3.4 [13] Suppose that the mapping F is continuously differentiable, Vi^(x) is positive definite for all x, the convex functions c,, i = 1,... , m, are twice continuously differentiable and the Slater's constraint qualification (5) is satisfied. If x 6 S and f'(x-y-x) > 0 for ally 6 S, then x is a global optimal solution of (11), and hence x is a solution to (1). The following lemma will be useful in the next section. Lemma 3.5 For any x, we have m 1 / ( x ) = - ( t f (x) - x, G(H(x) - * ) ) - £ A,C,(x)

m

>-£A,c,(x) for any A 6 A(x). In particular, if x € 5, then f(x)>l-(H(x)-x,G(H(x)~x)).

Proof. Since H(x) solves (8), it follows from the definition (10) of A(x) that each vector A 6 A(x) satisfies F(x) + G(H{x) - x) + £ A, Vc,-(x) = 0, 1=1

c,(x) + ( V c , ( x ) , # ( x ) - x ) < 0 ,

A,>0,

A,[c,(x) + (Vc,(x), H{x) - x)} = 0,z = 1 , . . . ,

m.

A Newton Method lor Variational Inequality

Problems

411

Hence, we have from (9) f(x)=-{F(x),H(x)

-x)-

l

-(H(x)

- x,G(H(x)

- «))

={G{H{x) - x), H(x) - * ) + / £ A, Vc,( x ), H(x) -

x\

-1-(H(x)-x,G(H(x)-x)) =\(H{x)

- x, G(H{x) - x)) + / £ A,Vc,(s), # ( * ) - « \

m 1 =-
m

>-^A,Ci(x), where the last inequality follows from the positive definiteness of G. Since A, > 0 and c,(x) < 0, t = 1 , . . . ,m, for all x S S, the last part of the lemma follows immediately. D

4

Globally Convergent Newton Method

In this section, we present a globally convergent Newton method for the variational inequality problem (1), which incorporates an Armijo-type line search procedure for the penalty function 6T : Rn —> R defined by m

0r(x) = f(x) + r J2 max(0, c,(x)), where r is a sufficiently large positive parameter. By Proposition 3.2 and [6, Lemma 3.1], 9r is directionally differentiable and the directional derivative is given by e'T(x- d) = f{z; d) + rJ2 (Vc,(x), d) + r £ max(0, (Vc,(x), d)), ;e/+ ie/o

(14)

where /+ = {i|c,(x) > 0} and I0 = {i | c,(x) = 0}. Throughout this section, we assume that the mapping F is continuously differentiable and strongly monotone with modulus n, so that VF satisfies (4). Note that, since the convexity of c, guarantees that V2c,-(x) is positive semi-definite, (4) implies that the matrix M(x, A) defined by (12) satisfies (d,M{x,X)d) >n \\d ||2 f o r a l l x , d € / J " , (15)

K. Taji and M. Fukushima

412 whenever A > 0. Now we state the algorithm.

Algorithm Step 0 Choose x° € fl*\ r > 0, 0 < 0 < 1, 0 < cr < 1, and a symmetric positive definite matrix G. Let k := 0 Step 1 Find the unique solution xk 6 T(xk) of the linearized variational inequality problem (F(xk) + M{xk,\k)T{xk-xk),x-xk)

> 0 for all x€T(xk),

(16)

where A* is an arbitrary vector in A(i*). Let dk := xk — xk Step 2 Set xk+i that

:= xk + /3mkdk, where mk is the smallest nonnegative integer m such eT(xk) - 0r(xk + /3mdk) > -a/Sm0'r(xk;

dk).

(17)

Let k := k + 1. Go to Step 1. Note that in Step 1 we need an optimal Lagrange multiplier vector A for the quadratic programming problem QP(i*) (cf. (8)). This has already been obtained in the previ ous iteration as a by-product of evaluating the function value f(xk). Note also that, by the positive definiteness of M, the linearized problem (16) always has a unique solution. Moreover problem (16) can be rewritten as a linear complementary prob lem, which can be solved in a finite number of steps using Lemke's complementarity pivoting algorithm [9], The following theorem shows that the vector dk generated by the algorithm is a descent direction of 6r at xk

Theorem 4.1 Let the mapping F be continuously differentiable and strongly mono tone with modulus p, and let the convex functions c,, i = 1 , . . . , m, be twice continu ously differentiable. If II A | U < r for all A G A(xk), then the vector dk = xk — xk satisfies the inequality 6'r{xk; / ) < - ( / . - \ || G ||) || dftf

(18)

In particular, if the matrix G is chosen sufficiently small to satisfy | | G | | < 2fi, then dk is a descent direction of 6T at xk.

A Newton Method for Variationa/ Inequality Problems

413

Proof. For simplicity of notation, we omit the superscript k in xk and dk. Let 7+ = {i| Ci{x) > 0} and I0 = {i\ c(x) = 0}. Note that d = x - x together with some Lagrange multiplier vector A > 0 satisfies m

T F(x) + M{x, M(x,A)~\fd
(19a)

i=l

cc(x) ( i ) + (Vc,(x),(f)<0, (Vc(x),d)<0, Xi[ci{x) + (Vc Ai[c,(x) (Vc,(x), d)} = 0, i(x),d)}

i = l,...,m. 1,...,m.

(19b) (19c)

Then (19b) yields £max(0,(Vc,(x),
(20)

m

Since d = x — x, and since M(x, A) = V F ( I ) + ^ A , V 2 Q - ( I ) , it follows from (13) that .^) /'(x;
TV

"

A<=A(r)

/

—

-x),x-x)

\ \

<{F{x) - [M(x, \) - G](H(x) -x),x-x) <{F{x) - G](H(x) =(F(x),x - -[M(x, x ) -\)(H(x) - x, M(x, \)T-x),x-x) (x-x))

+ (G(H(x) - x)i,i — x) T T(-(x) T + (G(H(x) =(F(x),x x ) x, M(x, \) (x-x)) =(F(x) + M(x, \) (x - i ) , ! - . ) - ( M ( x , X) (X - « ) , * - x) ~x),x-x) T =(F(x) M(x, (x T(x- . -) x), , ! H{x) - . ) - -x) ( M (+x ,(F(x), A f (x H(x) - x ) , -x -x)x) - (F(x)+ + M(x,\) X) -+(G(H(x)-x),x-x) (F(x) + Af (*, A) r (x - x), H{)) -x) + ,A)T(, H(), x x>

-(G(H(x)-x),x-x) =-(F(x) + M(x,\)T(x-x),H(x)-x) =-(F(x) + M(x,\)T(x-x),H(x)-x) + I (F(x) H(x) — x) + x -(H(x) — x G(H(x) — x)) \ 1'A — I' d Ml Ml

AW\ * Id Cd\ PS \)d\ 4-I--Id

--lx-B(x) --lx-H(x)

G(x-H(x))) G(x-H(x)))

(21) (21;

where trie last equality tollows from trie €(jua,nty 2 ( i - xx,,GG{H{x) ( # ( x ) - -xx) )) ) 2{x = (H(x) - x, G(H(x) - x)) + (x-x, (x - x, G(x - x)> - (x - H(x), G(x -

H(x))).

Since x is a solution to (16), the first term of (21) is nonpositive. From (9), the second term of (21) equals -f(x). The last term is nonpositive by the positive definiteness of G. Hence, we have f'{x;d)

< -f(x)

- (d,M(x,\)d)

+

\{d,Gd).

K. Taji and M. Fukushima.

414

Moreover, since A € A(x), it follows from Lemma 3.5 that m 1 \)d) + -(<*,Gd) + £ * « * ( * ) •

/'(*;d) < -(d,M(x,

(22)

Hence, we have T

1

m

9'T{x-d)<-(d, M(x,\) d) + -(d,Gd) + J2 W * ) + r £ (Vc,-(*),d) 1

T

<-(d,M(x,\) d)

i=l

+ \(d,Gd) 1

iei+

+ ^(A,- - r)c(x) '6/+

2

^-(^-^IIGIl)!!^!! , where the first inequality follows from (14), (20) and (22), the second inequality follows from (19b) together with the fact that A, > 0 for all i and c,(x) < 0 for i 0 7 + , and the third inequality follows from (15) and || A ||oo^ f for all A £ A(x). Xhis proves (18). The last part of the theorem follows immediately. □ Next we show the global convergence of the algorithm. Theorem 4.2 Suppose that the mapping F is strongly monotone with modulus \i. Suppose also that the parameter r is chosen sufficiently large. If the matrix G is chosen to satisfy || G \\< 1\i, and if the sequence {xk} generated by the algorithm is bounded, then {xk} converges to the unique solution to the variational inequality problem (1). Proof. Since the sequence {xk} is bounded, it follows from [6, Lemma 3.3] that there exists a positive number f > 0 such that || \k H^^ f for all k, where A* is any vector in A(x*). Assuming that r > f, we have from Theorem 4.1 that dk satisfies the descent condition (18), whenever xk is not a solution to (1). Hence, by the line search rule (17), the sequence {0r(xk)} is decreasing. This together with the boundedness of {x*} implies that there is at least an accumulation point. In a way similar to the proof of [12, Theorem 4.1], it can be shown that any accumulation point is a solution to (1). Moreover, under the strong monotonicity assumption, problem (1) must have a unique solution. Therefore we conclude that the entire sequence {x*} converges to the unique solution to (1). D Next we examine the asymptotic rate of convergence of the algorithm. To this end, we consider the iterates (x*, A*) generated by the Newton method directly applied to

A Newton Method for Variational Inequality Problems

415

the mixed nonlinear complementarity problem (6), namely F(xk) + M(xk, \k)T(xk+1 - xk) + £ £ , A?+1 Vc-(i') = 0, Ci(xk) + (Va(xk), xk+1 - xk) < 0, A?+1 > 0, +1 fc k A* [c,(x ) + (Vc.-(x*),x*+> - i )] = 0 , i = l,...,m.

(23)

It can be shown [5] that, if VF(x') is positive definite, and if the strict complemen tarity and the linear independence of the active constraints hold, then the sequence generated by the Newton method (23) is quadratically convergent, provided that the starting point is chosen sufficiently close to the solution. (Note that [5] deals with nonlinear programming problems, which correspond to the special case of problem (1) where F is a gradient mapping of some scalar function, so that F is symmetric. But the symmetry assumption is not used in the proof of the theorem in [5].) Note that a solution x

+1

to (23) is a solution of the variational inequality problem

(F(xk) + M(xk,\k)T(xk+1-xk),x-xM)

> 0 for all x£ T(xk),

(24)

which is the same problem as (16) solved in Step 1 of the algorithm, except for the choice of A*. Therefore, if || M(xk, \k) - M(xk, \k) || tends to zero as x —> x", then the sequence {xk} generated by solving the linearized variational inequality problem (F(xh) + M{xk,\k)T{xk+1

-xk),x

-xk+1)

> 0 for all

x£T{xh)

with an arbitrary A* 6 A(x*), is locally superlinearly convergent. Since the vector A* belongs to A(i' c ) denned by (10), and A* in (24) is determined at the previous Newton iteration (23), both A* and A* approach the set A(x*) whenever xk converges to x" In particular, if A(i*) consists of the unique vector A*, then both A* and A* converge to A*, and hence we have || M(xk,Xk) — M(xk,Xk) |—* 0. Note that the uniqueness of the Lagrange multiplier vector A* is ensured by the linear independence of the active constraints. These observations are summarized in the following theorem. Theorem 4.3 Let the assumptions of Theorem 4-2 be satisfied. In addition, suppose that the strict complementarity and the linear independence of the active constraints hold at the solution x'. If there is an integer k > 0 such that the unit step size is accepted in Step 2 of the algorithm for all k >k, then the sequence {xk} generated by the algorithm converges superlinearly to the solution x"

K. Taji and M. Fukushima,

416

5

Concluding Remarks

We have proposed a Newton method for solving variational inequality problems and shown that, under the strong monotonicity assumption, the method is globally con vergent and that, under some additional assumptions, the rate of convergence is super linear. When F is a gradient mapping of some differentiable convex function ip, problem (1) corresponds to a necessary and sufficient optimality condition for the convex programming problem minimize subject to

ip(x) c,(i) < 0, i = l , . . . , m .

, ^

Therefore we may apply our method to (25) with the identification F = V. In this case, the matrix M defined by (12) is rewritten as m

M(I,A) = V V ( i ) + ^A,V2c,(x), ;=]

which is the Hessian of the Lagrangean of problem (25). Moreover, since M is sym metric, the subproblem (16) solved in Step 1 can be rewritten as minimize*

-{d, M(xk, Xk)d) + (F(xk), d)

subject to

Ci(xk) + (Vci(xk),d)

< 0 z= l,...,m.

Thus our algorithm reduces to a successive quadratic programming (SQP) method. A major difference from other SQP methods is that our algorithm makes use of the function / as a merit function to globalize the convergence, instead of using a penalty function associated with problem (25).

References [1] G. Auchmuty, Variational principles for variational inequalities, Numerical Func tional Analysis and Optimization 10 (1989) 863-874. [2] A. Auslender, Optimisation: Methodes Numeriques, Masson, Paris, 1976. [3] A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Uncon strained Minimization Techniques, SIAM, Philadelphia, 1990.

A Newton Method for Variationai Inequality Problems

417

[4] M. Fukushima, Equivalent differentiable optimization problems and descent methods for asymmetric variationai inequality problems, Mathematical Program ming 53 (1992) 99-110. [5] U. M. Garcia Palomares and O. L. Mangasarian, Superlinearly convergent quasiNewton algorithms for nonlinearly constrained optimization problems, Mathe matical Programming 11 (1976) 1-13. [6] S. P. Han, A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications 22 (1977) 297-309. [7] P. T. Harker and J. S. Pang, Finite-dimensional variationai inequality and nonlin ear complementarity problems: A survey of theory, algorithms and applications, Mathematical Programming 48 (1990) 161-220. [8] T. Larsson and M. Patriksson, A class of gap functions for variationai inequalities, Mathematical Programming 64 (1994) 53-79. [9] C. E. Lemke, Bimatrix equilibrium points and mathematical programming, Man agement Science 11 (1965) 681-689. [10] P. Marcotte and J. P. Dussault, A note on a globally convergent Newton method for solving variationai inequalities, Operations Research Letters 6 (1987) 35-42. [11] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [12] K. Taji, M. Fukushima and T. Ibaraki, A globally convergent Newton method for solving strongly monotone variationai inequalities, Mathematical Programming 58 (1993) 369-383. [13] K. Taji and M. Fukushima, A new merit function and a successive quadratic programming algorithm for variationai inequality problems, SIAM Journal on Optimization, to appear.

D. Ward

418

Recent Advances in Nonsmooth Optimization, pp. 418-437 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

Upper Bounds on a Parabolic Second Order Directional Derivative of the Marginal Function Doug Ward Department of Mathematics and Miami University Oxford, Ohio 45056-1641 USA

Statistics

Abstract

We establish an upper bound on the parabolic second-order upper Dini deriva tive of the marginal function of a parametric nonlinear program with smooth equality constraint functions but possibly nonsmooth objective and inequality constraint functions. The main tools in the proof of this bound are an inter section theorem for second-order tangent sets and a Ljusternik-type lemma. The bound takes on a particularly simple form if the objective and constraint functions are C 1 , 1 . Corollaries include a new upper bound for the upper Dini derivative of the marginal function of a C , l program.

1

Introduction

We consider t h e parametric m a t h e m a t i c a l program V{s)

v{s) : = m i n { / ( i ) | gt(x)

< s{,i 6 J, ht{x)

= sm+i,i

6

L},

for a vector s : = (si,... ,sm+p) € R m + P , functions f,g„h: : R " -► ( - o o , + o o ] , a n d index sets J := { l , . . . , m } and L := { l , . . . , p } . In nonlinear p r o g r a m m i n g , one is often interested in how t h e values of the marginal function v vary with changes in 5

Parabolic Directional Derivative

419

and the study of the differential properties of v is a major area of research (see for example [12]). Although v generally has points of nondifferentiability, bounds on u T (s;j/) := hmsup —

—

—

and v(s + ty) - v(s) v (s:y) := liminfv ' i-o+ t the upper and lower Dini directional derivatives of u, can be established under fairly mild assumptions. For example, if a Mangasarian-Fromovitz-type constraint qualifi cation is satisfied at solutions of V(0), an upper bound for t) + (0;s), in terms of the support function of the set of Karush-Kuhn-Tucker multipliers for V(0), is valid. If an additional "uniform compactness" or "tameness'' condition holds, then a similar bound on v~(0; s) can be derived. Such bounds were established by Gauvin and Tolle [17] for the case in which / , <7;, and hi are C 1 , and have since been shown, via tech niques of nonsmooth analysis, to have extensions that are valid for larger classes of objective and constraint functions [2, 10, 29, 31]. If second-order information about the objective and constraint functions is available, then corresponding information about v can be deduced, leading to sharper estimates of the changes that occur in v as s changes [3, 7, 11-12, 19, 24-28]. The foundational results in this area, due to Fiacco and others [12], give an explicit formula for the Hessian of v under the assumption that / , g,, and hi are C2 and an appropriate con straint qualification and second-order sufficient optimality condition are satisfied at a solution of V(0). Subsequent research has examined the consequences of weakening one or more of these assumptions (see e.g. [11, 19, 21, 24, 28]). In the present paper, we explore what can be said about upper bounds on secondorder directional derivatives of v when the objective and constraint functions are no longer assumed to be C 2 . Specifically, our goal here is to use concepts and techniques of nonsmooth analysis to derive an upper bound for v++(s;y,z)

:=limsup (-.0+

v(s +

'-

ty+ltiz)-v(s)-tv+(s;y) -j— , l l&

the parabolic second-order upper Dini derivative of v, under minimal smoothness hypotheses on / , <;,•, and A,. We will see that such a bound can be stated if / and g< belong to a large class of functions which need not be differentiable or even locally Lipschitzian, while h := (hi, ■ ■ ■, hp) is C 1 with derivative Vh(x) and the second-order directional derivative h"(x;d,y)

:=

lim

(*,0-.
h(x + td+ \t2z) - h(x) i2/2

tVh(x)d

D. Ward

420 exists.

We begin in §2 by describing the analytical tools that will be needed in the proof of our main result: the connection between second order tangent sets and parabolic second-order directional derivatives, an intersection theorem for tangent sets, and a Ljusternik-type lemma. In §3, we present our main result, an upper bound on v++(0;s,z) which is stated in terms of second-order directional derivatives of / , <7;, and hi. We also use the fact that i> ++ (0;0,z) = v+(0;z) to deduce an interesting corollary: an upper bound on v+(0;z) that—like recent results of Gauvin and Janin [15]; Bonnans, Ioffe and Shapiro [8]; Ioffe [19]; and Minchenko and Sakolchik [22]— takes advantage of second-order information about / , 0, we set B(x,t) := {y e R" | \\y-x\\ < e}; and for S C R", we define the distance function dist(x | S) := inf{||j/ — a:|| | y 6 S}. We denote the interior of S by int S and the relative interior of S by ri S. We say that 5 is closed near x £ S if there exists t > 0 such that S D B(x, e) is closed. For a function / : R" —> R := [—oo, +oo], the epigraph of / is the set e p i / := {(x,r) | f(x) < r}, and the effective domain of / is defined by d o m / := {x \ f(x) < +oo}. If dom / ^ 0 and / never takes on the value —oo, then / is said to be proper. If epi / is closed near ( I , / ( I ) ) , then / is said to be strictly lower semicontinuous (abbreviated strictly l.s.c.) at x. The function / : R" —> R p is said to be strictly differentiate at x 6 R" [10] if there exists a linear mapping V / : R" —» R p such that for all y 6 R", lim (w,v,t)-(r,v,0+)

/(t

"

+

^ -

/ (

^

= Vf(x)y.

t

For / : R" —» R, we will say that / is C 1 at i 6 R" if / is differentiate on some neighborhood of x and its derivative function V/(-) is continuous at x. We say that / is C 1,1 at x if in addition V / ( ) is Lipschitzian near x. Following Furukawa [13], we define / to be twice Neustadt differentiable at x with respect to d 6 R" if f (x;d,y) exists for all y G R"

Parabolic Directional Derivative

2

421

Tangent Sets and Second-order Directional Derivatives

One fundamental idea in nonsmooth analysis is the connection between local conical approximations to sets, called tangent cones, and types of directional derivatives (see [1]). Specifically, suppose that / : R n -» R is finite at i g R n , and let A be a concept of tangent cone. (We can think of A as a set-valued mapping such that for S c R " and x 6 5, A(S, x) is a cone which approximates 5 near x.) Then the A directional derivative of / at x in the direction y is defined by fA(x;y)

: = inf{r | (j,,r) £ A ( e p i / , ( * , / ( * ) ) ) } .

If A(epi/, ( x , / ( i ) ) ) is a closed cone, then fA is defined precisely so that ep[fA(x;-):=A(epif,(x,f(x))),

(1)

and f (x; ■) will be lower semicontinuous. In this paper, we will work with fA for two tangent cones: the adjacent cone, defined for S C R n and x £ S by T(q

i(t>,x).

v _ / R„ jj/fcK.

Ve > 0, 3A > 0 such that Vt 6 (0, A), 1 3veB(y,e) such that x + tv£S }'

and the Clarke tangent cone C(S,x):=(j,6R"

Ve > 0,3A > 0 such that Vz 6 B(x, A) n 5 1 V< € (0, A), 3D G B(y, e) such that z + tv £ S J '

The properties of / T and / c are extensively discussed in [1, 33]. We mention here, in particular, that if / is Lipschitzian near x, then / (x; ■) = f+(x; ■) and f°(x; •) = f(x; •), where f°(x; ■) is the Clarke [10] directional derivative. A number of favorable properties possessed by / + and f for locally Lipschitzian functions are shared by fT and fc for some larger classes of functions. The connection between tangent cones and directional derivatives has a second-order analogue in the correspondence between second-order tangent sets and parabolic second-order directional derivatives [1, 30, 32]. In this paper, we will be particularly concerned with the second-order adjacent set, defined for S C R", x € S, v 6 R" by T 2 (5 :,x,v):=LeR

n

Ve > 0, 3A > 0 such that Vi € (0, A), 3w e B(y, e) such that x + tv + t2w/2 G S

D. Ward

422

We observe that T2 is a generalization of T in the sense that T2(S, x, 0) = T(S,x), and that T2(S,x,v) is often not a cone—for example, if S = {(x,y) g R 2 | y > x 2 } , then T 2 (5,(0,0),(l,0)) = { ( x , y ) | 2 / > 2 } . The direction v will generally be an element of T(S, x); in fact, T2(S, x,v) = 0 if

v#T(S,x). For / : R" -» R, if f(x) and / T ( x ; « ) are finite, then the second-order directional derivative associated with T2 is defined by 4 f{x; v,y):=

inf {r |(y, r) g T 2 (epi / , (x, / ( * ) ) , («, f(x;»)))

}.

Since r 2 ( S , x,u) is always a closed set, this definition implies that epi 4 / ( x ; „, •) = T 2 (epi / , (x, / ( * ) ) , («, f(x;„))).

(2)

As a generalized limit of difference quotients,
M

s

'

. ,

:=

t2w/2)-f(x)-tfT(x;v) -T-T-

. ,

«>o *>"
I*

f(x + tv +

t2w/2)-f(x)-tfT(x;v)

<2/2

This shows that ofj/ is a parabolic second-order derivative similar to those studied by Ben-Tal, Zowe, and others (see [1, 5-6, 13, 19, 30]). There are simpler expressions for d\f for special classes of functions. For example, if / is Lipschitzian near x, then it is not difficult to show that ^ / ( x j r i , ! / ) = / + + ( x ; v, y). If / is C 1 at x, then d\f (and similarly, / " ) can be further simplified. Proposition 2.1 Let f : R" -> R be C1 at x. Then for all v,y g R", 4f(x;v,y)

= Vf(x)y

+ d2+f(x;v),

(3)

where ,2 r, » ,/(X + tv) - f(X) d%f{x\v) : = hmsup — 1—0+

tVf(x)V ^-.

I I i,

Moreover, if f is C 1,1 at x, then a%f{x\-,-) is finite. Similarly if f is C 1 at x and twice Neustadt differentiate at x with respect to v, then for all y g R n , f'\x-v,y) = Vf{x)y + d2f(x-v) (4) whcvG

n

»

,.

f(
f(x +

tv)-f(x)-tVf(x)v

^-

.

Parabolic Directional Derivative

423

PROOF. The proofs of (3) and (4) are based on ideas of [5, Lemma 6.4; 6, p. 484]. To prove (3), let v G R", y G R" For sufficiently small t > 0, we have by the mean value theorem that there exists 6t G (0,1) such that / ( * + tv + t2y/2) - f(x + tv) = (V/(x +iv + t20ty/2),t2y/2). 1

(5)

T

Since / is C at x, f (x; v) = V/(x)u and / is Lipschitzian near x. As mentioned above, d2rf(x;v,y) = / + + ( x ; v,y), so by (5), 4 / ( * ; v, y) < lim sup(Vf(x + tv + t%y/2),y)

+ d2+f(x; v) = Vf{x)y

+ d\f{x;

v),

+ d\f(x;

v).

1—0+

and 4 / ( * ; v, y) > limmf(V/(x + tv + t20ty/2), y) + d\f(x;

v) = Vf(x)y

Hence (3) holds. The proof of (4) is similar. Finally, suppose that / is C 1,1 at x. Then there exist fi > 0 and M > 0 such that ||V/(x') - V/(x")|| <M\\x'-x"\\,

VX',X"GJB(X,^).

Choose 8 > 0 small enough so that x + tv G B{x,ti) for all t G (0,(5). Now let t G (0,(5). Again by the mean value theorem, there exists 6t G (0,1) such that / ( x + tv) - f(x) = (V/(x +

6,tv),tv).

Then f{x + tv) - f(x) -

tVf(x)v)

(Vf(x

+

ty2 Hence d\fix\

Ottv)-Vf(x),v) */2

v) is finite, and by (3), djf(x; v,-) is finite.

< 2M|jtj|| D

It is worth noting that if / is twice Frechet differentiable at x with second derivative V 2 / ( z ) , then d2+fix;v) = d2fix;v) = V2fix)(v,v), so that f"ix;v,y)

= d2Tf(x-v,y)

= Vf(x)y

+ V 2 /(x)(t>, „).

These facts follow easily from Taylor's Theorem. Equations (1) and (2) enable us to work with fT, / c , and d\f via T, C, and T2. There are two properties of T2 that will be especially useful to us here. One is the fact [30, Lemma 2.4] that if S C R n , x G S, and v € R", then T2(S,x,v)

+ C(S,x)cT2(S,x,v).

(6)

The other is a theorem relating T2(n™ ^ j , x, v) and n™ x r 2 ( 5 i , x, v) for locally closed sets S{ with x G n£.jS,. These two expressions will be equal, in fact, if the sets CiSi,x) have "sufficient intersection" in the following sense:

D. Ward

424 Definition 2.2 (a) For K C R p , define A" A := { ( i ! , . . . , x „ ) | x{ 6 K,xi = x 2 = . . . = i » } . (lj Let Ki,...,Ka be nonempty convex cones in W. strong general position if

These cones are said to be in

A"RP - f [ A; = R np

(7)

•=i

The strong general position concept is discussed in [35, 33], where several equivalent ways of writing (7) are given. In particular, n^Ki-Km+i

= Rp,Vm=l,...,n-l

is equivalent to (7). Another way to write (7) is n?=i(tf> - {*.}) # 0, Vi, € R p , i = 1 , . . . , n , which is often used in [1]. To put this condition in context, we observe that (7) implies n,"=1 ri A, # 0 and is implied by 3i such that A'; n (n,/,- int Kj) ^ 0. In addition, we note that if K\,..., Kn are in strong general position, then so are A,, i € M, for any nonempty M C { 1 , . . . , n}. We can now state the aforementioned tangent set intersection theorem. Theorem 2.3 [30] Let Si, i = 1 , . . . ,m be subsets o / R " which are closed near x 6 njijS,, and let v € R". Suppose that C(Sj,x), i = l , . . . , m are in strong general position. Then

r2(n™15„a;,l;) = n- 1 r 2 (5„i,j;). In working with the equality constraints in 'P(.s), we will make use of a second-order Ljusternik-type lemma. This lemma is a consequence of the following theorem, a special case of a versatile result of Borwein [9, Theorem 2.1]. Theorem 2.4 [9] Let S c R " , and let H : R" -» R m be strictly differentiable at xQ £ H~l{Q) n S. Suppose that S is closed near i 0 and VH(x0)C(S,x0)

= Rm

(8)

Then there exist L > 0, 8 > 0 such that for all x S B(x0,8) n S and all u g dist(x | #

_1

( u ) n S) < Ld\st(H(x)

| u).

B(0,8), (9)

Parabolic Directional Derivative

425

L e m m a 2.5 Let S C R", and let H : R n -» R m be strictly differentiate at x0 G 5 D j / - 1 ( 0 ) . Suppose that S is closed near x0, (8) holds, and H is twice Neustadt differentiable at x0 with respect to d. Let s = VH(x0)d, z G R m . Then T2(S,x0,d)nH"(x0;d,-)-1(z) V« > 0, 3A > 0 such that V< G (0, A), 3y(i) G B(y with x0 + td + t2y(t)/2 G S n # _ 1 ( i s + t2z/2) PROOF. Let t > 0 be that for all x G B(x0,6) H"(x0;d,y) = z. Then B(0,6) and there exists

"»}■

(10)

given. By Theorem 2.4, there exist L > 0, 6 G (0, e) such n 5 and u G £(0,<5), (9) holds. Let y G r 2 (5,x 0 , 0 such that for all t G (0, A), ts + t2zj2 G w(t) G B(y,S/2) with x 0 -Md + t2w{t)/2 G 5 n 5 ( x 0 , >5)

and

||if(x 0 4- td+t2w(t)/2)

-ts-

t2z/2\\

2

t j1 Let t G (0, A), and choose w(t) as above. a(t) G tf_1(ts + t2z/2) n 5 with

< 6/4L.

Then by (9), there exists a point

||x 0 + t(i + f 2 u ; ( < ) / 2 - a ( 0 | | £||ff (xp + <
+

e e 4 S '

aft) — In — td { - > —J . Then x 0 +
and

ll»(0-y|l<

a(r)-x0-id-<2iu(t)/2|| « 2 /2

+ M*)-y||<«-

Hence i/ belongs to the set on the right-hand side of (10). The proof of the opposite inclusion is routine. □

3

U p p e r Bounds on v ++

Throughout the remainder of this paper, we denote the feasible set and solution set of V(s) by FW:=fxGR" 9M~Si'ie-ir } v ' | hi(x) = s m + 1 , i £ i J

D. Ward

426 and il(a) := {x £ F(s) | f(x) = v(s)},

respectively. We let x0 G n(0) and define /(x 0 ) := {i G J \ gi(x0) = 0}. We will also be interested in the set of directions f fT(x0-d) = v+(Q-s), ) E(s) := Id G R" gf(x0;d) < sh i G I(x0), \ ; {

Vhi(x0)d = sm+i,

ie L

J

and for d G S(5), in the index set I(x0,d)

:= {i G I(x0) \ gj(x0;d)

= «,•}.

We will often assume that io and
A s s u m p t i o n 3.1 (a) / and gt, i G I(XQ), are strictly l.s.c. at xo', gi, i G J\ I(XQ) are continuous at x0. (b) /» is C 1 at xo and twice Neustadt differentiable at x0 with respect to d, and V/i(x 0 )R n = R". (c) / T (io;-)i 9l(xo'r), 7(x0)\/(io,d)-

i G /(zo) are proper; domc/j-^^io; d, ■) = R n for all i G

(d) d o m / c ( x o ; ■)> dom^rf (x 0 ; •), i G I(xo), and V/i(i 0 ) _ 1 (0) are in strong general position. (e)

d o m / c ( x 0 ; •)

Vh(xo)-l(0)

n{y\g?(xo;y)<0,

Vi€ I(x0)} ^
R e m a r k 3.2 (i) We note that if a function / : R n —t R is Lipschitzian near x, then d o m / c ( x ; •) = R" and fT(x; •) is proper. Thus if / and each gt are Lipschitzian near x 0 , parts (a) and (d) of Assumption 3.1 hold and parts (c) and (e) are greatly simplified. (ii)If/ : R" -> R i s C M a t x, then dom<4/(x; i>, ■) = R" for all v by Proposition 2.1. If / is twice Frechet differentiable at x 0 , then in particular / is twice Neustadt differentiable at x0 in every direction d. It follows that if / and each fl, are C 1,1 at xo, and if each /i, is twice Frechet differentiable at x 0 , then Assumption 3.1 (a), (c), and (d) are satisfied and (e) reduces to Vft(x o )- 1 (0) n {y | Vgi{x0)y < 0, V. G /(x 0 )} ± 0. In other words, Assumption 3.1 reduces in this setting to the MangasarianFromovitz constraint qualification.

427

Parabolic Directional Derivative We now establish our main result:

Theorem 3.3 Suppose that Assumption 3.1 is satisfied at XQ £ 0(0), d £ £(s). Then for all z £ R m + P , v++{0;s,z)

<mi{d2Tf(x0;d,y)

d\gi(x0;d,y) < zt, i £ I(x0,d), h"(x0;d,y) = zm+i,i£ L

\ J"

.^

PROOF. Let z £ R m + P , and suppose that y satisfies d\gi(x0\d,y) < zi, V» e I(x0,d), h"(x0;d,y) = zm+i, Vi 6 L, and dyf{xa\d,y)

< r.

++

It suffices to show that u (0; 5, 2) < r. To that end, let e > 0 be given, and let w be an element of the set in Assumption 3.1(e) with fc(x0; w) < a. Choose 7/ > 0 such that max{||7jiw||, \rja\} < e/2. Since (y, r) £ T 2 (epi / , (x 0 , f(x0)),

(d, f(x0;

d)))

and (w, a) £ CMepi/, (x 0 ,/(x 0 ))J, (6) implies that (y, r) + r,(w, a) £ T 2 (epi / , (*0) /(*<,)), (d, f(x0;

d)));

i.e., dj-f(x0; d,y + r)w) < r + -qa. Similarly, djgi(x0; d,y + i)w) < zu Vi £ I(x0, d) so we may choose S £ (0, e/2) such that <4ji(z 0 ; d, V + Vw) < z«" - <5, Vi € 7(x 0 ,
= 0, Proposition 2.1 gives h"(x0; d,y + r/w) = z m + ,

Now define the sets D0 := { ( x , r 0 , . . . , r m ) £ R " + m + 1 | f(x) < r 0 ) , A : = { ( x , r 0 , . . . , r m ) G R " + m + 1 | Si[x) < n), 1 £ J; and take

s := nSoA, /9:=(^o,/(xo),0,...,0)eR"+

m+1

,

D. Ward

428 and 7 :=

(d,fT{x0\d),31,...,sm).

As in (2) we obtain T2(D<,p,

(y + Vw,r + r,a,z1-6,...,zm-6)£

7),

Vi € J(x 0 , d) U {0},

and by Assumption 3.1(a), we have T2(Di,p,f)

= R " + m + \ Vi € J \ /(xo).

By Assumption 3.1(c), y + i/m 6 domdj.5,(z 0 ;d, •) for i e 7"(#o)\ I(xo,d), Lemma 2.7] implies that

and so [30,

(i/ + 7jiu,a) 6 T 2 f epigi,(a:o,5i(io)),(rf,ff1T(xo; Hence (y + r)w,r + r,a,zi-6,...,

*m - 6) € T 2 ( A , 0,7), Vz £ 7(i 0 ) \ 7(s 0 , d),

and we conclude that (y + r,w,r +

Va,zl-6,...,zrn-6)€n™=0T

2

(Di,fl,"f).

Moreover, Assumption 3.1(d) implies that C(7?,-,/3), i = 0 , . . . , m, are in strong gen eral position, so by Theorem 2.3, {y + ijw,r + 7)<x,zi - 6,...,zm

-6)

g

T2(S,/?,7).

Next define 77 : R"+m+» -> W by 77(x,r 0 ,... , r m ) = h{x). Again, it follows from Assumption 3.1(d) that V77(/3)-1(0) - n£0C{DtJ)

= R"+m+1,

(12)

and by Assumption 3.1(b), V77(/?)R n+m+1 = R" Since (12) and (13) are equivalent to R* = V77(/?)[n™0C(7)„/?)], [1, Corollary 4.3.6] implies that

R" = VH(fi)C{S,fi).

(13)

Parabolic Directional Derivative

429

We may then apply Lemma 2.5 to conclude that there exists A > 0 such that for all t € (0, A), there exists (y(t), r(t), *i(i), • • •, zm(t)) eB^{y

+ ijw, r + Va,

Zl

- 6,..., zm -

S)tS)

with /? + <7 + t2(y(t), r(t), zt(t),..., eSHH

zm(t))/2 x

f * ( s m + i , . . . ,sm+p) + t2(zm+u

..., z m + p ) / 2 h

i.e.,

gi(x0 + td + t2y{t)/2) < tsi + t2z{(t)/2 < tSi + t2zi/21 Vi G J; h,{x0 + td + t2y(t)/2) = tsm+i + t2zm+t/2, Vz G I ; and f{x0 + td + t2y(t)/2)

< f(x0) +
Hence z 0 + td + t2y(t)/2 G F(ts + t2z/2). Since d G E(s) and / T ( i 0 ; ■) is proper, - c o < fT(x0;d) = D + ( 0 ; S ) , and by [31, Theorem 3.3], u + (0;,s) < +oo. Taking into account the fact that x0 G H(0), we obtain v{ts + t2z/2) - v{0) - tv+{0- s) f(x0 + td + t2y(t)/2) - f{x0) - tfT(x0; 2 t /2 ~ t2/2 < r(t) < r + ria + e/2 < r + e. Since e is arbitrary, we conclude that u + + (0; 5, z) < r, and (11) holds.

d)

D

R e m a r k 3.4 (i) Since the inequality v++(0; s, z) > djv(0; 3, z) is always true, the right-hand side of (11) is also an upper bound for djv(0; s, z) under the assumptions of Theorem 3.3. In fact, the inequality ,2 / n > . - , ( / , , dTv(0;s,z)< 1n{^dTf(xo,d,y)

..,,

^

r f > y )

d2Tgi(x0\d,y)
(14)

holds under hypotheses weaker than Assumption 3.1 (see [32, Theorem 4.4]). Assumption 3.1 includes a Mangasarian-Fromovitz type constraint qualification which is not required in a proof of (14) but plays a crucial role in our proof of the sharper inequality (11).

D. Ward

430

(ii) Assumption 3.1 does not guarantee, in general, that the right-hand side of (11) will be less than +oo. We can ensure that this expression is less than +oo by adding the condition ( I
) .} ^ V

V

,-.., >

to Assumption 3.1. If <7; and hi are C 1,1 , however, then Assumption 3.1 implies (15) by Proposition 2.1. As an immediate consequence of Theorem 3.3, we deduce a new upper bound for v+. Corollary 3.5 Suppose that Assumption Then for all z G R m + P , + /r> \ ,-• djift v^0;z)<m(i[dTf(x0;d,y)

J

3.1 is satisfied at xo G H(0), d G 2(0).

\ y) •£ L = ^

PROOF. Let s := 0 in Theorem 3.3. Since i; ++ (0;0,:z) = v+(0;z), (16) in this case.

\ ]•

. . (16)

(11) reduces to □

Corollary 3.5 gives a bound on the upper Dini derivative of v that takes advantage of second-order information about /,#,-, and h,. Not surprisingly, such bounds can be sharper than those which use only first-order information (see [15-17] and Example 4.4 in the next section). For the case in which / , R by gi(xi,x2) = 4xi - x\. In this example, fi(0) = {(0,0)} and E(0) = {(0,0)}. Thus (16) reduces to i, + (0;z) < \ni{-yi

+ \y2\ | 4y, < z} = +

In fact, one can calculate directly that v (0;z) therefore obtain E(s) = {(dud2)

-z/4.

= —z/4 for z G R. For s G R, we

| - s / 4 = -dx + |d 2 |, Ad, <s} = {(3/4,0)},

and (11) gives v++(0;s,z)

< inf{ 5 2 /8 -

yi

+ \y2\ \ 4 y i < z} = s2/8 ++

2

Again, one can calculate directly that t> (0;.s,2) = s /8 - z/4.

z/4.

431

Parabolic Directional Derivative

T h e C 1 1 Case

4

It has become evident in recent years that much of the theory of nonlinear pro gramming and second-order differential analysis—for example, optimality conditions, Taylor expansions, and sensitivity results—can be extended very nicely from a C 2 setting to a C 1 , 1 setting (see for example [18, 20, 21, 34]). Just as first-order nonsmooth analysis is most effective for locally Lipschitzian functions [10], so also many theorems of second-order nonsmooth analysis take on a particularly simple form for C 1 ' 1 functions. In this section, we will see that Theorem 3.3 can be significantly sim plified in a C ' setting. Our simplified bound will be stated in terms of a Lagrange multiplier set, defined for d G T,(s) by A, = 0 , V i e A€R™xRp

M(x0,d):=

Vf{x0)

+

J\I{x0,d), £

\,Vg,(x0)

+ £ Xm+iVhi(xo)

= 0

Theorem 4.1 Let f, gi, i £ J, h, i G L, be C 1,1 at x0 G fl(0). Suppose that E(s) is nonempty, h is twice Neustadt differentiable at each d G S(s), V/j(x 0 )R" = R p ; and Vfc(*o) -1 (0) n {y | V9i(x0)y

< 0, Vz G I(x0)} + 0.

(17)

Let z G R m + P . Then v++(0:s,z)< V

'

'

y

inf

max

— rfg£(3) \eM(x0,d)

d\f(x0; +

d) + E/(i„,rf) k
(18)

PROOF. For d G E(s), z G R m + P , consider the convex program

(Vg<(xo), y) + d\gi{xo\ d) < z„ (Vx) a(d):=M{{Vf(x0),y)

+ dlf(x0;d)

(Vh,(x0),y)

i G I(x0,d), + d2hi(xQ;d) = zm+„ i GL

Our hypotheses guarantee that Assumption 3.1 is satisfied, so by Theorem 3.3 and Proposition 2.1, v++(0; s, z) < infd€E(3) a(d). For A G R m + P and y G R", define p

L{\, y) := (V/(x 0 ) + £ l{x0,d)

A,V 9i (x 0 ) + £

Am+1-Vfc,-(x0),»).

>=1

Then the Lagrangian dual of (Vj) is (Dl)

/3(d) ■■= sup

inf

L{Kv)- < A,z > +d\f(x0;d) + E/(i 0 ,i) Kd%9i{x
D. Ward

432

If A g M(x0,d), then L(X,y) = 0 for all y g R n and it follows that 13(d) =

sup xeM(xoM)

Otherwise, infveR.n L(X,y) = - o o ,

- < \,z>+dlf{x0;d) + E/(r0,d) ^
The constraint qualification (17) implies that the Mangasarian-Fromovitz constraint qualification is satisfied for program (V\). The multiplier set M(xo,d) is nonempty and compact [14], and so /3(d) is finite and attained for some A g M(x0,d). Condition (17) also implies that Slater's condition is satisfied for (Vi). By Lagrangian duality (see e.g. [4, Theorem 6.2.4]), it follows that 0(d) = a(d). Therefore (18) holds. D We illustrate Theorem 4.1 with an example from [15-17]. Example 4.2 In problem V(s), let n = 2, m = 2, p = 0, and define /{xi, x2) = — x2, gi(xux2) = x\ + x2,g2(xux2) = -xj + x2. Then n(0,0) = {(0,0)} and u(0,0) = 0. Let xo = (0,0), s = (0,1). One may calculate directly that u+((0,0);s) = 0. If z = (1,1), then u ++ ((0,0);.s,.z) = limsup (^0+

v(t2/2,t + t2/2) =-1. t*/2

On the other hand, E(s) = {(dud2) | d2 = 0}, l(x0,(du0)) {(1,0)} and the right-hand side of (18) becomes

= {1}, so M(x 0 , (du0))

=

iniU2dl - 1) = - 1 . ai€It

So with z = (1,1), the bound in (18) is attained. If we let s = 0 in Theorem 4.1, we obtain the following C 1,1 version of Corollary 3.5: Corollary 4.3 Let f, #;, i g J, h,, i e L, be C 1,1 at x 0 g S1(0). Suppose that h twice Neustadt differentiable at each d in Vf(x0)d

= 0,

"I

E ( 0 ) = Id Vgi(x0)d < 0, i g /(x 0 ), \ , Vh(xo)d = 0 Vh(x0)[R.n] = R p , and (17) holds. Let z g R m + P . Then v+(0: z) < inf

max

d£E(0) AeM(io.J)

d\f(x0\ d) + E/(x„,«f) Kdlgi(x0; d) + Efc 1 A m + ,(f 2 /i,(x 0 ; < f)-(A,z)

< +oo.

(19)

Parabolic Directional Derivative

433

It is interesting to compare Corollary 4.3 with the bounds in [2,10,17,29,31], which use only first-order information about / , #,-, and ht. For example, if / , g<, and k, are strictly differentiable at x0 G ft(0), Vh(x0)Rn = R", and (17) holds, then by Theorem 4.4 of [31], v + (0; z) < max{-(A, z) | A € M(x0)}

< +oo,

(20)

where A, = 0, Vi 6 J \ I{x0), M ( i 0 ) := I A € R+ x R" V / ( i o ) + £ A,V (x ) + £ X Vhi(x ) 5i 0 m+i Q /(i0)

= 0

i=i

The inequality (20) holds under weaker hypotheses than those of Corollary 4.3. How ever, if the hypotheses of Corollary 4.3 are satisfied, then (19) gives a sharper bound than (20). We illustrate this point with an example. Example 4.4 In Example 4.2, let x0 = (0,0), z = (1,0). As in [17], one may calculate that t> + ((0,0); z) = —1/2, while the right-hand side of (20) is 0, so that the bound in (20) is not tight. However, E(0,0) = {(dud2) \ d2 = 0), so that /(x o ,( 0}, and the right-hand side of (19) is inf max A! (2d\ - l ) - 2A2d? = ^ f R a ( d l ) ' <*ieR(>i,A2)eM(xo) where

.-f-2dl a(d) a ( d l )

-\

^ Mil <-5, 2^-1

if|rf1|>.5.

Since mf^gH.a(
lirn{v(tz)-v(0))/t

under an additional Holder condition on the solution multifunction. strengthens previous work of Gauvin and Janin [16].

Their work

D. Ward

434

5

Conclusion

We have established an upper bound for v++(0;s,z) under weaker smoothness as sumptions than have previously been used for such estimates. An interesting special case of this bound is an upper bound for the upper Dini directional derivative of v that takes advantage of second-order information about the objective and constraint functions. There are several open questions that remain to be addressed. For example, when can equality be guaranteed to hold in our upper bounds? Can a lower bound for D + + be derived under the weakened smoothness assumptions of this paper? Can our results be incorporated into a more general theory that also includes bounds on second-order directional derivatives of the solution function or multifunction of a nonsmooth program? Since there are generalizations of Theorem 2.4 that do not require strict differentiability, is it possible to weaken Assumption 3.1(b)? We hope to address these questions in future work. Acknowledgement. I am grateful for the perceptive comments of Professor Marcin Studniarski and the referees.

References [l] J.-P. Aubin and H. Frankowska, Set- Valued Analysis. Birkhauser, Boston, 1990. [2] A. Auslender, Differentiate stability in non convex and non differentiable pro gramming, Mathematical Programming Study 10 (1979) 29-41. [3] A. Auslender and R. Cominetti, First and second-order sensitivity analysis of nonlinear programs under directional constraint qualification conditions, Opti mization 21 (1990) 351-363. [4] M. S. Bazaraa, H. F. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley, New York, 1993. [5] A. Ben-Tal and J. Zowe, A unified theory of first and second-order conditions for extremum problems in topological vector spaces, Mathematical Programming Study 19 (1982) 39-76. [6] A. Ben-Tal and J. Zowe, Directional derivatives in nonsmooth optimization, Journal of Optimization Theory and Applications 47 (1985) 483-490.

Parabolic Directional Derivative

435

[7] J. F. Bonnans and R. Cominetti, Perturbed optimization in Banach Spaces I: A general theory based on a weak directional constraint qualification, Preprint, 1993. [8] J. F. Bonnans, A. D. Ioffe, and A. Shapiro, Developpement de solutions exactes et approaches en programmation non lineaire, C.R. Acad. Set. Paris 315 (1992) 119-123. [9] J. M. Borwein, Stability and regular points of inequality systems, Journal of Optimizaiion Theory and Applications 48 (1986) 9-52. [10] F. H. Clarke, Optimizaiion and Nonsmooth Analysis, Wiley, New York, 1983. [11] V. F. Dem'yanov and B. Pevnyi, Expansion with respect to a parameter of the extremal values of game problems, U.S.S.R. Computational Mathemaiics and Mathemaiical Physics 14 (1974) 33-45. [12] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Programming, Academic Press, New York, 1983. [13] N. Furukawa, A second-order extension of Ljusternik's theorem without twice Frechet differentiability condition, Bulletin of Informatics and Cybernetics 25 (1992) 53-59. [14] J. Gauvin, A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming, Mathemaiical Programming 12 (1977) 136-138. [15] J. Gauvin and R. Janin, Directional behaviour of optimal solutions in nonlinear mathematical programming, Mathemaiics of Operations Research 13 (1988) 629649. [16] J. Gauvin and R. Janin, Directional Lipschitzian optimal solutions and directional derivatives for the optimal value function in nonlinear mathematical programming, H. Attouch, J.-P. Aubin, F. Clarke, I. Ekeland (Eds.), Analyse Nonlineaire (Gauthiers-Villars, Paris, 1989), 305-324. [17] J. Gauvin and J. W. Tolle, Differential stability in nonlinear programming, SIAM J. Control Optimizaiion 15 (1977) 294-311. [18] J.-B. Hiriart-Urruty, J.-J. Strodiot, and V. Nguyen, Generalized Hessian matrix and second-order optimality conditions for problems with C1'1 data, Applied Mathematics and Optimizaiion 11 (1984) 43-56. [19] A. D. Ioffe, On sensitivity analysis of nonlinear programs in Banach spaces: the approach via composite unconstrained optimization, SIAM J. Optimizaiion 4 (1994) 1-43.

436

D. Ward

[20] D. Klatte and K. Tammer, On second-order sufficient optimality conditions for C 1,1 optimization problems, Optimization 19 (1988) 169-180. [21] B. Kummer, An implicit-function theorem for C 0,1 -equations and parametric C'^-optimization, Journal of Mathematical Analysis and Applications 158 (1991) 35-46. [22] L. I. Minchenko and P. P. Sakolchik, Holder behaviour of optimal solutions and directional differentiability of marginal functions in nonlinear programming, Preprint, 1994. [23] R. T. Rockafellar, Marginal values and second-order necessary conditions for optimality, Mathematical Programming 26 (1983) 245-286. [24] A. Seeger, Second-order directional derivatives in parametric optimization prob lems, Mathematics of Operation Research 13 (1988) 124-139. [25] A. Shapiro, Second-order derivatives of extremal-value functions and optimality conditions for semi-infinite programs, Mathematics of Operations Research 10 (1985) 207-219. [26] A. Shapiro, Second-order sensitivity analysis and asymptotic theory of para metrized nonlinear programs, Mathematical Programming 33 (1985) 280-299. [27] A. Shapiro, Sensitivity analysis of nonlinear programs and differentiability prop erties of metric projections, SIAM J. Control and Optimization 26 (1988) 628645. [28] A. Shapiro, Perturbation theory of nonlinear programs when the set of optimal solutions is not a singleton, Applied Mathematics and Optimization 18 (1988) 215-229. [29] D. E. Ward, Differential stability in non-Lipschitzian optimization, Journal of Optimization Theory and Applications 73 (1992) 101-120. [30] D. E. Ward, Calculus for parabolic second-order derivatives, Set-Valued Analysis 1 (1993) 213-246. [31] D. E. Ward, Dini derivatives of the marginal function of a non-Lipschitzian program, SIAM J. Optimization to appear. [32] D. E. Ward, Epiderivatives of the marginal function in nonsmooth parametric optimization, Optimization 31 (1994) 47-61. [33] D. E. Ward and J.M. Borwein, Nonsmooth calculus in finite dimensions, SIAM J. Control and Optimization 25 (1987) 1312-1340.

Parabolic Directional Derivative

437

[34] X. Q. Yang and V. Jeyakumar, Generalized second-order directional derivatives and optimization with C ' functions, Optimization 26 (1992) 165-185. [35] C. Zalinescu, On convex sets in general position, Linear Algebra and its Appli cations 64 (1985) 191-198.

J. Zhang,

438

C. Xu and Y. Fan

Recent Advances in Nonsmooth Optimization, pp. 438-458 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd

A SLP Method with a Quadratic Correction Step for Nonsmooth Optimization Jianzhong Zhang Department of Mathematics,

City Polytechnic

Chengxian Xu Department of Mathematics,

Xian Jiaotong

Yuan-An Fan Frank Russell Co., Tacoma,

USA

of Hong Kong, Hong

University,

P.R.

Kong

China

Abstract To improve Fletcher-Sainz de la Maza method for composite nonsmooth opti mization problems, it is suggested in this paper that a method by Fontecilla can be incorporated to form a two- step movement. The new algorithm does not calculate a pair of orthogonal bases and thus the discontinuity problem pointed out by Byrd and Schnabel is avoided. Also, Powell's sufficient condition for superlinear convergence holds under rather mild conditions in the modified ver sion. It is shown that the revised method is globally convergent with a locally superlinear rate. Computational experiments have been conducted and the numerical results show that the performance of this new version is satisfactory.

1

Introduction

In this paper we consider the problem of minimizing t h e composite n o n s m o o t h func tion

Recent Advances in Optimization

Read more

Recent Advances in Optimization

Read more

Optimization and Nonsmooth Analysis

Read more

Optimization and Nonsmooth Analysis

Read more

Optimization and Nonsmooth Analysis

Read more

Optimization and nonsmooth analysis

Read more

Optimization and nonsmooth analysis

Read more

Recent Advances in Technologies

Read more

Recent Advances in Physiotherapy

Read more

Recent Advances in Mechatronics

Read more

Recent Advances in Mechanics

Read more

Recent Advances in Surgery

Read more

Recent Developments in Vector Optimization

Read more

Recent developments in vector optimization

Read more

Topological Aspects of Nonsmooth Optimization

Read more

Nonsmooth Mechanics and Convex Optimization

Read more

Recent Advances in Constraints

Read more

Recent Advances in Physiotherapy

Read more

Recent Advances in Mechatronics

Read more

Topological Aspects of Nonsmooth Optimization

Read more

Advances in Design Optimization

Read more

Advances in Design Optimization

Read more

Recent Advances in Anaesthesia and Intensive Care 022 (Recent Advances)

Read more

Recent Advances in Computational Terminology

Read more

Recent Advances in Epilepsy Research

Read more

Recent Advances in Hydride Chemistry

Read more

Recent Advances in Phototrophic Prokaryotes

Read more

Recent Advances in Plant Biotechnology

Read more

Recent Advances in Reinforcement Learning

Read more

Recent Advances in Applied Probability

Read more

Recommend Documents

Recent Advances in Optimization

Recent Advances in Optimization

Lecture Notes in Economics and Mathematical Systems Founding Editors: M. Beckmann H. P. Kunzi Managing Editors: Prof. D...

Optimization and Nonsmooth Analysis

Optimization and Nonsmooth Analysis

Optimization and Nonsmooth Analysis

Optimization and Nonsmooth Analysis / SIAM's Classics in Applied Mathematics series consists of books that were previ...

Optimization and nonsmooth analysis

Optimization and nonsmooth analysis

Recent Advances in Technologies

I Recent Advances in Technologies Recent Advances in Technologies Edited by Maurizio A. Strangio In-Tech intechwe...

Recent Advances in Physiotherapy

Recent Advances in Physiotherapy Edited by CECILY PARTRIDGE Recent Advances in Physiotherapy Recent Advances in Ph...

Recent Advances in Mechatronics

Ryszard Jab�lo´ nski, Mateusz Turkowski, Roman Szewczyk (Eds.) Recent Advances in Mechatronics ´ski, Mateusz Turkowsk...