This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
tp(a) + h(x) for every x g M3. Now the result follows immediately by the definition of a lower level set. □ 1 and so -A- < 1. + Vh(xk)d» = 0, -gi(xk) - Vgi(xkfdx = 0 for i G a(z'), mm(uk + d"-,-g,(xk) - Vgt(xkfdx) = 0 for i € /?(z*), for z 6 7(2*), w* + „*(x*) 5(x ) 0 (t £ T) with ^ZieT ^f = 1 such that YlteT ^t 0 V i 6 / f , i.e., Ll(x) + L2{d) > 0 Vx G H. From this follows (2.21), since LX(Q) = 0 , and (2.20), since Lx{-) is positively homogeneous. Rl, defined by ip(x) = / ( * ) + h(c(x)) 6Alk min {1, Alk/bk} ^A^min {1, Alkjbk }, then go to Step 6. Step 4: Take dk = 6k and hk = 6k. Step 5: If
Deep Cut Ellipsoid
3
Algorithm
109
The Ellipsoid Method
In this section we assume the existence of an oracle that for each convex function involved in the definition of (C) is able to provide at every point of its domain the function value and an element of the subgradient set. Thus, we will freely use f(x) and g(x) for different instances of x and take elements of the sets df(x) and dg(x). The main idea of the ellipsoid algorithm is the following. By Assumption 2.1 there exists a sphere with center a0 containing an optimal solution. Subsequently an oracle (to be detailed) provides an hyperplane including the center of the sphere and with the property that the corresponding lower halfspace includes all the optimal solutions. Clearly, by Assumption 2.1 an optimal solution belongs to the intersection of the initial sphere and the lower halfspace and this halfsphere has half the volume of the sphere. It is now possible to compute an ellipsoid with minimal volume containing this intersection. We now replace the initial sphere by the new ellipsoid and repeat the procedure. Depending on the information available at the present stage of the algorithm it is convenient to take less than half of each ellipsoid whenever possible. We refer to the first type of cuts, going through the center of the ellipsoid, as central cuts and to the second type of cuts, that leave the center of the ellipsoid in the part to be neglected, as deep cuts.
Figure 1: The central and the deep cut ellipsoid methods In Figure 1 one can see the difference between the two types of cuts. The first picture represents a central cut on a sphere and the minimal volume ellipsoid that includes the half sphere. The second picture shows the same sphere and a cut parallel to the first but shifted halfway along the radius of the sphere. One can see in this case that the ellipsoid with minimal volume including this part of the sphere is indeed smaller than the one in the previous picture. The minimum volume ellipsoid required at each step is easy to compute if one finds such a (central or deep) cut. In order to recall how, we first need to introduce a mathematical description of an ellipsoid. A set £ C M" is called an ellipsoid if there exists a vector a G M° and a positive
110
J. B. G. Frenk and J. Gromicho
definite s X s-matrix A such that £ = £(A; a) := {x {as £ tf? R':a : (as (as -- afA-\x a)TA-1(a! -- a) a) << ll}} .. Moreover, in order to determine whether a given hyperplane in M" with normal a* intersects an ellipsoid £(A;a) we observe ([11]) that
and
min{a*Tas : as £ f(A; £(A; a)} = a*Ta - V^Atf V^Atf
(1)
max{a*Tx : as £ £(A; a)} = a*Ta + V^Atf. max{o* V^AH*.
(2)
T
This implies that the hyperplane C^(/3) with h(x) := a* (x - a) has a nonempty intersection with £{A;a) whenever -y/a^Aa* < -p < y/a*TAa* If this holds the hyperplane is called a valid cut. It is shown in [11] that for - 1 / s < a < 1 and a := ~P/y/a*TAa* a minimum volume ellipsoid containing the intersection £{A; a) (1 C% (/?) exists and this new ellipsoid has a strictly smaller volume than £{A; a). Moreover, for a = 1 the intersection of the lower halfspace and the ellipsoid reduces to a single point and there is no need to proceed. To bring the exposition into the framework of an iterative algorithm let us denote the current ellipsoid by £(Am;am) and the corresponding cut by C^Jpm) with hm(x) := < T ( x - am). Finally, denottng the depth of the current cut by 0 < am := :=
m ~f / " sJ
one can show ([1]) that the ellipsoid £(Am+1;am+i)
< < 11 with center given by
am+l •— a m - rrmmb6mm and matrix Am+1 m+l
(3)
ammbmmbTTJ := 6mm(Amm - <J
(4)
with updating values .
6m =
'-
s2(l-a2Jm)
s2-1
'
am: am:
-(, -(,
2(1 2 ( 1 +sa + samm))
l ) ( l++ QQmmy) ' + l)(l
r T:
1 +l+sa sam
- ™-:-^TT ^rr
and
\Ja*mAma*m is t , e minimum volume elliDso:J containing £(A . « ) H C- (8 ) We note here that for am = 0 (resp. 0 < am <1) the hypwplane is called a valid central cut (resp. valid deep cut).
Deep Cut Ellipsoid
Algorithm
111
Taking the same matrix Q as described on page 151 of [16] and copying with some obvious modifications the proof of Propositions 2.7 and 2.8 of [16] one can show that Am+i is positive definite given that a2m < 1 and Am is positive definite. Since in this section cuts are generated by means of subgradients the following result is useful to determine when these cuts are valid.
L e m m a 3.1 Let
Proof: Observe by the subgradient inequality, x G £{A;a)
and (1) that
0 <
= Va*TAa* and hence by (1) with /9 := £ —
D
The ellipsoid algorithm is well-known for the occurrence of numerical instabilities. These instabilities are related to the fact that each time a new ellipsoid is generated this new ellipsoid is not completely included in the previous one, bringing new points into consideration (see Figure 1). The inclusion of new points may induce the ellipsoid to elongate along one of its axes in such a way that eventually it may become "flat" on one of the other 5 — 1 axes. An effect of the potential elongation of the ellipsoid is that the center of the present ellipsoid may end up extremely far away from the initial ellipsoid. In order to try to overcome this we introduce the concept of norm cuts. Such cuts appeared first in [8]. The idea is to remember Assumption 2.1 at each iteration and not as for the basic version to use this information only to initialize the algorithm. As the next section shows the inclusion of this new cut permits a very simple convergence proof. Thus, if it happens that the center am of the current ellipsoid is outside the first ellipsoid then a cut is generated using the function n(x) := ||SE — a 0 || 2 . A geometrical interpretation of this cut is given in Figure 2. Before discussing the detailed steps of how to implement the oracles yielding the different cuts we first list the improved version of the algorithm.
J. B. G. Frenk and J. Gromicho
112
Step 0 let m := 0, Am := r2I and Ci-i := +o°; Step 1 if am satisfies some stopping criteria then stop else goto Step 2; Step 2 ifn(am) > r then (apply a norm cut] else if g{am) > 0 then (apply a constraint cut] else (apply an objective cut]; Step 3 (update the ellipsoid], let m := m + 1 and return to Step 1
v
y
The above algorithm requires the specification of four different procedures marked as framed statements. To start with a norm cut we observe the following. It follows by Lemma 2.3 and Assumption 2.1 that x' G £(Ao',ao) = ££(r) Q ^Km(r ~ n(am)) with hm(x) := Vn(a m ) T (i — a m ) and Vn(am) = ,.° m J°S, . Consequently the optimal point x* belongs to the lower halfspace £^m(/3m) with /3m := r — n(am). The validity of this norm cut follows now from i * G £(Am; am) and Lemma 3.1 and so the norm cut reduces to the following procedure. let o m := {n(am) -
r)/yJVn(a m ) T A m Vn(a m ) J
Clearly, after performing a norm cut if follows that x' £ £(Am+1;am+1), In order to apply a constraint cut we observe by Lemma 2.3 that £ | ( 0 ) C £/Tm( — (am)) with hm(x) := a*m{x — am) for some nonzero a^ 6 dg(am) and hence x' e Cf(0) C C£m(/3m) with fim := -g(am). Moreover, by x' G £{Am;am) and Lemma 3.1 the hyperplane £^m(/3m) is a valid cut. Clearly it is a valid deep cut since we assume that g(am) > 0 and so the constraint cut reduces to the next procedure.
Observe, since £J=(0) is nonempty and g(am) > 0 that a*m can not be zero. It is also clear after performing a constraint cut that x" G £(Am+i\ am+1). Moreover, to apply an objective cut we observe the following. Since / is finite and convex on 1RS it follows for every x G £j(0) that the subgradient set df(x) is nonempty ([19, 12]) and hence for every a*m G df(am) the so-called subgradient inequality /(as*) > f(am) + hm(x) holds with hm(x) := al£(x - am). Observe, if a^ = 0 then am is optimal and therefore there is no need for a cut. For a derivation of a deep or central valid cut with respect to / introduce £m := min{/(ajt) : y(a^) < 0, 0 < k < m} and observe by Lemma 2.3 that x~ G CJ{lm) C C^m((m f(am)).
Deep Cut Ellipsoid
113
Algorithm
Hence x~ must belong to the lower halfspace £ j (/?m) with /?m := £m — f(a.m)- The validity of this cut follows again by x" G £(Am;am) and Lemma 3.1. Clearly this is a valid deep cut whenever Zm < f(am) and this deep cut can be derived using only negligible additional computer effort. The objective cut now reduces to the next procedure. if f(am)
< C _ ! then let £m := f(am eJselet £m := £m-i; take a*m G df(am); if a*m = 0 then stop eise let am := (f(am) -
em)/^/a£Ama*
It is also clear after performing an objective cut that x* G £(Am+i; a m +i). Finally the update of the ellipsoid in S t e p 3 is done by applying formulas (3) to the center and (4) to the matrix. This concludes the description of the four procedures necessary to complete the ellipsoid algorithm. The technique to generate deep cuts using the subgradient inequality was first proposed by Shor and Gershovich in [25]. In the next section we give an easy and geometrically oriented rate of convergence proof for this version of the ellipsoid algorithm.
4
T h e Proof
To start this proof we need the following result. L e m m a 4.1 For each matrix A G J? 3XS with det(A) / 0 and each pair of vectors a,b€M' we have det(A + abT) = (1 + 6 T A" 1 a)det(>i). Proof: Observe by well-known properties of determinants ([13]) that det(>l + aV) = d e t ( [ ;
= det('1 + which completes the proof.
A
^ fc
] ) = det
1 -a
6T A
^f° ° T ])=(l + 6^a)det(A) n
We now assume that the ellipsoid algorithm has already performed m steps, m = 1,2,..., with centers ak, 0 < k < m, and no optimality check or stopping rule was applied.
J. B. G. Frenk and J. Gromicho
114
We may assume without loss of generality that 0 < a* < 1 and a*k ^ 0 for every 0 < k < m since a^ = 1 would mean that £(Ak+i; at+i) reduces to a point and a*k = 0, only possible for the objective function / , would make the algorithm stop. The following result is fundamental for our proof. For the definition of the se quences £m and /?m and the functions n and hm we refer to the previous subsection. Lemma 4.2 For each m > 0
cf(em)ncf(o)nc$(r)
C
£(Am;am)nC^(0m)
holds. Proof: Recall first that C^(r) = £"(j40;ao)- Now observe that in each iteration the ellipsoid algorithm applies either an objective, a constraint or a norm cut. By Lemma 2.3 we obtain in case of an objective cut that Cj(£m) C £ S m (/? m ), while a constraint cut satisfies £ | ( 0 ) C £j7m(/?m) and a norm cut ££(r) C Cnm(fim). The result follows now easily by induction since £m is nonincreasing and £(Am;am) n
CZjpm)C£(Am+l;am+l).
a
In order to prove the main convergence theorem we need the following regularity condition. This condition is the strong Slater condition (cf. [12]). Assumption 4.3 There exists some x € a0 + rB with g(x) < 0. The following result will also be useful for the proof of the main convergence theorem. Lemma 4.4 Let
z+^
W
with Lipschitz
£<(,).
Proof: Take x £ z + ^f^B. It follows that ||x - z\\2 < y-=£p- and this implies by the Lipschitz continuity of the function ip that y?(x) — v>(2) — IvC35) — v ( z ) l ^ Lv\\x — z\\2 < y — v(z). From this inequality we obtain that
Deep Cut Ellipsoid
Algorithm
115
Theorem 4.5 If the ellipsoid algorithm executes an infinite number of iterations then it follows that £m J. / ( * * ) . Moreover, if f is Lipschitz continuous on £ | ( 0 ) (~l C£(r) with Lipschitz constant Lj and g is Lipschitz continuous on C£(r) with constant Lg then there exists some m 0 such that
5
^-«^ ^?(i)'!^
-Qk otk
for every m > m0. Proof: We start by evaluating det(A m ). From (4) and Lemma 4.1 it follows by induction that m-1
det(Am) = det(,4o) I I (**(* ~ »*)) • k=0
Since Ao = r2I and Am is positive definite for every m we obtain after some calcu lations that
o<det(^)=^n(a"(i-^(^)<^(i),m and so it follows using ab > 1 that det(j4 m ) —» 0. By the convexity of £ < ( 0 ) n £ < ( r ) , Assumption 2.1 and Assumption 4.3 it follows that the line segment [£,»:*[ := {Ax + (1 — A)x" : 0 < A < 1} is contained in £<(o)n£<
with u3 := vol(B) ([11]), we obtain from
0 < 6'v, = vol (x + SB) < vol (£{Am; am) n £^(/?m)) < i vol(£(A m ; o m )) = i y d e t ( J 4 m ) v , for every m > 0 and this contradicts det(A m ) —» 0. Hence it must follow that £m I /(**) a n < l s o t n e n r s t P a r * >s proved.
116
J. B. G. Frenk and J. Gromicho
Figure 2: A norm cut
Figure 3: Geometric interpretation of the proof
Deep Cut Ellipsoid Algorithm
117
To prove the stated inequality we first assume that every optimal solution a;* satisfies g(x') = 0. Since (m I f{x") and by our assumption / ( g ) > f(x') there exists some m, such that
/(»*) /(*') < 4, < /(«)
for every m>mx. The continuity of / enables us to create the sequence xm G [£, x'[ w i t h / ( x m ) = lm. Now we use this sequence to create the sequence xm := {xm+x')/2 (see Figure 3) and for this new sequence it follows by the convexity of / that f(xm) < Hence by Lemma 4.4 we obtain that im xm
in <(U + ^-f(*~)BcC + -Tf{ia). BcCj(lm).
Recall now from the convexity of / and lm = f(xm)
(5) (5)
that
C ~ /(»m) >^ tCm- ~ /f(x') C (»') | | x m - x m | | 2 ~ | | x« m - x«*- | | 2, and, by construction, \\xm - x'\\2 = 2||x m - x m || 2 . This yields that tm - f(xm) {lm - /(x*))/2 and thus (5) implies that
>
i.^-nc^). 2L/
(6) (6) 2Lf On the other hand, by the convexity of g we obtain that g(xm) < 0 and applying again Lemma 4.4 yields that (7) C?)
^ / " ^ S C ^ O ) . LLgg
Now, from the convexity of g, g(x") = 0, and the Lipschitz continuity of / with Lipschitz constant Lf it follows that
1 , ^ ,-J|x*-xm||2 ^ ... * » - / ( » • ) 9{x) -*(*») > --^f^) ^ f ^ ) ^^ --g^ g a2|l«*-a|l, l l s ' - £ | | , "- " ^ ^~L . 2Lj\\x--x\\ l l x ' - x l b2 -*(*») and this, together with (7), leads to + Xmm +
*
-g(x){i -g(x){£mm - / (f(x')) **))„ < K 2 L / L J x * - xx\\, | | 2 BS CC£ * 3( 0( 0)) 2L,L,\\x--
Combining (6) with (8) and observing that -g(x) Xm
+
f(X )] ' Bc 2L}Lg\\x' -x\\2 }Lg\\x'-x\\2
((8) 8
< Lg\\x' - x\\2 finally yields
C<(i„. ) n £ < ( 0 ) .
)
118
J. B. G. Frenk and J. Gromicho
Since [x,x'[ C C<(r) there exists an e > 0 such that i + J C x € [x,x*[. Taking now m2 such that for m > m2 -g{x)(£ -g{x)(£mm - /(«*)) 2L 2LfL/ IJx*-x|| g\\x'-x\\2 2
£<(r) for every
<£
"
it follows by Lemma 4.2 for m > m0 := max{m 1 , m2} that
-**®1-"
<
(f
) n £*-(()) n C^fr)
a Q£(A C£r(A n £ ^ (An). m ro; ;omro))n£^(A»).
Thus
-(^«3?-)*-«^<.*««.
and computing these volumes gives
(-g(x){lmm-f{x'))\\_ (-g(x)(i - f(x'))Y ii 2L,L \\x'-x\\ ) 9 2 2L/La||x--x|| 2 J
1 / 1 \ * /I ^ 11 I 2 . m— I 2 V./1-<M r2 T T / M ' / (1 al) r ( 1 _ a J '{lVTT^r+ akh" ' - '~2\ 2 \ | °E\ab) /iUJ J-V\
V
Dividing the previous inequality by v„ raising both sides to s" 1 and multiplying by 2L L ' '}fi~*[h yields the desired result. To complete the proof we still have to consider the case when an optimal solution x" exists satisfying g{x") < 0. It is not difficult to verify that for this case the same result holds, since in fact im can be taken equal to x* for every m > m0 for which <"-/(*•) < e with s > 0 satisfying x'+eBC £<(0) n £<(r). Thus, from Lemma 4.4 we conclude that
x. J\Xx )> Q x* i. *-m m ~ f( c ££ cK(£ ) n £<(r) £ < (r) n £<(o) JJ
Lf
Lf
n n
g g
and from here on one may proceed to achieve the same result.
D
Finally, we would like to remark if an unconstrained instance of (C) has to be solved then it follows from the last steps of the proof (see also Ref [7]) that
o<em
-«"^U) S^ T
/
-i
\ "I
m-1
-
^
/ ,i a*
Deep Cut Ellipsoid
Algorithm
119
References [1] R. G. Bland, D. Goldfarb and M. J. Todd, The ellipsoid method: A survey, Operations Research29 (1981) 1039-1091. [2] D. den Hertog, Interior point approach to linear, quadratic and convex program ming - algorithms and complexity, volume 277 of Mathematics and Applications, Kluwer Academic Publisher, 1994. [3] S. T. Dziuban, J. G. Ecker and M. Kupferschmid, Using deep cuts in an ellipsoid algorithm for nonlinear programming, Mathematical Programming Study 25(5) (1985) 93-107. [4] J. G. Ecker and M. Kupferschmid, An ellipsoid algorithm for nonlinear program ming, Mathematical Programming 27 (1983) 83-106. [5] J. G. Ecker and M. Kupferschmid, A computational comparison of the ellipsoid algorithm with several nonlinear programming algorithms, SIAM Journal on Control and Optimization 23(5) (1985) 657-674. [6] A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Uncon strained Minimization Techniques, Wiley, New York, 1968. [7] J. B. G. Frenk, J. Gromicho and S. Zhang, General models in min-max con tinuous location: Theory and solution techniques, Technical Report TI 93-175, Tinbergen Institute, Rotterdam, The Netherlands, 1993. To appear in Journal of Optimization Theory and Applications. [8] J. B. G. Frenk, J. Gromicho and S. Zhang, A deep cut ellipsoid algorithm for convex programming: Theory and applications, Mathematical Programming 63(1) (1994) 83-108. [9] J. L. Goffin, Convergence rates of the ellipsoid method on general convex func tions, Mathematics of Operations Research 8 (1983) 135-150. [10] D. Goldfarb and M. J. Todd, Linear programming, volume 1 of Handbooks in Operations Research and Management Science, chapter II. North-Holland, Amsterdam, 1989. [11] M. Grotschel, L. Lovasz and A. Schrijver, Geometric Algorithms and Combina torial Optimization, Springer-Verlag, Berlin Heidelberg, 1988. [12] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I: Fundamentals, volume 305 of A Series of Comprehensive Studies in Mathematics, Springer-Verlag, Berlin, 1993.
120
J. B. G. Frenk and J. Gromicho
13] P. Lancaster and M. Tismenetsky, The Theory of Matrices, New York, second edition, 1985.
Academic Press,
14] D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Reading Massachusetts, 1984. 15] H. J. Luthi, On the solution of variational inequalities by the ellipsoid method, Mathematics of Operations Research 10 (1985) 515-522. 16] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Wiley, New York, 1988.
Optimization,
17] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, John Wiley & Sons, Chichester, 1983. 18] F. Plastria, Lower subdifferentiable functions and their minimization by cutting planes, Journal of Optimization Theory and Applications 46(1) (1985) 37-53. 19] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. 20] R. T. Rockafellar, Conjugate Duality and Optimization, 1974.
SIAM, Philadelphia,
21] N. Z. Shor, Utilization of the operation of space dilation in the minimization of convex functions, Cybernetics 6 (1970) 7-15. 22] N. Z. Shor, Convergence rate of the gradient descent method with dilation of the space, Cybernetics 6 (1970) 102-108. 23] N. Z. Shor, Cut-off method with space extension in convex programming prob lems, Cybernetics 13 (1977) 94-96. 24] N. Z. Shor, Minimization Methods for Non-differentiable Functions, series in computational mathematics, Springer-Verlag, Berlin, 1985.
Springer
25] N. Z. Shor and V. I. Gershovich, Family of algorithms for solving convex pro gramming problems, Cybernetics 15 (1979) 502-508. 26] D. B. Yudin and A. S. Nemirovsky, Evaluation of the informational complexity of mathematical programming problems, Matekon 13(2) (1976) 3-25. 27] D. B. Yudin and A. S. Nemirovsky, Informational complexity and efficient meth ods for the solution of convex extremal problems, Matekon 13(3) (1977) 25-45.
Nonsmooth
Equations
by Means of Quasi-Newton
with Globalization
121
Recent Advances in Nonsmooth Optimization, pp. 121-140 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Solving Nonsmooth Equations by Means of Quasi-Newton Methods with Globalization Marcia A. Gomes-Ruggiero, Jose Mario Martinez and S a n d r a Augusta Santos Department of Applied Mathematics, State University of Campinas, CP 6065, 13081 Campinas SP, Brazil
Abstract
We consider the utilization of quasi-Newton methods for solving nonlinear sys tems of equations, without smoothness assumptions. In order to improve the global convergence properties of the algorithms, we use a globalization strat egy based on a merit function. We adopt a tolerant procedure that permits a nonmonotone behavior of the merit function. We test our methods with a collection of large scale nonsmooth systems originated in nonlinear complemen tarity problems.
1
Introduction
We consider t h e resolution of nonlinear systems of equations F(x)
= 0
(1)
without smoothness assumptions on t h e m a p p i n g F : ( l C lRn —» IRn Methods for solving (1) are iterative. Quasi-Newton m e t h o d s , originally developed for smooth systems, g e n e r a t e sequences {xk} according to
xk+1 = xk- BklF(xk),
(2)
See [6], [7]. T h e quasi-Newton strategy consists in considering t h a t , choosing Bk in an a p p r o p r i a t e way, we have F{x)*Lk(x)
= Bk{x-xk)
+ F(xk),
(3)
122
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
at least in a neighborhood of the current point xk. Newton's method (for smooth systems) chooses Bk = F'(xk). See [7], [32]. Secant methods choose an arbitrary 5 0 and update the successive "Jacobian approximations" Bk in such a way that Bk+lsk
= yk
(4)
for all k = 0 , 1 , 2 , . . . , where sk = xk+i — xk and yk = F(xk+i) — F(xic). The "secant equation" (4) guarantees that the affine function Lk(x) defined on (3) satisfies Lk(xk) = F(xk)
and Lk(xk-i)
= F(xk--i).
(5)
The local convergence theory of secant methods for smooth systems of equations is well-developed. See [2], [7], [8], [25], [26]. Some authors (see [3], [4], [5], [21], [29]) studied quasi-Newton and secant methods for different nonsmooth problems, but a local convergence theory with sufficiently general assumptions is far from being complete. In spite of this limitation, the iterative approximation of a nonsmooth F by an affine interpolatory function is an appealing idea that deserves experimental investigation. This is one of the aims of this paper. In order to improve the global convergence properties of local algorithms, strate gies based on the minimization of a merit function / (most frequently f(x) = ||.F(£)|| 2 ) are generally used both in the smooth and the nonsmooth case. Martinez and Qi [28] considered one of these strategies in connection with the so called inexact iteration based on the Newton method. See also [20]. In this paper we generalize that strategy in order to consider completely arbitrary directions without smoothness or semismoothness hypotheses on F. We prove a global convergence theorem related to this strategy. We implement an algorithm that is able to deal with large nonsmooth systems. In this implementation we combine local (quasi-Newton) and global iterations using a tolerant strategy that calls to the global (special) iteration, which is more expensive, only when it is necessary. This strategy has been used for smooth systems in [12] and for some inverse problems in [9]. We use our methods to solve a collection of problems of dimensions 100 x 100 and 1000 x 1000. This paper is organized as follows. In Section 2 we introduce a general "global algorithm" for solving nonsmooth systems and we prove global convergence results. In Section 3 we describe a practical implementation (for large-scale problems) of the global algorithm. In Section 4 we define the quasi-Newton methods used in our study. In Section 5 we describe the tolerant strategy. The numerical experiments are reported and commented in Section 6.
2
Global Algorithm
In this section we analyze the convergence properties of a general model algorithm under minimal assumptions on F. In particular, no smoothness assumptions will be
Nonsmooth Equations by Means of Quasi-Newton with Globalization
123
made. As many methods for solving both smooth and nonsmooth equations, the algorithm is based on the monotone reduction of a merit function / . Usually, we choose
/(«) = ^ ( * ) H 2 n
(6) n
However, any function / : R -* R such that f(x) > 0 for all x £ R and f(x) = 0 iff F(x) = 0 will serve for our purposes. If F is not defined for some x € Rn, we define f(x) = oo. The main global algorithm, which use elements of the approaches of [10] for smooth problems and [28] for semismooth problems, is described below. Algorithm 2.1. Assume that cr G (0,1), 77 G (0, | ] . Choose x0 G Rn, an arbitrary initial approximation such that f(x0) < 00 and set a 0 = 1. Given xk G Mn, ctk 6 (0,1], the steps for obtaining xfc+i,ajfc+i are: Step 1. Choose dk G Rn (7) Step 2. If f{xk+akdk)
(8)
(9)
define ak+\ = 1. Otherwise, choose O/t+i 6 [yotk, (1 - r))ak].
(10)
Observe that Algorithm 2.1 is more general than line-search or backtracking meth ods (see [7]) since the "direction" dk can change after a failure of (8) or (9). We will take advantage of this feature in the practical implementation. We denote by K\ the set of indices of the ''very successful" iterations: K, = {keJN
I (9) holds}.
(11)
Let us now prove the convergence results related to Algorithm 2.1. The thesis of the convergence theorems will always be lim/t_0O f(xk) = 0. Clearly, if x, is a limit point of {xk} and / is continuous at x, this implies that f(x,) = 0. Lemma 2.1 Let {xk} be the sequence generated by Algorithm 2.1. If Y, log(l - o-ak) = - 0 0 ,
(12)
lim /(**) = 0.
(13)
the k —+00
124
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
Proof. By (9), for all k G Kt we have log /(x* + 1 ) < log(l - aak) + log Therefore, since f{xk+1)
f(xk).
< f(xk) for all k G W,
fog / ( * * ) <
l o g ( l ~ f f Q > ) + l o e //((ls0a ) E l°g(l~<">,•)+loe ieK, i
So, by (12), lim log f(xk) = - o o .
k—i-oo
□ D
This implies that /(x fc ) -»00
Lemma 2.2 Let {xk} be the sequence generated by Algorithm 2.1. Then, either lim^00/(xfc) = 0 or there exists K2l an infinite subset of IN, such that
lim akk = 00 and
/f(x( kx t++a^kd)k)-f(x - / (k)x t )
>>
(14) (14)
(
}
otk otk
(15)
forallkGK2. Proof. By Lemma 2.1, if (13) does not hold, either Kt is finite or E
log(l - aa
If #1 is finite, Kx C { 0 , 1 , . . . , fc0}, we have that (9) does not hold and ak+1 < (l-r,)ak for all k > k0. So, at -> 0 in ntis sase, and dth eesult ti sroved. If Kx is infinite and the series £*<=*, l°g(l - aak) converges, we necessarily have that ak -» 0 0orrcG Kt. Thereforee ak < 1 for rarge enough h G &i. Now, ,ince a* > 7<**-i) rjotk-i, we also have that «*_i -» 0 for fc G AV Moreover, since ak < 11 it follows that (9) does not hold at iteration fc - 1. So, (14) and (15) hold defining A 2 = {fcG W W|fc |fc + 1 e l ( , } . This completes the proof.
□
Lemma 2.3 Let {xk} be the sequence generated by Algorithm 2.1. Assume that there exist a G (0,1) and J, = {juj2l...}, an infinite subset of Kx such that a<*j, > |
>- ~ 7
for all i = 1,2,.... Then lim f(x /(xkfc) ) = 0. *:—oo
(16) (16)
Nonsmooth Equations by Means of Quasi-Newton with Globalization
125
Proof. By the fact that log(l - aak) < 0 for k G W and by (16), we have that £ fceft'i
log(l - aak) < Y. log(l - a a t ) < £ log(l - -r-). teJj
*=i
(17)
K
But, by the integral criterion, the series on the right hand side of (17) diverges, so the same happens with the series on the left hand side. Therefore, the desired result follows from Lemma 2.1. Q Lemma 2.3 suggests the following especialization of Algorithm 2.1. Algorithm 2.2. Assume that
f(xh + akdk) - f(xk)
(18)
< — crJ(Xk)
Oik
whenever
a at < — mk If such a choice is not possible, stop. (The algorithm breaks down.) Step 2. If (8) holds, define xk+1 = xk + a ^ . Otherwise, define Xfe+1 Step 3. If (9) holds, define ak+i = 1 and mk+l
(19)
x k-
= mk + 1.
Otherwise, choose ajt+i G [»?ai, (1 — ri)oik} and define m^+i = mk. Lemmas 2.1, 2.2 and 2.3 allow us to prove the following theorem related to Algo rithm 2.2. Theorem 2.4 Assume that Algorithm S.2 does not break down and that {xk} is gen erated by this algorithm. Then lim /(**) = 0. Proof. Let us prove first that Ki is infinite. In fact, if A'i C {0,1,...,feo} we should have mk = m^ for all k > k0. But, since k £ Ki for all k > k0, we have that Ofc+i < (1 — v)ak for all k > k0. This implies that ak —> 0 and then, for some k > fc0, ak <
a a = —. mko mk
So, by the definition of the algorithm, (18) holds at iteration k. This implies that (9) holds. So, k G Ki, which contradicts the assumption of finiteness of K%.
126
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
Therefore, from now on, we assume that K\ is infinite. Let k £ K\. If ak = 1 we clearly have that ak > ^ . (20) mjfc If ak < 1, then k — 1 ^ Ki. So, mk = rrik-\- But, since (9) does not hold at iteration k — 1, we have that (18) does not hold at this iteration and, so, by the definition of Algorithm 2.2, a a a*-i > But ak £ [»?a*_i,(l - 7/)a/t_i], so Ck > Vak-i >
■qa
mk Therefore, we proved that K\ is infinite and that for all k £ Ki, (20) holds. As a result, we are under the hypotheses of Lemma 2.3, with K\ = J\ and a = rja. So, the desired result is proved. □
3
Implementation of the Global Algorithm
In this section we describe an implementation of Algorithm 2.2 for solving nonsmooth systems of equations, remembering that we are specially interested in large-scale problems. The main tool for our practical algorithm is a method for minimizing convex quadratics with box constraints developed in [13] and [14]. The idea is to choose, at each iteration, sk = a^dk as an approximate minimizer of
Ms) = \\\Vks + F(xk)f on an appropriate trust-region (see [11]) of the form ||s||oo < A) where Vk is a suitable n x n matrix. In semismooth problems we choose Vk £ dsF(xk) (see [28] and [35]). After the computation of sk, if ak < a/mk, we test the inequality (18). If this inequality does not hold, we stop the execution (the algorithm breaks down). This necessarily happens when the problem has no solutions. The choice of the || ■ ||oo norm instead of the Euclidean allows us to deal with bounds on the variables. In this case, the || • ||cx. norm fits well with the bounds and the approximate minimizers are not difficult to find. Observe that constraints are naturally considered in our formulation since we can define / ( x ) = oo if x is infeasible. Algorithm 3.1 Let a 6 (0,1), r/ € (0, | ] , a e (0,1), tol £ (0,1), max £ IN be given independently of k. Let xo £ Mn be an arbitrary initial point, such that / ( i o ) < oo, A 0 = M, mQ = 1 and a0 = 1. Given xk £ Rn, &k > 0, mk £ N and ak £ (0,1], the steps for obtaining xjt+i, Afc+i, mk+i and ak+\ are the following: S t e p 1. Compute 5^, an "approximate solution'' of
Nonsmooth Equations by Means of Quasi-Newton with Globalization
Minimize «/>*(*) = i||V kS + F(xk)\\2
s.t. H U < A*.
127
(21)
The approximate solution of (21) is obtained by applying the method described in [13] (see also [14]), stopping when \\VpMsk)\\
(22)
(where Vpif>k(s) is the projected gradient of V>* on the box Halloo < A*) or when the number of iterations used by the algorithm [13] exceeds max. Step 2. If ak < a/mk but (18) does not hold, stop (the algorithm breaks down). Step 3 . The same as Step 2 of Algorithm 2.2. Step 4. The same as Step 3 of Algorithm 2.2. Step 5. If ajt +1 = 1, define Afc+i = M. Otherwise, define Ajt+] = ||5*:||oo/2. The parameters used in our implementation were a = 10 - 4 , a = 10 - 5 , 77 = | , M = 103, tol = ^ j , max = 300. The software used for this implementation was an adaptation of the algorithm for box constrained minimization introduced in [14]. The algorithm used for obtaining the approximate solution of (21) (see [13]) is an active set method that combines conjugate gradient iterations with projected and "chopped" gradient iterations in such a way that many active constraints can be added or dropped in a single iteration.
4
Quasi-Newton Methods
As we mentioned in the Introduction, the idea of considering a linear system as subproblem for computing each iteration, using interpolation, is very attractive, not only because of its simplicity but also because of its success in the smooth case. In this study, we consider three quasi-Newton formulae for the implementation of the local iteration (2). The first one corresponds to the classical Newton's method. This choice can be made, for example if F is semismooth (see [31], [36]). In this case we choose Bk as one of the matrices Vj, in the generalized differential set dgF(xk). In large sparse problems, where each equation depends only on a few number of variables, % is a sparse matrix. The LU factorization of Vk, which is necessary for computing xk+\ in (2) was computed using a static data structure and partial pivoting as described in [15] and [18]. The other two quasi-Newton methods considered are Broyden's method and the Column-Updating method. See [1], [17], [23], [24]. In both methods we choose B0 as in Newton's algorithm and Bk+i in such a way that the secant equation (4) is satisfied for all k = 0 , 1 , 2 , . . . and the recurrence formula for obtaining Bk+i is B^
= Bk^-B;;fk)T
(23)
128
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
where Zk =
Sk
for Broyden's method and zk = ej>, \(e")Tsk\
= \\sk\U
for the Column-Updating method, ({e 1 ,... , e"} is the canonical basis of JRn.) Applying the Sherman-Morrison formula to (23) ([16], p. 51]) we obtain ^-fy^TB;>. (zkYBk yk Formula (24) shows that B£"+i c a n D e obtained from Bk^ using 0(n2) point operations in the dense case. Moreover, B^=B^
+
fl^x = (/ + «*(*0 T W 1 .
(24) floating
(25)
where uk = (sk - B ^ 1 j / k ) / ( z t ) T B ^ 1 ^ , so B;1 = (I +
Ufe_1(^_1)
T
) . - . ( / + u0(zo)T)Bo\
(26)
for k = 1 , 2 , 3 . . . Formula ( 26) is used when n is large (see [18] and [30]). In this case, the vectors u0,... ,uit_i, 20) ...,Zk-i are stored and the product BklF(xk) is computed using (26). In this way, the computer time of iteration k is O(kn) plus the computer time of computing BQ F(xk). If k is large the process must be periodically restarted taking
Bk « J{xk).
5
Tolerant Strategy
In Sections 3 and 4 we studied two ways of approximating a solution of (1). The first is to use the merit function f(x), applying the globally convergent Algorithm 3.1. If this algorithm does not break down at some iteration k, an approximate solution of (1) is computed, in the sense that (13) holds. The second is to use the recurrence (2). The global method is based on the monotonic behavior of f(xk). Many times, to impose f(xk+i) < f(xk) for all k is not satisfactory. In fact, at least for smooth problems, efficient local methods frequently converge rapidly to a solution but the generated sequence {xk} does not exhibit monotonic behavior in f(xk). In these cases, the pure local method is much more efficient than the monotone global method. Often, the monotone method converges to a local (nonglobal) minimizer of / , while the local method converges to a solution of (1). By these reasons, it is necessary to give a chance to the local method before calling the minimization algorithm. This necessity
Nonsmooth Equations by Means of Quasi-Newton with Globalization
129
has been considered by several authors (see [19]). Here we describe a strategy that combines local algorithms and minimization methods. A similar strategy has been used in [9] for some overdetermined systems coming from inverse problems and in [12] for large-scale differentiable nonlinear systems. Let us define "ordinary iterations" and "special iterations". By an ordinary iteration we understand an iteration produced by any local (quasi-Newton) method, like the ones described in Section 4. A special iteration is an iteration produced by Algorithm 3.1. We define, for all k € IV, wk=
Argmin
{f(x0),...,f(xk)}.
For completeness, we define f{wk) = f(x0) if k < q. Ordinary and special iterations are combined by the following algorithm. Algorithm 5.1. Initialize k <— 0, FLAG <— 1. Let q > 0 be an integer, 7 G (0,1). Step 1. If FLAG = 1, obtain xk+i using an ordinary iteration. Else, obtain xk+i using a special iteration. Step 2. If
/(**+,) < 7 / K - , )
(27)
set FLAG «— 1,k <— k + 1 and go to Step 1. Else, re-define xk+\ *— wk+i. Set FLAG < 1, k <— k + 1 and go to Step 1. If the test (27) is satisfied infinitely many times, then there exists a subsequence of {xk} such that lim f(xk) = 0. k—*oo
Conversely, if (27) does not hold for all k > kQ, then all the iterations from the &oth on will be special, and the convergence properties of the sequence will be those of Algorithm 3.1. The parameters 7 and q measure the degree of tolerance in the local algorithm. If q is large we admit a large number of iterations without enough progress. On the other hand, 7 says what we mean by "enough progress". The closer the real number 7 £ (0,1) is to the unity, the more tolerant the algorithm becomes. Therefore, the parameters 7 and q are essential to a satisfactory performance of our tolerant strategy.
6
Numerical Experiments
In a recent paper, Luksan [22] published a collection of 17 large-scale differentiable nonlinear systems of equations. For each Luksan's problem g,(x) = 0,i = l,...,n,
(28)
we associated a semismooth system of equations defining: 1, = ( 1 , 0 , 1 , 0 , . . . ) T
(29)
130
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
and, for all i = 1 , . . . , n, ft.(x\ _ I 9<(x) - gi(x.), i odd; \ 9i(x) - gi(x,) + 1, i even. Finally, we denned .F(x) = ( / i ( x ) , . . . , /„(x)) T , where for all i = 1 , . . . , n , /,(x) = min {xi,hi(x)}.
(30)
In this way, x» is a solution of the system F(x) = 0, which is equivalent to the nonlinear complementarity problem (see [33], [34]) x > 0, h(x) > 0, (x,h(x))
= 0.
We used the same initial points of [22]. We worked in a SUN Sparc Station 2 and used the SUN FORTRAN compiler, using single precision in the ordinary iterations and double precision in the special iterations. In the purely local methods, convergence is declared at Xk whenever ||F(xt)||2 < y'nlO - 5 and divergence is declared if either the number of iterations performed is greater than 100 or ||jF(a!i)||oo > 1020 In the globalized methods, convergence at iteration k is declared if ||.F(xjfc)||2 < \fnl0~b or Ajt < 10 - 1 0 . In the former we have convergence to a solution to (1) whereas in the latter Xk is a "strangled point", stationary for problem (6) but not necessarily a solution of (1), which corresponds to a breakdown at step 2 of the Algorithm 3.1. The results of the global methods are sensitive to the choice of the parameters 7 and q. We used 7 = 0.9 and q = 5 for all tests. The results of this set of experiments are summarized in Tables 6.1, 6.2 and 6.3 where, respectively, we present the performance of Newton's method, Broyden's method and the Column-Updating method both in their local and global versions. In the global version, every cycle of local iterations is executed with a Newton ini tialization. In our tables we adopted the following notation to present the numerical results: (IT, NFI, T) for the local methods and (IT (N,G), EVALF, NFI, T) for the global ones, where IT represents the total number of iterations performed; NFI = ||.F(x)||2/v^! where x is the vector obtained in the last iteration; T is the total time spent (in seconds) in the numerical phase; N is the total number of Newton iterations; G is the total number of special iterations; EVALF is the total number of evaluations of f(x). We observe that the symbolic phase is performed twice for each problem since we have two different dimensions (n = 100 and n = 1000). For the smaller dimension, it lasts 0.01 seconds on average. For n = 1000, the average time spent in the symbolic phase is 0.1 seconds.
Nonsmooth Equations by Means of Quasi-Newton with Globalization Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Dimension
Newton Local Version
Newton Global Version
n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000
(100, 0.5E-1, 1.04) (4, 0.2E-8, 0.34) (3, 0.3E-8, 0.04) (2, 0.1E-6, 0.19) (100, 0.2E+1, 1.16) (100, 0.2E+2, 9.52) (15, 0.9E-6, 0.15) (15, 0.3E-6, 1.27) (51, 0.5E-5, 0.59) (100,0.3, 10.72) (100, 0.1E+5, 0.83) (58, 0.1E+13, 3.72) (100, 0.2E+9, 0.81) (100, 0.6E+8, 6.76) (100, 0.2E+9, 1.18) (100, 0.7E+8, 9.86) (100, 0.5E+6, 1.43) (100, 0.2E+6, 13.34) (5, 0.5E-6, 0.05) (5, 0.3E-6, 0.48) (1,0.0, 0.01) (1, 0.0, 0.05) (6, 0.1E-11,0.04) (6, 0.1E-11,0.36) (100, 0.2E-3, 0.62) (100, 0.2E-3, 4.36) (100, 0.1E+1, 0.76) (100, 0.1E+1, 6.35) (100, 0.1E+8, 2.52) (100, 0.2E+9, 23.8) (4, 0.2E-7, 0.03) (4, 0.1E-6, 0.26) (100, 0.4E+15, 0.85) overflow
(24(10,14), 276, 0.2E-1, 3.1) (4(4,0), 5, 0.2E-8, 0.34) (3(3,0), 4, 0.3E-8, 0.04) (2(2,0), 3, 0.1E-6, 0.23) (37(7,30), 642, 0.1, 8.01) (100(5,95), 1605, 0.2, 2168.7) (11(8,3), 28, 0.0, 0.35) (23(15,8), 55, 0.0, 7.01) (35(22,13), 112, 0.4E-5, 6.72) (59(38,21), 145, 0.9E-5, 96.92) (66(10,56), 1178, 0.6,21.1) (12(4,8), 189, 0.1E+1, 20.4) (11(11,3), 31, 0.2E-11, 0.34) (11(9,2), 30, 0.2E-6, 3.1) (11(9,2), 25, 0.1E-5, 0.55) (18(13,5), 46, 0.1E-6, 9.84) (28(18,10), 82, 0.1E-5, 2.2) (27(17,10), 83, 0.1E-6, 25.8) (5(5,0), 6, 0.5E-6, 0.07) (5(5,0), 6, 0.3E-6, 0.49) (1(1,0), 2, 0.0,0.01) (1(1,0), 2, 0.0, 0.04) (6(6,0), 7, 0.1E-11,0.05) (6(6,0), 7,0.1E-11, 0.36) (12(10,2), 31, 0.0, 0.24) (11(9,2), 32, 0.9E-5, 2.52) (29(7,22), 489, 0.3, 4.3) (34(10,14), 507, 0.3, 47.5) (22(16,6), 61, 0.2E-6, 4.2) (25(19,6), 55, 0.2E-6, 135.7) (4(4,0), 5, 0.2E-7, 0.04) (4(4,0), 5, 0.1E-6, 0.25) (100(8,92), 108, 0.5E-1, 59.4) overflow
Table 6.1
Newton's Method: First set of problems
131
132
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Dimension
Broyden Local Version
Broyden Global Version
n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000
(7, 0.5E-6, 0.06) (100, 0.8E+6, 14.1) (100, 0.2E+7, 1.61) (100, 0.7E+6, 14.5) (100, 0.2E+3, 1.66) (100, 0.4E+3, 14.68) (67, 0.2E+26, 0.84) (14, 0.1E+14, 0.82) (5, 0.2E+35, 0.06) (100, 0.5E+22, 0.26) (100, 0.2E+1, 1.49) (100, 0.1E+1, 13.4) (100, 0.1E+11, 1.53) (100, 0.9E+11, 13.8) (100, 0.7E-2, 1.64) (92, 0.8E-5, 12.44) (100, 0.6, 1.64) (100, 0.2E-1, 14.8) (17, 0.5E-5, 0.13) (20, 0.3E-5, 1.23) (1,0.0, 0.01) (1,0.0, 0.04) (12, 0.1E-7, 0.07) (12, 0.5E-6, 0.55) (13, 0.5E+3, 0.08) (100, 0.5E+3, 12.3) (100, 0.4E+1, 1.54) (100, 0.1E+1, 13.5) (100, 0.5E+11, 1.85) (100, 0.1E+12, 17.0) (15, 0.2E-5, 0.10) (32, 0.7E-5, 2.07) (100, 0.4E+17, 1.57) overflow
(7(1,0), 8, 0.5E-6, 0.06) (10(2,3), 12, 0.3E-6, 0.98) (10(1,1), 12, 0.3E-15, 0.1) (10(1,1), 12, 0.2E-17, 0.9) (41(6,34), 702, 0.1, 9.15) (100(4,95), 1605, 0.2, 2170.3) (20(7,6), 47, 0.1E-8, 0.56) (17(4,3), 34, 0.8E-7, 3.67) (37(14,10), 147, 0.6E-5, 7.3) (85(27,50), 420, 0.8E-5, 868.) (67(9,56), 1179,0.6,21.5) (13(3,8), 190, 0.1E+1, 20.3) (29(9,8), 62, 0.1E-5, 0.7) (42(13,12), 112, 0.7E-5, 17.9) (30(10,9), 77, 0.2E-6, 1.35) (33(9,8), 73, 0.6E-12, 12.15) (59(18,26), 210, 0.2E-7, 5.37) (61(19,28), 222, 0.9E-5, 50.3) (17(1,0), 18, 0.5E-5, 0.13) (20(1,0), 21, 0.3E-5, 1.22) (1(1,0), 2, 0.0,0.01) (1(1,0), 2, 0.0,0.04) (12(1,0), 13, 0.1E-7, 0.07) (12(1,0), 13, 0.5E-6, 0.54) (13(4,3), 31,0.0, 0.25) (13(4,3), 32, 0.9E-5, 2.45) (30(6,22), 490, 0.3, 4.31) (34(8,24), 507, 0.3, 47.2) (29(7,6), 50, 0.3E-7, 2.03) (33(11,10), 70, 0.1E-5, 169.7) (15(1,0), 16, 0.2E-5, 0.09) (15(2,1), 22, 0.2E-6, 22.5) (100(7,92), 108, 0.5E-1, 59.3) overflow
Table 6.2 - Broyden's Method: First set of problems.
Nonsmooth Equations by Means of Quasi-Newton with Globalization Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Dimension
Column-Updating Local Version
Column-Updating Global Version
n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000 n=100 n=1000
(9, 0.2E-5, 0.07) (13, 0.1E-5, 0.57) (100, 0.9E+6, 1.18) (100, 0.7E+7, 9.14) (100, 0.6E+1, 1.08) (100, 0.5E+3, 9.14) (54, 0.4E+18, 0.47) (54, 0.1E+18, 3.79) (3, 0.2E+27, 0.03) (3, 0.6E+26, 0.18) (100, 0.1E+1, 1.6) (100, 0.1E+1, 14.4) (100, 0.5E+20, 1.00) (100, 0.8E+20, 8.41) (100, 0.2E+4, 1.05) (100, 0.2E+4, 8.77) (100, 1.36, 1.11) (100, 0.1E+1, 9.59) (13, 0.3E-5, 0.09) (13, 0.3E-5, 0.58) (1, 0.0, 0.01) (1, 0.0,0.05) (12, 0.5E-6, 0.05) (12, 0.5E-6, 0.45) (100, 0.9E+3, 0.8) (100, 0.9E+3, 6.4) (100, 0.1E+4, 0.99) (100, 0.8E+3, 7.96) (100, 0.2E+11, 1.25) (100, 0.4E+12, 11.7) (63, 0.2E-5, 0.51) (34, 0.5E-5, 1.55) (100, 0.3E+16, 0.96) (overflow )
(9(1,0), 10, 0.2E-5, 0.06) (5(2,1), 7, 0.1E-5, 0.97) (11(2,1), 13, 0.3E-7, 1.23) (11(2,1), 13, 0.8E-5, 6.84) (41(6,34), 702, 0.1, 9.3) (100(5,95), 1605, 0.2, 2169.3) (20(7,6), 47, 0.1E-6, 0.54) (29(3,3), 44, 0.9E-6, 4.15) (35(14,13), 145, 0.9E-5, 7.31) (99(36,46), 422, 0.8E-5, 971.9) (67(9,56), 1179,0.6,21.1) (13(3,8), 190, 0.1E+1, 20.3) (30(9,8), 63, 0.1E-5, 0.7) (37(14,13), 110, 0.1E-5, 18.2) (27(10,9), 74, 0.2E-11, 1.33) (29(6,5), 54 , 0.7E-7, 7.49) (53(16,25), 194, 0.2E-11, 4.71) (54(16,24), 192, 0.7E-11, 43.4) (13(1,0), 14, 0.3E-5, 0.1) (13(1,0), 14, 0.3E-5, 0.6) (1(1,0), 2, 0.0, 0.01) (1(1,0), 2, 0.0,0.04) (12(1,0), 13, 0.5E-6, 0.08) (12(1,0), 13, 0.5E-6, 0.45) (13(3,2), 27, 0.6E-5, 0.21) (14(3,2), 30, 0.0,2.16) (30(6,22), 490, 0.3, 4.34) (34(8,24), 507, 0.3, 47.2) (23(7,6), 47, 0.5E-6, 1.99) (34(11,10), 71, 0.1E-5, 170.4) (20(6,6), 58, 0.3E-7, 2.41) (18(2,1), 20, 0.3E-6, 0.93) (100(7,92), 108, 0.5E-1, 59.4) (overflow)
Table 6.3 - Column-Updating Method: First set of problems
133
134
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
We proceed to the analysis of the numerical results. In the local version of New ton's method, 14 cases converged out of the 34 tests performed. For these 14 successful cases, the global version of Newton's method reproduced exactly the same iterations as the local version in 11 tests. In 2 other cases, the global version reached conver gence in a smaller number of iterations than the local one, but with an increase in the amount of total time spent in the execution. In only one case, the performance of the global version was worse than the one presented by the local version: the problem 4, n = 1000, spends more iterations to achieve the same solution of (1). In the 20 tests in which the local version of Newton's method failed to converge, the global version effectively worked in 11 tests, reaching the solution of system (1). In other 6 cases the sequence generated by the global version converged to a stationary point of (6). In the remaining 3 tests, interruption by overflow occurred, or the maximum number of iterations was achieved. Comparing the quasi-Newton methods, their performance were practically iden tical, with a slight advantage for the Column-Updating method in terms of running time, since in that method the number of operations is smaller than the one per formed by Broyden's algorithm. This fact becomes more perceptible as the dimension increases. In the local version of the quasi-Newton methods, 48 cases failed out of the 68 tests performed. For these 48 cases, the global version reached convergence to a solution of (1) in 32 tests. Other 10 cases converged to stationary points of (6) and, finally, there was failure in the remaining 6 tests. As regards the 20 tests in which the local version of the quasi-Newton methods converged, the global version reproduced the sequence of local iterations in 15 cases. In the 5 cases left, analogously to the behavior of the global Newton's method, a solution for the system (1) was obtained in fewer iterations but spending more time than in the local version. Comparing Newton and quasi-Newton methods, we observe that they behave al most in the same way. In figures, the local version of the Newton's method performed successfully in practically 40 % of the tests (14 out of 34). The local quasi-Newton method converged in 30 % of the tests (20 out of 68). Therefore, the globalization role is as much effective for the quasi-Newton methods as it is for Newton's algorithm. Such a behavior is quite different from the numerical results of [12], where the differentiable Luksan's problems [22] were used. When it comes to Newton's method, 15 out of 34 local tests and 24 out of 34 global ones had the same performance in terms of convergence or divergence, both in the differentiable and in the nondifferentiable problems. On the other hand, for the quasi-Newton methods, 53 out of 68 local tests and 47 out of 68 global ones performed analogously in both cases. Both Broyden's method and the Column-Updating method behave in the same way. Through these figures, we can see that Newton's method is not as robust for nondifferentiable prob lems as it is for differentiable ones. In fact, in the present experiments, although the local version of Newton's method reaches slightly better results than the local quasi-Newton, the globalization is quite more effective for the quasi-Newton methods than for Newton's method.
Nonsmooth
Equations by Means of Quasi-Newton with Globalization
135
We performed a second set of experiments, where ,, / , and x, are given by (28), (29) and (30), and hi is defined by
K
'
l 9i(x) - 9i(x.) + 1,
i odd or i > n / 2 ; i otherwise.
In this case, the function F is not differentiate at x,, since fe,(x.) = [x„]; = 0 if i is even and i > n/2. In Tables (6.4), (6.5) and (6.6) we report the performance of Newton's method, Broyden's method and the Column-Updating method, for n = 100.
Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Newton Local Version
Newton Global Version
(4, 0.1E-15, 0.05) (100, 0.2E-1, 1.19) (100, 0.2E+1, 1.13) (15, 0.9E-6, 0.16) (86, 0.3E-6, 0.99) (100, 0.6E+4, 0.83) (100, 0.1E+8, 0.85) (100, 0.2E+9, 1.09) (100, 0.2E+9, 1.68) (5, 0.3E-6, 0.07) (1, 0.0, 0.01) (100, 0.3E+10, 0.81) (100, 0.2E-3, 0.62) (100, 0.2E+1, 0.85) (100, 0.5E+7, 2.6) (4, 0.1E-6, 0.03) (100, 0.9E+16, 0.86)
(4(4,0), 5, 0.1E-15, 0.05) (32(13,19), 336, 0.1E-1, 5.83) (41(9,32), 657, 0.7E-1, 7.19) (11(8,3), 28, 0.0, 0.33) (100(43,57), 485, 0.3E-4, 53.1) (100(10,90), 2058, 0.2E+1, 26.0) (14(11,3), 30, 0.3E-10, 0.37) (11(9,2), 25, 0.5E-5, 0.68) (100(35,65), 698, 0.3, 27.9) (5(5,0), 6, 0.3E-6, 0.05) (1(1,0), 2, 0.0,0.01) (6(5,1), 14, 0.1E-5, 0.18) (12(10,2), 31,0.0, 0.3) (69(14,55), 1226, 0.2,37.9) (26(20,6), 60, 0.2E-6, 3.04) (4(4,0), 5, 0.1E-6, 0.04) (100(10,90), 110, 0.2E+6, 53.7)
Table 6.4 - Newton's Method: Second set of problems
136
M. A. Gomes-Ruggiero, Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
J. M. Martinez
and S. A.
Broyden Local Version
Broyden Global Version
(100, 0.1E+7, 1.57) (100, 0.1E+7, 1.57) (100, 0.1E+3, 1.61) (31, 0.34+20, 0.28) (5, 0.2E+31, 1.04) (100, 0.3E+1, 1.54) (100, 0.2E+11, 1.47) (100, 0.2, 1.56) (100, 0.4E+0, 1.64) (20, 0.6E-5, 0.16) (1, 0.0,0.01) (100, 0.6E-3, 1.55) (100, 0.3E+3, 1.5) (100, 0.3E+1, 1.62) (100, 0.3E+11, 1.90) (100, 0.3E-1, 1.56) (100, 0.5E+17, 1.57)
(11(2,1), 13, 0.6E-5, 0.18) (50(9,29), 483, 0.1E-1, 7.17) (44(8,35), 682, 0.7E-1, 7.21) (14(6,5), 39, 0.2E-5, 0.5) (36(15,16), 143, 0.9E-5, 7.41) (100(9,88), 2032, 0.2E+1, 24.5) (31(11,10), 77, 0.6E-5, 1.32) (30(10,9), 77, 0.2E-6, 1.31) (33(7,16), 325, 0.2, 6.31) (10(2,1), 12, 0.3E-5, 0.12) (1(1,0), 2, 0.0,0.01) (10(2,1), 12, 0.3E-6, 0.09) (13(4,3), 30, 0.4E-8, 0.3) (70(10,55), 1227, 0.2, 38.3) (30(10,9), 65, 0.4E-7, 4.38) (17(2,1), 19, 0.3E-6, 1.15) (100(10,90), 110, 0.2E+6, 52.9)
Table 6.5 - Broyden's M e t h o d : Second set of p r o b l e m s Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Column-Updating Local Version
Column-Updating Global Version
(100, 0.4E+7, 1.05) (100, 0.9E+7, 1.11) (100, 0.4E+3, 1.15) (87, 0.5E+23, 1.86) (3, 0.2E+27, 1.02) (100, 0.2E+2, 1.05) (100, 0.5E+12, 1.00) (100, 0.2E+1, 1.02) (100,2.4 1.21) (14, 0.7E-5, 0.07) (1,0.0, 0.01) (100, 0.8E+1, 0.89) (100, 0.9E+3, 0.8) (100, 0.7E+3, 1.19) (100, 0.8E+11, 1.54) (100, 0.2E+0, 1.03) (100, 0.2E+17, 0.94)
(11(2,1), 13, 0.3E-5, 0.17) (45(7,25), 458, 0.1E-1, 6.96) (44(8,35), 682, 0.7E-1, 7.16) (14(6,5), 39, 0.2E-5, 0.5) (40(14,15), 146, 0.9E-5, 6.97) (100(9,87), 2006, 0.2E+1, 24.6) (32(11,10), 78, 0.8E-6, 1.28) (26(9,8), 68, 0.4E-6, 1.25) (60(20,28), 219, 0.2E-5, 5.92) (14(1,0), 15, 0.7E-5, 0.1) (1(1,0), 2, 0.0,0.01) (14(2,1), 23, 0.2E-5, 0.19) (16(3,2), 30, 0.6E-6, 0.23) (40(5,26), 483, 0.3, 9.79) (23(6,5), 41, 0.7E-7, 1.4) (14(2,1), 16, 0.8E-7, 0.64) (100(10,90), 110, 0.2E+6, 54.4)
Table 6.6 - C o l u m n - U p d a t i n g M e t h o d : Second set of p r o b l e m s
Santos
Nonsmooth Equations by Means of Quasi-Newton with Globalization
137
In more than 50 % of these experiments the solution obtained was different from the nondifFerentiable point a;,. A summary of these cases is given in Table (6.7). We also observe, in this set of experiments, that the pure local versions of the methods are far less efficient than the global versions.
Execution
Method
Local Version
Newton Broyden Column-Updating Newton Broyden Column-Updating
Global Version
Convergence to a solution x ^ x, 4 2 2 4 6 6
Convergence to I , 2 0 0 6 5 6
Convergence to a stationary point
3 4 3
Table 6.7 - Summary of the performance of the methods when function is not nondifFerentiable at the solution x, Following a suggestion of a referee, we observed the rate ||a;(t+i — I«||/||a;ifc ~ x*\\ a t the final iterations of the three methods tested. A typical result is shown in Table 6.8. Here, the execution reported is global, but the final four iterations are purely local. In this case, the practical behavior of the rate seems to reflect theoretical superlinearity. This property, which has been proved in [36] for Newton's method, probably holds for quasi-Newton methods in many particular situations. In fact, local convergence properties of quasi-Newton methods for nonsmooth systems represents one of the more challenging problems in this research area. Newton 0.7972983E-01 0.1338583E+00 0.1487052E-01 0.2918130E-03
Broyden 0.7176153E-01 0.8193998E-01 0.5369848E-02 0.2844602E-01
Column-Updating 0.7176153E-01 0.6818981E-01 0.4678174E-02 0.2814408E-01
Table 6.8 - Convergence rate ||z*:+i - i . | | / | | i i - x,\\ for the last four iterations of the Global Versions of the Newton, Broyden and Column-Updating Methods for the Problem 4
References [1] C. G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation 19 (1965) 577-593.
138
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
[2] C. G. Broyden, J. E. Dennis, and J. J. More, On the local and superlinear convergence of quasi-Newton methods, Journal of the Institute of Mathematics and Applications 12 (1973) 223-246. [3] X. Chen, On the convergence of Broyden-like methods for nonlinear equations with nondifferentiable terms, Annals of the Institute of Statistical Mathematics 42 (1990) 387-401. [4] X. Chen and L. Qi, A parameterized Newton method and a quasi-Newton method for nonsmooth equations, Computational Optimization and Applica tions 3 (1994) 157-179. [5] X. Chen and T. Yamamoto, On the convergence of some quasi-Newton methods for nonlinear equations with nondifferentiable operators, Computing 48 (1992) 87-94. [6] J. E. Dennis and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Review 19 (1977) 46-89. [7] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Opti mization and Nonlinear Equations, (Prentice-Hall, Englewood Cliffs, New Jer sey, 1983). [8] J. E. Dennis and H. F. Walker, Convergence theorems for least-change secant update methods SIAM Journal on Numerical Analysis 18 (1981) 949-987. [9] M. A. Diniz-Ehrhardt and J. M. Martinez, A parallel projection method for overdetermined nonlinear systems of equations, Numerical Algorithms 4 (1993) 241-262. [10] S. C. Eisenstat and H. F. Walker, Globally convergent inexact Newton meth ods, Research Report, Department of Mathematics and Statistics, Utah State University, USA. [11] R. Fletcher, Practical Methods of Optimization (2nd edition), (John Wiley and Sons, New York, 1987). [12] A. Friedlander, M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos, A new globalization strategy for the resolution of nonlinear systems of equations, Relatoriode Pesquisa, RP04/94, Institute of Mathematics, University of Camp inas, Brazil, 1994. [13] A. Friedlander and J. M. Martinez, On the maximization of a concave quadratic function with box constraints. To appear in: SIAM Journal on Optimization.
Nonsmooth Equations by Means of Quasi-Newton with Globalization
139
[14] A. Friedlander, J. M. Martinez and S. A. Santos, A new trust region algo rithm for bound constrained minimization. To appear in: Journal of Applied Mathematics & Optimization. [15] A. George and E. Ng, Symbolic factorization for sparse Gaussian elimination with partial pivoting, SIAM Journal on Scientific and Statistical Computing 8, (1987) 877-898. [16] G. H. Golub and Ch. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore and London, 1989. [17] M. A. Gomes-Ruggiero and J. M. Martinez, The Column-Updating Method for solving nonlinear equations in Hilbert space, RAIRO Mathematical Modelling and Numerical Analysis 26 (1992) 309-330. [18] M. A. Gomes-Ruggiero, J. M. Martinez and A. C. Moretti, Comparing al gorithms for solving sparse nonlinear systems of equations, SIAM Journal on Scientific and Statistical Computing 13 (1992) 459-483. [19] L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique for Newton's method, SIAM Journal on Numerical Analysis 23 (1986) 707 716. [20] S. P. Han, J. S. Pang and N. Rangaraj, Globally convergent Newton methods for nonsmooth equations, Mathematics of Operations Research 17 (1992) 586-607. [21] C. M. Ip and J. Kyparisis, Local convergence of quasi-Newton methods for B-differentiable equations, Mathematical Programming 56 (1992) 71-90. [22] L. Luksan, Inexact trust region method for large sparse systems of nonlinear equations, Technical Report no.547, January 1993, Institute of Computer Sci ences, Academics of Sciences of the Czech Republic. [23] J. M. Martinez, A quasi-Newton method with modification of one column per iteration, Computing 33 (1984) 353-362. [24] J. M. Martinez, On the Convergence of the Column-Updating Method, Matematica Aplicada e Computacional 12 (1993) 83-95. [25] J. M. Martinez, Local convergence theory for inexact Newton methods based on structural least-change updates, Mathematics of Computation 55 (1990) 143168. [26] J. M. Martinez, On the relation between two local convergence theories of least change secant update methods, Mathematics of Computation 59 (1992) 457481.
140
M. A. Gomes-Ruggiero, J. M. Martinez and S. A. Santos
[27] J. M. Martinez, A theory of secant preconditioners, Mathematics of Computa tion 60 (1993) 681-698. [28] J. M. Martinez and L. Qi, Inexact Newton methods for solving nonsmooth equations, Relatorio de Pesquisa 67/93, Institute of Mathematics, University of Campinas, Brazil, 1993, to appear in: Journal of Computational and Applied Mathematics. [29] J. M. Martinez and M. C. Zambaldi, Least Change Update Methods for non linear systems with nondifferentiable terms, Numerical Functional Analysis and Optimization 14 (1993) 405-415. [30] H. Matthies and G. Strang, The solution of nonlinear finite element equations, International Journal of Numerical Methods in Engineering 14 (1979) 1613 1626. [31] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957-972. [32] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York, 1970). [33] J. S. Pang, Newton's method for B-differentiable equations, Mathematics of Operations Research 15 (1990) 311-341. [34] J. S. Pang, A B-differentiable equation based, globally, and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-131. [35] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa tions, Mathematics of Operations Research 18 (1993) 227-244. [36] L. Qi and J. Sun, A nonsmooth version of Newton's method, Programming 58 (1993) 353-368.
Mathematical
Approximate
Newton
Methods
141
Recent Advances in Nonsmooth Optimization, pp. 141-158 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Superlinear Convergence of Approximate Newton Methods for LC1 Optimization Problems without Strict Complementarity 1 Jiye Han a n d Defeng Sun Institute of Applied Mathematics,
Acadernia
Sinica Beijing
100080, P. R.
China
Abstract
In this paper, the Q-superlinear convergence property of the approximate New ton or SQP methods for solving LC 1 optimization problems is established under the assumptions that the derivatives of the objective and constraint functions are semismooth, the strong second-order sufficiency condition is satisfied and the gradients to the active constraints are linearly independent. The strong second-order sufficiency condition is weaker than the second-order sufficiency condition and the strict complementarity condition.
1
Introduction Consider the s t a n d a r d nonlinear p r o g r a m m i n g minimize subject to
}(x) g(x) < 0, h(x) = 0,
(1-1)
where / , g and h are differentiable functions from Rn into R, Rp and Rq respectively. One m e t h o d for solving (1.1) is to solve t h e following linearly constrained q u a d r a t i c p r o g r a m Qk minimize
V}(xk)T(x
subject to
g(xk) h\xk)
- xk) + hx
+ Vg(xk)T{x + Vh{xk)T(x
- xk)TGk{x
-
xk)
- xk) < 0, - xk) = 0
'This work is supported by the National Natural Science Foundation of China
(L2)
142
J. Han and D. Sun
successively. Here Gk is an n x n matrix. This method is called an approximate Newton method or a SQP (sequential quadratic programming) method. If Gk is exactly the second-order derivative of the Lagrangian at xk, this is Wilson's method. See Garcia Palomares and Mangasarian (Ref. 4) and Robinson (Refs. 21-22). Before the advent of the very recent paper by Qi (Ref. 19), the proof of the superlinear convergence of such approximate Newton or SQP methods for solving nonlinear programming problems requires twice smoothness of the objective and constrained functions. Sometimes, the second-order derivatives of those functions are required to be Lipschitzian, for example, see Garcia Palomares and Mangasarian (Ref. 4), Han (Ref. 5), McCormick (Ref. 9) and Robinson (Refs. 21-22). However, the secondorder differentiability may not hold for some problems. For example, the extended linear-quadratic programming problem, recently emerged in stochastic programming and optimal control, even in the fully quadratic case, does not possess twice differentiable objective functions. However, their objective functions are differentiable and their derivatives are Lipschitzian in that case. See Rockafellar (Ref. 24) or Rockafellar and Wets (Ref. 25) for a detail. We call a function F : R" -> BT a LC1 function, if it is differentiable and its derivative function is locally Lipschitzian. We call a nonlinear programming problem a LC 1 optimization problem if its objective and constrained functions are LC1 functions. For the detail of LC 1 functions and LC 1 optimization problems, see Qi (Ref. 17). In Qi (Ref. 19), the Q-superlinear con vergence of the approximate Newton or SQP methods for solving LC 1 optimization problems was established under the assumption that the derivatives of the objective and constrained functions are semismooth and the three key assumptions that the second-order sufficiency condition, the strict complementarity slackness and linear in dependence of the gradients to the active constraints are satisfied under the context of LC 1 optimization problems. Basing on generalized equations' theory established by Robinson (Ref. 23), Josephy (Refs. 7-8) provided a proof to the local superlinear (quadratic) convergence of quasi-Newton (Newton) methods without assuming the strict complementarity slackness condition when the second-order differentiability is available. Also basing on Robinson's generalized equations' theory (Ref. 23), without assuming the strict complementarity condition Lescrenier (Ref. 29) provided a proof to the convergence of a class of trust region methods proposed by Conn, Gould, and Toint (Ref. 30) for optimization problem with simple bounds constraints when the objective function is twice continuously differentiable. In this paper, we will discuss the superlinear convergence of approximate Newton or SQP methods for solving LC1 optimization problems without assuming the existence of the second-order differen tiability and the strict complementarity slackness condition. In a certain sense, our results in this paper are the LC 1 version of the results in Josephy (Refs. 7-8) or a generalization of the results in Qi (Ref. 19) without the strict complementarity slackness. To achieve this, our technique is different from that of Josephy (Refs. 7-8) or Qi (Ref. 19). First we consider the superlinear
Approximate Newton
Methods
143
convergence of a generalized approximate Newton type method for solving nonsmooth equations, recently developed in Pang (Ref. 14) and Qi (Refs. 16-17). Then, we prove that the approximate Newton or SQP methods are special cases of such generalized approximate Newton method. In section 2, we discuss the strong second-order sufficiency condition and linear independence under the context of LC 1 optimization. The Q-superlinear convergence of approximate Newton or SQP methods for LC1 optimization is established in section 3. In section 4, we give some discussions.
2
The Strong Second-Order Sufficiency Condition Throughout this paper, we assume that / , g and h in (1.1) are LC 1 functions.
The Lagrangian of (1.1) is L(x,u,v) = f(x) + uTg(x) + vTh(x). gradient of L with respect to x by FUfV. Then Fu,v(x) = v / ( * ) + V f f (i)u +
Denote the
Vh(x)v
is a locally Lipschitzian function. In Josephy (Refs. 7-8) or Robinson (Ref. 23), the two key assumptions other than second-order differentiability are the strong second-order sufficiency condition and linear independence of the gradients to the active constraints. We still need these two assumptions. However the strong second-order sufficiency condition needs to be modified because we will not assume the second-order differentiability of / , g and h. In general, assume that F : Rn —> Rm is locally Lipschitzian. By Rademacher's Theorem, F is differentiable almost everywhere. Let Dp be the set where F is dif ferentiate. Let dF be the generalized Jacobian of F in the sense of Clarke (Ref. 2). Then dF{x) = co{ lim F'(xk)\, (2.1) where co{A} is a convex hull of a set A. In Qi (Ref. 16) and Pang and Qi (Ref. 15), the concept dsF(x) dBF(x)
= { lim
F'(xk)\.
Then dF(x) =
codBF(x).
was introduced
J. Han and D. Sun
144
For m = 1, 8BF(X) was introduced by Shor (Ref. 26). Let F, denote the ith compo nent of F. Sun and Han (Ref. 27) introduced dbF(x) = dgF^x)
x dBF2(x) x ■ ■ ■ x
dBFm(x).
Then 8BF(X) C di,F(x) and the converse relation does not hold in general. For example if F : R1 —> R2 has the form v
'
/ min(*,*') y min(-x, -x')
\ I
then
w>-{(i).(. , 1 ) }•»«•)-{(;). (-.)•(-.)■(:)}■ and dBF(0) C dt,F(0). But when m = 1, 3 t F(a:) =
dBF{x).
From the results of Clarke (Ref. 2), Qi (Ref. 16), and Sun and Han (Ref. 27) we know that dF(x), dBF(x) and di,F(x) are nonempty compact subsets of Rmxn, and the maps dp, dBF and d^F are upper semi-continuous (Ref. 1). In fact if we note that dF{x) and diF(x) are compact subsets, and that the maps dF and d{F are upper semi-continuous (Ref. 2), we can draw the same conclusions for the maps dsF and d\,F through the standard analysis. In this paper we use M{x,F) to represent one of dF(x), dBF(x) and dt,F(x) and use the multifunction M(-, F) to represent one of dF, dBF and dtF. Therefore, M(x,F) is a nonempty compact subset of Rmxn, and the map M{-,F) is upper semi-continuous. Suppose that f\, / 2 : Rn —+ i?1 are continuously differentiable functions. Let fo(x) = m i n ( / i ( x ) , / 2 ( i ) ) , then 9*/o(i) =
{V/,(*) T } {V/r(x)r,V/2(zn {V/ 2 (x) T }
if fi(x) < / 2 ( x if / i t 1 ) = /2(x i f / i ( z ) > f2(x
This formulae will be used later in this paper. The first-order Kuhn-Tucker conditions for (1.1) are F»,v{x) = Vf{x) + Vg{x)u + Vh(x)v = 0, « > 0, g(i) < 0, u;5,(z) = 0, for i = l,...,p, ft(x) = 0.
(2.2)
Let H(z)=
/ V / ( i ) + Vg(x)u + Vh(x)v min{u,-g{x))
V
~h(x)
\ ,
)
(2.3)
Approximate
Newton
Methods
145
where the 'min' operator denotes the componentwise minimum. Then the first-order Kuhn-Tucker conditions are equivalent to H(z) = 0. Denote Ht{z) = V / ( x ) + Vg(x)u + Vh(x)v, H2(z) = min(u, -g(x)) and H3{z) = -h(x). Then
H(z) =
/ Hx(z) \ H2{z) .
\ H3(z) J For every z = (x, u, v) G R" x R? x R?, denote dQH(z) = M{z,Hx)
x dbH2(z) x
{VH3(zf}.
It is easy to see that 3QH(Z) is a nonempty compact subset of Rmxm, and the map 3QH is upper semi-continuous, where m = n + p + q. For any A G M(z, Hx), there exists V 6 R"xn such that A = {V V(?(i) Vh(x)). Denote
vx(z) = {ve Rnxn\ (v vg(x) Vh(x)) e
M^,^)}.
From the definition of the map M(-,-), it is easy to see that for any z = (x,u,v) G R" x R? x Rq, we have M(x,Fu,v)CVx(z). Suppose that z = (x,u,v)
£ R" x Rp x Rq is a Kuhn-Tucker point of (1.1). Let
I(z) = {»| 1 < t < p, ,(*) = 0}, /+(*) = {i G I(z)\ u, > 0}, I°(z) = {," 6 J(*)| ut- = 0}, G(z) = {<* G / T | /'(x; d) = 0, gKx; d) = 0 for i £ /+(«), ,'(x; ) < 0 for «' G /°(*) and /i;(x; d) = 0 for i = 1, ...,g} and G+( 2 ) = {d € iT*| f (z;d) = 0, g'^x-d) = 0 for i € /+(z) and ftj(x; d) = 0 for i = 1,..., q}. A point z = (x,u,v) € i?" X RP x Rq is said to satisfy the second-order sufficiency conditions (strong second-order sufficiency conditions) for (1.1) if it satisfies the firstorder Kuhn-Tucker conditions and if (fVd > 0 for all d £ G(z)\0 (d G G+(z)\0), V 6 Vx(z). Suppose that z = (x,u,v) G R" x Rv x Rq is a Kuhn-Tucker point of (1.1). We say that z satisfies the linear independence condition if {Vj;(x), i G I(z)} and {Vhi(x), i = 1,...,<J} are linearly independent. We say that z satisfies the strict complementarity slackness condition if I°(z) = 0. When the strict complementar ity condition is satisfied (i.e., I°(z) = 0), then G(z) = G+(z). Therefore, secondorder sufficiency conditions and the strict complementarity slackness condition mean
J. Han and D. Sun
146
strong second-order sufficiency conditions. In general, strong second-order sufficiency conditions mean the second-order sufficiency conditions, but don't mean the strict complementarity slackness condition. The strict complementarity slackness condi tion may not hold in nonlinear optimization problems. Therefore, we will consider the superlinear convergence properties of approximate Newton or SQP methods for LC1 optimization problems without assuming the strict complementarity condition. First, we shall consider the nonsingularity of matrices W € 8QH(Z) at a solution of H(z) = 0. If the components of such a solution are denoted by xo, uo, «o, we can partition the vector g(xo) into smaller vectors g+(x0), g°(xo) and g~(xo), of dimensions r, s and t, respectively, and partition u0 conformably into itg , u° and u^ so that
ut >0,
g+(x0) = 0, g°(x0) = 0, g-{x0) < 0,
(2.4)
u° = 0, un = 0,
where the ordering is componentwise. After suitable arrangement, (2.3) can be writ ten as / x \ I V / ( i ) + Vg{x)u + Vh(x)v u+ min(u + , — g+(x)) H u° = min(u°, -g°{x)) u~ min(u~, — g~(x)) K
v j
-h(x)
\
\ .
(2.5)
j
Theorem 2.1. Suppose that z0 = (x0,u0,v0) 6 Rn x Rv x Rq satisfies the strong second-order sufficiency conditions and the linear independence condition of (1.1). Then all W G 8QH(Z0) are nonsingular. Proof. According to the definition of dQH(z0), we only need to prove for i = 0,1,..., s, the nonsingularity of the following matrices
( WW =
v
Gf
G°' T
G°jT
Gf
-Gt
0 0 0 0 0
0 0 0 0 0
0 0
0 0 0
-GV 0 0
I -Ho
'jxj
0 0
'txt
0
#oT\ 0 0 0 0 0 /
where V € VXo(z0), H0 denotes Vh(x0)T1 G£ denotes Vg+(x0)T, etc, / = {1,...,«'} (when i = 0, J = 0), J = {\,...,s}\I, j = | J | , G° 7 is a matrix of the / rows of G°, J G°0J is a matrix of the J rows of Gj, and LXj and Itxt are the unit matrices of if*
Approximate and Rtxt
Newton
Methods
147
respectively. Suppose that a, b, c, d, e and / are such that Va + G+Tb + G°0'Tc + G°0jTd + Gft -Gja -Gl'a Iixid
+ H0Tl = = = =
-ffxte
0, 0, 0, 0,
(2.6)
= 0,
-//0a
= 0.
Therefore, we get Va + G+Tb + G°'Tc + H0Tl -G+a -Gl'a -H0a
= = = =
0, 0, 0, 0.
(2.7)
Premultiplying the equations in (2.7) by aT, bT, cT and lT, respectively, and adding the result we find that aTVa = 0. This, together with the second and fourth equations of (2.7) and the strong second-order sufficiency conditions, implies that a = 0; the first equation of (2.7) and the linear independence assumption now imply that 6, c and / are also zero. The fourth and fifth equations of (2.6) means that d and e are zero. Thus the matrix Wt^ is nonsingular. This completes the proof. □ Corollary 2 . 1 . Under the conditions of Theorem 2.1, there exist 8 > 0 and C > 0 such that for any z = (x, u, v) G R" x W x Rq, satisfying p — z0\\ < 6, and any W € dQH(z), W is invertible and WW^W < C. Proof. Applying Theorem 2.1 of this paper, and that 3QH{Z) is a nonempty compact subset and the map 8QH is upper semi-continuous, we can easily obtain the conclusion. □ We say that a locally Lipschitzian function F : R" —* Rm is semismooth at x if lim
{Vh'}
(2.8)
V£0F(i+th')
exists for any h 6 Rn. If F is semismooth at x, then F is directionally differentiable at x and F'(x; h) is equal to the limit in (2.8). Semismoothness was first introduced by Mifflin (Ref. 10) for functional. Convex functions, continuously piecewise linear functions, smooth functions and subsmooth functions are examples of semismooth functions. Scalar products and sums of semismooth functions are also semismooth functions. In Qi (Ref. 16) and Qi and Sun (Ref. 18), the definition of semismoothness was extended to F : R" —» R™■ It was proved in Qi (Ref. 17) that F is semismooth at x if and only if each of its components is semismooth at x.
J. Han and D. Sun
148
3
Superlinear Convergence P r o p e r t y
To establish the superlinear convergence of approximate Newton or SQP methods, we need the following two properties of semismoothness: Suppose that F : Rn —+ i f is locally Lipschitzian and semismooth at x. Then (1) F is B-differentiable at x, i.e., F'(x; h) exists for all h G if1 , and F(x + h) = F(x) + F'(x; h) + o(||A||), (2) For any V G dF(x +
(3.1)
h),h-^0 Vh-F'(x;h)
= o(\\h\\).
(3.2)
See Theorem 2.3 of Qi and Sun (Ref. 18). The approximate Newton method (ANM) for solving (1.1) is as follows: Start at a point z° = (x°,u°,v°) G Rn x Rp x R". Having zk = (xk,uk,vk), k+1 k+1 k+l k+1 find a Kuhn-Tucker point z = (x ,u ,v ) of the quadratic subproblem Qk described by (1.2). If zk+l is not unique, choose any Kuhn-Tucker point zk+l which is closest to zk in terms of distance \\zk+1 — zk\\. Suppose that z' = (x",u",v') G Rn x R" x Rq is a solution of H(z) = 0 (i.e., z' is a Kuhn-Tucker point of (1.1)). For every z = (x,u,v) G i?" x Rv x Rq, denote a(z) = {i |u, > -gi(x)}, For ielfl
f3(z) = {i\ u, = -gi(x)}
and f(z) - {i\ «,• < - p ; ( x ) } .
= {1,...,2W*'>I}, define
ffW(*) =
/ V / ( x ) + V^(x)u + V/i(i)u \ , P <"( 2 )
-Hx)
V where p^\z)
(3.3)
j
G P(z) and P(z) consists of all the following functions p(z),
Pj(z)
< -
if j G a(z'), if j G P(z'), if j G j(z*),
j = 1, ...,p and define 9Qi7('»(Z) = M{z,Hx)
x {Vp(')( 2 ) T } x { V i / 3 ( ^ T } .
Lemma 3.1. Suppose that z" = (x*,u*,v") G Rn x R? x il« is a Kuhn-Tucker point of (1.1) and satisfies the conditions of Theorem 2.1. Then there exist positive
Approximate Newton
Methods
149
constants 6 and C such that for any z = (x,u,v) G 7?" x 7?" x 7?' with z G {z\ \\z—z*\\ < 6}, and any i G 7", all W{i) G dg77(z) are invertible and HW^H < C . Proof. From the definition of H^(z) and dQH^(z) we know that H®(z*) = 0
Vie/"
and dQH(i){z*)CdQH(z')
Vie/".
From Theorem 2.1 we know that all matrices W G dqH(z') are nonsingular. This means that all matrices Wj,-) 6 <9Q77('>(Z*), i G 7" are nonsingular. It is easy to see that all 3 Q / 7 ' ' ' ( Z ) , i G 7^ are nonempty compact subsets and all the maps 8QH^'\ i G I13 are upper semi-continuous. Therefore for each i G I13 there exist a neighborhood N^(z') of z' and a positive number C, such that for any z G -/V(''(z*), all W(,) G dqH^(z) are nonsingular and satisfy |[WT^11] < C,. Since I13 is of finite elements, the conclusion of this lemma holds. D In order to establish the superlinear convergence of approximate Newton method, we first consider the following generalized approximate Newton method (GANM) for solving H(z) = 0 : Given z° = (x°,u°,t>0) G Rn x R" x R«. For k = 0,1,..., choose i G 7" and let
^+i = 2<= _ B j ^ ' V ) , where B (i)fc = S/H^k(zkf k
H^ (z)
(3-4)
and TfW* is defined as
/ V/(x*) + V ^ a ^ u + V/i(xfc)u + G t (z - xk) \ = q"k(z) , k \ -h(x )-Vh(xk)T(x-xk) )
(3.5)
i G 7^, where q<-''lk(z) is defined as
?f *M =
( -9j(xk)
- V9l(xk)T(x
k
T
- xk)
if j G a(**),
pf{z ) + Vpj'V) (* ~ «*)
if J e £(**),
[ Uj
(3-6)
i f ; G l{z*),
j = l,...,?,andGfc€7?nxn Remark 3.1. In practice, we can't use the above method since we don't know z" However, the above method provides an approach to prove the Q-superlinear convergence of the approximate Newton method. T h e o r e m 3 . 1 . Suppose that z* = (x*,u*,v*) G 7?" x Rp x Rq is a Kuhn-Tucker point of (1.1) and satisfies the conditions of Theorem 2.1. Suppose that V / , Vg and
J. Han and D. Sun
150
V/i are semismooth at i*. Let C and 6 be the positive constants in Lemma 3.1. If there exists Vk 6 Vxk{zk) such that \\Gk-Vk\\<±
Vfc,
(3.7)
then the above method GANM is well defined and Q-linearly converges to z* in a neighborhood of z* If furthermore,
£2,
\](Gk-Vk)(xk"-xk)\\
\\zk»-z«\\
(3 8)
- °'
"
then the convergence is Q-superlinear. If in the later case H(zk) ^ 0, we have
lim"^f;;j"=0.
(3.9)
*-• ll#(**)ll Proof. Since V / , Vg and V/t are semismooth at x', H and H^'K i 6 I13 are semismooth at z". From the definitions of Vx*(zk) and dQH^(zk), i € /", for each £(,-)*, i e J" there exists W(,-)t £ dQH{i1(zk) such that for any z = ( i , t i , « ) e i f x i ? x /?« | | ( 5 ( 1 > - W{i)k)z\\ = \\(Vk - Gk)x\\.
(3.10)
\\B{i)k-W{i)k\\<\\Vk-Gk\\<±.
(3.11)
In particular, we have
If \\zk - z*|| < 6, then by Lemma 3.1, W{'\ exists and || W ^ J < C. By the Pertur bation Lemma of Ortega and Rheinboldt (Ref. 12, p. 45), B^)k is invertible and l|B(7)\ll < \C
(3.12)
Recall that a map is semismmooth at z* if and only if each of its components is semismooth at z* and there are finite elements in the set / , so by (3.1) and (3.2), for every e > 0 there exists a neighborhood N(z*) of z" such that when z G N(z') and W(i) 6 dQH^{z) (note Wfa e dHf\z)) we have n+P+9
\\H®{x) - H^(z')
- W{l)(z - OH <
£
l^ ( , , (*) -
tfjV)
- Wj)(» - z')\
J=l
<e||z-z*||
Vie/* 3 . (3.13)
Approximate
Newton Methods
151
So we may choose S\ > 0 sufficiently small such that when \\zk — z*\\ < Si, for any i 6 / " we have \\HM(zk) - H^(z')
- W{t)k(zk - z')\\ < ± | | * * - z*\\.
(3.14)
Let S = min(<5j, (5). Then when \\zk — z*|| < 6, we have H ^ i _ 2 .|| = ^k _ B^H^(z")
- z'\\
< H^IIII^°(«*) - # V ) - %*(** - **)ii < \\B^\\l\\HU(zk)
- H<<\z') - W(i)k(zk - z')\\
+\\{Bmk-W{i)k){zk-z')\\}.
(3.15)
Substituting (3.11)-(3.12) and (3.14) into (3.15) gives
^_z»n
< _ Cc( ( — ++ — jMI-* l12 3
iC
8C
:
= i||^-z*||.
(3.16)
This proves that GANM is well defined and Q-linearly converges to z* in a neighbor hood of z'. Furthermore if (3.8) holds, by (3.10)-(3.11), (3.13) and (3.15), we have \\zk+1 ~ 2*|| < \c[\\HV{zk)
-
tf
- W{i)k(zk - z')\\
+ 11(^(0* - W(.l*)(**+1 - **)ll + ll(B(,-,t - Wm){zk^ - z')\\] < \c[o{\\zk
- z'\\) + \\{Vk - Gk)(xk" +
- xk)\\ + ± | | z * + 1 - *'||]
o{\\zk»-zk\\)+l-\\zk»-z'\\. (3.17)
This, and the Q-linear convergence of {zk}, turns out to be \\zk+1 - z'\\ = o(\\zk - z*\\),
(3.18)
i.e., the convergence of GANM is Q-superlinear. The proof of (3.9) is similar to the proof of Theorem 3.1 of Qi (Ref. 16).
D
J. Han and D. Sun
152
Remark 3.2. For unconstrained optimization problem ( / G C 2 ), condition (3.8) is known as the Dennis-More type condition (see, e.g., Dennis and Schnabel (Ref. 3)) and that for nonlinear programming (C2 optimization problem) with equality constraints a generalization of this condition due to Boggs, Tolle, and Wang (Ref. 31) is widely used. Corollary 3.1. Assume that the conditions of Theorem 3.1 hold. Then there exists a positive number e > 0 such that when there exists Vk € Vxt{zk) such that ||Vl-Gfc||
V *,
(3.19)
the approximate Newton method described above is well defined and Q-linearly con verges to 2* in a neighborhood of z* - If furthermore (3.8) holds, then the convergence is Q-superlinear. If in the later case, H(zk) ^ 0, then (3.9) holds. Proof. To complete the proof, we prove that the approximate Newton method is a special case of GANM in a neighborhood of z* Choose a positive number 82 > 0 {82 < 6/3, 8 is defined in the proof of Theorem 3.1) such that when z, zheB(z'-382)
= {z\
\\z-z'\\<382},
we have -g;(xk) - Vgt(xk) (x — xk) < uk -9i(xk) - X7g,(xkf(x - xh) >uk
if i € a(z*), if i € f(z').
(3.20)
So when zk £ B(z~\ Z82) we have a(*') C «(**), j(z") C 7 (**) and /3(zk) C
fi(z').
(3.21)
The first-order Kuhn-Tucker conditions of the quadratic subproblem Qk can be written as Hk(z) = 0, (3.22) where Hk{z) is defined as Vf{xk) + Vg(xk)u + Vh(xk)v + Gk(x - xk) H (z) = j min(u, -g(xk) - Vg(xkf(x - xk)) k k k -h{x )-Vh{x f{x-x )
\
k
(3.23)
/
We now show that (3.22) has a solution if 82 sufficiently small. Similarly to the proof of Theorem 4.1 of Robinson (Ref. 23), we can easily conclude that the following matrix / V. V 5o(2 .,(x*) Vh(x') \ A. = -V5a(z.)(x-)r 0 0 j
\
-Vh(x'f
0
0
/
Approximate
Newton
Methods
153
is nonsingular, and the Schur complement B(z') =
C{z'f'A^C{z')
is a P-matrix (i.e., a matrix with positive principle minors), where V, 6 Vx>(z") and C{z') From the definitions of M(z,Hi) exists 83 > 0 such that when
0 0
and V x (z), for every e > 0 we can prove that there
zheB(z'-63)
=
{z\\\z-z'\\<63},
we have V,*(**)CV,.(z*) + e £ ( 0 ; l } ,
(3.24)
where 5(0; 1) = {2 g Rn\ \\z\\ < 1}. So we may restrict S2 and e such that for any zk € B(z'; S2) = {z\ \\z - z*|| < 62}, the matrix .. V ffa(z . ) (x fc )
G* *(**) = ) - v 5 a ( , - ) ( ^ ) -Vh(xkf
VA(**) \ 0 0
0 0
J
is nonsingular, and the Schur complement B(zk) = C{zk)TA(zk)
*C(zk)
is a P-matrix, where Vgp(z.)(xk) 0
( k
C(z ) = Note that in the matrix
A(zk) k
-C{z f
C(zk)
0
(3.25)
the index sets a and 0 are defined at z" but the various gradients are evaluated at In order to consider the solvability of the system (3.22), we consider the solvability of the following system F„* ,„*(!*) + Gk
(3.26)
J. Han and D. Sun
154
The component dUt is explicit for i £ j(z'). Simplifying these equations, we deduce that the remaining components of the vector d = (dx,du,dv) £ Rn x R" x Ri can be obtained by solving the mixed linear complementarity problem q(zk) + A(zk)w + C{zk)d^ -gp(xk)-C(zk)Tw>0, u* + d»" > 0, I [-gp(x«)-C{zk)Tw]T{4
= 0, ^M>
+ d^) = 0,
where w = ( d - . d - . O . f(**) = (&(**)»-»•(**),-A(**)), &(**) =
F „*{xk)~Vg^xk)uk uK
and a, /? and 7 denotes respectively the index sets a(z*), P(z*) and 7(2*). From linear complementarity theory (see, e.g., Murty (Ref. 11)), we know that a sufficient condition for the system (3.27) to have a unique solution is (i) the matrix A(zk) is nonsingular and (ii) the Schur complement B(zk) = C(zk) A(zk) C(zk) is a Pmatrix. Since we have proved that these two conditions are satisfied, system (3.27) has a unique solution. Then system (3.26) has a unique solution when zk £ B(z*; <52). We denote this solution by dk = (rf*\
xR" x K>.
It is easy to prove that for each k there exists i 6 I13 such that H^(zk)
+ B{i)kdk = 0.
(3.28)
From the proof of Theorem 3.1, we know that ||*h + d * - z * | | < i ] | * f c - * * | | .
(3.29)
Let zk+1 =zk + dk. Then zk+l € B{z"; 62) if zk 6 B{z~; S2). We now prove that Hk(zk+1) zk, zk+1 £ B{z*;62), we have mm(uk+\-9i(xk)
= 0, which means that (3.22) has a solution, when
- V g ^ f ^ -g,(xk) Sf+1
1
- xk))
- Vgi(xkf(xk+l
- xk)
if i 6 a ( * ' ) , if i £ -y(z').
Approximate
Newton Methods
Thus if zkeB(z';62),
k+t 1) Hkk(zk+
H (Z )
155
then k k kk k (Fu.ut vt„*(**) (xk) + Vg(x )d»" + Vh(* )d» + G d** \ (F Vg(xk)d" Vh(*k)d" Gtk<*** k = min(u* + « f \ -g(x ) V (x*) V ) nan(u* ^ , - j ( * * ) - V 5f f ( * * ) V
V \
-*(**) - Vfc(sf>* VMx*) V
//
k fc .*(**)++VV5(x V k ++Vfc(*V Vh(xk)d"k++ G Gt
= 0, which means that system ffl(z) = 0 has a solution zfc+1 in B(z*;62), i.e., z*c+1 is a Kuhn-Tucker point of (1.2). Suppose that i*+1 6 £(z-;Z62) is an arbitrary solution of Hk(z) = 0. Since zk+l € B{z'; 3S2), then T min(u* ++11 \- 5 ,(x*) - V«7,-(x*) V«7,-(x*)T(x*+> (x*+>- -xkx))k))
f -g,(xk) 1 ik+l
- Vgi{xk)T($k^
- xk)
if a(z*), if t € a(z*), if f(z'). if i G 7(*»).
Therefore d* = zt+1 - zk is also a solutton of syssem (3.26)) From the uniqueness of the solution of system (3.26-), we know that z*+1 = zk+1, which shows that zt+1 is the closest Kuhn-Tucker point to zk in terms of distance \\zk+1 - zk\\. So there exists i 6 P such that ZM
=
sk+l = zk_
B-l^(.)(/))
which means that approximate "Newton method is a special case of GANM in a neighborhood of z* So we complete the proof of Corollary 3.1 by considering Theorem 3.1. □ Remark 3.3. If we choose Gk 6 V^(zk),
4
(3.7) and (3.8) are satisfied.
Some Discussions
In this paper we considered the local convergence of approximate Newton or SQP methods for LC 1 optimization problems without assuming the strict complementarity condition. The global convergent technique used in Qi (Ref. 19) can be applied to this paper similarly.
J. Han and D. Sun
156
GANM is useful in proving the Q-superlinear convergence of approximate Newton or SQP methods, but it can't be used in practice since we don't know oc(z'), fi(z') and 7(2*). The approximate Newton or SQP methods are well used and in each step a quadratic programming is needed to be solved. In the following we give such a method that in each step only a linear equations is needed to be solved. Given z° = {x°,u°,v0) For k = 0,1,...,
g R" x R" x R". zk+1 = zk - B;lH(zh),
where Bk G dQHk(zk)
= {VLk(zk)T}
Lk(z) = Vf(xk)
(4.1)
x dbgk(zk) x {Vhk(zk)T},
+ Vg(xk)u + Vh(xk)v
gk(z) = min(u, -g(xk)
+ Gk{x -
- Vg(xkf(x
-
and xk),
xk))
and hk(z) = -h(xk)
- Vh(xkf(x
-
xk).
It is easy to see that in a neighborhood of the solution z" of H(z) = 0, the above method is a special case of GANM. So similar convergent properties for (4.1) can be found in Theorem 3.1. Acknowledgements. The authors thank the two referees for their valuable com ments and suggestions on this paper and are grateful to Professor L. Qi for his helpful suggestions on nonsmooth equations and related problems.
References [1] J. D. Aubin, and F. Frankowska, Set-Valued Analysis, Birkhauser, Boston, 1990. [2] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [3] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Opti mization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, New Jercy, 1983. [4] U. C. Garcia Palomares and O. L. Mangasarian, Superlinearly convergent quasiNewton algorithms for nonlinearly constrained optimization problems, Mathe matical Programming 11 (1976) 1-13. [5] S. P. Han, Superlinearly convergent variable metric algorithms for general non linear nrogramming problems, Mathematical Programming 11 (1976) 263-282.
Approximate Newton Methods
157
[6] J. B. Hiriart-Urruty, J. J. Strodoit and V. H. Nguyen, Generalized Hessian ma trix and secord-order optimality conditions for problems with C 1,1 data, Applied Mathematics and Optimization 11 (1984) 43-56. [7] N. H. Josephy, Newton's method for generalized equations, Technical Summary Report 1965, Mathematical Research Center, University of Wisconsin-Madison, 1979. [8] N. H. Josephy, Quasi-JVewton methods for generalized equations, Techni cal Summary Report 1966, Mathematical Research Center, University of Wisconsin-Madison, 1979. [9] G. P. McCormick, Penalty function versus non-penalty function methods for constrained nonlinear programming problems, Mathematical Programming 1 (1971) 217-238. 101 R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1972) 957-972. 11] K. G. Murty, Linear Complementarity, Helderman-Verlag, Berlin, 1988.
Linear and Nonlinear
Programming,
121 J- M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. 131 J.-S. Pang, S. P. Han and R. Rangaraj, Minimization of locally Lipschitzian functions, SIAM Journal on Optimization 1 (1991) 57-82. 141 J.-S. Pang, A B-differentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-131. 15l J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443-465. 16] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa tions, Mathematics of Operations Research 18 (1993) 227-244. 171 L. Qi, LC 1 functions and LC 1 optimization problems, Applied Mathematics Preprint 91/21, School of Mathematics, The University of New South Wales, Sydney, Australia, 1991. 18] L. Qi and J. Sun, A nonsmooth version of Newton's method, Programming 58 (1993) 353-368.
Mathematical
191 L. Qi, Superlinearly convergent approximate Newton methods for LC 1 opti mization problems, Mathematical Programming 64 (1994) 277-294.
158
J. Han and D. Sun
[20] L. Qi and R. Womersley, An SQP algorithm for solving extended linearqua.dra.tic problems in stochastic programming, Applied Mathematics Preprint 92/23, School of Mathematics, The University of New South Wales, Sydney, Australia, 1992. [21] S. M. Robinson, A quadratically convergent algorithm for general nonlinear programming problems, Mathematical Programming 3 (1972) 145-156. [22] S. M. Robinson, Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear programming algorithms, Mathematical Programming 7 (1974) 1-16. [23] S. M. Robinson, Strongly regular generalized equations, Mathematics of Oper ations Research 5 (1980) 43-62. [24] R. T. Rockafellar, Computational schemes for solving large-scale problems in extended linear-quadratic programming, Mathematical Programming 48 (1990) 447-474. [25] R. T. Rockafellar and R. J.-B. Wets, Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time, SIAM Journal on Control and Optimization 28 (1990) 810-822. [26] N. Z. Shor, A class of almost-differentiable functions and a minimization method for functions of this class, Kibemetiica 4 (1972) 65-70. [27] D. Sun and J. Han, Newton and quasi-Newton methods for a class ofnonsmooth equations and related problems, Technical Report No. 026, Institute of Applied Mathematics, Academia Sinica, Beijing, China, 1994. [28] C. Zhu and R. T. Rockafellar, Primal-dual projected gradient algorithm for extended linear-quadratic programming, to Appear in SIAM Journal on Opti mization. [29] M. Lescrenier, Convergence of trust region algorithms for optimization with bounds when strict complementarity does not hold, SIAM Journal on Numerical Analysis 28 (1991) 476-495. [30] A. R. Conn, N. I. M. Gould and Ph. L. Toint, Global convergence of a class of trust region algorithms for optimization with simple bounds, SIAM Journal on Numerical Analysis 25 (1988) 433-460. Erratum in the same Journal 26 (1989) 764. [31] P. T. Boggs, J. W. Tolle and P. Wang, On the local convergence of quasinewton methods for constrained optimization, SIAM Journal on Control and Optimization 20 (1982) 161-171.
Second-Order Directional Derivatives
159
Recent Advances in Nonsmooth Optimization, pp. 159-171 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
On Second-Order Directional Derivatives in Nonsmooth Optimization L. R. Huang Department of Mathematics, South China Normal University, Guangzhou, China K. F. Ng Department of Mathematics, The Chinese University of Hong Kong, Hong Kong
Abstract Some relationships between the second-order derivative of Ben-Tal, Zowe and that of Chaney are established. These derivatives are used in providing optimality conditions for nonsmooth optimization problems with and without constraints.
1
Introduction
As a major role in second-order nonsmooth optimization problems, different kinds of generalized second-order directional derivatives have been introduced among which one has D2f(x;u,v) of Dem'yanov-Pevnyi [13], Ben-Tal, Zowe [3], and f"(x;x',u) of Chaney [6] where / is a locally Lipschitz real-valued function on a normed space X. In this paper, we shall survey some of the results obtained in [4, 14, 15, 16] as well as give further new results. In particular we provide, in section 4, a new set of sufficient conditions for a minimum point in nonsmooth optimization problems with and without constraints.
2
Definitions
Though most results can be generalized, we assume for simplicity that X = 1in; let W be an open set in X and / a real-valued locally Lipschitz function. The lower Dinidirectional derivative of / at x £ X in the direction u G X is denoted by D-f(x; u)
L. R. Huang and K. F. Ng
160 and is defined by D-f(x;u)
:= liminf j{f(x
+ tu) - / ( x ) } .
The upper Dini-directional derivative D+f(x;u) is similarly defined (with the lower limit replaced by the upper one). If D-f(x;u) = D+f(x;u) then the common value is denoted by f'(x;u). As in [2], Clarke's generalized (upper) directional derivative and subdifferential are denoted by f°(x;u) and df(x) respectively. In the case when f'(x;u) exists, the following definition was introduced by Ben-Tal and Zowe [3]. Definition 2.1 Let x € W and u,v G X. The Ben-Tal/Zowe lower and upper generalized second-order directional derivatives of f at x in the directions u and v are defined respectively by Dif(x;
u, v) := liminf i { / ( x + tu + t2v) - f(x) - tD+f(x;
u)}
(2.1)
Dlf(x;
u, v) := limsup - { / ( x + tu + t2v) - f(x) - tD.f{x\ no *
u)}.
(2.2)
and
It is easy to see that - oo < D2_f(x; u, v) < D\f(x;
u, v) < +oo
(2.3)
2
-D _f(x;u,v) = Dl(-f)(x;u,v). (2.4) Similar but different notions have appeared in the literatures, e.g. in Penot [20] and Studniarski [24]; these authors use D+f and D-f in the above definitions to replace £>_/ and D+f respectively. In our approach it is true [15] that if D2f(x;u,v) exists and is finite then the first-order derivative f'(x;u) also exists. This property is not shared by the approach of [20], [24] (see Example 3.6 in [24]). A sequence (i^) is said to converge to x in the direction u, denoted by ( i t ) —> u x, if (x/,) converges to x, x/c ^ x for every k, and the sequence (|| u \\ ,,**"*,.) converges to u. Chaney's subdifferential [5] of / at x in the direction u is denoted by duf(x) and defined to be the set consisting of all x* for each of which there exist sequences (a^) and x*k € df(xk) such that (x&) —> u x and (x*k) —» x' Because we have assumed that X = 7£", duf(x) is nonempty and duf(x) C df(x) since the multifunction df is upper semicontinuous [2]. The following generalized lower and upper second-order directional derivatives are also due to Chaney [5]. Definition 2.2 Let x e W and u 6 X. Suppose that x~ G <9„/(x). Then /i'(x;x*,u) is defined to be the infimum of all numbers lim inf -~{f(xk)
- f(x) - x*(xk - x)}
k^oot'k
taken over all triple of sequences (xk), (x£) and (tk) for which
Second-Order Directional Derivatives
161
(a) tk > 0 for each k and (xk) converges to x, (b) (tk) converges to 0 and f5^-12) converges to u, (c) (xj) converges to x* with x*k in df(xk)
for each k.
Similarly, /"(x;x*,u) is defined by the supremum of all numbers lim sup "5{/(x fc ) - f(x) - x"(xk - x)} fc—oo tk
taken over all triples of sequences (xjt), (x'k) and (tk) for which (a), (b) and (c) above all hold. Clearly, — oo < f"(x;x',u) < f+(x\x*,u) < +oo. Further, if f"(x;x',u) = f+(x;x",u), then we denote this common value by f"(x;x*,u) and call it Chaney's generalized second-order directional derivative of f at x and x* in the direction u.
Remark.
By (b), we see that if u ^ 0, then Xk — X \\xk-x\\
(xk — x)/tk \\xk-x\\ltk
u \\u\y
that is, (xk) converges to x in the direction u. Thus, lim inf ~^{f(xk)
- f(x) - x"(xk - x)}
k~ootk
= lim inf j ^ {f{xk) - f(x) - x'(xk - x)} *-°° || xk - x ||2 = || u f lim inf j fc-»oo || Xk
^ — {f(xk) — X
k
—r t{
- / ( * ) - x'(xk - x)} .
\\'
Hence, f"{x; x', u) equals to the infimum of all numbers || u ||2 lim inf -
jrr {f(xk)
- f{x) - x*(xk - x)}
fc^oo || xk — X | | '
taken over the set of all sequences (xk) such that (a') (xk) converges to x in the direction u and (b') there exists a sequence x"k € df(xk)
converging to x".
112
L. R. Huang and K. F. Ng
162
3
Relationships Between the two Second-Order Derivatives and Application for Convexity
In the case where / is C2 it is well known and easy to verify that the second-order derivatives of Chaney and Ben-Tal/Zowe are given by f"{x;x*,u)
= -
D2f(x; u,w) = -
, V2f{x)u
>
+x'(w),
where x' = S/f(x); this implies the following relationship between the two derivatives: f"(x;x\u)
= D2f(x;u,w)-x"{w),
Vw.
(3.1)
This relationship persists in some nonsmooth cases, e.g., if duf(x) = {x*} and /"(x;x*,u) exists (see [15, Corollary 4.2]. Another instance is provided by Theo rem 3.2 below. But (3.1) may fail to hold in general: the right-hand side of (3.1) may not be a constant function (of w); indeed it may be a convex and non-affine function of w. Based on the works of Ben-Tal and Zowe [3] the following weaken relation was established by Chaney [6] f"(x;x',u)=mi{D2f(x;u,v)-x*{v):veV>}
(e K)
(3.2)
for a very special class of nonsmooth functions. The following theorem 3.1 implies that (3.2) in fact holds in general provided that both sides of (3.2) exist in It. Theorem 3.1 Letx,u,x* G 1Z-n and suppose that x" £ duf{x) andx*(u) = Then f"{x;x',u) <mi{D2_f{x;u,v) - x*(v) :v£lln} < f'l{x;x\u)
D+f(x;u).
provided that the infimum is finite. Consequently, if f satisfies an additional condition that f"(x;x*,u) exists then f"(x;x',u)
= \nt{D2_f(x;u,v)
- x'(v) : v G
ft"}
(3.3)
Remark. In the terminology of convex analysis, the conclusion (3.3) simply says that f"(x;x*,u) equals to the value at x* of the conjugate function of D2_f(x;u,-). This result is proved in [15]. In a special case, it was established by Chaney [6] who considered / of the form f(x) = JlZi 9<(hi(x)) where each hi is a sup-type function and each #; is C2 with <7,'(/i,(x)) > 0. Theorem 3.2 Let x, u, x" G Un such that x" G duf(x), suppose that f"(x\x',u) exists. If inf{D2_f(x;u,w)
- x'(w) : w G ft"}
x'(u)
= f'(x;u),
and
Second-Order Directional Derivatives
163
and sup{D*/(x; u, w) - x'(w) : w € Tln) are both finite then D2f(x; u, to) exists and for all w € TZn f"(x; u, to) = D2f{x; u, to) - x*(to). The following result from [16] show that though Dlf(-,u,0) and D2_f(;u,0) do not equal in general but under reasonable conditions they have the same lower bounds on any open subset of "R.n Theorem 3.3 Suppose that f is regular on 7?" Let W be an open subset of TV1 and u€Kn. Then ini{D2+f(x;u,0)
:x€
w) = inf {Dif(x;u,0)
:x 6
w)
if the left-hand side is finite. Under the same conditions, we also have, for each x, that lim inf D\f(y; u, 0) = lim inf D2_f(y; u, 0) and lim
inf
Dlf(y;v,0)
y—*x,v—*x
T
= lim
D2_f(y;v,0)
inf y—*x,v—*u
Remark. As an application of Theorem 3.3, we note the following corollary which provides a sufficient condition for the existency of D2f(x;u,0). Corollary 3.4 If there exists xo € W such that inf{D*/(x;u,0) : x £ W] =
D2+f(xo;u,0)
then Dlf(xo;u,0)
=
D2_f(xo;u,0).
Indeed, by Theorem 3.3, the given assumption implies that D\f(x0; for all x and hence the equality at x = x0.
u, 0) < D2_f(x; u, 0)
Theorem 3.5 Let f be as in Theorem 3.3. Then f is convex on W if and only if D\f(x\u,Q) > 0 for each x € W and each unit vector u. For detailed proofs as well as the corresponding results in terms of Chaney's derivative, we refer the reader to [16].
L. R. Huang and K. F. Ng
164
4
Second Order Optimality Conditions
Let / , glt ■ ■ -, gm+p be real-valued locally Lipschitz functions on an open set W in 72". We consider the following optimization problems V, Vc without and with constraints respectively: (V)
minimize f(x),x
£ W
and (Vc)
minimize f(x),x
£ W
subject to
9i(x) < 0 for i £ {1,2,■■■,m} gj(x) = 0 for j € {m + 1 , ■•• ,m + p}. For any subset S C. W and x0 £ S, we use K$(x0) to denote the contingent cone [l] of S at x 0 . It is well-known and easily verfied that if x0 £ 5 is a local minimum point of / on S then D-f(x; •) > 0 on Ks(x0). The following result which roughly deals with a converse situation was established by Ioffe [17] for the special case when S = W and by the authors for the general case [14]. Lemma 4.1 Let S be a subset of TZ", x0 £ S and f be a locally Lipschitz function of S into TZ. Suppose that D-f(x0; u) > 0 for any u £ K$(x0)- Then for any e > 0, there exists 6 > 0 such that f(x0) < f(x) + t \\ x — x0 \\ for any x £ S with x / x0 and || x — x0 ||< 8; thus x0 is a strict local minimum point of Fc on S where Ft is defined by Fi(x):=f(x) + e\\x-x0\\. By virtue of this lemma and Ekeland's variational principle [l], one can show (cf. [14]): Theorem 4.2 Let f : W —> 7Z be a locally Lipschitz function such that D_f(x0; •) > 0 on Tln. Ifu £ Tln is such that D-f(x0\u) = 0, then 0 £ duf(x0). More generally, we have Theorem 4.3 / : W —> 72. be locally Lipschitz near x with x £ W. Then 7J_/(i; ■) is continuous, and, for any unit vector u in 7Zn with D-f(x; u) = mini|„||=1 D-f(x; v), one has 0 e 8J(x) + D-f(x;u)Bi, where B\ denotes the unit ball in 72.™. Proof. That D-f(x;-) is a continuous (finite) real-valued function follows from the Lipschitz property of / . By compactness it follows that there exists u £ B^ such that D-f(x;u) = min||„|)=1 D-f(x;v). Fix any such u, let F : 72" —► 72 be defined by F(y):=f(y)-D^f(x-u)\\y-x\\.
Second-Order Directional Derivatives
165
Then it is easy to verify that D-F(x;v)
= D..f(x;v)
- D-f(x;u)
>0
for all v with || v || = 1 and, in particular, D.F(x;
u) = D-f{x; u) - D-f(x;
u) = 0.
It follows from Theorem 4.2 that 0 6 duF(x) : there exist sequences tk, zk and z'k with tk 1 0, ^ 2 —► u and z'k G dF(zk) convergent to 0. Since by [2, Proposition 2.3.3], 9F(zk) C df(zk) + D.f(x;u)d(\\--x 11)00 and since d(- || • - x \\)(zk) Q Bu we write z\ = x"k + D_f(x; u)y'k with xj G 0 / 0 * ) and yl E Bi. Since the subdifferential function takes values locally in a compact set [2, Proposition 2.1.2], and by considering subsequences if necessary, we can assume that x\ —► x* and y"k —> y* for some x* and y* in TV1 These imply that x" G duf(x) and y" G Bi and so 0 = x- + £ > _ / 0 ; u)y* G 9 u /(u) + f»_/(x; w)5i-
□
Remark. Let / : w —> 7?. be a locally Lipschitz function and attain its local minimum at some point x0 G W (thus, in particular D-f(xo; •) > 0 on 1Z\). Suppose u G 72A is such that Z)_/(xo; u) = 0 then, by Theorem 4.2, 0 G duf(x0). Consequently f'l(xo; 0,u) is meaningfully defined by Definition 2.2 and, in fact, it is now easily seen from the definition and the minimality that f'l(xo; 0, u) > 0. This result on necessary conditions for / to attain its minimum extends the corresponding result of Chaney [8] in two aspects: firstly we have removed his additional semismooth assumption of / , secondly the positive definite property of f'l(xo', 0, ■) is now established on the whole set {u : D-f(xo',u) = 0} of ''critical directions" while his result is only on a subset. See [8] and [14] for details. The following result plays a key role for our subsequent discussions. Theorem 4.4 Let f : W —► 1Z be a locally Lipschitz function and x0 G Tin Suppose that D-f(x0; •) > 0 on TZn Let u G TV be a unit vector and let (xk) be a sequence convergent to x in the direction u such that f(xk) < f(x0) for each k. Then 0 G dj(x0) and fZ(xo;0,u) < 0.
Proof. From the assumption of (xk), one has D-f(x0;u) < 0 and in fact the equality holds in view of the given condition D-f(x0; •) > 0 on TZn. By Lemma 4.1 (with S = W), this condition also implies that for any («;) J. 0 with e; G (0,1), there exists (Si) I 0 such that Fit(x) '■= / O ) + «; || x — x0 || attains a minimum on B[x0,6i]
L. R. Huang and K. F. Ng
166
at x 0 . Now for each i, take a large enough fc; such that xki € B[x0, j6;]. Note that if x g B[x0,2 || xki - x 0 ||] then f{xk.) < / ( * . ) < f(x) + £,■ || * - i . ||< / ( * ) + 2e, || xki - io || • By Ekeland's variational principle [l] with A = e]/2 || «*, - x„ || / 2 , we can find zki 6 B[x„, 2 || xki — x 0 ||] such that
(i) I I * * - * * , ||<£! / 2 ||x f c j -x 0 ||/2, (ii) / ( * * ) < / < * * ) , and (iii) /(**,) < / ( * ) + 4«J/2 || x _ ^
|| for
a ll a
g 5 [ x 0 ) 2 || x*. - x 0 |||.
From (i) we have x<, ^ zjt, and zki lies in the open ball B(x 0 ,2 || x^, — x 0 ||). Thus, from (iii) we obtained 0 € df(zki) + 4eJ B\ by results of Clarke [2], and so there exists
*;ea/(xOwith 114 ll<4ej/2; therefore (zJJJ —> 0 6 3/(x„). Note that by (i)
( ( * * . - * * i ) / I I * * . - * . ID — o Then for f, = || x t l - x„ ||, Zfc, - X 0
_ [(zt, - X t , ) / < , ] + [(Xfc, -
x0)/U]
II **.- - *<« II ~ II (2*. - x*. )/U + (xk, - x„)/U || Hence the properties (a'), (b') in the Remark after Definition 2.2 are satisfied by the sequences (zkt), {z*k%). It follows from the definition of / " , (ii) and the inequality / ( * * ) < / ( * . ) that' /"(x„; 0, u) < lim inf j
— — {f(zk%) - /(*„)} < 0.
—°° II «*. - xo II 2
□
This result provides a short proof of the following sufficient condition theorem [14, Theorem 2.9], for the problem V. T h e o r e m 4.5 (Second-order sufficient conditions without constraint). and suppose that (i) D-f(xo;)>0
onTln,
Let x 0 € W
Second-Order Directional Derivatives (ii) / " ( x o , 0 , u) > 0 whenever D-f(x0;u)
167 = 0, and u / 0.
Then there exists 6 > 0 such that f(x) > f(x0) for all 0 ^ | | x — x0 ||< <S. Indeed, if not: there exists a sequence (i^) in W such that x* —» i 0 , Xk ^ x„ and /(**) < / ( i o ) - By compactness of the unit ball in 1ln we assume without loss of generality that xk —» u x„ for some unit vector u. Then D_/(x„;ii) = 0 by (i), and hence / " ( x o ; 0 , u ) < 0 by Theorem 4.4. This contradicts the given assumption (ii). Remark Theorem 1 of Chaney in [9] is a weak form of the following result where he assumed the following stronger condition in the place of (ii): (ii)* / " ( x 0 ; 0,u) > 0 for all unit vectors u in Hn for which 0 £ <9u/(x0). Corollary 4.6 Let x 0 € W, and suppose that (i) v ■ u > 0 for all unit vectors u in 1Zn and v in <9 u /(x 0 ). (ii) f"(xo;0,u)
> 0 for all unit vectors u in 1Zn for which Z)_/(x 0 ;u) = 0.
Then there exists 6 > 0 such that f(x) > /(x 0 ) for all 0 <|| x — x 0 ||< 8. To see this corollary we need only to show that there exists V- £ duf(x0) such that D-f(xo,u) = V- ■ u. In turn, for each t > 0, we apply the Mean Value Theorem of Lebourg [10, Theorem 2.3.7] to obtain at € (0,t) and vt € df(xo + atu) such that - { / ( X 0 + tu) - f{x0)}
= VfU
Passing to the lower limits, we have D-f(xo\u) = liminf ( j 0 f( • "• Since the multi function x —► df(x) is closed and locally takes values in a compact set [10], we can choose a sequence (/„) J. 0 such that lim„_ 00 vtn = u_ for some i;_ € <9/(x0). Clearly V- has the desired properties, as (x 0 + at„u) converges to x0 in the direction u. In connection with the constrained problem (Vc), let x 0 £ W be feasible (i.e. x 0 satisfies the given constraints). Let us say that a locally Lipschitz function F : W —► 11 is an allied function to / at x 0 if F(x) < F(x0) whenever x £ W is feasible and f(x) < f(x„). For example, if 0 = (/?,•)So* € ^ 1 + m +P is a "Lagrange multiplier compatible with x 0 ", that is, if m+p
A , A , ■ ■ ■, A* > 0 and (5m+u ■ ■ -,pm+p such that Pi9i(x.) = 0
€ 11 with £ .=o
# = 1
L. R. Huang and K. F. Ng
168
for all i = 1, • • • , m + p, then the corresponding Lagrange function L defined by m+p
L(x) := &/(*) + J2 pi9t(x) i=l
is an allied function to / at x0. Theorem 4.7 Let L : W —► 11 be an allied function of f at x0 such that D-L(x0; •) > 0 on 1Zn. Let (xk) be a sequence of feasible points in W convergent to x0 in the direction u, and suppose that f(xk) < / ( x 0 ) for all k. Then 0 6 3 u i(x 0 ) and
i'I(x„;0,u) < 0 . Proof. By assumptions, L(xk) < L(x0) for each k. Now the result follows from Theorem 4.4 (applied to function L in place of / ) . Part (ii) of the following theorem is a restatement of Theorem 4.7 while (i) is easy to verify. Theorem 4.8 Let (xjt) be a sequence of feasible point such that xk —►„ x 0 and f{xk) < /(xo) for each k. Then the following statements hold: (i) u € Ks{x0) and D-f(x0;u)
<0
(ii) If L is an allied function of f at x0 such that D-L(x0\-) 0 € duL(x0) and L'L(xo;0,w) < 0.
> 0 on 1in then
Let U denote the set of all unit vectors u in TV1 with the following property: there exists a sequence (x^) of feasible points such that x* —►„ x0, and / ( x t ) < / ( x 0 ) for all it. Theorem 4.9 (Second-order sufficient condition with constraints) Suppose for each u 6 U, there exists an allied function L of f at x0 such that D_L(x0\ •) > 0 on 1Zn and L"(x o ;0,u) > 0. Then there exists S > 0 such that f(x) > f(x0) for every feasible x ^ x 0 satisfying || x — x0 ||< <5. Remark. This result clearly improves an earlier result of ours [14, Theorem 5.7], which in turn extends a result of Chaney [9] where the function L is required to assume the regularity condition. Proof. Suppose not: there exists a sequence xk —► x„ and f(xk) < /(x 0 ) for each k. Without Xk —* u x0 for some unit vector u. Then u G U. allied function L of / at x» such that £>_Z,(x0;-)
(xk) of feasible points such that loss of generality we assume that By assumptions, there exists an > 0 and L"(x o ;0,u) > 0. This
Second-Order Directional Derivatives contradicts (ii) to Theorem 4.8.
169
□
For discussion of the accompany necessary condition theorems for (V) and (Pc) we refer the reader to [14]. Results in terms of the second-order derivatives of Cominetti and Correa [11] are also studied in [4]. A c k n o w l e d g e m e n t s . The author gratefully acknowledge the financial supports from the Research Grant Council of Hong Kong, the United College and the Institute of Mathematical Science, Chinese University of Hong Kong.
References [1] J-P Aubin and I. Ekeland, Applied Nonlinear Analysis (John Wiley & Sons. 1984). [2] A. Ben-Tal, Second-order and related extremality conditions in nonlinear pro gramming, Journal of Optimization Theory and Applications 31(1980) 143-165. [3] A. Ben-Tal and J. Zowe, Necessary and sufficient optimality conditions for a class of nonsmooth minimization problems, Mathematical Programming 24 (1982) 7091. [4] W. L. Chan, L. R. Huang ang K. F. Ng, On generalized second-order derivatives and Taylor expansions in nonsmooth optimization, SIAM Journal on Control and Optimization 32 (1994) 591-611. [5] R. W. Chaney, On second derivatives for nonsmooth functions, Nonlinear Anal ysis Theory, Methods and Applications 9 (1985) 1189-1209. [6] R. W. Chaney, Second-order directional derivatives for nonsmooth functions, Journal of Mathematical Analysis and Application 128 (1987) 495-511. [7] R. W. Chaney, Second-order necessary conditions in constrained semismooth optimization, SIAM Journal on Control and Optimizations 25 (1987) 1072-1081. [8] R. W. Chaney, Second-order necessary conditions in semismooth optimization, Mathematical Programming 40 (1988) 95-109. [9] R. W. Chaney, Second-order sufficient conditions in nonsmooth optimization, Mathematics of Operations Research 13 (1988) 660-673. [10] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley-interscience, New York, 1983.
170
L. R. Huang and K. F. Ng
[11] R. Cominetti and R. Correa, A generalized second-order derivative in nonsmooth optimization, SIAM Journal on Control and Optimization 28 (1990) 789-809. [12] C. N. Do, Generalized second-order derivatives of convex functions in reflexive Banach Spaces, Trnasactions of the American Mathematical Society 334 (1992) 281-301. [13] V. F. Dem'yanov and A. B. Pevnyi, Expansion with respect to a parameter of the extremal values of game problems, USSR Computational Mathematics and Mathematical Physics 14 (1974) 33-45. [14] L. R. Hunag and K. F. Ng, Second-order necessary and sufficient conditions in nonsmooth optimization, Mathematical Programming (1994) 379-402. [15] L.R. Huang and K.F. Ng, On some relations between Chaney's generalized second-order directional derivative and that of Ben-Tal and Zowe, SIAM Journal on Control and Optimization (to appear). [16] L. R. Hunag and K. F. Ng, On lower bounds of the second-order directional derivatives of Ben-Tal - Zowe and Chaney, submitted. [17] A. D. Ioffe, Calculus of Dini subdifferentials of functions and contingent coderivatives of set-valued maps, Journal of Nonlinear Analysis Theory, Methods and Applications 8 (1984) 517-539. [18] H. Kawasaki, Second-order necessary and sufficient optimality conditions for minimizing a sup-type function, Applied Mathematics and Optimization 26 (1992) 195-220. [19] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 959-972. [20] J.-P. Penot, Generalized higher order derivatives and higher order optimality conditions, preprint, Universite de Pau, 1985. [21] J.-P. Penot, Second-order generalized derivatives: comparisons of two types of epi-derivatives, Lecture Notes in Economics and Mathematical Systems 382 (1992) 52-76. [22] R. T. Rockafellar, First-and second-order epi-differentiability in nonlinear pro gramming, Trnasactions of the American Mathematical Society 307 (1988) 75108. [23] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematics of Operations Research 14 (1989) 462-484.
Second-Order Directional Derivatives
171
[24] M. Studniarski, Second-order necessary condition for optimality in nonsmooth nonlinear programming, Journal of Mathematical Analysis and Applications 154 (1991) 303-317.
M. Kocvara a n d J. V.
172
Outrata
Recent Advances in Nonsmooth Optimization, pp. 172-192 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
On the Solution of Optimum Design Problems with Variational Inequalities Michal Kocvara and Jiff V. O u t r a t a Institute of Information Theory and Automation, Czech Academy Voddrenskou vezi 4, 182 08 Praha 8, Czech Republic
of Sciences,
Pod
Abstract
The paper deals with the numerical solution of a class of optimum design prob lems in which the controlled systems are described by elliptic variational in equalities. The approach is based on the characterization of (discretized) sys tem operators by means of generalized Jacobians and the subsequent usage of nondifferentiable optimization methods. As an application, two important shape design problems are solved.
1
Introduction
O p t i m a l control problems, in which the controlled systems are governed by monotone variational inequalities, arise frequently in economic modelling and o p t i m u m design. Inspite of t h e fact t h a t the basic theoretical questions, connected with t h e existence of solutions, optimality conditions and suitable approximations, have been already successfully answered, the numerical solution of such problems remains t o be a difficult task. A classical regularization technique ([8]) consists in converting t h e variational inequality into an equation by means of a smooth penalty. In this way, one obtains a s t a n d a r d optimal control problem, to the solution of which various effective m e t h o d s are available. However, the smooth penalization leads either to a low accuracy or to ill-conditioned problems. Moreover, the incidental presence of state-space constraints which are mostly also treated by a smooth penalization, makes a suitable choice of t h e single penalty p a r a m e t e r s extremely cumbersome. Similar difficulties can be expected when using the penalty approach of [6], where the resulting optimization problem has to be solved on t h e product of the state and the control space.
O p t i m u m Design Problems
with Variation*/
173
Inequalities
In this p a p e r we develop another approach, started with [18], in which we, under suitable a s s u m p t i o n s , handle t h e variational inequality as a nondifferentiable con trolled s y s t e m by using t h e tools of nondifferentiable analysis [2]. As already shown in [19] a n d [17], in this way one can achieve a substantially higher accuracy compared with t h e regularization technique. Moreover, t h e incidental state-space constraints m a y be t r e a t e d by exact penalties, which further contributes to t h e quality of the results. In t h e spirit, our approach is close to the heuristic algorithms suggested in [5], b u t we do not use any heuristic reasoning in its development 1 . Assume t h a t U and Y are Banach spaces, B[U —> Y*\ is a linear continuous operator, t h e m a p A assigns to controls u linear selfadjoint elliptic ([8]) operators m a p p i n g Y into Ym and K C Y is a n o n e m p t y closed convex set. We confine ourselves to the controlled systems governed by the following variational inequality: For a given control u 6 U, find the state variable y 6 K such t h a t {A(u)y - Bu, v - y) > 0 for all v £ K.
. . ^ '
Since A is a selfadjoint operator, solving (1) for a given control u a m o u n t s to solving the convex p r o g r a m
±(y,A(u)y)-(Bu,y)
-> inf
subject to
(2) y e K,
which clearly possesses a unique solution due to the ellipticity of A(u) [8]. Also, if we discretize (1) e.g. by a finite element ( F E ) m e t h o d , a finite dimensional approximation y of y, corresponding to a finite dimensional approximation u of u, can b e found as t h e u n i q u e solution of a finite dimensional convex q u a d r a t i c p r o g r a m m i n g problem of t h e form (2). Therefore it is reasonable to investigate the optimization of systems governed by (1) (after an appropriate discretization) in a more general framework of two-level optimization problems #(u,y) -»inf subject to y 6 argmin?(u,v)
W
u £ w, m
where u £ R , y , v G R",.=, is a closed convex subset of lR n , u> is a compact subset of IR m , a n d for any fixed u 0 £ u the lower-level problems (y?(u0, v ) -* inf subject to
(4)
'After the paper had been finished, the autors learned about another approach proposed by J.S. Pang. This approach relies on a combination of interior and exterior penalties. It requires, however, the strict complementarity to be satisfied at the solution point. This is frequently violated in the applications discussed in Section 3.
174
M. Kocvara and J. V. Outrata
satisfy a number of assumptions. Problems (3) belong to so-called Stackelberg prob lems and have been already intensively studied from various points of view ([20,1,15]). Under the assumptions of [3], guaranteeing that the map S : U H arg min <^(u, v) is a locally Lipschitz operator, (3) may be rewritten into the form 0{u)=0(u,S(u))-Mnf subject to
(5) u 6 u).
Hence, to its numerical solution, existing nondifferentiable optimization (NDO) meth ods can be applied, provided 0 is at least directionally differentiable and we are able to compute vectors from the generalized gradient of 0 for arbitrary u 6 u>. This so-called direct method has been proposed and investigated in [18], where for the computation of the mentioned "subgradients" a passable way is derived provided the operators Vy U y>(u,S(u)) []Rm —> R n ] are surjective for all u 6 w. The aim of this paper is (i) to analyze the nonsurjective situation (appearing often in practical problems) from the point of view of the computation of "subgradients"; (ii) to apply the mentioned approach to two important design problems. Concerning item (i), in the next section we derive conditions, under which a matrix from the generalized Jacobian ([2]) of S at an u 0 6 lRm can be expressed as a linear operator denned by a special quadratic programming problem. The verification of these conditions is not easy in the general case, but assuming a special structure df E, they may be substantially simplified. It allows in particular to test the correctness of the vectors supplied as subgradients in the "packaging" and in the "incidence set identification" problems, used in Section 3 as test examples. However, even without this testing we have not observed any difficulties related to wrong subgradients. For the reader's convenience we recall still the definition of the generalized gradient and generalized Jacobian used throughout the whole paper. Definition 1.1 Let the operator F[Rm —» Ht"] be Lipschitz near x0 € H m and let up denote the set of points at which F fails to be differentiable. The generalized Jacobian of F at x0, denoted 8F(x0), is the set of[n x m] matrices, given by dF(xo) = conv I lim V_F(x,) | Xj —► x 0 , xi £ °~F{ For n = 1, dF(xo) is termed the generalized gradient of F at x0. Its elements are called subgradients. R e m a r k . The term subgradient is borrowed from convex analysis, where it is used for the vectors belonging to the subdifferential. Recently, however, it is frequently used also in a nonconvex setting for the elements of various generalized gradients.
Optimum Design Problems with VariationaJ Inequalities
175
The following notation is employed: A/"(C) is the null space of a linear operator C, S'(x0; d) is the directional derivative of an operator 5 at a point x0 in the direction d, Pi denotes the subspace of the polynomials of the first-order and x' is the i t h coordinate of a vector x G IRm and for x, y G R n the inequalities x > y (x > y) mean x' > y" (x* > y') for all i. H1, H£ are the usual Sobolev spaces W1-2, W01,2 and A denotes the closure of a set A.
2
Computation of the Subgradients
Let S = {yeB."|**'(y)<0, 1 = 1,2,...,?},
(6)
where functions $*[R" —* R] are convex and twice continuously differentiable. As sume that there exists an open subset V of R m containing u> such that
0, 0,
i = l,2,...,P,
. . >
(
v . . where £(u,y,A) = <^>(u,y) + *£. A'$'(y) is the standard Lagrangian. To be able to utilize the strong results of [3, 10], we impose still the Strong Second-Order Sufficient Condition, i.e., (A2) for all u 0 G V, y 0 G 5(u 0 ) and h G R m , h + 0, one has (h,V 2 , y £(u 0 ,yo,A 0 )h) > 0 , whenever (V*'(y 0 ), h> = 0
for
i G J(y 0 ) := {} € 7(y 0 ) | AJ0 > o}
Under (Al) and (A2), S is single-valued on V; moreover, it is locally Lipschitz ([3]) and directionally differentiable ([10, 13] ). The same is true about the operator A[R m —t R p ] , assigning to vectors u 0 G V the corresponding Kuhn-Tucker vectors An. It is well-known that if the strict complementarity condition 7(y 0 ) = J{yo) holds
176
M. Kocvara, and J. V. Outrata
at some y„ = S(u 0 ), u 0 G V, then 5 is even difFerentiable at u 0 ([4]). The gradient VS(uo) is in this case given as the operator which assigns to an arbitrary vector z G R m the (unique) solution of the quadratic program i(v,Q(u 0 )v> +
(8) i/(y„)(uo), v G £/(y„)(uo),
where Q(u0) = V ^ y £ ( u 0 , y 0 ) A0) and for an arbitrary index set G C { 1 , 2 , . . . ,p] LG(u)) = {{vv G £ RR"" | | (V*'(S(uo)),v) (V$ i (5(u 0 )),v) = 0, it 6G G) G}
(9)
If the differentiability of S at some u 0 G V is not ensured by the strict comple mentarity condition, we need for the evaluation of a subgradient from d 0 ( u o ) one arbitrary matrix from the generalized Jacobian dS(u0). Such matrices will now be constructed exactly according to Definition 1.1. Let y„ = S(u 0 ) and let the index set J(y 0 ) satisfy the inclusions (10)
J(yo) C J(y 0 ) C 7(y 0 ).
We denote A = 7(y 0 ) \ J{y0), & = J(y0) \ J(yo), o,ouo2,33 the cardinalities of I(yo),J(yo), A and B, respectively, and C(y 0 ) the [o x m] matrix, composed from V$'(y 0 ), i G 7(y 0 ), as rows. Evidently, C(y 0 ) may be divided into three matrices Cj{y0),CA{y0) and Cg(y0), composed from V$'(y 0 ) for i G J(y0),i G A and i G B, respectively. P r o p o s i t i o n 2.1 Let assumptions (Al), (A2) hold, u 0 G V,y 0 = S(u„) and let the index set J(y 0 ) satisfy inch (10). Assume that the linear system --C CTjA({yo)y\ yo)yI
+ Q(u 00)y^ )y; + y2* + Ce(y0)y5 )y*3
does not possess a solution (y-uy'2,y'3,y'4,yl) the conditions
(yi.y»)>o,
+
Cj(y 0 )yJ + C j ( y 0 ) y ; 2
g R° x R
03
xR
m
=0 =0
l[ii llj
>
3
x R°> x R° , satisfying
(yr,y;)^o
yie^((VXu 0t y 0 )f)n^(C J (y 8 )).
(12)
TTien tne operator Pj (u„) wfcicft assijw to an arbitrary vector z G R " «Ae fwj'^e) solution of the quadratic program i(v,Q(u0)v) + ( V V ^^(U u 0c, y oo)Kz ,vv )) -> inf subject to
(13) v G 6 7j(yo)(uo) ij(yo)(uo)
belongs to
dS(u0).
Optimum Design Problems with Variational Inequalities
177
Proof. In the first step we show the existence of a direction h e R " for which J ( S ( u 0 + Mi)) = 7(5(u 0 + Mi)) 0h)) = J(y J(y00))
(14)
for all sufficiently small positive t). By using of the directional derivatives of S and A, this condition may be rewritten into the form C^(y )5'(u 0 ;h) C4(y 0 )S'(u (A')'(u 0 ;h)
< 0 >0
iovzeB. fortGB.
(15)
Denote by A/(u 0 ) the subvector of A(u 0 ) composed from the multipliers, correspond ing to active constraints. Again, A/(u 0 ) may be decomposed into Aj(u 0 ),A^(u 0 ), A B (u 0 ) in the same way as C(y 0 ). The vectors S'(u 0 ;h), A' 7 (u 0 ;h) form the unique Kuhn-Tucker point of a special quadratic program with the constraints Cj(y0)S'(u0; h) = 0, CA(y0)S'{\i0;h) < 0 and C e (y 0 )S'(u 0 ; h) < 0, (cf. [10]), for which the KuhnTucker conditions attain the form Q(u 0 )S'(u )5'(u 0 ; h) + V ^ ( U o , y 0 )h + C r (y 0 )A',(uo; )A' / (u 0 ; h) = 0 (A')'(u 0 ; h ) ( V $ ' ( y 0 ) , 5'(u 0 ; h)) = 0, (A')'(u 0 ; h) > 0
for i g I(y0) \ J(y„).
(16)
By combining of relations (15),(16) and using the complementarity argument, one immediately concludes that the desired direction h exists whenever the linear system of equalities and inequalities Q(u 0 )S'(u 0 ;h) + V ^ ( u 0 , y 0 ) h + C Cj(yo)A^(u j ( y 0 ) A ^ u 00;;y y 00 ) + CjA' e (u 0 ;y 0 ) = 0 Cj(y0)S'(u0;
h) = 0,
C^(yo)5'(u0;h)<0,
C e (y 0 )S'(uo; h) = 0
(17)
A'B(uo;yo)>0
is consistent. Its consistency is according to the well-known Motzkin theorem of the alternative ([16]) equivalent to the inconsistency of (11), (12) and thus, under the assumptions imposed, the strict complementarity condition holds at the points u 0 + tfh for all tf > 0 sufficiently small. Consequently, S is differentiable at these points and VS(u 0 + tf h) is the operator, assigning to an arbitrary vector z € R m the unique solution v of the quadratic program rfh)v) - (V£u¥>(u0 + Ml, t?h, (5u (Su 0 + Mi))z, tfh))z, v) —► inf | ( v , Q(u 0 + Mi)v) —> inf subject to
(18) vv e G iLj. -
.(u 0 + Mi). (uo-Mh).
With respect to the definition of the generalized Jacobian, it remains to prove that VS(u 0 + tfh) converges for d | 0 to ^J (yo )(uo). To this purpose we denote u,, = u 0 + i?h and observe that V vs(u,) 5W = = r(u,)o(-v>K,s(u,))), rK)o(-V>(u,,5(u,))),
178
M. Kocvara and J. V. Outiata.
where T(u^) is the projection operator which projects (Q(u^)) 1 d, d € E. m , onto Ly, Ju#) in the Q(u#)— metric. Due to the continuity assumptions being imposed, T as well as V yu <^(-, 5(-)) depend continuously on xij over V so that limV5(u,) = P7(yo)(u0)e55(uo) by definition.
□
R e m a r k . In the particular case J{yo) = ^(yo), the variable y j disappears and in (12) we have to require y j > 0, yj ^ 0. Analogously, in the case J(yo) = ^(yo), the whole second equation of (11) disappears and in (12) we have to require y* > 0,
rt± o. Of course, the satisfaction of the above conditions can hardly be tested in the presented general form. Fortunately, these conditions may be drastically simplified in the case, when $'(y) = — y \ i = 1,2,... ,p, arising frequently in applications. Let us delete from Q(uo) and (Vy U c^(u 0 ,yo)) r all columns, corresponding to in dices i 6 J(yo) a n d denote these new [m x (m — ox)] and [n x (m — oi)] matrices by Q(uo) and F(\io), respectively. Corollary 2.2 Let p < m, $'(y) = —y',i = 1,2, . . . , p , let assumption (A2) hold, Uo £ V and yo = 5(uo). Suppose that n + p > m + o2 + o 3 ,
(19)
and the \(n -f p — o) x (m — Oi)] matrix, composed from F(UQ) and the rows of Q(u 0 ) corresponding to nonactive constraints, has maximal rank, i.e. m — 0\. Then the assertion of Proposition 2.1 holds true. Proof. We observe first that if (19) holds as equality, we have to test a square matrix of order m — o%; otherwise, the number of rows is greater than the number of columns so that the maximal rank of this matrix is indeed m — o\. The rows of the matrix C(y 0 ) possess nonzero elements only on positions specified by the index of the corresponding inequality. Since y j 6 A/"(C,/(yo)), one has (yj)' = 0 for t G -/(yo) and thus it suffices to consider only the remaining components. We denote by y\ the subvector of y*z composed from (y^)'',» £ ./(yo), and observe that the condition y% G M ((Vy U (^(u 0 ,yo)) T ) reduces to y\ 6 Af(F(u0)) and the first equation of (11) reduces to the form -C.S(yo)y! + Q(»°)r3 + Cj(y0)yl
+ Cj(y 0 )y 5 * = 0.
As clearly ( - C j ( y o ) y ! + Cj(y0)y;
+ Cj(y0)y5*)* = 0 for all i ? 7(y 0 ),
Optimum Design Problems with Varia.tiona..1nequalities
179
we conclude that necessarily ( Q ( u o ) y - ; )== 00 for all i <£(y„). (Q(uo)y-;) <£(y0).
(20) (20)
Equations (20) together with condition y£ 6 A/"(F(u0)) form a homogeneous linear system, the solution of which must be zero due to the imposed rank condition. Thus the whole vector y j = 0 and we immediately infer that the linear system (11) and conditions (12) are inconsistent due to the linear independence assumption (Al), being evidently satisfied. D Condition (19) restricts the admissible number of active constraints, whose multipliers are zero. It is not too restrictive, because usually p = m and thus it reduces to n > 03 + o3. In the examples solved, n was always larger than 02 + o3 and thus the criterion of the above corollary could be applied. For the evaluation of subgradients from 30(uo) it is not necessary to compute a matrix from 5S(uo) according to Proposition 2.1. Instead of it, we can apply the idea of the adjoint program ([18]) which leads to the following assertion. Proposition 2.3 Assume that g is continuously differentiable on V x HT, assumptions (Al), (A2) hold, uo 6 V, y 0 E 5(u 0 ) and J(y0) is an arbitrary index set satisfying inclusion (10). Let the linear system (11) do not possess a solution satisfying the conditions (12) and p 0 be the (unique) solution of the adjoint quadratic program ll(p,Q(u0,)p) ( p , Q ( u 0 ) p ) - (V (Vy^u0,y0),p) y f f (u 0 ,y 0 ),p) —> inf subject to (21) (21)
pP €€ ^W 11*)Then VuS(u0,y 0 ) - ( v X u 0 , y 00 ) ) T p o 6 * » ( « . ) .
(22) (22)
Proof. According to the Jacobian Chain Rule ([2]) TT (^J(yo) (uo)) £ := V u p(u 0 ,y0) + {Pj( (uo)) Vyyff (u 0 ,y 0 )0 G 3 0 ( u o ) . yo)
As already mentioned in the proof of Proposition 2.1, f
r
u ^(yo)( °) = ( u °o() -°V ( -> V(Xuuoo, ,yyoo)))) .. JWH = rW
where T(u 0 ) projects ((u 0 )) _1 d, d € !Rm onto Lj, This projector is symmetric (cf. [18]) and so
,(uo) in the <5(u 0 )-metric.
T £ = Vu5l (u00,y0) - (VyuV (u0,y ,y0)) r(u00)V )Vy5y5(u0,y0). (u0,y0). u5 (u yu ^(u 0 )) r(u
However, r ( u 0 ) V y ^ ( u 0 , y o ) is nothing else but the solution p 0 of our adjoint program (21) and thus inclusion (22) holds true. D This way of computing subgradients has been applied in the design problems investigated in the next section.
180
3
M. Kocvara and J. V. Outrata
Two Selected Optimum Design Problems
In this section we present two examples of shape design problems. In both examples, the controlled system describes the behaviour of a membrane supported by a rigid obstacle. In the first problem we want to find such a shape of the membrane, that its surface is minimized, while it is still in contact with a given part of the obstacle. In the second problem we do not care about the surface, but we want to manage the contact between the membrane and the obstacle exclusively on that given part. However, this part can move now, together with the boundary of the membrane. The first optimum design problem known as packaging problem has been analyzed and numerically solved in [7, 8] via the above mentioned regularization technique. In the following we show that results obtained by the technique proposed in this pa per are substantialy different both in terms of the design variables and the objective function. The second, more complicated problem, known as incidence set identifica tion problem has been analyzed in [9]; however, as far as we know, it has not been numerically solved yet. We start with the description of the controlled system which is the same for both problems.
3.1
Membrane with a rigid obstacle
Let us introduce the set of admissible design variables Vu = i u € C^QO, 1]) | 0 <
Cl
< u(x2) < c2, l ^ r « ( * 2 ) | < c 3 a.e. in (0,1) 1 ,
where C\,C2,cs are given positive constants such that Uad =f= 0. Consider a family of admissible domains Q(u) with variable right "vertical" part of the boundary: Q(u) = {(*!, x2) e IR21 0 < xx < u(x2) for all x2 € (0,1)} . Denote by O = (0,c 2 ) x (0,1) the largest domain from this family. Let ip € C(Q) be the obstacle function such that ip < 0 on <9fi U ((ci,C2) x (0,1)), and K(u) = {v 6 Hg(il(u))\v > i\> a.e. in fi(«)} be the set of admissible states. For u £ t/ 0 j, the corresponding state of the controlled system is computed by solving the variational inequality: Find v = v(u) € K(u) such that (VD, V(w - u)) 0 i n ( u ) > ( / . w ~ ")o,n(u) for all w €
K(u),
V{u)
where (., .)o,n(u) stands for the scalar product in L2(il(u)) and / € L2(il(u)). In the notation of (1), A(u)v = —Av on fi(u),B(u) = £f, where £ is the canonical embedding of L2(£2) into .//"'(H); however, the convex set K is in V(u) replaced by a set-valued map K(u).
181
Optimum Design Problems with VariationaJ Inequalities
Figure 1: Discretization of fi and u. The state problem V(u) describes the deflection of a membrane fixed on the bound ary given by dil(u), loaded by the pressure / and supported from below by a rigid obstacle described by xp. We discretize V(u) by the finite element method in the following way. Let 0 = a0 < di < • • • < (/,) = 1 be a partition of [0,1]. The discretization of Uad is defined as follows: E^ = {«AeC([o,i])|«t|[a,_J(%]ePi, 0 < ci < uh < c 2 ,
|ufc(a,-) - « A (o,_i) CLi —
l,...,D(h)\,
di-
i.e., U^d contains piecewise linear functions from Uaj. Further, we introduce a subset Uad C TRD(h), isometrically isomorphic with U^d: Uad = {ue
R D ( / , ) | u*' = uh{ai) for some uheU^d,
i = 1 , . . . , D(h)} ,
i.e., it is the set of vectors of ii-coordinates of the nodes u ' = (u A (a,),a,),
i=
l,...,D{h).
For Uh 6 U^d we define a polygonal computational domain il(uh) = { ( n , i j ) € R 2 | 0 < X i
0<x2<
l}
and construct its triangulation T(h, uh) depending on the mesh parameter h as well as on Uh. and consisting of two parts: the fixed triangulation of a rectangle (0, co] x [0,1] with Co < cx and the moving triangulation constructed by means of principal moving nodes (design nodes) u' and associated moving nodes, ^-coordinates of which are given by an equidistant partition of the segments [co,u/,(a,)], see Fig. 1. Thus, for a fixed h > 0, the triangulation T{h,uh) depends continuously on uh. The domain fi(uh) with a given triangulation T(h,uh) will be denoted by ilh-
M. Kocvara and J. V. Outrata
182
For discretization of P(u) we use the triangulation T{h,Uh) and the set of piecewise linear basis functions (Courant basis functions). In the standard way we obtain the stiffness matrix A(u) € IR NxA ' and the right-hand side vector f(u) € R N , con tinuously depending on a given u € Ua(j (N = N(h) is the number of nodes of the triangulation T(h, u^)). Denote by x', i = 1 , . . . , N, the nodes of T(h,Uh) and by * ' = il>(x'), i = 1 , . . . , N. The discretized problem can be written as the quadratic programming problem l{v,A-(f(u),v)—faif subject to
v 6 Kh(u),
) >
('P(u))^
J
where u 6 \Jad and K (u) = { v e R " | v- > *«', i = 1 , . . . ,N) h
Since the triangulation T(h,Uh) depends on u/, £ Ujjd, the same holds for ^ ( u ) . In order to be able to use the direct method described in Section 2, we perform a simple transformation y = v-*(u) and replace ('P(u))/ l by the problem subject to
| ( y , A(u)y) +
y>o.
CP(u))h
J
In such a way, the values of Kk are replaced by R + and u enters only the objective oi(V{u))h.
3.2
Packaging problem
In the packaging problem we try to find such a control u* for which the contact of the membrane with the obstacle occurs in a given subset of $7, while the surface of the membrane (i.e., the measure of fi(u*)) is minimized. Let fi0 be a given closed simply connected subset of [0,co] x [0,1]. For u 6 Uai, denote by Z(u) the contact region which is the set {x S 0(u) | v(x) = ip(x)}, where v is the solution of V(u). The packaging problem is defined as follows: G(u) = meas fi(u) —► inf subject to Z{u) D n 0 , u G Uad, v solves V(u).
(P)
183
Optimum Design Problems with Variations/ Inequalities
If the set {u 6 Uai I Z{u) D H0} is nonempty, then (P) has at least one solution
([8, Thm. 6]). For the treatment of the state constraint Z(u) D fto, an exterior quadratic penalty technique has been proposed and analyzed in [7, 8]. However, as v > i\>, the direct method being applied allows us to augment this state constraint by a differentiable exact penalty. (This is impossible in the regularization technique used in [7], where the relationship v > i\> does not hold.) One obtains then an augmented objective functional GT(u,v)= meas Q(u) + r / (v-ifi)dx, Jn0 where r > 0 is the penalty parameter. Hence, instead of (P), we will solve the problem GT(u, v) —► inf
subject to (Pr)
u £ Uad,
v solves V(u). In [8, Thms. 9.3, 9.4] it is shown that for exterior quadratic penalty the penalized problem has at least one solution for any r > 0, and if r —► oo, then u r , the solutions of ( P r ) , converge uniformly to u", the solution of (P). Analogously, the same can be proved for Gr with the exact linear penalty. The discretization of ( P r ) is straightforward. Let T>0 be the set of indices of nodes lying in fi0. The discretized problem reads as follows: £ r ( u , y ) = meas Uh + — ^y' h
—► inf
i£V0
subject to
(P.) u e Uad y solves {P(u))h,
with r > 0. It is known ([8, Thm. 9.5]) that if h —> 0+ then uTh, the solutions of (Pr);,, converge uniformly to u r , the solution of (P r ). Problem (P r )/i is now exactly of the form (3). It satisfies all requirements needed by the direct approach; in particular, the appropriate function 0 is locally Lipschitz and directionally differentiable. For the evaluation of its subgradients, relations (21) and (22) can be applied which provides us with the formula V u £ r ( u , y ) - [V u (A(u)y + A ( u ) * ( u ) ) - V f ( u ) ] T p € 5 0 ( u ) , where y solves {V(u))h
(23)
and the variable p solves the adjoint quadratic program i ( p , A ( u ) p ) - ( V y £ r ( u , y ) , p ) —► inf
subject to
(24) p{ = 0
for i 6 J{y),
M. Kocvara and J. V. Outrata
184
where J ( y ) C J ( y ) C J(y). In the computations performed, we have set J(y) = I(y) during the whole it eration process. To be correct, one should apply the test of Corollary 2.1 at each point, where the strict complementary condition is violated. Fortunately, we have not met any computational difficulties, which shows the robustness of our approach when applied to this kind of controlled systems. Example 3.1 (see [7]): Consider the packaging problem in which fl0 = [0.25, 0.5] x [0.25, 0.75], f(xi,x2) = —1.0 and ^ ( x ^ x j ) = —0.05xi. The set Uad is specified by the parameters c\ = 0.6, c-i = 1.0, C3 = 3.0, the triangulation parameter CQ = 0.5. First we have used (as in [7]) the quadratic penalty term — - 2 ^ (y') 2 w ' t n penalty 2h .ePo 4 parameter r = 10 The problem has been computed by the NDO code BT [19]. The discretized problems (V(u))i, have been solved by a two-step algorithm introduced in [13]. The results obtained for h = ~g (D(h) = 17) are close to those of [7] at least concerning the optimal value of the objective (E°pt = 0.784213). However, the used quadratic penalty has led to considerable inaccuracies in the satisfaction of the state constraint (up to 4 % of the deflection at the front corners of f?o)- Therefore, we have solved the problem with the exact penalty. The penalty term — 52 (y1) has been used for three discretizations given by i€Z>o
^ = 8' 16' 1*2' 64' Th e P e n a lty parameters r have been chosen large enough to satisfy the state constraint exactly. Their values together with the corresponding optimal objective values E°pt are given in Table 1.
TABLE 1 D{h) 9 17 33 65
r E?* 8 • 103 0.787932 8 • 104 0.826013 8 ■ 105 0.850895 8 • 106 0.866364
The final design for D(h) — 65 is depicted in Figure 2. We see that in the set Ho the isolines of the solution follow the isolines of the obstacle (which are parallel to 12—coordinate). Comparing the values of E°vt for the quadratic and linear penalty, respectively, we see a significant difference (5% for h = -h). Also the resulting optimal design is quite sensitive to the exact satisfaction of the state constraint. This is even more evident in the next example.
Optimum Design Problems with Variationai Inequalities
185
Figure 2: Optimal design for Ex. 3.1, D(h) = 65. Example 3.2 (see [6]): Let rf>{xux7) = -0.05(iJ + (z 2 - 0.25)2) and all other data be the same as in Example 3.1. Again, the linear penalty has been used for three discretizations given by h = | , ^ , J J i . The values of the penalty parameters r and the corresponding optimal objective values E°pt are given in Table 2. TABLE 2 £>(fc) 9 17 33 65
r E°?" 1,6 ■ 104 0.780361 3 • 105 0.900842 5 • 105 0.934860 1 106 0.980475
The final design for the finest mesh {D{h) = 65) is depicted in Figure 3. Again, in do, the isolines of the solution coincide with isolines of the obstacle, which are depicted in Figure 4. The comparison of the obtained optimal design with that computed via regular ization technique and with the quadratic penalty is quite interesting in this example. When comparing the maximum components of the design vectors for the same dis cretization parameter h = jr, we obtain the difference of 11%, i.e., the resulting designs differ substantially.
186
M. Kocvara and J. V. Outrata
Figure 3: Optimal design for Ex. 3.2, D(h) = 65.
Figure 4: Isolines of obstacle for Ex. 3.2.
Optimum Design Problems with Variations] Inequalities
187
Note that it is necessary to use a nonsmooth code in this approach. Numerical test performed with a smooth (SQP) code failed in most cases due to line-search difficulties at a point which was far from the true solution.
3.3
I n c i d e n c e set identification
In the second design problem of incidence set identification ([8]) we do not take care about the surface of Q(u) as in the packaging problem, but we want to manage the contact between the membrane and the obstacle exclusively on n 0 . However, in contrary to the first problem, this set can move together with the moving boundary of fi(u), because now fio = {(xi,x2)
e E. x [f,S\ |wi(x 2 ) < zi
,
where 0 < 7 < 6 < 1, wj, a>2 £ C0,1([7,<$]) and have uniformly bounded derivatives a. e., and for given positive scalars e, A, C4 e < wi(x 2 ) + A < LJ2(X2) < c4,
for all x2 G [7,5].
In [8], two different objective functionals, expressing the "identification" require ment have been proposed. In our approach we utilize the complementarity conditions to create another suitable objective, but we introduce it only after the problem has been discretized. The discretization we start with the domain S70. Let 7 = 60 < &i < • • ■ < bo'{h) — & be a partition of [7,1$] corresponding to the partition a0 < ■ ■ ■ < ar}(h.) of [0>1]> i-e-> such that the x2—coordinates of 6, coincide with x2—coordinates of some a,-; see Figure 5. Define for a given positive c5 (specifying the upper bound for the derivatives) a new set of admissible (discretized) design variables corresponding to il0:
Kd = {"h = («!*,«») G (C([i,6]))2 \ujh\lb,_lM
G Pu
0 < e < Uik + A < u2h < c4, i = l,...,D'(h),j
< c5, h - k-l = l,2}
and for ilk = (wi/i, "2/1) € U'J^ the associated discretized set fi0fc(w/i) as fto/i(ufc) = {(xi, x2) G R x [7,S] I u u (a; 2 ) < xx < u2h(x2)} . For uh G U£d and fih = («U,W 2 A) 6 tC> the single principal moving nodes are given by couples
(ulh(bi),k),
(u/.(<2;)>a0) (u3h(bi),bi),
i= i=
l,...,D(h), l,...,D'(h),
M. Kocvara and 3. V. Outrata
188
Figure 5: Discretization of n,u,o>i and u>2. and X\—coordinates of the associated moving nodes are given by an equidistant par tition of segments for x'2 < 7 or xl2 > 6,
[0,u/,(a<)] [0,uih(bj)],
[uih(6j),«2k(&i)], [u2h(bj),uh(ai)]
for 7 < x*3 < 6,
see Fig. 5. Analogously to U a j , we introduce still a set ~Vad by
V a „ = {u € R°(fc)+«'(M I u .
= Uh{a.):
uD
w
'
w
£ = ! , . . . , D(fc), /y(A),
= «»(*), i = l , . . . , t f ( A )
for some u h 6 C&, («u, u2A) 6 U*dj , i.e., V a j contains vectors of ii—coordinates of all principal moving nodes. For the construction of the objective we utilize the fact that if y' > 0 at some node in (T'(u))/,, then the coordinate A' of the appropriate (unique) Kuhn-Tucker vector must be zero. As A = A(u)y + A ( u ) t f ( u ) - f ( u ) ,
189
Optimum Design Problems with Varia.tiona.1 Inequalities we may employ the objective £ (A(u)(y + * ( u ) ) - f ( u ) ) \ where V\ = {1,2,...,
N}\
2>a. Thus, the discretized problem of identification of the incidence set can be formu lated as *v(u,y)= E (A(u)(y+*(u)-f(u))' + - [ subject to u G V„ d , ^ y solves {V(u))h,
?
£yi^mf
' (P'r)*
where r > 0 is the appropriate penalty parameter. Of course, even if E'r = 0, "semiactive" contacts with zero multipliers may occur; however, this undesirable phenomenon has not been observed in computations. Problem (P'r)h may again be solved numerically by using of the direct method and a suitable NDO routine. In this case, however, the computation of subgradients according to (21), (22) is slightly more complicated than in the packaging problem, because the evaluation of V u £ ^ , VyJE^ is nontrivial and the dependence of A, f on the design variable is more complex. Example 3.3. Consider the problem of identification of the incidence set where f(xi,x2) = —1.0 and i>{xi, 12) = —0.03. The sets Uad, U'Jj are specified by parameters c, = 0.7, c2 = 1.2, c3 = 2.5, 7 = 0.25, S = 0.75, e = 0.15, A = 0.05, c4 = 0.65. The proper choice of the (linear) penalty parameter r is more difficult than in the obstacle problem, because both terms in the objective functional E'r are of the same nature and thus r determines a scaling between them. For r = 33, h = yg and h = ^ , the final values of the objective functional are E'ropt = 0 and £^ p< = 0.395618 • 10" 4 , respectively. In Figure 6 we see the initial design of $7 and fi0 and the corresponding isolines of the solution. Figure 7 shows the optimal design for h = ^ ; the boundary of fl0 in fact coincides with the isoline —0.03. Finally, Figure 8 shows a 3D view of the solution and the obstacle with changed scaling in the vertical axis. We see that the problem constraints are satisfactorily satisfied. The obtained results could not be compared with any other ones, because in [9] there is no attempt to solve the problem numerically.
4
Conclusion
The direct method proposed in [18] and further developed in this paper proved to be an effective tool for the numerical solution of the considered shape optimization problems. It may be recommended whenever the number of design variables is small with respect to the number of state variables and a 'high' accuracy of the results is required. Another applications of this technique are reported in [11, 12].
190
M. Kocvara and J. V. Outrata.
Figure 6: Initial design for Ex. 3.3, D(h) = 33.
Figure 7: Optimal design for Ex. 3.3, D{h) = 33.
Optimum Design Problems with Variational Inequalities
191
Figure 8: Optimal state solution for Ex. 3.3, D(h) = 33.
References [1] J. P. Aubin and I. Ekeland, Applied Nonlinear Analysis, J. Wiley & Sons, New York, 1984. [2] F. H. Clarke, Optimization and Nonsmooth Analysis, J. Wiley & Sons, New York, 1983. [3] B. Cornet and G. Laroque, Lipschitz properties of solutions in mathematical programming, Journal of Optimization Theory and Applications 53 (1987) 407427. [4] A. V. Fiacco, Sensitivity analysis for nonlinear programming using penalty meth ods, Mathematical Programming 10 (1976) 287-311. [5] T. L. Friesz, R. L. Tobin, H.-J. Cho and N.J. Mehta, Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality con straints, Mathematical Programming 48 (1990) 265-284. [6] P. T. Harker and S. C. Choi, A penalty function approach for mathematical programs with variational inequality constraints, WP 87-09-08, Dep. Dec. Sci., Univ. of Pennsylvania, 1987.
192
M. Kocvara and J. V. Outrata
[7] J. Haslinger and P. Neittaanmaki, On the design of the optimal covering of an obstacle, in: Lecture Notes in Control Inf. Sci. 199, Springer-Verlag, Berlin, (1988) 192-209. [8] J. Haslinger and P. Neittaanmaki, Finite Element Approximaiion for Optimal Shape Design: Theory and Applications, J. Wiley & Sons, Chichester, 1988. [9] K.-H. Hoffmann and J. Haslinger, On the identification of incidence set for elliptic free boundary value problems, DFG Research Report No. 174, Augsburg, 1989. [10] K. Jitorntrum, Solution point differentiability without strict complementarity in nonlinear programming, Mathemaiical Programming Study 21 (1984) 127-138. [11] M. Kocvara and J. V. Outrata, Shape optimization of elasto-plastic bodies governed by variational inequalities, in: J.-P Zolesio, editor, Boundary Control and Variation, Lecture Notes in Pure and Applied Mathematics 163, pages 261271. Marcel Dekker, 1994. [12] M. Kocvara and J. V. Outrata, A numerical approach to the design of masonry structures, in: Proceedings 16th IFIP Conf. on System Modelling and Optimization, Compiegne, July 5-9, 1993 (to appear). [13] M. Kocvara and J. Zowe, An iterative two-step algorithm for linear complementarity problems, Numerische Mathematik 68 (1994) 95-106. [14] J. Kyparisis, Sensitivity analysis for nonlinear programs and variational inequalities with nonunique multipliers, Mathemaiics of Operations Research 15 (1990) 286-298. [15] P. Loridan and J. A. Morgan, Theoretical approximation scheme for Stackelberg problems, Journal of Optimization Theory and Applications 61 (1989) 95-110. [16] O. L. Mangasarian, Nonlinear Programming, McGraw Hill, New York, 1969. [17] M. Makelaand P. Neittaanmaki, Nonsmooth Optimization, World Scientific, Singapore, 1992. [18] J. V. Outrata, On the numerical solution of a class of Stackelberg problems, Zeitschrift fur Operations Research 34 (1990) 255-277. [19] H. Schramm and J. Zowe, A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results, SIAM Journal on Optimization 2 (1992) 121-152. [20] H. von Stackelberg, The Theory of Market Economy, Oxford University Press, Oxford, 1952.
Monotonicity
and
Quasimonotonicity
193
Recent Advances in Nonsmooth Optimization, pp. 193-214 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Monotonicity and Quasimonotonicity in Nonsmooth Analysis 12 Sandor Komlosi Faculty of Economics, Janus Pannonius H-7621, Rdkoczi ut 80, Pecs, Hungary
University
Abstract
The role of the monotonicity concept for linear operators, bifunctions and multifunctions in several branches of Mathematics such as Functional Analysis, Nonlinear Analysis, Optimization Theory is rather well known. Quite recently the notion of monotonicity has been generalized by different authors. The aim of the present paper is to give some recent contributions to the theory of gen eralized monotonicity in a unified way incorporating results for a wide range of generalized derivatives and subdifferentials.
1
Introduction
T h e role of t h e monotonicity concept for linear operators, bifunctions and multifunctions in several branches of M a t h e m a t i c s such as Functional Analysis, Nonlinear Analysis, O p t i m i z a t i o n Theory and its relationships with convexity are rather well known [1,2,8,30,35,36]. For an instance we mention only two well known theorems from Convex Analysis: (i) a directionally differentiable lower semicontinuous function is convex if and only if its directional derivative is a m o n o t o n e bifunction; (ii) a lower 'This paper is the written version of the invited lecture delivered by the author on "Workshop on Nonsmooth Analysis and its Applications" (Banach Center, Warsawa, May 3-14, 1993) and which was completed in October 1993, during the author's stay at the Department of Mathematics, University of Pisa. I take the opportunity of thanking Professor Franco Giannessi and Professor Massimo Pappalardo for their warm hospitality. 2 Partially supported by the Hungarian National Scientific Research Foundation (Grant No. T013967).
S. Komlosi
194
semicontinuous function is convex if and only if its convex subdifferential is a nonvoid monotone multifunction [30,33]. Since it has been well known that the convexity assumption on the function can be weakened to certain kind of generalized convexity assumption without "destroying" the nice results valid for the convex case, therefore there has recently been similar attempts to weaken the monotonicity assumption to some kinds of generalized monotonicity concept. In 1976 Karamardian introduced a concept of pseudomonotonicity for gradient maps [16]. Some years later, in 1983, Hassouni introduced the notion of quasimonotonicity for multifunctions [14]. The paper of Karamardian and Schaible [17] from 1990, where several kinds of generalized monotonicity were introduced and studied for gradient maps, might be considered as the one opening a new theory, the theory of generalized monotonicity. These concepts were extended to nondifferentiable functions (for generalized derivatives and subdifferential maps) by Komlosi [19-25] and Luc [27,28] and for Equilibrium Problems by Blum and Oettli [2], Further remarkable results on this topic can be found in [3,4,5,9,10,12,15,31,36,37]. The aim of the present paper is to give some recent contributions to the theory of generalized monotonicity in a unified way incorporating results for a wide range of generalized derivatives and subdifferentials. The paper is organized as follows: Chapter 2 gives a short overlook to monotonic ity concepts proved to be useful in several branches of optimization theory and related topics. Chapter 3 is devoted to the introduction of different generalized monotonicity concepts for bifunctions (generalized derivatives) and the study of their interrelations with certain kinds of generalized convexity. In Chapter 4 we show how monotonicity can be characterized via quasimonotonicity. These results are applied in Chapter 5, where quasimonotonicity - quasiconvexity results are proved applying suitable mean value theorems, due to Diewert and Zagrodny, respectively. Chapter 6 is devoted to the generalized monotonicity of multifunctions, for which different characteriza tions are given. For special subdifferentials generalized monotonicity is linked with generalized monotonicity of the generalized derivatives associated with them.
2
Monotonicity Concepts in Nondifferentiable Optimization
Let X, X* and (■, •) denote, respectively, a real Banach space, its topological dual and the canonical bilinear form on X* x X throughout this paper. The following models present the most favourable classes of optimization theory and related fields. Mathematical Programming Problem (MPP): minimize / ( x ) x £ C, C CX,
Monotonicity and
Quasimonotonicity
195
where the function / is, / : C —► R. Complementarity
Problem (CP): find x £ l such that xGC,
F ( X ) G C * and
(F(x),x)
= Q,
where C C X is a given convex cone, C" C X' is the dual cone to C and F is a given function, F : C —» X' Variational Inequality Problem (VIP): find x a X and y € X* such that x G C, y 6 T(x) and (y,z - x) > 0 for all z 6 C, where C c l and T is a multifunction T : C ^
X*.
Equilibrium problem (EP): find x £ . Y such that / ( x , y ) > 0 for all y € C, where C C X and f : C x C ^ R with / ( z , z) = 0 for all z G C There are interesting interrelations between these classes of problems, namely (MPP) can be formulated as (CP), (CP) can be given in the form of (VIP) and all of them may have the form of (EP). More details on these interrelations can be found in [2,8,37]. It is well known that in existence proofs for (MPP) the convexity of the objective function / ( x ) is a celebrated property, whereas for the other problems (CP), (VIP) and (EP) the monotonicity of the functions F(x), / ( x , y) and the multifunction T(x) guarantees the existence of solutions and supports algorithms computing them. For the sake of convenience let us recall the definitions of monotonicity for the different cases considered. a.) F:C ^X", C CX, (F(y)-F(x),y-x)>0 b)T:C
-> X*, C
for all
x,y€C
(1)
CX,
< r ( y ) - r ( x ) , y - x ) C R+
for all
x,y<EC.
(2)
Here ( r ( z ) , u ) denotes the set {(z*,u) | z* G T(z) } and R+ denotes the set of nonnegative reals. c) / : C x C -> R, with / ( z , z) = 0 for all z G C, /(x,y) + / ( y , x ) < 0
for all
x,yGC.
(3)
S. Komlosi
196
The interrelations between monotonicity and convexity are well known, moreover behind both properties there are well developed theories: Convex Analysis, Minty's Theory [35]. By a classical result, the monotonicity of the directional derivative / ' ( x , d) com pletely characterizes the convexity of the given function / ( x ) . Recently similar char acterizations were obtained in terms of some other kinds of generalized derivatives such as Clarke-, Rockafellar- and Dini derivatives [6,22,26].
3
Generalized Monotonicity for Bifunctions
Generalized derivatives might be considered in a unified way as a bifunction h(x,d) with finite or infinite real values, where x refers to a given point of a given subset C from X and d refers to a given direction of X. Since the domain for d is always the whole space X, therefore it is sufficient to specify the domain of x only and so, for the sake of brevity, we will say that /i(x, d) is a bifunction defined on C. Let us begin with introducing generalized monotonicity concepts for generalized derivatives proved to be useful for nondifferentiable functions [19]. Definition 3.1 Let /i(x,d) be a bifunction defined on the convex set C. h(x,d) is called monotone, strictly monotone, quasimonotone, pseudomonotone, strictly pseudomonotone on C, if for every y , z 6 C, y / z, condition (M), (SM), (QM), (PM), (SPM) holds, respectively: (M)
h(y,z-y)
+
h(z,y-z)<0,
(SM)
h{y, z - y) + h(x, y - z) < 0 ,
(QM)
h(y,z-y)>0
implies
h(z,y - z) < 0 ,
(PM)
h(y,z-y)>0
implies
/j(z,y-z)<0,
(SPM)
h(y,z-y)>0
implies
h(z,y - z) < 0
Remark. In the definitions of (M) and (SM) we adopt the following rule: (+oo) + (-oo) = 0 . The following interrelations are immediate consequences of the above definitions. (SM) => (M) => (PM) => (QM)
Monotonicity and
197
Quasimonotonicity
and (SM) =► {SPM) => (J°M) => (QM). Remark. If / ( x ) is differentiable with gradient map F(x) = / ' ( x ) , then the monotonicity concept (1) for F(x) coincides with (M) if we apply it for h(x,d) = ( F ( x ) , d ) . It should be mentioned that for this choice of h(x,d) the other kinds of generalized monotonicity concepts gives just the ones introduced by Karamardian and Schaible [17]. If ft(x,d) is a generalized gradient, then (M) is equivalent to (3) with / ( x , y) = h(x, y - x). One of the most useful applications of the generalized monotonicity concepts is that these properties can be related to appropriate kinds of generalized convexity [15,17,20,23,27, 28]. To proceed this way let us recall first some definitions. Definition 3.2 Let the function / ( x ) and the bifunction h(x, d) be defined on the convex set C C X. / ( x ) is called convex, strictly convex, quasiconvex, h-quasiconvex, h-pseudoconvex or h-strictly pseudoconvex on C, if conditions (CX), (SCX), (QCX), (h-QCX), (h-PCX) or (h-SPCX) hold, respectively: (CX): for all x , y 6 C and t g [0,1] one has
/(tx+(l-i)y)<*/(x) + (l-i)/(y), (SCX): for all x , y € C, x / y and t 6 (0,1) one has
(*x+(l-t)y)
/(ix+(l-*)y)<max{/(x),/(y)}, (h-QCX): for all x,ye
C,
/(x) < /(y) (h-PCX): for all x,y
implies
ft(y,x-y)<0,
implies
h{y, x - y) < 0 ,
implies
h(y,x-
(EC, /(x) < /(y)
(h-SPCX): for all x , y € C, x ^ y, /(x) < /(y)
y) < 0 .
Remark. The above definitions can be applied for function / ( x ) and bifunction /i(x, d) with 'extended' real values, as well. The following interrelations are immediate consequences of the definitions: (SCX) => (CX) => (QCX),
198
S. Komlosi ( h - S P C X ) => ( h - P C X ) =*• ( h - Q C X )
In the sequel we shall focus our attention to some special kinds of generalized derivatives of a given function / ( x ) at a, whose definitions are given below using the following notations [13]: ^ . (z, a) -> (a, /(a)) and ( z ) ,, and a > ((z,z ,Qa)) || a < (z,a)->(a,/(a)) > //(z) ( z , a ) ||aa< ! -*=*= ) - (z,a) -» ((a, a , //(a)) ( a ) ) and and a < //(z) (z) . (z,a) (z,a)-> Dini derivatives (upper and lower):
/(a ,Di A\ r / ( a + id)f d ) -/(a) - / (; a ) D / (a,d) (a, d) ::= hmsup —, = limsup (—0+ (—0+ ti /o(a,d) := liminf
/(a + id) - /(a) t
1
Dini-Hadamard derivatives (upper and lower): /
— -/(a) ,
(a, d) = hm sup
"-To*? & a /00 H(a,d) liminf^ H ( a , d ) = ]imini^ u-»d
+ tU U +
^ ' ~^ ^ ; t
Clarke derivatives (upper and lower): rC/ tCi J \ if(z + td) - a f (a,d) = hmsup , (z, 0 )|a
/ C ( a , d ) = liminf / ' ( Z 1-.0+
+ M )
~
a
;
t
Rockafellar derivatives: z+ / H" ( a , d ) = = l limsup i m s u p inf -^^Z + **U")) -—-- ,,
(z,o)la / n ( a , d ) = liminf sup / ( z + * " ) - < * . weaA; Rockafellar derivatives: a u f *(a,d) = + **U) ) ~~ / ( aa ) _, = limsup u infd / ((& +
i_o+
—
'
Monotonicity and
Quasimonotonicity
199
t i A\ v • t /(a + *u)-/(a) fwR(a,d) = liminf sup — -^-^ . Remark. The "limsup inf" and "liminf sup" operations were introduced by Rockafellar [34,35]. The meaninig of limsup inf (_0+ (z.o)la
u
—d
operation, for istance, is the following sup lim sup e>0
inf
(^o+ l l u - d l l < £ (z,o)ia
The meaning of the other operations used above are similar. The following theorem provides the relationships between these derivatives ac cording to the partial order "<", the proof of which can be found in [13]. Lemma 3.3 For all x G C and d 6 X one has /5(x,d) I V /c(x,d) |V /*(x,d)
> > >
/D"(x,d) I v /D(x,d) |V rfi(x,d)
> > >
fwR(x,d) |v /D(x,d) |V /0ff(x,d)
> > >
/„(x,d) | V /c(x,d) |V //(X,d)
where the "unnamed" derivatives / s ( x , d ) and / / ( x , d ) are defined as follows: fSi J\ r / ( z + ru) - a / (a,d) = limsup , u-d (z,a)la //(a, d) = liminf—1 (-.0+ u^d (z,a)Ta
t
Remark. It should be mentioned that for lower semicontinuous functions the convergence (z, a) J. a is equivalent to the simpler convergence z —*j a, whose meaning is: z - > , a <=$■ ( z , / ( z ) ) - > ( a , / ( a ) ) , and in this case a can be replaced with /(z) in the different quotient. Taking into account the above relationships the following "majoring criterion'' might be very useful.
S.
200
L e m m a 3 . 4 [23] Let the bifunctions h(x, d) andg(x, Assume that for all x € C and d £ X g(x,d)
< h(x,d)
Komldsi
d) be defined on the set C C
X.
.
Ifh(x,d) is (strictly) monotone, quasimonotone or (strictly) pseudomonotone on C, then g{x,d) is (strictly) monotone, quasimonotone or (strictly) pseudomonotone on C, respectively, as well. T h e proof is quite simple and thus o m i t t e d . T h e following two theorems provide a base in studying the links between general ized convexity and generalized monotonicity of different kinds of generalized deriva tives. T h e o r e m 3 . 5 Let f(x) be radially lower semicontinuous on the convex set C and let either h(x,d) = / D ( X , d ) or h(x,d) = / D ( x , d ) . Then each of the following statements are true: (i) f(x) is convex on C iff h(x,d) is m o n o t o n e on C, (?*) / ( x ) is strictly convex on C iff h(x, d) is strictly m o n o t o n e on C, (ii) / ( x ) is quasiconvex on C iff h(x,d) is quasimonotone on C, (Hi) / ( x ) is (strictly) h-pseudoconvex on C iff h(x,d) is (strictly) pseudomono tone on C. T h e proof of (i) and (i") can be found in the papers [22, 26]. S t a t e m e n t (ii) was proved in [19,20,27], whereas the proof of (Hi) was given in [23]. T h e o r e m 3.6 Let f(x) be lower semicontinuous on the open convex set C C Then each of the following statements are true. (i) f(x) is convex on C iff fR(x,d) is monotone on C, (ii) f(x) is quasiconvex on C iff fR(x,d) is quasimonotone on C,
X.
T h e proof of (i) of the above theorem was given in [26], whereas s t a t e m e n t (ii) was proved in [27]. T h e following result is an i m m e d i a t e consequence of Theorem 3.5 and L e m m a 3.3. T h e o r e m 3 . 7 Let f(x) be radially lower semicontinuous have for all x G C and d G X
on the convex set C and
fD(x,d)
Monotonicity and
Quasimonotonicity
201
Remark. The above theorem can be applied for any members of the following collection: /S(x,d)
>
|v
/C(x,d)
fDH(x,d)
>
/„ H (x,d)
>
/D(x,d)
|v
>
|v
fD(x,d)
From Theorem 3.6 and Lemma 3.3 one can infer the following statement. Theorem 3.8 Let f(x) be lower semicontinuous on the open convex set C and let us have for all x £ C and d 6 X fR(x,d)
(4)
a—>d
Then (i) the monotonicity of h(x, d) on C implies the monotonicity of g(x, d) on C, (ii) the quasimonotonicity ofh(x,d) on C implies the quasimonotonicity ofg(x,d) on C. Proof. First we prove assertion (ii). Suppose that h(x,d) is quasimonotone on C and assume for contradiction that (x, d) fails to be quasimonotone on C, that is there exist x, y G C such that jr(x,y-x)>0
and
ff(y,x-y)>0.
Due to (4) it follows that there exist x ' , y ' 6 C such that h{x', y' - x') > 0
and
h(y', x - y') > 0 ,
which contradicts to the quasimonotonicity of h(x,d). applied for proving (i). ■
The same reasoning can be
Since for lower semicontinuous functions we have fDH(x,d)
< fR(x,d)
< limsup inf / D W ( z , u ) , z->x
u-»d
(see [39]) therefore, by Lemmas 3.3 and 3.9, we arrived to the following result:
S. Komlosi
202
Theorem 3.10 Let f(x) be lower semicontionuous on the open set C. Then the following statements are true: (i) /£>tf(x,d) is monotone on C iff J " ( x , d ) is monotone on C, (") /Dw(x,d) is quasimonotone on C iff fR(x,d) is quasimonotone on C. Taking into account the above result, Theorem 3.8 can be sharpened as follows. Theorem 3.11 Let / ( x ) be lower semicontinuous on the open convex set C and let us have for all x G C and d 6 X
/zur(x,d)
> >
/D"(x,d) |V fD(x,d) R |V
> >
/„«(x,d) |V /D(x,d) | V
/*(x,d) > r (x,d) > fDff(x,d) 4
Characterizing Monotonicity in Terms of Quasimonotonicity
It is worth mentioning that the proofs for convexity-monotonicity and quasiconvexity-quasimonotonicity interrelations have been elaborated so far independently from each other in both of the cases of Theorems 3.5 and 3.6 (see [22,26,27]). What is rather surprising, due to the following two lemmas, convexity-mono tonicity statements can, however, be proved by applying quasiconvexity-quasimonotonicity results directly. Lemma 4.1 The function f(x) is convex on the convex set C C X iff the function F(x) = / ( x ) + (g,x> is quasiconvex on C, for all g 6 X'. Proof. Necessity: Obvious, since the sum of a convex and linear function is always convex.
Monotonicity and
Quasimonotonicity
203
Sufficiency: Assume that F(x) = / ( x ) + (g, x) is quasiconvex on C for all g £ l ' . Suppose for contradiction that / ( x ) fails to be convex on C. It means that there exist two distinct points x, y from C and a third point z = tx + (1 — i)y from the open line segment (x, y) such that i / ( x ) + (1 - t)f(y)
< /(z) .
By virtue of the Hahn-Banach Extension Theorem you can always find an appropriate g* € X' such that F'(x) = F'(y) < F*(z) , where F*(x) = / ( x ) + (g*,x). The above conditions, however, contradict to the quasiconvexity of F'(x). This contradiction proves the thesis. ■ Lemma 4.2 The bifunction h(x,d) tf(x,d)
is monotone on C iff the bifunction = /*(x,d) +
is quasimonotone on C for all g 6 X' Proof. Necessity: Assume that h(x,d) is monotone on C. arbitrary and set H(x,d) = h(x,d) + (g,d). Since
Let g G X" be
H(x, y - x) + H(y, x - y) = h(x, y - x) + h(y, x - y ) , therefore H(x, d) is also monotone and thus quasimonotone on C, as well. Sufficiency: Let H(x, d) be quasimonotone on C for all g € X* Since 0 6 X* therefore it follows that h(x, d) is quasimonotone on C, as well. Assume for contradiction that /i(x, d) fails to be monotone. It means that there exist two distinct points x and y in C such that /t(x,y-x) + / i ( y , x - y ) > 0 .
(5)
Without loss of the generality we may assume that ft(x,y-x)
> 0.
From the quasimonotonicity of h(x, d) it follows, that h(y, x — y) < 0. Taking into account inequality (5) we get 0 < -fe(y,x-y) <
h(x,y-x).
In virtue of the Hahn-Banach Extension Theorem you can always find a g* £ X' such that -h(y,x-y) < ( g * , x - y ) < h{x,y-x) . If you consider H*(x, d) = h(x,d) + (g",d) then the above inequalities yield H"(x,y-x)>0
and
H'{y,x
which contradicts to the assumption that H'(x,d) contradiction proves the thesis. ■
- y) > 0 ,
is quasimonotone on C.
Applications of the above lemmas will be given in the next chapter.
This
S. Komlosi
204
5
Two Mean Value Theorems
First we present a proof for statement (ii) of Theorem 3.5. Then, by applying Lemmas 4.1 and 4.2, we get statement (i) of the same theorem. To proceed this way we need the following mean value theorem due to Diewert. [11, Theorem 1, Corollary 1]. Theorem 5.1 Diewert's Mean Value Theorem. Let the function / ( x ) be defined on the line segment [y,z] and assume the values f(y), / ( z ) to be finite. Let s(t) = /(y + *(z — y)) be lower semicontinuous on [0,1]. Then there exists to £ [0,1) such that
/D(xo,z-y)>/(z)-/(y), where x 0 = y + t0(z — y). Proposition 5.2 Let / ( x ) be radially lower semicontinuous on the convex set C. Then / ( x ) is quasiconvex on C, iff any of the following conditions holds: (0 / D ( x i d ) is quasimonotone on C, (ii) fD(x,d) is quasimonotone on C. Proof, (i): Necessity. Assume that / ( x ) is quasiconvex on the convex set C and /D(x,y-x)>0,
(6)
where x, y are arbitrary elements from C. Our task now is to prove that /o(yi x—y) < 0. It is easy to verify that the following implication is true for / ( x ) on C [11]: u , v € C , / ( u ) < / ( v ) =► / D ( v , u - v ) < 0
(7)
From (6) / ( x ) < / ( y ) follows, which, together with (7) gives / o ( y , x — y) < 0. Sufficiency. The proof given here is a slightly modified version of the one given by Luc in [27], Let /£>(x,d) be quasimonotone on C. For contradiction suppose that / ( x ) fails to be quasiconvex. Then it means that there exist a line segment [a, b ] c C and a point z on it, z 6 (a, b), such that /(z)>max{/(a),/(b)} Due to the quasimonotonicity of / D ( X , d) we may assume that / ( z ) is finite. Assume for contradiction that /(z) = -foo for all z S (a, b). Then we have / n ( a , b — a) = /£>(b, a — b) = +oo, which contradicts to the quasimonotonicity of / D ( X , d) on C. Since / ( a ) , / ( b ) and / ( z ) are all finite, therefore there exist, by Diewert's Mean Value Theorem, u G [a, z) and w G [b, z) satisfying conditions /D(u,z-a)>/(z)-/(a)>0 and /o(w,z-b)>/(z)-/(b)>0.
Monotonicity and
Quasimonotonicity
205
Taking into account the positive homogenity of the Dini derivative with respect to its direction argument, the last two inequalities provide the following ones: /D(U,W-U)>0
and
/ D ( W , U - w) > 0 ,
contradicting the quasimonotonicity assumption. This contradiction proves statement (i) of the present theorem. Since / D ( x , d ) > /£>(x,d), therefore statement (ii) follows immediately from (i) above and Lemma 3.4. ■ Now we present a proof for statement (ii) of Theorem 3.6 capturing the essence but simplifying the details of the original one [27]. Then, by applying Lemmas 4.1 and 4.2, we get statement (i) of the same theorem. The same approach can be found in [41]. Let us consider now the generalized derivative fR(a, d), with 'extended' real val ues, and the subdifferential 6Rf(a), possibly empty, associated with it (in the sense of Definition 6.1) as follows: **/"(*) = { g € X' | (g, d) < / " ( a ; d) for all d g X } The basic tool in proving the above claim is the following approximate mean value theorem due to Zagrodny [38,Theorem 4.3]. Theorem 5.3 Zagrodny's Mean Value Theorem. Let / ( x ) be a proper lower semicontinuous function defined on an arbitrary Banach space X. Let f(x) assume finite values at a, b € X. Then there exist a point z £ [a, b) and two sequences {z^}, Zjt G X and {gk}, g* € SRf(zk) such that l i m Zjt = z , k—*+oo
liminf(gfc,b-z,)>^^||b-z||,
(8)
liminf(gt,b-a)>/(b)-/(a).
(9)
k—*+oo
The following corollary to this theorem, which allows us to replace b in (8) with any element from the half ray y = a 4- t(h — a) 6 C, t > 1, will be of direct use in the sequel. This lemma was proved in [27] in a rather complicated way. Here we present a quite simple proof for it. Lemma 5.4 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Let / ( x ) assume finite values at a, b € C. Then there exist a point z G [a, b) and two sequences {z*}, z* € C and {gfc}, gfc € SRf{zk) such that lim Zk = z , k—*+oo
S. Komlosi
206 and for any y = a + <(b — a) € C, t > 1 we have liminf(g,,y-z,>>Z^5M||y-z||.
Proof. We shall prove that the series {z*} and {g^} from Theorem 5.3 meet the requirements of the present lemma. Since the set C is open, therefore we may assume without loss of the generality that zk € C for all k. Let us denote ( / ( b ) — / ( a ) J / | | b — a|| by the simpler symbol K. Observe that y - z* = (y - b) + (b - zk) = a ( b - a) + (b - zk) . with a = |(y - b||/||b - a||. It follows that (g*,y-z*> = a ( g i , b - a ) + (gA,b-zfc) . Taking into account the superadditivity of the liminf operation, the definitions of scalars K and a and applying (8) and (9) one can deduce that l i m i n f ( g t , y - z / t ) > liminfa(g k ,b - a) + liminf(g t ,b - zk) > k—*+oo
k—*+oo
k—»+oo
> ^ ( | | y - b | | + ||b-z||) = A'||y-z|| and it was to be proved. ■ Now we are ready to prove the following theorem. Proposition 5.5 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is quasiconvex on C, iff the Rockafellar derivative fR(x,d) is quasimonotone on C. Proof. Necessity: For the proof we refer to [27]. Sufficiency: Let fR(x,d) be quasimonotone on C. Assume for contradiction that / ( x ) fails to be quasiconvex on C. It follows that there exist two points a, b 6 C with finite values / ( a ) and / ( b ) and a third point c on the open line segment (a,b) such that we have /(c)>max{/(a),/(b)}. (10) Without loss of the generality we may assume that /(c) is finite, as well. (If / ( c ) = +oo, then redefine / ( x ) only at x = c in such a way that (10) fulfil and we will arrive to the same conclusion.) Apply first Lemma 5.4, the corollary to Zagrodny's Mean Value Theorem for the line segment [a, c]. It follows the existence of a point z G [a,c) and two sequences {z/t}, zi 6 C and {gt}, gk G 6Rf(zk) satisfying the following conditions: lim Z/t = z ,
207
Monotonicity a n d Quasimonotonicity
and l i m i n f ( g , , b - zk) > M
ip~\\b
- z|| > 0 .
( T h e last inequality is a consequence of (10).) It follows t h a t for sufficiently large k one has (gt,b-zfc)>0.
(11)
Since z / c, therefore t h e sequence {<:*}, where c* denotes t h e projection of c to the closed line segment [zk, b] a d m i t s t h e following properties: ck 6 C
and
{cit} —> c,
cfc G ( z t , b ) , as
k —> + o o .
Since / ( c ) > / ( b ) and / ( x ) is lower semicontinuous, therefore for sufficiently large k /(e») > / ( b ) .
(12)
Let k b e arbitrary. Apply now L e m m a 5.4 to t h e line segment [cfc,b]. It follows t h e existence of a point wj. 6 [ c t , b ) and two sequences {w^,}, w ^ 6 C and {gw}, git, 6 SRf(wki) satisfying t h e following conditions: lim wfc, = Wfe , 1—* + co
and lim i n f ( g h , z t - w t . ) > ^ ~ { j , b ) ||zt - w t | | > 0 (13) ■^+oo ||ct - b|| ( T h e last inequality is a consequence of (12).) Let it b e sufficiently large satisfying (11) and (12). Due to the definition of w^ it is obvious t h a t (11) is equivalent to (gt,w/fc-zt) > 0 .
(14)
According t o (13) a n d (14) you can choose for this k a sufficiently large index i such that (g/ki,Zfc - w f c l ) > 0 and (gfe.Wfci - zk) > 0 hold. Set X! = w t , 6 C and x 2 = z* € C. Since / f l ( X i , X 2 - X i ) > (gjfci, Zfc - Wjti) > 0 and / f l ( X 2 , X ! - x 2 ) > (g*,Wfc -Zfc) > 0 , therefore these inequalities contradict to the quasimonotonicity of fR(x,d). contradiction proves t h e s t a t e m e n t of the above theorem. ■
This
S. Komlosi
208
6
Generalized Monotonicity for Mult if unctions
Subdifferentials, playing a very important role in Nonlinear Analysis and related fields, are useful dual objects of geometric character for generalized derivatives. Let / ( x ) be a function defined on X and let a £ X. The convex subdifferential of / ( x ) at a is defined as follows: */(*) := { 8 G X' |
implies
(v,x-y)<0,
(15)
Monotonicity and
Quasimonotonicity
209
(ii) pseudomonotone on C if for every x , y 6 C and u £ T(x), v £ T(y) one has (u,y-x)>0
implies
(v,x-y)<0,
(16)
(Hi) strictly pseudomonotone on C if for every x , y £ C, x / y and u € T(x), € T(y) one has
v
(u,y-x)>0
implies
(v,x-y)<0.
(17)
The following interrelations are immediate consequences of the definitions above. monotonicity =$■ pseudomonotonicity =4- quasimonotonicity and strict pseudomonotonicity =>• pseudomonotonicity . Amongst the subdifferential maps Sh(x) you can find a favourable class which is characterized by the following property: for each a £ C, h(a, ■) is the support function of Sh(a). In other words it means that A(a, ■) is a, positively homogeneous sublinear function and for all d G X we have fc(a,d) = sup{(d,g) |ge<5/i(a)} . Michel-Penot-, Clarke-, Rockafellar subdifferentials belong to the above class. The Michel-Penot directional derivative is defined as follows: fUp,
,i ,/ ( a + tz + i d ) - / ( a + fz) (a, d) = suphmsup , zex f-.o+ t and has some remarkable properties, namely it is a lower semicontinuous convex function of d and for all a, d £ X, we have /
/o(a,d) < /D(a,d) < /MP(a,d) < /c(a,d) The following theorem demonstrates that the generalized monotonicity concepts for bifunctions and multifunctions fit well to each other, moreover shows the way for transforming the results of the previous parts for subdifferentials possessing support function. (For the proof consult [22].) Theorem 6.3 Let 6h(x) be a subdifferential map defined on the convex set C with support function h(x,d). Then 6h(x) is monotone, quasimonotone, (strictly) pseu domonotone on C if and only if its support function h(x, d) is a monotone, quasimonotone, (strictly) pseudomonotone bifunction on C, respectively.
S. Komlosi
210
Let us consider now the generalized derivative fR(&, d) and the subdifferential S f(a) associated with it in the sense of Definition 6.1: R
**/(*) = { g € X' | (g,d) < / f i ( a , d ) for all d g X } . It is well known [34] that fR(&, ■) is the support function of 6Rf(a). By combining Theorem 6.3 and Proposition 5.5 we obtain the following theorem. Theorem 6.4 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is quasiconvex on C iff the Rockafellar subdifferential 6Rf(x) is quasimonotone on C. Thanks to the following result due to Hassouni [14,15], Theorem 6.4 enables us to characterize convexity in terms of the monotonicity of 6Rf(x). Now we give a proof bit simpler than that of Hassouni, applying characterization (15) instead of the Hassouni's definition. Lemma 6.5 Let T(x) be defined on C C X. Then the following statements are equivalent: (i) T(x) is monotone on C, (ii) T(x) + g is quasimonotone for all g 6 X* Proof, (i) =>■ (ii): It is not difficult to prove that the monotonicity of T(x) implies the monotonicity and thus the quasimonotonicity of its "translates" T(x) + g for any g g X * (ii) => (i): Suppose that T(x) + g is quasimonotone for any g g X*. Setting g = 0, we infer that T(x) itself is quasimonotone, too. Assume for contradiction that T(x) fails to be monotone on C. It follows that there exist x , y g C and u g T(x), v g T(y) such that (u,y-x) + (v,x-y) >0 (18) Without loss of the generality we may assume that (u,y-x) >0. Taking into account (15) and (18) it follows that (u,x-y) < (v,x-y) <0.
(19)
Let g = —(u + v)/2 and consider the multifunction M(x) = T(x) + g . Then we have u + g g M(x) and v + g g M(y) and from (19) we obtain that (u + g , y - x ) > 0
and
(v + g , x - y ) > 0 ,
which contradicts to the quasimonotonicity of M(x). As a corollary to Lemma 6.5 and Theorem 6.4 we have the following result (cf. [7,26,32,40]). Proposition 6.6 Let C be an open convex subset of a Banach space X and / ( x ) be a proper lower semicontinuous function defined on C. Then / ( x ) is convex on C iff the Rockafellar subdifferential 6Rf(x) is monotone on C.
Monotonicity and
Quasimonotonicity
211
References [1] J.-P. Aubin, Optima and Equilibria, Springer-Verlag, Berlin Heidelberg, 1993. [2] E. Blum and W. Oettli, From optimization and variational inequalities to equi librium problems, The Mathematics Student 63 (1993) 1-23. [3] M. Bianchi, Generalized quasimonotonicity and strong pseudomonotonicity of bifunctions, Contributi di Ricera, Universite Cattolica del Sacro Cuore di Milano, 1994. [4] M. Bianchi, Strong pseudomonotonicity and generalized quasimonotonicity of bifunctions, in P. Mazzoleni (ed.) Optimization of Generalized Convex Problems in Economics, Milano, (1994) 101-112. [5] E. Castagnoli and P. Mazzoleni, Orderings, generalized convexity and mono tonicity, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidelberg, (1994) 250-262. [6] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley and Sons, New York, 1983. [7] R. Correa, A. Jofre, A. and L. Thibault, Characterization of lower semicontinuous convex functions, Proceedings of the American Mathematical Society 116 (1992) 67-72. [8] R. W. Cottle, F. Giannessi and J-L. Lions, (Eds.) Variational Inequalities and Complementarity Problems, Wiley and Sons, New York, 1980. [9] J.-P. Crouzeix and A. Hassouni, Quasimonotonicity of separable operators and monotonicity indices, Working Paper, Blaise Pascal University, 1993. [10] J.-P. Crouzeix and A. Hassouni, Generalized monotonicity of a separable product of operators: the multivalued case, Working Paper, Blaise Pascal University, 1993. [11] W. E. Diewert, Alternative characterizations of six kinds of quasiconcavity in the nondifferentiable case with applications to nonsmooth programming in: S. Schaible-W. T. Ziemba (eds.) Generalized Concavity in Optimization and Eco nomics, Academic Press, New York, 1981. [12] R. Ellaia and A. Hassouni, Characterization of nonsmooth functions through their generalized gradients, Optimization 22 (1991) 401-416.
212
S. Komlosi
[13] K-H. Elster and J. Thierfelder, On cone approximations and generalized direc tional derivatives, in: F. H Clarke, V. F. Dem'yanov a n ( j F. Giannessi, (eds.) Nonsmooth Optimization and Related Topics, Plenum Press, New York, (1989) 133-154. [14] A. Hassouni, Sous-difFerentiels des fonctions quasiconvexes, These de 3eme Cycle de 1'Universite Paul Sabatier, Toulouse, 1983. [15] A. Hassouni, Quasimonotone multifunctions: applications to optimality condi tions in quasiconvex programming, Numerical Functional Analysis and Opti mization 13 (1992) 267-275. [16] S. Karmardian, Complementarity over cones with monotone and pseudomonotone maps, Journal of Optimization Theory and Applications 18 (1976) 445-454. [17] S. Karmardian and S. Schaible, Seven kinds of monotone maps, Journal of Op timization Theory and Application 66 (1990) 37-46. [18] S. Karmardian, S. Schaible and J.-P. Crouzeix, Characterizations of generalized monotone maps, Journal of Optimization Theory and Application 76 (1993) 399413. [19] S. Komlosi, Generalized monotonicity of generalized derivatives, Working Paper, Janus Pannonius University, Pecs, (1991) 8. [20] S. Komlosi, On generalized upper quasidifFerentiability, in: F. Giannessi (ed.) Nonsmooth Optimization: Methods and Applications, Gordon and Breach, Lon don, (1992) 189-201. [21] S. Komlosi, Generalized monotonicity of generalized derivatives, in: P. Mazzoleni (ed.) Proceedings of the Workshop on Generalized Concavity for Economic Applications held in Pisa April 2, 1992, (Verona, 1992) 1-7. [22] S. Komlosi, Generalized monotonicity in nonsmooth analysis, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidel berg, (1994) 263-275. [23] S. Komlosi, Generalized monotonicity and generalized convexity, Working Paper, WP-92-16, Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest, (1992) 23. (accepted by Journal of Optimization Theory and Application). [24] S. Komlosi, Generalized global monotonicity of generalized derivatives, in: R. Tomlinson (Ed.) International Transactions in Operational Research 1 (1994) 259-264.
Monotonicity and Quasimonotonicity
213
[25] S. Komlosi, Generalized monotonicity in nondifferentiable optimization, in: L. Martic, L. Neralic, H. Pasagic (Eds.) KOI'93 Proceedings, Croatian Operational Research Society, Zagreb, (1993) 17-31. [26] D. T. Luc and S. Swaminathan, A characterization of convex functions, Nonlinear Analysis, Theory, Methods and Applications 20 (1993) 697-701. [27] D. T. Luc, Characterization of quasiconvex functions Bulletin of the Australian Mathemaiical Society 48 (1993) 393-405. [28] D. T. Luc, On generalized convex nonsmooth functions, Bulletin of the Australian Mathemaiical Society 49 (1994) 139-149. [29] O. L. Mangasarian, Pseudoconvex functions, SIAM Journal on Controls (1965) 281-290. [30] J. J. Moreau, Fonctionelles Convexes. Lecture notes, Seminare equations aux derivees partielles, College de France (1966). [31] R. Pini and S. Schaible, Some invariance properties of generalized monotonicity, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.)Generalized Convexity. Springer Verlag, Heidelberg, (1994) 276-277. [32] R. A. Poliquin, Subgradient monotonicity and convex functions, Nonlinear Analysis 14 (1990) 305-317. [33] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ., 1970. [34] R. T. Rockafellar, Generalized directional derivatives and subgradients of nonconvex functions, Canadian Journal on Mathemaiics 32 (1980) 257-280. [35] R. T. Rockafellar, The Theory of Subgradients and its Applications to Problems of Optimizaiion: Convex and Nonconvex Functions, Heldermann Verlag, Berlin, 1981. [36] S. Schaible, Generalized monotone maps, in: F. Giannessi (ed.) Nonsmooth Optimization: Methods and Applications, Gordon and Breach, London, (1992) 392408. [37] S. Schaible, Generalized monotone maps - a survey, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.)Generalized Convexity, Springer Verlag, Heidelberg, (1994) 229249. [38] D. Zagrodny, Approximate mean value theorem for upper subderivatives, Nonlinear Analysis, Theory, Methods and Applications 12 (1988) 1413-1438.
214
S. Komlosi
[39] D. Zagrodny, A Note on the equivalence between the mean value theorem for the Dini derivative and the Clarke-Rockfellar derivative, Optimization 21 (1990) 179-183. [40] M. Tosques, Equivalence between generalized gradients and subdifferentials for a suitable class of lower semicontinuous functions, in: S. Komlosi, T. Rapcsak, S. Schaible (eds.) Generalized Convexity, Springer Verlag, Heidelberg, (1994) 116— 133. [41] D. Aussel, J. N. Corvellec and M. Lassonde, Subdifferential characterization of quasiconvexity and convexity, Journal on Convex Analysis, to appear. [42] S. Komlosi, Monotonicity and quasimonotonicity for multifunctions, in: P. Mazzoleni (ed.) Optimization of Generalized Convex Problems in Economics, Milano, (1994) 27-39.
Sensitivity
of Solutions
in Nonlinear
Programming
215
Recent Advances in Nonsmooth Optimization, pp. 215-223 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Sensitivity of Solutions in Nonlinear Programming Problems with Nonunique Multipliers A. B. Levy Department of Mathematics,
Bowdoin
R. T. Rockafellar Department of Mathematics,
University
College, Brunswick,
of Washington,
ME 04011
Seattle,
USA
WA 98195
USA
Abstract
We analyze the perturbations of quasi-solutions to a parameterized nonlin ear programming problem, these being feasible solutions accompanied by a Lagrange multiplier vector such that the Karush-Kuhn-Tucker optimality con ditions are satisfied. We show under a standard constraint qualification, not requiring uniqueness of the multipliers, that the quasi-solution mapping is differentiable in a generalized sense, and we present a formula for its derivative. The results are distinguished from previous ones in the subject, in that they do not entail having to impose conditions to ensure that dual as well as primal elements behave well with respect to sensitivity.
1
Introduction
A s t a n d a r d p a r a m e t e r i z e d nonlinear p r o g r a m m i n g problem can b e formulated in t e r m s of s m o o t h functions / ; on IR" x IR'' as follows: minimize fQ{x,w) where t h e set C(w) C(w)
— (v,x)
over all x € C(w),
C IR" is denned by
: = {x : h{x,w)
< 0,...., f,(x,w)
< 0J,+l(x,w)
= 0, . . . . , / „ ( * , u>) = o } .
(1)
A. B. Levy and R. T. RockafeU&r
216
Here w £ 1R and v £ fit" both serve as parameter elements. In principle, the "tilt" perturbations represented through v could be subsumed into w, but they have an essential role in theory, and we therefore keep them explicit. We concentrate our attention on points x that are quasi-solutions to the mini mization problem (1) in the sense of satisfying, in association with some multiplier vector y, the Karush-Kuhn-Tucker (K-K-T) optimality conditions: 3y = {Uu■■•,Um) € A r *:(/i(z,"'), • • •, fm{x,w)) V = Vr/0(l,U)) + y1VI/1(x,tu)+
with
(2)
\-ym^xfm(x,w),
written here for convenience in terms of NK(U) denoting the set of normal vectors at u to the polyhedral cone K := {ueW
:ui < 0 , . . . , u , < 0 , u s + 1 = 0 , . . . , « m = 0}.
(3)
Thus, for any u £ K the vectors y 6 NK(U) are the ones with j / ; > 0 for indices z £ { l , . . . , m } having u, = 0 but yt = 0 for indices i £ { l , . . . , m } with u; < 0, whereas j/, is unrestricted for indices i £ {s + 1 , . . . , m } . (By convention, NK(U) is the empty set when u ^ K.) The notation y 6 NK(U) saves us from repeatedly having to write down such complicated details, and it has the further advantage of adapting in the framework of variational analysis to a broad range of circumstances beyond those that come into play here. The K-K-T conditions are of course necessary for a feasible solution x to be locally optimal under the Mangasarian-Fromovitz constraint qualification, which in turn takes the form
fly = {yu---,ym) yiVi/i(i,«)) +
e Ni<(fi(x,™),■ ■ ■ Jm{x,w)) with \-ym^xfm(x,w)
(4)
= 0, except y = 0.
Quasi-solutions are sure to be optimal solutions when the minimization problem ex hibits convexity with respect to x, but this is not an issue of concern to us here. The quasi-solution mapping S in this framework associates with each parameter element (w, v) G E.d+n the set S(w,v) := [x £ IRn : the K-K-T conditions (2) hold}. Since, in general, S(w, v) is not a singleton, this equation defines a multifunction (set-valued mapping) S : 1R + " ^? R". In the main result of this paper, we calculate a kind of generalized derivative of S with respect to (w,v). Until now, differentiability properties of 5 have been studied by distinctly different means than will be used here. Most researchers (cf. [2], [6], and [1] for a survey) have looked at the sensitivity of solution multifunctions defined by K-K-T pairs {x,y), T(w,v) := {(x,y) : x solves the K-K-T conditions (2) with y as multiplier},
Sensitivity of Solutions in Nonlinear Programming
217
being forced by this strategy to make strong assumptions about the multipliers y (e.g. uniqueness) in order to draw conclusions about the x-components of these pairs. Some exceptions to this approach are described in [6] where, however, single-valuedness of the solution mapping S is essential. Our approach is new and has the advantage of enabling us to study the "primal" solution multifunction S directly, without any restrictions on the multipliers. In such a setting, much broader than previously has been accessible, our approach leads to formulas describing the magnitude and di rection of perturbations of quasi-solutions in terms of approximations based on set convergence. It does not in itself, though, provide tests for whether S is single-valued in a localized sense. Our methodology derives from our recent work in [3], where we studied the sensi tivity of parameterized "variational conditions'' over a set which itself can depend on the parameters. Associated with any closed set C C 1R" and mapping F : W1 —► IR" is the variational condition F{x) + Nc(x) 9 0,
x € C.
(5)
When C is convex, this expresses the variational inequality for C and F. When C is not convex, Nc(x) is interpreted as the cone of "limiting proximal normals" in nonsmooth analysis, rather than the cone of normal vectors in the sense of convex analysis. The parameterized variational conditions studied in [3] are of the form F{x, w) + NC(w)(x) 3v,
x <= C(w),
(6)
with parameter element (w,v) 6 K. x IB". As long as the Mangasarian-Fromovitz constraint qualification (4) holds, the K-K-T optimality conditions (2) can be re formulated in terms of this type of parameterized variational condition by taking F(x,w) = Vxf0(x,w) (details in Section 3). The quasi-solution multifunction S is given then by the solutions to the parameterized variational condition (6), the pertur bations of which were analyzed in [4]. Here we show that by applying the results of [3] to this formulation of the K-K-T optimality conditions, it can be established that S is "proto-differentiable," moreover with a specific formula for the proto-derivatives.
2
Proto-derivatives
Proto-differentiability, a concept of generalized differentiability which was introduced in [9], is distinguished from other differentiability concepts through its utilization of set convergence of graphs. Consider any multifunction T : W =t Ht" and any pair (u>, z) in the graph of T, i.e., with z G T{w). For each t > 0 one can form the difference quotient multifunction (A ( I%, f : u> ~ [T(w + tw) -
z]/t.
218
A. B. Levy and R. T. Rockafellar
Instead of asking the difference quotient multifunctions (AtT)w,z to converge in some kind of pointwise sense as tiO, proto-differentiability asks that they converge graph ically, i.e., that their graphs converge as subsets of IRm x IR" to the graph of some multifunction A. Then A is the proto-derivative multifunction at w for z and is denoted by T^y, for each u> G IR , T'mi{u) is a certain (possibly empty) subset of IR". The concept of Painleve-Kuratowski set convergence underlies the formation of these graphical limits. It refers to a kind of approximation described from two sides as follows. The inner set limit of a parameterized family of sets {Gt}t>o in IR is the set of points n such that for every sequence <* 10 there is a sequence of points rj^ € Gtk with rjk —> r]. The outer set limit of the family is the set of points t] such that for some sequence tki0 there is a sequence of points nk € Gtk with 77* —► 77. When the inner and outer set limits coincide, the common set G is the limit as tiO. In our framework, this is applied to sets that are the graphs of multifunctions. For a multifunction T : IR™ ■=? W and any pair (w,z) in gphT, i.e., with z 6 r(u>), the graph of the difference quotient mapping (AjT)^^ is r _ 1 gphT — (w, z)\. The multifunction T^ IRm =? IR" having as its graph the outer limit of the sets g p h ( A < r ) ^ as tiO is called the outer graphical derivative of T at w for z. Similarly, the multifunction Y!^i : IRm ■=? IR" having as its graph the inner limit of these sets is the inner graphical derivative. Proto-differentiability of V at w for z is the case where the outer and inner derivatives agree, the common mapping being then the proto-derivative: T'0i = r £ f = T0i, cf. Rockafellar [8]. For the sake of better understanding of the approximation inherent in protodifferentiability, we furnish a description of the kind of uniformity that the concept involves. Proposition 2.1 Under the assumption that T : IR"1 ^ W and A : IRm ^ IR" are multifunctions having closed graph, the following is necessary and sufficient for V to be proto-differentiable at w for z (where z £ T(u))j with proto-derivative T^ ^ = A: for every t > 0 there exists r > 0 such that, for all t £ (0, T), (a) whenever z + t( 6 T(w + tui) with \(\ < e'1 and \w\ < e~x, there exist £' and w' with \C - (\ < e, \w' - w| < e, C e A(w'). (b) whenever ( € A(u) with \(\ < e"1 and |w| < e _1 , there exist (' and w' with \(' -C\
A. Proto-differentiability convergence is known to and every bounded set B has
Gt n B C G -I- V and G D B C G, + V. It suffices in this to consider neighborhoods V of (0,0) G IR m+ " formed by the product of an e ball around the origin of IRm and such a ball in IR", and on the other hand
Sensitivity of Solutions in Nonlinear
Programming
219
to consider bounded sets B formed by the product of an e ' ball around the origin of IRm and such a ball in 1R". The two inclusions reduce then to (a) and (b). D The proto-derivative notation simplifies when T happens to be single-valued at w, i.e., such that the set r(u>) is just a singleton {z}, then it suffices to write T'^. The next result clarifies the relationship between proto-differentiability in this case and B-differentiability as defined by Robinson [7]. Proposition 2.2 Suppose that T is single-valued on a neighborhood ofw. Then V is B-differentiable at w if and only if T is continuous at w and proto-differentiable at w with rjj single-valued, in which event one has the local expansion T(w + tu) = T(w) + tVa(u) + o(t\u>\) for t > 0.
(7)
Proof. B-differentiability corresponds to having an expansion of the form described, but in which the middle term on the right is f A(w) for a continuous (single-valued) mapping A. When this holds it is clear that T is continuous at w and protodifferentiable there with T'^ = A. Conversely, if the latter properties hold with respect to a single-valued mapping A, then it can be deduced from [9, Theorem 4.1] that there exists K > 0 such that \T(w) — T(w)\ < K.\W — w\ for all w in some neigh borhood of w. In particular, this yields A(w) < K|W| for all u. Because the graph of A, being a limit under graph convergence, is closed, it follows that A must be continuous. The characterization of proto-differentiability in Proposition 2.1 special izes then to show that the mappings (A ( r)^, (which are single-valued on ever larger neighborhoods of 0 and bounded there by K) converge uniformly on bounded sets to A. That is the meaning of the expansion expressing B-differentiability. □ This result means that proto-differentiability extends to set-valued mappings, just in the manner that might be wished, the notion of one-sided directional differentia bility deemed most appropriate in the sensitivity analysis of single-valued mappings, smooth or nonsmooth. The question of whether a certain mapping is single-valued or not can be dealt with as a separate issue, which need not be resolved before progress can be made on quantitative stability of solutions.
3
Sensitivity Theorem
Our main result rests on the reformulation of the K-K-T optimality conditions (2) as a parameterized variational condition (6), so we give the details of this reformulation next. In terms of the mapping G : RnU - . IRm defined by
G{x,w) :=
^f1(x,w),...,fm(x,w)),
the K-K-T conditions simply require that
VJ0(x,w)
+ VxG(x,wYNK(G(x,w))
3 v,
(8)
A. B. Levy and ft. T. Rockafellar
220
where VxG(x, w)* is the transpose of the partial Jacobian matrix for G with respect to x. In [3, Theorem 5.1], we have shown that if the Mangasarian-Fromovitz constraint qualification holds at (x, u>), then for all pairs (x,w) G 1R" x ]Rd that are sufficiently close to (x,tt>), the set VXG(X,W)'NK\G(X,W)J is equal to the normal cone mapping associated with the set C(w) at x. Under these circumstances then, the K-K-T optimality conditions (2) come out as the variational condition Vxf0(x,w)
+ NC(w){x)
3 v.
(9)
Theorem 3.1 Let S : \R.n+,i —> IRm be the K-K-T solution mapping defined by S(w,v) := \x : the K-K-T conditions (2) hold), and let x G S(w, v) be such that the Mangasarian-Fromovitz constraint qualification (4) is satisfied. Then for all (w, v) sufficiently close to (w, v) and for all x £ S(w, v), S is proto-differentiable at (w, v) for x with proto-derivative given by the formula: S ( W ( W > ' ) : = {*' ■ VlJo(x,w)x' with z = v — Vxf0(x,w)
+ VlJ0(x,w)w' and M(x,w)
+ M[XtwU(x',w') :=
B v'}
x
Nc(w){ ),
the multifunction M being proto-differentiable at {x,w) for z. Proof. From the equivalence of the K-K-T optimality conditions (2) to the varia tional condition (9), we get the K-K-T solution mapping S to reduce to the solution mapping associated with this variational condition, namely S(w, v) = {x : V x / 0 (x, w) + NC(w)(x)
3 v}.
This is exactly the kind of solution mapping whose proto-differentiability was studied in [3]. The proto-differentiability of M(w,x) = Nc(w)(x) immediately follows along with that of S from [3, Theorem 5.2]. □ To carry this further, a formula for the proto-derivatives of M is required. We can obtain such a formula from viewing C(w) for each w as the x-section at w of the set E = {(x,w) £ R n x m.d : G(x,w) € K). Poliquin and Rockafellar [5] show that when the Mangasarian-Fromovitz constraint qualification holds at (x,w) £ E, the multifunction NE : (x,w) t-> NE{X,W) is protodifferentiable at (x,w) for any {z,q) € NE(X,W). Then from [3, Theorem 5.2] we have
K.»)AX'>W')
= U j * ' : V with (2'>') e (NE){x,w)Uq)(x',w')
j.
(10)
Here the formula for the proto-derivatives of NE is a key ingredient. To develop it and employ it put our various pieces together, we need the following notation.
Sensitivity of Solutions in Nonlinear
Programming
221
We let Ia(x, w) and Im denote the sets of active indices at (x, w) in the specification of E, namely I,(x,w)
= [i& {l,....,s} : fi(x,w)
= 0} and Im = {3 + 1,...., m } ,
and we define the polyhedral cone Q(x, tv) C W+d 0(x w) — l(x'
by
w') ■ V / i ( * . » ) - ( » ' . « « ' ) < 0 « " i € / . ( * , w ) , 1
n n
Next we introduce certain sets of multiplier vectors, first the bounded, polyhedral set Y(x w z q)=[
V = (Vu ■ ■ ■ ,Vm) £ NK-(fi(x,w),...,
fm(x,w))
:
1
and its face Ymax(x,w,z,q;x',w')
= argmax
£ J/.U 1 '.«"'), V 2 /.(x,u>)(x', u/)Y
yey(i,tu,z,9) i = i
\
/
and then the polyhedral cone
Y (x wx w)= l ' ' '
I V' = (y'1,...,y'm)eNK(f1(x,w),...Jm(x,w)): ) i / \ > [ y'i = 0 for i with (Vfi(X, w),(x', w')j / 0 j
Theorem 3.2 Under the assumptions of 3.1, the proto-derivatives of the multifunc tion M are given as follows. For (x',w') (fc Q(x,w), the set M! w . (x', w') is empty. But for (x',w') 6 Q(x,w), the set MLW\ z{x',w') consists of all vectors z' having the form m
r
T
m
z
' = H t/i v L/.( i . w)x' + v L/.( a: . w V + YJ yy*f>ix, w) - j/i* i=l
L
J
i=l
generated by arbitrary choices of y' € K'(a:,u>; x', u/) an
= {(x',w') 6 Q(x,v>) : {(z,q),{x',w'))
= 0} .
We have (z,q) 6 NE(X,W) if and only if there exists y £ Y(x,w,z,q). holds. The formula in question says that the set {NE)[KIW\,(z,q){x'iw')
Suppose that ' s empty if
A. B. Levy and R. T. Rockafellar
222 (x',wf) £ Q(x,w;z,q) (z1, q') of the form
whereas if (x',w') e Q(x,w;z,q) m
this set consists of all pairs
m
(*', ') = £ ViV2fi(x, «;)(*', w') + E j,;V/i(x, w) - y'0(z, q) i=i
1=1
generated by arbitrary choices of y 6 Ynu%x(x, w, z, q), y' 6 Y'(x, w; x', w') and y'0 £ 1R. When this is plugged into (10) we get the formula claimed here. □ Theorems 3.1 and 3.2 can be extended to cover other solution mappings associated with much more general optimization problems, but we will not take this up here. As seen, these results rely heavily on the those in of [3], in particular on [3, Theorem 5.2]. The theory developed in [3] allows a direct sensitivity analysis of parameterized optimization problems to a degree that has not been possible before. We are able on this foundation to obtain in Theorems 1 and 2 proto-derivatives in the sensitivity analysis of the "primal" solution mapping S without making any restrictions on the multiplier vectors y in the K-K-T optimality conditions. When the solution mapping S happens to be single-valued, Theorem 3.1 gives results about the B-differentiability of S. Theorem 3.3 Let S : 1R"+ —> lRm be the K-K-T solution mapping defined by S(w,v) := \x : the K-K-T conditions (2) hold>, and let x G S(w, v) be such that the Mangasarian-Fromovitz constraint qualification (4) is satisfied. If S is single-valued on some neighborhood of (w, v) and continuous at (w, v), and S',as<. is single-valued as well, then S is B-differentiable at (w, v) with the expansion S(w + tw',v + tv') = S(w,v) + tS[aA(w',v')
+
o(t\{w',v')\),
the B-derivative SL s\(w', v') being given by the formula in Theorem 3.1 in combina tion with the one in Theorem 3.2. Proof. This combines the preceding results with Proposition 2.
□
References [1] A. V. Fiacco and J. Kyparisis, Sensitivity analysis in nonlinear programming under second order assumptions. In A.V. Balakrishnan and E. M. Thoma, edi tors, Lecture Notes in Control and Information Sciences, Springer-Verlag, (1985) 74-97.
Sensitivity of Solutions in Nonlinear Programming
223
[2] J. Kyparisis, Sensitivity analysis for nonlinear programs and variational inequal ities with nonunique multipliers. Mathematics of Operations Research, 15 (1990) 286-298. [3] A. B. Levy and R. T. Rockafellar, Variational conditions and the protodifferentiation of partial subgradient mappings. Nonlinear Analysis: Theory, Method and Applications, (1994) to appear. [4] A. B. Levy and R. T. Rockafellar, Sensitivity analysis of solutions to generalized equations. Transactions of the American Mathematical Society, 345 (1994) 661671. [5] R. A. Poliquin and R. T. Rockafellar, Proto-derivative formulas for basic subgradient mappings in mathematical programming. Set-valued Analysis, 2 (1994) 275-290. [6] D. Ralph and S. Dempe, Directional derivatives of the solution of a parametric nonlinear program, 1994, Research Report. [7] S. M. Robinson, Local structure of feasible sets in nonlinear programming, part iii: stability and sensitivity. Mathematical Programming Study, 30 (1987) 45-56. [8] R. T. Rockafellar, Nonsmooth analysis and parametric optimization. In A. Cellina, editor, Methods of Nonconvex Analysis, Springer-Verlag, (1990) 137-151. [9] R. T. Rockafellar. Proto-differentiability of set-valued mappings and its applica tions in optimization. In H. Attouch, J. P. Aubin, F.H. Clarke, and I. Ekeland, editors, Analyse Non Lineaire, Gauthier-Villars, (1989) 449-482.
B. Mond and J.
224
Zhang
Recent Advances in Nonsmooth Optimization, pp. 224-243 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Generalized Convexity and Higher Order Duality of the Non-linear Programming Problem with Non-negative Variables B e r t r a m Mond School of Mathematics,
La Trobe University,
Bundoora,
Victoria,
3083,
Australia.
Jinyun Zhang School of Mathematics,
La Trobe University,
Bundoora,
Victoria,
3083,
Australia,
Abstract
Consider the nonlinear programming problem with non-negative variables. A number of different second order duals are given and appropriate duality the orems established under weakened second order convexity conditions. Higher order dual problems are also discussed and corresponding duality results estab lished.
1
Introduction
Consider t h e nonlinear programming problems (P) min/(x) s.t. g(x) > 0 (P') min/(i) s.t. g{x)>0, x>0 where / and g are twice difFerentiable functions from i?" into R and i ? " respectively. T h e Wolfe duals [14], [7] of (P) and (P') are respectively (where V denotes the gradient column vector with respect to x).
(ID)
maxf(u)-yTg(u) s.t. VyTg(u)
(ID')
m a x f(u)
= V/(u), T
- y g(u)
s.t. V / ( « ) > VyTg(u),
y > 0 T
- u [Vf(u)
-
y > 0
VyTg(u)}
Generalized Convexity and Higher Order Duality
225
The duality of (ID) to (P) and (ID') to (P') was first established with / convex and g concave. Bector et.al. [1], Mahajan and Vartak [6] and Mond and Weir [10] established the duality of (ID) to (P) and (ID') to (P') when the Lagrangean f-yTg T T respectively, / — y g — v [-\ is pseudo-convex. Mond and Weir [10] and Mond and Zhang [12] established the duality of a general dual to (P) and (P') under still weaker convexity conditions. Mangasarian [8] first formulated the following second order dual to (P) (where p £ R" and V 2 is the symmetric n x n matrix of second order partial derivatives) (2D) m « / ( u ) - yTg(u) - \pT[V2f(u) T
T
s.t. Vy g{u) + (V*y g(u))p
- VVff(«)]p
= V/(tt) +
V2f(u)p
2/>0 and established duality theorems under somewhat involved assumptions. Note that if p = 0, then (2D) becomes (ID). In [9], Mond established the duality of (2D) to (P) under the following simpler assumptions:
/ ( * ) - / ( « ) > (x ~ u)TVf(u) ,(x) -
9i(u)
t = 1,2,... , m ,
<(x-
u)TVg,(u)
(for all
+ (x - u)TV2f(u)p + (x-
u)TV2g,(u)p
- \pTV2f(u)p
(1)
- -pTV2gi{u)p
(2)
(x,u,p)).
Mahajan [5] calls the conditions (1) and (2) second order convexity and concav ity respectively. Similarly, Mahajan [5] and Mond and Weir [11] give the following definitions: / is said to be second order pseudo-convex in (x,w) for p if
(x - u)TVf(u)
+ (x- u)TV2f(u)p
> 0 => f(x) > / ( « ) -
l
-pTV2f{u)p.
(3)
/ is said to be second order quasi-convex in (x,u) for p if f{x) - fiu) + \pTV2f(u)p
< 0 => [x - uf[Vfiu)
+ V 2 /(«)p] < 0.
(4)
A function / is second order pseudo-concave or second order quasi-concave if — / is second order pseudo-convex or second order quasi-convex. Note that second or der convexity, pseudo-convexity and quasi-convexity imply, respectively, (first order) convexity, pseudo-convexity and quasi-convexity since the respective inequalities must hold for p = 0. Clearly a function that is second order convex is also second order pseudo-convex and second order quasi-convex.
B. Mond and J. Zhang
226
Mond and Weir [11] gave a number of second order duals of (P) and established duality theorems under weakened second order convexity conditions. We now give a number of second order duals of (P') and establish duality theorems under weakened second order convexity conditions. Higher duals are also considered.
2
Second Order Duality
We first give the following second order dual to (P') max f(u) - yTg{u) - uT[Vf(u)
(2D')
- VyTg{u) + V2f(u)p -
V2yTg(u)p]
- \pT\V2f(u) - vVs(«)]p s.t.
V / ( u ) + V 2 /(«)p > VyTg(u)
+ V2yTg(u)p
y>o
(5)
Theorem 2.1 (Weak duality) Let x satisfy the constraints of (P1) and (u,y,p) satisfy the constraints of (2D1). If f is second order convex for all feasible (x, u,y,p) and yTg is second order concave for all feasible (x,u,y,p) then inf(P')
> sup(2D').
Proof: Since / is second order convex for all feasible (x,u,y,p), /(*) > / ( u ) + (x - u)TVf(u) = f(u) + xT[Vf(u) > / ( « ) + xT[VyTg(u)
+ (x - u)TV2f(u)p
+ V2f(u)p] - uT[Vf(u)
\pTV2f[u)p
-
+ V2f(u)p] -
+ V2yTg(u)p] - uT[Vf(u)
= / ( « ) + (x - u)T[VyTg(u) - VyTg(u) - V2yTg{u)p\
-
> / ( « ) + yTg(x) - yTg(u) - uT[Vf(u)
X
-pTV2f{u)p
+ V2f(u)p] -
+ V2yTg(u)p\ - uT[Vf(u)
- \pT(VV9(u))p-\pT[V2f(u)
we have
\pTV2f{u)p
+ V2/(u)p
V2yTg(u)]p + V 2 / ( „ ) p - VyTg(u)
-
V2yTg(u)p]
- \pT[V2f{u) - V2yTg(u)]p > /(u) - yTg{u) - uT[Vf(u)
+ V2f(u)p - VyTg(u)
-
V2yTg(u)p]
-l-pr^2f{u)-V2yTg{u)]p (The second inequality holds since x > 0 and V / ( u ) + V 2 /(i/)p > VyTg(u) + V2yTg(u)p, the third inequality holds since yTg is second order concave, and the last inequality holds since y > 0,g(x) > 0). D
Generalized Convexity and Higher Order Duality
227
Theorem 2.2 (Weak duality). Let x satisfy the constraints of (P') and (u,y,p) satisfy the constraints of (2D'). If, for each v, f — y g — vT[-] is second order pseudo-convex in (x,u) whenever (x,u,y,p) is feasible for (P') and (2D'), then inf(P') > sup{2D'). Proof: Let v = V / ( u ) + V2f(u)p - VyTg(u) (x - u)TV{f(u)
= (x-
- V2yTg(u)p.
- yTg(u) - uTv} + (x - u)TV2{f(u)
u)T{Vf(u)
From (5), - yTg(u) -
uTv)p
- VyTg{u) - v + V 2 /(«)p - V 2 2 / T 5 ( u )p} = 0.
Since / — y g — v [■] is second order pseudo-convex, we have f{x) - yTg(x) - xTv > f(u) - yTg(u) - uTv - \pT[V*f(u)
-
V*yTg{u)\p.
By y > 0, g{x) > 0, x > 0 and v = V / ( « ) + V2f(u)p - VyTg{u) -
V2yTg(u)p
we have f{x) > fin) - yTgiu) - uT[Vf(u)
+ V 2 /(u)p - VyTg(u)
- V 2 j/ T 5 («)p]
- i / [ V 2 / ( U ) - V2yT9(u)]Pa Theorem 2.3 (Strong duality) Let x° be a local or global optimal solution of (P1) at which a constraint qualifi cation is satisfied. Then there exists y" 6 Rm such that ix°,y°,p = 0) is feasible for (2D1) and the corresponding values of (P') and (2D1) are equal there. If also, for each v, f — yTg — VT[-] is second order pseudo-convex in (x,u) whenever (x,U, y,p) *s feasible for (P') and (2D'), then x° and (i°, y",p = 0) are global optimal solution for (P') and (2D') respectively. Proof: Since a constraint qualification is satisfied at x", then by the necessary Kuhn-Tucker conditions, there exists y° € Rm such that V/(z°) > VyTgix°) x o T [V/(x 0 ) - Vy°Tgix°)] = 0 y°Tg(x°) = 0, y" > 0
B. Mond and J. Zhang
228
Thus (x°,y",p = 0) is feasible for (2D') and the corresponding values of (P') and (2D') are equal. If / — yTg — vT[-] is second order pseudo-convex in (x,u), then by weak duality, x° and (x°, y°,p = 0) must be optimal for (P') and (2D'j respectively. □ Before deriving a general second order dual of (Pi), we first list some special cases.
(2D1')
max/(u) - ^pTV2/(u)p V / ( u ) + V 2 / ( " ) p > VyTg(u) + V2yTg(u)p, yTg{u) + uT[Vf(u) - VyTg(u) + V2f(u)p - V2yTg{u)p)
s.t.
- \pTV2yTg(u)p (2D2')
< 0,
max f(u) - yTg(u) - ^pT[V2f(u)
y>0 -
V2yTg{u)]p
s.t. V / ( « ) + V 2 /(u)p > VyTg{u) + V2yTg{u)p, uT[Vf(u) - VyTg(u) + V 2 /(«)p - V2yTg{u)p] < 0, y>o
(2D3')
max f(u) - uT[Vf{u)
- VyTg(u) + V2f{u)p
- V2yTg(u)P] s.t.
\pTV2f{u)p
V / ( u ) + V2f{u)p > VyTg{u) + yTg{u) - ^pT^2yTg(y-)p
< o,
V2yTg(u)p
y >o
(2Dl') is a dual to (P') under the assumption that / is second order pseudo-convex and yTg+vT[-] is second order quasi- concave in (x, u) for all feasible (x, u, y, p). (2D2') is a dual to (P') if / — yTg is second order pseudo-convex, and (2D3') is dual to (P') ii f — vT[-] is second-order pseudo-convex and yTg is second order quasi-concave in (x, u) for all feasible (x, u,y,p). (These will be shown later as special cases of a general result.) Observe that xTv for all v, will be second order concave, and hence second order quasi-concave in x, and so this condition does not have to be stated in (2D2'). Other second order duals to (P') are possible, with the components p; of g grouped in different ways, depending on the convexity conditions of / and g. We now give a general second order dual to (P'). Let M = { l , 2 , . . . , m } . N = { 1 , 2 , . . . , « } . Let Ia C M, a = 0 , 1 , 2 , . . . , r with I0 n Ie = <j>, a =f 0 and \J Ia = M. 0=0,1,...,r
LetJaC7V,
a = 0 , 1 , 2 , . . . , r with Ja n J0 =
(J
Ja = N
o=0,l,...,r
Note that any particular /„ or Ja may be empty. Thus if M has rT disjoint subsets and N has r 2 disjoint subsets, r = max{ri,r 2 }. Thus, if ri > r2 then Ja, a > r 2 , is
Generalized Convexity and Higher Order
Duality
229
empty. max/(u) - £ yl9i{u) - E u 3 [V/(«) + V2f{u)p •e/o jeJo - VyTg(u) - V 2 j, T <,(«R
(2DC)
- V 2 £ S/.'5.(«)]P ie/o s.t. Vf(u) + V2f(u)p-VyTg(u)-V2yTg(u)p>0 (6) E »*(«) + E «i[V/(«) + V 2 /( u )p - V»TS(u) - VV*(«)rii -
Jz P T [ V 2 / ( « )
- Vv2^W,(a)p<0,
a = l,2,...,r.
»>0. T h e o r e m 2.4 (Weak duality) Let x satisfy the constraints of (P1) and (u, y,p) satisfy the constraints of (2DG''). If f — E y>9i ~ E VA'\ !S second order pseudo-convex and E Vi9i ~v E VA']> •e/o je^o '£'<> izJa a = 1 , 2 , . . . , r, is second order quasi-concave in (x, u) whenever (x, u, y,p) is feasible for (P') and (2DG1), then inf(P') > sup(2DG'). Proof: From (6), let v = V/(u) + V 2 /(u)p - VyTg(u) - V2yTg(u)p. Then E SKtf«'(*)+ E
x v
i J- E Vi9i{u)- E
Since E y>9< + E "it'l ' s
secon
u v
i 3 + 2PTv2 E Vi9i(u)p > 0, a = 1,2,..., r.
d order quasi-concave, this implies
(a; - u)TV i E 2/.5.(«) + E +(i-U)TV2|E^.(")+ E W or
p
- ° '
ffl =
1
u v
ii
»2.---.r
(x - u)T | V E Z/.'S.-(«) + v E
u v
ii
+(x - u)TV2 E 0, a = 1,2,... , r .<E/o
Thus ( x - U ) T ( v E **<(«) + v E <W + V * E I. i€M\/0 j€N\J0 ieM\/0
W9»(«)P}>0
J
B. Mond and J. Zhang
230
or
(x - u)T \ V Y
u v
Vigi{u) -\- v - V Y
ii +
V2
Vi9i(u)P \ > °
Y
Since v = V / ( u ) -f V 2 /(u)p — Vy g(u) — V y g(u)p, we have, (x - u ) r I V / ( u ) + V 2 /( U )p - V J^ Wft(«) - V £ «Y"J [ ieh jeJo T (x - u) V < f(u) - Y Vi9i(u) - Y
+ (x-ufv 2 1f( u )- YVi9i(u)(
ie/o v
Since f — Y, V'9> ~ Y, il"]
ii
' E Vi9i(")P \ > ° ieio J
\
Yum\P>° j€Jo
iS s e c
u v
V
)
°nd order pseudo-convex, we have
f(x) - Y yM1) - Y xm > /(«) - Y v*i(u) - Y «w - \pTF2f(u) - V * Y y.9>W}p By V > 0,g(x) > 0 , i > 0 a n d i ) = V/(u) + V 2 /(u)p - Vt/ T ff («) - V2yTg{u)p > 0, we have /(*) > /(«) - £ iG/o
wft(u)
- $3 Ui[V/(«) + V2f(u)p - VyTg(u)
- V*yTg{u)p]3
j€Jo
- Jpr[v2/(«) - v 2 Y v.*(«)fo z
ie/o
Theorem 2.5 (Strong duality) Let x° be a local or global optimal solution of (P1) at which a constraint quali fication is satisfied. Then there exists y" € Rm, such that (x°,y°,p = 0) is feasible for (2DG1) and the corresponding values of (P1) and (2DG') are equal. If also for each v, f — Y Vi9i ~ Y VA'] ™ second order pseudo-convex and Y Vi9< + Y vii']> a = 1,2,... ,r is second order quasi-concave in (x,u) whenever (x,u,y,p) is feasible for (P1) and (2DG1), then x° and (x°,y",p = 0) are global optimal solutions for (P') and (2DG1) respectively. Proof: Since a constraint qualification is satisfied at x°, then by the necessary Kuhn-Tucker conditions there exists y° G Rm such that V / ( J : 0 ) > ^y°^9{^0)^ x° T [V/(x°) - Vy°Tg(x°)} = 0,
yoTg(x°) = 0,
y° > 0.
Generalized Convexity and Higher Order Duality
231
Thus (x°,y°,p = 0) is feasible for (2DG') and the corresponding values of (P/) and (2DC) are equal. Optimality then follows, if / — 2 J V>9i ~ 5 ^ VA'] la second order pseudo-convex •eft
and 2_, V<9i + zl ie/ 0
jeJo
v
A']i a = 1,2,... ,r, is second order quasi-concave in (x,u),
from
jeJ«
weak duality. □ We now consider some special cases of the dual (2DC) and theorems 4 and 5. If I0 = M, J0 = N then (2DC) becomes (2D'). From Theorems 2 and 3 or Theorems 4 and 5, (2D') is a dual to (P') if / — yTg — vT[-] is second order pseudoconvex for all feasible (x,u,y,p). In particular / — yTg — vT[-] is second order pseudoconvex if / is second order convex and g is second order concave. If I0 = 4>,J0 = < M I = M,Ji = N then (2DC) becomes (2D1'). From Theorems 4 and 5, (2D1') is a dual to (P') if / is second order pseudo-convex and yTg + vT[-] is second order quasi-concave for all feasible (x,u,y,p). If J0 = M, J 0 = <Mi = N then (2DG') becomes (2D2'). From Theorems 4 and 5, (2D2') is a dual to (P') if / — yTg is second order pseudo-convex for all feasible (x,u,y,p). If To = (j>, Jo = N,h = M then (2DC) becomes (2D3'). From Theorems 4 and 5, (2D3') is a dual to (/") if / — vT[-] is second order pseudo-convex and yTg is second order quasi-concave for all feasible (x,u,y,p). If 7o =
max / (u) - uT[Vf(u) s.t.
+ V 2 /(«)p - VyTg{u) -
V2yTg(u)p]
\pTV*f(u)p
V f(u) + V 2 / ( " ) P > Vt/ T 5 («) + V2yTg{u)p Vifl'.-(w)- 2?Tv2y>9>(u)P ^ °.
« = 1,2,...,m
2/ > 0 From Theorems 4 and 5, (2D4') is a dual to (P') if / — uT[-] is second order pseudoconvex and each y,p,, i = 1,2, . . . , m is second order quasiconcave for all feasible (x,u,y,p). Note that if <7, is second order quasiconcave and y{ > 0, then t/,<7; is second order quasiconcave. Thus (2D4') is a dual to (P') if / - vT[-] is second order pseudoconvex and g (i.e. each component of g) is second order quasiconcave. We now give a Mangasarian type strict converse duality theorem for the dual (2DG') to (P/). A function / will be said to be second order strictly pseudo-convex at x" if, for all x ^ x* and p, (x - x*) r [V/(x*) + V 2 / ( z > ] > 0 =* f(x) - f(x')
- \pTV2f(x')p
> 0.
232
B. Mond and J. Zhang
It will be said to be second order strictly pseudo-convex at x*,p*, if, for all x # x', (x - x')T[Vf(x')
+ V2/(x*)p*] > 0 => f{x) - f(x')
- -p*TV*f(x*)p'
> 0.
Theorem 2.6 (Converse duality) Let x" be an optimal solution of (P') and let a constraint qualification be satisfied atx" Let v
J2"J'A-] /f-- E K^ *_ ~ E N
be second order pseudo-convex
and let
*W EW W++5E>» JJ H H, ■€/<.
jeJa
l,2,,...r «a = l-2,,...r
be second order quasi-concave in (x,u) whenever (x,y,u,p) is feasible for (P') and (2DG'). If(x',y',p') is an optimal solution of (2DG1) and iff - E y*i9i - E !>,•[•] is second order strictly pseudo-convex /(*•) = /(x*) £ y:rfftV) /(.') - E gi(x')
at x',p',
then x" = x°, i.e. x'''solves
£ *,-[V/(i*) **[V/(x*) + + V 2 /(x*)p* - E
(P'°) and
T Vy* Vy' g(x')
1 2 x ^ p T j -- V //( ( x *) * ) -- E »*tf(**)]P* -- VvyVT^j,(*v]i V[TVtv2 E vt9i(**)W 2
.e/o
Proof: We assume that x" / x° and exhibit a contradiction. Since x° is a solution of (P') and a constraint qualification is satisfied at x°, it follows by strong duality that there exists y° € Rm,p = 0 such that {x°,y°,p = 0) solves (2DC). Hence = /(*•) /(*•) -- ££ yfr,(*») yfgi(x°) - £ £ x'j[Vf(x') x°[V/(x°) /(*•) =
- V^/^fx0)], Vy^(x-) = /(**) - E *?«(**) -" E ^*[V/(x*) - V;/*^(x*; + V + V22/(x*)p* /(x*)p* -- V VV^fx V^fx V VL L
- V[W(x*)-EvV5 .(-*)K ,(x*)K 2
(7)
iek
1
V*f(x')p'-Vy'Tg(x')-
Since (x',if,p*) is feasible for (2DC), if we let v = Vf(x*) + VyTg(x')p', then we have
E *:*(**) + E *to - E v?w(**) - E *>i *>; •e/a
ie^o
■£/„ T T
22
+ ^P* + ^P* V V
jeJc
£ ¥**(*') P*>0, P*>0,
Q == ll,2,...,r. ,2,...,r.
233
Generalized Convexity and Higher Order Duality Also, since £ y*gi + £ v>[-] is second order quasi-concave it follows that ie/o J€Ja (x° - x') x*)TT{{ V 5J 3] y* t/*;(x*) £ 3 x'u.;} x*^} gi (x*) + V 5 i€/a
j€^a
+ (( x^° -- xI rf VV 2 ^{ 55 3; ^, (, (I x' *) )++ E 5 3* x; «^i}} pP ** > 0 , ■e/a ■€/<.
a = l,2,,..,r. l,2,,..,r.
jeJa ieJ<.
Thus T T 2 (x°-x*) 53 x*„, x*^}} ++ ((x°-x-) ( x ° - x * ) T {{v V ^53 *?*(**)+ *?*(**) + Vv E x ° - x ' ) T Vv 2 553 3 t/;,(x*)p* tig{(x*)p' >>0o .eM\/„
i e Af W l
.eM\/ 0
r rr 2 =^(x°-x-) + Vvx= * ( x ° - x " ) r {{v V 553 3 ^,(x-) ^,(x-) + x - t,-v53x> t , - V 5 3 ;x > ; ++Vv2 553 3 y*»; }> 0 ft -(x>*}>o 9i(x-)p-
•€M\/„ jeJo .eAf\/„ •€M\/0 jeJo «eAf\/„ ^ ( x " - x ' ) T { V 5 3 ^yT. 9i( (xx')* ) + + v - V 53 x)V]■ + V 2 5 3 ^^ ,,( (xx* > ) p**}} >> 0 ■eM\/ jeJo «eA/\/ ■ eM\/00 j€./o i£M\;, 0
Since v = Vf(x')
+ V 2 / ( x ' ) p - - Vy'Tg(x')
- V*y>Tg(x')p*,
it follows that
(x° - x*) T {V/(x*) - V 53 j,* 5 l (x') - V 5 3 x > , + V 2 /(x*)p' 2 -- V > 00 V 2 55 33js/,"*<5? ,l (( x^ -))pp*' }} >
= » ( * • - x°*)rv{/(x-) - 53 ^,(x*) - 53 x>,} ie/o
j€-/o je^o
+ (x° - x*)Tv2{/(x-) - 53 y:9,(x') - 53 x>,}p* > o ■e/o ie/o
ieJis
Since / - 5 3 y't9i - 5 3 «j[-] is second order strictly pseudo-convex at x*,p*, we have that
/(*•) - E ^.^°) /(*°) »?*(**) - E *J«* > /(**) - E »?*(**) ie/o
>e/o
JE'/O
W 22 /(^)-V /(^)-v 2253y-- yeJo EE ^^ ++ W 5:y-''e/o55 ,(x-)]p* ,(x-)]p* from (7), which implies i'€/o
This is a contradiction since y' > 0,5(x°) Vy'Tg(x') - V2y^«,(x*)p* > 0.
iGJo
> 0,x° > 0 and v = V/(x*) +V 2 /(x*)p* D
B. Mond and J. Zhang
234
3
Higher Order Duality
In [8], Mangasarian gives the following higher order dual to (P) (HD) max / ( « ) + h(u, p) - yTg(u) - yTk(u, p) s.t. V„h(u,p) = Vp(yTk(u,p)) y>0 where k : R" x R" -* R and ifc : R1 X R" -> R™ are differentiable functions; V p /t(w,p) denotes the n x 1 gradient of A with respect to p and V p (y T fc(u,p)) denotes the n x l gradient of yTk with respect to p. Note that for the first order dual (D), h(x,p) = p T V / ( x ) and fc;(x,p) = pTVgi(x),i = 1,2, . . . , m ; while for the second order dual (2D), h(x,p) = p T V / ( x ) + lpTV2f(x)p and *,-(*,p) = i^Vft(x) + ip r V 2 < ? ,(x)p, * = 1,2,.. . ,m. Mangasarian, however, does not prove a weak duality theorem for (P) and (HD) and only gives a limited version of strong duality. In [11] Mond and Weir give con ditions for which weak duality holds between (P) and (HD), prove strong duality for (P) and (HD) and consider other higher order duals to (P). We now give a higher order dual (HD') to (P') and conditions for which duality holds between (P') and (HD'), and also consider other higher order duals to (P').
(HD')
m a x / ( u ) + h(u,p) - yTg(u) - yTk{u,p) s.t. Vph(u,p)> Vp(yTk(u,p)) y
- (u + p)T[Vph{u,p)
-
VpyTk(u,p)} (8)
>0
Theorem 3.1 (Weak duality) Let x be feasible for (P') and («,?/,p) feasible for (HD'). (x,u,y,p) /(*) - /(«) > (* - u)TVph(u,p)
+ h(u,p) -
If for all feasible
T P (Vph(u,p))
(9)
and gi(x)-gi{u)
< (x-u) T V„*:,(u,p)-f fc,(u,p) - p T (V p fc,(u,p)),
. = 1 , 2 , . . . , m , (10)
then inf(P') > sup(2/D') Proof: / (a) > /(«) + (x-
u)TVph(u,p)
+ h{u,p) -
pT(Vph(u,p))
= /(it) -I- xTVph{u,p) - uTVph(u,p) + h(u,p) pT(Vph(u,p)) T T T > f(u) + x Vp{y k(u, p)) - (u + p) Vph(u, p) + h(u, p) = / ( « ) + (x - u)TVp{yTk{u,p)) + uTVp(yTk(u,p)) + h(u,p) - (u +
p)TVph(u,p)
Generalized Convexity and Higher Order Duality = / ( u ) + (x - u)TVp(yTk(u,p)) T
- (u + p) Vph(u,
+ yTk(u,p)
T
p) - y k(u, p) + p
- pT(VpyTk(u,p))
T
(Vp!/Tfc(u, T
> / ( « ) + VT9(x) - /(«) + h(u,p) - y k(u,p) > / ( « ) - yT9(u) + A(u,p) - yTk(u,p)
235
T
T
p)) + u Vpy k(u,
+ h{u,p) p)
- (u + p ) T [ V p % , p ) - V p y T fc(u,p)]
- (u + p) r [V p / J (u,p) -
VpyTk(u,p)]
(The first inequality holds by assumption (9), the second inequality holds by x > 0, Vph(u,p) > V p j/ T fc(u,p), the third inequality holds by assumption (10). The last one holds since y > 0,g(x) > 0). D Theorem 3.2 (Strong duality) Let x° be a local or global optimal solution of (P') at which a constraint qualifica tion is satisfied and let h(x",0) = 0,*(»°,0) = 0, Vph(x',0)
= Vf(x°),Vpk(x°,0)
= Vg(x°)
(11)
then there exists y" 6 Rm such that (x°,y°,p = 0) is feasible for (HD1) and the corresponding values of (P1) and (HD') are equal. If also (9) and (10) are satisfied for all feasible (x,u,y,p), then x" and (x°,y",p = 0) are global optimal solutions for (P1) and (HD') respectively. Proof: Since a constraint qualification [7] is satisfied at x° by the necessary KuhnTucker conditions [4] there exists y" £ Rm such that V/(x°) > Vy°Tg(x°) z° [V/(x°) - Vy°Tg(x°)} = 0 y°Tg(x°) = 0, t / ° > 0 T
Thus, from (11), (x°,j/°,p = 0) is feasible for (HD') and the corresponding values of (P') and (HD') are equal. If (9) and (10) hold, then by Theorem 3.1, x° and (x°,y°,p = 0) must be global optimal solutions for (P') and (HD') respectively. D Remarks. If h(x,p) = p T V / ( x ) + | p T V 2 / ( x ) p then (9) becomes the second order convexity condition given by Mond [9] and Mahajan [5]. If fc;(x,p) = pTVg>;(x) + |p T V 2 <7,(x)p, then (10) becomes the second order concavity condition given in Mond [9] and Mahajan [5]. Also, conditions (11) are satisfied if h(x,p) = p r V / ( x ) + \pTV2f{x)p and k(x,p)
= pTV9i{x)
+ -pTV2gi(x)p,
i = 1,2,... ,m.
We now show that weak duality between (P') and (HD') holds under weaker convex conditions than those given in Theorem 3.1.
B. Mond and J. Zhang
236 Theorem 3.3 (Weak duality). Let x be feasible for (P') and (u,y,p) (x,u,y,p) and v € Pt1,
be feasible for (HD').
If for all feasible
( x - u)TVp{h(u,p) - yTk(u,p) - pTv} > 0 T T => f(x) - y g(x) - x v - (f(u) - yTg{u) - uTv) - (h(u,p) + pT[Vph(u,p) - VpyTk(u,p)} > 0.
yTk(u,p)) (12)
Then inf(P') > sup(#D') Proof: Since (u, y,p) is feasible for (HD'), from (8) we let v = Vph(u,p)-VpyTk(u1p) (x - u)TVp{h(u,p) - yTk(u,p) - pTv} = ( i - u)T{Vph(u,p) - VpyTk{u,p) - VppTv) T = {x- u) {Vph(u,p) - VpyTk(u,p) -v} = 0
(13)
(by (13))
From (12) it follows that /(*) - VTg(x) - xTv - (/(«) - yT9(u) - uTv) - (h(u,p) + pT{Vph(u,p)-VpyTk(u,p)}>Q
yTk(u,p))
Since y > 0,g(x) > 0,x > 0, (8) and (13), we have f(x) > f(u) - yTg(u) + h{u,p) - yTk{u,p)
- {u + p)T[Vph(u,p)
-
VpyTk(u,p)]. D
Remarks. If / satisfies (9) and g satisfies (10), then / — yTg — vT[-] satisfies (12). If h(u,p) = pTVf(u) + \pTV2f{u)p and k,(u,p) = pTVg,(u) + lpTV*gi(u)p, i = 1,2,..., m, then (12) implies that / — yTg — vT[-] is second order pseudo-convex as defined by Mahajan [5]. Also strong duality between (P') and (HD') still hold if conditions (9) and (10) are replaced by condition (12). Other higher order duals to (P') are also possible. For example, under suitable conditions, the problem (HD") s.t.
max /(«) + h(u, p) - pTVph(u,
y >o is a dual to (P').
p)
Vph{u,p)-Vp(yTk(u,p)) >0 T T y g(u) + u [Vph(u,p) + yTk(u,p)-pTVpyTk(u,p)<0
(14) T
Vpy k(u,p)] (15)
Generalized Convexity and Higher Order Duality Theorem 3.4 (Weak duality). Let x be feasible for (P') and {u,y,p) (x,u,y,p), and v € R", (x - u)TVph(u,p)
237
feasible for (HD").
If, for all feasible
> 0 =>■ f(x) - f{u) - h{u,p) + pTVph(u,p)
>0
(16)
and yTg(x) + xTv - yTg(u) - uTv - yTk{u, p) + pTVpyTk{u, ^{x-u)TVp{yTk(u,p) + pTv}>0,
p) > 0 (17)
then inf(P') >
sup(HD").
Proof: Since x is feasible for (P') and (u,y,p) v = Vph(u,p) — VvyTk{u,p), then from (14)
is feasible for (HD"), and we let
yTg(x) + xTv - yTg{u) - uTv - yTk(u,p) From (17), it follows that (x - u)TVp{yTk(u,p) (x-u)T{VpyTk{u,p) => (x - u)TVph(u,p)
+ pTV„yTk(u,p)
> 0.
+ pTv} > 0, that is,
+ v) > 0 > 0
(By v = Vph(u,p) -
=» / ( * ) - / ( « ) - A(«,P) + pTVPh(u,P)
VpyTk{u,p))
> 0 (By (16)) D
Theorem 3.5 (Strong duality). Let x" be a local or global optimal solution of (P') at which a constraint Qualifica tion is satisfied and let h(x\0)
= 0,k(x°,0)
= 0, V p /i(x°,0) = V/(x°), Vpk(x"p) = Vg(x°)
(18)
then there exists y° € R" such that (x°,y°,p = 0) is feasible for (HD") and the corresponding values of (P') and (HD") are equal. If also (16) and (17) are satisfied for all feasible (x,u,y,p), then x" and (x",y",p = 0) are global optimal solutions for (P1) and (HD") respectively. Proof: Since a constraint qualification [7] is satisfied at x", by the necessary KuhnTucker conditions [4], there exists y" G Rm such that V/(x°) > VyoTg{x°), T
y° g(x°)
x o T [V/(x°) - Vy°Tg(x°)} = 0 = 0,
y'> 0.
B. Mond and J. Zhang
238
Thus from (18) (x°,y°,p = 0) is feasible for (HD") and the corresponding values of ( F ) and (HD") are equal. If (16) and (17) hold, then by Theorem 3.4, x° and (x°, y°,p = 0) must be global optimal solutions for (P') and (HD") respectively. D R e m a r k s . If h{u,p) = pTVf(u)
\pTV2f(u)p,
+
ki(u,p) = pT^ 9i(u) + -pTV2gi{u)p,
i=
1,2,...,m
then conditions (16) and (17) reduce to second-order pseudo-convexity of / and second order quasi-concavity of yTg + vT[-], and also the higher order dual (HD") reduces to the second order dual (2D1'). We now formulate a general higher order dual to (P'). Let / „ C M = {l,2,...,m},
JaCN
= {l,2,...,n},
a = 0 , 1 , 2 , . . .,r,
with r
| J Ia = M
and
/„ n Ip = 4> if a ^ 0
and
Ja n J/3 = (j> if a ^ /3
a=0 r
| J Ja = N
Note that any particular Ia or Ja may be empty. Thus if M has r\ disjoint subsets and N has r 2 disjoint subsets, r = max{ri,r2}. Thus, if ri > r 2 then Ja, a > r 2 , is empty. Consider the problem (HDG)
max / ( « ) + h(u,p) - ^2(y,gi(u) <e/o
-pT[Vph(u,P) -
+ j/ifc,(u,p))
VPJ2VMU,P)}
- Yl «J[ V P%>P) ~ vp!/rfc(u,p)]j jeJo
s.t.
Vph(u,p)-
V„yTk(u,p)
E t e ( u ) + UiWu
>0
(19)
Yl VMU,P)
«iIV,fc(w, P) - V„yTA:(U, p)]j < 0 a = l,2,...,r,
T h e o r e m 3.6 (Weak duality).
(20) y>0
Generalized Convexity and Higher Order Duality Let x be feasible for (P1) and (u,y,p) (x,u,y,p), and v g R",
239
feasible for (HDG). If, for all feasible
(x - u)T[Vph(u,p) - V„ £ yM^P) ~ VP Y p ^ ] > 0 •€/o jeJo =* /fa) - J2 Vi9i(x) ~ £ a;j«j - (/(«) - £ Wffi(«) ~ H u j»i) •€/o ieJo i€/o J6J0 - ( * ( « , P) - £ ViH»tP)) + p T [V p /i(«,p) - V p £ Vik(u,p)] > 0(21) and £ sw.fa) + £
x v
Ji -
£ y.?i(«) + £
u w
J i
- £ y
+ p r v P l£y,fc,(«,p)) >o (x - u)TVp{J2 vMu,P) + £ pm) > ° •'6/= jeJa a = l,2,...,r
(22)
inf(P') > sup(#£>G). Proof: Since x is feasible for (P') and (u,y,p)
is feasible for (HDG), if we let T
v = Vph(u,p)-Vpy k(u,p),
(23)
then Y y<9i(x) + £ t£/a
x v
i i - ( £ 2/*fa) + £
>£ja
+ PTVP(Y
U V
3 J) - £
J6ja
>e^o
VMW))
VMU>P)
«"€*•
* °.
« = 1,2,...,r.
(x - u)T{V„( £ »,-*»(«,?)) + V„ £ ?*»,} > 0.
« = 1,2,... ,r.
Hence by (22)
Thus ( x - « ) T { v p Y yik(u,p) + vP Y ieM\/0 jeN\Jo
=>(x-u)T{Vp £
PM}>O
}iW«,p)+«-VPEwl>«
=> (x - u f { V p ( U , p ) - V„ £ W**(«,f») - V p £ p ^ } > 0 (by (23)). is/o je-'o
B. Mond and J. Zhang
240
From (21) it follows that
f(x) ~ Y, Vi9i{x) - £ XjVj - [f(u) - J2 Vi9i(u) - J2
- (h(u,P) - £ yMu,p)) + pT[vPh(u,p) - vP £ t'e/o
u v
i i\
WM«,P)1
>°
I'eAi
Since x is feasible for (P') and (u,y,p) is feasible for (HDG) and (23), we have f{x) > / ( « ) + h(u,p) - ^2(yigi(u) + yiki(u,p)) i'€/o
- pT{vph(u,P) - v p Y.
VM*,P)]
te/o
- 5Z « j [ V ( u , p ) - Vpj/Tfc(u,p)]J »GJS) D
Theorem 3.7 (Strong duality) Let x° be a local or global optimal solution of (P') at which a constraint qualifica tion is satisfied and let conditions (11) be satisfied. Then there exists y° 6 Rm such that (x",y°,p = 0) is feasible for (HDG) and the corresponding values of (P1) and (HDG) are equal. If also (SI) and (22) are satisfied for all feasible (x,u,y,p) then x° and (x°,y°,p = 0) are global optimal solutions for (P') and (HDG) respectively. Proof: Since a constraint qualification [7] is satisfied at x" by the necessary KuhnTucker conditions [4] there exists y° g Rm, such that VyoTg(x°)
< V/(x°)
T
x° [V/(x°) - Vy°Tg(x°)} = 0 yoT9{x°) = 0 y°>o. Thus from (11), {x°, y°,p = 0) is feasible for (HDG) and the values of (P') and (HDG) are equal. If (21) and (22) are satisfied, then, by Theorem 3.6, x° and (x°,y°,p = 0) are global optimal solutions for (P') and (HDG) respectively. □ Remarks. If I0 = M, J0 = N then (HDG) becomes the higher order dual (HD'). In addition, conditions (21) and (22) become the condition (12). If Io = (f>, h = M, J0 =
Generalized Convexity and Higher Order Duality
241
condition (21) and (22) become second order pseudo-convexity of / — YJ j/,^,- — YJ »,•[■] and second order quasiconcavity of Y^ J/,'J7; — E "iHi a = 1,2,... , r respectively. We now give a Mangasarian type [7] strict converse duality theorem for the higher order dual (HDG) to (P'). Theorem 3.8 (Converse duality) Let x" be an optimal solution of (P1) and let a constraint qualification be satisfied at x" Let condition (11) be satisfied at x" and let conditions (21) and (22) be satisfied for all feasible (x,u,y,p). If (x*,y*,p*) is an optimal solution of (HDG) and if for all x / x" and v € R"
(x - x')T[Vph(x',p')
- Vp(£
yiki{x',p')))
- v p ( £ P*[VPM«%P*) - v>ytTk(x*y)W > o jeJo
=> /(*) - E $#(*) - Y: *& - (f(x-) - E y-gi(x') - Y. *;»i) •e/o jeJo >6/o je-Aa -(fc(x',P*) - YJy*fc.(^,P*))+P* T [V P /i(x*,P*)- V p £>;*,•(**.?*)] > 0 , •e/o te/o
(24)
then i* = i ° , i.e. x' solves (P') and
/(x°) = /(*•) +fc(x-,p')- 52[y;9<(x*) + y;k,(x',p')] ■e/o
r
- p- [vPMx-,p*) - v p yj y;k,(x',P')} •e/o - E
V
¥
*K PM* .P*) "
VpVW,P*)]i
Proof: We assume x° ^ x' and exhibit a contradiction. Since x" is a solution of (P') and a constraint qualification is satisfied at x", it follows by strong duality that there exists y" € /P", p = 0 such that {x",y°,p = 0) solves (HDG). Hence / ( * • ) = f(x') - E V?ft(*°) - E *°Ff(x°) •e/o ie./o
- V y ^ f i 0 ) ] , (By (11))
= /(«») + h(x',P>) - E b . % ( ^ ) + »?*»(*',?*)] <6/o
- p-T[Vpfc(**,p*) - V p £ y:Ux\p')) ie/o - E x*[V p ft(x',p*) - Vp2/'Tfc(x*,p*)3J
(25)
B. Mond and J. Zhang
242
Also since (x*,y',p") is feasible for (HDG) and from (19) if we let v = Vph(x',p*) Vpy*Tk(x*,p*), then we have that for a = 1,2,... , r
-
E visit"') + E *;«i - ( E »;*(**) + E *;»*) - E y?*.-(**,p*) + P * T V P ( E
VM*\P*))
> o.
and so by (22), we have (»• - x * ) T V p { £ si?fc,(s>*) + E P;»>} > 0. •e/a
« = 1,2... . , r .
ieJ«
Hence (*«-x*) T {V„
E
J/,-fc.(x*,P*) + V p
.£M\/o
^(x°-x*)r{Vp
£
E
P>i}>0
J6N\J0
3/,*^*>P*) + « - V P E P > i } ^ 0 j'6Jo
>£M\7o
Since u = Vpfe(i*,p*) — Vpt/*Tfc(i*,p*) we have
(*' - x-)T{vph(x',p') - vp E y'M*',p') - v , E P>.) > o and by (24) it follows that:
/(*•) - E y*9>(x0) - E *:»i - [/(*') - E »**(**) - E *&•] -(/i(x-,p*)-^2/*fc,(x-,p*)) + p*T[Vph(x",p')
- V p £ V?k(*%P')] > 0. i'e/o
(26)
But from (25) and (26) and v = Vph(x",p") - Vpy"Tk(x',p')
we get E y'9i(x°) + ieh E ^ [ ^ P M ^ ' I P * ) ~~ ^p!/*T^(:r*!P*)]j < 0 which is a contradiction since x° is feasible
for (P') and (x*,j/*,p*) is feasible for (HDG). D A number of papers containing second order duals for programming problems with non-differentiable functions can be found in the literature, see e.g., Bector and Chandra [2,3]. There too the problems considered do not require that the variables be non-negative. The method and results given here are applicable to these non-smooth problems as well and will be further discussed in a subsequent paper. In [13] Qi considered LC1 optimization problems, i.e., problems where the func tions and constraints are differentiable and the derivatives are locally Lipschitzian. He showed that many results proved for problems with second order differentiable func tions actually hold for LC1 problems. The possibility of establishing higher order duality results for LC1 optimization problems will subsequently be considered.
Generalized Convexity and Higher Order Duality
243
References [1] C. R. Bector, M. K. Bector and J. E. Klassen, Duality for a nonlinear programming problem, Utilitas Mathemaiica 11 (1977) 87-99. [2] C. R. Bector and S. Chandra, First and second order duality for a class of nondifferentiable programming problems, J. Inf. Opt. Sci. 7 (1986) 335-348. [3] C. R. Bector and S. Chandra, Second order duality with nondifferentiable functions, in manuscript. [4] H. W. Kuhn and A. W. Tucker, Nonlinear programming, Proceeding of the 2nd Berkeley Symposium on Mathemaiical Statistics and Probability, University of California Press, (1951) 481-492. [5] D. G. Mahajan, Contributions to optimality conditions and duality theory in nonlinear programming, PhD thesis, Indian Institute of Technology, (1977), Bombay, India. [6] D. G. Mahajan, M. N. Vartak, Generalization of some duality theorems in nonlinear programming, Mathemaiical Programming 12 (1977) 293-317. [7] O. L. Mangasarian, Nonlinear Programming, McGraw-Hill, New York (1969). [8] O. L. Mangasarian, Second and higher-order duality in nonlinear programming, Journal of Mathemaiical Analysis and Applications 51 (1975) 607-620. [9] B. Mond, Second order duality for nonlinear programming, Opsearch, Journal of the Operational Research Society of India 11 (1974) 90-99. [10] B. Mond and T. Weir, Generalized concavity and duality, in S. Schaible and W. T. Ziemba (ed.), Generalized Concavity in Optimization and Economics (Academic Press, New York, 1981) 263-279. [11] B. Mond and T. Weir, Generalized convexity and higher order duality, Journal of Mathematical Sciences 16-18 (1981-1983) 74-94. [12] B. Mond and J. Zhang, Generalized convexity and duality of the nonlinear programming with non-negative variable, Proceedings of a symposium on Optimization, Ballarat, Australia, 14 July 1994, to appear. [13] L. Qi, Superlinearly convergent approximate Newton methods for LC1 optimization problems, Mathemaiical Programming 64 (1994) 277-294. [14] P. Wolfe, A duality theorem for nonlinear programming, Quarterly of Applied Mathematics 18 (1961) 239-244.
244
W. Oettli and P. H. Such
Recent Advances in Nonsmooth Optimization, pp. 244-260 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Prederivatives and Second Order Conditions for Infinite Optimization Problems Werner Oettli Universitdt Mannheim,
P h a m Huu Sach Institute of Mathematics,
68131 Mannheim,
Germany
Box 631 Bo Ho, 10000 Hanoi,
Vietnam
Abstract
This paper deals with the problem of minimizing a supremum function over a subset C of a topological vector space X. Second order necessary and sufficient optimality conditions are written in terms of some approximations of the data of the problem.
1
Introduction
In what follows we consider the infinite optimization problem (P)
m i n { / ( i ) :=supf,(x)
:x G C}.
teT
Here T ^ 0 is a topological space, C / 0 is a subset of some real topological vector space X, say, and ft:C—>]R{oT all ( £ T . We shall derive necessary second order conditions for a local m i n i m u m of ( P ) , and sufficient second order conditions for a strict local m i n i m u m of ( P ) . We do not work with (explicitly defined) derivatives, but instead we work with prederivatives, i.e., with approximations having specified properties. It will be shown t h a t various forms of derivatives fit into our general model. If the underlying space X is finite-dimensional, then problem ( P ) becomes an instance of what is commonly called a semi-infinite p r o g r a m m i n g problem. Various
Prederivatives and Second Order Conditions
245
forms of second order conditions for semi-infinite programming problems, mostly for the case C = X, have been given in [2, 7, 9, 12, 13, 14, 19, 20, 24]. Here X will be of arbitrary dimension, and C will be a proper subset of X. The reader who is interested in the general theory of higher order optimality conditions for abstract mathematical programming problems is referred to [1, 3, 8, 11, 15, 17, 19]. Amongst recent approaches which would be useful for inf-sup problems we mention [22] where the notion of epi-derivative [23] is used as the main tool for deriving second order optimality conditions.
2
Second Order Necessary Conditions
In this Section we derive necessary conditions for a local minimum of problem (P). We assume that T is a compact space, X is a real topological vector space, C C X is arbitrary, and ft : C —> R for all t £ T. We say that XQ G C is a local minimum of (P) iff f{x0) is finite and there is a neighborhood V of x0 in X such that f(x0) < f(x) whenever x £ C PI V For fixed x we denote the function t — t > ft(x) by /(.)(x). Let x0 G C be fixed such that f(x0) is finite. Without loss of generality we suppose that f(x0) = 0. We assume that the function /(.)(zo) is upper semicontinuous on T, so that the set T0 := {t G T : ft(x0)
= /(*„)} = {t€T
: ft(x0)
> 0}
(2.1)
is compact and nonempty. The approximations to be used in this Section are collected in the following As sumption 2.1, which remains in force throughout this Section. Assumption 2.1 (i) Let H C X be a convex cone, and D C H a nonempty subset. (ii) For all x £ H and d G D let there exist Oi(-) : M+ —> X such that, for all £ > 0 sufficiently small, xe:=x0
+ ed + E2x + o1{E2) (EC,
(2.2)
where Oi(-) is subject to the condition that lim
Ol(e)/e
= 0.
(2.3)
(Hi) For all t G T, let f] : clH -> M and ft : D -* R have the following properties: (a) fl(•) is lower semicontinuous, convex, and positively homogeneous of degree 1; (b) the functions fh(x) and fh(d) are upper semicontinuous on T for all x £ clH and d G D respectively; (c) for all x G H and d G D there exists o2(-) : R+ —> R such that, for all e > 0 sufficiently small, ft(xc)-ft(x0)<efl(d)
+ e'(f}(x)
+ ff(d))+o2(Si)
V
(2.4)
W. Oettli and P. H. Sach
246 where xc is given by (2.2) and lim o 2 (e)/e = 0. e|0
Remark 2.2 If 0 £ D and if / ( 2 (0) = 0, then by setting d := 0 in (2.4) and substituting e for e 2 we obtain the following relation between / ( and f}: ft(x0 + ex + o,(e)) - /,(x 0 ) < e//(x) + o2(e)
V< £ T.
Definition 2.3 A direction d £ X is called critical iff d £ D and, for some 6 > 0, ft(xo) + 6f}{d)<0
VteT.
(2.5)
Remark 2.4 Since / ( (x 0 ) < 0 for all t £ T, (2.5) implies that /t(*o) + */?(<*) < 0 Vi £ T, Ve £ [0,*].
(2.6)
In the sequel we shall need the following result. Lemma 2.5 Assume that Xo £ C is a local minimum o/(P). Then for every critical direction d we have max (fl(x) + f?(d))>0 Vx£#. (2.7) ifcio
Proof. Fix d £ X, a critical direction. Set (?i(x) := f}(x) + f?(d). Assume, for contradiction, that (2.7) is false. Then there exists x £ H and 7 > 0 such that 9,(5) < - 7
Vi£T0.
Then the set T\ := {t £ T : (?i(x) < —7/2} is open and contains T0. Hence T2 := T \ 7] is compact and disjoint from To. Therefore k0 := max/((.To) < 0. teT2
Let &i := max//((f), A:2 := maxqt(x). We set xe := x0 + ed + e 2 x + oils'2) and obtain from (2.4) that /«(*«) < /.(*o) + e//(«0 + £29<(5) + o 2 (e 2 ). For < £ Ti it follows from (2.6) and (2.8) that, for all e £ [0,(5], /*(*.) < e2qt{x) + o 2 ( £ 2 ) < - e 2 7 / 2 + o 2 (e 2 ).
(2.8)
Prederivatives and Second Order Conditions
247
For t E T2 it follows from (2.8) that ft(xc)
teT
This is impossible, since Xo is a local minimum of (P). The next lemma can be read off from [16, Theorem 1] or [4, p.99-100]. For the sake of completeness we include its proof. Lemma 2.6 Let H C iRn be a nonempty, closed, convex set. Let T be a nonempty compact set. Let ip : T x H —> M be a function such that up{t, •) is convex and lower semicontinuous on H for all t E T, and ip(-,x) is upper semicontinuous on T for all
xeH.
if maxip{t,x)>0
Vx E H,
(2.9)
tef then there exists a finite subset T C T such that m&xu>(t,x) > 0 V i e / / . Proof. For simplicity we assume that T has at least n + 1 elements. Let xo E H and Hp := {x E H : ||i — x 0 || < p} (p > 0). Assume that (2.9) is true. Then for every fixed e > 0, p > 0 the family of sets H(t) (t E f) with H{t):=
{xeHp
:
-c}
has empty intersection. Since the sets H(t) C JR." are convex and compact for all t E T and have empty intersection it follows from Helly's Theorem [21, Corollary 21.3.2] that there is a subfamily of n + 1 sets having empty intersection, i.e., there is {tut2,...,tn+1) E fn+1 such that max
i^(ti,x) > — e
Vx E H„.
i=l,2,...,n+l
Hence the sets F(£,p):={(h,t2,...,tn+1)efn+i
:
max
*(*,•,x) > -e
Vx E ^ }
%=l,^,...,n+l
are nonempty for all e > 0, p > 0. This implies at the same time that any finite collection of the sets F(e,p) has nonempty intersection, since f]F(ei,pi)
D
F(mme„ma.xpi).
IV. Oettli and P. H. Sach
248
The sets F(e,p) being closed subsets of the compact set T n + 1 , the collection of all F(e, p) with e > 0, p > 0 has nonempty intersection. Then from
(Ix,I 2 ,...,Vi)e f|
F £
( >P)
e > 0 p>0
follows
max
y?(?,, x) > 0 Vi 6 ff. Hence, setting T := {t\,t2, ...,t n +i} we
t=l t 2,...,n+l
obtain the desired conclusion. For the next lemma we have to introduce some notation. Let T ^ 0 be a compact set. Let C(T) be the Banach space of all continuous functions F : T —* M with the norm \\F\\ := max|.F(/)|. We denote by C*(T) the continuous dual of C(T). For <e? A € C*(?) we define A > 0 :«=> (A, F) > 0 for all F £ C ( f ) such that F(r) > 0 on
f.
Lemma 2.7 Lei H ^ ® be a, convex set. Let T =fc % be a compact set. Let ip : T x H —► M be a function such that (p(t, •) is convex on H for all t £ T, and tp(-, x) is continuous on T for all x £ H. If max)(t,i)>0
Vz £ H,
(2.10)
!6T
i/ien iftere exists A £ C*(T) suc/i that A > 0, A ^ 0, and (A,v(-,x))>0
VxeH.
Proof. By assumption, for every fixed x £ H, y>(-,x) £ C(T). Let Q, := {F £ C ( f ) : F(t) < 0 Vt £ f } , Q 2 : = {F £ (7(f) : there exists x eH
such that F(t) >
Vf £ f } .
The sets 0 on Q2- The first inequality gives A > 0. Applying the second inequality to the functions ^p[-,x) £ Q2 we obtain the claimed result. Remark 2.8 If the set T is finite, then the conclusion of Lemma 2.7 gives the existence of real numbers A( > 0 (t £ T), not all zero, such that E, e jrA,y(*,*)>0
V.T£i7.
Without loss of generality we may assume that £ ( e j:A t = 1. As a consequence of Lemmas 2.5-2.7 we obtain
(2.11)
Prederivatives and Second Order Conditions
249
Theorem 2.9 Assume that x0 is a local minimum of (P). Then for every critical direction d and every finite-dimensional linear subspace S C X there are a finite subset T CT0 and real numbers \ t > 0 (t g T ) with £ t e r A« = 1 such that L1(x)>0
VxetfnS,
L2(d) > 0,
(2.12) (2.13)
where L\x)
:= E t e r A,/, 1 ^), L\d)
:= £ t e T KfUd)-
(2.14)
Proof. We fix d E X, a critical direction, and 5 C -V, a finite-dimensional subspace. For x G clff and t € T0 let v>(r,x) := f}(x) + fi(d). From Lemma 2.5 it follows that maxu((,i)>0
Vx e H D S.
Since S is of finite dimension, and maxy?(i, ■) is convex, this implies that t€To
maxy>(r,x) > 0 Vx G H, where if denotes the closure of H f~l 5 in S. We apply Lemma 2.6 and obtain a finite subset T C To such that maxu;((,i) > 0 Vx £ H. Now we apply Lemma2.7, where we substitute T :=T From Lemma 2.7 and Remark 2.8 we obtain real numbers A< > 0 (t € T) with EfgT -^t = 1 such that EteT-W(M)>0
Vx6#.
This implies by (2.14) that L 1 (s) + I 2 ( i ) > 0
VxetfnS.
(2.15)
From (2.15) follows (2.13), since Ll{Q) = 0, and (2.12), since £*(•) is positively homogeneous of degree 1. This completes the proof. For the next result we need the following additional requirement, where d £ X is a critical direction, and S C X is a linear subspace: If max<j((x) > 0 Vx e H n S, then maxg,(x) > 0 Vx € # , *er„ w ; _j<=T„ where qt{x) := f}{x) + f?(d), and .// is the closure of H D 5 in S.
(2.16)
Obviously, (2.16) is satisfied if either of the following conditions holds: (a) H f~l S is closed in S; (b) H (~) S has nonempty interior in S; (c) maxqt(-) is upper semicontinuous on clH.
W. Oettli and P. H. Sach
250
Theorem 2.10 Let x 0 be a local minimum o/(P). Let d G X be a critical direction. Let S C X be a reflexive Banach space whose norm topology is stronger than the topology inherited from X. Let (2.16) hold. Then for every e > 0 there are a finite subset T C To and real numbers Xt > 0 (t G T) with J2teT ^t = 1 such that L\x)
> -e||x||
VxGtfnS,
L2{d) > - e ,
(2.17) (2.18)
where L'(-) for i = 1,2 are defined by (2.14). Proof. Let e > 0, p > 0. Let qt(-) and H be as in condition (2.16). Let Hp := {x g H : ||x|| < p}. From Lemma 2.5 and condition (2.16) follows max<7((x) > 0 Vx G Hp. Therefore the family of sets K(t) := {x € Hp : qt{x) < — e}, where t G To, has empty intersection. The set Hp is weakly compact, since S is reflexive, and the sets K(t) are convex and closed, hence weakly closed. So there exists a finite subset T C To such that M K(t) = 0. So max — E VX G HP. We apply Lemma 2.7 t€T _ __ _ ter with H := Hp, T : = T,
0,
(2.20) (2.21)
where L\x)
:= (XJ^x)),
L2{d) : = (Xjf.^d)).
(2.22)
Proof. We concatenate Lemma 2.5 and Lemma 2.7, where we substitute H := H, f := To,
Prederivatives and Second Order Conditions
3
251
Second Order Sufficient Conditions
In this Section we derive sufficient conditions for a strict local minimum of problem (P). We say that x0 £ C \s a, strict local minimum of problem (P) iff there is a neighborhood V of x0 in X such that f(x0) < f(x) whenever x G C D V, x / x0. We assume that X is a normed space; C C X and the topological space T are arbitrary. We fix x0 G C, and as in the previous Section we assume that f(x0) = 0, and To is given by (2.1). The approximations to be used in this Section are collected in the following As sumption 3.1 which remains valid throughout this Section. Assumption 3.1 (i) Let H C X be a cone. (ii) Let h : C —> H have the following property: For all i E C , h(x) = x — x0 + o(x — xo),
(3-1)
where o(-) : X —> X satisfies limo(f)/||£|| = 0. (Hi) For all t G To, let f't : clH —> (a) //(■) is positively homogeneous (b) / ( 2 (-) is positively homogeneous (c) there exists oj(-), o2t(-) : H —►
]R (i = 1,2) have the following properties: of degree 1; of degree 2; M such that, for all x €. C,
ft(x) - Mxo) > fi(h(x)) + o](h(x)), ft(x) - /«(*„) > /?(*(*)) + ftiKx))
+ o]{h{x)),
(3.2) (3.3)
where h(-) is as in (3.1), and Yimo)(h)l\\h\\ = 0, Yimo2t{h)l\\h\\2 = 0. h—►0
h—tO
We recall that Assumption 3.1 (ii) was used in [18] as a main tool to derive sufficiency results. R e m a r k 3.2 If ff(-) is lower semicontinuous at 0, then (3.2) is a consequence of (3.3). Indeed, from /, 2 (0) = 0 and the lower semicontinuity follows the existence of 6 > 0 such that f?(h) > - 1 for all h G H with \\h\\ < 6, hence by homogeneity f?(h) > -<5-2||ft||2 for all he H. Therefore for all h G H we have
f?(h) + o2(h)>-6-2\\h\\2
+
o2(h)=:ol(h).
This shows that (3.3) implies (3.2). Definition 3.3 A sequence {dk} C X is called weakly critical iff dk € H, \\dk\\ = 1 for all k, and l i m s u p / / ( 4 ) < 0 Vt G T0. (3.4) k—*oo
A direction d G X is called weakly critical iff d G clH, \\d\\ = 1, and
fl(d)
wteT0.
W. Oettli and P. H. Such
252
Theorem 3.4 Assume that, for every weakly critical sequence {dk}, there are a finite subset T C To and real numbers Xt > 0 (t g T) such that L\x)
> 0 Vz £ H,
(3.5)
limsupi2(4) >0,
(3.6)
k—*oo
where L'(x) := YlteT ^tft(x)
(l
=
1,2). Then XQ is a strict local minimum of (P).
Proof. Assume, for contradiction, that io is not a strict local minimum of (P). Then there is a sequence {xk} C C such that qk := xk — x0 —► 0, qk ^ 0 and 0 > / ( x t ) — / ( x 0 ) for all k. The last inequality implies for all k that 0 > ft(xk)
- ft(x0)
VteT0.
(3.7)
Let h be the map appearing in (3.1). Putting hk := h(xk) it follows from (3.1) that
(A*-»)/ll«*l|-»fc This implies, since ^ —> 0, that hk — qk —> 0, and therefore /i^ —> 0. oJ(A/t)/||/it|| -> 0, o?(/i*:)/||^fc||2 -> 0. Combining (3.7) and (3.2) we obtain 0 > fl(hk) + o](hk)
Hence
V i e To,
(3.8)
Vi
(3.9)
or, equivalently,
o > tftoO + °}(fc*)/HM
er0,
where <4 := At/||A*|| 6 if. By letting fc —» oo in (3.9) we get (3.4). This shows that {dk} is a weakly critical sequence. By assumption there are a finite subset T C To and \ t > 0 (t G T) satisfying (3.5) and (3.6). Combining (3.7) and (3.3) we obtain for all k that Q>fl{hk)
+ f?(h)
+ o2t(h)
ViGT
(3.10)
Multiplying (3.10) by A( and summing up we obtain 0 > Ll(hk) + L*{hh) + d{hk), where o(hk)/\\hk\\2
(3.11)
—► 0. Dividing (3.11) by \\hk\\2 and taking account of (3.5) we find 0>L2(dk)
+ d(hh)/\\hk\\\
which implies 0 > \\m sup L2(dk), k—*oo
(3.12)
253
Prederivatives and Second Order Conditions contradicting (3.6). The theorem is thus proved.
Remark 3.5 If oj(-) in (3.2) is independent of t, then the assumption in Theorem 3.4 can be replaced by the following assumption: There are S > 0, 7 > 0, a finite subset T C T0, and A, > 0 (! 6 T) such that (3.5) holds and L2(d) > 7 for all d e Hs, where Hs := {d £ H \\d\\ = 1, f}(d) < 6 Vt e To}. Indeed, let the sequence {dk} be as in the proof of Theorem 3.4. Then from (3.9) follows, for all k sufficiently large, that dk 6 Hs, hence L2(dk) > 7, which implies limsupl 2 (djt) > 0. k—*oo
Remark 3.6 If X is finite-dimensional and the functions / ( ' ( ) (i = 1,2) are lower semicontinuous, then the following condition is sufficient for the assumption of The orem 3.4 to be satisfied: For every weakly critical direction d there are a finite subset T C To and A( > 0 (t E T ) such that (3.5) holds and L2{d) > 0. Indeed, take an arbitrary weakly critical sequence {dk}. By the the compactness of the unit sphere in X and the lower semicontinuity of //(•) we may suppose that dk converges to some weakly critical direction d. Prom L2(d) > 0 and the lower semicontinuity of /?{■) follows (3.6). Theorem 3.7 Let TQ be compact, let f!.\(x) {i = 1,2) be continuous on To, and let o2(-) in (3.3) be independent oft. Assume that for every weakly critical sequence {dk} there is a linear functional A on C(To), A > 0, such that (3.5) and (3.6) are satisfied with L'(x) := (A,/,',(x)) (i = 1,2). Then x0 is a strict local minimum o/(P). Proof. The proof is a replica of the proof of Theorem 3.4. Remark 3.8 The results of Sections 2 and 3 can be applied to the infinite program ming problem (Po)
min{/ 0 (x) : xeC,
ft(x) < 0
V(£ T},
where /o : C —* M is a given function. Indeed, let xo G C be a feasible point for problem (Po)- Consider the following optimization problem (Pi)
min{sup/,(i) : x € C } ,
where f := T U {0} and ft
\
/((I) :=
if t e T, / *(* 1 /o(xx) — fo{x0) if t = 0.
Problem (Pi) has the same structure as our standard problem (P). Moreover it is easily seen [12] that:
W. Oettli and P. H. Such
254
x0 is a local minimum of (Pi), if it is a local minimum of (Po); io is a strict local minimum of (Po), if it is a strict local minimum of (Pi). In this way we obtain necessary conditions for a local minimum of (Po), a n d sufficient conditions for a strict local minimum of (Po).
4
Examples
Below we indicate some examples for the approximations used in Sections 2 and 3. The first two examples refer to Assumption 2.1 (ii). E x a m p l e 4.1 Assume that C C X is a convex set. Let D := H := {A(c - x 0 ) : A > 0, c € C} -. cone(C - x 0 ). We shall see that Oi(e) = 0 satisfies (2.2) for all x € H, d 6 D. Indeed, choose a > 0 and j3 > 0 such that d\ := x 0 + a_1 0 sufficiently small, and (2.2) is true. Example 4.2 Let C := {x € X \ g(x) = 0}, where g : X —► Y is a mapping between the Banach spaces X and V. Assume that g is twice continuously Frechet differentiable, with first and second Frechet derivatives <7'(x) and g"(x) respectively. Assume that ^'(^o) is surjective. Choose H :={xeX D := {d € X
: g'{x0)x = 0},
g'(x0)d = 0, (g"(x0)d, d) = 0}.
Then there exists Oi(-) satisfying (2.2) and (2.3), see [3, Proposition 7.2]. From now on let X be a normed space. The next four examples refer to Assump tion 2.1 (iii). More precisely we shall verify condition (2.4) for different choices of ft(-). In all these examples it is supposed that ft:X—iRis continuously Frechet differentiable for all t 6 T, and that f}(x) := / ( '(x 0 )x. In accordance with (2.2) we set x£ := x 0 +
(e > 0),
where Oi(-) satisfies (2.3). The functions /,(■) will be positively homogeneous of degree 2 in all cases. Example 4.3 Assume that the functions /*(•) are twice continuously Frechet differ entiable, and let
f?(d):=l-(ti'(x0)d,d).
Frederivatives and Second Order Conditions
255
By the second order Taylor expansion formula we have, for all t G T, /t(x«) - ft(x0) = f't{x0)(xe
- x„) + r(/*'(&,e)(x £ - x 0 ), xE - x 0 )
for some £tiS £ [x0, x j . Hence /t(*«) - /i(*o) = efl(d) + e2fi(x)
+ e2f?(d) +
where <^(e2) := / ( '(xo) 0l (e 2 ) + 1 ((/ { "(^, e )(x e - *„),*. - Xo> - e3(f?{x0)d,d)). We require that the sets {/,'(x0) : < € T} and {/"(x 0 ) : < 6 T} are bounded, and that /"(•) is continuous at x 0 , uniformly in t 6 T. Then, as e J. 0, ypt(e2)/e2 converges to zero uniformly with respect to t £ T, hence (2.4) is satisfied. Example 4.4 Let ft(-) be continuously Frechet differentiable. We set fi{d) := limsup — (ft(x0 + ed) - ft(x0)
- ef't(xQ)d),
(4.1)
assuming that the limit in (4.1) is finite and that convergence to the limit is uniformly in t £ T. Then from (4.1) follows the existence of a function r(e) such that limr(e) = 0 and, for all t 6 T, ft(x0 + ed) - /,(*„) - eft(x0)d
< e2f2{d) + r(e) • e 2 .
By the mean value theorem we have, for all t £ T, ft{xc) - ft(x0) = }'Mt,c){s2x + o,{e2)) + ft(x0
+ ed) -
ft(xQ)
for some £(|£ £ [x0 + e
sup ||X -
Uft(x + ed)d-f[{x)d).
X 0 | | < T)
||x + ed— x 0 || < 0 < £ < T]
T)
(4.2)
W. Oettli and P. H. Sach
256
It follows from [12, Proposition 4] that fi(eh) = e2f?{h) for all e > 0, and that
\fUh)-ff(h)\<^\\h-m\\h\\ + \\M)Moreover, if ||ft|| < r/ we have /«(*o + *) - /f(*o) - f't(x0)h = j f ( / I ( * o + \h)h = / " ( A ' C ^ + €v)v - rt(xo)v)de <2fUv)[mede Jo
= \\hff?(v)
f't(x0)h)dX
[v := h/\\k\\, e := X\\h\\]
= f?(h).
Thus, if H&II < r/, then /«(*o + A) - /«(*o) < yj(*o)fc + # ( * ) < tf(*o)fc + f?(h) + |||fc - I | | (||h|| + ||*||). Now we let e > 0 be so small that ||xE —£0|| < V, and obtain with h := x s —x 0 , ft := ed, that /t(*«) ~ /t(*o) < /f(*o)(«« + £ 2 x) + e2f?(d) + p«(e»), 2 where V ( (e ) := /;(x 0 ) O l (e 2 ) + f ||e2x + 0 l (e 2 )|| • (||x. - x 0 || + e||d||). We require that {/,'(xo) : t 6 T) is bounded. Then v?((e2)/£2 -> 0 for £ | 0 uniformly in <, and (2.4) is satisfied. The approximation (4.2) was introduced in [12]. However, in [12] it was not applied to the functions ft themselves, but to the Lagrangian of problem (P). The resulting second order approximation to the Lagrangian is sublinear with respect to the multiplier, whereas our function L2(-) see (2.22) depends linearly on the multiplier. Therefore our results are different from those in [12]. Example 4.6 Let X be finite-dimensional. Assume that the Frechet derivatives /,'(•) are Lipschitz continuous in a convex neighborhood V of x 0 . Define St(x,d) := limsup — (f'v t{u + ed)d — f!{u)d). u —> x 2e ' Then [10] the following mean value property holds for all Xi,x 2 € V : ft(x2)
- ft(Xl)
< f't{Xi)(X2
for some £< 6 [xx,X2]. Moreover St(x,ed)
- x , ) + St({ttXi
= e2St{x,d)
fi(d):=St(x0,d).
-Xi)
for all e > 0. We set
(4.3)
Prederivatives and Second Order Conditions
257
This is the limiting case of (4.2) as 77 J. 0. Then we obtain from the above mean value property: ft(Xc)
~ ft(xo)
< f',{Xo)(xc
- X0) + St(£t,e,Xc
-
X0)
+ e2x) + f?(ed) +
where v?((e2) := / ( '(x 0 )oi(e 2 ) + max{0, St(^c,xc - x0) - St(x0,£d)}. The function St(-, ■) is upper semicontinuous [6]. We require that it is so uniformly in t € T, at all points {x0,d) with d G D, and that the set {/,'(i 0 ) : t G T] is bounded. Then (2.4) holds. We note that (4.3) can equivalently be written as f2(d):=^Sup{{Md,d)
: MeS2ft(x0)},
(4.4)
where S2ft(x0) C C(X,X') is a generalized set-valued second derivative, see [10]. The next three examples refer to Assumption 3.1. More precisely we shall verify inequality (3.3). In these examples C C X is an arbitrary set, and we let H := cone(C — Xo). Then we can set h(x) := x — x0 and o(h) s 0 in (3.1). As before, /?(*) := f[{x0)x. Example 4.7 Assume that / ( (-) is twice Frechet differentiable. Then (3.3) holds with
f?(d):=±(f?(xo)d,d), since ft(x) - ft(x0) = f't{x0){x - x0) + \{f't'{x0){x rt(x — x0)l\\x — x0\\2 —» 0 for x —> x0-
- x0),x - x0) + rt{x - x0), where
Example 4.8 Assume that / ( (-) is Frechet differentiable and that /,'(•) is Lipschitz continuous in a neighborhood V of XQ- Fix 77 > 0 as in Example 4.5 and let fl(d):=
inf Uft(x + ed)d-ft(z)d). Ze \\x-x0\\
(4.5)
0 < e < 77
Similarly with Example 4.5 we obtain for all h sufficiently small that ft(x0 + h)-
ft(x0)
> f't(x0)h
+
f2{h).
Thus (3.3) holds with h(x) := x - x0 and o2(h) = 0 for all h sufficiently small. Example 4.9 Let X be finite-dimensional. Assume that the Frechet derivative /,'(■) is Lipschitz continuous in a neighborhood V of x0. Define st(x,d)
:= liminf ^ ( / , ' ( « + ed)d -
f't(u)d),
W. Oettli and P. H. Sach
258 and set /,2(
Similarly with Example 4.6 we obtain that
ft(x0 + h) - ft(x0)
> ft(x0)h
+ st((t, h)
> ft{x0)h + f*(h) + rt(h), where & 6 [x0, x0 + h] and rt(h) := min{0, st((t, h) — st(x0, h)}. The function st(-, •) is lower semicontinuous. We require that st(-, h) is lower semicontinuous at xo, uniformly with regard to all h with \\h\\ = 1. Then r((/8)/||/»||2 —> 0 for h —> 0, and condition (3.3) is satisfied. Let us mention that a more complex form of necessary and sufficient second order conditions for problem (P) has recently been given by Kawasaki [13, 14]. In these conditions the inequalities involving second order derivatives - the counterparts of (2.13) and (3.6) carry an extra term which describes a suitable approximation to the function sup, € T / ( (-). It has not been possible as yet to incorporate this extra term into our general formalism. Acknowledgment. This paper was written during a research visit of the second author to Universitat Mannheim. He is indebted to Deutscher Akademischer Austauschdienst (DAAD) for financial support.
References [l] A. Ben-Tal, Second order theory of extremum problems, In: Extremal Methods and Systems Analysis (Lecture Notes in Economics and Mathematical Systems, Vol. 174), 336-356. Springer-Verlag, Berlin, 1980. [2] A. Ben-Tal, M. Teboulle, and J. Zowe, Second order necessary optimality con ditions for semi-infinite programming problems, In: Semi-Infinite Programming (Lecture Notes in Control and Information Sciences, Vol.15), 17-30. SpringerVerlag, Berlin, 1978. [3] A. Ben-Tal and J. Zowe, A unified theory of first and second order conditions for extremum problems in topological vector spaces, Mathematical Programming Study 19 (1982) 39-76. [4] E. Blum and W. Oettli, Mathematische 1975.
Optimierung, Springer-Verlag, Berlin,
[5] J. M. Borwein, Semi-infinite programming duality: how special is it? In: Semiinfinite Programming and Applications (Lecture Notes in Economics and Math ematical Systems, Vol. 215), 10-36. Springer-Verlag, Berlin, 1983.
Prederivatives and Second Order Conditions
259
[6] F. H. Clarke, Generalized gradients of Lipschitz functionals, Advances in Math ematics 40 (1981) 52-67. [7] B. S. Darkhovskii and E. S. Levitin, Quadratic optimality conditions for prob lems of semi-infinite mathematical programming, Transactions of the Moscow Mathematical Society 48 (1985) 175-225. [8] N. Furukawa and Y. Yoshinaga, Higher-order variational sets, variational deri vatives and higher-order necessary conditions in abstract mathematical program ming, Bulletin of Informatics and Cybernetics 23 (1988) 9-40. [9] R. P. Hettich and H. Th. Jongen, Semi-infinite programming: conditions of opti mality and applications, In: Optimization Techniques, Part II (Lecture Notes in Control and Information Sciences, Vol. 7), 1-11. Springer-Verlag, Berlin, 1978. [10] J. -B. Hiriart-Urruty, J. J. Strodiot, and V. H. Nguyen, Generalized Hessian ma trix and second-order optimality conditions for problems with C 1,1 data, Applied Mathematics and Optimization 11 (1984) 43-56. [11] K. H. Hoffmann and H. J. Kornstaedt, Higher-order necessary conditions in abstract mathematical programming, Journal of Optimization Theory and Ap plications 26 (1978) 533-569. [12] A. D. Ioffe, Second order conditions in nonlinear nonsmooth problems of semiinfinite programming, In: Semi-Infinite Programming and Applications (Lecture Notes in Economics and Mathematical Systems, Vol. 215), 262-280. SpringerVerlag, Berlin, 1983. [13] H. Kawasaki, An envelope-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems, Mathematical Programming 41 (1988) 73-96. [14] H. Kawasaki, Second-order necessary and sufficient optimality conditions for minimizing a sup-type function, Applied Mathematics and Optimization 26 (1992) 195-220. [15] F. Lempio and J. Zowe, Higher order optimality conditions, In: Modern Applied Mathematics, ed. by B. Korte, 147-193. North-Holland, Amsterdam, 1982. [16] V. L. Levin, Application of a theorem of E. Helly in convex programming, the problem of best approximation, and related problems, Matematichesirii Sbornik 79 (1969) 250-263. [17] E. S. Levitin, A. A. Miljutin and N. P. Osmolovskii, Higher order conditions for a local minimum in problems with constraints, Uspekhi Matematicheskikh Nauk 33 (1978) 83-148.
260
W. Oettli and P. H. Sach
[18] H. Maurer and J. Zowe, First and second order necessary and sufficient opti mality conditions for infinite-dimensional programming problems, Mathematical Programming 16 (1979) 98-110. [19] Pham Huu Sach, Second-order necessary optimality conditions for optimization problems involving set-valued maps, Applied Mathematics and Optimization 22 (1990) 189-209. [20] Pham Huy Dien and Pham Huu Sach, Second-order optimality conditions for the extremal problem under inclusion constraints, Applied Mathematics and Opti mization 20 (1989) 71-80. [21] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1972. [22] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematics of Operations Research 14 (1989) 462-484. [23] R. T. Rockafellar, First- and second-order epi-differentiability in nonlinear pro gramming, Transactions of the American Mathematical Society 307 (1988) 75108. [24] A. Shapiro, Second order derivative of extremal-value functions and optimality conditions for semi-infinite programs, Mathematics of Operations Research 10 (1985) 207-219.
Solution
Stability
of Nonsmooth
Equations
261
Recent Advances in Nonsmooth Optimization, pp. 261-288 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Necessary and Sufficient Conditions for Solution Stability of Parametric Nonsmooth Equations Jong-Shi P a n g 1 Department of Mathematical Sciences, The Johns Hopkins University, Maryland 21218-2689, U.S.A.; [email protected]
Baltimore,
Abstract
This paper gives necessary and sufficient conditions for the stability and the strong stability of a solution to a parametric nonsmooth equation under a set of mild assumptions. These assumptions are imposed on a first-order approxi mation of the nonsmooth function; one of them is motivated by the well-known second-order necessary condition in nonlinear programming and generalizes a key assumption in the classical Leray-Schauder fixed-point existence theorem. Specializations of the stability results to a parametric variational inequality and its Karush-Kuhn-Tucker system are discussed.
1
Introduction
Introduced by S.M. Robinson [26], the notion of a strongly regular solution to a gen eralized equation has played a central role in the sensitivity and stability analysis of p a r a m e t r i c , constrained optimization problems and variational inequalities. In t h e context of a p a r a m e t r i c nonlinear program ( N L P ) with differentiable inequality and equality constraints, Robinson showed in this reference t h a t if a stationary point of the program satisfies two assumptions: (i) LICQ-linear independence of the gradients of t h e active constraints, and (ii) S S O S C - s t r o n g second-order sufficiency condition, t h e n t h e stationary point is a strongly regular solution of the Karush-Kuhn-Tucker ( K K T ) system of t h e given program. Recently, Bonnans and Sulem [3] establish a very interesting converse of this result for a local m i n i m u m ; specifically, they show 'This work was based on research supported by the National Science Foundation under grant CCR-9213739.
J. S. Pang
262
that if the stationary point is a local minimum of the nonlinear program, then strong regularity of the point implies LICQ and SSOSC. Since every local minimum must satisfy the second-order necessary condition (SONC) [4], the Bonnans-Sulem result raises the question of whether more consequences of the SONC can be obtained than previously known; in particular, since it has been shown in [10] that the SONC has an important role to play in the stability analysis of parametric variational inequal ities (Vis) and related problems, it is natural to suspect that solution stability of a parametric VI can perhaps be characterized under the SONC. This line of thought has led us to the present investigation. Another motivation of this work stems from a previous paper [20] in which we have obtained some sufficient conditions for a given solution to a parametric nonsmooth equation to be stable; we have applied the result to a parametric variational inequal ity defined on a polyhedral set that is independent of the parameter. The goal of this paper is to obtain necessary and sufficient conditions for a given solution to a paramet ric nonsmooth equation to be stable and to apply the derived results to a parametric variational inequality in which the defining set is parametrized and not necessarily polyhedral. Similar characterizations are obtained for strong stability. In the context of a parametric KKT system, under the strict Mangasarian-Fromovitz constraint qual ification (SMFCQ) [13] and a second-order necessary condition, we show that strong regularity is equivalent to strong stability. Other specializations of the main charac terizations of stability and strong stability will also be discussed. For references on the sensitivity analysis of parametric Vis and NLPs, we cite [26, 27, 10, 12, 14, 23, 32]. As it turns out, a key assumption needed in our analysis happens to unify both the SONC in nonlinear programming and an assumption in the classical Leray-Schauder fixed-point existence theorem; [16, Theorem 3.1.4] or [18, Theorem 6.3.3]. This is somewhat unexpected because the SONC, well known as it may be in optimization, is not known to bear any relationship to existence results of fixed points of continuous mappings. The tool that brings out such a connection is degree theory [16, 18]. For readers who are not familiar with this theory, we advise them to consult the two references.
2
Miscellaneous Concepts
We shall be concerned with the system of parametric nonlinear equations:
F(*,w) = v. +m
(1)
where F : ft" —> ft" is a continuous function. The parameters of this equation are y 6 ft" which varies in a neighborhood of the origin 0 € ft" and u g ftm which varies in a neighborhood of a given vector u>* £ ftm; x is the primary variable of the equation. Suppose x* g ft" is a zero of the function F(-,w*). We are interested in the sensitivity of this solution as the parameters (w, y) vary around (ui*, 0). The following concepts play a central role in the sensitivity analysis of (1).
Solution Stability of Nonsmooth
Equations
263
Definition 1 The vector x" 6 F(-,u>*) 1(0) is said to be stable if there exist a positive scalar c> 0 and neighborhoods V C R", W C Rm, and U C Rn1 of x' ,u)*, and the origin, respectively, such that (a) for each (u,y) £ W x U, the set
5v(«,») = F(- 1 «)- 1 (y)nV is nonempty, and (b) sup{||x - x*|| : x € Sv{u,y)}
< e(||w - w ' | | + ||»||).
The same zero x" is said to be strongly stable if the set Sv(u,y) is a singleton for all (u,y) £ W X U and the function Sy : W X U —> V is Lipschitz continuous on its domain. We make several remarks regarding the above definition. Stability of the zero x" implies that the perturbed system F(x,ui) = y,
x £ V
is solvable, although not necessarily uniquely, for all pairs (w, y) sufficiently close to (u>*,0). Condition (b) implies that x" is an isolated zero of F(-,UJ")\ i.e. Sy(a/,0) = {x"}. In the language of set-valued analysis [1], (b) stipulates a locally, pseudo upper Lipschitzian property of the perturbed solution map Sv at the base pair (w*, 0) relative to the vector x'; that is, SY(U,y)
C {x*} + c ( | | w - « * | | + ||»||)B(0,1),
where B(0,1) is the unit Euclidean ball in Rn Clearly, if x" is strongly stable, then it is stable. Note that the Lipschitzian property of the solution function Sv means that there exists a constant L > 0 such that for any two pairs (w, y) and (a/, y') that belong to W x U, we have ||Sv(w,») - Sv(u\y')\\
< I ( | | w - « ' | | + || v - y'\\).
(2)
This last inequality is stronger than the requirement (b) which concerns only the base pair (w*,0) and one perturbed pair (oJ,y). As we see from the above discussion, the stability of the zero x" involves two requirements: (a) which asserts the (local) solvability of all slightly perturbed equa tions, and (b) which provides some kind of continuity of the perturbed solutions with reference to x* In the context of variational inequalities, Bonnans [2] has coined the term "hemistability'' for condition (a) and "semistability" for (b). He has obtained necessary and sufficient conditions for the semistability of a solution to a linearly con strained variational inequality and has discussed the combined role of hemistability
J. S. Pang
264
and semistability in the convergence analysis of Newton's method for KKT systems of variational inequalities. Related analysis of the latter type for general nonsmooth equations can be found in [20, Section 5]. Bonnans did not provide conditions for the hemistability of x*. In the subsequent study of variational inequalities, we allow the problems to be defined on perturbed, non-polyhedral, convex sets. In general, the stability of the zero x' will be studied via a first-order approxi mation of the mapping F(-,u>"). This approximation concept was formally defined in [28] and had played an important role in nonsmooth analysis. Definition 2 A function $ : R" —> R" is said to be a first-order approximation (FOA) of the function H : Rn —> Rn at the vector x* 6 Rn if the error function e(x) = H(x) — $ ( i ) satisfies two properties: (a) e(x*) = 0 and (b) V ; l.m = 0. *-»» \\x — x \\
This approximation is said to be strong if lim
e(x) — e(x') -V 7TT- = °-
If H is F(rechet)-differentiable at x* with the Jacobian matrix V'H(x*), then the affine function
$(i) = H(x") +
VH{x')(x-x')
is a FOA of H at x". This is a strong FOA if H is strongly F-differentiable at x' We will see more examples of FOAs when we discuss the application of the general theory to parametric variational inequalities. Although Definition 1 concerns a zero of the function F which has two arguments, the stability and strong stability concepts certainly are applicable to a function of a single argument.
3
The Assumptions for Stability
Returning to the parametric equation (1), we postulate some blanket assumptions (besides continuity) on the function F in order to study the stability of x" These assumptions are as follows: (A) The function F{-,UJ") has a FOA $ at x'\ moreover, $ is continuous in a neigh borhood of x". (B) The function $(u) = $(x* + v) is positively homogeneous; i.e. $ ( r u ) = for all v e .ft" and all r > 0.
rtytv)
Solution Stability of Nonsmooth
Equations
265
(C) There exist neighborhoods Vx and Wx of x* and u" respectively, and a constant 7 > 0 such that for all (s,w) g V j X t f , , ||/'(*,u)-.F(x,aOII<7i|w-wl. (D) There exists a continuous function T : iJ" —» i f such that (i) the origin is the only zero of T (thus the index of T at the origin, denoted ind(r,0), is well defined), (ii) ind(T,0) is nonzero, and (iii) for all scalars <5 > 0, the function r + 6$ never vanishes at a nonzero vector. We explain the above assumptions. Assumption (A) is the cornerstone of our theory. Assumption (B) is motivated by the case of a B(ouligand)-differentiable func tion F(-,u)*). (We refer the reader to [19] and [28] for some basic properties of a B-differentiable function.) Indeed if F(-,UJ') is B-differentiable at x* with the Bderivative denoted by BF(x*,u>*)(-), then we have \P(t;) = B F ( i ' , u * ) ( » ) and * is positively homogeneous in v. It should be pointed out that assumption (B) is not of major importance in our theory; its role is to allow us to state some characterizing conditions as global properties of * . Without (B), these conditions would become lo cal properties of ty. Since our interest in this paper is restricted to the B-differentiable case, we find assumption (B) convenient to have. Assumption (C) is a local Lipschitzian assumption of the function F(x, ■) at w", with the Lipschitzian modulus independent of x that is sufficiently close to x*. This assumption is commonly made in sensitivity analysis of this type; see for example condition (b) in [28, Theorem 3.2]. Unlike the other three assumptions, (D) seems rather artificial at first sight. We could think of (D) as a kind of nonvanishing property of the function ^ ; this assump tion generalizes the nonvanishing condition assumed in the Leray-Schauder fixed-point existence theorem in classical nonlinear analysis which corresponds essentially to T being the identity map; see [16, Theorem 3.1.4] or [18, Theorem 6.3.3]. In what fol lows, we give two simple instances in which (D) holds. The first instance occurs when
J. S. Pang
266 L e m m a 1 Let ¥ . Rn —> Rn be continuous ing two statements are equivalent.
and positively
homogeneous.
The
follow
(a) ¥ - 1 ( 0 ) = { 0 } ; (b) there exists a scalar A > 0 such
that
||*(w)|| >-MMI,
4
forallveR"
(3)
Main Results
T h e r e are two main results. T h e first result concerns the stability of t h e zero x", and t h e other concerns t h e strong stability. T h e o r e m 1 Let F : R"+m —> Rn be a continuous function and suppose F(X*,LJ*) 0. Under assumptions (A) to (D), the following four statements are equivalent.
=
(a) The origin is a stable zero of ¥ . (b) ¥ is surjective
and ^ _ 1 ( B ( 0 , 1 ) ) is
bounded.
(c) The origin is the unique zero of ty. (d) x* is a stable zero of
F(-,u>').
P r o o f , (a) => (b). Since ¥ is positively homogeneous, we have ¥ ( 0 ) = 0. Assume t h a t the origin is a stable zero of ¥ . T h e n there exists neighborhoods Ui and U2 of t h e origin, and a constant c > 0 such t h a t the system * ( « ) = y,
v£Ux
has a solution, for all y £ U2\ moreover,
HI < «#ll for any such solution. To show (b), let y £ Rn be arbitrary. T h e n ry £ U2 for all r > 0 sufficiently small. Hence there exists a vector v such t h a t ¥ ( u ) = ry. By the positive homogeneity of \P, it follows t h a t * t ( r _ 1 t ; ) = y. T h u s ¥ is surjective. To show the boundedness of \ P _ I ( B ( 0 , 1 ) ) , let y £ 6 ( 0 , 1 ) and assume t h a t \P(v) = y. T h e n for all T > 0 sufficiently small, we have ry £ U2,TV £ U\, and ^(TV) = ry. T h u s it follows t h a t T||V|| < cr\\y\\ which implies ||u|| < c. Hence (b) holds. (b) =4> (c). Suppose t h a t fy(v) = 0 for some vector v =£ 0. T h e n \v £ fl>_1(0) for all A > 0. Thus tf_1(23(0,1)) is unbounded. Consequently t h e origin is t h e unique zero of ¥ and (c) holds.
Solution Stability of Nonsmooth
Equations
267
(c) =*• (d). Since the origin is the unique zero of ^ i n d f ^ O ) is well defined. Moreover, there is a constant A > 0 such that (3) holds. It follows that * _ 1 is globally upper Lipschitzian at zero relative to x"; i.e., we have * _ 1 (j/) C {**} + A \\y\\ 6(0,1),
for all y 6 RT
Define the homotopy H : Rn x [0,1] -> R" by H(v, t) = (l-
i)r(v) + (*(»),
for (v, t) e R" x [0,1].
By assumption (D), the function H{-,t) does not vanish at any vector v ^ 0 for all t g [0,1); the same is true for t = 1 by (c). Hence by the invariance property of the degree of a continuous mapping, it follows that ind($, i*) = ind(*, 0) = ind(tf(•, 1), 0) = ind(#(-, 0), 0) = ind(r, 0) ^ 0. By Theorem 1 in [20], part (d) follows. (d) =4> (a). We first show that (c) holds under (d). Suppose ty(v) = 0 for some vec tor v 6 fi" Let c, V, W, and U be given by Definition 1. By the positive homogeneity of g, we have for any scalar r > 0, 0 = * ( T » ) = $(z* + TV) (4) = F(x~ + TV,Uj') — {F(x"
+ TV,U>') — $ ( x ' + TV)) .
Since $ is a, FOA of F(-,OJ') at x', it follows that for every e 6 (0,1/c), where c is the constant in condition (ii) of the stability of x', we have \\F(x" + TV,UJ') - $(x* + rv)\\
< ST\\V\\
for all T > 0 sufficiently small. Choose r > 0 such that ETV G U and x" + TV £ V Then by condition (ii) in the definition of stability, it follows that T\\v\\ < c\\F{x'
+ TV,UJ') - $ ( x ' + TV)\\ < <XT\\v\\.
By the choice of e, we deduce u = 0. Thus the origin is the only zero of g. As in the proof of [(c) =*> (d)], it follows that ind(>P,0) ^ 0. Thus for any open neighborhood U' C Rn of the origin, the degree of ^ at zero relative to V is nonzero. By the nearness property of the degree, it follows that for all vectors y with \\y\\ sufficiently small, the degree of the translated function $ — y at zero relative to U' is also nonzero. In turn, this implies that there exists a vector v € U satisfying ip(i;) = y. This establishes the first requirement for the origin to be a stable zero of * . The second requirement follows from the inequality (3). Consequently, (a) follows. Q.E.D. R e m a r k . Assumption (D) is not needed for the proof of (a) => (b) => (c).
J. S. Pang
268
After seeing the above proof, Asen Dontchev communicated to the author that conditions (a) and (d) are equivalent under assumptions (A) and (C) only. Indeed, in the paper [6] which was completed after the present paper, this equivalence is extended to set-valued mappings. Hence, assumptions (B) and (D) are responsible only for the implications [(a) =*• (b)] and [(c) =>■ (d)]. We have stated Theorem 1 by assuming all four conditions (A)-(D) because as we shall see in the next section, statements (b) and (c) are central to the applications of the theorem discussed later. Next we turn our discussion to the characterization of strong stability. For this purpose, we need to strengthen assumptions (A) and (C). In essence, the strength ening of (A) amounts to the assumption of a uniform, strong FOA for the function F(-,u>) for all u sufficiently close to ui'\ the strengthening of (C) amounts to the assumption of Lipschitz continuity of the function F(x,-) in a neighborhood of w*, with the Lipschitz modulus being the same for all x sufficiently close to x" (A)' The function F(-, •) has a strong FOA $ at {x* ,w") in the sense of [28, Definition 2.4] which satisfies $(x*) = 0; that is, for every e > 0 there exist neighborhoods V€ of x" and Wc of LJ' such that for all (x', UJ) 6 Vc x Wc for i = 1,2, we have I [F(x\u>) - $(x 1 )] - {F(x\u)
- $(x 2 )] (j < 4X1 - x 2 ||.
(5)
(Note that this assumption implies that $ is continuous in a neighborhood of x*, by the continuity of F.) ( C ) ' There exist neighborhoods V\ and Wj of x' and u>' respectively, and a constant 7 > 0 such that for all (x,w,u/) G Vl x Wx x Wx, ||F(x,u;)-F(x,u/)||<7||W-u/||. By taking (x 2 ,u;) = (**,w*), we see that (A)' =* (A). Clearly (C)' =* (C). We say that a mapping H : Rn —> Rn is a (global) Lipschitzian homeomorphism if H is a homeomorphism from Rn onto itself and both H and H~l are Lipschitz continuous on Rn. The following is the second main result. Theorem 2 Let F : Rn+m —► Rn be a continuous function and suppose F(x',w') = 0. Under assumptions (A)', (B), (C)', and (D), the following three statements are equivalent. (a)' The origin is a strongly stable zero of ^ . (b)' * is a homeomorphism and \P - 1 is Lipschitz continuous on R". (d)! x" is a strongly stable zero of F(-,u>"). Furthermore, if in addition 9 is piecewise linear, then any one of the above statements is further equivalent to each one of the following statements:
Solution Stability of Nonsmooth Equations (e) ^ is a (global) Lipschitzian
269
homeomorphism;
(f) * is injective; (g) * is bijective. Proof, (a)' => (b)'. It follows from the proof of (a) =>• (b) in Theorem 1 that * is surjective. By the strong stability assumption of the origin as a zero of * and by a scaling argument, it can be proved easily that there exists a constant c> 0 such that whenever tf(u') = y' for i = 1,2, then l 2 2 -v-\\
(6)
This establishes that (d)'. It follows from Theorem 1 that x* is a stable zero of F(-,ui*). Let V,W, and U be, respectively, the neighborhoods of x',to', and 0 associated with the stability of x' To show the Lipschitzian property (2), let c > 0 be the Lipschitz modulus of tf-1 Let e 6 (0,1/c) be arbitrary and Vc and Wt be, respectively, the neighborhoods of x' and u' asssciated with e as stipulated in asssmptton (A)'. Let W = IV = PF W nn W, W, nn w«. Wt. Suppose F(x\u/) = y' for some (<J,y%) G W x U for t = 1,2. By restricting the neighborhoods IV and t/ if necessary, it follows from the stability condition (ii) that x' e Ve n Vi for i = 1,2. Write u ; = x' - x* We have j,' = *(«■) + F ( x+F(x«, V ) - <WD')-*(x ( x 1 )i),, y< =*(«') or equivalently, if = S^fo* ~ * " ( * V ) + *(**))• Consequently,
K-«a|| 2 < cdlyi - j/2|| + \\[F(x\u; ||[F(xi, w1)') - *(*»)] F(xW *(x*)]||) Six1)] - [[F(X\UJ ) ) - <&(x*)]||) 2 < c(||»» - 22|| + [ [ ^| [( ^x W ) - #(* ^(x 11)] )]|| )] -" [F(xW) [ f ( z W ) - *(x *(^)lll 2 1f 2 2 + || J F(x ,u, )-,F(x ,u, )||)
+||f(*V)- (*V)ll)
c
(
7
2
ax(1 7) lll,wii< l . w i i < c ™7* *; ' (V-^!l ii/-^l + + i^-" ll^-"2 ll) ll)
270
J. S. Pang
as desired. (d)' => (a)'. By Theorem 1, it follows that the origin is a stable zero of # . Thus * is surjective. Hence it remains to show the existence of a constant c > 0 such that (6) holds whenever tf(v') = t/' for i = 1,2. (This will imply in particular that * is injective, hence is bijective because we already know that * is surjective.) Let L, V, W, and V be, respectively, the Lipschitz modulus of the solution map Sv and the neighborhoods of x*,u>", and the origin associated with the strong stability of x'Suppose i ${vi) = yi/\ *(v'') , for i = 1,2. Choose e € ( 0 , l / £ ) ; let V€ and Wc be as before. As in (4), we have for any r > 0, ry' = F(x' ry' = F(x'
+ TV\U") + TV\U")
-
(F(x' (F(x'
+ TV\U*) + TV\U*)
- $(x* + r»')) - $(x* + r»'))
. .
By choosing r sufficiently small, we can ensure that x'= x'+ rv* eVDVs F(x',u') ry' + + F(x',uJ') = = ry-
and
1 (F(x',o J')-$(x'))GU, (F(x\^)-nx ))GU,
for i = 1,2. Thus we have T|l^-^| < 1 Lr (lb1 - y2\\ + || [*ix [*(x\w*) ,u*)
- ^(x 1 )] - (F{X\LO-) (F(x2^')
- $(x 2 )] | | ) .
Using the strong FOA inequality (5) and rearranging terms, we deduce
1 2 1 2 lb F - - 2-« H <||< 2 / 1 - 2-^||, / ||, r ^ lI l^|b which establishes the desired Lipschitz property of * - ' . Finally, if * is piecewise linear, then * must be Lipschitz continuous, by a result in [8]. Thus (b)' is equivalent to (e). The equivalence of (e), (f), and (g) for a piecewise linear map is pointed out in the proof of Theorem 9 in [21]. Q.E.D. Remark. We point out that assumption (D) is needed only for the proof of (d)' =■ (a)'. In particular, (D) is really redundant in the proof of (b)v =4- (d)'. Indeed, if * is a homeomorphism, then ind(*,0) is already equal to ± 1 ; thus (D) holds with T = ^ under the assumption in (b)'. The equivalence of statements (a)' and (d)' in Theorem 2 under assumptions (A)' and (C)' was proved in [7] within a much more general context than the present setting. We have retained the other two assumptions (B) and (D) in the above theorem in order to establish the equivalence of all the statements therein. We close this section by mentioning the Habilitation Thesis of Scholtes [31] which contains a chapter discussing piecewise afflne functions; in particular, Section 2.3 is particularly relevant to the equivalence of statements (e), (f), and (g) in Theorem 2.
Solution Stability of Nonsmooth
5
Equations
271
P a r a m e t r i c Variational Inequalities
Let F : F n + m - . f f - b e a continuously differentiable mapping, and let g : Rn+m —> RP and h : Rn+m —» 7?« be twice continuously differentiable vector-valued functions such that <7,(-,w) is convex and hj(-,u) is affine in the first argument for every fixed second argument w € i f . (The continuous differentiability assumption of these functions can be relaxed somewhat. Nevertheless, for the sake of simplifying the discussion, we shall use the assumption as stated.) Let C(w) = {x € Rn : g(x,u>) < 0, h[x,cj) = 0} which is a closed convex subset of Rn for every u> £ Rm inequality is the family of problems:
The parametric variational
{VI(F(.,«),C(w)):«en}, m
where Q is a subset of R and for each w € fJ, VI (F(-, w), C'(w)} denotes the problem of finding a vector x G Cfo;) such that (y-x)TF(x,u)
> 0,
for all y 6 C(w).
For any u> € i f , let SOL(F(-,oi),C(u;)) denote the (possibly empty) set of solutions of the VI(F(-,w),C(w)). By letting HK(U) denote the Euclidean projection of the vector u £ Rn onto the closed convex set K C R", it is known that the VI (F(-,u>),C(u>)) is equivalent to the normal equation [29]: 0 = H(z,u)
= F(nCM(z),w) + z - n C M ( 2 ) ,
(7)
in the following sense: namely, if re solves the VI (F(-,u>), C(ui)), then z = x — F(x,u>) is a zero of H{-,LJ)\ conversely, if z is a zero of H(-,u>), then x = Hc(u,)(z) solves the VI(F(-,u,),CH). We shall say that a solution x" £ SOL(F(-,ui*),C(u>*)) is stable if the correspond ing vector 2* = x' — F(x*,u) is a stable zero of the parametric equation H(-,u>*) = 0 in the sense of Definition 1. Strong stability of x" is defined similarly. In what follows, we shall apply the theory developed in the last section to the parametric equation (7); as a consequence, we will obtain a characterization for the stability of a solution to a parametric variational inequality.
Properties of the projection Throughout the rest of this section, we let (z*,w*) G Rn+m be such that H(z",u>*) = 0; write x* = n C („.)(z*)- Thus x' G SOL(F(-,w*),C(w*)) and z' = x'F(x*,u*). For the theory in the last section to be applicable, we need to obtain a FOA of the function H(-,ui*) at x'; in turn, the existence of such a FOA relies on some basic
272
J. S. Pang
properties of the parametric projection operator II0(w)(z). By definition, for each w, the projection vector U0(u)(z) is the unique optimal solution of the nonlinear program in the variable x: minimize |\\\x | | s - z\\2 (8) subject to x€C{iv). x€C(u). As a function in x (with z fixed), the objective function of this program is strongly convex (and quadratic). Let JJ(x',u>*) ( * * , w ' ) = { t : ^ ( {i: gi{x-,u;')=0} ir* 1w*)=0} be the index set of active (inequality) constraints at (x',u;*). Throughout the rest of this section, we postulate that the constant-rank constraint qualification (CRCQ) holds at the pair (x',u") [11]; that is, (CRCQ) there exist neighborhoods Vj C Rn of x* and W1 C Rm of LJ" such that tor any subsets Q C {1,.., ,q} and J C I(x*,ui), the set of gradient vectors {Vshj(x,w) {V.hj{x,w)
: j G 2 } U {V^,(*,w) {V.flfow)
: «i G J}
has the same rank (depending on Q and J) for all vectors (z,w) G Vi x W,. Let M(**,«*) denote the set of KKT multipliers of the projection problem (8) at (z*,w*); that is Af(«*,«*) consists of vectors (A,/i) G i?p+,J such that x--z'
+ E L , AiV l5 ,(i*,w*) + E?=i HjV / ^ Vxhj(x*,u>*) ^x'.w*) = = 00 A>0,
T ATg(i*,uj*) A <,(x',u/) = 0.
(9)
The CRCQ implies that M(Z\LO") is a nonempty polyhedron. As noted before, i - - z' = F(x*,w*); thus the system (9) is exactly the KKT system of the VI (F(;w*),C{u*)) at the solution x". Define the VI (vector-valued) Lagrangian function: for (xf\,p,u) £ffxiip+xffxr, vP
9 i
C(x,\, ^2\,V hJ(x^). C{x,X,»,u,)sF(x,u,) VJijx(x,w). l,^) = F(x,oJ) + £\iV Igt(x,u)++'ENY,^ agi(x,w)
(10) (10)
Since fc(-,w) is affine for each fixed w, we note that V I £ ( x \ A , MM, Ww * ) = V * F (( :sc>>' ') )++££AA,, VV ^^ (( *z > ' ) is independent of the multiplier ft. In addition to the CRCQ, we also need the familiar Mangasarian-Fromovitz constraint qualification (MFCQ) at (sv,w*) to ensure that C(w) is nonempty and the projection n c ( „ , ii well lefined for rll w near u" Thii sonstrainn qqalification ii stated below:
273
Solution Stability of Nonsmooth Equations (a) the gradient vectors {Vxh](x',u'):j
=
l,...,q}
are linearly independent; and (b) there exists a vector u G R" such that Vxg,(x',L>') g,{x',Lj')TTuu
<0
V^hiix-^'fn V^x'^'^u
= 0 for all j = 1,,.. 1 , , . . ,?.
for all i G Z (l{x',w% *\w*),
Under the above setting, the following property of the parametric projection map can be established by applying [24, Theorem 2] to the parametric nonlinear program (8) with x as the primary variable and (z,w) as the parameter. Lemma 2 Under the CRCQ and MFCQ at (x*,w*), there exist neighborhoods Z0 ofz- and W0 o/w* such that the projection map UC(^(z) as a function of (*,w) is PC \ hence locally Lipschitz and B-differentiable, on the neighborhood ZQ x W0; in particular, there exists a constant 70 > 0 such that for all ( « \ u / ) GZ0xW0,i = l,2, 1
2
1
2
1 l|nCC(( w„»)(, l k1 -- u||).. ||n IW 2)||| < < 7o 7o ((ll^ u>2|||| + + p \\zx -- zz2\\) )) (2 )) -- n c( „)2)222)|||
Although the results in [24] can be used for computing the B-derivative of the parametric projection map, we shall need this derivative only for the map IIo(w)(z) considered as a function in z alone (with UJ' fixed). Define the critical cone of the set C(w*) at x* as follows: £ ( * W ) = {v K(x',iv') {vGR GnR : n : Vxg,{x',uj') g,(x*,w')TTvv Vxr^(i*,u;*) h]{x\uj')TTvU
< 0, for all i G J(**,W*),
= 0, for all j =- 1 , . . . ,q) ,g} n (x* - z*) x1 ,
where ax denotes the orthogonal complement of the linear subspace spanned by the vector a G Rn. Since 1" - z' = F(x%w*) and the CRCQ holds at (x*,w*)) we obtain the following representation of the critical cone which does not involve the auxiliary vector z'\ K(x',u') T(x*,C(uj*))nF(x~,u K(x',w') = T(x\C(u,*))n F(X',UJ')T, J')T, (11) where T(x',C(u')) is the tangent cone of C(u") at x' We write 5n c ( w -)(«*, d) to denote the directional derivative of the function n 0 ( „.) at z' along the direction d. The following lemma is a summary of some known results whose proofs can be found in [21, 19, 28], Lemma 3 Let u' be fixed. Under the CRCQ stated above, for every d G Rn, the Bderivative BUciu-)(z*,d) is the unique optimal solution of the strictly convex quadratic program in the variable v G Rn: minimize \vTAv - vTd subject subject to to vGfC(x',tj'), VGIC(X',LJ'),
° )
J. S. Pang
274 where
A = I+ £ AiVL«(x%«*)
(13)
i:A,>0
is symmetric positive definite, and (A,/j) £ M(2*,u>*) is arbitrary. Moreover, if each g;(-,to*) is affine (yielding A = I), then the B-derivative BUc(u*)(z", •) i s strong; indeed it holds in this case that
nc(„.)(z* + ») = x* + nc(*.,M.)(t>)
(14)
for all v £ R™ with \\v\\ sufficiently small. Related results on the directional differentiability of solutions to parametric non linear programs under other constraint qualifications can be found in [32] and the bibliography in [24].
Verification of assumptions (A)-(D) We now focus our discussion on the parametric equation (7). The following result identifies a FOA of H(-,ui") at z" Proposition 1 Suppose the CRCQ holds at (x*,ui*). given by the continuous function Z H (VXF{X',UJ')-
I)BUc{u,'){z\z-
A FOA of H(-,u>") at z* is
z') + z - z ' ,
for all z £ Rn
Moreover, this FOA is strong if each ;(•, w') is affine. Proof. This follows from the chain rule of B-differentiation [28] and the expression (14) for the polyhedral case. Q.E.D. Consequently, assumption (A) holds. Since G(v) = (VxF(x",u>*)-I)Bnc{„-)(z',v)
+ v,
foralluGi?"
(15)
is clearly positively homogeneous, (B) holds with this function G. To verify assump tion (C). Suppose that the MFCQ also holds at (x',u>'). We have H(z,v)
- H{z,u*) =
F(uCM{z),ui) - F(nC(„.)(z),w*) - n C H ( z ) + nc(u>)(z) = [F(n CM (2),u;)-F(n c(w) (;),^)]+[ J F(n CM ( 2 ),a;-)-F(n c(l ,. ) ( 2 ),^*)] -n C (u,)(*) + n C ( l J . ) (2). By Lemma 2 and the differentiability assumption of the function F, it follows that condition (C) is satisfied for H at (z*,u>*). Incidentally, the sole role of the CRCQ is
Solution Stability of Nonsmooth
Equations
275
to ensure that ( j l l ^ ^ ^ ) — ric(^*)(2:)|| is bounded by a constant (that is independent of z near z*) times ||w — w*||. As long as the projection operator has the latter property (possibly without the CRCQ), it becomes possible to apply Theorem 1. Finally we come to the last condition (D). We postulate the following assumption: (SONC) for every t; € £(x*,w*), there exists (\,(J.) € M(z*,u*) vTVxC(x*,\,ti,ur)v>0.
such that (16)
The reason why we label this assumption as SONC is that when F(x,u)
= Vx0(x,u),
for all (x,u) 6
Rn+m,
for some real-valued twice continuously differentiable function 6 : Rn+m —► i?, the above SONC is exactly the second-order necessary condition [4] for the following nonlinear program in the variable x € Rn: minimize
9(x,u>*)
subject to x € C[uf) Note that in general the above SONC is satisfied if for every (A,/i) e M(z*,ui*), the matrix V r £ ( i * , \,JJ.,OJ*) is copositive on the critical cone £(x*,u;*). Proposition 2 If SONC holds, then condition (D) is valid for the function H defined in (7). Proof. Let the function T required in (D) be the identity map. It remains to verify that I+ 6G never vanishes at a nonzero vector for all 8 > 0, where G is given by (15). Assume the contrary; let w / 0 be such that w + 6G(w) = 0,
(17)
z
for some 8 > 0. Let v = BYlc(u'){ ','w)- By Lemma 3, v € fC(x',u>*). Let (A,/i) G M(z*,ui*) be such that (16) holds. Let A be defined by (13) with this (A,//). Consider the map H(d) = (7 - A)P(d) + d, deRn, where P(d) is the unique solution of the quadratic program (12). In particular, we have P(w) = v. According to [21, Lemma 8], the map H is a global Lipschitzian homeomorphism from Rn onto itself, and its inverse is given by ■H-1(y) = (A-I)U)C(y)
+ y,
where K. is a, shorthand for K.(x*,u>"). Moreover, we have UK: o ft = P In terms of the mapping P, the equation (17) can be written as (l+8)w
+ 8(VxF(x',co*)-I)v
= 0.
(18)
J. S. Pang
276
Let u = H(w).
T h e n we have U/c(u) = P(w) = v; moreover w = W _ 1 ( « ) -{A-
I)UK(u)
+ u = (A - I)v + u.
Substituting t h e m a t r i x A from (13), we obtain (1 +8)(u-
I f e M ) + ((1 + 6)A + 8(VxF(x~,to-)
- / ) ) « = 0,
or equivalently, (1 + S)(u - n K ( « ) ) + (A + SVxC{x\
A, n,ui'))v
= 0.
Premultiplying this equation by vT = IIJC(W) T and using the following facts: A is pos itive definite, vT'VIC(x', A, fi,u}*)v > 0, and since XT is a cone, HK;(U)T (U — H/c(u)) = 0, we deduce t h a t v = 0. In turn, from (18), we obtain w = 0, which is a contradiction. Q.E.D. Summarizing t h e above discussion, we can now s t a t e the following result which is essentially a corollary of Theorem 1. In this result, K" denotes the dual cone of the set K. T h e o r e m 3 Assume that the differentiability and convexity conditions on the func tions F,g, and h as stated in the beginning of this section are valid. Let (x*, Z*,UJ") be as given above. Suppose that the CRCQ, MFCQ, and SONC hold at (x*,u>*). The following statements are then equivalent. (a) x* is a stable solution (b) For every (\,fi)
of the parametric
G M{z',ui"), 0 ^ v €
the implication K(x",u*}
e fC(xm,urY
Vx£(x*,\,fi,u>')v
(c) There exists (A,//) € M(z*,w*)
VI (F(-,to*),
C(UJ')).
below holds: vTVxC(x',\,fi,w')v>0.
such that the implication
(d) For every ( A , / J ) € M(z",u>"), there exists a constant vector q £ Rn, the generalized linear complementarity variable v: v €
(19)
' (19)
holds.
c > 0 such that for every problem (GLCP) in the
K.{X",UJ")
5 + V1£(l^A,(J,a:>6):(l>•)• T
v {q + \'IC(x',\,n,u>')v) has a solution
and all such solutions
v satisfy
(20)
= 01 \\v\\ < c\\q\\.
(e) There exist (A,/x) 6 M(z",<jj*) and c > 0 such that the conclusion
of (d)
holds.
Solution Stability of Nonsmooth
Equations
277
In addition, if the matrix VxF(x",u>*) is symmetric, then any one of the above state ments (a)—(e) is further equivalent to the either one of following two statements. (f) For every (A,/j) e M(z*,u)*), the implication below holds: 0 / i i £ KI(x*,u>*) =► vTVzC(x*,\,ti,w*)v (g) There exists (X,fi) 6 M(z',u")
> 0.
(21)
such that (21) holds.
Proof. The assumptions of Theorem 1 hold for the function Thus this theorem is applicable. (a) =>• (b). By the implication (d) => (c) in Theorem function G defined in (15) has a unique zero at the origin. (A,/J) G M(Z',UJ"), the implication (19) fails to hold for some satisfies 0 / v € K(x',u>*) S7x£(x*,\,(i,w*)v
H at the pair (z*,w*). 1, we deduce that the Suppose that for some vector v. This vector v
e K.{x',w*)*
T
V VIC(X\\,H,IJJ')V
= 0.
Let A be defined by (13) with this (A, /x), and let H and P be the mappings associated with this matrix A, as in the proof of Proposition 2. Let u = v — VxC(x", A, /j.,u>*)v. Then we have v = II^(u), where K. is a shorthand for K.{x',u>*). Let w = 'H~1(u). Then w = (A- I)Uic(u) + u = (A- I)v + u. Moreover, we have P(w) = lie ° ~H(u>) = IIJC(U) = u; that is, v = Now, G(w) = (VxF(x',io*)I)BUc{u,.}(z',w) +w = (VxF(x-,um)
- I)v + (A-I)v
= {VxC(x',\,n,u>')-
BYlc^'){z',w).
+u
I)v + u = 0.
Thus w = 0 which implies u = v = 0. This is a contradiction. Hence (b) holds. (b) => (c). This is obvious. (c) => (a). Either assumption (c) or (b) is equivalent to the function G having the origin as the unique zero. Thus (a) holds if either (c) or (b) holds. Consequently, (a), (b), and (c) are equivalent. In a similar way, we can establish that either (d) or (e) is equivalent to the function G being surjective and G~1(B(0,1)) being bounded. But the latter two facts are exactly condition (b) in Theorem 1. Consequently, the five conditions (a)—(e) in the present theorem are all equivalent. Finally, if VxF(x",w") is symmetric, then so is VxC(x', A,/z,a;*) for any (A,/i) £ M(z~,u>'). In this case, the equivalence of the two implications (19) and (21), under the assumed SONC, is an elementary fact which has been noted in [10, Proposition 6], amongst other places. Q.E.D.
278
J. S. Pang
The implication (21) states that the matrix VxC{x*,A,/i,w*) is strictly copositive on the cone K(x*,u*). When specialized to the case oo farametric conlinear programming, this condition is exactly the second-order sufficient condition used e.g. by Robinson [27]. It is a weakened form of the well-known, classical second-order sufficiency condition in nonlinear programming, which essentially requires that the matrix be positive definite on the linear span of the cone JC(xm,u?*). We make some further remarks about Theorem 3. From complementarity theory [9], it can be shown that for «, given pair (A,/z), if the matrix VxC{xm, A,//,u>*) is copositive on K(x',w'), then the implication (19) holds for this (\,/i) if and only if the GLCP (20) has a solution for all vectors q G R1 and all such solutions must be bounded by a constant multiple of ||g||. It is known from [10, Theorem 8] that if the matrix VI£(i;*,A,^,w*) is copositive on JC(x*,w*) for all pairs of (A,/i) G M(z*,w*), then without the CRCQ, statement (b) )mplies (a)) Thus the CRCQ allows su to weaken the assumption in this previous result (which requires the copositivity of the Jacobian matrix of the Lagrangian function for all multiplier pairs) to the present SONC. At the present time, we are not sure whether some form of Theorem 3 will be valid without the CRCQ.
6
The Case of a Fixed Polyhedral Set
We consider the case where C(u) is a constant polyhedron in Rn, which we denote C, for all u G R"1. Our goal is to apply Theorem 2 to characterize the strong stability of a solution x" € SOL(F(-,w*), C). We leave it to the reader to specialize Theorem 3 to obtain detailed characterizations for stability in the (constant) polyhedral case. Here we simply point out that in this case, statement (b) in Theorem 3 is equivalent to Reinoza's strong positivity condition [25]; further discussion on the latter condition can be found in [2, 10]] Throughout this section, the MFCQ is not needed, whereas the CRCQ clearly holds because of the polyhedrality of C. As before, let z* = x* - F(x*,w*). Thh function H has s slightly yimpler rorm: H(z,w) = F(Yl H(z,to) F(Uc{z),o + z-nc{z). z-Ilc(z). c(z),u)J)
(22)
Let K denote the critical cone of C at x' with respect to the function F(-,w*); that is, K= T(i*,C)nF(x*,)*)x, where T(x', C) is the tangent cone of C at x' 2 i - (VxF[x*,w*) F{x',u*)
- /)II;c(z I)U^{z -z')
Define the functions: + z-
F{x*,ui')-J)H^(v) G(v) = (VIxF(x*,a)*) - I)UK(v) + v
z',
for all z G e R" Rn
for all v G Rn
We note that G is the "linear" normal map associated with the pair (V,F(x',u ( V , F ( * '},,),fC); «•).*);
(23)
Solution
Stability
of Nonsmooth
Equations
279
see [29]. Necessary and sufficient conditions for such a m a p to be a Lipschitzian h o m e o m o r p h i s m on B" are obtained in this reference. In particular, one of these conditions is t h a t G is ''coherently oriented"; see t h e reference for definition. Further "nonsingularity" results of t h e m a p G in t h e case where VxF(x',ui') is symmetric can be found in [30]. We also note t h a t there is a one-to-one correspondence (of t h e s t a n d a r d type) between t h e solutions of the equation
q + G{w) = 0 and t h e solutions of t h e affine variational inequality (AVI) defined by t h e polyhedral cone K, and t h e affine m a p t> — i>q+
VxF(x",u*)v.
We denote t h e latter problem by AVI (q, VxF(x', ui'),K). Since AC is a cone, this AVI is equivalent to the G L C P (20) with VxC(x', A, fj.,u>*) and >C(X',UJ*) s u b s t i t u t e d by VxF(x*,u>*) a n d K. respectively. By t h e identity (14) and the continuous differentiability of F, it is not difficult to verify t h a t assumption ( A ) ' is valid. Assumption (C)' is also easy to verify. In the present setting, t h e S O N C takes the following simple form: ( S O N C : c o n s t a n t p o l y h e d r a l c a s e ) : t h e m a t r i x VxF(x*1 cone tC.
u>*) is copositive on the
We are now in a position to apply T h e o r e m 2 and obtain the following analog of T h e o r e m 3 for t h e strong stability of x" T h e o r e m 4 Let C be a polyhedron in Rn; let F : Rn+m —> FC be continuously differentiable. Let x" 6 SOL(F(-,u}"),C). Suppose that \7XF(X',UJ*) is copositive on the cone K. The following statements are then equivalent. (a) x* is a strongly
stable solution
of the parametric
(b) The map G defined by (23) is a Lipschitzian (c) The map G is coherently
(F(-,LO*),C).
homeomorphism
on Rn
oriented.
(d) The origin is a strongly stable solution (e) For every vector q 6 Rn,
VI
of the AVI (0,
the AVI (q,VxF(x*,w*),fC)
VXF(X',OJ*),K,). has a unique
solution.
P r o o f . T h e r e is nothing m o r e to prove about this theorem except to point out two things: (i) s t a t e m e n t (d) is equivalent to t h e strong stability of the origin as a zero of t h e m a p G, and (ii) t h e unique solvability of the AVI (q, VxF(x",u>*),IC) is equivalent to t h e bijectivity of t h e m a p G. T h e asserted equivalence of t h e five s t a t e m e n t s ( a ) — (e) is easily seen to be a consequence of T h e o r e m 2. Q.E.D.
J. S. Pang
280
Remark. The implication (b) => (a) is closely related to the implicit-function theo rem for the parametric normal map for H in (22); see [28]. As such, it holds without the copositivity assumption of VxF(x", LJ*). This assumption is needed for the reverse implication (a) =£- (b) or (c), which is like a converse of the implicit-function theorem. To the best of our knowledge, such a converse has not appeared in the literature.
7
The K K T System of a VI
We can apply Theorems 1 and 2 to the KKT system of a parametric VI defined on a nonconvex set that varies with the parameter. It is well-known that such a KKT system can be considered as a VI defined on a special polyhedral set. Hence in principle, Theorems 3 and 4 are applicable; nevertheless a naive specialization of these latter two theorems would lead to results that are valid only under a restrictive positive semi-definiteness assumption; see the discussion below. In order to obtain characterizations under an alternative, and perhaps less restrictive, assumption we revisit the assumption (D) in the context of the KKT system. This is the main focus of the following analysis. Consider the following system in the primary variables (x, X,fj.)^Rn+p+q: C(x,\,fi,u) A>0,
=0 \Tg(x,uj)
g(x,u>)<0,
(24)
h(x,ui) = 0, where C is the VI Lagrangian function defined in (10). Unlike the development in Section 5, we do not assume that g(-,u>) is convex or h(-,u>) is affine. Instead, we continue to assume that F is continuously differentiate and g and h are twice continuously differentiable. Define the function H : /?"+?+?+"• _> R"+P+I by / £(*A^,w) \ H(X,X,H,UJ)
for (i,A,/z,u>) e ^+"+9+"
-g{x,u>)
=
\
-h(x,uj)
J
and let C = Rn x R^. x R>. Then the system (24) is equivalent to the parametric VI (H(-,u>),C) where C is fixed. Let y G Rn+?+i denote a general triple (xt\,fl). We have VxC(x,\,fi,u}) Vxg(x,u)T Vxh(x,u)T VyH(x,\,p,oj)
-Vxg(x,u)
0
0
-Vxh(x,u>) Let y* = (x',\*,fi*) be a given solution of (24) corresponding to u>*. We need to evaluate the critical cone K.{xj',C) of C at the KKT-triple y*. For this purpose,
Solution Stability of Nonsmooth
Equations
281
define three index sets Q = { i : A* > p = {i:\;
0=gt{x",u*)}
=
o=gi(x;L,')}
7 = { i : A? = 0 >,(**, w*)}. We have T{y',C)
= Rnx
(RW x R^+lM)
X R>.
Hence fc(y*,C) = Rn x (ijN x flf x {0} h l ) x
fl'.
(25)
By the special structure of the critical cone )C(y", C) and the matrix VyH(x, A,^,OJ), it can be seen that the copositivity of the latter matrix on the former cone is equivalent to the positive semi-definiteness of VxC{x, \,fi,uj); this positive semi-definiteness assumption is needed in order for Theorems 3 and 4 to be directly applicable to the parametric KKT system (24). In what follows, we shall derive an alternative assumption for the characterization of stability and strong stability of the KKT triple (*v,A*,/«*). Specifically, we shall assume that the strict Mangasarian-Fromovitz constraint qualification (SMFCQ) holds at (x*, X",fi',u>*); namely, (a) the gradient vectors { V I 0 < ( a : \ u ; * ) : i e a } u { V I ^ ( 2 : \ w * ) :j =
l,...,q]
are linearly independent; (b) there exists a vector u 6 Rn such that Vigi(x*,u')Tu<
0
for alii 6/9,
T
Vxgj(x*,u>") u = 0 for all i 6 Q, Vxh3{x',oj*)Tu
=0
for a l l ; =
l,...,q.
It has been proved in [13] that the SMFCQ holding at (x*, A*,/i*,w*) is equivalent to (A*,/i*) being the unique KKT multipliers corresponding to the pair (x*,w*). With K(x",u)') denoting the critical cone as given by (11), (note that K(x'',ui"') is different from the cone K(y",C) defined in (25)) we postulate the following assumption which is basically the same as the SONC in Section 5 for the case of convex constraints because of the SMFCQ: ( S O N C : under SMFCQ) the matrix VxC(x", A*,/i*oj*) is copositive on the cone K{x',u"). The key in applying Theorems 1 and 2 to the parametric KKT system (24) at the triple (x",\*,ti") hinges on verifying assumption (D) for the function G(z) = (VyH(x*,\',ix',u;')~I)ntciy.,c)(z)
+ z,
for z 6 Rn+"+",
(26)
J. S. Pang
282 We define the required function T as follows. Let Vxg(x',Lj')T
/
Vxh(x',u')T
-Vxg(x",u*)
0
0
-Vxh(x',u)*)
0
0
and T(z) = (E-
/)n K ( v ., c ) (z) + at,
for
2
e
Bn+P+".
L e m m a 4 For the functions T and G defined above, T + 8G has the origin as the unique zero for all 6 > 0, and ind (T,0) = 1. Proof. This result follows from the theory of the mixed linear complementarity problem (LCP) as established in [10]. In what follows, we sketch the key ideas of the proof and refer to the reference for the omitted details. We have ( r + 6G)(z) = (1 + S)[(E(S) - / ) I W , c ) ( z ) + *], where (J + *V«£(y»,«'))/(l + tf) Vxg(x-,Lo-)T E{6)
Vxh(x',w*f
-Vxg(x',ui*)
0
0
-Vxh(x",uj')
0
0
We note that there is a one-to-one correspondence between the zeros of the function T + 6G and the solutions of the following homogeneous, mixed LCP in the variables (x,X,n): (I + 6VxC(y-,u,'))x
+ Ef=1 A,-Vx5i(a:*,W*) + £]=i ^ V A ( s W ) = 0 Vxgi(x",w')Tx
K > 0,
= 0,
<0
T
=0
Vxgi(x',uj') x
\l(S7xgl(x",u>*) x) \i = 0, T
Vxhj(x',u') x
for i 6 a
T
for i G ft
for i 6 7
= 0,
for j = l , . . . , o .
By the SMFCQ and the SONC, it can easily be shown that this mixed LCP has a unique solution, namely (x, A, fj.) = 0; see the proof of Theorem 7 in [10]. Thus T + SG has a unique zero. To show ind(T,0) = 1, we consider the following homotopy joining the identity map and V: for (2, t) S #" + p + " x [0,1], n{z,t)
= tz + (l
-t)T(z).
283
Solution Stability of Nonsmooth Equations
It suffices to verify that for each t € (0,1], H{-,t) has a unique zero, namely zero. The desired index property of T will then follow from the homotopy invariance principle of degree and the fact that the index of the identity map at zero is equal to one. We have H{z, t) H{z, t) = = ( [(1 - t)E t)E + + tl]-l t l ] - l )U )IIJC(*) + z, z, K(z) + which shows that for each t E (0,1] H{-,t) is the linear normal map associated with the positive definite matrix (1-t)E + tI and the polyhedral cone K. As such H(-,t) is a global homeomorphism, hence it has a unique zero. Q.E.D. With Lemma 4, we can now apply Theorem 1 to characterize the stability of the KKT triple (x*, A*, /**). Theorem 5 Let F be once continuously differentiable and g and h be twice continuously differentiable. Let y' = {x',X*,u') be a KKT triple at w*. Assume that the SMFCQ and the SONC hold at (x\ A*,/i*,w*). The following statements are then equivalent. (a) y' is a stable solution of the parametric KKT system (24) at LJ* (b) The implication below holds: \ =>v =>vTTVVxx/l{x*,X*, /l{x*,X*,ffi',uj*)v>0. J
e K{x\uny
C(x',X',^,u>')v VxC(x',X',v',u')v
(27)
(c) The GLCP (q,Vr£(i*,\*tfi*,u*),IC(x*,u}*}) has a solution for all vectors q G R1; moreover there exists a constant c> 0 such that for all q and for any such solution v, \\v\\
VXxC{ C{y',u*)x A,V,<7,.(W) + + E'=i EU c^M,"^') c M,%*)) V + J£f=i X i kVM(*,"*) V',LJ')X TT
Vxxg,(x',u>*) g,(x',u>*)x x = 0, A;>0, A;>0,
TT
fori£«
Vxgi(x',u;*) (x',u;*) x<0 x<0 TT
Xt{V {Vxxg,{x',u') g,{x',u') x)x) A, = 0 ,, TT
) \ for i e6 8 =0J
for x £ 7
VxxhhJJ(x*,Uj') (x*,Uj')x x = = 0,
iovj = = iovj
l,...,q. l,...,q.
=° =
J. S. Pang
284
has a nonzero solution (x,\,[i). By the S M F C Q , it can be proved t h a t t h e vector x m u s t b e nonzero. This vector x will t h e n violate t h e implication (27). Conversely, if v violates t h e latter implication, by reversing the a r g u m e n t , it is easy to construct a nonzero vector z such t h a t G(z) = 0. T h e equivalence of (b) and (c) under t h e SONC has been noted at t h e end of Section 5. Finally, the last equivalence has been noted in t h e proof of T h e o r e m 3. Q.E.D. T h e r e is much similarity between the two Theorems 3 and 5. T h e r e are also some differences. First, t h e assumptions are different: in t h e former, convexity of t h e sets C(u>) is assumed and t h e C R C Q is needed; in t h e latter, neither t h e convexity nor the C R C Q is assumed. Instead, uniqueness of the multiplier pair (X*,fi*) is assumed in T h e o r e m 5. This uniqueness assumption leads to t h e second difference between the two theorems. In Theorem 3, t h e stability of the solution x" £ SOL(F(-,u!*),C(iom)) is characterized; although the multiplier m a p must be locally upper Lipschitzian at (x*,u)*) [27, 23, 10], Theorem 3 does not assert t h e stability of any multiplier pair. One last difference between the two theorems is t h a t t h e way t h e y are derived. Although both results are corollaries of Theorem 1, they are based on different systems of equations. We next obtain necessary and sufficient conditions for t h e K K T triple (x*, A*,//*) to be strongly stable. T h e o r e m 6 Let F be once continuously differentiable and g and h be twice contin uously differentiable. Let y" = (i*,A*,u)*) be a KKT triple at UJ* Assume that the SMFCQ and the SONC hold at (%*, A*,/i*,w*). The following statements are then equivalent. (a) y* is a strongly stable solution
of the parametric
(b) The mapping G defined by (26) is a Lipschitzian (c) The
KKT
system
homeomorphism
on
matrix ■V,C(X',X*,fi*,u>) A=
is nonsingular,
Vxh(x*,uj')T
(Vxga(x%uj*)f
-Vxga(x*,u>*)
0
0
-VJi{x*,u*)
0
0
and the Schur
complement - {Vx9l)(x',L0'))T 1
[ Vxg0{x\u>') 0 0 ] A-
0 0
is a
(24) at
P-matrix.
1
u', Rn+p+q.
Solution Stability of Nonsmooth
Equations
285
(d) y" is a strongly regular solution of the parametric KKT system (24) at u>* in the sense of Robinson [26]. If in addition the matrix VxF(x',u>*) is symmetric, then any one of the above state ments (a)—(d) is further equivalent to the following two conditions combined: (LICQ) the gradient vectors {Vx9i{x',w')
: i € Z(x*,w*)} U { V A ' ( z W ) : j = 1,...,«}
(28)
are linearly independent, and (SSOSC) the matrix VxC(x~,\*,fi*,u>') is positive definite on the null space of the vectors {Vr<7;(x*,u;*) :i€a}\j{Vxhj(x',u>') : j = 1,. ..,}. (29) Proof. The equivalence of (a) and (b) follows from Theorem 2. Since G is certainly piecewise linear, by Theorem 2, (b) is equivalent to the bijectivity of G. In turn it is known that the bijectivity of G is equivalent to (c); see [10, Theorem 4b]. Thus (a), (b), and (c) are equivalent. The equivalence of (c) and (d) is a classical result due to Robinson [26]; so is the implication [LICQ + SSOSC] => (d). Finally, we sketch the proof for the reverse implication (d) =3- [LICQ + SSOSC] under the SONC and the symmetry of VxF{x',w"). (As mentioned in the beginning of the paper, this result is due to [3].) By the symmetry of VXF(X',LO*), it can be established that (c) is equivalent to the LICQ and the following implication: for any vector v ^ 0 such that V xC(x' ,\*, pT ,u')v belongs to the linear span of the gradient vectors (28) and V'xgi(x~,u*)Tv = 0, T
Vxh3{x',uj*) v
i€ a
= 0,
j-l,...,q,
T
we have v V xC{x*, X',fi',uj")v > 0. Moreover VxC(x", A*, ji*,u>") is strictly copositive on the cone JC(X',LJ'). Let u be an arbitrary, nonzero element of the null space of the vectors in (29). Consider the equality-constrained quadratic program: minimize
^vT'VxC(x',\"
,fi*,u>")v
subject to Vxg,(x',uj")Tv T
Vxhj{x',w') v T
Vxgi(x*,Lo') v
= 0,
i€ a
= 0,
j —
l,...,q T
= Vxg{(x',uj') u,
i e /3.
The vector u is feasible to this program; moreover by the strict copositivity of VxC(x', A*,/i*,a>*) on K.(x",u>"), it follows that this quadratic program has an opti mal solution v. \iv = 0, then u e K.{x*,u>") and thus uTVxC(x*, \",fi*,uJ*}u > 0. If 0 ^ 0 , then we have UTVXC{X',\\H',UJ')U
> vTVxC{x',y,ti\u>")v
> 0,
286
J. S. Pang
where the last inequality holds because VxC(x*, X*, fi*,u>*)v must be a linear combi nation of the vectors in (28), by the optimality of v. Q.E.D. We conclude this paper by mentioning that Theorems 1 and 2 can also be applied to some quasi-variational inequalities and complementarity problems of various types; the application will extend previous sensitivity results for these problems [22, 15]. The details are omitted. Acknowledgments. The author is deeply indebted to Dr. Frederic Bonnans for some fruitful discussion on the subject of this paper while he was visiting INRIA, France in June 1994. Indeed, it was Dr. Bonnans who told the author about the Bonnans-Sulem characterization of strong regularity in [3] that led him to the main results of this paper, Theorems 1 and 2. The author also acknowledges the generous support of INRIA which has made his visit there possible; he is especially grateful to Claude Lemarechal for his kind hospitality as the local host. Dr. Danny Ralph has made some helpful comments regarding a draft of this work and provided clarifications to Lemma 3 which essentially is his joint result with S. Dempe. Finally, the author is indebted to Dr. Asen Dontchev for constructive comments on Theorems 1 and 2 and for making available to the author his preprint [6] which contains generalizations of part of these two theorems.
References [1] J.-P. Aubin and H. Frankowska, Set-Valued Analysis (Birkhauser, Boston 1990). [2] J. F. Bonnans, Local analysis of Newton-type methods for variational inequalities and nonlinear programming, Applied Mathematics and Optimization 29 (1994) 161-186. [3] J. F. Bonnans and A. Sulem, Pseudopower expansion of solutions of general ized equations and constrained optimization problems, manuscript, INRIA (May 1994). [4] J. V. Burke, An exact penalization viewpoint of constrained optimization, SIAM Journal on Control and Optimization 29 (1991) 968-998. [5] S. Dempe, Directional differentiability of optimal solutions under Slater's condi tion, Mathematical Programming 59 (1993) 49-69. [6] A. L. Dontchev, Characterizations of Lipschitz stability in optimization, in R. Lucchetti and J. Revaliski, eds., Well-Posedness and Stability of Optimization Problems and Related Topics, Kluwer Academic Publishers, to appear. [7] A. L. Dontchev and W. W. Hager, Implicit functions, Lipschitz maps and sta bility in optimization, Mathematics of Operations Research 19 (1994) 753-768.
Solution Stability of Nonsmooth
Equations
287
[8] T. Fujisawa and E. S. Kuh, Piecewise-linear theory of resistive networks, SIAM Journal of Applied Mathematics 22 (1972) 307-328. [9] M. S. Gowda, Complementarity problems over locally compact cones, SIAM Journal on Control and Optimization 27 (1989) 836-841. 10] M. S. Gowda and J. S. Pang, Stability analysis of variational inequalities and complementarity problems, via the mixed linear complementarity problems and degree theory, Mathematics of Operations Research 19 (1994) to appear. [11] R. Janin, Directional derivative of the marginal function in nonlinear program ming, Mathematical Programming Study 21 (1984) 110-126. [12] D. Klatte, On qualitative stability of non-isolated minima, Control and Cybernatics 23 (1994) 183-200. [13] J. Kyparisis, On uniqueness of Kuhn-Tucker multipliers in nonlinear program ming, Mathematical Programming 32 (1985) 242-246. [14] J. Kyparisis, Parametric variational inequalities with multivalued solution sets, Mathematics of Operations Research 17 (1992) 341-364. [15] J. Kyparisis and C. M. Ip, Solution behavior of parametric implicit complemen tarity problems, Mathematical Programming 56 (1992) 65-70. [16] N. G. Lloyd, Degree Theory (Cambridge University Press, Cambridge 1978). [17] J. More and W. C. Rheinboldt, On P- and S-functions and related class of ndimensional nonlinear mappings, Linear Algebra and its Applications 6 (1973) 45-68. [18] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York 1970). 19] J. S. Pang, Newton's method for B-differentiable equations, Mathematics of Op erations Research 15 (1990) 311-341. 20] J. S. Pang, A degree-theoretic approach to parametric nonsmooth equations with multivalued perturbed solution sets, Mathematical Programming 62 (1993) 359383. 21] J. S. Pang and D. Ralph, Piecewise smoothness, local invertibility, and paramet ric analysis of normal maps, Mathematics of Operations Research 20 (1995) to appear. 22] J. S. Pang and J. C. Yao, On a generalization of a normal map and equation, SIAM Journal on Control and Optimization 32 (1994), to appear.
288
J. S. Pang
[23] Y. Qiu and T. L. Magnanti, Sensitivity analysis for variational inequalities, Math ematics of Operations Research 17 (1992) 61-76. [24] D. Ralph and S. Dempe, Directional derivatives of the solution of a paramet ric nonlinear program, manuscript, Department of Mathematics, University of Melbourne, Victoria, Australia (March 1994). [25] A. Reinoza, The strong positivity condition, Mathematics of Operations Research 10 (1985) 54-62. [26] S. M. Robinson, Strongly regular generalized equations, Mathematics of Opera tions Research 5 (1980) 43-62. [27] S. M. Robinson, Generalized equations and their solutions, Part II: Applications to nonlinear programming, Mathematical Programming Study 19 (1982) 200-221. [28] S. M. Robinson, An implicit-function theorem for a class of nonsmooth functions, Mathematics of Operations Research 16 (1991) 292-309. [29] S. M. Robinson, Normal maps induced by linear transformations, of Operations Research 17 (1992) 691-714.
Mathematics
[30] S. M. Robinson, Nonsingularity and symmetry for linear normal maps, Mathe matical Programming, Series B 62 (1993) 415-426. [31] S. Scholtes, Introduction to piecewise differentiable equations, Preprint No. 53/1994, Institut fur Statistik und Mathematische Wirtschaftstheorie, Univer s i t y Karlsruhe (1994). [32] A. Shapiro, Sensitivity analysis of nonlinear programs and differentiability prop erties of metric projections, SIAM Journal on Control and Optimization 26 (1988) 628-645.
Convergence
Theories
289
Recent Advances in Nonsmooth Optimization, pp. 289-321 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Miscellaneous Incidences of Convergence Theories in Optimization and Nonlinear Analysis, P a r t II : Applications in Nonsmooth Analysis 1 Jean-Paul Penot Faculte des Sciences, Mathematiques appliquees, Av. de I'Universite, 64000 PAU, France
URA CNRS
1204,
Abstract
We examine the common features of various approaches in nonsmooth analysis in an unified way. In particular, we consider the question of the equivalence of a geometrical approach with an analytical approach. This question can be split into two parts we call coherence. We also deal with the delicate problem of stabilization (or closure) of the notions of subdifferential and normal cone and bring some proposals for this problem. Finally we make some general observations about the links of nonsmooth analysis with other fields in which limit problems or stabilized equations bring strange new terms.
In t h e first p a r t [126] of this series of two papers we have considered the inci dence of t h e discovery (or revival) of new types of convergences on optimization. In particular we have dealt with the behavior of solutions to optimization problems un der p e r t u r b a t i o n s of t h e objective in a continuous way with respect to the bounded hemiconvergence or b o u n d e d Hausdorff topology. This topology first introduced in [94] a n d [110] has been given a new life in [109] and [118] and has been extensively studied in [10]-[13]. This topology can be used with profit in nonsmooth analysis; t h e s a m e can be said of t h e Mosco topology and of t h e Joly topology [92], [93], a useful variant for t h e non reflexive case exhibited in [19] and [20] as a workable topology, and t h e n brilliantly characterized in [28], [29] under t h e n a m e of slice topology. These topologies can play a useful role in nonsmooth analysis: see for instance [56], [123]
and [124]. 'Dedicated to Charles Castaing on the occasion of his 60th birthday.
J. P. Penot
290
Our purpose here is different. We mainly focus our attention on the stabilization process for normal cones and subdifferentials (see section 3). Stability, also called closedness, is a very desirable property for a multifunction. If, for instance, the value of this multifunction is the set S(w) of solutions to some problem depending on some parameter w, it ensures that the limit xo of some family of elements x(w) of S(w) as w —► WQ belongs to the set S(wo) of the limit problem. In the case of subdifferentials and normal cones the stabilization procedure brings non trivial sets under appropriate assumptions and, more importantly, enables one to dispose of useful calculus rules. Moreover, it brings a kind of unification of the various subdifferentials. However different procedures can be used: sequential closures are simple but are not closed and not idempotent; topological closures with respect to weak topologies are too large and not suitable for duality arguments. Moreover, any infinite dimensional dual space contains unbounded weak* converging nets.We try to avoid these drawbacks by using a variant of a convergence on a dual Banach space introduced in another context in [58]. Its use is simple enough and it is suitable for duality questions. In section 2 we simply characterize this convergence as the classical continuous convergence [96]. Before applying this convergence to normal cones and subdifferentials we give in section 1 a bird's eye-view on the basic constructions of nonsmooth analysis. The starting point for such constructions can be either a geometrical approach or an analytical approach. We examine whether the connections between both approaches are reversible or not. It appears that such a property we call coherence can be given simple characterizations. It has the pleasant effect of discarding any ambiguity in speaking of normal cones or subdifferentials of a certain type. Although we do not treat this question in an axiomatic way, we try to adopt a unified point of view. The need of such a unified treatment has been felt by several authors for some questions such as the Mean Value Theorem (see [16], [54], [60], [102], [128], [143] for instance). We devote the last section of the paper to analogies we point out between nonsmooth analysis and other topics of infinitesimal analysis such as homogeneization theory. In both fields, it is interesting to take unconventional limits using epiconvergence theory, either to introduce some kind of epi-derivative or some limit problem. When writing down in concrete terms the expression of the limit one gets both clas sical terms and "alien" terms whose effect are often decisive. We believe that more attention should be given to such phenomena which are quite important in semiinfinite optimization, stochastic programming and sensitivity analysis one one hand, elasticity and mechanics of composite or porous media on the other hand.
1
Coherence in Nonsmooth Analysis
The abundance of concepts in nonsmooth analysis is certainly a nuisance for the users and for the researchers. Thus it may be useful to single out a few useful concepts and eliminate the others, or to put it in more gentle and sensible terms, to leave the other concepts in the shadow, in some sort of reserve from which they could
Convergence Theories
291
be extracted if needed. Many biologists and ecologists have a similar point of view about living species : not all of them are presently useful for the needs of agriculture or industry, but it may appear that new needs turn out to make precious some neglected species. Chemistry has also shown that rare and queer elements such as cobalt, lithium, uranium and thorium may turn out to be crucial for our needs. Another way to make the field of nonsmooth analysis easier to grasp consists in making simple general observations about its developments and its rules. Since we believe it is too early to decide whether 3, 4 or 7 concepts deserve a special distinction, we will follow here this second path. First we observe that two points of view can be taken as a starting point : a geometrical approach and an analytical approach. Fortunately several connections are known between these two approaches : epigraphs for one direction, indicator functions, distances functions and support functions for the reverse direction. Then a question arises : are the two points of view equivalent? Second, we note that the preceding question can be raised at two different levels since there exist constructions which deal both with primal and dual objects according to the general scheme tangent cone T'
<=* directional derivative d'
Tl
Tl
normal cone N' *^ subdifferential <9' For some concepts only the lower level is available directly. Since we consider this level is the essential one, although primal concepts from the upper level may be very useful, for instance in bifurcation theory or in viability theory, we restrict our attention here to the dual notions of the lower level. When primal notions are available directly one obtains normal cones from tangent cones by taking polar cones and one gets subdifferentials from directional derivatives by taking continuous linear forms which minorize the directional derivatives.We observe that arrows from the lower level to the upper level can be drawn too, using polarity and support functions respectively. However such processes entail convexity and a certain loss of accuracy. Therefore, when no convexity is present, usually one avoids to pass from the lower level to the upper level because the reverse way will not bring back to the same sets. Nonetheless, we will do it below to identify the polar cone to the limiting normal cone. Let us insist on the fact that we adornate each piece of the preceding four pilarred construction by a symbol marking its type. Here we use the symbol ? as a generic type. For the Dini-Hadamard (or contingent) notions such as the contingent cone to a subset F at a T(F,a) : = l i m s u p -(F - a) 1-.0+ t
or the contingent subdifferential df of a function / df(a) := (x* G X* : lim
inf
-if {a + tu) - f(a)) > (xm, v) Mv G X
J. P. Penot
292
we do not use any superscript or subscript because it seems to us that they are the most basic and primitive notions. As an illustration of this assertion let us mention the following result which proof is an easy consequence of the definitions. An analogous result holds with the Frechet normal cone and the Frechet subdifferential which are closely related notions defined below. In fact the result is valid for the whole family of subdifFerentials 9' and the family of related normal cones N> associated with a bornology B* as in [42]. Proposition 1 ([115], Cor. 4-3) Suppose f attains on F a finite local maximum at a £ F. Then df(a)cN(F,a):=(T(F,a)Y In particular, if f is Hadamard differentiable at a one has f'(a) 6
N(F,a).
In [124] we used an exclamation mark ! for the contingent type in order to stress its analogy with the inferior Dini-Hadamard notion (or incident or adjacent notion) we denote with some i or an upside down exclamation mark: T'(F,a) :=lim inf -(F - a). For the Clarke-Rockafellar notions (circatangent cone T\ circa -subderivative d}f or / ' , circa-subdifferential d^/...) we use an arrow : T](F,a)
:=lim inf -{F - x). x—a
We observe that these three notions enter a general framework introduced in [117] and used and developed in [70]. This framework also contains the prototangent cone or pseudo-circatangent cone [131] Tp{F,a):=
n
lim inf
-(F-a-tu)
and its two variants, the quasi-circatangent cone [131] T"(F,a):=
n
lim inf
-(F-a-tu)
a+tugF
and the boundedly circatangent cone [145] (see also [78], [90] and [91] for the following characterization) Tb(F,a):=n
k>0
lim inf
~
x
AF - x). K
£$.,M\ -4
Convergence Theories
293
The preceding examples (see also [33], [46], [69], [82], [105] ) are defined for any subset of any Banach space. One may also consider a notion which is only defined for the subsets, or the closed subsets, of a restricted class of Banach spaces X; for simplicity we suppose X contains all finite dimensional spaces and is stable by taking products. Similarly, one may consider a notion of subdifferential which is defined on the set LSC(X) of lower semicontinuous functions on the members X of the class X. We do not introduce here a formal axiomatization for subdifferentials; we refer the reader to the ones already existing in the literature for various points of view (see [16], [69], [82], [143], for instance). Let us first consider the coherence of the geometrical approach. Let us recall that the subdifferential d' associated with a normal cone concept N1 is given by d?f(a) := {x'eX:
N7(Ef,aj)}
(x',-1) €
where a/ := (a, /(a)) and Ej is the epigraph of / , Ej := {(x, r) G X x 2R : r > f(x)}; on the other hand the normal cone notion associated to a subdifferential d' is given
by N7(F,a):=d1iF{a), where ip is the indicator function of F : IF(X) := 0 for x £ F, IF(X) ~ + ° ° for x 6 X \ F. In the following result we make precise what we mean by a coherent normal cone notion. Proposition 2 Let N' be a normal cone notion satisfying the following two condi tions: (a) 7V 7 (.R + ,0) = — JR+ when M+ is considered as a subset of M; (b) for any closed subsets A C A', B cY, a € A, b e B, one has N"(A x B, (a, 6)) = N-(A, a) x N7(B, b). Then the normal cone N" associated with the subdifferential d' deduced from N' coincides with TV'. In such a case we say that N' is coherent. In fact it suffices that 7V? satisfies the following condition: (c) for any closed subset A C X, a £ A, one has N\A
x ffi+,(a,0)) = N?{A,a)
x (-7R+).
Clearly, the incident normal cone and the circa-normal cone satisfy conditions (a), (b). It was first observed by L. Thibault that the contingent cone also satisfies these conditions. In a similar way one can show that the Frechet normal cone given by N-(F,a):=\x-
eX'-.Um {
sup (x* x<=F,x-.a
X
~ " ) < 0}
\\X -
a\\
J
J. P. Penot
294
satisfies these conditions. Proof. By construction, given F C X, a G F, we have x* G N"(F,a) iff x* G a ? i F ( « ) iff ( * * , - ! ) 6 i V ? ( £ , > , ( a , 0 ) ) = N?(F x R+,(a,0)) = N?{F,a) x (-M+), by (c),iffx"G N-{F,a).a Let us now consider the coherence of t h e analytical approach. In t h e following s t a t e m e n t t h e dual space X* is endowed with an arbitrary topology or convergence which is compatible with scalar multiplication. P r o p o s i t i o n 3 Let d be a subdifferential taking closed values. In order that d' coincides with the subdifferential d" associated with the normal cone N' deduced from d' it is necessary and sufficient that for any X in X and for any f G LSC(X), x £ Df := domf one has, with xj := (x,f(x)),
d?f(x) x {-1} = d7tEf(xj) nx'x
{-1}
and it is sufficient that the following two conditions are satisfied for any X in X and for any f £ LSC(X), x£Df: (a) d7iBf(xj) = c/(JR+(0 ? /(aO x { - 1 } ) when d-f(x) / 0; (b) d^Ejixf) n (X* x { - 1 } ) = 0 when d"'f(x) = 0. In such a case we say t h a t d' is coherent. Proof. T h e first assertion follows easily from the definitions. Let us prove the second one. Let X be in X and let / G LSC(X), x £ Df If x* € <9 ? /(i) we have (x*,—1) € d'iEj(xj), by (a), hence x" £ d"f(x). Conversely, if x' £ 9 ? ? / ( x ) i.e. if (x*, — 1) £ d'iE,(xf) condition (b) shows we cannot have 3 ? / ( x ) = 0. Therefore there exist nets ( i , ) , e / , ( x ; ) i g / in 1R+ and <9 7 /(x) respectively such t h a t (x*, — 1) = lim,'6/(£,-(x*, —1)). It follows t h a t (<,),-e/ —» 1 and ( x * ) i £ / —> x*. Since <9 ? /(x) is closed we get x* G d1 f(x).0 It has been observed in [76] Prop. 5.3 t h a t the contingent subdifferential satisfies condition (a) of the preceding proposition. It is easy to show t h a t it also satisfies con dition (b). Moreover these conditions are also satisfied by t h e incident subdifferential d\ the circa-subdifferential <9T and the Frechet subdifferential d~ given by d~f(a)
: = fx* e X'
: lim
inf
||x - a\\-\f(x)
- f(a)
- (x*,x - a)) > o } .
Let us note we can give to [76] Prop. 5.3 the following general form which shows the necessity of condition (a) of the preceding proposition under mild assumptions. L e m m a 1 Let N? be the normal cone associated with a coherent subdifferential <9? Suppose that for each X in X , for any subset E of X and any e G E the set N^(E, e) is a closed and convex cone. Suppose moreover that N?(E,e) C ( R + u ) ° whenever E-\-v C E for some v G X. Then, for any X in X and for any f G LSC(X), x £ Dj with d-f(x) ^ 0 one has : driEf(xf) = cl(JR+(d-f(x) x {-1})).
Convergence Theories
295
Proof. Let (x',r) g N"{Ehx;) := S ? i B / (x/). As £y + (0,1) C E,, we have r < 0. When r < 0 we get ( ( - r ) - V , - 1 ) g tf*{E},Xj), hence ( - r ) - V g 9 ? /(x) and (x*,r) g jR+(d ? /(i) x {-1}). When r = 0, taking x*0 g <9?/(zo) and using the convexity of N?(Ej,Xf), for each t g (0,1) we get « , -<) := (tx'0 + (1 - t)*«, - i ) g J V 7 ^ , , * , ) , hence (x',r) = \im,^0(x',-t) g c/(7R + (d ? /(*) x {-1})) by the preceding case. The reverse inclusion is obvious.□ The proof of the following observation is immediate, but the result is worth noting. Proposition 4 If the subdifferential d' is coherent, then the associated normal cone is coherent. If the normal cone N' is coherent, then the associated subdifferential is coherent.
2
Weak Convergences and Weak Limits Superior
Now let us turn to questions which link convergence theory, functional analysis, nonsmooth analysis and optimization theory. For many purposes, such as the stabilization process considered in the next section, it is necessary to dispose of topologies or con vergences for which sufficiently many compact subsets are available. The familiar weak* topology on a dual space X = Y* meets this requirement. However it has some drawbacks : it is not a sequential topology (i.e. it is not determined by the use of converging sequences), neighborhoods of 0 are enormous, and the coupling functional c : X x Y —> JR given by c(x,y) := {x,y) is not continuous if X is in finite dimensional. This last inconvenience is particularly annoying for questions in which duality plays a key role, in particular when one uses normal cones, limiting subdifferentials or Fenchel conjugation. Let us give a precise statement. Proposition 5 IfYis an infinite dimensional normed vector space and if X = Y*, the canonical coupling functional c : X x Y —> IR given by c(x,y) := (x,y) is not continuous when X is endowed with the weak*-topology and Y is endowed with the topology induced by the norm. Proof. Suppose on the contrary that c is continuous, so that one can find a weak* neighborhood V of 0 in X and a ball B with center 0 in Y such that c(VxB) C [—1,1]. Since V contains an infinite codimensional subspace, there exists a whole line Mv contained in V. Taking u g B such that c(v,u) / 0, we get a contradiction.□ It was independently suggested in [58] and [65] to turn to convergence tools either in terms of nets familiar to analysts in [58] or in terms of filters in [65] , more adapted to a topologist's viewpoint (see [49], [77] for general information on the topic). As a matter of fact, the category of convergence spaces is much more versatile than the category of topological spaces and is quite natural for many questions (for instance
296
J. P. Penot
a.e. convergence, continuous convergence, order convergence,...). Moreover, conver gence is frequently a simpler tool than the use of neighborhoods or open sets : the case of convergence of test functions in the theory of distributions is a striking ex ample but pointwise convergence and weak convergence are other cases familiar to analysts. Sometimes the use of convergences instead of topologies is just compulsory : for instance the Painleve-Kuratowski convergence on the hyperspace V{X) of a nonlocally compact topological space X is not topologizable. Here a convergence on X is understood as a relation 7 between nets of X and points of X denoted by (ii)ig/ —> x (or N -^ x \l N : I —* X denotes the net (x;); e /) satisfying the following conditions: (Ci) the constant net with value x converges to as; (C2) if (xj)jeJ is a subnet of a net (x;),' 6 / —► x then (xj)jej —> x; (C3) if x £ X and a net (x,)ig/ of X are such that any subnet (XJ)J^J has a further subnet (x^ktK such that {xk)keK —* x then (x,);e/ —> x. When the relation —> (also denoted by 7) satisfies condition (C\) only we say that it is a preconvergence. Since a convergence 7 can be naturally associated to a preconvergence 7 by adding to it nets all of whose subnets have a further subnet con verging to the considered point for 7, we will not distinguish between preconvergence and convergence when no confusion can arise. Let us give a precise statement for the preceding assertion. In it we say that a preconvergence a is finer than a preconver gence /3 if any net N converging to x for a also converges to x for /?; then j3 is said to be coarser than a. The notion of subnet we adopt here is the one introduced in [1]; using the original notion due to Moore would not make much difference for the present purpose (see also [104]). L e m m a 2 Given a preconvergence a on X, the family of convergences on X coarser than a has a finest element 7 = 7(a) described as follows: a net N converges to x € X for 7 iff for any subnet P of N there exists a subnet Q of P such that Q converges to x for a. Moreover, a mapping f : X —» Y of X into a topological space (or a space with a convergence) Y is continuous for 7 iff it is continuous for a. Proof. If 7 is a convergence, it is clearly the finest element of the family of convergences coarser than a. Clearly, 7 satisfies conditions (Ci) and (C 2 ). In order to prove that it satisfies condition (C3) let us consider a net N of X such that for any subnet R of N there exists a subnet 5 of R such that S -^» x. Let P be a subnet of TV.Taking R = P we can find a subnet S of P such that 5 -2» x. Then, by the definition of 7 we can find a subnet Q of 5, hence of P, such that Q A i . We have proved that N —> x. The last assertion is immediate.□ It may be comforting to know that a convergence can be described without making use of nets or filters but by using a process which evokes some proposals in algebraic topology ([68], [140]) so that one almost remains in the realm of topological spaces and continuous maps. However, the test spaces are no more compact topological spaces or simplexes but are what we call hushed spaces. We define a hushed space
Convergence Theories
297
to be a topological space which has only one non isolated point and a hushed map h : S —► T between two hushed spaces with non isolated points s0 and t0 respectively is a continuous map such that h(s0) = t0, h(s) ^ t0 for s ^ s0. Proposition 6 In order to define a convergence on a set X it suffices to associate for each hushed space T a set C(T, X) of mappings from T to X such that the following conditions are satisfied: (Cj) for any hushed space T, any constant map from T to X belongs to C(T,X); (Cj) for any hushed spaces S, T, for any hushed map g : S —> T and any f £ C(T, X ) one has fog £ C(S, X); (C3) if T is a hushed space and if a mapping f : T —» X is such that for any hushed space S and any hushed mapping g : S —> T there exists a hushed space R and a hushed map h : R —> S such that f o g o h £ C(R, X) then f £ C(T, X). Conversely, for any convergence space X the families of continuous mappings from a hushed space into X satisfy the preceding conditions. Proof. The last assertion is easy to check. On the other hand, given an association satisfying the conditions above one declares that a net N : I —► X converges to some x £ X if for S :— IU {00} , where 00 is an additional point, topologized by taking as a base of open sets the extended tails S; := /,■ U {00}, with /,- := {j £ I : j > i) the mapping / given by /(co) = x, f(j) = N(j) belongs to C(S, X). We leave to the reader the verification of the required conditions, with the following hints. If I and J are directed sets and if g : / —» J is a filtering map (i.e. for each j € J there exists i £ I such that g(Ii) C Jj) then the extension of g by g(oo) = 00 is a hushed map ; conversely, given a hushed space R with non isolated point ro, a directed set / and the associated hushed space S = / U { o o } a s above any hushed map h : R —> S gives rise to a filtered map g : H —» / when one sets H :={(r,i)e Rx I :r^r0, h{r) > 1} g((r,i))
= i.a
In order to cope with the insufficiencies of the weak* topology mentioned above, the following convergence which is a variant of a convergence considered in ([58]) can be introduced : a net (s,),g/ of X is said to be ^-convergent to x and we write (^i)ig/ -^ x if (xi)ig/ —► x for the weak* topology and if (x,); € / is eventually bounded, i.e. if a tail (x J ) J >, () for some i0 £ I is bounded. Equivalently, (x,), e / —> Z iff (x,), e / converges to x in the weak* topology and if limsup i e / ||x,|| < 00. Thus, the convergence 7 rules out weak* convergent nets which are unbounded; such nets exist in any infinite dimensional dual Banach space. Another characterization can be given as follows. L e m m a 3 The convergence 7 is the convergence 7(/3) associated with the preconvergence /3 given as follows, via the procedure described in the preceding lemma: a net N = (x;),-e/ converges to x for /3 iff it is bounded and converges to x in the weak' topology.
298
J. P. Penot
Proof. Clearly, if a net N converges to x for 7, it converges for the convergence l{j3) associated with /?. Conversely, \l N = (x,-)ig/ converges for 7(/3) then it is weak* convergent to x; let us show it is eventually bounded. If it is not the case, for each 1 6 / and each n € iV one can find }(i,n) 6 / such that j(i,n) > i, ||zj(i,n)|| > n. Then we cannot find a subnet (xk)keK of (*i(i,n))(t,n)g/xW which converges to x for f) and a fortiori is bounded. □ Moreover one can show the following relationship with the classical boundedweak* topology [67], [80], [97] and with the weak* sequential topology considered in [113], [114]. Lemma 4 The topology associated to the convergence 7 by taking as closed subsets of X the subsets which contain the limits of their 7- convergent nets is the bounded weak* topology
Convergence Theories
299
and using the fact that (x,) l € / is eventually bounded, we see that ((x,-, i/,)),g/ —» (x, j/). This shows that 7 is finer than continuous convergence and that the coupling func tional is continuous. On the other hand, if (x,);e/ —> x for the continuous convergence, then (xj)i € / —» x for the weak* topology (take j/, = y for each i G 7). If (x,), e / is not eventually bounded, we can select a subnet (xj(jin))(,-in)6/XAr with j(i, n) > i, ||xj(i n )|| > n as in the proof of Lemma 3. Choosing Uj(;,n) in the unit ball of Y such that (*j(f,n),«/(i,n)) > n, -1
and setting yi(iiTl) = n u,-(i,n), we see that (Uj(i,n)) -+ 0 but (iEj(i,n),yj(«,n)> > 1. for each (i,n) 6 7 x N, a contradiction. Q The following result which is an immediate consequence of the continuity of the coupling functional shows the interest of the convergence 7 for various problems. Corollary 2 The graph of a maximal monotone operator A : Y ~» X = Y* is closed in the product convergence of the norm convergence with the convergence 7. We define a multifunction F : T ~~» X from a topological space T into X to be stable (or closed or upper semicontinuous) at to G T if for any nets (2;),e/ —>
:x G
CDrBx}
for r G P := (0,oo), C,D C X, where d0(x,D) = inf {||x - y\\0 : y G D}. The fol lowing result, which completes [58] Prop. 1.7, shows that the use of the convergence 7 in order to define the limit superior of a parametrized family of sets is sensible. We hope it will help researchers who are reluctant to leave the realm of sequential convergences (whose drawbacks are well-known, especially for closure operations). P r o p o s i t i o n 7 Let F : T ~+ X be a multifunction such that for some t0 G T the set F(t0) is closed in X0. Then the following assertions are equivalent (a) F is stable (or closed) at t0 when X is endowed with the convergence 7 ;
J. P. Penot
300
(b) for each weak'-compact subset K of X the multifunction FK given by Fx{t) = F(t) D K is upper continuous at t0 (in the classical sense recalled in the following section); (c) F is continuous at to when the space V(X) of subsets of X is endowed with the topology generated by the sets [Kc]+ = {A € V{X) : A C X\K} for K in the family of weak? compact subsets of X ; (d) F is upper hemicontinuous at t0 for the excesses e°, r £ P •' for each r £ P one has e°(F(t),F(t0)) -.0flsf-tfo. Proof. The equivalence (a) •»(b)-«'(c) are analogous to [58] Prop. 1.7. Let us show (d) =>(a). Let (t,), e / —» t0 in T, (x,),e/ -^ x with a;,- 6 F(f,) for each i £ I. We have to show that x0 £ F(t0). Taking a subnet if necessary we may assume there exists r £ M+ such that z, £ rBx for each i £ / . Then we have d0(x{, F(t0)) —> 0, so that there exists («i)»el m F(t0) such that ||x,- — 2,-||0 —» 0. Then both (x,), € / and (2,),gj converge to xo in X 0 . Since F(t0) is closed in X0 we get x 0 G F(t0). Now suppose (a) holds but (d) does not hold : for some r, a in P and some net (<;),-£/ in T with limit to one has for each i £ / e°(F(t,),JF(t0))>a. Thus we can find (xi), g / in rfiy with X{ £ .F(£;), dofar^^fo)) > a for each z £ / . Since rBx is weak*-compact, (i,)i € i has a subnet (ijjjgj which converges to some xo £ rBx for the weak* topology. By (a) we have x0 £ F(t0). Since (xj)jej also converges for the norm ||.|| 0 we get a contradiction with d0(xj,F(to)) > a. □ We hope the preceding result will incite researchers to treat concrete examples (which abound) with the help of the convergence 7 and of the mixed excesses e°. The following result which relates the convergence 7 with the strong and the weak* convergences is a slight generalization of [101] Prop. 3.2. Here we say that a subset C of X is shadowy with respect to (w.r.t.) some point x £ X if x + [1, oo[(C — x) C C. A cone C is always shadowy w.r.t. 0 and when convex it is shadowy w.r.t. any x £ C n ( — C) ; the lower subdifferential of a quasiconvex function ([133]) is shadowy w.r.t. 0. Proposition 8 Let C be a weakly* locally compact subset of X, let x £ C and let (x,)i g / be a net in C. Among the following assertions one has (a) •& (b) <= (c) ; if C is shadowy w.r.t. x the three assertions are equivalent : (a) (x,); 6 / -^» x (b) (ij) —> x in the weak* topology ;
( C ;II*,--*||-»O. Proof. Clearly (a) => (6) and (c) =*> (a). We also have (b) => (a) since x has a neighborhood V in the weak* topology such that K n C i s weak* compact and there exists h £ / such that i ; £ V n C for ? > h.
Convergence Theories
301
It remains to show that (a) =>• (c) when C is shadowy w.r.t. x. Suppose on the contrary that there exists s > 0 and a cofinal subset J of 7 such that ||XJ — x\\ > s for each j £ J . Let V be a neighborhood of x in the weak* topology a such that V n C is weak* compact, hence bounded. Let r > 0 be such that \\v — x|| < r for each u 6 V n C; we may suppose r > s. Let V = x + ar _ 1 (V — x) ; since V is a neighborhood of x and since for each t ' f V ' f l C, setting v := x + r s _ 1 ( i / — x) € V, we have v £ C since C is shadowy w.r.t. x, hence ||t; — x|| < r and ||i>' — x|| < s. It follows that Xj g V n C, a contradiction with (Xj) —> x weakly*.□ The reader will find information about the use of weakly* locally compact convex sets in [48]; let us mention that this class includes the following example in which r is a positive constant, K is a compact subset of Y and Y is a Banach space : C := {x £ X : /iA-(x) := max{i,j) > H|x||}. yeh'
Such cones will be called Bishop-Phelps cones or sharp cones (see [8] for one of their uses). We provide a simple proof which brings some more information. Lemma 5 Any Bishop-Phelps cone as above is closed and locally compact in the bounded weak* topology o~\, and in the weak* topology a. Proof. The case of the topology a is treated in see [101] Prop. 3.5. Since | hx \ is one of the seminorms defining the topology at,, and since the dual norm is l.s.c. for cr hence for uj, the set C is closed for at- Setting V := K° = hjfQ—oo, 1]), a neighborhood of 0 for
J. P. Penot
302
Proposition 9 Suppose F is a subset of a n.v.s. X. Then the Frechet normal cone to F atx € F is the polar of the weak tangent cone to F at x in the duality between X and X" : N-iF,x) = {T{F,x))°. Proof. Given x" € N~(F,x) and v" = 7 - l i m ^ / f r 1 ^ . - x) £ T~<(F,x) with {ti)iel —» 0 + and (v{) := ( i , * 1 ^ ; — 1)) eventually bounded, we see that (r,),- 6 ; := (||zi - x\\)ieI —► 0 and we may assume that (g,-) := (<, _lr i) n a s a finite limit q; then we have (x*,u**) = lim(i",r, _1 (xi - x))^1^ < 0. Conversely, if x* € X' \ N~(F,x) we can find a sequence (x„) in F \ {x} with limit x and a > 0 such that || i „ - a: 11 for each n. Let v" be a weak** cluster point of the sequence (ijf^zfij)- Then (x",v") a, v" 6 T~i{F,x) and x~ $ {T~<(F,x))0.a
3
>
Stabilized Subdifferentials and Stabilized Nor mal Cones
Stability (also called closedness or upper semicontinuity) is a desirable feature for a multifunction. Let us recall that a multifunction M : W ~> Z between two topological spaces is said to be stable (or closed or upper semicontinuous) at w G W if nU€0(a)cl(M(U))
C M(w),
where cl denotes the closure and 0(w) denotes the family of open neighborhoods of w in W. It is said to be upper (semi) continuous at w if for any open subset V of Z containing M(w) one can find U 6 0(w) such that M(U) C V. Let us recall the following result, the first part of which is obvious, taking into account the fact that the multifunction M is stable at each point iff its graph is closed. Its last assertion follows from the fact that a multifunction with values in a compact set is upper continuous if and only if it is stable. Lemma 7 [35] Let F : W ~» Z be a multifunction between two topological spaces. There exists a smallest stable multifunction M whose graph contains the graph of F; it is obtained by taking the closure of the graph of F in the product space W x Z. If F is densely defined and if for each w € W there exists some U 6 O(w) such that F(U) is contained in a compact subset of Z then there exists a (unique) smallest upper continuous multifunction N containing F which is upper continuous with nonempty compact values. It is obtained by taking N(w) = n ( , E O M c!(F(C/)). In fact, M and N coincide.
303
Convergence Theories
W h e n Z is a convergence space, it is still possible to consider stable multifunctions and to stabilize any multifunction by taking its closure; but the simple procedure described in the preceding lemma in order to stabilize a multifunction is no more valid. However, taking the closure of a multifunction with respect to a convergence may bring useful properties. Unfortunately, convexity is lost in the process, but we will see that other important properties are preserved. Let us give a precise definition for the case we are interested in, i.e. the case of a dual Banach space endowed with the convergence 7 described above. Definition 1 Given a topological space W, a Banach space X and a multifunction M : W ~~* X", the stabilized multifunction M associated to M is given by M(w) := {x* e X' : 3{wt)t€l
-> w,3(x'),e,
A a;*,Vt G / x* G M(w,)}
When W = X and M = d1 f for some / € LSC(X) and some subdifferential <9? we denote by 9 ? / the multifunction obtained in this way, requiring furthermore that (wi)iel ~~* w i-e- that (u>,),g/ —> w and (/(«;,));£/ —> /(to) and we call it the stabilized subdifferential of f. Let us observe that the added condition can be interpreted either by changing the topology of W or by taking the closure of the hypergraph of / given by { ( I , / ( I ) , I * ) : x G X } and taking the projection. When W = X and M = N• (F,.) for some closed subset F of X and some normal cone N' we denote it by N?(F,.) and we call it the stabilized normal cone to F. The question arises whether the stabilized subdifferential (resp. normal cone) multifunction associated with a coherent subdifferential (resp. normal cone) is still coherent. The following result presents conditions ensuring this property. Proposition 10 (a) If N' is the normal cone associated with the subdifferential d then iV? is the normal cone associated with the stabilized subdifferential &1. (b) If d' is the subdifferential associated to the normal cone N' and if for any closed subset F of X, a G F and any p G X such that F + p C F one has N7(F, a + p) C Nn'{F,a) then d1 is the subdifferential associated with the stabilized normal cone 7V? (c) Under the preceding assumption, if N' (resp. d) is coherent then 7V? (resp. _?
d ) is coherent. We note that the assumption in assertion (b) is satisfied in particular for the contingent, the incident and the Frechet normal cones. Proof. The first assertion is an immediate consequence of the definitions and the last one follows from the two preceding assertions. Let us prove (b). Given X in X, f G LSC(X), x G Df, x' G d1'f(x), we can find a net (i;); 6 / -+ x and a net (x*), € / ^> x* such that x* G d ? f(x,) for each 1 G / and (f(x,))ie, -> f{x). Then ((*T, - l ) ) i 6 ; -> ( x ' , - 1 ) , ((x„/(x,))jie/ - (*,/(*)) and as ( x ' , - 1 ) G N-(Ej,(x,J(x,)) for each i G I we get (x*,— 1) G NJ(Ej, (x,f(x))). Conversely, if x* satisfies this relation, we can
J. P. Penot
304
find a net ((xj,r;)),- 6 / —» (x,/(x)) in Ej and a net ((x,*,s,))j e / —» (x*,—1) such that (x,*, a,-) € N7(Ef, (x,-, r,)) for each i £ / ; then, for t large enough, we have s, < 0 and, by our assumption, (| s'1 | x * , - l ) e N'!(Ef,(xi,f(xi))) so that | s" 1 | x* G d1 f{xi) and x* G d 7 /(x).D The following result describes the way polarity behaves with respect to the pre ceding closure process. It completes and mimics [15] Th. 1.1.8, [58] Th. 2.2, [116] Lemma 2.13. Proposition 11 Let W be a topological space, let M : W ~» X* be a multifunction whose values are closed convex cones and let M be its closure with respect to the convergence 7. Then, taking polars in the duality between X* and X", one has (M(w))° n X = (lim inf M(v)a) n X. If moreover the values of M are weak* -closed one can take polars in the duality between X* and X : (M(w))° = lim inf M(vY. v—*w
'
Proof. The inclusion (liminf„_ w M(«)°) n X C (M(w))° follows from the conti nuity of the coupling functional c : X" x X* —» 1R at each point of X x X*: when (x*) —» x*,(x**); —> x one has
\(xV>xi) ~ ix>x')\ < \ix" ~ * . » * } | + K*>«* - x'}\ - » ° Now if x € X \ liminf„_„, M(u)° we can find r > 0 and a net (u)i), 6 / —> w such that B**(x,r) n M(wi)° = 0 for each i G / . As B"(x,r) is weak"compact, the Hahn-Banach theorem provides some x* G M(tUi) with norm one such that (x*,v**) > 0 for each v" G
B"(x,r).
Taking a subnet if necessary we may assume (x*) l€ / has a weak*—limit x* As this net is bounded, the limit is valid for the convergence 7 and x* G M(w). Since the preceding inequality implies (x*,x) > r for each i G / , we get (x*,x) > r and x ^ (M(w))°. When the values of M are weak* closed, we may replace the ball B"(x,r) by the open ball U(x,r) of X and get x* G M(iu,) 00 = M(w{). D Corollary 3 The polar (in X) of the stabilized contingent normal cone N(F,x) to a closed subset F of X at x G F is (liminf , , co(T(F,x'))) n X. The polar (in X) of the stabilized Frechet normal cone N~(F,x) to a closed subset F of X at x G F is (liminf j F co(T^(F,x'))) n X. If X is reflexive one has Ti(F,x) = (N(F,x))°, N1(F,x) = co{N(F,x)).
Convergence Theories
305
Proof. The first assertion follows from the proposition and the fact that co(T(F, x')) = (T(F, x'))°° = (N{F, i'))°. The second one is proved similarly, using cd(Ti(F, x')) = (T1 (F,x'))"" — (N~(F,x'))°. The last assertion is a consequence of the equality \immfxli:xCd(T~'(F,x')) = T^(F,x) in [43] Th 3.1, taking into account that in a re flexive Banach space the weak contingent cone T ^ F , x') coincides with the sequential weak tangent cone.n The following result compares our definition with related ones. Proposition 12 Let M : W ~~* X* be a multifunction and let W (resp. Jf"eg) be its closure (resp. sequential closure) with respect to the weak" topology a. Then for each w € W one has
I » c l W c F ( 4 If there exist a neighborhood Uofw and a weak" closed, weak" -locally compact subset C of X' such that M(u) C C for each t i £ ( / then the last inclusion is an equality. If moreover X is reflexive both inclusions are equalities. Proof. Let x £ M (w). Since U x X* n M C U x C, and since C is closed for a we have i G C . Let V be a neighborhood of x' for a such that V n C is compact for a. If ((wi,x"))i£[ is a net in M with limit (w,x") the net (x*) is eventually in V n C, hence is convergent for 7 so that x" 6 M(w). The last assertion is proved in [101] and is based on Whitley's construction.□ Ph. Loewen further shows ([101] Prop. 3.7) that if F is epi-Lipschitzian (i.e. satisfies the cone condition [2]) or compactly epi-Lipschitzian in the sense of [34], then the preceding condition is satisfied for the multifunction N(F,-) on W := F. Then we say that F is a Loewen set or that F satisfies the (LC) condition. The epi-Lipschitzian conditions can be relaxed in the following way. Proposition 13 Let F be a closed subset of X which is compactly tangentially de termined near x £ F in the following sense: there exists r > 0, a compact subset K of X and a neighborhood U of x such that for any u 6 FC\U one has rB C T(F, u) + K, where B is the closed unit ball of X. Then N(F,x) coincides with N {F,x) obtained by stabilizing N(F,-) with respect to a. Proof. This follows from the preceding proposition and the fact that N(F, u) is contained in a fixed Bishop-Phelps cone.D Any closed subset of a finite dimensional n.v.s. is obviously compactly tangentially determined (take r = 1, K — B). It is also the case if there exist a neighborhood U of x and r > 0, v £ X such that B{v,r) C T{F,u) for each ue FnU (take K = {-v}) or if F is compactly epi-Lipschitzian in the sense of [34]. Corollary 4 If f is a Lipschitzmn function, or more generally, a function whose epigraph satisfies the (LC) condition, then the weak" closure and the -y-closure of df coincide:
Qmj
c
g*f =g /.
306
J. P. Penot
In particular, for each x € X the set df(x) is weak"-closed and the multifunction df(-) is closed. If moreover the space X is reflexive, then the sequential weak' closure of df coincides with the other two closures. In usual spaces it is not necessary to distinguish between the stabilized Frechet subdifferential and the stabilized contingent subdiiFerential when dealing with locally Lipschitzian functions. We need some definitions to present this fact. Definition 2 Given a subdifferential d' a Banach space X is said to be dependable for d1, or d1 -dependable if for any l.s.c. functions f,g : X —» MV {00} with g locally Lipschitzian and any x 6 domf, any e > 0, any x* € d'(f + g)(x), there exists u,ve B(x,e), u* € d?f(u), v' € d-g(v) with | f{u) - f(x) |< e, | g(v) - g(x) |< e, \\u' + v' — x"\\ < e. If in the preceding condition g is supposed to be convex and Lipschitzian we say that X is C-dependable for d' We call dependable the spaces which are d-dependable, i.e. dependable for the contingent subdifferential. It has been shown that a space is d~ - dependable, i.e. dependable for the Frechet subdifferential iff it is trustworthy in the sense of [81] iff it is an Asplund space, iff its dual satisfies the Radon-Nikodym property (see [62], [71]-[73], [81]). In particular, reflexive Banach spaces and separable Banach spaces are trustworthy. Thus, this class of spaces is important. Moreover, it can easily be characterized in terms of separable subspaces. It is shown in [61] that the class of LCJ-bumpable spaces, i.e. the class of spaces for which there exists a. non null Lipschitzian function of class C1 with bounded support is ^"-dependable and in fact ^"-dependable, where d" is the viscosity subdifferential defined in the following way: x* belongs to dvf(x) iff there exist a function g of class C 1 such that g'(x) = x" and f — g attains its minimum at x. The situation for dependable and C-dependable spaces is not as clear yet ; it is likely that these classes are much more restricted. On the contrary, the following class is at least as large as the class of trustworthy spaces: if we substitute in it the Frechet subdifferential to the contingent subdifferential, we obtain exactly the class of Asplund spaces (see [127]). A similar reason shows that this class is contained in the class of space on which exists a bump function of class T 1 in the sense of Proposition 17 below. Moreover it is contained in the class of Gateaux differentiability spaces in the sense of [132]. Definition 3 A Banach space is said to be reliable or a R-space if for any l.s.c. function f : X —> iRL){oo} and g : X —> HI convex and Lipschitzian and any x 6 domf at which f + g attains a local mimimum, and for any e > 0, there exists u,v € B{x,e), u* G df{u), v' € dg{v) with \ f(u) - f(x) |< e, | g(v) - g(x) |< e, [|«* + v*|| <£■ The proof of the coincidence result we announced is analogous to the proof of [88] Lemma 4.
307
Convergence Theories
Proposition 14 Let X be an Asplund space and let f be a locally Lipschitzian func tion on X. Then d f coincides with the multimapping d f obtained by stabilizing the Frechet subdifferential d~f. The stabilization procedure we have presented is a simple prototype. There are several other ways of stabilizing subdifferentiaJs; in particular, one may consider a stabilization procedure through the function itself considered as a variable (see [75], [152], [153] for instance). It may also be useful to introduce slightly more complicated processes involving e-subdifferentials or restrictions to finite dimensional subspaces; let us describe them shortly, since they follow a pattern similar to what precedes. The (contingent) e-subdifferential of / is given by
dj(x) := {i* ex'vvex
f(x,v) > (x-,v) -e\\v\\),
where f'(x, v) := lim
inf
t'\f{x
+ tu) -
f(x))
t—*G+,u—*v
is the contingent derivative of / . In nice spaces the regularisation procedure using this approximate subdifferential coincides with the one we presented. Proposition 15 For any l.s.c. function f on a C-dependable space (resp. reliable space, resp. trustworthy) X and for any x € domf one has d f(x) = 7 - l i m sup ,
dj(u),
(resp. d f(x) 3 7 - lim sup ,
d~f(u),
resp. d f(x) = 7 — limsup /
d~ f(u)).
Proof. Let us prove the first assertion, and then show the necessary changes for the second one. It suffices to show that any x" of the right hand side of this equality belongs to d / ( x ) . One can find nets (e,-)ie/ —> 0+, (x;);e,r —> x,(x*), e / -» x' weakly' with (f(xi))ieI -> f(x), (x*)ieI bounded, x* G 9 £ ,/(x,) for each i G / . Setting qi(u) := \\u — x;||, we see that x* G d(f + £,g)(x,-), hence x* G u* + 2E{B~ with u' G df(ui) for some u; G B(x„ e,) satisfying | f(u,) - /(x,) |< e,-._Thus, (u')iei -> x* and is bounded, and (u,-);e/ -> x with (/(xi)), e j —> f(x) : x" G9 / ( x ) . When x* belongs to the right hand side of the second equality we can take x* G d~f(x{) for each i G /■ Then x, is a local minimizer of / + 2e,-<7, — x* and reliability (resp. trustworthiness) enables us to conclude as above. □ The proof of the following result is of the same type and is omitted. Proposition 16 For a l.s.c. function f : X —» RU {00} and a locally Lipschitzian function g on a dependable (resp. trustworthy ) space X, for any x G domf one has
d(f + (resp.d
g)(x)Cdf(x)-rdg(x)
(f + g)(x)cd
f(x) + d
g(x)).
J. P. Penot
308
Therefore the stabilization process we used provides useful calculus rules. For more on this subject in a similar framework, see [107], [108], [88]. Let us note that, to the contrary of the A-subdifferential considered in [83], [84], [88], the stabilized subdifferential we introduce here coincides with the usual subdifferential in the convex case and in the case of a function of class C 1 (or even of class T1 as defined in [129]). Proposition 17 Suppose f = g + h, where g : X —> R U {oo} is convex and l.s.c. and h is of class T1, i.e. is continuous, Gateaux differentiable with a locally bounded derivative which is continuous for the weak"-topology. Then for each x £ domg one has d / ( * ) = dg{x) + h'(x). Proof. It suffices to show that any x' £ B f{x) belongs to the right hand side of this equality. One can find nets (i,),'e/ —> x, (x')iej —> x* weakly* with (/(xi))ie/ ~~* f(x)> (xi)iei bounded, x" 6 3/(x,) = dg(xi) + h'(xi) for each i € / , by [129], Prop. 1.5. Then (y*),e/ := (x* — /j'(x;)), g / is bounded and weak* converges to y" := x~ — h'(x). From the continuity of the coupling functional we conclude that y* 6 dg(x).0 Another stabilization process consists in using restrictions to finite dimensional subspaces. Let us give a short account of it ; here we replace weak convergence in [83] by 7—convergence. We denote by T the directed family of finite dimensional subspaces of X. Definition 4 For a l.s.c. extended real-valued function f on an arbitrary Banach space X the finitely stabilized subdifferential of f at x 6 domf is given by d f(x) :=FeJT n 7 - limsupd/u+F(u), t
u—*x
where fu+F(w)
:= f(w) if w £ u + F , +00 else.
The interest of such a modification lies in the fact that the nice calculus rules of [83] are preserved ; in particular, for an arbitrary Banach space X, for any f : X —* ML! {00} l.s.c, finite at x, for g : X —> ]R locally Lipschitzian one has
~dA{f + g){x)CdA
f{x)+~dAg{x).
Moreover the following new property holds. Proposition 18 / / / is an arbitrary convex function on the Banach space X, for each x 6 domf one has d f(x) = df(x) = df(x). Proof. Since for each u 6 domf and each F 6 T one has df(u) C dfu+p(u) the inclusion df(x) Cd f(x) holds. Conversely, given x~ £d f(x) let us show that for each w € X one has f(w)>f(x) + {x',w-x).
Convergence Theories
309
We pick F e f containing w and i , a net (x,); e / -^ x, a bounded net (x*), e / with weak" limit x* such that x* G dfXl+F(xi) for each i G / , so that xt- € F for each i € / . Then, as the restriction of / to F is convex and as the contingent subdifferential coincides with the ordinary subdifferential in this case, we have
f(u>)>f(xi) +
{xlw-Xi)
and taking limits we get x" G df(x) thanks to the continuity of the coupling functional. The second equality is proved similarly. D Up to now, we have not used other existing links between the analytical approach and the geometrical approach such as the distance functional. The reason is that this last means does not give an accurate connection as the indicator function does (see [37]). For a similar reason we have not treated here the case of proximal normal cones N* which reflect more the metric properties of the space (and the set) than its linear structure; see [39], [43], [44], [53], [88], for instance, in this connection. However, the following result akin to results in [88] is worth noting. Let us note that it does not ensure that the stabilized normal cone to F is weak*—closed, inasmuch the last assertion of the preceding corollary. Proposition 19 Suppose X has a smooth norm. Let dF be the distance function associated to the closed set F : dF(x) := m{yeF \\x — y\\. Then for each x € F one has N(F,x)
= R+
ddF(x).
Proof. The inclusion N(F,x) D M+ d dp(x) is obvious, provided one makes an appropriate adaptation of [88] Lemma 5. Now, given x' G N(F, x), we can find a net (x,), e / in F with limit x and a bounded net (x*),-g/ in X* with weak*— limit x* such that x* € N(F,xi) for each i G /• Since ddF(u) = N(F, u) n B*(0,1) for each u G F, as easily checked, (see also [88] Lemma 3) we can write x* = r,u", with n := ||x*||, u* G ddF(x,).Without loss of generality we may assume (n) and (u*) converge to some r and u* respectively and u" G<9 dp(x). Then x* = ru* and the reverse inclusion is proved. □
4
Alien Terms in Limit Problems
A connection between convergence theory and optimization problems which deserves comments and thoughts lies in the derivation of optimality conditions and in sensi tivity problems. We observe that the most recent and efficient optimality conditions [55], [85], [86], [95], [121], [122], [125] involve additional unexpected terms. Similar phenomena occur in other fields such as homogenization theory [4], [31], [51], [59],
J. P. Penot
310
[111], [112], [138] from which we borrow the words "alien term' 1 and elasticity [45], [50]. One may wonder whether such a fact is fortuitous or not. We believe it is not accidental. For the moment, this belief does not have firm grounds ; we hope that these lines may be an incentive to find some sound reasons. However we dispose of the following observations. Given a n.v.s. X, xo G X and an arbitrary function f : X —> fit' = KU {oo} which attains its minimum on X at Xo we observe that for each t > 0 the functions ft, f»: X - R given by n(v) = t-1[f(xo + 2
f't'(v) = 2t- [f(x0
tv)-f(x0)}
+ tv) - f(x0) - (x*0,tv)]
with XQ = 0 attain their minimum at 0. If the family (/(')i>o (resp (/,")i>o) epi-converges to some function f (resp. fxo,x') then one has f'xo{v) > 0 (resp. / " x'(v) — 0) for each v € X. In fact this inequality holds for the lower epi-limit of the family (/(')<>o (resp. (f")t>o) without supposing epi-convergence. This stems from the fact that the epigraphs E(ft) of / ( for / , = f't or /"are contained in X x 2R+ so that their limit superior (in the sense of the preceding section) is contained in X x IR+. It follows that the necessary condition f'xo{-) > 0 (resp. / " -•(.) > 0) involves the calculation of some epi-limits which may differ from the pointwise limits (see [57], [125] for instance).. On the other hand when one considers the limit problem of a family of equations of the form (Et)
Ft(v) = 0
or inequations of the form {VI,)
(Ft{v),u-v)>0Vu€C
one also takes graphical limits which involve additional terms. These "alien terms" are difficult to interpret and depend on the specific problem at hand. However one may wonder whether there exist natural links between the extra terms in optimality conditions and the "alien terms" of the limit problem of {Et) or {VIt). This question arises naturally when for instance {VIt) is the Euler equation of some parametrized minimization problem {Pt) minimize ft(x) : x £ C The observation above shows that problems of this type appear in the derivation of first order and second order optimality conditions. The relationships between convergence of parametrized families of convex functionals and convergence of their subdifferentials detected in [4], [7], [110], [123] yield some hints. But much more is to be done, from a theoretical point of view as well as from an applied point of view.
Convergence Theories
311
Sensitivity analysis represents another promising field for such a study : whenever the constraints are non polyhedral, additional terms should be added in the expression of the derivatives of the performance function. The fact that most problems with constraints involving partial differential equations are governed by non polyhedral sets in some functional spaces justifies an interest for such intricate matters.
References [1] J. F. Aarnes and P. R. Andenaes. , On nets and filters, Mathematica Scandinavica 31 (1972) 285-292. [2] S. Agmon, Lectures on Elliptic Boundary Value Problems, Van Nostrand Math ematics Studies 2 (1965), Princeton, N.J.. [3] E. Asplund and R. T. Rockafellar, Gradients of convex functions, Transactions of the American Mathematical Society 139 (1969) 443- 467. [4] H. Attouch, Variations] Convergence for Functions and Operators, Pitman, Boston (1984). [5] H. Attouch, D. Aze and G. Beer, On some inverse stability problems for the epigraphical sum, Journal ol Nonlinear Analysis Theory, Methods and Appli cations 16 (1991) 241-254. [6] H. Attouch, R. Lucchetti and R. J.-B. Wets, The topology of the p-Hausdorff distance, Annali Di Matematica Pura ed Applicata 160 (1991) 303-320. [7] H. Attouch, J.-L. Ndoutoume, M. Thera, Epigraphical convergence of functions and convergence of their derivatives in Banach spaces, Seminaire d'Analyse Convexe, Montpellier, Expose No9, 1990. [8] H. Atttouch and H. Riahi, Stability results for Ekeland's e — variational principle and cone extremal solutions, Seminaire d'Analyse Convexe 20, Montpellier, Expose 5 (1990). [9] H. Attouch and R. J.-B. Wets, Epigraphical Analysis, in Analyse non lineaire. H. Attouch et al (eds) Gauthier Villars, Paris (1989) pp.73-100. [10] H. Attouch and R. J.-B. Wets, Isometries for the Legendre-Fenchel transform, Transactions of the American Mathematical Society 296 (1986) 33-60 [11] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: I. The epigraphical distance, Transactions of the American Mathematical Society 328 (2) (1992) 695-729.
312
J. P. Penot
[12] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: II. A framework for nonlinear conditioning, SIAM Journal on Optimization 1992. [13] H. Attouch and R. J.-B. Wets, Quantitative stability of variational systems: III. e-approximate solutions, Preprint (Oct. 1987) IIASA Laxenburg, Austria. [14] J.-P. Aubin and I. Ekeland, Applied Functional Analysis, Wiley-Interscience, New York (1984). [15] J.-P. Aubin and H. Frankowska, Set-Valued Analysis, Birkhauser, Basel (1990). [16] D. Aussel, J. N. Corvellec and M. Lassonde, Mean value theorem and subdifferentiability criteria for lower semicontinuous functions, Transactions of the American Mathematical Society, to appear. [17] D. Aze and J.-P. Penot, Recent quantitative results about the convergence of convex sets and functions, in: Functional Analysis and Approximation, P. L. Papini (ed) Pitagora, Bologna (1989) 90-110. [18] D. Aze and J.-P. Penot, Operations on convergent families of sets and functions, Optimization 21 (1990) 521-534. [19] D. Aze and J.-P. Penot, Qualitative results about the convergence of convex sets and convex functions, in: Optimization and Nonsmooth Analysis, A. D. Ioffe et al. (eds). Pitman Research Notes 244, Longman, Harlow, (1992) 1-25. [20] D. Aze and J.-P Penot, The Joly topology and the Mosco-Beer topology revis ited, Bulletin of the Australian Mathematical Society 48 (1993) 353-363. [21] D. Aze and A. Rahmouni, Lipschitz behavior of the Legendre- Fenchel trans form, Preprint, University Perpignan, 1992. [22] D. Aze and A. Rahmouni, Intrinsic bounds for Kuhn-Tucker points of perturbed convex programs, Preprint, University Perpignan, 1992. [23] G. Beer, On Mosco convergence of convex sets, Bulletin of the Australian ematical Society 38 (1988) 239-253.
Math
[24] G. Beer, On the Young-Fenchel transform for convex functions, Proceedings of the American Mathematical Society 104 (1988) 1115-1123. [25] G. Beer, Conjugate convex functions and the epi-distance topology, Proceedings of the American Mathematical Society 108 (1990) 117-126 [26] G. Beer, Hyperspaces of a metric space : an overview, Preprint (1990). [27] G. Beer, Mosco convergence and weak topologies for convex sets and functions Mathematika 38 (1991) 89-104.
Convergence Theories
313
[28] G. Beer, Topologies on closed and convex sets and the Effros measurability of set-valued functions, Sem. d'Analyse Convexe, Montpellier 21 (1991), expose no 2. [29] G. Beer, The slice topology : a viable alternative to Mosco convergence in nonrefiexive spaces, Nonlinear Analysis Theory, Methods and Applications 19 (1992) 271-290. [30] G. Beer and J. M. Borwein, Mosco convergence and reflexivity, Proceedings of the American Mathematical Society 109 (1990), 427-436. [31] A. Bensoussan, J.-L. Lions and G. C. Papanicolaou, Asymptotic Periodic Structures, North Holland, Amsterdam, 1978.
Analysis for
[32] G. Beer and J. M. Borwein, Mosco convergence of level sets and graphs of linear functionals, Journal of Mathematical Analysis and Applications 175 (1993) 5367. [33] J. Birge and Liqun Qi, Semiregularity and generalized subdifferentials with applications to optimization, Mathematics of Operations Research 18 (4) (1993) 982-1005. [34] J. M. Borwein, Epi-Lipschitz-like sets in Banach spaces : theorems and ex amples, Journal of Nonlinear Analysis Theory, Methods and Applications 11 (1987) 1207-1217. [35] J. M. Borwein, Minimal cuscos and subgradients of Lipschitz functions, In: Fixed Point Theory and its Applications, J.-B. Baillon and M. Thera, eds. Pitman Lecture Notes in Maths, Longman, Essex, (1991), 57-82. [36] J. M. Borwein, Differentiability properties of convex, of Lipschitz, and of semicontinuous mappings on Banach spaces, in: Optimization and Nonlinear Anal ysis, A. Ioffe, M. Marcus and S. Reich (eds), Pitman Research Notes in Math. 244, Longman (1992), 39-52. [37] J. M. Borwein and M. Fabian, A note on regularity of sets and of distance functions in Banach spaces, Journal of Mathematical Analysis and Applications 182 (2) (1994) 566-570. [38] J. M. Borwein, S. P. Fitzpatrick and J. R. Giles, The differentiability of real functions on normed linear spaces using generalized subgradients, Journal of Mathematical Analysis and Applications 128 (2) (1987) 512-534. [39] J. M. Borwein and J. R. Giles, The proximal normal formula in Banach spaces, Transactions of the American Mathematical Society 302 (1987) 371-381.
J. P. Penot
314
[40] J. M. Borwein and A.D. Ioffe, Proximal analysis in smooth spaces, Preprint, University Simon Fraser, Vancouver, 1994. [41] J. M. Borwein and A.S. Lewis, Convergence of decreasing sequences of convex sets in nonreflexive Banach spaces, Preprint, University of Waterloo, October 1992. [42] J. M. Borwein and D. Preiss, A smooth variational principle with applications to to subdifferentiability and to differentiability of convex functions , Transactions of the American Mathematical Society 303 (1987) 513-527. [43] J. Borwein and H. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part I ■ Theory, Canadian Journal of Mathematics 38 (1986) 431-452. [44] J. Borwein and H. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part II : Applications, Canadian Journal of Mathematics 39 (1987) 428-472. [45] F. Bourquin, P. G. Ciarlet, G. Geymonat and A. Raoult. T-convergence et analyse asymptotique des plaques minces, C.R.A.S. (I) 315 (1992) pp. 10171024. [46] J. Burke and Liqun Qi, Weak directional closedness and generalized subdifferentials, Journal of Mathematical Analysis and Applications 159 (2) (1991) 485-499. [47] Ch. Castaing, Proximiteet mesurabilite. Un theoreme de compacite faible, Colloque sur la Theorie Mathematique du Controle Optimal, Brussels, (1969) 2533. [48] Ch. Castaing and M. Valadier, Convex Analysis and Measurable Lecture Notes in Maths. 580, Springer Verlag, Berlin 1977.
Multifunctions.
[49] G. Choquet, Convergences, Annales Inst. Fourier Grenoble 23 (1947-1948) 55112. [50] P. G. Ciarlet, Plates and Junctions in Elastic Multi-structures: Analysis, Masson, Paris (1990).
An
Asymptotic
[51] D. Cioranescu and F. Murat, Un terme etrange venu d'ailleurs I, II. in: JVonJinear Partial Differential Equations and their Applications, College de France Seminar, vol. II, Pitman, London, (1982) 98-138. [52] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.
Convergence Theories [53] F. H. Clarke, Methods of Dynamic and Nonsmooth Optimization, regional conferences series 57, S.I.A.M. , Philadelphia, 1989.
315 CBMS-NSF
[54] F. H. Clarke, R. J. Stern and P. R. Wolenski, Subgradient criteria for monotonicity, the Lipschitz condition, and convexity, Canadian journal of mathematics 45 (6) (1993) 1167-1183. [55] R. Cominetti, Metric regularity, tangent sets, and second-order optimality con ditions, Applied Mathematics and Optimization 21 (1990) 265-287. [56] R. Cominetti, On pseudo-differentiability, Transactions of the American Math ematical Society 324 (1991) 843-865. [57] R. Cominetti and J.-P. Penot, Tangent sets of order one and two to the positive cones of some functional spaces, in preparation. [58] L. Contesse and J.-P. Penot. Continuity of the Fenchel correspondence and continuity of polarities, Journal of Mathematical Analysis and Applications 156 (1991) 305- 328. [59] G. Dal Maso. An Introduction to T—Convergence, Birkhauser, Boston, 1992. [60] R. Deville, A mean value theorem for non differentiable mappings, Preprint, University Bordeaux I, 1993. [61] R. Deville and E. M. El Haddad, The subdifferential of the sum of two functions in Banach spaces I. First order case, Preprint, University Bordeaux I, 1993. [62] R. Deville, G. Godefroy and V. Zizler, Smoothness and Renormings in Banach Spaces, Pitman Monographs in Math 64, Longman, Essex, 1993. [63] J. Diestel, Sequences and Series in Banach Spaces, Springer-Verlag, New York (1984). [64] S. Dolecki, Tangency and differentiation : some applications of convergence theory, Annaii di Matematica Pura ed Applicata 130 (1982) 223-255. [65] S. Dolecki, Continuity of bilinear and non bilinear polarities, in: Optimization and Related Fields, Erice, 1984, R. Conti et al eds. Lecture Notes in Maths. 1190, Springer Verlag, 1986, 191- 213. [66] S. Dolecki, Convergence of minima in convergence spaces, Optimization (1986) 553-572.
17
[67] N. Dunford and J. T. Schwartz, Linear Operators, vol.1, Interscience, New York, 1958.
316
J. P. Penot
[68] S. Eilenberg, Homotopie et Espaces Fibres, unpublished Lectures, Paris 19661967. [69] K.-H. Elster and J. Thierfelder, On cone approximations and generalized di rectional derivatives in: Nonsmooth Optimization and Related Topics, F. H. Clarke, V. F. Dem'yanov and F. Giannessi eds., Plenum Press, New York, 1989, 134-154. [70] B. El Abdouni and L. Thibault, Quasi-interiorly e-tangent cones to multifunctions, Numerical Functional Analysis and Optimization 10 (7&8) (1989), 619641. [71] M. Fabian, Subdifferentials, local e—supports and Asplund spaces, Journal of the London Mathematical Society 34 (1986) 568-576. [72] M. Fabian, On classes of subdifferentiability spaces of loffe, Nonlinear Theory, Methods and Applications 12 (1) (1988) 63-74.
Analysis
[73] M. Fabian, Subdifferentiability and trustworthiness in the light of a new variational principle of Borwein and Preiss, Acta University Carolinae Math, et Phys. 30 (2) (1989) 51-56. [74] M. Fabian and N. V. Zhivkov, A characterization of Asplund spaces with the help of local e— supports of Ekeland and Lebourg, Comptes rendus Acad. bulgare Sci. 38 (6) (1985) 671-674. [75] H. Frankowska, The first order necessary conditions for nonsmooth variational and control problems, SIAM Journal on Control and Optimization 22 (1) (1984) 1-12. [76] H. Frankowska, S. Plaskacz and T. Rzezuchovski, Measurable viability theorems and Hamilton-Jacobi-Bellman equation, Cahiers Ceremade No 9207, Universite Paris IX, 1992. [77] A. Frolicher and W. Bucher, Calculus in vector spaces without norm, Lecture Notes in Math, 30 (1966) Springer Verlag, Berlin. [78] E. Giner, Etude sur les fonctionnelles integrates, these d'Etat, Universite of Pau, 1985. [79] J.-B. Hiriart-Urruty, New concepts in nondifferentiable programming, Bulletin de la Societe Mathematique de France, Memoire No 60 (1979) 57-85. [80] R. B. Holmes, Geometric Functional Analysis and its Applications, Verlag, New York, 1975.
Springer
Convergence Theories
317
[81] A. D. Ioffe, Subdifferentiability spaces and nonsmooth analysis, Bulletin of the American Mathematical Society 10 (1984) 87-89. [82] A. D. Ioffe, On the theory of subdifferential, in Fermat Days 85: Mathematics for Optimization, J. B. Hiriart-Urruty (ed) Elsevier Sci. Pub. (North Holland) Amsterdam (1986) 183-200. [83] A. D. Ioffe, Approximate subdifferentials and applications II, Mathematika 33 (1986) 111-128. [84] A. D. Ioffe, Approximate subdifferentials and applications 3: the metric theory, Mathematika 36 (1) (1989) 1-38. [85] A. D. Ioffe, On some recent developments in the theory of second order optimality conditions, in Optimization, S. Dolecki ed., Lecture Notes in Maths, vol. 1405, Springer Verlag Berlin (1989) 55-68. [86] A. D. Ioffe, Variational analysis of a composite function : a formula for the second-order epi-derivative, Journal of Mathematical Analysis and Applications 160 (2) (1991) 379-405. [87] A. D. Ioffe, Composite optimization : second order conditions, value functions and sensitivity, Proc. Symposium Antibes, June 1990, A. Bensoussan and J.-L. Lions ed., Lecture Notes in Control and Information Sc. 144, Springer Verlag, (1990) 442-452. [88] A. D. Ioffe, Proximal analysis and approximate subdifferentials, Journal of the London Mathematical Society 41 (1990) 175-192. [89] A. D. Ioffe, Non-smooth subdifferentials : their calculus and applications, Pro ceedings International Symposium on Nonlinear Analysis, Tampa, August 1992. [90] A. Jofre and J.-P. Penot, Comparing new notions of tangent cones, Journal of the London Mathematical Society (2) 40 (1989) 280-290. [91] A. Jofre and L. Thibault, Proximal and Frechet normal formulae for some small normal cones in Hilbert space, Journal of Nonlinear Analysis Theory, Methods and Applications 19 (7) (1992), 599-612. [92] J.-L. Joly, Une famille de topologies et de convergences sur l'ensembledes fonctionnelles convexes , these d'Etat, Universite de Grenoble, 1970. [93] J.-L. Joly, Une famille de topologies sur l'ensemble des fonctions convexes pour lesquelles la polarite est bicontinue, Journal de mathematiques pures et appliquees 52 (1973) 421-441.
318
J. P. Penot
[94] T. Kato, Perturbaiion Theory for Linear Operators, Springer-Verlag, New York (1966). [95] H. Kawasaki, An envelop-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems, Mathematical Programming 41 (1988) 73-96. [96] J. L. Kelley, General Topology, Van Nostrand, Princeton, 1955. [97] J. L. Kelley and I. Namioka, Linear Topological Spaces, Van Nostrand, Princeton, 1963. [98] E. Klein and A. Thompson, Theory of Correspondences, Wiley, Toronto (1984). [99] A. Kruger, Properties of generalized differentials, Siberian Journal 26 (1985) 822-832.
Mathematics
[100] P. D. Loewen, The proximal normal formula in Hilbert space, Journal of Nonlinear Analysis Theory, Methods and Applicaiions 11 (1987) 979-995. [101] P. D. Loewen. Limits of Frechet normals in nonsmooth analysis, in: Optimization and Nonlinear Analysis, A. Ioffe et al. Eds. Pitman Research Notes 244, Longman, Harlow, 1992. [102] P. D. Loewen, A Mean Value Theorem for Frechet subgradients, Preprint, University British Columbia, Vancouver, August 1992. [103] J. E. Marsden, Countable and net convergence, The American Monthly 75 (1968) 397-398.
Mathematical
[104] E. J. McSchane, Partial orderings and Moore-Smith limits, The American Mathematical Monthly 59 (1952) 1-10. [105] Ph. Michel and J.-P. Penot, A generalized derivative for calm and stable functions, Differential and fntegral Equations 5 (2) (1992) 433-454. [106] B. S. Mordukhovich, Approximation Methods in Problems of Optimization and Control, Nauka, Moscow, 1988 (Russian; English translation to appear in Wiley Interscience) [107] B. S. Mordukhovich and Yongheng Shao, Extremal characterizations of Asplund spaces, to appear, Proceedings of the American Mathematical Society. [108] B. S. Mordukhovich and Yongheng Shao, Nonsmooth sequential analysis in Asplund spaces, Preprint, Wayne State University Detroit, 1994. [109] J. J. Moreau, Intersection of moving sets in a normed space, Scandinavica 36 (1975) 159-173.
Mathematica
Convergence Theories
319
110] U. Mosco, Convergence of convex sets and solutions of variational inequalities, Advances in Mathematics 3 (1969) 510-585. I l l ] F. Murat, Compacite par compensation, Annali della Scuola normale superiore di Pisa, Classe di Scienze (4) 5 (1978) 481-507. 112] F. Murat, H-convergence, Rapport du seminaire d'analyse fonctionnelle et numerique de l'Universite d'Alger, (1978). 113] J.-P. Penot, Topologies faibles sur des varietes de Banach, C.R. Acad. Sci. Paris 274 (1972) 405-408. 114] J.-P. Penot, Topologies faibles sur des varietes de Banach. Application aux geodesiques des varietes de Sobolev, Journal Differential Geometry 9 (1974) 141-168. 115] J.-P. Penot, Calcul sous-differentiel et optimisation, Journal of functional anal ysis 27 (2) (1978) 248-276. 116] J.-P. Penot, A characterization of tangential regularity, Journal of Nonlinear Analysis Theory, Methods and Applications 5(6) (1981) 625-643. 117] J.-P. Penot, Variations on the theme of nonsmooth analysis : another subdifferential, in: Nondifferentiable Optimization: Motivations and Applications, Proc Sopron, 1984, V.F. Demyanov and D. Pallasche, ed., Lecture Notes in Econ. and Math. Systems No 255, Springer-Verlag Berlin 1985, 41-54. 118] J.-P. Penot, Preservation of persistence and stability under intersections and operations , Preprint (1986), to appear Journal of Optimization Theory and Applications 119] J.-P. Penot, The cosmic Hausdorff topology, the bounded Hausdorff topology and continuity of polarity, Proceedings of the American Mathematical Society 113 (1991) 275-285. 120] J.-P. Penot, Topologies and convergences on the set of convex functions Journal of Nonlinear Analysis Theory, Methods and Applications 18 (10) (1992) 905916. 121] J.-P. Penot, Optimality conditions in mathematical programming, Preprint (1990) 122] J.-P. Penot, Optimality conditions for composite functions, Preprint (1990). 123] J.-P. Penot, On the convergence of subdifferentials of convex functions, Nonlin ear Analysis Theory, Methods and Applications 21 (2) (1993) 87-101.
320
J. P. Penot
1241 J.-P. Penot, Second-order generalized derivatives : relationships with conver gence notions, in: Nonsmooth Optimization : Methods and Applications, F. Giannessi, ed., Erice (1991) Gordon and Breach, Philadelphia, (1992) 303-322. 1251 J.-P- Penot, Optimality conditions for minimax problems, semi-infinite pro gramming problems and their relatives, Preprint. 1261 J.-P. Penot, Miscellaneous incidences of convergence theories in optimization and nonlinear analysis I: behavior of solutions, Set-Valued Analysis 2 (1994) 259-274. 1271 J.-P. Penot, On the interchange of subdifferentiation and epi-convergence, to appear in Journal of Mathematical Analysis and Applications. 128] J.-P. Penot, Yet another Mean Value Theorem, submitted. 129] J.-P. Penot, Favorable classes of mappings and multimappings in nonlinear analysis and optimization, to appear in Journal of Convex Analysis. 130] J.-P. Penot, Stabilized subdifferentials, in preparation. 131] J.-P. Penot and P. Terpolilli, Cones tangents et singularites, C. R. Acad. Sc. Paris, 296 (1983) 721-724. 1321 R- R- Phelps, Convex Functions, Monotone Operators and Differentiability, Lecture Notes in Mathematics, 1363, Springer-Verlag, New York, 1989. 1331 F. Plastria, Lower subdifferentiable functions and their minimization by cutting planes, Journal of Optimization Theory and Applications 46 (1985) 37-53. 134] R. T. Rockafellar, Directionally Lipschitzian functions and subdifferential cal culus, Proceedings of the London Mathematical Society 39 (1979) 331-355. 135] R. T. Rockafellar, Generalized directional derivatives and subgradients of nonconvex functions, Canadian Journal of Mathematics 32 (1980) 157-180. 136] R. T. Rockafellar, The Theory of Subgradients and its Applications to Problems of Optimization: Convex and Nonconvex Functions, Heldermann Verlag, Berlin, 1981. 137] R. T. Rockafellar and R. J. B. Wets, book in preparation. 138] E. Sanchez-Palancia, Homogenization Techniques for Composite Media, Lecture Notes in Physics 272, Springer-Verlag, Berlin (1987). [139] Y. Sonntag, Convergence des suites d'ensembles, monograph, to appear.
Convergence Theories
321
1401 N. E. Steenrod, A convenient category of topological spaces, The Michigan Mathematical Journal 14 (1967) 133-152. 1411 Y. Sonntag and C. Zalinescu, Set convergences: a survey and a classification, Set-valued Analysis 2 (1994) 339-356. 1421 L- Thibault, On subdifferentials of optimal value functions, SIAM Journal on Control and Optimizations 29 (5) (1991) 1019-1036. 143] L. Thibault and D. Zagrodny, Integration of Subdifferentials of Lower Semicontinuous Functions on Banach Spaces, Preprint, 1992. 1441 J- S. Treiman, Clarke's generalized gradients and epsilon-subgradients in Ba nach spaces, Transactions of the American Mathematical Society 294 (1986) 65-78. 1451 J- S. Treiman, Shrinking generalized gradients, Journal of Nonlinear Analysis Theory, Methods and Applications 12 (1988) 1429-1450. 1461 J- S. Treiman, An infinite class of convex tangent cones, Journal of Optimization Theory and Applications 68 (3) (1991) 563-582. 1471 J. S. Treiman, The linear nonconvex generalized gradient and Lagrange multi pliers, Preprint, Western Michigan University, Kalamazoo, January 1993. 1481 D. Walkup and R. Wets, Continuity of some convex cone-valued mappings, Proceedings of the American Mathematical Society 18 (1967) 229-253. 1491 D. E. Ward, Convex subcones of the contingent cone in nonsmooth calculus and optimization, Transactions of the American Mathematical Society 302 (2) (1987) 661-682. 1501 D. E. Ward, Which subgradients have sum formulas? Journal of Nonlinear Analysis Theory, Methods and Applications 12 (1988) 1231-1243 15ll D. E. Ward, The quantificational tangent cones, Canadian Journal of Mathe matics 40 (3) (1988) 666-694. 1521 J- Warga, Derivate Containers, Inverse Functions and Controllability, Calculus of Variations and Control Theory, Academic Press, New York, (1976) 13-46. 1531 J- Warga, An implicit function theorem without differentiability, Proceedings of the American Mathematical Society 69 (1978) 65-69. 1541 A. Wilanski, Topics in Functional Analysis, Lecture Notes in Math. 45 Springer Verlag, Berlin 1967.
322
R. A. Poliquin
and R. T.
Rockafellar
Recent Advances in Nonsmooth Optimization, pp. 322-350 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Second-Order Nonsmooth Analysis in Nonlinear Programming Rene Poliquin 1 Department of Mathematical Canada, T6G SGI Terry Rockafellar 2 Department of Mathematics,
Sciences,
University
University
of Alberta,
of Washington,
Edmonton,
Seattle,
WA 98195
Alberta
USA
Abstract
Problems of nonlinear programming are placed in broader framework of com posite optimization. This allows second-order smoothness in the data structure to be utilized despite apparent nonsmoothness in the objective. Second-order epi-derivatives are shown to exist as expressions of such underlying smooth ness, and their connection with several kinds of second-order approximation is examined. Expansions of the Moreau envelope functions and proximal map pings associated with the essential objective functions for certain optimization problems in composite format are studied in particular.
1
Introduction
Problems in nonlinear programming are customarily stated in t e r m s of a finite system of equality and inequality constraints, defining a feasible set over which a certain func tion is to be minimized. For most numerical work it is assumed t h a t t h e constraint and objective functions are C 2 , so t h a t second-order methodology can be utilized. 'This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under grant OGP41983. 2 This work was supported in part by the National Science Foundation under grant DMS-9200303.
Second-order Nonsmooth
Analysis
323
This is taken as the model for "smooth" optimization, and any problem whose objec tive function fails to enjoy such differentiability, for instance by being only piecewise C2, belongs then to the category of "nonsmooth" optimization. But in practice a distinction between smooth and nonsmooth optimization based on such grounds is artificial. Many problems that start out with a nonsmooth objective, perhaps involving penalty functions and "max" expressions, can be recast with a smooth objective. On the other hand, nominally smooth problems with inequality constraints inherently exhibit nonsmoothness in their geometry. Anyway, techniques for solving those prob lems often veer into nonsmoothness by appealing to merit functions or dualization. The real issue in numerical and theoretical optimization alike is how to represent and exploit to the fullest whatever degree of smoothness may be available in a prob lem's elements. In this respect the traditional format falls short. Its deficiency is that it places all the emphasis in problem formulation on making a list of constraints, which must be simple equations or inequalities, each associated with an explicit constraint function, and afterward merely specifying one additional function for the objective. While a vehicle is provided for working with nonsmoothness in the boundary of the feasible set, none is provided for nonsmoothness as it might be found in the graph of the function being minimized, or for that matter for any other structural features of the objective. In contrast, the composite format for problems of optimization treats both con straints and objective more supportively and is able to span a wider range of situations with ease. In the composite format, a problem is set up by specifying a representation of the type (V)
minimize
f(x) := g(F{x))
over x G Rn,
where F : Mn —> R™ is the data map-ping and g : Mm —* 1R is the model function. The mapping F supplies the problem's special elements and carries its smoothness, whereas the function g provides the structural mold. Not only does g have no need to be smooth, it can even be extended-real-valued, with values in M = [—00,00] instead of just M = (—00,00). This is central to the idea. The feasible set in (V) is defined to be C = d o m / := [x I f(x) < 00} = [x \ F(x) £ D}, where D = domg := lu \g(u) < 00 j . Here we aim at applying second-order nonsmooth analysis to the essential objec tive function / of problem (V) in this format. Keeping close to the ordinary domain of nonlinear programming, if not the usual framework, we concentrate on the case in which • F is a C2 mapping, and • g is a proper, convex function that is polyhedral,
R. A. Poliquin and R. T. Rockafellar
324
i.e., such that the set epig := {(u,a) G Mm x m\a > p(u)} is polyhedral convex, cf. [17, Section 19]. The set D is polyhedral then as well. The nature and extent of the problem class covered under these restrictions is explored in Section 2 along with the relationship to "amenable" functions, which by definition have composite expressions / = g°F with smooth F and convex g satis fying a certain constraint qualification. For amenable functions a highly developed theory of first- and second-order generalized derivatives is now in place and ready for application under the circumstances described here. Formulas for such derivatives are worked out in Section 3 and incorporated into optimality conditions in the composite format, in particular second-order conditions related to epigraphical approximation. In Section 4, second-order expansions in terms of uniform convergence instead of epi graphical convergence are studied, and the question of Hessian matrices in a standard or generalized sense is taken up. Finally, Section 5 analyzes the Moreau envelope functions e A (z) := m i n | / ( x ' ) + — \x - x^\
for A > 0,
which relate to epigraphical approximation of / because e A (x) increases to f(x) as A \ 0 . These functions not only approximate but provide a kind of regularization of / . While / may be extended-real-valued and have discontinuities (in particular, jumps to oo), eA is finite and locally Lipschitz continuous and has one-sided directional derivatives at all points. Moreover, the minimizing sets agree: argmine A = argmin/ for all A > 0. We investigate the degree to which second-order properties of eA at minimizing points x correspond to such properties of / at these points. Secondorder properties of eA have a bearing on numerical techniques like the proximal point algorithm in the minimization of / , since they inevitably depend on the proximal mapping P\(x) : = argmin j f(x') + — \x' — x\ > for A > 0. x'
^
2A'
i i
This phase of our effort owes its inspiration to recent work of Lemarechal and Sagagastizabal [6], followed by Qi [16], who were motivated by the goals just mentioned. These authors have concentrated on finite, convex functions / , not necessarily of the composite form adopted here, whereas we relinquish convexity and welcome infinite values in order to obtain results that deal with constraints. On the other hand, Qi [16] takes up the topic of semi-smoothness of Ve A , which is not addressed here.
2
Problem Characteristics and Amenability
To understand better the class of optimization problems (V) covered by the composite format through some choice of a C2 mapping F and polyhedral function g, it helps first to see how problems that are stated in the traditional manner can be accommodated.
325
Second-order Nonsmooth Analysis
Example 2.1. For C2 functions / o , / i , . . ,,fm on R", consider the minimization of fo{x) subject to , /, v' f < 0 for» fori = 1,...,«, l,...,s, /Ml) ( l ) " {=0 \ = 0 fori=8 + l,...,m. This fits the composite format of minimizing f = g°F over Rn for the C2 mapping F : Rn -» 2Rm+1 defined by F(x) = ( / o ( x ) , / i ( s ) , . . . ,/»(*)) and the polyhedral function g : J? m + 1 -> 3? defined by f ., J < 0 for fori i = = 1,...,s, U i j W = S ( u0,u g(u)=g(u « 1 ,m.). . , u r a ) == \U°"" ,", Ui \\ == 00 fori for i = s + I,.. l , . . . ,.,m, m, 0 ,u...,u [ oo otherwise. Next we look at an apparently very different model, which illustrates accommodations that can be made to nonsmoothness. Example 2.2. For C2 functions fu...
Jm
on ST, consider the minimization of
/ ( * ) = max {/,1*),,..,/m(x)} /W=m»{/,1x),,..,/ m(x)} over all x € BT (no constraints). This fits the composite format f = goF with F(x) = (/i(x),...,/ra(x)) andg{u) = g{uu...,um) = m a x { u 1 , . . . , u a } . The mapping F is C2 and the function g is polyhedral. It is well known that this kind of problem, although nominally concerned with unconstrained minimization of a nonsmooth function, can be posed instead in terms of minimizing a linear function subject to smooth inequality constraints. Indeed, in the notation x = (x,a) £ Rn+1 it corresponds to minimizing f0(x) subject to fi(x) < 0 f o r i = 0 , 1 , . . . , m , where f0(x) = a and /,(£) =/„-(x) - a for t = l , , . . , m . Thus it surely deserves to be treated on a par with other problems where smoothness dominates the numerical methodology, at least as long as the dimension n is not unduly large. Another sort of flexibility in the composite model comes to light in the way constraints can be handled in patterns deviating from the standard one in Example 2.1. Simple equations and inequalities can be supplemented by conditions that restrict a function's values to lie in a certain interval. Box constraints on x do not have to be written with explicit constraint functions at all. Example 2.3. For C2 functions / 0 , / i , . . . , / m on Rn, nonempty closed intervals / , , . . . , / „ in R and a nonempty polyhedral set X C Rn, consider the problem of minimizing f0{x) over the set C:={xeX|/,(x)€/„ C:={xeX\f,(x)€l„iJ or equivalent^,
.m}, = l . . l...m},
minimizing f(x) over all x G Rn in the case of
f(x) = /„(*) + 6c(x) = | f^x) fffXx II £'£ ' / ( * ) = fo(x) + &c{x) = I f^x)
X f x
R. A. Poliquin and R. T. Rockafellar
326
This concerns f = g°F for the C2 mapping F : M1 -> Rm+n+i
defined by
F(x) = (/ 0 (x), / i ( * ) , . . . , / » ( » ) , *) and the polyhedral function g : K m + n + 1 —► 5? defined by . f / u, G 7,fon = 1 , . . . , m, | ( u m + i , . . . , u m + n ) 6 X, | oo otherwise.
g{u) = g(u0, u j , . . . , u m , u m + 1 , . . .
Example 2.3 encompasses Example 2.1 as the special case where X = 1R" and li = (—oo, 0] for i = 1 , . . . , s but 7,- = [0,0] for t = s + 1 , . . . , m. On the other hand, Example 2.3 could be extended by taking /o to be a max function as in Example 2.2, f0(x) = m a x { / o i ( i ) , . . . , foT(x)}. Then the C2 functions /ok would become additional components of F, and the uo part of u would turn into a vector (uoi,. •., «or), with m a x { u 0 i , . . . , u0r} entering the formula for g(u). An alternative way of arriving at nonsmoothness in the objective is illustrated by the following model. Example 2.4. For C2 functions fo, fi, ■ ■ ■, fm on Mn and proper polyhedral gt : M —+ M for i = 1 , . . . , m, the problem of minimizing fo(x)+gi(h(x))
+
functions
---+gm(fm(x))
over all x 6 Rn corresponds to f = g°F for the C2 mapping F with F(x) (fo(x), fi{x),..., fm(x)\ and the polyhedral function g with g(u) = g{u0,uu...,um)
=
= u 0 + #i("i) + •■• + ^ m ( u m ) .
Polyhedral functions g, of a single real variable as in Example 2.4 are piecewise linear convex functions in the obvious sense, except that they could have the value oo outside of a some closed interval /;. As a special case, such a function could have just one "piece," being afrine on /,, or even just 0 on 7; (with the term gi{fi(x)) just representing then a constraint fi(x) 6 7,). Piecewise linear functions with multiple slopes arise in a setting like Example 2.4 when constraints are relaxed by linear penalty expressions. Of course, a geometric constraint i £ l with X polyhedral (e. g. a box—a product of closed intervals, not necessarily bounded) could be built into Example 2.4 as in Example 2.3. Within nonsmooth analysis, the composite format in optimization is closely as sociated with concept of "amenability." For simplicity in stating the definition and working with it in the rest of the paper, we introduce the following notation. For any mapping F : Rn —* Mm and any vector y G Mm we simply write yF for the scalar function defined by (yF)(x) = (y,F(x)). Thus, {yF)(x) = yifi(x)
+
1- ymfm(x)
F = ( / i , . . . , / m ) and y =
when
(yi,...,y„ Vmh
Second-order Nonsmooth
327
Analysis
and if F is C1 with Jacobian V F ( x ) one has further that V(yF)(x)
= yiVfi(x)
+ ■■■ + » m V/ m (x) =
VF(x)Ty.
Definition 2.5. A function f Mn —► Ht is amenable at x if f(x) is finite and, at least locally around x, there is a representation f = g«F in which the mapping F is C , the function g is proper, lsc (lower semicontinuous) and convex, and the following condition, an abstract constraint qualification, is satisfied by the normal cone No(F(x)j to the convex set D = domg at F(x): there is no vector y ± 0 in ND(F(xj)
with V(yF)(x)
= 0.
(CQ)
It is strongly amenable if F is C2 rather than just C1, and fully amenable if, in addition, g is piecewise linear-quadratic. To say that g is piecewise linear-quadratic is to say that its effective domain D is the union of finitely many polyhedral sets, on each of which the formula for g is linear-quadratic, i.e., a polynomial of degree at most 2. When no quadratic terms are involved, g is just piecewise linear (piecewise affine might be a better term). The convex functions that are piecewise linear are precisely the polyhedral functions of convex analysis we have been referring to so far. This leads to the following observation, which paves the way for us to applying the theory of amenable functions, cf. [11]—[15], to the class of problems under consideration. Proposition 2.6. For problem (V) in the composite format with F of class C2 and g polyhedral, let x be a point of the feasible set C at which constraint qualification (CQ) is satisfied. Then the essential objective function f is fully amenable at all points x £ C in some neighborhood of x. Proof. This merely records the import for problem (V) of the observations just made, utilizing the fact that if (CQ) holds at x it must hold for all x £ C in some neighborhood of x (cf. [13]). D The constraint qualification (CQ) is satisfied trivially when JF(X) £ intD, since No(u) = {0} at all points u £ intZ). To see what it means in other situations, we inspect the preceding examples one by one. Example 2.1'. In Example 2.1, the constraint qualification (CQ) reduces to the Mangasarian-Fromovitz condition (written in its equivalent dual form): unless all the coefficients j / i , . . . , ym are taken to be 0, it is impossible to have the equation jfiV/i(x) + --- + ym V / m ( i ) = 0 with yi > 0 for indices i £ { 1 , . . . ,s] such that /,(x) = 0, and j/, = 0 for indices i £ { 1 , . . . ,s} such that / , ( i ) < 0 (but y, unrestricted for indices i £ {s + 1,... ,m}). Detail. The set D in this case consists of all vectors u = ( u i , . . . , « m ) such that u: < 0 for i = 1 , . . . ,s and u,- = 0 for i = s -f 1, — , m . For any u £ D, therefore, the
R. A. Poliquin and R. T. Rockafellar
328
normal cone JVD(u) consists of the vectors y with y, > 0 for i 6 { 1 , . . . , s} such that u,- = 0, whereas yi = 0 for i £ { 1 , . . . , s] such that ut < 0. O Example 2.2'. In Example 2.2, condition (CQ) reduces to triviality; it is satisfied automatically at every point x € 2R". Detail. In this case D = Mn, hence F(x) g intZ) always.
□
Example 2.3'. In Example 2.3, the constraint qualification (CQ) at a feasible point x means that the only multipliers yi € Njt (/;(s)J " E™ i!/.V/,(i) £ Nx(x)
are
Vl
satisfying
= 0,... ,ym = 0.
Here 7, is a closed interval with lower bound a, and upper bound 6, (these bounds possibly being infinite, with a; < 6,j, and the relation yi € Nj, (fi(x)j restricts sign of yi in the following pattern, depending on how the constraint fi(x) 6 7; is satisfied at x relative to these bounds:
yi e TV/, (/,(*))
j/i yi yi yi
> 0 <0 =0 free
when when when when
a, a, a, a,
< = < =
fi(x) /,(x) /,(x) /,(x)
= 6,, < 6,, < 6;, = &;.
x
Nx(x).
Detail. The representation / = g°F for this case has D - Jj x • • • x J m x X, and consequently ND(F(x))
= Nh (/,(*)) x • •• x NIm (fm(x))
The characterization of the one-dimensional relations t/, 6 Nit («,) is elementary.
□
Note that the constraint qualification in Example 2.3' reduces to the MangasarianFromovitz condition in Example 2.1' when X is the whole space, so that Nx(x) = {0}, while Ii = (-oo,0] for i = 1 , . . . ,$ (so that TV/,(«;) equals [0, oo) if u ; = 0 but equals {0} if u, < 0), whereas Ii = [0,0] for i — s + 1 , . . . ,m (so that iV/,(u,) = (-00,00) as long as Ui = 0). Example 2.4'. In Example 2.4 with the closed intervals domj, denoted by Ii (these possibly being all of Si for some indices i), the constraint qualification (CQ) takes the same form as it does in Example 2.31, except that Nx(x) is replaced by {0}. The examples have indicated the advantages of the composite format in allowing optimization problems to be expressed in a variety of ways. But just how general is the class of problems the composite format covers under our restrictions? This question is answered by the next result.
Second-order Nonsmooth
Analysis
329
Theorem 2.7. The optimization problems that can be placed in the composite for mat as (V) for a C2 mapping F and a polyhedral function g are precisely the ones which, in principle, concern the minimization over a set C, specifiable by a finite system of C2 equality and inequality constraints, of a function f0 that is either C2 itself or expressible as the pointwise max of a finite collection ofC2 functions. Moreover, the representation can always be set up in such a way that a point x E C satisfies the constraint qualification (CQ) for (V) if and only if it satisfies the Mangasarian-Fromovitz condition relative to the equality and inequality constraints utilized in representing C. Proof. If an optimization problem has a representation of the kind described, it fits into the composite format in the manner of Example 2.1 as supplemented by the device explained after Example 2.2. Then (CQ) reduces to the MangasarianFromovitz constraint qualification just as in Examples 2.1'. Conversely, suppose / = g°F for a C2 mapping F and a polyhedral function g. The epigraph set e p i / consists then of the points (x,ct) such that (F(x),a) € epig. To say that g is polyhedral is to say that epi g can be represented by a finite system of linear constraints, say ,
,
.
(u,a)eepig
, , ^
, f < 0 for k = 1,.. 1 , . . . .,q,
Zt(«,Q)|
= Q
for
fc
=
g +
1
where each function 4 is affine on JR m+1 Without loss of generality this system can be set up so that the Mangasarian-Fromovitz condition is satisfied at all points of epig. (Proceeding from an arbitrary system, one can rewrite as equality constraints any inequalities that never hold strictly, and then pare down the list of equality constraints until none is redundant.) The equality constraint functions lk must have the form /*(«, a) = (a^, u) — bk for some vector ak € Mm and scalar bk 6 M, since otherwise the hyperplane defined by h(u,a) = 0 could not contain epig. The same form may be present for some of the inequality constraint functions. We can suppose that for a certain p < q all of the functions lk for k = p-f 1 , . . . , r have this special form, whereas for k = 1 , . . . ,p none of them has it. In the latter case we can rescale Ik to write it 4(u, a) = (a/,, u) — b^ — a for some a^ 6 JRm and 6* € R, since otherwise, again, the half-space defined by h{ti,a) < 0 could not contain epig. The set D = domg is given then by r
u€D
n
^
, « . / <0 (a*,u)-fc*j = Q
for k = forfc = 9 +
p+l,...,q, 1)..,jr.
We have V/*(u,or) = ( a * , - l ) for k = l , . . . , p , but Vlk(u,a) = (ak,0) for k = p + 1 , . . . , r. The fact that the Mangasarian-Fromovitz condition holds everywhere for the system representing epig implies that it holds everywhere for this system representing D.
330
R. A. Poliquin and R. T. Rockafellar
Because e p i / consists of all pairs (x,a) such that (F(x),a) € epig, it is specified by lk(F{x),a) < 0 for k = 1 , . . . ,q and lk(F(x),a) = 0 for k = q + 1,. - . , r . Let M x ) = (a*>F(x)) -bkhti = 1,...,r. Thus, according to what we have arranged, ( xi ,, QQ )) ee e p i /
[
In other words, the set C = dom / is specified by /i*(x) < 0 for k = p + 1,...,q and M x ) = 0 for k = q+ 1,...,r, and the problem of minimizing / over Mn corresponds to minimizing over this set C the function /„(*) = m a x { / i i ( i ) ) , . . , &,(x)}. How do the constraint qualifications correspond in this framework? Consider any x € C. Condition (CQ) forbids the existence of a nonzero vector y € ND[F{X)) such that V(yF)(x) = 0. We know from the representation given to D that consists of all y = E L P + i ^ak sucn that AJ
ND[F(x))
f > 0 forJfcS{p+1,•..,g} for/fcS{p+1,•..,} with hak,F(x))-bk = =, = 0 for k€ with (ak,F(x)) - bk < 0, (t € {p+l,...,q} [ free forfc k £ {9 + 1,.. . , r } ,
where furthermore (because the Mangasarian-Fromovitz condition is satisfied univer sally in the representation of D) the vector y = ELp+i ^*«* cannot be 0 unless all the coefficients \k vanish. It follows that the vectors of the form V(yF){x) for some y € ND(F(x)) are precisely those of the form £t=p+ l \kVhk(x), and that (CQ) re quires, under the restrictions listed for \k, that the zero vector cannot be expressed in this form except by taking every \k = 0. Thus, (CQ) at x comes out as identical to the Mangasarian-Fromovitz constraint qualification at x relative to the specification of C by the functions hk. □ In the statement of Theorem 2.7, the words "in principle," "specifiable," and "expressible" warn that although it may be possible to reduce a problem to the special form described, this may be neither easy nor expedient. The advantage of the composite format is that it bypasses such reformulation and allows one to move ahead without it, if that is preferred.
3
Subgradients, Epi-derivatives and Optimality
Our task in analyzing problem (V) is greatly assisted by Proposition 2.6. When a function / : Mn -> R is amenable at x, it is Clarke regular at x in particular; cf. [2] and [12]. In consequence, all the various definitions of "subgradient" that might in general be invoked lead to the same set df(x). Derivatives simplify as well. First-order one-sided derivatives arise from consider ing difference quotient functions
A,,,/ : ££ ^>-> [/(as [/(*++t() *0-- f(x)]/t / ( * ) ] / * for fort «>>0.0.
Second-order Nonsmooth
Analysis
331
Classical differentiability of / at x can be identified with the case where, as t\0, the functions A£,tf converge pointwise, uniformly on all bounded sets, to some linear function. Such uniform convergence, even if to a possibly nonlinear function, is too narrow an idea, though, to serve when / is extended-real-valued, as we wish it to be here in harmony with our mode of handling constraints. A substitute notion with many interesting ramifications can be based instead on epi-convergence of functions, which expresses set convergence of their epigraphs. We say that / is epi-differentiable at x if, as t \ 0 , the functions A i i t / epi-converge to a proper function h; such a limit function need not be linear but must of necessity be lsc and positively homogeneous. Then h is the first-order epi-derivative function for / at x and is denoted by ft. The property of epi-convergence translates into having, for each choice of a sequence t " \ 0 and a vector £, that liminf^ A ii( i-(^' / ) > /^(£) lim sup„ A i|t i-(f ") < / j ( 0
for every sequence £" —» £, for some sequence £" —> £.
We say further that f is strictly epi-differentiable at x if, not only as < \ 0 but as x —► x with f(x) —► f(x), the functions A X | ( / epi-converge (the limit in this wider sense necessarily still being the function /£ 4 ). Theorem 3.1. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution to (V) at which condition (CQ) holds. Then at all feasible solutions x in some neighborhood of x, f is epi-differentiable at x and has at least one subgradient there as well, the subgradients being characterized as the vectors v such that fW)
> f(x) + (v,x'-x)
+
o(\x'-x\).
The epi-derivative function f'T is convex and positively homogeneous, the subgradient set df(x) is convex and closed, and the two are related by
fM)=
sup (v,0, v£dj(x)
df(x)
= {v\f'x(t)>(v,0
for
all^}.
Furthermore, these epi-derivative functions and subgradient sets are obtained from those for g by the formulas m)=9'F(x){VF(xK), df(x)
= {v(yF)(x)\y
e
dg(F(x))}.
In addition, there is a neighborhood U of x such that, relative to U x Rn, the set of points {x,v) with x € U and v 6 df(x) is closed, and relative to this set, the mapping (x,v)~Y(x,v)
:= {y I V{yF)(x)
= v,
y£dg(F(x))},
332
R. A. Poliquin and R. T. Rockafellar
is locally bounded with closed graph, hence in particular compact-valued, function (x,v) H-+ / ( X ) is continuous.
while the
Proof. From Proposition 2.6 we know that / is fully amenable at every point x G C near enough to x. All these properties, except for the very last (concerning continuity of / ) , are already understood to hold for any amenable function; see [19], [22]. Really, they only need F to be C1 and g to be lsc, proper, convex. The last property has been established in [14, Prop. 2.5] in the name of strongly amenable functions, but again the proof only requires amenability. O Moving to second-order concepts, we work with second-order difference quotient functions which depend not only on a point a; where / is finite but also on the choice of a subgradient v g df(x), namely the functions
K,v,J ■ t •- [/(* + t() - f(x) - t{v,0]/\t2
for t > 0.
We say that / is twice epi-differentiable at x for a vector v if f(x) is finite, v 6 df(x), and the functions A? 5 , / epi-converge to a proper function as t \ 0. The limit is then the second epi-derivative function f'J^ : JRn —> 1R\ see [12], [19] and [21]. When df(x) is a singleton consisting of v alone, the notation / j - can be simplified to f'l- The second epi-derivative function, when it exists, has to be lsc and positively homogeneous of degree 2, although not necessarily quadratic. Further, we call / strictly twice epi-differentiable at x for v if the stronger property holds that the functions A j , , , / epi-converge as ( \ 0 , x —> x with f(x) —► f(x), and v —> 8 with
v 6 df{x). It is important to appreciate that, because it is defined in terms of epi-convergence, second-order epi-differentiability is essentially a geometric property of approximation of epigraphs. This kind of approximation differs in general from the classical kind of approximation expressed by uniform convergence of functions on bounded sets, although key relationships can be detected in special situations. Such uniform con vergence is not a viable concept for broad use in an environment like ours here. Circumstances where it does nicely come into play will be identified in Sections 4 and 5, where second-order "expansions" of / and its envelopes eA will be considered. For now, f'lv has to be thought of as providing a second-order approximation
f(x + tO* f(x) + t(v,t) + \t2fls(0, not in the usual sense of local uniformity, but the closeness of epi A? s ( / to epi f?-. A rather remarkable fact about second-order epi-differentiability was established in [19]: when / is fully amenable at x, it is twice epi-differentiable there for every v 6 df(x). The widespread availability of this property in the context of optimization is what makes it especially interesting. Before looking at what the theory of second epi-derivatives tells us about the class of problems under consideration, we look at a parallel concept which turns out to be closely connected with this.
Second-order Nonsmooth
Analysis
333
Second-order differentiation of / can be contemplated also in the framework of first-order differentiation of the subgradient mapping df : Rn ■=? Mn (where df(x) is always regarded as the empty set when f(x) is not finite). For any set-valued mapping T : Mn =? FT, one can work with difference quotient mappings AXiVitT : Mn =? Mn associated with pairs (x, v) in the graph of X, namely A», M T : { >-> \T(x + tO - v]/t
for t > 0.
The mapping T is said to be proto-differentiable at x for v if v 6 T(x) and the mappings A i i S i ( r converge graphically to a mapping A as r \ 0 , in which event the limit mapping is denoted by T^e and called the proto-derivative of T at x for v; see [13], [20], [22]. (Graph convergence of these mappings refers to the convergence of their graphs as subsets of Mn x IFF.) We say that T is strictly proto-differentiable at x for 0 if in fact the mappings AXiVitT converge graphically to 2^ a as i \ 0 and (x,v) —> (x,v) with v 6 T(x). Again, a geometric notion of approximation is invoked. We have T(x + tO
wT(s)+^(0,
not with respect to some kind of uniform local bound on the difference, but in the sense that the set epi AiiC,(T can be made arbitrarily close to epi T's B (relative to the concepts of set convergence appropriate for unbounded sets) by taking the parameter t > 0 sufficiently small. The mapping T~fi assigns to each £ 6 Mn a subset T'%^(£) of Mn, which could be empty for some choices of £. When T(x) is a singleton consisting of t; only (as for instance in the case where T is actually single-valued everywhere), the notation T's g (£) can be simplified to T±(£). In stating the next theorem, we continue the notation introduced in advance of Definition 2.5 by writing V2(yF) for the matrix of second partial derivatives of the function yF : x t-> (y, F(x)). Then
V2(yF)(x) = !,,V 2 /,(i) + • • ■ +
ymV2Ux)
when F = ( / i , . . . , / m ) and y = (yt,...
,ym).
Theorem 3.2. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution to (V) at which condition (CQ) holds. Then at all feasible solutions x in some neighborhood of x, and for all subgradients v G df(x), (a) / is twice epi-differentiable at x for v, (b) df is proto-differentiable at x for v, and the second epi-derivative function f"v and proto-derivative mapping {df)'xv are related to each other by (3/)i„ =
d(lf'lv)-
R. A. Poliquin and R. T. Rockafellar
334 Furthermore, one has the formula f {
max {£,V*(yF)(x)t) oo
if(eE(x,v), if(£E(x,v),
where Y(x,v) is the compact subset of EV" defined in Theorem 3.1 and E(x,v) is the closed cone in jR" defined by E(x,v) = Nsm(v)
= {t\fx(£)
= (v,()}
= {t\fx(0
<
{v,0}-
In fact Y(x, v) and E(x, v) are polyhedral, and in terms of the finite set Ycxi(x, v) consisting of the extreme points of Y(x, v) the second-order epi-derivative formula can be written as f"AO
f max U,V2(yF)(x)A = \ ven„(x,,) x ( oo
if (V(yF)(x)
- «,f> < 0 for all y G
Y(x,v),
otherwise.
Proof. Once more we appeal to Proposition 2.6 for the observation that our hy pothesis implies / is fully amenable at points x near enough to x with f(x) finite. Then we apply the twice epi-differentiability result and formula of [19] with the protodifferentiability result and formula of [10]. This, in combination with the results in Theorem 3.1, takes care of all the assertions except those at the end relying on the polyhedral nature of Y(x,v). The fact that Y(x,v) is polyhedral is obvious from its definition in Theorem 3.1 as the set of vectors y £ dg(F(x)J satisfying the linear equation VF(x)Ty = v, since the subgradient set dg{F(x)\ is itself polyhedral (due to g being polyhedral). Indeed, this has previously been observed in [11], [13]. For any fixed vector £ the function V •-» (£i ^2{yF)(x)w ' s linear, so its maximum over Y(x,v) has to be attained at one of the finitely many points of Yext(x, v). Because the set df(x) is the image of the polyhedral set dg(F(x)j under the linear mapping y >-» V(t/i7')(x) = VF(x)Ty, it is polyhedral as well. Then E(x,v) must be polyhedral, since it is the normal cone to df(x) at v. The definition of this normal cone characterizes E(x,v) as consisting of the vectors £ such that (v' — v,() < 0 for all v' G df(x). Hence it consist of all £ such that (V(yF)(x) - v,t;) < 0 for all y€dg(F(x)). D The last part of Theorem 3.2 reveals interestingly enough that the second epiderivative function /"„ has the same character as that ascribed to / itself in Theorem 2.7, although simpler. It is the max of finitely many C2 (actually quadratic) functions plus the indicator of a set defined by finitely many C2 (actually linear) constraints. Note again that just because we know that a set can in principle be expressed in terms of such constraints, this does not mean we can readily make use of such an
Second-order Nonsmooth Analysis
335
expression. To write E(a, v) in terms of a finite system of linear constraints we would have to identify all extreme points and extreme rays of dg(F(x)j. Depending on the circumstances, this might or might not be easy. Additional formulas for the proto-derivative mapping (df)'x can be developed from this description of f"v by following the lines in [13]. To see more closely what the results in Theorems 3.1 and 3.2 mean in common situations, we focus on two key cases, the ones in Examples 2.2 and 2.3 (as extended in Examples 2.2' and 2.3'). Example 3.3 [13, Thm. 2]. In the problem of Example 2.2, consider any x G Mn and let 7(a) denote the set of indices i such that fi(x) = f(x). Then f is epi-differentiable at x and has at least one subgradient there, with a / ( x ) = ccoo {{Vfi(x) v / , ( x ) I zi GG7I(x)}, (x)}, 8f(x)
f'x(0== -- max max (V/,-(x),£). (V/,(a),A. f'M) ■ £/(r)
Moreover, / is twice epi-differentiable at x for any subgradient v G df(x), second-order epi-derivative function given by
I
/
with the
m
V m -v,(\ < 0 for all i G 7(a), max YjkfaV'M'X) i{^(V /.'(^)f"M) = \ I 6m < /.(^) - « , { > < 0forall i G I(x), i ; ,a ix ( i , » )YjkfaV'M'X) ,=l [ !»€*V«t 00{*.») 1=1 otherwise. oo otherwise. where Y„t{x, v) is the finite set of extreme points of the compact polyhedral set yi > 0 ifi e I{x), y{ = 0 if i £ 7(a), Y{x,v) := ly y > 0 if i € 7(a), y{ = 0 if i $ I(x), Y(x,v):
-{'
{
EZLilK = 1, E£i¥
I
( 1 7>, (/.(*)) = | [
( — oo,0] [0,oo) (-00,00) [0,0] [0,oo) (-00,00) [0,0]
f;(x) = &, -, when a, < /;(x) &,, when a; = /,(x) < 6,, when a, < /,(x) < b{, when a; = f,(x) = b(. when a; = f-(x) < b„ when a, < f,(x) < bh when a; = f,(x) = &,.
R. A. Poliquin and R. T. Rockafellar
336
Example 3.4 [13, Thm. 4]. For the problem of Example 2.3, consider any x e C satisfying the constraint qualification described in Example 2.3!. Let L{x,y) = }0{x) + yifi(x)
+
\-ymfm{x).
For all x 6 C in some neighborhood of x, f is epi-differentiable at x and has at least one subgradient there, with df(x) = V/ 0 (x) + Nc(x) = {vxL(x,y)\yi fry
=
f (V/o(*U) ( oo
6 Ni, (/*(*))} +
Nx(x),
# £ e Tjr(x) and ( v / , ( x ) , £ ) £ 7/.(/,(x)) for all i, otherwise.
Moreover / it twice epi-differentiable at x for every subgradient v 6 df(x), with the second-order epi-derivative function given in terms of the Lagrangian L by fZJO
= max (t,VlJL(x,y)t)
+
63M(0,
where Y(x, v) is a compact polyhedral set and H(x, v) is a polyhedral cone, namely Y(x,v)
= {y
y,€N,,(f,(x)),
5(i,B)={fgrc(*)
v-VxL(x,V)
€
Nx(x)\,
( « - V / 0 ( x ) , € ) = o}
= {( 6 Tx(x
(VM*U)
6 Ti, (f,(x))
for all i, (v - Vf0(x),(.)
= o}.
Here Y(x, v) can be replaced in the max expression by its finite set of extreme points. In Example 3.4 the function / 0 has been assumed to be C2, but the methodology is not limited to that case. We could easily go further by taking / = f0 -f Sc with the set C chosen according to the specifications in Example 2.3, but with f0 taken to be any fully amenable function. In particular, /o could be a max function of the kind in Examples 2.1 and 3.3, hence nonsmooth. This generality is attained through the calculus we have developed in [12], which provides formulas for /"„(£) and (df)'x „(£) when / is expressed as the sum of two fully amenable functions under an associated "constraint qualification" on the domains of the functions. For f = f0 + £c this constraint qualification is satisfied in particular when f0 is finite everywhere, as in the max function case. Then df(x) = df0(x) + Nc(x), and for any v G 9f(x) one has in terms of the set V(x,v)
:= {(«D,«I)|VO 6 9f0(x),
vz e Nc(x),
v0 + i>i = t>}
the expressions £.({)=
wuo=
max
{(/o): i 1 w (0 + ( * o ) : « ( f ) } ,
u (vo,«l)eViri»«(l.l'.0
fMlj« + WU«)}. '
Second-order Nonsmooth
337
Analysis
where ^ ^ ( i j U , ^ ) is the set of vectors (t>o,t>i) that achieve the maximum. The problem in Example 2.4 could likewise be handled by such calculus or tackled directly through Theorems 3.1 and 3.2. The first- and second-order epi-derivatives that have been shown to exist for the general problems in composite format we have been considering can be used employed in particular in the statement of optimality conditions. Theorem 3.5. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any feasible solution at which the condition (CQ) is satisfied. Let Y(x, 0) := [y € dg^F(x)) I V(yF)(x) = o}, this being a compact, polyhedral convex set (possibly empty), and let V ext (x,0) be its finite set of extreme points. (a) If x is locally optimal, then Y(x,0) must be nonempty, and max U,V2(yF){x){,) !/€n*>(r)
> 0 for £ satisfying
dg(F(xj).
is nonempty and max
U, V2(yF)(x)A
>0/or{/0
(V(»F)(S),f) < 0 for all y £
satisfying 8g(F(x)),
then x is locally optimal. Proof. This applies the formulas of Theorems 3.1 and 3.2 to the general characteri zation of local optimality in terms of first- and second-order epi-derivatives in [19]. □
4
Hessians and Second-order Expansions
Pursuing second-order properties to a greater depth, we turn to the question of the existence of second-order expansions for / in the sense of locally uniform convergence of difference quotient functions rather than the epi-convergence employed so far. In this endeavor we draw on results from our paper [15]. Two definitions from this paper set the stage. Definition 4.1. A single-valued mapping G from an open neighborhood of x 6 IRn into Mm has a first-order expansion at a point x £ O if there is a continuous mapping D such the difference quotient mappings AttiG : [G{x + tO - G(x)\/t for i > 0
R. A. Poliquin and R. T. RockafeUar
338
converge to D uniformly on bounded sets as t\0. The expansion is strict if actually the mappings AX,,G : [G(x +1£) - G(x)] /t for t > 0 converge to D uniformly on bounded sets as t\0
and x —* x.
The existence of a first-order expansion means that G is directionally differentiable at i : for every vector £ £ 1R", the directional derivative limit ,.
lim
G(x + tO - G{x) : —
exists. The existence of a strict first-order expansion means that G is strict directional differentiable at x; it corresponds to the existence for every £ of the more complicated limit where x is replaced by x, and x —> x along with £ —♦ £ and t\0. In both cases the mapping D in Definition 4.1 gives for each f the directional derivative D(£)Definition 4.2. Consider a function g on ]Rn and a point x where g is finite and differentiable. (a) g has a second-order expansion at x if there is a finite, continuous function h such that the second-order difference quotient functions
*IM® := W + *0 - 9{x) - t{Vg{x),Z)]/\t2 converge to h uniformly on bounded sets as t\0. The expansion is strict if g is differentiable not only at x but on a neighborhood of x, and the functions
&IMO ■= [(* + ® - g[x) - t(Vg(x),0}/lt2 converge to h uniformly on bounded sets as t\0 and x —> x. (b) g has a Hessian matrix H at x, this being a symmetric n x n matrix, if g has a second-order expansion with h(£) = ((, H(,}- The Hessian is strict if the expansion is strict. (c) g is twice differentiable at x if its first partial derivatives exist on a neighborhood ofx and are themselves differentiable at x, i.e., the second partial derivatives of g exist at x. Then V2g(x) denotes the matrix formed by these second partial derivatives. A second-order expansion in the sense of Definition 4.2 automatically requires the function h also to be positively homogeneous of degree 2: A(Af) = A2/i(£) for A > 0, and in particular, h(0) = 0. It means that g(x + ti) = g(x) + t(Vg(x),0
+ h2h(0
+
o(t*\tf)
for such a function h that is finite and continuous. The existence of a Hessian corre sponds to h actually being quadratic. The existence of a second-order expansion for an essential function / can be settled in a definitive manner on the basis of the second-order epi-derivative formula in
Second-order Nonsmooth
Analysis
339
Theorem 3.2 and a general result in our paper [14]. It is crucial for this purpose that strongly amenable functions / , such as we know we are dealing with now by virtue of Proposition 2.6, have a property called "prox-regularity," which we introduced in [14] (cf. Prop. 2.5 of that paper). This property is a typical hypothesis for most of the results of [14] and [15] that will be applied in what follows. Here we leave all discussion of it aside, jumping directly to the conclusions it supports. Theorem 4.3. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = d o m / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, the following properties are equivalent: (a) / has a second-order expansion at x; (b) / is differentiable at x; (c) df(x) contains a solitary vector v; (d) V(t/F)(i) is the same vector v for all y e dg(F(x));
(e) (d/)U°) = W for some vUnder these circumstances necessarily x 6 int C and V / ( x ) = v, and the expansion of f takes the form
/ ( * + *« = /(*) + („,£) +^< 2 max (CV*(yF)(x)t)
+
o(t2\e),
y€YM(x)
where Yext(x) is the set of extreme points of the compact, polyhedral convex set dg(F(x)J, and the max expression also equals /"„(£)• Proof. In view of Proposition 2.6, condition (CQ) extends from x to all points x sufficiently near to x with f(x) finite. It suffices therefore to argue the equivalences just at x itself. If a second-order expansion exists at x, / must in particular be differentiable at x and the function h expressing the second-order term must be the second epi-derivative function f'J, inasmuch as locally uniform convergence of difference quotient functions implies their epi-convergence. Conversely, if for any v € df(x) the function / ? 5 is finite, we obtain from [14, Thm. 6.7] (through the prox-regularity of / mentioned prior to the statement of the present theorem) that (b) and (c) hold with V / ( x ) , and moreover that (a) holds with the second-order term in the expansion dictated by h = f^. At this juncture we can apply the formula for fj[n in Theorem 3.2, which yields all the rest. In particular, (e) is obtained as an equivalent condition because (df)'M s (0) consists of the subgradients of | / ^ „ at 0. The subgradient formula for this function (cf. [13]) indicates that the unique subgradient at the origin is 0 if and only if the cone E(x,v), which is the effective domain of f'^s, has the origin in its interior, i.e., this cone is the whole space. D When does the expansion in Theorem 4.3 correspond actually to a Hessian for / at x? The following lemma will help answer this and a subsequent question as well.
R. A. Poliquin and R. T. Rockafellar
340
Lemma 4.4. Let Q{, i = 0 , 1 . . . , m be symmetric matrices in Mnxn, and let M be any subspace of Mn (perhaps JR" itself). Then in order to have the property max {(,Qit) = (£,QoO for all (, 6 M, 1=1,...,m
there must actually be an index i0 G { 1 , . . . ,m} such that the quadratic forms as sociated with Qi0 and Q0 agree on M. In other words, there must exist i0 such that to € argmax(£, Q;() for all £ G M. i = l , ...,m
Proof. We may assume without loss of generality that M = J?", since otherwise a change of coordinates can be employed to bring about a reduction to a space ]Rr with d < n. For each i G { 1 , . . . , m} let C, denote the closed subset of JRn consisting of the points x where index i gives the max, i.e., where the quadratic function , associated with Qi agrees with the quadratic function qo associated with Qo- The union of these sets C, is M". By suppressing indices one by one as needed, we can come up with a collection indexed by i G I C { 1 , . . . , m} such that the union of the C,'s for i G / is all of Ft", but no subcollection has this property. Then every C, for i E I must have nonempty interior, because it covers the complement of the (closed) union of all the other sets in this collection. The fact that qi agrees with q0 on the nonempty, open set int C, implies Qi = Qo (e.g., because the two functions q, and qo have the same second derivatives there). Hence we have Qi = QQ for all i £ I. □ Theorem 4.5. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = d o m / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, the following properties are equivalent: (a) / has a Hessian at x; (b) / is differentiable at x, and the function f" is quadratic; (c) df(x) is a singleton, and (df)'x is single-valued everywhere and linear; (d) there is a vector y G Yext(x) such that, for every y G Yext(x), one has both V([» - y]F)(x) = 0 and V2{[y - y]F)(x) positive semidefinite. Proof. The equivalence between (a), (b) and (d) is immediate from Theorem 4.3 and Lemma 4.4. Condition (c) comes into the picture because (df)'x is the subgradient mapping for if" by Theorem 3.2, so it is linear if and only if / " is quadratic. □ These results make clear that the existence of a Hessian for / is quite a special property in our context. It corresponds to /"„ being quadratic with v the unique element of df(x), and that only shows up in cases where constraints and first-order discontinuities are out of the immediate picture. However, there is an interesting concept to fall back on, which operates in wider territory.
Second-order Nonsmooth
Analysis
341
Recall that a function h : M" —» M is a generalized (purely) quadratic function if it is expressible in the form U j
~\oo
iit$M,
where M is a linear subspace of R" and Q is a symmetric matrix in ]Rnxn. On the other hand, a possibly set-valued mapping D : IFF =t 2Rm is a generalized linear mapping if its graph is a linear subspace of Rn x Mm. The generalized quadratic functions are known to be precisely (up to an additive constant) the functions whose subgradient mappings are generalized linear mappings. Let us think of / as having a generalized Hessian at z relative to a subgradient v G df(x) if the second-order epi-derivative function / " „ exists and is a generalized quadratic function. We do not want to push this terminology too far, since the concept reverts to approximation in the sense of epi-convergence rather than locally uniform convergence, but a certain case can be made for it, especially in view of the results that will be obtained in the next section in connection with envelope functions. The idea is that a generalized quadratic function h can be regarded as associated with a "generalized matrix for which some of the eigenvalues may be oo," this being identified with a subspace M and an equivalence class of symmetric n x n matrices Q with respect to inducing the same quadratic form on M. These matrices all have the same eigenvalues relative to M; by an isometric change of coordinates that preserves the orthogonal decomposition of JRn into the sum of the subspaces M and M x , they can simultaneously be reduced to the same diagonal matrix whose entries are these eigenvalues. We can simply regard M1 as the eigenspace associated with the eigenvalue oo. These remarks are chiefly intended to be motivational, but the question of when / " „ is a generalized quadratic function turns out to be important for a number of reasons. We proceed with putting together an answer. In this we denote by ri B the relative interior of a convex set B (in the sense of convex analysis [17]). Theorem 4.6. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = d o m / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, and all v £ df(x), the following properties are equivalent: (a) f"v is generalized quadratic; (b) {df)'xv is generalized linear; (c) there exists y € ridg(F(x)j such that V(yF)(x) = v; further, there exists y € Yext(x,v) such that ({, V2([2/ -y]F){x){)
> 0 for all y £ Ycxl(x,v)
where the notation is that of Theorem 3.2.
and ( G E(x,v),
342
R. A. Poliquin and R. T. Rockafellar
Proof. The equivalence between (a) and (b) is assured by the relation between / " „ and (df)'xv in Theorem 3.2. For the equivalence between (a) and (c), we recall from Theorem 3.2 that, for all i in a neighborhood of x, the domain of / " „ is the normal cone to the convex set df(x) at v. The normal cone to a convex set is a subspace precisely when the point under consideration belongs to the interior of the set. Because df{x) is the image of the convex set dg(F(x)) under the linear transformation y -► VF(x)Ty (by Theorem 3.1), ,ts selative enterior ri sth emage under ri dg(F(x)) under this transformation (cf. [17, Sec. 6]). Thus, the cone H(x,u) is a subspace if and only if v = V(yF)(at) for some y £ to apply Lemma 4.4.
ridg(F(x)).
It remains only D
The "generalized Hessian" case also arises in connection with strict second-order epi-differentiability of / . Theorem 4.7. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let x be any point of the feasible set C = dom / at which the condition (CQ) holds. Then for all x sufficiently close to x with f(x) finite, and for all v € df(x), the following properties are equivalent and imply in particular that f'x'v is a generalized quadratic function: (a) / is strictly twice epi-differentiable at x for v; (b) / £ „ . epi-converges (to something) as (*',«') -» (x,v) in the set of pairs (x',v') with v' €df{x') for which f'J,y is generalized quadratic. Proof. This comes out of [15, Cor. 4.3] because of Theorem 3.2 and the proxregularity of / consequent to the strong amenability in Proposition 2.6. D A test of sorts for the case in Theorem 4.7, albeit a stringent one, is the following. Proposition 4.8. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let 3 be any point of the feasible set C = d o m / at which the condition (CQ) holds. Suppose the function /£„ is generalized quadratic for a certain v G df(x), and for all points (x,v) near (x\ v) such that /£„ is generalized quadratic denote by YmM(x, v) the set of vectors y satisfying the associated condition in Theorem 4.7(c). Then a sufficient condition for f to be strictly twice epi-differentiable at x for v is that both Ex,v -» E*t8 and r m a x (x, v) -> Ymax(x, v) as x -* — »nd v -» — in the set of pairs (x,v) with v € df(x) for which /»„ is generalized quadratic. Proof. All we need to do, according to Theorem 4.7, is to show that fx'v epi-converges to / j g as (x,v) -> (x,v) i i nhe eet to fairs sx,v) with h G Gf(x) for rhich h/ „i generalized quadratic. We first need to show that for all (
fl,(0<^Mf'Jk^k) 1
i
K—«-00 k—tco
—
* i
K K
'
'
K K
/■
•
/
whenever ffc -» f, xk - * and vk -. v in the set of pairs (xk, vk) with vk G d6(xk) for which fXk,„„ is generalized quadratic. If ^ k i„, £ E(xt,vt) for all k sufficiently large
Second-order Nonsmooth
343
Analysis
there is nothing to show. Assume not, then £ G E(x,t>). Now consider y G ^nax(x,C>). Because ym»x(x,i>) -» y m M (x,C), there exists y* G Fm»x(zk,f;fc) with y* -> y. It follows that f"ktVls((k) = (ik,^2(ykF)(Xk)ik), and in the limit we get the desired inequality. Finally we show that for all £ there exist £It„ —> £, as x —► x and u —> v in the set of pairs (x, t>) with u G df(x) for which /Xi„ is generalized quadratic, with
limsup fUU«) < f'UO-
X->I, u->6
If £ £ - ( x , u) there is nothing to show. When £ G E(x,i>) there exists (XiV G E(x,r) with £x,„ -> £. We have /£„(£*,„) = (6,«, V 2 (y I ,„F)(x I ,„)f x ,„) for some y I|B G Kn»x(x,u). We may assume that yXi„ —► y with y G K nax (x,C); this is due to (CQ). In the limit we get the desired inequality. D To what extent are these various properties realized in our examples? The case of a max function furnishes some good insights. Proposition 4.9. In the case of a function f = m a x { / i , . . . , fm} in Example 2.2 (as continued in Examples 2.2 and 3.3), consider any x G Rn and any v 6 df(x) = co {V/,-(x) | i G I(x)]. (a) / i a s a second-order expansion at x if and only if the vectors V/;(x) for i G I(x) coincide (or I(x) is just a singleton). It has a Hessian at x if and only if, in addition, the matrices V 2 /,(x) for i G I(x) coincide, this common matrix then being the Hessian matrix. (b) / " „ has a subspace for its effective domain 3(x,v) if and only if one actually has v G ri co j V/,(x) i G 7 ( x ) | , in which event E(x,v) := {£ |
Y(x,v) = ly y, > 0 if i G J(x), yt = 0 if i <£ I(x),
E£iw = i, £r=i!/.v/,(x) = v] such that £,yi(t,V2fi(x){) 1=1
> £ y . ' ( £ , V 2 / , ( z K ) for all y' G Y(x,v)
and £ G E(x,v).
i=l
Proof. These results follow from Theorem 4.6 via Theorem 3.2.
□
Strict twice epi-differentiability is harder to pin down in this example, but an elementary sufficient condition for it can readily be developed. Recall that a set of vectors Vo, «i • - •, v, is affinely independent if the set {vi — v0,..., va — v0} is linearly independent.
344
R. A. Poliquin and R. T. Rockafellar
Proposition 4.10. For the max function in Proposition 4.8, suppose that (a) the vectors Vf(x) for i E I{x) are aiEnely independent, and (b)*€ri[co{v/,(z)|i €/(*)}]. Then f is strictly twice differentiate at x for v. Indeed in this case, for all (x, v) sufficiently close to (x, v) with v G df(x), the function £'„ is generalized quadratic and depends epi-continuously on (x, v). Proof. Let gph 5 / denote the graph of the mapping df, i.e., the set of pairs (x,v) with v G df(x). We first show that under our assumptions there is a neighborhood U of (x,t>) such that for all (x,v) G U n gphdf, we have 7(x) = 7(x). Consider xk -t x and » t - » S with vk G df{xk). We have T,jeiM(yk)iVMxk) = «* for some vector yk G Y{xk,vk). Because E,e/(xt)(!/fc). = 1 and (yk)t > 0, we may assume that (yk)i -» yt (as Jb -» oo). We may also assume (by taking a subsequence if necessary) that I(xk) = /* for some subset 7* of { l , . . . , m } . In the limit we have D e i * *VJi(S) = »• Then it follows from our assumptions that I* = 7(x). We next show that Y(x, v) consists of only one vector when { v / , ( x ) I i G 7(x)} is affinely independent. To see this, assume that
E wv/*(x) y-vf,(x) ==«v= vFM*) = E vFM*) ■e/(x)
.g/(x)
for j / and y> in F(x,u). This in turn means that
E E (».■(v.■- y,')v/,-(*) y,')v/,-(*) ==oo== EE (vi-Vi)*M*)> (w-»0v^(*)'
i£/(x) ■ei(jc)
i6/(x) <€/{*)
because E i e * W * = 1 = E* 6 /W»i- Therefore
E (w-y')(v/<(*)-v/1(*))=o,
.e/(x) •e/(»)
which shows that y; = •/< for all z. It follows easily from the preceding observations that (a) and (b) are satisfied at (x, v)eUn gph df. Also note that the arguments we have furnished show that for all (x, v)€Un gph df we have y> -> y as x' - . x and v' -> u where y* = F(x', i/) and t/ = K(*,w). We know then from Proposition 4.8 that f'Jv is a generalized quadratic for all (x, v)£lfn gph df (inasmuch as Y(x, v) is a singieton). Finally we demonstrate that for all (x, v) € gph df in a neighborhood of (x, v) the function / is strictly twice epi-differentiable at x for v. We know that 7(x') = 7(x) for a l l ( * > ' ) G f/ngph df. Fix(«,») 6 C/ngphc?/. Because the set {(V/,(x') - «') | i G 7(x)} is affinely independent, we have E{x',v') -> E(x,u) as x' -+ x and v' -* v with »' € df(x'). Recall that F max (ar',u') -» K B ^ X . U ) . We now apply Proposition 4.8, and this completes the proof. □'
Second-order Nonsmooth
Analysis
345
The condition in Proposition 4.10 is so powerful that it guarantees not only the strict second-order epi-differentiability of / at x for v but the same also for all (x,v) near (x,v) in the graph of df. It is hard to come up with a tractable condition for strict second-order epi-differentiability that is more modest in its consequences. The following example does show, however, that a max of finitely many C2 functions can be strictly twice epi-differentiable at a point i (actually here a point of global minimum) without necessarily being strictly twice epi-differentiable at nearby points. Example 4.11. Let fi(xi,x2)
f(xux2)
:= xi 3 S2 2 and f2{xi,x2)
:= —fi(xi,x2).
Consider
:= \fi(xi,x2)\ = m a x ( / 1 ( i 1 , i 2 ) , / ! ( i 1 , i 2 ) } .
l
This function f is C (in fact it is both C 1+ (differentiable with locally Lipschitz continuous gradient mapping) and lower-C2), and it is strictly twice epi-differentiable at x = (0,0), yet it does not have this property at points of the x\-axis away from the origin. Detail. The functions f\ and f2 agree on the X\- and 12-axes, with Vfi(x^,x2) = (0,0) there for i = 1,2. This shows that / is C 1+ as well as lower-C2, and in particular C1. Furthermore, / has a global minimum at x = (0,0), where both ft and f2 the null matrix as their Hessian. We therefore have /(oo)(oo)(0 = ^ f° r a ^ £ by Theorem 3.2, so the function /(o0woo) ' s quadratic. At a general point x not on the X\- or x2-axes, f'Jv is the quadratic associated with the Hessian of f\ or f2. For points with x\ = 0, the second-order epi-derivative likewise has the property that /('U),(o,o)(0 = ° f o r all £■ But when x2 = 0 we have (£,V2fi(xu0)0 = 2x?& 2 and 2 2 so (£, V / 2 ( i i , 0)f) = —2xj^2 , that except for the origin, / is not twice differentiable at such a point nor strictly twice epi-differentiable there. Instead, //' 0 , I 0 I O)(0 = max{2ij£ 2 2 , — 2x\(,22} = |2ij^ 2 2 | for all f = ( ^ I , ^ ) - The formulas we have identified for the second-order epi-derivative show that /"_vf(r) converges uniformly on bounded sets to /(oo) (oo) as i - ' 0 ; in particular they epi-converge. Hence by Theorem 4.7, / is strictly twice epi-differentiable at (0,0) for (0,0). □ We now turn our attention to Example 2.3, where f(x) = fo{x) + 8c{x) with / 0 smooth. Adopting the terminology of [1], we say in this setting that a pair (x,v) for v 6 df(x) furnishes a nondegenerate stationary point (relative to the problem of minimizing / - (v, ■) in R") if v - V/ 0 (z) € ri Nc{x). Proposition 4.12. In Example 2.3, consider any point x € C where the constraint qualification is satisfied (as characterized in Example 2.31), and let v 6 df{x), which is equivalent to v — V/ 0 (x) € Nc(x). Then (a) the effective domain E(x,v) of /£„ is a subspace if and only if (x,v) furnishes a nondegenerate stationary point, in which event 3(x,v)={{eTc(x)
= {teTx(x)
(v/,(i),^)€T / ,(/,(x)) for all z, (v - V/ 0 (i),f) = o};
R. A. Poliquin and R. T. Rockafellar
346
(b) f"v is a generalized quadratic function if and only if, in addition, there is a multiplier vector y in the set Y(x, w) = {y | jfc € Ni, (fi(x)),
v - VxL(x, y) €
Nx(as)}
with the property that ((, *LH*,
V)t) < (t, V * , i ( * . V)t) *>«■ »U V' € Y(x, v)and(e
E(x, v).
Proof. This result follows from Example 3.4 and Theorem 4.6. Note that from Example 3.4 we do have df(x) = V/ 0 (x) + Nc(x), and therefore t; 6 r\df(x) if and only if v — V/ 0 (x) G ri Nc(x), i.e., (x,v) is a nondegenerate stationary point. □ P r o p o s i t i o n 4.13. In Example 2.3, consider any x G C with v G V/ 0 (x) + Assume that (a) (x,v) furnishes a nondegenerate stationary point, (b) | V / , ( i ) fi(x) £ r i / , } is linearly independent,
Nc(x).
(c) x = mn. Then for all (x, v) in a neighborhood of(x, v) with v G df(x) the function f is strictly twice epi-differentiable at x for v, and in particular / " „ is generalized quadratic. Proof. The line of proof is very similar to that of Proposition 4.10. First notice that there exists a neighborhood U of ( i , v) such that for all (x, v) G U n gph df we must have | V / , ( x ) /,(z) ^ ri /,} linearly independent. Next notice that we may also assume that {2|/,(x)Gri/,} = {J|/1(x)Gri/,} when (x,v) £ f / f l gph df. This is because v — V/ 0 (x) G r\Nc(x),
Nc(x) = {VxL(x,y)\y,
€
where
N^Mx))}
(recall that L(x,y) = f0(x) + Ej/./.(x)). £From this it follows that Y(x,v) is a singleton for all (x, t>) G C/ n gph df. We now easily conclude that YmaLX(x, v) —> imax(x,i>) and E(x,») —► 3(x,v) when x —> x and v —► 6 with u G 9f(x). To finish off, we apply Proposition 4.8. □
5
Proximal Mappings and Envelopes
^From now on we concentrate on the envelope functions eA and proximal mappings P x defined at the end of Section 2 in association with a function / . We continue to take / to be the essential objective function for problem in composite format. Mainly we concentrate henceforth on the case of minimizing points x G argmin/. Such points have v = 0 as a subgradient: 0 G df(x) by Theorem 3.5.
Second-order Nonsmooth
347
Analysis
First on the agenda is the specialization to this context of a selection of facts from [14] and [15]. (The interested reader should consult these papers for many other results.) Theorem 5.1. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for each A > 0 sufficiently small, there is a neighborhood of x on which the function ex is C1+ and lower-C2, the mapping Px is single-valued and Lipschitz continuous, and
Vex = \-l[l-Px]
= [\I+(df)-l}~\
Px = (l + Ad/)" 1 with Px(x) = x. Proof. We invoke [14, Thms. 4.4, 4.6, 5.2], making the observation, as above, that our assumptions entail through Proposition 2.6 that / has the prox-regularity de manded in those theorems. □ Functions that are C1+ have been the focus of much research recently. The reader interested in the study of generalized second-order directional derivatives and Hessians of these functions will surely want to consult the work of Cominetti and Correa [3], Hiriart-Urruty [4], Jeyakumar and Yang [5], Pales and Zeidan [9], and Yang and Jeyakumar [23]. Note that here the function tx is not only C 1+ but also lower-C2 Theorem 5.2. Let f be the essential objective function in problem (V), with f = goF for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which condition (CQ) is satisfied (so that 0 G df(x) in particular), and for A > 0 define
dx(0 = m i n { ^ , 0 ( 0 + - | f - {|2} for all £. Then for all A sufficiently small the function d\ is both C 1+ and Jower-C2, the gradient mapping Vdx being Lipschitz continuous globally, and the following properties hold: (a) ex has a second-order expansion at x, given by ex(x + t{) = ex(x) + t2dx(0 +
o(\tt\2),
(b) Vex has a first-order expansion at x, given by
vtx(x + to = tvdx(o + o(\m), (c) Px has a first-order expansion at x, given by
Px{x + ti) = x +
t[l-\Vdx(0}+o(\t(\).
Proof. This time we apply [15, Thm. 3.5], again utilizing the prox-regularity of / furnished through Proposition 2.6. D
348
R. A. Poliquin and R. T. Rockafellar
Theorem 5.3. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent and necessarily involve the same matrix Hx: (a) ex has a Hessian matrix Hx at x; (b) VeA is differentiable at x with Jacobian matrix Hx; (c) ex is twice differentiable at x, with H\ = V 2 e A (x); (d) Px is differentiable at x with Jacobian matrix I — \HX; (e) / " 0 is generalized quadratic. Proof. This goes back to [15, Thm. 3.8], once more under the prox-regularity that our hypothesis guarantees. □ Theorem 5.4. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent: (a) / is strictly twice epi-differentiable at x for v; (b) ex has a strict Hessian at x; (c) VeA is strictly differentiable at x; (d) ex is twice differentiable at x, and V2ex(x) —> V 2 e A (i) as x —t x in the set of points x where ex is twice differentiable; (e) ex is strictly twice epi-differentiable at x for v; (f) VeA is strictly proto-differentiable at x for v; (g) Px is strictly differentiable at x; (h) Px is strictly proto-differentiable at x; Proof. This quotes [15, Thms. 4.1,4.2] in the environment of the prox-regularity of / that comes from Proposition 2.6. □ Theorem 5.5. Let f be the essential objective function in problem (V), with f = g°F for a C2 mapping F and a polyhedral function g. Let x be any optimal solution at which the condition (CQ) is satisfied. Then for every A > 0 sufficiently small, the following properties are equivalent: (a) eA is C2 on a neighborhood of x; (b) Px is C1 on a neighborhood of x; (c) For all (x,v) near to (x,v) in the graph of df, f is twice epi-differentiable, f"v is generalized quadratic, and f'Jv depends epi-continuously on (x,v), i.e., f", , epi-converges to /"„ as (x',v') —> (x,v) with v' G df(x'). Proof. We appeal here to [15, Thm. 4.4].
□
Corollary 5.6. In the case of Theorem 5.5 where f happens to be differentiable at x, or merely if it satisfies a local growth condition of type f(x) < f(x) + s\x — x\2, properties (a) and (b) hold if and only if f is itself C2 on a neighborhood of x.
Second-order Nonsmooth
Analysis
349
Proof. The additional assumption forces fga to be finite (cf. Theorem 4.3), and the property in (c) of Theorem 5.5 reduces then to / being C2; see also [15](Cor. 4.5). □ Example 5.7. For the function f of Example 2.2, the assumptions of Proposition 4.8 and Theorem 5.5 ensure the presence of properties (a) and (b) of Theorem 5.5. Example 5.8. For the function f of Example 2.3, the assumptions of Proposition 4.13 and Theorem 5.5 ensure the presence of properties (a) and (b) of Theorem 5.5.
References [l] J. V. Burke, On the identification of active constraints II: the nonconvex case, SIAM Journal Numberical Analysis 27 (1990) 1081-1102. [2] F. H. Clarke, Generalized gradients and applications, Transactions of the Amer ican Mathematical Society 205 (1975) 247-262. [3] R. Cominetti and R. Correa, A generalized second-order derivative in nonsmooth optimization, SIAM Journal on Control and Optimization 28 (1990) 789-809. [4] J.-B. Hiriart-Urruty, Characterization of the plenary hull of the generalized Jacobian matrix, Mathematical Programming Study 17 (1982) 1-12. [5] V. Jeyakumar and X. Q. Yang, Second-order analysis of C 1,1 functions and con vex composite minimization, preprint 1992. [6] C. Lemarechal and C. Sagastizabal, Practical aspects of the Moreau-Yosida regularization I: Theoretical properties, preprint 1994. [7] A. Levy and R. T. Rockafellar, Variational conditions and the proto-differentiation of partial subgradient mappings, Nonlinear Analysis theory, Methods and Applications submitted. [8] P. D. Loewen, Optimal Control via Nonsmooth Analysis, CRM Proceedings & Lecture Notes 2, AMS, 1993. [9] Z. Pales and V. Zeidan, Generalized Hessian for C 1+ functions in infinite dimen sional normed spaces, preprint 1994. [10] R. A. Poliquin, An extension of Attouch's Theorem and its application to secondorder epi-differentiation of convexly composite functions, Transactions of the American Mathematical Society 332 (1992) 861-874. [11] R. A. Poliquin and R. T. Rockafellar, Amenable functions in optimization, Nonsmooth Optimization Methods and Applications, F. Giannessi (ed.), Gordon &: Breach, Philadelphia, 1992, 338-353.
350
R. A. Poliquin and R. T. Rockafellar
[12] R. A. Poliquin and R. T. Rockafellar, A calculus of epi-derivatives applicable to optimization, Canadian Journal of Mathematics 45 (4) (1993) 879-896. [13] R. A. Poliquin and R. T. Rockafellar, Proto-derivative formulas for basic subgradient mappings in mathematical programming, Set-Valued Analysis 2 (1994), 275-290. [14] R. A. Poliquin and R. T. Rockafellar, Prox-regular functions in variational anal ysis, preprint October 1994. [15] R. A. Poliquin and R. T. Rockafellar, Generalized Hessian properties of regular ized nonsmooth functions, preprint November 1994. [16] L. Qi, Second-order analysis of the Moreau-Yosida approximation of a convex function, preprint 1994. [17] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [18] R. T. Rockafellar, Maximal monotone relations and the second derivatives of nonsmooth functions, Ann. Inst. H. Poincare: Analyse Non Lineaire 2 (1985) 167-184. [19] R. T. Rockafellar, First- and second-order epi-differentiability in nonlinear pro gramming, Transactions of the American Mathematical Society 307 (1988) 75107. [20] R. T. Rockafellar, Proto-differentiability of set-valued mappings and its applica tions in optimization, Analyse Non Lineaire, H. Attouch et al. (eds.), GauthierVillars, Paris (1989), 449-482. [21] R. T. Rockafellar, Second-order optimality conditions in nonlinear program ming obtained by way of epi-derivatives, Mathematical of Operations Research 14 (1989) 462-484. [22] R. T. Rockafellar, Generalized second derivatives of convex functions and saddle functions, Transactions of the American Mathematical Society 320 (1990) 810822. [23] X.Q.Yang and V. Jeyakumar, Generalized second-order directional derivatives and optimization with C 1 ' 1 functions, Optimization 26 (1992), 165-185.
Homogeneous
Programming
351
Recent Advances in Nonsmooth Optimization, pp. 351-380 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Characterizations of Optimality for Homogeneous P r o g r a m m i n g Problems with Applications A. M. R u b i n o v 1 Department of Mathematics and Computer Science, Faculty of Natural Science, Ben-Gurion University of the Negev, Beer-Sheva, 84105 Israel. E-mail: [email protected]. ac. il
B. M. Glover School of Information Technology and Mathematical Ballarat, Ballarat 3350, Victoria Australia. E-mail:
Sciences, University of [email protected]
Abstract Necessary and sufficient conditions for optimality for various classes of convex and nonconvex programming problems involving positively homogeneous ob jective and constraint functions are developed. In particular global optimality criteria for the maximization of a sublinear function subject to sublinear con straints are established under the assumption that the value of the problem is known. This assumption is removed for certain specially structured problems. Applications to mathematical economics and functional analysis are discussed, in particular the following problem is considered in detail: to describe an ele ment on the unit sphere which realizes the norm of a given linear operator.
1
Introduction
Positively homogeneous functions arise naturally in m a n y applications including eco nomic modelling and m a t h e m a t i c a l p r o g r a m m i n g . Optimization problems involving these functions p e r m i t interesting analysis due to the presence of homogeneity even in 'The research of the first author is supported by the Ministry of Science, Israel and an Australian Government Bilateral Science and Technology Grant
352
A. M. Rubinov and B. M. Glover
the absence of convexity. They provide a rich source of examples within the context of global optimization which is an area receiving considerable attention both from the theoretical and computational perspective (see Horst and Tuy [11] and Pardalos and Rosen [19]). In addition the study of positively homogeneous functions is fundamen tal in nonsmooth analysis where such functions arise as nonlinear approximations to nondifferentiable functions (see [4, 6]). Consequently the results of this paper are a contribution to the theoretical study of mathematical economics, functional analysis and nonsmooth analysis. In this paper we consider programming problems involving positively homoge neous functions, both objective and constraint. In particular the results developed apply to a range of nonconvex programming problems, including problems involving convex functions (eg. sublinear maximization problems). We begin by developing very general theoretical results concerning the concepts of support sets, subdifferentials and superdifferentials for a variety of positively homogeneous functions. In particular we discuss existence of these approximating sets and their properties. The approach used throughout is based on convexification. A similar approach has been discussed by various authors, see, in particular, Ioffe and Tikhomirov [12] and Ekeland and Temam [7]. We provide a detailed discussion of the geometrical basis for this convexification process and its manifestation in a variety of examples. In particular we discuss the significance of this approach for difference sublinear functions using a, concept of set difference applicable to compact convex sets. The main results are related to establishing optimality conditions for various ex tremal problems involving possibly nonconvex positively homogeneous objective and constraint functions. In particular we are able to obtain characterizations of opti mality, in terms of subdifferentials and superdifferentials, for the maximization of a sublinear function subject to positively homogeneous (not necessarily convex) con straints. Such global optimization problems have received considerable attention in the literature (see for example [13, 11, 9]). They arise, for example, in certain approx imation and norm comparison problems (see [10]). In the initial results presented we assume that the value of the programming problem is known, however this assump tion is relaxed for certain specially structured problems. The conditions presented, involving the intersection of subdifferentials and superdifferentials, may assist in the development of computational schemes for solving such global optimization problems, for instance by providing verifiable stopping criteria or search paradigms. This will be the focus of future research. The structure of the paper is as follows: in section 2 we introduce the concept of a set which is star-shaped with respect to zero and develop the properties of such sets and their Minkowski gauges (complementing the results of [22]). These gauges are positively homogeneous nonnegative l.s.c functions. In section 3 subdifferentials of these functions are introduced and studied. Sets which are star-shaped with respect to +00 and their respective gauges are studied in sections 4 and 5. Maximization of nonnegative sublinear functions and minimization of superlinear nonnegative func-
Homogeneous
Programming
353
tions are investigated in sections 6 and 7 respectively. We widen the applicable classes of extremal problem under consideration in section 8 using the concept of associated problems (see Tuy [24] for related ideas). In sections 9 and 10 we discuss sublinear maximization under a single constraint and subject to finitely many constraints respectively. In section 11, as an aid to the reader, we summarize the results and compare the various extremal conditions obtained in sections 6 to 10. Finally, in section 12, we present an application of our results to the problem of locating an element on the unit sphere which realises the norm of a given linear operator. To complete this introduction we outline some areas of potential application of the results contained in this paper. The study of extremal problems involving positively homogeneous functions, to which the results of this paper are primarily applicable, has at least two main areas of application, mathematical economics and functional analysis. We briefly outline the context of these applications as follows: Mathematical economics One of the main tools for studying problems of economic theory are the so-called production function and cost function. These nonnegative functions are defined on the cone IR/J. of vectors with nonnegative coordinates. The production function provides the value of an output of production (number) under a given input (vector), the cost function provides the value of an input (number) under a given output (vector). As a rule these functions, F, are positively homogeneous of degree a > 0 (and, in many circumstances of practical interest, of degree one). How ever, in any case, we can consider the function Fxla which is positively homogeneous of degree one when studying extremal problems involving objective functions of this type. As a rule only concave production functions and convex cost functions have been considered in Mathematical Economics. Under these convexity assumptions we generate classical convex extremal problems involving the production and cost functions. When it is possible economists have attempted to remove these convexity assumptions. For example when studying nonlinear eigenvalue problems the Nobel prize winners (in Economics) R. Solow and P. Samuelson [23] consider production functions which are positively homogeneous of degree one and increasing (and there fore nonnegative) without further convexity assumptions. Further generalizations of results in this direction are contained in [17, 18]. The results in this paper allow the further removal of these convexity assumptions for particular extremal problems arising in Mathematical Economics. A very important role in this regard is played by Theorem 7.2 which gives a nonconvex generalization of the following classical eco nomic problem: minimization of the cost of the input under a given value of the output. More recently nonnegative positively homogeneous functions have been used un der additional convexity assumptions for the investigation of certain problems of Eco nomic Equilibrium (see [15]). The point here is that positive homogeneity permits the study of economic equilibrium with the help of simple extremal problems such as those discussed in this paper.
A. M. Rubinov and B. M. Glover
354
Functional Analysis Many important positively homogeneous nonnegative functions are considered in functional analysis, for example the norm in a normed linear space, the norm of a linear operator, spectral radius, the least eigenvalue of a positive definite matrix and so on. It is usual, in applications in functional analysis, to consider the minimization of nonnegative sublinear functions (such as the norm function). This presents a classical convex problem, however it should be noted that the solution of this problem has interesting properties, for example, it provides information on the properties of the elements of best approximation. On the other hand the maximization of a sublinear function is also of interest. The maximum of a positively homogeneous function on the unit ball is a quantity which shows a degree of 'dilation' of the ball under the action of this function. It is quite natural to expect special properties of the elements where the maximum is achieved. In the final section of this paper we demonstrate this by an example concerning the norm of a linear operator.
2
Star-shaped with Respect to Zero Sets and Their Gauges
Throughout this paper X shall denote a locally convex Hausdorff topological vector space. For a set D C X we shall denote the closure, convex hull and cone generated by D as cl D, co D and cone D respectively. The interior of a set D will be denoted int D and a set with nonempty interior will be called solid. The dual space of X, denoted by X', will be endowed with the weak* topology. Definition 2.1 A set U C X is said to be star-shaped with respect to zero (0-st-sh) if zero is a star point of this set i.e
x 6 U, 0 < A < 1 = > Xx e u. We can rewrite the definition of a star-shaped with respect to zero set in the following form: U is 0-st-sh <=^ (VA G [0,1]) \U C U. Definition 2.2 Consider a 0-st-sh set U. IRU {+oo}) defined as follows, for x 6 A',
Then the function \iv : X —► ]R +00 (=
/jy(z) = inf {A > 0 : x € \U] is called the Minkowski gauge (or 0-gauge) of the set U. We assume that the infimum of the empty set is equal to + co. Clearly if U is convex then /iy coincides with the well known Minkowski gauge from convex analysis (see, for example, [20]).
Homogeneous Programming
355
For x 6 X let Rx denote the ray {Ax : A > 0}. It is easy to check that fiV(x) = 0 <*=> Rx C U. Hu(x) = +oo <(=* Rx n U = {0}. {x : nu(x) < +00} = cone U(= (J XU). A>0
In particular fiu(x) < +00, for all x 6 X, if and only if zero is an algebraic interior point of the set U (i.e U n {Ax : A > 0} ^ 0 for all x e A"). It is easy to check that UiCU,
<=► (V* € X)
W](x)
>
W2(x).
Definition 2.3 For c £ ]R the level set { i e l : o(x) < c} o/tfte function g : X —> IR+oo un7/ 6e denoted by Sc(g). Lemma 2.1 Le< g be a function following are equivalent:
defined on X and mapping into IR+oo, then the
(i) g is positively homogeneous nonnegative and l.s.c with o(0) < +00 (ii) there is a closed nonempty 0-st-sh set U such that g coincides with the Minkowski gauge fiu of U. Proof: (i) implies (ii). If o is positively homogeneous and 0 < 5(0) < +00 then g(0) = 0 and therefore 0 6 U = S\(g). The set U is nonempty and it is easy to check that the set is 0-st-sh. Since g is l.s.c it follows that U is closed. Let us consider the Minkowski gauge MJ of the set U. Since g is positively homogeneous and nonnegative we have fj.u{x) = inf {A > 0 : x/A 6 U) = inf{A > 0 :g(x/X) < 1} = inf{A>0:O(x)
Si(fiu).
Since Sc(nu) = CS\(MJ) whenever c > 0 we have that the set Sc(fiu) is closed. Since S0(fiu) = n c>0 5 c (/xy) it follows that S0(nu) is closed. Thus the level sets Sc(nu) of the function nu are closed whenever c > 0. Therefore fiu is l.s.c. O Remark 2.1: Let us note that the l.s.c positively homogeneous nonnegative function g is the Minkowski gauge of the set Si(g). Remark 2.2: Connections between special classes of 0-st-sh sets and continuous positively homogeneous functions have been studied extensively in [22].
A. M. Rubinov and B. M. Glover
356
Deflnition 2.4 The set of all l.s.c nonnegative positively homogeneous (of degree one) functions g defined on the space X with the property g(0) < +00 will be denoted by PH((X). The totality of all closed nonempty 0-st-sh sets will be denoted by St0(X). We now introduce the natural order > in PH((X)
as follows:
ffi > 92 <=> (Vx G X) gi(x) > g2{x) where <7; € PHi(X). Also the order relation defined on St0(X) inclusion is defined as follows:
determined by anti-
Ui > U2 <=> Ui c u, for Ui 6 St0(X). It is easy to check that the ordered sets PH((X) and St0(X) are complete lattices (see [3]). An arbitrary subset of either PHg(X) or St0(X) has a supremum and infimum. If, for example, (p0)aeA Q PHe(X) then (supp a )(x) = supp 0 (x), (infp„)(x) = cl[infp a (x)\. a a
a
°
Here (sup 0 p Q ) and (in^p^) are boundaries in the lattice PHe(X) and supapQ(x) and inf a p„(x) are pointwise boundaries; cl/ denotes the l.s.c hull of the function / (see [20]). Similarly, if (Ua)a€A C St0(X) then supUa = P | Ua, inf [/„ = cl (J Ua. °
aeA
aeA
It is easy to check that the following holds. Theorem 2.1 The mapping ip0 : St0(X) isomorphism.
—t PHe(X),
where ip0(U) = nu, is a lattice
Now let us consider the dual space L = X' of all continuous linear functions defined on X. Definition 2.5 Let g g PHt(X). The set s(g) = { f 6 l : ( V i £ l ) l(x) < g(x)} is called the support set of the function g. Definition 2.6 The function x t~> sup {t{x) : (. € s(g)} is called the i-convex hull of the function g and is denoted coLg. Clearly s(g) is nonempty (since 0 € s(g)) weak* closed and convex and coLg is a l.s.c sublinear function which is the greatest l.s.c sublinear minorant of g. Let us note that a function g 6 PH/(X) is sublinear if and only if g is a Minkowski gauge of a convex set U 6 Stc(X), in other words if the set Si(g) is convex.
Homogeneous
357
Programming
Now consider the level sets U = Si(g) and V = Si(coLg). Clearly U is a closed 0-st-sh set and 0 G V By Lemma 2.1 it follows that g is a Minkowski gauge of U and coLg is a Minkowski gauge of V Since coLg is the greatest l.s.c sublinear minorant of g we have, by applying Theorem 2.1, that V is the greatest convex subset of St0(X) which is majorized by U (relative to the order relation in the lattice St0(X)). Since the order relation in St0(X) is defined by anti-inclusion we can say that V is the least (by inclusion) convex closed set which contains U, i.e V is the closed convex hull of the set V, V - dcoU.
3
Sub differentials of Functions in PH^(X)
In this section we consider the support set s(g) of a function g G PHt(X) and subdifferentials of this function at a point. Firstly let us consider sublinear functions. Let p be a l.s.c sublinear function. Since p is convex we can consider the subdifferential dp(x) of the function p at the point i £ l It is well known (see [20]) that dp(x) = {£ G dp ■ £(x) = p(x)}. Here, and subsequently, dp denotes the subdifferential of the function p, by definition: dp = {£ e A" : (Vi 6 X) C(x) < p(x)}. If a sublinear function p belongs to PH((X) then dp = s(p). It is easy to check that d(coLg) = s(g) for g G PH((X). We now define the subdifferential at a point for a function g G PHe(X). We give the same definition that is used for sublinear functions. Definition 3.1 The subdifferential of the function g G PHe(X) is the set §g{x) = {eet(g):t(x)=g(z)}.
at the point x G X
In the nonconvtx case this set may be empty. Proposition 3.1 Let g G PHt(X). 1. g(x) = (coLg)(x) 2. dc,(x)jt®
Then the following hold:
<=> dg(x) =
d(coLg){x).
= > g(x) = (coLg){x).
3. (g(x) = (coLg)(x),
d(coL9)(x) * 0) => 2g(*) + «■
Proof: 1. This is true since s(g) = s(coLg). 2. If dg(x) ^ 0 then there is I G s(g) such that t(x) = g(x). By the definition of the function coL# we have I G d(coLg) and t(x) = ( c o ^ ^ x ) . Therefore g(x) = (coLg)(x).
A. M. Rubinov and B. M. Glover
358 3. Follows easily from part 1.
D
Example 3.1: Let p l 5 p2 be continuous sublinear functions defined on the space X such that pi(x) > P2(z) for all x £ X. Let # = Pi — P2- Clearly g is a continuous nonnegative positively homogeneous function. In order to describe the support sets * s(g) and the subdifferential dg(x) we require the notion of set difference — between convex sets. By definition A-
B= {x:x
+
BCA}.
Proposition 3.2 Letpx, p2 and g be as above. Then the following hold: m
1- »(g) = dpi - dp2-
* 2. For all x G X, dg{x) = dpi(x) -
dp2(x)-
Proof: 1. We have that t e s(g) <^> (Vx) t(x) + p2(x) <
JH(X)
<=>• d{t + pt) c 8pi ^=>( +
dp2^dp1
<=>^G dpi - dp22.£edg(*) *=> (t + dP2 C dpu £(x) + P2(x) = Pi(x)) dpi(x) <=> £e dpi(x) - 8p2(x).
<=> ( +
dp2(x)C a
For a detailed discussion of differences of compact convex sets and their connection with various concepts of subdifferential in nonsmooth analysis see [21]. Let us now give a geometric interpretation of the subdifferential dg(x). Let £ G s(g). Since £ and g are positively homogeneous we have that £ G s(g) (i.e for all x G X, £(x) < g(x)) if and only if g(x) = 1 implies £(x) < 1. Now let x G Si(g). Then x = Ax', where g(x') = 1 and A < 1 therefore £(x) = X£(x') < 1. Thus the inclusion £ G s(g) holds if and only if the level set Si(g) is contained in the halfspace H( = {x : £{x) < 1}. If we consider £ G dg(x) then I G s(g), i.e S^g) C fl^, moreover ^(x) = g(x). Therefore £ G d#(x) if and only if the set {x' : ^(x') = g(x')} is a supporting hyperplane to the set Si(g) at the point x. The equality dg(x) = d(coLg)(x) shows that a hyperplane supports Si(g) at the point x if and only if this hyperplane supports the closed convex hull clco5i(^) at the same point. Let p be a sublinear function which is continuous at a point x G X. It is well known that the subdifferential dp(x) is a nonempty weak* compact convex set and dp(x)
Homogeneous Programming
359
coincides with the subdifFerential dp'x of the directional derivative p'x{-) = p'(x,-) of the function p at the point x: p'x(u) = lim a~1(p(x + au) — p(x)) = max £(u). Clearly if 5 6 PHt(X) and dg(x) ^ 0 then the directional derivative g'x may not exist. However we can say that the lower Dini derivative of the function g at the point x in the direction u f l : gD(x,u)
= liminf a~x(g(x + au) — p(x))
majorizes p'x(u) where p = coLg. This follows since p(x) = g(x) and p(y) < g(y) for all 3/ £ X so that p'x(u) = lim a _ 1 (p(x + au) — p{x)) < liminfa~ y (g(x + aw) — g{x)) =
gD(x,u).
We now provide an example to show that the inequality p'x(u) < gD{x,u) may be strict. E x a m p l e 3.2: Let X = IR2 and U = Ui U U2 where U\ is the convex polyhedron with vertices at (0,0), (1,0), (1,1), (0,1/2) and U2 is the convex polyhedron with vertices at (0,0), (—1,0), ( — 1,1), (0,1/2). It is easy to verify that the Minkowski gauge g\ of the set f/i has the form: , \ _ J rnax(—ii + 2x 2 , xi) if xj > 0, x 2 > 0 5i(xi,x 2 ) - I + 0 Q in all o t h e r cases The Minkowski gauge g2 of the set U2 has the form , 52^ 1,
, _ J max (xj — 2x2, —Xi) if xj < 0, x 2 > 0 2) — S _
o o
j
n ajj 0 ( - n e r
c a s e s
Since U = Ui U U2 we have #(x) = inf (<7i(x), <72(x)) for all x 6 IR2. Here g is the Minkowski gauge of the set U. Clearly the convex hull of the set U is the rectangle with vertices ( — 1,0), ( — 1,1), (1,1), (1,0). The Minkowski gauge p of this rectangle has the form: / max(|a:iJ, x2) p(xi,x 2 ) = j + o o
x2 > 0 X2
Clearly p = coLg. Now let x = (1,1). We have g(x) =
360
A. M. Rubinov and B. M. Glover
However pfa = max fa, x2) near the point x and so p'(x,u) = max(u max(u p'(S,u) l 7 uu2). It is easy to check that g'(x,u)=p'(x,u) g'(x,u)=p'fau)
= uu
iifffu! ,,>>«m2 ,
and g'fa u) = - u , + 2u2 >"2=
p'(x, u), if uj < u2.
The role of very general directional derivatives for positively homogeneous functions have been recently explored in [8, 14, 5]. The following lemma will be useful when we study extremal problems in subsequent sections. L e m m a 3.1 Let p £ PHt{X), assertions are equivalent: (i) i€L,
x' G X, p(x') = A and 0 < A < +oo. The following
(VX G Sk(p)) tfa t(x) < £fa), t(x'),
(fa) l(x*) = 1
(u) xeedp(x*) (ii) xeedpfa) Proof: (i) => (ii). Since i(z*) = 1 we have (Xt)(x*) = A = p(x'). Let us establish that A^ G dp. At first we consider a vector i such that ptx) > 0. Let x' = -4TZ. Since p(x') = A it follows that i(x') < 1 and therefore Ufa < pfa. Now let pfa = 0. Clearly fix G 5A(p) for all ft > 0. Therefore £(/ix) < 1, for all ft > 0, i.e ^(x) < 0. Consequently we obtain the inequality (fa < pfa in this case also. (ii) = > (i). We have, for all x, AC(x) < p(x) and Ufa) = pfa) = A. Therefore (.fa) = 1 and, for all x G Sx(p),
£fa < \pfa) = 1 = (fa) efa A
a
4
Star-shaped with Respect to +00 and Their Gauges
We now introduce the notion of star-shaped with respect to +00 sets. This notion is defined symmetrically with the notion of 0-st-sh sets.
Homogeneous
Programming
361
Definition 4.1 A subset U C X is called star-shaped with respect to +00 (+oo-stsh) if
x e u, A > 1 =*■ \xeu. Equivalently AC/ C t/ for all A > 1. If U is +oo-st-sh then the function v\j : X —> IR+00 tu/iere i/[/(x) = sup {A > 0 : x G At/} 15 called the +oo-gauge of the set U. Such functions are discussed in relation to interior point methods in [Sj. We assume that the supremum of the empty set is equal to zero. It is easy to check the following: Vu{x)
= 0 <^=> Rx n U = 0 or Rx n U = {0}. uv{x) = +00 <==;> Rx C U. uu(x) > 0 <$=> x 6 cone U. (Vx G AT) ^/(x) = +00 <^=> J7 = X
If 0 G [/ then i/t/(0) = +00 and if 0 £ U then i>[/(0) = 0. Consider the following examples. Example 4.1: Let I G L = X', H~ = {x : i(x) < c}, H+ = {x : £(x) > c} where c > 0. Clearly i/ c _ is a 0-st-sh set and H+ is a +oo-st-sh set. We have HH-(x) = inf {A > 0 : x G XH~} = inf {A > 0 : C(x/\)
= i n f { A > 0 : -C(x) < A) c = min(0, - « x ) ) . c j/ H +( x ) = sup {A > 0 : x G Ai/+} = sup{A > 0 : -£(x) > A} c = min(0,
c
-?{x)).
Therefore vH+ = nH-. Example 4.2: Consider the following generalization of Example 4.1. Let U be a 0-st-sh subset of X such that the interior of U, int U, is nonempty and every ray Rx
A. M. Rubinov and B. M. Glover
362
(x ^ 0) does not intersect the boundary of U more than once. Now let V = JRn\U. It is easy to check that V is a +oo-st-sh set. We have vv{x) = sup {A > 0 : x/A g V} = i n f { £ > 0 : £ x € V} = inf {(, > 0 : £x g £/} = sup{<£ > 0 : i^x 6 C} = inf {u> > 0 : x £ u>£/} = /*u(a:). Example 4.3: Let K be a cone in X. Clearly K is a 0-st-sh set and, simultaneously, a +oo-st-sh set. In this case we have HK{x) = inf {A > 0 : x € AA'} = \°+QQ
i/*(x) = sup {A > 0 : x e AA'} = | ^
^
K
j = M*)>
x £ A' J
=
5
*\*(x)
where (5z denotes the indicator function of a set Z. In the following we shall only be interested in closed +oo-st-sh sets. For a function g : X —* IR+oo we shall denote the level set {x £ X : g(x) > c} by Qc(g)Lemma 4.1 Let g : X —> IR+oo- Then the following are equivalent: (i) g is positively homogeneous nonnegative u.s.c and there is an x 6 X such that g(x) > 0, (ii) there is a closed nonempty +oo-st-sh set U such that g coincides with the +oogauge vy of the set U. Proof: The proof is similar to that of Lemma 2.1. We provide an outline below for completeness. (i) implies (ii). If g(x) > 0 then there is a A > 0 such that #(Ax) > 1 and therefore U = Qi(g) y£ 0. Since g is u.s.c the set U is closed. It is easy to check that g = u\j. (ii) implies (i). Let U be a closed nonempty +oo-st-sh set and let vu be the +oogauge of U. Then there is a x e X such that uv(x) > 0 (if x G U then vv(x) > 1). The level sets Qd^v) = cQi{vv) = cU are closed whenever c > 0. The set Qo(vu) — X is closed also. Thus vu is u.s.c. D Remark 4.1: Note that the function g in (i) above is the +oo-gauge of the set Q\{g). Definition 4.2 The set of all u.s.c nonnegative positively homogeneous functions defined on X with the property supxeX g(x) > 0 will be denoted by PHU{X). The totality of all closed nonempty +oo-st-sh sets will be denoted by St+ao(X).
Homogeneous
Programming
363
We introduce the natural order relation > in PHU(X) PHU{X): 9i>92
as follows, for gi: g2 G
<=> (Vx 6 X) ,(x) > g2(x)
and the order relation defined by inclusion in St+00(X),
for Ux, U2 G
St+00(X):
and St+00(X)
are complete
Ui > U2 «=» Ux D U2. It is easy to check that the ordered sets PHU(X) lattices. If {pa)aeA C PHU(X) then
(supp a )(z) = cl(supp Q (x)), (infp 0 )(x) = infp„(x). a a a
a
Here c l / denotes the u.s.c hull of the function / . If (Ua)o€A
Q St+00(X)
then
sup[/ a = cl [J Ua, miUa = P | Ua. a
a£A
a£A
The following is easily established. Theorem 4.1 The mapping tp+00 : St+00(X) lattice isomorphism.
5
—t PH^(X)
where ifi+00(U) = uu is a
U p p e r Support Sets and Superdifferentials at a Point
It should be noted that it is not possible to introduce the notion of L-concave hull and superdifferential at a point for all functions in PHU(X). This follows since the Lconcave hull of a function is a superlinear function. However there are no superlinear nonnegative functions q =fi 0 defined on all of X with (0) = 0 (if q(x) > 0 then ?(0) > q(x) + q{—x) > 0). So we can only introduce these notions for a special subclass of PHU(X). Let g G PHU(X). We introduce the following sets Vg = clcoQi{g), Kg = clconecoQjfff) = cl \J \>o
co\Qj(g)
where Q\{g) = {x : g(x) > 1} is the level set of the function g. Clearly Kg = clconeV^. We will consider below only functions g G PHU{X) such that 0 £ Vg. We will denote the set of all such functions by PH°(X). Definition 5.1 Let g G PH°U(X). The set s(g) = {( G L : (Vx G K„) t{x) > g(x)} is called the upper support set of g. Clearly this set is closed and convex. Proposition 5.1 For g G PH°(X)
the set~s{g) is not empty.
A. M. Rubinov and B. M. Glover
364 Proof: Since g G PH°(X) can find £ G V such that
we have 0 £ Vg and applying the separation theorem we inf # x ) =
inf tfs) > 0.
Assume that in{l€Q^g)£(x) = 1. We have £(x) > g(x) if (x) = 1. Since £ and g are positively homogeneous we have £(x) > g(x) if g(x) > 0. Clearly ^(x) > 0 for all x G clconeco Vg = /f3 and therefore £(x) > g(x) for all x G A s , that is £ G s(g). □ Let g G PH°(X). The function co^g where, for x £ X,
c^)(*)={fw*):'e^)} ;!£ is called the L-concave hull of the function
The superdifferential dg(x) of the function g at
dg(x)={ees(g):£(x)=g(x)}. Proposition 5.2 Let g G PH°(X).
Then the following hold:
1- 9(x) = {c°Lg)(x) <=> dg{x) =
d(coLg)(x).
2. dg(x) ± 0 = > g{x) = (coLg){x). 3. (g(x) = (coLg)(x), d(coLg)(x)) / 0 = ^ dg(x) ? 0. Proof: Follows as in Proposition 3.1. D Example 5.1: Let K be a solid closed convex cone, q1, q2 be superlinear functions defined on K with q\ > q2 on K and „/„1 _ / 9i( x ) ~ ©fa) ^ ' ~ \ 0 Clearly g G PH°(X).
I(
= A x0 A
It is easy to check, as in Example 3.1, that s(g) = dqx — dq2,
dg(x) = dqi(x) — dq2(x) for x G int K. Let g G PH°(X) and x G <3i(ff)- It can be shown, in a similar fashion to the approach for 0-st-sh sets, that I G dg(x) if and only if the set {x' : £(x') = g(x')} is a support hyperplane for the set Qi{g) at the point x.
365
Homogeneous Programming L e m m a 5.1 Let g € PH°(X), following are equivalent:
x~ e Kg and g(x') = A with 0 < A < +oo. Then the
(i) £ 6 L, (Vx 6 Qx(g)) £(x) > i(x'), £(x') = 1.
(a) weaj(i'). Proof: (i) implies (ii). Since £(x') = 1 we have (X£)(x*) = g(x*). Now let us consider a vector x g Kg such that g(x) > 0. Let x' = -Ayx. Since g(x') = A it follows that £{x') < 1 and therefore A^(x) < g{x). Since £(x) > 1 for all x € Q\{g) we have that £(x) > 0 for all x £ Kg = clconeco[^Q A (5)]. So if x 6 Ks and g(x) = 0 then A^(x) > g(x) also. Therefore A^ £ dg(x'). The proof of (ii) implies (i) is similar to that of Lemma 3.1. □
6
Maximization of Sublinear Functions Subject to Positively Homogeneous Constraints
We consider the following extremal problem: P(c)
f(x)
—> max
subject to
g(x) < c
where / , g 6 PH((X) and / is sublinear. Assume that c > 0. We assume that there exists a solution of this problem with d = max {/(x) : g(x) < c] and the value d of the problem P(c) is finite. Moreover we assume that d ^ 0. Theorem 6.1 Let f,g(z PH((X) and assume f is sublinear. Let the inequalities c > 0, d > 0 hold. Let the function f be continuous at a point x* such that g(x*) = c. Then the point x" is a solution of the problem P(c) if and only if the intersection
ia/(**)n-&(**) a contains a nonzero linear functional.
(i)
c
Proof: Let x* be a solution of the problem P(c). Since / is continuous at the point x" it follows that / is continuous at the point Ax* with A > 0. Clearly /(Ax*) < d when 0 < A < 1. Hence the set Td(f) = {x : f(x) < d) is open. Applying the separation theorem we get a linear function I 6 L such that ^(x) < ^(x*) for all x G Sc(f). Since 0 € Td(f) we have ^(x*) > 0. Without loss of generality we can assume that £(x*) = 1. Lemma 3.1 now yields d£ g df(x). Since x* is a solution of the problem
A. M. Rubinov and B. M. Glover
366
P(c) it follows that the inclusion Sc(g) C Sd(f) holds and therefore £(x) < £(x") = 1 for all x € Sc(g). Applying Lemma 3.1 again we obtain c£ € dg(x'). Thus the intersection (1) contains the linear functional £ ^ 0. Conversely, let intersection (1) contain the linear functional £ ^ 0. Thus c£ 6 dg(x") and d£ 6 df(x'). These inclusions show that the following equalities hold:
ct(x') = g(x'),
d£(x") = /(*•).
Since g(x*) = c we have that C(x') = 1 and therefore f(x*) = d = max{/(x) : g{x)
O
Remark 6.1: Let us note that if x" is a solution of the problem P(c) then the subdifferential dg{x") is non-empty. Remark 6.2: We can rewrite Theorem 6.1 in the following form: x' is a maximizer of the problem P(c) if and only if there are functionals £\ € df(x*) and (2 6 dg(x') such that ■Mi + \2£2 = 0 where Aj = 1/d, \2 = — 1/c. Thus we can consider the numbers \\ and A2 as Lagrange multipliers. Let us note that the ratio \i/\2 = —c/d of these multipliers is determined by the ratio of the value of c to the value of the program d. Remark 6.3: We can consider various approximations for a function g £ PH((X) near a point 1*. For example if g is locally Lipschitz we can consider the Clarke subdifferential (see [4]) or the Michel-Penot subdifferential (see [16, 5]). Let us note that these and other known approximations are defined with the help of general ized directional derivatives which majorize the lower Dini directional derivative. The subdifferential dg(x) = d(coLg)(x) is defined using the directional derivative of the sublinear function coLg. We have seen that this directional derivative is a lower sublinear approximation to the lower Dini directional derivative (possibly strictly lower, see Example 3.2). Let us consider the situation where the value d of the problem P(c) is unknown. Theorem 6.1 gives the following necessary condition for a maximum in this case: Let f,g& PH((X) and / be sublinear and continuous at a point x", the putative solution of the problem P(c). Assume that 0 < f(x*) < +00. Then there are numbers Aj > 0 and A2 > 0 and /' 6 L, I' / 0 such that
l'e^df(xnn^dg(x') A\
(2)
A2
Now we consider a function / which has a form
f i(x)
itxeK G K
1 +00 otherwise otherwise
(3) '
v
Homogeneous
367
Programming
where if is a closed solid convex cone and / 6 L and l(x) > 0 for all x S K. Clearly / € PHt(X). Let x' G int K and /' e df(x'). We have l'(x) < l(x) for all x£l< and l'(x') = l(x"). Since x' 6 int A' we have /' = /. So df(x*) = {/} in our case. We can give a necessary and sufficient condition for a global maximum in this case without using the value d of the problem P(c). Theorem 6.2 Let g e PHt(X) and assume f has the form (3). A point x* e K with g(x') = c is a solution of the problem P(c) if and only if there is A > 0 such that
Medg(x')
(4)
Proof: Let x~ be a solution to P(c). Then the formula (2) holds. Since df(x') = {/} we have I' = j-l. Therefore (4) is true with A = A2/Ai. Now let (4) hold with A > 0. Let g(x) < c. We have by the definition of subdifferential that: A l(x) < g(x) < c A l(x') = g(x') = c Therefore l{x') > l(x) if g(x) < c. □ Now let us explain why we consider only a function / which has the form (3). Let / 6 PH°(X) and d o m / = K where K is a closed solid convex cone. Let us give sufficient conditions for a solution of the problem P(c) with given / and g € PH((X): If x* £ K, g(x') = c and there are numbers A] > 0, A2 > 0 and a functional P € I , /' # 0 such that
t'e^-dg(x*)n^df(x*)
(5)
then x* is a solution of the problem P(c). (Compare with the necessary condition (2)). Actually, if g{x) < c we have in this case
f/(*) < ?t» < Y^X) $ Yc Ai
Ai
A2
l / ( x ' ) = l'(x') = 1 g(x*) = i-e. A2
Aj
A2
Since Ai > 0 we have f(x) < f(x') if g(x') < c. Now let / be a continuous nonnegative positively homogeneous function defined on the cone K. We can extend the function / to the entire space using two methods. Either we can consider the function / : /(*) =
+00
x£l< x &K
0
x eK xqLK
or the function / :
7(x) =
A. M. Rubinov and B. M. Glover
368
Clearly / £ PHt{X),_f £ PH°U{X). Clearly if x € K the subdifferential §£(x) and the superdifferential df(x) depend only on / (but do not depend on / or / ) . So we can write both the necessary condition (2) and sufficient condition (5) in this case. It can be shown that if both of these take place at the point x' £ int K then / has the form (3). In this case both of the sets dj(x*) and df(x") are not empty. Let /i £ df(x') and l2 £ df(x"). Then, for all x £ K,
h(x) < f{x) < h(x) that is h — h £ K" and h(x*) = hix"). Since x~ £ int if we have /i = l2- Thus / is equal to a linear function on the cone K. So using this method we no longer require knowledge of the value d provided we are dealing with functions which have the form (3). Now we can compare conditions of local and global maximum for the problem P(c) involving a function / which has the form (3). Assume that g £ PH((X) and g is a Lipschitz function near a point x" £ int K. Then necessary conditions for a local maximum (but not always sufficient) have the form: XI £ dcg{x") where dcg(x") is the Clark subdifferential of the function g. At the same time a necessary and sufficient condition for a global maximum has the form (4): A / £ dg(x'). Let us note that the convex compact set dg(x') is contained within the convex compact set dcg(x"). R e m a r k 6.4: We can obtain, applying the proof of Theorem 6.1, necessary conditions for a local maximum which do not depend on the value d of the program. Let x' be a local maximum of the problem P(c). Then there is a small cone K such that x* is a global maximizer of the following problem: f(x)
—> max
subject to
g(x) < c, x £ K
Let us replace the function g by the function g defined as follows:
^
= {+oo
xtK
Clearly x* is a global maximizer of the following program /(x)
—> max
subject to
and so, by Theorem 6.1, the intersection
ia/(x*)n-9p(x*) d
c
g(x) < c
Homogeneous
369
Programming
contains a nonzero linear functional (.. Here d = f(x*) and we can rewrite the necessary condition in the following form: the intersection
'^(On-SM
(6)
f(x
contains a nonzero linear functional £. Clearly this condition does not depend on the value d. Note that the inequality g > g implies that coLg > coLg and therefore we cannot substitute 3g(i*) for dg{x") in (6).
7
Minimization of Superlinear Functions Subject to Positively Homogeneous Constraints
We now consider the following extremal problem: Q(c)
f(x)
—» min
subject to
g(x) > c
where f,g£ PH°(X) with Kj = Kg and c > 0. We further assume that there exists a solution of this problem with d = min {f{x) : g(x) > c) and 0 < d < +oo. Theorem 7.1 Let f,g£ PH°(X) and suppose there is a closed solid convex cone K such that Kj = Kg = K and K ^ X. Let the restriction of f to the cone K be superlinear. Then the point x~ 6 intK such that g(x') = c is a solution of Q(c) if and only if the intersection:
]df(x')n-dg(x') d contains a nonzero linear functional.
c
Proof: Let x" be a solution of the problem Q(c). Since x" 6 intA' the interior of the closed convex set Qd{f) is not empty. Since / € PH°(X) we have 0 £ Vg = Qi(g) and therefore 0 £ Qd(g)- Now let us consider the set x* — K and show that int (x'~
K) n Qd(f) = 0.
If y e int (x* — K) then there i s a t G int K such that x' = y + v. Since v g int K there is a A > 0 such that v — \x* G K. Thus there is a w £ K such that v = \x* + w. Applying the superlinearity of / on the cone K we have
/(**) > f(y) + f(v) > f(y) + A / ( 0 + f(w) > f(y) + Xf(x').
370
A. M. Rubinov and B. M. Glover
Therefore /(») < (1 - A)/(x') < d. Thus y g Qd(f)- On the other hand since i* S int K we have 0 £ int (x* - K). Now we can apply the separation theorem to find t € L such that inf l(x) > sup £{z) = £(s*) + sup £(x'). xeQdf) xtx'-K x'e-K Since —K is a cone we have supxi^_K £(x') = 0. Since 0 6 int (x* — K) we have t(x') = supX£X._K £(x) > 0. Without loss of generality we can assume that £(x") = 1. The remainder of the proof follows in a similar way to the proof of Theorem 6.1. O We now provide a result related to Theorem 6.2 which has application to economic theory. Theorem 7.2 Let K be a closed convex cone, f, g € PH°(X), Kj = Kg = K and there is I 6 X' such that f(x) = l(x) for all x 6 K and l(x) > 0 for all x £ K, x ^ 0. Then a point x" 6 K such that g(x') = c is a solution of Q(c) if and only if there is A > 0 such that Xledg(x')(7) Proof: Let x* be a solution of the problem Q(c) and f{x*) = l(x') = d. Clearly d > 0. We have 1 = l(x*)/d < l(x)/d for all x € Qd(f)- Using Lemma 5.1 we can obtain inclusion (7) under A = 1/d. On the other hand if this inclusion is true then using an approach similar to the proof of Theorem 6.2 we obtain that x' is a minimizer of the problem Q(c). □ We now give an economic application of Theorem 7.2. Let us consider the cone i C of all vectors with nonnegative coordinates in the n-dimensional coordinate space R" as a cone of vectors of resources. Let us consider the economic system which can transform a vector of resources x into a vector of output. Denote the value of the output by G(x). We assume that there is a price vector / in the system, I = (htht ■ ■ ■ >'n)i where /, > 0 is a price of the product i and the value of the output is calculated with the help of the price vector /. The function G is called the production function. Clearly G is a nonnegative function defined on the cone Pi]. As a rule it is assumed in economic theory that G is positively homogeneous of degree a > 0 and a continuous function. In this case a function g where g(x) = G(x) 1 ^ 0 is positively homogeneous of degree one and continuous. One of the classical problems of economic theory is the following: find a vector of resources which has a minimal value between all vectors which allow the receipt of an output greater than or equal to the given value c > 0. Clearly this problem coincides with the problem Q(c) which is defined using the functions / and g, where f(x) = l(x) and g(x) is defined as above for all x € R+. We assume also that f(x) = g(x) = 0 for all x which do not belong to iff.. Theorem 7.2 gives both necessary and sufficient conditions for the solution of this problem. Let us note that previously only concave functions g were considered in economic theory. The concavity allows the application of the Karush-Kuhn-Tucker theorem for the analysis of this problem.
Homogeneous
8
Programming
371
Associated Problems
We considered in section 6 maximization problems P(c) with sublinear objective func tion and positively homogeneous constraints belonging to the class PH((X). Now we consider problems of the form P(c) where the objective function is merely assumed to belong to the class PHt(X) and the constraint is superlinear. At first we consider an arbitrary problem P(c) with positively homogeneous nonnegative objective and constraint function: P(d)
g(x)
—» max
subject to
f(x) < d
Here / and g are nonnegative positively homogeneous functions, 0 < d < +oo. Let c be the value of this problem, assume that 0 < c < +oo and that the set T of solutions to this problem is nonempty. If x" G T then g(x") = c, f(x*) = d. Now consider the following problem: Q(c)
f(x)
—> min
subject to
g(x) > c
Lemma 8.1 The value of the problem Q(c) coincides with d. Proof: We have min^)^,; f[x) < f(x*) = d. Assume that there is x' such that g(x') > c and f(x') = d'
(8)
On the other hand the set of solutions of the problem Q(c) has the same form as (8). Thus the problems P(rf) and Q(c) have the same set of solutions and we can consider Q(c) instead of P(d). We say that P{d) and Q(c) are associated problems. Recently Tuy [24] has discussed a similar concept to the idea of associated problems and computational implications. Theorem 8.1 Let us consider the problem P(d) g(x)
—> max subject to f(x) < d.
Assume that all the conditions of Theorem 7.1 are satisfied. Then x* is a solution of the problem P(d) if and only if the intersection -df{x')n-dg{x') d c contains a nonzero linear functional.
A. M. Rubinov and B. M. Glover
372
Proof: We can consider the associated problem Q(c) and apply Theorem 7.1.
Q
Remark 8.1: Let us note that the optimality condition is expressed in terms of superdifferentials. At the same time the optimality condition in Theorem 6.1 is ex pressed using subdifferentials. Now we consider the problem Q(d): g(x)
—> min subject to f(x) > d
and define the associated problem P(c) as follows: f(x)
— ► max subject to g(x) < c.
Assume that 0 < c < +oo, 0 < d < +oo. Lemma 8.2 The value of the problem P(c) coincides with c. Proof: Similar to the proof of Lemma 8.1. □ It is straightforward to check that associated problems have the same solution sets. Theorem 8.2 Consider the problem Q(d): g(x)
—> min subject to f(x) > d.
Assume that all conditions of Theorem 6.1 are satisfied. Then x" is a solution of Q(d) if and only if the intersection \df(x') a contains a nonzero linear functional.
9
n
c
-dg(x')
Sublinear Maximization
Now we consider the problem P(c): f(x)
—> max subject to g(x) < c
where / and g are l.s.c sublinear functions defined on the space X, in this case not necessarily nonnegative. We assume that there exists a solution of the problem with d = max {f(x) : g(x) < c} and - c o < d < +oo, d / 0. It is easy to check that both the system of inequalities c > 0, d < 0 and the system of inequalities c < 0, d > 0 are impossible. So we consider the following two cases: 1. c > 0, d > 0 2. c < 0, d < 0 For the first case, c > 0, d > 0, we require the following version of Lemma 3.1.
373
Homogeneous Programming
Lemma 9.1 Let p be a sublinear l.s.c function defined on X, x* € X and p(x*) = X where 0 < A < -fee. The following assertions are equivalent:
(i)iBL,
(Vx e sx{p))e(x) < e(x'), e(X') = 1
(a) xe e 5p(x*) Proof: (i) implies (ii). The equality (A^)(x*) = p(x') and the inequality (A^)(x) < p(x) when p(x) > 0 were proved in the proof of Lemma 3.1. Now we will prove that (A^)(x) < p(x) if p(x) < 0. Applying the l.s.c of the function p and the inequality A = p(x') > 0 we obtain the inequality p(ax + x*) > 0 for all sufficiently small a > 0. Since p(ax + x*) > 0 we see that the following inequality holds: [X£)(ax + x*) < p(ax + x*). Since p is sublinear we have p(ax + x*) < ap(x) + p(x'). Since (X£)(x*) = p(x*) we have that (A/)(x) < p(x). Thus (\(){x) < p{x) for all x and (X£)(x') = p(x'). Thus
Xi edp{x'). (ii) implies (i). Similar to that of Lemma 3.1
□
Theorem 9.1 Let us consider the problem P(c) where f, g are l.s.c sublinear func tions. Assume that c > 0, d > 0. Let the function f be continuous at x" where g(x') = c. Then x" is u solution of the problem P(c) if and only if the intersection
ia/(z*)ni&r(x*) a contains a nonzero linear functional.
c
The proof is similar to Theorem 6.1 and hence is omitted. Now for the case c < 0, d < 0 we require the following lemma. L e m m a 9.2 Let p be a l.s.c sublinear function with p(x) < 0 for all x 6 domp. Let x* € domp, X = p(x") and A < 0. Then the following assertions are equivalent: (i) £ € X', (Vx 6 Sx{p))e(x) (ii)
< £(x*), ((x') = 1
-X££dp(x').
Proof: Similar to that of Lemma 3.1. □ Now we pass on to the case c < 0, d < 0. Let us note that those x where either f(x) > 0 or g(x) > 0 are not interesting in this case and we will consider only functions / and g with the following properties: (Vx6dom/)/(x) <0,
(Vx G domg) g{x) < 0.
(9)
For example we can substitute for / ( x ) and g(x), +oo at all points where these functions are positive. Thus we obtain l.s.c functions.
A. M. Rubinov and B. M. Glover
374
Theorem 9.2 Let us consider l.s.c sublintar functions f and g such that inequalities (9) hold. Let c < 0 and let d, the value of the problem P(c), be negative. Let the function f be continuous at the point x* such that g(x*) = c. Then the point x" is a solution of the problem P(c) if and only if the intersection (1) contains a nonzero linear functional. Proof: The analysis is similar to that in the proof of Theorem 6.1. It shows that the result will follow if we use Lemma 9.2 instead of Lemma 3.1 and the following assertion which will be proved: if £ is a linear function such that £ ^ 0 and £(x") = max{^(x) : f(x) < f(x*)} then l(x') < 0. We have f(2x') < f(x') in the our case. Therefore £(2x') < £{x'). Hence £{x') < 0. Assume that £(x') = 0. If y € domp then f{x* + y) < f(x') + f{y) < f{x*) and therefore £(y) = £(x" + y) < £(x") = 0. Thus, for all y €. domp, £(y) < 0. Since the function / is continuous at the point x" it follows that Ax* is an interior point of the set S/(r»)(/) under A > 1 and therefore Ax* is an interior point of the cone dom p. If £(x") = 0 and £ is nonpositive on the cone dom p and x* is an interior point of this cone then £ = 0, a contradiction. □ Remark 9.1: If dom / = domg = K and x* 6 int K then we can prove Theorem 9.2 with the help of Theorem 7.1 by considering the functions —/ and — g instead of the functions / and g and the problem Q(— c) instead of problem P(c), where Q(—c) is the problem: ( —/)(x) —> min subject to (— g)(x) < —c. Recently Jeyakumar and Glover [13] have discussed conditions characterizing global optimality for programming problems including sublinear maximization problems such as P(c) using a generalization of Farkas' lemma.
10
Lagrange Multipliers for Sublinear Maximization
In section 6 we established Lagrange multiplier rules for positively homogeneous pro gramming problems with a single constraint. We now consider problems involving a finite number of sublinear constraints. Consider the following extremal problem: (P) / ( x )
—> max subject to #;(x) < 1, i g I = { 1 , 2 , . . . , n}, x £ K
where / , ; (i G / ) are continuous sublinear functions, and K is a closed convex cone. Let g = max, <7, + % where 8K is the indicator function of the cone K. Clearly g is a l.s.c sublinear function. Let x £ A', we shall denote by Mx the cone defined as follows: Mx = c\(K + (Ax)A<0) = cl(/v + (Ax) AeR ).
Homogeneous
Programming
375
Clearly
M; = K* n [(A*)A€R]* = K'nHx where Hx = {£ G X' : ^(i) = 0} is the hyperplane in X' generated by x. Lemma 10.1 For x G K the following holds:
dg(x) = co |J &,(*) - M ; i/Aere J^ = {i £ I: gi(x) = g(x)}. Proof: It is straightforward to show that
dg{x) 2 co U &,-(*) - M;. We have, by applying well known rules in subdifferential calculus, that dg = co |J dg,: - tf* Thus it is easy to establish the reverse inclusion and so the result follows. □ We can now use Theorem 9.1 to study problem (P). By applying this result it follows that x* is a solution of (P) if and only if there are linear functionals £, £', (.; (i G 7X-) and numbers a; > 0 (i G Ix-), J2> a< = 1 such that 1 + 0, dt 6 df{x'),
£, G dg,(x'),
t G Ai£.
e=-£(a,e,)-?
(io)
■€4« Let (." — dL Then we can rewrite this condition in the following form: there are 1° G df(x'), £, G dgi{x') and numbers A,, A0 such that (VxGMI.)(A°r+ £
Xili)(x)<0,
■€/,•
where A0 = 1/d, £ , A, = - 1 , A, < 0. Let A, = 0 for i 0 Ix». Clearly we can consider numbers A0, A j , . . . , A„ as Lagrange multipliers. Let us note that the following condition is a necessary and sufficient condition for a global maximum: A
where d is the value of the problem.
°
I
A. M. Rubinov and B. M. Glover
376
11
Conclusion
Let us consider the classical extremal problem involving the minimization of a sublinear function or, equivalently, the maximization of a superlineax function over a convex set. We consider only the simplest cases for discussion purposes. Let K be a solid closed convex cone, / a continuous superlinear nonnegative function defined on K and g be a continuous sublinear nonnegative function defined on K. We are interested in the following problems: P(c) Q(d)
f(x) g(x)
—> max subject to g(x) < c, x €E K —> min subject to f(x) > d, x g K
Assume that d is the value of problem P(c) then c is the value of problem Q(d). Applying the separation theorem and Lemmas 3.1 and 5.1 it is easy to check that x' is a solution of either problem P(c) or problem Q(d) if and only if there is I ^ 0 such that
ee~df{x')n-dg(x'), d c The following table provides a summary of the results of this paper. Problem
Objective
Constraint
Direction of Constraint
Characterization of optimality: Nonzero £ G X' in
max
subl
PHt
<
ia/(s*)ni&(x')
min
superl
PHI
>
\df{x-)n\d9(x')
max
PHI
superl
<
\df{x*)n\dg{*')
min
PHt
subl
>
\df{x*)n\ds{x*)
max
superl
subl
<
^/(x')ni^(x-)
min
subl
superl
>
]Mx*)nia,(i')
In the above 'subl' denotes sublinear, 'superl' denotes functions, / , in PH°(X) with superlinear restriction on the set A'/; PH( denotes PHt(X), c (d) is the value of the right-hand side of the constraint for constraint function g (/) and the value of the problem if g (/) is the objective function. We can see that we must use subdifferentials for functions belonging to PHt (in particular sublinear functions) and superdifferentials for functions belonging to PH° (in particular functions which have superlinear restrictions). Let / be a continuous and positively homogeneous function defined on a solid cone K and x 6 int K. We can
Homogeneous
377
Programming
apply both subdifferentials and superdifferentials in this case. The table shows that we must use subdifferentials if our function is the objective function under minimization or the constraint function under maximization. We must use superdifferentials if our function is the objective function under maximization or the constraint function under minimization.
12
Applications
Let X and Y be Banach spaces and A : X —» Y be a bounded linear operator. Let us consider the following extremal problem: ||Ax||y —> max subject to ||x||x < 1Clearly the value of this problem is equal to ||A||, the norm of the operator A. Let us denote
l|z|U=ff(*) \\y\W = p{y) \\Ax\\Y = f(x). Thus we have the extremal problem (Pi) f(x)
—► max subject to g(x) < 1.
Here c = 1 and d = \\A\\. Clearly dg = B'x, dp = BY, where B'x and BY denote the unit ball in the respective dual spaces X' and Y' Since f(x) = p(Ax) we have the following chain rule (see, for example, [1]) df = A'dp = A'BY where A* is the conjugate with respect to the operator A. Clearly dg(x) = {I € Bx : l(x) = \\x\\x} = {/ G X' : \\l\\ = 1, l(x) = ||x||;c}- Let us compute df(x). We have df(x)
= {leA'(BY):l{x)
=
\\Ax\\Y}.
Thus / € dj{x) if and only if there is a /' g BY such that / = A'V and l(x) = p(Ax) or equivalently: / = A'l', l'(Ax) = (A'l')(x)
= p(Ax), I'edp
(11)
The formulae (11) show that / G df{x) if and only if / = A'l' where /' G dp{Ax) or df(x) =
A'{dp(Ax)).
Thus we obtain the following result. T h e o r e m 12.1 Let X and Y be Banach spaces and A : X —> Y a bounded linear operator. A point x' G X has the properties 11*1* = 1 ||i4x*|| r = ||A||
(12)
A. M. Rubinov and B. M. Glover
378 if and only if there is an £ ^ 0 such that
t eflff(x')n J^A*(dp(Ax*)).
(13)
Here g(x) = \\x\\x and p(y) = \\y\\Y. If both X and Y are Banach spaces with smooth norms then dg(x*) = {Vg(x*)} and dp(Ax') = {Vp(Ax')}, (x* =£ 0). Therefore (13) can be written in the following form V<7(x*) = p | A'(Vp(Ax')) (14) We can consider (14) as an equation for locating the element x*. Now let X and Y be Hilbert spaces. We have V(x) = w3r- for x # 0, and Vp(y) = iffi - for i / ^ 0 . Thus equation (14) has the form (note that ||x*||x = 1) 1 ,. / Ax* x" = -r-—A* \Ax*\ or equivalently \\A\\\\Ax*\\Yx* = (A*A)(x*)
(15)
If x* is a point satisfying (15) then || Ax*||y = ||A|| and we have the necessary condition for optimality: x* is an eigenvector of the operator A*A with the eigenvalue ||/1|| 2 . Let us now show that this condition is sufficient for optimality. If ||x||x = 1 and ||A||2x* = (A*A)(x*) then
||||A||V|U = ||A||2 = ||(AM)(x-)|U < M*||||Ax-||r. Since ||A*|| = ||A|| we have ||A|| < \\Ax*\\Y. Since \\x\\x = 1 we have \\A\\ = ||Ax*||y. Thus we have the following result. T h e o r e m 12.2 Let X and Y be Hilbert spaces and A : X —► Y a bounded linear op erator. Then a point x* £ X has the properties (12) if and only if x* is an eigenvector of the self adjoint operator A*A with eigenvalue ||A|| 2 .
References [l] J.-P. Aubin and I. Ekeland, Applied Nonlinear Analysis, Wiley, New York, 1984. [2] A. Barbara and J.-P. Crouziex, Concave gauge functions and applications, Zeitschrift fur Operations Research, 40 (1994) 43-74. [3] G. Birkhoff, Lattice Theory, American Mathematical Society, Providence R. I., 2nd Edition, 1948.
Homogeneous Programming
379
[4] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [5] B. D. Craven, D. Ralph and B. M. Glover, Small convex-valued subdifferentials in mathematical programming, to appear Optimizaiion (1994). [6] V. Demyanov and A. M. Rubinov, Introduction to Constructive Nonsmooth Analysis, to appear 1995. [7] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, Elsevier North Holland, Amsterdam, 1976. [8] F. Giannessi, Semidifferentiable functions and necessary optimality conditions, Journal of Optimizaiion Theory and Applications (1989) 191-241. [9] J.-B. Hiriart-Urruty, From convex to nonconvex minimization: Necessary and sufficient conditions for global optimality, in Nonsmooth Optimization and Related Topics, Plenum, New York, (1990) 219-240. [10] J.-B. Hiriart-Urruty and C. Lemarechal, Testing necessary and sufficient conditions for global optimality in the problem of maximizing a convex quadratic function over a convex polyhedron, Preliminary Report, University of Paul Sabatier, Toulouse, (1990). [11] R. Horst and H. Tuy, Global Optimization, Springer-Verlag, Berlin, 1990. [12] A. D. Ioffe and V. M. Tikhomirov, Theory of Extremal Problems, Nauka, Moscow, 1974. [13] V. Jeyakumar and B. M. Glover, Nonlinear extensions of Farkas' lemma with applications to global optimization and least squares, to appear Mathemaiics of Operations Research (1995). [14] S. Komlosi and M. Pappalardo, A general scheme for first order approximations in optimization, Optimization Methods and Software 3 (1994) 143-152. [15] M. I. Levin, V. L. Makarov and A. M. Rubinov, Mathemaiical Models of Economic Interaction, Nauka, Moscow (in Russian), 1993. [16] P. Michel and J.-P. Penot, A generalized derivative for calm and stable functions, Differential and Integral Equations 5 (2) (1992) 433-454. [17] M. Morishima, Equilibrium, Stability and Growth, Clarendon Press, Oxford, 1964. [18] H. Nikaido, Convex Structures and Economic Theory, Academic Press, New York, 1968.
380
A. M. Rubinov and B. M. Glover
[19] P. M. Pardalos and J. B. Rosen, Constrained Global Optimization: Algorithms and Applications, Lecture Notes in Computer Science, 268, Springer-Verlag, Berlin, 1987. [20] R. T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [21] A. M. Rubinov, Differences of convex compact sets and their applications in nonsmooth analysis, in Nonsmooth Optimization: Methods and Applications, F. Giannessi (ed), Gordon and Breach, Amsterdam, (1992) 379-391. [22] A. M. Rubinov and A. Yagubov, The space of star-shaped sets and its applica tions in nonsmooth optimization, Mathematical Programming Study 29 (1986) 176-202. [23] R. M. Solow and P. A. Samuelson, Balanced growth under constant returns to scale, Econometrica 20 (1953) 412-424. [24] H. Tuy, D. C. Optimization: theory, methods and algorithms, Hanoi Institute of Mathematics Preprint 1993, to appear Handbook of Global Optimization, R. Horst and P. M. Pardalos (eds), Kluwer, 1994.
Regularized
Duality
381
Recent Advances in Nonsmooth Optimization, pp. 381-391 Eds. D.-Z. Du, L. Qi and R..S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
On Regularized Duality In Convex Optimization Andrzej Ruszczyriski International Institute
for Applied Systems
Analysis,
2361 Laxenburg,
Austria
Abstract
The general convex programming problem, under constraint qualification, is shown to be equivalent to a non-zero sum game in which objectives of the players are obtained by partial regularization of the Lagrangian function. Based on t h a t , a solution method is developed in which the players improve their decisions while anticipating the steps of their opponents. Convergence of the method is proved and application to decomposable problems is discussed.
1
Introduction
Let / : 1R" (-» IR a n d <j>, : IR" i-» IR, i = 1 , . . . , m , be convex functions, let X C IR" be a convex closed set, and let 6 £ IR m We consider t h e convex p r o g r a m m i n g problem min/(x) 9i{x)
i = l,...,m, x 6 X.
(1) (2) (3)
Associated with problem (l)-(3) is the Lagrangian L : IR" x IT™ >-> IR defined as m
L(x,y) = f(x) + '£yi(Lgi(x)-bi),
(4)
where y 6 Y = IR? is t h e vector of dual variables. T h r o u g h o u t this paper we shall a s s u m e t h a t t h e following condition holds.
A.
382
Ruszczynski
Constraint Qualification Condition. There exists x° 6 ri X such that gi(x°) < bi, i =
l,...,m.
It is well known (see, e.g., [12]) that the following proposition is true. Proposition 1.1 Assume that the Constraint Qualification Condition is satisfied. Then a point £ £ X is a solution of (l)-(3) iff there exists y € Y such that the pair (£,y) is a saddle point of the Lagrangian (4) on X X.Y, i.e L(x, y) < L(x, y) < L(x, y),
Vx e X, Vy € Y.
(5)
This is the starting point of our considerations; we shall aim at developing a new approach to constrained nonsmooth optimization problems based on a saddle point procedure. There were many attempts to solve optimization problems via saddle point seeking methods; the simplest algorithm (see, e.g., [l]) has the form xk+l =
Ux(xk-TkLx(xk,yk)),
yk+1=nY(yk
+ TkLy(xk,yk)),
* = 1,2,...,
where Lx(xk, yk) and Ly{xk, yk) are some subgradients of L at (xk, yk) with respect to x and y, and IIJC(-) and Ily (•) denote orthogonal projections on X and Y, respectively. Such methods are convergent only under special conditions (like strict convexityconcavity) and with special stepsizes for primal and dual updates: rk —» 0, Y^T=o Tk = oo (cf. [10]). One possibility to overcome these difficulties is the use of the proximal point method [9, 14]. Its idea is to replace (5) by a sequence of saddle-point problems for regularized functions ^{i,v)
= L(U)
+ ^\\(-xkWi-P-\\rl-ykr
(6)
A saddle point (£k,rik) of At is substituted for (xk+1,yk+1) at the next iteration, etc. A variation of this approach is the alternating direction method [5, 2]. We are going to develop an iterative method for (5) which does not have saddlepoint subproblems. The key idea, which generalizes and simplifies the concept used for linear programming in a recent work [6], is to replace the regularized function (6) by two convex-concave functions: a primal and a dual one, and to make steps in x and in y using subgradients of these functions. We shall develop the basic concept in section 2, and in section 3 we describe the method. Next, in section 4 we prove its convergence to a saddle point of L. Finally, in section 5 we discuss the application of this approach to some convex optimization problems of special structure.
Regularized
Duality
383
For a convex set X C IR", the cone of feasible directions at x G X is denoted by Kx(x) = {d G IR" : 3 ( T > 0) x + rd G X}. The conjugate (negative of the polar) of a cone K C Ht" is defined to be K* = {d € IR" : V(z G K) (d,x) > 0}. For a convex-concave function L : IR" x ]Rm n M w e use dxL(x,y) and dyL(x,y) to denote its subdifferentials with respect to x and j/. Elements of these subdifferentials (subgradients) will be denoted by Lx(x,y) and Ly(x,y).
2
Regularized Duality
Let us define a non-zero sum game with two players: P and D. The objective of P is to minimize in the variables x £ X the regularized primal function: P(x,y)
= max L{x,r,)-Z\\V-yf
(7)
where p > 0 is some parameter. The objective of D is to maximize with respect to the variables y G Y the regularized dual function: D(x,y) = rrun\LU,y)
+ ^U~x\\A.
(8)
2
i€X
A Nash equilibrium of the game is defined as a point (x,y) G X x Y such that x G argminP(x,2/),
(9)
and y G arg max D(x,y).
(10)
We define the proximal mappings £(i, ?/) and !j(a:, i/) as the solutions of the subproblems in (8) and (7), respectively. We also introduce the error functions: A(x,t,) = U(x,y)
- x\\2 + \\r,(x,y) -
yf,
and the regularized duality gap E(x,y)
= L(x,n(x,y))
-
They satisfy the following relations. Lemma 2.1 For all x G X and y G Y, E(x,y)>pA(x,y).
L(£{x,y),y).
A.
384
Proof. By the definition of £ = £(x,y), there exists a subgradient Lx(£,y)
Lxti,y) + p(Z-x)eK'x(0. As x — £ 6 /<*(£),
we
Ruszczynski such that
(11)
have
L ( I , J / ) - £(£,») > (Z*(&*),* - 0 > p||{ - x|| 2 . In a symmetric way, from the definition of ■q = //(x, j/) it follows that L{x,y)-
L(x,ri) < (Ly(x,ri),y-T))
< -p||r?-y||2
Subtracting the last two inequalities, we obtain the required result. □ We can now prove the equivalence of (5) and our game. Theorem 2.2 The following statements are equivalent: (a) (i,y)
is a Nash equilibrium of the game (9)-(10);
(b) E(x,y)
= 0;
(c) A(x,y)
= 0;
(d) (i,y)
is a saddle point of L over X x Y.
Proof. We denote £ = £(x,y) and fj = n(x,y). (a)=>(b). Since p > 0, the function n(x,y) is continuous. Therefore dxP(x,y) = dxL(x,r)(x,y)). Using this equality in the optimality conditions for (9), we deduce that there exists a subgradient Lx(i,fj) 6 Kx(x). Thus L{£,rj) - L(x,fj) > {Lx(x, fj),i-x)>
0.
Analogously, optimality conditions for (10) yield —Ly((,,y) 6 KY(y) f° r some subgra dient Ly{(,y), so L(i,y)-L(i,f,)>0. Adding the last two inequalities we obtain E(i,y) non-negative, (b) follows.
< 0. Since E{x,y)
is always
(b)=^(c). The result follows immediately from Lemma 2.1. (c)=>(d). Since A(x,£) = 0, one has £ = x and fj = y. By (11), Lx(x,y) e Kx(x) for some Lx(x, y). This is equivalent to the right inequality in (5). Similarly, —Ly(£, y) £ Ky(y) for some Ly(x,y), which completes the proof of (c). (d)=>(a). The left inequality in (5) implies L(x,y) = m&xL(x,r>) = max L(x,V)-^\\r,-y\\2
=
P(£,v).
Regularized
Duality
385
On the other hand, for every x 6 X, from the right inequality in (5) we get L{£,y) < L(x,y)
P,l_ < max L{x,j))--\\Ti-y veY
Consequently, P(i,y) < P(x,y) for all x € X. D(£, i) > D(x,y) for all y £ Y. □
-.1,2
=
P(x,y).
In the same manner we prove
In convex programming the regularized Lagrangian functions take on well-known forms. The regularized primal function is the augmented Lagrangian (cf. [13]) for (l)-(3): P(x,y) = f(x) + -L £ [max (0, <,,(*) - b, + py,)]2 - I f > ; ) 2 -
(12)
The regularized dual function is the augmented Lagrangian for the dual problem, D(x,y)
= min
/(0 + E ».*■(«+ fll(-x|i
(13)
Consequently, Theorem 2.2 shows that a solution of the convex programming problem and the associated dual vector can be obtained as a Nash equilibrium of a game, in which augmented Lagrangian functions serve as players' objectives. It appears to be a step backwards: games are generally more difficult than optimization problems, but our game exhibits regularities that can be exploited by the solution procedure.
3
The Partial Regularization Method
Let us now describe in detail a method for finding a saddle point of L. It is, in fact, a subgradient algorithm for solving the game (9)-(10). It can also be interpreted as a method operating on the original saddle problem in which both players try to predict the moves of their opponents to calculate the best response. Initialization. Choose x° € X, y° G Y and 7 € (0,2). Set Jb = 0. Prediction. Calculate r)k = T](xk,yk) and (,k = Stopping test. If &(xk,yk)
((xk,yk).
= 0, then stop.
Direction finding. Find subgradients Lx(xk,rik)
and Ly((,k,yk)
and define
< = lie* ( - ! . ( « * , 1?*)),
dk =
Ilcky(Ly(e,yk)),
where Ckx and Cy are closed convex cones such that C£ 3 Kx{xk) KY(V").
and CY D
A.
386
Ruszczynski
Stepsize calculation. Determine
n
= —w
L
(14)
Step. Update the points X*+1 = UX (xk + Tkdhx) , yk+1 = Uy (yk + rkdk) , increase k by one and go to Prediction. Let us stress that, contrary to proximal point methods, our approach does not have saddle point subproblems. Instead of them, two auxiliary optimization problems are solved at the prediction step. Our method resembles in some way the extragradient method of [8], but our prediction step uses proximal operators, not just a linear Jacobi step. Owing to that, we can solve nonsmooth problems. We also have a constructive stepsize rule. It should be stressed that projections on Cx and CY are optional; we can always use Ckx = HT and CY = M m . Still, the use of Cx = clKx{xk), CY = cl Ky(yk) is easy in some classes of problems (like polyhedral ones) and yields larger stepsizes, because removal of the normal component may substantially decrease direction lengths.
4
Convergence
To avoid obscuring the main idea, we shall now prove convergence of the method in its basic form, presented in the previous section. Various modifications and extensions will be discussed after the proof. Theorem 4.1 Assume that a saddle point of L on X x Y exists. Then the method generates a sequence l(xk,yk)> _ convergent to a saddle point of L on X x Y Proof. Let (x",y') be a. saddle point of L on X x Y We define ^
= ||^-x«||2
+
||/-^||2
(15)
Our proof uses the general line of argument developed for iterative methods based on abstract Fejer mappings (see Eremin and Astafiev [3] and Polyak [11]). We shall prove that our algorithmic mapping decreases the distance Wk whenever {xk,yk) is not a solution.
387
Regularized Duality
At first we establish a descent property of directions (dk,dk). Using the formula h = nc(/j)+n_ c .(A), which holds for any closed convex cone C, with h = — Lx(xk, r]k) and C = Ckx, we obtain -Lx(xk,r,k)
n_ U_icic>.x).(-Lx(xk,r, ,r)k)).
= dk +
Multiplying both sides of this equation by x' - xk £ Kx{xk) inequality kk kk (d>,z* -- xk) > (L {LIx(x (xkk,,r, ),x v ),x {£,**■
- x') x') > L(xk,r,k)
-
C Cx we get the L(x',r, L(x',Vk).
Likewise, k k k ,y ). )(dkyy.y--y .y'-ykk)>L(e,yl-L(t )>L(e,y')-L(e,y
By the saddle point conditions (5), h L(e,y')>L(x',r, L(e,y')>L(x',r,k).).
Adding the last three inequalities we obtain: (4,x« (4,x«
k k k k k _ (dkk,y' ,y' - yykk)) > L(xkk,,vVk)k) - L(( = E{x £(**,»*)■ _ xxk)) + + (d > L(x L((k,y,yk)) = ,y ).
(16) (16)
This implies, in particular, that dk = {dk,dk) ^ 0, since otherwise one would have E{xk,yk) = 0 and, by Theorem 2.2, the algoritm would stop. Therefore the stepsize (14) is well defined. Since the projection on X is non-expansive, ll^Z-II^W^'
+
r^X-'r
= 11** " 2*H22 + 2Tk{dkkx,Xk k - X') +
= ^
2 Tt2||^|| k 2 - X'\\ + 2Tk(d x,X - X') + T^\\d \\
In a similar way, \\yk+l - y'W2 < \\yk - y'W2 + 2r {dk,yk \\yk+l - y'W2 < \\yk - y'W2 + 2rkk{dk,yk
- f) + r22\\dkk\\22 - f) + r \\d \\
Adding the last two inequalities and using (16) we conclude that 2 2k W*+i <Wk- 2rkEk + rT%\\d t ||/|| ,f,
with Ek = E(xk,yk).
(17) (17)
Substituting (14) we get l{2 l{2 wk+ ^Ek l<w<W kWk+1 ~^kEk . . k-
(18) (18)
Thus the sequence {Wk} is non-increasing and
Jii^ [i^% ==0°-J™Pf
(19) (19)
388
A.
Ruszczynski
Since Wk is bounded, the sequence {(xk,yk)} has an accumulation point {£,$). Thus {dk} is bounded and, by (19), limt-^, Ek = 0. Therefore E(x, y) = 0. By Theorem 2.2, (i,y) is a saddle point of L and we can use it instead of (x*,y*) in (15). Then, from (18) we see that the distance to (x,y) is non-increasing. Consequently, (z,y) is the only accumulation point of the sequence {(xk,yk)}. □ It is clear from the proof that we may replace the stepsize rule (14) with a more flexible requirement,
VA* ,
i(L(xk,Vk)-LUk,yk))
.
P^Ti
W¥- with At = A(xk,yk)
pfjji
Pf
'.
and 0 < A < 7 < 2. Indeed, (17) implies
w^1<Wt-AP-^A' * +1
S
*
(20)
||d*||»
The rest of the proof is the same, but with At instead of Ek. We can also have iteration-dependent parameters 0 < pt < p and 0 < Xk < ~fk < 2, provided that £r=o X^(2 - lk)p\ = =o, because e20) still implies l i m i n f ^ ^ A* t =0 Finally, it should be stressed that instead of proximal operators in the prediction steps, we can use more general mappings with similar properties (see [7] how to modify the proofs in this case). We choose to present the idea with the use of quadratic regularizations just for simplicity, to avoid obscuring it with technical details.
5
Application to Decomposable Problems
Let us now consider decomposable problems of the form n
™n £/,(*,)
(21) (21)
£«,-(*,•)<*, t = l,..,,m,
222) 222)
3=1
xXj:- e ^ , ; = ll,,......,,nn..
(23)
We assume that the functions f} and gtJ are convex and the sets X, are convex and closed. As usual, we introduce multipliers y € E.™ and the Lagrangian n
j=i j=i
m m
.=1 .=1
/ n
v=i \j=i
\\
- 6.1/ ■ /
Our method, when applied to this problem, takes a rather simple form.
Regularized Duality
389
Indeed, the prediction step in the dual variables can be carried out analytically, separately for each constraint: 77,(1,3/;) = max T)i(x,yi) \
P
\i=l
'
/
, i — 1,...,m. + Vi/ 1
(24)
The resulting regularized primal function (12) is the augmented Lagrangian for (21)(23):
max (o, £gvixj)
7=1 7=1
*p < .'=1 ^ =1 m
(
J=l
- b, + py)} - P- £ » * /J
'=1
Consequently, the update of primal variables is a projected subgradient step for the augmented Lagrangian function. It is clearly decomposable. Note that in a related work [15] of ours, we used here a whole sequence of nonlinear Jacobi-type steps. The dual function (13) takes on the additive form: ■ 1
. 1
¥
l»
D(x,y)-D(x,y) =
jrD]{xJ,y)
with Dj( <&) + + fajf 116 - *j|f] , ; =- 1....,». Dj(XiXi,y) ,y)==min min[/,&) [/,&)+5>* + £ W»?»<&) " ^llj ; = L» -,»■(25) (25) The minimizers f* ara used in the dual update, which is just aa nnder-relaxed step of the multiplier method, very similar to (24): + y* ( 0,T^ (£*,■(#) ~ -M 6,| + + Vi) Vi) ,i, «==1,.. 1,...,m. . ,m. y ' ' - max L j (£*,■(#)
In some cases, subproblems (25) can be quite easy to solve. The simplest example is the standard linear programming problem with ftfa) = CjXh gujfa) = atJx_, and Xj = [lj, Uj]. Then (25) has a closed-form solution, which can be calculated in parallel for each j = l , . . . , n . It is worth noting that the regularized dual function D(x,y) becomes the augmented Lagrangian function for the dual problem. Properties of our method in the case of linear programming are analyzed in detail in [6], with limit properties of the stepsizes r^, with the analysis of the rate of convergence, and with some numerical results. In fact, the highly encouraging properties discovered in [6] and analysed in a series of papers [16], [7] and [4] motivated the research reported in the present paper. Acknowledgement. The author is greatly indebted to Markku Kallio, earlier co operation with whom provided an impulse for this work. Thanks are also offered to Sjur Flam for many helpful comments.
390
A.
Ruszczynski
References [1] K. J. Arrow, L. Hurwicz and H. Uzawa, Studies in Linear and Nonlinear Programming (Stanford University Press, Stanford, 1958). [2] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming 55 (1992) 293-318. [3] I. I. Eremin and N. N. Astafiev, Introduction to the Theory of Linear and Convex Programming (Nauka, Moscow, 1976). [4] S. D. Flam and A. Ruszczynski, Noncooperative convex games: computing equilibrium by partial regularization, working paper WP-94-42, IIASA, Laxenburg, 1994. [5] D. Gabay, Application de la methode des multiplicateurs aux inequations variationelles, in: M. Fortin and R. Glowinski (eds.), Methodes de Lagrangien Augments (Dunod, Paris, 1982) 279-307. [6] M. Kallio and A. Ruszczynski, Parallel solution of linear programs via Nash equilibria, working paper WP-94-15, IIASA, Laxenburg, 1994. [7] M. Kallio and A. Ruszczynski, Perturbation methods for saddle point computation, working paper WP-94-38, IIASA, Laxenburg, 1994. [8] G. M. Korpelevich, The extragradient method for finding saddle points and other problems, Ekonomika i Matematicheskee Metody 12 (1976) 747-756. [9) B. Martinet, Regularisation d'inequations variationelles par approximations successives, Rev. Francaise Inf. Rech. Oper. 4 (1970) 154-159. [10] A. S. Nemirovski and D. B. Yudin, Cesaro convergence of the gradient method for approximation of saddle points of convex-concave functions, Doklady AN SSSR 239 (1978) 1056-1059. [11] B. T. Polyak, Minimization of nonsmooth functional,, Zhumal Matematiki i Matematicheskoi Fiziki 9 (1969) 509-521.
Vychislitelnoi
[12] R. T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970). [13] R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Mathematics of Operations Research 1 (1976) 97-116.
Regularized
Duality
391
[14] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization 14 (1976) 977-898. [15] A. Ruszczyiiski, Augmented Lagrangian decomposition for sparse convex op timization, working paper WP-92-75, IIASA, Laxenburg, 1992 (to appear in Mathematics of Operations Research). [16] A. Ruszczyiiski, A partial regularization method for saddle point seeking, work ing paper WP-94-20, IIASA, Laxenburg, 1994.
J. Sun, K. E. Wee and J. Zhu
392
Recent Advances in Nonsmooth Optimization, pp. 392-404 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
An Interior Point Method for Solving a Class of Linear-Quadratic Stochastic Programming Problems' Jie Sun, Kwan Eng Wee and Jishan Zhu Department of Decision Sciences, National Crescent, 0511, Singapore
University
of Singapore,
10 Kent
Ridge
Abstract
The quadratically convergent polynomial algorithm of Ye and Anstreicher is suggested for solving a class of two-stage stochastic programs in which both the present cost function and the recourse problem are linear-quadratic. Such stochastic programs, although are nonsmooth in nature, can be reduced to a linear complementary problem with a special structure. The proposed algo rithm takes advantage of this structure and performs well in computational tests.
1
Introduction
An i m p o r t a n t source of nonsmooth optimization problems is stochastic p r o g r a m m i n g . A two-stage stochastic programming model can be briefly formulated as follows. At the first (current) stage, a decision x G R" has to be m a d e , incurring a direct cost <j>{x), subject to x € X, where A" is a closed set. At t h e second (future) stage, a r a n d o m event is observed with outcome u> € Q, where Q is a probability space. T h e decision x and t h e outcome u> then determine an additional cost rf>w(x). Our task is to m a k e the 'This research is partially supported by grants RP-920068 and RP-930033 of the National Uni versity of Singapore.
IP Method for Stochastic Programming
393
best decision i "here and now" with respect to present cost and constraints as well as the expected cost Ea[il>u(x)] and certain induced constrains. The mathematical form of this model is: minimize <j>(x) + 23„[^w(a:)], subject to x e X.
(1.1)
The most widely used case comes when X is a nonnegative orthant (i.e. the set iff. = {x 6 Rn\xj > 0, j = 1,■ ■ ■, n}),
(1.2)
the superscript T represents transpose, Hu is symmetric and positive semidefiand Hw, and the set Z^ are in principle application might not involve quite so
In many practical situations that motivate our model the recourse function is actually a penalty function of the vector hw — T„x. For example in the case of stochastic programming with simple recourse [9], which includes the stochastic transshipment problem [8] as a special case, 4>^(x) is a piecewise linear function of u> — x. Therefore one might ask why we do not use a more explicit formula than the supremum function (1.2) to designate rjjw{x). It turns out that the formula (1.2) can cover many penalty functions so far used in practice and that it provides a clean duality framework. See the fundamental papers of Rockafellar and Wets [4], [6] for details. Under this framework, problem (1.1) is equivalent to a saddle point problem and it is this saddle point form that opens a door to possible interior point methods for problem (1.1). We may re-write the linear-quadratic case of problem (1.1), according to the discussion above, in the following form: minimize cTx + -xTPx
+ 2 J ^wip^(x), subject to x G X,
(1.3)
u€fi
where 7r„ is the probability of the random event u 6 f l . It has been shown that ipu(x) is in general a convex piecewise linear-quadratic function for every fixed u> in the sense that the function is convex and its domain is a union of convex polyhedra, on each of which the function is given by a quadratic or an affine formula. Therefore, the function is nonsmooth and problem (1.3) is a nonsmooth convex piecewise quadratic program [4].
J. Sun, K. E. Wee and J. Zhu
394
Let
y=Y[Zw,b
=
ir0L
and Q = diag (• • ■, wwH^,, ■ ■ •).
A=
wen \
I
\
/
Then problem (1.3) can be further generalized to the following form: minimize^* f(x) = cTx + -xTPx
+ sup{(6 - Ax)Ty -
-yTQy},
(1.4)
where X = R\, y = J?" are nonnegative orthants, x,c € Rn, y, b G i?"1, m = £ „ |Z U | (|Z U | is the dimension of 2„), A € Rmxn,P g Rn*n,Q € i ? m x m , and the matrices P and Q are symmetric positive semidefinite. It should be noted that the assumption of X being the nonnegative orthant is not a serious restriction. For example, if X = {x £ Rn\Cx = d,x > 0}, where C € R'x" and d € R', then we introduce additional vectors y1 > 0 and y2 > 0, and put an additional term sup {(d-Cx)T(y'-y2)} y',y2eR'+
in f{x). The new term produces infinite penalty for the violation of Cx = d. By redefining y = y x R'+ x R\ and redefining 6, A, and Q appropriately, we obtain an equivalent problem on X x y with X = R^_. Therefore, as a preliminary study, we concentrate on the case of X and y being nonnegative orthants in this paper. Several methods have been proposed to solve problem (1.4) including the finite gener ation method of Rockafellar and Wets [6], the projected gradient method of Zhu and Rockafellar [13], the steepest descent method of Zhu [12] and the infeasible interior point method of Wright and Ralph [10]. For the special case where both matrices P and Q are diagonal and both X and y are boxes, a simplex-active-set method has been developed [5]. In this paper we explain how the recent predictor-corrector algorithm of Ye and Anstreicher [11] can be applied to problem (1.4). Unlike other algorithms, the algorithm possesses polynomial complexity and local quadratic rate. Our computational results show that the algorithm is apparently efficient and the total number of iterations increases insignificantly as the dimension of both primal and dual problems increases. In Section 2 of this paper we estimate the error of the iterates and state global and local convergence results. Then we study a variety of this method that incorporates the special structure of (1.3), present preliminary results of our computational tests, and conclude the paper in Section 3.
IP Method for Stochastic
2
Programming
395
The Algorithm and its Convergence Properties
Problem (1.4) has a symmetric dual problem: maximize^? g(y) = bTy - -yTQy •£ The corresponding saddle function is: /(*, y) = cTx + -xTPx
- sup{{ATy - cfx - -xTPx}. xex 2
(2.1)
+ bTy - ^yTQy - yTAx.
(2.2)
It can be seen that the dual problem is large-scale, which is meant to specify decisions zu with respect to all possible realizations of the random event u>. However, the primal vector x, together with matrix P, is likely to be of ordinary size. A strong duality theorem has been established for problems (1.4), (2.1), and (2.2), see [6] and [4]. It states as follows. The Strong Duality Theorem If both (1.4) and (2.1) are feasible (i.e. There exist x £ X and y 6 y such that f(x) < oo and g(y) > — oo), then both problems have finite optimal values and optimal solutions. In addition, the primal optimal solution i* and the dual optimal solution y* form a saddle point (x", y*) of (2.2), and the value l(x",y") is the common optimal value of (1.4) and (2.1). According to this theorem, in order to find optimal solution for (1.4) or (2.1), we only have to find a saddle point for (2.2). Since l(x,y) is a convex-concave function, the sufficient and necessary conditions for (x, y) to be a saddle point of (2.2) on X X y are -Vxl(x,y)eNx(x), Vyl(x,y)eNy(y), (2.3) where Nx(x) stands for the normal cone of X at i and Ny(y) has a similar meaning. The condition can be equivalently translated into an equation-inequality system Px — ATy — w = —c Ax + Qy — s = b • wTx = 0 sTy = 0 x,w,y,s > 0.
(2.4)
This is a linear complementary problem of a special form. Our task is to select a specific interior point method that is suitable for the structure of problem (1.3). To apply an interior point method to problem (2.4), we need an additional assumption.
J. Sun, K. E. Wee and J. Zhu
396
Assumption 2.0 Problem (2.4) has an interior feasible solution. That is, there is a quadruple (x,y,s,w) > 0 such that the first two equations of (2.4) is satisfied. Under this assumption, problem (2.2), and thus problem (2.4), will have a solution as shown in the following proposition. Proposition 2.1 Under Assumption 2.0, problem (2.2) has a saddle point on X x y.. Proof. Let J V ( P ) and J V ( Q ) be the null spaces of P and Q, respectively. Let TCX and rcybe the recession cones of X and y, respectively. According to [4], x is feasible to the primal problem if and only if xeX
and 6 - Ax € [ r c j n Af(Q)}°,
where <S° represents the polar cone of the set S; that is 5° = {p\pTq < 0, for q £ £ } . Now we have rcy n Af{Q) = {q\Qq = 0,q > 0}. Thus the polar cone is [rcy n Sf{Q)]° = {p\p = Qy -s
for some y 6 Rm and some 0 < s £
Rm}.
Thus x > 0, s > 0, and the second equation in (2.4) imply the feasibility of x to problem (1.4). Similarly y > 0, w > 0, and the first equation in (2.4) imply the feasibility of y to problem (2.1). We conclude that function (2.2) has a saddle point on X x y according to the strong duality theorem. D Let us define the central path of problem (2.4) as the set {(x(fi),y(fi),w(n),s(ii))\p, where (x(p),y(fi),w(p),s(n))
> 0},
is the solution of the following system: Px — ATy — w = —c Ax + Qy — s = b ■ WjX} = fi, j = l,...,n Siyi = fi, t = l,...,m i , w,y,s > 0.
(2.5)
The proposed algorithm finds a sequence of approximate solutions of (2.5) as fi { 0, starting from an approximate solution to (x(p.B), j/(/i 0 ), w(^o), S(A'O)), where fi0 is the
IP Method for Stochastic
397
Programming
initial value of fi. In particular, the proposed algorithm performs one-step Newton's method to get an approximate solution of (2.4) (the predictor step). Accordingly, the parameter n is reduced. However, to keep the solution estimable, the algorithm then performs a one-step Newton's method to get an approximate solution of (2.5) (the corrector step), so that the new iterate is still close to the central path.
Denote the positive diagonal matrices diag(ii, ■ • ■ , i „ ) , diag(yi, ■ • ■, j/ m ), diag(u>i, ■ ■ ■ ,w and diag(s!, ■ • • ,sm) by X, Y, W and S, respectively. To describe the extent of ap proximation to the central path, we define a proximity function * ( x , y , w , . , / 0 = ( I I — - e\\l + | | ^ - e|| 2 ) 1/2 , (2.6) fl ft where e is a vector of ones of compatible dimension. With a little abuse of the notations, the same e is used no matter what the dimension is. The following result provides an error bound for an approximate solution of (2.5) which satisfies the first two equations of (2.5) but may not satisfy the other equations. Proposition 2.2 If xk > 0 and y > 0 satisfy the first two equations of (2.5) together with some wk > 0 and sh > 0 and 6(xk,yk,wk,sk,pk) < a, then 0 < f(xk) - g(yk) < (1 + a/Vn + m){n + m)fik. Proof. By definitions of f(x) and g(y) (see (1.4) and (2.1)), we always have f(x) > g(y) for all {x,y) £ X x y (the weak duality). Therefore we only need to prove the second inequality. We have f(x") = cTxh + \{xk)TPxk T k
k
= c x + \(x fPx
k
2
=
+ sup{[6 - Axk)Ty - V Q J / } j,>0
1
+ sup{[Qy
^ k
k T
\yTQy}
- s}y -
y>0
^
k 1 k k 1 k k cTx + -{x )Tpx + -(y fQy
+ *Mkyk)TQyk
- \yTQy - ( < ? / ) V - v] - (*kfy}
< CTX* + l-{xkfpxk + \(yk)TQyk (2.7) The last inequality uses the convexity of yTQy and the nonnegativity of (sk)Ty. A symmetric argument for the dual problem implies 9(yk)>bTyk-\(yk)TQyk-\(xk)T*k-
(2.8)
J. Sun, K. E. Wee and J. Zhv
398
Subtracting (2.8) from (2.7) and using the first two equations of (2.5), we have
/(**) - g(yk) < cTxk + (xkfPxk - 6 V + (yk)TQyk = (wkfxk + (sk)Tyk. On the other hand, from 6(xk,yk,wk,sk,fik) {wkfxk
+ (skfyk
< \\4
= eTWkxk + eTSkyk
< a, we have = er (
^
"
^
+ ("» + » V *
f Wkxk - fike ' \ Skyk - nke J + (m + n)iik < aiik^/n + m + (n + m)fik.
Hence we have f{xk) - g{yk) < (1 + a/y/n + m){n + m)jj,k
Now we are ready to state the proposed algorithm, which is a specialization of a method of Ye and Anstreicher [11] for linear complementarity problems to the case of problem (2.4). Note that by setting A = 0 and A = 1, the Newton directions of (2.4) and (2.5) are determined by WAx + XAw = -Xw + Xfie SA.y + YAs = -Sy + X/ie Aw = PAx - ATAy As = AAx + QAy.
(2-9)
The associated direction is the predictor (afnne-scaling) direction if A = 0 and is the corrector (centering) direction if A = 1. Algorithm 2.3 (Ye and Anstreicher [11]) Step 0 (Initialization) Let k = 0. Choose (x°, y°, w°, s°) > 0, fi0 > 0 and 0 < a < 1/4 such that The first two equations of (2.4) are satisfied by (x°,y°, w°,s°), and such that <5(:E 0 ,I/ 0 ,U> 0 ,S 0 ,//O) < Q-
Step 1 For k = 0,1, ■ • •, until \ik < e/[(l + a/y/n + m)(n + m)] (e is the user assigned tolerance), do Step 1.1 Solve (2.9) with x = xk,y = yk,w = wk,s = sk,n = /xt, and A = 0. Denote by Axp, Ayp, Awp, and Asp the resulting directions. Let ,k_(Xk \0
0WAx"\ Yk)\Ayp)'
2a '-^
a
2+
4a||rf*||+tt'
IP Method for Stochastic
Programming
399
x(8) = xk + 9Ax",
y{9) = yk + 9Ayp,
w(9) = wk + 6Aw",
s{9) = sk + 8Asp,
and H{0) = {x(9)Tw(8) + y{6)Ts(9)}l{m
+ n).
This is the predictor step. S t e p 1.2 Solve (2.9) with x = x(9),y = y(8),w = w{9),s = s(9),ft = y,{B) and A = 1, resulting in A i c , Ayc, Awc and A.sc Let xk+l = x(9) + Axc,
yk+1 = y(9) + Ayc,
wk+1 = w{9) + Aiuc,
sk+l = s{8) + Asc,
and fik+\ — l*(9)- This is the corrector step. Update k and go to next iteration of Step 1. Convergence properties of this algorithm, stated in [11] (which uses an earlier result of Ji, Potra, and Huang [1]), are as follows. Theorem 2.4 Assume that problem (2.4) has a strictly complementary solution. That is, there is a solution (x,y,w,s) of (2.4) such that x + w > 0 and y + s > 0. Then 1. (xk,yk,wk,sk) k
k
> 0 f o r all jfc;
k k
2. (x ,y , w ,s ) satisfies the first two equations of (2.4) (Thus xk and yk are feasible to (1.4) and (2.1), respectively, according to the proof of Proposition 2.1); 3. The algorithm has iteration complexity 0(^/m + nL), where L is the input length of (2.4); 4. (xk)Twk
+ (yk)Tsk
-> 0
Q-quadratically.
We make some remarks on this algorithm. Remark 1 There are many available interior path-following methods for linear com plementarity problems. This one seems to be among the best of them in theoretical properties concerning local and global convergence.
400
J. Sun, K. E. Wee and J. Zhu
Remark 2 A serious drawback is that the algorithm needs a starting point near the central path. For a primary study, we can generate random testing problems with the required initial point, as we will do in Section 3. For practical problems, we might select an infeasible version of the algorithm like [3] (by "infeasible" algorithms we mean the algorithms that can start from arbitrary x > 0 and y > 0) or use a standardized approach to construct an initial solution in step 0, see [2J. In general, those standard approaches tend to increase the complexity of the algorithms. Remark 3 Current research indicates that the assumption on the existence of a strictly complementary solution can be removed at the cost of losing a little rate of convergence. In a recent paper [3], Mizuno proposed a predictor-corrector algorithm that does not require this assumption and has a superlinear convergence rate. Remark 4 Note that the steplength 6 can be computed through a formula in Step 1.1. Therefore, unlike the existing algorithms for problem (1.4), the algorithm does not need any line search involving f(x) or g(y).
3
Computational Aspects of the Algorithm
The major computational effort of the algorithm is spent on solving equation system (2.9). Since the matrix Q is extremely large, the key point is how to reduce the amount of work by taking advantage of the block-diagonal structure of Q. To achieve this goal, we re-write system (2.9) as ' WAx + X(PAx - ATAy) = -Xw + X^e SAy + Y{AAx + QAy) = -Sy + A/xe Aw = PAx - A7 Ay As = AAx + QAy. Solving the second equation for Ay and substituting it into the first equation, we get the following equivalent system: ' [(P + X-1W) + AT(Q + Y-1S)-1A]Ax = AT{Q + Y-'Sy'i-s + A/zF-'e) - w + < (Q + Y-1S)Ay = -AAx-s + \fiY-1e T Aw = PAx - A Ay As = AAx + QAy.
XfiX^e (3.1)
The solution of this system consists of three steps. First, we solve for (U, v) the
IP Method for Stochastic
Programming
401
equation system [Q + r- 1 5)(f/, v) = (A, -a +
\v.Y-lc).
Note that Q + Y~lS is positive definite due to y > 0 and s > 0. Since Q + Y^S is block-diagonal, the solution (U, v) can be obtained by decomposing the system into |ft| (the number of elements in ft) small systems and solving them in parallel (in sequential in our implementation). Second, we compute the left-hand side matrix and the right-hand side vectors of the first equation in (3.1) by using (U, v) and solve the resulting equation. Since the dimension of this equation is ordinary and the left-hand side matrix is positive definite due to x > 0 and w > 0, this goal can be achieved even if the left-hand side matrix is dense. Finally, we substitute Ax obtained in the second step into the second equation to get Ay, again by using block-diagonal decomposition of Q + Y~XS, and compute AID and As by using the rest of the equations in (3.1). The block-diagonal structure of matrix Q can be used to save computer memories as well. As a matter of fact, the result of (Q + Y^S^A = U is computed blockwise and multiplied with AT and added to P + X~l W in the same fashion. There is no need to store a large matrix like U. Factorization of a block of Q + Y_1S can be saved for later use in solving the second equation in (3.1). The algorithm has been implemented on a DEC7000/620 computer under the UNIX operating system at the National University of Singapore for solving a sequence of randomly generated problems. The primal vector x has four sizes: n = 10,20,50, and 100. In each group, for all UJ £ ft, the dimension of vectors z^ is identical, which we call the block dimension of the problem. The probabilities i u are also identical. The size of the probability space |ft| in problem (1.3) is so designed that the dimension of vector y in problem (1.4) is 100, 1000, 5000, and 10000, respectively. For instance, if the block dimension is 50 and m = 10000, then the size of the probability space is 10000/50=200. The matrix Q in problem (1.4) then have 200 blocks with each block being a 50x50 dense matrix. A total of 32 problems is tested. An initial solution (x°,y°,w°, s°) > 0 is randomly generated together with P, Q, and A and special care is taken to make P and Q positive definite. The right-hand side vectors 6 and c in (2.4) are then computed so that the first two equations of (2.4) are satisfied. Special care is taken so that the initial point satisfies W0x° w e and S0y° w e. Therefore, Ho = 1 is easily chosen to satisfy S(x°, y°, w°, s°, ^ 0 ) < a = 0.25. The stopping criteria is that every x}Wj (j = 1, • ■ • ,n) and every ytst (i = 1, • • •,m) must be less than 10~8 An additional feasibility check is done by computing the 1norms of the vectors Px- ATy-w + c and Ax + Qy-s-b before the algorithm outputs computational results. We find none of them exceeds 7xl0~ 1 0 in all 32 problems.
402
n 10 20 50 100
J. Sun, K. E. Wee and 3. Zhu m=100 cpu time itn no. 3.30E-01 21 5.69E-01 22 2.00E+00 22 6.57E+00 23
m=1000 cpu time itn no. 3.94E+00 25 6.82E+01 25 2.11E+01 25 8.57E+01 33
m=5000 cpu time itn no. 49 4.51E+01 47 7.14E+01 44 2.01E+02 47 6.46E+02
m=10000 cpu time itn no. 56 1.04E+02 1.72E+02 55 5.23E+02 57 1.45E+03 52
Table 1. CPU Time and Number of Iterations with Block Dimension = 20 (itn no. = number of iterations)
n 10 20 50 100
m=100 cpu time itn no. 4.84E-01 17 6.01E-01 14 1.87E+00 16 5.58E+00 18
m=1000 cpu time itn no. 7.89E+00 25 1.18E+01 25 2.85E+01 25 8.71E+01 31
m=5000 cpu time itn no. 7.30E+01 40 1.08E+02 41 2.58E+02 42 6.00E+02 44
m = 10000 cpu time itn no. 1.96E+02 52 2.75E+02 51 7.20E+02 55 1.60E+03 52
Table 2. CPU Time and Number of Iterations with Block Dimension = 50 (itn no. = number of iterations)
The computational results are shown in Tables 1 and 2. It is seen that for fixed block dimension, the CPU time increases with respect to both n and m. Moreover, the number of iterations is dominated by m, which coincides with the theoretical estimate of 0(y/n + mL) w 0{^JmL) (for m » n). However, the increase of num bers of iterations is slower than yjm + n as the size of the problem increases, which is commonly observed in computational experiments of interior point methods for linear programming. Unfortunately, we have not been able to compare the algorithm with other existing methods for two-stage stochastic programming problems because there seems to be a lack of common basis for such a comparison. For example, the results reported in Ruszczyriski [7] is for linear problems with more general recourse functions, while the results in Zhu and Rockafellar [12)[13] are for specially structured optimal control problems with m = n. It is inappropriate to use those test problems to examine an algorithm for quadratic stochastic programming. In summary, our research indicates that interior point methods can be adopted to incorporate the special structure of two-stage linear-quadratic stochastic program ming problems. Since the proposed algorithm has good theoretical properties and
IP Method for Stochastic
Programming
403
its performance is satisfactory in our preliminary computational experiments, we feel that it is worthwhile to pursue further research in this direction.
References [1] J. Ji, F. Potra and S. Huang, A predictor-corrector method for linear comple mentarity problems with polynomial complexity and superlinear convergence, Preprint, Dept. of Math. University of Iowa, Iowa, USA (1991). [2] M. Kojima, M. Megiddo, T. Noma and A. Yoshise, A unified approach to in terior point algorithms for linear complementarity problems, Lecture Notes in Computer Science No. 538, Springer-Verlag, Berlin, Germany (1991). [3] S. Mizuno, A superlinearly convergent infeasible-interior-point algorithm for ge ometrical LCPs without strictly complementary condition, Preprint No. 214 Mathematische Institute der Universitat Wuerzburg, Germany (1994). [4] R. T. Rockafellar, Linear-quadratic programming and optimal control, SIAM Journal on Control and Optimization 25 (1987) 781-814. [5] R. T. Rockafellar and J. Sun, A finite simplex-active-set method for monotropic piecewise quadratic programming, in: D. Du and J. Sun eds. Advances in Op timization and Approximation, Kluwer Academic Publishers, Dordrecht, The Netherlands (1994). [6] R. T. Rockafellar and R. J.-B. Wets, A Lagrangian finite generation technique for solving linear-quadratic problems in stochastic programming, Mathematical Programming Studies 28 (1986) 63-93. [7] A. Ruszczyriski, A regularized decomposition method for minimizing a sum of polyhedral functions, Mathematical Programming 35 (1986) 309-333. [8] J. Sun, K.-H. Tsai and L. Qi, A simplex method for network programs with con vex separable piecewise linear costs and its application to stochastic transship ment problems, in: Network Optimization Problems: Algorithms, Applications and Complexity, D. Du and P. Pardalos eds. World Scientific Publishing Co., London, UK (1993). [9] R. J.-B. Wets, Solving stochastic programs with simple recourse, Stochastics 10 (1983) 219-212.
404
J. Sun, K. E. Wee and J. Zhu
[10] S. Wright and D. Ralph, A superlinear infeasible-interior-point algorithm for monotone complementarity problems, Preprint, MCS-P344-1292, Math and Comp. Sci. Division, Argonne National Laboratory, Argonne, IL, USA (1993). [11] Y. Ye and K. Anstreicher, On quadratic and 0(- v /ni) convergence of a predictorcorrector algorithm for LCP, Mathematical Programming 62 (1993) 537-552. [12] C. Zhu, On the primal-dual steepest descent algorithm for extended linearquadratic programming, Preprint, Dept. of Math Sciences, The Johns Hopkins University, Baltimore, MD, USA (1992). [13] C. Zhu and R. T. Rockafellar, Primal-dual projected gradient algorithms for extended linear-quadratic programming, SIAM Journal on Optimization 3 (1993) 751-783.
A Newton
Method
for Vasiational
Inequality
Problems
405
Recent Advances in Nonsmooth Optimization, pp. 405-417 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
A Globally Convergent Newton Method for Solving Variational Inequality Problems with Inequality Constraints Kouichi Taji a n d Masao Fukushima 1 Graduate School of Information Science, Takayama, Ikoma, Nara 630-01, Japan
Nara Institute
of Science
and
Technology,
Abstract
Newton method for variational inequality problems can be made globally con vergent by incorporating a line search for a merit function. But the existing methods may not be easy to apply to problems with general nonlinear con straints, because the evaluation of a merit function requires the solution of an optimization problem with such nonlinear constraints. In this paper, we propose a new globally convergent Newton method for solving variational in equality problems with general inequality constraints. The method solves at each iteration an affine variational inequality problem, in which not only the mapping of the problem but also the constraints are linearized. In a line search, it makes use of the merit function recently proposed by the authors, which can be evaluated by solving a convex quadratic minimization problem with linear constraints. Thus each step of the algorithm can be carried out finitely even if the constraints are nonlinear. We show that, when the mapping is strongly monotone, the method is globally convergent to the solution, and that, under some additional assumptions, the rate of convergence is superlinear.
x This work was supported in part by the Scientific Research Grant-in-Aid from the Ministry of Education, Science and Culture, Japan.
K. Taji and M. Fukushima
406
1
Introduction
We consider the variational inequality problem of finding x~ € S such that {F(x*),x-x')
> 0 for all x£S,
(1)
1
where 5 is a nonempty closed convex subset of iT , F is a continuously differentiable mapping from B" into R" and (•, •) denotes the inner product in i f . In this paper, we suppose that the set S is specified by S= {x<= Rn\ci(x)
<0,
i = l,...,m],
(2)
where c< : R" —> R are twice continuously differentiable convex functions. Many iterative methods, such as Newton method, projection methods, the linearized Jacobi method and the successive over-relaxation methods, have been proposed to solve the variational inequality problem (1) (see [7] and the references cited therein). Among them, the Newton method generates a sequence {xk}, where xk+1 is a solution to the linearized variational inequality problem (F(I') + V F (
I
' ) V
+ 1
-
I
' ) , I - I
H I
) > 0
for all
x€
S.
(3)
It can be shown that, under suitable assumptions, the Newton method converges quadratically to a solution i", provided that an initial point x° is chosen sufficiently close to x" [7, Theorem 4.1]. For the variational inequality problem (1), various merit functions have been proposed and their properties have been studied [1, 2, 4, 8]. Those functions can be used to globalize the Newton method. Marcotte and Dussault [10] obtained a globally convergent Newton method by incorporating an exact line search strategy for the gap function g{x) = ma,x{{F(x),x - y) \ y £ S}, introduced by Auslender [2]. Another modification is the one recently proposed by Taji, Fukushima and Ibaraki [12], which makes use of Armijo line search for the regularized gap function / ( x ) = m a x | - ( F ( x ) , y - x ) - - (y - x,G(y - x))
yes
}•
introduced by Fukushima [4], where G is an n x n symmetric positive definite matrix. Under suitable assumptions, both modifications are shown to be globally convergent to a solution with quadratic rate of convergence [10, 12]. Note that the above mentioned methods tacitly assume that the constraint set S has a relatively simple structure. For example, when 5 is a polyhedral convex set, that
A Newton Method for Variational Inequality
407
Problems
is, the functions c, are all affine, the variational inequality subproblem (3) of New ton method becomes an affine variational inequality problem and the gap functions g and / can be evaluated by solving linear and quadratic programming problems, respectively. However, when 5 is a general convex set defined by nonlinear convex functions, solving the linearized subproblem (3) and evaluating g(x) and f(x) should be considered difficult tasks. In this paper we propose a new globally convergent Newton method for solving vari ational inequality problems with general inequality constraints. The method solves at each iteration an affine variational inequality subproblem, in which not only the mapping F but also the constraint functions c, are linearized. Moreover it makes use of the merit function recently proposed by Taji and Fukushima [13] to obtain global convergence. The method has a clear advantage over the methods that solve subproblems (3) and use the merit function g or / , in that each step of the algorithm can be carried out finitely even if the set 5 is specified by nonlinear inequalities. It is shown that, when the mapping is strongly monotone, the method converges globally to the solution, and that, under some additional assumptions, the rate of convergence is superlinear. The method is closely related to a successive quadratic programming method for solving nonlinear programming problems.
2
Preliminaries
In this section, we summarize some preliminary facts which will be useful subse quently. The mapping F : -ft" —> ft" is said to be monotone if {F(x) - F(y), x - y) > 0 for all
x,yeRn,
strictly monotone if the above inequality holds strictly whenever x ^ y, and strongly monotone with modulus \i > 0 if (F(x) ~F(y),x-y}>n\\x-y\\2
for
all x, y € ft"
It is well known [11, Theorem 5.4.3] that, when F is continuously differentiable, F is strongly monotone with modulus \i if and only if the Jacobian VF(x) satisfies {d, VF(x)d)
>n\\d
||2
for all x,deRn
(4)
It is also well known [7, Corollary 3.2] that, when the mapping F is strongly monotone, the variational inequality problem (1) admits a unique solution.
408
K. Taji and M. Fukushima
A function
tioT>0
at a tn the
{x) 6(x - 4>{x) — *i —
exists. We call the limit the directional derivative and denote it by 0'(x;d). In the remainder of the paper, we suppose that the Slater's constraint qualification holds for (2), i.e., there exists an i 6 if1 such that c,(x) <0 for all i= , . . . , , m .
(5)
Under this assumption, x" is a solution to (1) if and only if there exists a Lagrange multiplier vector A* = (X^,,..,A'm) such that (x",\') is a solution to the following mixed nonlinear complementarity problem [7, Proposition 2.2] : m
F(X*) + F(x') + X:A*V 52\*Vc C ,(X-) i(x') == 0,0,
(6)
c,(x") < < 0" 0" A* > 0, A*Ci(x') = 0, i = = l , . . . , m. m.
3
A Merit Function
In this section, we review the merit function recently proposed by the authors [13] for the inequality constrained variational inequality problem. The reader may refer to [13] for details. Choose an n x n positive definite matrix G and define the function / : R" -> R bb /(*) = max { -(F(x),
x) - i\ (y [y - xtG(v G{v ~ *)) |\ je r ( * ) } , j,y - z)
(7)
where T(x) is the polyhedral convex set defined by T(x) = {y€R"\c,(x) {i,€/r |c,(x) + {Vc,(x) (Vc,(x),2/-x) >y-x)
< <0,0, i = 1 , . . . , m } .
By the convexity of a, it is easy to verify that, for all x e Rn, T(x) is a closed convex set containing 5 . Thus / provides an over-estimate of the function / introduced by Fukushima [4], i.e., f(x) > f{x) for all x € R* In particular, when c,- are all linear, / coincides with / . Note that the positive definiteness of G guarantees that the maximum in (7) is always attained by y = H(x) uniquely, where H(x) is the unique solution y to the convex quadratic programming problem QP(i):
minimize, subject to
\(y - x,G(y - x)) + (F(x), y - a) ^(y-x,G(y-x)) (F(x),y-a) c,(x) <0, Ci(x)++ (Vc,(x),y-x) < 0, »> = l , . . . , m .
.8)
409
A Newton Method for Variational Inequality Problems Therefore, the function / can be written as f(x) = ~{F(x), H{x) - x) - i (H(x) - x, G(H(x) - x)).
(9)
Let A(x) denote the set of optimal Lagrange multiplier vectors for QP(x), that is, A(x) = {A G iK" f | F(x) + + G(H(x) G(H{x) - x) + f > , V c , ( x ) = 0, A; A; > 0, \i[ci(x) + (Va(x),H(x)
-x)}
(10)
=0,i = l,...,m}.
Since H(x)) = i - holds for the solution x* of (1) [13, Lemma 2.1], A(x") coincides with the set of vectors A* satisfying (6). Using the function / , we can formulate the optimization problem minimize f(x)
subject to x6 x 6 55..
(11)
We can prove that this problem is equivalent to the variational inequality problem (1). Proposition 3.1 [13] Let the function f:Rn^Rbe defined by (7). Then f(x) > 0 for all xG S. Moreover, x £ 5 and f{x) = 0 if and only if x solves the variational inequality problem (1). Hence x solves (1) if and only if it solves the opttmization problem (11 and / ( x ) = 0. For given i £ RT and A e UT, we define the matrix M(x, A) by m
M(x,A) = VF(x) + X > ; V c , , ( x ) .
(12)
t=i
The next proposition demonstrates the directional differentiability of / . Proposition 3.2 [13] Suppose that the mapping F is continuous and the convex functions C, i — 1 , . . . ,m, are continuously differentiable. Suppose also that the Slater's constraint qualification (5) holds. Then the function f defined by (7) is continuous on R". Moreover, if F is continuously differentiable and c{, i = l,...,m, are twice continuously differentiable, then f is directionally differentiable in any direction d 6 Rn and its directional derivative f'{x;d) is given by /'(x; d) = min (F(x) - [M(x, A) - G](H{x) fix; G](H(x) - x), d).
(13)
K. Taji and M. Fukushima
410
Remark 3.3 If the set A(x) is a singleton {A}, then / is differentiable at x and the gradient is given by V/(x) = F{x) - [M(x, A) - G](H(x) - x). A sufficient condition [3, Theorem 6] for A(x) to be a singleton is that the gradi ent vectors Vc;(x),i G I(x), are linearly independent, where I(x) = {i | c,(x) + (Vci(x),H(x) — x) = 0}, and the strict complementarity condition is satisfied, i.e., A,- = 0 implies c;(x) + (Vc,(x), H(x) - x) < 0. By Proposition 3.1, x solves the variational inequality problem (1) if and only if x is a global optimal solution of (11). The next proposition gives a condition under which any point satisfying the first order necessary optimality condition for (11) is actually a global optimal solution of (11). Proposition 3.4 [13] Suppose that the mapping F is continuously differentiable, Vi^(x) is positive definite for all x, the convex functions c,, i = 1,... , m, are twice continuously differentiable and the Slater's constraint qualification (5) is satisfied. If x 6 S and f'(x-y-x) > 0 for ally 6 S, then x is a global optimal solution of (11), and hence x is a solution to (1). The following lemma will be useful in the next section. Lemma 3.5 For any x, we have m 1 / ( x ) = - ( t f (x) - x, G(H(x) - * ) ) - £ A,C,(x)
m
>-£A,c,(x) for any A 6 A(x). In particular, if x € 5, then f(x)>l-(H(x)-x,G(H(x)~x)).
Proof. Since H(x) solves (8), it follows from the definition (10) of A(x) that each vector A 6 A(x) satisfies F(x) + G(H{x) - x) + £ A, Vc,-(x) = 0, 1=1
c,(x) + ( V c , ( x ) , # ( x ) - x ) < 0 ,
A,>0,
A,[c,(x) + (Vc,(x), H{x) - x)} = 0,z = 1 , . . . ,
m.
A Newton Method lor Variational Inequality
Problems
411
Hence, we have from (9) f(x)=-{F(x),H(x)
-x)-
l
-(H(x)
- x,G(H(x)
- «))
={G{H{x) - x), H(x) - * ) + / £ A, Vc,( x ), H(x) -
x\
-1-(H(x)-x,G(H(x)-x)) =\(H{x)
- x, G(H{x) - x)) + / £ A,Vc,(s), # ( * ) - « \
m 1 =-
m
>-^A,Ci(x), where the last inequality follows from the positive definiteness of G. Since A, > 0 and c,(x) < 0, t = 1 , . . . ,m, for all x S S, the last part of the lemma follows immediately. D
4
Globally Convergent Newton Method
In this section, we present a globally convergent Newton method for the variational inequality problem (1), which incorporates an Armijo-type line search procedure for the penalty function 6T : Rn —> R defined by m
0r(x) = f(x) + r J2 max(0, c,(x)), where r is a sufficiently large positive parameter. By Proposition 3.2 and [6, Lemma 3.1], 9r is directionally differentiable and the directional derivative is given by e'T(x- d) = f{z; d) + rJ2 (Vc,(x), d) + r £ max(0, (Vc,(x), d)), ;e/+ ie/o
(14)
where /+ = {i|c,(x) > 0} and I0 = {i | c,(x) = 0}. Throughout this section, we assume that the mapping F is continuously differentiable and strongly monotone with modulus n, so that VF satisfies (4). Note that, since the convexity of c, guarantees that V2c,-(x) is positive semi-definite, (4) implies that the matrix M(x, A) defined by (12) satisfies (d,M{x,X)d) >n \\d ||2 f o r a l l x , d € / J " , (15)
K. Taji and M. Fukushima
412 whenever A > 0. Now we state the algorithm.
Algorithm Step 0 Choose x° € fl*\ r > 0, 0 < 0 < 1, 0 < cr < 1, and a symmetric positive definite matrix G. Let k := 0 Step 1 Find the unique solution xk 6 T(xk) of the linearized variational inequality problem (F(xk) + M{xk,\k)T{xk-xk),x-xk)
> 0 for all x€T(xk),
(16)
where A* is an arbitrary vector in A(i*). Let dk := xk — xk Step 2 Set xk+i that
:= xk + /3mkdk, where mk is the smallest nonnegative integer m such eT(xk) - 0r(xk + /3mdk) > -a/Sm0'r(xk;
dk).
(17)
Let k := k + 1. Go to Step 1. Note that in Step 1 we need an optimal Lagrange multiplier vector A for the quadratic programming problem QP(i*) (cf. (8)). This has already been obtained in the previ ous iteration as a by-product of evaluating the function value f(xk). Note also that, by the positive definiteness of M, the linearized problem (16) always has a unique solution. Moreover problem (16) can be rewritten as a linear complementary prob lem, which can be solved in a finite number of steps using Lemke's complementarity pivoting algorithm [9], The following theorem shows that the vector dk generated by the algorithm is a descent direction of 6r at xk
Theorem 4.1 Let the mapping F be continuously differentiable and strongly mono tone with modulus p, and let the convex functions c,, i = 1 , . . . , m, be twice continu ously differentiable. If II A | U < r for all A G A(xk), then the vector dk = xk — xk satisfies the inequality 6'r{xk; / ) < - ( / . - \ || G ||) || dftf
(18)
In particular, if the matrix G is chosen sufficiently small to satisfy | | G | | < 2fi, then dk is a descent direction of 6T at xk.
A Newton Method for Variationa/ Inequality Problems
413
Proof. For simplicity of notation, we omit the superscript k in xk and dk. Let 7+ = {i| Ci{x) > 0} and I0 = {i\ c(x) = 0}. Note that d = x - x together with some Lagrange multiplier vector A > 0 satisfies m
T F(x) + M{x, M(x,A)~\fd
(19a)
i=l
cc(x) ( i ) + (Vc,(x),(f)<0, (Vc(x),d)<0, Xi[ci{x) + (Vc Ai[c,(x) (Vc,(x), d)} = 0, i(x),d)}
i = l,...,m. 1,...,m.
(19b) (19c)
Then (19b) yields £max(0,(Vc,(x),
(20)
m
Since d = x — x, and since M(x, A) = V F ( I ) + ^ A , V 2 Q - ( I ) , it follows from (13) that .^) /'(x;
TV
"
A<=A(r)
/
—
-x),x-x)
\ \
<{F{x) - [M(x, \) - G](H(x) -x),x-x) <{F{x) - G](H(x) =(F(x),x - -[M(x, x ) -\)(H(x) - x, M(x, \)T-x),x-x) (x-x))
+ (G(H(x) - x)i,i — x) T T(-(x) T + (G(H(x) =(F(x),x x ) x, M(x, \) (x-x)) =(F(x) + M(x, \) (x - i ) , ! - . ) - ( M ( x , X) (X - « ) , * - x) ~x),x-x) T =(F(x) M(x, (x T(x- . -) x), , ! H{x) - . ) - -x) ( M (+x ,(F(x), A f (x H(x) - x ) , -x -x)x) - (F(x)+ + M(x,\) X) -+(G(H(x)-x),x-x) (F(x) + Af (*, A) r (x - x), H{)) -x) + ,A)T(, H(), x x>
-(G(H(x)-x),x-x) =-(F(x) + M(x,\)T(x-x),H(x)-x) =-(F(x) + M(x,\)T(x-x),H(x)-x) + I (F(x) H(x) — x) + x -(H(x) — x G(H(x) — x)) \ 1'A — I' d Ml Ml
AW\ * Id Cd\ PS \)d\ 4-I--Id
--lx-B(x) --lx-H(x)
G(x-H(x))) G(x-H(x)))
(21) (21;
where trie last equality tollows from trie €(jua,nty 2 ( i - xx,,GG{H{x) ( # ( x ) - -xx) )) ) 2{x = (H(x) - x, G(H(x) - x)) + (x-x, (x - x, G(x - x)> - (x - H(x), G(x -
H(x))).
Since x is a solution to (16), the first term of (21) is nonpositive. From (9), the second term of (21) equals -f(x). The last term is nonpositive by the positive definiteness of G. Hence, we have f'{x;d)
< -f(x)
- (d,M(x,\)d)
+
\{d,Gd).
K. Taji and M. Fukushima.
414
Moreover, since A € A(x), it follows from Lemma 3.5 that m 1 \)d) + -(<*,Gd) + £ * « * ( * ) •
/'(*;d) < -(d,M(x,
(22)
Hence, we have T
1
m
9'T{x-d)<-(d, M(x,\) d) + -(d,Gd) + J2 W * ) + r £ (Vc,-(*),d) 1
T
<-(d,M(x,\) d)
i=l
+ \(d,Gd) 1
iei+
+ ^(A,- - r)c(x) '6/+
2
^-(^-^IIGIl)!!^!! , where the first inequality follows from (14), (20) and (22), the second inequality follows from (19b) together with the fact that A, > 0 for all i and c,(x) < 0 for i 0 7 + , and the third inequality follows from (15) and || A ||oo^ f for all A £ A(x). Xhis proves (18). The last part of the theorem follows immediately. □ Next we show the global convergence of the algorithm. Theorem 4.2 Suppose that the mapping F is strongly monotone with modulus \i. Suppose also that the parameter r is chosen sufficiently large. If the matrix G is chosen to satisfy || G \\< 1\i, and if the sequence {xk} generated by the algorithm is bounded, then {xk} converges to the unique solution to the variational inequality problem (1). Proof. Since the sequence {xk} is bounded, it follows from [6, Lemma 3.3] that there exists a positive number f > 0 such that || \k H^^ f for all k, where A* is any vector in A(x*). Assuming that r > f, we have from Theorem 4.1 that dk satisfies the descent condition (18), whenever xk is not a solution to (1). Hence, by the line search rule (17), the sequence {0r(xk)} is decreasing. This together with the boundedness of {x*} implies that there is at least an accumulation point. In a way similar to the proof of [12, Theorem 4.1], it can be shown that any accumulation point is a solution to (1). Moreover, under the strong monotonicity assumption, problem (1) must have a unique solution. Therefore we conclude that the entire sequence {x*} converges to the unique solution to (1). D Next we examine the asymptotic rate of convergence of the algorithm. To this end, we consider the iterates (x*, A*) generated by the Newton method directly applied to
A Newton Method for Variational Inequality Problems
415
the mixed nonlinear complementarity problem (6), namely F(xk) + M(xk, \k)T(xk+1 - xk) + £ £ , A?+1 Vc-(i') = 0, Ci(xk) + (Va(xk), xk+1 - xk) < 0, A?+1 > 0, +1 fc k A* [c,(x ) + (Vc.-(x*),x*+> - i )] = 0 , i = l,...,m.
(23)
It can be shown [5] that, if VF(x') is positive definite, and if the strict complemen tarity and the linear independence of the active constraints hold, then the sequence generated by the Newton method (23) is quadratically convergent, provided that the starting point is chosen sufficiently close to the solution. (Note that [5] deals with nonlinear programming problems, which correspond to the special case of problem (1) where F is a gradient mapping of some scalar function, so that F is symmetric. But the symmetry assumption is not used in the proof of the theorem in [5].) Note that a solution x
+1
to (23) is a solution of the variational inequality problem
(F(xk) + M(xk,\k)T(xk+1-xk),x-xM)
> 0 for all x£ T(xk),
(24)
which is the same problem as (16) solved in Step 1 of the algorithm, except for the choice of A*. Therefore, if || M(xk, \k) - M(xk, \k) || tends to zero as x —> x", then the sequence {xk} generated by solving the linearized variational inequality problem (F(xh) + M{xk,\k)T{xk+1
-xk),x
-xk+1)
> 0 for all
x£T{xh)
with an arbitrary A* 6 A(x*), is locally superlinearly convergent. Since the vector A* belongs to A(i' c ) denned by (10), and A* in (24) is determined at the previous Newton iteration (23), both A* and A* approach the set A(x*) whenever xk converges to x" In particular, if A(i*) consists of the unique vector A*, then both A* and A* converge to A*, and hence we have || M(xk,Xk) — M(xk,Xk) |—* 0. Note that the uniqueness of the Lagrange multiplier vector A* is ensured by the linear independence of the active constraints. These observations are summarized in the following theorem. Theorem 4.3 Let the assumptions of Theorem 4-2 be satisfied. In addition, suppose that the strict complementarity and the linear independence of the active constraints hold at the solution x'. If there is an integer k > 0 such that the unit step size is accepted in Step 2 of the algorithm for all k >k, then the sequence {xk} generated by the algorithm converges superlinearly to the solution x"
K. Taji and M. Fukushima,
416
5
Concluding Remarks
We have proposed a Newton method for solving variational inequality problems and shown that, under the strong monotonicity assumption, the method is globally con vergent and that, under some additional assumptions, the rate of convergence is super linear. When F is a gradient mapping of some differentiable convex function ip, problem (1) corresponds to a necessary and sufficient optimality condition for the convex programming problem minimize subject to
ip(x) c,(i) < 0, i = l , . . . , m .
, ^
Therefore we may apply our method to (25) with the identification F = V>. In this case, the matrix M defined by (12) is rewritten as m
M(I,A) = V V ( i ) + ^A,V2c,(x), ;=]
which is the Hessian of the Lagrangean of problem (25). Moreover, since M is sym metric, the subproblem (16) solved in Step 1 can be rewritten as minimize*
-{d, M(xk, Xk)d) + (F(xk), d)
subject to
Ci(xk) + (Vci(xk),d)
< 0 z= l,...,m.
Thus our algorithm reduces to a successive quadratic programming (SQP) method. A major difference from other SQP methods is that our algorithm makes use of the function / as a merit function to globalize the convergence, instead of using a penalty function associated with problem (25).
References [1] G. Auchmuty, Variational principles for variational inequalities, Numerical Func tional Analysis and Optimization 10 (1989) 863-874. [2] A. Auslender, Optimisation: Methodes Numeriques, Masson, Paris, 1976. [3] A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Uncon strained Minimization Techniques, SIAM, Philadelphia, 1990.
A Newton Method for Variationai Inequality Problems
417
[4] M. Fukushima, Equivalent differentiable optimization problems and descent methods for asymmetric variationai inequality problems, Mathematical Program ming 53 (1992) 99-110. [5] U. M. Garcia Palomares and O. L. Mangasarian, Superlinearly convergent quasiNewton algorithms for nonlinearly constrained optimization problems, Mathe matical Programming 11 (1976) 1-13. [6] S. P. Han, A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications 22 (1977) 297-309. [7] P. T. Harker and J. S. Pang, Finite-dimensional variationai inequality and nonlin ear complementarity problems: A survey of theory, algorithms and applications, Mathematical Programming 48 (1990) 161-220. [8] T. Larsson and M. Patriksson, A class of gap functions for variationai inequalities, Mathematical Programming 64 (1994) 53-79. [9] C. E. Lemke, Bimatrix equilibrium points and mathematical programming, Man agement Science 11 (1965) 681-689. [10] P. Marcotte and J. P. Dussault, A note on a globally convergent Newton method for solving variationai inequalities, Operations Research Letters 6 (1987) 35-42. [11] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [12] K. Taji, M. Fukushima and T. Ibaraki, A globally convergent Newton method for solving strongly monotone variationai inequalities, Mathematical Programming 58 (1993) 369-383. [13] K. Taji and M. Fukushima, A new merit function and a successive quadratic programming algorithm for variationai inequality problems, SIAM Journal on Optimization, to appear.
D. Ward
418
Recent Advances in Nonsmooth Optimization, pp. 418-437 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
Upper Bounds on a Parabolic Second Order Directional Derivative of the Marginal Function Doug Ward Department of Mathematics and Miami University Oxford, Ohio 45056-1641 USA
Statistics
Abstract
We establish an upper bound on the parabolic second-order upper Dini deriva tive of the marginal function of a parametric nonlinear program with smooth equality constraint functions but possibly nonsmooth objective and inequality constraint functions. The main tools in the proof of this bound are an inter section theorem for second-order tangent sets and a Ljusternik-type lemma. The bound takes on a particularly simple form if the objective and constraint functions are C 1 , 1 . Corollaries include a new upper bound for the upper Dini derivative of the marginal function of a C , l program.
1
Introduction
We consider t h e parametric m a t h e m a t i c a l program V{s)
v{s) : = m i n { / ( i ) | gt(x)
< s{,i 6 J, ht{x)
= sm+i,i
6
L},
for a vector s : = (si,... ,sm+p) € R m + P , functions f,g„h: : R " -► ( - o o , + o o ] , a n d index sets J := { l , . . . , m } and L := { l , . . . , p } . In nonlinear p r o g r a m m i n g , one is often interested in how t h e values of the marginal function v vary with changes in 5
Parabolic Directional Derivative
419
and the study of the differential properties of v is a major area of research (see for example [12]). Although v generally has points of nondifferentiability, bounds on u T (s;j/) := hmsup —
—
—
and v(s + ty) - v(s) v (s:y) := liminfv ' i-o+ t the upper and lower Dini directional derivatives of u, can be established under fairly mild assumptions. For example, if a Mangasarian-Fromovitz-type constraint qualifi cation is satisfied at solutions of V(0), an upper bound for t) + (0;s), in terms of the support function of the set of Karush-Kuhn-Tucker multipliers for V(0), is valid. If an additional "uniform compactness" or "tameness'' condition holds, then a similar bound on v~(0; s) can be derived. Such bounds were established by Gauvin and Tolle [17] for the case in which / , <7;, and hi are C 1 , and have since been shown, via tech niques of nonsmooth analysis, to have extensions that are valid for larger classes of objective and constraint functions [2, 10, 29, 31]. If second-order information about the objective and constraint functions is available, then corresponding information about v can be deduced, leading to sharper estimates of the changes that occur in v as s changes [3, 7, 11-12, 19, 24-28]. The foundational results in this area, due to Fiacco and others [12], give an explicit formula for the Hessian of v under the assumption that / , g,, and hi are C2 and an appropriate con straint qualification and second-order sufficient optimality condition are satisfied at a solution of V(0). Subsequent research has examined the consequences of weakening one or more of these assumptions (see e.g. [11, 19, 21, 24, 28]). In the present paper, we explore what can be said about upper bounds on secondorder directional derivatives of v when the objective and constraint functions are no longer assumed to be C 2 . Specifically, our goal here is to use concepts and techniques of nonsmooth analysis to derive an upper bound for v++(s;y,z)
:=limsup (-.0+
v(s +
'-
ty+ltiz)-v(s)-tv+(s;y) -j— , l l&
the parabolic second-order upper Dini derivative of v, under minimal smoothness hypotheses on / , <;,•, and A,. We will see that such a bound can be stated if / and g< belong to a large class of functions which need not be differentiable or even locally Lipschitzian, while h := (hi, ■ ■ ■, hp) is C 1 with derivative Vh(x) and the second-order directional derivative h"(x;d,y)
:=
lim
(*,0-.
h(x + td+ \t2z) - h(x) i2/2
tVh(x)d
D. Ward
420 exists.
We begin in §2 by describing the analytical tools that will be needed in the proof of our main result: the connection between second order tangent sets and parabolic second-order directional derivatives, an intersection theorem for tangent sets, and a Ljusternik-type lemma. In §3, we present our main result, an upper bound on v++(0;s,z) which is stated in terms of second-order directional derivatives of / , <7;, and hi. We also use the fact that i> ++ (0;0,z) = v+(0;z) to deduce an interesting corollary: an upper bound on v+(0;z) that—like recent results of Gauvin and Janin [15]; Bonnans, Ioffe and Shapiro [8]; Ioffe [19]; and Minchenko and Sakolchik [22]— takes advantage of second-order information about / , ,, and hi. Finally, in §4 we examine the important special case in which / , gi, and hi are C1' —i.e., are differentiable with a locally Lipschitz gradient function [18, 20, 21, 34]. In this case, our bounds for u + + (0; s, z) and v+(0; z) can be rewritten, via Lagrangian duality, in terms of Lagrange multipliers and simpler second-order directional derivatives. We close this introduction with a compilation of basic definitions and notation. In the sequel, | ■ || will denote the Euclidean norm and (•, •) the usual inner product on R" For i e R n a n d e > 0, we set B(x,t) := {y e R" | \\y-x\\ < e}; and for S C R", we define the distance function dist(x | S) := inf{||j/ — a:|| | y 6 S}. We denote the interior of S by int S and the relative interior of S by ri S. We say that 5 is closed near x £ S if there exists t > 0 such that S D B(x, e) is closed. For a function / : R" —> R := [—oo, +oo], the epigraph of / is the set e p i / := {(x,r) | f(x) < r}, and the effective domain of / is defined by d o m / := {x \ f(x) < +oo}. If dom / ^ 0 and / never takes on the value —oo, then / is said to be proper. If epi / is closed near ( I , / ( I ) ) , then / is said to be strictly lower semicontinuous (abbreviated strictly l.s.c.) at x. The function / : R" —> R p is said to be strictly differentiate at x 6 R" [10] if there exists a linear mapping V / : R" —» R p such that for all y 6 R", lim (w,v,t)-(r,v,0+)
/(t
"
+
^ -
/ (
^
= Vf(x)y.
t
For / : R" —» R, we will say that / is C 1 at i 6 R" if / is differentiate on some neighborhood of x and its derivative function V/(-) is continuous at x. We say that / is C 1,1 at x if in addition V / ( ) is Lipschitzian near x. Following Furukawa [13], we define / to be twice Neustadt differentiable at x with respect to d 6 R" if f (x;d,y) exists for all y G R"
Parabolic Directional Derivative
2
421
Tangent Sets and Second-order Directional Derivatives
One fundamental idea in nonsmooth analysis is the connection between local conical approximations to sets, called tangent cones, and types of directional derivatives (see [1]). Specifically, suppose that / : R n -» R is finite at i g R n , and let A be a concept of tangent cone. (We can think of A as a set-valued mapping such that for S c R " and x 6 5, A(S, x) is a cone which approximates 5 near x.) Then the A directional derivative of / at x in the direction y is defined by fA(x;y)
: = inf{r | (j,,r) £ A ( e p i / , ( * , / ( * ) ) ) } .
If A(epi/, ( x , / ( i ) ) ) is a closed cone, then fA is defined precisely so that ep[fA(x;-):=A(epif,(x,f(x))),
(1)
and f (x; ■) will be lower semicontinuous. In this paper, we will work with fA for two tangent cones: the adjacent cone, defined for S C R n and x £ S by T(q
i(t>,x).
v _ / R„ jj/fcK.
Ve > 0, 3A > 0 such that Vt 6 (0, A), 1 3veB(y,e) such that x + tv£S }'
and the Clarke tangent cone C(S,x):=(j,6R"
Ve > 0,3A > 0 such that Vz 6 B(x, A) n 5 1 V< € (0, A), 3D G B(y, e) such that z + tv £ S J '
The properties of / T and / c are extensively discussed in [1, 33]. We mention here, in particular, that if / is Lipschitzian near x, then / (x; ■) = f+(x; ■) and f°(x; •) = f(x; •), where f°(x; ■) is the Clarke [10] directional derivative. A number of favorable properties possessed by / + and f for locally Lipschitzian functions are shared by fT and fc for some larger classes of functions. The connection between tangent cones and directional derivatives has a second-order analogue in the correspondence between second-order tangent sets and parabolic second-order directional derivatives [1, 30, 32]. In this paper, we will be particularly concerned with the second-order adjacent set, defined for S C R", x € S, v 6 R" by T 2 (5 :,x,v):=LeR
n
Ve > 0, 3A > 0 such that Vi € (0, A), 3w e B(y, e) such that x + tv + t2w/2 G S
D. Ward
422
We observe that T2 is a generalization of T in the sense that T2(S, x, 0) = T(S,x), and that T2(S,x,v) is often not a cone—for example, if S = {(x,y) g R 2 | y > x 2 } , then T 2 (5,(0,0),(l,0)) = { ( x , y ) | 2 / > 2 } . The direction v will generally be an element of T(S, x); in fact, T2(S, x,v) = 0 if
v#T(S,x). For / : R" -» R, if f(x) and / T ( x ; « ) are finite, then the second-order directional derivative associated with T2 is defined by 4 f{x; v,y):=
inf {r |(y, r) g T 2 (epi / , (x, / ( * ) ) , («, f(x;»)))
}.
Since r 2 ( S , x,u) is always a closed set, this definition implies that epi 4 / ( x ; „, •) = T 2 (epi / , (x, / ( * ) ) , («, f(x;„))).
(2)
As a generalized limit of difference quotients,
M
s
'
. ,
:=
t2w/2)-f(x)-tfT(x;v) -T-T-
. ,
«>o *>"
I*
f(x + tv +
t2w/2)-f(x)-tfT(x;v)
<2/2
This shows that ofj/ is a parabolic second-order derivative similar to those studied by Ben-Tal, Zowe, and others (see [1, 5-6, 13, 19, 30]). There are simpler expressions for d\f for special classes of functions. For example, if / is Lipschitzian near x, then it is not difficult to show that ^ / ( x j r i , ! / ) = / + + ( x ; v, y). If / is C 1 at x, then d\f (and similarly, / " ) can be further simplified. Proposition 2.1 Let f : R" -> R be C1 at x. Then for all v,y g R", 4f(x;v,y)
= Vf(x)y
+ d2+f(x;v),
(3)
where ,2 r, » ,/(X + tv) - f(X) d%f{x\v) : = hmsup — 1—0+
tVf(x)V ^-.
I I i,
Moreover, if f is C 1,1 at x, then a%f{x\-,-) is finite. Similarly if f is C 1 at x and twice Neustadt differentiate at x with respect to v, then for all y g R n , f'\x-v,y) = Vf{x)y + d2f(x-v) (4) whcvG
n
»
,.
f(
f(x +
tv)-f(x)-tVf(x)v
^-
.
Parabolic Directional Derivative
423
PROOF. The proofs of (3) and (4) are based on ideas of [5, Lemma 6.4; 6, p. 484]. To prove (3), let v G R", y G R" For sufficiently small t > 0, we have by the mean value theorem that there exists 6t G (0,1) such that / ( * + tv + t2y/2) - f(x + tv) = (V/(x +iv + t20ty/2),t2y/2). 1
(5)
T
Since / is C at x, f (x; v) = V/(x)u and / is Lipschitzian near x. As mentioned above, d2rf(x;v,y) = / + + ( x ; v,y), so by (5), 4 / ( * ; v, y) < lim sup(Vf(x + tv + t%y/2),y)
+ d2+f(x; v) = Vf{x)y
+ d\f{x;
v),
+ d\f(x;
v).
1—0+
and 4 / ( * ; v, y) > limmf(V/(x + tv + t20ty/2), y) + d\f(x;
v) = Vf(x)y
Hence (3) holds. The proof of (4) is similar. Finally, suppose that / is C 1,1 at x. Then there exist fi > 0 and M > 0 such that ||V/(x') - V/(x")|| <M\\x'-x"\\,
VX',X"GJB(X,^).
Choose 8 > 0 small enough so that x + tv G B{x,ti) for all t G (0,(5). Now let t G (0,(5). Again by the mean value theorem, there exists 6t G (0,1) such that / ( x + tv) - f(x) = (V/(x +
6,tv),tv).
Then f{x + tv) - f(x) -
tVf(x)v)
(Vf(x
+
ty2 Hence d\fix\
Ottv)-Vf(x),v) */2
v) is finite, and by (3), djf(x; v,-) is finite.
< 2M|jtj|| D
It is worth noting that if / is twice Frechet differentiable at x with second derivative V 2 / ( z ) , then d2+fix;v) = d2fix;v) = V2fix)(v,v), so that f"ix;v,y)
= d2Tf(x-v,y)
= Vf(x)y
+ V 2 /(x)(t>, „).
These facts follow easily from Taylor's Theorem. Equations (1) and (2) enable us to work with fT, / c , and d\f via T, C, and T2. There are two properties of T2 that will be especially useful to us here. One is the fact [30, Lemma 2.4] that if S C R n , x G S, and v € R", then T2(S,x,v)
+ C(S,x)cT2(S,x,v).
(6)
The other is a theorem relating T2(n™ ^ j , x, v) and n™ x r 2 ( 5 i , x, v) for locally closed sets S{ with x G n£.jS,. These two expressions will be equal, in fact, if the sets CiSi,x) have "sufficient intersection" in the following sense:
D. Ward
424 Definition 2.2 (a) For K C R p , define A" A := { ( i ! , . . . , x „ ) | x{ 6 K,xi = x 2 = . . . = i » } . (lj Let Ki,...,Ka be nonempty convex cones in W. strong general position if
These cones are said to be in
A"RP - f [ A; = R np
(7)
•=i
The strong general position concept is discussed in [35, 33], where several equivalent ways of writing (7) are given. In particular, n^Ki-Km+i
= Rp,Vm=l,...,n-l
is equivalent to (7). Another way to write (7) is n?=i(tf> - {*.}) # 0, Vi, € R p , i = 1 , . . . , n , which is often used in [1]. To put this condition in context, we observe that (7) implies n,"=1 ri A, # 0 and is implied by 3i such that A'; n (n,/,- int Kj) ^ 0. In addition, we note that if K\,..., Kn are in strong general position, then so are A,, i € M, for any nonempty M C { 1 , . . . , n}. We can now state the aforementioned tangent set intersection theorem. Theorem 2.3 [30] Let Si, i = 1 , . . . ,m be subsets o / R " which are closed near x 6 njijS,, and let v € R". Suppose that C(Sj,x), i = l , . . . , m are in strong general position. Then
r2(n™15„a;,l;) = n- 1 r 2 (5„i,j;). In working with the equality constraints in 'P(.s), we will make use of a second-order Ljusternik-type lemma. This lemma is a consequence of the following theorem, a special case of a versatile result of Borwein [9, Theorem 2.1]. Theorem 2.4 [9] Let S c R " , and let H : R" -» R m be strictly differentiable at xQ £ H~l{Q) n S. Suppose that S is closed near i 0 and VH(x0)C(S,x0)
= Rm
(8)
Then there exist L > 0, 8 > 0 such that for all x S B(x0,8) n S and all u g dist(x | #
_1
( u ) n S) < Ld\st(H(x)
| u).
B(0,8), (9)
Parabolic Directional Derivative
425
L e m m a 2.5 Let S C R", and let H : R n -» R m be strictly differentiate at x0 G 5 D j / - 1 ( 0 ) . Suppose that S is closed near x0, (8) holds, and H is twice Neustadt differentiable at x0 with respect to d. Let s = VH(x0)d, z G R m . Then T2(S,x0,d)nH"(x0;d,-)-1(z) V« > 0, 3A > 0 such that V< G (0, A), 3y(i) G B(y with x0 + td + t2y(t)/2 G S n # _ 1 ( i s + t2z/2) PROOF. Let t > 0 be that for all x G B(x0,6) H"(x0;d,y) = z. Then B(0,6) and there exists
"»}■
(10)
given. By Theorem 2.4, there exist L > 0, 6 G (0, e) such n 5 and u G £(0,<5), (9) holds. Let y G r 2 (5,x 0 ,) with there exists A > 0 such that for all t G (0, A), ts + t2zj2 G w(t) G B(y,S/2) with x 0 -Md + t2w{t)/2 G 5 n 5 ( x 0 , >5)
and
||if(x 0 4- td+t2w(t)/2)
-ts-
t2z/2\\
2
t j1 Let t G (0, A), and choose w(t) as above. a(t) G tf_1(ts + t2z/2) n 5 with
< 6/4L.
Then by (9), there exists a point
||x 0 + t(i + f 2 u ; ( < ) / 2 - a ( 0 | | £||ff (xp + <
+
e e 4 S '
aft) — In — td { - > —J . Then x 0 +
and
ll»(0-y|l<
a(r)-x0-id-<2iu(t)/2|| « 2 /2
+ M*)-y||<«-
Hence i/ belongs to the set on the right-hand side of (10). The proof of the opposite inclusion is routine. □
3
U p p e r Bounds on v ++
Throughout the remainder of this paper, we denote the feasible set and solution set of V(s) by FW:=fxGR" 9M~Si'ie-ir } v ' | hi(x) = s m + 1 , i £ i J
D. Ward
426 and il(a) := {x £ F(s) | f(x) = v(s)},
respectively. We let x0 G n(0) and define /(x 0 ) := {i G J \ gi(x0) = 0}. We will also be interested in the set of directions f fT(x0-d) = v+(Q-s), ) E(s) := Id G R" gf(x0;d) < sh i G I(x0), \ ; {
Vhi(x0)d = sm+i,
ie L
J
and for d G S(5), in the index set I(x0,d)
:= {i G I(x0) \ gj(x0;d)
= «,•}.
We will often assume that io and
A s s u m p t i o n 3.1 (a) / and gt, i G I(XQ), are strictly l.s.c. at xo', gi, i G J\ I(XQ) are continuous at x0. (b) /» is C 1 at xo and twice Neustadt differentiable at x0 with respect to d, and V/i(x 0 )R n = R". (c) / T (io;-)i 9l(xo'r), 7(x0)\/(io,d)-
i G /(zo) are proper; domc/j-^^io; d, ■) = R n for all i G
(d) d o m / c ( x o ; ■)> dom^rf (x 0 ; •), i G I(xo), and V/i(i 0 ) _ 1 (0) are in strong general position. (e)
d o m / c ( x 0 ; •)
Vh(xo)-l(0)
n{y\g?(xo;y)<0,
Vi€ I(x0)} ^
R e m a r k 3.2 (i) We note that if a function / : R n —t R is Lipschitzian near x, then d o m / c ( x ; •) = R" and fT(x; •) is proper. Thus if / and each gt are Lipschitzian near x 0 , parts (a) and (d) of Assumption 3.1 hold and parts (c) and (e) are greatly simplified. (ii)If/ : R" -> R i s C M a t x, then dom<4/(x; i>, ■) = R" for all v by Proposition 2.1. If / is twice Frechet differentiable at x 0 , then in particular / is twice Neustadt differentiable at x0 in every direction d. It follows that if / and each fl, are C 1,1 at xo, and if each /i, is twice Frechet differentiable at x 0 , then Assumption 3.1 (a), (c), and (d) are satisfied and (e) reduces to Vft(x o )- 1 (0) n {y | Vgi{x0)y < 0, V. G /(x 0 )} ± 0. In other words, Assumption 3.1 reduces in this setting to the MangasarianFromovitz constraint qualification.
427
Parabolic Directional Derivative We now establish our main result:
Theorem 3.3 Suppose that Assumption 3.1 is satisfied at XQ £ 0(0), d £ £(s). Then for all z £ R m + P , v++{0;s,z)
<mi{d2Tf(x0;d,y)
d\gi(x0;d,y) < zt, i £ I(x0,d), h"(x0;d,y) = zm+i,i£ L
\ J"
.^
PROOF. Let z £ R m + P , and suppose that y satisfies d\gi(x0\d,y) < zi, V» e I(x0,d), h"(x0;d,y) = zm+i, Vi 6 L, and dyf{xa\d,y)
< r.
++
It suffices to show that u (0; 5, 2) < r. To that end, let e > 0 be given, and let w be an element of the set in Assumption 3.1(e) with fc(x0; w) < a. Choose 7/ > 0 such that max{||7jiw||, \rja\} < e/2. Since (y, r) £ T 2 (epi / , (x 0 , f(x0)),
(d, f(x0;
d)))
and (w, a) £ CMepi/, (x 0 ,/(x 0 ))J, (6) implies that (y, r) + r,(w, a) £ T 2 (epi / , (*0) /(*<,)), (d, f(x0;
d)));
i.e., dj-f(x0; d,y + r)w) < r + -qa. Similarly, djgi(x0; d,y + i)w) < zu Vi £ I(x0, d) so we may choose S £ (0, e/2) such that <4ji(z 0 ; d, V + Vw) < z«" - <5, Vi € 7(x 0 , ). Since /i is C 1 at x 0 and Vh(x0)w for i = 1,.. . ,p.
= 0, Proposition 2.1 gives h"(x0; d,y + r/w) = z m + ,
Now define the sets D0 := { ( x , r 0 , . . . , r m ) £ R " + m + 1 | f(x) < r 0 ) , A : = { ( x , r 0 , . . . , r m ) G R " + m + 1 | Si[x) < n), 1 £ J; and take
s := nSoA, /9:=(^o,/(xo),0,...,0)eR"+
m+1
,
D. Ward
428 and 7 :=
(d,fT{x0\d),31,...,sm).
As in (2) we obtain T2(D<,p,
(y + Vw,r + r,a,z1-6,...,zm-6)£
7),
Vi € J(x 0 , d) U {0},
and by Assumption 3.1(a), we have T2(Di,p,f)
= R " + m + \ Vi € J \ /(xo).
By Assumption 3.1(c), y + i/m 6 domdj.5,(z 0 ;d, •) for i e 7"(#o)\ I(xo,d), Lemma 2.7] implies that
and so [30,
(i/ + 7jiu,a) 6 T 2 f epigi,(a:o,5i(io)),(rf,ff1T(xo; Hence (y + r)w,r + r,a,zi-6,...,
*m - 6) € T 2 ( A , 0,7), Vz £ 7(i 0 ) \ 7(s 0 , d),
and we conclude that (y + r,w,r +
Va,zl-6,...,zrn-6)€n™=0T
2
(Di,fl,"f).
Moreover, Assumption 3.1(d) implies that C(7?,-,/3), i = 0 , . . . , m, are in strong gen eral position, so by Theorem 2.3, {y + ijw,r + 7)<x,zi - 6,...,zm
-6)
g
T2(S,/?,7).
Next define 77 : R"+m+» -> W by 77(x,r 0 ,... , r m ) = h{x). Again, it follows from Assumption 3.1(d) that V77(/3)-1(0) - n£0C{DtJ)
= R"+m+1,
(12)
and by Assumption 3.1(b), V77(/?)R n+m+1 = R" Since (12) and (13) are equivalent to R* = V77(/?)[n™0C(7)„/?)], [1, Corollary 4.3.6] implies that
R" = VH(fi)C{S,fi).
(13)
Parabolic Directional Derivative
429
We may then apply Lemma 2.5 to conclude that there exists A > 0 such that for all t € (0, A), there exists (y(t), r(t), *i(i), • • •, zm(t)) eB^{y
+ ijw, r + Va,
Zl
- 6,..., zm -
S)tS)
with /? + <7 + t2(y(t), r(t), zt(t),..., eSHH
zm(t))/2 x
f * ( s m + i , . . . ,sm+p) + t2(zm+u
..., z m + p ) / 2 h
i.e.,
gi(x0 + td + t2y{t)/2) < tsi + t2z{(t)/2 < tSi + t2zi/21 Vi G J; h,{x0 + td + t2y(t)/2) = tsm+i + t2zm+t/2, Vz G I ; and f{x0 + td + t2y(t)/2)
< f(x0) + T (z 0 ; d) + t2r{t)/2.
Hence z 0 + td + t2y(t)/2 G F(ts + t2z/2). Since d G E(s) and / T ( i 0 ; ■) is proper, - c o < fT(x0;d) = D + ( 0 ; S ) , and by [31, Theorem 3.3], u + (0;,s) < +oo. Taking into account the fact that x0 G H(0), we obtain v{ts + t2z/2) - v{0) - tv+{0- s) f(x0 + td + t2y(t)/2) - f{x0) - tfT(x0; 2 t /2 ~ t2/2 < r(t) < r + ria + e/2 < r + e. Since e is arbitrary, we conclude that u + + (0; 5, z) < r, and (11) holds.
d)
D
R e m a r k 3.4 (i) Since the inequality v++(0; s, z) > djv(0; 3, z) is always true, the right-hand side of (11) is also an upper bound for djv(0; s, z) under the assumptions of Theorem 3.3. In fact, the inequality ,2 / n > . - , ( / , , dTv(0;s,z)< 1n{^dTf(xo,d,y)
..,,
^
r f > y )
d2Tgi(x0\d,y)
(14)
holds under hypotheses weaker than Assumption 3.1 (see [32, Theorem 4.4]). Assumption 3.1 includes a Mangasarian-Fromovitz type constraint qualification which is not required in a proof of (14) but plays a crucial role in our proof of the sharper inequality (11).
D. Ward
430
(ii) Assumption 3.1 does not guarantee, in general, that the right-hand side of (11) will be less than +oo. We can ensure that this expression is less than +oo by adding the condition ( I
) .} ^ V
V
,-.., >
to Assumption 3.1. If <7; and hi are C 1,1 , however, then Assumption 3.1 implies (15) by Proposition 2.1. As an immediate consequence of Theorem 3.3, we deduce a new upper bound for v+. Corollary 3.5 Suppose that Assumption Then for all z G R m + P , + /r> \ ,-• djift v^0;z)<m(i[dTf(x0;d,y)
J
3.1 is satisfied at xo G H(0), d G 2(0).
\
PROOF. Let s := 0 in Theorem 3.3. Since i; ++ (0;0,:z) = v+(0;z), (16) in this case.
\ ]•
. . (16)
(11) reduces to □
Corollary 3.5 gives a bound on the upper Dini derivative of v that takes advantage of second-order information about /,#,-, and h,. Not surprisingly, such bounds can be sharper than those which use only first-order information (see [15-17] and Example 4.4 in the next section). For the case in which / , ;, and hi are C2, other results of this sort have been obtained by Gauvin and Janin [16], Bonnans, Ioffe and Shapiro [8], Ioffe [19], and Minchenko and Sakolchik [22]. We illustrate Theorem 3.3 and Corollary 3.5 with an example: Example 3.6 In problem V(s), let n = 2, m = 1, p = 0, and define / : R 2 —» R by f(xi,x2) = x\ + |xi - 1| + \x2\ and gi : R 2 -> R by gi(xi,x2) = 4xi - x\. In this example, fi(0) = {(0,0)} and E(0) = {(0,0)}. Thus (16) reduces to i, + (0;z) < \ni{-yi
+ \y2\ | 4y, < z} = +
In fact, one can calculate directly that v (0;z) therefore obtain E(s) = {(dud2)
-z/4.
= —z/4 for z G R. For s G R, we
| - s / 4 = -dx + |d 2 |, Ad, <s} = {(3/4,0)},
and (11) gives v++(0;s,z)
< inf{ 5 2 /8 -
yi
+ \y2\ \ 4 y i < z} = s2/8 ++
2
Again, one can calculate directly that t> (0;.s,2) = s /8 - z/4.
z/4.
431
Parabolic Directional Derivative
T h e C 1 1 Case
4
It has become evident in recent years that much of the theory of nonlinear pro gramming and second-order differential analysis—for example, optimality conditions, Taylor expansions, and sensitivity results—can be extended very nicely from a C 2 setting to a C 1 , 1 setting (see for example [18, 20, 21, 34]). Just as first-order nonsmooth analysis is most effective for locally Lipschitzian functions [10], so also many theorems of second-order nonsmooth analysis take on a particularly simple form for C 1 ' 1 functions. In this section, we will see that Theorem 3.3 can be significantly sim plified in a C ' setting. Our simplified bound will be stated in terms of a Lagrange multiplier set, defined for d G T,(s) by A, = 0 , V i e A€R™xRp
M(x0,d):=
Vf{x0)
+
J\I{x0,d), £
\,Vg,(x0)
+ £ Xm+iVhi(xo)
= 0
Theorem 4.1 Let f, gi, i £ J, h, i G L, be C 1,1 at x0 G fl(0). Suppose that E(s) is nonempty, h is twice Neustadt differentiable at each d G S(s), V/j(x 0 )R" = R p ; and Vfc(*o) -1 (0) n {y | V9i(x0)y
< 0, Vz G I(x0)} + 0.
(17)
Let z G R m + P . Then v++(0:s,z)< V
'
'
y
inf
max
— rfg£(3) \eM(x0,d)
d\f(x0; +
d) + E/(i„,rf) k
(18)
PROOF. For d G E(s), z G R m + P , consider the convex program
(Vg<(xo), y) + d\gi{xo\ d) < z„ (Vx) a(d):=M{{Vf(x0),y)
+ dlf(x0;d)
(Vh,(x0),y)
i G I(x0,d), + d2hi(xQ;d) = zm+„ i GL
Our hypotheses guarantee that Assumption 3.1 is satisfied, so by Theorem 3.3 and Proposition 2.1, v++(0; s, z) < infd€E(3) a(d). For A G R m + P and y G R", define p
L{\, y) := (V/(x 0 ) + £ l{x0,d)
A,V 9i (x 0 ) + £
Am+1-Vfc,-(x0),»).
>=1
Then the Lagrangian dual of (Vj) is (Dl)
/3(d) ■■= sup
inf
L{Kv)- < A,z > +d\f(x0;d) + E/(i 0 ,i) Kd%9i{x
D. Ward
432
If A g M(x0,d), then L(X,y) = 0 for all y g R n and it follows that 13(d) =
sup xeM(xoM)
Otherwise, infveR.n L(X,y) = - o o ,
- < \,z>+dlf{x0;d) + E/(r0,d) ^
The constraint qualification (17) implies that the Mangasarian-Fromovitz constraint qualification is satisfied for program (V\). The multiplier set M(xo,d) is nonempty and compact [14], and so /3(d) is finite and attained for some A g M(x0,d). Condition (17) also implies that Slater's condition is satisfied for (Vi). By Lagrangian duality (see e.g. [4, Theorem 6.2.4]), it follows that 0(d) = a(d). Therefore (18) holds. D We illustrate Theorem 4.1 with an example from [15-17]. Example 4.2 In problem V(s), let n = 2, m = 2, p = 0, and define /{xi, x2) = — x2, gi(xux2) = x\ + x2,g2(xux2) = -xj + x2. Then n(0,0) = {(0,0)} and u(0,0) = 0. Let xo = (0,0), s = (0,1). One may calculate directly that u+((0,0);s) = 0. If z = (1,1), then u ++ ((0,0);.s,.z) = limsup (^0+
v(t2/2,t + t2/2) =-1. t*/2
On the other hand, E(s) = {(dud2) | d2 = 0}, l(x0,(du0)) {(1,0)} and the right-hand side of (18) becomes
= {1}, so M(x 0 , (du0))
=
iniU2dl - 1) = - 1 . ai€It
So with z = (1,1), the bound in (18) is attained. If we let s = 0 in Theorem 4.1, we obtain the following C 1,1 version of Corollary 3.5: Corollary 4.3 Let f, #;, i g J, h,, i e L, be C 1,1 at x 0 g S1(0). Suppose that h twice Neustadt differentiable at each d in Vf(x0)d
= 0,
"I
E ( 0 ) = Id Vgi(x0)d < 0, i g /(x 0 ), \ , Vh(xo)d = 0 Vh(x0)[R.n] = R p , and (17) holds. Let z g R m + P . Then v+(0: z) < inf
max
d£E(0) AeM(io.J)
d\f(x0\ d) + E/(x„,«f) Kdlgi(x0; d) + Efc 1 A m + ,(f 2 /i,(x 0 ; < f)-(A,z)
< +oo.
(19)
Parabolic Directional Derivative
433
It is interesting to compare Corollary 4.3 with the bounds in [2,10,17,29,31], which use only first-order information about / , #,-, and ht. For example, if / , g<, and k, are strictly differentiable at x0 G ft(0), Vh(x0)Rn = R", and (17) holds, then by Theorem 4.4 of [31], v + (0; z) < max{-(A, z) | A € M(x0)}
< +oo,
(20)
where A, = 0, Vi 6 J \ I{x0), M ( i 0 ) := I A € R+ x R" V / ( i o ) + £ A,V (x ) + £ X Vhi(x ) 5i 0 m+i Q /(i0)
= 0
i=i
The inequality (20) holds under weaker hypotheses than those of Corollary 4.3. How ever, if the hypotheses of Corollary 4.3 are satisfied, then (19) gives a sharper bound than (20). We illustrate this point with an example. Example 4.4 In Example 4.2, let x0 = (0,0), z = (1,0). As in [17], one may calculate that t> + ((0,0); z) = —1/2, while the right-hand side of (20) is 0, so that the bound in (20) is not tight. However, E(0,0) = {(dud2) \ d2 = 0), so that /(x o ,(
.-f-2dl a(d) a ( d l )
-\
^ Mil <-5, 2^-1
if|rf1|>.5.
Since mf^gH.a(
lirn{v(tz)-v(0))/t
under an additional Holder condition on the solution multifunction. strengthens previous work of Gauvin and Janin [16].
Their work
D. Ward
434
5
Conclusion
We have established an upper bound for v++(0;s,z) under weaker smoothness as sumptions than have previously been used for such estimates. An interesting special case of this bound is an upper bound for the upper Dini directional derivative of v that takes advantage of second-order information about the objective and constraint functions. There are several open questions that remain to be addressed. For example, when can equality be guaranteed to hold in our upper bounds? Can a lower bound for D + + be derived under the weakened smoothness assumptions of this paper? Can our results be incorporated into a more general theory that also includes bounds on second-order directional derivatives of the solution function or multifunction of a nonsmooth program? Since there are generalizations of Theorem 2.4 that do not require strict differentiability, is it possible to weaken Assumption 3.1(b)? We hope to address these questions in future work. Acknowledgement. I am grateful for the perceptive comments of Professor Marcin Studniarski and the referees.
References [l] J.-P. Aubin and H. Frankowska, Set- Valued Analysis. Birkhauser, Boston, 1990. [2] A. Auslender, Differentiate stability in non convex and non differentiable pro gramming, Mathematical Programming Study 10 (1979) 29-41. [3] A. Auslender and R. Cominetti, First and second-order sensitivity analysis of nonlinear programs under directional constraint qualification conditions, Opti mization 21 (1990) 351-363. [4] M. S. Bazaraa, H. F. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley, New York, 1993. [5] A. Ben-Tal and J. Zowe, A unified theory of first and second-order conditions for extremum problems in topological vector spaces, Mathematical Programming Study 19 (1982) 39-76. [6] A. Ben-Tal and J. Zowe, Directional derivatives in nonsmooth optimization, Journal of Optimization Theory and Applications 47 (1985) 483-490.
Parabolic Directional Derivative
435
[7] J. F. Bonnans and R. Cominetti, Perturbed optimization in Banach Spaces I: A general theory based on a weak directional constraint qualification, Preprint, 1993. [8] J. F. Bonnans, A. D. Ioffe, and A. Shapiro, Developpement de solutions exactes et approaches en programmation non lineaire, C.R. Acad. Set. Paris 315 (1992) 119-123. [9] J. M. Borwein, Stability and regular points of inequality systems, Journal of Optimizaiion Theory and Applications 48 (1986) 9-52. [10] F. H. Clarke, Optimizaiion and Nonsmooth Analysis, Wiley, New York, 1983. [11] V. F. Dem'yanov and B. Pevnyi, Expansion with respect to a parameter of the extremal values of game problems, U.S.S.R. Computational Mathemaiics and Mathemaiical Physics 14 (1974) 33-45. [12] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Programming, Academic Press, New York, 1983. [13] N. Furukawa, A second-order extension of Ljusternik's theorem without twice Frechet differentiability condition, Bulletin of Informatics and Cybernetics 25 (1992) 53-59. [14] J. Gauvin, A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming, Mathemaiical Programming 12 (1977) 136-138. [15] J. Gauvin and R. Janin, Directional behaviour of optimal solutions in nonlinear mathematical programming, Mathemaiics of Operations Research 13 (1988) 629649. [16] J. Gauvin and R. Janin, Directional Lipschitzian optimal solutions and directional derivatives for the optimal value function in nonlinear mathematical programming, H. Attouch, J.-P. Aubin, F. Clarke, I. Ekeland (Eds.), Analyse Nonlineaire (Gauthiers-Villars, Paris, 1989), 305-324. [17] J. Gauvin and J. W. Tolle, Differential stability in nonlinear programming, SIAM J. Control Optimizaiion 15 (1977) 294-311. [18] J.-B. Hiriart-Urruty, J.-J. Strodiot, and V. Nguyen, Generalized Hessian matrix and second-order optimality conditions for problems with C1'1 data, Applied Mathematics and Optimizaiion 11 (1984) 43-56. [19] A. D. Ioffe, On sensitivity analysis of nonlinear programs in Banach spaces: the approach via composite unconstrained optimization, SIAM J. Optimizaiion 4 (1994) 1-43.
436
D. Ward
[20] D. Klatte and K. Tammer, On second-order sufficient optimality conditions for C 1,1 optimization problems, Optimization 19 (1988) 169-180. [21] B. Kummer, An implicit-function theorem for C 0,1 -equations and parametric C'^-optimization, Journal of Mathematical Analysis and Applications 158 (1991) 35-46. [22] L. I. Minchenko and P. P. Sakolchik, Holder behaviour of optimal solutions and directional differentiability of marginal functions in nonlinear programming, Preprint, 1994. [23] R. T. Rockafellar, Marginal values and second-order necessary conditions for optimality, Mathematical Programming 26 (1983) 245-286. [24] A. Seeger, Second-order directional derivatives in parametric optimization prob lems, Mathematics of Operation Research 13 (1988) 124-139. [25] A. Shapiro, Second-order derivatives of extremal-value functions and optimality conditions for semi-infinite programs, Mathematics of Operations Research 10 (1985) 207-219. [26] A. Shapiro, Second-order sensitivity analysis and asymptotic theory of para metrized nonlinear programs, Mathematical Programming 33 (1985) 280-299. [27] A. Shapiro, Sensitivity analysis of nonlinear programs and differentiability prop erties of metric projections, SIAM J. Control and Optimization 26 (1988) 628645. [28] A. Shapiro, Perturbation theory of nonlinear programs when the set of optimal solutions is not a singleton, Applied Mathematics and Optimization 18 (1988) 215-229. [29] D. E. Ward, Differential stability in non-Lipschitzian optimization, Journal of Optimization Theory and Applications 73 (1992) 101-120. [30] D. E. Ward, Calculus for parabolic second-order derivatives, Set-Valued Analysis 1 (1993) 213-246. [31] D. E. Ward, Dini derivatives of the marginal function of a non-Lipschitzian program, SIAM J. Optimization to appear. [32] D. E. Ward, Epiderivatives of the marginal function in nonsmooth parametric optimization, Optimization 31 (1994) 47-61. [33] D. E. Ward and J.M. Borwein, Nonsmooth calculus in finite dimensions, SIAM J. Control and Optimization 25 (1987) 1312-1340.
Parabolic Directional Derivative
437
[34] X. Q. Yang and V. Jeyakumar, Generalized second-order directional derivatives and optimization with C ' functions, Optimization 26 (1992) 165-185. [35] C. Zalinescu, On convex sets in general position, Linear Algebra and its Appli cations 64 (1985) 191-198.
J. Zhang,
438
C. Xu and Y. Fan
Recent Advances in Nonsmooth Optimization, pp. 438-458 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
A SLP Method with a Quadratic Correction Step for Nonsmooth Optimization Jianzhong Zhang Department of Mathematics,
City Polytechnic
Chengxian Xu Department of Mathematics,
Xian Jiaotong
Yuan-An Fan Frank Russell Co., Tacoma,
USA
of Hong Kong, Hong
University,
P.R.
Kong
China
Abstract To improve Fletcher-Sainz de la Maza method for composite nonsmooth opti mization problems, it is suggested in this paper that a method by Fontecilla can be incorporated to form a two- step movement. The new algorithm does not calculate a pair of orthogonal bases and thus the discontinuity problem pointed out by Byrd and Schnabel is avoided. Also, Powell's sufficient condition for superlinear convergence holds under rather mild conditions in the modified ver sion. It is shown that the revised method is globally convergent with a locally superlinear rate. Computational experiments have been conducted and the numerical results show that the performance of this new version is satisfactory.
1
Introduction
In this paper we consider the problem of minimizing t h e composite n o n s m o o t h func tion
(1.1)
A SLP Method
439
where / : R" -» Rl and c : R" -► Rm are twice continuously differentiable, h : Rm -> R1 is positively homogeneous (i.e. h(ac) = ah{c) for all a > 0) and polyhedral convex. Problem (1.1) has many applications, for example, it occurs when solving a system of nonlinear equations c;(x) = 0, i = 1,2,... ,m (m > n) by minimizing ||c(z)||i or ||c(ar)||oo, and when finding a local solution to a nonlinear programing problem using exact penalty functions [6]. The first order necessary condition for x. to be a local solution of (1.1) is that there exists a vector of multipliers A. 6 dh(cm) such that g. + A.X. = 0
(1.2)
where g, = V/(x„) and A. = [ V c i ( x , ) , . . . , Vc m (x,)] [6]. The subdifferential set dh(c,) = dh (c(x„)) of the polyhedral convex function h (c(x)) can be expressed as dh(c)
= {\\ \ = \ , + D.u : u e U}
(1.3)
where D, is an m x t matrix whose columns form a basis for the set dh(c,) — A,, t is the dimension of dh(c,) and U is a set in R'. This structure was first introduced by Osborne in [14]. If 0 6 IntU, or equivalently A, lies in the interior of dh(c,), then strict complementarity is said to hold. The second order sufficient condition for the solution x. is sTW,s > 0, Vs e S, (1.4) where m
W, = V 7 ( x „ ) + £A.,-V 2 C ,(x„)
(1.5)
1=1
and
S, = {s\ || S || = 1, (A,D,)Ts = 0} Under the conditions of strict complementarity and second order sufficiency, the prob lem min/(x) -f c(x) A. s.t.D^c(x) = 0
(1.6)
is locally equivalent to the problem (1.1) in the sense that x, minimizes (1.1) with multiplier A, if and only if x» solves (1.6). (we shall see later in Section 4 that if function i^(x) is formulated as an exact penalty function for optimizing smooth nonlinear programming problems with equality and inequality constraints, Dk and D. are used to identify the active constraints at Xk and x..) Most methods for problem (1.1) are line search descent methods in which the search direction is the minimizer of the problem
min qk(8) = lk(6) + UT Bk6
(1.7)
440
J. Zhang, C. Xu and Y. Fan
where h(6) = fk + gl6 + h(ck + ATk6) fk = f(xk), gk = V / ( x i ) , ck = c(xk), Ak = [V C l (x t ), • • •, Vc m (x*)] and Bk is either the matrix Wk, that is defined by a formula similar to (1.5) but substituting x, and A. by i t and \k respectively, or an approximation to Wk. When nondegeneracy condition, strict complementarity condition and second order sufficient condition are satisfied, line search descent methods are locally convergent and the convergence rate is quadratic or superlinear depending on the closeness of Bk to Wk [17]. Fletcher and Sainz de la Maza proposed in [8] a globally convergent method for the solution of problem (1.1) using a trust region strategy. The F-S method is a hybrid method combining a successive linear programming method with a type of successive quadratic programming methods. In each iteration, at first, the following trust region type linearized subproblem min//c((5)
s.t-IHU < pk
(1.8)
is solved such that the multiplier vector At can be estimated and the linearized re duction A/* = 4(0) - lk(6k) =
(1.10)
which, as an approximation of problem (1.6), is a smooth minimization problem under a set of equality constraints. Note that Dk is the matrix of a basis for the set dh(ck) — Xk, and for some commonly used homogeneous polyhedral convex functions h, their corresponding forms of Dk can be found in [6]. Many methods are proposed for solving this type of optimization problems. Among them, one type of method makes QR decomposition for the Jacobian Ak = AkDk of the constraint functions obtaining a pair of orthogonal bases Yk in the range space R{Ak) and Zk in the null ~ T
space Af(Ak ), and then considers two movements in the two subspaces respectively. The two mutually orthogonal movements comprise a complete step. This type of method can be seen in [2], [3], [13], [17] and other papers. In [8] Fletcher and Sainz
A SLP Method
441
de la Maza carry out one iteration of the Coleman-Conn method [2], [3] for the auxiliary problem (1.10) and obtain a vector Sk, which is an alternative choice of the moving vector in respect of 8k. If the criterion V(xk)
-
(1.11)
is satisfied, they choose dk = 6k, otherwise set dk = Sk, where bk = Sj Bk8k and 6 is a constant in (0, ^ ) . To make a move, set xk+i = xk + dk if
Ak)-'AkT
(1.12)
to replace QR decomposition of Ak. As Pk is continuous with respect of Ak, this replacement avoids the trouble of discontinuity. We find that the idea can be adopted for our purpose and brings about a revised algorithm that possesses nice theoretic properties as well as satisfactory practical performance. This paper is organized as follows. In Section 2 we shall present a revised F-S al gorithm after a simple introduction of the Fontecilla method. It is then analysed in Section 3 that the new algorithm is globally convergent with a locally superlinear rate of convergence. In Section 4 some important implementation issues of the algorithm are explained, and a set of experimental results are reported in comparison with the original algorithm. It is found that in addition to solving the discontinuity problem, the revised algorithm has some other advantages, which we shall point out in Section 5.
2
The Revised Algorithm
The Fontecilla method solves general equality constrained nonlinear optimization problems min/(x) s.t.c(x) = 0
(2.1)
442
J. Zhang, C. Xu and Y. Fan
The Lagrangian function of problem (2.1) is defined by l(x,n) = f(x) + fiTc(x)
(2.2)
where /i 6 BT1 is a Lagrangian multiplier vector. Let Xk be a current approximation to the solution z . of problem (2.1) and fik+i be an estimate of Lagrangian multiplier vector. Let Bk be a positive definite approximation to the Hessian W, of the Lagrangian function at x», and be updated using quasiNewton formulae, for example, BFGS or DFP formulae. Then the iterate xjt+i in Fontecilla's method is simply taken to be z,t + i = Xk + dk
(2.3)
where the correction dk is a sum of a step hk S Af(A-[) and a step Vk satisfying c(xk + hh) + A\ vk = 0, i.e., dk = hk + vk (2.4) The step hk is used to update the matrix Bk so that the positive defmiteness of the matrix is maintained, that is, Bk+l = BFGS/DFP(B,, hk, yk)
(2.5)
where yk = ^xl{xk + hk, Hk+i)-Vxl(xk,
Hk+i)
(2.6)
VJ(x,fi) is the gradient of the Lagrangian function l(x,fj,) with respect to i . Define the multiplier estimate as »k+l = ~(ATk B^1 Ak)'1 ATk Blx gk
(2.7)
and take hk to be the solution of equations Bk hk = - V x / ( z t , ij.k+1)
(2.8)
Then it can be verified that hk £ jV(/tjT). Obviously, vk = -B;1
Ak(ATk B,1 Ak)'1 c(xk + hk)
(2.9)
satisfies c(xk + h) + A\* vk = 0. It is the hk determined by (2.8) and vk in (2.9) that form a complete step in Fontecilla's method. Under relatively mild conditions (see Eq.(3.2) below), Fontecilla's method satisfies Powell's sufficient condition [15] for twostep g-superlinear convergence. Furthermore, the convergence rate of the sequence {xk + hk] is one-step q-superlinear. When the Fontecilla's step is used to replace the Coleman-Conn step in F-S method iteration k can be described as follows.
A SLP Method
443
Step 1: Solve the trust region linear programming subproblem (1.8) to give 6k, Xk, Dk, Alk and bk. Step 2: Form a Fontecilla step by using the following formulae: (I) p* + 1 = -(AJ B? A,)-1 Al Bkl gk (II) hk = -B^Vj^x,,), (III) vk = -B? Ak{ATkB? (IV) dk = hk + vk
(gk = gk + AkXk)
(Vjk(xk) = gk + Aktik+i) l T Ak)~ D kc(xk + hk)
(2.10) (2.11) (2.12)
Step 3: If
xk+l = xk + d k yk = Vjk(xk + hk)-Vjk(xk) Bk+1 = B F G S / D F P ( 5 , , hk, yk)
OAlkmm(l,Alk/bk) 0Alkmm(l,Alk/bk)
In the algorithm lk(x) is the Lagrangian function lk(x) = f(x) + XTkc(x) +
4+1Dlc(x)
of problem (1.10), 9 6 (0, | ) , 0 < ax < 1 < <J2 are fixed parameters, p m a x is a user supplied upper bound on pk. The initial matrix BQ is set to unit matrix / and some ad hoc techniques (see [6] for example) are used to maintain the positive definiteness of Bk when hjyk > 0 does not hold. We here use subscripts to replace superscripts, but do not change any other symbols of Fletcher-Sainz de la Maza method.
3
Convergence Analysis
Obviously, global convergence property proved in [8] is not affected as it is guaranteed by solving a linearized trust region subproblem in each iteration and by choosing 6k,
J. Zhang, C. Xu and Y. Fan
444
rather hk + vk, as a moving vector if hk + vk can not let the value of tfi(x) have enough reduction. Or to say in short, the global convergence is controlled by the criterion in Step 3, which is not changed in our revised algorithm. Only the local convergence rate should be considered here. The following hypotheses will be assumed at the rest of the paper. (Al) xk -> x„ Xk -> A,. By the global convergence results of the F-S method, this assumption ensures that A, G dh(c.) and g. = g, + A,X, = 0. (A2) A, = A,D, and Ak = Ak Dk (for all k) have full column rank. (A3) { Bk } is bounded and uniformly positive definite, i.e., there exist constants b and b such that ||flfc|| < b, \\B?\\<1, Vfc (3.1) (A4) For Pk given in (1.12), ,.
\\Pk(Bk - W,)hk\\
(3 2)
£a — i j M j — - °
-
L e m m a 3.1 \\{AlB^Ak)-'\\<\\{AlAk)-'\\\\Bk\\
Proof:
(3.3)
We know that
lWy>||j,||V||fi*||,
Vj/#0.
Since A^ Ak is also positive definite and symmetric, xTATkAkx>\\x\\y\\(ATkAk)-l\\, Let Hk = A\Blx
Vx#0.
Ak. Using the above two inequalities, for any x ^ 0, we have xTHkx
= xTAlB-kxAkx
>
\\Akx\\2/\\Bk\\
^IkllVdl^ll'tK^^ni) Inequality (3.4) shows that
\*n(Hk)>mBk\\-\\(AlAk)-H)
(3.4)
A SLP Method
445
and therefore m1\\
=
XmM(H^)<\\Bk\\\\(ATkAk)-1\\ Q.E.D.
Let S = {k\ k is positive integer and dh(ck) =
dh(c.)}.
We first consider those iterations in which Dk = D„ i.e., all Dk, k 6 5, are a constant matrix. Lemma 3.2
For k £ 5, /ifc -> 0, u* -> 0, and therefore dk -* 0.
Proof: When fc —> oo, Ak approaches A,. Therefore, assumption (A2) implies that \\{A[Ajt)_1|| is bounded for sufficiently large k € 5. On the other hand, gk = 9k + AkXk —» g, 4- A,A. = g» = 0. So, by the definition of nk+\ (see (2.10)), Lemma 3.1 and assumption (A3) we know fik+i —> 0, which by (2.11) shows /ifc —► 0. Since problem (1.6) is locally equivalent to problem (1.1), x* must satisfy Djc, = 0, which implies that for k € 5, D^c(xk-\-hk) = D^c(xk-\-hk) —» £)^c(x») = 0. Therefore from equation (2.12) we have Ufc —> 0. Q.E.D. The next lemma gives a minimizer of the unconstrained optimization problem (1.7) for sufficiently large k g S. Lemma 3.3
Let vk =
-B?Ak{ATkB^Ak)-*DTkc{xk)
and let 6k be the optimal solution of the unconstrained optimization problem (1.7). Tnen for sufficiently large k 6 S, 6k = hk-rvk
(3.5)
Remark 3.1 The only difference between 6k and the vector dk used in the revised method is that the quantity c(xk + hk) in (2.12) is replaced by c(xk). If we use 8k to form a sequence xk+i = xk+Sk as Womersley did in [17], then under a condition similar to (3.2), {xk} will be locally superlinearly convergent, but its global convergence is not guaranteed. Proof:
Obviously 6k is also an optimal solution of the problem
mmf(6) + h(c(6))
446
J. Zhang, C. Xu and Y. Fan
where f{6) = g^S + 6TBk6/2, and c(S) = ck + ATk6. Let ck = c(6k). As proved in [9], for sufficiently large k 6 S, dh(ck) = dh(c,) and thus for large k G 5 , A* € dh(ck) = dh(c,) = dh(ck), which means that, just as problems (1.1) and (1.6) are locally equivalent, 6k is also a n optimal solution of the problem min/(<$) + A[c(<5) s.t DTkc{6) = 0
(3.6)
According to the Karush-Kuhn-Tucker condition for smooth constrained optimiza tion, there exists a vector r such that V / ( 4 ) + Vc(«*)A* + Vc(6k)Dkr
= 0
(3.7)
Equations (3.6) and (3.7) can be rewritten as gk + Bk6k + Akr = 0 h + A T k8 k = 0
(3.8) (3.9)
where ck = D\ck. Multiplying (3.8) by —(AkrBk1Ak)~1AkBk1 (2.10), we have M*+, - r =
and observing the definition of nk+i in {AlBklAk)-lATk6k
Solving r from the above equation and substituting it into (3.8), we obtain
=-Bkigk-Bk-lAkr = -B^ih + Akfik+1) + = -Bk\gk + Aktik+1) = hk + vk
Sk
B^AkiAlB^A^AlS, B^MAlB^A^'c,
which completes the proof.
Q.E.D.
We will now follow the process developed by Fletcher and Sainz de la Maza [8] to prove the main lemmas for predicted and actual reductions in ip(x) when a Fontecilla step is taken. L e m m a 3.4
For large k e S, the predicted reduction Aqk = qk(0) - qk(6k) is given
by A
1k = ^hTkBkhk
Proof:
- -vTkBkvk
- fkvk
+ h(ck) - \Tkck
(3.10)
From equation (1.7), Ag/t = -gTk8k + h{ck) - h(ck + ATk8k) - -6TkBk6k
(3.11)
A SLP Method
447
It follows from the fact hk € ^f(Aj)
that
vTkBkhk = -clDkiAjB;1
A*)" 1 ATkhk = 0.
(3.12)
Therefore, we have 8TkBk8k = hTkBkhk + vTkBkvk.
(3.13)
On the other hand, equation (3.8) generates 8Tkgk + 8TkAkr + 8TkBk8k = 0. From the definition of gk, the fact hk 6 JV(AJO> and (3.13), we have &k9k = SlAkXk = -8lAkXk = -8TkAk\k
anc
- vTkAkr -
l
tne
equations (3.8), (3.12), (3.5)
8lBk8k
+ vl(gk + BkSk) - 8TkBk8k + vTkgk - hlBkhk
(3.14)
Finally, since At € dh(ck) and h is homogeneous, h{ck + ATk8k) = h(ck) = XTk(ck + ATk6k)
(3.15)
Substituting (3.13), (3.14) and (3.15) into (3.11) we obtain (3.10). Q.E.D. The following lemma gives the relation between the actual reduction Atpk obtained in a Fontecilla step and the predicted reduction Aqk given in (3.10). Lemma 3.5
For large k G S, Atpk ~ Aqk = o(\\hk\\2) + o(\\vk\\)
(3.16)
wiere Aipk =
From problem (1.1), &
(3.17)
Let c'k = c(xk + hk) + ATkVk- Expanding c(xk + dk) = c(xk + hk + vk) at xk + hk and using Lipschitz continuity of the polyhedral convex function h, we have h (c(xk + dk)) = h(c'k + o(\\vk\\)) = h(c'k) + o(\\vk\\)
(3.18)
J. Zhang, C. Xu and Y. Fan
448
Since dk —> c„, when k —► oo we know that dh(dk) C dh(c.) for large k (see Lemma 1.2 in [8]). On the other hand, when k € S, it follows from the definition of Vk (see equation (2.12)) that DT.c'k = DTkc'k = DTkc{xk + hk) + ATk vk = 0 Then, again the Lemma 1.2 in [8] gives dh(c) = dh(dk). Thus, from the Lemma 1.1 in [8] we have % i ) = A J 4 = Afo** + fct) + \TkATkvk 1
m
= Aftc* + ATkhk) + -
YWUhttdixkihk
+AfA^ f c + 0 ( | | ^ | | 2 )
(3.19)
Substituting (3.18) and (3.19) into (3.17), and using the fact / ( * * + d„) -h=
T 9 kdk
+ \hTkV'fkhk
+ o(\\hkf)
+ o(\\vk\\)
we obtain, A<^ =
h(ck) - (hk + vk)T(gk + Ak\k) -\lck + o{\\hk\\>) + o(\\vk\\)
\hTkWkhk
(3.20)
where Wk is the Hessian matrix of the Lagrange function f(x) + \%c(x) evaluated at x k. From the definition of hk (see Eq.(2.11)) and the property that hk 6 Af{Ak), w e obtain hjBkhk = -hTk(gk + AkHk+i) = -hTkgk (3.21) Substituting (3.21) into (3.20) gives A
hTkBkhk-\hTkWkhk-vl~gk + h{ck) -\lck + o{\\hkr) + o(\\vk\\)
V-22>
Equations (3.10) and (3.22) tell us that A
\hTkBkhk-\hTkWkhk + \vlBkVk + ( ^ - ^ ) T ^ + o(ll^|| 2 ) + o(||^||)
, , _ - J
(3 23
As hk = Pkhk, under assumption (A4), hTk(Bk - Wk)hk = hTkPk(Bk - Wk)hk = o(\\hk\\2) From the fact hk G A/"(A£) and the boundedness of the matrix we have vk - vk = B^AkiAlBk'A^Dllcix, = Bk-'Ak{ATkBVAkVDTk[ATkhk = 0(\\hh\\2)
Bkl
(3.24) A^A*
Bkl
Ak)~l
+ hk) - c(xk)} + 0(\\hk\\2)} (3.25)
A SLP Method
449
Hence, (vk-vk)Tgk
= o(\\hk\\2)
(3.26)
and vTkBkvk = vjBkvk
+ o(\\vk - vk\\)
= o(|K||) + 0 (||^f)
(3.27)
Substituting (3.24), (3.26) and (3.27) into (3.23) generates (3.16). Q.E.D. Having proved Lemma 3.4 and Lemma 3.5, we can obtain Aipk — Aqk = o(||A
For k 6 S, when k —> co A
(3.28)
Therefore, the step dk = hk + vk produced in the revised algorithm satisfies the criterion (1.11) for sufEciently large t e S . In other words, for large k e S, xk+i = xk + dk. Proof:
Since hk G Af{Al), ck = DTkck = DTk(ck + ATkhk) = DTkc(xk + hk) +
0(\\hkf)
Therefore ||^c(xi
+
^ ) | | = 0(||c,||) + 0(||^|| 2 )
and Huftll = || - B^Ak(ATkB^Ak)-lDlc(xk
+ hk)\\ = 0(\\ck\\) + 0(\\hk\\2)
(3.29)
Notice that equation (3.12) in [8] is still true, and is expressed here as M v
> h{ck) _ XTkck > v\\ck\\
(3.30)
where v is a constant in the interval (0, 1). We divide the index set S into two collectively exhaustive subsets: S = Si U S2, such that Si = {k\\\ck\\=o(\\hk\\2), 2
k£S]
s2 = U|||M = o(|N|), kes}
450
J. Zhang, C. Xu and Y. Fan
and inspect the following two different cases: (1)
For k 6 Si, from (3.29) we have
I M = 0(11**11*)-
(3-31)
According to relation (3.16), we obtain AVk-Aqk
= o(\\hk\\2).
(3.32)
From Eqs. (3.27) and (3.31), we have vTkBkVk = o(\\hk\\2)
(3.33)
As gk —> 0, it can be deduced from (3.25) and (3.31) that vTkgk = o(\\vk\\) = 0 ( | K | | ) + o(\\vk - vk\\) = o(\\hk\\2).
(3-34)
Substituting (3.30), (3.33) and (3.34) into expression (3.10), we obtain Aqk = \hTkBkhk
+ o(\\hk\\2)
> \\hk\\2/4b
(3.35)
for large k £ Si, where the last inequality comes from assumption (A3). Then (3.32) and (3.35) imply Atpk/Aqk -> 1, when k ^ oo (3.36) (2)
For k 6 S 2 , (3.29) gives \\vk\\ = 0(||£ fc ||), and thus (3.16) implies A
(3.37)
From equation (3.27), we have vkrBkvk
= o(\\ck\\)
(3.38)
whereas vlh
= o{\\vk\\) = o{\\vk\\) + o(\\vk - vk\\) = o(||£ t ||).
(3.39)
Combining (3.10), (3.30), (3.38) and (3.39), and using assumption (A3), we have Aqk > h(ck) - \{ck + o(\\ck\\) > ^\\ck\\
(3.40)
for large k € S 2 . Equations (3.37) and (3.40) ensure that Aipk/Aqk-^\,
when k -4 oo.
And equations (3.36) and (3.41), together, give the limit (3.28).
(3.41)
A S I P Method
451
The second conclusion of the theorem follows from (3.28) because it is pointed out in [8] that A g t > i A / * m i n ( l , Mk/bk). Q.E.D. Our next result shows that the correct active set at optimal solution x, is ultimately identified by the algorithm, i.e., k € S holds for all sufficiently large k. In this case, as Fontecilla has proved, the sequence {x*} converges 2-step q-superlinearly, whereas the sequence {xk + hk} converges 1-step q-superlinearly (see Theorem 4.6 and Theorem 5.1 in [9]). To prove this result, we need another two assumptions. (A5)
Strict complementarity holds, i.e., 0 e Int U. (see (1.3))
(A6)
{(Af A t ) - 1 } is uniformly bounded.
T h e o r e m 3.2 Under Assumptions (Al) - (A6), k E S for all sufficiently large k, i.e. starting with a certain iteration, it is always true that xk+i — xk -+■ hk + vk. Proof: Let S 1 = {k\ k $ S] = {k\ dh(ck) ^ dh(cm)}. Assume SL contains an infinite subsequence, from which we shall derive a contradiction. Let 5 be a thinner subset of 5 X , on which dh{ck) and the corresponding matrix Dk are fixed. According to the Theorem 2.3 in [9], the strict complementarity condition implies dh(ck) 5 dh(c,). The same theorem also points out that when Djc, = 0, dh(ck) C dh{c). Therefore, for k € S, Dfc, # 0. By assumption (A6) and Lemma 3.1, we know that {AlB^Ak)-1 is bounded. So, from the fact that §k —> 0 and the definitions of fik+i and hk (see (2.10) and (2.11)) we have Hk+\ —* 0 and hk —» 0. As shown in the proof of Lemma 3.5, D\c'k = 0, i.e., ATkvk = -Djc{xk + hk). Thus ATkvk = -DTkc
+ o(l)
(3.42)
For ifc S S, since D\c, ^ 0 and Ak is bounded, (3.42) means that when k is large, vk is uniformly bounded away from zero. As dk = hk 4- vk, it is immediate to obtain that there exists a constant 7' > 0 such that ||4II > 7',
for
large k G S.
(3.43)
Since Xk € dh{ck), and Dk is a basis for dh(ck) - Xk, by Lemma 1.1 of [8], Dkck = Dl(ck + A p t ) = 0. But for k e S, it is clear that D\ck -» D\c Therefore AUk = -DTkck = -Die
+ o(l)
(3.44)
J. Zhang, C. Xu and Y. Fan
452
and from which we obtain the conclusion that 6 k (k £ S) is bounded away from zero. So, by reducing the value of 7', if necessary, the inequality INI>7',
for large*: € 5
is also true. As trust region radius pk > \\6k\\, we have Pk > l \
for large k £ S.
Since the number of different sets dh(Zk) is finite, SL can be partitioned into a finite number of thinner subsets. So there exists 7 > 0 such that ||<41| > 7> \\f>k\\ > 7
and
pk > 7
for sufficiently large k £ S x .
(3.45)
In the algorithm, Xk+\ has three possible choices: Xk + hk + Vk, Xk + 6k, or Xk- Since Xk —> x,, Xk+i —Xk—>0. So (3.45) means that Xk+i = Xk for sufficiently large k £ S1 (say k > k). In this case, the radius of the trust region is reduced to <J\Pk, with <Ti < 1. Hence, from (3.45), it is impossible that k 6 S1 for all large k. In other words, there is an infinite subsequence {kj} such that k} £ S. For sufficiently large kj £ S, Theorem 3.1 indicates that the Fontecilla step dk] satisfies (1.11) and Xk,+\ = Xk, + dkr Thus Xk —* x, means dk —» 0, i.e. ||4,|| < 7 M ,
for;>J
and
%£ 5
(3.46)
where a2 > 1 is given in the algorithm. Without loss of generality, we can assume
kj >k. It can be proved that there is J ' > J such that Pkj, < 7
(3.47)
L
Otherwise, for any k > kj, if k £ S , then xk+\ = i t and hence by Step 5 of the algorithm, pt+i = ffipt; whereas if k £ 5, then according to (3.46), pk > 1 > \\dk\\, so that the part (IV) of the Step 6 in the algorithm gives pk+\ = pk- The two possible cases mean that {pk} would be a decreasing sequence approaching zero, violating (3.45). Next, we use the method of induction to prove that for all k > kji, pk < 7, and hence k £ 5. At first, as shown in (3.47), kj, is such an index. Now assume that a k > kjt satisfies the conditions Pk < 1 and k £ S. There are two possibilities: (a) Pk < 7/02. Then pk+i < a2pk < 7- (b) Pk > 7/^2- Then (3.46) gives pk > \\dk\\, and the part (IV) of Step 6 in the algorithm implies pk+\ = pk < 7. Therefore p^+i < 7 holds for both cases, which implies k + 1 £ S. The conclusion that k £ S for all k > kji contradicts the assumption that 5 contains an infinite number of indices. The proof is thus completed. Q.E.D.
A SLP Method
4
453
Numerical Results
Numerical experiments have been performed on an IBM personal computer with the Fontecilla step and also the C-C step being used in F-S method to solve the subproblem (1.10). The test composite minimization problems are formulated from general smooth nonlinear constrained optimization problem min / ( x )
s.t. a{x) = Oi = 1,2,...,mE
(4.1)
Ci(x) < 0,i = mE + 1 , . . . , m by using the /j exact penalty function [6] rnE
m
¥>(*) = / ( x ) + £ | c - ( x ) | + i=l
£
max(c,(x),0)
(4.2)
i=mE+l
The objective function / ( x ) is first scaled by a scaling parameter v so that ||\'\\„> < 1 (see [6] for details). The linearized trust region subproblem (1.8) is converted into a standard linear programming with bounded variables by introducing slack variables r+ = max{c,(xfc) + Vc,(xt) T (5,0} t = 1,2,... ,m r~ = -min{c,(x(..) + Vc,-(x,b) <5,0} t == 1 , 2 , . . . , m and the converted linear programming problem is then solved by using a standard code of the Simplex method for bounded variables. The converted LP problem has the form minti) y s.t.[B : Im]y = b /< y < u
(4.3)
where y
= (8T,(r+)T,(r-)T)TeRn+2m,
r+ = ( r + , . . . ! r + ) T , r~ = ( r f , . . . , r " ) T
» = (9l*i+mE>0Tm-mE)T,
om-mE = b=-c(xk),
em+mE = ( 1 , 1 , . . . , 1) T 6 T
Rm+mE,
m mE
(o,o,...,o) eR -
l = (-pkeTn,0l,0TJT,
u = {p&oo^ool)*,
0 m = ( 0 , . . . , 0 ) r € i? m
oo m = ( o o , . . . , oo) T , B = [ATk : - / „ ] G J R-x("+-) If some variables in problem (4.1) have lower or/and upper bounds, then the bounds on corresponding elements of 6 in (4.3) should accordingly be changed.
J. Zhang, C. Xu and Y. Fan For the lI exact penalty function (4.2), we know that the columns of Dk are simply the coordinate vectors ei for the indices i that make ci(xk) = 0, i.e. the matrix Dk provides the current estimate of the set of active constraints. The BFGS updating formula
is employed to update the whole Hessian approximation Bk (with Bo = I) used in the Fontecilla step. The LDLT form of Bk is stored and updated by calling the routine M C l l A [7] so that the positive definiteness of Bk can be controlled. In the implementation, the horizontal step hk is restricted to lie within the trust region box. Although hgyk 0 rarely occurs in practical use, hcyk > 0 is not guaranteed by any line searches. Powell's ad hoc strategy [6] is used to determine a convex combination of the vectors yk and Bkhk to replace yk in the updating formula (4.4), so that the positive definiteness of Bk is maintained.
<
The implementation of the C-C step in the original F-S method is more complicated as C-C step uses approximations of the reduced-Hessian. If at xk there are r active conttraints, then in the QR decomposition of Ak E RnXr,the orthogonal basis of N ( A r ) , Zk, has n - r columns and hence the reduced Hessian ZzWkZk has order n - r. In this case we follow basically the strategy used in F-S method, that is, adopt the revised BFGS formula, suggested by Nocedal and Overton in [13], for updating approximate reduced Hessian, see section 4 of [8], but with M ( k ) therein being expressed as Bk. The N-0 criterion (see (4.1) of [8]) is also used to skip some not well defined update so that the BFGS update is always well defined. We denote by Sk the set of active constraints at xk. When the sets Sk and Sk+l are different, before the updating is carried out, we reset the reduced Hessian approximation Bk = LkDkLf to Lkbkifwhere 2; = L ~ Z ~ Z kand +l if ISkl = lsk+lI
> ISk+ll the first lSk 1 rows & columns of Dk if lSkl < ISk+ll if ISkl
(JSklstands for the cardinality of the set Sk and I is the identity matrix of dimension lSkl - ISk+llL) -Gi_ven9ssquare root free transformation [ll]is then used to factorize the matrix LkDkLr into LDLT form and at last the routine M C l l A is called to obtain the LDLT form of Bk+l. To realize Step 2, after obtaining matrix L - ~ A by ~backward substitutions, we use Given's square root free transformation to factorize the matrix A ~ B F = ~ AA; (~L D L T ) - ~ A ~ into LbLT form. Then all the vectors p k + l , hk and vk in step 2 of the algorithm can be obtained by backward and forward substitutions.
A SLP Method
455
Some modifications are made in this experiment. One is that once xjt+i is reset to %k, i-e., Step 5 is carried out, maximum 3 linearized trust region subproblems with reduced pk are successively solved, without calculating the Fontecilla or C-C Step (i.e. skip Step 2-3), to find a point to reduce the value of y(x). If a better point is still not found after three times, we resume complete iterations. Another modification is that *Vxh(xk+i) is used to replace Vxlk(xk + hk) in the definition of yk, so that an extra evaluation of gradient can be avoided at each iteration. Our numerical experiments show that this modification does enhance the efficiency of the algorithm. The experiments are carried out on 10 small standard test problems. Brief informa tions about these test problems are given at the first two columns of Table 4.1. A detailed reference for these problems can be found in [16]. The parameters in the algorithm are set to the values of 8 = 0.1, ax = 0.2,
< 10 - 6
(4.5)
is used to terminate the iteration. The numerical results are given in Table 4.1. In the table, n,m and nb are the numbers of variables, constraints and bounds on variables, respectively; SLQQR and SLQF stand for the F-S method with C-C steps and with Fontecilla steps, whereas NI, NSQ, NSL, NF and NG represent the numbers of iterations, successful SQP steps, successful LP steps, function evaluations and gradient evaluations, respectively. As shown in [8], the performance of the method SLQQR are comparable with, or even better than the best results obtained by other first derivative methods [12, 13]. However, Table 4.1 shows that the performance of the method SLQF is comparable with that of the method SLQQR, and improvements are obtained on the numbers of iterations and function evaluations when the Fontecilla step is used to replace the step obtained by QR decomposition. These results show that SLQF method is promising.
456
J. Zhang, C. Xu and Y. Fan Table 4.1 Numerical results
Problems Wright 1 Chamber!] an Fletcher 1 Fletcher 2 Rosenbrock Colville 1 Wright 2 Powell Mulcai and Polak Hock and Schittkowski
n
m
SLQQrt
T.6
2 2 2 2 2 4 S 5 6
1 1 1 1 0 0 3 3 2
0 0 0 0 4 8 0 0 2
7
4
0
NI 4 6 6 4 51 18 7 4 15 12
N3Q
NF
NG
4 6 6 4 51 18 7 4 15
3 5 5 4 30 18 5 4 12
3 5 5 4 28 15 5 4 11
0 0 0 0 2 3 0 0 1
4 6 6 5 43 25 6 5 15
4 6 6 4 31 19 6 4 13
12
10
9
1
14
11
NSL
JVP
NG
4 6 6 4 46 15 7 4 13
0 0 0 0 5 3 0 0 2
7 7 5 70 25 8 5 21
10
2
19
5
SLQF NSL
NI
NsQ
Some further improvements on the method may be made to increase the efficiency for practical use. For example, using a special purpose l\ LP solver and some sparse ma trix techniques, and making a line search after obtaining <4 are potential to improve the method.
5
Conclusion
In this paper we revised Fletcher-Sainz de la Maza method for optimizing composite nonsmooth optimization problems by incorporating their linearized approximation model of the problem with Fontecilla's method that uses orthogonal projection op erator as a main tool to solve equality constrained smooth nonlinear optimization problems. In general in each iteration our algorithm solves one LP problem and calculates one Fontecilla step. The algorithm is globally convergent with a local superlinear rate. The convergence analysis of the revised method does not depend on the continuity of the null space basis that the original method requires, but usually fails to hold. Also, Powell's sufficient condition for achieving a two-step q-superlinear rate is more likely satisfied by introducing the Fontecilla step. A numerical experiment has been conducted and a comparison of the computational results shows that the new algorithm is comparable to the original one with some perceivable and quite stable improvement. Furthermore, the implementation of the revised algorithm is easier as the working matrix Bk does not need to expand or contract when the set of active constraints changes in the progress of computation. Although any general LP solver can be employed to solve the linearlized model, a tailor-made method for the special purpose has been proposed, see [18].
A SLP Method
457
A c k n o w l e d g e m e n t . The authors gratefully thank the partial support of the Croucher Foundation of Hong Kong and the City Polytechnic of Hong Kong (grant 700308).
References [1] R. H. Byrd and R. B. Schnabel, Continuity of the null space basis and constrained optimization, Mathematical Programming 35 (1986) 32-41. [2] T. F. Coleman and A. R. Conn, Nonlinear programming via an exact penalty function: Asymptotic analysis, Mathematical Programming 24 (1982) 123-136. [3] T. F. Coleman and A. R. Conn, Nonlinear programming via an exact penalty function: Global analysis, Mathematical Programming 24 (1982) 137-161. [4] T. F. Coleman and D. S. Sorensen, A note on the computation of an orthogonal basis for the null space of a matrix, Mathematical Programming, 29 (1984) 234242. [5] J. E. Dennis and J. J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Mathematics of Computation 28 (1974) 549-560. [6] R. Fletcher, Practical Methods of Optimization, Wiley, Chichester, (1987). [7] R. Fletcher and M. J. D. Powell, On the modification of LDLT Mathematics of Computation 28 (1974) 1067-1087.
factorization,
[8] R. Fletcher and E. Sainz de la Maza, Nonlinear programming and nonsmooth optimization by successive linear programming, Mathematical Programming 43 (1989) 235-256. [9] R. Fontecilla, Local convergence of secant methods for nonlinear constrained optimization, SIAM Journal on Numerical Analysis 25 (1988) 697-712. [10] R. Fontecilla, T. Steihaug and R. A. Tapia, A convergence theory for a class of quasi-Newton methods for constrained optimization, SIAM Journal on Numerical Analysis 24 (1987) 1133-1151. [11] W. M. Gentleman, Least squares computations by Givens transformation with out squares roots, Journal of the Institute of Mathematics and Applications 12 (1973) 329-336. [12] C. B. Gurwits and M. L. Overton, Sequential quadratic programming methods based on approximating a projected Hessian matrix, SIAM Journal on Scientific and Statistical Computing 10 (1989) 631-653.
458
J. Zhang, C. Xu and Y. Fan
[13] J. Nocedal and M. L. Overton, Projected Hessian updating algorithms for nonlin early constrained optimization, SIAM Journal on Numerical Analysis 22 (1985) 821-850. [14] M. R. Osborne, Finite Algorithms in Optimization and Data Analysis, (1985), Wiley, Chichester. [15] M. J. D. Powell, The convergence of variable metric methods for nonlinearly constrained optimization calculations, in Nonlinear Programming 3, O. L. Mangasarian, R. Meyer and S. Robinson eds., Academic Press, New York, (1978) 27-63. [16] E. Sainz de la Maza, Nonlinear programming algorithms based on l\ linear pro gramming and reduced Hessian approximations, Ph.D.Thesis, Department of Mathematical Science, University of Dundee, (1987). [17] R. S. Womersley, Local properties of algorithms for minimizing nonsmooth com posite functions, Mathematical Programming 32 (1985) 69-89. [18] C. Xu and J. Zhang, An active set method for general /i linear problem subject to box constraints, to appear in Optimization.
A Successive
Approximation
for
NCP
459
Recent Advances in Nonsmooth Optimization, pp. 459-472 Eds. D.-Z. Du, L. Qi and R.S. Womersley ©1995 World Scientific Publishing Co Pte Ltd
A Successive Approximation Quasi-Newton Process for Nonlinear Complementarity Problem
Shu-zi Zhou, Dong-hui Li and Jin-ping Zeng Department of Applied Mathematics, University 410082, PRC
of Hunan,
Changsha,
Hunan,
Abstract
In recent years, several versions of damped Newton method for solving nonlin ear complementarity problem have been proposed based on its equivalent nons mooth equations. Global convergence is well established. As for quasi-Newton methods, local and semi-local convergence has also been proved. In this paper, we study the global convergence of Broyden-like methods on the basis of suc cessive approximation Newton process given in [1]. A new line search technique is introduced here. Under suitable conditions, we get the global convergence of the method. Numerical results are also given in the paper.
1
Introduction
We consider t h e following nonlinear complementarity problem of finding an x 6 Rn such t h a t
x>0,F{x)>0,xTF(x) 'The Research Is Supported by the NNSF of P.R.China
= 0
(1.1)
S.Z. Zhou, D. H. Li and J. P. Zeng
460
where F is a mapping from Rn into itself. Problem (1.1) is simplified as NCP(F). Numerical methods for solving NCP(F) have been developed rapidly. Many algo rithms for solving nonlinear equations have been applied to solve NCP(F)(see [14] and [16] for details). Among these methods, great attention is paid to the linearized Newton and quasi-Newton methods because of their local quadratic and superlinear convergence respectively(see [10], [16] and [22]). Since 1990's, a new path for solving NCP(F) appears. The NCP(F) is transformed as an equivalent nonsmooth equations to be solved. Various kinds of damped Newton methods with global convergence are proposed (see [6], [8], [14], [15], [19] and [20]). The first one may due to [14]. It reduces (1.1) into the following equivalent so-called B-differentiable equations: H(x) = mm{x,F{x)}
=0.
(1.2)
Replacing F-derivative by B-derivative in the traditional damped Newton method for solving nonlinear smooth equations, [14] establishes a damped Newton method for (1.2). Another important type of damped Newton method for solving (1.1) belongs to [19]. It changes (1.1) into another equivalent nonsmooth equations of the following form: H(x) = fk(x)+gk(x) =0 (1.3) where f^ and * are mappings from Rn into itself with fk being F-differentiable and gk not, but relatively small. A so-called successive approximation damped Newton method is proposed. The numerical results show that it is a useful method. As for the corresponding quasi-Newton methods, only [9] gives a local convergence theorem based on [14]. It is not clear whether the global convergence is true, though the numerical results in [9] show the possibility. In our paper, we study the global convergence of Broyden-like methods. A successive approximation scheme related to [19] is established. A new line search technique is introduced here. Under suitable conditions, we get the global convergence of the method. In the following section, we give a simple review of successive approximation Newton method described in [19]. Based on this, we propose the corresponding quasi-Newton method in section 3. A new line search process is also stated in the section. In section 4, particular attention is paid to the Broyden-like method. Global convergence is discussed. At last, some numerical results are given in section 5, which show the usefulness of the algorithm.
A Successive Approximation
2
for NCP
461
Review of the Successive Approximation New ton M e t h o d
In this section, we will state the algorithm. For this purpose, we first give the concrete decomposition form of (1.1) described in [19] as follows: Let efc be a given positive number sequence. /*(x), gk(x) and fk(x), gk{x) are map pings from Rn into itself with components [fk(x)]i, [gk(x)]i and [fk(x)]i, [gk{x)]i , defined by the following: k
^
=
=
fi(»)-x, 4tk
+
2et
x . - ^ x ) - ^ iek
(|x,-j;(x)|-eoa
= min{l<)Fi{x)]
_ ek 4
_ [fk{x)]h
( 22 )
4€fe
and \f r„X\ _ / mm{Xi,Fi{x)}, [fk ^' ~ \ [/<=(*)]„ M l ) l i
- \
[9k(x)h,
ii \x, - F,(x)\ > ekif \xi - F,(x)\ < ek. ii\x,-F,{x)\<ek.
(2
. -3)
{2A)
Then the solutions of (1.1) coincide with the following nonlinear equations: H(x) = min{x, F(x)} = fk(x) + gk(x) = 0
(2.5)
where fk and gk are defined by (2.3) and (2.4) respectively. It is not difficult to see that \\gk\\ = sup \\gk{x)\\
(2.6)
X
and fk is continuously differentiable with Jacobian being determined by f'k(x) = ( [ V / t W ] , , - , [ V / ^ ) l n ) T , where e„ VFi(x),
\?fk(*)]i =
if Xi < Fi{x) - tk if Fi(x) < x{ - tk (2 7)
£-(Xt-F,(x)){VFt(x)-e,) + ^ ( V F , ( i ) + e,),
'
otherwise.
and e, denotes the i-th coordinates.
Let 0(x) = ^-\\H{x)f
(2.8)
S.Z. Zhou, D. H. Li and J. P. Zeng
462
ek(x)=1-\\Mx)\\>
(2.9)
Then the successive approximation Newton method (SANM) in [19] can be stated as follows: Algorithm l ( S A N M ) . Given p, a G (0,1), initial vector x0 G R" and a positive number e0 < ^||/T(xo)||- Let 0 < a < 1 - a,k = 0. 1°. Solve the following equations H{xk) + f'k(xk)Pk
= 0.
(2.10)
to get pk. 2° Set xk+i = xk + pmkpk,
where mk is the smallest nonnegative integer m such that
Qk(xk + pmPk) 3° If tk < a\\H(xk+i)\\,
- 0k(xk)
< -2<rpmO{xk).
(2.11)
we let tk+\ = ck; otherwise, define tk+i so that ekn
<min{^\\H(xk+1)l±\\9k\\},
(2.12)
k := k + 1, go to step 1°. Under suitable conditions, [19] obtained the global convergence of above algorithm.
3
Successive Approximation Quasi-Newton Method (SAQNM)
For simpleness, in the later of the paper, we simplify (xk)j, (F(xk))i, etc. as xf, Fk resp. Noticing the component expression of f'k(x), we can rewrite (2.10) as: ' Xi+Pi=0, Ff + (VF,k)TPk
= 0,
for ie Ak(xk); for i G Ek(xk);
' H" + ifM ~ F^VF-k - e^TPk + ^(VF,k + tt)TPk
= 0,
(31)
for i G Ck(xk),
where Ak(x)
= {i\Xi < Fi(x) - tk}
A Successive Approximation
for NCP
463
Ek{x) = {i\Fi{x) <xi- tk) Ck{x) =
{i\\xi-Fi{x)\
To establish the corresponding quasi-Newton methods, we substitute F'(xk) matrix Bk in (3.1). That is to say, pk satisfies the following linear equations: ' ** + P? = 0, Ft + (B'fpk = 0, H* + ^(x>< - F?)(B? - eifPk + \{B$ + ei)TPk = 0,
by a
for i e Ak{xk); for i 6 Ek(xk); (3-2) for i 6 Ck{xk).
where (B, ) is the i-th row of Bk- However, for such a pk, the line search process in algorithm 1 may fail to be fulfilled. In other words, pk generated by (3.2) may be an ascend direction of 0^ at xk. For this reason, a new line search technique should be introduced. To do so, we define ;,i
WA)
=
fk{xk)T[fk(x ~ jk\^k-r fkjxk + "fk Xpk)]\ Jk^kj Uk\->-k) — k) , i . MI2 m 1727 > f M J ,_ ^ 1 / mzx{\\fk(xk)-fk(xk + \Pk)\\2,\\\pk\\2\
to o\
I33)
Xk is chosen so that \\fk(xk + Xkpk)\\ < \\fk(xk)\\Denote Nk(X) and Dk(X) being the numerator and denominator of qk(X) respectively. We give the following line search process. Algorithm 2.(Line Search Process) Given p G (0,1), t G (0,1/6), A[0) = 1. For i = 0,1, • ■ •, detect the following inequality successively ? * ( A l i ) ) > i + e. If (3.4) holds, then let \k = A£0. Otherwise, when Nk{Xf) when Nk(\ki]) < 0, take x[i+l) = -pXf
(3.4) > 0, take A£' + 1 ) = pXki];
Remark. The line search process is very similar to the derivative-free line search in [5]. If fk(xk)Tf'k(xk)pk / 0, then it is easy to see that Algorithm 2 terminates in finite steps. Now, we consider the case when fk(xk)Tf'k(xk)Pk = 0. From the construction of Algorithm 1, we have 0 = fk(xk)Tf'k(xk)Pk = fk(xk)TH(xk) 2 > \\H{xk)\\ - \\gk\\\\H(xk)\\ > (1 -
= (H(xk) cc)\\H{xk)f
gk{xk))TH(xk)
Since a < 1, fk(xk)Tf'k(xk)pk = 0 implies that xk is a solution of H(x) = 0. Hence in this case the algorithm terminates.
464
S.Z. Zhou, D. H. Li and J. P. Zeng
Now, we state SAQNM as follows: Algorithm 3. (SAQNM) Given p,ot G (0,1), i 0 6 i f , initial nonsingular matrix Bo- Let e0 < 3rll#( x o)||, k := 0. 1°. Solve (3.2) to get pk. 2°. Determine A* by algorithm 2. 3°. Set xk+1 = xk + \kPk4°. Update Bk to get
Bk+l.
5°. If tk < a||//(ifc + i)||, we let ek+\ = ek; otherwise, define e^+i such that (2.12) holds. Remark. In step 2°, we take / _ i ( i ) = fo(x)-
4
Global Convergence of Broyden-like M e t h o d s
In this section, we restrict our attention to Broyden-like methods, i.e. in the step 4° of Algorithm 3, Bk+i takes the form of: B R - D i A (g* ~ ^k)sj Bk+1-£)k +
(4.1)
where sk = xk+i —xk = Xkpk, yk = Fk+i — Fk, 4>k is chosen such that for some constant <j> G (0,1), when Bk is nonsingular, Bk+i is also nonsingular and I0t-l|
(4.2)
A concrete choice of such (f>k and 4> can be seen in [23]. Let A be a n x n matrix, ||A||f denotes its Frobenius norm denned by \\A\\l = Tr(ATA),
(4.3)
The following lemma is useful for the proof of the global convergence of Broyden-like methods.
A Successive Approximation L e m m a 1. rank L, i.e.
for NCP
465
Let F : R" -> RJ1 be continuously differentiable and F' be Lipschitz of \\F'(x)-F'(y)\\F
Vx.y.
(4.4)
{St} is updated by (4.1) and (4.2) with B0 nonsingular. If OO
£ ii*fc+i - **na < °o,
(4-5)
fc=0
then
HmiEtlw-^.IlVlKII^O. °°
(4.6)
fc=0
Proof. Denote Gk+1 = / F'(a;* + r(xk+1 - xk))dr.
(4.7)
Then we have yk = Gk+\Sk by the mean-value theorem (see [13] ). Therefore, it follows from (4.4) that \\Gk+\ - Gk\\F < L
Jo
\\xk + rsk - xk-y - rsk-i\\dT
and then
\\Gk+1-Gk\\F
+ 11^11).
(4.8)
Let ak = \\Bk - Gk\\F,
h = \\Gk+i-Gk\\F,
crk=\\yk-BkSk\\l\\sk\\.
(4.9)
We claim that OO
E=obl<<x-
( 4 - 10 )
k
By the update formulae (4.1) and (4.2), we deduce Bk+\ - Gk+1
=
T
Bk - Gk+i - <j>k(B k
=
- Gjt+i)-7|—||5
(Bk-Gk+i)(I-hj^2\
where 7 is identity matrix. Taking Frobenius norm in both sides of the above equa tions , we get that
4+1
=\\Bk-Gk+A\F + {
y
■'
466
S.Z. Zhou, D. H. Li and J. P. Zeng
It implies k
ak+
(4.12) (4-12)
j=o
By (4.11), we deduce that 2 (1 -
It follows from (4.12) that k-1
(1 -
Summing the above inequalities from k = 0 to k = I - 1, we get fe 6 ( i - ^ ) *:=0 E ^ < a o + 2E^(«o +JE J) + E i *:=0 =0 *=0 k=0
k=0
j=0
k=0
i.e. 2 (i-^ )X>* 2
(4.13) (4-13)
=0 kfc=0
By elementary calculation, (4.13) implies that for all m < I - 1, / - Il
m-l
(-1
=0 k/fc=0
=0 kJc=0
tk=m =m
( l - ^ ) E ^ < 2 ( aa o0 + E M 2 + 2 ( E M 2 and
1-1 1-1
m m -- ll
1-1 1-1
k k=0
k=0
k=m
22 (( ll -- ^^ E )E ^ <2 2( aK0 ++ E M +2 2 (a/--m m )) E 4 < E M + E6i ll =0 k=0 k=m
Taking limit in the last inequality , it is not difficult to see that lim(l '~°°
i (/—l -1
rt-^^ '
k=0
/ _ 1 '-
1
°°
2 < 2 lim — - E V bbl
<
k=m
k=m
By the arbitarily of m, and together with (4.10), we complete the proof. Q.E.D. Now we turn to prove the global convergence of algorithm 3.
A Successive Approximation
467
for NCP
Theorem 2. Let the level set ft = {x|0(x) < 9 0 ( i o ) } l
be bounded, F € C (ft) and (4.4) hold. If {x*} is generated by algorithm 3 with Bk updated by (4.1) and (4.2), then {xk} C ft. Furthermore, if f'k(x) is nonsingular at every accumulation points of {xk}, then any accumulation point of {x^} solves NCP(F). Proof. We may prove that {xk} C ft by induction. Denote K = {0}\J{k\ek > a||i^(x fc+ i)||} = {fc0, h, k2, ■ ■ •}, where k0 = 0 < kx < k% < • ■ • If K is infinite, then by the same discussion as the first part of the proof of theorem 1 in [19], the conclusion is true. Now, we assume that K is finite and wish to deduce a contradiction. By step 5° of algorithm 3, there is an index k" such that when k > k*, fk and gk are independent of k. Without loss of generality, we let fk(x) = f(x),
gk(x)=g(x),
Vfc > 0,
and assume g(x) ^ 0. So ||#(s*)ll>-||0ll>O. a
Vfc>0.
(4.14)
From the line search steps, we can easily get 2qk(Xk) ~ 1 > 2e. By the definition of qk(Xk), this means that 2Nk(\k)
- Dk(\k)
> 2tDk(\k)
(4.15)
But 2Nk{\k) = 2f(xk)T[f(xk) < 2f(xk)T[f(xk)
=
Dk{h) - f(xk+1)} - f(xk+1)]
l|/(z*)l| 2 -||/(** + i)ll 2
and Dk(Xk)>
- max{||/(xO - f(xk+l)\\», - \\f(xk) - / ( x , + 1 ) | | 2
||AiPt||2=||xfc+1-xfc||2.
Therefore, we have 2e||**+1-xt||2<||/(x,)||2-||/(xt+1)||2
\\\kPk\\2}
S.Z. Zhou, D. H. Li and J. P. Zeng
468
which implies (4.5). Thus, by (4.6), there is a subsequence {<Jk}k£K' having the limit zero. We consider the corresponding subsequence {itJjteA"'Let Xk -> x,(k e K', k -► oo), where K' C K. From (2.7) and (3.2), we have
[H{xk) + f'{xk)pk]i= and for i 6
0, ,; (VFk-Bk)Tpk,
iii£Ak(zk); i(i€Ek(xk)
I ,
Ck(xk),
T P*VY7 Ck - B*) Dk\T„. j_}ifX71?k_ \H{xk) + /'(x,)p,], = —'' 'tf.* _- F*)CS7Ff + -(VF* Pk
DksT„ BkyPk.
Noticing that Vi € Ck(xk), \xk — Ftk\ < tk, we deduce from the above expressions that \\H(xk) + f'(xk)Pk\\
< \\(F'(xk) - Bk)pk\\.
(4.16)
On the other hand, let Gk+\ be defined by (4.7), then ^
_ \\(Bk
-F'(xk))Pk\ \\Pk\\
\\Gk+1-F'(xk)\\.
So, we claim that rjk —> 0, (k 6 A'', k —> oo). From this and (4.16), we have \\H(xk) + f'(xk)Pk\\/\\Pk\\^Q,
(k€K',k^oo).
(4.17)
Since f'{x) is nonsingular, f'(xk) is uniformly nonsingular for k S K' sufficiently large. From (4.17), there exist positive constants i/2 > i/\, such that for k € K' sufficiently large »A\H{xk)\\<\\pk\\
= 0.
(4.19)
In the following discussion, we wish to prove that for k 6 K' sufficiently large, \\k\ is bounded away from zero. So, by the fact that xk+i — xk tends to zero, we claim that p = 0, and thus (4.19) implies H(x) = 0, which contradicts with (4.14). Therefore K is infinite, and the conclusion is true.
A Successive Approximation
for NCP
469
To prove that \Xk\ has a positive lower bound, it suffices (by the line search process) to show that there is a constant t' 6 (0,1], such that qk(t) > 1/2 + e, for every t £ (0,f); and qk(t) < 0, for every t £ (-t',0). Indeed, when k £ K' sufficiently large, for every i > 0, we have Nkit) = f(xk)T[f(xk) - f(xk + tPk)] = ~tf{xk)Tf'(xk)Pk + o{t) = tf(xk)TH(xk) + o(t) = t\\H(xk)\\*-tg(xk)TH(xk) + o(t) >t\\H(xk)\\*-t\\g\\.\\H(xk)\\+o(t) > t ( l - « ) | | t f ( z t ) f + o(t). By the continuous differentiability of / , we conclude that there is a positive constant fo such that ||/(*») - f(xk + tPk)\\ < v0t\\Pk\\ < v0v2t\\H{xk)\\. Again by use of the definition of qk(X), for k £ K' sufficiently large, we get >t(l-a)\\H(xk)f
t(l - a)\\H(xk)f
~
+
o(t)
+ o(t)
(4
-U)
m^{v&}vii'\\H(xk)\r
However, as (4.14) pointed, ||//(z/t)ll > ||g|| > 0, (4.20) shows that there is t' £ (0,1) such that when t £ (0, t'), qk(t) > 1/2 + t. By the similar way, we can get that when t£(-t',0), qk(t) < 0 . Q.E.D.
5
Numerical Results
In this section, we give some numerical results for algorithm 3. Two test functions of NCP(F) are as follows: Problem 1. (see[10])
Fi(x) = 3x\ + 2xxxi + 2x\ + x3 + 3x4 - 6 F2{x) = 2x\ + z, + x\ + 3i3 + 2i4 - 2 F3(x) = Zx\ + xxx2 + 1x\ + 2x3 + 3x4 - 1
470
S.Z. Zhou, D. H. Li and J. P. Zeng
F4{x) = x\ + 3x22 +
2l
3 + 3 i 4 - 3.
The unique solution of corresponding NCP(F) is x" = (^V&,0,0, nondegenerate solution.
^)T,
which is a
Problem 2. (see[9]) Fi(x) = 3x\ + 2xjx 2 + 2x% + x 3 + 3 i 4 - 6 F2{x) = 2x\ + X! + x\ + 10x3 + 2 x 4 - 2 F3(x) = 3x\ + X!i 2 + 2x^ + 2x 3 + 9x4 - 9 F 4 (x) = x\ + 3x\ + 2x 3 + 3x4 - 3 The corresponding NCP(F) has a degenerate solution x*D = (A\/6,0,0, A) T , and a nondegenerate solution x* = ( l , 0 , 3 , 0 ) r Using the stopping criterion: ||/f(xt)|| < 10 - 5 , we give the iteration numbers in the following table.
Initial point Problem 1 Problem 2
(1,1,1,1) 12 51
(1,0,1,0) 6 98
(1,-1,1,-1) 11 80
(0,1,1,0) 15 12
References [1] C. G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation 19 (1965) 577-593. [2] X. Chen, On the convergence of Broyden-like methods for nonlinear equations with nondifferentiable terms, Annals of the Institute of Statistics and Mathemat ics 42 (1990) 387-401. [3] X. Chen and L. Qi, A parameterized Newton method and a Broyden-like method for nonsmooth equations, Computational Optimization and Applications 3 (1994) 157-179.
A Successive Approximatiof
for NCP
471
[4] X. Chen and T. Yamamoto, Convergence domain of certain iterative method for solving nonlinear equations, Numerical Functional Analysis and Optimization 10 (1989) 37-48. [5] A. Griewank, The 'global' convergence of Broyden-like methods with a suitable line search, Journal of Australian Mathemaiical Society , Ser.B 28 (1986) 75-92. [6] S. P. Han, J. S. Pang and N. Rangaraj, Globally convergent Newton method for nonsmooth equations, Mathemaiics of Operations Research 17 (1992) 586-607. [7] P. T. Harker and J. S. Pang, Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications, Mathemaiical Programming 48 (1990) 161-220. [8] P. T. Harker and B. Xiao, Newton's method for nonlinear complementarity problem: a B-differentiable equation approach, Mathematical Programming 48 (1990) 339-358. [9] C. M. Ip and J. Kyparisis, Local convergence of quasi-Newton methods for Bdifferentiable equations, Mathemaiical Programming 56 (1992) 71-89. [10] N. H. Josephy, Quasi-Newton methods for generalized equation, Technique Summary Report, No. 1977, Mathemaiics Research Center, Madison, WI1979. [11] M. Kojima and S. Shindo, Extension of Newton and quasi-Newton method to system of PC1 equations, Journal of Operations Research Society of Japan 29 (1986) 352-374. [12] P. Marcotte and J. Dussault, A note on a global convergent Newton method for solving monotone variational inequalities, Operations Research Letters 6 (1987) 35-42. [13] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, (1970). [14] J. S. Pang, Newton's method for B-differentiable equations, Mathemaiics of Operations Research 15 (1990) 331-341. [15] J. S. Pang, A B-differentiable equation-based globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming 51 (1991) 101-131. [16] J. S. Pang and D. Chan, Iterative methods for variational and complementarity problems, Mathemaiical Programming 24 (1982) 284-313. [17] J. S. Pang and L. Qi, Nonsmooth equations: motivation and applications, SIAM Journal of Optimization 3 (1993) 443-465.
472
S.Z. Zhou, D. H. Li and J. P. Zeng
[18] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227-244. [19] L. Qi and X. Chen, A globally convergent successive approximation method for nonsmooth equations, SIAM Journal on Control and Optimization (to appear). [20] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Pro gramming 58 (1993) 353-368. [21] K. Taji, M. Fukushima and T. Ibaraki, A globally convergent Newton method for solving strongly variational inequalities, Mathematical Programming 58 (1993) 369-383. [22] S. Z. Zhou and Q. R. Yan, Kantorovich theorem for nonlinear complementarity problems, Chinese Science Bulletin 36 (1991). [23] J. J. More and A. Trangenstein, On the global convergence of Broyden's method, Mathematics of Computation 30 (1976) 523-540.