Lagrange Multiplier Approach to Variational Problems and Applications
Advances in Design and Control SIAM’s Advances i...
49 downloads
984 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Lagrange Multiplier Approach to Variational Problems and Applications
Advances in Design and Control SIAM’s Advances in Design and Control series consists of texts and monographs dealing with all areas of design and control and their applications. Topics of interest include shape optimization, multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses on the mathematical and computational aspects of engineering design and control that are usable in a wide variety of scientific and engineering disciplines.
Editor-in-Chief Ralph C. Smith, North Carolina State University
Editorial Board Athanasios C. Antoulas, Rice University Siva Banda, Air Force Research Laboratory Belinda A. Batten, Oregon State University John Betts, The Boeing Company Stephen L. Campbell, North Carolina State University Eugene M. Cliff, Virginia Polytechnic Institute and State University Michel C. Delfour, University of Montreal Max D. Gunzburger, Florida State University J. William Helton, University of California, San Diego Arthur J. Krener, University of California, Davis Kirsten Morris, University of Waterloo Richard Murray, California Institute of Technology Ekkehard Sachs, University of Trier
Series Volumes Ito, Kazufumi and Kunisch, Karl, Lagrange Multiplier Approach to Variational Problems and Applications Xue, Dingyü, Chen, YangQuan, and Atherton, Derek P., Linear Feedback Control: Analysis and Design with MATLAB Hanson, Floyd B., Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation Michiels, Wim and Niculescu, Silviu-Iulian, Stability and Stabilization of Time-Delay Systems: An Eigenvalue-Based Approach ¸ Adaptive Control Tutorial Ioannou, Petros and Fidan, Baris, Bhaya, Amit and Kaszkurewicz, Eugenius, Control Perspectives on Numerical Algorithms and Matrix Problems Robinett III, Rush D., Wilson, David G., Eisler, G. Richard, and Hurtado, John E., Applied Dynamic Programming for Optimization of Dynamical Systems Huang, J., Nonlinear Output Regulation: Theory and Applications Haslinger, J. and Mäkinen, R. A. E., Introduction to Shape Optimization: Theory, Approximation, and Computation Antoulas, Athanasios C., Approximation of Large-Scale Dynamical Systems Gunzburger, Max D., Perspectives in Flow Control and Optimization Delfour, M. C. and Zolésio, J.-P., Shapes and Geometries: Analysis, Differential Calculus, and Optimization Betts, John T., Practical Methods for Optimal Control Using Nonlinear Programming El Ghaoui, Laurent and Niculescu, Silviu-Iulian, eds., Advances in Linear Matrix Inequality Methods in Control Helton, J. William and James, Matthew R., Extending H∞ Control to Nonlinear Systems: Control of Nonlinear Systems to Achieve Performance Objectives
Lagrange Multiplier Approach to Variational Problems and Applications
Kazufumi Ito North Carolina State University Raleigh, North Carolina
Karl Kunisch University of Graz Graz, Austria
Society for Industrial and Applied Mathematics Philadelphia
Copyright © 2008 by the Society for Industrial and Applied Mathematics. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. Library of Congress Cataloging-in-Publication Data Ito, Kazufumi. Lagrange multiplier approach to variational problems and applications / Kazufumi Ito, Karl Kunisch. p. cm. -- (Advances in design and control ; 15) Includes bibliographical references and index. ISBN 978-0-898716-49-8 (pbk. : alk. paper) 1. Linear complementarity problem. 2. Variational inequalities (Mathematics). 3. Multipliers (Mathematical analysis). 4. Lagrangian functions. 5. Mathematical optimization. I. Kunisch, K. (Karl), 1952– II. Title. QA402.5.I89 2008 519.3--dc22 2008061103
is a registered trademark.
To our families: Junko, Yuka, and Satoru Brigitte, Katharina, Elisabeth, and Anna
i
i
i
ItoKunisc 2008/6/12 page vii i
Contents Preface 1
xi
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
1 1 3 5 8 17 23 23 24
Sensitivity Analysis 2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Implicit function theorem . . . . . . . . . . . . . . . . . . . . . . 2.3 Stability results . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Lipschitz continuity . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Application to optimal control of an ordinary differential equation
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
27 27 31 34 45 53 62
First Order Augmented Lagrangians for Equality and Finite Rank Inequality Constraints 3.1 Generalities . . . . . . . . . . . . . . . . . . . . . 3.2 Augmentability and sufficient optimality . . . . . . 3.3 The first order augmented Lagrangian algorithm . . 3.4 Convergence of Algorithm ALM . . . . . . . . . . 3.5 Application to a parameter estimation problem . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
65 65 67 75 78 82
4 Augmented Lagrangian Methods for Nonsmooth, Convex Optimization 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Convex analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Conjugate and biconjugate functionals . . . . . . . . . . . . . 4.2.2 Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Fenchel duality theory . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
87 87 89 92 95 98
2
3
Existence of Lagrange Multipliers 1.1 Problem statement and generalities . . . . . . . . . . . 1.2 A generalized open mapping theorem . . . . . . . . . . 1.3 Regularity and existence of Lagrange multipliers . . . 1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . 1.5 Weakly singular problems . . . . . . . . . . . . . . . . 1.6 Approximation, penalty, and adapted penalty techniques 1.6.1 Approximation techniques . . . . . . . . . . . . 1.6.2 Penalty techniques . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . .
vii
i
i i
i
i
i
i
viii
Contents 4.4 4.5 4.6 4.7
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
104 109 114 119 120 121 122 122 124 125 126 126
Newton and SQP Methods 5.1 Preliminaries . . . . . . . . . . . . . . . . . . 5.2 Newton method . . . . . . . . . . . . . . . . . 5.3 SQP and reduced SQP methods . . . . . . . . . 5.4 Optimal control of the Navier–Stokes equations 5.4.1 Necessary optimality condition . . . . . 5.4.2 Sufficient optimality condition . . . . . 5.4.3 Newton’s method for (5.4.1) . . . . . . 5.5 Newton method for the weakly singular case . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
129 129 133 137 143 145 147 147 148
6 Augmented Lagrangian-SQP Methods 6.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Equality-constrained problems . . . . . . . . . . . . . . . . 6.3 Partial elimination of constraints . . . . . . . . . . . . . . . 6.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 An introductory example . . . . . . . . . . . . . . . 6.4.2 A class of nonlinear elliptic optimal control problems 6.5 Approximation and mesh-independence . . . . . . . . . . . 6.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
155 155 156 165 172 172 174 183 186
7 The Primal-Dual Active Set Method 7.1 Introduction and basic properties . . . . . . . . . . . 7.2 Monotone class . . . . . . . . . . . . . . . . . . . . 7.3 Cone sum preserving class . . . . . . . . . . . . . . 7.4 Diagonally dominated class . . . . . . . . . . . . . . 7.5 Bilateral constraints, diagonally dominated class . . 7.6 Nonlinear control problems with bilateral constraints
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
189 189 196 197 200 202 206
Semismooth Newton Methods I 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Semismooth functions in finite dimensions . . . . . . . . . . 8.2.1 Basic concepts and the semismooth Newton algorithm 8.2.2 Globalization . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
215 215 217 217 222
5
8
ItoKunisc 2008/6/12 page viii i
Generalized Yosida–Moreau approximation Optimality systems . . . . . . . . . . . . . Augmented Lagrangian method . . . . . . . Applications . . . . . . . . . . . . . . . . . 4.7.1 Bingham flow . . . . . . . . . . . . 4.7.2 Image restoration . . . . . . . . . . 4.7.3 Elastoplastic problem . . . . . . . . 4.7.4 Obstacle problem . . . . . . . . . . 4.7.5 Signorini problem . . . . . . . . . . 4.7.6 Friction problem . . . . . . . . . . . 4.7.7 L1 -fitting . . . . . . . . . . . . . . . 4.7.8 Control problem . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
i
i i
i
i
i
i
Contents
8.3 8.4 8.5 8.6 9
ix
8.2.3 Descent directions . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 A Gauss–Newton algorithm . . . . . . . . . . . . . . . . . . . . . 8.2.5 A nonlinear complementarity problem . . . . . . . . . . . . . . . Semismooth functions in infinite-dimensional spaces . . . . . . . . . . . The primal-dual active set method as a semismooth Newton method . . . Semismooth Newton methods for a class of nonlinear complementarity problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semismooth Newton methods and regularization . . . . . . . . . . . . .
Semismooth Newton Methods II: Applications 9.1 BV-based image restoration problems . . . . . 9.2 Friction and contact problems in elasticity . . . 9.2.1 Generalities . . . . . . . . . . . . . . . 9.2.2 Contact problem with Tresca friction . . 9.2.3 Contact problem with Coulomb friction .
. . . . .
225 228 231 234 240
. 243 . 246
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
253 254 263 263 265 272
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
277 281 291 292 297 302
11 Shape Optimization 11.1 Problem statement and generalities . . . . . . . . 11.2 Shape derivative . . . . . . . . . . . . . . . . . . 11.3 Examples . . . . . . . . . . . . . . . . . . . . . 11.3.1 Elliptic Dirichlet boundary value problem 11.3.2 Inverse interface problem . . . . . . . . . 11.3.3 Elliptic systems . . . . . . . . . . . . . . 11.3.4 Navier–Stokes system . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
305 305 308 314 314 316 321 323
10 Parabolic Variational Inequalities 10.1 Strong solutions . . . . . . . . . . . . . 10.2 Regularity . . . . . . . . . . . . . . . . 10.3 Continuity of q → y(q) ∈ L∞ () . . . 10.4 Difference schemes and weak solutions 10.5 Monotone property . . . . . . . . . . .
ItoKunisc 2008/6/12 page ix i
. . . . .
. . . . .
. . . . .
. . . . .
Bibliography
327
Index
339
i
i i
i
i
ItoKunisc 2008/6/12 page x i
i
i
i
i
i
i
i
i
i
ItoKunisc 2008/6/12 page xi i
Preface The objective of this monograph is the treatment of a general class of nonlinear variational problems of the form min
y∈Y, u∈U
f (y, u)
subject to e(y, u) = 0,
g(y, u) ∈ K,
(0.0.1)
where f : Y ×U → R denotes the cost functional, and e : Y ×U → W and g : Y ×U → Z are functionals used to describe equality and inequality constraints. Here Y , U , W , and Z are Banach spaces and K is a closed convex set in Z. A special choice for K is simple box constraints K = {z ∈ Z : φ ≤ z ≤ ψ}, (0.0.2) where Z is a lattice with ordering ≤, and φ, ψ are elements in Z. Theoretical issues which will be treated include the existence of minimizers, optimality conditions, Lagrange multiplier theory, sufficient optimality considerations, and sensitivity analysis of the solutions to (0.0.1) with respect to perturbations in the problem data. These topics will be covered mainly in the first part of this monograph. The second part focuses on selected computational methods for solving the constrained minimization problem (0.0.1). The final chapter is devoted to the characterization of shape gradients for optimization problems constrained by partial differential equations. Problems which fit into the framework of (0.0.1) are quite general and arise in application areas that were intensively investigated during the last half century. They include optimal control problems, structural optimization, inverse and parameter estimation problems, contact and friction problems, problems in image reconstruction and mathematical finance, and others. The variable y is often referred to as the state variable and u as the control or design parameter. The relationship between the variables y and u is described by e. It typically represents a differential or a functional equation. If e can be used to express the variable y as a function of u, i.e., y = (u), then (0.0.1) reduces to min J (u) = f ((u), u) subject to g((u), u) ∈ K. u∈U
(0.0.3)
This is referred to as the reduced formulation of (0.0.1). In the more general case when y and u are both independent variables linked by the equation constraint e(y, u) = 0 it will be convenient at times to introduce x = (y, u) in X = Y × U and to express (0.0.1) as min f (x) x∈X
subject to e(x) = 0, g(x) ∈ K.
(0.0.4)
xi
i
i i
i
i
i
i
xii
ItoKunisc 2008/6/12 page xii i
Preface
From a computational perspective it can be advantageous to treat y and u as independent variables even though a representation of y in terms of u is available. As an example consider the inverse problem of determining the diffusion parameter a in −∇ · (a ∇y) = f in , y|∂ = 0 (0.0.5) from the measurement yobs , where is a bounded open set in R2 and f ∈ H −1 (). A nonlinear least squares approach to this problem can be formulated as (0.0.1) by choosing y ∈ Y = H01 (), a ∈ U = H 1 () ∩ L∞ (), W = H −1 (), Z = L2 () and considering min (|y − yobs |2 + β |∇a|2 ) dx (0.0.6) subject to (0.0.5) and a ≤ a ≤ a, ¯ where 0 < a < a¯ < ∞ are lower and upper bounds for the “control” variable a. In this example, given a ∈ U , the state y ∈ Y can be uniquely determined by solving the elliptic boundary value problem (0.0.5) for y, and we obtain a problem of type (0.0.3). The general approach that we follow in this monograph for the analytical as well as the numerical treatment of (0.0.1) is based on Lagrange multiplier theory. Let us subsequently suppose that Y, U , and W are Hilbert spaces and discuss the case Z = U and g(y, u) = u, i.e., the constraint u ∈ K. We assume that f and e are C 1 and denote by fy , fu the Fréchet derivatives of f with respect to y and u, respectively. Let W ∗ be the dual space of W and let the duality product be denoted by ·, · W ∗ ,W . The analogous notation is used for Z and Z ∗ . We form the Lagrange functional L(y, u, λ) = f (y, u) + e(y, u), λ W,W ∗ ,
(0.0.7)
where λ ∈ W ∗ is the Lagrange multiplier associated with the equality constraint e(y, u) = 0 which, for the present discussion, is supposed to exist. It will be shown that a minimizing pair (y, u) satisfies Ly (y, u) = fy (y, u) + ey (y, u)∗ λ = 0, fu (y, u) + eu (y, u)∗ λ , v − u Z∗ ,Z ≥ 0 for all v ∈ K,
(0.0.8)
e(y, u) = 0 and u ∈ K. In the case K = Z the second equation of (0.0.8) results in the equality fy (y, u) + eu (y, u) = 0. In this case a first possibility for solving the system (0.0.8) for the unknowns (y, u, λ) is the use of a direct equation solver of Newton type, for example. Here the Lagrange multiplier λ is treated as an independent variable just like y and u. Alternatively, for (y, u) satisfying e(y, u) = 0 and λ satisfying fy (y, u) + eu (y, u)∗ λ = 0 the gradient of J (u) of (0.0.4) can be evaluated as Ju = fu (y, u) + eu (y, u)∗ λ.
(0.0.9)
Thus the combined step of determining y ∈ Y for given u ∈ K such that e(y, u) = 0 and finding λ ∈ W ∗ satisfying fy (y, u) + eu (y, u)∗ λ = 0 for (y, u) ∈ Y × U provides a
i
i i
i
i
i
i
Preface
ItoKunisc 2008/6/12 page xiii i
xiii
possibility for evaluating the gradient of J at u. If in addition u has to satisfy a constraint of the form u ∈ K, the projected gradient is obtained by projecting fu (y, u) + eu (y, u)∗ λ onto K, and (projected) gradient-based iterative methods can be employed to solve (0.0.1). In optimal control of differential equations, the multiplier λ is called the adjoint state and the second equation in (0.0.8) is called the optimality condition. Further the system (0.0.8) coincides with the celebrated Pontryagin maximum principle. The steps in the procedure explained above for obtaining the gradient are referred to as the forward equation step for the state equation (the third equation in (0.0.8)) and the backward equation step for the adjoint equation (the first equation). This terminology is motivated by time-dependent problems, where the third equation is an initial value problem and the first one is a problem with terminal time boundary condition. Lastly, if K is of type (0.0.2), then the second equation of (0.0.8) can be written as the complementarity condition fu (y, u) + eu (y, u)∗ λ + η = 0, η = max(0, η + (u − ψ)) + min(0, η + (u − φ)).
(0.0.10)
Due to the nondifferentiability of the max and min operations classical Newton methods are not directly applicable to system (0.0.10). Active set methods provide a very efficient alternative. They turn out to be equivalent to semismooth Newton methods in function spaces. If appropriate structural conditions are met, including that f be quadratic and e affine, then such techniques are globally convergent. Moreover, under more general conditions they exhibit local superlinear convergence. Due to the practical relevance of problems with box constraints, a significant part of this monograph is devoted to the analysis of these methods. A frequently employed alternative to the multiplier approach is given by the penalty method. To explain the procedure consider the sequence of minimization problems ck min f (y, u) + | e(y, u)|2W (0.0.11) y∈Y,u∈K 2 for an increasing sequence of penalty parameter ck . That is, the equality constraint is eliminated through the quadratic penalty function. This requires to solve the unconstrained minimization problem (0.0.11) over Y × K for a sequence of {ck } tending to infinity. Under mild assumptions it can be shown that the sequence of minimizers (yk , uk ) determined by (0.0.11) converges to a minimizer of (0.0.1) as ck → ∞. Moreover, (yk , uk ) satisfies fy (yk , uk ) + ey (yk , uk )∗ (ck e(yk , uk )) = 0, fu (yk , uk ) + eu (yk , uk )∗ (ck e(yk , uk )), v − uk Z∗ ,Z ≥ 0.
(0.0.12)
Comparing (0.0.12) with (0.0.8) it is natural to ask whether ck e(yk , uk ) tends to a Lagrange multiplier associated to e(y, u) = 0 as ck → ∞. Indeed this can be shown under suitable conditions. Despite disadvantages due to slow convergence and possible ill-conditioning for solving (0.0.12) with large values of ck , the penalty method is widely accepted in practice. This is due, in part, to the simplicity of the approach and the availability of powerful algorithms for solving (0.0.12) if K = Z, when the inequality in (0.0.12) becomes an equality. Athird methodology is given by duality techniques. They are based on the introduction of the dual functional d(λ) = inf L(y, u, λ) (0.0.13) y∈Y, u∈K
i
i i
i
i
i
i
xiv
ItoKunisc 2008/6/12 page xiv i
Preface
and the duality property sup d(λ) = λ∈Y
inf
y∈Y, u∈K
f (y, u)
subject to e(y, u) = 0.
(0.0.14)
The duality method for solving (0.0.1) requires minimizing L(y, u, λk ) over (y, u) ∈ Y ×K and updating λ by means of λk+1 = λk + αk J e(yk , uk ),
(0.0.15)
where (yk , uk ) = argmin L(y, u, λk ), y∈Y, u∈K
αk > 0 is an appropriately chosen step size and J denotes the Riesz mapping from W onto W ∗ . It can be argued that e(yk , uk ) is the gradient of d(λ) at λk and (0.0.15) is in turn a steepest ascent method for the maximization of d(λ) under appropriate conditions. Such methods are called primal-dual methods. Despite the fact that the method can be justified only under fairly restrictive convexity assumptions on L(y, u, λ) with respect to (y, u), it provides an elegant use of Lagrange multipliers and is a basis for so-called augmented Lagrangian methods. Augmented Lagrangian methods with K = Z are based on the following problem which is equivalent to (0.0.1): c min f (y, u) + |e(y, u)|2W 2 subject to e(y, u) = 0. y∈Y,u∈K
(0.0.16)
Under rather mild conditions the quadratic term enhances the local convexity of L(y, u, λ) in the variables (y, u) for sufficiently large c > 0. It helps the convergence of direct solvers based on the necessary optimality conditions (0.0.8). To carry this a step further we introduce the augmented Lagrangian functional c Lc (y, u, λ) = f (x, u) + e(y, u), λ + |e(y, u)|2W . 2
(0.0.17)
The first order augmented Lagrangian method is the primal-dual method applied to (0.0.17), i.e., (yk , uk ) = argmin Lc (y, u, λk ), y∈Y, u∈K (0.0.18) λk+1 = λk + c J e(yk , uk ). Its advantage over the penalty method is attributed to the fact that local convergence of (yk , uk ) to a minimizer (y, u) of (0.0.1) holds for all sufficiently large and fixed c > 0, without requiring that c → ∞. As we noted, Lc (y, u, λ) has local convexity properties under well-studied assumptions, and the primal-dual viewpoint is applicable to the multiplier update. The iterates (yk , uk , λk ) converge linearly to the triple (y, u, λ) satisfying the first order necessary optimality conditions, and convergence can improve as c > 0 increases. Due to these attractive characteristics and properties, the method of multipliers and its subsequent Newton-like variants have been recognized as a powerful method for minimization problems with equality constraints. They constitute an important part of this book.
i
i i
i
i
i
i
Preface
ItoKunisc 2008/6/12 page xv i
xv
In (0.0.16) and (0.0.18) the constraint u ∈ K remained as explicit constraint and was not augmented. To describe a possibility for augmenting inequalities we return to the general form g(y, u) ∈ K and consider inequality constraints with finite rank, i.e., Z = Rp and K = {z ∈ Rp : zi ≤ 0, 1 ≤ i ≤ p}. Then under appropriate conditions the formulation min
y∈Y, u∈U,
q∈Rp
Lc (y, u, λ) + (μ, g(y, u) − q) +
c |g(y, u) − q|2Rp 2
subject to q ≤ 0
(0.0.19)
is equivalent to (0.0.1). Here μ ∈ Rp is the Lagrange variable associated with the inequality constraint g(y, u) ≤ 0. Minimizing the functional in (0.0.19) over q ≤ 0 results in the augmented Lagrangian functional Lc (y, u, λ, μ) = Lc (y, u, λ) +
1 c | max(0, μ + c g(y, u))|2Rp − |μ|2Rp , 2 2
(0.0.20)
where equality and finite rank inequality constraints are augmented. The corresponding augmented Lagrangian method is (yk , uk ) = argmin Lc (y, u, λk , μk ), y∈Y, u∈K
λk+1 = λk + c J e(yk , uk ),
(0.0.21)
μk+1 = max(0, μk + c g(yk , uk )). In the discussion of box-constrained problems we already pointed out the relevance of numerical methods for nonsmooth problems; see (0.0.10). In many applied variational problems, for example in mechanics, fluid flow, or image analysis, nonsmooth cost functionals arise. Consider as a special case the simplified friction problem 1 min f (y) = (|∇y|2 + |y|2 ) − f˜ y dx + g |y| ds over y ∈ H 1 (), (0.0.22) 2 where is a bounded domain with boundary . This is an unconstrained minimization problem with X = H 1 () and no control variables. Since the functional is not continuously differentiable, the necessary optimality condition fy (y) = 0 is not applicable. However, with the aid of a generalized Lagrange multiplier theory a necessary optimality condition can be written as ∂y = g λ on , ∂ν |λ(x)| ≤ 1 and λ(x)y(x) = | y(x)| a.e. in .
− y + y = f˜,
(0.0.23)
From a Lagrange multiplier perspective, problems with L1 -type cost functionals as in (0.0.22) and box-constrained problems are dual to each other. So it comes as no surprise that again semismooth Newton methods provide an efficient technique for solving (0.0.22) or (0.0.23). We shall analyze them and provide a theoretical basis for their numerical efficiency. This monograph also contains the analysis of optimization problems constrained by partial differential equations which are singular in the sense that the state variable cannot be differentiated with respect to the control. This, however, does not preclude that the cost
i
i i
i
i
i
i
xvi
ItoKunisc 2008/6/12 page xvi i
Preface
functional is differentiable with respect to the control and that an optimality principle can be derived. In terms of shape optimization problems this means that the cost functional is shape differentiable while the state is not differentiable with respect to the shape. In summary, Lagrange multiplier theory provides a tool for the analysis of general constrained optimization problems with cost functionals which are not necessarily C 1 and with state equations which are in some sense singular. It also leads to a theoretical basis for developing efficient and powerful iterative methods for solving such problems. The purpose of this monograph is to provide a rather thorough analysis of Lagrange multiplier theory and to show its impact on the development of numerical algorithms for problems which are posed in a function space setting. Let us give a short description of the book for those readers who do not intend to read it by consecutive chapters. Chapter 1 provides a variety of tools to establish existence of Lagrange multipliers and is called upon in all the following chapters. Here, as in other chapters, we do not attempt to give the most general results, nor do we strive for covering the complete literature. Chapter 2 is devoted to the sensitivity analysis of abstract constrained nonlinear programming problems and it essentially stands for itself. This chapter is of great importance, addressing continuity, Lipschitz continuity, and differentiability of the solutions to optimization and optimal control problems with respect to parameters that appear in the problem formulation. Such results are not only of theoretical but also of practical importance. The sensitivity equations have been a starting point for the development of algorithmic concepts for decades. Nevertheless, readers who are not interested in this topic at first may skip this chapter without missing technical results which might be needed for later chapters. Chapters 3, 5, and 6 form a unit which is devoted to smooth optimization problems. Chapter 3 covers first order augmented Lagrangian methods for optimization problems with equality and inequality constraints. Here as in the remainder of the book, the inequality constraints that we have in mind typically represent partial differential equations. In fact, during the period in which this monograph was written, the terminology “PDE-constrained optimization” emerged. Inverse problems formulated as regularized least squares problems and optimal control problems for (partial) differential equations are primary examples for the theories that are discussed here. Chapters 5 and 6 are devoted to second order iterative solution methods for equality-constrained problems. Again the equality constraints represent partial differential equations. This naturally gives rise to the following situation: The variables with respect to which the optimization is carried out can be classified into two groups. One group contains the state variables of the differential equations and the other group consists of variables which represent the control or input variables for optimal control problems, or coefficients in parameter estimation problems. If the state variables are considered as functions of the independent controls, inputs, or coefficients, and the cost functional in the optimization problem is only considered as a functional of the latter, then this is referred to as the reduced formulation. Applying a second order method to the reduced functional we arrive at the Newton method for optimization problems with partial differential equations as constraints. If both state and control variables are kept as independent variables and the optimality system involving primal and adjoint variables, which are the Lagrange multipliers corresponding to the PDE-constraints, is derived, we arrive at the sequential quadratic programming (SQP) technique: It essentially consists of applying a Newton algorithm to the first order necessary optimality conditions. The Newton method for the reduced formulation and the SQP technique are the focus of Chapter 5. Chapter 6
i
i i
i
i
i
i
Preface
ItoKunisc 2008/6/12 page xvii i
xvii
is devoted to second order augmented Lagrangian techniques which are closely related, as we shall see, to SQP methods. Here the equation constraint is augmented in a penalty term, which has the effect of locally convexifying the optimization problem. Since augmented Lagrangians also involve Lagrange multipliers, there is, however, no necessity to let the penalty parameter tend to infinity and, in fact, we do not suggest doing so. A second larger unit is formed by Chapters 4, 7, 8, and 9. Nonsmoothness, primal-dual active set strategy, and semismooth Newton methods are the keywords which characterize the contents of these chapters. Chapter 4 is essentially a recapture of concepts from convex analysis in a format that is used in the remaining chapters. A key result is the formulation of differential inclusions which arise in optimality systems by means of nondifferentiable equations which are derived from Yosida–Moreau approximations and which will serve as the basis for the primal-dual active set strategy. Chapter 7 is devoted to the primaldual active set strategy and its global convergence properties for unilaterally and bilaterally constrained problems. The local analysis of the primal-dual active set strategy is achieved in the framework of semismooth Newton methods in Chapter 8. It contains the notion of Newton derivative and establishes local superlinear convergence of the Newton method for problems which do not satisfy the classical sufficient conditions for local quadratic convergence. Two important classes of applications of semismooth Newton methods are considered in Chapter 9: image restoration and deconvolution problems regularized by the bounded variation (BV) functional and friction and contact problems in elasticity. Chapter 10 is devoted to a Lagrangian treatment of parabolic variational inequalities in unbounded domains as they arise in the Black–Scholes equation, for example. It contains the use of monotone techniques for analyzing parabolic systems without relying on compactness assumptions in a Gelfand-triple framework. In Chapter 11 we provide a calculus for obtaining the shape derivative of the cost functional in shape optimization problems which bypasses the need for using the shape derivative of the state variables of the partial differential equations. It makes use of the expansion technique that is proposed in Chapters 1 and 5 for weakly singular optimal control problems, and of the assumption that an appropriately defined adjoint equation admits a solution. This provides a versatile technique for evaluating the shape derivative of the cost functional using Lagrange multiplier techniques. There are many additional topics which would fit under the title of this monograph which, however, we chose not to include. In particular, issues of discretization, convergence, and rate of convergence are not discussed. Here the issue of proper discretization of adjoint equations consistent with the discretization of the primal equation and the consistent time integration of the adjoint equations must be mentioned. We do not enter into the discussion of whether to discretize an infinite-dimensional nonlinear programming problem first and then to decide on an iterative algorithm to solve the finite-dimensional problems, or the other way around, consisting of devising an optimization algorithm for the infinite-dimensional problem which is subsequently discretized. It is desirable to choose a discretization and an iterative optimization strategy in such a manner that these two approaches commute. Discontinuous Galerkin methods are well suited for this purpose; see, e.g., [BeMeVe]. Another important area which is not in the focus of this monograph is the efficient solution of those large scale linear systems which arise in optimization algorithms. We refer the reader to, e.g., [BGHW, BGHKW], and the literature cited there. The solution of large scale time-dependent optimal control problems involving the coupled system of primal and
i
i i
i
i
i
i
xviii
ItoKunisc 2008/6/12 page xviii i
Preface
adjoint equations, which need to be solved in opposite directions with respect to time, still offers a significant challenge, despite the advances that were made with multiple shooting, receding horizon, and time-domain decomposition techniques. From the point of view of optimization theory there are several topics as well into which one could expand. These include globalization strategies, like trust region methods, exact penalty methods, quasiNewton methods, and a more abstract Lagrange multiplier theory than that presented in Chapter 1. As a final comment we stress that for a full treatment of a variational problem in function spaces, both its infinite-dimensional analysis as well as its proper discretization and the relation between the two are indispensable. Proceeding from an infinite-dimensional problem directly to its disretization without such a treatment, important issues can be missed. For instance discretization without a well-posed analysis may result in the use of inappropriate inner products, which may lead to unnecessary ill-conditioning, which entails unnecessary preconditioning. Inconsiderate discretization may also result in the loss of structural properties, as for instance symmetry properties.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 1 i
Chapter 1
Existence of Lagrange Multipliers
1.1
Problem statement and generalities
We consider the constrained optimization problem min f (x) subject to x ∈ C and g(x) ∈ K,
(1.1.1)
where f is a real-valued functional defined on a real Banach space X; further C is a closed convex subset of X, g is a mapping from X into the real Banach space Z, and K is a closed convex cone in Z. Throughout the monograph it is understood that the vertex of a cone coincides with the origin. Further, in this chapter it is assumed that (1.1.1) admits a solution x ∗ . Here we refer to x ∗ as solution if it is feasible, i.e., x ∗ satisfies the constraints in (1.1.1), and f has a local minimum at x ∗ . It is assumed that f is Fréchet differentiable at x ∗ , and (1.1.2) g is continuously Fréchet differentiable at x ∗ with Fréchet derivatives denoted by f (x ∗ ) and g (x ∗ ), respectively. The set of feasible points for (1.1.1) will be denoted by M = {x ∈ C : g(x) ∈ K}. Moreover, the following notation will be used. The topological duals of X and Z are denoted by X ∗ and Z ∗ . For the subset A of X the polar or dual cone is defined by A+ = {x ∗ ∈ X ∗ : x ∗ , a ≤ 0 for all a ∈ A}, where ·, · (= ·, · X∗ ,X ) denotes the duality pairing between X ∗ and X. Further for x ∈ X the conical hull of C \ {x} is given by C(x) = {λ(c − x) : c ∈ C, λ ≥ 0}, and K(z) with z ∈ Z is defined analogously. The Lagrange functional L : X × Z ∗ → R associated to (1.1.1) is defined by L(x, λ) = f (x) + λ, g(x) Z∗ ,Z . 1
i
i i
i
i
i
i
2
ItoKunisc 2008/6/12 page 2 i
Chapter 1. Existence of Lagrange Multipliers
Definition 1.1. An element λ∗ ∈ Z ∗ is called a Lagrange multiplier for (1.1.1) at the solution x ∗ if λ∗ ∈ K + , λ∗ , g(x ∗ ) = 0, and f (x ∗ ) + λ∗ ◦ g (x ∗ ) ∈ −C(x ∗ )+ . Lagrange multipliers are of fundamental importance for expressing necessary as well as sufficient optimality conditions for (1.1.1) and for the development of algorithms to solve (1.1.1). If C = X, then the inclusion in Definition 1.1 results in the equation f (x ∗ ) + λ∗ ◦ g (x ∗ ) = 0. For the existence analysis of Lagrange multipliers one makes use of the following tangent cones which approximate the feasible set M at x ∗ : 1 ∗ ∗ + T (M, x ) = x ∈ X : x = lim (xn − x ), tn → 0 , xn ∈ M , n→∞ tn ∗ L(M, x ) = {x ∈ X : x ∈ C(x ∗ ) and g (x ∗ )x ∈ K(g(x ∗ ))}. The cone T (M, x ∗ ) is called the sequential tangent cone (or Bouligand cone) and L(M, x ∗ ) the linearizing cone of M at x ∗ . Proposition 1.2. If x ∗ is a solution to (1.1.1), then f (x ∗ )x ≥ 0 for all x ∈ T (M, x ∗ ). Proof. Let x ∈ T (M, x ∗ ). Then there exist sequences {xn }∞ n=1 in M and {tn } with limn→∞ tn = 0 such that x = limn→∞ t1n (xn − x ∗ ). By the definition of Fréchet differentiability there exists a sequence {rn } in Z such that f (x ∗ )x = lim
1
n→∞ tn
and limn→∞
1 r |xn −x ∗ | n
f (x ∗ )(xn − x ∗ ) = lim
1
n→∞ tn
(f (xn ) − f (x ∗ ) + rn )
= 0. By optimality of x ∗ we have f (x ∗ )x ≥ lim |x| n→∞
1 rn = 0 |xn − x ∗ |
and the result follows. We now briefly outline the main steps involved in the abstract approach to prove the existence of Lagrange multipliers. For this purpose we assume that C = X, K = {0} so that the set of feasible points is described by M = {x : g(x) = 0}. We assume that g (x ∗ ) : X → Z is surjective. Then by Lyusternik’s theorem [Ja, We] we have L(M, x ∗ ) ⊂ T (M, x ∗ ).
(1.1.3)
Consider the convex cone B = {(f (x ∗ )x + r, g (x ∗ )x) : r ≥ 0, x ∈ X} ⊂ R × Z.
i
i i
i
i
i
i
1.2. A generalized open mapping theorem
ItoKunisc 2008/6/12 page 3 i
3
Observe that (0, 0) ∈ B and that due to Proposition 1.2 and (1.1.3) the origin (0, 0) is a boundary point of B. Since g (x ∗ ) is surjective, B has nonempty interior and hence by the Eidelheit separation theorem that we shall recall below there exists a closed hyperplane in R × Z which supports B at (0, 0), i.e., there exists 0 = (α, λ∗ ) ∈ R × Z ∗ such that α(f (x ∗ )x + r) + λ∗ , g (x ∗ )x ≥ 0 for all (r, x) ∈ R+ × X. Setting x = 0 we have α ≥ 0. If α = 0, then λ∗ g (x ∗ ) = 0, which would imply λ∗ = 0, which is impossible. Consequently α > 0 and without loss of generality we can assume α = 1. Hence λ∗ is a Lagrange multiplier. For the general case of equality and inequality constraints the application of Lyusternik’s theorem is replaced by stability results for solutions of systems of inequalities. The existence of a separating hyperplane will follow from a generalized open mapping theorem, which is covered in the following section. The separation theorem announced above is presented next. Theorem 1.3. Let K1 and K2 be nontrivial, convex, and disjoint subsets of a normed linear space X . If K1 is closed and K2 is compact, then there exists a closed hyperplane strictly separating K1 and K2 , i.e., there exist x ∗ ∈ X , β ∈ R, and > 0 such that x ∗ , x ≤ β − for all x ∈ K1 and x ∗ , x ≥ β + for all x ∈ K2 . If K1 is open and K2 is arbitrary, then these inequalities still hold with = 0. Let us give a brief description of the sections of this chapter. Section 1.2 contains the open mapping theorem already alluded to above. Regularity properties which guarantee the existence of a Lagrange multiplier in general nonlinear optimal programming problems in infinite dimensions are analyzed in section 1.3 and applications to parameter estimation and optimal control problems are described in section 1.4. Section 1.5 is devoted to the derivation of a first order optimality system for a class of weakly singular optimal control problems and to several applications which are covered by this class. Finally in section 1.6 several further alternatives for deriving optimality systems are presented in an informal manner.
1.2 A generalized open mapping theorem Throughout this section T denotes a bounded linear operator from X to Z. As in the previous section C stands for a closed convex set in X, and K stands for a closed convex cone in Z. For Theorem 1.4 below it is sufficient that K is a closed convex set. For ρ > 0 we set Xρ = {x : |x| ≤ ρ} and Zρ is defined analogously. Recall that by the open mapping theorem, surjectivity of T implies that T maps open sets of X onto open sets of Z. As a consequence T X1 contains Zρ for some ρ > 0. The following theorem generalizes this result. Theorem 1.4. Let x¯ ∈ C and y¯ ∈ K, where K is a closed convex set. Then the following statements are equivalent: (a) Z = T C(x) ¯ − K(y), ¯
i
i i
i
i
i
i
4
ItoKunisc 2008/6/12 page 4 i
Chapter 1. Existence of Lagrange Multipliers (b) there exists ρ > 0 such that Zρ ⊂ T ((C − {x}) ¯ ∩ X1 ) − (K − {y}) ¯ ∩ Z1 ,
where C(x) ¯ = {λ(x − x) ¯ : λ ≥ 0, x ∈ C} and K(y) ¯ = {k − λy¯ : λ ≥ 0, k ∈ K}. Proof. Clearly (b) implies (a). To prove the converse we first show that n((C − x) ¯ ∩ X1 ). C(x) ¯ =
(1.2.1)
n∈N
The inclusion ⊃ is obvious and hence it suffices to prove that C(x) ¯ is contained in the set on the right-hand side of (1.2.1). Let x ∈ C(x). ¯ Then x = λy with λ ≥ 0 and y ∈ C − x. ¯ Without loss of generality we can assume that λ > 0 and |y| > 1. Convexity of C implies 1 1 that |y| y ∈ (C − x) ¯ ∩ X1 . Let n ∈ N be such that n ≥ λ|y|. Then x = (λ|y|) |y| y ∈ n((C − x) ¯ ∩ X1 ), and (1.2.1) is verified. For α > 0 let Aα = αT ((C − x) ¯ ∩ X1 − (K − y) ¯ ∩ Z1 ). We will show that 0 ∈ int A1 . In fact, from (1.2.1) and the analogous equality with C replaced by K it follows from (a) that nT ((C − x) ¯ ∩ X1 ) ∪ (−n)((K − y) ¯ ∩ Z1 ). Z= n∈N
n∈N
This implies hat Z=
n∈N
An =
An .
(1.2.2)
n∈N
Thus the complete space Z is the countable union of the closed sets An . By the Baire category theorem there exists some m ∈ N such that int Am = ∅. Let a ∈ int Am . By (1.2.2) there exists k ∈ N with −a ∈ Ak . Using Aα = αA1 this implies that − mk a ∈ Am . It follows that the half-open interval {λ(− mk a)+(1−λ)a : 0 ≤ λ < 1} belongs to int Am and consequently 0 ∈ int Am = m int A1 . Thus we have 0 ∈ int A1 .
(1.2.3)
Hence for some ρ > 0 Zρ ⊂
1 A1 = A 1 ⊂ A 1 + Z ρ2 , 2 2 2
and consequently for all i = 0, 1, . . . i i 1 1 1 Z( )i ρ = Zρ ⊂ (A 1 + Z ρ2 ) = A( 1 )i+1 + Z( 1 )i+1 ρ . 2 2 2 2 2 2
(1.2.4)
(1.2.5)
Now choose y ∈ Zρ . By (1.2.4) there exist x1 ∈ (C − x) ¯ ∩ X1 , y1 ∈ (K − y) ¯ ∩ Z1 , and r1 ∈ Z ρ2 such that y = T ( 12 x1 ) − 12 y1 + r1 . Applying (1.2.5) with i = 1 to r1 implies the existence of x2 ∈ (C − x) ¯ ∩ X1 , ¯ ∩ Z1 , and r2 ∈ Z( 1 )2 ρ such that y2 ∈ (K − y) 2
y=T
1 x1 + 2
2 1 1 1 y1 + y2 + r 2 . x2 − 2 2 2
i
i i
i
i
i
i
1.3. Regularity and existence of Lagrange multipliers
ItoKunisc 2008/6/12 page 5 i
5
Repeated application of this argument implies that y = T un − vn + rn , where n i 1 un = xi with xi ∈ (C − x) ¯ ∩ X1 , 2 i=1 n i 1 vn = yi with yi ∈ (K − y) ¯ ∩ Y1 , 2 i=1 and rn ∈ Y( 1 )n ρ . Observe that 2
1 un = x1 + · · · + 2
n n i 1 1 0 ∈ (C − x) ¯ ∩ X1 xn + 1 − 2 2 i=1
and i n 1 1 |un − un+m | ≤ ≤ . 2 2 i=n+1 n+m
Consequently {un }∞ ¯ ∩ X1 such n=1 is a Cauchy sequence and there exists some x ∈ (C − x) ¯ ∩ Z1 such that limn→∞ vn = v. that limn→∞ un = x. Similarly there exists v ∈ (K − y) Moreover limn→∞ rn = 0 and continuity of T implies that y = T x − v.
1.3
Regularity and existence of Lagrange multipliers
In this section existence of a Lagrange multiplier for the optimization problem (1.1.1) will be established using a regular point condition which was proposed in the publications of Robinson for sensitivity analysis of generalized equations and later by Kurcyusz, Zowe, and Maurer for the existence of Lagrange multipliers. Definition 1.5. The element x¯ ∈ M is called regular (or alternatively x¯ satisfies the regular point condition) if 0 ∈ int {g (x)(C ¯ − x) ¯ − K + g(x)}. ¯
(1.3.1)
If (1.3.1) holds, then clearly 0 ∈ int {g (x)C( ¯ x) ¯ − K(g(x))}. ¯
(1.3.2)
¯ x) ¯ − K(g(x)) ¯ is a cone. Consequently (1.3.2) implies that Note that g (x)C( g (x)C( ¯ x) ¯ − K(g(x)) ¯ = Z.
(1.3.3)
Finally (1.3.3) implies (1.3.1) by Theorem 1.4. Thus, the three conditions (1.3.1)–(1.3.3) are equivalent. Condition (1.3.1) is used in the work of Robinson (see, e.g., [Ro2]) on the stability of solutions to systems of inequalities, and (1.3.3) is the regularity condition employed in [MaZo, ZoKu] to guarantee the existence of Lagrange multipliers.
i
i i
i
i
i
i
6
ItoKunisc 2008/6/12 page 6 i
Chapter 1. Existence of Lagrange Multipliers
Remark 1.3.1. If C = X, then the so-called Slater condition g (x)h ¯ ∈ int K(g(x)) ¯ for some h ∈ X implies (1.3.1). In case C = X = Rn , Z = Rm , K = {y ∈ Rm : yi = 0 for i = 1, . . . , k, and yi ≤ 0 for i = k + 1, . . . , m}, g = (g1 , . . . , gk , gk+1 , . . . , gm ), the constraint g(x) ∈ K amounts to k equality and m − k inequality constraints, and the regularity condition (1.3.3) becomes ⎧ ¯ ki=1 are linearly independent and the gradients {gi (x)} ⎪ ⎪ ⎨ there exists x ∈ X such that (1.3.4) g (x)x ¯ = 0 for i = 1, . . . , k, ⎪ ⎪ ⎩ i ¯ < 0 for i = k + 1, . . . , m with gi (x) ¯ = 0. gi (x)x This regularity condition is referred to as the Mangasarian–Fromowitz constraint qualification. Other variants of conditions which imply existence of Lagrange multipliers are named after F. John, Karush–Kuhn–Tucker (KKT), and Arrow, Hurwicz, Uzawa. Theorem 1.6. If the solution x ∗ of (1.1.1) is regular, then there exists an associated Lagrange multiplier λ∗ ∈ Z ∗ . Proof. By Proposition 1.2 we have f (x ∗ )x ≥ 0 for all x ∈ T (M, x ∗ ). From Theorem 1 and Corollary 2 in [We], which can be viewed as generalizations of Lyusternik’s theorem, it follows that L(M, x ∗ ) ⊂ T (M, x ∗ ) and hence f (x ∗ )x ≥ 0 for all x ∈ L(M, x ∗ ).
(1.3.5)
We define the set B = {(f (x ∗ )x + r, g (x ∗ )x − y) : r ≥ 0, x ∈ C(x ∗ ), y ∈ K(g(x ∗ ))} ⊂ R × Z. This is a convex cone containing the origin (0, 0) ∈ R × Z. Due to (1.3.5) the origin is a boundary point of B. By the regular point condition and Theorem 1.4 with T = g (x ∗ ) and y¯ = g(x ∗ ) there exists ρ > 0 such that {(α, y) : α ≥ max{f (x ∗ )x : x ∈ (C − x ∗ ) ∩ X1 } and y ∈ Zρ } ⊂ B, and hence int B = ∅. Consequently there exists a hyperplane in R × Z which supports B at (0, 0), i.e., there exists (α, λ∗ ) = (0, 0) ∈ R × Z ∗ such that α(f (x ∗ )x + r) + λ∗ (g (x ∗ )x − y) ≥ 0 for all x ∈ C(x ∗ ), y ∈ K(g(x ∗ )), r ≥ 0.
(1.3.6)
Setting (r, x) = (0, 0) this implies that λ∗ , y ≤ 0 for all y ∈ K(g(x ∗ )) and consequently λ∗ ∈ K ∗
and
λ∗ g(x ∗ ) = 0.
(1.3.7)
i
i i
i
i
i
i
1.3. Regularity and existence of Lagrange multipliers
ItoKunisc 2008/6/12 page 7 i
7
The choice (x, y) = (0, 0) in (1.3.6) implies α ≥ 0. If α = 0, then due to regularity of x ∗ we have λ∗ = 0, which is impossible. Consequently α > 0, and without loss of generality we can assume that α = 1. Finally with (r, y) = (0, 0), inequality (1.3.6) implies f (x ∗ ) + λ∗ g (x ∗ ) ∈ −C(x ∗ )+ . This concludes the proof. We close this subsection with two archetypical problems. Equality-constrained problem. We consider min f (x) subject to g(x) = 0.
(1.3.8)
Let x ∗ denote a local solution at which (1.1.2) is satisfied and such that g (x ∗ ) : X → Z is surjective. Then there exists λ∗ ∈ Z ∗ such that the KKT condition f (x ∗ ) + λ∗ g (x ∗ ) = 0 in X∗ holds. Surjectivity of g (x ∗ ) is of course only a sufficient, not a necessary, condition for existence of a Lagrangian. This topic is taken up in Example 1.12 and below. Inequality-constrained problem. This is the inequality-constrained problem with the so-called unilateral box, or simple constraints. We suppose that Z = L2 () with a domain in Rn and consider min f (x) (1.3.9) subject to g(x(s)) ≤ 0 for a.e. s ∈ . Setting C = X and K = {v ∈ L2 () : v ≤ 0}, this problem becomes a special case of (1.1.1). Let x ∗ denote a local solution at which (1.1.2) is satisfied and such that g (x ∗ ) : X → Z is surjective. Then there exists λ∗ ∈ L2 () such that the KKT conditions f (x ∗ ) + λ∗ g (x ∗ ) = 0 in X∗ , λ∗ (s) ≥ 0, g(x ∗ (s)) ≤ 0, λ∗ (s)g(x ∗ (s)) = 0 for a.e. s ∈
(1.3.10)
hold. Again surjectivity of g (x ∗ ) is a sufficient, but not a necessary, condition for (1.3.10) to hold. For later use let us consider the case where the functionals f (x ∗ ) ∈ X∗ and λ∗ ∈ Z ∗ are represented by duality pairings, i.e., f (x ∗ )(x) = JX f (x ∗ ), x X∗ ,X ,
λ∗ (z) = JZ λ∗ , z Z∗ ,Z
for x ∈ X and z ∈ Z. Here JX is the canonical isomorphism from L(X, R) into the space of representations of continuous linear functionals on X with respect to the ·, · X∗ ,X duality pairing, and JZ is defined analogously. In finite dimensions these isomorphisms are transpositions. With this notation f (x ∗ ) + λ∗ g (x ∗ ) = 0 can be expressed as JX f (x ∗ ) + g (x ∗ )∗ JZ λ∗ = 0. Henceforth the notation of the isomorphisms will be omitted.
i
i i
i
i
i
i
8
ItoKunisc 2008/6/12 page 8 i
Chapter 1. Existence of Lagrange Multipliers
1.4 Applications This section is devoted to a discussion of examples in which the general existence result of the previous section is applicable, as well as to other examples in which it fails. Throughout denotes a bounded domain in Rn with Lipschitz continuous boundary ∂. We use standard Hilbert space notation as introduced in [Ad], for example. Differently from the previous sections we shall use J to denote the cost functional and f for inhomogeneities in equality constraints representing partial differential equations. Moreover the constraints g(x) ∈ K from the previous sections will be split into equality constraints, denoted by e(x) = 0, and inequality constraints, g(x) ∈ K, with K a nontrivial cone. Example 1.7. Consider the unilateral obstacle problem ⎧
2 1 ⎪
⎪ ∇y dx − fydx min J (y) = ⎨ 2 ⎪ ⎪ ⎩ over y ∈ H01 () and y(x) ≤ ψ(x) for a.e. x ∈ ,
(1.4.1)
where f ∈ L2 () and ψ ∈ H01 (). This is a special case of (1.1.1) with X = Z = H01 (), K = {ϕ ∈ H01 () : ϕ(x) ≤ 0 for a.e. x ∈ }, C = X, and g(y) = y − ψ. It is well known and simple to argue that (1.4.1) admits a unique solution y ∗ ∈ H01 (). Moreover, since g is the identity, every element of X satisfies the regular point condition. Hence by Theorem 1.6 there exists λ∗ ∈ H01 ()∗ such that ∗ λ , ϕ H −1 ,H 1 ≤ 0 for all ϕ ∈ K, ∗ ∗ 0 0 λ , y − ψ H −1 ,H 1 = 0, 0
0
− y ∗ + λ∗ = f in H −1 , where H −1 = H01 ()∗ . Under well-known regularity assumptions on ψ and ∂, cf. [Fr, IK1], for example, λ∗ ∈ L2 (). This extra smoothness of the Lagrange multiplier does not follow from the general multiplier theory of the previous section. The following result is useful to establish the regular point condition in diverse optimal control and parameter estimation problems. We require some notation. Let L be a closed convex cone in the real Hilbert space U which induces an ordering denoted according to u ≥ 0 if u ∈ L. Let q ∗ ∈ U ∗ be a nontrivial bounded linear functional on U , with π : U → ker q ∗ the orthogonal projection. For ϕ ∈ U, μ ∈ R, and γ ∈ R+ define g : U → Z := U × R × R by
2 g(u) = u − ϕ, u − γ 2 , q ∗ (u) − μ ,
where | · | denotes the norm in U , and put K = L × R− × {0} ⊂ Z. Proposition 1.8. Assume that [ker q ∗ ]⊥ ∩ L = {0} and let h0 ∈ [ker q ∗ ]⊥ ∩ L with
2
2 q ∗ (h0 ) = 1. If q ∗ (L) ⊂ R+ , q ∗ (ϕ) < μ, and π(ϕ) + μ2 h0 < γ 2 ; then the set M = {u ∈ U : g(u) ∈ K}
i
i i
i
i
i
i
1.4. Applications
ItoKunisc 2008/6/12 page 9 i
9
is nonempty and every element u ∈ M is regular, i.e., 0 ∈ int g(u) + g (u)U − K for each u ∈ M.
(1.4.2)
Proof. Note that U = ker q ∗ ⊕ span {h0 } and that the orthogonal projection onto [ker q ∗ ]⊥ is given by q ∗ (u)h0 for u ∈ U . To show that M is nonempty, we define uˆ = ϕ + μ − q ∗ (ϕ) h0 . Observe that
uˆ − ϕ = μ − q ∗ (ϕ) h0 ∈ L,
2
uˆ = π(ϕ) 2 + μ2 h0 2 < γ ,
ˆ = μ. Thus uˆ ∈ M. Next let u ∈ M be arbitrary. We have to verify that and q ∗ (u) (1.4.3) 0 ∈ int u − ϕ + h − L, |u|2 − γ 2 + 2(u, h) − R− , q ∗ (h) : h ∈ U ⊂ Z, where (·, ·) denotes the inner product in U . Put
2
2 γ 2 − π(ϕ) − μ2 h0 μ − q ∗ (ϕ)
, , δ = min q ∗ +1 1 + 2γ + 2γ h0
and define B = (u, ˜ r˜ , s˜ ) ∈ Z : (u, ˜ r˜ , s˜ ) Z ≤ δ . Without loss of generality we endow the product space with the supremum norm in this proof. We shall show that
2 (1.4.4) B ⊂ u − ϕ + h1 + σ h0 − L, u − γ 2 + 2(u, h1 + σ h0 ) − R− , ∗ ∗ σ q (h0 ) : h1 ∈ ker q , σ ∈ R , which implies (1.4.3). Let (u, ˜ r˜ , s˜ ) ∈ B be chosen arbitrarily. We put σ = s˜ and verify that there exists (h1 , l, r − ) ∈ ker q ∗ × L × R− such that
2 ˜ r˜ ). (1.4.5) u − ϕ + h1 + s˜ h0 − l, u − γ 2 + 2(u, h1 + s˜ h0 ) − r − = (u, This will imply the claim. Observe that by the choice of δ we find μ − q ∗ (ϕ) − q (x) ˜ + s˜ ≥ 0. Hence l is defined by ˜ + s˜ h0 ∈ L. l = μ − q ∗ (ϕ) − q ∗ (u) ∗
˜ we obtain equality in the first coordinate of (1.4.5). For Choosing h1 = π(φ − u + u) the second coordinate in (1.4.5) we observe that
2
u − γ 2 + 2(u, h1 + s˜ h0 )
2
2 = πu + μ2 h0 − γ 2 + 2(u, π(ϕ − u + u) ˜ + s˜ h0 )
2
2 2
2
2 2 2 2 ˜ + s˜ h0 ≤ πu + μ h0 − γ + π u + π ϕ − 2 π u + 2γ |u|
2
2
≤ μ2 h0 − γ 2 + πϕ + 2δγ 1 + h0 ≤ −δ by the definition of δ. Hence there exists r − ∈ R− such that equality holds in the second coordinate of (1.4.5).
i
i i
i
i
i
i
10
ItoKunisc 2008/6/12 page 10 i
Chapter 1. Existence of Lagrange Multipliers
Remark 1.4.1. If the constraint q ∗ (u) = μ is not present in the definition of M, then the
conclusion of Proposition 1.8 holds provided that ϕ < γ . Example 1.9. We consider a least squares formulation for the estimation of the potential c in − y + cy = f in , (1.4.6) y = 0 on ∂ from an observation z ∈ L2 (), where f ∈ L2 () and ⊂ Rn , with n ≤ 4. For this purpose we set
M = c ∈ L2 () : c(x) ≥ 0, c ≤ γ for some γ > 0 and consider ⎧
2 α 2 1 ⎪ ⎨min J (c) = y(c) − z + c 2 2 ⎪ ⎩ subject to c ∈ M and y(c) solution to (4.6).
(1.4.7)
The Lax–Milgram theorem implies the existence of a variational solution y(c) ∈ H01 () for every c ∈ M. Here we use the fact that H 1 () ⊂ L4 () and cy ∈ L4/3 () for c ∈ L2 (), y ∈ H 1 (), for n ≤ 4. Proposition 1.8 with L = {c ∈ L2 () : c(x) ≥ 0} and
2 g(c) = c, c − γ 2 is applicable and implies that every element of M is a regular point. Using subsequential weak limit arguments one can argue the existence of a solution c∗ to (1.4.7). To apply Theorem 1.6 Fréchet differentiability of c → J (c) at c∗ must be verified. This will follow from Fréchet differentiability of c → y(c) at every c ∈ M, which in turn can be argued by means of the implicit function theorem, for example. The Fréchet derivative of c → y(c) at c ∈ M in direction h ∈ L2 () denoted by y (c)h satisfies − y (c)h + cy (c)h = −hy(c) in , (1.4.8) on ∂. y (c)h = 0 With these preliminaries we obtain a Lagrange multiplier (μ1 , μ2 ) ∈ −L × R+ such that the following first order necessary optimality system holds: ⎧ ∗ y(c ) − z, y (c∗ )h L2 + α (c∗ , h) − (μ1 , h) + 2μ2 (c∗ , h) = 0 ⎪ ⎪ ⎪ ⎨ for all h ∈ L2 (), (1.4.9) 2 ∗ ∗ −(μ1 , c ) + μ2 c − γ = 0, ⎪ ⎪ ⎪ ⎩ (μ1 , μ2 ) ∈ L × R+ . The first equation in (1.4.9) can be simplified by introducing p(c∗ ) as the variational solution to the adjoint equation given by − p + c∗ p = − (y(c∗ ) − z) in , (1.4.10) p=0 on ∂.
i
i i
i
i
i
i
1.4. Applications
ItoKunisc 2008/6/12 page 11 i
11
Using (1.4.8) and (1.4.10) in (1.4.9) we obtain the optimality system ⎧ ∗ p(c ) y(c∗ ) + αc∗ − μ1 + 2μ2 c∗ = 0, ⎪ ⎪ ⎨
2 (μ1 , μ2 ) ∈ −L × R+ , c∗ , c∗ − γ 2 ∈ L × R− , ⎪ ⎪ ⎩(μ1 , c∗ ) = 0, μ2 ( c∗ 2 − γ 2 = 0.
(1.4.11)
Example 1.10. We revisit Example 1.9 but this time the state variable is considered as an independent variable. This results in the following problem with equality and inequality constraints: ⎧ α 1 ⎪ ⎨min J (y, c) = |y − z|2 + |c|2 2 2 (1.4.12) ⎪ ⎩ − subject to e(y, c) = 0 and g(c) ∈ L × R , where g is as in Example 1.9 and e : H01 () × L2 () → H −1 () is defined by e(y, c) = − y + cy − f. Note that cy ∈ L () ⊂ H −1 () for c ∈ L2 (), y ∈ H 1 () if n ≤ 4. Let (c∗ , y ∗ ) ∈ L2 () × H01 () denote a solution of (1.4.12). Fréchet differentiability of J, e, and g is obvious for this formulation with y as independent variable. The regular point condition now has to be established for the constraint e ∈ K = {0} × L × R− ⊂ H −1 () × L2 () × R. g 4 3
The second and third components are treated as in Example 1.9 by means of Proposition 1.8 and the first coordinate is incorporated by an existence argument for (1.4.6). The details are left to the reader. Consequently there exists a Lagrange multiplier (p, μ1 μ2 ) ∈ H01 () × L × R+ such that the derivative of L(y, c, p, μ1 , μ2 )
2 = J (y, c) + e(y, c), p H −1 ,H01 + (μ1 , −c) + μ2 c∗ − γ 2
with respect to (y, c) is zero at (y ∗ , c∗ , p, μ1 , μ2 ). This implies that p is the variational solution in H01 () to − p + c∗ p = −(y ∗ − z) in , p=0 on ∂ and py ∗ + αc∗ − μ1 + 2μ2 c∗ = 0. In addition, of course,
2 (μ1 , μ2 ) ∈ −L × R+ , c∗ , c∗ − γ 2 ∈ L × R− ,
2 (μ1 , c∗ ) = 0, μ2 c∗ − γ = 0.
i
i i
i
i
i
i
12
ItoKunisc 2008/6/12 page 12 i
Chapter 1. Existence of Lagrange Multipliers
Thus the first order optimality condition derived in Example 1.9 above and the present one coincide, and the adjoint variable of Example 1.9 becomes the Lagrange multiplier of the equality constraint e(y, c) = 0. Example 1.11. We consider a least squares formulation for the estimation of the diffusion coefficient a in −(ayx )x + cy = f in = (0, 1), (1.4.13) y(0) = y(1) = 0, where f ∈ L2 (0, 1) and c ∈ L2 (), c ≥ 0, are fixed. We assume that a is known outside of an interval I = (β, γ ) with 0 < β < γ < 1 and that it coincides there with the values of a known function ν ∈ H 1 (0, 1), which satisfies min {ν(x) : x ∈ (0, 1)} > 0. The set of admissible parameters is then defined by
M = a ∈ H 1 (I ) : a ≥ ν, a − aˆ H 1 (I ) ≤ γ , a(β) = ν(β), γ a dx = m . a(γ ) = ν(γ ), and β
Here aˆ ∈ H 1 (I ) is a fixed reference parameter with a(β) ˆ = ν(β), a(γ ˆ ) = ν(γ ), and γ m = β aˆ dx. Recall that H 1 (I ) ⊂ C(I¯) and hence a(β) and a(γ ) are well defined. The coefficient a appearing in (1.4.13) coincides with ν on the complement of I and with some element of M on I . Consider the least squares formulation ⎧
2
2 ⎨min 12 y(a) − z L2 (0,1) + α2 a H 1 (I ) (1.4.14) ⎩ subject to a ∈ M and y(a) solution to (1.4.13), where z ∈ L2 (0, 1) is given. It is simple to argue the existence of a solution a ∗ to (1.4.14). Note that minimizing over M implies that the mean of the coefficient a is known. This, together with some additional technical assumptions, implies that the solution a ∗ is (locally) stable with respect to perturbations of z [CoKu]. To argue the existence of Lagrange multipliers associated to the constraints defining M one considers the set of shifted parameters M˜ = {a − aˆ : a ∈ M}, i.e., γ
1
˜ M = a ∈ H (I ) : a ≥ ν − a, ˆ a H 1 (I ) ≤ γ , a dx = 0, a(β) = a(γ ) = 0 . β
We apply Proposition 1.8 with U = and (|u|2L2 (I ) + |ux |2L2 (I ) )1/2 as norm, γ L = {a ∈ U : a ≥ 0}, q ∗ (a) = β a dx, ϕ = ν − aˆ ∈ U , and μ = 0. The mapping g : U → U × R × R is given by
2 g(a) = a − ϕ, a H 1 (I ) − γ 2 , q ∗ (a) . H01 (I ),
γ Let us assume that β ν dx < m and ν − aˆ H 1 (I ) < γ . Then we have q ∗ (L) ⊂ γ R+ , q ∗ (ϕ) < m − β adx ˆ = μ, and
πϕ 1 = π(ν − a) ˆ H 1 (I ) ≤ ν − aˆ H 1 (I ) < γ . H (I )
i
i i
i
i
i
i
1.4. Applications
ItoKunisc 2008/6/12 page 13 i
13
It remains to ascertain that [ker q ∗ ]⊥ ∩ L is nonempty. Let ψ be the unique solution of −ψxx + ψ = 1 on (β, γ ), ψ(β) = ψ(γ ) = 0. By the maximum principle ψ ∈ L. A short calculation shows that ψ ∈ [ker q ∗ ]⊥ and hence h0 := q ∗ (ψ)−1 ψ satisfies h0 ∈ [ker q ∗ ]⊥ ∩ L and q ∗ (h0 ) = 1. Next we present examples where the general existence result of Section 1.3 is not applicable. Different techniques to be developed in the following sections will guarantee, however, the existence of a Lagrange multiplier. In these examples we shall again use e for equality and g for inequality constraints. Example 1.12. Consider the problem min x12 + x32 subject to e(x1 , x2 , x3 ) = 0, where e(x1 , x2 , x3 ) =
(1.4.15)
x1 − x22 − x3 . x22 − x32
Here X = R3 , Z = R2 , e of Section 1.3 is e here, and K = {0}. Note that x ∗ = (0, 0, 0) is a solution of (1.4.15) and that ∇e(x ∗ ) is not surjective. Hence x ∗ is not a regular point for the constraint e(x) = 0. Nevertheless (0, λ2 ) with λ2 ⊂ R arbitrary is a Lagrange multiplier for the constraint in (1.4.15). Note that in case C = X and K = {0}, the last requirement in Definition 1.1 becomes f (x ∗ ) + e (x ∗ )∗ λ∗ = 0 in X ∗ , where e (x ∗ )∗ : Z ∗ → X ∗ is the adjoint of e (x ∗ ). Hence if e (x ∗ ) is not surjective and if it has closed range, then ker e (x ∗ )∗ is not empty and the Lagrange multiplier is not unique. Example 1.13. Here we consider optimal control of the equation with elliptic nonlinearity (Bratu problem) − y + exp(y) = u in , (1.4.16) ∂y = 0 on and y = 0 on ∂ \ , ∂n where is a bounded domain in Rn , n ≤ 3, with smooth boundary ∂ and is a connected, strict subset of ∂. Let H1 () = {y ∈ H 1 () : y = 0 on ∂ \ }. The following lemma establishes the concept of solution to (1.4.16). Its proof is given at the end of this section. Lemma 1.14. For every u ∈ L2 () the variational problem (∇y, ∇v) + exp y, v H1 ()∗ ,H1 () = (u, v) for v ∈ H1 () has a unique solution y = y(u) ∈ H1 () and there exists a constant C such that |y|H1 ∩L∞ ≤ C(|u|L2 () + C) for all u ∈ L2 ()
(1.4.17)
i
i i
i
i
i
i
14
ItoKunisc 2008/6/12 page 14 i
Chapter 1. Existence of Lagrange Multipliers
and |y(u1 ) − y(u2 )|H1 ∩L∞ ≤ C|u1 − u2 |L2 () for all ui ∈ L2 (), i = 1, 2. The optimal control problem is given by ⎧
2
2 min J (y, u) = 21 |y − z L2 () + α2 |u L2 () ⎪ ⎪ ⎪ ⎪ ⎨ subject to (y, u) ∈ H1 () ∩ L∞ () × L2 () ⎪ ⎪ ⎪ ⎪ ⎩ and (y, u) solution to (1.4.17),
(1.4.18)
(1.4.19)
where z ∈ L2 (). To consider this problem as a special case of (1.1.1) we set X = Y × L2 (), where Y = H1 () ∩ L∞ (), Z = H1 ()∗ , K = {0}, and define e : X → Z ∗ as the mapping which assigns to (y, u) ∈ Y × U the functional v → (∇y, ∇v) + (exp(y), v) − (u, v). The control component of any minimizing sequence {(yn , un )} for J is bounded. Together with Lemma 1.14 it is therefore simple to argue that (1.4.19) admits a solution (y ∗ , u∗ ) ∈ Y × U . Clearly J and e are continuously Fréchet differentiable. If (y ∗ , u∗ ) was a regular point, then this would require surjectivity of e (y ∗ , u∗ ) : X → H1 ()∗ . For (δy, δu) ∈ X, e (y ∗ , u∗ )(δy, δu) is the functional defined by v → (∇δy, ∇v) + (exp(y ∗ )δy − δu, v),
v ∈ H1 ().
Since − +exp(y ∗ ) is an isomorphism from H1 () to H1 ()∗ and H1 () is not contained in L∞ (), if n > 1, it follows that e (y ∗ , u∗ ) : X → H1 ()∗ is not surjective if n > 1. Example 1.15. We consider the least squares problem for the estimation of the vectorvalued convection coefficient u in − y + u · ∇y = f in , (1.4.20) y = 0 on ∂, div u = 0, from data z ∈ L2 (). Here is a bounded domain in Rn , n ≤ 3, with smooth boundary and f ∈ L 2 (). To cast the problem in abstract form we choose U = u ∈ L2n () : div u = 0 , X = H01 () ∩ L∞ () ×U, Z = H −1 (), K = {0} and define e : X → Z as the mapping which assigns to (y, u) ∈ X the functional v → (∇y, ∇v) − (uy, ∇v) − (f, v)
for v ∈ H01 ().
Note that e(y, u) is not well defined for (y, u) ∈ H01 () × U , since u · ∇y ∈ L1 () only. The regularized least squares problem is given by ⎧ 1 α 2 2 ⎨min J (y, u) = 2 |y − z|L2 () + 2 |u|L2n () (1.4.21) ⎩ subject to e(y, u) = 0, div u = 0, (y, u) ∈ X.
i
i i
i
i
i
i
1.4. Applications
ItoKunisc 2008/6/12 page 15 i
15
Observe that (u · ∇y, y) = (∇ · (uy), y) = 0. Using this fact and techniques similar to those of the proof of Lemma 1.14 (compare [Tr, Section 2.3] and [IK14, IK15]) it can be shown that for every u ∈ U there exists a unique solution y = y(u) ∈ X of (1.4.20) and for every bounded subset B of U there exists a constant k(B) such that |y(u)|X ≤ k(B)|u|L2n for all u ∈ B.
(1.4.22)
Extracting a subsequence of a minimizing sequence to (1.4.21) it is simple to argue the existence of a solution (y ∗ , u∗ ) ∈ X to (1.4.21). Clearly e is Fréchet differentiable at (y ∗ , u∗ ) and for δy ∈ H01 () ∩ L∞ (), ey (y ∗ , u∗ )δy ∈ H −1 is the functional defined by ey (y ∗ , u∗ )δy, v H −1 ,H 1 = (∇δy, ∇v) + (u∗ · ∇δy, v) with v ∈ H01 (). 0
Note that ey (y ∗ , u∗ ) is well defined on H01 () ∩ L∞ (). But as a consequence of the bilinear term u∗ · ∇δy which is only in L1 (), ey (y ∗ , u∗ ) is not defined on H01 (). The operator e (y ∗ , u∗ ) from X to H −1 () is not surjective if n > 1, and hence (y ∗ , u∗ ) does not satisfy the regular point condition. We now turn to the proof of Lemma 1.14 for which we require the following result from [Tr]. Lemma 1.16. Let ϕ : (k1 , h1 ) → R be a nonnegative, nonincreasing function and suppose that there are positive constants r, K, and β, with β > 1, such that ϕ(h) ≤ K(h − k)−r ϕ(k)β for k1 < k < h < h1 . If kˆ := K r 2 β−1 ϕ(k1 ) 1
β
β−1 r
ˆ = 0. satisfies k1 + kˆ < h1 , then ϕ(k1 + k)
Proof of Lemma 1.14. Let us first argue the existence of a solution y = y(u) ∈ of (∇y, ∇v) + (ey , v)H1 ()∗ ,H1 () = (u, v) for all v ∈ H1 ().
(1.4.23)
Since (exp (s1 ) − exp (s2 ))(s1 − s2 ) ≥ 0 for s1 , s2 ∈ R, it follows that − φ + exp (φ) − γ I defines a maximal monotone operator from a subset of H1 () to H1 ()∗ for some γ > 0 [Ba], and hence (1.4.23) has a unique variational solution y = y(u) ∈ H1 () for every u ∈ L2 (); see [IK15] for details. Since |∇(y(u1 ) − y(u2 ))|2 ≤ (u1 − u2 , y(u1 ) − y(u2 )) and ∂ \ is nonempty, there exists an embedding constant C such that |y(u1 ) − y(u2 )|H1 () ≤ C |u1 − u2 |L2 ()
(1.4.24)
for u1 , u2 in L2 (), and |y(u)|H1 ≤ C(|u|L2 () + C) for all u ∈ L2 (). Throughout the remainder of the proof C will denote a generic constant, independent of u ∈ L2 (). To verify (1.4.17) it remains to obtain an L∞ () bound for y = y(u). The
i
i i
i
i
i
i
16
ItoKunisc 2008/6/12 page 16 i
Chapter 1. Existence of Lagrange Multipliers
proof is based on a generalization of well-known L∞ () estimates due to Stampacchia and Miranda [Tr] for linear variational problems to the nonlinear problem (1.4.23). Let us aim first for a pointwise (a.e.) upper bound for y. For k ∈ (0, ∞) we set yk = (y − k)+ and k = {x ∈ : yk > 0}. Note that yk ∈ H1 () and yk ≥ 0. Using (1.4.23) we find (∇yk , ∇yk ) = (∇y, ∇yk ) = (u, yk ) − (ey , yk ) ≤ (u, yk ), and hence |∇yk |2L2 ≤ |u|L2 |yk |L2 .
(1.4.25)
By Hölder’s inequality and a well-known embedding result 12 1 1 yk2 ≤ |yk |L6 |k | 3 ≤ C|∇yk |L2 |k | 3 . |yk |L2 = k
Here we used the assumption that n ≤ 3. Employing this estimate in (1.4.25) implies that 1
|∇yk |L2 ≤ C|k | 3 |u|L2 .
(1.4.26)
We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞, and we find 4 4 (y − k) > (y − k)4 ≥ |h |(h − k)4 , |yk |L4 = k
h
which, combined with (1.4.26), gives ˆ − k)−4 |k | 43 |u|4 2 , |h | ≤ C(h L
(1.4.27)
where the constant Cˆ is independent of h, k, and u. It will be shown that Lemma 1.16 is ˆ 4 2 . The conditions on k1 and applicable to (1.4.27) with ϕ(k) = |k |, β = 43 and K = C|u| L β β−1 1 h1 can easily be satisfied. In fact, in our case k1 = 0, h1 = ∞, and kˆ = Cˆ 4 |u|L2 2 β−1 |0 | 4 . The condition k1 + kˆ < h1 is satisfied since 1 kˆ = Cˆ 4 |u|L2 2 β−1 |0 | β
β−1 4
1 < Cˆ 4 |u|L2 2 β−1 || β
β−1 4
< ∞.
We conclude that |kˆ | = 0 and hence y ≤ kˆ a.e. in . A uniform lower bound on y can be obtained in an analogous manner by considering yk = (−(k + y))+ . We leave the details to the reader. This concludes the proof of (1.4.17). To verify (1.4.18) the H 1 estimate for y(u1 ) − y(u2 ) is already clear from (1.4.24) and it remains to verify the L∞ () estimate. Let us set yi = y(ui ), z = y1 − y2 , zk = (z − k)+ , and k = {x ∈ : zk > 0} for k ∈ (0, ∞). We obtain |∇zk |2L2 = (∇z, ∇zk ) = (u1 − u2 , zk ) − (ey1 − ey2 , zk ) ≤ (u1 − u2 , zk ). Proceeding as above with y and yk replaced by z and zk the desired pointwise upper bound for y1 − y2 is obtained. For the lower bound we define zk = (−(k + z))+ for k ∈ (0, ∞) and k = {x ∈ : zk > 0} = {x : k + y1 (x) < y2 (x)}. It follows that |∇zk |2L2 = −(∇(y1 − y2 ), ∇zk ) = (ey1 − ey2 , zk ) − (u1 − u2 , zk ) ≤ −(u1 − u2 , zk ). From this inequality we obtain the desired uniform pointwise lower bound on y1 − y2 .
i
i i
i
i
i
i
1.5. Weakly singular problems
ItoKunisc 2008/6/12 page 17 i
17
1.5 Weakly singular problems We consider the optimization problem with equality and inequality constraints of the type min J (y, u) (1.5.1) subject to e(y, u) = 0, u ∈ C ⊂ U, with J : Y × U → R, e : Y1 × U → W , where Y, U, W are Hilbert spaces and Y1 is a Banach space that is densely embedded in Y . Further C is a closed convex subset of U . Below W ∗ will denote the dual of W . Let (y ∗ , u∗ ) denote a local solution to (1.5.1) and let V (y ∗ ) × V (u∗ ) ⊂ Y1 × U denote a neighborhood of (y ∗ , u∗ ) such that J (y ∗ , u∗ ) ≤ J (y, u) for all (y, u) ∈ V (y ∗ ) × (V (u∗ ) ∩ C) satisfying e(y, u) = 0. It is assumed throughout that J is Fréchet differentiable in a neighborhood in the Y × U topology of (y ∗ , u∗ ) and that the Fréchet derivative is locally Lipschitz continuous. Further e is assumed to be Fréchet differentiable at (y ∗ , u∗ ) with Fréchet derivative e (y ∗ , u∗ )(δy, δu) = ey (y ∗ , u∗ )δy + eu (y ∗ , u∗ )δu. In particular ey (y ∗ , u∗ ) ∈ L(Y1 , W ). Since Y1 is dense in Y , one may consider ey (y ∗ , u∗ ) as densely defined linear operator with domain in Y . To distinguish this operator from ey (y ∗ , u∗ ) defined on Y1 we shall denote it in this section by G : D(G) ⊂ Y → W and we assume that (H1) G∗ : D(G∗ ) ⊂ W ∗ → Y ∗ is densely defined. Then necessarily G is closable [Ka, p. 168]. Its closure will be denoted by the same symbol. In addition the following assumptions will be required: (H2) Jy (y ∗ , u∗ ) ∈ Rg G∗ . Condition (H2) is a regularity assumption. It implies the existence of a solution λ∗ ∈ D(G∗ ) to the adjoint equation G∗ λ + Jy (y ∗ , u∗ ) = 0, which is a Lagrange multiplier associated to the constraint e(y, u) = 0. (H3) There exists a dense subset D of C with the following property: For every u ∈ D there exists tu > 0 such that for all t ∈ [0, tu ] there exists y(t) ∈ Y1 satisfying e(y(t), u∗ + t (u − u∗ )) = 0 and 1 lim+ |y(t) − y ∗ |2Y = 0. (1.5.2) t→0 t (H4) For every u ∈ D and y(·) as in (H3), e is directionally differentiable at every element of {(y ∗ + s(y(t) − y ∗ ), u∗ + st (u − u∗ )) : s ∈ [0, 1], t ∈ [0, tu ]} in all directions (y, ˜ u) ˜ ∈ Y1 × U and 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ [e (y +s(y(t)−y ), u +stv)−e (y , u )](y(t)−y , tv)ds, λ = 0, lim t→0+ t 0 W,W ∗ where v = u − u∗ .
i
i i
i
i
i
i
18
ItoKunisc 2008/6/12 page 18 i
Chapter 1. Existence of Lagrange Multipliers
Note that (H4) is satisfied if (H3) holds with Y replaced by Y1 and if e : Y1 → W is Fréchet differentiable with locally Lipschitzian derivative. Our assumptions do not require surjectivity of e (y ∗ , u∗ ) : Y1 × U → W which is required by (1.3.1) nor that e (y ∗ , u∗ ) is well defined on all of Y × U . Below we shall give several examples which illustrate the applicability of the assumptions. For now, we argue that if they are satisfied, then a Lagrange multiplier with respect to the equality constraint exists. Theorem 1.17. Let (y ∗ , u∗ ) be a local solution of (1.5.1) and assume that (H1)–(H4) hold. Then ⎧ ∗ ∗ ,u ) = 0 in W (primal equation), ⎪ ⎨e(y ∗ ∗ ∗ ∗ ∗ G λ + J (y , u ) = 0 in Y (adjoint equation), y ⎪ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ⎩ eu (y , u ) λ + Ju (y , u ), u − u ≥ 0 for all u ∈ C (optimality). ∗ U ,U
(1.5.3) The system of equations (1.5.3) is referred to as an optimality system. Proof. Let u ∈ D, set v = u − u∗ , choose tu according to (H3), and assume that t ∈ (0, tu ]. Due to (H3) and (H4) 0 = e(y(t), u∗ + tv) − e(y ∗ , u∗ ) = G(y(t) − y ∗ ) + t eu (y ∗ , u∗ ) v 1 + [e (y ∗ + s(y(t) − y ∗ ), u∗ + stv) − e (y ∗ , u∗ )](y(t) − y ∗ , tv)ds.
(1.5.4)
0
(H2) implies the existence of a solution λ∗ to the adjoint equation. Observe that by (1.5.4) and the fact that u∗ is a local solution to (1.5.1) 0 ≤ J (y(t), u∗ + tv) − J (y ∗ , u∗ ) = J (y ∗ , u∗ )(y(t) − y ∗ , tv) 1 ∗ + J (y + s(y(t) − y ∗ ), u∗ + stv) − J (y ∗ , u∗ ) (y(t) − y ∗ , tv)ds 0
+ G∗ λ∗ , y(t) − y ∗ Y ∗ ,Y + t eu (y ∗ , u∗ )v, λ∗ W,W ∗ 1 ∗ + e (y + s(y(t) − y ∗ ), u∗ + stv) − e (y ∗ , u∗ ) (y(t) − y ∗ , tv), λ∗
W,W ∗
0
ds.
By the second equation in (1.5.3), local Lipschitz continuous differentiability of J , (H3), (H4), and the fact that J (y ∗ , u∗ )(y(t) − y ∗ , tv) = Jy (y ∗ , u∗ )(y(t) − y ∗ ) + tJu (y ∗ , u∗ )v we obtain 1 0 ≤ lim+ J (y(t), u∗ + tv) − J (y ∗ , u∗ )) = Ju (y ∗ , u∗ )(u∗ ) + eu (y ∗ , u∗ )∗ λ∗ , u − u∗ U ∗ ,U . t→0 t Since u is arbitrary in D and D is dense in C it follows that Ju (y ∗ , u∗ )(u∗ ) + eu (y ∗ , u∗ )∗ λ∗ , u − u∗ U ∗ ,U ≥ 0 for all u ∈ C. This ends the proof.
i
i i
i
i
i
i
1.5. Weakly singular problems
ItoKunisc 2008/6/12 page 19 i
19
We next give several examples which demonstrate the applicability of hypotheses (H1)–(H4) and the necessity to allow for two spaces Y1 and Y with Y1 Y . The typical situation that we have in mind is Y1 = Y ∩ L∞ () with Y a Hilbertian function space over . Example 1.18. Consider first the finite-dimensional equality-constrained optimization problem ⎧ ⎨min y12 + u2 , y1 − y22 = u, (1.5.5) ⎩ 3 y2 = u2 for which (y ∗ , u∗ ) = (0, 0, 0) is the solution. Here Y = R2 , U = R1 , W = R1 , and y − y2 − u . e(y, u) = 1 3 2 2 y2 − u Note that with (y1 , y2 , u) = (x1 , x2 , x3 ) this problem coincides with Example 1.12. We recall that e (y ∗ , u∗ ) is not surjective and the theory of Section 1.3 assuring the existence of a Lagrange multiplier is therefore not applicable. However, G∗ = 10 00 and the adjoint equation λ 0 G∗ 1 = λ2 0 has infinitely many solutions. Thus (H1), (H2) are satisfied. As for (H3) note that yy12 (u) = (u) u+u 34 defines a solution branch to e(y, u) = 0 for which (H3) is satisfied. It is simple to 2 u3
verify (H4). Hence (1.5.5) is an example for an optimization problem where all hypotheses (H1)–(H4) are satisfied and Theorem 1.17 is applicable. Example 1.19. We consider the optimal control problems with distributed control min J (y, u) =
α 1 |y − z|2L2 () + |u|2L2 () 2 2
(1.5.6)
subject to ⎧ ⎨− y + exp(y) = u in , ∂y = 0 on , ⎩ ∂n y = 0 on ∂ \ ,
(1.5.7)
where α > 0, z ∈ L2 (), is a bounded domain in Rn , n ≤ 3, with smooth boundary ∂, and is a connected, strict subset of ∂. The concept of solution to (1.5.7) is the variational one of Lemma 1.14. To consider this problem in the general setting of this section the control variable u is chosen in U = L2 (). We set Y = H1 () = {y ∈ H 1 () : y = 0 on ∂\}, Y1 = H1 () ∩ L∞ (), and W = (H1 ())∗ . Moreover e : Y1 × U → W is defined by assigning to (y, u) ∈ Y1 × U the functional on W ∗ given by v → (∇y, ∇v) + (exp y, v) − (u, v) for v ∈ H1 ().
i
i i
i
i
i
i
20
ItoKunisc 2008/6/12 page 20 i
Chapter 1. Existence of Lagrange Multipliers
Let (y ∗ , u∗ ) ∈ Y1 denote a solution to (1.5.6), (1.5.7). For (y, u) ∈ Y1 × L2 () the Fréchet derivative ey (y, u)δy of e with respect to y in direction δy ∈ H1 () is given by the functional v → (∇δy, ∇v) + ((exp y)δy, v). Clearly ey (y, u) : Y → W is symmetric and (H1), (H2) are satisfied. (H3) is a direct consequence of Lemma 1.14. To verify (H4) note that e (y, u)(δy, δu) is the functional defined by v → (∇δy, ∇v) + (exp(y)δy, v) − (δu, v) for v ∈ H1 (). If (y, u) ∈ Y1 × U , then e (y, u) is well defined on Y × U . (H4) requires us to consider 1 lim t→0+ t
1
0
(exp(y ∗ + s(y(t) − y ∗ )) − exp y ∗ )(y(t) − y ∗ )λ∗ dxds,
(1.5.8)
where y(t) is the solution to (1.5.7) with u = u∗ + tv, v ∈ U . Note that λ∗ ∈ L∞ and that {|y(t)|L∞ : t ∈ [0, 1]} is bounded by Lemma 1.14. Moreover |y(t) − y ∗ |Y1 ≤ c t|v|L2 for a constant c independent of t ∈ [0, 1], and thus the pointwise local Lipschitz property of the exponential function implies that the limit in (1.5.8) is zero. (H4) now easily follows. The considerations of this example remain correct for cost functionals J that are much more general than the one in (1.5.6). In fact, it suffices that J is weakly lower semicontinuous from Y ×U to R and radially unbounded with respect to u, i.e., limn→∞ |un |L2 = ∞ implies that lim supn→∞ inf y∈Y J (y, un ) = ∞. This guarantees existence of a solution (y ∗ , u∗ ). The general regularity assumptions and (H1)–(H4) are satisfied if J : Y × U → R is continuous and Fréchet differentiable in a neighborhood of (y ∗ , u∗ ) with locally Lipschitz continuous derivative. Example 1.20. Consider the nonlinear optimal boundary control problem min J (y, u) =
1 α |y − z|2L2 () + |u|2H s () 2 2
(1.5.9)
subject to ⎧ ⎨− y + exp y = f in , ∂y = u on , ⎩ ∂n y = 0 on ∂ \ ,
(1.5.10)
where α > 0, z ∈ L2 (), f ∈ L∞ are fixed, is a bounded domain in Rn , with smooth boundary ∂, and is a nonempty, connected, strict subset of ∂. Further s is a real number strictly larger than n−3 if n ≥ 3, and s = 0 if n < 3. Differently from Example 1.19 2 the dimension n of is now allowed to be arbitrary. This example can be treated within the general framework of this book by setting Y, Y1 , W as in Example 1.19. The control space U is chosen to be H s (). To verify (H1)–(H4) one proceeds as in Example 1.19 by utilizing the following lemma. Its proof is given in [IK15] and is similar to that of Lemma 1.14. Lemma 1.21. The variational problem (∇y, ∇v) + (exp y, v) = (f, v) + (u, v) ,
v ∈ H1 (),
i
i i
i
i
i
i
1.5. Weakly singular problems
ItoKunisc 2008/6/12 page 21 i
21
has a unique solution y = y(u) ∈ H1 () ∩ L∞ () for every u ∈ H s (), and there exists a constant c such that |y|H1 ∩L∞ ≤ c(|u|H s () + c) for all u ∈ H s ().
(1.5.11)
Moreover, c can be chosen such that |y(u1 ) − y(u2 )|H1 ()∩L∞ ≤ c|u1 − u2 |H s for all ui ∈ H s (), i = 1, 2.
(1.5.12)
Example 1.22. We reconsider (1.4.21) from Example 1.15 as a special case of (1.5.1). For this purpose we set Y = H01 (), Y1 = H01 () ∩ L∞ (), W = H −1 (), and e as in Example 1.15. Let (y ∗ , u∗ ) ∈ Y1 × U denote a solution of (1.4.21). Clearly e is Fréchet differentiable at (y ∗ , u∗ ) and the partial derivative G = ey (y ∗ , u∗ ) ∈ L(Y1 , W ) is the functional characterized by ey (y ∗ , u∗ )δy, v W,W ∗ = (∇δy, ∇v) + (u∗ · ∇δy, v), v ∈ W ∗ = H01 (). As a consequence of the quadratic term u∗ · ∇δy which is only in L1 (), G is not defined on all of Y = H01 (). As an operator from Y1 to W , the operator G is not surjective. Considered as an operator with domain in Y , its adjoint is given by G∗ w = − w − ∇ · (u∗ w). The domain of G∗ contains Y1 and hence G∗ is densely defined. Moreover its range contains L2 () and thus (H1) as well as (H2) are satisfied. Let U (u∗ ) ⊂ U be a bounded neighborhood of u∗ . Since for every u ∈ U (u∗ ) (∇(y(u) − y ∗ ), ∇v) − (u(y(u) − y ∗ ), ∇v) = ((u − u∗ )y ∗ , ∇v) for all v ∈ H01 (), it follows that there exists a constant k > 0 such that |y(u) − y ∗ |H 1 ≤ k|u − u∗ |L2n for all u ∈ U (u∗ ),
(1.5.13)
and (H3) follows. The validity of (H4) is a consequence of (1.5.13) and the fact that λ∗ is the unique variational solution in H01 () to − λ∗ − ∇ · (u∗ λ∗ ) = −(y ∗ − z) and hence an element of L∞ (). Remark 1.5.1. Comparing Examples 1.19 and 1.20 with Example 1.22 we observe that the linearization e (y, u), with (y, u) ∈ Y1 × U , is well defined on Y × U for Examples 1.19 and 1.20 but it is only defined with domain strictly contained in Y × U for Example 1.22. For none of these examples is e defined on all of Y × U . Example 1.23. Here we consider the nonlinear optimal control problem with nonlinearity of blowup type: ⎧ min 12 |∇(y − z)|2L2 () + α2 |u|2L2 () subject to ⎪ ⎪ 2 ⎨ − y − exp y = f in , (1.5.14) ∂y ⎪ ⎪ ⎩ ∂n = u on , y = 0 on ∂ \ ,
i
i i
i
i
i
i
22
ItoKunisc 2008/6/12 page 22 i
Chapter 1. Existence of Lagrange Multipliers
where α > 0, z ∈ H1 (), f ∈ L2 (), is a smooth bounded domain in R2 , and ⊂ ∂ is a connected strict subset of ∂. Since is assumed to be a two-dimensional domain we have the following property of the exponential function: for every p ∈ [1, ∞), {| exp y|Lp : y ∈ B} is bounded
(1.5.15)
provided that B is a bounded subset of H01 () [GT, p. 155]. The variational form of the boundary value problem in (1.5.14) is given by (∇y, ∇v) − (exp y, v) = (f, v) + (u, v) for all v ∈ H1 (),
(1.5.16)
where H1 () is defined in Example 1.19. To argue existence of a solution to (1.5.14) let {(yn , un )} be a minimizing sequence with weak limit (y ∗ , u∗ ) ∈ H01 () × L2 (). Due to (1.5.15) and the radial unboundedness of the cost functional with respect to the H1 () × L2 ()-norm the set {| exp yn |Lp : n ∈ N} is bounded for every p ∈ [1, ∞) and {| exp yn |W 1,p : n ∈ N} is bounded for every p ∈ [1, 2). Since W 1,p () is compactly embedded in L2 () for every p ∈ (1, 2) it follows that for a subsequence of {yn }, denoted by the same symbol, lim exp(yn ) = exp y ∗ in L2 (). It is now simple to argue that (y ∗ , u∗ ) is a solution to (1.5.14). Let us discuss then the validity of (H1)–(H4) with Y = Y1 = H1 (), W = (H1 ())∗ , with the obvious choice for J , and with e : Y × U → W the mapping assigning to (y, u) ∈ Y × U the functional v → (∇y, ∇v) − (exp y, v) − (f, v) − (u, v) for v ∈ H1 (). From (1.5.15) it follows that e is well defined and its Fréchet derivative at (y, u) in direction (δy, δu) is characterized by (e (y, u)(δy, δu), v) = (∇δy, ∇v) − (exp(y)δy, v) − (δu, v) for v ∈ H1 (). The operator G = ey (y ∗ , u∗ ) can be expressed as G(δy) = − δy − exp(y ∗ )δy. Note that G ∈ L(Y, W ∗ ), and G is self-adjoint with compact resolvent. In particular, (H1) is satisfied. The spectrum of G consists of eigenvalues only. It will be assumed that 0 is not an eigenvalue of G.
(1.5.17)
Due to the regularity assumption for z (note that it would suffice to have z ∈ (H1 )∗ ), (1.5.17) implies that (H2) holds. To argue the validity of (H3) and (H4) one can rely on the implicit function theorem. Let B be a bounded open neighborhood of y ∗ in H1 (). Using (1.5.15) one argues the existence of a constant κ > 0 such that | exp y − exp y| ¯ L4 ≤ κ|y − y| ¯ H 1 for all y, y¯ ∈ B. It follows that e is continuous on B × U and its partial derivative ey (y, u) is Lipschitz continuous with respect to (y, u) ∈ B × U . The implicit function theorem implies the existence of a neighborhood U (u∗ ) of u∗ such that for every u ∈ U (u∗ ) there exists a solution y(u∗ ) of (1.5.16) depending continuously on u. Since e(y, u) is Lipschitz continuous with respect to u it follows, moreover, that there exists L > 0 such that |y(u) − y ∗ |H 1 ≤ L|u − u∗ |L2 () for all u ∈ U (u∗ ). (H3) and (H4) are a direct consequence of this estimate.
i
i i
i
i
i
i
1.6. Approximation, penalty, and adapted penalty techniques
ItoKunisc 2008/6/12 page 23 i
23
The methodology utilized to consider this example can also be applied to Examples 1.19 and 1.20 provided that is restricted to being two-dimensional. This is essential for (1.5.15) to hold. For Example 1.23 it is essential for the cost functional to be radially unbounded with respect to the H1 ()-norm for the y-component to guarantee that minimizing sequences are bounded. For Examples 1.19 and 1.20 the a priori bound on the y-component of minimizing sequences can be obtained through the state equation.
1.6 Approximation, penalty, and adapted penalty techniques As in the previous section we consider problems of type min J (y, u) subject to e(y, u) = 0, u ∈ C ⊂ U.
(1.6.1)
We describe techniques that in specific situations can be more powerful than the general theory of Section 1.4 to obtain optimality systems of type ⎧ ∗ ∗ e(y , u ) = 0, ⎪ ⎪ ⎪ ⎪ ⎨ ey∗ (y ∗ , u∗ )λ∗ + Jy (y ∗ , u∗ ) = 0, (1.6.2) ⎪ ⎪ ⎪ ⎪ ∗ ∗ ∗ ∗ ⎩ eu (y , u )λ + Ju (y ∗ , u∗ ), u − u∗ U ∗ ,U ≥ 0 for all u ∈ C, where (y ∗ , u∗ ) denotes a solution to (1.6.1). We shall proceed formally here, referring the reader to [Ba, Chapter 3.2.2] and [Lio1] for details.
1.6.1 Approximation techniques If J and/or e are not smooth, then one can aim for obtaining an optimality system of the form (1.6.2) by introducing families of smooth operators eε and smooth functionals Jε , ε > 0, and replacing (1.6.1) by min Jε (y, u) (1.6.3) subject to eε (y, u) = 0, and u ∈ C. Let (yε , uε ), ε > 0, denote solutions to (1.6.3). Under relatively mild conditions it can be argued that {(yε , uε )}ε>0 contains accumulation points and that they are solutions of (1.6.2). But they need not coincide with (y ∗ , u∗ ). To overcome this difficulty a penalty term is added to the cost in (1.6.3), resulting in the adapted penalty formulation ⎧ 1 ⎪ ⎨min Jε (y, u) + | u − u∗ |2 2 (1.6.4) ⎪ ⎩ subject to eε (y, u) = 0 and u ∈ C. It can be shown that under appropriate assumptions (yε , uε ) converges to (y ∗ , u∗ ) and that there exists λ∗ such that (1.6.2) holds. This approach can be used for optimal control of semilinear elliptic equations, variational inequalities [Ba, BaK1, BaK2], and state-dependent parameter estimation problems [BKR], for example.
i
i i
i
i
i
i
24
1.6.2
ItoKunisc 2008/6/12 page 24 i
Chapter 1. Existence of Lagrange Multipliers
Penalty techniques
This technique is well suited for systems with multiple steady states or finite explosion property. The procedure is explained by means of an example. We consider the nonlinear parabolic control system ⎧ yt − y − y 3 = u in Q = (0, T ) × , ⎪ ⎪ ⎪ ⎪ ⎨ y = 0 on = (0, T ) × ∂, (1.6.5) ⎪ ⎪ ⎪ ⎪ ⎩ y(0, ·) = y0 in , where y = y(t, x), T > 0, and y0 is a given initial condition. With (1.6.5) we associate the optimal control problem ⎧ 1 α ⎪ ⎨min J (y, u) = |y − z|6L6 (Q) + |u|2L2 (Q) 6 2 (1.6.6) ⎪ ⎩ 1,2 subject to u ∈ C, y ∈ H (Q) and (1.6.5), where α > 0, z ∈ L6 (Q), C is a closed convex set in L2 (Q), and H 1,2 (Q) = ϕ ∈ L2 (Q) : ϕt , ϕxi , ϕxi xj ∈ L2 (Q) . Realizing the state equation by a penalty term leads to ⎧
2 1 ⎪ ⎨min Jε (y, u) = J (y, u) + yt − y − y 3 − u L2 (Q) ε ⎪ ⎩ subject to u ∈ C, y ∈ H 1,2 (Q), y| = 0, y(0, ·) = y0 .
(1.6.7)
It can be shown that for every ε > 0 there exists a solution (yε , uε ) to (1.6.7). Considering the behavior of the pair (yε , uε ) as ε → 0+ the question of convergence to a particular solution (y ∗ , u∗ ) of (1.6.6) again arises. As before this can be realized by means of adaptation terms: ⎧
2
2 1 1 ⎪ ⎨min Jε (y, u) + y − y ∗ L2 (Q) + u − u∗ L2 (Q) 2 2 (1.6.8) ⎪ ⎩ 1,2 subject to u ∈ C, y ∈ H (Q), y| = 0, y(0, ·) = y0 . Let us denote solutions to (1.6.8) by (y ε , uε ). It is fairly straightforward to derive optimality conditions for (1.6.8): Setting λε = − we find
1 ε yt − y ε − (y ε )3 − uε ∈ L2 (Q) ε
⎧ ε ε ε 2 ε 5 ε ∗ ⎪ ⎪λt − λ − 3(y ) λε = −(y − z) − (y − y ) ⎪ ⎪ ⎨ λε = 0 on , ⎪ ⎪ ⎪ ⎪ ⎩ ε λ (T , ·) = 0 in Q
(1.6.9)
in Q, (1.6.10)
i
i i
i
i
i
i
1.6. Approximation, penalty, and adapted penalty techniques and
(uε − u∗ )(u − uε )dQ ≥ 0
(αuε − λε )(u − uε )dQ + Q
for all u ∈ C,
ItoKunisc 2008/6/12 page 25 i
25
(1.6.11)
Q
where λε must be interpreted as a weak solution in L2 (Q) of (1.6.10) (i.e., the inner product with smooth test function is taken, and partial integration bringing all differentiations on the test function is carried out). It can be verified that uε → u∗ in L2 (Q), y ε y ∗ weakly in H 1,2 (Q) and that there exists λ∗ ∈ W 2,1;6/5 , the weak limit of λε , such that the following optimality system for (1.6.6) is satisfied: ⎧ 3 ⎪ ⎪yt − y − y = u in Q, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨−λt − λ − 3y 2 λ = −(y − z)5 in Q, (1.6.12) ! ⎪ ⎪ y = λ = 0 on , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ y(0, ·) = y 0 , λ(T , ·) = 0 in , where W 2,1;6/5 = ϕ ∈ L6/5 (Q) : ϕt , ϕxi , ϕxi xj ∈ L6/5 (Q) ; see [Lio1, Chapter I.3].
i
i i
i
i
ItoKunisc 2008/6/12 page 26 i
i
i
i
i
i
i
i
i
i
ItoKunisc 2008/6/12 page 27 i
Chapter 2
Sensitivity Analysis
2.1
Generalities
In this chapter we discuss the sensitivity of solutions to parameterized mathematical programming problems of the form min x∈C
f (x, p) subject to (2.1.1)
e(x, p) = 0, g(x, p) ≤ 0, and (x, p) ∈ K, where C is a closed convex set in X, and further e : X × P → W represents an equality constraint, g : X × P → Rm a finite-dimensional inequality constraint, and : X × P → Z an infinite-dimensional affine constraint. The cost f as well as the equality and inequality constraints are allowed to depend on a parameter p ∈ P . Unless otherwise specified P is a normed linear space, X, W, Z are real Hilbert spaces, and K is a closed convex cone with vertex at 0 in Z. Such problems arise in inverse, control problems, and variational inequalities, and some examples will be discussed in Section 2.6. The cone K induces a natural ordering ≤ by means of z1 ≤ z2 if z1 − z2 ∈ K. We denote by K + the dual cone given by K + = {z ∈ Z ∗ : z, z˜ ≤ 0 for all z˜ ∈ K}, where ·, · denotes the duality pairing between Z and Z ∗ . Suppose that x0 is the solution to (2.1.1) for the nominal parameter p = p0 . The objective of this section is to investigate • continuity and Lipschitz continuity of the solution mapping p ∈ P → (x(p), λ(p), μ(p), η(p)) ∈ X × W ∗ × Rm,+ × K + , where for p in a neighborhood of p0 , x(p) ∈ X denotes the local solution of (2.1.1) in a neighborhood of x0 , and λ, μ, η are the Lagrange multipliers associated with constraints described by e, g, in (2.1.1); • differentiability of the minimum value function F (p) = f (x(p), p) at p0 ; and 27
i
i i
i
i
i
i
28
ItoKunisc 2008/6/12 page 28 i
Chapter 2. Sensitivity Analysis • directional differentiability of the solution mapping.
Due to the local nature of the analysis, f , e, and g need only be defined in a neighborhood of (x0 , p0 ). We assume throughout this chapter that they are twice continuously Fréchet differentiable with respect to x and that their first and second derivatives are continuous in a neighborhood of (x0 , p0 ). It is assumed that is affine in x for every p ∈ P and that the first derivative (p0 ) with respect to x is continuous in a neighborhood of p0 . In order to ensure the existence of a Lagrange multiplier at x0 we assume that (H1) x0 is a regular point, i.e., ⎧⎛ ⎛ ⎞ ⎞ ⎛ ⎞⎫ 0 0 ⎨ e (x0 , p0 ) ⎬ ⎠ + ⎝ g(x0 , p0 ) ⎠ , 0 ∈ int ⎝ g (x0 , p0 ) ⎠ (C − x0 ) + ⎝ Rm + ⎩ ⎭ −K (p0 ) (x0 , p0 )
(2.1.2)
where the interior is taken in W × Rm × Z and primes denote the Fréchet derivatives with respect to x. We define the Lagrange functional L : X × P × W ∗ × Rm × Z ∗ → R by L(x, p, λ, μ, η) = f (x, p) + λ, e(x, p) + μ, g(x, p) Rm + η, (x, p) . With (2.1.2) holding, it follows from Theorem 1.6 that there exists a Lagrange multiplier (λ0 , μ0 , η0 ) ∈ W ∗ × Rm,+ × K + such that L (x0 , p0 , λ0 , μ0 , η0 ), c − x0 ≥ 0
for all c ∈ C,
e(x0 , p0 ) = 0, (2.1.3) μ0 , g(x0 , p0 ) = 0,
g(x0 , p0 ) ≤ 0, μ0 ≥ 0,
η0 , (x0 , p0 ) = 0,
(x0 , p0 ) ∈ K, η0 ∈ K + .
To express (2.1.3) in a more compact form we recall that the subdifferential ∂ψC of the indicator function 0 if x ∈ C, ψC (x) = ∞ if x ∈ /C of a closed convex set C in a Banach space X is given by {y ∈ X∗ : y, c − x X∗ ,X ≤ 0 for all c ∈ C} ∂ψC (x) = {}
if x ∈ C, if x ∈ / C.
The set ∂ψC (x) is also referred to as the normal cone to C at x. For convenience we also specify ∂ψK + (η) =
{z ∈ Z : z∗ − η, z X∗ ,X ≤ 0 for all z∗ ∈ K + } {}
if η ∈ K + , if η ∈ / K +.
i
i i
i
i
i
i
2.1. Generalities
ItoKunisc 2008/6/12 page 29 i
29
Note that (2.1.3) is equivalent to ⎧ L (x0 , p0 , λ0 , μ0 , η0 ) + ∂ψC (x0 ), ⎪ ⎪ ⎨ e(x0 , p0 ), 0∈ −g(x ⎪ 0 , p0 ) + ∂ψRm,+ (μ0 ), ⎪ ⎩ −(x0 , p0 ) + ∂ψK + (η0 ).
(2.1.4)
In fact this follows from the following characterization of the normal cone to K + . Lemma 2.1. Suppose that K is a closed convex cone in Z. Then z ∈ ∂ψK + (η) if and only if z ∈ K, η ∈ K + , and η, z = 0. Proof. Assume that z ∈ ∂ψK + (η). Then ∂ψK + (η) is nonempty, η ∈ K + , and z∗ − η, z ≤ 0
for all z∗ ∈ K + .
(2.1.5)
Hence with z∗ = 2η this implies η, z ≤ 0 and for z∗ = η2 we have η, z ≥ 0, and therefore η, z = 0. From (2.1.5) it follows that z∗ , z ≤ 0 for all z∗ ∈ K + , and hence the geometric form of the Hahn–Banach theorem implies that z ∈ K. Conversely, if z ∈ K, η ∈ K + , and η, z = 0, then z∗ − η, z ≤ 0 for all z∗ ∈ Z ∗ and thus z ∈ ∂ψK + (η). Let A : X → X∗ be the operator representation of the Hessian L (x0 , p0 , λ0 , μ0 , η0 ) of L such that Ax1 , x2 = L (x0 , p0 , λ0 , μ0 , η0 )(x1 , x2 ) and define operators E : X → W , G : X → Rm , and L : X → Z by E = e (x0 , p0 ),
G = g (x0 , p0 ),
L = (p0 ).
Without loss of generality one can assume that the coordinates of the inequality constraints g(x, p0 ) ≤ 0 and the associated Lagrange multiplier μ0 are arranged so that − 0 + 0 − + : X → R m 1 , g 0 : X → R m2 , μ0 = (μ+ 0 , μ0 , μ0 ) and g = (g , g , g ), with g − m3 g : X → R , m = m1 + m2 + m3 , and g + (x0 , p0 ) = 0, μ+ 0 > 0, g 0 (x0 , p0 ) = 0, μ00 = 0, g − (x0 , p0 ) < 0, μ− 0 = 0.
(2.1.6)
We further define G+ = g + (x0 , p0 ) ,
and E+ =
E G+
G0 = g 0 (x0 , p0 ) ,
: X → W × R m1 .
i
i i
i
i
i
i
30
ItoKunisc 2008/6/12 page 30 i
Chapter 2. Sensitivity Analysis
Note that the coordinates denoted by superscripts + behave locally like equality constraints. Define E(z) : X × R → (W × Rm1 ) × Rm2 × Z for z ∈ Z by ⎛
E+ E(z) = ⎝ G0 L
⎞ 0 0 ⎠. z.
The adjoint E ∗ (z) is given by E ∗ (z) =
E+∗ 0
G∗0 0
L∗ ·, z Z
.
The following conditions will be used. (H2) There exists a κ > 0 such that Ax, x X∗ ,X ≥ κ |x|2X for all x ∈ ker(E+ ). (H3) E(z) is surjective at z = (x0 , p0 ). (H4) There exists a neighborhood U˜ × N˜ of (x0 , p0 ) and a positive constant ν such that |f (x, p) − f (x, q)| + |e(x, p) − e(x, q)|W + |g(x, p) − g(x, q)|Rm + |(x, p) − (x, q)|Z ≤ ν |p − q|P for all (x, p) and (x, q) in U˜ × N˜ . Condition (H2) is a second order sufficient optimality condition for (2.1.1) at x0 and (H3) implies that there exists a neighborhood O of (x0 , p0 ) in Z such that E(z) is surjective for all z ∈ O. The chapter is organized as follows. In Sections 2.2 and 2.3 we discuss the basic results for establishing Lipschitz continuity of the solution mapping. The implicit function theory for the generalized equation of the form (2.1.4) is discussed in Section 2.2. In Section 2.3 stability results for solutions to mathematical programming problems including (2.1.1) are addressed. Sufficient conditions for stability of the solutions are closely related to sufficient optimality conditions for local minimizers. Section 2.3 therefore also contains first and second order sufficient conditions for local optimality. Sections 2.4 and 2.5 are devoted to establishing Lipschitz continuity and directional differentiability of the solution mapping. For the latter we employ the assumption of polyhedricity of a closed convex cone K which, together with appropriate additional conditions, implies that the directional ˙ μ, derivative (x, ˙ λ, ˙ η) ˙ in direction q satisfies ⎧ L p (x0 , p0 , λ0 , μ0 , η0 )q + Ax˙ + E ∗ λ˙ + G∗+ μ˙ + + G∗0 μ˙ 0 + L∗ η, ˙ ⎪ ⎪ ⎨ ˙ −ep+ (x0 , p0 )q − E+ x, 0∈ −gp0 (x0 , p0 )q − G0 x˙ + ∂ψRm2 ,+ (μ˙ 0 ), ⎪ ⎪ ⎩ ˙ −p (x0 , p0 )q − Lx˙ + ∂ψKˆ + (η),
(2.1.7)
where e+ = ge+ and Kˆ + is the dual cone of Kˆ = ∪λ>0 λ(K − (x0 , p0 ))∩[η0 ]⊥ . Section 2.6 contains some applications to optimal control of ordinary differential equations. This chapter is strongly influenced by the work of S. M. Robinson and also by that of W. Alt.
i
i i
i
i
i
i
2.2. Implicit function theorem
2.2
ItoKunisc 2008/6/12 page 31 i
31
Implicit function theorem
In this section we discuss an implicit function theorem for parameterized generalized equations of the form 0 ∈ F (x, p) + ∂ψC (x),
(2.2.1)
where F : X × P → X∗ , with X ∗ the strong dual space of X, and C is a closed convex set in X. We assume that X and P are normed linear spaces unless specified otherwise. Suppose x0 ∈ X is a solution to (2.2.1) for a reference parameter p = p0 ∈ P . Let F (x, p0 ) be Fréchet differentiable at x0 and define the linearized form by T x = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) + ∂ψC (x).
(2.2.2)
Definition 2.2 (Strong Regularity). The generalized equation (2.2.1) is called strongly regular at x0 with associated Lipschitz constant ρ if there exist neighborhoods of V of 0 and U of x0 in X such that (T −1 V ) ∩ U , the intersection of U with the restriction of T −1 to V , is single valued and Lipschitz continuous from V to U with modulus ρ. Note that if C = X, the strong regularity assumption coincides with F (x0 , p0 )−1 being a bounded linear operator, which is the common condition for the implicit function theorem. The following result is taken from Robinson’s work [Ro3]. Theorem 2.3 (Generalized Implicit Function Theorem). Let P be a topological space, and assume that F is Fréchet differentiable with respect to x and both F and F are continuous at (x0 , p0 ). If (2.2.1) is strongly regular at x0 with associated Lipschitz constant ρ, then for every > 0 there exists neighborhoods N of p0 and U of x0 , and a singlevalued function x : N → U such that for every p ∈ N , x(p) is the unique solution in U of (2.2.1). Furthermore, for each p, q ∈ N |x(p) − x(q)|X ≤ (ρ + ) |F (p, x(q)) − F (q, x(q))|X∗ .
(2.2.3)
Proof. For > 0 we choose a δ > 0 such that ρδ < /(ρ + ). By strong regularity, there exists neighborhoods V of 0 and U of x0 in X such that (T −1 V ) ∩ U is single valued and Lipschitz continuous from V to U with modulus ρ. Let h(x, p) = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) − F (x, p) and choose a neighborhood N of p0 and a closed ball U of radius r about x0 contained in U so that for each p ∈ N and x ∈ U we have for h(x, p) ∈ V |F (x, p) − F (x0 , p0 )| ≤ δ and ρ |F (x0 , p) − F (x0 , p0 )| ≤ (1 − ρδ)r.
i
i i
i
i
i
i
32
ItoKunisc 2008/6/12 page 32 i
Chapter 2. Sensitivity Analysis
For any p ∈ N define an operator p from U → U by p (x) = T −1 (h(x, p)) ∩ U.
(2.2.4)
Note that x ∈ U ∩ p (x) if and only if x ∈ U and (2.2.1) holds. We show that p is a strict contraction and p maps U into itself. For x1 , x2 ∈ U we find, using h (x, p) = F (x0 , p0 ) − F (x, p), that |p (x1 ) − p (x2 )| ≤ ρ |h(x1 , p) − h(x2 , p)| = ρ|
1 0
h (x1 + t (x1 − x2 ), p)(x1 − x2 ) dt| ≤ ρδ |x1 − x2 |,
where ρ δ < 1. Since x0 = T −1 (0) ∩ U we have |p (x0 ) − x0 | ≤ ρ |h(x0 , p)| = ρ |F (x0 , p) − F (x0 , p0 )| ≤ (1 − ρδ)r, and thus for x ∈ U |p (x) − x0 | ≤ |p (x) − p (x0 )| + |p (x0 ) − x0 | ≤ ρδ |x − x0 | + (1 − ρδ)r ≤ r. By the Banach fixed point theorem p has a unique fixed point x(p) in U and for each x ∈ U we have |x(p) − x| ≤ (1 − ρδ)−1 |p (x) − x|. (2.2.5) It follows from our earlier observation that x(p) is the unique solution to (2.2.1) in U . To verify (2.2.3) with p, q ∈ N we use (2.2.5) with x = x(q) and obtain |x(p) − x(q)| ≤ (1 − ρδ)−1 |p (x(q)) − x(q)|. Since x(q) = q (x(q)) we find |p (x(q)) − x(q)| ≤ ρ |h(x(q), p) − h(x(q), q)| = ρ |F (x(q), p) − F (x(q), q)|, and hence
|x(p) − x(q)| ≤ ρ(1 − ρδ)−1 |F (x(q), p) − F (x(q), q)|.
Observing that ρ(1 − ρδ)−1 ≤ ρ + the desired estimate (2.2.3) follows. As a consequence of the previous theorem, Lipschitz continuity of F (x, p) in p implies local Lipschitz continuity of x(p) at p0 . Corollary 2.4. Assume in addition to the conditions of Theorem 2.3 that P is a normed linear space and that for some ν > 0 |F (x, p) − F (x, q)| ≤ ν |p − q| for p, q ∈ N and x ∈ U . Then x(p) is Lipschitz on N with modulus ν (ρ + ). It should be remarked that the condition of strong regularity is the weakest possible condition which can be imposed on the values of a function F and its derivative at a point x0 ,
i
i i
i
i
i
i
2.2. Implicit function theorem
ItoKunisc 2008/6/12 page 33 i
33
so that for each perturbation satisfying the hypothesis of Theorem 2.3, a function x(·) will exist having the properties stated in Theorem 2.3. To see this, one has only to consider a function F : X → X ∗ which is Fréchet differentiable at x0 and satisfies 0 ∈ F (x0 ) + ∂ψC (x0 ). Let P be a neighborhood of the origin in X ∗ , and let F (x, p) = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) − p with p0 = 0. Choose > 0. If there exist neighborhoods N and V and a function x(·) satisfying the properties stated in Theorem 2.3, then with T x = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) + ∂ψC (x) we see that the restriction to N of T −1 V is a singled-valued, Lipschitz continuous function. Therefore the generalized equation 0 ∈ F (x, p0 ) + ∂ψC (x) is strongly regular at x0 . One of the important consequences of Theorem 2.3 is the following theorem on parametric sensitivity analysis. Theorem 2.5. Under the conditions of Theorem 2.3 and Corollary 2.4 there exists for each > 0 a function α (p) : N → R+ satisfying limp→p0 α (p) = 0, such that |x(p) − p (x0 )| ≤ α (p) |p − p0 |, where p (x0 ) is the unique solution in U of the linear generalized equation 0 ∈ F (x0 , p) + F (x0 , p0 )(x − x0 ) + ∂ψC (x).
(2.2.6)
Proof. From the proof of Theorem 2.3 it follows that x(p) = p (x(p)) and thus by strong regularity |x(p) − p (x0 )| ≤ ρ |h(x(p), p) − h(x0 , p)|
1
≤ ρ h (x0 + θ (x(p) − x0 )) dθ
|x(p) − x0 | 0
1
≤ ρν(ρ + )|p − p0 | h (x0 + θ (x(p) − x0 )) dθ
. 0
Since h (x, p) = F (x0 , p0 ) − F (x, p) and F is continuous,
1
e (x0 + θ (x(p) − x0 )) dθ
→ 0
0
as p → p0 , which completes the proof. In the case C = X the generalized equation (2.2.1) reduces to the nonlinear equation F (x, p) = 0 and p (x0 ) = F (x0 , p0 )−1 (F (x0 , p0 ) − F (x0 , p)).
i
i i
i
i
i
i
34
ItoKunisc 2008/6/12 page 34 i
Chapter 2. Sensitivity Analysis
Thus, Theorem 2.5 shows that if F (x0 , ·) is Fréchet differentiable at p0 , then so is x(p) and x (p0 ) = −F (x0 , p0 )−1
∂F (x0 , p0 ). ∂p
(2.2.7)
In many applications, one might find (2.2.6) significantly easier to solve than (2.2.1). In fact, the necessary optimality condition (2.1.3) is of the form (2.2.1), and (2.2.6) corresponds to a quadratic programming problem, as we will discuss in Section 2.4. Thus Theorem 2.5 can provide a relatively cheap way to find a good approximation to x(p) for p near p0 .
2.3
Stability results
Throughout this section, let X, Y be real Banach spaces and let C be closed convex subsets of X. Concerning K it suffices in this section to assume that it is a closed convex set in Y , unless it is specifically assumed that it is a closed convex cone in Y . For p in the metric space P with metric δ(·, ·) consider the parameterized minimization problem min f (x, p)
subject to x ∈ C, g(x, p) ∈ K.
(2.3.1)
This class of problems contains (2.1.1) as a special case. To facilitate the analysis of (2.3.1) let (p) denote the set of feasible points, i.e., (p) = {x ∈ C : g(x, p) ∈ K}. For a given p0 ∈ P and x0 ∈ (p0 ) a local minimizer of (2.3.1), define the set r (p) by ¯ 0 , r), r (p) = (p) ∩ B(x ¯ 0 , r) is the closure of the open ball with radius r at x0 in X. We further where r > 0 and B(x define the value function μr (p) by μr (p) = inf {f (x, p)|x ∈ r (p)}.
(2.3.2)
Throughout this section it is assumed that f : X × P → R and g : X × P → Y are continuously Fréchet differentiable with respect to x and that their first derivatives are continuous in a neighborhood of (x0 , p0 ). Recall from Definition 1.5 that x0 ∈ (p0 ) is a regular point if 0 ∈ int {g(x0 , p0 ) + gx (x0 , p0 )(C − x0 ) − K}. (2.3.3) We use a generalized inverse function theorem for set-valued functions due to Robinson [Ro3] for the stability analysis of (2.3.1). For a nonempty set S in X and x ∈ X we define d(x, S) = inf {|y − x| : y ∈ S}, and d(x, S) = ∞ if S = ∅. For nonempty sets S and S¯ in X the Hausdorff distance is defined by ¯ = max sup{d(x, S) ¯ : x ∈ S}, sup{d(y, S) : y ∈ S} ¯ . dH (S, S)
i
i i
i
i
i
i
2.3. Stability results
ItoKunisc 2008/6/12 page 35 i
35
We further set BX = {x ∈ X : |x| < 1}. For a set-valued function F : X → Y we define the preimage of F by F −1 (y) = {x ∈ X : y ∈ F (x)}. A set-valued function is called closed and convex if its graph is closed and convex. Theorem 2.6 (Generalized Inverse Function Theorem). Let X and Y be normed linear spaces and F a closed convex set-valued function from X to Y . Let y0 ∈ F (x0 ) and suppose that for some η > 0, y0 + η BY ⊂ F (x0 + BX ). Then for x ∈ x0 + BX and any y ∈ y0 + int (η BY ), d(x, F −1 (y) ∩ (x0 + BX )) ≤ (η − |y − y0 |)−1 (1 + |x − x0 |)d(y, F (x)). Proof. Let x ∈ x0 + BX and y − y0 ∈ int (η BY ). For y ∈ F (x) the claim is clearly satisfied and therefore we assume y ∈ / F (x). Let δ > 0 be arbitrary and choose yδ ∈ F (x) such that 0 < |yδ − y| < d(y, F (x)) + δ.
(2.3.4)
Let α = η − |y − y0 | > 0. Choose any ∈ (0, α) and set y = y + (α − )|y − yδ |−1 (y − yδ ). Then |y − y0 | ≤ |y − y0 | + (α − ) < η, and y ∈ y0 + int (ηBY ). Thus by assumption there exists x ∈ x0 + BX with y ∈ F (x ). For
1 1+(α−)|y−yδ |−1
∈ (0, 1), we find, using convexity of F and the fact that yδ ∈ F (x),
y = (1 − λ)yδ + λ y ∈ (1 − λ)F (x) + λ F (x ) ⊂ F ((1 − λ)x + λ x ).
(2.3.5)
Since x and x are contained in x0 + BX we have (1 − λ)x + λ x ∈ x0 + BX . Together with (2.3.5) this implies d(x, F −1 (y) ∩ (x0 + BX )) ≤ |x − (1 − λ)x + λ x | = λ |x − x |. However, |x − x | ≤ |x − x0 | + |x0 − x | ≤ |x − x0 | + 1 and λ=
|y − yδ | |y − yδ | < . |y − yδ | + α − α−
Hence by (2.3.4) d(x, F −1 (y) ∩ (x0 + BX )) ≤ (η − |y − y0 | − )−1 (1 + |x − x0 |) (d(y, F (x)) + δ), and letting → 0+ , δ → 0+ the claim follows.
i
i i
i
i
i
i
36
ItoKunisc 2008/6/12 page 36 i
Chapter 2. Sensitivity Analysis
The proof of the stability result below relies on the following set-valued fixed point lemma. For a metric space (X , d) let F(X ) denote the collection of all nonempty closed subsets of X . Lemma 2.7. Let (X , d) be a complete metric space and let : X → F(X ) be a set-valued mapping. Suppose that there exists x0 ∈ X and constants r > 0, α ∈ (0, 1), ∈ (0, r) such ¯ 0 , r) the sets (x1 ), (x2 ) are nonempty, that for all x1 , x2 in the closed ball B(x dH ((x1 ), (x2 )) ≤ α d(x1 , x2 ), and d(x0 , (x0 )) ≤ (1 − d)(r − ). ¯ 0 , r) such that x¯ ∈ (x) Then there exists x¯ ∈ B(x ¯ and d(x0 , x) ¯ ≤ (1−α)−1 d(x0 , (x0 ))+. If moreover dH ((S1 ), (S2 )) ≤ α dH (S1 , S2 )
(2.3.6)
¯ 0 , r), then the sequence Sk = (Sk−1 ), for all nonempty closed subsets S1 and S2 of B(x ¯ with S0 = {x0 }, converges in the Hausdorff metric to a set S¯ satisfying S¯ = (S). Proof. Let δ = α(x0 , (x0 )) and set γ = α −1 (1 − α). By induction one proves the ¯ 0 , r) and existence of a sequence xk+1 ∈ (xk ), k = 0, 1, . . . , such that xk ∈ B(x d(xk+1 , xk ) ≤ α k δ + (1 − 2−(k+1) )γ α k+1 , d(xk+ü1 , x0 ) ≤ (1 − α k+1 )(r − ) + γ
k+1
(1 − 2−i )α i .
(2.3.7)
i=1
In fact, choose x1 ∈ (x0 ) with d(x0 , x1 ) ≤ δ + γ2 α ≤ (1 − α)(r − ) + γ2 α ≤ r. ¯ 0 , r) and (2.3.7) holds for k = 0. Now suppose that (2.3.7) holds with k Hence x1 ∈ B(x replaced by k − 1. Then d(x0 , xk ) < r and hence (xk ) is nonempty and there exists xk+1 ∈ (xk ) satisfying d(xk+1 , xk ) ≤ d((xk ), xk )+2−(k+1) γ α k+1 . Consequently d(xk+1 , xk ) ≤ k+1 dH ((xk ), (xk−1 )) + 2−(k+1) γ α k+1 ≤ α d(xk , xk−1 ) + 2−(k+1) ≤ α k δ + (1 − !γk+1α −(k+1) k+1 2 )γ α , and d(xk+1 , x0 ) ≤ d(xk+1 , xk ) + d(xk , x0 ) ≤ γ i=1 (1 − 2−i )α i + (1 − k+1 α )(r − ), and (2.3.7) holds for all k. For k ≥ 0 and for every m ≥ 1 we have d(xk+m , xk ) ≤
m−1
d(xk+i+1 , xk+i )
i=0
≤
m−1
α k+i δ + (1 − 2−(k+i+1) )γ α k+i+1
i=0
≤ α k (1 − α)−1 (δ + γ α),
i
i i
i
i
i
i
2.3. Stability results
ItoKunisc 2008/6/12 page 37 i
37
¯ o , r). Since X is complete there exists and consequently {xk } is a Cauchy sequence in B(x δ ¯ x¯ ∈ B(x0 , r) with limk→∞ xk = x¯ and d(x, ¯ x0 ) ≤ 1−α + . To verify the fixed point property, note that for every k we have d(x, ¯ (x)) ¯ ≤ d(x, ¯ xk+1 ) + d(xk+1 , (x)) ¯ ≤ d(x, ¯ xk+1 ) + dH ((xk ), (x)) ¯
(2.3.8)
≤ d(x, ¯ xk+1 ) + α d(xk , x). ¯ Passing to the limit with respect to k implies that x¯ ∈ (x). ¯ ¯ o , r) We now assume that also satisfies (2.3.6). Since the set of closed subsets of B(x ¯ is complete with respect to the Hausdorff metric, the existence of a set S with lim Sk = S¯ in the Hausdorff metric follows. To argue invariance of S¯ under note that ¯ (S)) ¯ = lim dH (Sk , (S)) ¯ dH (S, k→∞
¯ ≤ lim dH (Sk−1 , S) ¯ =0 = lim dH ((Sk−1 ), (S)) k→∞
k→∞
¯ and hence S¯ = (S). The following result is an important consequence of Theorem 2.6. Theorem 2.8 (Stability). Let x0 ∈ (p0 ) be a regular point. Then for every > 0 there exist neighborhoods U of x0 , N of p0 such that for p ∈ N the set (p) is nonempty and for each x ∈ U ∩ C d(x, (p)) ≤ (M + ) d(0, {g(x, p) − K : x ∈ C}),
(2.3.9)
where a constant M > 0 is independent of > 0. Proof. Let F be the closed convex set-valued mapping from X into Y given by g (x0 , p0 )(x − x0 ) + g(x0 , p0 ) − K for x ∈ C, F (x) = {} for x ∈ / C. The regular point condition implies that there exists an η > 0 such that η BY ⊂ F (x0 + BX ). Here we use Theorem 1.4 and the discussion below Definition 1.5. For δ > 0 and r > 0 let ρ = (η − 2δ r)−1 (1 + r) where we assume that 2δ r < η Let
and
2ρδ < 1.
h(x, p) = g(x0 , p0 ) + g (x0 , p0 )(x − x0 ) − g(x, p)
i
i i
i
i
i
i
38
ItoKunisc 2008/6/12 page 38 i
Chapter 2. Sensitivity Analysis
¯ 0 , r) of radius r about x0 such and choose a neighborhood of N of p0 and a closed ball B(x ¯ that for x ∈ U = C ∩ B(x0 , r) and p ∈ N |g (x, p) − g (x0 , p0 )| ≤ δ and |g(x0 , p) − g(x0 , p0 )| ≤ δ r. As in the proof of Theorem 2.3 it can be shown that |h(x1 , p) − h(x2 , p)| ≤ δ|x1 − x2 | for all x1 , x2 ∈ U and p ∈ N . In particular, if x ∈ U and p ∈ N , then |h(x, p)| ≤ |h(x0 , p0 )| + |h(x, p) − h(x0 , p0 )| ≤ δ r + δ r < η. For any p ∈ N define the set-valued closed convex mapping p from U → C by p (x) = F −1 (h(x, p)).
(2.3.10)
Note that x ∈ p (x) implies that g(x, p) ∈ K. We argue that Lemma 2.7 is applicable with = p and X = C endowed with the metric induced by X. For every x ∈ U the set p (x) is a closed convex subset of X. (Hence, if X is a reflexive Banach space, then the metric projection is well defined and Lemma 2.7 will be applicable with = 0.) For every x1 , x2 ∈ U we have by Theorem 2.6 dH (p (x1 ), p (x2 ))| ≤ ρ |h(x1 , p) − h(x2 , p)| < ρδ |x1 − x2 |, where we used that h(xi , p) ∈ F (xi ) for i = 1, 2. Moreover |p (x0 ) − x0 | ≤ ρ |g(x0 , p) − g(x0 , p0 )| ≤ ρδr, and for every pair of sets S1 and S2 in U we have dH (p (S1 ), p (S2 )) ≤ ρ δ dH (S1 , S2 ). Hence all conditions of Lemma 2.7 are satisfied with α = 2ρδ and = r(1−2α) . Con1−α sequently there exists a Cauchy sequence {xk } in C such that xk+1 ∈ p (xk ) and x¯ = ¯ Moreover, if Sk = p (Sk−1 ) with S0 = {x0 }, then limk→∞ xk ∈ C satisfies x¯ ∈ p (x). ¯ and dH (Sm , Sk ) → 0 as m ≥ k → ∞ and thus S¯ = limk→∞ Sk satisfies S¯ = p (S) S¯ ⊂ p . Let (x, y) ˜ ∈ M = {(x, y) ˜ ∈ U × Y : g(x0 , p0 ) + g (x0 , p0 )(x − x0 ) + y˜ ∈ K}. For all x¯ ∈ S¯ we have by Theorem 2.6 d(x, F −1 (h(x, ¯ p)) ≤ ρ d(h(x, ¯ p), F (x)) ≤ ρ |h(x, ¯ p) + y| ˜ ≤ ρ (|g(x, p) − g(x0 , p0 ) − g (x0 , p0 )(x − x0 ) − y| ˜ + |h(x, ¯ p) − h(x, p)| ).
i
i i
i
i
i
i
2.3. Stability results
ItoKunisc 2008/6/12 page 39 i
39
Since |h(x, p) − h(x, ¯ p)| ≤ δ |x − x|, ¯ ¯ ≤ d(x, S)
ρ ˜ |g(x, p) − g(x0 , p0 ) − g (x0 , p0 )(x − x0 ) − y| 1 − ρδ
for all (x, y) ˜ ∈ M. Now, for x ∈ C and y ∈ g(x, p) − K we let y˜ = g(x, p) − g(x0 , p0 ) − g (x0 , p0 )(x − x0 ) − y. Then (x, y) ˜ ∈ M and hence ¯ ≤ d(x, (p)) ≤ d(x, S)
ρ |y|. 1 − ρδ
ρ 1+r This holds for all y ∈ g(x, p) − K. Since 1−ρδ = η−δ(1+3r) , we can select for every > 0 ρ −1 neighborhoods N = N and U = U such that 1−ρδ ≤ η + . This implies the claim with M = η−1 .
Theorem 2.9 (Lipschitz Continuity of Value Function). Let x0 ∈ (p0 ) be a local minimizer. Assume moreover that x0 is regular and that there exists a neighborhood U × N of (x0 , p0 ) such that |f (x, p) − f (x, ˜ p0 )| ≤ Lf (|x − x| ˜ + δ(p, p0 ))
(2.3.11)
for x, x˜ ∈ U and p ∈ N and |g(x, p) − g(x, p0 )| ≤ Lg δ(p, p0 )
(2.3.12)
for (x, p) ∈ U × N. Then there exist constants r > 0 and Lr and a neighborhood N˜ of p0 such that |μr (p) − μr (p0 )| ≤ Lr δ(p, p0 ) ˜ for all p ∈ N. Proof. Since x0 is a local minimizer of (2.3.1) at p0 there exists s > 0 such that f (x0 , p0 ) ≤ f (x, p0 )
for all x ∈ s (p0 ).
¯ 0 , s). By Theorem 1.4 and the discussion following Definition 1.5 the Let Cs = C ∩ B(x regular point condition (2.3.3) also holds with C replaced by Cs . Applying Theorem 2.8 with C = Cs and = 1 there exist neighborhoods U1 of x0 and N1 of p0 such that s (p) is nonempty and d(x, s (p)) ≤ (M + 1) d(0, {g(x, p) − K : x ∈ Cs })
(2.3.13)
for each x ∈ U1 and p ∈ N1 . We now choose r > 0 and a neighborhood N˜ of p0 such that ¯ 0 , r) ⊂ U1 ∩ U, N˜ ⊂ N1 ∩ N, and 2(M + 1)Lg δ(p, p0 ) < r for all p ∈ N˜ . r ≤ s, B(x (2.3.14) By (2.3.13) and (2.3.12) we obtain for all p ∈ N˜ d(x0 , s (p)) ≤ (M + 1) |g(x0 , p) − g(x0 , p0 )| ≤ (M + 1)Lg δ(p, p0 ).
i
i i
i
i
i
i
40
ItoKunisc 2008/6/12 page 40 i
Chapter 2. Sensitivity Analysis
Thus there exists a x(p) ∈ s (p) such that |x(p) − x0 | ≤ 2(M + 1)Lg δ(p, p0 ) < r and therefore x(p) ∈ r (p) and f (x(p), p) ≥ μr (p). From (2.3.11) |f (x(p), p) − f (x0 , p0 )| ≤ Lr δ(p, p0 ), where Lr = Lf (2(M + 1)Lg + 1). Combining these inequalities, we have μr (p) ≤ f (x(p), p) ≤ f (x0 , p0 ) + Lr δ(p, p0 ) = μr (p0 ) + Lr δ(p, p0 )
(2.3.15)
˜ for all p ∈ N. Conversely, let p ∈ N˜ and x ∈ r (p). From (2.3.13) and (2.3.12) d(x, s (p0 )) ≤ (M + 1) |g(x, p) − g(x0 , p0 )|. Hence there exists x¯ ∈ s (p0 ) such that |x − x| ¯ ≤ 2(M + 1)Lg δ(p, p0 ) < r,
(2.3.16)
¯ p0 ). From (2.3.11) we deduce that |f (x, p) − f (x, ¯ p0 )| ≤ and thus f (x0 , p0 ) ≤ f (x, Lr δ(p, p0 ), and hence ¯ p0 ) ≤ f (x, p) + Lr δ(p, p0 ) μr (p0 ) = f (x0 , p0 ) ≤ f (x,
(2.3.17)
for each x ∈ r (p). The desired estimate follows from (2.3.15) and (2.3.17). In the above proof we followed the work of Alt [Alt3]. Next, we establish Hölder continuity of the local minimizers x(p) of (2.3.1). Theorem 2.9 provides Lipschitz continuity of the value function under the regular point condition. The following example shows that the local solutions to the perturbed problems may behave rather irregularly. Example 2.10. Let X = Y = R 2 , P = R, C = R 2 , K = {(y1 , y2 ) : y1 ≥ 0, y2 ≥ 0}, f (x1 , x2 , p) = x1 + p x2 , and g(x1 , x2 , p) = (x1 , x2 ). For p0 = 0 a local solution to (2.3.1) is given by x0 = (0, 1) and μr (p0 ) = 0. Now let r > 0 be arbitrary. Then for p = p0 the perturbed problem min f (x, p),
x ∈ r (p)
(2.3.18)
has a unique solution x(p) = and
(0, max(1 − r, 0)) (0, 1 + r)
μr (p) =
p max(1 − r, 0) p(1 + r)
if p > 0, if p < 0
if p > 0, if p < 0.
This shows that μr (p) is Lipschitz continuous but the distance |x(p) − x0 | is not continuous with respect to p at p0 = 0. The reason for this behavior of x(p) is that the unperturbed
i
i i
i
i
i
i
2.3. Stability results
ItoKunisc 2008/6/12 page 41 i
41
problem does not depend on x2 and all (0, x2 ) with x2 ≥ 0 are optimal for p = p0 . On the other hand if we let f (x1 , x2 , p) = x1 + px2 + 12 (x2 − 1)2 , then for r ≥ 1 and |p| ≤ 1 the solution to (2.3.18) is given by x(p) = (0, 1 − p). Thus the distance |x(p) − x0 | is continuous in p. This example suggests that some kind of local invertibility should be assumed on f in order to ensure continuity of the solutions. We define a condition closely related to sufficient optimality conditions: There exist κ > 0, β ≥ 1, γ > 0, and neighborhoods Uˆ of x0 and Nˆ of p0 such that f (x, p) ≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 )
(2.3.19)
for all p ∈ Nˆ and x ∈ (p) ∩ Uˆ . The following result shows that this condition is implied by sufficient optimality conditions at x0 . Theorem 2.11. Assume that x0 is regular, that (2.3.11)–(2.3.12) hold, and that there exist constants κ > 0, β ≥ 1 and neighborhood U˜ of x0 such that f (x, p0 ) ≥ f (x0 , p0 ) + κ |x − x0 |β for all x ∈ (p0 ) ∩ U˜ .
(2.3.20)
Then condition (2.3.19) holds. Proof. We select r, s > 0 and a neighborhood N1 of p0 as in (2.3.14) in the proof of Theorem ¯ 0 , s) ∈ U˜ . Set Uˆ = B(x ¯ 0 , r) and choose x ∈ r (p), p ∈ N . 2.9. We can assume that B(x From (2.3.16) there exists an x¯ ∈ s (p0 ) such that |x − x| ¯ ≤ 2(M + 1)Lg δ(p, p0 ), and due to (2.3.17) f (x, p) ≥ f (x, ¯ p0 ) − Lr δ(p, p0 ). By (2.3.20) f (x, ¯ p0 ) ≥ f (x0 , p0 ) + κ˜ |x¯ − x0 |β . Note that |x − x0 |β ≤ (|x¯ − x0 | + |x¯ − x|)β ≤ |x¯ − x0 |β + β(|x¯ − x0 | + |x¯ − x|)β−1 |x − x| ¯ ≤ |x¯ − x0 |β + β(3s)β−1 |x − x|. ¯ Thus, f (x, p) ≥ f (x0 , p0 ) + κ |x − x0 |β − Lr δ(p, p0 ) ≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 ), where γ = Lr + κβ(3s)β−1 2(M + 1)Lg and (2.3.19) follows with Nˆ = N˜ .
i
i i
i
i
i
i
42
ItoKunisc 2008/6/12 page 42 i
Chapter 2. Sensitivity Analysis
Condition (2.3.20) can be verified under the following second order sufficient optimality condition due to Maurer and Zowe [MaZo] . Theorem 2.12. Assume that K is a closed convex cone in Y and that x0 ∈ (p0 ) is a regular point, and let L(x, λ) = f (x, p0 ) + λ, g(x, p0 ) Y ∗ ,Y be the Lagrangian for (2.3.1) at x0 with Lagrange multiplier λ0 . If there exist constants ω > 0, β¯ > 0 such that L (x0 , λ0 , p0 )(h, h) ≥ ω |h|2 for all h ∈ S,
(2.3.21)
where S = L((p0 ), x0 ) ∩ {h ∈ X : λ0 , g (x0 , p0 )h Y ∗ ,Y ≥ −β¯ |h|}, then there exist κ > 0 and a neighborhood U˜ of x0 such that (2.3.20) holds with β = 2. For convenience we recall the definition of the linearizing cone: L((p0 ), x0 ) = {x ∈ C(x0 ) : g (x0 )x ∈ K(g(x0 ))}, where K(g(x0 )) = {λ(y − g(x0 )) : y ∈ K, λ ≥ 0}. Proof. All quantities are evaluated at p0 and this dependence will therefore be suppressed. First we show that every x ∈ = (p0 ) can be represented as x − x0 = h(x) + z(x) with h(x) ∈ L(, x0 ) and |z(x)| = o(|x − x0 |), where
o(s) s
(2.3.22)
→ 0 as s → 0+ . Let x ∈ and expand g at x0 :
g(x) − g(x0 ) = g (x0 )(x − x0 ) + r(x, x0 )
with |r(x, x0 )| = o(|x − x0 |).
By the generalized open mapping theorem (see Theorem 1.4), there exist α > 0 and k(x) ∈ K(g(x0 )) such that for some z(x) ∈ α |r(x, x0 )| (BX ∩ C(x0 )) r(x, x0 ) = g (x0 )z(x) − k(x). If we put h(x) = x − x0 + z(x), then h(x) ∈ C(x0 ), |z(x)| = o(|x − x0 |), and g (x0 )h(x) = g (x0 )(x − x0 ) + r(x, x0 ) + k(x) = g(x) − g(x0 ) + k(x) ∈ K − g(x0 ) + k(x) ⊂ K(g(x0 )), i.e., h(x) ∈ L(, x0 ). Next, for x ∈ 1 f (x) ≥ f (x) + λ0 , g(x) = L(x, λ0 ) = L(x0 , λ0 ) + L (x − x0 , x − x0 ) + r(x − x0 ), 2 (2.3.23) where |r(x −x0 )| = o(|x −x0 |2 ), and we used λ0 , g(x) ≤ 0 for x ∈ , and L (x0 , λ0 ) = 0. Now put B = L (x0 , λ0 ) and S = L() ∩ {h ∈ X : λ0 , g (x0 )h ≥ −β¯ |h|} in Lemma 2.13. It implies the existence of δ0 > 0 and γ > 0 such that L (x0 , λ0 )(x − x0 , x − x0 ) ≥ δ0 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈ − x0 with |z(x)| ≤ γ |h(x)| and λ0 , g (x0 )h ≥ −β¯ |h|.
i
i i
i
i
i
i
2.3. Stability results
ItoKunisc 2008/6/12 page 43 i
43
Choose δ ∈ (0, 1) satisfying δ/(1 − δ) < γ and ρ > 0 such that |z(x)| ≤ δ|x − x0 | for |x − x0 | ≤ ρ. Then for |x − x0 | ≤ ρ |h(x)| ≥ |x − x0 | − |z(x)| ≥ (1 − δ)|x − x0 | and thus |z(x)| ≤
δ |h(x)| < γ |h(x)|. 1−δ
(2.3.24)
Hence L (x0 , λ0 )(x − x0 , x − x0 ) ≥ δ0 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈ − x0 with |x − x0 | ≤ ρ and λ0 , g (x0 )h(x) ≥ −β¯ |h(x)|. By (2.3.23) there exists ρ¯ ∈ (0, ρ] such that f (x) ≥ f (x0 ) + δ40 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈ − x0 with |x − x0 | ≤ ρ and λ0 , g (x0 )h(x) ≥ −β¯ |h(x)|.
(2.3.25)
For the case x − x0 = h(x) + z(x) ∈ − x0 with |x − x0 | ≤ ρ¯ and λ0 , g (x0 )h(x) < −β¯ |h(x)| we find f (x) − f (x0 ) = f (x0 )(x − x0 ) + r¯ (x − x0 ) = − λ0 , g (x0 )h(x) − λ0 , g (x0 )z(x) + r¯ (x − x0 ) ¯ ≥ β|h(x)| + r1 (x − x0 ), where r1 (x − x0 ) = − λ0 , g (x0 )z(x) + r¯ (x − x0 ) and r1 (x − x0 ) = o(|x − x0 |). Together with (2.3.24) this implies ¯ − δ)|x − x0 | + r1 (x − x0 ) for x − x0 = h(x) + z(x) ∈ − x0 f (x) ≥ f (x0 ) + β(1 with |x − x0 | ≤ ρ¯ and λ0 , g (x0 )h(x) < −β¯ |h(x)|. Combined with (2.3.25) we arrive at the desired conclusion. Lemma 2.13. Let S be a subset of a normed linear space X and let B : X × X → R be a bounded symmetric quadratic form satisfying for some ω > 0 B(h, h) ≥ ω |h|2 for all h ∈ S. Then there exist δ0 > 0 and γ > 0 such that B(h + z, h + z) ≥ δ0 |h + z| 2 for all h ∈ S, z ∈ X with |z| ≤ γ |h|. Proof. Let b = B and choose γ > 0 such that δ = ω − 2bγ − bγ 2 > 0. Then for all z ∈ X and h ∈ S satisfying |z| ≤ γ |h| B(h + z, h + z) ≥ ω |h|2 − 2b|h||z| − b|z|2 ≥ δ |h|2 .
i
i i
i
i
i
i
44
ItoKunisc 2008/6/12 page 44 i
Chapter 2. Sensitivity Analysis
Since |h + z| ≤ |h| + |z| ≤ (1 + γ ) |h| we get B(h + z, h + z) ≥ δ0 |h + z|2 with δ0 = δ/(1 + γ )2 . Remark 2.3.1. An example which illustrates the benefit of including the set {h ∈ X : λ0 , g (x0 , p0 )h Y ∗ ,Y ≥ −β¯ |h|} into the definition of the set S on which the Hessian must be positive definite is given by the one-dimensional example with f (x) = −x 3 − x, g(x) = x, K = R− , and C = R. In this case = R − , x0 = 0, f (x0 ) = 0, f (x0 ) = −1, and f (x0 ) = 0. We further find that L(, x0 ) = K and that the Lagrange multiplier is given by λ0 = 1. Consequently for β¯ ∈ (0, 1) we have that S = L(, x0 ) ∩ {h ∈ R : λ0 g (x0 )h ≥ −β¯ |h|} = {0}. Hence (2.3.21) is trivially satisfied. Let us note that f is bounded below as f (x) ≥ f (x0 ) + max(|x|, 2|x|2 ) for x ∈ K. A similar argument does not hold if f is replaced by f (x) = −x 3 . In this case x0 = 0, λ0 = 0, f (x0 ) = 0, S = K, and (2.3.21) is not satisfied. Note that in this case we have lack of strict complementarity; i.e., the constraint is active and simultaneously λ0 = 0. Theorem 2.14. Suppose that x0 ∈ (p0 ) is a regular point and that for some β > 0 f (x0 )h ≥ β|h| for all h ∈ L((p0 ), x0 ). ) of x0 such that Then there exists α > 0 and a neighborhood U ). f (x) ≥ f (x0 ) + α|x − x0 | for all x ∈ (p0 ) ∩ U Proof. For any x ∈ (p0 ) we use the representation x − x0 = h(x) + z(x) from (2.3.22) in the proof of Theorem 2.12. By expansion we have f (x) − f (x0 ) = f (x0 )h(x) + f (x0 )z(x) + r(x, x0 ) with |r(x, x0 )| = o(|x − xo |). The first order condition implies f (x) − f (x0 ) ≥ β|h(x)| + r1 (x, x0 ), where r1 (x, x0 ) = f (xo )z(x) + r(x, xo ) and |r1 (x, x0 )| = o(|x − x0 |). Choose a neighbor) of x0 such that |z(x)| ≤ 1 |x − x0 | and |r1 (x, x0 )| ≤ β |x − x0 | for x ∈ (p0 ) ∩ U ). hood U 2 4 β 1 ) Then |h(x)| ≤ |x − x0 | and f (x) − f (x0 ) ≥ |x − x0 | for x ∈ U . 2
4
Note that for C = X we have f (x0 ) + g(x0 , p0 )∗ λ0 = 0 and the second order condition of Theorem 2.12 involves the set of directions h ∈ L((p0 ), x0 ) for which the first order condition is violated. Following [Alt3] we now show that (2.3.19) implies stability of local minimizers.
i
i i
i
i
i
i
2.4. Lipschitz continuity
ItoKunisc 2008/6/12 page 45 i
45
Theorem 2.15. Suppose that the local solution x0 ∈ (p0 ) is a regular point and that (2.3.11), (2.3.12), and (2.3.19) are satisfied. Then there exist real numbers r > 0 and L > 0 and a neighborhood N of p0 such that the value function μr is Lipschitz continuous at p0 and for each p ∈ N the following statements hold: (a) For every sequence {xn } in r (p) with the property that limn→∞ f (xn , p) = μr (p) it follows that xn ∈ B(x0 , r) for all sufficiently large n. (b) If there exists a point x(p) ∈ r (p) with μr (p) = f (x(p), p), then x(p) ∈ B(x0 , r), i.e., x(p) is a local minimizer of (2.3.1) and 1
|x(p) − x0 | ≤ L δ(p, p0 ) β . Proof. We choose r > 0 and a neighborhood N˜ of p0 as in (2.3.14) in the proof of Theorem ¯ 0 , r) ⊂ Nˆ × Uˆ with Nˆ , Uˆ 2.9. Without loss of generality we can assume that N˜ × B(x determined from (2.3.19) and that κ r β > (Lr + γ )δ(p, p0 )
(2.3.26)
˜ where Lr is determined in Theorem 2.9. Thus μr is Lipschitz continuous at for all p ∈ N, p0 by Theorem 2.9. Let p ∈ N˜ be arbitrary and let {xn } in (p) be a sequence satisfying limn→∞ f (xn , p) = μr (p). By (2.3.19) and Lipschitz continuity of μr κ |xn − x0 |β ≤ |f (xn , p) − f (x0 , p0 )| + γ δ(p, p0 ) ≤ |f (xn , p) − μr (p)| + |μr (p) − μr (p0 )| + γ δ(p, p0 ) ≤ |f (xn , p) − μr (p)| + (Lr + γ ) δ(p, p0 ). This estimate together with (2.3.26) imply |xn − x0 | < r for all sufficiently large n. This proves (a). Similarly, for x(p) ∈ r (p) satisfying μr (p) = f (x(p), p) we have κ |x(p) − x0 |β ≤ (Lr + γ ) δ(p, p0 ). Thus, |x(p) − x0 | < r, |x(p) − x0 | ≤
β1
Lr + γ δ(p, p0 ) κ
and (b) follows.
2.4
Lipschitz continuity
In this section we investigate Lipschitz continuous dependence of the solution and the Lagrange multipliers with respect to the parameter p in (2.1.1). Throughout this section and the next we assume that C = X and that the requirements on the function spaces specified below (2.1.1) are met.
i
i i
i
i
i
i
46
ItoKunisc 2008/6/12 page 46 i
Chapter 2. Sensitivity Analysis
Theorem 2.16. Assume (H1)–(H4) hold at the local solution x0 of (2.1.1). Then there exist neighborhoods N = N (p0 ) of p0 and U = U (x0 , λ0 , μ0 , η0 ) of (x0 , λ0 , μ0 , η0 ) and a constant M such that for all p ∈ N there exists a unique (xp , λp , μp , ηp ) ∈ U satisfying ⎧ L (x, p, λ, μ, η), ⎪ ⎪ ⎨ −e(x, p), 0∈ (2.4.1) −g(x, p) + ∂ψRm,+ (μ), ⎪ ⎪ ⎩ −(x, p) + ∂ψK + (η), and |(x(p), λ(p), μ(p), η(p)) − (x(q), λ(q), μ(q), η(q))| ≤ M |p − q|
(2.4.2)
for all p, q ∈ N. Moreover, there exists a neighborhood N˜ ⊂ N of p0 such that x(p) is a local solution of (2.1.1) if p ∈ N˜ . For the proof we shall employ the following lemma. Lemma 2.17. Consider the quadratic programming problem in a Hilbert space H min J (x) =
1 Ax, x + a, ˜ x
2
(2.4.3)
˜ subject to Ex = b˜ in Y˜ and Gx − c˜ ∈ K˜ ⊂ Z, ˜ and K˜ is a closed convex where Y˜ , Z˜ are Banach spaces, E ∈ L(H, Y˜ ), G ∈ L(H, Z), ˜ If cone in Z. (a) A·, · is a symmetric bounded quadratic form on H ×H and there exists an ω > 0 such that Ax, x ≥ ω |x|2H for all x ∈ ker (E), and ˜ c) (b) for all (b, ˜ ∈ Y˜ × Z˜ ˜ b, ˜ c) ˜ Gx − c˜ ∈ K} is nonempty, S( ˜ = {x ∈ H : Ex = b, then there exists a unique solution to (2.4.3) satisfying ˜ b, ˜ c). Ax + a, ˜ v − x ≥ 0 for all v ∈ S( ˜ ˜ η) Moreover, if x is a regular point, then there exists (λ, ˜ ∈ Y˜ ∗ × Z˜ ∗ such that ⎛
⎞ ⎛ a˜ A 0 ∈ ⎝ b˜ ⎠ + ⎝ −E −G c˜
E∗ 0 0
⎞⎛ G∗ 0 ⎠⎝ 0
⎞ ⎛ ⎞ x 0 ⎠. 0 λ˜ ⎠ + ⎝ ∂ψK˜ + (η) ˜ η˜
˜ b, ˜ c) ˜ b, ˜ c) Proof. Since K˜ is closed and convex, S( ˜ is closed and convex as well. Since S( ˜ ˜ Moreover, every is nonempty, there exists a unique w ∈ range E ∗ such that Ew = b. ˜ b, ˜ c) x ∈ S( ˜ can be expressed as x = w + y, where y ∈ ker (E). From (a) it follows that ˜ b, ˜ c). J is coercive and bounded below on S( ˜ Hence there exists a bounded minimizing ˜ b, ˜ c) sequence {xn } in S( ˜ such that limn→∞ J (xn ) → inf x∈S( ˜ b, ˜ c) ˜ J (x). Boundedness of
i
i i
i
i
i
i
2.4. Lipschitz continuity
ItoKunisc 2008/6/12 page 47 i
47
˜ b, ˜ c) xn and weak sequential closedness of S( ˜ imply the existence of a subsequence of xn ˜ ˜ that converges weakly to some x in S(b, c). ˜ Since J is weakly lower semicontinuous, ˜ b, ˜ c). J (x) ≤ lim inf J (xn )n→∞ . Hence x minimizes J over S( ˜ ˜ ˜ ˜ b, ˜ c). For v ∈ S(b, c) ˜ and t ∈ (0, 1) we have x + t (v − x) ∈ S( ˜ Thus, 0 ≤ J (x + t (v − x)) − J (x) = t Ax + a, ˜ v − x +
t2 A(v − x), v − x
2
and letting t → 0+ we have Ax + a, ˜ v − x ≥ 0
˜ b, ˜ c). for all v ∈ S( ˜
The last assertion follows from Theorem 1.6. Proof of Theorem 2.16. The proof of the first assertion of the theorem is based on the implicit function theorem of Robinson for generalized equations; see Theorem 2.3. It requires us to verify the strong regularity condition for the linearized form of (2.4.1) which is given by ⎧ L (x0 , p0 , λ0 , μ0 , η0 ) + A(x − x0 ) + E ∗ (λ − λ0 ) + G∗ (μ − μ0 ) + L∗ (η − η0 ), ⎪ ⎪ ⎨ −e(x0 , p0 ) − E(x − x0 ), 0∈ −g(x ⎪ 0 , p0 ) − G(x − x0 ) + ∂ψR m,+ (μ), ⎪ ⎩ −(x0 , p0 ) − L(x − x0 ) + ∂ψK + (η), where the operators A, E, G, and L are defined in the introduction to this chapter. If we define the multivalued operator T from X × W × Rm × Z to itself by ⎛ ⎞ ⎛ ⎞⎛ ⎞ x A E ∗ G ∗ L∗ x ⎜ λ ⎟ ⎜ −E 0 ⎜ ⎟ 0 0 ⎟ ⎟ ⎜ ⎟⎜ λ ⎟ T ⎜ ⎝ μ ⎠ = ⎝ −G 0 ⎠ ⎝ 0 0 μ ⎠ η −L 0 0 0 η ⎛ ⎞ ⎛ ⎞ f (x0 , p0 ) − Ax0 0 ⎜ ⎟ ⎜ ⎟ 0 Ex0 ⎟ ⎜ ⎟ +⎜ ⎝ −g(x0 , p0 ) + Gx0 ⎠ + ⎝ ∂ψRm,+ (μ) ⎠ , −(x0 , p0 ) + Lx0 ∂ψK + (η) then the linearization can be expressed as 0 ∈ T (x, λ, μ, η). Since the constraint associated with g − (x, p0 ) ≤ 0 is inactive in a neighborhood of x0 , strong regularity of T at (x0 , λ0 , μ0 , η0 ) easily follows from strong regularity of the multivalued 0 mapping T from X × (W × Rm1 ) × Rm2 × Z into itself at (x0 , (λ˜ 0 , μ+ 0 ), μ0 , η0 ) defined by ⎛ ⎞⎛ ⎞ ⎛ ⎞ x x A E+∗ G∗0 L∗ ⎜ λ˜ ⎟ ⎜ −E+ 0 ⎜ ˜ ⎟ 0 0 ⎟ ⎟⎜ λ ⎟ ⎟ ⎜ T⎜ ⎝ μ0 ⎠ = ⎝ −G0 ⎠ ⎝ μ0 ⎠ 0 0 0 −L 0 0 0 η η ⎛ ⎞ ⎛ ⎞ f (x0 , p0 ) − Ax0 0 ⎜ ⎟ ⎜ ⎟ 0 E+ x0 ⎟+⎜ ⎟ +⎜ 0 ⎠. ⎝ ⎠ ⎝ m ,+ ∂ψR 2 (μ ) G 0 x0 ∂ψK + (η) −(x0 , p0 ) + Lx0
i
i i
i
i
i
i
48
ItoKunisc 2008/6/12 page 48 i
Chapter 2. Sensitivity Analysis
Strong regularity of T requires us to show that there exist neighborhoods of Vˆ of 0 and Uˆ of 0 m1 m2 (x0 , (λ˜ 0 , μ+ × Z such that T −1 (Vˆ ) ∩ Uˆ is single valued 0 ), μ0 , η0 ) in X × (W × R ) × R and Lipschitz continuous from Vˆ to Uˆ . We now divide the proof in several steps. Existence. We show the existence of a solution to ˜ μ0 , η) (α, β, γ , δ) ∈ T (x, λ, for (α, β, γ , δ) ∈ Vˆ . Observe that this is equivalent to ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ x a A E+∗ G∗0 L∗ 0 ⎜ b ⎟ ⎜ −E+ 0 ⎟ ⎜ λ˜ ⎟ ⎜ 0 0 0 ⎟ ⎜ ⎟⎜ ⎟+⎜ 0∈⎜ ⎝ c ⎠ + ⎝ −G0 0 0 0 ⎠ ⎝ μ0 ⎠ ⎝ ∂ψRm2 ,+ (μ0 ) d −L 0 0 0 ∂ψK + (η) η
⎞ ⎟ ⎟, ⎠
(2.4.4)
where (a, b, c, d) = f (x0 , p0 ) − Ax0 − α, E+ x0 − β, G0 x0 − γ , −(x0 , p0 ) + Lx0 − δ). (2.4.5) To solve (2.4.4) we introduce the quadratic optimization problem with linear constraints: 1 min Ax, x + a, x
2
(2.4.6)
subject to E+ x = b, G0 x ≤ c, and Lx − d ∈ K. We verify the conditions in Lemma 2.17 for (2.4.6) with Y˜ = W × Rm1 , Z˜ = Rm2 × Z, E = E+ , G = (G0 , L), and K˜ = Rm2 ,− × K. Let S be the feasible set of (2.4.6): S(α, β, γ , δ) = {x ∈ X : E+ x = b, G0 x ≤ c, Lx − d ∈ K}, where the relationship between (α, β, γ , δ) and (a, b, c, d) is given by (2.4.5). Clearly, x0 ∈ S(0) and moreover x0 is regular point. From Theorem 2.8 it follows that there exists a neighborhood V of 0 such that for all (α, β, δ, γ ) ∈ V the set S(α, β, γ , δ) is nonempty. Lemma 2.17 implies the existence of a unique solution x to (2.4.6) for every (α, β, δ, γ ) ∈ V . By (H1)–(H2) and Theorems 2.12, 2.11, and 2.14 the solution x is Hölder continuous with exponent 12 with respect to (α, β, γ , δ) ∈ V . Next we show that x = x(α, β, γ , δ) is a regular point for (2.4.6), i.e., ⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫ 0 E+ 0 ⎨ ⎬ 0 ∈ int ⎝ G0 x − c ⎠ + ⎝ G0 ⎠ (X − x) + ⎝ Rm2 ,+ ⎠ . ⎩ ⎭ Lx − d L −K This is equivalent to ⎧⎛ ⎨ 0 ∈ int ⎝ ⎩
⎛ ⎞ ⎛ ⎞ ⎞⎫ β 0 E+ ⎬ ⎠ + ⎝ G0 ⎠ (X − x0 ) + ⎝ Rm2 ,+ ⎠ . γ ⎭ (x0 , p0 ) + δ L −K
i
i i
i
i
i
i
2.4. Lipschitz continuity
ItoKunisc 2008/6/12 page 49 i
49
Since x0 is a regular point, there exists a neighborhood V˜ ⊂ V of 0 such that this inclusion holds for all (α, β, γ , δ) ∈ V˜ . Hence the existence of a solutions to (2.4.4) follows from Lemma 2.17. Uniqueness. To argue uniqueness, assume that (xi , λ˜ i , μ0i , ηi ), i = 1, 2, are solutions to (2.4.4). It follows that A(x1 − x2 ) + E+∗ (λ˜ 1 − λ˜ 2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ), x1 − x2 ≤ 0, E+ (x1 − x2 ) = 0, G0 (x1 − x2 ), μ01 − μ02 ≥ 0, L(x1 − x2 ), η1 − η2 ≥ 0. Using (H2) we find κ |x1 −x2 |2 ≤ A(x1 −x2 ), x1 −x2 ≤ − G∗0 (μ01 −μ02 ), x1 −x2 −(L∗ (η1 −η2 ), x1 −x2 ) ≤ 0, which implies that x1 = x2 = x. To show uniqueness of the remaining components, we observe that E+∗ (λ˜ 1 − λ˜ 2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ) = 0, η1 − η2 , L(x − x0 ) + (x0 , p0 ) + δ = 0, or equivalently
E ∗ (z)(λ˜ 1 − λ˜ 2 , μ01 − μ02 + η1 − η2 ) = 0,
where z = L(x − x0 ) + (x0 , p0 ) + δ and E is defined below (2.1.6). Due to continuous dependence of x on (α, β, γ , δ) we can assure that z ∈ O for all (α, β, γ , δ) ∈ V˜ . Hence (H3) implies that (λ˜ 1 − λ˜ 2 , μ01 − μ02 , η1 − η2 ) = 0. Continuity. By (H3) the operator E ∗ ((x0 , p0 )) has closed range and injective. Hence there exists > 0 such that |E ∗ ((x0 , p0 ))| ≥ || for all = (λ˜ , μ0 , η). Since E ∗ ((x0 , p0 )) − E ∗ (z) ≤ |z − (x0 , p0 )|, there exists a neighborhood of 0 in X × (W × Rm1 ) × Rm2 × Z, again denoted by V˜ , such that |E ∗ (zδ )| ≥
|| 2
(2.4.7)
˜ μ0 , η) for all (α, β, γ , δ) ∈ V˜ , where zδ = (x0 , p0 ) + Lx − Lx0 + δ. Any solution (x, λ, satisfies ˜ μ0 , η) = (−Ax − a, 0). E ∗ (zδ )(λ, It thus follows from (2.4.7) that there exists a constant k1 such that for all (α, β, γ , δ) ∈ V˜ the solution (x, λ˜ , μ, ˜ η) satisfies ˜ μ0 , η)|X×(W ×Rm1 )×Rm2 ×Z ≤ k1 . |(x, λ,
i
i i
i
i
i
i
50
ItoKunisc 2008/6/12 page 50 i
Chapter 2. Sensitivity Analysis
Henceforth let (αi , βi , γi , δi ) ∈ V˜ and let (xi , λ˜ i , μ0i , ηi ) denote the corresponding solution to (2.4.4) for i = 1, 2. Then ⎞ ⎛ λ˜ 1 − λ˜ 2 α1 − α2 + A(x2 − x1 ) . E ∗ (zδ1 ) ⎝ μ01 − μ02 ⎠ = η2 , L(x2 − x1 ) + δ2 − δ1
η1 − η 2 By (2.4.7) this implies that there exists a constant k2 such that |(λ˜ 1 − λ˜ 2 , μ01 − μ02 , η1 − η2 )|(W ×Rm1 )×Rm2 ×Z ≤ k2 (|α1 − α2 | + |δ1 − δ2 | + |x1 − x2 |). (2.4.8) Now, from the first equation in (2.4.4) we have A(x1 − x2 ) + E+∗ (λ˜ 1 − λ˜ 2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ) + a1 − a2 , x1 − x2 ≤ 0. (2.4.9) We also find μ01 − μ02 , G0 (x1 − x2 ) = μ01 , c1 − c2 + μ01 , c2 − G0 x2
(2.4.10) − μ02 , G0 x1 − c1 − μ02 , c1 − c2 ≥ μ01 − μ02 , c1 − c2
and similarly η1 − η2 , L(x1 − x2 ) ≥ η1 − η2 , d1 − d2 .
(2.4.11)
Representing x1 − x2 = v + w with v ∈ ker (R+ ) and w ∈ range (E+∗ ), one obtains E+ (x1 − x2 ) = E+ w = b1 − b2 . Thus |w| ≤ k3 |b1 − b2 |
(2.4.12)
for some k3 , and from (H2) and (2.4.9)–(2.4.11) κ |v|2 ≤ Av, v = A(x1 − x2 ), x1 − x2 − 2 Av, w − Aw, w
≤ − λ˜ 1 − λ˜ 2 , b1 − b2 − μ01 − μ02 , c1 − c2 − η1 − η2 , d1 − d2
− a1 − a2 , v + w − 2 Av, w − Aw, w . Let = (λ˜ 1 − λ˜ 2 , μ01 − μ02 , η1 − η2 ). Then κ |v|2 ≤ |||(β1 − β2 , γ1 − γ2 , δ1 − δ2 )| + |α1 − α2 |(|v| + |w|) + A(2|v||w| + |w|2 ). It thus follows from (2.4.8), (2.4.12) that there exists a constant k4 such that |x1 − x2 | ≤ k4 |(α1 − α2 , β1 − β2 , γ1 − γ2 , δ1 − δ2 )| for all (αi , βi , γi , δi ) ∈ V˜ . We apply (2.4.8) once again to obtain Lipschitz continuity of (x, λ˜ , μ0 , η) with respect to (α, β, γ , δ) in a neighborhood of the origin. Consequently T
i
i i
i
i
i
i
2.4. Lipschitz continuity
ItoKunisc 2008/6/12 page 51 i
51
is strongly regular at (x0 , λ0 , μ0 , η0 ), Robinson’s theorem is applicable, and (2.4.1), (2.4.2) follow. Local solution. We show that there exists a neighborhood N˜ of p0 such that for p ∈ N˜ the second order sufficient optimality (2.3.21) is satisfied at x(p) so that x(p) is a local solution of (2.1.1) by Theorem 2.12. Due to (H2) and the smoothness properties of f, e, g we can assume that L (x(p), p, λ(p), η(p))(h, h) ≥
κ 2 |h| 2
for all h ∈ ker E+
(2.4.13)
if p ∈ N (p0 ). Let us define Ep = (e (x(p), p), g+ (x(p), p)) for p ∈ N (p0 ). Due to surjectivity of Ep0 and smoothness properties of e, g there exists a neighborhood N˜ ⊂ N (p0 ) of p0 such that Ep is surjective for all p ∈ V˜ . Lemma 2.13 implies the existence of δ0 > and γ > 0 such that
L (x(p), p, λ(p), η(p))(h + z, h + z) ≥ δ0 |h + z|2 for all h ∈ ker E+ and z ∈ X satisfying |z| ≤ γ |h|. The orthogonal projection onto ker Ep is given by Pker Ep = I − Ep∗ (Ep Ep∗ )−1 Ep . We can select N˜ so that |Ep∗ (Ep Ep∗ )−1 Ep − Ep∗0 (Ep0 Ep∗0 )−1 Ep0 | ≤
γ 1+γ
for all p ∈ V˜ . For x ∈ ker Ep , we have x = h + z with h ∈ ker E+ and z ∈ (ker E+ )⊥ and |x|2 = |h|2 + |z|2 . Thus, |z| ≤ |Ep∗ (Ep Ep∗ )−1 Ep x − Ep∗0 (Ep0 Ep∗0 )−1 Ep0 x| ≤
γ (|h| + |z|) 1+γ
and hence |z| ≤ γ |h|. From (2.4.13) this implies L (x(p), p, λ(p), η(p)) ≥ δ0 |x|2
for all x ∈ ker Ep
and, since L((p), p) ⊂ ker Ep , the second order sufficient optimality (2.3.21) is satisfied at x(p). The first part of the proof of Theorem 2.16 contains a Lipschitz continuity result for linear complementarity problems. In the following corollary we reconsider this result in a modified form which will be used for the convergence analysis of sequential quadratic programming problems. Throughout the following discussion p0 in (2.1.1) (with C = X) is fixed and therefore its notation is suppressed. For (x, λ, μ) ∈ X × W ∗ × Rm let L(x, λ, μ) = f (x) + λ, e(x) + μ, g(x) ; i.e., the equality and finite rank inequality constraints are realized in the Lagrangian term. ¯ μ) For (x, λ, μ) ∈ X×W ∗ ×Rm , (x, ¯ λ, ¯ ∈ X×W ∗ ×Rm , and (a, b, c, d) ∈ X×W ×Rn ×Z
i
i i
i
i
i
i
52
ItoKunisc 2008/6/12 page 52 i
Chapter 2. Sensitivity Analysis
consider
⎧ ¯ μ)(x min 12 L (x, ¯ λ, ¯ − x, ¯ x − x) ¯ + f (x) ¯ + a, x − x , ¯ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ¯ − x) ¯ = b, ¯ + e (x)(x ⎨ e(x) (2.4.14)
⎪ ⎪ g(x) ¯ + g (x)(x ¯ − x) ¯ ≤ c, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (x) ¯ + L(x − x) ¯ − d ∈ K.
¯ μ) Let (x, ¯ λ, ¯ ∈ X × W ∗ × Rm and define the operator G from X × W ∗ × Rm × Z ∗ to X × W × Rm × Z by G(x, ¯ λ¯ , μ)(x, ¯ λ, μ, η) ⎛ ⎞ ⎛ ⎞ ¯ μ)(x f (x) ¯ L (x, ¯ λ, ¯ − x) ¯ + e (x) ¯ ∗ λ + g (x) ¯ ∗ μ + L∗ η ⎜ −e(x) ⎜ ⎟ ¯ ⎟ −e (x)(x ¯ − x) ¯ ⎟+⎜ ⎟. =⎜ ⎝−g(x) ⎠ ¯ ⎠ ⎝ −g (x)(x ¯ − x) ¯ −(x) ¯ −L(x − x) ¯ Note that
⎛
⎞ ⎛ a ⎜ b ⎟ ⎜ ⎟ ¯ μ)(x, 0∈⎜ ¯ λ, ¯ λ, μ, η) + ⎜ ⎝ c ⎠ + G(x, ⎝ d
⎞ 0 ⎟ 0 ⎟ ∂ψRm,+ (μ) ⎠ ∂ψK + (η)
(2.4.15)
is the first order optimality system for (2.4.14). The following modification of (H2) will be used
(H2) there exists κ > 0 such that L (x0 , λ0 , μ0 ) x, x X ≥ κ|x|2X for all x ∈ ker(E+ ). , (H3) hold at a local solution of (2.1.1) and that Corollary 2.18. Assume that (H1), (H2), the second derivatives of f , e, and g are Lipschitz continuous in a neighborhood of x0 . Then there exist neighborhoods U (ξ0 ) of ξ0 = (x0 , λ0 , μ0 , η0 ) in X × W ∗ × Rm × Z ∗ , ˜ Uˆ (x0 , λ0 , μ0 ) of (x0 , λ0 , μ0 ), and V of the origin in X × W × Rm × Z and a constant K, ¯ μ) such that for all (x, ¯ λ, ¯ ∈ Uˆ (x0 , λ0 , μ¯ 0 )) and q = (a, b, c, d) ∈ V , there exists a unique ¯ μ, solution ξ = ξ(x, ¯ λ, ¯ q) = (x, λ, μ, η) ∈ U (ξ0 ) of (2.4.15), and x is a local solution of (2.4.14). Moreover, for every pair q1 , q2 ∈ V and (x¯1 , λ¯ 1 , μ¯ 1 ), (x¯2 , λ¯ 2 , μ¯ 2 ) ∈ Uˆ (x0 , λ0 , μ¯ 0 ) we have ˜ x¯1 , λ¯ 1 , μ¯ 1 ) − (x¯2 , λ¯ 2 , μ¯ 2 )| + |q1 − q2 |). |ξ(x¯1 , λ¯ 1 , μ¯ 1 , q1 ) − ξ(x¯2 , λ¯ 2 , μ¯ 2 , q2 )| ≤ K(|( Proof. From the first part of the proof of Theorem 2.16 it follows that ⎛ ⎞ ⎛ a 0 ⎜ b ⎟ ⎜ 0 ⎟ ⎜ 0∈⎜ ⎝ c ⎠ + G(x0 , λ0 , μ0 )(x, λ, μ, η) + ⎝ ∂ψRm,+ (μ) d ∂ψK + (η)
⎞ ⎟ ⎟ ⎠
i
i i
i
i
i
i
2.5. Differentiability
ItoKunisc 2008/6/12 page 53 i
53
admits a unique solution ξ = (x, λ, μ, η) which depends on (a, b, c, d) Lipschitz continuously provided that (a, b, c, d) is sufficiently small. The proof of the corollary can be completed by observing that the estimates in the proof of Theorem 2.16 also hold uniformly ¯ μ) if (x0 , λ0 , μ0 ) is replaced by (x, ¯ λ, ¯ in a sufficiently small neighborhood Uˆ (x0 , λ0 , μ0 ) of (x0 , λ0 , μ0 ), and (a, b, c, d) is in a sufficiently small neighborhood V of the origin in X × W × Rm × Z. As an alternative to reconsidering the proof of Theorem 2.16 one can apply Theorem 2.16 to (2.1.1) with P = (X × W ∗ × Rm ) × (X × W × Rm × Z), p = (x, λ, μ, a, b, c, d), p0 = (x0 , λ0 , μ0 , 0), and f, e, g, replaced by ¯ μ, ¯ μ)(x f˜(x, x, ¯ λ, ¯ a) = 12 L (x, ¯ λ, ¯ − x, ¯ x − x) ¯ + f (x) ¯ + a, x − x , ¯
e(x; ˜ x, ¯ b) = e(x) ¯ + e (x)(x ¯ − x) ¯ − b, g(x; ˜ x, ¯ c) = g(x) ¯ + g (x)(x ¯ − x) ¯ − c, ˜ x, (x; ¯ d) = (x) ¯ + L(x − x) ¯ − d. Clearly (H1)–(H3) hold for e, ˜ g, ˜ ˜ at (x0 , p0 ) with p0 = (x0 , λ0 , μ0 , 0). Lipschitz continuity ¯ μ, of f˜, e, ˜ g, ˜ ˜ with respect to (x, ¯ λ, ¯ a, b, c, d) is a consequence of the general regularity requirements on f, e, g, made at the beginning of the chapter and the assumption that the second derivatives of f and e are Lipschitz continuous in a neighborhood of x0 . This ˜ implies (H4) for e, ˜ f˜, g, ˜ .
2.5
Differentiability
In this section we discuss the differentiability properties of the solution ξ(p) = (x(p), λ(p), μ(p), η(p)) of the optimality system (2.1.3) with respect to p. Throughout it is assumed that p ∈ N so that (2.1.3) has the unique solutions ξ(p) in U , where N and U are specified in Theorem 2.16. We assume that C = X. Definition 2.19. A function φ from P to a normed linear space X˜ is said to have a directional derivative at p0 ∈ P if φ(p0 + t q) − φ(p0 ) lim t→0+ t exists for all q ∈ P . Consider the linear generalized equation ⎧ ∗ ∗ ∗ ⎪ ⎪ L (x0 , p, λ0 , μ0 , η0 ) + A(x − x0 ) + E (λ − λ0 ) + G (μ − μ0 ) + L (η − η0 ), ⎨ −e(x0 , p) − E(x − x0 ), 0∈ −g(x ⎪ 0 , p) − G(x − x0 ) + ∂ψRm,+ (μ), ⎪ ⎩ −(x0 , p) − L(x − x0 ) + ∂ψK + (η). (2.5.1) In the proof of Theorem 2.16 strong regularity of (2.1.4) at ξ(0) = (x0 , λ0 , μ0 , η0 ) was established. Therefore it follows from Theorem 2.5 that there exist neighborhoods of p0
i
i i
i
i
i
i
54
ItoKunisc 2008/6/12 page 54 i
Chapter 2. Sensitivity Analysis
and ξ(p0 ) which, without loss of generality, we again denote by N and U , such that (2.5.1) ˆ admits a unique solution ξˆ (p) = (x(p), ˆ λ(p), μ(p), ˆ η(p)) ˆ in U for every p ∈ N . Moreover we have |ξ(p) − ξˆ (p)| ≤ α(p) |p − p0 |,
(2.5.2)
where α(p) : N → R+ satisfies α(p) → 0 as p → p0 . Thus if ξˆ has a directional derivative at p0 , then so does ξ(p) and the directional derivatives coincide. We henceforth concentrate on the analysis of directional differentiability of ξˆ (p). From (2.5.2) and Lipschitz continuity if p → ξ(p) at p0 , it follows that for every q ∈ P
ξˆ (p + t q) − ξˆ (p ) 0 0
lim sup
< ∞.
t t>0 ˆ
ˆ
ξ (p0 ) Hence there exist weak cluster points of ξ (p0 +t q)− as t → 0+ . Note that these weak t ξ(p0 +t q)−ξ(p0 ) cluster points coincide with those of as t → 0+ . We will show that under t appropriate conditions, these weak cluster points are strong limit points and we derive equations that they satisfy. The following definition and additional hypotheses will be used.
Definition 2.20. A closed convex set C of a Hilbert space H is called polyhedric at z ∈ H if ∪λ>0 λ(C − P z) ∩ [z − P z]⊥ = ∪λ>0 λ(C − P z) ∩ [z − P z]⊥ , where P denotes the metric projection onto C and [z − P z]⊥ stands for the orthogonal complement of the subspace spanned by x − P z ∈ H . Moreover C is called polyhedric if C is polyhedric at every z ∈ H . (H5) e(x0 , ·), (x0 , ·), f (x0 , ·), e (x0 , ·), g (x0 , ·), and (x0 , ·) are directionally differentiable at p0 . (H6) K is polyhedric at (x0 , p0 ) + η0 . (H7) There exists ν > 0 such that Ax, x ≥ ν |x|2 for all x ∈ ker E. (H8) EL : X → W × Z is surjective. Since every element z ∈ Z can be decomposed uniquely as z = z1 + z2 with z1 = PK z and z2 = PK + z and z1 , z2 = 0 (see [Zar]), (H.6) is equivalent to ∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ = ∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ . Recall the decomposition of the inequality constraint with finite rank and the notation that was introduced in (2.1.6). Due to the complementarity condition and continuous dependency of ξ(p) on p one can assume that g + (x(p), p) = 0, g − (x(p), p) < 0, μ+ (p) > 0, μ− (p) = 0
(2.5.3)
for p ∈ N. This also holds with (x(p), μ(p)) replaced by (x(p), ˆ μ(p)). ˆ
i
i i
i
i
i
i
2.5. Differentiability
ItoKunisc 2008/6/12 page 55 i
55
˙ μ, Theorem 2.21. Let (H1)–(H5) hold and let (x, ˙ λ, ˙ η) ˙ denote a weak cluster point of ξ(p0 +t q)−ξ(p0 ) + ˙ as t → 0 . Then ( x, ˙ λ, μ, ˙ η) ˙ satisfies t ⎧ Lp (x0 , po , λ0 , μ0 , η0 )q + Ax˙ + E ∗ λ˙ + G∗ μ˙ + L∗ η, ˙ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ −ep+ (x0 , p0 )q − E+ x, ˙ ⎪ ⎪ ⎪ ⎪ ⎨ 0∈ −gp0 (x0 , p0 ) − G0 x˙ + ∂ψRm2 ,+ (μ˙ 0 ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ μ˙ − , ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ η, ˙ (x0 , p0 ) + η0 , Lx˙ + p (x0 , p0 )q .
(2.5.4)
ˆ
ˆ
ξ (p0 ) Proof. As described above, we can restrict ourselves to a weak cluster point of ξ (p0 +t q))− . t We put pn = p0 + tn q. By (2.5.1)
0 = L (x0 , pn , λ0 , μ0 , η0 ) − L (x0 , p0 , λ0 , μ0 , η0 ) ˆ n ) − λ0 ) + G∗ (μ(p +A(x(p ˆ n ) − x0 ) + E ∗ (λ(p ˆ n ) − μ0 ) + L∗ (η(p ˆ n ) − η0 ) = 0, 0 = −e(x0 , pn ) + e(x0 , p0 ) − E(x(p ˆ n ) − x0 ) = 0. Dividing these two equations by tn > 0 and letting tn → 0+ we obtain the first two equations in (2.5.4). For the third equation note that μ˙ 0 ≥ 0 since μˆ 0 (pn ) ≥ 0 and μ0 (p0 ) = 0. Since g 0 (x0 , p0 ) = 0 we have from (2.5.1) g 0 (x0 , pn ) − g 0 (x0 , p0 ) + G0 (x(p ˆ n ) − x0 ), z − (μˆ 0 (pn ) − μ0 (p0 ) ≥ 0 for all z ∈ Rm2 ,+ . Dividing this inequality by tn > 0 and letting tn → 0+ we obtain the third inclusion. The fourth equation is obvious. For the last equation we recall that η(p ˆ n ), (x0 , pn ) + L(x(p ˆ n ) − x0 ) = η0 , (x0 , p0 ) = 0. Thus, η(p ˆ n ) − η0 , (x0 , pn ) + η(p ˆ n ), L(x(p ˆ n ) − x0 ) + η0 , (x0 , pn ) − (x0 , p0 ) = 0, which implies the last equality. Define the value function V (p) V (p) = inf {f (x, p) : e(x, p) = 0, g(x, p) ≤ 0, (x, p) ∈ K}. x∈c
Then we have the following corollary. Corollary 2.22 (Sensitivity of Cost). Let (H.1)–(H.4) hold and assume that e, g, are continuously differentiable in the sense of Fréchet at (x0 , p0 ). Then the Gâteaux derivative
i
i i
i
i
i
i
56
ItoKunisc 2008/6/12 page 56 i
Chapter 2. Sensitivity Analysis
of the value function μ at p0 exists and is given by V (p0 ) = Lp (x0 , p0 , λ0 , μ0 , η0 ) = fp (x0 , p0 ) + λ0 , ep (x0 , p0 ) + μ0 , gp (x0 , p0 ) + η0 , p (x0 , p0 ) . Proof. Note that V (p) = f (x(p), p) = L(x(p), p, λ(p), μ(p), η(p)). For q ∈ P let ξ(t) = t −1 (x(p0 + t q) − x(p0 )), t > 0. Then there exists a subsequence tn ˙ μ, such that limn→∞ tn = 0 and ξ(tn ) → (x, ˙ λ, ˙ η). ˙ Now, let pn = p0 + tn q and V (pn ) − V (p0 ) = L(x(pn ), pn , λ(pn ), μ(pn ), η(pn )) −L(x(pn ), p0 , λ(pn ), μ(pn ), η(pn )) + L(x(pn ), p0 , λ(pn ), μ(pn ), η(pn )) −L(x(pn ), p0 , λ(p0 ), μ(p0 ), η(p0 )) + L(x(pn ), p0 , λ(p0 ), μ(p0 ), η(p0 )) −L(x(p0 ), p0 , λ(p0 ), μ(p0 ), η(p0 )). Thus, V (pn )−V (p0 ) tn
= L (x0 , p0 , λ0 , μ0 , η0 ), x
˙ + L(x0 , p0 , λ0 , μ0 , η0 )p , q
˙ e(x0 , p0 ) + μ, ˙ g(x0 , p0 ) + η, ˙ (x0 , p0 ) + O(tn ). + λ, ˙ e(x0 , p0 ) = 0 and considering separately the cases according to (2.5.3) Clearly λ, we find μ, ˙ g(x0 , p0 ) = 0 as well. Below we verify that η0 , Lx˙ + p (x0 , p0 )q = 0.
(2.5.5)
From Theorem 2.21 this implies that η, ˙ (x0 , p0 ) = 0. To verify (2.5.5) note that from (2.5.1) we have ˆ n ) − x0 ) ≤ 0 η(t ˆ n ), (x0 , p0 ) − (x0 , pn ) − L(x(t and consequently η0 , p (x0 , p0 )q + L x
˙ ≥ 0. Moreover ˆ n ) − x0 ) ≤ 0 η0 , (x0 , pn ) − (x0 , p0 ) + L(x(t ˙ ≤ 0 and (2.5.5) holds. It follows that and therefore η0 , p (x0 , p0 )q + L x
lim
n→∞
V (pn ) − V (p0 ) = Lp (x0 , p0 , λ0 , μ0 , η0 ), q . tn
Since the limit does not depend on the sequence {tn } the desired result follows.
i
i i
i
i
i
i
2.5. Differentiability
ItoKunisc 2008/6/12 page 57 i
57
In order to prove strong differentiability of x(p) we use a result due to Haraux [Har] on directional differentiability of the metric projection onto a closed convex set. Theorem 2.23. Let C be a closed convex set in a Hilbert space H with metric projection P from H onto C. Let ψ be an H -valued function and assume that C is polyhedric at ψ(0). Set ˆ K(ψ(0)) = ∪λ>0 λ(C − P ψ(0)) ∩ [ψ(0) − P ψ(0)]⊥ , ˆ and denote by PK(ψ(0)) the projection onto K(ψ(0)). If there exists a sequence tn such that ˆ ψ(tn )−ψ(0) + ˙ exists in H , then =: ψ limn→∞ tn → 0 and limn→∞ tn lim
tn →0+
P ψ(tn ) − P ψ(0) ˙ = PK(ψ(0)) ψ. ˆ tn
ψ(0) Proof. Let γ (t) = P ψ(t)−P . Since the metric projection onto a closed convex set in a t Hilbert space is a contraction, γ (tn ) has a weak limit which is denoted by γ . Since
ψ(tn ) − P ψ(tn ), P ψ(0) − P ψ(tn ) ≤ 0 and P ψ(tn ) = tn γ (tn ) + P ψ(0), we obtain ψ(tn ) − tn γ (tn ) − P ψ(0), −tn γ (tn ) ≤ 0. This implies tn and hence
ψ(tn ) − ψ(0) + ψ(0) − tn γ (tn ) − P ψ(0), −tn γ (tn ) ≤ 0 tn
tn2
ψ(tn ) − ψ(0) γ (tn ), γ (tn ) − ≤ tn γ (tn ), ψ(0) − P ψ(0)
tn
(2.5.6)
= P ψ(tn ) − P ψ(0), ψ(0) − P ψ(0) ≤ 0 for all n. Since the norm is weakly lower semicontinuous, (2.5.6) implies that ˙ ≤ 0. γ , γ − ψ
(2.5.7)
Employing (2.5.6) again we have 0 ≥ γ tn ), ψ(0) − P ψ(0) ≥ tn
ψ(tn ) − ψ(0) γ (tn ), γ (tn ) − tn
for tn > 0. Since {γ (tn )} is bounded, this implies γ , ψ(0) − P ψ(0) = 0, ˆ and thus γ ∈ K(ψ(0)).
i
i i
i
i
i
i
58
ItoKunisc 2008/6/12 page 58 i
Chapter 2. Sensitivity Analysis
Let w ∈ ∪λ>0 λ(C − P ψ(0)) ∩ [ψ(0) − P ψ(0)]⊥ be arbitrary. Then there exist λ > 0 and u ∈ C such that w = λ (u − P ψ(0)) and ψ(0) − P ψ(0), u − P ψ(0) = 0. Since ψ(tn ) − tn γ (tn ) − P ψ(0), u − P ψ(0) − tn γ (tn ) ≤ 0, we find for δn = γ (tn ) − γ that ψ(tn ) − ψ(0) + ψ(0) − P ψ(0) − tn γ − tn δn , u − P ψ(0) − tn γ − tn δn ≤ 0. tn tn Thus, ψ(tn ) − ψ(0) tn − tn γ − tn δn , u − P ψ(0) − tn γ − tn δn ≤ ψ(0) − P ψ(0), tn δn
tn and therefore ψ(tn ) − ψ(0) − γ , u − P ψ(0) ≤ δn , u − P ψ(0) + ψ(0) − P ψ(0), δn + tn M tn for some constant M. Letting tn → 0+ we have ψ˙ − γ , u − P ψ(0) ≤ 0. Since C is polyhedric at ψ(0) it follows that ψ˙ − γ , w ≤ 0
ˆ for all w ∈ K(ψ(0)).
(2.5.8)
Combining (2.5.7) and (2.5.8) one obtains ˙ γ − w ≤ 0 γ − ψ,
ˆ for all w ∈ K(ψ(0)),
ψ˙ and thus γ = PK(ψ(0)) ˆ To show strong convergence of γ (tn ) observe that from (2.5.7) and (2.5.8) ψ˙ − γ , γ = 0. Hence ˙ γ = |γ |2 ≤ lim inf |γ (tn )|2 ≤ lim sup ψ,
ψ(tn ) − ψ(0) ˙ γ , , γ (tn ) = ψ, tn
which implies that lim |γ (tn )|2 = |γ |2 and completes the proof. Theorem 2.24 (Sensitivity Equation). Let (H1)–(H8) hold. Then the solution mapping p → ξ(p) = (x(p), λ(p), μ(p), η(p)) is directionally differentiable at p0 , and the directional derivative (x, ˙ λ˙ , μ, ˙ η) ˙ at p0 in direction q ∈ P satisfies ⎧ Lp (x0 , p0 , λ0 , μ0 , η0 )q + Ax˙ + E ∗ λ˙ + G∗+ μ˙ + G∗0 μ˙ 0 + L∗ η, ˙ ⎪ ⎪ ⎨ ˙ −ep+ (x0 , p0 )q − E+ x, 0∈ (2.5.9) −gp0 (x0 , p0 )q − G0 x˙ + ∂ψRm,+ (μ˙ 0 ), ⎪ ⎪ ⎩ −p (x0 , p0 )q − Lx˙ + ∂ψKˆ + (η). ˙
i
i i
i
i
i
i
2.5. Differentiability
ItoKunisc 2008/6/12 page 59 i
59
Proof. Let {tn } be a sequence of real numbers with limn→∞ tn = 0+ and w−lim tn−1 (ξ(p0 + ˙ μ, ˙ μ, tn q) − ξ(p0 )) = (x, ˙ λ, ˙ η). ˙ Then (x, ˙ λ, ˙ η) ˙ is also a weak cluster point of w − −1 ˆ lim tn (ξ (p0 + tn q) − ξ(p0 )). The proof is now given in several steps. ˆ n )−x(0) converges strongly in X as n → ∞ and (2.5.9) holds for We first show that x(t tn all weak cluster points. Let pn = p0 + tn q and define φ(tn ) = L (x0 , pn , λ0 , μ0 , η0 ) + f (x0 , p0 ) − Ax0 . From (2.5.1) we have ⎧ ˆ n )) ˆ n ) − x0 ) + E ∗ (λ(t L (x0 , pn , λ0 , μ0 , η0 ) + f (x0 , p0 ) + A(x(t ⎪ ⎪ ⎪ ∗ ∗ ⎪ ˆ n )) + L (η(t ˆ n )), +G (μ(t ⎨ 0∈ −e(x0 , pn ) − E(x(t ˆ n ) − x0 ), ⎪ ⎪ ˆ n ) − x0 ) + ∂ψRm,+ (μ(t ˆ n )), ⎪ −g(x0 , pn ) − G(x(t ⎪ ⎩ −(x0 , pn ) − L(x(t ˆ n ) − x0 ) + ∂ψK + (η(t ˆ n )).
(2.5.10)
By (H8) there exists a unique w(tn ) ∈ range (E ∗ , L∗ ) such that ⎛ ⎝
E
⎞
⎛
⎠ w(tn ) = ⎝
L
⎞
−e(x0 , pn ) + Ex0
⎠.
−(x0 , pn ) + Lx0
˙ If we Due to (H5) and (H8) there exists w˙ ∈ X such that lim tn−1 (w(tn ) − w(0)) = w. ˆ n ) − w(tn ), then using Lemma 2.7 one verifies that y(tn ) ∈ C, where define y(tn ) = x(t C = {x ∈ X : Ex = 0 and Lx ∈ K}. Observe that ˆ n ) + L∗ η(t ˆ n ), c − yn (tn ) ≤ 0 E ∗ λ(t
for all c ∈ C
and thus Ayn (t) + φ(tn ) + Aw(tn ) + G∗ μ(t ˆ n ), c − y(tn ) ≥ 0
for all c ∈ C.
(2.5.11)
If we equip ker E with the inner product ((x, y)) = Ax, y
for x, y ∈ ker E,
then ker E is a Hilbert space by (H7). Let AP = Pker E A Pker E and define ∗ ψ(tn ) = −A−1 P Pker E (φ(tn ) + Aw(tn ) + G μ(tn )) ∈ ker E.
Then limtn →0+
ψ(tn )−ψ(0) tn
=: ψ˙ exists, and (2.5.11) is equivalent to
((y(tn ) − ψ(tn ), c − y(tn ))) ≥ 0
for all c ∈ C.
i
i i
i
i
i
i
60
ItoKunisc 2008/6/12 page 60 i
Chapter 2. Sensitivity Analysis
Thus y(tn ) = PC ψ(tn ), where PC denotes the metric projection in ker E onto PC with respect to the inner product ((·, ·)). For any h ∈ ker E we find ((h, ψ(0) − PC ψ(0) )) = h, −f (x0 , p0 ) + A xo − A w(0) − G∗ μ0 − A y(0)
= h, −f (x0 , p0 ) − G∗ μ0 − E ∗ λ0 = h, L∗ η0 , and therefore [ψ(0) − PC ψ(0)]⊥ = {h : η0 , L h = 0}.
(2.5.12)
Here the orthogonal complement is taken with respect to the ((·, ·))—the inner product on ker E. Moreover we have L(y(0) = (x0 , p0 ). Hence from Lemma 2.25 below we have ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ , i.e., C is polyhedric with respect to ψ(0). Theorem 2.23 therefore implies that limtn →o+ (tn−1 )(y(tn ) − y(0)) = : y˙ exists and satisfies ˆ ((ψ˙ − y, ˙ v − y)) ˙ ≥ 0 for all v ∈ C(ψ(0)).
(2.5.13)
˙ Moreover limtn →0 (tn−1 )(x(tn ) − x(0)) exists and will be denoted by x. ˙ μ, To verify that (x, ˙ λ, ˙ η) ˙ satisfies (2.5.9) it suffices to prove the last inclusion. From (2.5.13) we have −L p (x0 , p0 , λ0 , μ0 , η0 )q − A x˙ − G∗ μ, ˙ v − y
˙ ≤0 and from the first equation of (2.5.4) L∗ η, ˙ v − y
˙ = η, ˙ L v − p (x0 , p0 )q ≤ 0
(2.5.14)
ˆ for all v ∈ C(ψ(0)) = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ . Here we used the facts that φ˙ = Lp (x0 , λ0 , μ0 , ηo )q and L w˙ = −p (x0 , p0 )q. From (2.5.12), (H8), and L y(0) = (x0 , p0 ) we have ˆ L C(ψ(0)) = ∪λ>0 λ(K − (x0 , p0 ) ∩ [η0 ]⊥ = : Kˆ and therefore by (2.5.14) ˆ η, ˙ v − L x˙ − p(x0 , p0 )q ≤ 0 for all v ∈ K.
(2.5.15)
Recall from (2.5.5) that η0 , p (xo , p0 )q + L x
˙ = 0. ˆ and from (2.5.15) we deduce η˙ ∈ Kˆ + . ConseThis further implies p (x0 , p0 )q + L x˙ ∈ K, ˙ which is equivalent to p(xo , p0 )q +L x˙ ∈ ∂ ψKˆ + (η) ˙ quently η˙ ∈ ∂ ψKˆ (p (x0 , p0 )q +L x), and implies the last inclusion in (2.5.9).
i
i i
i
i
i
i
2.5. Differentiability
ItoKunisc 2008/6/12 page 61 i
61 ˆ
Next, we show that the weak cluster points of limt→0+ ξ (t)−ξ(0) are unique. Let t ˙ (x˙i , λi , μ˙ i , η˙ i ), i = 1, 2, be two weak cluster points. Then by (2.5.9) 0 = A(x˙1 − x˙2 ), x˙1 − x˙2 + μ˙ 1 − μ˙ 2 , G(x˙1 − x˙2 ) + η˙ 1 − η˙ 2 , L(x˙1 − x˙2 )
= A(x˙1 − x˙2 ), x˙1 − x˙2 + μ˙ 01 , −gp0 (x0 , p0 )q − G0 x˙2 + μ˙ 02 , −gp0 (x0 , p0 )q − G0 x˙1
+ η˙ 1 , −p (x0 , p0 )q − Lx˙2 + η˙ 2 , −p (x0 , p0 )q − Lx˙1
≥ A(x˙1 − x˙2 ), x˙1 − x˙2 , and by (H2) this implies x˙1 = x˙2 . From (2.5.9) we have ⎛ ˙ ⎞ λ˜ 1 − λ˙˜ 2 E ∗ ((x0 , p0 )) ⎝ μ˙ 01 − μ˙ 02 ⎠ = 0 η˙ 1 − η˙ 2 ˆ and hence (H3) implies (λ˙ 1 , μ˙ 1 , η˙ 1 ) = (λ˙ 2 , μ˙ 2 , η˙ 2 ). Consequently ξ (t)−ξ(0) has a unique t weak limit as t → 0+ . Finally we show that the unique weak limit is also strong. For the x-component this was already verified. Note that from (2.5.1) ⎞ ⎛ ⎛˜ ⎞ ˆ − x0 ) −L (x0 , p0 + t q, λ0 , μ0 , η0 ) − A(x(t) ˆ − λ˜ 0 λ(t) ⎠. E ∗ ((x0 , p0 )) ⎝ μˆ 0 − μ00 ⎠ = ⎝ ˆ − x0 )
η(t), ˆ −(x0 , p + t q) + (x0 , p0 ) − L(x(t) η(t) ˜ − η0
Dividing this equation by t and noticing that the right-hand side converges strongly as ˜ − λ0 , μ(t) t → 0+ , it follows from (H3) that limt→0+ t −1 (λ(t) ˜ − μ0 , η(t) ˜ − η0 ) converges strongly as well. Lemma 2.25. Assume that (H6) and (H8) hold and let y(0) ∈ C be such that L y(0) = (x0 , p0 ). Then we have ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h = 0} (2.5.16) = ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h = 0}. Proof. It suffices to verify that the set on the right-hand side of (2.5.16) is contained in the set on the left. For this purpose let y ∈ ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h = 0} and decompose y = y1 + y2 with y1 ∈ ker( EL ) and y2 ∈ (ker( EL ))⊥ . We need only consider y2 . Since L y2 ∈ ∪λ→0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ and therefore L y2 ∈ ∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ by (H6), hence there exist sequences {λn } and {kn } with λn > 0, kn ∈ K, kn , η0 = 0, and lim λn (kn − (x0 , p0 )) = L y2 . Let us define cn = c˜n +y(0), where c˜n is the unique element in ( EL )∗ satisfying ( EL )c˜n = ( kn −L0 y(0) ). Then we have E(cn −y(0)) = 0, η0 , L cn = η0 , kn = 0 and hence the sequence λn (cn −y(0)) is contained in the set on the left-hand side of (2.5.16). Moreover limn→∞ ( EL )(λn (cn −y(0)) = ( EL )y2 , and since both λn (cn − y(0)) and y2 are contained in the range of ( EL )∗ we find that limn→∞ λn (cn − y(0)) = y2 .
i
i i
i
i
i
i
62
ItoKunisc 2008/6/12 page 62 i
Chapter 2. Sensitivity Analysis
2.6 Application to optimal control of an ordinary differential equation We consider optimal control of the linear ordinary differential equation with a terminal constraint T min fˆ(y(t), u(t), α)dt 0
subject to y(t) ˙ = A(α)y(t) + B(α)u(t) + h(t) on (0, T ),
(2.6.1)
y(0) = 0, |y(T ) − y d | ≤ δ, u≤z
a.e. on (0, T ),
where T > 0, y(t) ∈ Rn , u(t) ∈ Rn , α ∈ Rr , y d ∈ Rn , δ > 0, z ∈ L2 , A(α) ∈ Rn×n , B(α) ∈ Rn×m , fˆ : Rn ×Rm ×Rr → R, and h ∈ L2 (0, T ). All function spaces of this section are considered on the interval (0, T ) with Euclidean image space of appropriate dimension. The perturbation parameter is given by p = (α, y d , h, z) ∈ P = Rr × Rn × L2 × L2 with reference parameter p0 = (α0 , y0d , h0 , z0 ). To identify (Pp ) as a special case of (2.1.1) we set X = HL1 × L2 , W = L2 , Z = L2 , K = {φ ∈ L2 : φ ≤ 0 a.e.}, where HL1 = {φ ∈ H 1 : φ(0) = 0}, x = (y, u), and T f (y, u, p) = fˆ(y, u, α)dt, 0
e(y, u, p) = y˙ − A(α)y − B(α)u − h, g(y, u, p) =
1 |y(T ) − y d |2 − δ 2 , 2
(y, u, p) = u − z. The linearizations E, G, and L are E(y0 , u0 , p0 )(h, v) = h˙ − A(α0 )h − B(α0 )v, G(y0 , u0 , p0 )(h, v) = y0 (T ) − y0d , h(T ) , L(y0 , u0 , p0 )(h, v) = v. The following assumptions will imply conditions (H1)–(H8) of this chapter. (A1) There exists a solution (y0 , u0 ) of (2.6.1).
i
i i
i
i
i
i
2.6. Application to optimal control of an ODE
ItoKunisc 2008/6/12 page 63 i
63
(A2) A(·) and B(·) are Lipschitz continuous and directionally differentiable at α0 . (A3) (y, u) → f (y, u, α) is twice continuously differentiable in a neighborhood of (y0 , u0 , α0 ) ∈ HL1 × L2 × Rr . (A4) There exists > 0 such that for the Hessian with respect to (y, u) (h, v) fˆ (y0 , u0 , α0 ) (h, v)T ≥ |v|2
for all (h, v) ∈ Rn × Rm . (A5) There exist a neighborhood Vˆ of (y0 , u0 , α0 ) ∈ HL1 × L2 × Rr and a constant κ such that |f (y, u, α1 ) − f (y, u, α2 )| ≤ κ|α1 − α2 | for all (y, u, αi ) ∈ Vˆ , i = 1, 2, and f (y0 , u0 , ·) is directionally differentiable at α0 , where the prime denotes the derivative with respect to (y, u). T (A6) y0 (T ) − y0d , 0 eA(α0 )(T −s) B(α0 ) (z0 − u0 )ds = 0. With (A3) holding, the regularity requirements made at the beginning of the chapter are satisfied. The regular point condition (H1) will be satisfied if for every (φ, ρ, ψ) ∈ L2 × R × L2 there exist (h, v) ∈ HL1 × L2 and (r + , r, k) ∈ R+ × R × K such that ⎧ h˙ − A(α0 )h − B(α0 )v = φ, ⎪ ⎪ ⎪ ⎪ ⎨ (2.6.2) y0 (T ) − y0d , h(T ) + r + + r g(y0 , u0 , p0 ) = ρ, ⎪ ⎪ ⎪ ⎪ ⎩ v − k + r(u0 − z0 ) = ψ. From the first and third equations we have v = ψ + k + r(z0 − u0 ),
t
h(t) = ρ1 (t) + t
eA(α0 )(t−s) B(α0 )(k + r(z0 − u0 ))dt,
0
where ρ1 (t) = 0 e y0 (T ) − y0α ,
(B(α0 )ψ +φ)ds. The second equation in (2.6.2) is equivalent to eA(α0 )(T −s) B(α0 )(k + r(z0 − u0 ))ds + r + + r g(y0 , u0 , p0 )
A(α0 )(t−s) T
0
= ρ − ρ1 (T ). If g(y0 , u0 , p0 ) = 0, then, using (A6), the desired solution for (2.6.2) is obtained by setting r + = k = 0 and −1 T d A(α0 )(T −s) e B(α0 )(z0 − u0 )ds (ρ − ρ1 (T )). r = y0 (t) − yo , 0
If g(y0 , u0 , p0 ) < 0 and ρ−ρ1 ≥ 0, then the choice r + = ρ−ρ1 , r = k = 0 gives the desired solution. Finally if g(y0 , u0 , p0 ) < 0 and ρ − ρ1 < 0, then r = (ρ − ρ1 )g(y0 , u0 , p0 ) > 0,
i
i i
i
i
i
i
64
ItoKunisc 2008/6/12 page 64 i
Chapter 2. Sensitivity Analysis
r + = 0, and k = r(u0 − u) ∈ K give a solution for (2.6.2). Thus the regular point condition holds and implies the existence of a Lagrange multiplier (λ0 , μ0 , η0 ) ∈ L2 × R × L2 . For the Lagrangian functional T L(y, u, p0 , λ0 , μ0 , η0 ) = fˆ(y, u, α0 )dt + λ0 , y˙ − A(α0 )y − B(α0 )u − h0
0
+
μ0 (|y(T ) − y0d | − δ 2 |) − η0 , u − z0 , 2
the Hessian with respect to (y, u) at (y0 , u0 ) in directions (h, v) ∈ HL1 × L2 satisfies
L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v))
T
= 0
(h(t), v(t))fˆ (y0 , u0 , α0 )(h(t), v(t))T dt + μ0 |h(T )|2 ≥ |v|2L2
by (A4). Let k1 > 0 be chosen such that |h|HL1 ≤ k1 |v|L2 for all (h, v) ∈ HL1 × L2 in ker E. Then there exists a constant k2 > 0 such that
L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v)) ≥ k2 (|h|2H 1 + |v|1L2 ) L
for all (h, v) ∈ ker E and (H2), (H7) hold. We turn to the verification of (H3). The case g(y0 , u0 , α0 ) < 0 is simple and we therefore consider the case g(y0 , u0 , α0 ) = 0. For arbitrary (φ, ρ, ψ) ∈ L2 × R × L2 existence of a solution (h, v, r) ∈ H11 × L2 × R to ⎧ h˙ − A(α0 )h − B(α0 )v = φ, ⎪ ⎪ ⎪ ⎪ ⎨ (2.6.3) y0 (T ) − y0d , h(T ) = ρ, ⎪ ⎪ ⎪ ⎪ ⎩ v + r(z0 − u0 ) = ψ must be shown. From the first and third equations v = ψ − r(z0 − u0 ) and
t
h(t) = ρ1 (t) − r
eA(α0 )(t−s) B(α0 )(z0 − u0 )ds.
0
From the second equation of (2.6.3) and (H7) we find −1 T A(α0 )(T −s) d r = y0 (T ) − y0 , e B(α0 )(z0 − u0 )ds 0
( y0 (T ) − y0d , ρ1 (T ) − ρ), and (H3) holds. Conditions (H4) and (H5) follow from (A2) and (A5). The cone of a.e. nonpositive functions in L2 is polyhedric [Har] and hence (H6) holds. Finally (H8) is simple to check. We note that (A5) and (A6) are not needed if (2.6.1) is considered without terminal constraint.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 65 i
Chapter 3
First Order Augmented Lagrangians for Equality and Finite Rank Inequality Constraints 3.1
Generalities
This chapter is devoted to first order augmented Lagrangian methods for constrained problems of the type min f (x) over x ∈ X (3.1.1) subject to e(x) = 0, g(x) ≤ 0, (x) ∈ K, where f : X → R, e : X → W, g : X → Rm , : X → Z with X, W , and Z real Hilbert spaces, K a closed convex cone with vertex at 0 in Z, and Rm endowed with the natural ordering x ≤ 0 if xi ≤ 0 for i = 1, . . . , m. A sequence of unconstrained problems with the property that their solutions converge to the solution of (3.1.1) will be formulated. As we shall see, first order augmented Lagrangian techniques are related to Lagrangian or duality methods, as well as to the penalty technique. They arise from considering the Lagrangian functional for (3.1.1) and adding a proper penalty term. Differently from pure penalty methods, however, the penalty parameter is not taken to infinity. Rather the penalty term with fixed weight is used to enhance convexity of the Lagrangian functional in the neighborhood of a local minimum and to speed up convergence of pure Lagrangian methods. Affine inequality constraints with infinite-dimensional image space can be added to the problem formulation (3.1.1). Such constraints, however, are not augmented in this chapter. Central to the analysis is the augmentability property, which is also of interest in its own right. It has received a considerable amount of attention for finite-dimensional optimization problems; see, e.g., [Hes1]. Augmentability of (3.1.1) means that a properly defined penalty term involving the constraints in (3.1.1) can be added to the Lagrangian functional for (3.1.1) such that the resulting augmented Lagrangian functional has the property that its Hessian is positive definite on all of X under minimal assumptions on (3.1.1). We shall present a convergence theory for the first order augmented Lagrangian method applied to (3.1.1) without strict complementarity assumption. This means that we do not assume that μ∗i = 0 implies gi (x ∗ ) < 0, where x ∗ denotes a local solution of (3.1.1) and μ∗i is the ith coordinate of the Lagrange multiplier associated to the inequality constraint g(x) ≤ 0. As an application we shall discuss the fact that in the context of parameter estimation problems the first order 65
i
i i
i
i
i
i
66
ItoKunisc 2008/6/12 page 66 i
Chapter 3. First Order Augmented Lagrangians
augmented Lagrangian method can be considered as a hybrid method combining the outputmatching and equation error methods. In the context of parameter estimation or, equally well, of optimal control problems, the first order augmented Lagrangian method lends itself to vectorization and parallelization in a natural way; see [KuTa]. It will be convenient throughout this section to identify the Hilbert spaces X, W , and Z with their duals. As a consequence the Lagrange multiplier associated to the equality constraint e(x) = 0 is sought in W and that for (x) ∈ K in Z. Let us now fix those assumptions which are assumed throughout this chapter: There exists x ∗ ∈ X and (λ∗ , μ∗ , η∗ ) ∈ W × Rp × Z such that f, e, and g are twice continuously Fréchet differentiable (3.1.2) in a neighborhood of x ∗ , x ∗ is a stationary point of (3.1.1) with Lagrange multiplier (λ∗ , μ∗ ), i.e., (f (x ∗ ), h)X + λ∗ , e (x ∗ )h W + μ∗ , g (x ∗ )h Rm + η∗ , (x ∗ )h Z = 0 for all h ∈ X, and
(3.1.3)
-
e(x ∗ ) = 0, μ∗ , g(x ∗ ) Rm = 0, μ∗ ≥ 0, g(x ∗ ) ≤ 0, ∗ η , (x ∗ ) Z = 0, η∗ ∈ K + , (x ∗ ) ∈ K.
(3.1.4)
Moreover we assume that e (x ∗ ) : X → W is surjective. (3.1.5) Above ·, · W and ·, · Z denote the inner products in W and Z, respectively, and ·, · Rm is the usual inner product in Rm . If x ∗ is a local solution of (3.1.1), if further (3.1.2) holds and x ∗ is a regular point in the sense of Definition 1.5, then there exists a Lagrange multiplier (λ∗ , μ∗ , η∗ ) such that (3.1.3), (3.1.4) hold. Introducing the Lagrange functional L : X × W × Rm × Z → R by L(x, λ, μ, η) = f (x) + λ, e(x) W + μ, g(x) Rm + η, (x) Z ,
(3.1.3) can be expressed as L (x ∗ , λ∗ , μ∗ , η∗ )h = 0 for all h ∈ X. Here and below the prime denotes differentiation with respect to x. With (3.1.2) holding, L (x ∗ , λ∗ , μ∗ , η∗ ) : X × X → R exists as symmetric bilinear form given by L (x ∗ , λ∗ , μ∗ , η∗ )(h, k) = f (x ∗ )(h, k) + λ∗ , e (x ∗ )(h, k) W + μ∗ , g (x ∗ )(h, k) Rm for all (h, k) ∈ X×X. The index set of the coordinates of the finite rank inequality constraints is decomposed as I1 = {i : gi (x ∗ ) = 0, μ∗i = 0}, I2 = {i : gi (x ∗ ) = 0, μ∗i < 0}, I3 = {i : gi (x ∗ ) < 0}.
i
i i
i
i
i
i
3.2. Augmentability and sufficient optimality
ItoKunisc 2008/6/12 page 67 i
67
Clearly I1 ∪ I2 ∪ I3 = {1, . . . , m}. If I1 is empty, then we say that strict complementarity holds. Let us denote the cardinality of Ii by mi . Without loss of generality we assume that mi ≥ 1 for i = 1, 2, 3 and that I1 = 1, . . . , m1 , I2 = m1 + 1, . . . , m1 + m2 , and I3 = m1 + m2 + 1, . . . , m. The following second order sufficient optimality condition will be of central importance: There exists a constant γ > 0 such that ⎧ ⎪ L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) ≥ γ |h|2X for all h ∈ C, ⎪ ⎨ where C = {h ∈ X : e (x ∗ )h = 0, gi (x ∗ )h ≤ 0 ⎪ ⎪ ⎩ for i ∈ I1 , gi (x ∗ )h = 0 for i ∈ I2 }.
(3.1.6)
In case X is finite-dimensional, (3.1.6) is well known to imply that x ∗ is a strict local solution to (3.1.1); see, e.g., [Be, p. 71]. For the infinite-dimensional case this will be proved in Theorem 3.4 below. Section 3.2 is devoted to augmentability of problem (3.1.1). This means that a functional is associated to (3.1.1) with the property that if x ∗ is a local minimizer of (3.1.1), then it is also a minimizer of that functional where the essential constraints are eliminated and at most simple affine constraints remain. Here this is achieved on the basis of Lagrangian functionals and augmentability is obtained without the use of a strict complementarity assumption. Section 3.3 contains the foundations for the first order augmented Lagrangian algorithm based on a duality framework. The convergence analysis for the first order algorithm is given in Section 3.4. Section 3.5 contains an application to parameter estimation problems.
3.2 Augmentability and sufficient optimality We consider a modification of (3.1.1) where the equality and finite rank inequality constraints of (3.1.1) are augmented: c c min fc (x, u) = f (x) + |e(x)|2W + |g(x) + u|2Rm over (x, u) ∈ X × Rm 2 2 subject to e(x) = 0, g(x) + u = 0, u ≥ 0, (x) ∈ K,
(3.2.1)
where c > 0 will be appropriately chosen. Observe that x ∗ is a local solution to (3.1.1) if and only if (x ∗ , u∗ ) = (x ∗ , −g(x ∗ )) is a (local) solution to (3.2.1). Moreover, x ∗ is stationary for (3.1.1) with Lagrange multiplier (λ∗ , μ∗ , η∗ ) if and only if (x ∗ , −g(x ∗ )) is stationary for (3.2.1) with Lagrange multiplier (λ∗ , μ∗ , μ∗ , η∗ ), i.e., (3.1.3)–(3.1.4) hold with g(x ∗ ) = −u∗ . Associated to (3.2.1) we introduce the functional c c Lc (x, u, λ, μ, η) = L(x, λ, μ, η) + |e(x)|2W + |g(x) + u|2Rm . 2 2
(3.2.2)
i
i i
i
i
i
i
68
ItoKunisc 2008/6/12 page 68 i
Chapter 3. First Order Augmented Lagrangians
Its Hessian L c at (x ∗ , u∗ ) in direction ((h, k), (h, k)) ∈ (X × Rm ) is given by L c (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) = L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|e (x ∗ )h|2 + c|g (x ∗ )h + k|2 = L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|e (x ∗ )h|2 + c |g (x ∗ )h|2 +c
|gi (x ∗ )h + ki |2 + c
i∈I1 ∪I3
(3.2.3)
i∈I2
(|ki |2 + 2ki gi (x ∗ )h).
i∈I2
By the Riesz representation theorem there exists for every i ∈ {1, . . . , m} a unique li ∈ X such that gi (x ∗ )h = li , h X for every h ∈ X, where ·, · X denotes the inner product in X. The operator E : X → W × Rm2 , defined by Eh = e (x ∗ )h, lm1 +1 , h X , . . . , lm1 +m2 , h X , combines the linearized equality constraint and the active inequality constraints, which act like equalities in the sense that the associated Lagrange multipliers are positive. The essential technical result which will imply augmentability of the constraints in (3.1.1) is given next. Proposition 3.1. If (3.1.2)–(3.1.6) hold, then there exist constants c¯ > 0 and τ ∈ (0, γ ] such that ˆ (h, k)) ˆ := L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|Eh|2 + c H ((h, k),
m1 | li , h X + kˆi |2 i=1
≥ τ (|h|2 + |kˆR2 m1 |) 1 for all c ≥ c, ¯ h ∈ X, and kˆ ∈ Rm + .
Proof. Step 1. An equivalent characterization of C is derived by eliminating linear dependencies in the definition of C. The cone in (3.1.6) can be equivalently expressed as C = {h ∈ X : Eh = 0 and li , h X ≤ 0 for all i ∈ I1 }. (3.2.4) We next show that ⎧ there exist m ˜ 2 ≤ m2 and vectors ni , i = m1 + 1, . . . , m1 + m ˜ 2, ⎪ ⎪ ⎪ ⎨ m ˜2 ˜ such that E : X → Y × R defined by ˜ = e (x ∗ )h, nm1 +1 , h , . . . , nm1 +m˜ 2 , h ⎪ Eh ⎪ X X ⎪ ⎩ ˜ is surjective and ker E = ker E.
(3.2.5)
To verify (3.2.5) let ni = Pker e li , where Pker e denotes the projection onto ker e (x ∗ ). Choose I˜2 ⊂ I2 such that {ni }i∈I˜2 are linearly independent and span{ni }i∈I˜2 = span{ni }i∈I2 . Possibly ˜2 1 +m ˜ after reindexing we have {ni }i∈I˜2 = {ni }m m1 +1 . We show that ker E = ker E. Let h ∈ ker E. Then e (x ∗ )h = 0. For every i ∈ I˜2 let li = ni + li⊥ , where li⊥ ∈ (ker e (x ∗ ))⊥ . Then
i
i i
i
i
i
i
3.2. Augmentability and sufficient optimality
ItoKunisc 2008/6/12 page 69 i
69
˜ The converse, ker E˜ ⊂ ker E, is ni , h X = li , h X = 0, i ∈ I˜2 , and therefore h ∈ ker E. ˜ observe that the adjoint E˜ ∗ : W ×Rm˜ 2 → X of proved similarly. To verify surjectivity! of E, ! E˜ is given by E˜ ∗ (w, r) = e (x ∗ )∗ w + i∈I˜2 ri ni . If E˜ ∗ (w, r) = 0, then, using i∈I˜2 ri ni ∈ ! ker e (x ∗ ) and e (x ∗ )∗ w ∈ (ker e (x ∗ ))⊥ , we have i∈I˜2 ri ni = 0 and e (x ∗ )∗ w = 0. Therefore (w, r) = 0, ker E˜ ∗ = 0, and range E˜ = W × Rm2 , by the closed range theorem. Thus (3.2.5) holds. Finally we consider the set of vectors {ni }i∈I1 with ni = PE˜ li . After possible rear˜ 1 } ⊂ I1 such that {ni }i∈I˜1 are rangement of indices there exists a subset I˜1 = {1, . . . , m linearly independent in ker E. The cone C can therefore equivalently be expressed as C = h ∈ X : h ∈ ker E and ni , h X ≤ 0 for all i ∈ I˜1 . To exclude trivial cases we assume throughout that I˜1 and I˜2 are nonempty. Step 2. Here we characterize the polar cone C1∗ = h ∈ ker E : h, h˜ X ≤ 0 for all h˜ ∈ C1 of the closed convex cone C1 = h ∈ ker E : ni , h X ≤ 0 for all i ∈ I˜1 . Denoting by P and P ∗ the canonical projections in ker E onto C1 and C1∗ , respectively, every element h ∈ ker E can be uniquely expressed as h = h1 + h2 with h1 , h2 X = 0 and h1 = P h ∈ C1 , h2 = P ∗ h ∈ C1∗ (see [Zar, p. 256]). Moreover . C1 = Ci with Ci = {h ∈ ker E : ni , h X ≤ 0} i∈I˜1
and C1∗ =
.
Ci
∗
= co
Ci∗ = co
Ci∗
(see [Zar]), with co denoting convex closure. It follows that C1∗
=
m ˜1
αi ni : (α1 , . . . , αm˜ 1 ) ∈
m ˜1 R+
.
(3.2.6)
i=1
Step 3. Let K1 be chosen such that K1
m ˜1 i=1
2
m˜ 1
|αi | ≤ αi n i
2
i=1
for every (α1 , . . . , αm˜ 1 ) ∈ Rm˜ 1 . For arbitrary h = |h| = 2
m ˜1 i=1
!m˜ 1 i=1
αi ni ∈ C1∗ we have
αi n i , h X ≤ αi ni , h X , I1+
i
i i
i
i
i
i
70
ItoKunisc 2008/6/12 page 70 i
Chapter 3. First Order Augmented Lagrangians
where I1+ = {i ∈ I˜1 : ni , h X > 0 and αi > 0}. Consequently ⎞⎛ ⎞1/2 ⎞1/2 ⎛ ⎛ 2 2 −1/2 ni , h ⎠ ≤ K1 |h| ⎝ ni , h ⎠ αi2 ⎠ ⎝ |h|2 ≤ ⎝ X
I1+
X
Ii+
and K1 |h|2 ≤
I1+
2
ni , h
X
for every h ∈ C1∗ .
(3.2.7)
I1+
Step 4. Every h ∈ X can be uniquely decomposed into mutually orthogonal elements as h = h1 + h2 + h3 , where h1 ∈ C1 ⊂ ker E, h2 ∈ C1∗ ⊂ ker E, and h3 ∈ ker E ⊥ . !m˜ 1 α n . Note By Step 2 there exists a vector (α1 , . . . , αm˜ 1 ) ∈ Rm˜ 1 such that h2 = i=1 !m˜i1 i that ni , h1 X ≤ 0 for all i ∈ I˜1 . Therefore, using the fact that h1 , h2 X = i=1 αi h 1 , ni X = 0, it follows that ˜ , then if αi > 0 for some i = 1, . . . , m 1 (3.2.8) ni , h1 X = PE˜ li , h1 X = li , h1 X = 0. 1 We shall use Step 3 with I1+ = {i ∈ I˜1 : ni , h2 X > 0 and αi > 0}. Let kˆ ∈ Rm + and c > 0. Then by (3.1.6) and (3.2.8) we have ˆ (h, k)) ˆ = L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|Eh3 |2 H ((h, k),
li , h3 + li , h2 + kˆi 2 + c
li , h + kˆi 2 +c X X X I1+
I1 \I1+
∗
∗
∗
∗
≥ γ |h1 | + 2L (x , λ , μ , η )(h1 , h2 + h3 ) 2
+ L (x ∗ , λ∗ , μ∗ , η∗ )(h2 + h3 , h2 + h3 ) 2 2 c1 ni , h2 X + kˆi − c1 + c|Eh3 |2 + li , h3 X 2 + + I1
2 c2 ˆ 2 + |ki | − c2 li , h X 2 + + I1 \I1
I1
I1 \I1
for all 0 < c1 , c2 ≤ c to be chosen below. Here we used (a + b)2 ≥ 12 a 2 − b2 and the fact that li , h2 = ni , h2 for i ∈ I1 . To estimate Eh3 from below we make use of the fact that E˜ is an isomorphism from (ker E)⊥ to W × Rm˜ 2 . Hence there exists K2 > / / 0 such ˜ 3 | ≤ |Eh3 | for all h3 ∈ (ker E)⊥ . Setting L = /L (x ∗ , λ∗ , μ∗ )/ , c3 = that K2 |h3 | ≤ |Eh min(c1 , c2 ), and c4 = max(c1 , c2 ) we obtain ˆ (h, k)) ˆ = γ |h1 |2 − 2L|h1 |(|h2 | + |h3 |) − L(|h2 | + |h3 |)2 H ((h, k), 2 c1 c3 ˆ 2 + cK22 |h3 |2 + ni , h2 X + k 2 + 2 I i 1 I1 2 2 2 − 3c4 li , h3 X − 3c2 li , h1 X + li , h2 X , I1
where we used the fact that ni , h2
X
I1 −I1+
≥ 0 for i ∈ I1+ .
i
i i
i
i
i
i
3.2. Augmentability and sufficient optimality Setting |l|2 =
! I1
ItoKunisc 2008/6/12 page 71 i
71
|li |2 and using (3.2.7) we find
ˆ (h, k)) ˆ ≥ γ |h1 |2 + c1 K1 |h2 |2 + cK22 |h3 |2 + c3 kˆ 2 H ((h, k), 2 2 I i 1
− 2L|h1 |(|h2 | + |h3 |) − 2L(|h2 | + |h3 |2 ) 2
− 3c2 |l|2 |h1 |2 − 3c2 |l|2 |h2 |2 − 3c4 |l|2 |h3 |2 c1 c3 ˆ 2 ≥ γ |h1 |2 + K1 |h2 |2 + cK22 |h3 |2 + k 2 2 I i 1 γ 1/2 γ −1/2 γ 1/2 γ −1/2 − 2L |h1 ||h2 | − 2L |h1 ||h3 | 4 4 4 4 − 2L|h2 |2 − 2L|h3 |2 − 3c2 |l|2 |h1 |2 − 3c2 |l|2 |h2 |2 − 3c4 |l|2 |h3 |2 γ 2 2 ≥ |h1 | γ − − 3c2 |l| 2 c 4 2 1 2 2 + |h2 | K1 − L − 2L − 3c2 |l| 2 γ c3 ˆ 2 4 2 2 2 2 + |h3 | cK2 − L − 2L − 3c4 |l| + k . γ 2 I i 1
We now choose the constants in the order c2 , c1 , and c¯ such that the coefficients of |h1 |2 , |h2 |2 , and |h3 |2 , with c replaced by c, ¯ are positive. This implies the claim for every c ≥ c. ¯ It is worthwhile to point out the following corollary to Proposition 3.1 which is well known in finite-dimensional spaces. Corollary 3.2. Let E ∈ L(X, W ) be surjective and let A ∈ L(X) be self-adjoint and coercive on ker E; i.e., there exists γ > 0 such that Ax, x ≥ γ |x|2 for all x ∈ ker E. Then there exist constants τ > 0 and c¯ > 0 such that (A + cE ∗ E)x, x X ≥ τ |x|2 for all x ∈ X and c ≥ c. ¯ This follows from Proposition 3.1 with A = L (x ∗ ). Proposition 3.3. Let (3.1.2)–(3.1.6) hold. Then there exists σ > 0 such that μ∗i ki ≥ σ (|h|2 + |k 2 |) L c¯ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + i∈I2
for all h ∈ X with |h| ≤ μ/(2 ˜ c¯ supi∈I2 |li |), μ˜ = minI2 μ∗i and all k ∈ Rm with ki ≥ 0 for i ∈ I1 ∪ I2 , and c¯ is as introduced in Proposition 3.1. ! Proof. Let A = L c¯ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + i∈I2 μ∗i ki . Then from (3.2.3) and the definition of H
∗
li , h + ki 2 + c¯ ˆ (h, k)) ˆ + c¯ |ki |2 + 2 li , h X ki + μi ki , A = H ((h, k), X I3
I2
I2
i
i i
i
i
i
i
72
ItoKunisc 2008/6/12 page 72 i
Chapter 3. First Order Augmented Lagrangians
where kˆ denotes the first m1 coordinates of the vector k. By Proposition 3.1 we have for every ε > 1 2 ˆ 2Rm1 ) + c¯ li , h X (1 − ε 2 ) + |ki |2 (1 − ε −2 ) A ≥ τ (|h|2 + |k| + c¯
I3
li , h ki + |ki | − 2c¯ μ∗i ki X 2
I2
≥ τ (|h| + 2
I2
ˆ 2Rm1 ) |k|
+ c(1 ¯ − ε )|h| 2
2
−2
+ c(1 ¯ −ε )
A≥
τ 2
I2
|ki |2
I2 ∪I3
|li | − 2c¯ 2
I3
Now choose ε > 1 such that find that
= (ε 2 − 1)τ
li , h ki + μ∗i ki . X I2
! I3
I2
|li |2 . Then using the constraint for h we
1 ˆ 2Rm1 ) + c(1 ¯ − ε −2 ) ki2 + μ˜ − 2c¯ sup |li ||h| ki τ (|h|2 + |k| 2 I2 I ∪I I 2
3
2
3
1 ˆ 2Rm1 ) + c(1 ≥ τ (|h|2 + |k| ¯ − ε −2 ) ki2 . 2 I ∪I
2
This proves the claim for c = c. ¯ For arbitrary c ≥ c¯ the assertion follows from the form of L c (x ∗ , u∗ , λ∗ , μ∗ ). Theorem 3.4. Assume that (3.1.2)–(3.1.6) hold. Then there exist constants σ¯ > 0, c¯ > 0 and a neighborhood U (x ∗ , −g(x ∗ )) = U (x ∗ , u∗ ) of (x ∗ , u∗ ) such that Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u Rm c c = f (x) + λ∗ , e(x) W + μ∗ , g(x) + u Rm + η∗ , (x) Z + |e(x)|2W + |g(x) + u|2Rm 2 2 ≥ f (x ∗ ) + σ¯ (|x − x ∗ |2 + |u − u∗ |2 ) (3.2.9) for all c ≥ c¯ and (x, u) ∈ U (x ∗ , u∗ ) with ui ≥ 0 for i ∈ I1 ∪ I2 . Proof. By (3.1.4) we have μ∗ , u∗ = − μ∗ , g(x ∗ ) = 0. Moreover Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm = f (x ∗ ), (Lc )x (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) = (Lc )u (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) = 0, where (Lc )x and (Lc )u denote the partial derivatives of Lc with respect to x and u. Consequently Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm = f (x ∗ ) + μ∗ , u − u∗ Rm
+ L c (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((x − x ∗ , u − u∗ ), (x − x ∗ , u − u∗ )) + o(|x − x ∗ |2 + |u − u∗ |2 ).
i
i i
i
i
i
i
3.2. Augmentability and sufficient optimality
ItoKunisc 2008/6/12 page 73 i
73
The definitions of I1 and I3 imply that μ∗i = 0 for i ∈ I1 ∪ I3 . Thus Proposition 3.3 implies the existence of a neighborhood U (x ∗ , u∗ ) of (x ∗ , u∗ ) and a constant σ¯ > 0 such that (3.2.9) holds for all (x, u) ∈ U (x ∗ , u∗ ) with ui ≥ 0 for i ∈ I1 ∪ I2 . The conclusion of Theorem 3.4 is referred to as augmentability of problem (3.1.1). This means that a functional, which in our case is Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm , can be found with the property that it has (x ∗ , u∗ ) as a strict local minimizer under the simple constraint u ≥ 0, while the explicit nonlinear constraints are eliminated. Let us note that if (3.1.6) holds with C replaced by the larger set Cˆ = {h ∈ X : e (x ∗ )h = 0, gi (x ∗ )h ≤ 0 for i ∈ I1 ∪ I2 }, then the conclusion of Theorem 3.4 holds with Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm replaced by Lc (x, u, λ∗ , μ∗ , η∗ ). In case X is finite-dimensional, augmentability is analyzed in detail in [Hes1], for example. The proof of augmentability in [Hes1] depends in essential manner on compactness of the unit ball. In [Be, p. 161 ff ], the augmentability analysis relies on the strict complementarity assumption, i.e., I1 = ∅ is assumed. We obtain, as immediate consequence of Theorem 3.4, that (3.1.6) provides a second order sufficient optimality condition. Corollary 3.5. Assume that (3.1.2)–(3.1.6) hold. Then there exists a neighborhood U (x ∗ ) of x ∗ such that f (x) ≥ f (x ∗ ) + σ¯ |x − x ∗ |2 for all x ∈ U (x ∗ ) satisfying e(x) = 0, g(x) ≤ 0, and (x) ∈ K. Theorem 3.4 can be used as the basis to define augmented cost functionals in terms of x only with the property that x ∗ is a uniform strict local unconstrained minimum. The two choices we present differ in the way in which the inequality constraints are treated. The first choice uses the classical cutoff penalty functional g(x) ˜ = max(g(x), 0), and the second is the Bertsekas penalty functional μ g(x, ˆ μ, c) = max g(x), − , c
(3.2.10)
(3.2.11)
where μ ∈ Rm and c > 0. In each of the two cases the max operation acts coordinatewise. To motivate the choice (3.2.11) we use the Lagrangian Lc (x, u, λ, μ, η) introduced in (3.2.2) for (3.2.1) and consider ⎧ ⎨min Lc (x, u, λ, μ, η) = fc (x, u) + λ, e(x) W (3.2.12) ⎩ + μ, g(x) + u Rm + η, (x) Z over x ∈ X, u ≥ 0. Carrying out the constraint minimization with respect to u with x and μ fixed results in μ , u = u(x, μ) = max 0, − g(x) + c
i
i i
i
i
i
i
74
ItoKunisc 2008/6/12 page 74 i
Chapter 3. First Order Augmented Lagrangians
and consequently
μ g(x) + u(x, μ) = max g(x), − c
= g(x, ˆ μ, c).
(3.2.13)
Corollary 3.6. Let (3.1.2)–(3.1.6) hold. Then there exists a neighborhood U (x ∗ ) of x ∗ such that c c 2 f (x)+ λ∗ , e(x) W + μ∗ , g(x) ˜ + η∗ , (x) Z + |e(x)|2W + |g(x)| ˜ Rm 2 2 ≥ f (x ∗ ) + σ¯ |x − x ∗ |2 for all x ∈ U (x ∗ ), and c ≥ c, ¯ where g(x) ˜ = max(g(x), 0). Proof. Setting u = max(−g(x), 0)
(3.2.14)
we have g(x) + u = g(x) ˜ and u ≥ 0. Recall the definition of U = U (x ∗ , −g(x ∗ )) from Theorem 3.4. Determine a neighborhood U (x ∗ ) such that x ∈ U (x ∗ ) implies (x, −g(x)) ∈ U and gi (x) ≤ 0 if gi (x ∗ ) < 0. It is simple to argue that x ∈ U (x ∗ ) implies (x, u) ∈ U, where u is defined in (3.2.14). The claim now follows from Theorem 3.4. Corollary 3.7. Let (3.1.2)–(3.1.6) hold and let r > 0. Then there exist constants δ > 0 and c˜ = c(r) ˜ ≥ c¯ such that c c f (x)+ λ∗ , e(x) X + μ∗ , g(x, ˆ μ, c) Rm + η∗ , (x) Z + |e(x)|2W + |g(x, ˆ μ, c)|2 2 2 ≥ f (x ∗ ) + σ¯ |x − x ∗ |2 for all c ≥ c, ˜ x ∈ Bδ = {x ∈ X : |x − x ∗ | ≤ δ}, and μ ∈ Br+ = {μ ∈ Rm + : |μ| ≤ r}, where g(x, ˆ μ, c) = max(g(x), − μc ). Proof. Let ε > 0 be such that gi (x ∗ ) ≤ −ε for all i ∈ I3 and such that |x − x ∗ | < ε and |u − u∗ | < ε implies (x, u) ∈ U (x ∗ , u∗ ), where U (x ∗ , u∗ ) is given in Theorem 3.4. Determine δ ∈ (0, ε) such that |x − x ∗ | ≤ δ implies |g(x) − g(x ∗ )| < 2ε and choose c˜ ≥ 2rε . For c ≥ c˜ and μ ∈ Br+ we define μ u = max 0, − + g(x) . (3.2.15) c Then g(x) + u = g(x) ˆ and u ≥ 0. To verify the claim it suffices to show that x ∈ Bδ and μ ∈ Br+ imply |u − u∗ | < ε, where u is defined in (3.2.15). If i ∈ I1 ∪ I2 , then |ui − u∗i | ≤ |gi (x ∗ ) − gi (x)| + μci . For i ∈ I3 we have μci + gi (x) ≤ μci + |gi (x ∗ ) − gi (x)| + gi (x ∗ ) < rc + 2ε − ε < 0 and consequently |ui − u∗i | = |gi (x ∗ ) − gi (x) − μci |. Summing over i = 1, . . . , m we find |u − u∗ | ≤ 2|g(x ∗ ) − g(x)|2 + c22 |μ|2 < ε2 . Remark 3.2.1. Note that c x → μ, g(x, ˆ μ, c) + |g(x, ˆ μ, c)|2 2
i
i i
i
i
i
i
3.3. The first order augmented Lagrangian algorithm
ItoKunisc 2008/6/12 page 75 i
75
is C 1 if g is C 1 . This follows from the identity
μ, g(x, ˆ μ, c)
Rm
c 1 + |g(x, ˆ μ, c)|2Rm = | max(0, μ + cg(x))|2Rm − |μ|2Rm . (3.2.16) 2 2c
Here, as throughout, the norm on Rm is the Euclidean one. On the other hand, c x → μ, g(x, ˜ μ, c) + |g(x, ˜ μ, c)|2 2 is not C 1 .
3.3 The first order augmented Lagrangian algorithm Here we describe the first order augmented Lagrangian algorithm which is a hybrid method combining the penalty technique and a Lagrangian method. Considering for a moment equality constraints only, the penalty method for (3.1.1) consists in minimizing f (x) +
ck |e(x)|2 2
for a sequence of penalty parameters ck tending to ∞. The Lagrangian method relies on minimizing f (x) + (λk , e(x)) and updating λk as a maximizer of the dual problem associated to (3.1.1), which will be defined below. The first order augmented Lagrangian method provides a systematic technique for the multiplier update. Its convergence analysis will be given in the next section. Let x ∗ be a local solution of (3.1.1) with (3.1.2)–(3.1.6) holding. The algorithm will ∗ require startup values (λ0 , μ0 ) ∈ W × Rm + for the Lagrange multipliers. We set r = |μ | + ∗ 2 ∗ 2 1/2 ˜ ε) ≥ c¯ as in the proof of Corollary 3.6. (|λ0 −λ | +|μ0 −μ | ) and choose ε and c˜ = c(r, Recall that ε was chosen such that gi (x ∗ ) ≤ −ε for all indices i in the set of inactive indices I3 and such that |x −x ∗ | < ε, |u−u∗ | < ε imply (x, u) ∈ U (x ∗ , −g(x ∗ )) with U (x ∗ , −g(x ∗ )) given in Theorem 3.4. Finally, let {cn }∞ n=1 be a monotonically nondecreasing sequence of penalty parameters with c1 > c, ˜ and set σn = cn − c. ¯ It is not required that limn→∞ cn = ∞, as would be the case for penalty methods. Algorithm ALM. 1. Choose (λ0 , μ0 ) ∈ W × Rm +. 2. Let xn be a solution to min Ln (x) over x ∈ X with (x) ∈ K.
(Pn )
3. Update the Lagrange multipliers λn = λn−1 + σn e(xn ), μn = μn−1 + σn g(x ˆ n , μn−1 , cn ),
i
i i
i
i
i
i
76
ItoKunisc 2008/6/12 page 76 i
Chapter 3. First Order Augmented Lagrangians where Ln (x) = f (x) + λn−1 , e(x) W + μn−1 , g(x, ˆ μn−1 , cn ) Rm +
cn cn |e(x)|2W + |g(x, ˆ μn−1 , cn )|2 . 2 2
Observe that μn = max(μn−1 + σn g(xn ), μn−1 (1 − σcnn )) and consequently μn ∈ Rn+ for each n = 1, 2, . . . . Existence of solutions to (Pn ) will be discussed in Section 3.4. It will be shown that (Pn ) has a solution in the interior of Bδ (see Corollary 3.6) for all n sufficiently large, or for all n if c1 is sufficiently large, and that these solutions converge to x ∗ , while (λn , μn ) converges to (λ∗ , μ∗ ). To motivate the Lagrange multiplier update in step 3 of the algorithm and to justify calling this a first order algorithm, we consider problem (3.2.1) without the infinite rank inequality constraint. Introducing Lagrange multipliers for the equality constraints e(x) = 0 and g(x) + u = 0 we obtained the augmented Lagrange functional Lc in (3.2.12). Carrying out the minimization with respect to u and utilizing (3.2.13) suggests introducing the modified augmented Lagrangian functional c c Lˆ c (x, λ, μ) = f (x) + λ, e(x) W + μ, g(x, ˆ μ, c) Rm + |e(x)|2W + |g(x, ˆ μ, c)|2Rm . 2 2 (3.3.1) Since Lˆ c (x ∗ , λ∗ , μ∗ ) = f (x ∗ ) we find Lˆ c (x ∗ , λ, μ) ≤ Lˆ c (x ∗ , λ∗ , μ∗ ) ≤ Lˆ c (x, λ∗ , μ∗ ) for all x ∈ Bδ and μ ≥ 0, ∗
(3.3.2)
∗
ˆ , μ) = 0 for whenever c ≥ c. ˜ The first inequality follows from the fact that e(x ) = g(x μ ≥ 0 and the second one from Corollary 3.6. From the saddle point property (3.3.2) for Lˆ c (x, λ, μ) we deduce that sup inf Lˆ c (x, λ, μ) ≤ Lˆ c (x ∗ , λ∗ , μ∗ ) ≤ inf sup Lˆ c (x, λ, μ).
λ,μ≥0 x
x
(3.3.3)
λ,μ
For the following discussion we assume strict complementarity, i.e., I1 = ∅. Then x → Lˆ c (x, λ, μ) is twice continuously differentiable for (x, λ, μ) in a neighborhood of (x ∗ , λ∗ , μ∗ ) and Lˆ c (x ∗ , λ∗ , μ∗ )(h, h) = Lˆ (x ∗ , λ∗ , μ∗ )(h, h) + c|Eh|2W for h ∈ X. Thus by (3.1.6) and Corollary 3.2, Lˆ c (x ∗ , λ∗ , μ∗ ) is positive definite on X for all c sufficiently large. Moreover (x, λ, μ) → Lˆ c (x, λ, μ) is continuous in a neighborhood U (x ∗ ) × U (λ∗ , μ∗ ) of (x ∗ , λ∗ , μ∗ ) and Lˆ c (x, λ, μ) ≥ τ¯ |x|2 for τ¯ > 0 independent of x ∈ X and (λ, μ) ∈ W × Rm . It follows that min Lˆ c (x, λ, μ)
x∈U (x ∗ )
admits a unique solution x(λ, μ) for every (λ, μ) ∈ U (λ∗ , μ∗ ) and as a consequence of Theorem 2.24 (λ, μ) → x(λ, μ) is differentiable. Consequently the locally defined dual functional d(λ, μ) = min∗ Lˆ c (x, λ, μ) x∈U (x )
i
i i
i
i
i
i
3.3. The first order augmented Lagrangian algorithm
ItoKunisc 2008/6/12 page 77 i
77
is differentiable for (λ, μ) in a neighborhood of (λ∗ , μ∗ ). For the gradient ∇d at (λ, μ) in direction (δλ , δμ ) we obtain using (3.2.16) ∇d(λ, μ)(δλ , δμ ) = f (x) + e (x)∗ λ + g (x)∗ max(0, μ + cg(x)), xλ (δλ ) + xμ (δμ ) + δλ , e(x) + δμ , g(x, ˆ μ, c) , where x = x(λ, μ). Utilizing the first order optimality condition this implies that the gradient of d(λ, μ) is given by ∇d(λ, μ) = (e(x(λ, μ)), g(x(λ, ˆ μ), μ, c)) ∈ W × Rm .
(3.3.4)
Thus the multiplier updates in step 3 of Algorithm ALM are steepest ascent directions for the dual problem sup d(λ, μ). λ,μ≥0
In view of (3.3.3) the first order augmented Lagrangian algorithm is an iterative algorithm combining minimization in the primal direction in step 2 and maximization in the dual direction on every level of the iteration. Remark 3.3.1. In this chapter we identify X and W with their dual spaces and consider e (x)∗ as an operator from W to X. If, alternatively, e (x)∗ is considered as an operator from W ∗ to X ∗ , then the Lagrange multiplier is taken as an element of W ∗ and the Lagrange ˜ = f (x) + λ, ˜ e(x) W ∗ ,W . The relation between λ and λ˜ functional is defined as L(x, λ) ˜ is given by I λ = λ, with I the canonical isomorphism from W to W ∗ . If, for example, W = H −1 (), then W ∗ = H01 () and I = (− )−1 . The Lagrange multiplier update in step 3 of Algorithm ALM is then given by λn = λn−1 + σn I e(xn ). As already mentioned the augmented Lagrangian algorithm ALM is also a hybrid method combining Lagrange multiplier and penalty methods. Solving (Pn ) without the penalty terms we obtain the former, for λn = 0 and cn satisfying limn→∞ cn = ∞ we obtain the latter. The choice of cn for Algorithm ALM in practice is an important one. While no general purpose techniques appear to be available, guidelines for the choice include choosing them large enough such that the augmented Lagrangian has positive definite Hessian in the sense of Proposition 3.1. Large cn will improve the convergence estimates, as we shall see in the following section, and at the same time they can be the cause for ill-conditioning of the auxiliary problems (Pn ). In fact, for large cn the Hessian of the cost functional Ln (x) can have eigenvalues on significantly different scales. For further discussion on the choice of the parameter c we refer the reader to [Be]. Let us make a few historical comments on the first order augmented Lagrangian algorithm. It was originated by Hestenes [Hes2] and Powell [Pow] and extended among others by Bertsekas [Be] and in [PoTi, Roc2]. The infinite-dimensional case has received less attention. In this respect we refer the reader to the work of Polyak and Tret’yakov [PoTr] and Fortin and Glowinski [FoGl]. Both consider the case of augmenting equality constraints only. In [FoGl] augmented Lagrangian methods are also developed systematically for solving nonlinear partial differential equations.
i
i i
i
i
i
i
78
3.4
ItoKunisc 2008/6/12 page 78 i
Chapter 3. First Order Augmented Lagrangians
Convergence of Algorithm ALM
We shall use the following notation: c¯ c¯ Lˆc (x, μ) = f (x) + λ∗ , e(x) + μ∗ , g(x, ˆ μ, c)|2 . ˆ μ, c) + η∗ , (x) + |e(x)|2 + |g(x, 2 2 Further we set r = |μ∗ | + (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 )1/2 . Corollary 3.7 then guarantees the existence of δ > 0 and c˜ = c(r) ˜ such that Lˆ c (x, μ) − f (x ∗ ) ≥ σ¯ |x − x ∗ |2
(3.4.1)
for all x ∈ Bδ , μ ∈ Br+ , and c ≥ c. ˜ We also assume that Bδ is contained in the region of applicability of (3.4.1) and that {cn }∞ n=1 is a monotonically nondecreasing sequence with c˜ < c1 . Then we have the following convergence properties of Algorithm ALM from arbitrary initializations (λ0 , μ0 ) ∈ W × Rm + in the case that suboptimal solutions are chosen in step 2. Theorem 3.8. Assume that (3.1.2)–(3.1.5), (3.4.1) hold and that x˜n ∈ Bδ satisfy Ln (x˜n ) ≤ Ln (x ∗ ) for each n = 1, . . . .
(3.4.2)
If (λn , μn ) are defined as in step 3 of Algorithm ALM with xn replaced by x˜n , then for n ≥ 1 we have with σn = cn − c¯ σ¯ |x˜n − x ∗ |2 +
1 (|λn − λ∗ |2 + |μn − μ∗ |2 ) 2σn 1 ≤ (|λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 ). 2σn
(3.4.3)
This implies that μn ∈ Br+ for all n ≥ 1 and |x˜n − x ∗ |2 ≤
1 (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) 2σ¯ σn
(3.4.4)
and ∞ n=1
σn |x˜n − x ∗ |2 ≤
1 (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ). 2σ¯
(3.4.5)
Proof. We proceed by induction and assume that the claim has been verified up to n−1. For n = 1 the result can be obtained by the general arguments given below. For the induction step we observe that (3.4.3), which is assumed to hold up to n − 1, implies |μn−1 | ≤ |μ∗ | + (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 )1/2 = r. Consequently μn−1 ∈ Br+ and (3.4.1) with μ = μn−1 is applicable. Using the fact that [max(0, μn−1,i + cn gi (x ∗ ))]2 − μ2n−1,i ≤ 0
i
i i
i
i
i
i
3.4. Convergence of Algorithm ALM
ItoKunisc 2008/6/12 page 79 i
79
and (3.2.16) we find m 1 Ln (x ) = f (x ) + {[max(0, μn−1,i + cn gi (x ∗ ))]2 − μ2n−1,i } ≤ f (x ∗ ). 2cn i=1 ∗
∗
(3.4.6)
Next we rearrange terms in Ln (x˜n ) and obtain Ln (x˜n ) = f (x˜n ) + λ∗ , e(x˜n ) + λn−1 − λ∗ , e(x˜n ) + μ∗ , g( ˆ x˜n , μn−1 , cn ) c¯ 1 ˆ x˜n , μn−1 , cn ) + |e(x˜n )|2 + (cn − c)|e( ¯ x˜n )|2 + μn−1 − μ∗ , g( 2 2 c¯ 1 ˆ x˜n , μn−1 , cn )|2 + (cn − c)| + |g( ¯ g( ˆ x˜n , μn−1 , cn )|2 2 2 ˆ x˜n , μn−1 ) + 1 (|λn − λ∗ |2 − |λn−1 − λ∗ |2 ) = L( 2σn 1 + (|μn − μ∗ |2 − |μn−1 − μ∗ |2 ) − η∗ , (x˜n ) Z . 2σn Since η∗ ∈ K + and (x˜n ) ∈ K we have η∗ , (x˜n ) ≤ 0. This fact together with the above equality, (3.4.2), and (3.4.6) implies ˆ x˜n , μn−1 ) + 1 (|λn − λ∗ |2 − |λn−1 − λ∗ |2 ) L( 2σn 1 (|μn − μ∗ |2 − |μn−1 − μ∗ |2 ) ≤ Ln (x˜n ) ≤ f (x ∗ ). + 2σn Finally (3.4.1) implies that σ¯ |x˜n − x ∗ |2 +
1 1 1 1 |λn − λ∗ |2 + |μn − μ∗ |2 ≤ |λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 , 2σn 2σn 2σn 2σn
which establishes (3.4.3). Estimates (3.4.4) and (3.4.5) follow from (3.4.3). The following conditions will be utilized to guarantee existence of local solutions to the auxiliary problems (Pn ): ⎧ f : X → R and gi : X → R, i = 1, . . . , m, are weakly lower ⎪ ⎪ ⎪ ⎨semicontinuous, ⎪e : X → W maps weakly convergent sequences to ⎪ ⎪ ⎩ weakly convergent sequences.
(3.4.7)
Further (PnC ) denotes problem (Pn ) with the additional constraint that x is contained in the closed ball Bδ . We refer to xn as the solution to (PnC ) if Ln (xn ) ≤ Ln (x) for all x ∈ Bδ . Proposition 3.9. If (3.1.2)–(3.1.5), (3.4.1), and (3.4.7) hold, then (PnC ) admits a solution xn for every n = 1, 2, . . . . Moreover, there exists n0 such that every solution xn of (PnC )
i
i i
i
i
i
i
80
ItoKunisc 2008/6/12 page 80 i
Chapter 3. First Order Augmented Lagrangians
satisfies xn ∈ int Bδ if n ≥ n0 . If c11−c¯ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then every solution xn of (PnC ) is in int Bδ for every n = 1, 2, . . . . Thus for n0 or c sufficiently large the solutions to (PnC ) are local solutions of the unconstrained problems (Pn ). Proof. Let {xni }i∈N be a minimizing sequence for (PnC ). There exists a weakly convergent subsequence {xnik }k∈N with weak limit xn ∈ Bδ . Condition (3.4.7) implies weak lower semicontinuity of Ln and hence Ln (xn ) ≤ lim inf ik Ln (xnik ) and xn is a solution of (PnC ). In particular (3.4.2) holds with x˜n replaced by xn for n = 1, 2, . . . . Consequently limn xn = x ∗ and xn ∈ int Bδ for all n sufficiently large. Alternatively, if c11−c¯ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then xn ∈ int Bδ for all n = 1, 2, . . . by (3.4.4). For the solutions xn obtained in Proposition 3.9 the conclusions of Theorem 3.8 hold, in particular limn xn = x ∗ and the associated Lagrange multipliers {λn } and {μn } are bounded. To investigate convergence of the Lagrange multipliers we introduce M : X → W × Rm1 +m2 × Z defined by M = (e (x ∗ ), gac (x ∗ ), (x ∗ )),
where gac denotes the first m1 + m2 coordinates of g. The following additional hypothesis will be needed: There exists a constant κ > 0 such that / ∗ / /f (x ) − f (x)/ ≤ κ|x ∗ − x|, / ∗ / /e (x ) − e (x)/ ≤ κ|x ∗ − x|, / ∗ / /g (x ) − g (x)/ ≤ κ|x ∗ − x|
(3.4.8)
for all x ∈ Bδ , and M is surjective.
(3.4.9)
We point out that (3.4.9) is more restrictive than the regular point condition in Definition 1.5 of Chapter 1. Below we set μn,ac = col(μn,1 , . . . , μn,m1 +m2 ) and μ∗ac = col(μ∗1 , . . . , μ∗m1 +m2 ). Without loss of generality we shall assume that σ¯ ≤ 1. Theorem 3.10. Let (3.1.2)–(3.1.5), (3.4.1), and (3.4.8)–(3.4.9) hold and let xn be a solution of (Pn ) in int Bδ , n˜ ≥ 1. Then there exists a constant K independent of n such that K |λn − λ∗ | + |μn − μ∗ | + |ηn − η∗ | ≤ √ (|λn−1 − λ∗ | + |μn−1 − μ∗ |) for all n ≥ 1. σ¯ σn Here ηn denotes the Lagrange multiplier associated with the constraint ∈ K in (Pn ). Sufficient conditions for xn ∈ Bδ were given in Proposition 3.9 above.
i
i i
i
i
i
i
3.4. Convergence of Algorithm ALM
ItoKunisc 2008/6/12 page 81 i
81
Proof. Optimality of xn and x ∗ for (Pn ) and (3.1.1) (see (3.1.3)) imply f (xn ), h + λn−1 + cn e(xn ), e (xn )h + max(0, μn−1 + cn g(xn )), g (xn )h + ηn , (xn )h = 0 and ∗ ∗ ∗ ∗ ∗ ∗ ∗ f (x ), h + λ , e (x )h + μ , g (x )h + η , (x )h = 0 for all h ∈ X. If we set λ˜ n = λn−1 + cn e(xn ) and μ˜ n = max(0, μn−1 + cn g(xn )), we obtain the following equation in X: e (x ∗ )∗ (λ˜ n − λ∗ ) + gac (x ∗ )∗ (μ˜ n,ac − μ∗ac ) + (x ∗ )∗ (ηn − η∗ ) = f (x ∗ ) − f (xn ) + (e (x ∗ )∗ − e (xn )∗ )λ˜ n + (gac (x ∗ )∗ − gac (xn )∗ )μn,ac .
Here we used the fact that μ˜ n,i = 0 for i ∈ I3 by the choice of δ and c˜ (see the proof of Corollary 3.7), and we set μn,ac = col (μn,1 , . . . , μn,m1 +m2 ). Note that M ∗ : W ×Rm1 +m2 → X is given by M ∗ = col (e (x ∗ )∗ , gac (x ∗ )∗ ). Hence we find MM ∗ (λ˜ n − λ∗ , μ˜ n,ac − μ∗ac , ηn − η∗ ) = M f˜ (x ∗ ) − f˜ (xn ) + (e (x ∗ )∗ − e (xn )∗ )λ˜ n + (gac (x ∗ )∗ − gac (xn )∗ )μ˜ n,ac .
(3.4.10)
Since λ˜ n − λ∗ = ce(x ¯ n ) and μ˜ n − μn = c¯g(x ˆ n , μn−1 , cn ) we have ∗ ¯ |λ˜ n − λn | = c|e(x n ) − e(x )|, |μ˜ n,ac − μn,ac | ≤ c|g ¯ ac (xn ) − gac (x ∗ )|.
(3.4.11)
Thus by (3.4.8) and (3.4.3), (3.4.4) of Theorem 3.8 the sequences {λn } and {μn } are uniformly bounded. Consequently by (3.4.9) and (3.4.10) there exists a constant K¯ such that |λ˜ n − ¯ n − x ∗ |, and further by (3.4.11) a constant K can be chosen λ∗ | + |μ˜ n,ac − μ∗ac | ≤ K|x such that ¯ n − x ∗ | for all n = 1, 2, . . . . |λn − λ∗ | + |μnac − μ∗ac | + |ηn − η∗ | ≤ K|x The choice of δ implies that gi (xn ) ≤ μni =
μn−1,i cn
(3.4.12)
for all i ∈ I3 and therefore
c¯ n−1 μ for i ∈ I3 , n ≥ 1. cn i
(3.4.13)
From (3.4.3), with x˜n replaced by xn , (3.4.12), and (3.4.13), and using σ¯ ≤ 1 the theorem follows. Corollary 3.11. Under the assumptions of Theorem 3.10 we have n K n0 1 ∗ ∗ ∗ |λn − λ | + |μn − μ | + |ηn − η | ≤ √ √ (|λ0 − λ∗ | + |μ0 − μ∗ |) σi σ¯ i=1 for all n ≥ 1.
i
i i
i
i
i
i
82
ItoKunisc 2008/6/12 page 82 i
Chapter 3. First Order Augmented Lagrangians
Remark 3.4.1. If the surjectivity requirement for M is replaced by assuming that M ∗ is surjective, then the proof of Theorem 3.10 implies the existence of a constant K such that |PM (λn − λ∗ , μn,ac − μ∗ac , ηn − η∗ )|W ×Rm1 +m2 ×Z ≤ K|xn − x ∗ |, where PM = M(M ∗ M)−1 M ∗ denotes the orthogonal projection of W × Rm1 +m2 × Z onto the range of M which is closed, since M ∗ is surjective. Since xn → x ∗ in X this implies convergence of PM (λn , μn,ac , ηn ) to PM (λ∗ , μ∗ac , η∗ ). Remark 3.4.2. If a constraint of the type x ∈ C, with C a closed convex set in X, appears in (3.1.1), then Theorem 3.8 and Proposition 3.9 remain valid if Algorithm ALM is modified such that in (Pn ) the functional Ln (x) is minimized over x ∈ C and x˜n appearing in (3.4.2) satisfies x˜n ∈ C. The stationarity condition (3.1.3) has to be replaced by f (x ∗ )(x − x ∗ ) + λ∗ , e (x ∗ )(x − x ∗ ) W + μ∗ , g (x ∗ )(x − x ∗ ) Rm ≥ 0 (3.1.3 ) for all x ∈ C, in this case.
3.5 Application to a parameter estimation problem We consider a least squares formulation for the estimation of the coefficient a in −div (a grad y) = f in , y = 0 in ∂,
(3.5.1)
where is a bounded domain in Rn , with Lipschitz continuous boundary ∂ if n ≥ 2, and f ∈ H −1 (), f = 0. Let Q be a Hilbert space that embeds compactly in L∞ () and let N : Q → R denote a seminorm on Q with ker(N ) ⊂ span{1} and such that N (a) defines a norm on Q/R = {a ∈ Q : 1, a Q = 0} that is equivalent to the norm induced by Q. For z ∈ H01 (), α > 0, and β > 0 we consider min 12 |y − z| 2H 1 () + α2 N 2 (a) 0 (3.5.2) subject to e(a, y) = 0, a(x) ≥ β, where e : Q × H01 () → H −1 () is given by e(a, y) = div (a grad y) + f. This is a special case of (3.1.1) with X = Q × H01 (), W = H −1 (), g = 0, and f the cost functional in (3.5.2). The affine constraint a(x) ≥ β can be considered as an explicit constraint as discussed in Remark 3.4.2 by setting C = {a ∈ Q : a(x) ≥ β}. Denote the solution to (3.5.1) as a function of a by y(a). To guarantee the existence of a solution for (3.5.2) we require the following assumption: either N is a norm on Q (3.5.3) or inf {|y(a) − z|H 1 : a is constant } < |z|H 1 .
i
i i
i
i
i
i
3.5. Application to a parameter estimation problem
ItoKunisc 2008/6/12 page 83 i
83
Lemma 3.12. If (3.5.3) holds, then there exists a solution (a ∗ , y ∗ ) of (3.5.2). Proof. Let {(an , yn )}∞ n=1 denote a minimizing sequence for (3.5.2). If N is a norm, then a subsequence argument implies the existence of a solution to (3.5.2). Otherwise we decompose an as an = an1 + an2 with an2 ∈ Q/R and an1 ∈ (Q/R)⊥ . Since {(an , yn )}∞ n=1 is a 2 ∞ ∞ minimizing sequence, {N (an2 )}∞ and consequently {|a | } are bounded. We have L n n=1 n=1 an1 |∇yn |2 dx = − an2 |∇yn |2 dx + fyn dx.
{an1 |∇yn |2L2 }∞ n=1 ,
Since a ≥ β implies a ≥ β, it follows that |∇yn | = |∇y(an )| → 0 for n → ∞. In this case 1
is bounded. If an1 → ∞, then
|z| ≤ lim |y(an ) − z|2H 1 () + βN 2 (an ) ≤ inf {|y(a) − z|H 1 : a = constant} < |z|H 1 , n→∞
∞ which is impossible. Therefore {an1 }∞ n=1 and as a consequence {an }n=1 are bounded, and a subsequence of {(an , yn )} converges weakly to a solution of (3.5.2).
H
−1
Using Theorem 1.6 one argues the existence of Lagrange multipliers (λ∗ , η∗ ) ∈ () × Q∗ such that the Lagrangian
L(a, y, λ∗ , η∗ ) =
1 α |y − z|2H 1 () + N 2 (a) + λ∗ , e(a, y) H01 ,H −1 + η∗ , β − a Q∗ ,Q 0 2 2
is stationary at (a ∗ , y ∗ ), and η∗ , β − a ∗ Q∗ ,Q = 0, η∗ , h Q∗ ,Q ≥ 0 for all h ≥ 0. Hence (3.1.2)–(3.1.5) hold. Here (3.1.3)–(3.1.4) must be modified to include the affine inequality constraint with infinite-dimensional image space. We turn to the augmentability condition (3.4.1) and introduce the augmented Lagrangian c Lc (a, y, λ∗ , η∗ ) = L(a, y, λ∗ , η∗ ) + |e(a, y)|2H −1 2 for c ≥ 0. Henceforth we assume that (3.5.2) admits a solution (a0 , y0 ) for α = 0. Such a solution exists if z is attainable, i.e., if there exists a0 ∈ Q, a0 ≥ β, such that y(a0 ) = y and in this case y0 = y(a0 ). Alternatively, if the set of coefficients is further constrained by a norm bound |a|Q ≤ γ , for example, then existence of a solution to (3.5.2) with α = 0 is also guaranteed. Proposition 3.13. Let (a0 , y0 ) denote a solution to (3.5.2) with α = 0, let (a ∗ , y ∗ ) be a solution for α > 0, and choose γ ≥ |a ∗ |Q . Then, if |y0 − z|H 1 is sufficiently small, there exist positive constants c0 and σ such that Lc (a, u; λ∗ , η∗ ) ≥
1 ∗ α |y − z|2H 1 + N 2 (a ∗ ) + σ (|a − a ∗ |2Q + |y − y ∗ |2H 1 ) 0 2 2
for all c ≥ c0 and (a, y) ∈ C × H01 () with |a|Q ≤ γ . This proposition implies that (3.4.1) is satisfied and that Theorem 3.8 and Proposition 3.9 are applicable with Bδ = {a ∈ Q : a(x) ≥ β, |a|Q ≤ γ } × H01 ().
i
i i
i
i
i
i
84
ItoKunisc 2008/6/12 page 84 i
Chapter 3. First Order Augmented Lagrangians
Proof. For (a, y) ∈ C × H01 () we have Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) = ∇L(a ∗ , y ∗ , λ∗ , η∗ ) 1 α c − h∇λ∗ , ∇v L2 + |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 , 0 2 2 2 where h = a − a ∗ , v = y − y ∗ , and ∇L denotes the gradient of L with respect to (a, y). First order optimality implies that Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) ≥ − h∇λ∗ , ∇v L2 1 α c + |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 =: D. 0 2 2 2 Introduce P = ∇ · (− )−1 ∇ and note that P can be extended to an orthogonal projection on L2 ()n . We find that |e(a, y)|2H −1 = P (a∇y − a ∗ ∇y ∗ ), a∇y − a ∗ ∇y ∗ L2 = P (a∇v + h∇y ∗ ), h∇y ∗ + a∇v L2 1 ≥ P (h∇y ∗ ), h∇y ∗ L2 − P (a∇v), a∇v L2 2 1 ≥ P (h∇y ∗ ), h∇y ∗ L2 − |a|2L∞ |v|2H 1 0 2 1 ≥ P (h∇y ∗ ), h∇y ∗ L2 − γ 2 k12 |v|2H 1 , 0 2 ∞ where k1 denotes the embedding constant of Q into L (). Henceforth we consider the case that ker N = span{1} and use the decomposition introduced in the proof of Lemma 3.12. We have 1 |e(a, y)|2 ≥ |h1 |2 |y ∗ |2H 1 − |h1 | |h2 |L∞ |y ∗ |2H 1 − γ 2 k12 |v|H01 . 0 0 2 ∞ Since Q embeds continuously into L () and N is a norm equivalent to the norm on Q/R, there exists a constant k2 such that |h2 |L∞ ≤ k2 N (h) for all h = h1 + h2 ∈ Q. Consequently |e(a, y)|2 ≥
1 |h1 |2 |y ∗ |2H 1 − k2 |h1 | N (h2 ) |y ∗ |2H 1 − γ 2 k12 |v|H01 0 0 2
(3.5.4)
for all (a, y) ∈ C×H01 (). Next note that Ly (a ∗ , u∗ ; λ∗ , η∗ ) = 0 implies that −∇(a ∗ ∇λ∗ ) = − (y ∗ − y) in H −1 (), and hence
h∇λ∗ , ∇v 2 ≤ 1 (|h1 | + k2 N (h2 ))|y ∗ − z|H 1 |v|H 1 . (3.5.5) L 0 0 β Setting c = δα with δ > 0 and observing that |f |H −1 ≤ k3 |y ∗ |H 1 for a constant k3 depending only on β, D=
1 1 (1 − δαγ 2 k12 )|v|2H 1 − (|h1 | + k2 N (h2 ))|y ∗ − z||v|H01 0 2 β α δ k2 2 ∗ 2 2 2 N (h2 ) + |y |H 1 |h1 | − 2 δ|h1 |N (h2 ) |f |H −1 . + 0 2 2 k3
i
i i
i
i
i
i
3.5. Application to a parameter estimation problem
ItoKunisc 2008/6/12 page 85 i
85
There exist δ > 0 and k4 > 0 such that D ≥ k4 (|v|2H 1 + |h1 |2 + N 2 (h2 )) − 0
1 (|h1 | + k2 N (h2 )) |y ∗ − z|H01 |v|H01 . β
Since |h1 | + N (h2 ) defines an equivalent norm on Q, the claim follows with c0 = δα. It is worthwhile to comment on the auxiliary problems of Algorithm ALM. These involve the minimization of Ln (a, y) =
1 α cn |y −z|2H 1 + N 2 (a)+ λn−1 , A(a)y +f H01 ,H −1 + |A(a)y +f |2H −1 (3.5.6) 0 2 2 2
over (a, y) ∈ C × H01 (), where A(a)y = div (a grad y). The resulting optimality conditions are given by α 2 (N (an )) (a − an ) 2 + (−∇λn−1 ∇ yn + cn ∇ −1 (A(an )yn + f ) ∇yn , a − an )Ln ≥ 0
(3.5.7)
for all a ≥ an and − (yn − z) + A(an )λn−1 + A(an )(− )−1 (A(an )yn + f ) = 0.
(3.5.8)
The Lagrange multiplier is updated according to λn = λn−1 + σn (− )−1 (A(an )yn + f ).
(3.5.9)
To simplify (3.5.7), (3.5.8) for numerical realization one can proceed iteratively solving (3.5.7) with yn replaced by yn−1 for an and then (3.5.8) for yn . A good initialization for the variable λ is given by λ0 = 0. In fact, for small residue problems |y ∗ − z|H01 is small ∂ L(a ∗ , y ∗ , λ∗ , η∗ ) = 0 implies that λ∗ is small. Setting λ0 = 0 and y0 = z the and then ∂y coefficient a1 is determined from min a≥β
α 2 C1 N (a) + |A(a)z + f |2H −1 . 2 2
(3.5.10)
Thus the first step of the augmented Lagrangian algorithm for estimating a in (3.5.1) involves a regularized equation error step. Recall that the equation error method for parameter estimation problems consists in solving the hyperbolic equation −div (a grad z) = f for a. This estimate is improved during the subsequent iterations. The first order augmented Lagrangian method can therefore be considered as a hybrid method combining the output least squares and the equation error approach to parameter estimation. Let us close this section with some brief comments. If the H 1 least squares criterion is replaced by an L2 -criterion, then the first negative Laplacian in (3.5.8) must be replaced by the identity operator. For the H 1 -criterion the same number of differentiations are applied to y in the least squares and the equation error term in (3.5.8). If one would approach the problem of estimating a in (3.5.1) by first discretizing and then applying an optimization strategy, the discrete analogue of (− )−1 can be interpreted as preconditioning. We assumed that Q embeds compactly in L∞ (). This is the case, for example, if Q = H 1 () when n = 1, or Q = H 2 () when n = 2 or 3, or if Q ⊂ L∞ () is finite-dimensional.
i
i i
i
i
ItoKunisc 2008/6/12 page 86 i
i
i
i
i
i
i
i
i
i
ItoKunisc 2008/6/12 page 87 i
Chapter 4
Augmented Lagrangian Methods for Nonsmooth, Convex Optimization
4.1
Introduction
The class of optimization problems that motivates this chapter is given by min f (x) + ϕ(x) over x ∈ C,
(4.1.1)
where X, H are real Hilbert spaces, C is a closed convex subset of X, and ∈ L(X, H ). In this chapter we identify H with its dual space and we distinguish between X and its dual X ∗ . Further f : X → R is a continuously differentiable, convex function, and ϕ : H → (−∞, ∞] is a proper, lower semicontinuous, convex but not necessarily differentiable function. This problem class encompasses a wide variety of optimization problems including variational inequalities of the first and second kind [Glo]. Our formulation here is slightly more general than the one in [Glo, EkTe], since we allow the additional constraint x ∈ C. For example, one can formulate regularized inverse scattering problems in the form (4.1.1):
2
μ 1
2 min |∇u| + g |∇u| dx + k(x, y)u(y) dy − z(x)
dx (4.1.2)
2 2 over u ∈ H 1 () and u ≥ 0, where k(x, y) denotes a scattering kernel. The problem consists in recovering the original image u defined on a domain from scattered and noisy data z ∈ L2 (). Here μ > 0 and g > 0 are fixed and should be adjusted to the statistics of the noise. If μ = 0, then this problem is equivalentto the image enhancement algorithm in [ROF] based on minimization of the BV-seminorm |∇u| ds. In this example ϕ(v) = g |v| ds, which is nondifferentiable. Several additional applications are treated at the end of this chapter. We develop a Lagrange multiplier theory to deal with the nonsmoothness of ϕ. To briefly explain the approach let x, λ ∈ H and c > 0, and define the family of generalized Yosida–Moreau approximations ϕc (x, λ) by c ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.1.3) u∈H 2 87
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 88 i
88 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization This is equivalent to an augmented Lagrangian approach. In fact, note that (4.1.1) is equivalent to min f (x) + ϕ(x − u) (4.1.4) subject to x ∈ C and u = 0 in H. Treating the equality constraint u = 0 in (4.1.4) by the augmented Lagrangian method results in the minimization problem min f (x) + ϕ(x − u) + (λ, u)H +
x∈C,u∈H
c 2 |u| , 2 H
(4.1.5)
where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. Equivalently, problem (4.1.5) is written as min Lc (x, λ) = f (x) + ϕc (x, λ). x∈C
(4.1.6)
It will be shown that ϕc (u, λ) is continuously Fréchet differentiable with respect to u ∈ H . Moreover, if xc ∈ C denotes the solution to (4.1.6), then it satisfies f (xc ) + ∗ λc , x − xc X∗ ,X ≥ 0
for all x ∈ C,
λc = ϕc (xc , λc ). It will further be shown that under appropriate conditions the pair (xc , λc ) ∈ C × H has a ¯ as c → ∞ such that x¯ ∈ C is the minimizer of (4.1.1) (strong-weak) cluster point (x, ¯ λ) and that λ¯ ∈ H is a Lagrange multiplier in the sense that ¯ x − x
f (x) ¯ + ∗ λ, ¯ X∗ ,X ≥ 0
for all x ∈ C
(4.1.7)
with the complementarity condition ¯ ¯ λ) λ¯ = ϕc (x,
for each c > 0.
(4.1.8)
System (4.1.7)–(4.1.8) for the pair (x, ¯ λ¯ ) is a necessary and sufficient optimality condition for problem (4.1.1). We analyze iterative algorithms of Uzawa and augmented Lagrangian ¯ and present a convergence analysis. It will be shown type for finding the optimal pair (x, ¯ λ) that condition (4.1.8) is equivalent to the complementarity condition λ¯ ∈ ∂ϕ(x). ¯ Thus the frequently employed differential inclusion λ¯ ∈ ∂ϕ(x) ¯ is replaced by the nonlinear equation (4.1.8). This chapter is organized as follows. In Sections 4.2–4.3 we present the basic convex analysis and duality theory in Banach spaces. Section 4.4 is devoted to the generalized Yosida–Moreau approximation which is the basis for the remainder of the chapter. Conditions for the existence of Lagrange multiplier for (4.1.1) and optimality systems are derived in Section 4.5. Section 4.6 is devoted to the augmented Lagrangian algorithm. Convergence of both the augmented Lagrangian and the Uzawa methods are proved. Section 4.7 contains a large number of concrete applications.
i
i i
i
i
i
i
4.2. Convex analysis
4.2
ItoKunisc 2008/6/12 page 89 i
89
Convex analysis
In this section we present standard results from convex analysis and duality theory in Banach spaces following, in part, [BaPe, EkTu]. Throughout X denotes a real Banach space. Definition 4.1. (1) A functional F : X → (−∞, ∞] is called convex if F ((1 − λ) x1 + λ x2 ) ≤ (1 − λ) F (x1 ) + λ F (x2 ) for all x1 , x2 ∈ X and 0 ≤ λ ≤ 1. It is called proper if it is not identically ∞. (2) A functional F : X → (−∞, ∞] is said to be lower semicontinuous (l.s.c.) at x ∈ X if F (x) ≤ lim inf F (y). y→x
A functional F is l.s.c. if it is l.s.c. at all x ∈ X. (3) A functional F : X → (−∞, ∞] is called weakly lower semicontinuous (w.l.s.c.) at x if F (x) ≤ lim inf F (xn ) n→∞
for all sequences {xn } converging weakly to x. Further F is w.l.s.c. if it is w.l.s.c. at all x ∈ X. (4) The subset D(F ) = {x ∈ X : F (x) < ∞} of X is called the effective domain of F . (5) The epigraph of F is defined by epi(F ) = {(x, c) ∈ X × R : F (x) ≤ c}. Lemma 4.2. A functional F : X → (−∞, ∞] is l.s.c. if and only if its epigraph is closed. Proof. The lemma follows from the fact that epi (F ) is closed if and only if (xn , cn ) ∈ epi(F ) → (x, c) in X × R implies F (x) ≤ c. Lemma 4.3. A functional F : X → (−∞, ∞] is l.s.c. if and only if its level sets Sc = {x ∈ X : F (x) ≤ c} are closed for all c ∈ R. Proof. Assume that F is l.s.c. Let c > 0 and let {xn } be a sequence in Sc with limit x. Then F (x) ≤ lim inf x→∞ F (xn ) ≤ c and hence x ∈ Sc and Sc is closed. Conversely, assume that Sc is closed for all c > 0, and let {xn } be a sequence converging to x in X. Choose a subsequence {xnk } with lim F (xnk ) = lim inf F (xn ), k→∞
n→∞
and suppose that F (x) > lim inf n→∞ F (xn ). Then there exists c¯ ∈ R such that lim inf F (xn ) < c¯ < F (x), n→∞
and there exists an index m such that F (xnk ) < c¯ for all k ≥ m. Since Sc¯ is closed x ∈ Sc¯ , which is a contradiction to c¯ < F (x). Lemma 4.4. Assume that all level sets of the functional F : X → (−∞, ∞] are convex. Then F is l.s.c. if and only if it is w.l.s.c. on X.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 90 i
90 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Proof. Assume that F is l.s.c. Suppose that {xn } converges weakly to x¯ in X, and let {xnk } be a subsequence such that d = lim inf F (xn ) = limk→∞ F (xnk ). By Lemma 4.3 the sets {x : F (x) ≤ d + } are closed for every > 0. By assumption they are also convex. Hence by Mazur’s lemma they are also weakly (sequentially) closed. Hence F (x) ¯ ≤ d + for every > 0. Since is arbitrary, we have that F (x) ¯ ≤ d and F is w.l.s.c. The converse implication is obvious. Note that for a convex function all associated level sets are convex, but not vice versa. Theorem 4.5. Let F be a proper, l.s.c., convex functional on X. Then F is bounded below by an affine functional; i.e., there exist x ∗ ∈ X∗ and c ∈ R such that F (x) ≥ x ∗ , x X∗ ,X + c
for all x ∈ X.
Moreover, F is the pointwise supremum of a family of such continuous affine functionals. Proof. Let x0 ∈ X and choose β ∈ R such that F (x0 ) > β. Since epi(F ) is a closed convex subset of the product space X × R, it follows from the separation theorem for convex sets [EkTu] that there exists a closed hyperplane H ⊂ X × R given by H = {(x, r) ∈ X × R : x0∗ , x + ar = α} with x0∗ ∈ X∗ , a ∈ R, α ∈ R, such that
x0∗ , x0 + aβ < α < x0∗ , x + ar
for all (x, r) ∈ epi(F ).
Setting x = x0 and r = F (x0 ), we have x0∗ , x0 + aβ < α < x0∗ , x0 + aF (x0 ) and thus a (F (x0 ) − β) > 0. If F (x0 ) < ∞, then a > 0 and thus α 1 − x0∗ , x ≤ r a a β<
for all (x, r) ∈ epi(F ),
α 1 − x0∗ , x0 < F (x0 ). a a
Hence, b(x) = αa − a1 x0∗ , x is a continuous affine function on X such that b ≤ F and −x ∗ the first claim is established with c = αa and x ∗ = a 0 . Moreover β < b(x0 ) < F (x0 ). Therefore F (x0 ) = sup b(x0 ) = x ∗ , x0 + c : (4.2.1) x ∗ ∈ X∗ , c ∈ R, b(x) ≤ F (x) for all x ∈ X . If F (x0 ) = ∞, either a > 0 (thus we proceed as above) or a = 0. In the latter case α − x0∗ , x0 > 0 and α − x0∗ , x < 0 on D(F ). Since F is proper there exists an affine function b(x) = x ∗ , x + c such that b ≤ F . Thus x ∗ , x + c + θ (α − x0∗ , x ) < F (x)
i
i i
i
i
i
i
4.2. Convex analysis
ItoKunisc 2008/6/12 page 91 i
91
for all x and θ > 0. Choosing θ > 0 large enough so that x ∗ , x + c + θ (α − x0∗ , x ) > β, we have that b(x) = x ∗ , x + c + θ (α − x0∗ , x ) is a continuous affine function on X with b ≤ F and β < b(x0 ). Therefore (4.2.1) holds at x0 with F (x0 ) = ∞ as well. Theorem 4.6. If F : X → (−∞, ∞] is convex and bounded above on an open set U , then F is continuous on U . Proof. We choose M ∈ R such that F (x) ≤ M − 1 for all x ∈ U . Let xˆ be any element in U . Since U is open there exists a δ > 0 such that the open ball {x ∈ X : |x − x| ˆ < δ} is contained in U . For any ∈ (0, 1), let θ = M−F . Then for x ∈ X satisfying |x − x| ˆ <θδ (x) ˆ we have
x − xˆ
|x − x| ˆ
< δ.
θ + xˆ − xˆ = θ Hence
x−xˆ θ
+ xˆ ∈ U . By convexity of F x − xˆ F (x) ≤ (1 − θ )F (x) ˆ +θF + xˆ < (1 − θ)F (x) ˆ + θ M, θ
and thus F (x) − F (x) ˆ < θ (M − F (x)) ˆ = . x−x ˆ θ
+ xˆ ∈ U and θ 1 θM 1 xˆ − x F (x) ˆ ≤ F + xˆ + F (x) < + F (x), 1+θ θ 1+θ 1+θ 1+θ
Similarly,
which implies F (x) − F (x) ˆ > −θ(M − F (x)) ˆ = −. Therefore |F (x) − F (x)| ˆ < if |x − x| ˆ < θ δ and F is continuous in U . Theorem 4.7. If F : X → (−∞, ∞] is a proper, l.s.c., convex functional on X, and F is bounded above on a convex, open δ-neighborhood U of a bounded, convex set C, then F is Lipschitz continuous on C. Proof. By Theorem 4.5 and by assumption there exist constants M and m such that m ≤ F (x) ≤ M
for all x ∈ U.
x| ˆ δ Let x and xˆ be in C with |x − x| ˆ ≤ M−m , x = x, ˆ and set θ = 2|x− . Without loss of θ 2 x−xˆ generality we assume that M−m < 1 so that θ ∈ (0, 1). Then y = θ + xˆ ∈ U since xˆ ∈ U and |y − x| ˆ = x−θ x|ˆ ≤ 2δ . Due to convexity of F we have x − xˆ F (x) ≤ (1 − θ )F (x) ˆ +θ F + xˆ ≤ (1 − θ)F (x) ˆ +θM θ
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 92 i
92 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization and hence F (x) − F (x) ˆ ≤ θ(M − F (x)) ˆ ≤ Similarly,
x−x ˆ θ
2 (M − m)|x − x|. ˆ δ
+ xˆ ∈ U and
F (x) ˆ ≤
θ xˆ − x 1 θM 1 F + xˆ + F (x) ≤ + F (x), 1+θ θ 1+θ 1+θ 1+θ
which implies −θ (M − m) ≤ −θ(M − F (x)) ˆ ≤ F (x) − F (x) ˆ and therefore |F (x) − F (x)| ˆ ≤
2 δ (M − m)|x − x| ˆ for |x − x| ˆ ≤ . θ M −m
Since C is bounded and convex, Lipschitz continuity for all x, xˆ ∈ C follows.
4.2.1
Conjugate and biconjugate functionals
Definition 4.8. The functional F ∗ : X∗ → [−∞, ∞] defined by F ∗ (x ∗ ) = sup { x ∗ , x X∗ ,X − F (x) : x ∈ X} is called the conjugate of F . If F is bounded below by an affine functional x ∗ , · − c, then F ∗ (x ∗ ) = sup { x ∗ , x − F (x)} ≤ sup { x ∗ , x − x ∗ , x + c} = c, x∈X
x∈X
and hence F ∗ is not identically ∞. Note also that D(F ) = ∅ implies that F ∗ ≡ −∞ and conversely, if F ∗ (x ∗ ) = −∞ for some x ∗ ∈ X, then D(F ) is empty. Example 4.9. If F (x) = 1 + q1 = 1. p
1 |x|p , p
x ∈ R, then F ∗ (x ∗ ) =
1 ∗ q |x | q
for 1 < p < ∞ and
Example 4.10. If F (x) is the indicator function of the closed unit ball of X, i.e., F (x) = 0 for |x| ≤ 1 and F (x) = ∞ otherwise, then F ∗ (x ∗ ) = |x ∗ |. In fact, F ∗ (x ∗ ) = sup{ x ∗ , x : |x| ≤ 1} = |x ∗ |. If F is a proper, l.s.c., convex functional on X, then by Theorem 4.5 F (x0 ) = sup { x ∗ , x0 − c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x − c ≤ F (x) for all x ∈ X} for every x0 ∈ X. For x ∗ ∈ X define c(x ∗ ) = inf {c ∈ R : c ≥ x ∗ , x − F (x) for all x ∈ X}
i
i i
i
i
i
i
4.2. Convex analysis
ItoKunisc 2008/6/12 page 93 i
93
and observe that
F (x0 ) = sup { x ∗ , x0 − c(x ∗ )}. x ∗ ∈X ∗
∗
∗
But c(x ) = supx∈X { x , x − F (x)} = F ∗ (x ∗ ) and hence F (x0 ) = sup { x ∗ , x0 − F ∗ (x ∗ )}. x ∗ ∈X ∗
This suggests the following definition. Definition 4.11. For F : X → (−∞, ∞] the biconjugate functional F ∗∗ : X → (−∞, ∞] is defined by F ∗∗ (x) = sup { x ∗ , x − F ∗ (x ∗ )}. x ∗ ∈X ∗
Above we proved the following result. Theorem 4.12. If F is a proper, l.s.c., convex functional, then F = F ∗∗ . It is simple to argue that (F ∗ )∗ = F ∗∗ if X is reflexive. Next we prove some important results on conjugate functionals. Theorem 4.13. For every functional F : X → (−∞, ∞] the conjugate F ∗ is convex and l.s.c. on X∗ . Proof. If F is identically ∞, then F ∗ is identically −∞. Otherwise D(F ) is nonempty and F (x ∗ ) > −∞ for all x ∗ ∈ X∗ , and F ∗ (x ∗ ) = sup { x ∗ , x − F (x) : x ∈ D(F )}. Thus F ∗ is the pointwise supremum of the family of continuous affine functionals x ∗ → x ∗ , x − F (x) for x ∈ D(F ). This implies that F ∗ is convex. Moreover {x ∗ ∈ X∗ : x ∗ , x − F (x) ≤ c} is closed for each c ∈ R and x ∈ D(F ). Consequently . {x ∗ ∈ X∗ : F ∗ (x ∗ ) ≤ c} = {x ∗ ∈ X∗ : x ∗ , x − F (x) ≤ c} x∈D(F )
is closed for each c ∈ R. Hence F ∗ is l.s.c. by Lemma 4.3. Theorem 4.14. For every F : X → (−∞, ∞] the biconjugate F ∗∗ is convex, l.s.c., and F ∗∗ ≤ F . Moreover for each x¯ ∈ X F ∗∗ (x) ¯ = sup { x ∗ , x −c ¯ : x ∗ ∈ X∗ , c ∈ R, x ∗ , x −c ≤ F (x) for all x ∈ X}. (4.2.2) Proof. If there is no continuous affine functional which is everywhere less than F , then F ∗ ≡ ∞ and hence F ∗∗ ≡ −∞. In fact, if there exist c ∈ R and x ∗ ∈ X∗ with c ≥ F ∗ (x ∗ ), then x ∗ , · − c is everywhere less than F , which is impossible. The claims readily follow in this case.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 94 i
94 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Otherwise, assume that there exists a continuous affine functional everywhere less than F . Then D(F ∗ ) is nonempty and F ∗∗ (x) > −∞ for all x ∈ X. If F ∗ (x ∗ ) = −∞ for some x ∗ ∈ X∗ , then F ∗∗ ≡ ∞, F ≡ ∞, and the claims follow. In the remaining case F ∗ (x ∗ ) is finite for all x ∗ and we have F ∗∗ (x) = sup{ x ∗ , x − F ∗ (x ∗ ) : x ∗ ∈ X∗ , F ∗ (x ∗ ) finite}. For every x ∗ ∈ D(F ∗ ) we have x ∗ , x − F ∗ (x ∗ ) ≤ F (x) for all x ∈ X, hence F ∗∗ ≤ F and F ∗∗ (x) ¯ ≤ sup{ x ∗ , x
¯ − c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x − c ≤ F (x) for all x ∈ X}. Let x ∗ and c be such that x ∗ , x − c ≤ F (x) for all x ∈ X. Then x ∗ , x − c ≤ x ∗ , x − F (x ∗ ) for all x ∈ X. Therefore sup{ x ∗ , x
¯ − c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x − c ≤ F (x) for all x ∈ X} ≤ F ∗∗ (x), ¯ and (4.2.2) follows. Moreover F ∗∗ is the pointwise supremum of a family of continuous affine functionals, and it follows as in the proof of Theorem 4.13 that F ∗ is convex and l.s.c. Theorem 4.15. If F : X → (−∞, ∞] is a convex functional which is finite and l.s.c. at x, then F (x) = F ∗∗ (x). For the proof of this result we refer the reader to [EkTu, p. 104], for example. Theorem 4.16. For every F : X → (−∞, ∞], we have F ∗ = F ∗∗∗ . Proof. Since F ∗∗ ≤ F due to Theorem 4.14, F ∗ ≤ F ∗∗∗ by the definition of conjugate functions. The definition of F ∗∗ implies that x ∗ , x − F ∗∗ (x) ≤ F (x ∗ ) ∗∗
∗
∗
∗
∗
(4.2.3)
∗∗
if F (x) and F (x ) are finite. If F (x ) = ∞ or F (x) = ∞, inequality (4.2.3) holds as well. If F ∗ (x ∗ ) = −∞, we have that F and F ∗ are identically ∞. If F ∗∗ (x) = −∞, then F ∗ is identically ∞, and (4.2.3) holds for all (x, x ∗ ) ∈ X × X ∗ . Thus, F ∗∗∗ (x) = sup{ x ∗ , x − F ∗∗ (x)} ≤ F (x ∗ ) x∈X ∗
∗
for all x ∈ X . Hence F
∗∗∗
∗
=F .
Theorem 4.17. Let F : X → (−∞, ∞] and assume that ∂F (x) = ∅ for some x ∈ X. Then F (x) = F ∗∗ (x). Proof. Since ∂F (x) = ∅ there exists a continuous affine functional ≤ F with (x) = F (x). Due to Theorem 4.14 we have ≤ F ∗∗ ≤ F . It follows that (x) = F ∗∗ (x) = F (x), as desired.
i
i i
i
i
i
i
4.2. Convex analysis
4.2.2
ItoKunisc 2008/6/12 page 95 i
95
Subdifferential
In this short section we summarize some important results on subdifferentials of functionals F : X → (−∞, ∞]. While F is not assumed to be convex, the applications that we have in mind are to convex functionals. Definition 4.18. Let F : X → (−∞, ∞]. The subdifferential of F at x is the (possibly empty) set ∂F (x) = {x ∗ ∈ X∗ : F (y) − F (x) ≥ x ∗ , y − x for all y ∈ X}. If ∂F (x) is nonempty, then F is called subdifferentiable at x and the set of all points where F is subdifferentiable is denoted by D(∂F ). As a first observation, we note that x0 = argminx∈X F (x) if and only if 0 ∈ ∂F (x0 ). In fact, these two statements are equivalent to F (x) ≥ F (x0 ) + (0, x − x0 )
for all x ∈ X.
Example 4.19. Let F be Gâteaux differentiable at x, i.e., there exists w ∗ ∈ X∗ such that lim
t→0+
F (x + t v) − F (x) = w ∗ , v
t
for all v ∈ X,
and w∗ ∈ X∗ is called the Gâteaux derivative of F at x. It is denoted by F (x). If in addition F is convex, then F is subdifferentiable at x and ∂F (x) = {F (x)}. Indeed, for v = y − x F (x + t (y − x)) − F (x) ≤ F (y) − F (x), t
where 0 < t < 1.
As t → 0+ we obtain F (x), y − x ≤ F (y) − F (x)
for all y ∈ X,
and thus F (x) ∈ ∂F (x). On the other hand, if w ∗ ∈ ∂F (x), we find for y ∈ X and t > 0 F (x + t y) − F (x) ≥ w ∗ , y . t Taking the limit t → 0+ , we obtain F (x) − w ∗ , y ≥ 0
for all y ∈ X.
This implies that w∗ = F (x). Example 4.20. For F (x) = 12 |x|2 , x ∈ X, we will show that ∂F (x) = F(x), where F : X → X∗ denotes the duality mapping. In fact, if x ∗ ∈ F(x), then x ∗ , x − y = |x|2 − x ∗ , y ≥
1 (|x|2 − |y|2 ) 2
for all y ∈ X.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 96 i
96 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Thus x ∗ ∈ ∂ϕ(x). Conversely, if x ∗ ∈ ∂ϕ(x), then 1 (|y|2 − |x|2 ) ≥ x ∗ , y − x
2
for all y ∈ X.
(4.2.4)
We let y = t x, 0 < t < 1, and obtain 1+t |x|2 ≤ x ∗ , x
2 and thus |x|2 ≤ x ∗ , x . Similarly, if t > 1, then |x|2 ≥ x ∗ , x and therefore |x|2 = x ∗ , x
and |x ∗ | ≥ |x|. On the other hand, setting y = x + t u, t > 0, in (4.2.4), we have t x ∗ , u ≤
1 t2 (|x + t u|2 − |x|2 ) ≤ t |u||x| + |u|2 , 2 2
which implies x ∗ , u ≤ |u||x|. Hence |x ∗ | ≤ |x| and we obtain |x|2 = |x ∗ |2 = x ∗ , x
Example 4.21. Let K be a closed convex subset of X and let ψK be the indicator function of K, i.e., 0 if x ∈ K, ψK (x) = ∞ otherwise. Obviously, ψK is convex and l.s.c. on X. By definition we have for x ∈ K ∂ψK (x) = {x ∗ ∈ X∗ : x ∗ , y − x ≤ 0 for all y ∈ K} and ∂ψK (x) = ∅ if x ∈ / X. Thus D(ψK ) = D(∂ψK ) = K and ∂ψK (x) = {0} for each interior point of K. Moreover, for x ∈ K, ∂ψK (x) coincides with the definition of the normal cone to K at x. Theorem 4.22. Let F : X → (−∞, ∞]. (1) We have x ∗ ∈ ∂F (x) ¯ if and only if F (x) ¯ + F ∗ (x ∗ ) = x ∗ , x . ¯ (2) Assume that X is reflexive. Then x ∗ ∈ ∂F (x) ¯ implies that x¯ ∈ ∂F ∗ (x ∗ ). If ∗ moreover F is convex and l.s.c., then x ∈ ∂F (x) ¯ if and only if x¯ ∈ ∂F ∗ (x ∗ ). Proof. (1) Note that x ∗ ∈ ∂F (x) ¯ if and only if x ∗ , x − F (x) ≤ x ∗ , x
¯ − F (x) ¯
(4.2.5)
for all x ∈ X. By the definition of F ∗ this implies F ∗ (x ∗ ) = x ∗ , x
¯ − F (x). ¯ Conversely, if F (x ∗ ) + F (x) ¯ = x ∗ , x , ¯ then (4.2.5) holds for all x ∈ X. (2) Since F ∗∗ ≤ F by Theorem 4.14, it follows from (1) that F ∗∗ (x) ¯ ≤ x ∗ , x
¯ − F ∗ (x ∗ ).
i
i i
i
i
i
i
4.2. Convex analysis
ItoKunisc 2008/6/12 page 97 i
97
By the definition of F ∗∗ , F ∗∗ (x) ¯ ≥ x ∗ , x
¯ − F ∗ (x ∗ ). Hence F ∗ (x ∗ ) + F ∗∗ (x) ¯ = x ∗ , x . ¯ Since X is reflexive, F ∗∗ = (F ∗ )∗ . Applying (1) to F ∗ we have x¯ ∈ ∂F ∗ (x ∗ ). If in addition F is convex and l.s.c., it follows from Theorem 4.12 that F ∗∗ = F . Thus if x¯ ∈ ∂F ∗ (x ∗ ), then F (x) ¯ + F ∗ (x ∗ ) = F ∗ (x ∗ ) + F ∗∗ (x) ¯ = x ∗ , x
¯ by applying (1) to F ∗ . Therefore x ∗ ∈ ∂F (x) ¯ again by (1). Proposition 4.23. For F : X → (−∞, ∞] the set ∂F (x) is closed and convex for every x ∈ X. Proof. If F (x) = ∞, then ∂F (x) = ∅. Henceforth let F (x) < ∞. For every x ∗ ∈ X ∗ we have F ∗ (x ∗ ) ≥ x ∗ , x − F (x) and hence by Theorem 4.22 ∂F (x) = {x ∗ ∈ X∗ : F ∗ (x ∗ ) − x ∗ , x ≤ −F (x)}. By Theorem 4.13 the functional x ∗ → F ∗ (x ∗ ) − x ∗ , x is convex and l.s.c. The claim follows from Lemma 4.3. Theorem 4.24. If the convex function F is continuous at x, ¯ then ∂F (x) ¯ is not empty. Proof. Since F is continuous at x, there exists for every > 0 an open neighborhood U of x¯ such that F (x) ≤ F (x) ¯ + , x ∈ U . Then U × (F (x) ¯ + , ∞) is an open set in X × R, contained in epi F . Hence (epi F )o , the relative interior of epi F , is nonempty. Since F is convex, epi F is convex and (epi F )o is convex. Note that (x, ¯ F (x)) ¯ is a boundary point of epi F . Hence by the Hahn–Banach separation theorem, there exists a closed hyperplane S = {(x, a) ∈ X × R : x ∗ , x + α a = β} for nontrivial (x ∗ , α) ∈ X ∗ × R and β ∈ R such that x ∗ , x + α a > β
for all (x, a) ∈ (epi F )o ,
x ∗ , x
¯ + α F (x) ¯ = β.
(4.2.6)
Since (epi F )o = epi F , every neighborhood of (x, a) ∈ epi F contains an element of (epi ϕ)o . Suppose x ∗ , x + α a < β. Then {(x, ˜ a) ˜ ∈ X × R : x ∗ , x
˜ + α a˜ < β}
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 98 i
98 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization is a neighborhood of (x, a) and contains an element of (epi ϕ)o , which contradicts (4.2.6). Therefore x ∗ , x + α a ≥ β
for all (x, a) ∈ epi F.
(4.2.7)
Suppose α = 0. For any u ∈ U there is an a ∈ R such that F (u) ≤ a. Then from (4.2.7) x ∗ , u = x ∗ , u + α a ≥ β and thus
x ∗ , u − x
¯ ≥0
for all u ∈ U .
Choose δ > 0 such that |u − x| ¯ ≤ δ implies u ∈ U . For any nonzero element x ∈ X let δ t = |x| . Then |(tx + x) ¯ − x| ¯ = |tx| = δ so that tx + x¯ ∈ U . Hence ¯ − x /t ¯ ≥ 0. x ∗ , x = x ∗ , (tx + x) Similarly, −t x + x¯ ∈ U and x ∗ , x = x ∗ , (−tx + x) ¯ − x /(−t) ¯ ≤ 0. Thus, x ∗ , x , x ∗ = 0, which is a contradiction. Therefore α is nonzero. From (4.2.6), (4.2.7) we have for a > F (x) ¯ that α(a − F (x)) ¯ > 0 and hence α > 0. Employing (4.2.6), (4.2.7) again implies that ∗ x − , x − x¯ + F (x) ¯ ≤ F (x) α ∗
for all x ∈ X and therefore − xα ∈ ∂F (x). ¯
4.3
Fenchel duality theory
In this section we discuss some elements of duality theory. We call inf F (x)
x∈X
(P )
the primal problem, where X is a real Banach space and F : X → (−∞, ∞] is a proper l.s.c. convex function. We have the following result for the existence of a minimizer. Theorem 4.25. Let X be reflexive and let F be a l.s.c. proper convex functional defined on X satisfying lim F (x) = ∞.
|x|→∞
(4.3.1)
Then there exists an x¯ ∈ X such that F (x) ¯ = inf {F (x) : x ∈ X}. Proof. Let η = inf {F (x) : x ∈ X} and let {xn } be a minimizing sequence such that limn→∞ F (xn ) = η. Condition (4.3.1) implies that {xn } is bounded in X. Since X is
i
i i
i
i
i
i
4.3. Fenchel duality theory
ItoKunisc 2008/6/12 page 99 i
99
reflexive, there exists a subsequence that converges weakly to some x¯ in X and it follows from Lemma 4.4 that F (x) ¯ = η. We embed (P ) into a family of perturbed problems inf (x, y),
(Py )
x∈X
where y ∈ Y is an embedding variable, Y is a Banach space, and : X × Y → (−∞, ∞] is a proper l.s.c. convex function with (x, 0) = F (x). Thus (P0 ) = (P ). For example, in terms of (4.1.1) we let F (x) = f (x) + ϕ(x) (4.3.2) and with Y = H (x, y) = f (x) + ϕ(x + y).
(4.3.3)
Definition 4.26. The dual problem of (P ) with respect to is defined by sup (−∗ (0, y ∗ )).
(P ∗ )
y ∗ ∈Y ∗
The value function of (Py ) is defined by h(y) = inf (x, y), x∈X
y ∈ Y.
In this section we analyze the relationship between the primal problem (P ) and its dual (P ∗ ). Throughout we assume that h(y) > −∞ for all y ∈ Y . Theorem 4.27. sup (P ∗ ) ≤ inf (P ). Proof. For any (x, y) ∈ X × Y and (x ∗ , y ∗ ) ∈ X∗ × Y ∗ we have x ∗ , x + y ∗ , y − (x, y) ≤ ∗ (x ∗ , y ∗ ). Thus 0 = 0, x + y ∗ , 0 ≤ F (x) + ∗ (0, y ∗ ) for all x ∈ X and y ∗ ∈ Y ∗ . Therefore sup (−∗ (0, y ∗ )) = sup (P ∗ ) ≤ inf (P ) = inf F (x). x∈X
y ∗ ∈Y ∗
Lemma 4.28. h is convex. Proof. The proof is established by contradiction. Suppose there exist y1 , y2 ∈ Y and θ ∈ (0, 1) such that θ h(y1 ) + (1 − θ ) h(y2 ) < h(θ y1 + (1 − θ) y2 ).
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 100 i
100 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Then there exist c and > 0 such that θ h(y1 ) + (1 − θ ) h(y2 ) < c − < c < h(θ y1 + (1 − θ) y2 ). Set a1 = h(y1 ) +
θ
and a2 =
c − θ a1 c − − θ h(y1 ) = > h(y2 ). 1−θ 1−θ
By definition of h there exist x1 , x2 ∈ X such that h(y1 ) ≤ (x1 , y1 ) ≤ a1
and
h(y2 ) ≤ (x2 , y2 ) ≤ a2 .
Thus h(θ y1 + (1 − θ ) y2 ) ≤ (θ x1 + (1 − θ) x2 , θ y1 + (1 − θ) y2 ) ≤ θ (x1 , y1 ) + (1 − θ) (x2 , y2 ) ≤ θ a1 + (1 − θ) a2 = c, which is a contradiction. Hence h is convex. Lemma 4.29. For all y ∗ ∈ Y ∗ , h∗ (y ∗ ) = ∗ (0, y ∗ ). Proof. h∗ (y ∗ ) = sup ( y ∗ , y − h(y)) = sup ( y ∗ , y − inf (x, y)) y∈Y
x∈X
y∈Y
= sup sup ( y ∗ , y − (x, y)) = sup sup ( 0, x + y ∗ , y − (x, y)) y∈Y x∈X
=
y∈Y x∈X
( (0, y ∗ ), (x, y) − (x, y)) = ∗ (0, y ∗ ).
sup (x,y)∈X×Y
Theorem 4.30. If h is l.s.c. at 0, then inf (P ) = sup (P ∗ ). Proof. Since F is proper, h(0) = inf x∈X F (x) < ∞. Since h is convex by Lemma 4.28, it follows from Theorem 4.15 that h(0) = h∗∗ (0). Thus by Lemma 4.29 sup (P ∗ ) = sup (−∗ (0, y ∗ )) = sup ( y ∗ , 0 − h∗ (y ∗ )) y ∗ ∈Y ∗
y ∗ ∈Y ∗
= h∗∗ (0) = h(0) = inf (P ). Theorem 4.31. If h is subdifferentiable at 0, then inf (P ) = sup (P ∗ ) and ∂h(0) is the set of solutions of (P ∗ ). Proof. By Lemma 4.29 we have that y¯ ∗ solves (P ∗ ) if and only if −h∗ (y¯ ∗ ) = −∗ (0, y¯ ∗ ) = sup (−∗ (0, y ∗ )) y ∗ ∈Y ∗
= sup ( y ∗ , 0 − h∗ (y ∗ )) = h∗∗ (0). y ∗ ∈Y ∗
i
i i
i
i
i
i
4.3. Fenchel duality theory
ItoKunisc 2008/6/12 page 101 i
101
By Theorem 4.22 h∗∗ (0) + h∗∗∗ (y¯ ∗ ) = y¯ ∗ , 0 = 0 if and only if y¯ ∗ ∈ ∂h∗∗ (0). Since h∗∗∗ = h∗ by Theorem 4.16, we have y¯ ∗ ∈ ∂h∗∗ (y ∗ ) if and only if −h∗ (y¯ ∗ ) = h∗∗ (0). Consequently y¯ ∗ solves (P ∗ ) if and only if y ∗ ∈ ∂h∗∗ (0). Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗ (0) by Theorem 4.17. Therefore ∂h(0) is the set of all solutions of (P ∗ ) and (P ∗ ) has at least one solution. Let y ∗ ∈ ∂h(0). Then y ∗ , x + h(0) ≤ h(x) for all x ∈ X. If {xn } is a sequence in X such that xn → 0, then lim inf h(xn ) ≥ lim y ∗ , xn + h(0) = h(0) n→∞
and h is l.s.c. at 0. By Theorem 4.30 inf (P ) = sup (P ∗ ). Corollary 4.32. If there exists an x¯ ∈ X such that (x, ¯ ·) is finite and continuous at 0, then h is continuous on an open neighborhood U of 0 and h = h∗∗ . Moreover, inf (P ) = sup (P ∗ ) and ∂h(0) is the set of solutions of (P ∗ ). Proof. First show that h is continuous. Clearly, (x, ¯ ·) is bounded above on an open neighborhood U of 0. Since for all y ∈ Y h(y) ≤ (x, ¯ y), h is bounded above on U . Since h is convex by Lemma 4.28 it is continuous by Theorem 4.6. Hence h = h∗∗ by Theorem 4.15. Moreover h is subdifferentiable at 0 by Theorem 4.24. The conclusion then follows from Theorem 4.31. Example 4.33. Consider the case (4.3.2)–(4.3.3), i.e., (x, y) = f (x) + ϕ(x + y), where f : X → (−∞, ∞], ϕ : Y → (−∞, ∞] are l.s.c. and convex and : X → Y is a continuous linear operator. Let us calculate the conjugate of : ∗ (x ∗ , y ∗ ) = sup sup{ x ∗ , x + y ∗ , y − (x, y)} x∈X y∈Y
= sup{ x ∗ , x − f (x) + sup[ y ∗ , y − ϕ(x + y)]}, x∈X
y∈Y
where sup[ y ∗ , y − ϕ(x + y)] = sup[ y ∗ , x + y − ϕ(x + y) − y ∗ , x ] y∈Y
y∈Y
= sup[ y ∗ , z − ϕ(z)] − y ∗ , x = ϕ ∗ (y ∗ ) − y ∗ , x . z∈Y
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 102 i
102 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Thus ∗ (x ∗ , y ∗ ) = sup{ x ∗ , x − y ∗ , x − f (x) + ϕ ∗ (y ∗ )} x∈X
= sup{ x ∗ − ∗ y ∗ , x − f (x) + ϕ ∗ (y ∗ )} = f ∗ (x ∗ − y ∗ ) + ϕ ∗ (y ∗ ). x∈X
Theorem 4.34. For any x¯ ∈ X and y¯ ∗ ∈ Y ∗ , the following statements are equivalent. (1) x¯ solves (P ), y¯ ∗ solves (P ∗ ), and min (P ) = max (P ∗ ). (2) (x, ¯ 0) + ∗ (0, y¯ ∗ ) = 0. (3) (0, y¯ ∗ ) ∈ ∂(x, ¯ 0). Proof. Clearly (1) implies (2). If (2) holds, then (x, ¯ 0) = F (x) ¯ ≥ inf (P ) ≥ sup (P ∗ ) ≥ −∗ (0, y¯ ∗ ) by Theorem 4.27. Thus (x, ¯ 0) = min (P ) = max (P ∗ ) = −∗ (0, y¯ ∗ ). Therefore (2) implies (1). Since (0, y¯ ∗ ), (x, ¯ 0) = 0, equivalence of (2) and (3) follows by Theorem 4.22. Any solution y ∗ of (P ∗ ) is called a Lagrange multiplier associated with . For Example 4.33 the optimality condition implies 0 = (x, ¯ 0) + ∗ (0, y¯ ∗ ) = f (x) ¯ + f ∗ (−∗ y¯ ∗ ) + ϕ(x) ¯ + ϕ(y¯ ∗ ) = [f (x) ¯ + f ∗ (−∗ y¯ ∗ ) − −∗ y¯ ∗ , x ] ¯ + [ϕ(x) ¯ + ϕ ∗ (y¯ ∗ ) − y¯ ∗ , x)]. ¯
(4.3.4)
Since each expression of (4.3.4) in square brackets is nonnegative, it follows that ¯ = 0, f (x) ¯ + f ∗ (−∗ y¯ ∗ ) − −∗ y¯ ∗ , x
ϕ(x) ¯ + ϕ ∗ (y¯ ∗ ) − y¯ ∗ , x) ¯ = 0. By Theorem 4.22 ¯ −∗ y¯ ∗ ∈ ∂f (x), y¯ ∗ ∈ ∂ϕ(x). ¯
(4.3.5)
The functional L : X × Y ∗ → (−∞, ∞] defined by −L(x, y ∗ ) = sup { y ∗ , y − (x, y)}
(4.3.6)
y∈Y
i
i i
i
i
i
i
4.3. Fenchel duality theory
ItoKunisc 2008/6/12 page 103 i
103
is called the Lagrangian. Note that ∗ (x ∗ , y ∗ ) =
{ x ∗ , x + y ∗ , y − (x, y)}
sup x∈X, y∈Y
= sup x ∗ , x + sup { y ∗ , y − (x, y)}} = sup ( x ∗ , x − L(x, y ∗ )). x∈X
y∈Y
Thus
x∈X
−∗ (0, y ∗ ) = inf L(x, y ∗ ) x∈X
(4.3.7)
and therefore the dual problem (P ∗ ) is equivalent to sup inf L(x, y ∗ ).
y ∗ ∈Y ∗ x∈X
If is a convex l.s.c. function that is finite at (x, y), then for the biconjugate of x : y → (x, y) in y we have x (y)∗∗ = (x, y) and ∗ ∗ ∗ (x, y) = ∗∗ x (x, y) = sup { y , y − x (y )} y ∗ ∈Y ∗
= sup { y ∗ , y + L(x, y ∗ )}. y ∗ ∈Y ∗
Hence
(x, 0) = sup L(x, y ∗ ) y ∗ ∈Y ∗
(4.3.8)
and the primal problem (P ) is equivalently written as inf sup L(x, y ∗ ).
x∈X y ∗ ∈Y
Thus by means of the Lagrangian L, the problems (P ) and (P ∗ ) can be formulated as min-max problems, and by Theorem 4.27 sup inf L(x, y ∗ ) ≤ inf sup L(x, y ∗ ). y∗
x
x
y∗
Theorem 4.35 (Saddle Point). Assume that is a convex l.s.c. function that is finite at (x, ¯ y¯ ∗ ). Then the following are equivalent. (1) (x, ¯ y¯ ∗ ) ∈ X × Y ∗ is a saddle point of L, i.e., L(x, ¯ y ∗ ) ≤ L(x, ¯ y¯ ∗ ) ≤ L(x, y¯ ∗ ) for all x ∈ X, y ∗ ∈ Y ∗ .
(4.3.9)
(2) x¯ solves (P ), y¯ ∗ solves (P ∗ ), and min (P ) = max (P ∗ ). Proof. Suppose (1) holds. From (4.3.7) and (4.3.9) L(x, ¯ y¯ ∗ ) = inf L(x, y¯ ∗ ) = −∗ (0, y¯ ∗ ) x∈X
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 104 i
104 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization and from (4.3.8) and (4.3.9) L(x, ¯ y¯ ∗ ) = sup L(x, ¯ y ∗ ) = (x, ¯ 0). y ∗ ∈Y ∗
Thus, ∗ (x, ¯ 0) + (0, y¯ ∗ ) = 0 and (2) follows from Theorem 4.34. Conversely, if (2) holds, then from (4.3.7) and (4.3.8) −∗ (0, y¯ ∗ ) = inf L(x, y¯ ∗ ) ≤ L(x, ¯ y¯ ∗ ), x∈X
¯ y ∗ ) ≥ L(x, ¯ y¯ ∗ ). (x, ¯ 0) = sup L(x, y ∗ ∈Y ∗
Consequently −∗ (0, y¯ ∗ ) = (x, ¯ 0) by Theorem 4.34 and (4.3.9) holds. Theorem 4.35 implies that no duality gap between (P ) and (P ∗ ) is equivalent to the saddle point property of the pair (x, ¯ y¯ ∗ ). For Example 4.33 we have L(x, y ∗ ) = f (x) + y ∗ , x − ϕ(y ∗ ).
(4.3.10)
If (x, ¯ y¯ ∗ ) is a saddle point, then from (4.3.9) −∗ y¯ ∗ ∈ ∂f (x), ¯ x¯ ∈ ∂ϕ ∗ (x). ¯
(4.3.11)
It follows from Theorem 4.22 that the second equation is equivalent to y¯ ∗ ∈ ∂ϕ(x) ¯ and (4.3.11) is equivalent to (4.3.5), if X is reflexive. Thus the necessary optimality system for min F (x) = f (x) + ϕ(x) x∈X
is given by
¯ −∗ y¯ ∗ ∈ ∂f (x), y¯ ∗ ∈ ∂ϕ ∗ (x). ¯
4.4
(4.3.12)
Generalized Yosida–Moreau approximation
Definition 4.36 (Monotone Operator). Let A be a graph in X × X ∗ . (1) A is monotone if y1 − y2 , x1 − x2 ≥ 0 for all x1 , x2 ∈ D(A) and y1 ∈ Ax1 , y2 ∈ Ax2 (2) A is maximal monotone if any monotone extension of A coincides with A.
i
i i
i
i
i
i
4.4. Generalized Yosida–Moreau approximation
ItoKunisc 2008/6/12 page 105 i
105
Let ϕ be an l.s.c. convex function on X. For x1∗ ∈ ∂ϕ(x1 ) and x2∗ ∈ ∂ϕ(x2 ), ϕ(x1 ) − ϕ(x2 ) ≤ x2∗ , x1 − x2 , ϕ(x2 ) − ϕ(x1 ) ≤ x1∗ , x2 − x1 . It follows that x1∗ − x2∗ , x1 − x2 ≥ 0. Hence ∂ϕ is a monotone in X × X∗ . The following theorem characterizes maximal monotone operators by a range condition [ItKa]. Theorem 4.37 (Minty–Browder). Assume that X and X∗ are reflexive and strictly convex Banach spaces and let F : X → X∗ denote the duality mapping. Then a monotone operator A is maximal monotone if and only if R(λ F + A) = X∗ for all λ > 0 (or, equivalently, for some λ > 0). Theorem 4.38 (Rockafeller). Let X be a real Banach space. If ϕ is an l.s.c. proper convex functional on X, then ∂ϕ is a maximal monotone operator from X into X∗ . Proof. We prove the theorem for the case that X is reflexive . The general case is considered in [Roc2]. ByAsplund’s renorming theorem we can assume that after choosing an equivalent norm, X and X∗ are strictly convex. Using the Minty–Browder theorem it suffices to prove that R(F + ∂ϕ) = X∗ . For x0∗ ∈ X∗ we must show that the equation x0∗ ∈ F x + ∂ϕ(x) has at least a solution x0 . Note that F x is single valued due to the fact that X ∗ is strictly convex. Define the proper convex functional f : X → (−∞, ∞] by f (x) =
1 2 |x| + ϕ(x) − x0∗ , x . 2 X
Since f is l.s.c. and f (x) → ∞ as |x| → ∞, there exists x0 ∈ D(f ) such that f (x0 ) ≤ f (x) for all x ∈ X. The subdifferential of the mapping x → 12 |x| 2 is given by the monotone operator F . Hence we find ϕ(x) − ϕ(x0 ) ≥ x0∗ , x − x0 − F (x), x − x0 for all x ∈ X. Setting x(t) = x0 + t (u − x0 ) with u ∈ X and using convexity of ϕ we have ϕ(u) − ϕ(x0 ) ≥ x0∗ , u − x0 − F (x(t)), u − x0 . Taking the limit t → 0+ and using the fact that x → F (x) is continuous from X endowed with the norm topology to X ∗ endowed with the weak∗ topology, we obtain ϕ(u) − ϕ(x0 ) ≥ x0∗ , u − x0 − F (x0 ), u − x0 , which implies that x0∗ − F (x0 ) ∈ ∂ϕ(x0 ). Throughout the remainder of this section let H denote a real Hilbert space which is identified with its dual H ∗ . Further let A denote a maximal monotone operator in H × H . Recall that A is necessarily densely defined and closed. Moreover the resolvent
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 106 i
106 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Jμ = (I + μ A)−1 , with μ > 0, is a contraction defined on all of H ; see [Bre, ItKa, Paz]. Moreover |Jμ x − x| → 0 as μ → 0+ for each x ∈ H. (4.4.1) The Yosida approximation Aμ of A is defined by Aμ x =
1 (x − Jμ x). μ
The operator Aμ is single valued, monotone, everywhere defined, Lipschitz continuous with Lipschitz constant μ1 and Aμ x ∈ AJμ x for all x ∈ H . Let ϕ be an l.s.c., proper, and convex function on H . Throughout the remainder of this chapter let A denote the maximal monotone operator ∂ϕ on the Hilbert space H = H ∗ . For x, λ ∈ H and c > 0 define the functional ϕc (x, λ) by c ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.4.2) u∈H 2 Then ϕc (x, λ) is a smooth approximation of ϕ in the following sense. Theorem 4.39. For x, λ ∈ H the infimum in (4.4.2) is attained at a unique point λ uc (x, λ) = x − J 1 x + . c c ϕc (x, λ) is convex, Lipschitz continuously Fréchet differentiable in x and λ ϕc (x, λ) = λ + c uc (x, λ) = A 1 x + . c c Moreover, limc→∞ ϕc (x, λ) = ϕ(x) and λ 1 ϕ J1 x + − |λ|2 ≤ ϕc (x, λ) ≤ ϕ(x) for every x, λ ∈ H. c c 2c H
Proof. First we observe that (4.4.2) is equivalent to
2
c
1 λ ϕc (x, λ) = inf ϕ(v) + x + − v
− |λ|2 , v∈H 2 c 2c
(4.4.3)
where v = x − u. Note that v → ψ(v) = ϕ(v) + 2c |v − y|2 , with y = x + λc , is convex and l.s.c. for every y ∈ H . Since ϕ is proper, D(A) = ∅, and thus for x0 ∈ D(A), and x0∗ ∈ Ax0 we have ϕ(x) ≥ ϕ(x0 ) + (x0∗ , x − x0 )
for all x ∈ H.
Thus, lim ψ(v) = ∞ as |v| → ∞. Hence there exists a unique minimizer v0 ∈ H of ψ. Let ξ ∈ H and define η = (1 − t)v0 + tξ for 0 < t < 1. Since ϕ is convex we have ϕ(η) − ϕ(v0 ) ≤ t (ϕ(ξ ) − ϕ(v0 )).
(4.4.4)
i
i i
i
i
i
i
4.4. Generalized Yosida–Moreau approximation
ItoKunisc 2008/6/12 page 107 i
107
Moreover, since ψ(v0 ) ≤ ψ(η), ϕ(η) − ϕ(v0 ) ≥
c (|v0 − y|2 − |(1 − t)v0 + tξ − y|2 ) 2 t 2c = tc (v0 − ξ, v0 − y) − |v0 − ξ |2 . 2
(4.4.5)
From (4.4.4) and (4.4.5) we obtain, taking the limit t → 0+ , ϕ(ξ ) − ϕ(v0 ) ≥ c (y − v0 , ξ − v0 ). Since ξ ∈ H is arbitrary this implies that y − v0 ∈
1 c
Av0 . Thus v0 = J 1 y and c
uc (x, λ) = x − v0 = x − J 1 c
x+
λ c
attains the minimum in (4.4.2). Note that this argument also implies that A is maximal. For x1 , x2 ∈ X and 0 < t < 1 ϕc ((1 − t)x1 + tx2 , λ) = ψ((1 − t)v1 + tv2 ) −
1 2 |λ| , 2c
where yi = xi + λc and vi = J 1 yi for i = 1, 2. Hence the convexity of x → ϕc (x, λ) c follows from the one of ψ. Next, we show that ∂ϕc (x, λ) = A 1 (x + λc ). For xˆ ∈ H , let yˆ = xˆ + λc ∈ H and c vˆ = J 1 y. ˆ Then, we have c
ϕ(v) ˆ +
c c |vˆ − y|2 ≥ ϕ(v0 ) + |v0 − y|2 2 2
ϕ(v0 ) +
c c ˆ 2 ≥ ϕ(v) ˆ + |vˆ − y| |v0 − y| ˆ 2. 2 2
and
Thus, c c ˆ 2 − |v0 − y|2 ) ≥ ϕc (x, ˆ λ) − ϕc (x, λ) ≥ (|vˆ − y| (|v0 − y| ˆ 2 − |vˆ − y|2 ). 2 2
(4.4.6)
Since |vˆ − v0 | → 0 as |xˆ − x| → 0, it follows from (4.4.6) that ˆ λ) − ϕc (x, λ) − c(y − v0 ), xˆ − x | |ϕc (x, →0 |xˆ − x| as |xˆ − x| → 0. Hence x → ϕc (x, λ) is Fréchet differentiable with F -derivative λ c(y − v0 ) = λ + c uc (x, λ) = A 1 x + . c c
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 108 i
108 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization The dual representation of ϕc (x, λ) is derived next. Define the functional (v, y) on H × H by c (v, y) = ϕ(v) + |v − (yˆ + y)|2 , 2 where we set yˆ = x + λc . Consider the family of primal problems inf (v, y)
v∈H
and the dual problem
for each y ∈ H
(Py )
sup {−∗ (0, y ∗ )}.
(P ∗ )
y ∗ ∈H
If h(y) is the value functional of (Py ), i.e., h(y) = inf v∈H (v, y), then from (4.4.3) we have ϕc (x, λ) = h(0) −
1 |λ|2 . 2c
(4.4.7)
From the proof of Theorem 4.39 it follows that h(y) is continuously Fréchet differentiable with h (0) = ϕc (x, λ). It thus follows from Theorem 4.31 that inf (P0 ) = max (P ∗ ) and h (0) is the solution of (P ∗ ). This leads to the following theorem. Theorem 4.40. For x, λ ∈ H
ϕc (x, λ) = sup
y ∗ ∈H
1 ∗ 2 (x, y )H − ϕ (y ) − |y − λ|H , 2c ∗
∗
∗
where the supremum is attained at a unique point λc (x, λ) and we have λ λc (x, λ) = λ + c uc (x, λ) = ϕc (x, λ) = A 1 x + . c c
(4.4.8)
(4.4.9)
Proof. For (v ∗ , y ∗ ) ∈ H × H we have by definition c ∗ (v ∗ , y ∗ ) = sup sup (v ∗ , v) + (y ∗ , y) − ϕ(v) − |v − (yˆ + y)|2 2 v∈H y∈H
= sup (v ∗ , v) + (y ∗ , v − y) ˆ − ϕ(v) v∈H
1
c + sup − y ∗ , v − (yˆ + y) − |v − (yˆ + y)|2 2 y∈H
2
1 ∗2 |y | ˆ + = sup (y ∗ + v ∗ , v) − ϕ(v) − (y ∗ , y) 2c v∈H ˆ + = ϕ ∗ (y ∗ + v ∗ ) − (y ∗ , y)
1 ∗2 |y | . 2c
i
i i
i
i
i
i
4.5. Optimality systems
ItoKunisc 2008/6/12 page 109 i
109
Hence, ∗
∗
h(0) = sup {− (0, y )} = sup y ∗ ∈H
y ∗ ∈H
1 ∗2 (y , y) ˆ − ϕ (y ) − |y | 2c ∗
∗
∗
which implies (4.4.8) from (4.4.7) and since yˆ = x + λc . By Theorem 4.31 the maximum of y ∗ → y ∗ , x − ϕ ∗ (y ∗ ) − 2c1 |y ∗ − λ|2 is attained at the unique point h (0) = λc (x, λ) that is given by (4.4.9). The following theorem provides an equivalent characterization of λ ∈ ∂ϕ(x). This results will be used in the following section to replace the differential inclusion, which relates the primal and adjoint variable by means of a nonlinear equation. Theorem 4.41. (1) If λ ∈ ∂ϕ(x) for x, λ ∈ H , then λ = ϕc (x, λ) for all c > 0. (2) Conversely, if λ = ϕc (x, λ) for some c > 0, then λ ∈ ∂ϕ(x). Proof. If λ ∈ ∂ϕ(x), then by Theorems 4.39, 4.40, and 4.22 ϕ(x) ≥ ϕc (x, λ) ≥ λ, x − ϕ ∗ (λ) = ϕ(x) for every c > 0. Thus, λ ∈ H attains the supremum in (4.4.8) and by Theorem 4.40 we have λ = ϕc (x, λ). Conversely, if λ ∈ H satisfies λ = ϕc (x, λ) for some c > 0, then uc (x, λ) = 0 by Theorem 4.40. Hence it follows from Theorem 4.39, (4.4.2), and Theorem 4.40 that ϕ(x) = ϕc (x, λ) = λ, x − ϕ ∗ (λ), and thus λ ∈ ∂ϕ(x) by Theorem 4.22.
4.5
Optimality systems
In this section we derive first order optimality systems for (4.1.1) based on Lagrange multipliers. In what follows we assume that f : X → R and ϕ : H → (−∞, ∞] are l.s.c., convex functions, that f is continuously differentiable, and that ϕ is proper. Further ∈ L(X, H ) and C is a closed convex set in X. In addition we require that (A1) f, ϕ are bounded below by zero on K, (A2) f (x1 ) − f (x2 ), x1 − x2 X∗ ,X ≥ σ |x1 − x2 |2X for some σ > 0 independent of x1 , x2 ∈ C, and (A3) ϕ(x0 ) < ∞ and ϕ is continuous at x0 for some x0 ∈ C.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 110 i
110 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization As a consequence of (A2) we have f (x) − f (x0 ) − f (x0 ), x − x0 X∗ ,X 1 σ = f (x0 + t (x − x0 )) − f (x0 ), x − x0 X∗ ,X dt ≥ |x − x0 |2X 2 0
(4.5.1)
for all x ∈ X. Due to Theorem 4.24 there exists y0∗ ∈ D(A(y0 )) = ∂ϕ(y0 ) such that ϕ(x) − ϕ(x0 ) ≥ (y0∗ , x − x0 )H
for x0 ∈ H.
(4.5.2)
Hence, lim f (x) + ϕ(x) → ∞ as |x|X → ∞ and it follows from Theorem 4.25 and (A2) that there exists a unique minimizer x¯ ∈ C of (4.1.1). Theorem 4.42 (Optimality). A necessary and sufficient condition for x¯ ∈ C to be the minimizer of (4.1.1) is given by f (x), ¯ x − x
¯ X∗ ,X + ϕ(x) − ϕ(x) ¯ ≥ 0 for all x ∈ C.
(4.5.3)
Proof. Assume that x¯ is the minimizer of (4.1.1). Then for x ∈ C and 0 < t < 1 we have x¯ + t (x − x) ¯ ∈ C and f (x¯ + t (x − x)) ¯ + ϕ((x¯ + t (x − x)) ¯ ≥ f (x) ¯ + ϕ(x). ¯ Since ϕ(((1 − t)x¯ + tx)) − ϕ(x) ¯ ≤ t (ϕ(x) − ϕ(x)), ¯ we obtain ¯ − f (x)) ¯ + ϕ(x) − ϕ(x) ¯ ≥ 0. t −1 (f (x¯ + t (x − x)) Taking the limit t → 0+ , we have (4.5.3) for all x ∈ C. Conversely, assume that x¯ ∈ C satisfies (4.5.3). Then, from (4.5.1), f (x) + ϕ(x) − (f (x) ¯ + ϕ(x)) ¯ ¯ x − x
¯ + f (x), ¯ x − x
¯ + ϕ(x) − ϕ(x) ¯ = f (x) − f (x) ¯ − f (x), ≥
σ |x − x| ¯ 2X 2
for all x ∈ C. Thus, x¯ is a minimizer of (4.1.1). Next, ϕ is split from the optimality system in (4.5.3) by means of an appropriately defined Lagrange multiplier. For this purpose we consider the regularized minimization problems: min f (x) + ϕc (x, λ) over x ∈ C
(4.5.4)
i
i i
i
i
i
i
4.5. Optimality systems
ItoKunisc 2008/6/12 page 111 i
111
for c > 0 and λ ∈ H . From Theorem 4.39 it follows that x → ϕc (x, λ) is convex, continuously differentiable, and bounded below by − 2c1 |λ|2H . Thus, for λ ∈ H , (4.5.4) has a unique solution xc ∈ C and xc ∈ C is the solution of (4.5.4) if and only if xc satisfies f (xc ), x − xc + ϕc (xc , λ), (x − xc ) H ≥ 0 for all x ∈ C. (4.5.5) It follows from Theorems 4.39 and 4.40 that 1 ϕc (xc , λ) = A 1 xc + λ = λc ∈ H. c c
(4.5.6)
We have the following result. Theorem 4.43 (Lagrange Multipliers). (1) Suppose that xc converges strongly to x¯ in X as c → ∞ and that {λc }c≥1 has a weak cluster point. Then for each weak cluster point λ¯ ∈ H of {λc }c≥1 λ¯ ∈ ∂ϕ(x) ¯ and
¯ (x − x) f (x), ¯ x − x
¯ X∗ ,X + λ, ¯ H ≥ 0 for all x ∈ C.
(4.5.7)
¯ ∈ C × H satisfies (4.5.7), then x¯ minimizes (4.1.1). Conversely, if (x, ¯ λ) (2) Assume that there exists λ˜ c ∈ ∂ϕ(xc ) for c ≥ 1 such that {|λ˜ c |H }, c ≥ 1, is bounded. Then, xc converges strongly to x¯ in X as c → ∞. Proof. To verify (1), assume that xc → x¯ and let λ¯ be a weak cluster point of λc in H . ¯ ∈ C × H satisfies Then, from (4.5.5)–(4.5.6), we can conclude that (x, ¯ λ) ¯ (x − x) f (x), ¯ x − x
¯ + λ, ¯ H ≥0 for all x ∈ C. It follows from Theorems 4.39 and 4.40 that 1 λc , vc H − |λc − λ|2 = ϕc (xc , λ) + ϕ ∗ (λc ) 2c 1 λ ≥ ϕ J 1 vc + − |λ|2 + ϕ ∗ (λc ). c c 2c H Since J 1 (vc + λc ) → v¯ = x¯ and ϕ and ϕ ∗ are l.s.c., letting c → ∞, we obtain c ¯ v¯ ≥ ϕ(v) ¯ λ, ¯ + ϕ ∗ (λ), H ¯ ∈ C × H satisfies (4.5.7). which implies that λ¯ ∈ ∂ϕ(v) ¯ by Theorem 4.22. Hence, (x, ¯ λ) ¯ ∈ C × H satisfies (4.5.7). Then ϕ(x) − ϕ(x) Conversely, suppose that ( x, ¯ λ) ¯ ≥ ¯ (x − x) λ, ¯ H for all x ∈ C. Thus the inequality in (4.5.7) implies (4.5.3). It then follows from Theorem 4.42 that x¯ minimizes (4.1.1). For (2) we note that from (4.5.5) we have f (xc ), x − xc + ϕc (x, λ) − ϕc (xc , λ) ≥ 0
for all x ∈ C.
Thus f (xc ), x¯ − xc + ϕc (x, ¯ λ) − ϕc (xc , λ) ≥ 0.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 112 i
112 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Also, from (4.5.3) f (x), ¯ xc − x
¯ + ϕ(xc ) − ϕ(x) ¯ ≥ 0. Adding these inequalities, we obtain f (x) ¯ − f (xc ), x¯ − xc + ϕc (xc , λ) − ϕ(xc ) (4.5.8) ≤ ϕc (x, ¯ λ) − ϕ(x) ¯ ≤ 0. By (4.4.8) ϕ(vc ) −
1 1 |λ˜ c − λ|2H = vc , λ˜ c H − ϕ ∗ (λ˜ c ) − |λ˜ c − λ|2H ≤ ϕc (vc , λ), 2c 2c
(4.5.9)
where vc = xc and hence ϕ(vc ) − ϕc (vc , λ) ≤
1 |λ˜ c − λ|2H . 2c
Thus, from (4.5.8) and (A2) σ 1 |xc − x ∗ |2X ≤ |λ˜ c − λ|2H 2 2c and since {|λ˜ c |H }c≥1 is bounded, |xc − x ∗ |X → 0 as c → ∞. The following lemma addresses the condition in Theorem 4.43 (2). Lemma 4.44. (1) If D(ϕ) = H and bounded above, on bounded sets, then ∂ϕ(xc ) is nonempty and |∂ϕ(xc )|H is uniformly bounded for c ≥ 1. (2) If ϕ = χK with K a closed convex set in H and xc ∈ K for all c > 0, then λ˜ c can be chosen to be 0 for all c > 0. Proof. (1) By definition of xc and by Theorem 4.39 f (xc ) + ϕc (vc , λ) ≤ f (x) ¯ + ϕ(x) ¯ and
f (xc ) + ϕ J 1 c
λ vc + c
≤ f (x) ¯ + ϕ(x) ¯ +
1 |λ|2 , 2c H
where vc = xc . Since ϕ is bounded below by 0, there exists a constant K1 > 0 independent of c ≥ 1 such that f (xc ) − f (x) ¯ − f (x), ¯ xc − x
¯ ≤ K1 + f (x)|x ¯ ¯ X. c − x| By (4.5.1) σ |xc − x| ¯ 2X ≤ K1 + f (x)|x ¯ ¯ X. c − x| 2 Thus there exists a constant M such that |xc − x| ¯ X ≤ M for c ≥ 1. Since ϕ is everywhere defined, convex, l.s.c., and assumed to be bounded above on bounded sets, it follows from
i
i i
i
i
i
i
4.5. Optimality systems
ItoKunisc 2008/6/12 page 113 i
113
Theorem 4.7 that ϕ is Lipschitz continuous in the open set B = {v ∈ H : |v − v| ¯ < (M + 1) }, where v¯ = x. ¯ Let L denote the Lipschitz constant. By Theorem 4.24 ∂ϕ(vc ) is nonempty for c ≥ 1. Let λ˜ c ∈ ∂ϕ(vc ) for c ≥ 1. Hence L |v − vc | ≥ ϕ(v) − ϕ(vc ) ≥ λ˜ c , v − vc for all v ∈ B. Since vc ∈ B we have |λ˜ c | ≤ L, and the claim follows. (2) In this case ∂ϕ(vc ) is given by the normal cone NC (vc ) = {w ∈ H : (w, vc −v)H ≥ 0 for all v ∈ C} and thus 0 ∈ ∂ϕ(vc ) for all c > 0. ¯ ∈ C × H that Theorem 4.45 (Complementarity). Assume that there exists a pair (x, ¯ λ) ¯ satisfies (4.5.7). Then the complementarity condition λ ∈ ∂ϕ(x) ¯ can equivalently be expressed as ¯ λ¯ = ϕc (x, ¯ λ)
(4.5.10)
¯ min f (x) + ϕc (x, λ)
(4.5.11)
and x¯ is the unique solution of x∈C
for every c > 0. Proof. The first claim follows directly from Theorem 4.41. From (4.5.5) we conclude that xˆ is a minimizer of (4.5.11) if and only if xˆ ∈ C satisfies ¯ (x − x) f (x), ˆ x − x
ˆ X∗ ,X + ϕc (x, ˆ λ), ˆ H ≥ 0 for all x ∈ C. From (4.5.7) and (4.5.10) it follows that x¯ ∈ C satisfies this inequality as well and hence xˆ = x. ¯ ¯ ∈ C × H satisfies Theorems 4.43 and 4.45 imply that if a pair (x, ¯ λ) ⎧ ¯ (x − x) ¯ x − x
¯ + λ, ¯ ≥ 0 for all x ∈ C, ⎨ f (x), ⎩ ¯ ¯ λ = ϕc (x, ¯ λ)
(4.5.12)
for some c > 0, then x¯ is the minimizer of (4.1.1). Conversely, if x¯ is a minimizer of (4.1.1) and ∂(ϕ ◦ + ψC )(x) ¯ = ∗ ∂ϕ(x) ¯ + ∂ψC (x), ¯
(4.5.13)
¯ satisfies (4.5.12) for all c > 0. Here ψC then there exists a λ¯ ∈ H such that the pair (x, ¯ λ) denotes the indicator function of the set C. In fact it follows from (4.5.7) that −f (x) ¯ ∈ ∂(ϕ ◦ + ψC )(x) ¯ and by (4.5.13) −f (x) ¯ ∈ ∗ ∂ϕ(x) ¯ + ∂ψC (x) ¯ = ∗ ∂ϕ(x) ¯ + NC (x), ¯ where NC (x) ¯ = {z ∈ X ∗ : z, x − x
¯ ≤ 0 for all x ∈ C}. This implies that there exists ¯ Condition some λ¯ ∈ ∂ϕ(x) ¯ such that (4.5.7) holds and also (4.5.12) for the pair (x, ¯ λ). (4.5.13) holds, for example, if there exists x ∈ int (C) and ϕ is continuous and finite at x (see, e.g., Propositions 12 and 13 in Section 3.2 of [EkTu]).
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 114 i
114 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization The following theorem discusses the equivalence between the existence of a Lagrange multiplier λ¯ and uniform boundedness of λc . Theorem 4.46. Suppose is compact and let λ ∈ H be fixed. Then λc = ϕc (vc , λ) is uniformly bounded in c ≥ 1 if and only if there exists λ∗ ∈ ∂ϕ(x) ¯ such that (4.5.7) holds. In either case there exists a subsequence (xcˆ , λcˆ ) ∈ X × H such that xcˆ → x¯ strongly in X ¯ ∈ C × H satisfies (4.5.7). and λcˆ → λ∗ weakly in H where (x, ¯ λ) Proof. Assume that |λc |H is uniformly bounded. From (4.5.5) f (xc ) + ∗ λc , x − xc ≥ 0
for all x ∈ C,
where λc = ϕc (xc , λ). From (A2) it follows that ¯ 2X ≤ f (xc ) − f (x), ¯ xc − x
¯ ≤ f (x) ¯ − ∗ λc , xc − x
¯ σ |xc − x| ≤ (|λc | + f (x)) ¯ |xc − x|, ¯ which by assumption implies that |xc |X is uniformly bounded. Since is compact it follows that any weakly convergent sequence λc → λ∗ in H satisfies ∗ λc → ∗ λ¯ strongly in X ∗ . Again, from (A2) we have σ |xc − xcˆ |X ≤ f (xc ) − f (xcˆ ), xc − xcˆ ≤ |∗ λc − ∗ λcˆ |X∗ for any c, cˆ > 0. Hence {xc } is a Cauchy sequence in X and thus there exists x¯ ∈ X such ¯ X → 0. The rest of the proof for the “only if” part is identical to the one of the that |xc − x| first part of Theorem 4.43. It will be shown in Theorem 4.49 that 1 1 σ ¯ 2H ≤ ¯ 2H ¯ 2X + |xc − x| |λc − λ| |λ − λ| 2 2c 2c if (4.5.7) holds. This implies the “if” part.
4.6 Augmented Lagrangian method In this section we discuss the augmented Lagrangian method for (4.1.1). Throughout we assume that (A1)–(A3) of Section 4.5 hold and we set Lc (x, λ) = f (x) + φc (x, λ). Augmented Lagrangian Method. Step 1: Choose a starting value λ1 ∈ H a positive number c and set k = 1. Step 2: Given λk ∈ H find xk ∈ C by Lc (xk , λk ) = min Lc (x, λk )
over x ∈ C.
Step 3: Update λk by λk+1 = ϕc (xk , λk ). Step 4: If the convergence criterion is not satisfied, then set k = k + 1 and go to Step 2.
i
i i
i
i
i
i
4.6. Augmented Lagrangian method
ItoKunisc 2008/6/12 page 115 i
115
Before proving convergence of the augmented Lagrangian method we motivate the update of the Lagrange multiplier in Step 3. First, it follows from (4.5.12) that it is a fixed point iteration for the necessary optimality condition in the dual variable λ. To give the second motivation we define the dual functional dc : H → R by dc (λ) = inf f (x) + ϕc (x, λ) over x ∈ C
(4.6.1)
for λ ∈ H . The update is then a steepest ascent step for dc (λ). In fact it will be shown in the following lemma that (4.6.1) attains the minimum at a unique minimizer x(λ) ∈ C and that dc is continuously Fréchet differentiable with F -derivative dc (λ) = u(x(λ), λ), where uc (x, λ) is defined in Theorem 4.39. Thus the steepest ascent step is given by λk+1 = λk + c u(x(λk ), λk ), which by Theorem 4.40 coincides with the update given in Step 3. Lemma 4.47. For λ ∈ H and c > 0 the infimum in (4.6.1) is attained at a unique minimizer x(λ) ∈ C and the mapping λ ∈ H → x(λ) ∈ X is Lipschitz continuous with Lipschitz constant σ −1 . Moreover, the dual functional dc is continuously Fréchet differentiable with F-derivative dc (λ) = u(x(λ), λ), where uc (v, λ) is defined in Theorem 4.39. Proof. The proof is given in three steps. Step 1. Since f (x) + ϕc (x, λ) is convex and l.s.c. and since (4.5.1) holds, there exists a unique x(λ) that attains its minimum over K. To establish Lipschitz continuity of λ → x(λ) we note that x(λ) satisfies the necessary optimality condition f (x(λ)), x − x(λ) + ϕc (x(λ), λ), (x − x(λ)) ≥ 0 for all x ∈ C. (4.6.2) Using this inequality at λ, μ ∈ H , we obtain f (x(μ)) − f (x(λ)), x(μ) − x(λ)
+ ϕc (x(μ), μ) − ϕc (x(λ), λ), (x(μ) − x(λ)) ≤ 0. By (A2) and (4.5.6) this implies that μ λ 2 σ |x(μ) − x(λ)|X + A 1 x(μ) + − A 1 x(λ) + , (x(μ) − x(λ)) ≤ 0, c c c c and thus
μ μ σ |x(μ) − x(λ)|2X + A 1 x(μ) + − A 1 x(λ) + , (x(μ) − x(λ)) c c c c λ μ ≤ A 1 x(λ) + − A 1 x(λ) + , (x(μ) − x(λ)) . c c c c
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 116 i
116 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Since A 1 is monotone and Lipschitz continuous with Lipschitz constant c, this inequality c yields σ |x(μ) − x(λ)|X ≤ |μ − λ|H , which shows the first assertion. Step 2. We show that for every v ∈ H the functional λ ∈ H → ϕc (v, λ) is Fréchet ∂ differentiable with F -derivative ∂λ ϕc (v, λ) given by u(v, λ). Since (4.4.2) can be equivalently written as (4.4.3), it follows from the proof of Theorem 4.39 that λ ∈ H → ϕc (v, λ) is Fréchet differentiable and ∂ λ 1 ϕc (v, λ) = (λ + c u(v, λ)) − = u(v, λ). ∂λ c c Step 3. To argue differentiability of λ → dc (λ) note that it follows from (4.6.2) that for λ, μ ∈ H dc (μ) − dc (λ) = f (x(μ)) − f (x(λ)) + ϕc (x(μ), λ) − ϕc (x(λ), λ) +ϕc (x(μ), μ) − ϕc (x(μ), λ) ≥ f (x(λ)), x(μ) − x(λ) + ϕc (x(λ), λ), (x(μ) − x(λ))
(4.6.3)
+ϕc (x(μ), μ) − ϕc (x(μ), λ) ≥ ϕc (x(μ), μ) − ϕc (x(μ), λ). Similarly, we have dc (μ) − dc (λ) = f (x(μ)) − f (x(λ)) + ϕc (x(μ), μ) − ϕc (x(λ), μ) +ϕc (x(λ), μ) − ϕc (x(λ), λ) ≤ − f (x(μ)), x(λ) − x(μ) + ϕc (x(μ), μ), (x(λ) − x(μ))
(4.6.4)
+ϕc (x(λ), μ) − ϕc (x(λ), λ) ≤ ϕc (x(λ), μ) − ϕc (x(λ), λ). It follows from Step 2 and Theorem 4.39 that |ϕc (x(μ), μ) − ϕc (x(μ), λ) − (u(x(λ)λ), μ − λ)|
1
≤ (u(x(μ), λ + t (μ − λ)) − u(x(λ), λ), μ − λ) dt
0
1 ≤ |μ − λ| c ≤
1 |μ − λ| c
1
A 1 x(μ) + 1 (λ + t (μ − λ)) − A 1 x(λ) + λ + t (λ − μ) dt
c
c c c 0 0
1
(c|x(μ) − x(λ)| + 2t |μ − λ|) dt ≤
1 + |μ − λ|2 . σ c
i
i i
i
i
i
i
4.6. Augmented Lagrangian method
ItoKunisc 2008/6/12 page 117 i
117
Hence (4.6.3)–(4.6.4) and Step 2 imply that λ ∈ H → dc (λ) is Fréchet differentiable with F -derivative given by u(x(λ), λ), where u(x(λ), λ) is Lipschitz continuous in λ. The following theorem asserts convergence of the augmented Lagrangian method. Theorem 4.48. Assume that (A1)–(A3) hold and that there exists λ¯ ∈ ∂ϕ(x) ¯ such that (4.5.7) is satisfied. Then the sequence (xk , λk ) is well defined and satisfies 1 1 σ ¯ 2H ≤ ¯ 2H |xk − x| |λk+1 − λ| |λk − λ| ¯ 2X + 2 2c 2c
(4.6.5)
∞ σ 1 ¯ 2H , |xk − x| ¯ 2X ≤ |λ1 − λ| 2 2c k=1
(4.6.6)
and
¯ X → 0 as k → ∞. which implies that |xk − x| ¯ where v¯ = x. Proof. It follows from Theorem 4.45 that λ¯ = ϕc (v, ¯ λ), ¯ Next, we establish (4.6.5). From (4.5.5) and Step 3 f (xk ), x¯ − xk + λk+1 , (x¯ − xk ) ≥ 0 and from (4.5.7)
¯ (xk − x) ¯ xk − x
¯ + λ, ¯ ≥ 0. f (x),
Adding these two inequalities, we obtain ¯ (xk − x) f (xk ) − f (x), ¯ xk − x
¯ + λk+1 − λ, ¯ ≤ 0. From Theorems 4.39 and 4.40 λk+1 − λ¯ = A 1 c
vk +
λk c
− A1 c
v¯ +
(4.6.7)
λ¯ , c
¯ From the definition of Aμ we have where vk = xk and v¯ = x. ˆ v − v
ˆ = μ |Aμ v − Aμ v| ˆ 2 + Aμ v − Aμ v, ˆ Jμ v − Jμ v
ˆ ≥ μ |Aμ v − Aμ v| ˆ2 Aμ v − Aμ v, for μ > 0 and v, vˆ ∈ H , since Aμ v ∈ AJμ v and A is monotone. Thus, λ λ¯ ¯ vk − v) ¯ vk + k − v¯ + (λk+1 − λ, ¯ = λk+1 − λ, c c 1 1 1 ¯ λk − λ) ¯ ≥ |λk+1 − λ| ¯ 2 − (λk+1 − λ, ¯ λk − λ) ¯ − (λk+1 − λ, c c c ≥
1 1 ¯ 2− ¯ 2. |λk+1 − λ| |λk − λ| 2c 2c
Hence, (4.6.5) follows from (A2) and (4.6.7). Summing up (4.6.5) with respect to k, we obtain (4.6.6).
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 118 i
118 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization The duality (Uzawa) method is an iterative method for λk ∈ H . It is given by λk+1 = ϕc (xk , λk ),
(4.6.8)
where xk ∈ C solves f (xk ) + ∗ λk , x − xk ≥ 0
for all x ∈ C.
(4.6.9)
It will be shown that the Uzawa method is conditionally convergent in the sense that there exists 0 < c < c¯ such that it converges for c ∈ [c, c]. ¯ On the other hand, the augmented Lagrangian method can be written as λk+1 = ϕc (xk , λk ), where xk ∈ C satisfies f (xk ) + ∗ λk+1 , x − xk ≥ 0
for all x ∈ C.
Note that the Uzawa method is explicit with respect to λ while the augmented Lagrangian method is implicit and converges unconditionally (see Theorem 4.48) with respect to c. Theorem 4.49 (Uzawa Algorithm). Assume that (A1)–(A3) hold and that there exists λ¯ ∈ ∂ϕ(x) ¯ such that (4.5.7) is satisfied. Then there exists c¯ such that for the sequence (xk , λk ) generated by (4.6.8)–(4.6.9) we have |xk − x| ¯ X → 0 as k → ∞ if 0 < c ≤ c. ¯ Proof. As shown in the proof of Theorem 4.48 it follows from (4.5.7) and (4.6.9) that ¯ (xk − x)
f (xk ) − f (x), ¯ xk − x
¯ + λk − λ, ¯ ≤ 0. Since λk+1 − λ¯ = A 1
c
λk vk + c
(4.6.10)
−A
1 c
λ¯ v¯ + , c
where vk = xk , and since v¯ = x¯ and A 1 is Lipschitz continuous with Lipschitz conc stant c, 2 ¯ ≤ |λk − λ¯ + c (vk − v)| ¯ 2 |λk+1 − λ| ¯ 2 + 2c λk − λ, ¯ vk − v)
= |λk − λ| ¯ + c2 |vk − v| ¯2 ¯ 2 − (2cσ − c2 2 )|xk − x| ¯ 2, ≤ |λk − λ| where (4.6.10) was used for the last inequality. Choose c¯ < 2σ ()−1 . Then β = ¯ and 2cσ − c2 2 > 0 if c ∈ (0, c] β
∞
¯ 2. |xk − x| ¯ 2 ≤ |λ0 − λ|
k=0
Consequently |xk − x| ¯ → 0 as k → ∞.
i
i i
i
i
i
i
4.7. Applications
ItoKunisc 2008/6/12 page 119 i
119
4.7 Applications In this section we discuss applications of the results in Sections 4.5 and 4.6. In many cases the conjugate functional ϕ ∗ is given by ϕ ∗ (v) = ψK ∗ (v), where K ∗ is a closed convex set in H and ψS is the indicator function of a set S, i.e., ⎧ if x ∈ S, ⎨ 0 ψS (x) = ⎩ ∞ if x ∈ S. Then it follows from Theorem 4.40 that for v, λ ∈ H 1 1 ϕc (v, λ) = sup − |y ∗ − (λ + c v)|2H + (|λ + c v|2H − |λ|2H ). 2c 2c y ∗ ∈K ∗
(4.7.1)
Hence the supremum is attained at λc (v, λ) = ProjK ∗ (λ + c v), where ProjK ∗ (φ) denotes the projection of φ ∈ H onto K ∗ . This implies that the update in Step 3 of the augmented Lagrangian algorithm is given by λk+1 = ProjK ∗ (λk + c xk ).
(4.7.2)
Example 4.50. For the equality constraint ϕ(v) = ψK (v) with K = {0} and ϕ ∗ (w) = 0 for w ∈ H . Thus ϕc (v, λ) =
1 (|λ + c v|2H − |λ|2H ) 2c
and λk+1 = λk + c xk , which coincides with the first order augmented Lagrangian update for equality constraints discussed in Chapter 3. Example 4.51. If ϕ(v) = ψK (v), where K is a closed convex cone with vertex at the origin in H , then ϕ ∗ = ψK + , where K + = {w ∈ H : w, v H ≤ 0 for all v ∈ K} is the dual cone of K. In particular, if K = {v ∈ L2 () : v ≤ 0 a.e.}, then K + = {w ∈ L2 () : w ≥ 0 a.e.}. Thus for the inequality constraint in H = L2 () the update (4.7.2) becomes λk+1 = max(0, λk + c xk ), where the max operation is defined pointwise in . Here L2 () denotes the space of scalarvalued square integrable functions over a domain in Rn . For K = {v ∈ Rn : vi ≤ 0 for all i} the update (4.7.2) is given by λk+1 = max(0, λk + cxk ), where the maximum is taken coordinatewise. It coincides with the first order augmented Lagrangian update for finite rank inequality constraints in Chapter 3.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 120 i
120 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Example 4.52. If ϕ(v) = |v|H , then ϕ ∗ = ψB , where B is the closed unit ball in H . Thus the update (4.7.2) becomes λk+1 =
λ k + c k xk . max(1, |λ + c xk |)
Example 4.53. We consider the bilateral inequality constraint φ ≤ v ≤ ψ in L2 (). We set ψ −φ φ+ψ ψ −φ 2 K = v ∈ L () : − ≤v− ≤ 2 2 2 and ϕ = ψK . It follows that
φ+ψ ϕ (w) = w, 2 ∗
L2
ψ −φ + , |w| 2
. L2
From Theorem 4.40 we conclude that the expression x, w − φ ∗ (w) − maximized with respect to w at the element y ∗ satisfying x−
1 |w 2c
− λ|2 is
φ+ψ y∗ − λ ψ − φ − ∈ ∂(| · |)(y ∗ ) 2 c 2
a.e. in . Thus it follows from Theorem 4.40 that the complementarity condition (4.1.8) is given by λ¯ = max(0, λ¯ + c (x¯ − ψ)) + min(0, λ¯ + c (x¯ − φ)), and the Lagrange multiplier update in Step 3 of the augmented Lagrangian method is λk+1 = max(0, λk + c (xk − ψ)) + min(0, λk + c (xk − φ)), where the max and min operations are defined pointwise a.e. in . Note that with obvious modifications, x in Sections 4.5 and 4.6 can be replaced by the affine function of the form x + a with a ∈ H .
4.7.1
Bingham flow
We consider the problem μ 2 ˜ min |∇u| − f u dx + g |∇u| dx 2
over u ∈ H01 (),
(4.7.3)
where is a bounded open set in R2 with Lipschitz boundary, g and μ are positive constants, and f˜ ∈ L2 (). For a discussion of (4.7.3) we refer the reader to [Glo, GLT] and the references therein. In the context of the general theory of Section 4.5 we choose X = H01 (), H = L2 () × L2 (),
and
= g grad,
i
i i
i
i
i
i
4.7. Applications
ItoKunisc 2008/6/12 page 121 i
121
K = X, and define f : X → R and ϕ : H → R by μ 2 ˜ f (u) = |∇u| − f u dx 2 and ϕ(v1 , v2 ) =
3
v12 + v22 dx.
Conditions (A1)–(A3) are clearly satisfied. Since dom(ϕ) = H it follows from Theorem 4.45 and Lemma 4.44 that there exists λ¯ such that (4.5.7) holds. Moreover, it is not difficult to show that ϕ ∗ (v) = ψK ∗ (v), where K ∗ is given by K ∗ = {v ∈ H : |v(x)|R2 ≤ 1 a.e. in }. Hence it follows from (4.7.2) that Steps 2 and 3 in the augmented Lagrangian method are given by −μ uk − g div λk+1 = f˜, where
λk+1 =
4.7.2
⎧ ⎪ λ + c ∇uk ⎪ ⎨ k ⎪ ⎪ ⎩
λk + c ∇uk |λk + c ∇uk |R2
(4.7.4)
on Ak = {x : |λk (x) + c ∇uk (x)|R2 ≤ 1}, (4.7.5) on \ Ak .
Image restoration
For the image restoration problem introduced in (4.1.2) the analysis is similar to the one for the Bingham flow and Steps 2 and 3 in the augmented Lagrangian method are given by −μ uk + K∗ (Kuk − z) + g div λk+1 = 0, where
λk+1 =
⎧ ⎪ λ + c ∇uk ⎪ ⎨ k ⎪ ⎪ ⎩
λk + c ∇uk |λk + c ∇uk |R2
μ
∂u + g λk+1 = 0 on , ∂n
on Ak = {x : |λk (x) + c ∇uk (x)|R2 ≤ 1}, on \ Ak
in the strong form. Here K : L2 () → L2 () denotes the convolution operator defined by (Ku)(x) = k(x, s)u(s) ds.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 122 i
122 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization
4.7.3
Elastoplastic problem
We consider the problem min
β 2 ˜ |∇u| − f u dx 2
subject to
over u ∈ H01 () (4.7.6)
|∇u| ≤ 1 a.e. in ,
where β is a positive constant, is a bounded domain in R2 , and f˜ ∈ L2 (). In the context of the general theory we choose X = H01 (), H = L2 () × L2 (), K = X, and define f : X → R and ϕ : H → R by β f (u) = |∇u|2 − f˜ u dx 2
and
and
= ∇,
ϕ = ψCˆ ,
where Cˆ is the closed convex set defined by C = {v ∈ H : |v|R2 ≤ 1 a.e. in }. Then |w|R2 dx. ϕ ∗ (w) =
The maximum of (4.4.8), i.e., ∇u, w − φ ∗ (w) − L2 (), is attained at y ∗ such that
1 |w 2c
− λ| with respect to w ∈ L2 () ×
λ + c ∇u = y ∗ + cμ, where |μ| ≤ 1 and μ · y ∗ = |y ∗ | a.e. in . It thus follows from Theorem 4.40 that the Lagrange multiplier update is given by k λk + c ∇uk λ + c ∇uk |−1 λk+1 = c max 0, | c |λk + c ∇uk | a.e. in . The existence of Lagrange multiplier λ¯ ∈ L∞ () for the inequality constraint in (4.7.6) is shown in [Bre2] for f˜ = 1. In general, existence is still an open problem.
4.7.4
Obstacle problem
We consider the problem min
1 |∇u|2 − f˜ u dx 2
subject to
over u ∈ H01 () (4.7.7)
φ ≤ u ≤ ψ a.e. in .
In the context of the general theory we choose X = H01 (), H = L2 (),
and
= the natural injection,
i
i i
i
i
i
i
4.7. Applications
ItoKunisc 2008/6/12 page 123 i
123
C = X, and define f : X → R and ϕ : H → R by 1 2 ˜ f (u) = |∇u| − f u dx and 2
ϕ(v) = ψK ,
where K is the closed convex set defined by K = {v ∈ H : φ ≤ v ≤ ψ a.e. in }. For the one-sided constraint u ≤ ψ (i.e., φ = −∞) it is shown in [IK4], for example, that there exists a unique λ¯ ∈ ∂ϕ(u) ¯ such that (4.5.7) is satisfied provided that ψ ∈ H 1 (), ψ| ≥ 0, ˜ ¯ ∈ (H 2 ∩ H01 ) × L2 satisfies and sup(0, f + ψ) ∈ L2 (). In fact, the optimal pair (u, ¯ λ) − u¯ + λ¯ = f˜, λ¯ = max(0, λ¯ + c (u¯ − ψ)).
(4.7.8)
In this case Steps 2 and 3 in the augmented Lagrangian method is given by − uk + λk+1 = f˜, λk+1 = max(0, λk + c (uk − ψ)). For the two-sided constraint case we assume that φ, ψ ∈ H 1 () satisfy on , max(0, ψ + f˜), min(0, φ + f˜) ∈ L2 (),
(4.7.9)
S1 = {x ∈ : ψ + f˜ > 0} ∩ S2 = {x ∈ : φ + f˜ < 0} = ∅,
(4.7.10)
φ≤0≤ψ
and that there exists a c0 > 0 such that − (ψ − φ) + c0 (ψ − φ) ≥ 0
a.e. in .
(4.7.11)
Let λˆ ∈ H be defined by
⎧ ⎨ ψ(x) + f˜(x), x ∈ S1 , ˆ λ(x) = φ(x) + f˜(x), x ∈ S2 , ⎩ 0 otherwise.
(4.7.12)
Employing the regularization procedure of (4.5.4) we find the following theorem. Theorem 4.54. Assume that φ, ψ ∈ H 1 () satisfy (4.7.9)–(4.7.12) where λˆ is defined in (4.7.12). Then (4.5.5) is given by ⎧ ⎨ λˆ + c (uc − ψ) if λˆ + c (uc − ψ) > 0, ˜ − uc + λc = f , λc = (4.7.13) λˆ + c (uc − φ) if λˆ + c (uc − φ) < 0, ⎩ 0 otherwise, ˆ a.e. in for c ≥ c0 . Moreover, as c → ∞, uc u∗ weakly and φ ≤ uc ≤ ψ, |λc | ≤ |λ| 2 ¯ satisfies in H () and λc λ¯ weakly in L2 () where u¯ is the solution of (4.7.7) and (u, ¯ λ) the necessary and sufficient optimality condition ⎧ ⎨ λ¯ + c (u¯ − ψ) if λ¯ + c (u¯ − ψ) > 0, ˜ ¯ ¯ − u¯ + λ = f , λ = λ¯ + c (u¯ − φ) if λ¯ + c (u¯ − φ) < 0, ⎩ 0 otherwise.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 124 i
124 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization Proof. From Theorem 4.40 it follows that ⎧ ⎨ λˆ + c (u − ψ) if λˆ + c (u − ψ) > 0, ˆ λc (u, λ) = λˆ + c (u − φ) if λˆ + c (u − φ) < 0, ⎩ 0 otherwise.
(4.7.14)
We note that (4.5.4) has a unique solution uc ∈ H01 (). From (4.5.5)–(4.5.6) we deduce ˆ ∈ L2 () and thus that uc satisfies (4.7.13). Since λˆ ∈ L2 () it follows that λc (uc , λ) 1 2 uc ∈ H (). Let η = sup(0, uc − ψ). Then η ∈ H0 (). Hence, we have (∇(uc − ψ), ∇η) + (−( ψ + f˜) + λc , η) = 0. ˆ For x ∈ assume that uc (x) ≥ ψ(x). If λ(x) > 0, then −( ψ + f˜)+λc = c (uc −ψ) ≥ 0. ˆ ˆ If λ(x) = 0, then −( ψ + f˜) + λc ≥ c (uc − ψ) ≥ 0. If λ(x) < 0, then −( ψ + f˜) + λc ≥ − (ψ − φ) + c (ψ − φ) ≥ 0 for c ≥ c0 . Thus, we have (−( ψ + f˜) + λc , η) ≥ 0 and |∇η|2 = 0. This implies that η = 0 and uc ≤ ψ a.e. in . Similarly, we can prove that uc ≥ φ a.e. in by choosing the test function η = inf (0, uc − φ) ∈ H01 (). Moreover, it ˆ ≤ |λ| ˆ a.e. in and |λc |L2 is uniformly bounded. follows from (4.7.14) that |λc | = |λc (uc , λ)| ¯ in H 2 () × L2 (). Thus, there exists a weakly convergent subsequence (ucˆ , λcˆ ) (u, ¯ λ) 1 Moreover the subsequence ucˆ converges strongly to u¯ in H0 () and, as shown in the proof of Theorem 4.46, − u¯ + λ¯ = f˜ and λ¯ ∈ ∂ϕ(u). ¯ (4.7.15) Hence, it follows from Theorem 4.43 that u¯ minimizes (4.7.7). Since the solution to (4.7.7) is unique it follows from (4.7.15) that λ¯ ∈ L2 () is unique. The theorem now follows from Theorem 4.45. From Theorem 4.54 it follows that Steps 2 and 3 in the augmented Lagrangian method for the two-sided constraint are given by ⎧ ⎨ λk + c (uk − ψ) if λk + c (uk − ψ) > 0, λk + c (uk − φ) if λk + c (uk − φ) < 0, − uk + λk+1 = f˜, λk+1 = ⎩ 0 otherwise. Note that the last equality can be equivalently expressed as λk+1 = max(0, λk + c (uk − ψ)) + min(0, λk + c (uk − φ)).
4.7.5
Signorini problem
Consider the Signorini problem 1 min (|∇u|2 + |u|2 ) − f˜ u dx 2 subject to u ≥ 0
over u ∈ H 1 () (4.7.16)
on the boundary ,
i
i i
i
i
i
i
4.7. Applications
ItoKunisc 2008/6/12 page 125 i
125
which is a simplified version of a contact problem arising in elasticity theory. In this case we choose X = H 1 (), H = L2 (),
and
= the trace operator on boundary ,
C = X, and define f : X → R and ϕ : H → R by 1 f (u) = (|∇u|2 + |u|2 ) − f˜ u dx 2
and
ϕ(v) = ψK ,
where K is the closed convex set defined by K = {v ∈ L2 () : v ≥ 0 a.e. in }. If f˜ ∈ L2 (), then the unique minimizer u¯ ∈ H 2 () [Bre2] and it is shown in [Glo] that ∂ ¯ satisfies (4.5.7). Note that range() is dense in H and is for λ¯ = ∂n u, ¯ the pair (u, ¯ λ) compact. It thus follows from Theorems 4.43 and 4.46 that (4.5.7) has a unique solution and that uc → u¯ strongly in X and λc → λ¯ weakly in H . Steps 2 and 3 in the augmented Lagrangian method are given by − uk + uk = f˜,
∂ uk − λk+1 = 0 ∂n
λk+1 = max(0, λk − c uk )
4.7.6
on ,
on .
Friction problem
Consider a simplified friction problem from elasticity theory 1 min (|∇u|2 + |u|2 ) − f˜ u dx + g |u| ds over u ∈ H 1 (), 2
(4.7.17)
where is a bounded open domain of R2 with sufficiently smooth boundary . In this case we choose X = H 1 (), H = L2 (),
and
= the trace operator on boundary ,
C = X, and define f : X → R and ϕ : H → R by 1 f (u) = (|∇u|2 + |u|2 ) − f˜ u dx and 2
ϕ(v) = g
|v| ds.
Note that ϕ = ψK ∗ , where K ∗ = {v ∈ H : |v(x)| ≤ 1 a.e. in }. Since dom(ϕ) = H it follows from Theorem 4.43 and Lemma 4.44 that there exists λ¯ such that (4.5.7) holds. From (4.7.2) Steps 2 and 3 in the augmented Lagrangian method are given by − uk + uk = f˜,
λk+1
∂ uk + λk+1 = 0 ∂n
⎧ ⎨ λk + c u k λk + c uk = ⎩ |λk + c uk |
on ,
on k = {x ∈ : |λk + c uk | ≤ 1}, on \ k .
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 126 i
126 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization
L1 -fitting
4.7.7
Consider the minimization problem μ min |u − z| dx + |∇u|2 ds 2
over u ∈ H01 ()
(4.7.18)
for interpolation of noisy data z ∈ L2 () by minimizing the L1 -norm of the error u − z over u ∈ H01 (). Again μ > 0 is fixed but should be adjusted to the statistics of noise. The analysis of this problem is very similar to that of a friction problem and Steps 2 and 3 in the augmented Lagrangian method are given by −μ uk + λk+1 = 0,
λk+1
4.7.8
⎧ ⎨ λk + c (uk − z) λk + c (uk − z) = ⎩ |λk + c (uk − z)|
on k = {x : |λk (x) + c (uk (x) − z)| ≤ 1}, on \ k .
Control problem
Consider the optimal control problem min
1 2
T
|x(t)|2 + |u(t)|2 dt
over (x, u) ∈ L2 (0, T ; Rn ) × L2 (0, T ; Rm )
0
(4.7.19)
t
subject to x(t) − x0 −
Ax(s) + Bu(s) ds = 0
and
|u(t)| ≤ 1 a.e.,
0
where A ∈ Rn×n , B ∈ Rn×m , and x0 ∈ Rn are fixed. In this case we formulate the problem as in the form (4.1.1) by choosing X = L2 (0, T ; Rn )×L2 (0, T ; Rm ), H = L2 (0, T ; Rm ),
and
(x, u) = u for (x, u) ∈ X,
and define f : X → R and ϕ : H → by f (x, u) =
1 2
T
(|x(t)|2 + |u(t)|2 ) dt
and
ϕ(v) = ψK (v),
0
where K = {v ∈ H : |v(t)| ≤ 1 a.e. in (0, T )}. C is the closed affine space defined by t C = (x, u) ∈ X : x(t) − x0 − Ax(s) + Bu(s) ds = 0 . 0
It follows from Theorems 4.39 and 4.40 that
v + λc λ λ+cv λ
λc = c v + − = c max 0, v + − 1 λ c c |λ + c v| max(1, |v + c |)
(4.7.20)
i
i i
i
i
i
i
4.7. Applications
ItoKunisc 2008/6/12 page 127 i
127
for v, λ ∈ H . For c > 0 the regularized problem fc (x, u) = f (x, u) + ϕc (u, 0)
over (x, u) ∈ C
has a unique solution (xc , uc ) ∈ C. Define the Lagrangian T t x(t) − x0 − L(x, u, μ) = fc (x, u) + Ax(s) + Bu(s) ds, μ(t) 0
0
Since the mapping F : X → L2 (0, T ; Rn ) defined by t Ax(s) + Bu(s) ds, F (x, u) = x(t) −
dt. Rn
t ∈ (0, T ),
0
is surjective it follows from the Lagrange multiplier theory that there exists a unique Lagrange multiplier μc ∈ L2 (0, T ; Rn ) such that the Fréchet derivative of L with respect to (x, u) satisfies L (xc , uc , μc )(h, v) = 0 for all (h, v) ∈ X. Hence we obtain 2 t T 1 h(t) − Ah(s) ds, μc (t) + (x(t), h(t)) dt = 0 0
0
for all h ∈ L2 (0, T ; Rn ) and t 2 T 1 uc u(t) + c max(0, |uc | − 1) , v(t) − Bv(s) ds, μc (t) dt = 0 |uc | 0 0 for all v ∈ L2 (0, T ; Rm ). It is not difficult to show that if we define pc (t) = − then (xc , uc , pc ) satisfies d pc (t) + AT pc (t) + xc (t) = 0, dt
T t
μc (s) ds,
pc (T ) = 0, (4.7.21)
uc (t) uc (t) + c max(0, |uc (t)| − 1) = −B T pc (t). |uc (t)| Let (x, ¯ u) ¯ be a feasible point of (4.7.19); i.e., (x, ˜ u) ˜ ∈ C and u˜ ∈ K. Since fc (xc , uc ) ≤ fc (x, ˜ u) ˜ = f (x, ˜ u) ˜ and ϕc (uc , 0) ≥ 0 it follows that |(xc , uc )|X is uniformly bounded in c > 0. Thus, there exists a subsequence of (xc , uc ) that converges weakly to (x, ¯ u) ¯ ∈ X as c → ∞. We shall show that (x, ¯ u) ¯ is the solution of (4.7.19). First, since F (xc , uc ) = x0 , xc converges strongly to x¯ in L2 (0, T ; Rn ) and weakly in H 1 (0, T ; Rn ) and (x, ¯ u) ¯ ∈ C. From the first equation of (4.7.21) it follows that pc converges strongly to p¯ in H 1 (0, T ; Rn ), where p¯ satisfies d p(t) ¯ + AT p(t) ¯ + x(t) ¯ = 0, dt
p(T ¯ ) = 0.
Next, from the second equation of (4.7.21) we have |uc (t)| + c max(0, |uc (t)| − 1) = |B T pc (t)|
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 128 i
128 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization and thus
|uc (t)| =
⎧ |B T pc (t)| ⎪ ⎪ ⎨
if |B T pc (t)| ≤ 1,
|B T pc (t)| − 1 ⎪ ⎪ ⎩ 1+ c+1
if |B pc (t)| ≥ 1,
t ∈ [0, T ].
(4.7.22)
T
Note that uc (t) =
−B T pc (t) , 1 + ηc (t)|uc (t)|−1
(4.7.23)
c where ηc (t) = c max(0, |uc (t)| − 1) = max(0, |B T pc (t)| − 1). 1+c Define the functions η, ¯ uˆ ∈ H 1 (0, T ) by η(t) ¯ = max(0, |B T p(t)| ¯ − 1)
and
for t ∈ [0, T ]. Then |u(t)| ˆ =
u(t) ˆ =
¯ ¯ −B T p(t) −B T p(t) = 1 + η(t) ¯ max(1, |B T p(t)|) ¯
(4.7.24)
¯ |B T p(t)| . max(1, |B T p(t)|) ¯
ˆ in C(0, T ) by (4.7.22) and it follows Since B T pc → B T p¯ in C(0, T ) we have |uc | → |u| from (4.7.23)–(4.7.24) that ηc → η¯ and uc → uˆ in C(0, T ). Since uc → u¯ weakly in L2 (0, T ; Rm ) we have u¯ = u. ˆ Therefore, we conclude that uc → u¯ in C(0, T ) and if ¯ ¯ satisfies (x, λ(t) = η(t) ¯ u(t), ¯ then (x, ¯ u, ¯ p, ¯ λ) ¯ u) ¯ ∈ C, u¯ ∈ C and d p(t) ¯ + AT p(t) ¯ + x¯ = 0, dt ¯ u(t) ¯ + λ¯ = −B T p(t),
p(T ¯ ) = 0, (4.7.25)
¯ ¯ ¯ λ(t) = λc (u(t), ¯ λ(t)) = ϕc (u, ¯ λ),
which is equivalent to (4.5.7). Now, from Theorem 4.43 we deduce that (x, ¯ u) ¯ is the solution to (4.7.19). The last equality in (4.7.25) can be verified by separately considering the cases |u(t)| ¯ < 1 and |u(t)| ¯ = 1. It follows from (4.7.20) that Steps 2 and 3 in the augmented Lagrangian method are given by d xk (t) = Axk (t) + Buk (t), dt
xk (0) = x0 ,
d pk (t) + AT pk (t) + xk = 0, dt
pk (T ) = 0,
uk (t) + λk+1 (t) = −B T pk (t),
λ k + c uk λk where λk+1 = c max 0,
uk +
− 1 c |λk + c uk |
for t ∈ [0, T ].
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 129 i
Chapter 5
Newton and SQP Methods
In this chapter we discuss the Newton method for the equality-constrained optimal control problem min J (y, u) subject to e(y, u) = 0, (P ) where J : Y × U → R and e : Y × U → W , and Y, U , and W are Hilbert spaces. The focus is on problems where for given u there exists a solution y(u) to e(y, u) = 0, which is typical for optimal control problems. We shall refer to Jˆ(u) = J (y(u), u)
(5.0.1)
as the reduced cost functional. In the first section we give necessary and sufficient optimality conditions based on Taylor series expansion arguments. Section 5.2 is devoted to Newton’s method to solve (P ) and sufficient conditions for quadratic convergence are given. Section 5.3 contains a discussion of SQP (sequential quadratic programming) and reduced SQP techniques. We do not provide a convergence analysis here, since this can be obtained as a special case from the results on second order augmented Lagrangians in Chapter 6. The results are specialized to a class of optimal control problems for the Navier–Stokes equation in Section 5.4. Section 5.5 is devoted to the Newton method for weakly singular problems as introduced in Chapter 1.
5.1
Preliminaries
(I) Necessary Optimality Condition The first objective is to characterize the derivative of Jˆ without recourse to the derivative of the state y with respect to the control u. This is in the same spirit as Theorem 1.17 but obtained under the less restrictive assumption of this chapter. We assume that (C1) (y, u) ∈ Y × U satisfies e(y, u) = 0 and there exists a neighborhood V (y) × V (u) of (y, u) on which J and e are C 1 with Lipschitz continuous first derivatives such that for every v ∈ V (u) there exists a unique y(v) ∈ V (y) satisfying e(y(v), v) = 0. 129
i
i i
i
i
i
i
130
ItoKunisc 2008/6/12 page 130 i
Chapter 5. Newton and SQP Methods
Theorem 5.1. Assume that the pair (y, u) satisfies (C1) and that there exists λ ∈ W ∗ such that (C2) ey (y, u)∗ λ + Jy (y, u) = 0, and (C3) limt→0+ 1t |y(u + td) − y|2Y = 0 for all d ∈ U . Then the Gâteaux derivative Jˆ (u) exists and Jˆ (u) = eu (y, u)∗ λ + Ju (y, u).
(5.1.1)
Proof. For d ∈ U and t sufficiently small let v = y(u + td) − y. Then 1 ˆ ˆ J (u+td)− J (u) = J (y +sv, u+std)(v, td)−J (y, u)(v, td) ds +J (y, u)(v, td). 0
(5.1.2)
Similarly, 0 = λ, e(v, u + td) − e(y, u) W ∗ ,W
1
= λ,
e (y + sv, u + std)(v, td) − e (y, u)(v, td) ds + λ, e (y, u)(v, td) ,
0
and hence by Lipschitz continuity of e in V (y) × V (u) and (C3) we have
(5.1.3)
1 lim+ λ, ey (y, u)v = − λ, eu (y, u)d . t→0 t Since J is Lipschitz continuous in V (y) × V (u), it follows from (5.1.2) and conditions (C2)–(C3) that lim+
t→0
1 Jˆ(u + td) − Jˆ(u) = − lim+ λ, ey (y, u)v + Ju (y, u)d t→0 t t
= (eu (y, u)∗ λ + Ju (y, u))d. Defining the Lagrangian L : Y × U × W ∗ → R by L(y, u, λ) = J (y, u) + λ, e(y, u) , (5.1.1) can be written as Jˆ (u) = Lu (y, u, λ). If ey (y, u) : Y → W is bijective, then (5.1.1) also follows from the implicit function theory. In fact, by the implicit function theorem there exists a neighborhood V (y) × V (u) of (y, u) such that e(y, v) = 0 has a unique solution y(v) ∈ V (y) for v ∈ V (u) and y(·) : V (u) ⊂ U → Y is C 1 with ey (y, v)y (v) + eu (y, v) = 0,
v ∈ V (u).
i
i i
i
i
i
i
5.1. Preliminaries
ItoKunisc 2008/6/12 page 131 i
131
Thus Jˆ (u)v = Jy (y, u)y (u)v + Ju (y, u)v = eu (y, u)∗ λ, v + Ju (y, u)v. If conditions (C1)–(C3) hold at a local minimizer (y ∗ , u∗ ) of (P ), it follows from Theorem 5.1 that the first order necessary optimality condition for (P ) is given by ⎧ ey (y ∗ , u∗ )∗ λ∗ + Jy (y ∗ , u∗ ) = 0, ⎪ ⎪ ⎨ eu (y ∗ , u∗ )∗ λ∗ + Ju (y ∗ , u∗ ) = 0, (5.1.4) ⎪ ⎪ ⎩ e(y ∗ , u∗ ) = 0, or equivalently L (y ∗ , u∗ , λ∗ ) = 0,
e(y ∗ , u∗ ) = 0,
where, as in the previous chapter, prime with L denotes the derivative with respect to (y, u).
(II) Sufficient Optimality Condition Assume that (y ∗ , u∗ , λ∗ ) ∈ Y1 × U × W ∗ satisfies (C1) and that the necessary optimality condition (5.1.4) holds. For (y, u) ∈ V (y ∗ ) × V (u∗ ) define the quadratic approximation to J (y, u) and λ, e(y, u) by E1 (y −y ∗ , u−u∗ ) = J (y, u)−J (y ∗ , u∗ )−Ju (y ∗ , u∗ )(u−u∗ )−Jy (y ∗ , u∗ )(y −y ∗ ) (5.1.5) and E2 (y − y ∗ , u − u∗ ) = λ∗ , e(y, u) − e(y ∗ , u∗ ) − ey (y ∗ , u∗ )(y − y ∗ ) − eu (y ∗ , u∗ )(u − u∗ ) . (5.1.6) Since for y = y(u) with u ∈ V (u∗ ) E2 (y(u) − y ∗ , u − u∗ ) = − λ∗ , ey (y ∗ , u∗ )(y − y ∗ ) + eu (y ∗ , u∗ )(u − u∗ ) , we find by summing (5.1.5) and (5.1.6) with y = y(u) and using (5.1.4) Jˆ(u) − Jˆ(u∗ ) = J (y(u), u) − J (y ∗ , u∗ ) = E1 (y(u) − y ∗ , u − u∗ ) + E2 (y(u) − y ∗ , u − u∗ ). Based on this identity, we have the following local sufficient optimality conditions. • If E(u) := E1 (y(u) − y ∗ , u − y ∗ ) + E2 (y(u) − y ∗ , u − u∗ ) ≥ 0 for all u ∈ V (u∗ ), then u∗ is a local minimizer of (P ). If E(u) > 0 for all u ∈ V (u∗ ) with u = u∗ , then u∗ is a strict local minimizer. • Assume that J is locally uniformly convex in the sense that for some α > 0 E1 (y(u) − y ∗ , u − u∗ ) ≥ α(|y(u) − y ∗ |2Y + |u − u∗ |2U ) for all u ∈ V (u∗ ), and let β ≥ 0 be the size of the nonlinearity of e, i.e., |e(y, u) − e(y ∗ , u∗ ) − ey (y ∗ , u∗ )(y − y ∗ ) − eu (y ∗ , u∗ )(u − u∗ )| ≤ β(|y − y ∗ |2Y + |u − u∗ |2U )
i
i i
i
i
i
i
132
ItoKunisc 2008/6/12 page 132 i
Chapter 5. Newton and SQP Methods
for all (y, u) ∈ V (y ∗ ) × V (u∗ ). Then E(u) ≥ (α − β|λ∗ |W ∗ ) (|y(u) − y ∗ |2Y + |u − u∗ |2U ) for all u ∈ V (u∗ ). Thus, u∗ is a strict local minimizer of (P ) if α − β|λ∗ |W ∗ > 0. • Note that E(u) = L(x, λ∗ ) − L(x ∗ , λ∗ ) − L (x ∗ , λ∗ )(x − x ∗ ), where x = (y, u), x ∗ = (y ∗ , u∗ ), and y ∗ = y(u∗ ). If L is C 2 with respect to x = (y, u) in V (y ∗ ) × V (u∗ ) with second derivative with respect to x denoted by L , we have E(u) = L (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) + r(x − x ∗ ),
(5.1.7)
where |r(x − x ∗ )| = o(|x − x ∗ |2 ). Thus, if there exists σ > 0 such that L (x ∗ , λ∗ )(δx, δx) ≥ σ |δx|2 for all δx ∈ ker e (x ∗ ),
(5.1.8)
and e (x ∗ ) : Y × U → W is surjective, then there exist σ0 > 0 and ρ > 0 such that Jˆ(u) − Jˆ(u∗ ) = E(u) ≥ σ0 (|y(u) − y(u∗ ))|2 + |u − u∗ |2 )
(5.1.9)
for all y(u) ∈ V (y ∗ )×V (u∗ ) with |(y(u), u)−(y(u∗ ), u∗ )| < ρ. In fact, from Lemma 2.13, Chapter 2, with S = ker e (x ∗ ) there exist γ > 0, σ0 > 0 such that L (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) ≥ σ0 |x − x ∗ |2 for all x ∈ V (y ∗ ) × V (u∗ ) with |xker⊥ | ≤ γ |xker |,
(5.1.10)
where we decomposed x − x ∗ as x − x ∗ = xker + xker⊥ ∈ ker e (x ∗ ) + (ker e (x ∗ ))⊥ . Since 0 = e(x) − e(x ∗ ) = e (x ∗ )(x − x ∗ ) + η(x − x ∗ ), where |η(x − x ∗ )|W = o(|x − x ∗ |), γ ) there exists ρ > 0 such that it follows that for δ ∈ (0, 1+γ |xker⊥ | ≤ δ|x − x ∗ |
if |x − x ∗ | < ρ.
Consequently |xker⊥ | ≤
δ |xker | < γ |xker | 1−δ
if
|x − x ∗ | < ρ,
and hence (5.1.9) follows from (5.1.7) and (5.1.10). We refer the reader to [CaTr, RoTr] and the literature cited therein for detailed investigations on the topic of second order sufficient optimality. The aim of these results is to establish second order sufficient optimality conditions which are close to second order necessary conditions.
i
i i
i
i
i
i
5.2. Newton method
5.2
ItoKunisc 2008/6/12 page 133 i
133
Newton method
In this section we describe the Newton method applied to the reduced form (5.0.1) of (P ). That is, let y(u) denote a solution to e(y, u) = 0. Then the constrained optimization problem is transformed to the unconstrained problem for u in U : min Jˆ(u) = J (y(u), u).
(5.2.1)
u∈U
Let (y ∗ , u∗ ) denote a solution to (P ), and assume that (C1) holds for (y ∗ , u∗ ) and that (C2), (C3) hold for all (y(u), u) ∈ V (y ∗ ) × V (u∗ ). In addition it is assumed that J and e are C 2 in V (y ∗ ) × V (u∗ ) with Lipschitz continuous second derivatives. From Theorem 5.1 the first derivative of Jˆ(u) is given by Jˆ (u) = eu (y, u)∗ λ + Ju (y, u),
(5.2.2)
where u ∈ V (u∗ ), y = y(u), and λ = λ(u) satisfy ey (y, u)∗ λ = −Jy (y, u).
(5.2.3)
From (5.2.2) it follows that Jˆ (u) = Lu (y(u), u, λ(u))
for u ∈ V (u∗ ).
(5.2.4)
We henceforth assume that for every u ∈ V (u∗ ) and y = y(u), λ = λ(u), ⎛ ⎝
Lyy (y, u, λ) ey (y, u)∗ ey (y, u)
0
⎞⎛ ⎠⎝
μ1
⎞
⎛
⎠+⎝
μ2
Lyu (y, u, λ)
⎞ ⎠=0
(5.2.5)
eu (y, u)
admits a solution (μ1 , μ2 ) ∈ L(U, Y ) × L(U, W ∗ ). Note that (5.2.5) is an operator equation in L(U, Y ∗ ) × L(U, W ). It consists of the linearized primal equation for μ1 and the adjoint operator equation with right-hand side −ey (y, u)∗ μ2 − Lyu (y, u, λ) ∈ L(U, Y ∗ ) for μ2 . Using (5.24) and the adjoint operator in the form Ly (y, u, λ) = 0 and e(y, u) = 0, we find Jˆ (u) = Lyu (y, u, λ)μ1 + eu (y, u)∗ μ2 + Luu (y, u, λ),
(5.2.6)
where (μ1 , μ2 ) satisfy (5.2.5). In fact, since e and J are required to have Lipschitz continuous second derivatives in V (y ∗ ) × V (u∗ ) we obtain the following relationships in W and Y ∗ : ey (y, u)v + eu (y, u)(td) = o(|t|), Lyy (y, u, λ)v + ey (y, u)∗ w + Lyu (y, u, λ)(td) = o(|t|),
(5.2.7)
i
i i
i
i
i
i
134
ItoKunisc 2008/6/12 page 134 i
Chapter 5. Newton and SQP Methods
where (y, u) ∈ V (y ∗ ) × V (u∗ ), d ∈ U , and v = y(u + td) − y, w = λ(u + td) − λ. By (5.2.4), (5.2.5), and (5.2.7) we find Jˆ (u + td) − Jˆ (u) = Luy (y, u, λ)v + Luu (y, u, λ)(td) + eu (y, u)∗ w + o(|t|) = −Lyy (y, u, λ)(μ1 , v) − ey (y, u)v, μ2 W,L(U,W ∗ ) + Luu (y, u, λ)(td) − ey (y, u)μ1 , w L(U,W ∗ ),W + o(|t|) = ey (y, u)μ1 , w L(U,W ),W ∗ + Lyu (y, u, λ)(td, μ1 ) + eu∗ (y, u)μ2 , td L(U,U ∗ ),U +Luu (y, u, λ)(td) − ey (y, u)μ1 , w L(U,W ),W ∗ + o(|t|). Dividing by t and letting t → 0 we obtain (5.2.6). From (5.2.5) we deduce μ1 = −(ey (y, u))−1 eu (y, u), μ2 = −(ey (y, u)∗ )−1 (Lyy μ1 + Lyu ), where L is evaluated at (y, u, λ). Hence the second derivative of Jˆ is given by Jˆ (u) = Luy μ1 − eu (y, u)∗ (ey (y, u)∗ )−1 (Lyy μ1 + Lyu ) + Luu (5.2.8)
= T (y, u)∗ L (y, u, λ)T (y, u), with
⎛ T (y, u) = ⎝
−ey (y, u)−1 eu (y, u)
⎞ ⎠,
I where T ∈ L(U, Y × U ). After this preparation the Newton equation Jˆ (u)δu + Jˆ (u) = 0 can be expressed as T (y, u)∗ L (y, u, λ)
δy δu
⎛ + T (y, u)∗ ⎝
(5.2.9)
0 (eu (y, u)∗ λ + Ju (y, u)
⎞ ⎠ = 0,
ey (y, u)δy + eu (y, u)δu = 0. By the definition of T (y, u) and the closed range theorem we have ker(T (y, u)∗ ) = range(T (y, u))⊥ = ker(e (y, u))⊥ = range(e (y, u)∗ ) provided that range(e (y, u)), and hence range(e (y, u))∗ , is closed. As a consequence the Newton update can be expressed as ⎛ ⎞ ⎛ ⎞ δy 0 L (y, u, λ) e (y, u)∗ ⎝ δu ⎠ + ⎝ eu (y, u)∗ λ + Ju (y, u) ⎠ = 0, (5.2.10) e (y, u) 0 δλ 0
i
i i
i
i
i
i
5.2. Newton method
ItoKunisc 2008/6/12 page 135 i
135
where (δy, δu, δλ) ∈ Y × U × W ∗ and the equality is understood in Y ∗ × U ∗ × W . We now state the Newton iteration for (5.2.1). Newton Method. (i) Initialization: Choose u0 ∈ V (u∗ ), solve e(y, u0 ) = 0,
ey (y, u0 )∗ λ + Jy (y, u0 ) = 0
for (y0 , λ0 ),
and set k = 0 (ii) Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗ ⎞ ⎛ ⎞ ⎛ δy 0 L (yk , uk , λk ) e (yk , uk )∗ ⎝ ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0. δu e (yk , uk ) 0 0 δλ (iii) Update uk+1 = uk + δu. (iv) Feasibility step: Solve for (yk+1 , λk+1 ) ∈ Y × W ∗ e(y, uk+1 ) = 0,
ey (yk+1 , uk+1 )∗ λ + Jy (y, uk+1 ) = 0.
(v) Stop, or set k = k + 1, and goto (ii). The following theorem provides sufficient conditions for local quadratic convergence. Here we assume that ey (x ∗ ) : Y → W is a bijection at a local solution x ∗ = (y ∗ , u∗ ) of (P ). Then there corresponds a unique Lagrange multiplier λ∗ . Theorem 5.2. Let x ∗ = (y ∗ , u∗ ) be a local solution to (P ) and let V (y ∗ ) × V (u∗ ) denote a neighborhood in which J and e are C 2 with Lipschitz continuous second derivatives. Let us further assume that ey (x ∗ ) : Y → W is a bijection and that there exists κ > 0 such that L (x ∗ , λ∗ )(h, h) ≥ κ|h|2 for all h ∈ ker e (x ∗ ). Then, if |u0 − u∗ | is sufficiently small, |(yk+1 , uk+1 , λk+1 ) − (y ∗ , u∗ , λ∗ )| ≤ K|uk − u∗ |2 for a constant K independent of k = 0, 1, . . . . Proof. Since ey (x ∗ ) : Y → W is a homeomorphism and x → ey (x) is continuous from X → L(Y, X) , there exist r > 0 and γ > 0 such that ⎧ ey (x)−1 L(W,Y ) ≤ γ for all x ∈ Br , ⎪ ⎪ ⎨ ey (x)−∗ L(Y ∗ ,W ∗ ) ≤ γ for all x ∈ Br , (5.2.11) ⎪ ⎪ ⎩ κ T ∗ (x)L (x, λ)T (x)u, u ≥ 2 |u|2 for all x ∈ Br , |λ − λ∗ | < r, where Br = {(y, u) ∈ Y × U : |y − y ∗ | < r, |u − u∗ | < r}. These estimates imply that, possibly after decreasing r > 0, there exists some Kˆ > 0 such that ˆ − u∗ |U and |λ(u) − λ∗ |W ∗ ≤ K|u ˆ − u∗ |U for all |u − u∗ | < r, |y(u) − y ∗ |Y ≤ K|u
i
i i
i
i
i
i
136
ItoKunisc 2008/6/12 page 136 i
Chapter 5. Newton and SQP Methods
where ey (y(u), u)∗ λ(u) = −Jy (y(u), u). Thus for ρ = min(r, Krˆ ) the inequalities in (5.2.11) hold with x = (y(u), u) and λ = λ(u), provided that |u − u∗ | < ρ. Let S(u) = T (y(u), u)∗ L (y(u), u, λ(u))T (y(u), u). Then there exists μ > 0 such that |S(u) − S(u∗ )| ≤ μ|u − u∗ |
for |u − u∗ | < ρ.
Let |u0 − u∗ | < min( μκ , ρ) and, proceeding by induction, assume that |uk − u∗ | ≤ |u0 − u∗ |. Then S(uk )(uk+1 − u∗ ) = S(uk )(uk − u∗ ) − Jˆ (uk ) + Jˆ (u∗ ) 1 = (S(uk + s(u∗ − uk )) − S(uk ))(u∗ − uk ) ds
(5.2.12)
0
and hence |uk+1 − u∗ | ≤
2μ μ |uk − u∗ |2 ≤ |u0 − u∗ | |u0 − u∗ | ≤ |u0 − u∗ | < ρ. κ 2 κ ˆ
ˆ k+1 − It further follows that |y(uk+1 ) − y ∗ |Y ≤ Kμ |uk − u∗ |2 and |y(uk+1 ) − y ∗ |Y ≤ K|u κ ˆ Kμ ˆ ≤ r. In a similar manner we have |λ(uk+1 ) − λ∗ |W ∗ ≤ |uk − u∗ |2 and u∗ | < Kρ κ ∗ ∗ |λ(uk+1 ) − λ |W < r. The claim follows from these estimates. Remark 5.2.1. If |λ∗ |W ∗ is small, then it is suggested to approximate the Hessian of L (y, u, λ) by L (y, u, λ) ∼ J (y, u) and the reduced Hessian T ∗ L T by Jˆ (u) ∼ T (y, u)∗ J (y, u)T (y, u). Under the assumptions of Theorem 5.2, T ∗ J T is positive definite in a neighborhood of the solution, provided that λ∗ is sufficiently small. Moreover, if λ∗ = 0, then the Newton method converges superlinearly to (y ∗ , u∗ ), if it converges. Indeed we can follow the proof of Theorem 5.2 and replace (5.2.12) by S0 (uk )(uk+1 − u∗ ) = S(uk )(uk − u∗ ) − Jˆ (uk ) + Jˆ (u∗ ) −T ∗ (y(uk ), uk )(e (xk )∗ (λk − λ∗ ))T (y(uk ), uk )(uk − u∗ ), where S0 (u) = T ∗ (y(u), u) J (y(u), u) T (y(u), u). Remark 5.2.2. Note that the reduced Hessian S = T ∗ L T is a Schur complement of the linear system (5.2.10). That is, if we eliminate δy and δλ by δy = −ey−1 eu δu,
δλ = −(ey∗ )−1 (Lyy δy + Lyu δu),
we obtain (5.2.9) in the form Sδu + Lu (y, u, λ) = 0.
(5.2.13)
i
i i
i
i
i
i
5.3. SQP and reduced SQP methods
ItoKunisc 2008/6/12 page 137 i
137
Remark 5.2.3. If (5.2.13) is solved by an iterative method based on Krylov subspaces, then this uses Sr for given r ∈ U and requires that we perform the forward solution 4 = −ey−1 (y, u)eu (y, u)r, δy the adjoint solution 4 + Lyu r), 4 = −(ey (y, u)∗ )−1 (Lyy δy δλ and evaluation of the expression 4 + Luu r + eu (y, u)∗ δ4 λ. Sr = Luy δy In applications where the discretization of U is much smaller than that of Y × U × W ∗ this procedure may offer a significant reduction in storage and execution time over solving the saddle point problem (5.2.10).
5.3
SQP and reduced SQP methods
In this section we describe the SQP (sequential quadratic programing) and reduced SQP methods without entering into technical details. Throughout this section we set x = (y, u) ∈ X = Y × U . Let x ∗ denote a local solution to min J (x) over x ∈ X (5.3.1) subject to e(x) = 0. Further let L(x, λ) = J (x) + λ, e(x)
denote the Lagrangian and let λ∗ be a Lagrange multiplier at the solution x ∗ . Then a necessary first order optimality condition is given by L (y ∗ , u∗ , λ∗ ) = 0,
e(x ∗ ) = 0,
(5.3.2)
and a second order sufficient optimality condition by L (x ∗ , λ∗ )(h, h) ≥ σ |h|2X for all h ∈ ker e (x ∗ ),
(5.3.3)
where σ > 0. Differently from the Newton method described in the previous section, in the SQP method both y and u are considered as independent variables related by the equality constraint e(y, u) = 0 which is realized by a Lagrangian term. The SQP method then consists essentially in a Newton method applied to the necessary optimality condition (5.3.2) to iteratively solve for (y ∗ , u∗ , λ∗ ). This results in determining updates from the linear system ⎛ ⎞ ⎛ ⎞ δy ey (yk , uk )∗ λk + Jy (yk , uk ) L (yk , uk , λk )e (yk , uk )∗ ⎝ δu ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0. e (yk , uk ) 0 δλ e(yk , uk ) (5.3.4)
i
i i
i
i
i
i
138
ItoKunisc 2008/6/12 page 138 i
Chapter 5. Newton and SQP Methods
If the values for yk and λk are obtained by means of a feasibility step as in (iv) of the Newton algorithm, then the first and the last components on the right-hand side of (5.3.4) are 0 and we arrive at the Newton algorithm. This will be further investigated in Section 5.5 for weakly singular problems. Note that for affine constraints the iterates, except possibly the initialization, are feasible by construction, and hence e(xk ) = 0. Let us note that given an iterate xk = (yk+1 , uk+1 ) near x ∗ the SQP step for (5.3.1) is also obtained as the necessary optimality to the quadratic subproblem ⎧ 1 ⎪ ⎨min L (xk , λk ), h + L (xk , λk )(h, h) 2 (5.3.5) ⎪ ⎩subject to e (x )h + e(x ) = 0, h ∈ X, k k for hk = (δyk , δuk ) and setting xk+1 = xk + hk . If (5.3.1) admits a solution (x ∗ , λ∗ ) and J and e are C 2 in a neighborhood of x ∗ with Lipschitz continuous second derivatives, and if further e (x ∗ ) is surjective and there exists κ > 0 such that L (x ∗ , λ∗ )(h, h) ≥ κ|h|2X for all h ∈ ker e (x ∗ ), then |(xk+1 , λk+1 ) − (x ∗ , λ∗ )|2 ≤ K|(xk , λk ) − (x ∗ , λ∗ )| for a constant K independent of k, provided that |(x0 , λ0 ) − (x ∗ , λ∗ )| is sufficiently small. Thus the SQP method is locally quadratically convergent. The proof is quite similar to the one of Theorem 6.4. Assume next that there exists a null-space representation of e (x) for all x in a neighborhood of x ∗ ; i.e., there exist a Hilbert space H and an isomorphism T (x) from H onto ker e (x), where ker e (x ∗ ) is endowed with the norm of X. Then (5.3.2) and (5.3.3) can be expressed as T (x ∗ )∗ J (x ∗ ) = 0
(5.3.6)
T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ ) ≥ σ I,
(5.3.7)
and
respectively. Referring to (5.3.5), let qk ∈ ker e (xk )⊥ ⊂ X satisfy e (xk )qk + e(xk ) = 0. Then hk ∈ X can be expressed as hk = qk + T (xk )w. Thus, (5.3.5) is reduced to min L (xk , λk ), T (xk )w + T (xk )w, L (xk , λk )qk
+
1 T (xk )w, L (xk , λk )T (xk )w , 2
(5.3.8)
i
i i
i
i
i
i
5.3. SQP and reduced SQP methods
ItoKunisc 2008/6/12 page 139 i
139
and the solution hk to (5.3.5) can be expressed as hk = qk + T (xk )wk ,
(5.3.9)
where wk is a solution to the unconstrained problem (5.3.8) in H , given by
T ∗ (xk )L (xk , λk )T (xk )wk = −T ∗ (xk )(J (xk ) − L (xk , λk )Rk (xk )e(xk )).
(5.3.10)
Therefore the full SQP step is decomposed into a minimization step in H , a restoration step to the linearized equation e (xk )q + e(xk ) = 0, and an update of the Lagrange multiplier according to e (xk )∗ δλ = −e (xk )∗ λk − Jy (xk ) − L (xk , λk )hk . If e (xk ) ∈ L(X, W ) admits a right inverse R(xk ) ∈ L(W, X) satisfying e (xk )R(xk ) = IW , then qk = −R(xk )e(xk ) in (5.3.9) and hk = −Rk e(xk ) − Tk Tk∗ L (xk , λk )Tk )−1 Tk∗ (J (xk ) − L (xk , λk )Rk e(xk ) ,
(5.3.11)
where Tk = T (xk ) and Rk = R(xk ) and we used that T ∗ w = 0 for w ∈ range e (x)∗ . Note that a right inverse to e (x) for x in a neighborhood of x ∗ exists if e (x ∗ ) is surjective and x → e (x) is continuous. An alternative to deriving the update wk is given by differentiating T (x)∗ J (x) = T (x)∗ L (x, λ) with respect to x, evaluating at x = x ∗ , λ = λ∗ , and using (5.3.2): d (T (x ∗ )∗ J (x ∗ )) = T (x ∗ )∗ L (x ∗ , λ∗ ). dx This representation holds in general only at the solution x ∗ . But if we use its structure for an update of the form hk = qk + T (xk )wk in a Newton step to (5.3.6), we arrive at Tk∗ L (xk , λk )hk = Tk∗ L (xk , λk )(Tk wk − R(xk )e(xk )) = −Tk∗ J (xk ), which results in an update as in (5.3.10) above. In the reduced SQP approach the term L (xk , λk )Rk e(xk ) is deleted from the expression in (5.3.11). Recall that for Newton’s method this term vanishes since e(xk ) = 0 at each iteration level. This results in
Tk∗ L (xk , λk )T (xk )wk = −Tk∗ J (xk ).
(5.3.12)
This equation for the update of the control coincides with the Schur complement form of the Newton update given in (5.2.13), since Lu (xk , λk ) = eu∗ (xk )λk + Ju (xk ) = Tk∗ J (xk ),
i
i i
i
i
i
i
140
ItoKunisc 2008/6/12 page 140 i
Chapter 5. Newton and SQP Methods
where we used the fact that in the Newton method ey∗ (xk )λk + Jy (xk ) = 0. The updates for the state and the adjoint state differ, however, for the Newton and the reduced SQP methods. A second distinguishing feature for reduced SQP methods is that often the reduced Hessians Tk∗ L (xk , λk )Tk are approximated by invertible operators Bk ∈ L(H ) suggested by secant methods. Thus a reduced SQP step has the form xk+1 = xk + hRED k , where −1 ∗ hRED k = −Rk e(xk ) − Tk Bk Tk J (xk ).
(5.3.13)
in general depends on the specific choice of For the reduced SQP method the step hRED k the null-space representation and the right inverse. In the full SQP method the step hk in (5.3.11) is invariant with respect to these choices. A third distinguishing feature of reduced SQP methods is the choice of the Lagrange multiplier, which is required to specify the update of Bk . The λ-update is typically not derived from the first equation in (5.3.4). From the first order condition we have J (x ∗ ), R(x ∗ )h + λ∗ , e (x ∗ )R(x ∗ )h = J (x ∗ ), R(x ∗ )h + λ∗ , h = 0 for all h ∈ W. This suggests the λ-update λ+ = −R(x)∗ J (x).
(5.3.14)
Other choices for the Lagrange multiplier update are possible. For convergence proofs of reduced SQP methods this update is required to be locally Lipschitz continuous; see [Kup, KuSa]. In the case of problem (P ), a vector (δy, δu) ∈ X lies in the null-space of e (x) if ey (x)δy + eu (x)δu = 0. Assuming that ey (x) : Y → W is invertible, we have ker e (x) = {(δy, δu) : δy = −ey (x)−1 eu (x) δu}. This suggests choosing H = U and using the following definitions for the null-space representation and the right inverse: T (x) = (−ey (x)−1 eu (x), I ),
R(x) = (ey (x)−1 , 0).
(5.3.15)
In this case a reduced SQP step can be decomposed as follows: (i) solve Bk δu = −T (xk )∗ J (xk ), (ii) solve ey (xk )δy = −eu (xk )δu − e(xk ), (iii) set xk+1 = xk + (δy, δu), (iv) update λ.
i
i i
i
i
i
i
5.3. SQP and reduced SQP methods
ItoKunisc 2008/6/12 page 141 i
141
The update of the Lagrange multiplier λ is needed for the approximation Bk to the reduced Hessian T ∗ (xk )L (xk , λk )T (xk ). A popular choice is given by the BFGS-update formula Bk+1 = Bk + where
1 1 zk , · zk − Bk δuk , · Bk δuk , zk , δuk
δuk , Bk δuk
zk = T ∗ (xk )L xk + T (xk )δuk , λk − J (xk ).
Each SQP step requires at least three linear systems solves, one in W ∗ for the evaluation of T (xk )∗ J (xk ), one in U for δu, and another one in Y for δy. The update of the BFGS formula requires one additional system solve. The typical convergence behavior that can be proved for reduced SQP methods with BFGS update is local two-step superlinear convergence [Kup, KuSa, KSa], |xk+1 − x ∗ | =0 k→∞ |xk−1 − x ∗ | lim
provided that B0 − T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ )
is compact.
(5.3.16)
Here B0 denotes the initialization to the reduced Hessian approximation, and (x ∗ , λ∗ ) is a solution to (5.1.4). If e is C 1 at x ∗ and ey (x ∗ ) is continuously invertible and compact, then (5.3.16) holds for the choice B0 = Luu (x ∗ , λ∗ ). In [HiK2], condition (5.3.16) is analyzed for a class of optimal control problems related to the stationary Navier–Stokes equations. We recall that condition (5.3.16) is related to an analogous condition in the context of solving nonlinear equations F (x) = 0 in infinite-dimensional Hilbert spaces by means of secant methods, in particular the Broyden method. The initialization of the Jacobian must be a compact perturbation of the linearization of F at the solution to ascertain local Q-superlinear convergence [Grie, Sa]. If we deal with finite-dimensional problems with e a mapping from RN into RM , with M < N , then a common null-space representation is provided by the QR decomposition. Let H = RN−M and R1 (x) T , e (x) = Q(x)R(x) = (Q1 (x) Q2 (x)) 0 where Q(x) is orthogonal, Q1 (x) ∈ RN ×M , Q2 (x) ∈ RN ×(N−M) , and R1 (x) ∈ RM×M is an upper triangular matrix. Then the null-space representation and the right inverse to e (x) are given by T (x) = Q2 (x) ∈ RN×(N −M) and R(x) = Q1 (x)(R1T (x))−1 ∈ RN ×M . In the finite-dimensional setting one often works with the “least squares solution” for the λ-update, i.e., with the solution to min |J (x) + e (x)∗ λ|, λ
which results in λ(x)+ =
−R1−1 (x)QT1 (x)J (x).
i
i i
i
i
i
i
142
ItoKunisc 2008/6/12 page 142 i
Chapter 5. Newton and SQP Methods
To derive another interesting feature of a variant of a reduced SQP method we recapture the derivation from above for (P ) with T and R chosen as in (5.3.15), as ⎧ T (x)∗ L (x, λ)T (x)w = −T (x)∗ L (x, λ), ⎪ ⎪ ⎨ h = (δy, δu) = q + T (x)w, where q − R(x)e(x) = 0, ⎪ ⎪ ⎩ λ + δλ = −R(x)∗ J (x).
(5.3.17)
In particular, the term L (xk , λk )Rk e(xk ) is again deleted from the expression in (5.3.10) and the Lagrange multiplier update is chosen according to (5.3.14). However, differently from (5.3.13), the reduced Hessian is not approximated in (5.3.17). Here we do not indicate the iteration index k, and we assume that ey (x) : Y → W is bijective and that (5.3.7) holds. Then (q, w, δλ) is a solution to (5.3.17) if and only if (δy, δu, δλ) is a solution to the system ⎛
0 0 ⎝ 0 T (x)∗ L (x, λ)T (x) ey (x) eu (x)
⎞ ⎞⎛ ⎞ ⎛ ey (x)∗ δy Ly (x, λ) eu (x)∗ ⎠ ⎝ δu ⎠ = − ⎝ Lu (x, λ) ⎠ , e(x) 0 δλ
(5.3.18)
with w = δu, q = −R(x)e(x), and (δy, δu) = q + T (x)δu. In fact, R ∗ (x) is a left inverse to e (x)∗ , and from the first equation in (5.3.18) we have δλ = −(ey (x)∗ )−1 (Jy (x) + ey∗ (x)λ) = −R(x)∗ J (x) + λ. From the second equation T (x)∗ L (x, λ)T (x)δu = −Lu (x, λ) − eu (x)∗ δλ = −Lu (x, λ) + eu (x)∗ ey (x)−∗ Ly (x, λ) = −T (x)∗ L (x, λ) = −T (x)∗ J (x). The third equation is equivalent to (δy, δu) = −R(x)e(x) + T (x)δu. System (5.3.18) should be compared to (5.3.4). We note that the reduced SQP step is equivalent to a block tridiagonal system, where the “Luu ” element in the system matrix is replaced by the reduced Hessian while the elements corresponding to “Lyy ” and “Lyu ” are zero. The system matrix of (5.3.18) can be used advantageously as preconditioning for iteratively solving (5.3.4). In fact, let us denote the system matrices in (5.3.4) and (5.3.18) by S and Sred , respectively, and consider the iteration −1 S δzk + col (Ly (x, λ), Lu (x, λ), e(x)) . δzn+1 = δzn − Sred
(5.3.19)
−1 S is nilpotent of degree 3. Hence In [IKSG] it was proved that the iteration matrix I − Sred the iteration (5.3.19) converges in 3 steps to the solution of (5.3.4) with (xk , λk ) = (x, λ).
i
i i
i
i
i
i
5.4. Optimal control of the Navier–Stokes equations
5.4
ItoKunisc 2008/6/12 page 143 i
143
Optimal control of the Navier–Stokes equations
In this section we consider the optimal control problem ⎧ T ⎪ ⎪ min J (y, u) = 0 (y(t)) + h(u(t)) dt ⎪ ⎪ ⎪ ⎪ ⎨ over u ∈ U = L2 (0, T ; U˜ ) subject to ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
d y(t) dt
= A0 y(t) + F (y(t)) + Bu(t) for t ∈ (0, T ],
(5.4.1)
y(0) = y0 ,
where A0 is a densely defined, self-adjoint operator in a Hilbert space H which satisfies (−A0 φ, φ)H ≥ α|φ|2H for some α > 0 independent of φ ∈ domA0 . We set V = 1 1 dom((−A0 ) 2 ) endowed with |φ|V = |(−A0 ) 2 φ|H as norm. The V -coercive form (v, w) → 1 1 ((−A0 ) 2 v, (−A0 ) 2 w) also defines an operator A ∈ L(V , V ∗ ) satisfying −Av, w V ∗ ,V = 1 1 ((−A0 ) 2 v, (−A0 ) 2 w)H for all v, w ∈ W . Since A and A0 coincide on dom A0 , and dom A0 is dense in V , we shall not distinguish between A and A0 as operators in L(V , V ∗ ). In (5.4.1), moreover, U˜ is the Hilbert space of controls, B ∈ L(U˜ , V ∗ ), and y0 ∈ H . We assume that h ∈ C 1 (U˜ , R) with Lipschitz continuous first derivative and ∈ C 1 (V , R), with Lipschitz continuous on bounded subsets of H . The nonlinearity F is supposed to satisfy ⎧ F : V → V ∗ is continuously differentiable and there exists ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ a constant c > 0 such that for every ε > 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ¯ y − y
¯ V ∗ ,V ≤ ε|y − y| ¯ 2V + εc (|y| ¯ 2V + |y|2H |y|2V )|y − y| ¯ 2H , ⎨ F (y) − F (y), (H1) ⎪ ¯ y V ∗ ,V ≤ ε|y|2V + εc |y|2H |y| ¯ 2V (1 + |y| ¯ 2H ), ⎪ ⎪ F (y)y, ⎪ ⎪ ⎪ 1 1 1 1 ⎪ ⎪ ⎪ |(F (y) − F (y))v| ¯ ¯ H2 |y − y| ¯ V2 |v|H2 |v|V2 V ∗ ≤ c|y − y| ⎪ ⎪ ⎪ ⎪ ⎩ for all y, y, ¯ and v in V .
(H2)
⎧ ⎪ For any u ∈ L2 (0, T ; U˜ ) there exists a unique weak ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ solution y = y(u) in ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) satisfying ⎪ ⎪ ⎨ dtd y(t), ψ V ∗ ,V = A0 y(t) + F (y(t)) + Bu(t), ψ V ∗ ,V ⎪ ⎪ ⎪ ⎪ ⎪ for all ψ ∈ V , and y(0) = y0 . ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Moreover {y(u) : |u|L2 (0,T ;U˜ ) ≤ r} is bounded in ⎪ ⎪ ⎪ ⎩ W (0, T ) for each r > 0.
With these preliminaries the dynamical system in (5.4.1) can be considered with values in V ∗ . The conditions on A0 and F are motivated by the two-dimensional incompressible
i
i i
i
i
i
i
144
ItoKunisc 2008/6/12 page 144 i
Chapter 5. Newton and SQP Methods
Navier–Stokes equations with distributed control given by ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
d y dt
+ (y · ∇)y + ∇p = ν∇y + Bu in (0, T ] × ,
∇y = 0
in (0, T ] × ,
y(0, ·) = y0
(5.4.2)
in ,
where is a bounded domain with Lipschitz continuous boundary ∂, y = y(t, x) ∈ R2 is the the velocity field, p = p(t, x) ∈ R is the pressure, and ν is the normalized viscosity. Let V = {φ ∈ H01 ()2 : ∇ · φ = 0}, H = {φ ∈ L2 ()2 : ∇ · φ = 0, n · φ = 0 on δ}, where n is the outer unit normal vector to ∂, let denote the Laplacian in H , and let P F denote the orthogonal projection of L2 ()2 onto the closed subspace H . Then A0 = νP 1 is the Stokes operator in H . It is a self-adjoint operator with domain dom((−A0 ) 2 ) = V and −(A0 φ, ψ)H = ν(∇φ, ∇ψ)L2 for all φ ∈ dom(A0 ), ψ ∈ V . If ∂ is sufficiently smooth, e.g., ∂ is C 2 , then dom(A0 ) = H 2 ()2 ∩ V . The nonlinearity in (5.4.2) satisfies the following properties: (u · ∇v) w dx = (u · ∇w) v dx, (5.4.3)
1
1
1
1
(u · ∇v) w dx ≤ c|u|H2 |u|V2 |v|V |w|H2 |w|V2
(5.4.4)
for a constant c independent of u, v, w ∈ V ; see, e.g., [Te]. In terms of the general formulation (5.4.1) the nonlinearity F : V → V ∗ is given by (F (φ), v)V ∗ ,V = − (φ · ∇)φ v dx.
It is well defined due to (5.4.4). Moreover (H1) follows from (5.4.3) and (5.4.4). From the theory of variational solutions to the Navier–Stokes equations it follows that there exists a constant C such that for all y0 ∈ H and u ∈ L2 (0, T ; U˜ ) there exists a unique solution y ∈ W (0, T ) such that |y|C(0,T ;H ) + |y|W (0,T ) ≤ C(|y0 |H + |u|L2 (0,T ;U˜ ) + |y0 |2H + |u|2L2 (0,T ;U˜ ) ), where |y|W (0,T ) = |y|L2 (0,T ;V ) +| dtd y|L2 (0,T ;V ∗ ) and we recall that C([0, T ]; H ) is embedded continuously into W (0, T ); see, e.g., [LiMa, Tem]. Assuming the existence of a local solution to (5.4.1) we shall present in the following subsections first and second order optimality conditions, as well as the steps which are necessary to realize the Newton algorithm. Control and especially optimal control has received a considerable amount of attention in the literature. Here we only mention a few [Be, FGH, Gu, HiK2] and refer the reader to further references given there.
i
i i
i
i
i
i
5.4. Optimal control of the Navier–Stokes equations
5.4.1
ItoKunisc 2008/6/12 page 145 i
145
Necessary optimality condition
Let x ∗ = (y ∗ , u∗ ) denote a local solution to (5.4.1). We shall derive a first order optimality condition by verifying the assumptions of Theorem 5.1 and applying (5.1.4). In the context of Section 5.1 we set Y = W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ), W = L2 (0, T ; V ∗ ) × H, U = L2 (0, T ; U˜ ), T J (y, u) = 0 (y(t)) + h(u(t)) dt, e(y, u) = (yt − A0 y − F (y) − Bu, y(0)). To verify (C1)–(C3) with (y, u) = (y ∗ , u∗ ), let V (y ∗ ) × V (u∗ ) denote a bounded neighborhood of (y ∗ , u∗ ). Let V (y ∗ ) be chosen such that y(u) ∈ V (y ∗ ) for every u ∈ V (u∗ ). This is possible by (H2). Since Y = W (0, T ) is embedded continuously into C([0, T ]; H ), the continuity assumptions for and h imply the continuity requirements J in (C1). Note that e (y, u) : Y × U → W is given by e (y, u)(δy, δu) = ((δy)t − A0 δy − F (y)δy − Bδu, δy(0)), and global Lipschitz continuity of (y, u) → e (y, u) from V (y ∗ ) × V (u∗ ) ⊂ Y × U to L(Y × U, W ) follows from the last condition in (H1). Solvability of e(y, u) = 0 with respect to y for given u follows from (H2) and hence (C1) holds. Since, by (H1), the bounded bilinear form a(t; ·, ·) : V × V → R defined by t → a(t; φ, ψ) = − A0 φ, ψ V ∗ ,V − F (y ∗ (t))φ, ψ V ∗ ,V satisfies
c a(t; φ, φ) ≥ |φ|2V − ε|φ|2V − |φ|2V |y ∗ (t)|2V (1 + |y ∗ |2C(0,T ;H ) ), ε there exists a constant c¯ > 0 such that a(t; φ, φ) ≥
1 2 |φ| − c|y ¯ ∗ (t)|2V |φ|2H for t ∈ (0, T ), 2 V
where |y ∗ |2V ∈ L1 (0, T ). Consequently, the adjoint equation − dtd p ∗ (t) = A0 p ∗ (t) + F (y ∗ (t))∗ p ∗ (t) + (y ∗ (t)), p∗ (T ) = 0 admits a unique solution p∗ ∈ W (0, T ), with T T ∗ 2 ∗ 2 C 0 |y ∗ (s)|2V ds |p (t)|H + |p (s)|V ds ≤ e t
T
| (y ∗ (s))|2V ∗ ds
(5.4.5)
(5.4.6)
t
for a constant C independent of t ∈ [0, T ]. In fact, −
1 d ∗ 2 |p (t)|H + |p ∗ (t)|2V ≤ | F (y ∗ (t))p ∗ (t), p∗ (t) V ∗ ,V | + | (y ∗ (t))|V ∗ |p ∗ (t)|V . 2 dt
i
i i
i
i
i
i
146
ItoKunisc 2008/6/12 page 146 i
Chapter 5. Newton and SQP Methods
Hence by (H1) there exists a constant C such that d ∗ 2 |p (t)|H − C|y ∗ (t)|2V |p ∗ (t)|2H + |p ∗ (t)|2V ≤ | (y ∗ (t))|2V ∗ . dt T Multiplying by exp(− t ρ(s)ds), ¯ where ρ(s) ¯ = C|y ∗ (s)|2V , we find −
|p
∗
(t)|2H
T
+
|p
∗
(s)|2V
exp
t
s
ρ(τ ¯ )dτ ds ≤
t
T
| (y
∗
(s))|2V ∗
exp
t
s
ρ(τ ¯ )dτ ds
t
and (5.4.6) follows. This implies (C2) with λ∗ = (p ∗ , p∗ (0)). For any y ∈ W (0, T ) we have d d for t ∈ (0, T ). (5.4.7) |y(t)|2H = 2 y(t), y(t) dt dt V ∗ ,V Let u ∈ V (u∗ ) and denote by y = y(u) ∈ V (y ∗ ) the solution to the dynamical system in (5.4.1). From (5.4.7) we have 1 d |y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V 2 dt = F (y(t)) − F (y ∗ (t)), y(t) − y ∗ (t) V ∗ ,V + Bu(t) − Bu∗ (t), y(t) − y ∗ (t) V ∗ ,V . By (H2) the set {y(u) : u ∈ V (u∗ )} is bounded in W (0, T ). Hence by (H1) there exists a constant C independent of u ∈ V (u∗ ) and t ∈ [0, T ] such that d |y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V ≤ C(ρ(t)|y(t) − y ∗ (t)|2H + |u(t) − u∗ (t)|2U˜ ), dt where ρ(t) = |y(t)|2V + |y ∗ (t)|2V . By Gronwall’s inequality this implies that |y(t) − y
∗
(t)|2H
t
+ 0
|y(t) − y
∗
(t)|2V ds
≤ C exp C
T
t
ρ(τ )dτ
0
|u(s) − u∗ (s)|2 ds,
0
T where 0 ρ(τ )dτ is bounded independent of u ∈ V (u∗ ). Utilizing the equations satisfied by y(u) and y(u∗ ) it follows that ˆ − u∗ |L2 (0,T ;U˜ ) |y(u) − y(u∗ )|W (0,T ) ≤ C|u
(5.4.8)
for a constant Cˆ independent of u ∈ V (u∗ ). Hence (C3) follows and Theorem 5.1 implies the optimality system d ∗ y (t) = A0 y ∗ (t) + F (y ∗ (t)) + Bu∗ (t), y(0) = y0 , dt d − p ∗ (t) = A0 (y ∗ (t)) + F (y ∗ (t))∗ p ∗ (t) + (y ∗ (t)), p ∗ (T ) = 0, dt B ∗ p ∗ (t) + h (u∗ (t)) = 0.
i
i i
i
i
i
i
5.4. Optimal control of the Navier–Stokes equations
5.4.2
ItoKunisc 2008/6/12 page 147 i
147
Sufficient optimality condition
We assume J to be of the form 1 T β T J (y, u) = (Q(y(t) − y(t)), ¯ y(t) − y(t)) ¯ dt + |u(t)|2U˜ dt, H 2 0 2 0 where Q is symmetric and positive semidefinite on H , y¯ ∈ L2 (0, T ; H ), and β > 0. We shall verify the requirements in (II) of Section 5.1. By (5.4.8) it follows that E1 (y(u) − y ∗ , u − u∗ ) ≥ α(|y(u) − y ∗ |2W (0,T ) + |u − u∗ |2L2 (0,T ;U˜ ) ) for all u ∈ V (u∗ ).
(5.4.9)
Moreover, |e(y(u), u) − e(y ∗ , u∗ ) − e (y ∗ , u∗ )(y(u) − y ∗ , u − u∗ )|L2 (0,T ;V ∗ )×H = |F (y(u)) − F (y ∗ ) − F (y ∗ )(y(u) − y ∗ )|L2 (0,T ,V ∗ )
T
= 0
≤c 0
∗
∗
|F (y) − F (y ) − F (y )(y(u) − y T 1
∗
)|2V ∗
12
s|y(t) − y ∗ (t)|H |y(t) − y ∗ (t)|V ds
2
12 dt
0
c ≤ √ |y − y ∗ |C(0,T ;H ) |y − y ∗ |L2 (0,T ;V ) , 2 and therefore |e(y(u), u) − e(y ∗ , u∗ ) − e (y ∗ , u∗ )(y(u) − y ∗ )|L2 (0,T ;V ∗ )×H ¯ ≤ C|y(u) − y ∗ |2W (0,T ) for a constant C¯ independent of u ∈ V (u∗ ). From (5.4.6) C ∗ |p∗ |L2 (0,T ;V ) ≤ exp ¯ L2 (0,T ;V ∗ ) . |y |L2 (0,T ;V ) |Q(y ∗ − y)| 2
(5.4.10)
(5.4.11)
Combining (5.4.9)–(5.4.11) implies that if |Q(y ∗ − y)| ¯ L2 (O,T ,V ∗ ) is sufficiently small, then (y ∗ , u∗ ) is a strict local minimum.
5.4.3
Newton’s method for (5.4.1)
Here we specify the steps necessary to carry out one Newton iteration for the optimal control problem related to the Navier–Stokes equation. Let u ∈ L2 (0, T ; U˜ ) and let y = y(u) ∈ W (0, T ) denote the associated velocity field. For δu ∈ L2 (0, T ; U˜ ) let δy, λ, and μ in W (0, T ) denote the solutions to the sensitivity equation d δy = A0 δy + F (y)δy + Bδu, δy(0) = 0, dt
(5.4.12)
i
i i
i
i
i
i
148
ItoKunisc 2008/6/12 page 148 i
Chapter 5. Newton and SQP Methods
the adjoint equation d λ = A0 λ + F (y)∗ λ + (y), dt and the adjoint equation arising in the Hessian −
λ(T ) = 0,
d μ = A0 μ + F (y)∗ μ + (F (y)∗ λ)δy + (y)δy, μ(T ) = 0. dt Then the operators characterizing the Newton step −
Jˆ (u) δu = −Jˆ (u)
(5.4.13)
(5.4.14)
(5.4.15)
can be expressed by Jˆ (u) = h (u) + B ∗ λ and Jˆ (u)δu = h (u)δu + B ∗ μ. Let us also note that − F (y)δy, ψ V ∗ ,V = b(y, δy, ψ) + b(δy, y, ψ), − F (y)∗ λ, ψ = b(y, ψ, λ) + b(ψ, y, λ), − (F (y)∗ λ) δy, ψ = b(δy, ψ, λ) + b(ψ, δy, λ) for all ψ ∈ V , where b(u, v, w) = (u · ∇)v w dx. The evaluation of the gradient Jˆ (u) requires one forward in time solve for y(u) and one adjoint solve for λ. Each evaluation of the Hessian necessitates one forward solve of the sensitivity equation for δy and one solve of the second adjoint equation (5.4.14) for μ. If the approximation as in Remark 5.2.1 is used, then this results in not including the term (F (y)∗ λ)δy in the second adjoint equation and each evaluation requires us to solve the linear time-varying Hamiltonian system d y = A(t) δy + B δu, δy(0) = 0, dt d − μ = A(t)∗ μ + (y)δy, μ(T ) = 0, dt where A(t) = A0 + F (y(t)).
5.5
Newton method for the weakly singular case
In this section we discuss the Newton method for the weakly singular case as introduced in Section 1.5, and we utilize the notation given there. In particular J : Y × U → R, e : Y1 × U → W , with ey (y, u) considered as an operator in Y . Assuming that (y, u, λ) ∈ Y1 × U × dom(ey (y, u)∗ ), the SQP step for (δy, δu, δλ) is given by ⎛ ⎞ ⎞⎛ ⎞ ⎛ L (y, u, λ) e (y, u)∗ δy ey (y, u)∗ λ + Jy (y, u) ⎝ ⎠ ⎝δu⎠ = − ⎝eu (y, u)∗ λ + Ju (y, u)⎠ . e(y, u) e (y, u) 0 δλ
i
i i
i
i
i
i
5.5. Newton method for the weakly singular case
ItoKunisc 2008/6/12 page 149 i
149
The updates (y +δy, u+δu, λ+δλ) will not necessarily remain in Y1 ×U ×dom(ey (y, u)∗ ) since it is not assumed that e is surjective from Y1 × U to W . However, the feasibility steps consisting in solving the primal equation e(y, u+ ) = 0
(5.5.1)
ey (y + , u+ )∗ λ + Jy (y + , u+ ) = 0
(5.5.2)
+
for y = y and the adjoint equation for the dual variable λ = λ+ will guarantee that (y + , λ+ ) ∈ Y1 ×Z1 holds. Here u+ = u+δu and Z1 ⊂ dom (ey (y + , u+ )∗ ) denotes a Banach space densely embedded into W ∗ , with λ∗ ∈ Z1 . Since Y1 and Z1 are contained in Y and W ∗ , the feasibility steps (5.5.1)–(5.5.2) can also be considered as smoothing steps. Thus we obtain the Newton method for the singular case. Algorithm • Initialization: Choose u0 ∈ V (u∗ ), solve e(y, u0 ) = 0,
ey (y0 , u0 )∗ λ + Jy (y0 , u0 ) = 0 for (y0 , λ0 ) ∈ Y1 × Z1 ,
and set k = 0. • Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗ ⎛ ⎞⎛ ⎞ ⎛ ⎞ L (yk , uk , λk ) e (yk , uk )∗ δy 0 ⎝ ⎠ ⎝δu⎠ = − ⎝eu (yk , uk )∗ λk + Ju (yk , uk )⎠ . e (yk , uk ) 0 δλ 0 • Newton update: uk+1 = uk + δu. • Feasibility step: Solve for (yk+1 , λk+1 ) ∈ Y1 × Z1 : e(y, uk+1 ) = 0,
ey (yk+1 , uk+1 )∗ λ + Jy (yk+1 , uk+1 ) = 0.
• Stop, or set k = k + 1, and goto the Newton step. Remark 5.5.1. The algorithm is essentially the Newton method of Section 5.2. Because of the feasible step, the first and the last components on the right-hand side of the SQP step are zero, and Newton’s method and the SQP method coincide. Let us point out that the SQP iteration may not be well defined without the feasibility step since the updates (y + , λ+ ) may only be in Y × W ∗ , while e and ey (y + , u+ )∗ are not necessarily well defined on Y × U and W ∗. We next specify the assumptions which justify the above derivation and under which well-posedness and convergence of the algorithm can be proved. Thus let (y ∗ , u∗ , λ∗ ) ∈ Y1 × U × Z1 be a solution to (5.1.4), or equivalently to (1.5.3), and let V (y ∗ ) × V (u∗ ) × V (λ∗ ) ⊂ Y1 × U × Z1 be a convex bounded neighborhood of the solution triple (y ∗ , u∗ , λ∗ ).
i
i i
i
i
i
i
150
ItoKunisc 2008/6/12 page 150 i
Chapter 5. Newton and SQP Methods
(H5) (a) For every u ∈ V (u∗ ) there exists a unique solution y = y(u) ∈ V (y ∗ ) of e(y, u) = 0. Moreover, there exists M > 0 such that |y(u) − y ∗ |Y1 ≤ M|u − u∗ |U . (b) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) there exists a unique solution λ = λ(y, u) ∈ V (λ∗ ) of ey (y, u)∗ λ + Jy (y, u) = 0 and |λ(y, u) − λ∗ |Z1 ≤ M|(y, u) − (y ∗ , u∗ )|Y1 ×U . (H6) J is twice continuously Fréchet differentiable on Y × U with the second derivative locally Lipschitz continuous. (H7) The operator e : V (y ∗ ) × V (u∗ ) ⊂ Y1 × U → W is Fréchet differentiable with Lipschitz continuous Fréchet derivative e (y, u) ∈ L(Y1 × U, W ). Moreover, for each (y, u) ∈ V (y ∗ ) × V (u∗ ) the operator e (y, u) with domain in Y × U has closed range. (H8) For every λ ∈ V (λ∗ ) the mapping (y, u) → λ, e(y, u) W ∗ ,W from V (y ∗ ) × V (u∗ ) to R is twice Fréchet differentiable and the mapping (y, u, λ) → λ, e (y, u)(·, ·) W ∗ ,W from V (y ∗ ) × V (u∗ ) × V (λ∗ ) ⊂ Y1 × U × Z1 → L(Y1 × U, Y ∗ × U ∗ ) is Lipschitz continuous. Moreover, for each (y, u, λ) ∈ Y1 × U × Z1 , the bilinear form λ, e (y, u)(·, ·) W ∗ ,W can be extended as a continuous bilinear form on (Y × U )2 . (H9) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) ⊂ Y1 × U the operator e (y, u) can be extended as continuous linear operator from Y × U to W , and the mapping (y, u) → e (y, u) from V (y ∗ ) × V (u∗ ) ⊂ Y1 × U → L(Y × U, W ) is continuous and (y, u, λ) → (λ, e (y, u)(·, ·)) from V (y ∗ )×V (u∗ )×V (λ∗ ) ⊂ Y1 ×U ×Z1 → L(Y ×U, Y ∗ ×U ∗ ) is continuous. (H10) There exists κ > 0 such that L (y ∗ , u∗ , λ∗ )v, v Y ∗ ×U ∗ ,Y ×U ≥ κ |v|2Y ×U for all v ∈ ker e (y ∗ , u∗ ) ⊂ Y × U . (H11) e (y ∗ , u∗ ) ∈ L(Y × U, W ) is surjective. Condition (H5) requires well-posedness of the primal and the adjoint equation in Y1 , respectively, Z1 . The adjoint equations arise from linearization of e at elements of Y1 ×U . Condition (H6) requires smoothness of J . In (H7) and (H8) the necessary regularity requirements for e as mapping on Y1 × U and in Y × U are specified. From (H5) it follows that the initialization as well as the feasibility step are well defined provided that uk ∈ V (u∗ ). As a consequence the derivatives of J and e that are required for defining the Newton step are taken at elements (yk , uk , λk ) ∈ Y1 × U × Z1 . For x = (y, u, λ) ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) let A(x) denote the operator L (y, u, λ) e (y, u)∗ . A(x) = e (y, u) 0 Conditions (H6) and (H9) guarantee that the operator A(x) ∈ L(Y ×U ×W ∗ , Y ∗ ×U ∗ ×W ) for x ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) . Conditions (H6), (H9)–(H11) imply that
i
i i
i
i
i
i
5.5. Newton method for the weakly singular case
ItoKunisc 2008/6/12 page 151 i
151
(H12) there exist a neighborhood V (x ∗ ) ⊂ V (y ∗ ) × V (u∗ ) × V (λ∗ ) and M1 , such that for every x = (y, u, λ) ∈ V (x ∗ ) and δw ∈ Y ∗ × U ∗ × W A(x)δx = δw admits a unique solution δx ∈ Y × U × W ∗ satisfying |δx|Y ×U ×W ∗ ≤ M1 |δw|Y ∗ ×U ∗ ×W . Theorem 5.3. If (H5)–(H8) and (H12) hold at a solution (y ∗ , u∗ , λ∗ ) ∈ Y1 × U × Z1 of (5.1.4) and |u0 − u∗ |U is sufficiently small, then the iterates of the algorithm are well defined and they satisfy |(yk+1 , uk+1 , λk+1 ) − (y ∗ , u∗ , λ∗ )|Y1 ×U ×Z1 ≤ K |uk − u∗ |2U
(5.5.3)
for a constant K independent of k. Proof. Well-posedness of the algorithm in a neighborhood of (y ∗ , u∗ , λ∗ ) follows from (H5) and (H12). To prove convergence let us denote by x ∗ the triple (y ∗ , u∗ , λ∗ ), and similarly δx = (δy, δu, δλ) and xk = (yk , uk , λk ). Without loss of generality we assume that min(M, M1 ) ≥ 1. The Newton step of the algorithm can be expressed as A(xk )δx = −F (xk ), with F : Y1 × U × Z1 → Y ∗ × U ∗ × W defined by F (y, u, λ) = −(ey (y, u)∗ λ + Jy (y, u), eu (y, u)∗ λ + Ju (y, u), e(y, u)). Due to the smoothing step the first and third coordinates of F are 0 at xk . By (H5) there exists M > 0 such that |yk − y ∗ |Y1 ≤ M |uk − u∗ |U
if uk ∈ V (u∗ )
(5.5.4)
and |λk − λ∗ |Z1 ≤ M |(yk , uk ) − (y ∗ , u∗ )|Y1 ×U
if (yk , uk ) ∈ V (y ∗ ) × V (u∗ ),
(5.5.5)
where (yk , λk ) are determined by the feasibility step. Let us assume that xk ∈ V (x ∗ ). Then it follows from (H6)–(H8) that, possibly after increasing M, it can be chosen such that for every xk ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) |F (x ∗ ) − F (xk ) − F (xk )(x ∗ − xk )|Y ∗ ×U ∗ ×W =
1
|(F (xk + s(x ∗ − xk )) − F (xk ))(x ∗ − xk )|Y ∗ ×U ∗ ×W ds
0
= 0
1
|(A(xk + s(x ∗ − xk )) − A(xk ))(x ∗ − xk )|Y ∗ ×U ∗ ×W ds ≤
M ∗ |x − xk |2Y1 ×U ×Z1 . 2
i
i i
i
i
i
i
152
ItoKunisc 2008/6/12 page 152 i
Chapter 5. Newton and SQP Methods
Moreover, |A(xk )(xk + δx − x ∗ )| = |F (x ∗ ) − F (xk ) − F (xk )(x ∗ − xk )| ≤
M |xk − x ∗ |2Y1 ×U ×Z1 . 2
Consequently, by (H12) |uk+1 − u∗ |U ≤ |xk+1 − x ∗ |Y ×U ×W ∗ ≤
MM1 |xk − x ∗ |2Y1 ×U ×Z1 , 2
(5.5.6)
provided that xk ∈ V (x ∗ ). The proof will be completed by an induction argument with respect to k. Let r be such that 2M 5 M1 r < 1 and that |x − x ∗ | < 2M 2 r implies x ∈√V (x ∗ ). Assume that |u0 − u∗ | ≤ r. Then |y0 − y ∗ |Y1 ≤ Mr by (5.5.4) and |λ0 − λ∗ |Z1 ≤ 2M 2 r by (5.5.5). It follows that |x0 − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 |u0 − u∗ |2U ≤ 2M 2 r and hence x0 ∈ V (x ∗ ). Let |xk − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 r. Then from (5.5.6) |uk+1 − u∗ |U = 2M 5 M1 r 2 ≤ r.
(5.5.7)
Consequently (5.5.4)–(5.5.6) are applicable and imply |xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 2 |uk+1 − u∗ |2U ≤ M 3 M1 |xk − x ∗ |2Y1 ×U ×Z1 . It follows that
(5.5.8)
|xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 7 M1 |uk − u∗ |2U ,
which implies (5.5.3) with K = 4M 7 M1 . From (5.5.7)–(5.5.8) finally |xk+1 −x ∗ |Y1 ×U ×Z1 ≤ 2M 2 r. Let us return now to some of the examples of Chapter 1, Section 1.5, and discuss the applicability of conditions (H5)–(H11). Example 1.19 revisited. Condition (H5)(a) is a direct consequence of Lemma 1.14. Condition (H5)(b) corresponds to (∇λ, ∇φ) + (ey λ, φ) + (y − z, φ) = 0 for all φ ∈ Y,
(5.5.9)
given y ∈ Y1 . Let Z1 = Y1 = Y ∩ L∞ (). It follows from [Tr, Chapter 2.3] and the proof of Lemma 1.14 that there exists a unique solution λ = λ(y) ∈ Z1 to (5.5.9). Moreover, if y ∈ V (y ∗ ) and w = λ(y) − λ(y ∗ ), then w ∈ Z1 satisfies ∗
∗
(∇w, ∇φ) + (ey w, φ) + ((ey − ey )λ, φ) + (y − y ∗ , φ) = 0 for all φ ∈ Y. From the proof of Lemma 4.1 it follows that there exists M > 0 such that |λ(y) − λ(y ∗ )|Z1 ≤ M |y − y ∗ |Y1 for all y ∈ V (y ∗ ) and thus (H5)(b) is satisfied. It is simple to argue the validity of (H6)–(H9). Note that e (y ∗ , u∗ ) is surjective from Y × U to W and thus (H11) is satisfied. As for (H10), this condition is equivalent to the requirement that ∗
|δy|2L2 () + (λ∗ ey , (δy)2 ) + β |δu|2U ≥ κ(|δy|2Y + |δu|2U )
i
i i
i
i
i
i
5.5. Newton method for the weakly singular case
ItoKunisc 2008/6/12 page 153 i
153
for all (δy, δu) ∈ Y × U satisfying ∗
(∇δy, ∇φ) + (ey δy, φ) − (δu, φ) = 0 for all φ ∈ Y,
(5.5.10)
where λ∗ is the solution to (5.3.11) with y = y ∗ . Then there exists k¯ > 0 such that 2 ¯ |δy|2Y ≤ k|δu|
for all (δy, δu) satisfying (5.5.10). It follows that (H10) holds if |(λ∗ )− |L∞ is sufficiently small. This is the case, for example, in the case of small residue problems, in the sense that |y ∗ − z|2L2 () is small enough. If z ≥ y ∗ , then the weak maximum principle [Tr] is applied to (5.5.9) and gives λ∗ ≥ 0. With quite analogous arguments it can be shown that (H5)–(H11) also hold for Example 1.20 of Chapter 1. Example 1.22 revisited. The constraint and adjoint equations are given by (∇y, ∇φ) − (u y, ∇φ) = (f, φ) for all φ ∈ H01 () and (∇λ, ∇φ) + (u λ, ∇φ) + (y − z, φ) = 0 for all φ ∈ Y = H01 (),
(5.5.11)
where div u = 0, f ∈ L2 (), and z ∈ L2 (). As already discussed in Examples 1.15 and 1.22 they admit unique solutions in Y1 = H01 () ∩ L∞ () for u ∈ U = L2n (). Let w1 = y(u) − y(u∗ ) and w2 = λ(y, u) − λ(y ∗ , u∗ ). Then (∇w1 , ∇φ) − (u w1 , ∇φ) = ((u − u∗ )y ∗ , ∇φ) and (∇w2 , ∇φ) + (u w2 , ∇φ) = −((u − u∗ ) λ∗ , ∇φ) − (y − y ∗ , φ) for all φ ∈ Y. From (1.4.22) it follows that (H5) holds if y ∗ and λ∗ are in W 1,∞ . Conditions (H6)– (H8) are easily verified. Here we address only the closed range property of e (y, u), with (y, u) ∈ Y1 × U . For u ∈ Cn∞ () with div u = 0 surjectivity follows from the Lax– Milgram lemma, and a density argument asserts surjectivity for every u ∈ U . We turn to (H12) and assume that u ∈ Cn∞ (), div u = 0 first. Then e (y, u) ∈ L(V × U, H −1 ()) and e (y, u)(δy, δu) = 0 can be expressed as (∇δy, ∇φ) − (δu y, ∇φ) − (u δy, ∇φ) = 0 for all φ ∈ Y. Hence |δy|Y ≤ |y|L∞ |δu|U
(5.5.12)
for all (δy, δu) ∈ ker e (y, u). Henceforth we assume that β − 2|y ∗ |L∞ |λ∗ |L∞ > 0.
i
i i
i
i
i
i
154
ItoKunisc 2008/6/12 page 154 i
Chapter 5. Newton and SQP Methods
Then there exists a neighborhood V (y ∗ ) × V (λ∗ ) of (y ∗ , λ∗ ) and κ > 0 such that β − 2|y|L∞ () |λ|L∞ ≥ κ
(5.5.13)
for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ). For (δy, δu) ∈ Y × U and (y, λ) ∈ V (y ∗ ) × V (λ∗ ) we have by (5.5.12) and (5.5.13) L (y, u, λ)((δy, δu), (δy, δu)) = |δy|2L2 () + β|δu|2U − 2(δu∇δy, λ) ≥ β|δu|2 − 2|λ|L∞ |∇δy|Y |δu|U ≥ κ|δu|2U . This estimate, together with (5.5.12), implies that L (y, u, λ) is coercive on ker e (y, u) and hence A(x)δx = δw admits a unique solution in Y × U × Y for every δw. To estimate δx in terms of δw, we note that the last equation in the system A(x)δx = δw implies that |δy|Y ≤ |y|L∞ |δu|U + |w3 |H −1 .
(5.5.14)
Similarly, we find from the first equation in the system that |δλ|2Y ≤ |δy||δλ| + |λ|L∞ |δλ|Y |δu|U + |w1 |H −1 |δλ|Y and consequently |δλ|Y ≤ c|δy| + |λ|L∞ () |δu|U + |w1 |H −1 , where c is the embedding constant of
H01 ()
(5.5.15)
2
into L (). Moreover, using (5.5.14) we find
L (y, u, λ)((δy, δu), (δy, δu)) ≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |δy|Y |δu|U ≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |y|L∞ () |δu|2U − 2|λ|L∞ |w3 |H −1 |δu|U ≥ |δy|2 +
κ|δu|2U
(5.5.16)
− 2|λ|L∞ |w3 |H −1 |δu|U .
From (5.5.15), (5.5.16) and utilizing A(x)δx = δw we obtain |δy|2 + κ|δu|2 ≤ (2|λ|L∞ |δu|U + |δλ|Y )|w3 |H −1 + |w1 |H −1 |δy|Y + |w2 |U |δu|U . (5.5.17) Inequalities (5.5.14), (5.5.15), and (5.5.17) imply the existence of a constant M1 such that |δx|Y ×U ×Y ≤ M1 |δw|H −1 ×U ×H −1
(5.5.18)
for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ) and every b ∈ Cn∞ (), with div b = 0. A density argument with respect to u implies that A(x)δx = δw admits a solution for all x ∈ V (y ∗ )×U ×V (λ∗ ) and that (5.5.18) holds for all such x.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 155 i
Chapter 6
Augmented Lagrangian-SQP Methods 6.1
Generalities
This chapter is devoted to second order augmented Lagrangian methods for optimization problems with equality constraints of the type min f (x) over x ∈ X (6.1.1) subject to e(x) = 0, and to problems with equality constraints as well as additional constraints min f (x) over x ∈ C subject to e(x) = 0,
(6.1.2)
where f : X → R, e : X → W , with X and W real Hilbert spaces and C a closed convex set in X. We shall show that for equality constraints it is possible to replace the first order Lagrangian update that was developed in Chapter 4 by a second order one. It will become evident that second order augmented Lagrangian methods are closely related to SQP methods. For equality-constrained problems the SQP method also coincides with the Newton method applied to the first order optimality conditions. Just like the Newton method, the SQP method and second order augmented Lagrangian methods are convergent with second order convergence rate if the initialization is sufficiently close to a local solution of (6.1.1) and if appropriate additional regularity conditions are satisfied. In case the initialization is not in the region of attraction, globalization techniques such as line searches or trust region techniques may be necessary. We shall not focus on such methods within this monograph. However, the penalty term, which, together with the Lagrangian term, characterizes the augmented Lagrangian method, also has a globalization effect. This will become evident from the analytical as well as the numerical results of this chapter. Let us stress that for the concepts that are analyzed in this chapter we do not advocate choosing c large. As in Chapter 3, throughout this chapter we identify the dual of the space W with itself, and we consider e (x)∗ as an operator from W to X. In Section 6.2 we present the second order augmented Lagrangian method for (6.1.1). Problems with additional constraints as 155
i
i i
i
i
i
i
156
ItoKunisc 2008/6/12 page 156 i
Chapter 6. Augmented Lagrangian-SQP Methods
in (6.1.2) are considered in Section 6.3. Applications to optimal control problems will be given in Section 6.4. Section 6.5 is devoted to short discussions of miscellaneous topics including reduced SQP methods and mesh independence. In Section 6.6 we give a short description of related literature.
6.2
Equality-constrained problems
In this section we consider
-
min f (x) over x ∈ X subject to e(x) = 0,
(6.2.1)
f : X → R, e : X → W , with X and W real Hilbert spaces. Let x ∗ be a local solution of (6.2.1). As before derivatives as well as partial derivatives with respect to the variable x will be denoted by primes. We shall not distinguish by notation between the functional f in the dual X ∗ of X and its Riesz representation in X. As in Chapter 4 we shall identify the topological duals of W and Z with themselves. It is assumed throughout that ⎧ ⎪ ⎨f and e are twice continuously Fréchet differentiable (6.2.2) with Lipschitz continuous second derivatives ⎪ ⎩ in a convex neighborhood of V (x ∗ ) of x ∗ and
e (x ∗ ) is surjective.
(6.2.3)
The Lagrangian functional associated with (6.2.1) is denoted by L : X × W → R and it is given by L(x, λ) = f (x) + λ, e(x) W . With (6.2.3) holding there exists a Lagrange multiplier λ∗ ∈ W such that the following first order necessary optimality condition is satisfied: L (x ∗ , λ∗ ) = 0, e(x ∗ ) = 0.
(6.2.4)
We shall also make use of the following second order sufficient optimality condition: there exists κ > 0 such that (6.2.5) L (x ∗ , λ∗ )(h, h) ≥ κ |h|2X for all h ∈ ker e (x ∗ ). Here L (x ∗ , λ∗ ) denotes the bilinear form characterizing the second Fréchet derivative of L with respect to x at (x ∗ , λ∗ ). For any c > 0 the augmented Lagrangian functional Lc : X × W → R is defined by c Lc (x, λ) = f (x) + λ, e(x) W + |e(x)|2W . 2 We note that the necessary optimality condition implies L c (x ∗ , λ∗ ) = 0 e(x ∗ ) = 0 for all c ≥ 0.
(6.2.6)
i
i i
i
i
i
i
6.2. Equality-constrained problems
ItoKunisc 2008/6/12 page 157 i
157
Lemma 6.1. Let (6.2.3) and (6.2.5) hold. Then there exists a neighborhood V (x ∗ , λ∗ ) of (x ∗ , λ∗ ), c¯ > 0 and σ¯ > 0 such that L c (x, λ)(h, h) ≥ σ¯ |h|2X for all h ∈ X, (x, λ) ∈ V (x ∗ , λ∗ ), and c ≥ c. ¯ Proof. Corollary 3.2 and conditions (6.2.3) and (6.2.5) imply the existence of σ¯ > 0 and c¯ > 0 such that L c (x ∗ , λ∗ )(h, h) ≥ 2σ¯ |h|2X for all h ∈ X and c ≥ c. ¯ Due to continuity of (x, λ) → L c (x, λ) the conclusion of the lemma follows. Lemma 6.1 implies in particular that x → Lc (x, λ∗ ) can be bounded from below by a quadratic function. This fact is referred to as augmentability of (6.2.1) at (x ∗ , λ∗ ). Lemma 6.2. Let (6.2.3) and (6.2.5) hold. Then there exist σ¯ > 0, c¯ > 0, and a neighborhood V˜ (x ∗ ) of x ∗ such that
2 Lc (x, λ∗ ) ≥ Lc (x ∗ , λ∗ ) + σ¯ x − x ∗ X for all x ∈ V˜ (x ∗ ) and c ≥ c. ¯
(6.2.7)
Proof. Due to Taylor’s theorem, Lemma 6.1, and (6.2.6) we find for x ∈ V (x ∗ ) 1 Lc (x, λ∗ ) = Lc (x ∗ , λ∗ ) + L c (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) + o(|x − x ∗ |2X ) 2 ≥ Lc (x ∗ , λ∗ ) +
2
2
σ¯
x − x ∗ X + o( x − x ∗ X ). 2
The claim follows from this estimate. Without loss of generality we may assume that the neighborhoods V (x ∗ ) and V˜ (x ∗ ) of (6.2.2) and Lemma 6.2 coincide and that V (x ∗ , λ∗ ) of Lemma 6.1 equals V (x ∗ ) × V (λ∗ ). Due to (6.2.2) we can further assume that e (x) is surjective for all x ∈ V (x ∗ ). To iteratively determine (x ∗ , λ∗ ) one can apply Newton’s method to (6.2.6). Given a ˆ is the solution to current iterate (x, λ) the next iterate (x, ˆ λ)
L c (x, λ) e (x)
e (x)∗ 0
xˆ − x λˆ − λ
=−
L c (x, λ) e(x)
.
(6.2.8)
Alternatively, (6.2.8) can be used to define the λ-update only, whereas the x-update is calculated by a different technique. We shall demonstrate next that (6.2.8) can be solved for λˆ without recourse to x. ˆ For (x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c¯ we define B(x, λ) ∈ L(W ) by B(x, λ) = e (x)L c (x, λ)−1 e (x)∗ .
(6.2.9)
i
i i
i
i
i
i
158
ItoKunisc 2008/6/12 page 158 i
Chapter 6. Augmented Lagrangian-SQP Methods
Here L(W ) denotes the space of bounded linear operators from W into itself. Note that B(x, λ) is invertible. In fact, there exists a constant k > 0 such that B(x, y)y, y W = L c (x, λ)−1 e (x)∗ y, e (x)∗ y X
2 ≥ k e (x)∗ y X for all y ∈ W . Since e (x)∗ is injective and has closed range, there exists kˆ such that
∗ 2
e (x) y ≥ kˆ |y|2 for all y ∈ W, W X and by the Lax–Milgram theorem continuous invertibility of B(x, λ) follows, provided that ¯ Premultiplying the first equation in (6.2.8) by e (x)L c (x, λ)−1 (x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c. we obtain ˆ λ + B(x, λ)−1 e(x) − e (x)L c (x, λ)−1 L c (x, λ) , λ= (6.2.10) ˆ x= ˆ x − L c (x, λ)−1 L c (x, λ). ˆ on (x, λ, c) is important, (x(x, ˆ Whenever the dependence of (x, ˆ λ) ˆ λ, c), λ(x, λ, c)) will ˆ If, for fixed λ, x = x(λ) is chosen as a local solution to be written in place of (x, ˆ λ). min Lc (x, λ) subject to x ∈ X,
(6.2.11)
ˆ λ(x(λ), λ, c) = λ + B(x(λ), λ)−1 e(x(λ)).
(6.2.12)
then L c (x(λ), λ) = 0 and
We point out that (6.2.12) can be interpreted as a second order update to the Lagrange variable. To acknowledge this, let dc denote the dual functional associated with Lc , i.e.,
dc (λ) = min Lc (x, λ) subject to {x : x − x ∗ ≤ } for some > 0. Then the first and second derivatives of dc with respect to λ satisfy ∇λ dc (λ) = e(x(λ)) and ∇λ2 dc (λ) = −B(x(λ), λ), and (6.2.12) presents a Newton step for maximizing dc . Returning to (6.2.8) we note that its blockwise solution given in (6.2.10) requires setting up and inverting L c (x, λ). Following an argument due to Bertsekas [Be] we next argue that L c (x, λ) can be avoided during the iteration. One requires only that L 0 (x, λ) = L (x, λ). In fact, we find L c (x, λ) = L 0 (x, λ + ce(x)) and L c (x, λ) = L 0 (x, λ + ce(x)) + c e (x)(·), e (x)(·) W .
i
i i
i
i
i
i
6.2. Equality-constrained problems
ItoKunisc 2008/6/12 page 159 i
159
Consequently (6.2.8) can be expressed as xˆ − x L0 (x, λ + ce(x)) + ce (x)∗ e (x) e (x)∗ e (x) 0 λˆ − λ L0 (x, λ + ce(x)) . =− e(x)
(6.2.13)
Using the second equation e (x)(xˆ −x) = −e(x) in the first equation of (6.2.13) we arrive at L 0 (x, λ + ce(x))(xˆ − x) − ce (x)∗ e(x) + e (x ∗ )(λˆ − λ) = −L 0 (x, λ + ce(x)), and hence (6.2.8) is equivalent to xˆ − x L0 (x, λ + ce(x)) e (x)∗ e (x) 0 λˆ − (λ + ce(x)) L0 (x, λ + ce(x)) =− . e(x)
(6.2.14)
Solving (6.2.8) is thus equivalent to (i) carrying out the first order multiplier iteration λ˜ = λ + ce(x), (ii) solving
˜ L 0 (x, λ) e (x)
e (x)∗ 0
xˆ − x λˆ − λ˜
=−
˜ L 0 (x, λ) e(x)
for (x, ˆ λˆ ). It will be convenient to introduce the matrix of operators L0 (x, λ) e (x)∗ M(x, λ) = . e (x) 0 With (6.2.3) and (6.2.5) holding there exist a constant κ > 0 and a neighborhood U (x ∗ , λ∗ ) ⊂ V (x ∗ , λ∗ ) of (x ∗ , λ∗ ) such that M −1 (x, λ) L(X×W ) ≤ κ for all (x, λ) ∈ U (x ∗ , λ∗ ).
(6.2.15)
Lemma 6.3. Assume that (6.2.3) and (6.2.5) hold. Then there exists a constant K > 0 ˆ of such that for any (x, λ) ∈ U (x ∗ , λ∗ ) the solution (x, ˆ λ) xˆ − x L (x, λ) M(x, λ) =− e(x) λˆ − λ satisfies
ˆ − (x ∗ , λ∗ )
ˆ λ)
(x,
X×W
2 ≤ K (x, λ) − (x ∗ , λ∗ ) X×W .
(6.2.16)
i
i i
i
i
i
i
160
ItoKunisc 2008/6/12 page 160 i
Chapter 6. Augmented Lagrangian-SQP Methods
Proof. Note that xˆ − x L (x, λ) − L (x ∗ , λ∗ ) M(x, λ) =− e(x) − e(x ∗ ) λˆ − λ and consequently x − x∗ xˆ − x ∗ L (x, λ) − L (x ∗ , λ∗ ) = − . + M(x, λ) M(x, λ) e(x) − e(x ∗ ) λ − λ∗ λˆ − λ∗ This equality further implies 15 6x − x ∗ xˆ − x ∗ ∗ ∗ = dt. M(x, λ) − M(tx + (1 − t)x , tλ + (1 − t)λ ) M(x, λ) λ − λ∗ λˆ − λ∗ 0 The regularity properties of f and e imply that (x, λ) → M(x, λ) is Lipschitz continuous on U (x ∗ , λ∗ ) for a Lipschitz constant γ > 0. Thus we obtain
∗
2
γ
M(x, λ) xˆ − x ≤ (x, λ) − (x ∗ , λ∗ ) X×W ,
ˆλ − λ∗ X×W 2 and by (6.2.15)
ˆ − (x ∗ , λ∗ )
ˆ λ)
(x,
X×W
≤
2 γ κ
(x, λ) − (x ∗ , λ∗ ) X×W , 2
which implies the claim. We now describe three algorithms and analyze their convergence. They are all based on (6.2.14) and differ only in the choice of x. Recall that if x is a solution to (6.2.11), then (6.2.12) and hence (6.2.14) provide a second order update to the Lagrange multiplier. Solving (6.2.11) implies extra computational cost. In the results which follow we show that as a consequence a larger region of attraction with respect to the initial condition and an improved rate of convergence factor is obtained, compared to methods which solve (6.2.11) only approximately or skip it all together. As a first choice x in (6.2.14) is determined by solving min Lc (x, λn ) subject to x ∈ V (x ∗ ). The second choice is to take x only as an appropriate suboptimal solution to this optimization problem, and the third choice is to simply choose x as the solution xˆ of the previous iteration of (6.2.14). Algorithm 6.1. ¯ ∞) and set σ = c − c, ¯ n = 0. (i) Choose λ0 ∈ W, c ∈ (c, (ii) Determine x˜ as a solution of min Lc (x, λn ) subject to x ∈ V (x ∗ ).
(Paux )
i
i i
i
i
i
i
6.2. Equality-constrained problems
ItoKunisc 2008/6/12 page 161 i
161
(iii) Set λ˜ = λn + σ e(x). ˜ ˆ (iv) Solve for (x, ˆ λ):
˜ xˆ − x˜ ˜ λ) L0 (x, . =− e(x) ˜ λˆ − λ˜
˜ M(x, ˜ λ)
ˆ n = n + 1, and goto (ii). (v) Set λn+1 = λ, The existence of a solution to (Paux ) is guaranteed if, for example, the following conditions on f and e hold: ⎧ ⎪ ⎨f : X → R is weakly lower semicontinuous, (6.2.17) e : X → W maps weakly convergent sequences ⎪ ⎩ to weakly convergent sequences. Under the conditions of Theorem 6.4 below it follows that the solutions x˜ of (Paux ) satisfy x˜ ∈ V (x ∗ ). Theorem 6.4. If (6.2.3), (6.2.5), and (6.2.17) hold and then Algorithm 6.1 is well defined and
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
X×W
≤
1 c−c¯
|λ0 − λ∗ |2 is sufficiently small,
2 Kˆ
λn − λ∗ W c − c¯
(6.2.18)
for a constant Kˆ independent of c and n = 0, 1, . . . . Proof. Let ηˆ be the largest radius for a ball centered at (x ∗ , λ∗ ) and contained in U (x ∗ , λ∗ ), and let γ be a Lipschitz constant for f and e on U (x ∗ , λ∗ ). Further let E = e (x ∗ ) and note that (EE ∗ )−1 E ∈ L(X) as a consequence of (6.2.3). We define M¯ = 1 + 2(cγ ¯ )2 + 8γ 2 (1 + μ)2 (EE ∗ )−1 E 2 , ¯ |λ0 − λ∗ |W + |λ∗ |W , and we put where μ = 1 + √cγ 2σ σ¯ 7
2σ σ¯ 2σ σ¯ η = min ηˆ , M¯ K M¯
,
(6.2.19)
where K is given in Lemma 6.3. Let us assume that
λ0 − λ∗ < η. W The proof will be given by induction on n. The case n = 0 follows from the general arguments given below. For the induction step we assume that
λi − λ∗ ≤ λi−1 − λ∗ for i = 1, . . . , n. (6.2.20) W W
i
i i
i
i
i
i
162
ItoKunisc 2008/6/12 page 162 i
Chapter 6. Augmented Lagrangian-SQP Methods
Using (ii) of Algorithm 6.1 we have f (x ∗ ) ≥ Lc (x, ˜ λn ) = f (x) ˜ + λ∗ , e(x) ˜ W + λn − λ∗ , e(x) ˜ W 1 1 + c¯ |e(x)| ˜ 2W + (c − c) ¯ |e(x)| ˜ 2W 2 2 σ ≥ Lc¯ (x, ˜ λ∗ ) + λn − λ∗ , e(x) ˜ W + |e(x)| ˜ 2W 2
2
2 1
= Lc¯ (x, ˜ λ∗ ) + ˜ − λ∗ W − λn − λ∗ W . λn + σ e(x) 2σ In view of Lemma 6.2 and the fact that x˜ is chosen in V (x ∗ )(⊂ V˜ (x ∗ )) we find
2
2
2 1
1
σ¯ x˜ − x ∗ X + λn − λ∗ W . (6.2.21)
λ˜ − λ∗ ≤ W 2σ 2σ 3 ¯ This implies in particular that |x˜ − x ∗ | ≤ 2σMσ¯ |λ0 − λ∗ | < ηˆ and hence x˜ ∈ V (x ∗ ). The necessary optimality conditions for (6.2.1) and (Paux ) with x˜ ∈ V (x ∗ ) are given by f (x ∗ ) + E ∗ λ∗ = 0 and
f (x) ˜ + e (x) ˜ ∗ (λn + ce(x)) ˜ = 0.
Subtracting these two equations and defining λ¯ = λn + ce(x) ˜ give ¯ = f (x) ¯ E ∗ (λ∗ − λ) ˜ − f (x ∗ ) + (e (x) ˜ ∗ − e (x ∗ )∗ )(λ) and consequently ¯ . λ∗ − λ¯ = (EE ∗ )−1 E f (x) ˜ − f (x ∗ ) + (e (x) ˜ ∗ − e (x ∗ )∗ )(λ) We obtain
¯ W ≤ 2γ (1 + |λ| ¯ W )(EE ∗ )−1 E x˜ − x ∗ . |λ∗ − λ| X
(6.2.22)
To estimate λ¯ observe that due to (6.2.20) and (6.2.21)
¯λ ≤
λ˜ − λ¯
+
λ˜
= c¯ e(x) ˜ − e(x ∗ ) W + λ∗ W + λ˜ − λ∗ W W W W
cγ ¯
λ n − λ ∗ ≤ cγ ¯ x˜ − x ∗ W + λ∗ W + λn − λ∗ W ≤ 1 + √ W 2σ σ¯
∗ + λ W ≤ μ. (6.2.23) Using (6.2.20)–(6.2.23) we find ˜ − (x ∗ , λ∗ )|2X×W ≤ |x˜ − x ∗ |2X + 2|λ˜ − λ| ¯ 2W + 2|λ¯ − λ∗ |2W |(x, ˜ λ) = |x˜ − x ∗ |2X + 2(cγ ¯ )2 |x˜ − x ∗ |2X + 8γ 2 (1 + μ)2 (EE ∗ )−1 E2 |x˜ − x ∗ |2X
2 M¯
M¯ 2 = λn − λ∗ X < η ≤ ηˆ 2 . 2σ σ¯ 2σ σ¯
i
i i
i
i
i
i
6.2. Equality-constrained problems
ItoKunisc 2008/6/12 page 163 i
163
˜ We find This implies that Lemma 6.3 is applicable with (x, λ) = (x, ˜ λ).
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
X×W
≤
2 K M¯
λn − λ∗ X , 2σ σ¯
(6.2.24)
¯
. From (6.2.20), (6.2.24), and the definition of η we and (6.2.18) is proved with Kˆ = K2σM ¯ also obtain
λn − λ∗ < η.
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ X×W W This implies (6.2.20) for i = n + 1 and the proof is finished. Algorithm 6.2. This coincides with Algorithm 6.1 except for (i) and (ii), which are replaced by (i) Choose λ0 ∈ W, c ∈ (c, ¯ ∞), and σ ∈ (0, c − c] ¯ and set n = 0. (ii) Determine x˜ ∈ V (x ∗ ) such that ˜ λn ) ≤ Lc (x ∗ , λn ) = f (x ∗ ). Lc (x, Theorem 6.5. Let (6.2.3) and (6.2.5) hold. If |λ0 − λ∗ |W is sufficiently small, then Algorithm 6.2 is well defined and
2
1
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ K 1 + (6.2.25) λn − λ∗ W X×W σ σ¯ for all n = 0, 1, . . . . Here xn+1 stands for xˆ of step (iv) of Algorithm 6.1 and K, independent of c, is given in (6.2.16). Proof. Let ηˆ be defined as in the proof of Theorem 6.4. We define ηˆ 1 η = min √ , , a Ka where a = 1 +
1 σ σ¯
(6.2.26)
. The proof is based on an induction argument. If
λ0 − λ∗ < η,
then the first iterate of Algorithm 6.2 is well defined,
(x1 , λ1 ) − (x ∗ , λ∗ )
X×W
<η
and (6.2.25) holds with n = 0. This will follow from the general arguments given below. Assuming that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η, we show that Algorithm 6.2 is well defined for n + 1, that
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) < η, X×W
i
i i
i
i
i
i
164
ItoKunisc 2008/6/12 page 164 i
Chapter 6. Augmented Lagrangian-SQP Methods
and that (6.2.25) holds. As in the proof of Theorem 6.4 one argues that (6.2.21) holds and consequently
2
1
∗ ∗
λn − λ∗ 2 . ˜ ˜ λ) − (x , λ ) ≤ 1+ (6.2.27)
(x, W X×W 2σ σ¯ This implies that |(x, ˜ λ˜ ∗ ) − (x ∗ , λ∗ )|X×W < ηˆ and hence Lemma 6.3 is applicable with ˜ (x, λ) = (x, ˜ λ) and (iv) of Algorithm 6.2 is well defined. Combining (6.2.16) with (6.2.27) we find
1
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
λn − λ∗ 2 , ≤K 1+ W X×W 2σ σ¯ which is (6.2.25). By the definition of η we further obtain
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ λn − λ∗ W < η. X×W Remark 6.2.1. In the proof of Theorem 6.4 as well as in that of Theorem 6.5 we utilize (6.2.7) from Lemma 6.2. Conditions (6.2.3) and (6.2.5) are sufficient conditions for (6.2.7) to hold for c ≥ c¯ > 0. If (6.2.7) can be shown to hold for all c ≥ 0, then c¯ can be chosen equal to 0 and σ = c is admissible in (i ) of Algorithm 6.2. In the third algorithm we delete the second step of Algorithms 6.1 and 6.2 and directly iterate (6.2.14). Algorithm 6.3. (i) Choose (x0 , λ0 ) ∈ X × W, c ≥ 0 and put n = 0. (ii) Set λ˜ = λn + ce(xn ). ˆ (iii) Solve for (x, ˆ λ): ˜ M(xn , λ)
xˆ − xn λˆ − λ˜
=−
˜ L 0 (xn , λ) e(xn )
.
ˆ n = n + 1, and goto (ii). (iv) Set (xn+1 , λn+1 ) = (x, ˆ λ), Theorem 6.6. Let (6.2.3) and (6.2.5) hold. If max(1, c) |(x0 , λ0 ) − (x ∗ , λ∗ )|X×W is sufficiently small, then Algorithm 6.3 is well defined and
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ K˜ (xn , λn ) − (x ∗ , λ∗ ) X×W X×W for a constant K˜ independent of n = 0, 1, 2, . . . , but depending on c. Proof. Let ηˆ and γ be defined as in the proof of Theorem 6.4. We introduce ηˆ 1 η = min √ , , a Ka
(6.2.28)
i
i i
i
i
i
i
6.3. Partial elimination of constraints
ItoKunisc 2008/6/12 page 165 i
165
where a = max 2, 1 + 2c2 γ 2 and K is defined in Lemma 6.3. Let us assume that
(x0 , λ0 ) − (x ∗ , λ∗ ) < η. X×W Again we proceed by induction and the case n = 0 follows from the general arguments given below. Let us assume that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η. Then
2
˜ − (x ∗ , λ∗ )
(xn , λ)
X×W
2
2
2
≤ xn − x ∗ X + 2c2 e(xn ) − e(x ∗ ) W + 2 λn − λ∗ W
2
2
2 ≤ xn − x ∗ X + 2c2 γ 2 xn − x ∗ X + 2 λn − λ∗ W
2 ≤ a (xn , λn ) − (x ∗ , λ∗ ) X×W < ηˆ 2 ,
˜ and thus Lemma 6.3 is applicable with (x, λ) = (xn , λ). Remark 6.2.2. (i) If c is set equal to 0 in Algorithm 6.3, then we obtain the well-known SQP algorithm for the equality-constrained problem (6.2.1). It is well known to have a second order convergence rate which also follows from Theorem 6.6 since K˜ is finite for c = 0. (ii) Theorem 6.6 suggests that in case (Paux ) is completely skipped in the second order augmented Lagrangian update the penalty parameter may have a negative effect on the region of attraction and on the convergence rate estimate. Our numerical experience, without additional globalization techniques, indicates that moderate values of c do not impede the behavior of the algorithm when compared to c = 0, which results in the SQP algorithm. Choosing c > 0 may actually enlarge the region of attraction when compared to c = 0. For parameter estimation problems c > 0 is useful, because in this way Algorithm 6.3 becomes a hybrid algorithm combining the output least squares and the equation error formulations [IK9].
6.3
Partial elimination of constraints
Here we consider ⎧ ⎪ ⎨ min f (x) subject to ⎪ ⎩
(6.3.1) e(x) = 0, g(x) ≤ 0, and (x) ∈ K,
where f : X → R, e : X → W, g : X → Rm , : X → Z, where X, W , and Z are real Hilbert spaces, K is a closed convex cone in Z, and is an affine constraint. In the remainder of this chapter we identify the dual of Z with itself. Let x ∗ be a local solution of (6.3.1) and assume that ⎧ ⎨ f, e, and g are twice continuously Fréchet differentiable with second derivatives Lipschitz continuous (6.3.2) ⎩ in a neighborhood of x ∗ .
i
i i
i
i
i
i
166
ItoKunisc 2008/6/12 page 166 i
Chapter 6. Augmented Lagrangian-SQP Methods
The objective of this section is to approximate (6.3.1) by a sequence of problems with quadratic cost and affine constraints. For this purpose we utilize the Lagrangian L : X × W × Rm × Z → R associated with (6.3.1) given by L(x, λ, μ, η) = f (x) + λ, e(x) W + μ, g(x) Rm + η, (x) Z . We shall require the following assumption (cf. Chapter 3): x ∗ is a regular point, i.e., ⎧⎛ ∗ ⎞ ⎞ ⎛ ⎞⎫ ⎛ 0 0 ⎬ ⎨ e (x ) ⎠ + ⎝ g(x ∗ ) ⎠ , 0 ∈ int ⎝ g (x ∗ ) ⎠ X + ⎝ Rm + ⎭ ⎩ −K L (x ∗ )
(6.3.3)
where the interior is taken in W × Rm × Z. With (6.3.3) holding there exists a Lagrange multiplier (λ∗ , μ∗ , η∗ ) ∈ W × Rm,+ × K + such that ⎧ ∗ ∗ ∗ ∗ L (x , λ , μ , η ), ⎪ ⎪ ⎨ −e(x ∗ ), 0∈ −g(x ∗ ) + ∂ ψ Rm,+ (μ∗ ), ⎪ ⎪ ⎩ −(x ∗ ) + ∂ ψK + (η∗ ). As in Section 3.1, we assume that the coordinates of the inequalities g(x) ≤ 0 are arranged so that μ∗ = (μ∗,+ , μ∗,0 , μ∗,− ) and g = (g + , g 0 , g − ) with g + : X → Rm1 , g 0 : Rm2 , g − : X → Rm3 , m = m1 + m2 + m3 , and g + (x ∗ ) = 0, μ∗,+ > 0, g 0 (x ∗ ) = 0, μ∗,0 = 0, g − (x ∗ ) < 0, μ∗,− = 0. We further define G+ = g + (x ∗ ) , G0 = g 0 (x ∗ ) , E E+ = : X → W × R m1 , G+ and for z ∈ Z we define the operator E(z) : X × R → (W × Rm1 ) × Rm2 × Z by ⎞ ⎛ E+ 0 E(z) = ⎝ G0 0 ⎠ . L z The following additional assumptions are required:
there exists κ > 0 such that L (x ∗ , λ∗ , μ∗ ) (x, x) ≥ κ|x|2X for all x ∈ ker(E+ )
(6.3.4)
and E((x ∗ )) is surjective.
(6.3.5)
The SQP method for (6.2.1) with elimination of the equality and finite rank inequality constraints is given next. We set L(x, λ, μ) = f (x) + λ, e(x) W + μ, g(x) Rm .
i
i i
i
i
i
i
6.3. Partial elimination of constraints
ItoKunisc 2008/6/12 page 167 i
167
Algorithm 6.4. (i) Choose (x0 , λ0 , μ0 ) ∈ X × W × Rm and set n = 0. (ii) Solve for (xn+1 , λn+1 , μn+1 ): ⎧ min 12 L (xn , λn , μn )(x − xn , x − xn ) + f (xn )(x − xn ), ⎪ ⎪ ⎪ ⎪ ⎨ e(xn ) + e (xn )(x − xn ) = 0, ⎪ ⎪ ⎪ ⎪ ⎩ g(xn ) + g (xn )(x − xn ) ≤ 0, (xn ) + L(x − xn ) ∈ K,
(Paux )
where (λn+1 , μn+1 ) are the Lagrange multipliers associated to the equality and inequality constraints. (iii) Set n = n + 1 and goto (ii). )2), (H3) of Chapter 2, there exist by CorolSince (6.3.3)–(6.3.5) imply (H1), (H ∗ ∗ ∗ ˆ lary 2.18 neighborhoods U (x , λ , μ ) and U (x ∗ ) such that the auxiliary problem (Paux ) of Algorithm 6.4 admits a unique local solution xn+1 in U (x ∗ ) provided that (xn , λn , μn ) ∈ Uˆ (x ∗ , λ∗ , μ∗ ) = Uˆ (x ∗ )×Uˆ (λ∗ , μ∗ ). To obtain a convergence rate estimate for xn , Lagrange multipliers to the constraints in (Paux ) are introduced. Since the regular point condition is stable with respect to perturbations in x ∗ , we can assume that Uˆ (x ∗ ) is chosen sufficiently small such that for xn ∈ Uˆ (x ∗ ) there exist (λn+1 , μn+1 , ηn+1 ) ∈ W × Rm × Z such that ⎛ ⎞ 0 ⎜ 0 ⎟ ⎟ (6.3.6) 0 ∈ G(xn , λn , μn )(xn+1 , λn+1 , μn+1 , ηn+1 ) + ⎜ ⎝ ∂ ψRm,+ (μn+1 ) ⎠ , ∂K + (ηn+1 ) where G(xn , λn , μn )(x, λ, μ, η) ⎛ L (xn , λn , μn )(x − xn ) + f (xn ) + e (xn )∗ λ + g (xn )∗ μ + L∗ η ⎜ −e(xn ) − e (xn )(x − xn ) =⎜ ⎝ −g(xn ) − g (xn )(x − xn ) −(xn ) − L(x − xn )
⎞ ⎟ ⎟. ⎠
Theorem 6.7. Assume that (6.3.2) – (6.3.5) are satisfied at (x ∗ , λ∗ , μ∗ ) and that |(x0 , λ0 , μ0 )− (x ∗ , λ∗ , μ∗ )| is sufficiently small. Then there exists K¯ > 0 such that ¯ n , λn , μn ) − (x ∗ , λ∗ , μ∗ )|2 |(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ K|(x for all n = 1, 2, . . . . Proof. The optimality system for (6.3.1) can be expressed by ⎛ 0 ⎜ 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 ∈ G(x , λ , μ )(x , λ , μ , η ) + ⎜ ⎝ ∂ ψRm,+ (μ∗ ) ∂ ψK + (η∗ )
⎞ ⎟ ⎟, ⎠
i
i i
i
i
i
i
168
ItoKunisc 2008/6/12 page 168 i
Chapter 6. Augmented Lagrangian-SQP Methods
or equivalently ⎛ ⎞ ⎛ an ⎜bn ⎟ ⎜ ⎟ + G(xn , λn , μn )(x ∗ , λ∗ , μ∗ , η∗ ) + ⎜ 0∈⎜ ⎝ cn ⎠ ⎝ dn
⎞ 0 ⎟ 0 ⎟, ∗ ∂ ψRm,+ (μ ) ⎠ ∂k+ (η∗ )
(6.3.7)
where (an , bn , cn , dn ) = (a(xn , λn , μn ), b(xn ), c(xn ), d(xn )) and a(x, λ, μ) = f (x ∗ ) − f (x) + (e (x ∗ )∗ − e (x)∗ )λ∗
+ (g (x ∗ )∗ − g (x)∗ )μ∗ − L (x, λ, μ)(x ∗ − x), b(x) = e(x) + e (x)(x ∗ − x) − e(x ∗ ), c(x) = g(x) + g (x)(x ∗ − x) − g(x ∗ ), d(x) = 0. Without loss of generality we may assume that the first and second derivatives of f, e, and g are Lipschitz continuous in Uˆ (x ∗ ). It follows that there exists L˜ such that ˜ (|a(x, λ)|2 + |b(x)|2 + |c(x)|2 )1/2 ≤ L(|x − x ∗ |2 + |λ − λ∗ |2 + |μ − μ∗ |2 ) for all (x, λ, μ) ∈ Uˆ (x ∗ ) × Uˆ (λ∗ , μ∗ ).
(6.3.8)
Let K˜ be determined from Corollary 2.18 and let B((x ∗ , λ∗ , μ∗ ), r) denote a ball in Uˆ (x ∗ )× Uˆ (λ∗ , μ∗ ) with center (x ∗ , λ∗ , μ∗ ) and radius r, where r K˜ L˜ < 1. Proceeding by induction, assume that (xn , λn , μn ) ∈ B((x ∗ , λ∗ , μ∗ ), r). Then from Corollary 2.18, with (x¯1 , λ¯ 1 , μ¯ 1 ) = (x¯2 , λ¯ 2 , μ¯ 2 ) = (xn , λn , μn ), and (6.3.6) – (6.3.8) we find |(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ˜ n , bn , cn )|X×W ×Rm ≤ K˜ L(|x ˜ n − x ∗ |2 + |λn − λ∗ |2 + |μn − μ∗ |2 ) ≤ K|(a ≤ (K˜ L˜ r)r < r. This estimate implies that (xn+1 , λn+1 , μn+1 ) ∈ B((x ∗ , λ∗ , μ∗ ), r), as well as the desired local quadratic convergence of Algorithm 6.4. Remark 6.3.1. Let L(x, λ) = f (x) + λ, e(x) W and consider Algorithm 6.4 with L (xn , λn , μn ) replaced by L (xn , λn ); i.e., only the equality constraints are eliminated. If (6.3.4) is satisfied with L (x ∗ , λ∗ , μ∗ ) replaced by L (x ∗ , λ∗ ), then Theorem 6.7 holds for the resulting algorithm with the triple (x, λ, μ) replaced by (x, λ). Next we consider
min f (x) subject to e(x) = 0, x ∈ C,
(6.3.9)
i
i i
i
i
i
i
6.3. Partial elimination of constraints
ItoKunisc 2008/6/12 page 169 i
169
where f and e are as in (6.3.1) and C is a closed convex set in X. Let x ∗ denote a local solution of (6.3.9) and assume that f and e are twice continuously Fréchet differentiable, (6.3.10) f and e are Lipschitz continuous in a neighborhood of x ∗ , 0 ∈ int e (x ∗ )(C − x ∗ ),
(6.3.11)
and
there exists a constant κ > 0 such that L (x ∗ , λ∗ ) (x, x) ≥ κ|x|2X for all x ∈ ker e (x ∗ ).
(6.3.12)
To solve (6.3.9) we consider the SQP algorithm with elimination of the equality constraint. Algorithm 6.5. (i) Choose (x0 , λ0 ) ∈ X × W ∗ and set n = 0. (ii) Solve for (xn+1 , λn+1 ): ⎧ 1 ⎪ ⎨min 2 L (xn , λn )(x − xn , x − xn ) + f (xn )(x − xn ), ⎪ ⎩
(Paux ) e(xn ) + e (xn )(x − xn ) = 0, x ∈ C,
where λn+1 is the Lagrange multiplier associated to the equality constraint. (iii) Set n = n + 1 and goto (ii). The optimality condition for (Paux ) in Algorithm 6.5 is given by ∂ψC (xn+1 ) 0 ∈ G(xn , λn )(xn+1 , λn+1 ) + , 0 where
(6.3.13)
L (xn , λn )(x − xn ) + f (xn ) G(xn , λn )(x, λ) = . −e(xn ) − e (xn )(x − xn )
Due to (6.3.11) and (6.3.12) there exists a neighborhood U (x ∗ , λ∗ ) of (x ∗ , λ∗ ) such that (6.3.13) admits a solution (xn+1 , λn+1 ) and xn+1 is the solution of (Paux ) of Algorithm 6.5, provided that (xn , λn ) ∈ U (x ∗ , λ∗ ). We also require the following condition: ⎧ ⎪ There exist neighborhoods Uˆ (x ∗ ) × Uˆ (λ∗ ) of (x ∗ , λ∗ ) and ⎪ ⎪ ⎪ ⎪ V of the origin in X × W and a constant K˜ such that ⎪ ⎪ ⎪ ⎪ for all q1 , q2 ∈ V and (x, ¯ ∈ Uˆ (x ∗ ) × Uˆ (λ∗ ) there exists ¯ λ) ⎨ (6.3.14) a unique solution ⎪ ⎪ ∗ ∗ ˆ ˆ ¯ ¯ ¯ ⎪ (x, λ) = (x( x, ¯ λ , q), λ( x, ¯ λ , q ), λ( x, ¯ λ, q )) ∈ U (x ) × U (λ ) of 1 1 ⎪ ⎪ ⎪ ¯ ⎪ 0 ∈ q + G( x, ¯ λ)(x, λ) + ∂ψ (x) and 1 C ⎪ ⎪ ⎩ ˜ 1 − q2 |. ¯ q1 ), λ(x, ¯ q1 )) − (x(x, ¯ q2 ), λ(x, ¯ q2 ))| ≤ K|q |(x(x, ¯ λ, ¯ λ, ¯ λ, ¯ λ,
i
i i
i
i
i
i
170
ItoKunisc 2008/6/12 page 170 i
Chapter 6. Augmented Lagrangian-SQP Methods
Remark 6.3.2. From the discussion in the first part of this section it follows that (6.3.14) holds for problem (6.3.1) with C = {x : g(x) ≤ 0, (x) ∈ K}, provided that g is convex. Condition (6.3.14) was analyzed for optimal control problems in several papers; see, for instance, [AlMa, Tro]. Theorem 6.8. Suppose that (6.3.10)–(6.3.13) hold at a local solution x ∗ of (6.3.9). If |(x0 , λ0 ) − (x ∗ , λ∗ )| is sufficiently small, then the iterates (xn , λn ) converge quadratically to (x ∗ , λ∗ ). Proof. The optimality system for (6.3.9) can be expressed as ∂ψC (x ∗ ) , 0 ∈ G(x ∗ , λ∗ )(x ∗ , λ∗ ) + 0 or equivalently 0∈
an ∂ψC (x ∗ ) , + G(xn , λn )(x ∗ , λ∗ ) + bn 0
(6.3.15)
where
an = f (x ∗ ) − f (xn ) + (e (xn )∗ − e (x ∗ )∗ )λ∗ − L (xn , λn )(x ∗ − xn ), bn = e(xn ) + e (xn )(x ∗ − xn ) − e(x ∗ ). In view of (6.3.10) we may assume that Uˆ (x ∗ ) is chosen sufficiently small such that f and e are Lipschitz continuous on Uˆ (x ∗ ). Consequently there exists L˜ such that
˜ n − x ∗ |2 + |λn − λ∗ |2 ), provided that xn ∈ Uˆ (x ∗ ). |(an , bn )| ≤ L(|x
(6.3.16)
˜ −1 and center (x ∗ , λ∗ ) contained Let B((x ∗ , λ∗ ), r) denote a ball with radius r < (K˜ L) in U (x ∗ , λ∗ ) and in Uˆ (x ∗ ) × Uˆ (λ∗ ), and assume that (xn , λn ) ∈ B((x ∗ , λ∗ ), r). Then by (6.3.13)–(6.3.16) we have |(xn+1 , λn+1 ) − (x ∗ , λ∗ )| ≤ K˜ L˜ |(xn , λn ) − (x ∗ , λ∗ )|2 < |(xn , λn ) − (x ∗ , λ∗ )| < r. The proof now follows by an induction argument. In Section 6.2 the combination of the second order method (6.2.8) with first order augmented Lagrangian updates was analyzed for equality-constrained problems. This approach can also be taken for problems with inequality constraints. We present the analogue of Theorem 6.4. Let c c Lc (x, λ, μ) = f (x) + λ, e(x) W + μ, g(x, ˆ μ, c) Rm + |e(x)|2W + |g(x, ˆ μ, c)|2Rm , 2 2 where g(x, ˆ μ, c) = max(g(x), − μc ), as defined in Chapter 3. Below c˜ ≥ c¯ denote the constants and Bδ the closed ball of radius δ around x ∗ that were utilized in Corollary 3.7.
i
i i
i
i
i
i
6.3. Partial elimination of constraints
ItoKunisc 2008/6/12 page 171 i
171
Algorithm 6.6. (i) Choose (λ0 , μ0 ) ∈ W × Rm , c ∈ [c, ˜ ∞) and set n = 0. (ii) Determine x˜ as the solution to min Lc (x, λn , μn ) subject to x ∈ Bδ , (x) ∈ K.
1 ) (Paux
¯ x), ˜ μ˜ = μn + (c − c) ¯ g( ˆ x, ˜ μn , c). (iii) Update λ˜ = λn + (c − c)e( ˜ μ) ˜ λ, ˜ for (xn+1 , λn+1 , ηn+1 ). (iv) Solve (Paux ) of Algorithm 6.4 with (xn , λn , μn ) = (x, (v) Set n = n + 1 and goto (ii). Theorem 6.9. Assume that (3.4.7), (3.4.9) of Chapter 3 and (6.3.2)–(6.3.4) of this chapter hold at (x ∗ , λ∗ , μ∗ ). If c−1 c¯ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then Algorithm 6.6 is well defined and |(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )|X×W ×Rm ×Z ≤
Kˆ (|λn − λ∗ |2W + |μn − μ∗ |2Rm ) c − c¯
for a constant Kˆ independent of n = 0, 1, 2, . . . . Proof. The assumptions guarantee that Corollary 3.7, Proposition 3.9, and Theorem 3.10 of Chapter 3 and Theorem 6.7 of the present chapter are applicable. The proof is now similar to that of Theorem 6.4, with Theorem 6.7 replacing Lemma 6.3 to estimate the step of (Paux ). Let ηˆ be the radius of the largest ball U centered at (x ∗ , λ∗ , μ∗ ) such that Theorem 6.7 is applicable and that the x-coordinates of elements in U are also in Bδ . Let √ ¯ κ 2 (c − c) 2(c − c) ¯ η = min ηˆ κ c − c, ¯ , where κ 2 = , 1 + 2K 2 K¯ with K defined in Theorem 3.10 and K¯ in Theorem 6.7, |(λ0 , μ0 ) − (λ∗ , μ∗ )|W ×Rm < η. We proceed by induction with respect to n. The case n = 0 is simple and we assume that |(λi , μi ) − (λ∗ , μ∗ )| ≤ |(λi−1 , μi−1 ) − (λ∗ , μ∗ )| for i = 1, . . . , n.
(6.3.17)
From Theorems 3.8 and 3.10 we have |(x˜ − x ∗ , λ˜ − λ∗ , μ˜ − μ∗ )| ≤ <
√
1
κ c − c¯
| (λn , μn ) − (λ∗ , μ∗ )| (6.3.18)
η
≤ η. ˆ √ κ c − c¯
i
i i
i
i
i
i
172
ItoKunisc 2008/6/12 page 172 i
Chapter 6. Augmented Lagrangian-SQP Methods
This estimate implies that x˜ ∈ int Bδ and that Theorem 6.7 with (xn , λn , μn ) replaced by ˜ μ) (x, ˜ λ, ˜ is applicable. It implies ¯ x, ˜ μ) |(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ K|( ˜ λ, ˜ − (x ∗ , λ∗ , μ∗ )|2 ,
(6.3.19)
and combined with (6.3.18) K¯ |(λn , μn ) − (λ∗ , μ∗ )|2 κ 2 (c − c) ¯ ≤ |(λn , μn ) − (λ∗ , μ∗ )|2 .
|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤
This implies (6.3.17) with i = n + 1 as well as the claim of the theorem.
6.4 Applications 6.4.1 An introductory example Consider the optimal control problem ⎧ min 12 Q |y − z|2 dxdt + β2 |u|2U , ⎪ ⎪ ⎨ yt = y + g(y) + Bu in Q, ⎪ y ⎪ = 0 on , ⎩ y(0, ·) = ϕ on ,
(6.4.1)
where Q = (0, T ) × and = (0, T ) × ∂. Define e : X = Y × U → W by e(y, u) = yt − y − g(y) − Bu. Here B ∈ L(U, Y ), where U is the Hilbert space of controls, and the choice for Y can be Y = {y ∈ L2 (H01 ()) : yt ∈ L2 (H −1 ())} or Y = L2 (H01 () ∩ H 2 ()) ∩ W 1,2 (L2 ()), for example. Here L2 (H01 ()) is an abbreviation for L2 (0, T ; H01 ()). The former choice corresponds to variational formulations, the latter to strong formulations of the partial differential equation and both require regularity assumptions for g. Matching choices for W are W = L2 (H −1 ()) and W = L2 (L2 ()). We proceed with Y = {y ∈ L2 (H01 ()) : yt ∈ L2 (H −1 ())}. The Lagrangian is given by L(y, u, λ) = J (y, u) + λ, e(y, u) L2 (H −1 ) T = J (y, u) + λ, (− )−1 (yt − y − g(y) − Bu) H −1 ,H01 dt, 0
i
i i
i
i
i
i
6.4. Applications
ItoKunisc 2008/6/12 page 173 i
173
where denotes the Laplacian with Dirichlet boundary conditions. The augmented Lagrangian SQP step as described below (6.2.14) is given by λ˜ = λ + c(yt − y − g(y) − Bu) and ⎛
⎞⎛ ⎞ δy I + g (y) −1 λ˜ 0 ( ∂t∂ + + g (y)) −1 ⎝ ⎠ ⎝ δu ⎠ 0 βI B ∗ −1 −1 ∂ −1 δλ (− ) ( ∂t − − g (y)) B 0 ⎛ ⎞ y − z − (( ∂t∂ + + g (y))(− )−1 λ˜ ⎠, = −⎝ βu + B ∗ −1 λ˜ −1 (− ) (yt − y − g(y) − Bu) with δy(0, ·) = δλ(T , ·) = 0. Inspection of the previous system (e.g., for the sake of symmetrization) suggests introducing the new variable = − −1 λ. This results in an update for given by ˜ = + c(− )−1 (yt − y − g(y) − Bu),
(6.4.2)
and the transformed system has the form ⎛ ˜ I − g (y) 0 ⎝ 0 βI ( ∂t∂ − − g (y)) −B
⎞⎛ ⎞ δy −( ∂t∂ + + g (y)) ⎠ ⎝ δu ⎠ −B ∗ δ 0 ⎛ ⎞ ˜ y − z − ( ∂t∂ + + g (y)) ⎠. ˜ = −⎝ βu − B ∗ yt − y − g(y) − Bu
(6.4.3)
Let us also point out that if we had immediately discretized (6.4.1), then the differences between topologies tend to get lost and (− )−1 in (6.4.2) may have been forgotten. On the discretized level the effect of (− )−1 can be restored by preconditioning. Let us now return to the system (6.4.3). Due to its size it will—after discretization— not be solved by a direct method but rather by iteration based on conjugate gradients. The question of choice for preconditioners arises. The following two choices for block preconditioners were successful in our tests [KaK]: ⎛
0 ⎝ 0 P
0 βI 0
⎞−1 ⎛ P∗ 0 0 ⎠ =⎝ 0 0 P −∗
0
β −1 I 0
⎞ P −1 0 ⎠ 0
(6.4.4)
and ⎛
I ⎝ 0 P
0 βI 0
⎞−1 ⎛ P∗ 0 0 ⎠ =⎝ 0 0 P −∗
0
β −1 I 0
⎞ P −1 ⎠, 0 −∗ −1 P P ,
(6.4.5)
i
i i
i
i
i
i
174
ItoKunisc 2008/6/12 page 174 i
Chapter 6. Augmented Lagrangian-SQP Methods
where P = ∂t − . Note that (6.4.4) requires 2, whereas (6.4.5) needs 4, parabolic solves per iteration. Numerically, it appears that (6.4.4) is preferable to (6.4.5). For further investigations on preconditioning of SQP-like systems we refer the reader to [BaSa]. This may still be extended by the results of Battermann and Sachs.
6.4.2 A class of nonlinear elliptic optimal control problems The general framework of the previous section will be applied to optimal control problems governed by partial differential equations of the type ⎧ ˆ in , ⎪ ⎨ − y + g(y) = f ∂y (6.4.6) = u on 1 , ∂n ⎪ ⎩ ∂y = uf on 2 , ∂n where fˆ ∈ L2 () and uf ∈ L2 (2 ) are fixed and u ∈ L2 (1 ) will be the control variable. Here is a bounded domain in R2 with C 1,1 boundary or is convex. For the case of higher-dimensional domains we refer the reader to [IK10]. The boundary ∂ is assumed to consist of two disjoint sets 1 , 2 , each of which is connected, or possibly consisting of finitely many connected components, with = ∂ = 1 ∪ 2 , and 2 possibly empty. Further it is assumed that g ∈ C 2 (R) and that g(H 1 ()) ⊂ L1+ε () for some ε > 0. Equation (6.4.6) is understood in the variational sense, i.e., (∇y, ∇ϕ) + (g(y), ϕ) = (fˆ, ϕ) + (u, ˜ ϕ) for all ϕ ∈ H 1 (), where
u˜ =
u uf
(6.4.7)
on 1 , on 2 ,
(·, ·) denotes the L2 -inner product on , and (·, ·) stands for duality pairing between functions in Lp () and Lq () with p −1 + q −1 = 1 for appropriate choices of p. We recall that H 1 () is continuously embedded into Lp () for every p ≥ 1 if n = 2. ˜ ϕ , with τ the In (6.4.7) we should more precisely write u, ˜ τ ϕ instead of u, zero order trace operator on . However, we shall frequently suppress this notation. We refer to y as a solution of (6.4.6) if (6.4.7) holds. The optimal control problem is given by ⎧ 1 α ⎨ min 2 |Cy − yd |2Z + 2 |u|2L2 (1 ) (6.4.8) ⎩ 1 2 subject to (y, u) ∈ H () × L (1 ) a solution of (6.4.6). Here C is a bounded linear (observation) operator from H 1 () to a Hilbert space Z, and yd ∈ Z and α > 0 are fixed. To express (6.4.8) in the form (6.2.1) of Section 6.2 we introduce e˜ : H 1 () × L2 (1 ) → H 1 ()∗ with ˜ ϕ) e(y, ˜ u), ϕ (H 1 )∗ ,H 1 = (∇y, ∇ϕ) + (g(y) − fˆ, ϕ) − (u,
i
i i
i
i
i
i
6.4. Applications
ItoKunisc 2008/6/12 page 175 i
175
and e : H 1 () × L2 () → H 1 () by e = N e, ˜ where N : H 1 ()∗ → H 1 () is the Neumann solution operator associated with (∇v, ∇ϕ) + (v, ϕ) = (h, ϕ) for all ϕ ∈ H 1 (), where h ∈ H 1 ()∗ . In the context of Section 6.2 we set X = H 1 () × L2 (1 ),
Y = H 1 (),
with x = (y, u) ∈ X, and f (x) = f (y, u) =
1 α |Cy − yd |2Z + |u|2L2 (1 ) . 2 2
We assume that (6.4.8) admits a solution x ∗ = (y ∗ , u∗ ). The regularity requirements (6.2.2) of Section 6.2 are clearly met by the mapping g. Those for e are implied by (h0)
y → g(y) is twice continuously differentiable from H 1 () to L1+ε () for some ε > 0 with Lipschitz continuous second derivative in a neighborhood of y ∗ .
We shall also require the following hypothesis: (h1) g (y ∗ ) ∈ L2+ε () for some ε > 0. With (h1) holding, g (y ∗ )ϕ ∈ L2 () for every ϕ ∈ H 1 (). It is simple to argue that ker e (x ∗ ) = {(v, h) : ∇v, ∇ϕ + g (y ∗ )v, ϕ = h, ϕ 1 for all ϕ ∈ H 1 ()}, i.e., (v, h) ∈ ker e (x ∗ ) if and only if (v, h) is a variational solution of ⎧ − v + g (y ∗ )v = 0 in , ⎪ ⎪ ⎪ ⎪ ⎨ ∂v = h on 1 , ∂n ⎪ ⎪ ⎪ ⎪ ⎩ ∂v = 0 on 2 . ∂n
(6.4.9)
If (6.2.3) is satisfied, i.e., if e (x ∗ ) is surjective, then there exists a unique Lagrange multiplier λ∗ ∈ H 1 () associated to x ∗ such that e (x ∗ )∗ λ∗ + (N C ∗ (Cy ∗ − yd ), αu∗ ) = 0 in H 1 () × L2 (1 ),
(6.4.10)
where e (x ∗ )∗ : H 1 () → H 1 () × L2 (1 ) denotes the adjoint of e (x ∗ ) and C ∗ : Z → H 1 ()∗ stands for the adjoint of C : H 1 () → Z with Z a pivot space. More precisely we have the following proposition.
i
i i
i
i
i
i
176
ItoKunisc 2008/6/12 page 176 i
Chapter 6. Augmented Lagrangian-SQP Methods
Proposition 6.10 (Necessary Condition). If x ∗ is a solution to (6.4.8), e (x ∗ ) is surjective, and (h1) holds, then λ∗ is a variational solution of -
− λ∗ + g (y ∗ )λ∗ = −C ∗ (Cy ∗ − yd ) in , ∂λ∗ = 0 on , ∂n
(6.4.11)
i.e., (∇λ∗ , ∇ϕ) + (g (y ∗ )λ∗ , ϕ) + (Cy ∗ − yd , Cϕ)Z = 0 for all ϕ ∈ H 1 () and τ1 λ∗ = αu∗ on 1 .
(6.4.12)
Proof. The Lagrangian associated with (6.4.8) can be expressed by L(y, u, λ) =
α 1 |Cy − yd |2Z + |u|2L2 (1 ) + ∇λ, ∇y 2 2 + λ, g(y) − fˆ − λ, u˜ .
For every (v, h) ∈ H 1 () × L2 (1 ) we find Ly (y ∗ , u∗ , λ∗ )(v) = Cy ∗ − yd , Cv Z + ∇λ∗ , ∇v + λ∗ , g (y ∗ )v and Lu (y ∗ , u∗ , λ∗ )(h) = α u∗ , h 1 − λ∗ , h 1 . Thus the claim follows. We now aim for a priori estimates for λ∗ . We define B : H 1 () → H 1 ()∗ as the differential operator given by the left-hand side of (6.4.11); i.e., Bv = ϕ is characterized as the solution to ∇v, ∇ψ + g (y ∗ )v, ψ = ϕ, ψ (H 1 )∗ ,H 1 for all ψ ∈ H 1 (). We shall use the following hypothesis: (h2) 0 is not an eigenvalue of B. Note that (h2) holds, for example, if g (y ∗ ) ≥ β
a.e. on
for some β > 0. With (h2) holding, B is an isomorphism from H 1 () onto H 1 ()∗ . Moreover, (h2) implies surjectivity of e (x ∗ ). Lemma 6.11. Let the conditions of Proposition 6.10 hold. (i) There exists a constant K(x ∗ ) such that |λ∗ |H 1 ≤ K(x ∗ )|(N C ∗ (Cy ∗ − yd ), αu∗ )|X .
i
i i
i
i
i
i
6.4. Applications
ItoKunisc 2008/6/12 page 177 i
177
(ii) If moreover (h2) is satisfied and C ∗ (Cy ∗ − yd ) ∈ L2 (), then there exists a constant K(y ∗ ) such that |λ∗ |H 2 ≤ K(y ∗ )|C ∗ (Cy ∗ − yd )|L2 () . Proof. Due to surjectivity of e (x ∗ ) we have (e (x ∗ )e (x ∗ )∗ )−1 ∈ L(H 1 ()) and thus (i) follows from (6.4.10). Let us turn to (ii). Due to (h2) and (6.4.11) there exists a constant Ky ∗ such that |λ∗ |H 1 ≤ Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ .
(6.4.13)
To obtain the desired H 2 () estimate for λ∗ one applies the well-known H 2 a priori estimate for Neumann problems to − λ∗ + λ∗ = w in , ∂λ∗ = 0 in ∂ ∂n with w = λ∗ − g (y ∗ )λ∗ − C ∗ (Cy ∗ − yd ). This gives |λ∗ |H 2 ≤ K |λ∗ |L2 + |g (y ∗ )λ∗ |L2 + |C ∗ (Cy ∗ − yd )|L2 for a constant K (depending on but independent of y ∗ ). Since |g (y ∗ )λ∗ |L2 ≤ |g (y ∗ )|L1+ |λ∗ |H 1 , the desired result follows from (6.4.13). To calculate the second Fréchet derivative we shall use (h3) g (y ∗ ) ∈ L1+ε () for some ε > 0. Proposition 6.12. Let (6.2.3), (h1), and (h3) hold. Then L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) = |Cv|2Z + α|h|2L2 (1 ) + λ∗ , g (y ∗ )v 2
(6.4.14)
for all (v, h) ∈ X. Proof. It suffices to observe that by Sobolev’s embedding theorem there exists a constant Ke such that | λ∗ , g (y ∗ )v 2 | ≤ Ke |λ∗ |H 1 |g (y ∗ )|L1+ |v|2H 1 (6.4.15) for all v ∈ H 1 (). We turn now to an analysis of the second order sufficient optimality condition (6.2.5) for the ∗ optimal control problem (6.4.8). In view of (6.4.14) the crucial term is given by λ , g (y ∗ )v 2 . Two types of results will be given. The first will rely on | λ∗ , g (y ∗ )v 2 |
i
i i
i
i
i
i
178
ItoKunisc 2008/6/12 page 178 i
Chapter 6. Augmented Lagrangian-SQP Methods
being sufficiently small. This can be achieved by guaranteeing that λ∗ or, in view of (6.4.11), Cy ∗ − yd is small. We refer to this case as small residual problems. The second class of assumptions rests on guaranteeing that λ∗ g (y ∗ ) ≥ 0 a.e. on . In the statement of the following theorem we use Ke from (6.4.15) and K(x ∗ ), K(y ∗ ) from Lemma 6.11. Further B −1 denotes the norm of B −1 as an operator from H 1 ()∗ to H 1 (). Theorem 6.13. Let (6.2.3), (h1)–(h3) hold. (i) If Z = H 1 (), C = id, and Ke K(x ∗ )|(y ∗ − yd , αg ∗ )|X |g (y ∗ )|Lq < 1,
(6.4.16)
then the second order sufficient optimality condition (6.2.5) holds. (ii) If Z = L2 (), C is the injection of H 1 () into L2 (), and k˜e K(y ∗ )|y ∗ − yd |L2 () |g (y ∗ )|L∞ () ≤ 1,
(6.4.17)
where k˜e is the embedding constant of H 2 () into L∞ (), then (6.2.5) is satisfied. (iii) If 2B −1 τ1 Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ < α,
(6.4.18)
where τ1 is the norm of the trace operator from H 1 () onto L2 (1 ) and Ky ∗ is defined in (6.4.13), then (6.2.5) is satisfied. Proof. (i) By (6.4.14) and (6.4.15) we have for every (v, h) ∈ X L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |v|2H 1 () + α|h|2L2 (1 ) − Ke |λ∗ |H 1 () |g (y ∗ )|L1+ () |v|2H 1 () ≥ 1 − Ke K(x ∗ )|(y ∗ − yd , αu∗ )|X |g (y ∗ )|L1+ () |v|2H 1 + α|h|2L2 (1 ) , where in the last estimate we used Lemma 6.11 (i). The claim now follows from (6.4.16). We observe that in this case L (y ∗ , u∗ , λ∗ ) is positive definite on all of X, not only on ker e (x ∗ ). (ii) By (6.4.9) and (h2) we obtain |v|H 1 () ≤ B −1 τ1 |h|L2 (1 ) for all (v, h) ∈ ker e (x ∗ ).
(6.4.19)
Here τ1 denotes the norm of the trace operator from H 1 () onto L2 (1 ). Hence by Lemma 6.11 (ii) and (6.4.19) we find for every (v, h) ∈ ker e (x ∗ ) L (v ∗ , u∗ , λ∗ )((v, h), (v, h)) = |v|2L2 () + α|h|2L2 (1 ) − λ∗ , g (y ∗ )v 2 L2 () ≥ |v|2L2 () + α|h|2L2 (1 ) − |λ∗ |L∞ () |g (y ∗ )|L∞ () |v|2L2 () 5 6 ≥ 1 − k˜e K(y ∗ )|g (y ∗ )|L∞ () |y ∗ − yd |L2 () |v|2L2 () 1 α 2 . |v| |h|2L2 (1 ) + + 1 2 B −1 2 τ1 2 H ()
i
i i
i
i
i
i
6.4. Applications
ItoKunisc 2008/6/12 page 179 i
179
Due to (6.4.17) the expression in brackets is nonnegative and the result follows. (iii) In this case C can be a boundary observation operator, for example. As in (i) we find L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + α|h|2L2 (1 ) − Ke |λ∗ |H 1 () |g |L1+ () |v|2H 1 () ≥ |Cv|2Z + α|h|2L2 (1 ) − Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ |g |L () |v|2H 1 () , where (6.4.17) was used. This estimate and (6.4.19) imply that for (v, h) ∈ ker e (x ∗ ) α L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |h|2L2 (1 ) 2 2 1 α ∗ ∗ 2 ∗ 1 ∗ 1+ + |C (Cy − y )| |g | − K y d (H ) L () |v|H 1 () . 2B −1 τ1 The desired result follows from (6.4.18). In view of (6.4.16)–(6.4.18) and the fact that |y ∗ − yd | is decreasing with α → 0+ , the question arises, whether decreasing α results in the fact that the second order sufficient optimality condition. This is nontrivial since the term |y ∗ − yd | in (6.4.16)–(6.4.18) is multiplied by factors which depend on x ∗ and hence on α itself. We refer the reader to [IK10] for a treatment of this issue. In Theorem 6.13 the second order optimality condition was guaranteed by small residue conditions. Alternatively one can proceed by assuming that (h4) λ∗ g (y ∗ ) ≥ 0 a.e. on . Theorem 6.14. Assume that (6.2.3), (h1), (h3), and (h4) hold and that (a) Z = H 1 () and C = id, or (b) (h2) is satisfied. Then (6.2.5) holds. Proof. By Proposition 6.12 and (h4) we find for all (v, h) ∈ X L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + α|h|2L2 (1 ) . In the case that (a) holds the conclusion is obvious and L (v ∗ , u∗ , λ∗ ) is positive not only on ker e (x ∗ ) but on all of X. In case (b) we use (6.4.19) to conclude that α 1 2 L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |v| |h|2L2 (1 ) + 1 2 B −1 2 τ1 2 H () for all (v, h) ∈ ker e (x ∗ ). Next we give a sufficient condition for (h4).
i
i i
i
i
i
i
180
ItoKunisc 2008/6/12 page 180 i
Chapter 6. Augmented Lagrangian-SQP Methods
Theorem 6.15. Let (6.2.3), (h1) hold and assume that (i) C ∗ (yd − Cy ∗ ), ψ (H 1 )∗ ,H 1 ≥ 0 for all ψ ∈ H 1 () with ψ ≥ 0, a.e., (ii) g (y ∗ ) ≥ 0 a.e. on , and (iii) g (y ∗ ) ≥ 0 a.e. on . Then (h4) holds. The conclusion remains correct if the inequalities in (i) and (iii) are reversed. Proof. Set ϕ = inf(0, λ∗ ) ∈ H 1 () in (6.4.11). Then we have |∇ϕ|2 dx + (g (y ∗ )ϕ, ϕ) + c∗ (Cy ∗ − yd ), ϕ (H 1 )∗ ×H 1 = 0.
∗
Since g (y ) ≥ 0 it follows from (i) that |∇ϕ|2L2 () = 0 and λ∗ ≥ 0. Together with (iii) we find λ∗ g (y ∗ ) ≥ 0 a.e. on . If the inequalities in (i) and (iii) are reversed, we take ϕ = sup(0, λ∗ ). Example 6.16. We consider − y + y 3 − y = fˆ in , ∂y = u on ∂n
(6.4.20)
and the associated optimal control problem ⎧ ⎨ min 12 |y − yd |2 dx + α2 u2 dx ⎩
(6.4.21)
subject to (y, u) ∈ H 1 () × L2 () a solution of (6.4.20).
In the context of the problem (6.4.8) we set 1 = = ∂, 2 = ∅, and Z = L2 (),
C : H 1 () → L2 () canonical injection,
g(t) = t 3 − t.
Equation (6.4.20) represents a simplified Ginzburg–Landau model for superconductivity with y denoting the wave function, which is valid in the absence of internal magnetic fields (see [Tin, Chapters 1, 4]). Both (6.4.20) and − y + y 3 + y = h˜ are of interest in this context, but here we concentrate on (6.4.20) since it has three equilibria, ±1 and 0, of which ±1 are stable and 0 is unstable. Proposition 6.17. Problem (6.4.21) admits a solution (y ∗ , u∗ ) ∈ H 1 () × L1 (). Proof. We first argue that the set of admissible pairs (y, u) ∈ H 1 () × L2 () for (6.4.21) is not empty. For this purpose we consider − y + y 3 − y = fˆ in , y = 0 on ∂.
(6.4.22)
i
i i
i
i
i
i
6.4. Applications
ItoKunisc 2008/6/12 page 181 i
181
Let T : L6 () → L6 () be the operator defined by T (y) = (− )−1 (hˆ + y − y 3 ), where denotes the Laplacian with Dirichlet boundary conditions. Clearly T is completely continuous and (I − T )y = 0 implies that y is a solution to (6.4.22) with y ∈ H 2 () ∩ H01 (). By a variant of Schauder’s fixed point theorem due to Schäfer [Dei, p. 61], either (I −t T )y = 0 admits a solution for every t ∈ [0, 1] or (I −t T )y = 0 admits an unbounded set of solutions for some tˆ ∈ (0, 1). Assume that the latter is the case and let y = y(tˆ) satisfy (I − tˆ T )y = 0. Then −(tˆ)−1 y + y 3 − y = fˆ, and hence |∇y|2 dx + y 4 dx = y 2 dx + fˆ y dx tˆ−1
≤
|y(x)|>2
This implies that tˆ−1
y 2 dx +
|y(x)|≤2
|y| |fˆ|dx.
y 2 dx +
|y| |fˆ|dx.
|∇y|2 dx ≤ 4 || +
(6.4.23)
Since y ∈ H01 (), Poincaré’s inequality implies the existence of a constant Cˆ independent of y = y(tˆ) such that ˆ fˆ|L2 + ||). |y(tˆ)|H01 ≤ C(| Consequently the set of solutions y(tˆ) to (I − tˆ T )y = 0 is necessarily bounded in L6 (), ∂y and (6.4.22) admits a solution y ∈ H 2 () ∩ H01 (). Setting u = ∂u ∈ H 1/2 () ⊂ L2 () we find the existence of an admissible pair for (6.4.21). Let (yn , un ) ∈ H 1 () × L2 () be a minimizing sequence for (6.4.21). Due to the 2 choice of the cost functional the sequence {un }∞ n=1 is bounded in L (). Proceeding as in 1 (6.4.23) it is simple to argue that {yn }∞ is bounded in H (). Standard weak convergence n=1 arguments then imply that every weak subsequential limit of {(yn , un )} is a solution to (6.4.21). Proposition 6.18. For every (y, u) ∈ X = H 1 ()×L2 () the linearization e (y, u) : X → X is surjective. Proof. We follow [GHS]. Since e = N e˜ with N defined below (6.4.8) it suffices to show that E : = e(y, ˜ u) : X → H 1 ()∗ is surjective, where E is characterized by E(v, h), w H 1,∗ ,H 1 = ∇v, ∇w + q v, w − h, τ w , with q = 3y 2 − 1 ∈ H 1 (). Consider the operator E0 : H 1 () → H 1 ()∗ defined by E0 (v), w H 1,∗ ,H 1 = ∇v, ∇w + q v, w , and observe that E0 = − + I + (q − 1)I ; i.e., E0 is a compact perturbation by the operator (q − 1)I ∈ L(H 1 (), H 1 ()∗ ) of the homeomorphism (− + I ) ∈ L(H 1 (), H 1 ()∗ ).
i
i i
i
i
i
i
182
ItoKunisc 2008/6/12 page 182 i
Chapter 6. Augmented Lagrangian-SQP Methods
By the Fredholm alternative either E0 , and hence E, is surjective or there exists a finite1 dimensional kernel of E0 spanned by the basis {vi }N i=1 in H () and E0 v = z, for z ∈ 1 ∗ H () , is solvable if and only if z, vi H 1,∗ ,H 1 = 0 for all i = 1, . . . , N. In the latter case on by Lemma 6.19 below. Without loss of generality vi , i = 1, . . . , N, are nontrivial we may assume that vi , vj = δij for i, j = 1, . . . , N. For z ∈ H 1 ()∗ we define zˆ ∈ ! H 1 ()∗ by ˆz, w H 1,∗ ,H 1 = z, w H 1,∗ ,H 1 + h, w , where h = − N i=1 z, vi H 1,∗ ,H 1 τ vi ∈ 2 L (). Then ˆz, vi H 1,∗ ,H 1 = 0 for all i = 1, . . . , N and hence there exists v ∈ H 1 () such that E0 v = zˆ or equivalently ∇v, ∇w + q v, w = z, w H 1,∗ ,H 1 for all v ∈ H 1 (). Consequently E is surjective. Lemma 6.19. If for q ∈ L2 () the function v ∈ H 1 () satisfies ∇v, ∇w + q v, w = 0 for all v ∈ H 1 ()
(6.4.24)
and v = 0 on , then v = 0 in . ˜ be a bounded domain with smooth boundary strictly containing . Define Proof. Let v in , v˜ = ˜ \ , 0 in and by 0 of q. Since v = 0 on = ∂ we have v˜ ∈ H 1 () and let q˜ denote the extension ˜ by (6.4.24). This, together with the fact ∇ v, ˜ ∇w ˜ + q˜ v, ˜ w ˜ = 0 for all w ∈ H01 () ˜ implies that v˜ = 0 in ˜ and v = 0 in ; see that v˜ = 0 on an open nonempty subset of , [Geo]. Let us discuss the validity of the conditions (hi) for the present problem. Sobolev’s embedding theorem and g (t) = 3t 2 − 1
and
g (t) = 6t
imply that (h1), (h3), and (h4) hold. The location of the equilibria of g suggests that for h = 0, yd ≥ 1 implies 1 ≤ y ∗ ≤ yd and similarly that yd ≤ −1 implies yd ≤ y ∗ ≤ −1. This was confirmed in numerical experiments [IK10]. In these cases g (y ∗ ) ≥ β > 0 and (i)–(iii) of Theorem 6.15 hold. The numerical results in [IK10] are based on Algorithm 6.3. While it does not contain a strategy on the choice of c, it was observed that c > 0 and of moderate size is superior (with respect to region of attraction for convergence) to c = 0, which corresponds to the SQP method without line search. For example, for the choice yd = 1 and the initialization y0 = −2, the iterates yn have to pass through the stable equilibrium −1 and the unstable equilibrium 0 to reach the desired state yd . This can be accomplished with Algorithm 6.3 with c > 0, without globalization strategy, but not with c = 0. Similar comments apply to the case of Neumann boundary controls.
i
i i
i
i
i
i
6.5. Approximation and mesh-independence
ItoKunisc 2008/6/12 page 183 i
183
Example 6.20. This is the singular system − y − y 3 = h in , ∂y = u on , ∂n
(6.4.25)
and the associated optimal control problem ⎧ 1 ⎨ min 2 |y − yd |2H 1 () + ⎩
α 2
u2 ds (6.4.26)
subject to (y, u) ∈ H 1 () × L2 (1 ) a solution of (6.4.25),
where yd ∈ H 1 (). If (6.4.25) admits at least one feasible pair (y, u), then it is simple to argue that (6.4.26) has a solution x ∗ = (y ∗ , u∗ ). We refer the reader to [Lio2, Chapter 3] for existence results in the case that the cost functional is of the form |y −yd |rLr () +α|g|2L2 () for appropriately chosen r > 2. The existence of a Lagrange multiplier is assured in the same manner as in Example 6.6. Clearly (h1) and (h3) are satisfied. For yd = const ≥ 12 we observed numerically that 0 ≤ y ∗ ≤ yd , λ∗ < 0, which in view of (h4) and Theorem 6.14 explains the second order convergence rate that is observed numerically.
6.5 Approximation and mesh-independence This section is devoted to a brief description of some aspects concerning approximation of Algorithm 6.3. For this purpose let h denote a discretization parameter tending to zero and, for each h, let Xh and Wh be finite-dimensional subspaces of X and W , respectively. In this section we set Z = X × W and Zh = Xh × Wh . The finite-dimensional spaces Xh and Wh are endowed with the inner products and norms induced by X and W . We introduce surjective restriction operators RhX ∈ L(X, Xh ) and RhW ∈ L(W, Wh ) and define Rh = (RhX , RhW ). For each h, let fh : Xh → R and eh : Xh → Wh denote discretizations of the cost functional f and the constraint e. Together with the infinite-dimensional problem (6.2.1) we consider the finite-dimensional discrete problems -
min fh (xh ) over x ∈ Xh subject to eh (x) = 0.
(6.5.1)
The Lagrangians for (6.5.1) are given by Lh (xh , λh ) = fh (xh ) + eh (xh ), λh W , and the approximations of (6.2.6) are L h (xh , λh + ceh (xh )) = 0 ,
eh (xh ) = 0,
(6.5.2)
which are first order necessary optimality conditions for (6.5.1).
i
i i
i
i
i
i
184
ItoKunisc 2008/6/12 page 184 i
Chapter 6. Augmented Lagrangian-SQP Methods We require the following assumptions.
Condition 6.5.1. The approximations (Zh , Rh ) of the space Z are uniformly bounded; i.e., there exists a constant cR > 0 independent of h satisfying Rh L(Z,Zh ) ≤ cR . Condition 6.5.2. For every h problem (6.5.1) has a local solution xh∗ . The mappings fh and eh are twice continuously Fréchet differentiable in neighborhoods V˜h∗ of xh∗ , and the operators eh are Lipschitz continuous on V˜h∗ with a uniform Lipschitz constant ξe > 0 independent of h. Condition 6.5.3. For every h there exists a nonempty open and convex set Vˆh∗ ⊆ V˜h∗ such that a uniform Babuška–Brezzi condition is satisfied on Vˆh∗ ; i.e., there exists a constant β > 0 independent of h satisfying eh (xh )qh , wh W inf sup ≥β for all xh ∈ Vˆh∗ . wh ∈Wh qh ∈Xh qh X wh W The Babuška–Brezzi condition implies that the operators eh (xh ) are surjective on Vˆh∗ . Hence, if Condition 6.5.3 holds, there exists for every h > 0 a Lagrange multiplier λ∗h ∈ Wh such that (xh∗ , λ∗h ) solves (6.5.2). Condition 6.5.4. There exist a scalar r > 0 independent of h and a neighborhood V (λ∗h ) Vh∗ = Vˆh∗ × V (λ∗h ) of λ∗h for every ∗h ∗such that for ∗ (i) B (xh λh ); r ⊆ Vh and (ii) a uniform second order sufficient optimality condition is satisfied on Vh∗ ; i.e., there exists a constant κ¯ > 0 such that for all (xh , λh ) ∈ Vh∗ L h (xh , λh )(qh )2 ≥ κ¯ qh 2X for all qh ∈ ker eh (xh ). We define
F (x, λ) =
L (x, λ) e(x)
.
For every h and for all (xh , λh ) ∈ Vh∗ we introduce approximations of F and M by Lh (xh , λh ) Fh (xh , λh ) = , eh (xh ) Lh (xh , λh ) eh (xh ) Mh (xh , λh ) = . eh (xh ) 0 By Conditions 6.5.3 and 6.5.4 there exists a bound η which may depend on β and κ ∗ but is independent of h so that Mh−1 (xh , λh )L(Zh ) ≤ η
(6.5.3)
i
i i
i
i
i
i
6.5. Approximation and mesh-independence
ItoKunisc 2008/6/12 page 185 i
185
for all (xh , λh ) ∈ Vh∗ . We require the following consistency properties of Fh and Mh , where V (x ∗ , λ∗ ) = V (x ∗ ) × V (λ∗ ) denotes the neighborhood of the solution x ∗ of (6.2.1) and the associated Lagrange multiplier λ∗ defined above (6.2.8). Condition 6.5.5. For each h we have Rh (V (x ∗ , λ∗ )) ⊆ Vh∗ , and the approximations of the operators F and M are consistent on V (x ∗ ) and V (x ∗ , λ∗ ), respectively, i.e., lim Fh (Rh (x, λ)) − Rh F (x, λ)Z = 0
h→0
and
/ / / q q / / =0 M − R (R (x, λ))R M(x, λ) lim / h h h h w w /Z h→0 /
for all (q, w) ∈ Z and (x, λ) ∈ V (x ∗ , λ∗ ). Moreover, for every h let the operator Mh be Lipschitz continuous on Vh∗ with a uniform Lipschitz constant ξM > 0 independent of h. Now we consider the following approximation of Algorithm 6.3. Algorithm 6.7. (i) Choose (xh0 , λ0h ) ∈ Vh∗ , c ≥ 0 and put n = 0. (ii) Set λ˜ nh = λnh + c eh (xhn ). (iii) Solve for (xˆh , λˆ h ): Mh (xhn , λ˜ nh )
xˆh − xhn λˆ h − λ˜ nh
=−
L h (xhn , λ˜ nh ) eh (xhn )
.
(iv) Set (xhn+1 , λn+1 ˆh , λˆ h ), n = n + 1, and goto (ii). h ) = (x Theorem 6.21. Let c (xh0 , λ0h ) − (xh∗ , λ∗h )Z be sufficiently small for all h and let (6.2.3), (6.2.5), and Conditions 6.5.2–6.5.4 hold. Then we have the following: (a) Algorithm 6.7 is well defined and ∗ ∗ n n ∗ ∗ 2 (xhn+1 , λn+1 h ) − (xh , λh )Z ≤ C (xh , λh ) − (xh , λh )Z ,
where C = 12 η ξM max(2, 1 + 2c2 ξe2 ) is independent of h. (b) Let (x 0 , λ0 ) be a startup value of Algorithm 6.3 such that c (x 0 , λ0 ) − (x ∗ , λ∗ )Z is sufficiently small and assume that / / (6.5.4) lim /(xh0 , λ0h ) − Rh (x 0 , λ0 )/Z = 0. h→0
If in addition Conditions 6.5.1 and 6.5.5 are satisfied, we obtain / / lim /(xhn , λnh ) − Rh (x n , λn )/Z = 0 h→0
i
i i
i
i
i
i
186
ItoKunisc 2008/6/12 page 186 i
Chapter 6. Augmented Lagrangian-SQP Methods
for all n, where (xhn , λnh ) and (x n , λn ) are the nth iterates of the finite- and infinite-dimensional methods, respectively. Corollary 6.22. Under the hypotheses of Theorem 6.21 lim (xh∗ , λ∗h ) − Rh (x ∗ , λ∗ )Z = 0
h→0
holds. Corollary 6.23. Let the hypotheses of Theorem 6.21 hold and let {(Zh(n) , Rh(n) )} be a sequence of approximations with limn→∞ h(n) = 0. Then we have n lim (xh(n) , λnh(n) ) − Rh(n) (x ∗ , λ∗ )Z = 0.
n→∞
We turn to a brief account of mesh-independence. This is an important feature of iterative approximation schemes for infinite-dimensional problems. It asserts that the number of iterations to reach a specified approximation quality ε > 0 is independent of the mesh size. We require the following notation: / / n(ε) = min n0 | for n ≥ n0 : /F x n , λn + ce(x n ) /Z < ε , / / nh (ε) = min n0 | for n ≥ n0 : /Fh x n , λn + ceh (x n ) / < ε . h
h
h
Z
We point out that both n(ε) and nh (ε) depend on the startup values (x 0 , λ0 ) and (xh0 , λ0h ) of the infinite- and finite-dimensional methods. Theorem 6.24. Under the assumptions of Theorem 6.21 there exists for each ε > 0 a constant hε > 0 such that n(ε) − 1 ≤ nh (ε) ≤ n(ε)
for h ∈ (0, hε ] .
The proofs of the results of this section as well as numerical examples which demonstrate that mesh-independence is also observed numerically can be found in [KVo1, Vol].
6.6
Comments
Second order augmented Lagrangian methods as discussed in Section 6.2 were treated in [Be] for finite-dimensional and in [IK9, IK10] for infinite-dimensional problems. The history for SQP methods is a long one; see [Han, Pow1] and [Sto, StTa], for example, and the references therein for the analysis of finite-dimensional problems. Infinitedimensional problems are considered in [Alt5], for example. Mesh-independence of augmented Lagrangian-SQP methods was analyzed in [Vol]. This paper also contains many references on mesh-independence for SQP and Newton methods. Methods which replace equality- and inequality-constrained optimization problems by a linear quadratic approximation with respect to the equality constraints and the cost functional, while leaving the
i
i i
i
i
i
i
6.6. Comments
ItoKunisc 2008/6/12 page 187 i
187
equality constraints as explicit constraints, are frequently referred to as Lagrange–Newton methods in the literature. We quote [Alt4, AlMa, Don, Tro] and the references therein. Reduced SQP methods in infinite-dimensional spaces were analyzed in [JaSa, KSa, Kup] among others. For optimization problems with equality and simple inequality constraints the SQP method can advantageously be combined with projection methods; see [Hei], for example. We have not considered the topic of approximating the Hessian by secant updates. In part this is due to the fact that in the context of optimal control of partial differential equations the structure of the nonlinearities is such that the continuous problem may allow a rather straightforward characterization of the gradient and Hessian, provided, of course, that it is sufficiently regular. Secant methods for SQP methods are considered in, e.g., [Sto, KuSa, Kup, StTa].
i
i i
i
i
ItoKunisc 2008/6/12 page 188 i
i
i
i
i
i
i
i
i
i
ItoKunisc 2008/6/12 page 189 i
Chapter 7
The Primal-Dual Active Set Method
This chapter is devoted to the primal-dual active set strategy for variational problems with simple constraints. This is an efficient method for solving the optimality systems arising from quadratic programming problems with unilateral or bilateral affine constraints and it is equally well applicable to certain complementarity problems. The algorithm and some of its basic properties are described in Section 7.1. In the ensuing sections sufficient conditions for its convergence with arbitrary initialization and without globalization are presented for a variety of different classes of problems. Sections 7.2 and 7.3 are devoted to the finite-dimensional case where the system matrix is an M-matrix or has a cone-preserving property, respectively. Operators which are of diagonally dominant type are considered in Sections 7.4 and 7.5 for unilateral, respectively, bilateral problems. In Section 7.6 nonlinear optimal control problems with control constraints are investigated.
7.1
Introduction and basic properties
Here we investigate the primal-dual active method. Let us consider the quadratic programming problem ⎧ 1 ⎪ ⎪ Ax, x X − a, x X ⎨min x∈X 2 ⎪ ⎪ ⎩ subject to Ex = b, Gx ≤ ψ,
(7.1.1)
where A ∈ L(X) is a self-adjoint operator in the real Hilbert space X, E ∈ L(X, W ), G ∈ L(X, Z), with W a real Hilbert space, and Z = Rn or Z = L2 (), with a domain in Rd , endowed with the usual Hilbert space structure and the natural ordering. We assume that (7.1.1) admits a unique solution denoted by x. If x is a regular point in the sense of Definition 1.5 (with C = X and g(x) = (Ex − b, Gx − ψ)), then there exists a Lagrange 189
i
i i
i
i
i
i
190
ItoKunisc 2008/6/12 page 190 i
Chapter 7. The Primal-Dual Active Set Method
multiplier (λ, μ) ∈ W × Z such that Ax + E ∗ λ + G∗ μ = a, Ex = b,
(7.1.2)
μ = max(0, μ + c (Gx − ψ)), where c > 0 is a fixed constant and max is interpreted pointwise a.e. in if Z = L2 () and coordinatewise if Z = Rn . The third equation in (7.1.2) constitutes the complementarity condition associated with the inequality constraint in (7.1.1), as discussed in Examples 4.51 and 4.53. Note that the auxiliary problems in the augmented Lagrangian-SQP methods of Chapter 6 (see for instance (ii) of Algorithm 6.4 or (ii) of Algorithm 6.6) take the form of (7.1.1). The primal-dual active set method that will be analyzed in this chapter is an efficient technique for solving (7.1.2). While (7.1.2) is derived from (7.1.1) with A self-adjoint, this assumption is not essential in the remainder of this chapter, and we therefore drop it unless it is explicitly specified. We next give two examples of constrained optimal control problems which are special cases of (7.1.1). ˆ where ˆ ⊂ is the control domain and consider the Example 7.1. Let X = L2 (), optimal control problem ⎧ 1 α ⎪ ⎪min |y − yd |2 dx + |u|2X ⎨ u∈X 2 2 ⎪ ⎪ ⎩ subject to − y = Bu, y = 0 on ∂ and u ≤ ψ, ˆ and B ∈ L(X, L2 ()) is the extension-by-zero where α > 0, yd ∈ L2 (), ψ ∈ L2 (), ˆ to . This problem can be formulated as (7.1.1) without equality constraint operator (E = 0) by setting A = αI + B ∗ (− )−2 B,
a = B ∗ (− )−1 y, ¯
and
G = I,
where denotes the Laplacian with homogeneous Dirichlet boundary conditions. Example 7.2. Here we consider optimal control of the heat equation with Neumann boundˆ where ˆ is a subset of the boundary ∂ of : ary control and set, X = L2 (0, T ; L2 ()), T ⎧ 1 α ⎪ ⎪ min |y − yd |2 dx dt + |u|2U ⎪ ⎪ u∈X 2 0 2 ⎪ ⎪ ⎪ ⎨ subject to ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
d y dt
= y,
y(0, ·) = y0
ˆ ν · ∇y(t) = Bu(t) in ,
ˆ y = 0 on ∂ \ , u(t) ≤ ψ,
where α > 0, yd ∈ L2 (0, T ; L2 ()), y0 ∈ L2 (), ˆ is a subset of the boundary ∂, ν is ˆ and B is the extension-by-zero operator ˆ to ∂. For u ∈ X there the outer normal to ,
i
i i
i
i
i
i
7.1. Introduction and basic properties
ItoKunisc 2008/6/12 page 191 i
191
exists a unique solution y = y(u) ∈ Y = L2 (0, T ; H 1 ()) ∩ H 1 (0, T ; L2 ()) to the initial boundary value problem arising as the equality constraints. Let T ∈ L(U, Y ) denote the solution operator given by y = T (u). Setting A = αI + T ∗ T ,
a = T ∗ yd ,
and
G = I,
this optimal control problem is equivalent to (7.1.1). In these examples A is bounded on X. More specifically it is a compact perturbation of a multiple of the identity. There are important problems which are not covered by (7.1.1). For the obstacle problem, considered in Section 4.7.4, the choice X = L2 () results in the unbounded operator A = − , and hence it is not covered by (7.1.1). If X is chosen as H01 () and G as the inclusion of H01 () into L2 (), then the solution does not satisfy the regular point condition. While special techniques allow one to guarantee the existence of a Lagrange multiplier μ associated with the constraint y ≤ ψ in L2 (), the natural space for the Lagrange multiplier is H −1 (). This suggests combining the primal-dual active set method with a regularization method, which will be discussed in a more general context in Section 8.6. Discretization of the obstacle problem by central finite differences leads to a system matrix A on X = Rn that is symmetric positive definite and is an M-matrix. This important class of finite-dimensional variational problems will be discussed in Section 7.2. Another important class of problems are state-constrained optimal control problem, for example, 1 α min |y − y| ¯ 2 dx + |u|2L2 () 2 2 subject to − y = u,
y = 0 on ∂,
y≤ψ
for y ∈ H01 () and u ∈ L2 (). This problem can be formulated as (7.1.1) with X = (H 2 () ∩ H01 ()) × L2 () and G the inclusion of X into L2 (). The solution, however, will not satisfy a regular point condition, and the Lagrange multiplier associated with the state constraint y ≤ ψ is only a measure in general. This class of problems will be discussed in Section 8.6. Let us return to (7.1.2). If the active set A = {μ + c(Gx − ψ) > 0} is known, then the linear system reduces to Ax + E ∗ λ + G∗ μ = a, Ex = b, Gx = ψ in A
and
μ = 0 in Ac .
Here and below we frequently use the notation {f > 0} to stand for {x : f (x) > 0} if f ∈ L2 () and {i : fi > 0} if f ∈ Rn . The active set at the solution, however, is unknown.
i
i i
i
i
i
i
192
ItoKunisc 2008/6/12 page 192 i
Chapter 7. The Primal-Dual Active Set Method
The primal-dual active set method uses the complementarity condition μ = max(0, μ + c (Gx − ψ)) as a prediction strategy. Based on the current primal-dual pair (x, μ) the updates for the active and inactive sets are determined by I = {μ + c(Gx − ψ) ≤ 0}
A = {μ + c(Gx − ψ) > 0}.
and
This leads to the following Newton-like method. Primal-Dual Active Set Method. (i) Initialize x 0 , μ0 . Set k = 0. (ii) Set Ik = {μk + c(Gx k − ψ) ≤ 0}, Ak = {μk + c(Gx k − ψ) > 0}. (iii) Solve for (x k+1 , λk+1 , μk+1 ): Ax k+1 + E ∗ λk
+
1
+ G∗ μk+1 = a,
Ex k+1 = b, Gx k+1 = ψ in Ak
and
μk+1 = 0 in Ik .
(iv) Stop, or set k = k + 1, and return to (ii). It will be shown in Section 8.4 that the above algorithm can be interpreted as a semismooth Newton method for solving (7.1.2). This will allow the local convergence analysis of the algorithm. In this chapter we concentrate on its global convergence, i.e., convergence for arbitrary initializations and without the necessity for a line search. In case A is positive definite, a sufficient condition for the existence of solutions to the auxiliary systems of step (iii) of the algorithm is given by surjectivity of E and surjectivity G : N (E) → Z. In fact in this case (iii) is the necessary optimality condition for ⎧ 1 ⎪ ⎪ Ax, x X − a, x X ⎨min x∈X 2 ⎪ ⎪ ⎩
subject to Ex = b,
Gx = ψ on Ak .
In the remainder of this chapter we focus on the reduced form of (7.1.2) given by Ax + μ = a,
μ = max(0, μ + c (x − ψ)),
(7.1.3)
where A ∈ L(Z). We now derive sufficient conditions which allow us to transform (7.1.2) into (7.1.3). In a first step we assume that G is surjective, range(G∗ ) ⊂ kerE, E x¯ = b for some x¯ ∈ (ker E)⊥ .
(7.1.4)
i
i i
i
i
i
i
7.1. Introduction and basic properties
ItoKunisc 2008/6/12 page 193 i
193
Note that (7.1.4) implies that G : N (E) → Z is surjective. If not, then there exists a nonzero z ∈ Z such that (z, Gx)Z = (G∗ z, x)X = 0 for all x ∈ ker E. If we let x = G∗ z, then |x|2 = 0 and z = 0, since G∗ is injective. Let PE denote the orthogonal projection in X onto ker E. Then (7.1.2) is equivalent to ¯ Axˆ + G∗ μ = PE (a − Ax),
μ = max(0, μ + c (Gxˆ − (ψ − Gx), ¯
E ∗ λ = (I − PE )(a − A(PE xˆ + x)), ¯ with A = PE APE and x = xˆ + x¯ ∈ ker E + (ker E)⊥ . The first of the above equations is equivalent to the system ˆ + G∗ μ = (I − PG )PE (a − A x), ¯ (I − PG )A((I − PG ) xˆ + PG x) (7.1.5) PG A((I − PG ) xˆ + PG x) ˆ = PG PE (a − A x), ¯ where PG = I − G∗ (G G∗ )−1 G is the orthogonal projection in ker E ⊂ X onto ker G. Since G∗ is injective the first equation in (7.1.5) is equivalent to ¯ (G G∗ )−1 G A(G∗ (G G∗ )−1 y + x2 ) + μ = (G G∗ )−1 G PE (a − A x), where y = G x1 for x1 ∈ ker E ∩ (ker G)⊥ , x2 ∈ ker G, and xˆ = x1 + x2 . Let A11 = (GG∗ )−1 GAG∗ (GG∗ )−1 ,
A12 = (GG∗ )−1 GAPG , A22 = PG APG
and ¯ a1 = (GG∗ )−1 GPE (a − Ax),
a2 = PG PE (a − Ax). ¯
Then (7.1.5) is equivalent to the following equation in Z × (ker E ∩ ker G): y μ a1 A11 A12 + = . A21 A22 a2 x2 0
(7.1.6)
Equation (7.1.6) together with μ = max(0, μ + c(y − (ψ − Gx))) ¯
(7.1.7)
¯ are equivalent to (7.1.2) where x = xˆ + x, ¯ with and E ∗ λ = (I − PE )(a − A(xˆ + x)) xˆ = x1 + x2 ∈ ker E, x2 ∈ ker E ∩ ker G, x1 ∈ ker E ∩ (ker G)⊥ , y = Gx1 . Note that the system matrix in (7.1.6) is positive definite if A restricted to ker E is positive definite. Let us now further assume that A22 is nonsingular.
(7.1.8)
Then (7.1.6), (7.1.7) are equivalent to −1 ∗ (A11 − A12 A−1 22 A12 )y + μ = a1 − A12 A22 a2
i
i i
i
i
i
i
194
ItoKunisc 2008/6/12 page 194 i
Chapter 7. The Primal-Dual Active Set Method
and (7.1.7), which is of the desired form (7.1.3). In the finite-dimensional case, (7.1.3) admits a unique solution for every a ∈ Rn if and only if A is a P -matrix; see [BePl, Theorem 10.2.15.]. Recall that A is called a P -matrix if all its principal minors are positive. In view of the fact that the reduction of (7.1.6) to (7.1.3) was achieved by taking the Schur complement with respect to A22 it is also worthwhile to recall that the Schur complement of a P -matrix (resp., M-matrix) is again a P -matrix (resp., M-matrix); see [BePl, p. 292]. For further reference it will be convenient to specify the primal-dual active set algorithm for the reduced system (7.1.3). Primal-Dual Active Set Method for Reduced System. (i) Initialize x 0 , μ0 . Set k = 0. (ii) Set Ik = {μk + c(x k − ψ) ≤ 0}, Ak = {μk + c(x k − ψ) > 0}. (iii) Solve for (x k+1 , μk+1 ): Ax k+1 + μk+1 = a, x k+1 = ψ in Ak
and
μk+1 = 0 in Ik .
(iv) Stop, or set k = k + 1, and return to (ii). We use (7.1.3) rather than (7.1.6) for the convergence analysis for the reason of avoiding additional notation. All convergence results that follow equally well apply to (7.1.6) where it is understood that the coordinates corresponding to the variable x2 are treated as inactive throughout the algorithm and the corresponding Lagrange multipliers are set and updated by 0. In the following subsections convergence will be proved under various different conditions. These conditions will imply also the existence of a solution to the subsystems in step (iii) of the algorithm as well as the existence of a unique solution to (7.1.3). ˜ ⊂ {1, . . . , n}, respectively, ˜ ⊂ , let R˜ denote the restriction operator to For ∗ ˜ ˜ c . For any A let I be its complement in . Then R˜ is the extension-by-zero operator to n R , respectively, , and denote ∗ AA = RA ARA ,
AA,I = RA ARI∗ ,
and analogously for AI and AI,A , and δxA = RA (x k+1 − x k ),
δxI = RI (x k+1 − x k ),
and analogously for δμA and δμI . From (iii) above we have AAk δxAk + AAk ,Ik δxIk + δμAk = 0, (7.1.9) AIk δxIk + AIk ,Ak δxAk − μkIk = 0.
i
i i
i
i
i
i
7.1. Introduction and basic properties
ItoKunisc 2008/6/12 page 195 i
195
The following properties for k = 1, 2, . . . follow from steps (ii) and (iii): μk (x k − ψ) = 0,
μk + (x k − ψ) > 0 on Ak ,
x k − ψ ≥ 0, μk ≥ 0 on Ak , δxAk ≤ 0,
x k ≤ ψ, μk ≤ 0 on Ik ,
(7.1.10)
δμIk ≥ 0,
all statements holding coordinatewise if Z = Rn and pointwise a.e. for Z = L2 (). Remark 7.1.1. From (7.1.10) it follows that Ak = Ak+1 implies that the solution is found, i.e., (xk , μk ) = (x ∗ , μ∗ ). In numerical practice it was observed that Ak = Ak+1 can be used as a stopping criterion; see [HiIK, IK20, IK22]. Remark 7.1.2. The primal-dual active set strategy can be interpreted as a prediction strategy which, on the basis of (x k , μk ), predicts the true active and inactive sets for (7.1.3), i.e., the sets A∗ = {μ∗ + c(x ∗ − ψ) > 0} and
I ∗ = (A∗ )c .
To further pursue this point we assume that the systems in (7.1.3) and step (iii) of the algorithm admit solutions, and we define the following partitioning of the index set at iteration level k: IG = Ik ∩ I ∗ ,
IB = Ik ∩ A∗ ,
AG = Ak ∩ A∗ ,
AB = Ak ∩ I ∗ .
The sets IG , AG give a good prediction and the sets IB , AB give a bad prediction. Let us denote x = x k+1 − x ∗ , μ = μk+1 − μ∗ , and we denote by G((x k , μk )) the system matrix for step (iii) of the algorithm: ⎛ ⎞ AIk Ak IIk 0 AIk ⎜AAk Ik AAk 0 IAk ⎟ ⎟. G(xk , μk ) = ⎜ ⎝ 0 0 IIk 0 ⎠ 0 −cIAk 0 0 Then we have the identity ⎞ ⎛ xIk ⎜ xAk ⎟ ∗ ∗ ⎟ G(x k , μk ) ⎜ ⎝ μIk ⎠ = −col 0Ik , 0Ak , 0IG , μIB , 0AG , c(ψ − x )AB . μAk
(7.1.11)
Here we assumed that the components of the equation μ − max{0, μ + c(x − ψ)} = 0 are ordered as (IG , IB , AG , AB ). Since x k ≥ ψ on Ak and μk ≤ 0 on Ik , we have |ψ − x ∗ |AB ≤ |x k − x ∗ |AB
and
|μ∗ |IB ≤ |μk − μ∗ |IB .
(7.1.12)
By the definition yAG = 0,
xAB = (ψ − x ∗ )AB ,
μIG = 0,
μIB = −μ∗IB .
(7.1.13)
i
i i
i
i
i
i
196
ItoKunisc 2008/6/12 page 196 i
Chapter 7. The Primal-Dual Active Set Method
On the basis of (7.1.11)–(7.1.13) we can draw the following conclusions. (i) If x k → x ∗ , then, in the finite-dimensional case, there exists an index k¯ such that ¯ Consequently the convergence occurs in finitely many steps. IB = AB = ∅ for all k ≥ k. (ii) By (7.1.11)–(7.1.12) there exists a constant κ ≥ 1 independent of k such that | x| + | μ| ≤ κ |(x k − x ∗ )AB | + |(μk − μ∗ )IB | . Thus if the incorrectly predicted sets are small in the sense that |(x k − x ∗ )AB | + |(μk − μ∗ )IB | ≤
1 (|(x k − x ∗ )AB,c | + |(μk − μ∗ )IB,c |), 2κ − 1
where AB,c , IB,c denote the complement of the indices AB , IB , respectively, then |x k+1 − x ∗ | + |μk+1 − μ∗ | ≤
1 (|x k − y ∗ | + |μk − μ∗ |), 2
and convergence follows. (iii) If x ∗ < ψ and μ0 + c (x 0 − ψ) ≤ 0 (e.g., y 0 = ψ, μ0 = 0), then the algorithm converges in one step. In fact, in this case AB = IB = ∅.
7.2
Monotone class
In this section we assume that Z = Rn and that A is an M-matrix, i.e., it is nonsingular, its nondiagonal elements are nonnegative, and A−1 ≥ 0. Then there exists a unique solution (x ∗ , μ∗ ) to (7.1.3). Example 7.3. Consider the discretized obstacle problem on the square (0, 1) × (0, 1). Let h = N1 and let ui,j denote the approximation of the solution at the nodes (ih, j h), 0 ≤ i, j ≤ N . Using the central difference approximation results in a finite-dimensional variational inequality satisfying 4ui,j − ui+1,j − ui−1,j − ui,j +1 − ui,j −1 + μi,j = fi,j , h2 μi,j = max(0, μi,j + c (ui,j − ψi,j ) for 1 ≤ i, j ≤ N − 1, and u0,j = uN,j = ui,0 = ui,N = 0. The resulting matrix A in 2 R(N−1) is an M-matrix. The following theorem asserts the convergence of the iterates of the primal-dual active set method for (7.1.3). Theorem 7.4. Assume that A is an M-matrix. Then xk → x ∗ for arbitrary initial data. Moreover x ∗ ≤ x k+1 ≤ x k for all k ≥ 1, x k ≤ ψ for all k ≥ 2, and there exists k0 such that μk ≥ 0 for all k ≥ k0 . −1 Proof. Since A is an M-matrix we have A−1 I ≥ 0 and AI AI,A ≤ 0 for every index −1 partition of {1, . . . , n} into I and A. Since δxIk = −AIk AIk Ak δxAk − A−1 Ik δμIk by (7.1.9)
i
i i
i
i
i
i
7.3. Cone sum preserving class
ItoKunisc 2008/6/12 page 197 i
197
it follows that δxIk ≤ 0. Together with δxAk ≤ 0, which follows from the third equation in (7.1.10), this implies that x k+1 ≤ x k for k ≥ 1. Next we show that x k is feasible for k ≥ 2. Due to monotonicity of x k with respect to k it suffices to show this for k = 2. For i such that (x 1 − ψ)i > 0 we have μ1i = 0 by (7.1.10) and hence μ1i + c (x 1 − ψ)i > 0 and i ∈ A1 . Since x 2 = ψ on A1 and x 2 ≤ x 1 it follows that x 2 ≤ ψ. To verify that x ∗ ≤ x k for k ≥ 1, note that ∗ aIk−1 = μ∗Ik−1 + AIk−1 xI∗k−1 + AIk−1 Ak−1 xA k−1
= AIk−1 xIk k−1 + AIk−1 Ak−1 ψAk−1 , and consequently ∗ AIk−1 xIk k−1 − xI∗k−1 = μ∗Ik−1 + AIk−1 Ak−1 xA − ψAk−1 . k−1 ∗ Since μ∗Ik−1 ≥ 0 and xA ≤ ψAk−1 , the M-matrix properties of A imply that xIk k−1 ≥ xI∗k−1 k−1 and consequently x k ≥ x ∗ for all k ≥ 1. ¯ i), k¯ ≥ 1, we have Turning to the feasibility of μk assume that for a pair of indices (k, k¯ k¯ k¯ k¯ μi < 0. Then necessarily i ∈ Ak−1 ¯ , xi = ψi , and μi + c(xi − ψi ) < 0. It follows that ¯ ¯ ¯ k+1 k+1 i ∈ Ik¯ , μk+1 = 0, and μ + c(x − ψi ) ≤ 0, since xik+1 ≤ ψi , k ≥ 1. Consequently i i i i ∈ Ik+1 and by induction i ∈ Ik for all k ≥ k¯ + 1. Thus, whenever a coordinate of μk ¯ ¯ it is zero from iteration k+1 ¯ becomes negative at iteration k, onwards, and the corresponding primal coordinate is feasible. Due to finite-dimensionality of Rn it follows that there exists ko such that μk ≥ 0 for all k ≥ ko . Monotonicity of x k and x ∗ ≤ x k ≤ ψ for k ≥ 2 imply the existence of x¯ such that lim x k = x¯ ≤ ψ. Since μk = Ax k + a ≥ 0 for all k ≥ ko , there exists μ¯ such that lim μk = μ¯ ≥ 0. Together with the complementarity property μ( ¯ x¯ − ψ), which is a consequence of the first equation in (7.1.10), it follows that (x, ¯ μ) ¯ = (x ∗ , μ∗ ).
7.3
Cone sum preserving class
For rectangular matrices B ∈ Rn×m we denote by · 1 the subordinate matrix norm when both Rn and Rm are endowed with the 1-norms. Moreover, B+ denotes the n × m matrix containing the positive parts of the elements of B. Recall that a square matrix is called a P -matrix if all its principle minors are positive. Theorem 7.5. If A is a P -matrix and for every partitioning! of the index set {1, . . . , n} into −1 disjoint subsets I and A we have (A−1 A ) < 1 and IA + 1 i∈I (AI xI )i > 0 for xI ≥ 0 I k ∗ with xI = 0, then limk→∞ x = x . The third condition in the above theorem motivates the terminology cone sum preserving. If A is an M-matrix, ! then the conditions of Theorem 7.5 are satisfied. The proof will reveal that M(x k ) = ni=1 xik is a merit function. Proof. From (7.1.9) and the fact that x k+1 = ψ on Ak we have for k = 1, 2, . . . −1 k k (x k+1 − x k )Ik = A−1 Ik AIk Ak (x − ψ)Ak + AIk μIk
i
i i
i
i
i
i
198
ItoKunisc 2008/6/12 page 198 i
Chapter 7. The Primal-Dual Active Set Method
and upon summation over the inactive indices k k A−1 (xik+1 − xik ) = (A−1 Ik AIk Ak (x − ψ)Ak i + Ik μIk )i . i∈Ik
i∈Ik
(7.3.1)
i∈Ik
Using again that x k+1 = ψ on Ak , this implies that n
(xik+1 −xik ) = −
i=1
(xik −ψi )+
i∈Ak
k (A−1 Ik AIk Ak (x −ψ)Ak )i +
i∈Ik
k (A−1 Ik μIk )i . (7.3.2)
i∈Ik
k Since xA ≥ ψAk it follows that k n
k (xik+1 − xik ) ≤ ((A−1 Ik AIk Ak )+ 1 − 1) |x − ψ|1,Ak +
i=1
k (A−1 Ik μIk )i < 0
(7.3.3)
i∈Ik
unless x k is the solution to (7.1.3). In fact, if |x k − ψ|1,Ak , then x k ≤ ψ on Ak and μk ≥ 0 on Ak . If moreover μIk = 0, then μk ≥ 0 and x k ≤ ψ on . Together with the first equation in (7.1.10), this implies that {(x k , μk )} satisfies the complementarity conditions. It also satisfies Ax k + μk = a and hence (x k , μk ) is a solution to (7.1.3). Consequently x k → M(x k ) =
n
xik
i=1
acts as a merit function for the algorithm. Since there are only finitely many possible choices for active/inactive sets, there exists an iteration index k¯ such that Ik¯ = Ik+1 ¯ . In this case ¯ ¯ k+1 k+1 (x , μ ) is a solution to (7.1.3). In fact, in view of (iii) of the algorithm it suffices to show ¯ ¯ that x k+1 and μk+1 are feasible. This follows from the fact that due to Ik¯ = Ik+1 we have ¯ ¯ ¯ ¯ ¯ ¯ ¯ k+1 k+1 k+1 k+1 c(xik+1 − ψi ) = μk+1 + c(x − ψ ) ≤ 0 for i ∈ I and μ + c(x − ψ ) = μ >0 i i k¯ i i i i i ¯k+1 k+1 ¯ for i ∈ Ak¯ . From (7.1.10) we deduce μ (x − ψ) = 0, and hence the complementarity conditions hold and the algorithm converges in finitely many steps. A perturbation result. We now discuss the primal-dual active set strategy for the case where the matrix A can be expressed as an additive perturbation of an M-matrix. Theorem 7.6. Assume that A = M + K with M an M-matrix and K an n × n matrix. If K1 is sufficiently small, then the primal-dual active set algorithm is well defined and limk→∞ (x k , μk ) = (x ∗ , μ∗ ), where (x ∗ , μ∗ ) is a solution to (7.1.3). If y T Ay > 0 for y = 0, then the solution to (7.1.3) is unique. Proof. As a consequence of the assumption that M is an M-matrix all principal submatrices of M are M-matrices as well [BePl]. Let S denote the set of all subsets of {1, . . . , n} and A its complement. Define ρ = sup MI−1 KI 1 and σ = sup BIA (K)1 , I∈S
I∈S
(7.3.4)
i
i i
i
i
i
i
7.3. Cone sum preserving class
ItoKunisc 2008/6/12 page 199 i
199
where BIA (K) = MI−1 KIA − MI−1 KI (M + K)−1 I AIA . Assume that K is chosen such that ρ < 12 and σ < 1. For every subset I ∈ S the inverse of AI exists and can be expressed as A−1 I = (II +
∞
−MI−1 KI )i MI−1 .
i=1
Consequently the algorithm is well defined. Proceeding as in the proof of Theorem 7.5 we arrive at n
(xik+1 − xik ) = −
i=1
(xik − ψi ) +
i∈Ak
k A−1 Ik AIk Ak (x − ψ)Ak
i∈Ik
i
+
k (A−1 Ik μIk )i ,
i∈Ik
where μki ≤ 0 for i ∈ Ik and xik ≥ ψi for i ∈ Ak . Below we drop the index k with Ik and −1 −1 −1 Ak . Note that A−1 I AIA ≤ MI KIA − MI KI (M + K)I AIA = BIA (K). Here we used −1 −1 −1 −1 −1 (M + K)I − MI = −MI KI (M + K)I and MI MIA ≤ 0. This implies n
(xik+1 − xik ) ≤ −
i=1
k BIA (K)(x k − ψ)A i + (xik − ψi ) + (A−1 I μI )i . i∈A
i∈I
i∈I
(7.3.5) We estimate
⎛ ⎞ ∞ k ⎝MI−1 μkI + (A−1 (−MI−1 KI )j MI−1 μkI ⎠ I μ I )i = i∈I
j =1
i∈I
≤ −|MI−1 μkI |1 +
∞ j =1
ρ i |MI−1 μkI |1
= (α − 1)|MI−1 μkI |1 + where we set α = implies that n
ρ 1−ρ
i
1 − (α + 1) |MI−1 μkI |1 = (α − 1)|MI−1 μkI |1 , 1−ρ
∈ (0, 1) by (7.3.4). This estimate, together with (7.3.4) and (7.3.5),
k (xik+1 − xik ) ≤ (σ − 1)|xA − ψAk |1 + (α − 1)|MI−1 μkIk |1 . k k
i=1
Now it can! be verified in the same manner as in the proof of Theorem 7.5 that x k → k M(x ) = ni=1 xik is a merit function for the algorithm and convergence of (x k , μk ) to a solution (x ∗ , μ∗ ) follows. If there are two solutions to (7.1.3), then their difference y satisfies y t Ay ≤ 0 and hence y = 0 and uniqueness follows. Observe that the M-matrix property is not stable under arbitrarily small perturbations since off-diagonal elements may become positive. Theorem 7.6 guarantees that convergence of the primal-dual active set strategy for arbitrary initial data is preserved for sufficiently small perturbations K of an M-matrix. Therefore, Theorem 7.6 is also of interest in connection with numerical implementations of the primal-dual active set algorithm.
i
i i
i
i
i
i
200
7.4
ItoKunisc 2008/6/12 page 200 i
Chapter 7. The Primal-Dual Active Set Method
Diagonally dominated class
We again consider the reduced problem (7.1.3), Ax + μ = a,
μ = max(0, μ + c (x − ψ)),
(7.4.1)
but differently from Sections 7.2 and 7.3 we admit also the infinite-dimensional case. Sufficient conditions related to diagonal dominance of A will be given which imply that k+1 k+1 k+1 + k+1 − |(x − ψ) | dx, |(μ ) | dx (7.4.2) M(x , μ ) = max β Ik
Ak
with β > 0 acts as a merit functional for the primal-dual algorithm. Here we set φ + = max(φ, 0) and φ − = −min(φ, 0). The natural norm associated to this merit functional is the L1 ()-norm and consequently we assume that A ∈ L(L1 ()), a ∈ L1 (), and ψ ∈ L1 ().
(7.4.3)
The analysis of this section can also be used to obtain convergence in the Lp ()-norm for any p ∈ (1, ∞) if the norms in the integrands of M are replaced by | · |p -norms and the L1 ()-norms below are replaced by Lp ()-norms as well. The results also apply for Z = Rn . In this case the integrals in (7.4.2) are replaced by sums over the respective index sets. We assume that there exist constants ρi , i = 1, . . . , 5, such that for all partitions A and I of and for all φA ≥ 0 in L2 (A) and φI ≥ 0 in L2 (I) − |[A−1 I φI ] | ≤ ρ1 |φI |, + |[A−1 I AIA φA ] | ≤ ρ2 |φA |
(7.4.4)
and |[AA φA ]− | ≤ ρ3 |φA |, − |[AAI A−1 I φI ] | ≤ ρ4 |φI |,
(7.4.5)
+ |[AAI A−1 I AIA φA ] | ≤ ρ5 |φA |.
Here | · | denotes the L1 ()-norm. Assumption (7.4.4) requires in particular the existence of A−1 I . By a Schur complement argument with respect to the sets Ik and Ak this implies existence of a solution to the linear systems in step (iii) of the algorithm for every k. Theorem 7.7. If (7.4.3), (7.4.4), (7.4.5) hold and ρ = max(β ρ1 + ρ2 , ρβ3 + ρ4 + ρβ5 ) < 1, then M is a merit function for the primal-dual algorithm of the reduced system and limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.4.1). Proof. For every k ≥ 1 we have (x k+1 − ψ)+ ≤ (x k+1 − x k )+ on Ik and (μk+1 )− = (δμ)− on Ak . Therefore (7.4.6) (δxIk )+ , (δμAk )− . M(x k+1 , μk+1 ) ≤ max β Ik
Ak
i
i i
i
i
i
i
7.4. Diagonally dominated class
ItoKunisc 2008/6/12 page 201 i
201
From (7.1.9) we deduce that −1 k δxIk = −A−1 Ik (−μIk ) + AIk AIk Ak (−δxAk ),
with μkIk ≤ 0 and δxAk ≤ 0. By (7.4.4) therefore |(δxIk )+ | ≤ ρ1 |μkIk | + ρ2 |δxAk | = ρ1
Ik ∩Ak−1
|(μkIk )− | + ρ2
Ak ∩Ik−1
(xk − ψ)+
(7.4.7)
ρ2 ≤ ρ1 + M(x k , μk ). β Similarly, by (7.1.9) −1 k δμAk = AAk (−δxAk ) + AAk Ik A−1 Ik (−μIk ) − AAk Ik AIk AIk Ak (−δxAk ).
Since δxAk ≤ 0 and μkIk ≤ 0, we find by (7.4.5) −
|(δμAk ) | ≤ ρ3 |δxAk | + and therefore M(x
k+1
ρ4 |μkIk |
+ ρ5 |δxAk | ≤
ρ3 + ρ5 + ρ4 M(x k , ν k ), β
(7.4.8)
ρ3 + ρ5 + ρ4 M(x k , μk ) = ρ M(x k , μk ). ) ≤ max βρ1 + ρ2 , β
k+1
,μ
Thus, if ρ < 1, then M is a merit functional. Furthermore M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ). Together with (7.4.7), (7.4.8), and (7.1.9) it follows that (x k , μk ) is a Cauchy sequence. Hence there exists (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) = (x ∗ , μ∗ ) and Ax ∗ + μ∗ = a, μ∗ (x ∗ −ψ) = 0 a.e. in . Since (x k −ψ)+ → (x ∗ −ψ)+ as k → ∞ and limk→∞ (x k+1 − ψ)+ = 0, it follows that x ∗ ≤ ψ. Similarly, one argues that μ∗ ≥ 0. Thus (x ∗ , μ∗ ) is a solution to (7.4.1). Concerning the uniqueness of the solution to (7.4.1), assume that A ∈ L(L2 ()) and ˆ μ) ˆ are solutions to that (Ay, y)L2 () > 0 for all y = 0. Assume further that (x ∗ , μ∗ ) and (x, (7.4.1) with xˆ − x ∗ ∈ L2 (). Then (xˆ − x ∗ , A(xˆ − x ∗ ))L2 () ≤ 0 and therefore xˆ − x ∗ = 0. Remark 7.4.1. In the finite-dimensional case the integrals in the definition of M must be replaced by sums over the active/inactive index sets. If A is an M-matrix, then ρ1 = ρ2 = ρ5 = 0 and ρ < 1 if ρβ3 + ρ4 < 1. This is the case if A is diagonally dominant in the sense that ρ4 < 1 and β is chosen sufficiently large. If these conditions are met, then ρ < 1 is stable under perturbations of A. Remark 7.4.2. Consider the infinite-dimensional case with A = αI + K, where α > 0, K ∈ L(L1 ()), and Kφ ≥ 0 for all φ ≥ 0. This is the case for the operators in Examples 7.1 and 7.2, as can be argued by using the maximum principle. Let K denote the norm of
i
i i
i
i
i
i
202
ItoKunisc 2008/6/12 page 202 i
Chapter 7. The Primal-Dual Active Set Method
−1 1 1 K in K ∈ L(L1 ()). For K < α and any I ⊂ we have A−1 I = α II − α K I A I K and hence ρ1 ≤ α(α−K) . Moreover ρ3 = 0. The conditions involving ρ2 , ρ4 , and ρ5 are
satisfied with ρ2 =
7.5
K , α−K
ρ4 =
K2 , α(α−K)
and ρ5 =
K2 . α−K
Bilateral constraints, diagonally dominated class
Consider the quadratic programming with the bilateral constraints min
1 Ax, x X − a, x X 2
subject to Ex = b,
ϕ ≤ Gx ≤ ψ
with conditions on A, E, and G as in (7.1.1). From Example 4.53 of Section 4.7, we recall that the necessary optimality condition for this problem is given by Ax + E ∗ λ + G∗ μ = a, Ex = b,
(7.5.1)
μ = max(0, μ + c(Gx − ψ)) + min(0, μ + c(Gx − ϕ)), where max as well as min are interpreted as pointwise a.e. operations if Z = L2 () and coordinatewise for Z = Rn . Primal-Dual Active Set Algorithm. (i) Initialize x 0 , μ0 . Set k = 0. (ii) Given (x k , μk ), set k k A+ k = {μ + c(Gx − ψ) > 0}, Ik = {μk + c(Gx k − ψ) ≤ 0 ≤ μk + c(Gx k − ϕ)}, k k A− k = {μ + c(Gx − ϕ) < 0}. (iii) Solve for (x k+1 , λk+1 , μk+1 ): Ax k+1 + E ∗ λk+1 + G∗ μk+1 = a, Ex k+1 = b, Gx k+1 = ψ in A+ k,
μk+1 = 0 in Ik , and Gx k+1 = ϕ in A− k.
(iv) Stop, or set k = k + 1, and return to (ii). We assume the existence of a solution to the auxiliary systems in step (iii). In case A is positive definite a sufficient condition for existence is given by surjectivity of E and
i
i i
i
i
i
i
7.5. Bilateral constraints, diagonally dominated class
ItoKunisc 2008/6/12 page 203 i
203
surjectivity of G : N (E) → Z. As in the unilateral case we shall consider a transformed version of (7.5.1). If (7.1.4) and (7.1.8) are satisfied, then (7.5.1) can be transformed into the equivalent system Ax + μ = a,
μ = max(0, μ + c (x − ψ)) + min(0, μ + c (x − ϕ)),
(7.5.2)
where A is a bounded operator on Z. The algorithm for this reduced system is obtained from the above algorithm by replacing G by I and deleting the terms involving E and E ∗ . As in the unilateral case, if one does not carry out the reduction step from (7.1.6) to (7.5.2), then the coordinates corresponding to x2 are treated as inactive ones in the algorithm. We henceforth concentrate on the infinite-dimensional case and give sufficient conditions for M(x k+1 , μk+1 ) = max ((x k+1 − ψ)+ + (x k+1 − ϕ)− ) dx, Ik
A+ k
(μ
k+1 −
) dx +
A− k
(μ
k+1 +
(7.5.3)
) dx
to act as a merit function for the algorithm applied to the reduced system. In the finitedimensional case the integrals must be replaced by sums over the respective index sets. We note that (iii) with G = I implies the complementarity property (x k − ψ)(x k − ϕ)μk = 0 a.e. in .
(7.5.4)
As in the previous section the merit function involves L1 -norms and accordingly we aim for convergence in L1 (). We henceforth assume that A ∈ L(L1 ()), a ∈ L1 (), ψ and ϕ ∈ L1 ().
(7.5.5)
Below · denotes the norm of operators in L(L1 ()). The following conditions will be used: There exist constants ρi , i = 1, . . . , 5, such that for arbitrary partitions A ∪ I = we have A−1 I ≤ ρ1 , A−1 I AIA ≤ ρ2
(7.5.6)
and AAk − c I ≤ ρ3 , AAI A−1 I ≤ ρ4 ,
(7.5.7)
AAI A−1 I AIA ≤ ρ5 . 5 We further set ρ = 2 max( max(ρ1 , ρ2 , ρc2 ), max(ρ3 + ρ5 , ρ4 ), ρ3 +ρ ). c
Theorem 7.8. If (7.5.5), (7.5.6), (7.5.7) hold and ρ < 1, then M is a merit function for the primal-dual algorithm of the reduced system (7.5.2) and limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.5.2).
i
i i
i
i
i
i
204
ItoKunisc 2008/6/12 page 204 i
Chapter 7. The Primal-Dual Active Set Method
Proof. For δx = x k+1 − x k and δμ = μk+1 − μk we have AA+k δxA+k + AA+k Ik δxIk + AA+k A−k δxA−k + δμA+k = 0, AIk δxIk + AIk A+k δxA+k + AIk A−k δxA−k − μkIk = 0,
(7.5.8)
AA−k δxA−k + AA−k Ik δxIk + AA−k A+k δxA+k + δμA−k = 0 with
⎧ ⎨ >0 =0 μkA+ k ⎩ > c (ψ − ϕ) ⎧ ⎨ [c(ϕ − ψ), 0) =0 μkIk ∈ ⎩ (0, c(ψ − ϕ)] ⎧ ⎨ < c (ϕ − ψ) =0 μkA− k ⎩ <0 ⎧ ⎨ =0 <0 + δxAk k ⎩ = ψ − ϕ < μc ⎧ k ⎨ = ϕ − ψ > μc δxA−k >0 ⎩ =0
+ A+ k−1 ∩ Ak , Ik−1 ∩ A+ k, + A− ∩ A k, k−1
on on on on on on on on on
(7.5.9)
A+ k−1 ∩ Ik , Ik−1 ∩ Ik , A− k−1 ∩ Ik ,
(7.5.10)
− A+ k−1 ∩ Ak , Ik−1 ∩ A− k, − A− k−1 ∩ Ak ,
(7.5.11)
on on on
+ A+ k−1 ∩ Ak , Ik−1 ∩ A+ k, − Ak−1 ∩ A+ k,
(7.5.12)
on on on
− A+ k−1 ∩ Ak , Ik−1 ∩ A− k, − A− ∩ A k. k−1
(7.5.13)
From (7.5.4) − ψIk )+ ≤ (δxIk )+ and (xIk+1 − ϕIk )− ≤ (δxIk )− . (xIk+1 k k
(7.5.14)
This implies that M(x
k+1
k+1
,μ
) ≤ max
Ik
|δxIk | ,
A+ k
(μ
k+1 −
) +
A− k
(μ
k+1 +
)
(7.5.15)
.
From (7.5.8), (7.5.6), (7.5.10), (7.5.12), and (7.5.13) we have |δxIk | ≤ ρ1 |μkIk | + ρ2 |δxAk | ≤ ρ1 |(μkI
)− | + |(μkI
)+ | + ρ2 |δxA+k | + |δxA−k |
≤ ρ1 |(μkI
)− | + |(μkI
)+ | + ρ2 |(x k − ψ)+ | + 1c |(μkA+ ∩A− )+ | A+ ∩I
+ k ∩Ak−1
+ k ∩Ak−1
− k ∩Ak−1
− k ∩Ak−1
k
k−1
k
k−1
+|(x k − ϕ)− | + 1c |(μkA− ∩A+ )− | . A− ∩I k
k−1
k
k−1
i
i i
i
i
i
i
7.5. Bilateral constraints, diagonally dominated class This implies
ρ2 |δxIk | ≤ 2 max ρ1 , ρ2 , M(x k , μk ). c
ItoKunisc 2008/6/12 page 205 i
205
(7.5.16)
From (7.5.8) further k μk+1 Ak − (μAk − cδxAk ) = g,
(7.5.17)
where g = (cI − AδxAk )δxAk − AAk Ik A−1 Ak AIk Ak δxAk . By (7.5.9), (7.5.12), we have μkA+ − c δxA+k ≥ 0. k
Similarly, by (7.5.11), (7.5.13) μkA− − c δxA−k ≤ 0. k
Consequently, )− | + |(μk+1 )+ | ≤ |gAk | ≤ (ρ3 + ρ5 )|δxAk | + ρ4 |μIk | |(μk+1 A+ A− k
k
ρ3 + ρ5 ≤ 2 max ρ4 , ρ3 + ρ5 , M(x k , μk ). c
(7.5.18)
By (7.5.15), (7.5.16), and (7.5.18) ρ2 ρ3 + ρ5 k+1 k+1 M(x , μ ) ≤ 2 max max ρ1 , ρ2 , , max ρ4 , ρ3 + ρ5 , M(x k , μk ). c c It follows that M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ) and if ρ < 1, then M(x k , μk ) → 0 as k → ∞. From the estimates leading to (7.5.16) it follows that x k is a Cauchy sequence. Moreover μk is a Cauchy sequence by (7.5.8). Hence there exist (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) = (x ∗ , μ∗ ). By Lebesgue’s bounded convergence theorem and since M(x k , μk ) → 0, it follows that ϕ ≤ x ∗ ≤ ψ. Clearly Ax ∗ + μ∗ = a and (x ∗ − ψ)(x ∗ − ϕ)μ∗ = 0 by (7.5.4). This last equation implies that μ∗ = 0 on I ∗ = {ϕ < x ∗ < ψ}. It remains to show that μ∗ ≥ 0 on A∗,+ = {x ∗ = ψ} and μ∗ ≤ 0 on A∗,− = {x ∗ = ϕ}. Let s ∈ A∗,+ be such that x k (s) and μk (s) converge. Then μ∗∗(s) ≥ 0. If not, then μ∗ (s) < 0 and there exists k¯ such ¯ Then s ∈ I k and μk+1 = 0 for that μk (s) + c(x k (s) − ψ(s)) ≤ μ 2(s) < 0 for all k ≥ k. ∗ ¯ contradicting μ (s) < 0. Analogously one shows that μ∗ ≤ 0 on A∗,− . k ≥ k, Conditions (7.5.6) and (7.5.7) are satisfied for additive perturbations of the operator cI , for example. This can be deduced from the following result. Theorem 7.9. Assume that A = cI + K with K ∈ L(L1 ()) and K < c and that (7.5.5), (7.5.6), (7.5.7) are satisfied. If K ρ3 + ρ5 ρ¯ = 2 max max ρ1 , ρ2 , max(ρ3 + ρ5 , ρ4 ), < 1, c c then the conclusions of the previous theorem are valid.
i
i i
i
i
i
i
206
ItoKunisc 2008/6/12 page 206 i
Chapter 7. The Primal-Dual Active Set Method
Proof. We follow the proof of Theorem 7.8 and eliminate the overestimate (7.5.14). Let P = {xIk+1 − ψ > 0} ∩ Ik . We find k ⎧ k on P ∩ Ik−1 , ⎪ ⎨ ≤ δxP ∩Ik−1 k k+1 = δxP ∩A+ on P ∩ A+ x −ψ k−1 , k−1 ⎪ − k ⎩ = δx − + (ϕ − ψ) on P ∩ A P ∩Ak−1 k−1 . P ∩A− k−1
This estimate, together with A−1 = 1c I − 1c KA−1 , implies that (x k+1 − ψ)+ ≤ δx k + ϕ−ψ Ik
≤ P
1 = c ≤− and hence
P ∩A− k−1
P k A−1 Ik μIk −
P
A−1 Ik AIk Ak δxAk +
μkIk
P
1 c
P
1 + ϕ−ψ − − c P ∩Ak−1
k KIk A−1 Ik μIk −
P
P
P ∩A− k−1
ϕ−ψ
k KIk A−1 Ik μIk
− P
A−1 Ik AIk Ak δxAk
A−1 Ik AIk Ak δxAk ,
K ρ1 |μkIk | + ρ2 |δxAk |. c Ik An analogous estimate can be obtained for Ik (x k+1 − ϕ)− and we find K k+1 + (x − ψ) (x k+1 − ϕ)− ≤ ρ1 |μkIk | + ρ2 |δxAk |. c Ik Ik (x k+1 − ψ)+ ≤
We can now proceed as in the proof of Theorem 7.8. Example 7.10. We apply Theorem 7.9, A = I + K ∈ L(L1 ()). By Neumann series γ γ2 γ γ arguments we find ρ¯ = 2 max( 1−γ , max(γ + 1−γ , 1−γ )) = 1−γ , where γ = K, and 1 c ρ¯ < 1 if K < 3 . If A = I + K is replaced by A = cI + K, then ρ¯ < 1 if γ < 2c+1 , in case c ≥ 1, and ρ¯ < 1 if
c2 , c+2
in case c ≤ 1.
Example 7.11. Here we consider the finite-dimensional case A = I + K ∈ Rn×n , where Rn is endowed with the 1 -norm. Again Theorem 7.9 is applicable and ρ¯ < 1 if K < 13 , where · denotes the matrix-norm subordinate to the 1 -norm of Rn . Recall that this norm is given by the maximum over the column sums of the absolute values of the matrix.
7.6
Nonlinear control problems with bilateral constraints
In this section we consider a special case of bilaterally constrained problems that were already investigated in the previous section, where the operator A is of the form T ∗ T . This
i
i i
i
i
i
i
7.6. Nonlinear control problems with bilateral constraints
ItoKunisc 2008/6/12 page 207 i
207
will allow us to obtain improved sufficient conditions for global convergence of the primaldual active set algorithm. The primary motivation for such problems are optimal control problems, and we therefore slightly change the notation to that which is more common in optimal control. Let U and Y be Hilbert spaces with U = L2 (), where is a bounded measurable set in Rd , and let T : U → Y be a, possibly nonlinear, continuously differentiable, injective, mapping with Fréchet derivative denoted by T . Further let ϕ, ψ ∈ U with ϕ < ψ a.e. in . For α > 0 and z ∈ Y consider min J (u) =
ϕ≤u≤ψ
1 α |T (u) − z|2Y + |u|2U . 2 2
(7.6.1)
The necessary optimality condition for (7.6.1) is given by
αu + T (u)∗ (T (u) − z) + μ = 0, μ = max(0, μ + α(u − ψ)) + min(0, μ + α(u − ϕ)),
(7.6.2)
where (u, μ) ∈ U × U , and max as well as min are interpreted as pointwise a.e. operations. If the upper or lower constraint are not present, we can set ψ = ∞ or ϕ = −∞. Example 7.12. Let ⊂ Rn be a bounded domain in Rn with Lipschitz continuous bound) ⊂ , ) ary , let ⊂ be measurable subsets, and consider the optimal control problem min
ϕ≤u≤ψ
α 1 |y − z|2Y + | u|2U subject to 2 2
(∇y, ∇v) + (y, v) = (u, v)) for all v ∈ H 1 (),
(7.6.3)
) and U = L2 () ˜ We set Y = L2 () ). Define L u = y with L : L2 () ) → where z ∈ L2 (). 2 L () as the solution operator to the inhomogeneous Neumann boundary value problem 2 2 2 ) ) where R ) → L2 (), denotes the (7.6.3) and set T = R ) L : L () ) : L () → L () ∗ 2 ) 2 ) canonical restriction operator. Then T : L () → L () is given by T ∗ = R) L∗ E ) ) to . with R) the restriction operator to ) and E ) the extension-by-zero operator from Further the adjoint L∗ : L2 () → L2 () of L is given by L∗ w˜ = τ p, where τ is the Dirichlet trace operator from H 1 () to L2 () and p is the solution to ˜ w) for all w ∈ H 1 (). (∇p, ∇w) + (p, w) = (w,
(7.6.4)
The fact that L and L∗ are adjoint to each other follows by setting v = p in (7.6.3) and w = y in (7.6.4). We next specify the primal-dual active set algorithm for (7.6.1). The iteration index is denoted by k and an initial choice (u0 , μ0 ) is assumed to be available.
i
i i
i
i
i
i
208
ItoKunisc 2008/6/12 page 208 i
Chapter 7. The Primal-Dual Active Set Method
Primal-Dual Active Set Algorithm. (i) Given (uk , μk ), determine k k A+ k = {μ + α(u − ψ) > 0}, k k Ik = {μ + α(u − ψ) ≤ 0 ≤ μk + α(uk − ϕ)}, k k A− k = {μ + α(u − ϕ) < 0}. (ii) Determine (uk+1 , μk+1 ) from k+1 k+1 = ϕ on A− = 0 on Ik , uk+1 = ψ on A+ k, u k, μ
and α uk+1 + T (uk+1 )∗ (T (uk+1 ) − z) + μk+1 = 0.
(7.6.5)
Note that the equations for (uk+1 , μk+1 ) in step (ii) of the algorithm constitute the necessary optimality condition for the auxiliary problem ⎧ ⎨ min 12 |T (u) − z|2Y + α2 |u|2U over u ∈ U (7.6.6) ⎩ − subject to u = ψ on A+ , u = ϕ on A . k k The analysis in this section relies on the fact that (7.6.2) can be equivalently expressed as ⎧ y = T (u), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ λ = −T (u)∗ (y − z), (7.6.7) ⎪ ⎪ α u − λ + μ = 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ μ = max(0, μ + (u − ψ)) + min(0, μ + (u − ϕ)), where λ = −T (u)∗ (T (u)−z) is referred to as the adjoint state. Analogously, for uk+1 ∈ U , setting y k+1 = T (uk+1 ), λk+1 = −T (uk+1 )∗ (T (uk+1 ) − z), (7.6.5) can equivalently be expressed as ⎧ k+1 = T (uk+1 ), where ⎪ y ⎪ ⎪ ⎪ ⎧ ⎪ ⎪ on A+ ⎪ ⎨ ψ k, ⎪ ⎪ 1 k+1 k+1 ⎪ u = λ on I ⎪ k, ⎨ ⎩ α ϕ on A− k, (7.6.8) ⎪ ⎪ ⎪ ⎪ k+1 ⎪ λ = −T (uk+1 )∗ (y k+1 − z), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ α uk+1 − λk+1 + μk+1 = 0. In what follows we give conditions which guarantee convergence of the primal-dual active set strategy for linear and certain nonlinear operators T from arbitrary initial data. The convergence proof is based on an appropriately defined functional which decays when
i
i i
i
i
i
i
7.6. Nonlinear control problems with bilateral constraints
ItoKunisc 2008/6/12 page 209 i
209
evaluated along the iterates of the algorithm. An a priori estimate for the adjoint variable λ in (7.6.8) will play an essential role. To specify the condition alluded to in the above let us consider two consecutive iterates − of the algorithm. For every k = 1, 2, . . . , the sets A+ k , Ak , and Ik give a mutually disjoint decomposition of . According to (i) and (ii) in the form (7.6.8) we find ⎧ k on A+ RA+ ⎪ k, ⎪ ⎪ ⎪ ⎨ 1 uk+1 − uk = (7.6.9) (λk+1 − λk ) + RIk on Ik , α ⎪ ⎪ ⎪ ⎪ ⎩ k RA− on A− k, where the residual R k is given by ⎧ ⎨ 0 k ψ − α1 λk = ψ − uk < 0 RA + = ⎩ ψ − ϕ < α1 μk ⎧ 1 k ⎨ α μ = α1 λk − ψ ≤ 0 k 0 RI = ⎩ 1 k μ = α1 λk − ϕ ≥ 0 α ⎧ ⎨ ϕ − ψ > α1 μk k RA− = ϕ − α1 λk = ϕ − uk > 0 ⎩ 0
+ on A+ k−1 ∩ Ak , + on Ik−1 ∩ Ak , + on A− k−1 ∩ Ak ,
(7.6.10)
on A+ k−1 ∩ Ik , on Ik−1 ∩ Ik , on A− k−1 ∩ Ik ,
(7.6.11)
− on A+ k−1 ∩ Ak , on Ik−1 ∩ A− k, − on A− k−1 ∩ Ak .
(7.6.12)
− Here R k denotes the function defined on whose restrictions to A+ k , Ik , Ak coincide with k k k RA+ , RI , and RA− . We shall utilize the following a priori estimate: ⎧ ⎨ There exists ρ < α such that (7.6.13) ⎩ k+1 |λ − λk |U < ρ |R k |U for every k = 1, 2, . . . .
Sufficient conditions for (7.6.13) will be given at the end of this section. The convergence proof will be based on the following merit functional M : U × U → R given by M(u, μ) = α 2 (|(u − ψ)+ |2 + |(ϕ − u)+ |2 ) dx + |μ− |2 dx + |μ+ |2 dx,
A+ (u)
A− (u)
where A+ (u) = {x : u ≥ ψ} and A− (u) = {x : u ≤ ϕ}. Note that the iterates (uk , μk ) ∈ U × U satisfy μk (uk − ψ)(ϕ − uk )(x) = 0 for a.e. x ∈ , (7.6.14) and hence at most one of the integrands of M(uk , μk ) can be strictly positive at x ∈ . Theorem 7.13. Assume that (7.6.13) holds for the iterates of the primal-dual active set strategy. Then M(uk+1 , μk+1 ) ≤ α −2 ρ 2 M(uk , μk ) for every k = 1, . . . . Moreover there exist (u∗ , μ∗ ) ∈ U ×U , such that limk→∞ (uk , μk ) = (u∗ , μ∗ ) and (u∗ , μ∗ ) satisfies (7.6.2).
i
i i
i
i
i
i
210
ItoKunisc 2008/6/12 page 210 i
Chapter 7. The Primal-Dual Active Set Method
Proof. From (7.6.8) we have μk+1 = λk+1 − αψ uk+1 =
1 k+1 λ α
on A+ k, on Ik ,
μk+1 = λk+1 − αϕ
on A− k.
Using step (ii) of the algorithm in the form of (7.6.8) implies that ⎧ k ⎨ μ >0 μk+1 = λk+1 − λk + λk − αψ = λk+1 − λk + α(uk − ψ) > 0 ⎩ αuk + μk − αψ ≥ 0 and therefore
+ on A+ k−1 ∩ Ak , + on Ik−1 ∩ Ak , + on A− k−1 ∩ Ak ,
|μk+1,− (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A+ k.
(7.6.15)
Analogously one derives |μk+1,+ (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A− k.
(7.6.16)
Moreover 1 k+1 (λ − λk + λk ) − ψ α ⎧ 1 k ⎨ αμ ≤ 0 1 k+1 uk − ψ ≤ 0 = (λ − λk ) + ⎩ 1 k α μ +u−ψ ≤0 α
uk+1 − ψ =
on A+ k−1 ∩ Ik , on Ik−1 ∩ Ik , on A− k−1 ∩ Ik ,
which implies that |(uk+1 − ψ)+ (x)| ≤
1 k+1 |λ (x) − λk (x)| for x ∈ Ik . α
(7.6.17)
1 k+1 |λ (x) − λk (x)| for x ∈ Ik . α
(7.6.18)
Analogously one derives that |(ϕ − uk+1 )+ (x)| ≤
Due to (ii) of the algorithm we have that − (uk+1 − ψ)+ = (ϕ − uk+1 )+ = 0 on A+ k ∪ Ak ,
which, together with (7.6.17)–(7.6.18), implies that |(uk+1 − ψ)+ (x)| + |(ϕ − uk+1 )+ (x)| ≤
1 k+1 |λ (x) − λk (x)| for x ∈ . α
(7.6.19)
From (7.6.14)–(7.6.16) and since ϕ < ψ a.e. on we find |μk+1,− (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A+ (uk+1 )
(7.6.20)
i
i i
i
i
i
i
7.6. Nonlinear control problems with bilateral constraints
ItoKunisc 2008/6/12 page 211 i
211
and |μk+1,+ (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A− (uk+1 ). Combining (7.6.19)–(7.6.21) implies that
(7.6.21)
M(uk+1 , μk+1 ) ≤
|λk+1 (x) − λk (x)|2 dx.
(7.6.22)
Since (7.6.13) is supposed to hold we have M(uk+1 , μk+1 ) ≤ ρ 2 |R k |2U . Moreover, from the definition of R k we deduce that |R k |2U ≤ α −2 M(uk , μk ),
(7.6.23)
M(uk+1 , μk+1 ) ≤ α −2 ρ 2 M(uk , μk ) for k = 1, 2, . . . .
(7.6.24)
and consequently
From (7.6.13), (7.6.23), (7.6.24) it follows that |λk+1 − λk |U ≤ ( αρ )k ρ|R 0 |U . Thus there exists λ∗ ∈ U such that limk→∞ λk = λ∗ . Note that for k ≥ 1 k A+ k = {x : λ (x) > α ψ(x)},
Ik = {x : α ϕ(x) ≤ λk (x) ≤ α ψ(x)},
k A− k = {x : λ (x) < α ϕ(x)},
and hence μk+1 = max(0, λk − α ψ) + min(0, λk − α ϕ) + (λk+1 − λk )χA+k ∪A−k . Since limk→∞ (λk+1 − λk ) = 0 and limk→∞ λk exists, it follows that there exists μ∗ ∈ U such that limk→∞ μk = μ∗ , and μ∗ = max(0, λ∗ − α ψ) + min(0, λ∗ − α ϕ).
(7.6.25)
From the last equation in (7.6.8) it follows that there exists u∗ such that limk→∞ uk = u∗ and α u∗ −λ∗ +μ∗ = 0. Combined with (7.6.25) the triple (u∗ , μ∗ ) satisfies the complementarity condition given by the second equation in (7.6.2). Passing to the limit with respect to k in (7.6.5) we obtain that the first equation in (7.6.2) is satisfied by (u∗ , μ∗ ). We turn to the discussion of (7.6.13) and consider the linear case first. Proposition 7.14. If T is linear and T 2L(U,Y ) < α, then (7.6.13) holds. Proof. From (7.6.8) and (7.6.9) we have, with δu = uk+1 − uk , δy = y k+1 − y k , and δλ = λk+1 − λk , ⎧ 1 k ⎪ ⎪ δu = R + α δλ χIk , ⎪ ⎪ ⎨ (7.6.26) T δu = δy, ⎪ ⎪ ⎪ ⎪ ⎩ ∗ T δy + δλ = 0.
i
i i
i
i
i
i
212
ItoKunisc 2008/6/12 page 212 i
Chapter 7. The Primal-Dual Active Set Method
Taking the inner product in L2 with δλ in the first equation we arrive at (δu, δλ) ≤ (R k , δλ), and taking the inner product with δu in the third equation, |δy|2 + (δλ, δu) = 0. Combining these two relations implies that |δy|2 ≤ (R k , δλ). Utilizing the third equation in (7.6.26) in this last inequality and the fact that the norms of T and T ∗ coincide, we have |δλ| ≤ T ∗ 2 |R k |, from which the desired estimate follows. We now turn to a particular case when (7.6.13) holds for a nonlinear operator T . Let = be a bounded domain in Rn , n = 2 or 3, with smooth boundary ∂ . Further let φ : R → R be a monotone mapping with locally Lipschitzian derivative, satisfying φ(0) = 0, and such that the substitution operator determined by φ maps H 1 () into L2 (). We choose U = Y = L2 () and define T (u) = y as the solution operator to − y + φ(y) = u in , (7.6.27) y=0 on ∂, where denotes the Laplacian. The adjoint variable λ is the solution to λ + φ (y)λ = −(y − z) in , λ=0 on ∂.
(7.6.28)
) = {uk : k = 1, 2, . . .} denote the set of Let (u0 , μ0 ) be an arbitrary initialization and let U iterates generated by the primal-dual active set algorithm. Since these iterates are solutions ) is bounded in to the auxiliary problems (7.6.6), it follows that for every α¯ > 0 the set U 2 L () uniformly with respect to α ≥ α. ¯ By monotone operator theory and regularity theory of elliptic partial differential equations it follows that the set of primal states {y k = y(uk ) : k = 1, 2, . . .} and adjoint states {λk = λ(y(uk )) : k = 1, 2, . . .} are bounded subsets of L∞ (); see [Tr]. Let C denote this bound and let LC denote the Lipschitz constant of φ on the ball BC (0) with center 0 and radius C in R. Denote by H01 () = {u ∈ H 1 () : u = 0 on ∂} the Hilbert space endowed with norm |∇u|L2 and let κ stand for the embedding constant from H01 () into L2 (). 4
(1+C LC ) κ < 1, where α ≥ α. ¯ Then (7.6.13) holds Proposition 7.15. Assume that 0 < α−(1+C LC ) κ 4 for the mapping T determined by the solution operator to (7.6.27).
Proof. From (7.6.8) we have − (y k+1 − y k ) + φ(y k+1 ) − φ(y k ) =
1 k+1 − λk )χI k+1 + R k , (λ α
(7.6.29)
− (λk+1 − λk ) + φ (y k+1 )(λk+1 − λk ) + (φ (y k+1 ) − φ (y k ))λk + y k+1 − y k = 0, (7.6.30)
i
i i
i
i
i
i
7.6. Nonlinear control problems with bilateral constraints
ItoKunisc 2008/6/12 page 213 i
213
both Laplacians with homogeneous Dirichlet boundary conditions. Taking the inner product of (7.6.29) with y k+1 − y k we have, using monotonicity of φ, |y k+1 − y k |1 ≤
κ2 |(λk+1 − λk )|1 + |R k |−1 , α
(7.6.31)
where | · |1 and | · |−1 denote the norms in H01 () and H −1 (), respectively. Note that φ (y k+1 ) ≥ 0. Hence from (7.6.30) we find |λk+1 − λk |21 ≤ C LC |y k+1 − y k |L2 |λk+1 − λk |L2 + |(y k+1 − y k , λk+1 − λk )| ≤ (1 + C LC ) κ 2 |y k+1 − y k |1 |λk+1 − λk |1 . Thus, |λk+1 − λk |1 ≤ (1 + C LC ) κ 2 |y k+1 − y k |1 and hence from (7.6.31) |y k+1 − y k |1 ≤
α |R k |−1 . α − (1 + C LC ) κ 4
|λk+1 − λk |L2 ≤
α (1 + C LC ) κ 4 |R k |L2 . α − (1 + C LC ) κ 4
It thus follows that
This implies (7.6.13) with ρ =
(1 + C LC ) κ 4 . α − (1 + C LC ) κ 4
i
i i
i
i
ItoKunisc 2008/6/12 page 214 i
i
i
i
i
i
i
i
i
i
ItoKunisc 2008/6/12 page 215 i
Chapter 8
Semismooth Newton Methods I
8.1
Introduction
In this chapter we study semismooth Newton methods for solving nonlinear nonsmooth equations. These investigations are motivated by complementarity problems, variational inequalities, and optimal control problems with control or state constraints, for example. The operator equation for which we desire to find a solution is typically Lipschitz continuous but not C 1 regular. We shall also establish the relationship between the semismooth Newton method and the primal-dual active set method that was discussed in Chapter 7. Since semismooth Newton methods are not widely known even for finite-dimensional problems, we consider the finite-dimensional case before we turn to problems in infinite dimensions. In fact, these two cases are distinctly different. In finite dimensions we have Rademacher’s theorem, which states that every locally Lipschitz continuous function is differentiable almost everywhere. This result has no counterpart for functions between infinite-dimensional function spaces. As an example consider the nonlinear complementarity problem g(x) ≤ 0,
x ≤ ψ,
and
(g(x), x − ψ)Rn = 0,
where g : R → R and ψ ∈ R . It can be expressed equivalently as the problem of finding a root to the following equation: n
n
n
F (x) = g(x) + max(0, −g(x) + x − ψ) = max(g(x), x − ψ) = 0,
(8.1.1)
where the max operation just like the inequalities must be interpreted componentwise. Note that F is a locally Lipschitz continuous function if g is locally Lipschitzian, but it is not C 1 if g is C 1 . A function is called locally Lipschitz continuous if it is Lipschitz continuous on every bounded subset of its domain. Let us introduce some of the key concepts for the finite-dimensional case. For a locally Lipschitz continuous function F : Rm → Rn let DF denote the set of points at which F is differentiable. For x ∈ Rm we define ∂B F (x) as ∂B F (x) = J : J = lim ∇F (xi ) , (8.1.2) xi →x, xi ∈DF
215
i
i i
i
i
i
i
216
ItoKunisc 2008/6/12 page 216 i
Chapter 8. Semismooth Newton Methods I
and we denote by ∂F (x) the generalized derivative at x introduced by Clarke [Cla], i.e., ∂F (x) = co ∂B F (x),
(8.1.3)
where co stands for the convex hull. A generalized Newton iteration for solving the nonlinear equation F (x) = 0 with F : Rn → Rn can be defined by x k+1 = x k − Vk−1 F (x k ), where Vk ∈ ∂B F (x k ).
(8.1.4)
The reason for using ∂B F rather than ∂F in (8.1.4) is the following: For the convergence analysis we shall require that all V ∈ ∂B F (x ∗ ) are nonsingular, where x ∗ is the thought solution to F (x) = 0. This is more readily satisfied for ∂B F than for ∂F , as can be seen from F (x) = |x|, for example. In this case 0 ∈ ∂F (0) but 0 ∈ / ∂B F . We also introduce the coordinatewise operation ∂b F (x) = ⊗m i=1 ∂B Fi (x), where Fi is the ith coordinate of F . From the definition of ∂B F (x) it follows that ∂B F (x) ⊂ ∂b F (x). For F given in (8.1.1) we have ∂B F (x) = ∂b F (x) if −g (x) + I is surjective. Moreover, in Section 8.4 it will be shown that if we select ⎧ if − g(x) + x − ψ > 0, ⎨ d V (x) d = ⎩ g (x)d if − g(x) + x − ψ ≤ 0, then the generalized Newton method reduces to the primal-dual active set method. Local convergence of {x k } to x ∗ , a solution of F (x) = 0, is based on the following concepts. The generalized Jacobians V k ∈ ∂B F (xk ) are selected so that their inverses V (xk )−1 are uniformly bounded and that they satisfy the condition |F (x ∗ + h) − F (x ∗ ) − V h| = o(|h|),
(8.1.5)
where V = V (x ∗ + h) ∈ ∂B F (x ∗ + h), for h in a neighborhood of x ∗ . Then from (8.1.5) with h = x k − x ∗ and Vk = V (x k ) we have |x k+1 − x ∗ | = |Vk−1 (F (x k ) − F (x ∗ ) − Vk (x k − x ∗ ))| = o(|x k − x ∗ |).
(8.1.6)
Thus, there exists a neighborhood B(x ∗ , ρ) of x ∗ such that if x 0 ∈ B(x ∗ , ρ), then x k ∈ B(x ∗ , ρ) and x k converges to x ∗ superlinearly. This discussion will be made rigorous for F mapping between finite-dimensional spaces, as above, as well as for the infinite-dimensional case. For the finite-dimensional case we shall rely on the notion of semismooth functions. Definition 8.1. F : Rm → Rn is called semismooth at x if F is locally Lipschitz at x and lim
V ∈∂F (x+t h ), h →h, t→0+
V h exists for all h ∈ Rm .
(8.1.7)
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 217 i
217
Semismoothness was originally introduced by Miflin [Mif] for scalar-valued functions. Convex functions and real-valued C 1 functions are examples for such semismooth functions. Definition 8.1 is due to Qi and Sun [Qi, QiSu]. It will be shown that if F is semismooth at x ∗ , condition (8.1.5) holds. Due to the fact that the notion of Clarke derivative is not available in infinite dimensions, Definition 8.1 does not allow a direct generalization to the infinite-dimensional case. Rather, the notion of Newton differentiability related to the property expressed in (8.1.5) was developed in [CNQ, HiIK]. Alternatively, in [Ulb] the infinite-dimensional case of mappings into Lp spaces was treated by considering the superposition of mappings F = !(G), with G a C 1 mapping into Lr and ! the substitution operator (!y)(s) = ψ(y(s)) for y ∈ Lr , where ψ is a semismooth function between finite-dimensional spaces. As an alternative to (8.1.4) the increment d k for a generalized Newton method x k+1 = k x + d k can be defined as the solution to F (x k ) + F (x k ; d) = 0,
(8.1.8)
where F (x; d) denotes the directional derivative of F at x in direction d. Note that it may be a nontrivial task to solve (8.1.8) for d k . This method was investigated in [Pan1, Pan2]. We shall return to (8.1.8) in the context of globalization of the method specified by (8.1.4). In Section 8.2 we present the finite-dimensional theory for Newton’s method for semismooth functions. Section 8.3 is devoted to the discussion of Newton differentiability and solving nonsmooth equations in Banach spaces. In Section 8.4 we exhibit the relationship between the primal-dual active set method and semismooth Newton methods. Section 8.5 is devoted to a class of nonlinear complementarity problems. In Section 8.6 we discuss applications where, for different reasons, semismooth Newton methods are not directly applicable but rather a regularization is necessary as, for instance, in the case of state-constrained optimal control problems.
8.2 8.2.1
Semismooth functions in finite dimensions Basic concepts and the semismooth Newton algorithm
In this section we discuss properties of semismooth functions and analyze convergence of the generalized Newton method (8.1.4). We follow quite closely the work by Qi and Sun [QiSu]. First we describe the relationship of semismoothness to directional differentiability. A function F : Rm → Rn is called directionally differentiable at x ∈ Rm if lim
t→0+
F (x + t h) − F (x) = F (x; h) t
exists for all h ∈ Rd . Further F is called Bouligand-differentiable (B-differentiable) [Ro4] at x if it is directionally differentiable at x and lim
h→0
F (x + h) − F (x) − F (x; h) = 0. |h|
A locally Lipschitzian function F is B-differentiable at x if and only if it is directionally differentiable at x; see [Sha] and the references therein.
i
i i
i
i
i
i
218
ItoKunisc 2008/6/12 page 218 i
Chapter 8. Semismooth Newton Methods I
Theorem 8.2. (i) Suppose that F : Rm → Rn is locally Lipschitz continuous and directionally differentiable at x. Then F (x; ·) is Lipschitz continuous and for every h, there exists a V ∈ ∂F (x) such that F (x; h) = V h.
(8.2.1)
(ii) If F is locally Lipschitz continuous, then the following statements are equivalent. (1) F is semismooth at x. (2) F is directionally differentiable at x and for every V ∈ ∂F (x + h), V h − F (x; h) = o(|h|) as h → 0. (3) limx+h∈ DF , |h|→0
F (x+h;h)−F (x;h) |h|
= 0.
Proof. (i) For h, h ∈ Rm we have
F (x + th) − F (x + th )
|F (x; h) − F (x; h )| = lim+
≤ L |h − h |, t→0 t
(8.2.2)
where L is the Lipschitz constant of F in a neighborhood of x. Thus F (x; ·) is Lipschitz continuous at x. Since F (y) − F (x) ∈ co ∂F ([x, y])(y − x)
(8.2.3)
for all x, y ∈ Rm (see [Cla, p. 72]), there exist a sequence {tk }, with tk → 0+ , and Vk ∈ co ∂F ([x, x + tk h])(y − x) such that F (x; h) = lim Vk h. k→∞
Since F is locally Lipschitz, the sequence {Vk } is bounded, and there exists a subsequence of Vk , denoted by the same symbol, such that Vk → V . Moreover ∂F is closed at x; i.e., xi → x and Zi → Z, with Zi ∈ ∂F (xi ), imply that Z ∈ ∂F (x) [Cla, p. 70], and hence V ∈ ∂F (x). Thus F (x; h) = V h, as desired. (ii) We turn to verify the equivalence of (1)–(3). (1) → (2): First we show that F (x; h) exists and that F (x; h) =
lim
V ∈∂F (x+th), t→0+
V h.
(8.2.4)
(x) : t → 0} is bounded. Thus there exists a sequence Since F is locally Lipschitz, { F (x+th)−F t ti → 0+ and ∈ Rn such that
lim
i→∞
F (x + ti h) − F (x) = . ti
We argue that equals the limit in (8.2.4). By (8.2.3) F (x + ti h) − F (x) ∈ co (∂F ([x, x + ti h])) h = co (∂F ([x, x + ti h] h). ti
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 219 i
219
The theorem implies that for each i there exist tik ∈ [0, ti ], λki ∈ [0, 1] with !n Carathéodory k k k k=0 λi = 1, and Vi ∈ ∂F (x + ti h), where k = 0, . . . , n, such that F (x + ti h) − F (x) k k = λi Vi h. ti k=0 n
By passing to subsequences such that lim λki → λk , for k = 0, . . . , n, we have =
n k=0
n lim λki lim Vik h = λk
i→∞
i→∞
k=0
lim
V ∈∂F (x+th), t→0+
Vh =
lim
V ∈∂F (x+th), t→0+
V h.
Next we prove that the limit in (8.2.4) is uniform for all h with |h| = 1. This implies (2). If the claimed uniform convergence in (8.2.4) does not hold, then there exists > 0, and sequences {hk } in Rm with |hk | = 1, {tk } with tk → 0+ , and Vk ∈ ∂F (x + tk hk ) such that |Vk hk − F (x; hk )| ≥ 2. Passing to a subsequence we may assume that hk converges to some h ∈ Rm . By Lipschitz continuity of F (x; h) with respect to h we have |Vk hk − F (x; h)| ≥ for all k sufficiently large. This contradicts the semismoothness of F at x. (2) → (1): Suppose that F is not semismooth at x. Then there exist > 0, h ∈ Rm , and sequences {hk } in Rm and {tk }, and h ∈ Rm satisfying hk → h, tk → 0+ , and Vk ∈ ∂F (x + tk hk ) such that |Vk hk − F (x; h)| ≥ 2. Since F (x; ·) is Lipschitz continuous, |Vk hk − F (x; hk )| ≥ for sufficiently large k. This contradicts assumption (2). (2) → (3) follows from (8.2.1). (3) → (2): For arbitrary > 0 there exists δ > 0 such that for all h with |h| < δ and x + h ∈ DF |F (x + h; h) − F (x; h)| ≤ |h|. (8.2.5) with h = 0. By (8.1.3) V h ∈ co lim F (x + h ; h) .
Let V ∈ ∂F (x + h) and |h| <
δ 2
h →h, x+h ∈DF
! By the Carathéodory theorem, there exist λk ≥ 0 with nk=0 λk = 1 and hk , k = 1, . . . , , |h|), where L is the Lipschitz constant satisfying x + hk ∈ DF and |hk − h| ≤ min( 2δ , |h| L of F near x, such that
n
k k
V h −
≤ |h|. λ F (x + h ; h) (8.2.6)
k=0
i
i i
i
i
i
i
220
ItoKunisc 2008/6/12 page 220 i
Chapter 8. Semismooth Newton Methods I
From (8.2.5) and (8.2.2) we find
n
k
k
λ F (x + h ; h) − F (x; h)
k=0
≤
n k=0
≤
n
λk |F (x + hk ; h) − F (x + hk ; hk )| + |F (x + hk ; hk ) − F (x; hk )| +|F (x; hk ) − F (x; h)| λk (2L|hk − h| + |hk |) ≤ 4 |h|.
k=0
It thus follows from (8.2.6) that |V h − F (x + h; h)| ≤ 5 |h|. Since > 0 is arbitrary, (2) follows. Theorem 8.3. Let F : Rn → Rn be locally Lipschitz continuous, let x ∈ Rn , and suppose that all V ∈ ∂B F (x) are nonsingular. Then there exist a neighborhood N of x and a constant C such that for all y ∈ N and V ∈ ∂B F (y) |V −1 | ≤ C.
(8.2.7)
Proof. First, we claim that there exist a neighborhood N of x and a constant C such that for all y ∈ DF ∩ N , ∇F (y) is nonsingular and |∇F (y)−1 | ≤ C.
(8.2.8)
If this claim is not true, then there exists a sequence y k → x, y k ∈ DF , such that either all ∇F (y k ) are singular or |(∇F (y k ))−1 | → ∞. Since F is locally Lipschitz the set {∇F (y k ) : k = 1, . . .} is bounded. Thus there exists a subsequence of ∇F (y k ) that converges to some V . Then V must be singular and V ∈ ∂B F (x). This contradicts the assumption and there exists a neighborhood N of x such that (8.2.8) holds for all y ∈ DF ∩N . Moreover (8.2.7) follows from (8.1.2), (8.2.8), and continuity of the norm. Lemma 8.4. Suppose that F : Rn → Rn is semismooth at a solution x ∗ of F (x) = 0 and that all V ∈ ∂B F (x ∗ ) are nonsingular. Then there exist a neighborhood N of x ∗ and : R+ → R+ with limt→0+ (t) = 0 monotonically such that |x − V −1 F (x) − x ∗ | ≤ (|x − x ∗ |)|x − x ∗ |, |F (x − V −1 F (x))| ≤ (|x − x ∗ |)|F (x)| for all V ∈ ∂B F (x) and x ∈ N . Proof. From Theorem 8.3 there exist a neighborhood N of x ∗ and a constant C such that |V −1 | ≤ C for all V ∈ ∂B F (x) with x ∈ N . Thus, if x ∈ N , then it follows from
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 221 i
221
Theorem 8.2 (2) and B-differentiability of F at x ∗ , which is implied by semismoothness of F at x ∗ , that |x − V −1 F (x) − x ∗ | ≤ |V −1 ||F (x) − F (x ∗ ) − V (x − x ∗ )| ≤ |V −1 | |F (x) − F (x ∗ ) − F (x ∗ ; x − x ∗ )| + |F (x ∗ ; x − x ∗ ) − V (x − x ∗ )| ≤ (|x − x ∗ |)|x − x ∗ |. This implies the first claim. Let xˆ = x − V −1 F (x). Since F is B-differentiable at x ∗ we have |F (x)| ˆ ≤ |F (x ∗ ; xˆ − x ∗ )| + (|xˆ − x ∗ |)|xˆ − x ∗ |. From the first part of the theorem we obtain for a possibly redefined function |F (x)| ˆ ≤ (L + ) |x − x ∗ |, where = (|x − x ∗ |) and L is the Lipschitz constant of F at x ∗ . Since |x − x ∗ | ≤ |xˆ − x| + |xˆ − x ∗ | ≤ |V −1 F (x)| + |xˆ − x ∗ | ≤ C|F (x)| + |x − x ∗ |, we find |x − x ∗ | ≤
C |F (x)| 1−
for all x sufficiently close to x ∗ and hence |F (x)| ˆ ≤
C(L + ) |F (x)|. 1−
This implies the second claim. We are now prepared for the local superlinear convergence result that was announced in the introduction of this chapter. Theorem 8.5 (Superlinear Convergence). Suppose that F : Rn → Rn is semismooth at a solution x ∗ of F (x) = 0 and that all V ∈ ∂B F (x ∗ ) are nonsingular. Then the iterates x k+1 = x k − Vk−1 F (x k ),
Vk ∈ ∂B F (x k ),
for k = 0, 1, . . . are well defined and converge to x ∗ superlinearly if x 0 is chosen sufficiently close to x ∗ . Moreover, |F (xk )| decreases superlinearly to 0. Proof. We proceed by induction and suppose that x 0 ∈ N and (|x 0 − x ∗ |) ≤ 1 with N, defined in Lemma 8.4. Without loss of generality we may assume that N is a ball in Rm . To verify the induction step suppose that x k ∈ N . Then |x k+1 − x ∗ | ≤ (|x k − x ∗ |)|x k − x ∗ | ≤ |x 0 − x ∗ |. This implies that x k+1 ∈ N and superlinear convergence of x k → x ∗ . Superlinear convergence of F (x k ) to 0 easily follows from the second part of Lemma 8.4.
i
i i
i
i
i
i
222
8.2.2
ItoKunisc 2008/6/12 page 222 i
Chapter 8. Semismooth Newton Methods I
Globalization
We now discuss globalization of the semismooth Newton method (8.1.4). For this purpose we define the merit function θ by θ(x) = |F (x)|2 . Throughout we assume that F : Rn → Rn is locally Lipschitz continuous and Bdifferentiable and that the following assumptions (8.2.9)–(8.2.11) hold: S = {x ∈ Rn : |F (x)| ≤ |F (x 0 )|} is bounded.
(8.2.9)
There exist σ¯ , b > 0 and a graph from S to the nonempty subsets of Rn such that θ (x; d) ≤ −σ¯ θ (x) and |d| ≤ b |F (x)| for all d ∈ (x) and x ∈ S.
(8.2.10)
Moreover has the following closure property: xk → x¯ and ¯ ≤ −σ¯ θ(x). dk → d¯ with xk ∈ S and dk ∈ (xk ) imply θ o (x; ¯ d) ¯
(8.2.11)
Here θ o (x; d) denotes the Clarke generalized directional derivative of θ at x in direction d (see [Cla]) which is defined by θ o (x; d) = lim sup y→x,
t→0+
θ(y + t d) − θ(y) . t
It is assumed that b > C is an arbitrarily large parameter. It serves the purpose that the iterates {dk } are uniformly bounded. With reference to the closure property of given in (8.2.11), note that it is not required that d¯ ∈ (x). ¯ Conditions (8.2.10) and (8.2.11) will be discussed in Section 8.2.3 below. Following [HaPaRa, Qi] we next investigate a globalization strategy for the generalized Newton iteration (8.1.4) and introduce the following algorithm. Algorithm G. Let β, γ ∈ (0, 1) and σ ∈ (0, σ¯ ). Choose x 0 ∈ Rn and set k = 0. Given x k with k F (x ) = 0. Then (i) If there exists a solution hk to Vk hk = −F (x k ) with |hk | ≤ b|F (x k )|, and if further |F (x k + hk )| < γ |F (x k )| , set d k = hk , x k+1 = x k + d k , αk = 1, and mk = 0. (ii) Otherwise choose d k ∈ (x k ) and let αk = β mk , where mk is the first positive integer m for which θ (x k + β m d k ) − θ(x k ) ≤ −σβ m θ(x k ).
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 223 i
223
Finally set x k+1 = x k + αk d k . In (ii) the increment d k = hk from (i) can be chosen if θ (x k ; hk ) ≤ −σ θ(x k ). Theorem 8.6. Suppose that F : Rn → Rn is locally Lipschitz and B-differentiable. (a) Assume that (8.2.9)–(8.2.11) hold. Then the sequence {x k } generated by Algorithm G is bounded, it satisfies |F (x k+1 )| < |F (x k )| for all k ≥ 0, and each accumulation point x ∗ of {x k } satisfies F (x ∗ ) = 0. (b) If moreover for one such accumulation point |h| ≤ c |F (x ∗ ; h)| for all h ∈ Rn ,
(8.2.12)
then the sequence x k converges to x ∗ . (c) If in addition to the above assumptions F is semismooth at x ∗ and all V ∈ ∂B F (x ∗ ) are nonsingular, then x k converges to x ∗ superlinearly. Proof. (a) First we prove that for each x ∈ S such that θ(x) = 0 and d satisfying θ (x; d) ≤ −σ¯ θ (x), there exists a τ¯ > 0 such that θ (x + τ d) − θ(x) ≤ −σ τ θ(x) for all τ ∈ [0, τ¯ ]. If this is not the case, then there exists a sequence τn → 0+ such that θ (x + τn d) − θ(x) > −σ τn θ(x). Dividing both sides by τn and letting n → ∞, we have by (8.2.10) −σ¯ θ (x) ≥ θ (x; d) ≥ −σ θ (x). Since σ < σ¯ , this shows θ (x) = 0, which contradicts the assumption θ(x) = 0. Hence for each level k at which d k ∈ (x k ) is chosen according to the second alternative in Algorithm G, there exists mk < ∞ and αk > 0 such that |F (x k+1 )| < |F (x k )|. By construction the iterates therefore satisfy |F (x k+1 )| < |F (x k )| for each k ≥ 0. Assume first that lim sup αk > 0. If the first alternative in Algorithm G with αk = 1 occurs infinitely many times, then using the fact that γ < 1 we find that limk→0 θ(x k ) = 0. Otherwise, for all k sufficiently large 0 ≤ θ (x k+1 ) ≤ (1 − σ αk )θ (x k ) ≤ θ(x k ). Thus θ (x k ) is monotonically decreasing, bounded below by 0, and hence convergent. Therefore limk→∞ (θ (x k+1 ) − θ (x k )) = 0 and consequently limk→∞ αk θ(x k ) = 0. Thus lim sup αk > 0 implies that limk→∞ θ(x k ) = 0. Hence each accumulation point x ∗ of {x k } satisfies F (x ∗ ) = 0. Note that the existence of an accumulation point follows from (8.2.9). If on the other hand lim sup αk = 0, then lim mk → ∞. By the definition of mk , for τk := β mk −1 , we have τk → 0 and θ (x k + τk d k ) − θ(x k ) > −σ τk θ(x k ).
(8.2.13)
i
i i
i
i
i
i
224
ItoKunisc 2008/6/12 page 224 i
Chapter 8. Semismooth Newton Methods I
By (8.2.9), (8.2.10) the sequence {(x k , d k )} is bounded. Let {(x k , d k )}k∈K be any convergent subsequence with limit (x ∗ , d). Note that θ (x k + τk d k ) − θ (x k ) θ (x k + τk d) − θ(x k ) θ(x k + τk d k ) − θ(x k + τk d) = + , τk τk τk where θ (x k + τk d k ) − θ(x k + τk d) → 0, k∈K,k→∞ τk lim
since θ is locally Lipschitz continuous. Since d k ∈ (x k ) for all k ∈ K it follows from (8.2.11) that θ (x ∗ ; d) ≤ −σ¯ θ (x ∗ ). Then from (8.2.13) and (8.2.11) we find −σ θ (x ∗ ) ≤ lim sup k∈K, k→∞
θ (x k + τk d k ) − θ(x k ) ≤ θ o (x ∗ ; d) ≤ −σ¯ θ(x ∗ ). τk
(8.2.14)
It follows that (σ¯ − σ ) θ (x ∗ ) ≤ 0 and thus θ(x ∗ ) = 0. (b) Since F is B-differentiable at x ∗ there exists a δ > 0 such that for |x − x ∗ | ≤ δ |F (x) − F (x ∗ ) − F (x ∗ ; x − x ∗ )| ≤
1 |x − x ∗ |. 2c
Thus, |F (x ∗ ; x − x ∗ )| ≤ |F (x)| + |F (x) − F (x ∗ ) − F (x ∗ , x − x ∗ )| ≤ |F (x)| +
1 |x − x ∗ |. 2c
From (8.2.12) |x − x ∗ | ≤ c |F (x ∗ ; x − x ∗ )| ≤ c |F (x)| + and thus
|x − x ∗ | ≤ 2c |F (x)|
1 |x − x ∗ | 2
if |x − x ∗ | ≤ δ.
Given ∈ (0, δ) define the set ∗ N (x , ) = x ∈ Rn : |x − x ∗ | ≤ , |F (x)| ≤
2c + b
. ¯
Since x ∗ is an accumulation point of {x k }, there exists an index k¯ such that x k ∈ N (x ∗ , ). Since |d k | ≤ b|F (x k )| for all k we have ¯
¯
¯
¯
¯
|x k+1 − x ∗ | ≤ |x k − x ∗ + αk¯ d k | ≤ |x k − x ∗ | + αk¯ |d k | ¯
¯
¯
≤ 2c |F (x k )| + b |F (x k )| = (2c + b) |F (x k )| ≤ . ¯ Hence x k+1 ∈ N (x ∗ , ). By induction, x k ∈ N (x ∗ , ) for all k ≥ k¯ and thus the sequence x k converges to x ∗ .
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 225 i
225
(c) Since limk→∞ x k = x ∗ the iterates x k of Algorithm G enter into the region of attraction for Theorem 8.5. Moreover, referring to the proof of Lemma 8.4, for any γ ∈ (0, 1) there exists kγ such that the iterates according to (8.1.4) satisfy |F (x k+1 )| ≤ γ |F (x k )| for k ≥ kγ . Hence these iterates coincide with those of the Algorithm G for k ≥ kγ , and superlinear convergence follows. Remark 8.2.1. (i) We point out that the requirement that the graph satisfies the closure property (8.2.11) is used in the proof of Theorem 8.6 only for the case that lim supk→∞ αk = 0. (ii) For part (a) of Theorem 8.6 the condition |hk | ≤ b|F (x k )| in the first alternative of Algorithm G is not required and |d| ≤ b|F (x)| for all x ∈ S used in alternative (ii) can be replaced by requiring that the directions are uniformly bounded. These conditions are used in the proof of Theorem 8.6(b). (iii) Since h → F (x ∗ ; h) is positively homogeneous, since we consider the finitedimensional case here, one can easily argue that (8.2.12) is equivalent to F (x ∗ ; h) = 0 for all h = 0, which is called BD-regularity in [Qi].
8.2.3
Descent directions
We turn to a discussion of conditions (8.2.10) and (8.2.11) required for the descent directions d. For the Clarke generalized directional derivative θ o (x, d) we have by local Lipschitz continuity of F at x that θ o (x, d) = lim sup
y→x,t→0+
2(F (x), (F (y + t d) − F (y)) t
and there exists F o : Rn × Rn → Rn such that θ o (x, d) = 2(F (x), F o (x; d))
for (x, d) ∈ Rn × Rn .
We introduce the notion of quasi-directional derivative. Definition 8.7. Let F : Rn → Rn be directionally differentiable. Then G : S × Rn → Rn is called a quasi-directional derivative of F on S ⊂ Rn if (i) (F (x), F (x; d)) ≤ (F (x), G(x; d)), (ii) G(x; td) = tG(x; d) for all d ∈ Rn , x ∈ S, and t ≥ 0, ¯ ≤ lim supx→x, ¯ d)) ¯ d → d¯ with (iii) (F (x), ¯ F o (x; ¯ d→d¯ (F (x), G(x; d)) for all x → x, x, x¯ ∈ S. For the special case of optimization subject to box constraints a quasi-directional derivative will be constructed in Section 8.2.5. In the remainder of this section we consider the relationship between well-known choices for descent directions (see, e.g., [HPR, Pan1]) and the concept of quasi-directional derivative of Definition 8.7, and we assume that F is
i
i i
i
i
i
i
226
ItoKunisc 2008/6/12 page 226 i
Chapter 8. Semismooth Newton Methods I
a locally Lipschitz continuous and directionally differentiable function on S where S refers to the set defined in (8.2.9). (a) Bouligand direction. If there exists b¯ such that |h| ≤ b¯ |F (x; h)| for all x ∈ S, h ∈ Rn
(8.2.15)
F (x) + F (x; d) = 0
(8.2.16)
and if
admits a solution d for each x ∈ S, then a first choice for the direction is given by the ¯ (x)|. solution d to (8.2.16), i.e., (x) = d; see, e.g., [Pan1]. By (8.2.15) we have |d| ≤ b|F Moreover θ (x, d) = 2 (F (x; d), F (x)) = −2 θ(x), and therefore the inequalities in (8.2.10) hold with b = b¯ and σ¯ = 2. For this choice, however, does not satisfy (8.2.11), in general, unless additional conditions are imposed on the problem data; see Section 8.2.5 below. (b) Generalized Bouligand direction. As a second choice (see [HPR, Pan1]), we assume that G is a quasi-directional derivative of F on S, that |h| ≤ b¯ |G(x; h)| for all x ∈ S, h ∈ Rn
(8.2.17)
F (x) + G(x; d) = 0
(8.2.18)
holds, and that admits a solution d which is used as the descent direction for each x ∈ S. We set (x) = d. Then one argues as for the first choice that the inequalities in (8.2.10) hold with b = b¯ ¯ in S × R with and σ¯ = 2. Moreover satisfies (8.2.11), since for any (x, d) → (x, ¯ d) (x) = d we have ¯ ≤ 2 lim sup (F (x), G(x; d)) = −2 lim |F (x)|2 = −2θ(x). ¯ d) ¯ θ o (x, x→x, ¯ d→d¯
x→x¯
We refer the reader to Section 8.2.5 for the construction of G for specific applications. (c) Generalized gradient direction. The following choice was discussed in [HPR]. Here d is chosen as the solution to min J (x, d) = 2(F (x), G(x; d)) + η |d|2 , d
(8.2.19)
where η > 0 and x ∈ S. Assume that for some L > 0 ⎧ ⎨ h → G(x; h) is continuous and ⎩
(8.2.20) |G(x; h)| ≤ L |h| for all x ∈ S, h ∈ Rn .
Then, d → J (d) is coercive, bounded below, and continuous. Thus there exists an optimal solution d to (8.2.19).
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 227 i
227
Lemma 8.8. Assume that F : Rn → Rn is Lipschitz continuous, directionally differentiable and that G is a quasi-directional derivative of S satisfying (8.2.20). (a) If d is an optimal solution to (8.2.19), then (F (x), G(x; d)) = −η |d|2 . (b) If d = 0 is an optimal solution to (8.2.19), then (F (x), G(x; h)) ≥ 0 for all h ∈ Rn . Proof. (a) Let d be an optimal solution to (8.2.19) and consider min 2(F (x), G(x; αd)) + α 2 η |d|2 . α≥0
Then α = 1 is optimal and differentiating with respect to α, we have (F (x), G(x; d)) + η |d|2 = 0. (b) If d = 0 is an optimal solution, then the optimal value of (8.2.19) is zero. It follows that for all h ∈ Rn and α ≥ 0 0 ≤ 2α(F (x), G(x; h)) + α 2 η|h|2 . Dividing by α > 0 and letting α → 0+ we obtain the claim. By Lemma 8.8 the optimal value of the cost J in (8.2.19) is given by −η|d|2 . If this value is negative, then any solution to (8.2.19) provides a decay for θ . The optimal value of the cost is 0 if and only if d = 0 is the optimal solution. In this case Lemma 8.8 implies that x is a stationary point in the sense that (F (x), G(x; h)) ≥ 0 for all h ∈ Rn . Let us now turn to the discussion of condition (8.2.10) for the direction given by the solution d to (8.2.19), i.e., (x) = d. We assume (8.2.17) and that (8.2.18) admits a solution for every x ∈ S. Since J (0) = 0, we have 2(F (x), G(x; d)) ≤ −η |d|2 and therefore η |d|2 ≤ −2(F (x), G(x; d)) ≤ 2|F (x)| |G(x; d)| ≤ 2L |d| |F (x)|. Thus, |d| ≤
2L |F (x)|, η
and the second condition in (8.2.10) holds. Turning to the first condition let dˆ satisfy ˆ = 0. Then using Lemma 8.8(a) and (8.2.17) we find, at a solution d to F (x) + G(x; d) (8.2.19), ˆ F (x)) + η |d| ˆ2 J (x, d) = (F (x), G(x; d)) = −η |d|2 ≤ 2 (G(x; d), ≤ −2 |F (x)|2 + ηb¯ 2 |F (x)|2 = −(2 − ηb¯ 2 ) θ (x).
(8.2.21)
Since G is a quasi-directional derivative of F on S, we have θ (x; d) ≤ 2(F (x), G(x; d)) ≤ −2(2 − ηb¯ 2 ) θ (x) and thus the direction d defined by (8.2.19) satisfies the first condition in (8.2.10) with σ¯ = 2(2 − ηb¯ 2 ), provided that η < b¯22 .
i
i i
i
i
i
i
228
ItoKunisc 2008/6/12 page 228 i
Chapter 8. Semismooth Newton Methods I
¯ with dk = (xk ), xk ∈ S, To argue (8.2.11) for this choice of , let xk → x, ¯ dk → d, ˆ ˆ and choose dk such that F (xk ) + G(xk ; dk ) = 0. Then 1 o ¯ ≤ lim sup(F (xk ), G(xk ; dk )) ¯ d) θ (x; 2 k→∞ ≤ lim sup(2(F (xk ), G(xk ; dk )) + η|dk |2 ) k→∞
≤ lim sup(2(F (xk ), G(xk ; dˆk )) + η|dˆk |2 ) k→∞
¯ ≤ −2|F (x)| ¯ 2 + η b¯ 2 lim |F (xk )|2 = −(2 − ηb¯ 2 )θ (x), k→∞
and thus (8.2.11) holds if η <
2 . b¯ 2
8.2.4 A Gauss–Newton algorithm In this section we admit the case that the minimum of θ is not attained with value 0, which typically arises in nonlinear least squares problems, and the case where the equation Vk d = −F (x k ) or F (x k ) + G(x k ; d) = 0, which would provide candidates for search directions in (ii) of Algorithm G, does not have a solution. For these cases we consider the following Gauss–Newton method. Gauss–Newton algorithm. Let β, γ ∈ (0, 1), α > 0, and σ ∈ (0, 2η). Choose x 0 ∈ Rn and set k = 0. Given x k with F (x k ) = 0. Then (i) If there exists Vk ∈ ∂B F (x k ) such that hk = −(α |F (x k )| I + VkT Vk )−1 VkT F (x k )
(8.2.22)
with |hk | ≤ b |F (x k )| satisfies |F (x k +hk )| < γ |F (x k )|, let d k = hk and set x k+1 = x k +d k , α k = 1, and mk = 0. (ii) Otherwise, let d k be defined by (8.2.19) with x = x k . Stop if d k = 0 or F (x k ) = 0. Otherwise, set αk = β mk where mk is the first positive integer m for which θ (x k + β m d k ) − θ(x k ) ≤ −σβ m |d k |2 ,
(8.2.23)
and set x k+1 = x k + α k d k . Note that (8.2.22) gives the solution to 1 α min |F (x k ) + Vk h|2 + |F (x k )| |h|2 . h 2 2
(8.2.24)
¯ then If the Gauss–Newton algorithm terminates after finitely many steps with index k, ¯ ¯ ¯ either F (x k ) = 0 or d k = 0 in the second alternative of the algorithm. In this case x k is a ¯k ¯k stationary point of θ in the sense that (F (x ), G(x ; d)) ≥ 0 for all d by Lemma 8.8.
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 229 i
229
We next discuss global convergence when the Gauss–Newton algorithm takes infinitely many steps. Theorem 8.9. Suppose that F : Rn → Rn is locally Lipschitz and B-differentiable, (8.2.9), (8.2.11), and (8.2.20) hold, and G is a quasi-directional derivative of F on S. (a) If {x k } is an infinite sequence generated by the Gauss–Newton algorithm, then k {x } is bounded and |F (x k+1 )| < |F (x k )| for all k. If the first alternative occurs infinitely often, then all accumulation points x ∗ satisfy F (x ∗ ) = 0. Otherwise, limk→∞ d k = 0 and (F (x ∗ ), G(x; h)) ≥ 0
for all h ∈ Rn .
(8.2.25)
(b) If for some accumulation point F (x ∗ ) = 0 and |h| ≤ c |F (x ∗ ; h)|
for all h ∈ Rn ,
(8.2.26)
then the sequence x k converges to x ∗ . (c) If in addition to the assumptions in (a) and (b), F is semismooth at x ∗ and all V ∈ ∂B F (x ∗ ) are nonsingular, then x k converges to x ∗ superlinearly. Proof. (a) The algorithm guarantees that |F (x k+1 )| < |F (x k )| for all k. Due to (8.2.9) the sequence {x k } is bounded. In case the first alternative is taken we have from (8.2.24) with h = 0 α|d k |2 ≤ |F (x k )|. In case of the second alternative Lemma 8.8(a) and (8.2.20) imply that η|d k |2 ≤ |F (x k )||G(x k ; d k )| ≤ L|F (x k )||d k |. Consequently the sequence {d k } is bounded. If the first alternative of the Gauss–Newton algorithm occurs infinitely often, then limk→∞ |F (x k )| = 0 and every accumulation point x ∗ of {x k } satisfies F (x ∗ ) = 0. Let us turn to the case when eventually only the second alternative occurs. Since |F (x k )| is monotonically decreasing and bounded below, the sequence F (x k+1 ) − F (x k ) is convergent and hence by (8.2.23) lim αk |d k |2 = 0.
k→∞
If limk→∞ d k = 0, then there exists an index set K such that limk∈K, k→∞ |d k | = 0 and consequently limk∈K, k→∞ αk = 0. For τk = β mk −1 we have −σ |d k |2 ≤
1 (θ (x k + τ k d k ) − θ(x k )). τk
(8.2.27)
Let Kˆ ⊂ K be such that {x k }k∈Kˆ , {d k }k∈Kˆ are convergent subsequences with limits x ∗ and d. Note that 1 1 (θ (x k + τ k d k ) − θ (x k )) = k (θ (x k + τ k d) − θ(x k )) τk τ 1 + k (θ (x k + τ k d k ) − θ(x k + τ k d)) τ
(8.2.28)
i
i i
i
i
i
i
230
ItoKunisc 2008/6/12 page 230 i
Chapter 8. Semismooth Newton Methods I
with 1 (θ (x k + τ k d k ) − θ(x k + τ k d)) = 0, ˆ τk k∈K,k→∞ lim
since θ is locally Lipschitz continuous. By Lemma 8.8(a) we have (F (x k ), G(x k ; d k )) = ˆ Since G is assumed to be a quasi-directional derivative we have −2η|d k | for all k ∈ K. o ∗ ∗ θ (x , d) = (F (x ), F o (x ∗ ; d)) ≤ −2η|d|. Passing to the limit in (8.2.27), utilizing (8.2.28), we find 1 (θ (x k + τ k d k ) − θ(x k )) = θ o (x ∗ , d) ≤ −2η|d|2 . k τ ˆ k∈K, k→∞
−σ |d|2 ≤ lim sup
Since σ ∈ (0, 2η) this implies that d = 0, which contradicts our assumption. Consequently limk→∞ d k = 0. From Lemma 8.8(a) we have 2α(F (x k ), G(x k ; h)) + ηα 2 |h|2 = J (x k , αh) ≥ J (k k ; d k ) = −η|d k |2 for every α > 0 and h ∈ Rn . Passing to the limit with respect to k and dividing by α we obtain 2 (F (x ∗ ), G(x ∗ ; h)) + ηα|h|2 ≥ 0, and (8.2.25) follows by letting α → 0+ . (b) The proof is identical to the one of Theorem 8.6(b). (c) From Theorem 8.3 there exists a bounded neighborhood N ⊂ {x : |F (x)| ≤ |F (x 0 )|} of x ∗ and a constant C such that for all x ∈ N and V ∈ ∂B F (x) we have |V −1 | ≤ C. Consequently there exists M > 0 such that for all x ∈ N we have
α|F (x)|I + V (x)T V (x) −1 V (x)T − V (x)−1 (8.2.29)
−1 ≤ α|F (x)| α|F (x)|I + V (x)T V (x) ≤ M|F (x)| ≤ M|F (x 0 )|. −1 Let h = − α|F (x)|I + V (x)T V (x) V (x)T F (x). Then by Lemma 8.4, possibly after shrinking N , we have for all x ∈ N |x + h − x ∗ | = |x − V (x)−1 F (x) − x ∗ + h + V (x)−1 F (x)| ≤ (|x − x ∗ |)|x − x ∗ | + M|F (x)|2 .
(8.2.30)
Moreover
|F (x + h)| ≤ |F x − V (x)−1 F (x) | +|F [ V (x)−1 − (α|F (x)|I + V (x)T V (x))−1 V (x)T F (x) ]| ¯ ≤ L(|x − x ∗ |)|F (x)| + M L¯ 2 |F (x)|2 ≤ L¯ (|x − x ∗ |) + M L¯ 2 |x − x ∗ | |F (x)|,
where we used (8.2.29) and denoted by L¯ the Lipschitz constant of F on the bounded set {x : |x| ≤ M|F (x0 )|2 } ∪ {x − V (x)−1 F (x) : x ∈ N } ∪ N .
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 231 i
231
Since x k → x ∗ , the last estimate implies the existence of an index k¯ such that x k ∈ N and ¯ |F (x k + hk )| ≤ γ |F (x k )| for all k ≥ k, −1 where hk = − α|F (x k )|I + V (x k )T V (x k ) V (x k )T F (x k ). Thus the first alternative in ¯ and superlinear convergence follows the Gauss–Newton algorithm is chosen for all k ≥ k, k k ¯ for x ∈ N . from (8.2.30) with (x, h) = (x , h ), where we also use that |F (x)| ≤ L|x|
8.2.5 A nonlinear complementarity problem Consider the complementarity problem ⎧ ⎨ φ ≤ x ≤ ψ, g(x)i (x − ψ)i (x − φ)i = 0 for all i, ⎩
g(x) ≤ 0 for x ≥ ψ
and
g(x) ≥ 0 for x ≤ φ,
where g ∈ C 1 (Rn , Rn ) and φ < ψ. This corresponds to the bilateral constraint case and can equivalently be expressed as F (x) = g(x) + max(0, −g(x) + x − ψ) + min(0, −g(x) + x − φ) = 0; see Section 4.7, Example 4.53. Clearly θ(x) = |F (x)|2 is locally Lipschitz continuous and directionally differentiable, and hence B-differentiable. Define A+ = {−g(x) + x − ψ > 0},
A− = {−g(x) + x − φ < 0},
I 1 = {−g(x) + x − ψ = 0},
I 2 = {x − ψ < g(x) < x − φ},
and
I 3 = {−g(x) + x − φ = 0}. We obtain
F (x; d) =
⎧ d ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ g (x)d
on A+ ∪ A− , on I 2 ,
⎪ ⎪ max(g (x)d, d) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ min(g (x)d, d)
on I 1 , on I 3 ,
and the Bouligand direction (8.2.16) is as the solution to d + x − ψ = 0 on A+ ,
d + x − φ = 0 on A− ,
max(g (x)d, d) + F (x) = 0 on I 1 ,
g (x)d + g(x) = 0 on I 2 ,
min(g (x)d, c d) + F (x) = 0 on I 3 . (8.2.31)
i
i i
i
i
i
i
232
ItoKunisc 2008/6/12 page 232 i
Chapter 8. Semismooth Newton Methods I
Further if g (x) − I is surjective and g ∈ C 2 (Rn , Rn ), then F o (x, d) satisfies ⎧ d on A+ ∪ A− , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ on I 2 , ⎨ g (x)d o F (x; d) = ⎪ ⎪ max(g (x)d, d) on {F (x) > 0} ∪ (I 1 ∪ I 3 ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ min(g (x)d, d) on {F (x) ≤ 0} ∪ (I 1 ∪ I 3 ).
(8.2.32)
To verify this claim we consider F (x) = g(x) + max(0, −g(x) + x − ψ). The general case then follows with minor modifications. We find θ o (x, d) = lim sup
y→x,t→0+
= lim sup
y→x,t→0+
2 (F (x), F (y + td) − F (y)) t
2 ˆ − max(g (x)y, y − ψ)), ˆ (F (x), max(g (x)(y + td), y + td − ψ) t (8.2.33)
where ψˆ = ψ + g(x) − g (x)x. For d ∈ Rn we define r ∈ Rn by ri > 0 on S1 = {i ∈ I 1 : Fi (x) > 0, (Ad)i > di } ∪ {i ∈ I 1 : Fi (x) ≤ 0, (Ad)i < di }, ri < 0 on S2 = {i ∈ I 1 : Fi (x) > 0, (Ad)i ≤ di } ∪ {i ∈ I 1 : Fi (x) ≤ 0, (Ad)i ≥ di }, and r arbitrary otherwise. Here we set A = g (x). Let y ∈ Rn be such that Ay = y − ψˆ + r, and choose t = ty such that ˆ i for i ∈ S1 , (Ay + tAd)i > (y + td − ψ) ˆ i for i ∈ S2 (Ay + tAd)i < (y + td − ψ) for t ∈ (0, ty ). Passing to the limit in (8.2.33) we arrive at (8.2.32). For arbitrary δ > 0 we claim that the quasi-directional derivative G, defined by ⎧ d on Aδ = {−g(x) + x − ψ > δ} ∩ {−g(x) + x − φ < −δ}, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ g (x)d on Iδ = {x − ψ + δ ≤ g(x) ≤ x − φ − δ}, ⎪ ⎪ ⎨ G(x; d) = (8.2.34) and otherwise ⎪ ⎪ ⎪ ⎪ ⎪ max(g (x)d, d) if {F (x) > 0}, ⎪ ⎪ ⎪ ⎪ ⎩ min(g (x)d, d) if {F (x) ≤ 0}, is a quasi-directional derivative for F . In fact, G is clearly positively homogeneous of degree 1, and (F (x), F (x, d)) ≤ (F (x), G(x, d)),
i
i i
i
i
i
i
8.2. Semismooth functions in finite dimensions
ItoKunisc 2008/6/12 page 233 i
233
which shows (i) of Definition 8.7. Moreover we have (F (x), F o (x, d)) ≤ (F (x), G(x, d)). If |x − x| ¯ sufficiently small, then i ∈ Iδ (x) implies that i ∈ I 2 (x), ¯ i ∈ Aδ (x) implies that i ∈ A(x). ¯ Thus, (iii) holds by the definition of G. To argue that (8.2.18) has a solution, let Aδ ∪ Iδ ∪ (Aδ ∩ Iδ )c be a pairwise disjoint partition of the index {i = 1, . . . , n}. Set A = g (x), for x ∈ S, and decompose A according to the partition of the index set ⎛ ⎞ A11 A12 A13 A = ⎝ A21 A22 A23 ⎠ . A31 A32 A33 Setting d = (d1 , d2 , d3 )T and F = (F1 , F2 , F3 )T , the equation G(x; d) = −F is equivalent to d1 = −F1 , −1 d2 = −A−1 22 A23 d3 + A22 (−F2 + A21 F1 ),
and
⎧ Md3 + μ˜ = −w − F3 , ⎪ ⎪ ⎪ ⎪ ⎨ μ˜ = max(0, μ˜ + d3 + F3 ) ⎪ ⎪ ⎪ ⎪ ⎩ μ˜ = min(0, μ˜ + d3 + F3 )
on {F (x) > 0},
(8.2.35)
on {F (x) ≤ 0},
where w = −A31 F1 + A32 A−1 22 (−F2 + A21 F1 )
and
M = A33 − A32 A−1 22 A23 .
Assume that A is symmetric positive definite for every x ∈ S. Then every Schur complement of A is positive definite as well, and (8.2.35) admits a unique solution (d3 , μ). ˜ It follows that G(x; d) = −F admits a unique solution d for every x ∈ S and F ∈ Rn . Consequently (8.2.18) is satisfied. Since g (x) is positive definite for every x ∈ S and S is closed and bounded by (8.2.9), and hence compact, it follows that g (x) and M are uniformly positive ¯ | for some definite with respect to x ∈ S. This implies that there exists b¯ such that |d| ≤ b|F b¯ independent of x ∈ S. Consequently (8.2.17) holds. We remark that (iii) of Definition 8.7 is not satisfied for the choice G(x; d) = F (x, d) unless g(x)i = (x − ψ)i and g(x)i = (x − φ)i for all i and x ∈ S.
i
i i
i
i
i
i
234
8.3
ItoKunisc 2008/6/12 page 234 i
Chapter 8. Semismooth Newton Methods I
Semismooth functions in infinite-dimensional spaces
In infinite-dimensional spaces notions of generalized derivatives for functions which are not C 1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainly utilize a concept of generalized derivative that is sufficient to guarantee superlinear convergence of Newton’s method. We refer the reader to the discussion in Section 8.1, especially (8.1.5) and (8.1.6). This notion of differentiability is called Newton derivative and will be defined below. We refer the reader to [HiIK, CNQ, Ulb] for further discussions of the topics covered in this section. Let X, Z be real Banach spaces and let D ⊂ X be an open set. Definition 8.10. (1) F : D ⊂ X → Z is called Newton differentiable at x if there exist an open neighborhood N (x) ⊂ D and mappings G : N (x) → L(X, Z) such that lim
|h|→0
|F (x + h) − F (x) − G(x + h)h|Z = 0. |h|X
(A)
The family {G(s) : s ∈ N (x)} is called an N -derivative of F at x. (2) F is called semismooth at x if it is Newton differentiable at x and lim G(x + t h)h exists uniformly in |h| = 1.
t→0+
(3) F is directionally differentiable at x ∈ D if lim
t→0+
F (x + t h) − F (x) =: F (x; h) t
exists for all h ∈ X. (5) F is B-differentiable at x ∈ D if F is directionally differentiable at x and lim
|h|→0
F (x + h) − F (x) − F (x; h) = 0. |h|X
Note that differently from the finite-dimensional case we do not require Lipschitz continuity of F as part of the definition of semismoothness. Lemma 8.11. Suppose that F : D ⊂ X → Z is Newton differentiable at x ∈ D with N-derivative G. (1) F is directionally differentiable at x if and only if lim G(x + t h)h exists for all h ∈ X.
t→0+
(8.3.1)
In this case F (x; h) = limt→0+ G(x + th)h for all h ∈ X. (2) F is B-differentiable at x if and only if lim G(x + t h)h exists uniformly in |h| = 1,
t→0+
(8.3.2)
i.e., F is semismooth.
i
i i
i
i
i
i
8.3. Semismooth functions in infinite-dimensional spaces
ItoKunisc 2008/6/12 page 235 i
235
Proof. (1) If (8.3.1) holds for h ∈ X with |h| = 1, then lim
t→0+
F (x + th) − F (x) = lim+ G(x + th)h. t→0 t
Since h ∈ X with |h| = 1 was arbitrary this implies that F is directionally differentiable at x and F (x; h) = lim+ G(x + th)h. t→0
Similarly the converse holds. (2) If F is directionally differentiable at x, then F is B-differentiable at x, i.e., lim
F (x + h) − F (x) − F (x; h) = 0 if and only if |h|X
lim
F (x + tv) − F (x) − F (x; v) = 0 and the limit is uniform in |v|X = 1. t
|h|→0
t→0+
Here we use positive homogeneity of the directional derivative F (x; h) with respect to the second variable. If F is B-differentiable at x, then it is differentiable and from (1) we have lim
t→0
F (x + tv) − F (x) = F (x; v) = lim G(x + tv)v. t→0 t
The Bouligand property and the equivalence stated above imply that the limt→0 G(x + tv)v exists uniformly in |v| = 1. The converse easily follows as well. Example 8.12. Let ψ : R → R be semismooth at every x ∈ R in the sense of Definition 8.1 and globally Lipschitz, i.e., there exists a constant L such that |ψ(s) − ψ(t)| ≤ L|s − t| for all s, t ∈ R. We first argue that there exists a measurable selection V : R → R such that V (t) ∈ ∂ψ(t) for a.e. t ∈ R. Since ∂ψ(s) is a nonempty closed set in R for every s ∈ R (see [Cla, p. 70]), it suffices to prove that the multivalued function s → ∂ψ is measurable; i.e., for every compact set C ⊂ R the preimage PC = {t ∈ R : ∂ψ(t) ∩ C = ∅} is measurable. The measurable selection theorem (see [Cla, p. 111]) then ensures the existence of the desired measurable selection V . To verify measurability of s → ∂ψ(s) let C be a compact set and let {tk } be a convergent sequence in PC with limit t ∗ . Choose vk ∈ ∂ψ(tk ) ∩ C for k = 1, 2, . . . . By compactness of C there exists a convergent subsequence, denoted by the same symbol, with limit v ∗ ∈ C. Since tk → t ∗ , upper semicontinuity of ∂ψ at t ∗ (see [Cla, p. 70]) implies the existence of a sequence v˜k ∈ ∂ψ(t ∗ ) such that limk→∞ (v˜k − vk ) = 0. Consequently limk→∞ v˜k = v ∗ and by closedness of ∂ψ(t ∗ ) we have v ∗ ∈ ∂ψ(t ∗ ) ∩ C. Thus PC is closed and therefore measurable. Associated to ψ with the properties specified above, we define for 1 ≤ p ≤ q ≤ ∞ the substitution operator F : Lq () → Lp ()
i
i i
i
i
i
i
236
ItoKunisc 2008/6/12 page 236 i
Chapter 8. Semismooth Newton Methods I
by F (x)(s) = ψ(x(s)) for a.e. s ∈ , where x ∈ L (), and with a bounded domain in Rn . We now verify that F is Newton differentiable on R if 1 ≤ p < q ≤ ∞ and that any measurable selection V of ∂ψ provides an N -derivative. Let D : R × R → R be given by q
D(s, v) = |ψ(s + v) − ψ(s) − V (s + v)v|. By Theorem 8.2 we have lim v −1 D(s, v) = 0 for every s ∈ R.
v→0
(8.3.3)
Moreover global Lipschitz continuity of ψ implies that D(s, v) ≤ 2L|v| for all (s, v) ∈ R2 .
(8.3.4)
Let x ∈ Lq () and let {hk } be a sequence in Lq () converging to 0. Then there exists a subsequence {hk } converging to 0 a.e. in . By (8.3.3) this implies that hk (s)−1 D(x(s), hk (s)) → 0 for a.e. s ∈ . By (8.3.4) Lebesgue’s bounded convergence theorem is applicable pˆ and implies that h−1 ˆ < ∞. Since this is the k D(x, hk ) converges to 0 in L for every 1 ≤ p case for every a.e. convergent subsequence of hk and since {hk } was arbitrary, we find that |h−1 D(x, h)|Lpˆ → 0 for every 1 ≤ pˆ < ∞ as |h|Lq () → 0.
(8.3.5)
By the Hölder inequality we obtain |D(x, h)|Lp ≤ |h−1 D(x, h)|Lr |h|Lq , qp where r = q−p if q < ∞, and r = p if q = ∞. Using (8.3.5) this implies Newton differentiability of F at x. To verify that F is semismooth at x ∈ Lq (), we shall show that
|V (x + h)h − ψ (x; h)|Lp → 0 as |h|Lq → 0. |h|Lq
(8.3.6)
¯ v) = |V (s + v)v − ψ (s; v)|. Then by Theorem 8.2 Let D¯ : R × R → R be given by D(s, (3) and Lipschitz continuity of ψ we have ¯ v) = 0 and D(s, ¯ v) ≤ 2L|v| for all (s, v) ∈ R2 . lim v −1 D(s,
v→0
The proof of (8.3.6) can now be completed in the same manner as that for Newton differentiability, by using Lebesgue’s bounded convergence theorem. Semismoothness of F : Lq () → Lp () follows from (8.3.5), (8.3.6), and Lemma 8.11 (2). The class of mappings F of this example was treated in [Ulb]. Example 8.13. Let X be a Hilbert space. Then the norm functional F (x) = |x| is Newton x+h differentiable. In fact, let G(x + h)h = ( |x+h| , h)X and G(0)h = (λ, h)X for some λ with λ ∈ X. Then
(2(x + h), h)X − |h|2 (x + h, h)X
→0 |h|−1 |F (x + h) − F (x) − G(x + h)h| = |h|−1
− |x| + |x + h| |x + h| as h → 0. Hence F is Newton differentiable on X. Moreover F is semismooth.
i
i i
i
i
i
i
8.3. Semismooth functions in infinite-dimensional spaces
ItoKunisc 2008/6/12 page 237 i
237
Example 8.14. Let F : Lq () → Lp () denote the pointwise max operation F (x) = max(0, x) and define Gm by ⎧ ⎨ 0 if x(s) < 0, δ if x(s) = 0, Gm (x)(s) = ⎩ 1 if x(s) > 0, where δ ∈ [0, 1]. It follows from Example 8.12 that F is semismooth from Lq () into Lp () provided that 1 ≤ p < q ≤ ∞, with Gm as N -derivative. It can also be argued that Gm is an N-derivative for F for any choice of δ ∈ R (see [HiIK]). If p = q, then F is directionally differentiable at every x ∈ Lq (). In fact, for h ∈ Lq () define 0 if x(s) < 0 or x(s) = 0, h(s) ≤ 0, F (x; h)(s) = h(s) if x(s) > 0 or x(s) = 0, h(s) ≥ 0. Then we have
and
F (x(s) + t h(s)) − F (x(s))
− F (x; h)(s)
≤ 2|h(s)|
t
F (x(s) + t h(s)) − F (x(s))
lim+
− F (x; h)(s)
t→0 t
for a.e. s ∈ .
By the Lebesgue dominated convergence theorem
F (x + t h) − F (x)
lim+
− F (x; h)
= 0, t→0 t Lp i.e., F (x; h) is the directional derivative of F at x. However, F is not Newton differentiable with Gm as N-derivative in general. For this purpose consider x = −|s| on = (−1, 1) and choose hn (s) as n1 multiplied by the p 2 characteristic function of the interval (− n1 , n1 ). Then |hn |Lp = np+1 and
1
|F (x + hn ) − F (x) − Gm (x + hn )hn | ds = p
−1
1 n
− n1
p+1 2 1 |x(s)| = . p+1 n p
Thus, |F (x + hn ) − F (x) − Gm (x + hn )hn |Lp = lim n→∞ |hn |Lp
1 p+1
p1
= 0,
and hence condition (A) is not satisfied at x for any p ∈ [1, ∞). To consider the case p = ∞ we choose = (0, 1) and show that (A) is not satisfied at x(s) = s. For this purpose define for n = 2, . . . ⎧ 1 on (0, n1 ] , ⎪ ⎨ −(1 + n )s hn (s) = (1 + n1 )s − n2 (1 + n1 ) on ( n1 , n2 ] , ⎪ ⎩ 0 on ( n2 , 1] .
i
i i
i
i
i
i
238
ItoKunisc 2008/6/12 page 238 i
Chapter 8. Semismooth Newton Methods I
Observe that En = {s : x(s) + hn (s) < 0} ⊃ (0, n1 ]. Therefore lim
1
n→∞ |hn |L∞ (0,1)
|max(0, x + hn ) − max(0, x) − Gm (x + hn )hn |L∞ (0,1)
n2 n |x|L∞ (En ) ≥ lim = 1, n→∞ n + 1 n→∞ n + 1
= lim
and hence (A) cannot be satisfied. Lemma 8.15. Suppose that H : D ⊂ X → Y is continuously Fréchet differentiable at x ∈ D and φ : Y → Z is Newton differentiable at H (x) with N -derivative G. Then F = φ(H ) is Newton differentiable at x with N-derivative G(H (x + h))H (x + h) ∈ L(X, Z) for h sufficiently small. Proof. Let U be a convex neighborhood of x in D such that H ∈ L(X, Y ) is continuous in U and H (U ) is contained in N (H (x)), with N (H (x)) defined according to N -differentiability of φ at H (x). Let h ∈ X be such that x ∗ + h ∈ U and note that 1 H (x + θ h)h dθ, (8.3.7) H (x + h) = H (x) + 0
where
/ / / /
0
1
/ / H (x + θ h) dθ − H (x + h)/ /→0
as |h|X → 0,
(8.3.8)
since H ∈ L(X, Y ) is continuous at x. By Newton differentiability of φ at H (x) and (8.3.7) it follows that
1
1
φ(H (x + h)) − φ(H (x)) − G(H (x + h)) H (x + θ h)h dθ
= 0. lim
|h|X →0 |h|X 0 Z From (8.3.8) we deduce that lim
|h|X →0
1
φ(H (x + h)) − φ(H (x)) − G(H (x + h))H (x + h)h Z = 0. |h|X
This implies Newton differentiability of F = φ(H ) at x. Theorem 8.16. Suppose that x ∗ is a solution to F (x) = 0 and that F is Newton differentiable at x ∗ with N-derivative G. If G is nonsingular for all x ∈ N (x ∗ ) and {G(x)−1 : x ∈ N (x ∗ )} is bounded, then the Newton iteration x k+1 = x k − G(x k )−1 F (x k ) converges superlinearly to x ∗ provided that |x 0 − x ∗ | is sufficiently small. Proof. Note that the Newton iterates satisfy |x k+1 − x ∗ | ≤ G(x k )−1 |F (x k ) − F (x ∗ ) − G(x k )(x k − x ∗ )|
(8.3.9)
i
i i
i
i
i
i
8.3. Semismooth functions in infinite-dimensional spaces
ItoKunisc 2008/6/12 page 239 i
239
if x k ∈ N (x ∗ ). Let B(x ∗ , r) denote a ball of radius r centered at x ∗ contained in N (x ∗ ) and let M be such that G(x)−1 ≤ M for all x ∈ B(x ∗ , r). We apply (A) with x = x ∗ . Let η ∈ (0, 1] be arbitrary. Then there exists ρ ∈ (0, r) such that |F (x ∗ + h) − F (x ∗ ) − G(x ∗ + h)h| <
η 1 |h| ≤ |h| M M
(8.3.10)
for all |h| < ρ. Consequently, if we choose x 0 such that |x 0 − x ∗ | < ρ, then by induction from (8.3.9), (8.3.10) with h = x k − x ∗ we have |x k+1 − x ∗ | < ρ and in particular x k+1 ∈ B(x ∗ , ρ). It follows that the iterates are well defined. Moreover, since η ∈ (0, 1] is chosen arbitrarily, x k → x ∗ converges superlinearly. The following theorem provides conditions which guarantee in an appropriate sense global convergence of the Newton iteration [QiSu]. Theorem 8.17. Suppose that F is continuous and directionally differentiable on the closed sphere S = B(x 0 , r). Assume also the existence of bounded operators G(·) ∈ L(X, Z) and constants β, γ ∈ R+ such that G(x)−1 ≤ β,
|G(x)(y − x) − F (x; y − x)| ≤ γ |y − x|,
|F (y) − F (x) − F (x; y − x)| ≤ δ |y − x| for all x, y ∈ S, where α = β(γ + δ) < 1, and β |F (x 0 )| ≤ r(1 − α). Then the iterates defined by x k+1 = x k − G(x k )−1 F (x k ) for k = 0, 1, . . . remain in S and converge to the unique solution x ∗ of F (x) = 0 in S. Moreover, we have the error estimate α |x k − x k−1 |. |x k − x ∗ | ≤ 1−α Proof. First note that |x 1 − x 0 | ≤ |G(x 0 )−1 F (x 0 )| ≤ β |F (x 0 )| ≤ r(1 − α). Thus x 1 ∈ S. Suppose that x 1 , . . . , x k ∈ S. Then |x k+1 − x k | ≤ |G(x k )−1 F (x k )| ≤ β |F (x k )| ≤ β |F (x k ) − F (x k−1 ) − F (x k−1 ; x k − x k−1 )| +β |G(x k−1 )(x k − x k−1 ) − F (x k−1 ; x k − x k−1 )| ≤ β(δ + γ ) |x k − x k−1 | = α |x k − x k−1 | ≤ α k |x 1 − x0 | ≤ r(1 − α)α k . (8.3.11)
i
i i
i
i
i
i
240
ItoKunisc 2008/6/12 page 240 i
Chapter 8. Semismooth Newton Methods I
Since |x k+1 − x 0 | ≤
k
|x j +1 − x j | ≤
j =0
we have r
k+1
k
rα j (1 − α) ≤ r,
j =0
∈ S and by induction x ∈ S for all k. For each m > n k
|x m − x n | ≤
m−1
|x j +1 − x j | ≤
j =n
m−1
rα j (1 − α) ≤ rα n .
j =n
Hence {x k } is a Cauchy sequence in S and there exists limk→∞ x k = x ∗ ∈ S. By locally Lipschitz continuity of F and (8.3.11) |F (x ∗ )| = lim |F (x k )| ≤ lim αβ −1 |x k − x k−1 | = 0, i.e., F (x ∗ ) = 0. For y ∗ ∈ S satisfying F (y ∗ ) = 0 we find |y ∗ − x ∗ | ≤ β |G(x ∗ )(y ∗ − x ∗ )| ≤ β |F (y ∗ ) − F (x ∗ ) − F (x ∗ ; y ∗ − x ∗ )| +β |G(x ∗ )(y ∗ − x ∗ ) − F (x ∗ ; y ∗ − x ∗ )| ≤ α |y ∗ − x ∗ |. This implies y ∗ = x ∗ and hence x ∗ is the unique solution to F (x) = 0 in S. Finally |x m − x k | ≤
m−1
|x j +1 − x j | ≤
j =k
m−k
α j |x k − x k−1 | ≤
j =1
α |x k − x k−1 | 1−α
implies the asserted error estimate by letting m → ∞.
8.4 The primal-dual active set method as a semismooth Newton method Let us consider the complementarity problem in the unknowns (x, μ) ∈ L2 () × L2 () Ax + μ = a, (8.4.1) μ = max(0, μ + c(x − ψ)), where A ∈ L(L2 ()), with a bounded domain in Rn , c > 0, and a ∈ Lp (), ψ ∈ Lp () for some p > 2. Recall that the second equation in (8.4.1) is equivalent to x ≤ ψ,
μ ≥ 0,
(μ, x − ψ)L2 () = 0,
(8.4.2)
where the inequalities and the max operation are understood in the pointwise a.e. sense. In Example 7.1 of Chapter 7 it was shown that such problems arise in the context of constrained optimal control problems with A of the form
i
i i
i
i
i
i
8.4. The primal-dual active set method as a semismooth Newton method A = αI + B ∗ (− )−2 B,
ItoKunisc 2008/6/12 page 241 i
241 (8.4.3)
where α > 0, B is a bounded operator, and denotes the Laplacian with homogeneous Dirichlet boundary conditions. This suggests and justifies our assumption that A = αI + C,
where C ∈ L(L2 (), Lp ()) with p > 2.
(8.4.4)
We shall show that the primal-dual active set method discussed in Chapter 7 is equivalent to the semismooth Newton method applied to ⎧ ⎨ Ax − a + μ, 0 = F (x, μ) = (8.4.5) ⎩ μ − max(0, μ + c (x − ψ)). For this purpose we define G : Lp () → L2 () by ⎧ ⎨ 0 if x(s) ≤ 0, G(x)(s) = ⎩ 1 if x(s) > 0,
(8.4.6)
and we recall that the max function is Newton differentiable from Lp () to L2 () if p > 2 with G as an N -derivative. A Newton step applied to the second equation in (8.4.4) results in c (x k+1 − x k ) + c (x k − ψ) = 0 in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0}, (μk+1 − μk ) + μk = 0
in Ik = {s : μk (s) + c (x k (s) − ψ(s)) ≤ 0}.
Hence a Newton step for (8.4.5) is given by ⎧ Ax k+1 − a + μk+1 = 0, ⎪ ⎪ ⎪ ⎪ ⎨ x k+1 = ψ in Ak , ⎪ ⎪ ⎪ ⎪ ⎩ k+1 μ = 0 in Ik .
(8.4.7)
This coincides with the primal-dual active set strategy of Section 7.1. To analyze its local convergence properties note that under assumption (8.4.4), equation (8.4.5) with c = α is equivalent to the reduced problem Ax − a + max(0, −Cx + a − αψ) = 0.
(8.4.8)
Applying a semismooth Newton step to (8.4.8) results in A(x k+1 − x k ) − G(−Cx k + a − αψ) C(x k+1 − x k ) (8.4.9) +Ax k − a + max(0, −Cx k + a − αψ) = 0. Setting μk+1 = a − Ax k+1 the iterates of (8.4.9) coincide with those of (8.4.7), provided that the initialization for the reduced iteration (8.4.9) is chosen such that μ0 = a − Ax 0 . This follows from the fact that μk + α(x k − ψ) = −Cx k + a − αψ.
i
i i
i
i
i
i
242
ItoKunisc 2008/6/12 page 242 i
Chapter 8. Semismooth Newton Methods I
For any partition = A ∪ I into measurable sets A and I let RI : L2 (I) → L2 (I) denote the canonical restriction operator and RI∗ : L2 (I) → L2 (I) its adjoint. Further set AI = RI A RI∗ . Proposition 8.18. Assume that (8.4.1) with ψ and a in Lp (), p > 2, admits a solution (x ∗ , μ∗ ) ∈ L2 () × L2 () and that (8.4.4) holds. If moreover 2 {A−1 I : = I ∪ A} is uniformly bounded in L(L ()),
(8.4.10)
then the iterates (x k , μk ) defined by (8.4.7) converge superlinearly in L2 () × L2 () to (x ∗ , μ∗ ), provided that |x ∗ − x 0 | is sufficiently small and μ0 = a − Ax 0 . Moreover (x ∗ , μ∗ ) ∈ Lp () × Lp (). Proof. First note that if (8.4.1) admits a solution (x ∗ , μ∗ ) for some c > 0, then it admits the same solution with c = α and x ∗ is also a solution of (8.4.8). By (8.4.4) this implies that x ∗ ∈ Lp () and the first equation in (8.4.1) implies that μ∗ ∈ Lp (). By Lemma 8.15 and Example 8.14 the mapping F˜ x = αAx−a+max(0, −Cx+a−αψ) is Newton differentiable ˜ from L2 () into itself and G(x) = A − G(−Cx + a − αψ)C is an N -derivative. By ˜ −1 : x ∈ L2 ()} is uniformly bounded in L(L2 ()), (8.4.10) the family of operators {G(x) and hence by Theorem 8.16 the iterates x k converge superlinearly to x ∗ , provided that |x 0 − x ∗ | is sufficiently small. Superlinear convergence of μk to μ∗ follows from (8.4.5) and (8.4.7). Note that (8.4.10) is satisfied for operators of the form given in (8.4.3). The characterization of inequalities by means of max and min operations in complementarity systems such as (8.4.2) is one of several possibilities. 8 Another frequently used complementarity function is the Fischer–Burmeister function (ψ − x)2 + μ2 −(ψ −x)− μ. Numerical experiments appear to indicate that max-based complementarity functions can be more efficient numerically than the Fischer–Burmeister function; see, e.g., [Kan]. As shown in Section 7.5 a class of optimization problems with bilateral constraints ϕ ≤ x ≤ ψ can be expressed as Ax − a + μ = 0,
μ = max(0, μ + c (x − ψ) + min(0, μ + c (x − ϕ))
(8.4.11)
or, equivalently, as Ax − a + max(0, −Cx + a − αψ) + min(0, −Cx + a − αψ) = 0,
(8.4.12)
where c = α and ϕ < ψ with a, ϕ, and ψ in L (). The primal-dual active set strategy applied to (8.4.11) can be expressed as p
Ax k+1 − a + μk+1 = 0, k k x k+1 = ψ in A+ k = {s : μ (s) + c (x (s) − ψ(s)) > 0},
μk+1 = 0 in Ik = {s : μk (s) + c (x k (s) − ϕ(s)) ≤ 0 ≤ μk (s) + c (x k (s) − ψ(s))}, k k x k+1 = ϕ in A− k = {s : μ (s) + c (x (s) − ϕ(s)) < 0}.
(8.4.13)
i
i i
i
i
i
i
8.5. Semismooth Newton methods for nonlinear complementarity problems
ItoKunisc 2008/6/12 page 243 i
243
In the bilateral case, the role of c influences the likelihood of switches from being k+1 active-above to active-below in one iteration. If, for example, s ∈ A+ (s) + k , then μ k+1 k+1 c (x (s) − ϕ(s)) = μ (s) + c (ψ(s) − ϕ(s)) is more likely to be negative and hence s ∈ A− k+1 if c is large. For c = α, iteration (8.4.13) is equivalent to applying the semismooth Newton method to (8.4.12) with G as N -derivative for the max function and ⎧ ⎨ 0 if x(s) ≥ 0, ˜ G(x)(s) = ⎩ 1 if x(s) < 0 as N-derivative for the min function. Under the assumptions of Proposition 8.18 local superlinear convergence of the iteration (8.4.13) can be shown.
8.5
Semismooth Newton methods for a class of nonlinear complementarity problems
In this section we consider nonlinear complementarity problems in the Hilbert space X = L2 (). Let g be a continuous mapping from L2 () into itself with a bounded domain in Rn , and consider g(x) + μ = 0,
μ ≥ 0, x ≤ ψ,
and
(μ, x − ψ) = 0,
(8.5.1)
where ψ ∈ L2 (). As discussed in Chapter 7, (8.5.1) can be expressed equivalently as ⎧ ⎨ g(x) + μ, 0 = F (x, μ) = (8.5.2) ⎩ μ − max(0, μ + c (x − ψ)), for any c > 0, where max denotes the pointwise max operation. If is a continuously differentiable functional on X, then (8.5.1) with g = is the necessary optimality condition for min (x)
x∈L2 ()
subject to x ≤ ψ.
(8.5.3)
We shall employ semismooth Newton methods for solving F (x, μ) = 0. Let G(x) be as defined in Section 8.4. Applying a primal-dual active set strategy to the second equation in (8.5.2) results in g(x k+1 ) + μk+1 = 0, x k+1 = ψ in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0},
(8.5.4)
μk+1 = 0 in Ik = {s : μk (s) + c (x k (s) − ψ(s)) ≤ 0} for k = 0, 1, . . . . To investigate the convergence properties of (8.5.4) we note that (8.5.2) is equivalent to g(x) + max(0, −g(x) + c (x − ψ)) = 0,
(8.5.5)
i
i i
i
i
i
i
244
ItoKunisc 2008/6/12 page 244 i
Chapter 8. Semismooth Newton Methods I
and the iteration (8.5.4) is equivalent to the reduced iteration g(x k+1 ) + G −g(x k ) + c (x k − ψ) −g(x k+1 ) + c (x k+1 − ψ) −(−g(x k ) + c (x k − ψ)) + max(0, −g(x k ) + c (x k − ψ)) = 0,
(8.5.6)
provided that the initialization for (8.5.4) is chosen such that μ0 = −g(x 0 ). Note that (8.5.5) can be considered as a partial semismooth Newton iteration. Separating the generalized differentiation of the max operation from that of the linearization of the nonlinearity g was investigated numerically in [dReKu, GrVo]. Proposition 8.19. Assume that (8.5.2) admits a solution x ∗ , that x → g(x) − c (x − ψ) is Lipschitz continuous from L2 () to Lp () in a neighborhood U of x ∗ for some c > 0 and p > 2, and that there exists α > 0 such that 2 (8.5.7) α|x − y|L2 (I) ≤ (g(x) − g(y))(s)(x − y)(s) ds I
for all partitions = A ∪ I and x, y ∈ L2 (). Then the iterates (x k , μk ) defined by (8.5.4) converge superlinearly to (x ∗ , μ∗ ), provided that |x ∗ − x 0 | is sufficiently small and μ0 = −g(x 0 ). Proof. From (8.5.5) and (8.5.6) we have g(x k+1 ) − g(x ∗ ) + G(zk )(−g(x k+1 ) + c(x k+1 − ψ) + g(x ∗ ) − c(x ∗ − ψ)) = G(zk ) −g(x k ) + c(x k − ψ) + g(x ∗ ) − c(x ∗ − ψ) + max(0, −g(x ∗ ) + c(x ∗ − ψ)) − max(0, −g(x k ) + c(x k − ψ)), where zk = −g(x k ) + c(x k − ψ). Taking the inner product in L2 () with x k+1 − x ∗ , using (8.5.7) to estimate the left-hand side from below, and Example 8.14 and Lipschitz continuity of x → g(x) + c(x − ψ) from L2 () to Lp () to bound the right-hand side from above we find 8 min(c, α)|x k+1 − x ∗ |L2 () ≤ o(|x k − x ∗ |L2 () ). Finally (8.5.2) and (8.5.4) imply that μk+1 − μ∗ = g(x ∗ ) − g(x k+1 ), and hence superlinear convergence of μk to μ∗ follows from superlinear convergence of x k to x ∗ . A full semismooth Newton step applied to (8.5.5) is given by g (x k )(x k+1 − x k ) + G −g(x k ) + c (x k − ψ) −g (x)(x k+1 − x k ) + c (x k+1 − x k ) +g(x k ) + max(0, −g(x k ) + c (x k − ψ)) = 0. (8.5.8)
i
i i
i
i
i
i
8.5. Semismooth Newton methods for nonlinear complementarity problems
ItoKunisc 2008/6/12 page 245 i
245
It can be equivalently expressed as g (x k )(x k+1 − x k ) + g(x k ) + μk+1 = 0, x k+1 = ψ in Ak = {s : −g(x k )(s) + c (x k (s) − ψ(s)) > 0},
(8.5.9)
μk+1 = 0 in Ik = {s : −g(x k )(s) + c (x k (s) − ψ(s)) ≤ 0}. Remark 8.5.1. Note that, differently from the linear case considered in Section 8.4, if we apply a full semismooth Newton step to (8.5.2) rather than to the reduced equation (8.5.5), then the resulting algorithm differs in the update of the active/inactive sets. Applying a semismooth Newton step to (8.5.2) results in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0} = {s : −g(x k−1 )(s) − g (x k−1 )(x k − x k−1 ) + c (x k (s) − ψ(s)) > 0}. To investigate local convergence of (8.5.8) we denote, for any partition = A ∪ I into measurable sets I and , by RI : L2 () → L2 (I) the canonical restriction operator and by RI∗ : L2 (I) → L2 () its adjoint. Further we set g (x)I = RI g (x) RI∗ . Proposition 8.20. Assume that (8.5.2) admits a solution x ∗ , that x → g(x) − c (x − ψ) is a C 1 function from L2 () to Lp () in a neighborhood U of x ∗ for some c > 0 and p > 2, and that 2 {g (x)−1 I ∈ L(L (I)) : x ∈ U, = A ∪ I} is uniformly bounded.
Then the iterates x k defined by (8.5.8) converge superlinearly to x ∗ , provided that |x ∗ − x 0 | is sufficiently small. Proof. By Lemma 8.15 and Example 8.14 the mapping x → max(0, −g(x) + c(x − ψ)) is Newton differentiable from L2 () into itself and G(−g(x) + c(x − ψ))(−g (x) + cI ) is an N -derivative. Moreover g (x) + G(−g(x) + c(x − ψ))(−g + cI ) is invertible in L(L2 ()) with uniformly bounded inverses for x ∈ U . Setting z = −g(x) + c(x − ψ),
A = {z > 0},
I = \ A,
hI = χI h,
hA = χA h,
this follows from the fact that for given f ∈ L2 () the solution to the equation g (x)h + G(z)(−g (x)h + ch) = f is given by chA = fA
and
hI =
gI (x)−1
1 fI − χI g (x)fA . c
From Theorem 8.16 we conclude that x k → x ∗ superlinearly, provided that |x ∗ − x 0 | is sufficiently small.
i
i i
i
i
i
i
246
8.6
ItoKunisc 2008/6/12 page 246 i
Chapter 8. Semismooth Newton Methods I
Semismooth Newton methods and regularization
In this section we discuss the case where the operator A of Section 8.4 is not a continuous but only a closed operator in X = L2 (). Let V denote a real Hilbert space that is densely and continuously embedded into X and let V ∗ denote its dual. For ψ ∈ V let C = {y ∈ V : y ≤ ψ}. We identify X∗ with X and thus we have the Gel’fand triple framework: V ⊂ X = ∗ X ⊂ V ∗ . Let a : V × V → R be a bounded coercive bilinear form, i.e., for M, ν > 0 |a(y, v)| ≤ M |y|V |v|V for all y, v ∈ V , a(v, v) ≥ ν |v|2V for all v ∈ V . For given f ∈ V ∗ consider the variational inequality for y ∈ C: a(y, v − y) − f, v − y ≥ 0 for all v ∈ C.
(8.6.1)
The existence of solutions to (8.6.1) is established in Theorem 8.26. The solution to (8.6.1) is unique. In fact if y˜ ∈ C is a solution to (8.6.1), we have a(y, y˜ − y) − f, y˜ − y ≥ 0, a(y, ˜ y − y) ˜ − f, y − y
˜ ≥ 0. Summing these inequalities, we have a(y − y, ˜ y − y) ˜ ≤ 0 and thus y˜ = y. If a is symmetric, then (8.6.1) is the necessary and sufficient optimality condition for the constrained minimization problem in V 1 a(y, y) − f, y
2
over y ∈ C.
Ay, v V ∗ ×V = a(y, v)
for y, v ∈ V .
min
(8.6.2)
Let us define A ∈ L(V , V ∗ ) by
We note that (8.6.2) is equivalent to Ay + μ = f,
y ≤ ψ,
μ ∈ C+,
μ, y − ψ V ∗ ,V = 0,
(8.6.3)
where the first equality holds in V ∗ and C + = {μ ∈ V ∗ : μ, v V ∗ ,V ≤ 0 for all v ≤ 0}. If μ (or equivalently y) has extra regularity in the sense that μ ∈ L2 (), then (8.6.3) is equivalent to ⎧ ⎨ Ay + μ = f, (8.6.4) ⎩ μ = max(0, μ + c(y − ψ)) for each c > 0.
i
i i
i
i
i
i
8.6. Semismooth Newton methods and regularization
ItoKunisc 2008/6/12 page 247 i
247
Example 8.21. For the obstacle problem in its simplest form we choose V = H01 () and a(v, w) = ∇v∇w dx. In general, the unique solution (y, μ) is in L2 () × H −1 (). If ∂ and ψ are sufficiently regular, then (y, μ) ∈ (H01 () ∩ H 2 ()) × L2 () and (8.6.3) is equivalent to (8.6.4). Example 8.22. Consider the state-constrained optimal control problem with β > 0 and y¯ ∈ L2 () min
u∈L2 ()
1 β |y − y| ¯ 2L2 () + |u|2L2 () 2 2
subject to Ey = u and y ∈ C,
(8.6.5)
where E is a closed linear operator in L2 (). We assume that E −1 exists and set V = dom (E), where dom (E) is endowed with the graph norm of E. For E = − with homogeneous Dirichlet boundary conditions, we have dom E = H01 () ∩ H 2 (). The necessary and sufficient optimality condition for (8.6.5) is given by Ey = u,
y ∈ C,
βu = p,
E ∗ p + y − y, ¯ v − y V ∗ ×V ≥ 0
(8.6.6) for all v ∈ C,
where E ∗ : X → V ∗ denotes the conjugate of E as operator from V to X. This optimality system can be written as (8.6.1) with f = y¯ and a(v, w) = β (Ev, Ew)X + (v, w) for v, w ∈ V . It can also be written in the form (8.6.3) with μ ∈ V ∗ , but differently from Example 8.21, μ∈ / L2 () in general; see [BeK3]. In the case of mixed constraints, i.e., when pointwise constraints on the controls and the state are present, it is natural to treat the control constraints with the active set methods presented in Chapter 7 and to relax the state constraints by the technique explained in this section. Thus the obstacle and the state-constrained optimal control problem differ in that for the former, under weak regularity conditions on the problem data, the Lagrange multiplier is in L2 () and complementarity can be expressed in the form of (8.6.4), whereas for the latter this is not the case. Before we can address the applicability of the primal-dual active set strategy or semismooth Newton methods to such problems we also need to consider the candidates for the iterates. In the case of the obstacle problem these are given formally by a(y k+1 , v) + (μk+1 , v)L2 () = (f, v)L2 () for all v ∈ H01 (), y k+1 = ψ in Ak = {x : μk + c (y k (x) − ψ(x)) > 0},
(8.6.7)
μk+1 = 0 in Ik = {x : μk + c (y k (x) − ψ(x)) ≤ 0}. This is the optimality condition for 1 min |∇y|2 dx − (f, y)L2 () 2
over y ∈ H01 () with y = ψ on Ak ,
i
i i
i
i
i
i
248
ItoKunisc 2008/6/12 page 248 i
Chapter 8. Semismooth Newton Methods I
for which the Lagrange μk+1 is not in L2 (), and hence an update of Ak as in (8.6.7) is not feasible. Alternatively, considering the possibility of applying a semismooth Newton approach we observe that the reduced form of (8.6.4) is given by μ = max(0, μ + c(A−1 (f − μ) − ψ)), so that no smoothing of μ under the max operation occurs. In order to remedy these difficulties related to the regularity properties of the Lagrange multiplier we consider a one-parameter family of regularized problems based on smoothing of the complementarity condition given by μ = α max(0, μ + c(y − ψ)),
with 0 < α < 1.
(8.6.8)
This is a relaxation of the second equation in (8.6.4) with α as a continuation parameter. Note that an update for μ based on (8.6.8) results in μ ∈ L2 (). Equation (8.6.8) is equivalent to cα (y − ψ) , (8.6.9) μ = max 0, 1−α with cα/(1 − α) ranging in (0, ∞) for α ∈ (0, 1). We shall use a generalization of (8.6.8) and introduce an additional shift parameter μ¯ ∈ L2 () into (8.6.9). Moreover we replace cα/(1 − α) by c and arrive at μ = max(0, μ¯ + c (y − ψ)), with c ∈ (0, ∞).
(8.6.10)
This is exactly the same as the generalized Yosida–Moreau approximation for inequality constraints discussed in Chapter 4.7 and is related to augmented Lagrangian methods as described in Chapter 4.6. Utilizing this regularization in (8.6.4) results in Ay + μ = f, (8.6.11) μ = max(0, μ¯ + c (y − ψ)). Note that for each c > 0 y → max(0, μ¯ + c (y − ψ)) is Lipschitz continuous and monotone from X to X. Thus existence of a unique yc ∈ V satisfying Ay + max(0, μ¯ + c (y − ψ)) = f follows by monotone operator techniques; see, e.g., the discussion above Theorem 4.43. This implies the existence of a unique solution (yc , μc ) ∈ V × L2 () to (8.6.11). Convergence as c → ∞ will be analyzed at the end of this section. The semismooth Newton iteration for (8.6.11) is discussed next. Semismooth Newton Algorithm with Regularization. (i) Choose μ, c, y0 , set k = 0. (ii) Set Ak = {x : (μ¯ + c (y k − ψ))(x) > 0}, Ik = \ Ak .
i
i i
i
i
i
i
8.6. Semismooth Newton methods and regularization
ItoKunisc 2008/6/12 page 249 i
249
(iii) Solve for y k+1 ∈ V : a(y, v) + (μ¯ + c (y − ψ), χAk v)L2 () = (f, v)L2 () for all v ∈ V . (iv) Set μk+1 =
⎧ ⎨ ⎩
0 μ¯ + c (y k+1 − ψ)
on
Ik ,
on
Ak .
(v) Stop, or set k = k + 1, goto (ii). The iterates μk as assigned in step (iv) are not necessary for the algorithm, but they will be useful in the convergence analysis. The practical relevance of μ¯ is given by the fact that for certain problems a proper choice can guarantee feasibility of the iterates, i.e., yk ≤ ψ, as will be shown next. Theorem 8.23. Assume that a(φ, φ + ) ≤ 0, for φ ∈ V , implies that φ = 0. Then if μ¯ ≥ (f − Aψ)+ in the sense that (μ, ¯ φ) ≥ f, φ − a(ψ, φ)
(8.6.12)
for all φ ∈ C, the solution to (6.11) is feasible, i.e., yc ≤ ψ. Proof. Since for φ = (yc − ψ)+ max 0, μ¯ + c(yc − ψ), φ ≥ (μ, ¯ φ), it follows from (8.6.11) and (8.6.12) that a(y − ψ, φ) = f, φ − a(ψ, φ) − (μ, ¯ φ) ≤ 0. This implies that φ = 0 and thus yc ≤ ψ. Theorem 8.24. If a(φ, φ + ) ≤ 0, for φ ∈ V , implies that φ = 0, then the iterates of the semismooth Newton algorithm with regularization satisfy y k+1 ≤ y k for all k ≥ 1. Proof. Let φ = (y k+1 − y k )+ and observe that a(y k+1 −y k , φ)+a(y k , φ)− f, φ + (μ¯ + c (y k −ψ), φ χAk ) + c(y k+1 −y k , φ χAk ) = 0. We have a(y k , φ) − f, φ + (μ¯ + c(y k − ψ), φ χAk ) = −(μ¯ + c(y k − ψ), φ χAk−1 ) + (μ¯ + c(y k − ψ), φ χAk ) = (μ¯ + c(y k − ψ), φ χAk ∩Ik−1 ) − (μ¯ + c(y k − ψ), φ χIk ∩Ak−1 ) ≥ 0.
i
i i
i
i
i
i
250
ItoKunisc 2008/6/12 page 250 i
Chapter 8. Semismooth Newton Methods I
This implies that a(y k+1 − y k , φ) + c(y k+1 − y k , φ χAk ) ≤ 0; thus, a(y k+1 − y k , φ) ≤ 0 and hence φ = 0. Let us stress that in a finite-dimensional or discretized setting with L2 () replaced by R the need for regularization is not apparent, since the lack of regularity, as exhibited in Examples 8.21 and 8.22 and in the discussion of the iterative step of the semismooth Newton algorithm, does not exist. If the finite-dimensional problems arise from discretization of continuous ones, then the regularity of the Lagrange multipliers, however, certainly influences the convergence and approximation properties. It will typically lead to meshsize-dependent behavior of the semismooth Newton algorithm. We turn to convergence of the semismooth Newton algorithm with regularization. Recall that A−1 ∈ L(V ∗ , V ). Below we shall also denote the restriction of A−1 to L2 () by the same symbol. n
Theorem 8.25. Assume that μ¯ − cψ ∈ Lp () and A−1 ∈ L(L2 (), Lp ()) for some p > 2. If μ0 ∈ L2 () and |μ0 − μc |L2 () is sufficiently small, then (y k , μk ) → (yc , μc ) superlinearly in V × L2 (). Proof. First we show superlinear convergence of μk to μc by applying Theorem 8.16 to F : L2 () → L2 () defined by F (μ) = μ − max(0, μ¯ + c(A−1 (f − μ) − ψ)). By Example 8.14 and Lemma 8.15 the mapping F is Newton differentiable with N derivative given by ˜ G(μ) = I + c G(μ¯ + c(A−1 (f − μ) − ψ))A−1 , ˜ where G was defined in (8.4.6). The proof will be completed by showing that G(μ) has 2 2 uniformly bounded inverses in L(L ()) for μ ∈ L (). We define A = {x : (μ¯ + c(A−1 (f − μ) − ψ))(x) > 0},
I = \ A.
Further, let RA : L2 () → L2 (A) and RI : L2 () → L2 (I) denote the restriction opera∗ : L2 (A) → L2 () and RI∗ : L2 (I) → L2 () are the tors to A and I. Their adjoints RA extension-by-zero operators from A and I to , respectively. The mapping (RA , RI ) : L2 () → L2 (A)×L2 (I) determines an isometric isomorphism and every μ ∈ L2 () can uniquely ˜ be expressed as (RA μ, RI μ). The operator G(μ) can equivalently be expressed as ∗ IA 0 RA A−1 RA RA A−1 RI∗ ˜ G(μ) = +c , 0 II 0 0 where IA and II denote the identity operators on L2 (A) and L2 (I). Let (gA , gI ) ∈ L2 (A) × L2 (I) be arbitrary and consider the equation ˜ G(μ)((δμ) A , (δμ)I ) = (gA , gI ).
(8.6.13)
i
i i
i
i
i
i
8.6. Semismooth Newton methods and regularization
ItoKunisc 2008/6/12 page 251 i
251
Then necessarily (δμ)I = gI and (8.6.13) is equivalent to ∗ (δμ)A + c RA A−1 RA (δμ)A = gA − c RA A−1 RI∗ gI .
(8.6.14)
The Lax–Milgram theorem and nonnegativity of A−1 imply the existence of a unique solution (δμ)A to (8.6.14) and consequently (8.6.13) has a unique solution for every (gA , gI ) and every μ. Moreover these solutions are uniformly bounded with respect to μ ∈ L2 since (δμ)I = gI and |δμA |L2 (A) ≤ |gA |L2 () + cA−1 L(L2 ()) |gI |L2 (I) . This proves superlinear convergence μk → μc in L2 (). Superlinear convergence of y k to yc in V follows from Ay k +μk = f and the fact that A : V ∗ → V is a homeomorphism. Convergence of the solutions (yc , μc ) to (8.6.11) as c → ∞ is addressed next. Theorem 8.26. The solution (yc , μc ) ∈ V ×X of the regularized problem (8.6.11) converges to the solution (y ∗ , μ∗ ) ∈ V × V ∗ of (8.6.3) in the sense that yc → y ∗ strongly in V and μc → μ∗ strongly in V ∗ as c → ∞. Proof. Recall that yc ∈ V satisfies -
a(yc , v) + (μc , v) = f, v
for all v ∈ V ,
μc = max(0, μ¯ + c(yc − ψ)).
(8.6.15)
Since μc ≥ 0, for y ∈ C (μc , yc − y) = (μc , yc − ψ − (y − ψ)) ≥
c 1 |(yc − ψ)+ |2X − |μ| ¯ 2. 2 2c X
Letting v = yc − y in (8.6.15), c 1 a(yc , yc − y) + |(yc − ψ)+ |2X ≤ f, yc − y + |μ| ¯ 2. 2 2c X
(8.6.16)
Since a is coercive, this implies that c ν |yc |2V + |(yc − ψ)+ |2X is bounded uniformly in c. 2 Thus there exist a subsequence yc , denoted by the same symbol, and y ∗ ∈ V such that yc → y ∗ weakly in V . For all φ ≥ 0 ((yc − ψ)+ , φ)X ≥ (yc − ψ, φ)X . Since (yc − ψ)+ → 0 in X and yc → y ∗ weakly in X as c → ∞, this yields (y ∗ − ψ, φ)X ≤ 0 for all φ ≥ 0
i
i i
i
i
i
i
252
ItoKunisc 2008/6/12 page 252 i
Chapter 8. Semismooth Newton Methods I
√ which implies y ∗ − ψ ≤ 0 and thus y ∗ ∈ C. Since φ → a(φ, φ) defines an equivalent norm on V and norms are w.l.s.c., letting c → 0 in (8.6.16) we obtain a(y ∗ , y ∗ − y) ≤ f, y ∗ − y for all y ∈ C which implies that y ∗ is the solution to (8.6.1). Now, setting y = y ∗ in (8.6.16) a(yc − y ∗ , yc − y ∗ ) + a(y ∗ , yc − y ∗ ) − f, yc − y ∗ ≤
1 |μ| ¯ 2. 2c X
Since yc → y ∗ weakly in V , this implies that limc→∞ |yc − y ∗ |V = 0. Hence (yc , μc ) → (y ∗ , μ∗ ) strongly in V × V ∗ .
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 253 i
Chapter 9
Semismooth Newton Methods II: Applications
In the previous chapter semismooth Newton methods in function spaces were investigated. In was demonstrated that in certain cases the semismooth Newton method is equivalent to the primal-dual active set method. The application to nonlinear complementarity problems was discussed and the necessity of introducing regularization in cases where the Lagrange multiplier associated to the inequality condition has low regularity was demonstrated. In this chapter applications of semismooth Newton methods to nondifferentiable variational problems in function spaces will be treated. They concern image restoration problems regularized by bounded variation functionals in Section 9.1 and frictional contact problems in elasticity in Section 9.2. We shall make use of the Fenchel duality theorem which we recall for further reference; see, e.g., Section 4.3 and [BaPe, EkTe] for details. Let V and Y be Banach spaces with topological duals V and Y , respectively. Further, let ∈ L(V , Y ) and let F : V −→ R ∪ {∞}, G : Y −→ R ∪ {∞} be convex, proper, and l.s.c. functionals such that there exists v0 ∈ V with F(v0 ) < ∞, G(v0 ) < ∞ and G is continuous at v0 . Then inf {F(v) + G(v)} = sup −F (− q) − G (q) , (9.0.1) v∈V
q∈Y
where ∈ L(Y , V ) is the adjoint of . See, e.g., Section 4.3, Theorem 4.30 and Example 4.33 with (v, w) = F(v) + G(v + w). The convex conjugates F : V −→ R ∪ {∞} and G : Y −→ R ∪ {∞} of F and G, respectively, are defined by F (v ) = sup v, v V ,V − F(v) , v∈V
and analogously for G . The conditions imposed on F and G guarantee that the dual problem, i.e., the problem on the right-hand side of (9.0.1), admits a solution. Furthermore, v¯ ∈ V and q¯ ∈ Y are solutions to the two optimization problems in (9.0.1) if and only if the extremality conditions − q¯ ∈ ∂F(u), ¯ q¯ ∈ ∂G(u) ¯
(9.0.2)
hold, where ∂F denotes the subdifferential of F. 253
i
i i
i
i
i
i
254
9.1
ItoKunisc 2008/6/12 page 254 i
Chapter 9. Semismooth Newton Methods II: Applications
BV-based image restoration problems
In this section we consider the nondifferentiable optimization problem min 12 |Ku − f |2 dx + α2 |u|2 dx + β |Du| over u ∈ BV(),
(9.1.1)
where is a simply connected domain in R2 with Lipschitz continuous boundary ∂, f ∈ L2 (), β > 0, α ≥ 0 are given, and K ∈ L(L2 ()). We assume that K∗ K is invertible or α > 0. Further BV() denotes the space of functions of bounded variation. A function u is in BV() if the BV-seminorm defined by ∞ 2 |Du| = sup u div v : v ∈ (C0 ()) , | v (x)|∞ ≤ 1
is finite. Here | · |∞ denotes the supremum norm on R2 . It is well known that BV() ⊂ L2 () for ⊂ R2 (see [Giu]) and that u → |u|L2 + |Du| defines a norm on BV(). If K = identity and α = 0, then (9.1.1) is the well-known image restoration problem with BV-regularization. It consists of recovering the true image u from the noisy image f . BV-regularization is known to be preferable to regularization by |∇u|2 dx, for example, due to its ability to preserve edges in the original image during the reconstruction process. Since the pioneering work in [ROF], the literature on (9.1.1) has grown tremendously. We give some selected references [AcVo, CKP, ChLi, ChK, DoSa, GeYa, IK12] and refer the reader to the monograph [Vog] for additional ones. Despite its favorable properties for reconstruction of images, and especially images with blocky structure, problem (9.1.1) poses some severe difficulties. On the analytical level these are related to the fact that (9.1.1) is posed in a nonreflexive Banach space, the dual of which is difficult to characterize [Giu, IK18], and on the numerical level the optimality system related to (9.1.1) consists of a nonlinear partial differential equation, which is not directly amenable to numerical implementations. Following [HinK1] we shall show that the predual of (9.1.1) is a bilaterally constrained optimization problem in a Hilbert space, for which the primal-dual active set strategy can advantageously be applied and which can be analyzed as a semismooth Newton method. We require some facts from vector-valued function spaces, which we summarize next. Let IL2 () = L2 () × L2 () be endowed with the Hilbert space inner product structure and norm. If the context suggests to do so, then we shall distinguish between vector fields v ∈ IL2 () and scalar functions v ∈ L2 () by using an arrow on top of the letter. Analogously we set IH10 () = H01 () × H01 (). We set L20 () = {v ∈ L2 () : vdx = 0} and H0 (div) = { v ∈ IL2 () : div v ∈ L2 (), v · n = 0 on ∂}, where n is the outer normal to v |2H0 (div) = | v |2IL2 () + | div v|2L2 as norm. Further ∂. The space H0 (div) is endowed with | we put H0 (div 0) = { v ∈ H0 (div) : div v = 0 a.e. in }. It is well known that IL2 () = grad H 1 () ⊕ H0 (div 0);
(9.1.2)
cf. [DaLi, p. 216], for example. Moreover, H0 (div) = H0 (div 0)⊥ ⊕ H0 (div 0),
(9.1.3)
i
i i
i
i
i
i
9.1. BV-based image restoration problems
ItoKunisc 2008/6/12 page 255 i
255
with H0 (div 0)⊥ = { v ∈ grad H 1 () : div v ∈ L2 (), v · n = 0 on ∂}, and div : H0 (div 0)⊥ ⊂ H0 (div) → L20 () is a homeomorphism. In fact, it is injective by construction and for every f ∈ L20 () there exists, by the Lax–Milgram lemma, ϕ ∈ H 1 () such that div ∇ϕ = f in ,
∇ϕ · n = 0 on ∂,
with ∇ϕ ∈ H0 (div 0)⊥ . Hence, by the closed mapping theorem we have div ∈ L(H0 (div 0)⊥ , L20 ()). Finally, let Pdiv and Pdiv⊥ denote the orthogonal projections in IL2 () onto H0 (div 0) and grad H 1 (), respectively. Note that the restrictions of Pdiv and Pdiv⊥ to H0 (div 0) coincide with the orthogonal projections in H0 (div) onto H0 (div 0) and H0 (div 0)⊥ . Let 1 denote the two-dimensional vector field with 1 in both coordinates, set B = αI + K∗ K, and consider min 12 | div p + K∗ f |2B over p ∈ H0 (div), (9.1.4) such that − β 1 ≤ p ≤ β 1, where for v ∈ L2 () we put |v|2B = (v, B −1 v)L2 . It is straightforward to argue that (9.1.4) admits a solution. Theorem 9.1. The Fenchel dual to (9.1.4) is given by (9.1.1) and the solutions u∗ of (9.1.1) and p ∗ of (9.1.4) are related by Bu∗ = div p ∗ + K∗ f, (− div)∗ u∗ , p − p ∗ H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div),
(9.1.5) (9.1.6)
with −β 1 ≤ p ≤ β 1. Alternatively (9.1.4) can be considered as the predual of the original problem (9.1.1). Proof. We apply Fenchel duality as recalled at the beginning of the chapter with V = H0 (div), Y = Y ∗ = L2 (), = − div, G : Y → R given by G(v) = 12 |v − K∗ f |2B , and F : V → R defined by F(p) = I[−β 1,β where 1] (p), 0 if − β 1 ≤ p(x) ≤ β 1 for a.e. x ∈ , I[−β 1,β = 1] (p) ∞ otherwise. The convex conjugate G ∗ : L2 () → R of G is given by 1 α 1 |Kv + f |2 + |v|2 − |f |2 . 2 2 2 ∗ ∗ Further the conjugate F : H0 (div) → R of F is given by G ∗ (v) =
F ∗ ( q ) = sup q , p
H0 (div)∗ ,H0 (div) p∈S 1
for q ∈ H0 (div)∗ ,
(9.1.7)
i
i i
i
i
i
i
256
ItoKunisc 2008/6/12 page 256 i
Chapter 9. Semismooth Newton Methods II: Applications
Let us set where S1 = {p ∈ H0 (div) : −β 1 ≤ p ≤ β 1}. S2 = {p ∈ C01 () × C01 () : −β 1 ≤ p ≤ β 1}. The set S2 is dense in the topology of H0 (div) in S1 . In fact, let p be an arbitrary element of S1 . Since (D())2 is dense in H0 (div) (see, e.g., [GiRa, p. 26]), there exists a sequence pn ∈ (D())2 converging in H0 (div) to p. Let P denote the canonical projection in H0 (div) onto the closed convex subset S1 and note that, since p ∈ S1 , |p − P pn |H0 (div) ≤ |p − pn |H0 (div) + |pn − P pn |H0 (div) ≤ 2|p − pn |H0 (div) → 0
for n → ∞.
Hence limn→∞ |p − P pn |H0 (div) = 0 and S2 is dense in S1 . Returning to (9.1.7) we have for v ∈ L2 () and (− div)∗ ∈ L(L2 (), V ∗ ), F ∗ ((− div)∗ v) = sup (v, − div p), p∈S 2
which can be +∞. By the definition of the functions of bounded variation it is finite if and only if v ∈ BV() (see [Giu, p. 3]) and ∗ ∗ F ((−div) v) = β |Dv| < ∞ for v ∈ BV().
The dual problem to (9.1.4) is found to be min
1 α |Ku − f |2 + |u|2 + β 2 2
|Du|
over u ∈ BV().
From (9.0.2) moreover we find (− div)∗ u∗ , p − p ∗ H0 (div)∗ ,H0 (div) ≤ 0 and
for all p ∈ S1
Bu∗ = div p ∗ + K∗ f. We obtain the following optimality system for (9.1.4).
∗ ∈ H0 (div)∗ Corollary 9.2. Let p ∗ ∈ H0 (div) be a solution to (9.1.4). Then there exists λ such that ∗ = 0, div∗ B −1 div p ∗ + div∗ B −1 K∗ f + λ ∗ , p − p ∗ H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div), λ
(9.1.8) (9.1.9)
with −β 1 ≤ p ≤ β 1. ∗ = − div∗ u∗ ∈ H0 (div)∗ to obtain (9.1.8). For Proof. Apply div∗ B −1 to (9.1.5) and set λ ∗ , (9.1.9) follows from (9.1.6). this choice of λ
i
i i
i
i
i
i
9.1. BV-based image restoration problems To guarantee uniqueness of the solutions we replace (9.1.4) by min 12 | div p + K∗ f |2B + γ2 |p| 2 over p ∈ H0 (div) such that − β 1 ≤ p ≤ β 1
ItoKunisc 2008/6/12 page 257 i
257
(9.1.10)
with γ > 0. Clearly (9.1.10) admits a unique solution. Our goal now is to investigate the primal-dual active set strategy for (9.1.10). If we were to simply apply this method to the first order necessary optimality condition for (9.1.10), then the resulting nonlinear operator arising in the complementarity system is not Newton differentiable; cf. Section 8.6. The numerical results which are obtained in this way are competitive from the point of view of image reconstruction [HinK1], but due to lack of Newton differentiability the iteration numbers are mesh-dependent. We therefore approximate (9.1.10) by a sequence of more regular problems where the constraints −β 1 ≤ p ≤ β 1 are realized by penalty terms and an IH10 () smoothing guarantees superlinear convergence of semismooth Newton methods applied to the first order optimality conditions of the approximating problems: 2 + 12 | div p + K∗ f |2B + γ2 |p| 2 min 2c1 |∇ p| 2+ + 2c1 | max(0, c(p − β 1))|
2 over p ∈ IH10 (). + β 1))| (9.1.11) Let pc denote the unique solution to (9.1.11). It satisfies the optimality condition 1 | min(0, c(p 2c
1 c = 0, − pc − ∇B −1 div pc − ∇B −1 K∗ f + γ pc + λ c + min(0, c(pc + β 1)). c = max(0, c(pc − β 1)) λ
(9.1.12a) (9.1.12b)
Next we address convergence as c → ∞. c )}c>0 converges weakly in H0 (div) × IH10 ()∗ to the Theorem 9.3. The family {(pc , λ ∗ ) ∈ H0 (div) × H0 (div)∗ of the optimality system associated to unique solution (p ∗ , λ (9.1.10) given by ∗ = 0, div∗ B −1 div p ∗ + div∗ B −1 K∗ f + γ p ∗ + λ ∗ , p − p ∗ H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div). λ
(9.1.13) (9.1.14)
Moreover, the convergence of pc to p ∗ is strong in H0 (div). Proof. The proof is related to that of Theorem 8.26. The variational form of (9.1.13) is given by ∗ , v H0 (div)∗ ,H0 (div) = 0 (div p ∗ , div v)B + (K∗ f, div v)B + γ (p ∗ , v) + λ
(9.1.15)
i ) ∈ H0 (div)×H0 (div)∗ , for all v ∈ H0 (div). To verify uniqueness, let us suppose that (pi , λ =λ 2 − λ 1 i = 1, 2, are two solution pairs to (9.1.13), (9.1.14). For δ p = p2 − p1 , δ λ we have v H0 (div)∗ ,H0 (div) = 0 (B −1 div δ p, div v) + γ (δ p, v) + δ λ,
(9.1.16)
i
i i
i
i
i
i
258
ItoKunisc 2008/6/12 page 258 i
Chapter 9. Semismooth Newton Methods II: Applications
for all v ∈ H0 (div), and δ p
δ λ, H0 (div)∗ ,H0 (div) ≥ 0. With v = δ p in (9.1.16) we obtain |B −1 div δ p| 2 + γ |δ p| 2≤0 1 = λ 2 . Thus uniqueness is established and hence p1 = p2 . From (9.1.15) we deduce that λ and we can henceforth rely on subsequential arguments. ic , i = 1, 2, of λ c . We In the following computation we consider the coordinates λ have for a.e. x ∈ ic pci = max(0, c(pci − β)) + min(0, c(pci + β)) pci λ ⎧ ⎨ c(pci − β)pci if pci ≥ β, 0 if |pci | = β, = ⎩ c(pci + β)pci if pci ≤ β. It follows that ic , pci )IL2 () ≥ (λ
1 i 2 | 2 |λ c c IL ()
for i = 1, 2,
and consequently c , pc )IL2 () ≥ (λ
1 c |2 2 |λ IL () c
for every c > 0.
(9.1.17)
From (9.1.12) and (9.1.17) we deduce that 1 |∇ pc |2 + | div pc |2B + γ |pc |2 ≤ | div pc |B |K∗ f |B c and hence 1 1 1 |∇ pc |2 + | div pc |2B + γ |pc |2 ≤ |K∗ f |B . c 2 2
(9.1.18)
We further estimate c |IH1 ()∗ = |λ 0
sup =1 | v| 1 IH0 ()
c , v IH1 ()∗ ,IH1 () λ 0 0
≤
sup =1 | v| 1 IH0 ()
1 v| . |∇ pc ||∇ v| + | div pc |B | div v|B + |K∗ f |B | div v|B + γ |pc | | c
From (9.1.18) we deduce the existence of a constant K, independent of c ≥ 1, such that c |IH1 ()∗ ≤ K. |λ 0
(9.1.19)
∗ ) ∈ H0 (div) × Combining (9.1.18) and (9.1.19) we can assert the existence of (p ∗ , λ ∗ 1 IH0 () such that for a subsequence denoted by the same symbol ∗
c ) (p ∗ , λ ∗ ) weakly in H0 (div) × IH10 () . (pc , λ
(9.1.20)
i
i i
i
i
i
i
9.1. BV-based image restoration problems
ItoKunisc 2008/6/12 page 259 i
259
We recall the variational form of (9.1.12), i.e., 1 c , v) = 0 (∇ pc , ∇ v) + (div pc , div v)B + (K∗ f, div v)B + γ (pc , v) + (λ c for all v ∈ IH10 (). Passing to the limit c → ∞, using (9.1.18) and (9.1.20) we have (div p ∗ , div v)B +(K∗ f, div v)B + γ (p ∗ , v) ∗ , v IH1 ()∗ ,IH1 () = 0 for all v ∈ IH10 (). + λ 0 0
(9.1.21)
Since IH10 () is dense in H0 (div) and p ∗ ∈ H0 (div) we have that (9.1.21) holds for all v ∈ ∗ can be identified with an element in H0 (div)∗ and ·, · IH1 ()∗ ,IH1 () H0 (div). Consequently λ 0 0 in (9.1.21) can be replaced by ·, · H0 (div)∗ ,H0 (div) . We next verify that p ∗ is feasible. For this purpose note that + min(0, c(pc + β 1), p − pc ) ≤ 0 c , p − pc ) = max(0, c(pc − β 1)) (λ
(9.1.22)
From (9.1.11) we have for all −β 1 ≤ p ≤ β 1. 1 1 c |2 ≤ |K∗ f |2B . |∇ pc |2 + | div pc + K∗ f |2B + γ |pc |2 + |λ c c
(9.1.23)
c |2 ≤ |K∗ f |2B for all c > 0. Note that Consequently, 1c |λ 1 2 2 + c| min(0, pc + β 1)| 22 c |2 2 = c| max(0, pc − β 1)| |λ IL () IL () IL () c and thus c→∞ c→∞ 22 − 22 − | max(0, (pc − β 1))| −−→ 0 and | min(0, (pc + β 1))| −−→ 0. IL () IL ()
(9.1.24)
Recall that pc p ∗ weakly in IL2 (). Weak lower semicontinuity of the convex functional IL2 () and (9.1.24) imply that p → | max(0, p − β 1)|
2 dx ≤ lim inf | max(0, p − β 1)| ∗
c→∞
2 dx = 0. | max(0, pc − β 1)|
Consequently, p ∗ ≤ β 1 and analogously one verifies that −β 1 ≤ p ∗ . In particular p ∗ is feasible and from (9.1.22) we conclude that c , p ∗ − pc H0 (div)∗ ,H0 (div) ≤ 0 λ
for all c > 0.
(9.1.25)
By optimality of pc for (9.1.11) we have lim sup c→∞
1 γ | div pc + K∗ f |2B + |pc |2 2 2
≤
1 γ 2 | div p + K∗ f |2B + |p| 2 2
(9.1.26)
i
i i
i
i
i
i
260
ItoKunisc 2008/6/12 page 260 i
Chapter 9. Semismooth Newton Methods II: Applications
Density of S2 in S1 = {p ∈ H0 (div) : for all p ∈ S2 = {p ∈ (C01 ())2 : −β 1 ≤ p ≤ β 1}. in the norm of H0 (div) implies that (9.1.26) holds for all p ∈ S1 and −β 1 ≤ p ≤ β 1} consequently 1 1 γ γ ∗ 2 2 lim sup | div pc + K f |B + |pc | ≤ | div p ∗ + K∗ f |2B + |p ∗ |2 2 2 2 2 c→∞ 1 γ ≤ lim inf | div pc + K∗ f |2B + |pc |2 , c→∞ 2 2 where for the last inequality weak lower semicontinuity of norms is used. The above inequalities, together with weak convergence of pc to p ∗ in H0 (div), imply strong convergence of pc to p ∗ in H0 (div). Finally we aim at passing to the limit in (9.1.25). This is ∗ in IH10 ()∗ . Note from (9.1.12) that c λ impeded by the fact that we only established λ 1 c }c≥1 is bounded in H0 (div). Hence there exists μ {− c pc + λ ∗ ∈ H0 (div)∗ such that 1 c μ − pc + λ ∗ c
weakly in H0 (div)∗ ,
∗
and consequently also in IH10 () . Moreover, { √1c |∇ pc |}c≥1 is bounded and hence 1 − pc 0 c
∗
weakly in IH10 () ∗
c λ ∗ weakly in IH10 () it follows that as c → ∞. Since λ ∗ − μ λ ∗ , v IH10 ()∗ ,IH10 () = 0
for all v ∈ IH10 ().
∗ and μ Since both λ ∗ are elements of H0 (div)∗ and since IH10 () is dense in H0 (div), it ∗ =μ follows that λ ∗ in H0 (div)∗ . For p ∈ S2 we have ∗ , p − p ∗ H0 (div)∗ ,H0 (div) = μ∗ , p − p ∗ H0 (div)∗ ,H0 (div) λ 1 = lim − pc + λc , p − pc c→∞ c H0 (div)∗ ,H0 (div) 1 c , p − pc ) = lim (∇ pc , ∇(p − pc )) + (λ c→∞ c 1 c , p − pc ) ≤ 0 ≤ lim + (λ (∇ pc , ∇ p) c→∞ c by (9.1.22) and (9.1.23). Since S2 is dense in S1 we find ∗ , p − p ∗ H0 (div)∗ ,H0 (div) ≤ 0 λ
for all p ∈ S1 .
Remark 9.1.1. For γ = 0 problems (9.1.10) and (9.1.11) admit a solution, which, however, c )}c>0 contains a weak is not unique. From the proof of Theorem 9.3 it follows that {(pc , λ ∗ 1 accumulation point in H0 (div) × IH0 () and every weak accumulation point is a solution of (9.1.10).
i
i i
i
i
i
i
9.1. BV-based image restoration problems
ItoKunisc 2008/6/12 page 261 i
261
Semismooth Newton treatment of (9.1.11). We turn to the algorithmic treatment of the infinite-dimensional problem (9.1.11) for which we propose the following algorithm. Algorithm. (1) Choose p0 ∈ IH10 () and set k = 0. (2) Set, for i = 1, 2, A+,i ki + β 1)(x) > 0}, k+1 = {x : (p −,i A = {x : (pki + β 1)(x) < 0}, k+1 i Ik+1
−,i = \ (A+,i k+1 ∪ Ak+1 ).
(3) Solve for p ∈ IH10 () and set pk+1 = p where 1 v) (∇ p, ∇ v) + (div p, div v)B + (K∗ f, div v)B + γ (p, c A+ , v) + (c(p + β 1)χ A− , v) = 0 + (c(p − β 1)χ k+1 k+1
(9.1.27)
for all v ∈ IH10 (). (4) Set ik+1 λ
⎧ 0 ⎪ ⎪ ⎪ ⎪ ⎨ i c(pk+1 − β 1) = ⎪ ⎪ ⎪ ⎪ ⎩ c(p i + β 1) k+1
i on Ik+1 ,
on A+,i k+1 , on A−,i k+1
for i = 1, 2. (5) Stop, or set k = k + 1, and goto (2). Above χA+k+1 stands for i χA = + k+1
⎧ ⎨ 1 ⎩
0
if x ∈ A+,i k+1 , if x ∈ / A+,i k+1 ,
and analogously for A− k+1 . The superscript i, i = 1, 2, refers to the respective component. We note that (9.1.27) admits a solution pk+1 ∈ IH10 (). Step (4) is included for the sake of the analysis of the algorithm. Let C : IH10 () → H −1 () × H −1 () stand for the operator 1 C = − − ∇B −1 div +γ id. c It is a homeomorphism for every c > 0 and allows us to express (9.1.12) as + c min(0, p + β 1) = 0, C p − ∇B −1 K∗ f + c max(0, p − β 1)
(9.1.28)
i
i i
i
i
i
i
262
ItoKunisc 2008/6/12 page 262 i
Chapter 9. Semismooth Newton Methods II: Applications
where we drop the index in the notation for pc . For ϕ ∈ L2 () we define 1 if ϕ(x) > 0, Dmax(0, ϕ)(x) = 0 if ϕ(x) ≤ 0 and
Dmin(0, ϕ)(x) =
1 0
if ϕ(x) < 0, if ϕ(x) ≥ 0.
(9.1.29)
(9.1.30)
Using (9.1.29), (9.1.30) as Newton derivatives for the max and the min operations in (9.1.28) the semismooth Newton step can be expressed as A+ + c(pk+1 + β 1)χ A− − ∇B −1 K∗ f = 0, C pk+1 + c(pk+1 − β 1)χ k+1 k+1
(9.1.31)
k+1 from step (4) of the algorithm is given by and λ A+ + c(pk+1 + β 1)χ A− . k+1 = c(pk+1 − β 1)χ λ k+1 k+1
(9.1.32)
rather The iteration of the algorithm can also be expressed with respect to the variable λ than p. For this purpose we define − c min(0, C −1 (∇ fˆ − λ) − β 1) + β 1), ) = λ − c max(0, C −1 (∇ fˆ − λ) F (λ
(9.1.33)
k ), the semismooth Newton step where we put fˆ = B −1 K∗ f . Setting pk = C −1 (∇ fˆ − λ applied to F (λ) = 0 at λ = λk results in A+ + c(C −1(∇ fˆ − λ A− k+1 ) − β 1)χ k+1 ) + β 1)χ k+1 = c(C −1(∇ fˆ − λ λ k+1 k+1 which coincides with (9.1.32). Therefore the semismooth Newton iterations according to = 0 coincide, provided that the initializations are related by the algorithm and that for F (λ) ˆ ∈ IL2 () C p0 − ∇ f + λ0 = 0. The mapping F is Newton differentiable, i.e., for every λ IL2 () ) − F (λ) IL2 () = o(|h| + h) − DF (λ + h)h| |F (λ
(9.1.34)
IL2 () → 0; see Example 8.14. Here D denotes the Newton derivative of F defined for |h| by means of (9.1.29) and (9.1.30). For (9.1.34) to hold, the smoothing property of C −1 in the sense of an embedding from IL2 () into ILp () for some p > 2 is essential. The following result now follows from Theorem 8.16. c − λ 0 |IL2 () is sufficiently small, then the iterates {(pk , λ k )}∞ Theorem 9.4. If |λ k=1 of the 1 2 algorithm converge superlinearly in IH0 () × IL () to the solution (pc , λc ) of (9.1.11). In Theorem 9.3 convergence as c → ∞ is established and Theorem 9.4 asserts k ) for each fixed c. The combination of these two limit convergence of the iterates (pk , λ processes was addressed in [HinK2], where a path concept with respect to the variable c is developed and the iterations with respect to k are controlled to remain in a neighborhood around the path.
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 263 i
263
Let us compare the algorithm of this section to the general framework of augmented Lagrangians presented in Section 4.6 for nonsmooth problems. We again introduce in (9.1.10) a diffusive regularization and realize the inequality constraints by a generalized Yosida–Moreau approximation. This suggests considering the Lagrangian L(p, μ) : IH10 () 2 × IL () → R defined by = λ) Lc (p,
1 1 γ 2 λ), |∇ p| 2 + | div p + K∗ f |2B + |p| + φc (p, 2c¯ 2 2
(9.1.35)
where φc is the generalized Yosida–Moreau approximation of the indicator function φ of the and c > 0, c¯ > 0. Here we choose c differently from set {p ∈ IL2 () : −β 1 ≤ p ≤ β 1}, c¯ since in the limit of the augmented Lagrangian iteration the constraint −β 1 ≤ p ≤ β 1 is satisfied for any fixed c > 0. We have = φc (p, λ)
c 2 φ(p − q) + (μ, q)IL2 () + | q| 2 2 IL () ()
inf
q∈IL
2
for c > 0 and μ ∈ IL2 (), which can equivalently be expressed as = λ) φc (p,
1
2 2 + c(p − β 1) max 0, λ IL () 2c
1
2 2 − 1 λ 2 2 . + c(p + β 1) + min 0, λ IL () 2c 2c IL ()
The auxiliary problems in step 2 of the augmented Lagrangian method of Section 4.6 with Lc k in the max/min operations. given in (9.1.35) coincide with (9.1.11) except for the shift by λ Each of these auxiliary problems can efficiently be solved by the semismooth Newton Conversely, algorithm presented in this section if pk ∓ β 1 is replaced by λk + c(pk ∓ β 1). one can think of introducing augmented Lagrangian steps to the algorithm of this section, with the goal of avoiding possible ill-conditioning as c → ∞. For numerical experience with the algorithmic concepts of this section we refer the reader to [HinK1].
9.2 9.2.1
Friction and contact problems in elasticity Generalities
This chapter is concerned with the application of semismooth Newton methods to contact problems with Tresca and Coulomb friction in two spatial dimensions. In contact problems, also known as Signorini problems, one has to detect the contact zone between an elastic body and a rigid foundation. At the contact boundary, frictional forces are often too large to be neglected. Thus, besides the nonpenetration condition the frictional behavior in the contact zone has to be taken into account. Modeling friction involves a convex, nondifferentiable functional which, by means of Fenchel duality theory, can be related to a bilateral obstacle problem for which the primal-dual active set concept can advantageously be applied. We commence with a formulation of the Signorini contact problem with Coulomb friction in two dimensions. A generalization to the three-dimensional case is given in [HuStWo]. We consider an elastic body that occupies, in its initial configuration, the open and bounded domain ⊂ R2 with C 1,1 -boundary = ∂. Let this boundary be divided
i
i i
i
i
i
i
264
ItoKunisc 2008/6/12 page 264 i
Chapter 9. Semismooth Newton Methods II: Applications
into three disjoint parts, namely, the Dirichlet part d , further the part n with prescribed 2 surface load h ∈ L2 (n ) := L2 (n ) , and the part c , where contact and friction with a rigid foundation may occur. For simplicity we assume that ¯ c ∩ ¯ d = ∅ to avoid working 1
with the space H002 (c ). We are interested in the deformation y = (y1 , y2 )! of the elastic 2 body which is also subject to a given body force f ∈ L2 () := L2 () . The gap between 2 the elastic body and the rigid foundation is d := τN d ≥ 0, where d ∈ H1 () := H 1 () and τN y denotes the normal component of the trace along c . As usual in linear elasticity, the linearized strain tensor is ε(y) =
1 ∇y + (∇y)! . 2
Using Hooke’s law for the stress-strain relation, the linearized stress tensor σ (y) := Cε(y) := λtr(ε(y))Id + 2με(y) is obtained, where λ and μ are the Lamé parameters. These parameters are given by λ = (Eν)/ (1 + ν)(1 − 2ν) and μ = E/ 2(1 + ν) with Young’s modulus E > 0 and the Poisson ration ν ∈ (0, 0.5). Above, C denotes the fourth order isotropic material tensor for linear elasticity. The Signorini problem with Coulomb friction is then given as follows: −Div σ (y) = f
in ,
(9.2.1a)
τy = 0
on d ,
(9.2.1b)
σ (y)n = h
on n ,
(9.2.1c)
on c ,
(9.2.1d)
on {x ∈ c : τT y = 0},
(9.2.1e)
on {x ∈ c : τT y = 0},
(9.2.1f )
τN y − d ≤ 0, σN (y) ≤ 0, (τN y − d)σN (y) = 0 |σT (y)| < F|σN (y)| |σN (y)| σT (y) = −F τT y |τT y|
where the sets {x ∈ c : τT y = 0} and {x ∈ c : τT y = 0} are referred to as the sticky and the sliding regions, respectively. Above, Div denotes the rowwise divergence 1 2 1 operator and τ : H1 () → H 2 () := H 2 () the zero order trace mapping. The corresponding scalar-valued normal and tangential component mappings are denoted by 1 τN , τT : H1 () → H 2 (c ); i.e., for y ∈ H1 () we have the splitting τ y = (τN y)n + (τT y)t with n and t denoting the unit normal and tangential vector along c , respectively. Similarly, using (9.2.1a) we can, following [KO], decompose the stress along the boundary, namely, 1 σ (y)n = σN (y)n + σT (y)t with mappings σN , σT : Y → H − 2 (c ). Moreover, F : c → R denotes the friction coefficient which is supposed to be uniformly Lipschitz continuous. There are major mathematical difficulties inherent in the problem (9.2.1). For instance, in general σN (y) in (9.2.1e), (9.2.1f) is not pointwise a.e. defined. Replacing the Coulomb friction in the above model by Tresca friction means replacing |σN (y)| by a given friction g. The resulting system can then be analyzed and the existence of a unique solution can be proved. For a review we refer the reader to [Rao], for example.
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
9.2.2
ItoKunisc 2008/6/12 page 265 i
265
Contact problem with Tresca friction
We now give a variational statement of the Signorini problem with given friction. The set of admissible deformations is defined as Y := {v ∈ H1 () : τ v = 0 a.e. on d }, where H1 () := (H 1 ())2 . To incorporate the nonpenetration condition between the elastic body and the rigid foundation we introduce the cone K := {v ∈ Y : τN v ≤ 0 a.e. on c }. We define the symmetric bilinear form a(· , ·) on Y × Y and the linear form L(·) on Y by L(y) = f y dx + h τ y dx, a(y, z) := (σ y) : (εz) dx,
n
where : denotes the sum of the componentwise products. 1 For the friction g we assume that g ∈ H − 2 (c ) and g ≥ 0, i.e., g, h c ≥ 0 for 1 all h ∈ H 2 (c ) with h ≥ 0. Since the friction coefficient F is assumed to be uniformly 1 Lipschitz continuous, it is a factor on H 2 (c ), i.e., the mapping 1
1
λ ∈ H 2 (c ) → Fλ ∈ H 2 (c ) is well defined and bounded; see [Gri, p. 21]. By duality it follows that F is a factor on 1 H − 2 (c ) as well. Consequently, the nondifferentiable functional j (y) := Fg|τT y| dx c
is well defined on Y. After these preparations we can state the contact problem with given friction as min J (y) :=
y∈d+K
1 a(y, y) − L(y) + j (y) 2
or, equivalently, as elliptic variational inequality [Glo]: Find y ∈ d + K such that a(y, z − y) + j (z) − j (y) ≥ L(z − y) for all z ∈ d + K.
(P)
(9.2.2)
Due to the Korn inequality, the functional J (·) is uniformly convex; further it is l.s.c. This implies that (P) and equivalently (9.2.2) admit a unique solution y ∗ ∈ d + K. To derive the dual problem corresponding to (P), we apply the Fenchel calculus to 1 the mappings F : Y → R and G : V × H 2 (c ) → R given by −L(y) if y ∈ d + K, 1 F(y) := G(q, ν) := q : C q dx + Fg|ν| dx, 2 ∞ else; c
i
i i
i
i
i
i
266
ItoKunisc 2008/6/12 page 266 i
Chapter 9. Semismooth Newton Methods II: Applications
where V = {p ∈ (L2 ())2×2 : p12 = p21 }. 1
Furthermore, ∈ L(Y, V × H 2 (c )) is given by y := (1 y, 2 y) = (εy, τT y), which allows us to express (P) as
min F(y) + G(y) . y∈Y
1
Endowing V × H 2 (c ) with the usual product norm, F and G satisfy the conditions for the Fenchel duality theorem, which was recalled in the preamble of this chapter. For the convex conjugate one derives that F (− (p, μ)) equals +∞ unless −Div p = f , p · n = h in L2 (n ),
and pT + μ = 0 in H − 2 (c ), 1
(9.2.3)
− 12
where pT = (n! p) · t ∈ H (c ). Further one obtains that 1 − pN , d c if (9.2.3) and pN ≤ 0 in H − 2 (c ) hold, F (− (p, μ)) = ∞ else. Evaluating the convex conjugate for G yields that ⎧ ⎨ 1 C−1 p : p dx if Fg, |ν| − ν, μ ≥ 0 for all ν ∈ H 12 ( ), c c c 2 G (p, μ) = ⎩ ∞ else. Thus, following (9.0.1) we derive the dual problem corresponding to (P): 1 − C−1 p : p dx + pN , d c . sup 2 1 −2 (p,μ)∈V×H
(P )
(c ) 1
− s.t. (9.2.3),p N ≤0 in H 2 (c ), and Fg, |ν| c − ν, μ c ≥0 1
for all ν∈H 2 (c ).
This problem is an inequality-constrained maximization problem of a quadratic functional, while the primal problem (P) involves the minimization of a nondifferentiable functional. Evaluating the extremality conditions (9.0.2) for the above problems one obtains the following lemma. Lemma 9.5. The solution y ∗ ∈ d + K of (P) and the solution (p ∗ , μ∗ ) of (P ) are related 1 by σ y ∗ = p∗ and by the existence of λ∗ ∈ H − 2 (c ) such that a(y ∗ , z) − L(z) + μ∗ , τT z c + λ∗ , τN z c = 0 for all z ∈ Y,
(9.2.4a)
λ∗ , τN z c ≤ 0 for all z ∈ K,
(9.2.4b)
λ∗ , τN y ∗ − d c = 0,
(9.2.4c) 1 2
Fg, |ν| c − μ∗ , ν c ≥ 0 for all ν ∈ H (c ),
(9.2.4d)
Fg, |τT y ∗ | c − μ∗ , τT y ∗ c = 0.
(9.2.4e)
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 267 i
267
Proof. The extremality condition − (p ∗ , μ∗ ) ∈ ∂F(y ∗ ) results in y ∗ ∈ d + K and (p∗ , ε(z − y ∗ )) − L(z − y ∗ ) + μ∗ , τT (z − y ∗ ) c ≥ 0 for all z ∈ d + K. ∗
∗
∗
∗
(9.2.5)
∗
The condition (p , μ ) ∈ ∂G(y ) yields that p = σ y and (9.2.4d) and (9.2.4e). Introducing the multiplier λ∗ for the variational inequality (9.2.5) leads to (9.2.4a), (9.2.4b), and (9.2.4c). According to (9.2.3) the multiplier μ∗ ∈ H − 2 (c ) corresponding to the nondifferentiability of the primal functional J (·) has the mechanical interpretation μ∗ = −σT y ∗ . Using Green’s theorem in (9.2.4a) one finds 1
λ∗ = −σN y ∗ ,
(9.2.6)
i.e., λ∗ is the negative stress in normal direction. We now briefly comment on the case that the given friction g is more regular, namely, g ∈ L2 (c ). In this case we can define G on the larger set V × L2 (c ). One can verify that the assumptions for the Fenchel duality theorem hold, and thus obtain higher regularity for the dual variable μ corresponding to the nondifferentiability of the cost functional in (P ), in particular μ ∈ L2 (c ). This implies that the dual problem can be written as follows: 1 sup − C−1 p : p dx + pN , d c . (9.2.7) 2 2 (p,μ)∈V×L (c ) 1
s.t. (9.2.3),pN ≤0 in H − 2 (c ), and |μ|≤Fg a.e. on c .
Utilizing the relation p = σ y and (9.2.6), one can transform (9.2.7) into ⎧ 1 ⎪ ⎪ − min a(y λ,μ , y λ,μ ) + λ, d c , ⎪ ⎪ − 12 ⎪ 2 ⎪ (c ) 2 ⎨ (λ,μ)∈H (−c )∈×L 1 λ≥0 in H
2
(c )
|μ|≤Fg a.e. on c ⎪ ⎪ ⎪ where y λ,μ satisfies ⎪ ⎪ ⎪ ⎩ a(y λ,μ , z) − L(z) + λ, τN z c + (μ, τT z)c = 0 for all z ∈ Y.
(9.2.8)
Problem (9.2.8) is an equivalent form for the dual problem (9.2.7), now written in the variables λ and μ. The primal variable y λ,μ appears only as an auxiliary variable determined from λ and μ. Since g ∈ L2 (c ), also the extremality conditions corresponding to (P) and (9.2.7) can be given more explicitly. First, (9.2.4d) is equivalent to |μ∗ | ≤ Fg a.e. on c , and a brief computation shows that (9.2.4e) is equivalent to τT y ∗ = 0 or τ y∗ τT y ∗ = 0 and μ∗ = Fg |τT y ∗ | . T
(9.2.4d )
(9.2.4e )
Moreover (9.2.4d ) and (9.2.4e ) can equivalently be expressed as σ τT y ∗ − max(0, σ τT y ∗ + μ∗ − Fg) − min(0, σ τT y ∗ + μ∗ + Fg) = 0
(9.2.9)
with arbitrary σ > 0.
i
i i
i
i
i
i
268
ItoKunisc 2008/6/12 page 268 i
Chapter 9. Semismooth Newton Methods II: Applications
We now introduce and analyze a regularized version of the contact problem with given friction that allows the application of the semismooth Newton method. In what follows we assume that g ∈ L2 (c ). We start our consideration with a regularized version of the dual problem (9.2.7) written in the form (9.2.8). Let γ1 , γ2 > 0, λˆ ∈ L2 (c ), λˆ ≥ 0, and μˆ ∈ L2 (c ), and define the functional Jγ1,γ2 : L2 (c ) × L2 (c ) −→ R by 1 1 ˆ 2 a(y λ,μ , y λ,μ ) + (λ, d)c + λ − λ c 2 2γ1 1 1 1 ˆ 2 − + μ − μ ˆ 2c − λ μ ˆ 2c , c 2γ2 2γ1 2γ2
Jγ1,γ2 (λ, μ) :=
where y λ,μ ∈ Y satisfies a(y λ,μ , z) − L(z) + (λ, τN z)c + (μ, τT z)c = 0 for all z ∈ Y.
(9.2.10)
The regularized dual problem with given friction is defined as max
(λ,μ)∈L2 (c )×L2 (c ) λ≥0, |μ|≤Fg a.e. on c
−Jγ1,γ2 (λ, μ).
(Pγ1,γ2 )
Obviously, the last two terms in the definition of Jγ1,γ2 are constants and can thus be neglected in the optimization problem (Pγ1,γ2 ). However, they are introduced with regard to the primal problem corresponding to (Pγ1,γ2 ), which we turn to next. We define the functional Jγ1,γ2 : Y → R by 1 1 Jγ1,γ2 (y) := a(y, y) − L(y) + max(0, λˆ + γ1 (τN y − d))2c 2 2γ1 1 + Fgh(τT y(x), μ(x)) ˆ dx, γ2 c where h(· , ·) : R × R −→ R is a local smoothing of the absolute value function, given by ⎧ 1 ⎪ ⎨ |γ2 x + α| − Fg 2 h(x, α) := 1 ⎪ ⎩ |γ2 x + α|2 2Fg
if |γ2 x + α| ≥ Fg, if |γ2 x + α| < Fg.
(9.2.11)
Then the primal problem corresponding to (Pγ1,γ2 ) is min Jγ1,γ2 (y). y∈Y
(Pγ1,γ2 )
This can be verified similarly as for the original problem using Fenchel duality theory; see [Sta1] for details. Clearly, both (Pγ1,γ2 ) and (Pγ1,γ2 ) admit unique solutions y γ1,γ2 and (λγ1,γ2 , μγ1,γ2 ), respectively. Note that the regularization turns the primal problem into the unconstrained minimization of a continuously differentiable functional, while the corresponding dual problem is still a constrained minimization of a quadratic functional. To shorten notation we henceforth mark all variables of the regularized problems only by the
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 269 i
269
index γ instead of γ1,γ2 . It can be shown that the extremality conditions relating (Pγ1,γ2 ) and (Pγ1,γ2 ) are a(y γ , z) − L(z) + (μγ , τT z)c + (λγ , τN z)c = 0 for all z ∈ Y, λγ − max(0, λˆ + γ1 (τN y γ − d)) = 0 on c , γ2 (ξγ − τT y γ ) + μγ − μˆ = 0, ξγ − max(0, ξγ + σ (μγ − Fg)) − min(0, ξγ + σ (μγ + Fg)) = 0
(9.2.12a) (9.2.12b) (9.2.12c)
for any σ > 0. Here, ξγ is the Lagrange multiplier associated to the constraint |μ| ≤ Fg in (Pγ1,γ2 ). By setting σ = γ2−1 , ξγ can be eliminated from (9.2.12c), which results in γ2 τT y γ + μˆ − μγ − max(0, γ2 τT y γ + μˆ − Fg) − min(0, γ2 τT y γ + μˆ + Fg) = 0. (9.2.13) While (9.2.13) and (9.2.12c) are equivalent, they will motivate slightly different active set algorithms due to the parameter σ in (9.2.12c). Next we investigate the convergence of the primal variable y γ as well as the dual variables (λγ , μγ ) as the regularization parameters γ1 , γ2 tend to infinity. For this purpose we denote by y ∗ the solution of (P) and by (λ∗ , μ∗ ) the solution to (P ). Theorem 9.6. For all λˆ ∈ L2 (c ), λˆ ≥ 0, μˆ ∈ L2 (c ), and g ∈ L2 (c ), the primal variable y γ converges to y ∗ strongly in Y and the dual variables (λγ , μγ ) converge to 1 (λ∗ , μ∗ ) weakly in H − 2 (c ) × L2 (c ) as γ1 → ∞ and γ2 → ∞. Proof. The proof of this theorem is related to that of Theorem 8.26 in Chapter 8 and can be found in [KuSt]. An active set algorithm for solving (9.2.12) is presented next. The interpretation as generalized Newton method is discussed later. In the following we drop the index γ . Algorithm SSN. 1. Initialize (λ0 , ξ 0 , μ0 , y 0 ) ∈ L2 (c ) × L2 (c ) × L2 (c ) × Y, σ > 0 and set k := 0. 2. Determine the active and inactive sets Ak+1 = {x ∈ c : λˆ + γ1 (τN y k − d) > 0}, c Ick+1 = c \ Ak+1 c , k k Ak+1 f,− = {x ∈ c : ξ + σ (μ + Fg) < 0}, k k Ak+1 f,+ = {x ∈ c : ξ + σ (μ − Fg) > 0}, k+1 Ifk+1 = c \ (Ak+1 f,− ∪ Af,+ ). k+1 k k 3. If k ≥ 1, Ak+1 = Akc , Ak+1 c f,− = Af,− , and Af,+ = Af,+ stop. Else
i
i i
i
i
i
i
270
ItoKunisc 2008/6/12 page 270 i
Chapter 9. Semismooth Newton Methods II: Applications
4. Solve a(y k+1 , z) − L(z) + (μk+1 , τT z)c + (λk+1 , τN z)c = 0 for all z ∈ Y, λk+1 = 0 on Ick+1 , λk+1 = λˆ + γ1 (τN y k+1 − d) on Ak+1 c , μk+1 − μˆ − γ2 τT y k+1 = 0 on Ifk+1 , k+1 μk+1 = −Fg on Ak+1 = Fg on Ak+1 f,− , μ f,+ .
5. Set
ξ k+1
⎧ τ y k+1 + γ2−1 (μˆ + Fg) ⎪ ⎪ ⎨ T := τT y k+1 + γ2−1 (μˆ − Fg) ⎪ ⎪ ⎩ 0
on Ak+1 f,− , on Ak+1 f,+ , on Ifk+1 ,
k := k + 1 and goto step 2. Note that there exists a unique solution to the system in step 4, since it represents the necessary and sufficient optimality conditions for the equality-constrained auxiliary problem min Jγ1 ,γ2 (λ, μ), λ=0 on Ic
k+1
μ=−Fg on A
k+1 f,− ,
,
μ=Fg on Af,+
k+1
with Jγ1 ,γ2 as defined in (9.2.10) that clearly has a unique solution. If the algorithm stops at step 3 then y k is the solution to the primal problem (Pγ1,γ2 ) and (λk , μk ) solves the dual problem (Pγ1,γ2 ). Provided that we choose σ = γ2−1 , the above algorithm can be interpreted as a semismooth Newton method in infinite-dimensional spaces. To show this assertion, we consider a reduced system instead of (9.2.12). Thereby, as in the dual problem (Pγ1,γ2 ), the primal variable y only acts as an auxiliary variable that is calculated from the dual variables (λ, μ). We introduce the mapping F : L2 (c ) × L2 (c ) −→ L2 (c ) × L2 (c ) by ⎛ ⎞ λ − max(0, λˆ + γ1 (τN y λ,μ − d)) ⎜ ⎟ F (λ, μ) = ⎝ γ2 τT y + μˆ − μ − max(0, γ2 τT y + μˆ − Fg) · · · ⎠ , (9.2.14) λ,μ
λ,μ
· · · − min(0, γ2 τT y λ,μ + μˆ + Fg)
where for given λ and μ we denote by y λ,μ the solution to a(y, z) − L(z) + (μ, τT z)c + (λ, τN z)c = 0 for all z ∈ Y. 1
(9.2.15) 1
For (λ, μ) ∈ L2 (c ) × L2 (c ) we have that τN y λ,μ , τT y λ,μ ∈ H 2 (c ). Since H 2 (c ) embeds continuously into Lp (c ) for every p < ∞, we have the following composition of the first component of F , for each 2 < p < ∞: " L2 (c ), L2 (c ) × L2 (c ) → Lp (c ) → (9.2.16) (λ, μ) → τN y λ,μ → max(0, λˆ + γ1 (τN y λ,μ − d)).
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 271 i
271
Since the mapping " involves a norm gap under the max functional, it is Newton differentiable (see Example 8.14), and thus the first component of F is Newton differentiable. A similar observation holds for the second component as well, and thus the whole mapping F is Newton differentiable. Hence, we can apply the semismooth Newton method to the equation F (λ, μ) = 0. Calculating the explicit form of the Newton step leads to Algorithm SSN with σ = γ2−1 . Theorem 9.7. Suppose that there exists a constant g0 > 0 with Fg ≥ g0 , and further suppose that σ ≥ γ2−1 and that λ0 − λγ c , μ0 − μγ c are sufficiently small. Then the iterates (λk , ξ k , μk , y k ) of Algorithm SSN converge superlinearly to (λγ , ξγ , μγ , y γ ) in L2 (c ) × L2 (c ) × L2 (c ) × Y. Proof. The proof consists of two steps. First we prove the assertion for σ = γ2−1 and then we utilize this result for the general case σ ≥ γ2−1 . Step 1. For σ = γ2−1 Algorithm SSN is a semismooth Newton method for the equation F (λ, μ) = 0 (F as defined in (9.2.14)). We already argued Newton differentiability of F . To apply Theorem 8.16, it remains to show that the generalized derivatives have uniformly bounded inverses, which can be achieved similarly as in the proof of Theorem 8.25 in Chapter 8. Clearly, the superlinear convergence of (λk , μk ) carries over to the variables ξ k and y k . Step 2. For σ > γ2−1 we cannot use the above argument directly. Nevertheless, one can prove superlinear convergence of the iterates by showing that in a neighborhood of the solution the iterates of Algorithm SSN with σ > γ2−1 coincide with those of Algorithm SSN with σ = γ2−1 . The argument for this fact exploits the smoothing properties of the Neumann-to-Dirichlet mapping for the elasticity equation. First, we again consider the case σ = γ2−1 . Clearly, for all k ≥ 1 we have λk − λk−1 ∈ L2 (c ) and μk − μk−1 ∈ L2 (c ). The corresponding difference y k − y k−1 of the primal variables satisfies a(y k − y k−1 , z) + (μk − μk−1 , τT z)c + (λk − λk−1 , τN z)c = 0 for all z ∈ Y. From regularity results for elliptic variational equalities it follows that there exists a constant C > 0 such that τT y k − τT y k−1 C 0 (c ) ≤ C λk − λk−1 c + μk − μk−1 c . (9.2.17) We now show that (9.2.17) implies k+1 k Akf,− ∩ Ak+1 f,+ = Af,+ ∩ Af,− = ∅
(9.2.18)
provided that λ0 −λγ c and μ0 −μγ c are sufficiently small. If B := Akf,− ∩Ak+1 f,+ = ∅, −1 −1 k−1 k then it follows that τT y + γ2 (μˆ + Fg) < 0 and τT y + γ2 (μˆ − Fg) > 0 on B, which implies that τT y k − τT y k−1 > 2γ2−1 Fg ≥ 2γ2−1 Fg0 > 0 on B. This contradicts (9.2.17) provided that λ0 − λγ c and μ0 − μγ c are sufficiently small. Analogously, one can show that Akf,+ ∩ Ak+1 f,− = ∅. We now choose an arbitrary σ ≥ γ2−1 and assume that (9.2.18) holds for Algorithm SSN if σ = γ2−1 . Then we can argue that in a neighborhood of the solution the iterates of Algorithm SSN are independent of σ ≥ γ2−1 . To verify this assertion we separately consider
i
i i
i
i
i
i
272
ItoKunisc 2008/6/12 page 272 i
Chapter 9. Semismooth Newton Methods II: Applications
the sets Ifk , Akf,− , and Akf,+ . On Ifk we have that ξ k = 0 and thus σ has no influence when determining the new active and inactive sets. On the set Akf,− we have μk = −Fg. Here, we consider two types of sets. First, sets where ξ k < 0 belong to Ak+1 f,− for the next iteration independently of σ . And, second, if ξ k ≥ 0, we use ξ k + σ (μk − Fg) = ξ k − 2σ Fg. Sets where ξ k − 2σ Fg ≤ 0 are transferred to Ifk+1 , and those where 0 < ξ k − 2σ Fg ≤ k+1 k ξ k − 2γ2−1 Fg belong to Ak+1 f,+ for the next iteration. However, the case that x ∈ Af,− ∩ Af,+ cannot occur for σ ≥ γ2−1 , since it is already ruled out by (9.2.18) for σ = γ2−1 . On the set Akf,+ one argues analogously. This shows that in a neighborhood of the solution the iterates are the same for all σ ≥ γ2−1 , and thus the superlinear convergence result from Step 1 carries over to the general case σ ≥ γ2−1 , which ends the proof. Aside from the assumption that λ0 − λγ c and μ0 − μγ c are sufficiently small, σ controls the probability that points are moved from the lower active set to the upper, or vice versa, in one iteration. Smaller values for σ make it more likely that points belong to k+1 k Akf,− ∩ Ak+1 f,+ or Af,+ ∩ Af,− . In the numerical realization of Algorithm SSN it turns out that choosing small values for σ may not be optimal, since this may lead to the situation that points which are active with respect to the upper bound become active with respect to the lower bound in the next iteration, and vice versa. This in turn may lead to cycling of the iterates. Such undesired behavior can be overcome by choosing larger values for σ . If the active set strategy is based on (9.2.13), one cannot take advantage of a parameter which helps avoid points from changing from Af,+ to Af,− , or vice versa, in one iteration. Remark 9.2.1. So far we have not remarked on the choice of λ¯ and μ. ¯ One possibility is to choose them according to first order augmented Lagrangian updates. In practice this will ¯ μ) result in carrying out some steps of Algorithm SSN and then updating (λ, ¯ as the current k k values of (λ , μ ).
9.2.3
Contact problem with Coulomb friction
We give a short description of how the contact problem with Coulomb friction can be treated with the methods of the previous subsection. As pointed out earlier, one of the main difficulties of such problems is related to the lack of regularity of σN (y) in (9.2.1). The 1 following formulation utilizes the contact problem with given friction g ∈ H − 2 (c ) and a 1 fixed point idea. We define the cone of nonnegative functionals over H 2 (c ) as 1 1 −1 H+ 2 (c ) := ξ ∈ H − 2 (c ) : ξ, η c ≥ 0 for all η ∈ H 2 (c ), η ≥ 0 −1
−1
and consider the mapping ! : H+ 2 (c ) → H+ 2 (c ) defined by !(g) := λg , where λg is the unique multiplier for the contact condition in (9.2.4) for the problem with given friction g. Property (9.2.4b) implies that ! is well defined. With (9.2.6) in mind, y ∈ Y is called a weak solution of the Signorini problem with Coulomb friction if its negative normal
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 273 i
273
boundary stress −σN (y) is a fixed point of the mapping !. In general, such a fixed point for the mapping ! does not exist unless F is sufficiently small; see, e.g., [EcJa, Has, HHNL]. The regularization for the Signorini contact problem with Coulomb friction that we consider here corresponds to the regularization in (Pγ1,γ2 ) and reflects the fact that the Lagrangian for the contact condition relates to the negative stress in the normal direction. It is given by a(y, z − y) + max(0, λˆ + γ1 (τN y − d)), τN (z − y) c − L(z − y) (9.2.19) 1 F max(0, λˆ + γ1 (τN y − d)) h(τT z, μ) ˆ − h(τT y, μ) ˆ dx ≥ 0 + γ2 c for all z ∈ Y, with h(· , ·) as defined in (9.2.11). Existence for (9.2.19) is obtained by means of a fixed point argument for the regularized Tresca friction problem. For this purpose we set L2+ (c ) := {ξ ∈ L2 (c ) : ξ ≥ 0 a.e.} and define the mapping !γ : L2+ (c ) → L2+ (c ) by !γ (g) := λγ = max(0, λˆ + γ1 (τN y γ − d)) with y γ the unique solution of the regularized contact problem with friction g ∈ L2+ (c ). In a first step Lipschitz continuity of the mapping γ : L2+ (c ) → Y which assigns to a given friction g ∈ L2+ (c ) the corresponding solution y γ of (Pγ1,γ2 ) is investigated. Lemma 9.8. For every γ1 , γ2 > 0 and λˆ ∈ L2 (c ), μˆ ∈ L2 (c ) the mapping γ defined above is Lipschitz continuous with constant L=
FL∞( c ) c1 , κ
(9.2.20)
where F∞ denotes the essential supremum of F, κ the coercivity constant of a(· , ·), and c1 the continuity constant of the trace mapping from Y to L2 (c ). In particular, the Lipschitz constant L does not depend on the regularization parameters γ1 , γ2 . For the proof we refer the reader to [Sta1]. We next address properties of the mapping !γ . Lemma 9.9. For every γ1 , γ2 > 0 and λˆ ∈ L2 (c ), μˆ ∈ L2 (c ) the mapping !γ : L2+ (c ) → L2+ (c ) is compact and Lipschitz continuous with constant L=
cγ F∞ , κ
where c is a constant resulting from trace theorems. Proof. We consider the following composition of mappings. γ
L2+ (c ) −→ g →
Y y
"
−→ →
ϒ
L2 (c ) −→ τN y →
L2+ (c ), ˆ max(0, λ + γ1 (τN y − d)).
(9.2.21)
From Lemma 9.8 it is known that γ is Lipschitz continuous. The mapping " consists 1 of the linear trace mapping from Y into H 2 (c ) and the compact embedding of this space
i
i i
i
i
i
i
274
ItoKunisc 2008/6/12 page 274 i
Chapter 9. Semismooth Newton Methods II: Applications
into L2 (c ). Therefore, it is compact and linear, in particular Lipschitz continuous with a constant c2 > 0. Since max(0, λˆ + γ1 (ξ − d)) − max(0, λˆ + γ1 (ξ˜ − d))c ≤ γ1 ξ − ξ˜ c
(9.2.22)
for all ξ, ξ˜ ∈ L2 (c ), the mapping ϒ is Lipschitz continuous with constant γ1 . This implies that !γ is Lipschitz continuous with constant L=
c 1 c 2 γ1 F∞ , κ
(9.2.23)
where c1 , c2 are constants from trace theorems. Concerning compactness the composition of " and γ is compact. From (9.2.22) it then follows that !γ is compact. This ends the proof. We can now show that the regularized contact problem with Coulomb friction has a solution. Theorem 9.10. The mapping !γ admits at least one fixed point, i.e., the regularized Coulomb friction problem (9.2.19) admits a solution. If F∞ is such that L as defined in (9.2.23) is smaller than 1, the solution is unique. Proof. We apply the Leray–Schauder fixed point theorem to the mapping !γ : L2 (c ) → L2 (c ). Using Lemma 9.9, it suffices to show that λ is bounded in L2 (c ) independently of g. This is clear taking into account the dual problem (Pγ1,γ2 ). Indeed, min
λ≥0, |μ|≤Fg a.e. on c
Jγ1 ,γ2 (λ, μ) ≤ min Jγ1 ,γ2 (λ, 0) < ∞. λ≥0
Hence, the Leray–Schauder theorem guarantees the existence of a solution to the regularized Coulomb friction problem. Uniqueness of the solution holds if F is such that L is smaller than 1, since in this case !γ is a contraction. In the following algorithm the fixed point approach is combined with an augmented Lagrangian concept to solve (9.2.19). Algorithm ALM-FP. 1. Initialize (λˆ 0 , μˆ 0 ) ∈ L2 (c ) × L2 (c ) and g 0 ∈ L2 (c ), m := 0. 2. Choose γ1m , γ2m > 0 and determine the solution (λm , μm ) to problem (Pγ1,γ2 ) with given friction g m and λˆ := λˆ m , μˆ := μˆ m . 3. Update g m+1 := λm , λˆ m+1 := λm , μˆ m+1 := μm , and m := m + 1. Unless an appropriate stopping criterion is met, goto step 2. The auxiliary problems (Pγ1,γ2 ) in step (2) can be solved by Algorithm SSN. Numerical experiments with this algorithm are given in [KuSt]. For the following brief discussion of the above algorithm, we assume that the mapping ! admits a fixed point λ∗ in L2 (c ). Then,
i
i i
i
i
i
i
9.2. Friction and contact problems in elasticity
ItoKunisc 2008/6/12 page 275 i
275
with the variables y ∗ ∈ Y and μ∗ ∈ L2 (c ) corresponding to λ∗ , we have for γ1 , γ2 > 0 that a(y ∗ , z) − L(z) + (λ∗ , τN z)c + (μ∗ , τT z)c = 0 for all z ∈ Y, λ∗ − max(0, λ∗ + γ1 (τN y ∗ − d)) = 0 on c , γ2 τT y ∗ − max(0, γ2 τT y ∗ + μ∗ − Fλ∗ ) − min(0, γ2 τT y ∗ + μ∗ + Fλ∗ ) = 0 on c . Provided that Algorithm ALM-FP has a limit point, it also satisfies this system of equations; i.e., the limit satisfies the original, nonregularized contact problem with Coulomb friction. We end the section with a numerical example taken from [KuSt]. Example 9.11. We solve the contact problem with Tresca as well as with Coulomb friction using Algorithms SSN = [0, 3] × [0, 1], the gap and ALM-FP, respectively. We choose function d = max 0.0015, 0.003(x1 − 1.5)2 + 0.001 and E = 10000, ν = 0.45, and f = 0. The boundary of possible contact and friction is c := [0, 3] × {0}, and we assume traction-free boundary conditions on n := [0, 3] × {1} ∪ {0} × [0, 0.2] ∪ {3} × [0, 0.2]. On d := {0} × [0.2, 1] ∪ {3} × [0.2, 1] we prescribe the deformation as follows: ⎧ 0.003(1 − x2 ) ⎪ ⎪ on {0} × [0.2, 1], ⎨ −0.004 τy = −0.003(1 − x2 ) ⎪ ⎪ ⎩ on {3} × [0.2, 1]. −0.004 Further γ1 = γ2 = 108 , σ = 1, λ¯ = μ¯ = 0, and Algorithm SSN is initialized by λ0 = μ0 = 0. Algorithm ALM-FP is initialized with the solution of the pure contact problem and g 0 = 0, λˆ 0 = μˆ 0 = 0. The MATLAB code is based on [ACFK], which that uses linear and bilinear finite elements for the discretization of the elasticity equations without friction and contact. The semismooth Newton method detects the solution for given friction g ≡ 1 and F = 1 after 7 iterations. The corresponding deformed mesh and the elastic shear energy density are shown in Figure 9.1. As expected, in a neighborhood of the points (0, 0.2) and
1 0.8 0.6 0.4 0.2 0 −0.2 −0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 9.1. Deformed mesh for g ≡ 1; gray tones visualize the elastic shear energy density.
i
i i
i
i
i
i
276
ItoKunisc 2008/6/12 page 276 i
Chapter 9. Semismooth Newton Methods II: Applications
20
20
20
10
10
10
0
0
0
−10
−10
−10
−20
−20
−20
−30
−30
−30
−40 0
0.5
1
1.5
2
2.5
3
−40 0
0.5
1
1.5
2
2.5
3
−40 0
40
80
200
30
60
150
20
40
100
10
20
50
0
0
0
−10
−20
−50
−20
−40
−100
−30
−60
−40 0
0.5
1
1.5
2
2.5
3
−80 0
0.5
1
1.5
2
2.5
3
1.5
2
2.5
3
−150
0.5
1
1.5
2
2.5
3
−200 0
0.5
1
Figure 9.2. Upper row, left: multiplier λ∗ (solid), rigid foundation (multiplied by 5 · 10 , dotted), and normal displacement τN y ∗ (multiplied by 5 · 103 , dashed). Lower row, left figure: dual variable μ∗ (solid) with bounds ±Fλ∗ (dotted) and tangential displacement y ∗ (multiplied by 5 · 103 , dashed) for F = 2. Middle column: same as left, but with F = 5. Right column: same as first column, but with F = 10. 3
(3, 0.2), i.e., the points where the boundary conditions change from Neumann to Dirichlet, we observe a stress concentration due to a local singularity of the solution. Further, we also observe a (small) stress concentration close to the points where the rigid foundation has the kinks. We turn to the Coulomb friction problem and investigate the performance of Algorithm ALM-FP. In Figure 9.2 the normal and the tangential displacement with corresponding multipliers for F = 2, 5, 10 are depicted. One observes that the friction coefficient significantly influences the deformation. For instance, in the case F = 2 the elastic body is in contact with the foundation in the interval [1.4, 1.6], but it is not for F = 5 and F = 10. These large values of F may, however, be physically of little relevance. Algorithm ALM|g m −g m−1 | FP requires overall between 20 and 25 linear solves to stop with |gm | c ≤ 10−7 . For c further information on the numerical performance we refer the reader to [Sta1].
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 277 i
Chapter 10
Parabolic Variational Inequalities
In this section we discuss the Lagrange multiplier approach to parabolic variational inequalities in the Hilbert space H = L2 () which are of the type d ∗ y (t) + Ay ∗ (t) − f (t), y − y ∗ (t) ≥ 0, y ∗ (t) − ψ ∈ C, dt (10.0.1) ∗ y (0) = y0 for all y − ψ ∈ C, where the closed convex set C of H is defined by C = {y ∈ H : y ≥ 0}, A is a closed operator on H , denotes an open domain in Rn , and y ≥ 0 is interpreted in the pointwise a.e. sense. We consider the Black–Scholes model for American options, which is a variational inequality of the form 2 d σ 2 v(t, S) − S vSS + rS vS − r v + Bv ≥ 0 ⊥ v(t, S) ≥ ψ(S), dt 2 v(T , S) = ψ(S)
−
(10.0.2)
for a.e. (t, S) ∈ (0, T ) × (0, ∞), where ⊥ denotes complementarity, i.e., a ≥ 0 ⊥ b ≥ ψ if a ≥ 0, b ≥ ψ, and a(b − ψ) = 0. In (10.0.2) the reward function ψ(S) = (K − S)+ for the put option and ψ(S) = (S − K)+ for the call option. Here S ≥ 0 denotes the price, v the value of the share, r > 0 is the interest rate, σ > 0 is the volatility of the market, and K is the strike price. Further T is the maturity date. The integral operator B is defined by ∞ Bv(S) = −λ (z − 1)SvS + (v(t, S) − v(t, zS)) dν(z). 0
Note that (10.0.2) is a backward equation with respect to the time variable. Setting y(t, S) = v(T − t, S) we arrive at (10.0.1), and (10.0.2) has the following interpretation 277
i
i i
i
i
i
i
278
ItoKunisc 2008/6/12 page 278 i
Chapter 10. Parabolic Variational Inequalities
[Kou] in mathematical finance. The price process St is governed by the Ito’s stochastic differential equation dSt /St− = r dt + σ dBt + (Jt − 1) dπt , where Bt denotes a standard Brownian motion, πt is a counting Poisson process, Jt − 1 is the magnitude of the jump ν, and λ ≥ 0 is the rate. The value function v is represented by v(t, S) = sup E t,x [e−r(τ −t) ψ(Sτ )] τ
over all stopping times τ ≤ T .
(10.0.3)
¯ ∞) with It will be shown that for the put case, (0, ∞) can be replaced by = (S, ¯ certain S > 0. Thus, (10.0.2) can be formulated as (10.0.1) by defining a bounded bilinear form a on V × V by ∞ 2 σ 2 a(v, φ) = (10.0.4) S vS + (r − σ 2 )S v φS + (2r − σ 2 ) vφ − Bv φ) dS 2 S¯ for v, φ ∈ V , where V is defined by ¯ ∞), V = φ ∈ H : φ is absolutely continuous on (S, S¯
∞
¯ =0 S 2 |φS |2 dS < ∞ and φ(S) → 0 as S → ∞ and ψ(S)
equipped with |φ|2V =
S¯
∞
(S 2 |φS |2 + |φ|2 ) dS.
Now a(v, φ) ≤
σ2 |v|V |φ|V + |r − σ 2 ||v|H |φ|V + |2r − σ 2 ||v|H |φ|H 2
and a(v, v) ≥
σ2 2 3 |v|V + 2r − σ 2 |v|2H − |r − σ 2 ||v|V |v|H 2 2
σ2 2 3 2 (r − σ 2 )2 |v|2H . ≥ |v|V + 2r − σ − 4 2 σ2 Note that if Av ∈ L2 (), then (Av, φ) = a(v, φ)
for all φ ∈ V .
Then, the solution v(T − t, S) satisfies (10.0.1) with f (S) = λ v + = sup(0, v). Since v + ≥ v a.e. in ,
S¯ S
0
ψ(zS) dν(z). Let
(−Bv, v + ) ≥ (−Bv + , v + ).
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 279 i
279 Thus,
Av = −
satisfies
σ2 2 S vSS + rS vS − r v + Bv 2
Av, v + ≥ Av + , v + .
Motivated by this example we make the following assumptions. Let X be a Hilbert space that is continuously embedded into H , and let V be a separable closed linear subspace of X endowed with the induced norm and dense in H . Assume that ψ ∈ X, and that
f ∈ L2 (0, T ; V ∗ ),
φ + = sup(0, φ) ∈ V
for all φ ∈ V .
The following assumptions will be used for the operator A. (1) A ∈ L(X, V ∗ ), i.e., there exists M¯ such that ¯ | Ay, φ V ∗ ×V | ≤ M|y| |φ|
for all y ∈ X and φ ∈ V ,
and A is closed in H with dom(A) = {y ∈ X : Ay ∈ H } ⊂ X, where dom(A) is a Hilbert space equipped with the graph norm. (2) There exist ω > 0 and ρ ∈ R such that for all φ ∈ V Aφ, φ ≥ ω|φ|2V − ρ|φ|2H . (3) For all φ ∈ V ,
Aφ, φ + ≥ Aφ + , φ + .
(4) There exists λ¯ ∈ H satisfying λ¯ ≤ 0 a.e. such that λ¯ + Aψ − f (t), φ ≤ 0 for a.e. t and all φ ∈ V satisfying φ ≥ 0 a.e. (5) There exists an ψ¯ ∈ dom(A) such that ψ¯ − ψ ∈ V ∩ C = C. (6) Let as be the symmetric form on V × V defined by as (y, φ) =
1 ( Ay, φ + Aφ, y ) 2
for y, φ ∈ V and assume that the skew-symmetric form satisfies 1 | Ay, φ − Aφ, y | ≤ M|y|V |φ|H 2 for a constant M independent of y, φ ∈ V . (7) (φ − γ )+ ∈ V for any γ ∈ R+ and φ ∈ V , and A1, (φ − γ )+ ≥ 0.
i
i i
i
i
i
i
280
ItoKunisc 2008/6/12 page 280 i
Chapter 10. Parabolic Variational Inequalities
Assumptions (1)–(5) apply to (10.0.2) and to second order elliptic differential operators. Assumption (6) applies to the biharmonic operator 2 and to self-adjoint operators. For the biharmonic operator and systems of equations as the elasticity system, for instance, the monotone property (3) does not hold. In this chapter we discuss (10.0.1) without assuming that V is embedded compactly into H . In the latter case, one can use the Aubin lemma, which states that W (0, T ) = L2 (0, T ; V )∩H 1 (0, T ; V ∗ ) is compactly embedded into L2 (0, T ; H ). This ensures that the weak limit of certain approximating sequences defines the solution; see, e.g., [GLT, IK23]. Instead, our analysis uses the monotone trick for variational inequalities. From, e.g., [Tan, p. 151], we recall that W (0, T ) embeds continuously into C([0, T ]; H ). We commence with the definitions of strong and weak solutions to (10.0.1). Definition 10.1 (Strong Solution). Given y0 − ψ ∈ C and f ∈ L2 (0, T ; H ), an X-valued function y ∗ (t), with y ∗ − ψ ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) is called a strong solution of (10.0.1) if y ∗ (0) = y0 , y ∗ ∈ H 1 (δ, T ; H ) for every δ > 0, y ∗ (t, x) ≥ ψ(x) a.e. in (0, T ) × , and for a.e. t ∈ (0, T ), d ∗ y (t) + Ay ∗ (t) − f (t), y − y ∗ (t) ≥ 0 dt for a.e. t ∈ (0, T ) and for all y − ψ ∈ C. Defining λ∗ = − dtd y ∗ − Ay ∗ + f (t) ∈ L2 (0, T ; V ∗ ), we have in the distributional sense that y ∗ satisfies ⎧ d ∗ ⎪ ⎨ y (t) + Ay ∗ (t) + λ∗ (t) = f (t), y ∗ (0) = y0 , dt (10.0.5) ⎪ ⎩ ∗ ∗ ∗ λ (t), y − ψ ≤ 0 for all y − ψ ∈ C and λ , y − ψ = 0. Moreover, in case that y ∗ is a strong solution we have y ∗ ∈ L2 (δ, T ; dom(A)) for every δ > 0. Further λ∗ ∈ L2 (δ, T ; H ) and (10.0.1) can equivalently be written as a variational inequality in the form ⎧ d ∗ ⎪ ⎨ y (t) + Ay ∗ (t) + λ∗ (t) = f (t), y ∗ (0) = y0 , dt (10.0.6) ⎪ ⎩ ∗ ∗ ∗ ∗ λ (t) ≤ 0, y (t) ≥ ψ, (y (t) − ψ, λ (t))H = 0 for a.e. t > 0. Definition 10.2 (Weak Solution). Assume that y0 ∈ H and f ∈ L2 (0, T , V ∗ ). Then a function y ∗ − ψ ∈ L2 (0, T ; V ) satisfying y ∗ (t, x) ≥ ψ(x) a.e. in (0, T ) × is called a weak solution to (10.0.1) if 2 T 1 d y(t), y(t) − y ∗ (t) + Ay ∗ (t), y(t) − y ∗ (t) − f (t), y(t) − y ∗ (t) dt dt 0 1 + |y(0) − y0 |2H ≥ 0 2 (10.0.7)
i
i i
i
i
i
i
10.1. Strong solutions
ItoKunisc 2008/6/12 page 281 i
281
is satisfied for all y − ψ ∈ K, where K = {y ∈ W (0, T ) : y(t, x) ≥ 0 a.e. in (0, T ) × }
(10.0.8)
and W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ). Since for y ∗ and y in W (0, T ) T d 1 ∗ ∗ y(t) − y (t), y(t) − y (t) dt = (|y(T ) − y ∗ (T )|2H − |y(0) − y0 |2H ), dt 2 0 it follows that a strong solution to (10.0.1) is also a weak solution. Let us briefly outline this chapter. Section 10.1 is devoted to proving existence and uniqueness of strong solutions. Extra regularity of solutions is obtained in Section 10.2. Continuous dependence of the solution with respect to parameters in A is investigated in Section 10.3. Section 10.4 focuses on weak solutions obtained as the limit of approximating difference schemes. In Section 10.5 monotonic behavior of solutions with respect to initial conditions and the forcing function is proved. We refer to [Sey] and the literature cited there for an introduction to numerical aspects in mathematical finance.
10.1
Strong solutions
In this section we establish the existence of the strong solution to (10.0.1) under assumptions (1)–(5) and (1)–(2), (5)–(6), respectively. For λ¯ ∈ H satisfying λ¯ ≤ 0 we consider the regularized equations of the form - d y + Ayc + min(0, λ¯ + c (yc − ψ)) = f, c > 0, dt c (10.1.1) yc (0) = y0 . Proposition 10.3. If assumptions (1)–(2) hold and y0 ∈ H , f ∈ L2 (0, T ; V ∗ ), and λ¯ ∈ H , then (10.1.1) has a unique solution yc satisfying yc − ψ ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ). Proof. Existence and uniqueness of the solution to (10.1.1) follow with monotone techniques; see [ItKa, Lio3], for instance. Define A : V → V ∗ by ¯ cφ). Aφ = Aφ + min(−λ, Then (10.1.1) can equivalently be expressed as d (10.1.2) v + Av = f − λ¯ − Aψ ∈ L2 (0, T ; V ∗ ), dt with v = yc − ψ and v(0) = y0 − ψ ∈ H . We note that A is hemicontinuous, i.e., s → A(φ1 + sφ2 ), φ3 is continuous from R → R for all φi ∈ V , i = 1, . . . , 3, and |Aφ|V ∗ ≤ |Aφ|V ∗ + c|φ|H for all φ ∈ V , Aφ1 − Aφ2 , φ1 − φ2 ≥ ω|φ1 − φ2 |2V − ρ|φ1 − φ2 |2H for all φ1 , φ2 ∈ V , Aφ, φ ≥ ω|φ|2V − ρ|φ|2H for all φ ∈ V .
i
i i
i
i
i
i
282
ItoKunisc 2008/6/12 page 282 i
Chapter 10. Parabolic Variational Inequalities
It follows that (10.1.2) admits a unique solution v ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) and this gives the desired solution yc = v + ψ of (10.1.1); cf. [ItKa, Theorem 8.7], [Lio3, Theorem II.1.2]. Theorem 10.4. (1) If in addition to the assumptions in Proposition 10.3 assumptions (3)– (4) hold, y0 − ψ ∈ C, then yc (t) − ψ ∈ C and yc (t) ≥ ycˆ (t) for cˆ ≥ c. Moreover, yc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) and weakly in H 1 (0, T ; V ∗ ) as c → ∞, where y ∗ is the unique solution of (10.0.1) in the sense that y ∗ − ψ ∈ K, (10.0.6) is satisfied with λ∗ ∈ L2 (0, T ; H ), and the estimate t 1 t −2ρs 2 1 −2ρt ∗ 2 −2ρs ∗ 2 ¯ ds → 0 e |yc (t) − y (t)|H + e ω |yc (s) − y (s)|V ds ≤ e |λ| 2 c 0 0 for t ∈ [0, T ] holds. If in addition assumption (7) is satisfied and λ¯ ∈ L∞ (), then |yc (t) − y ∗ (t)|L∞ ≤
1 ¯ L∞ . |λ| c
(2) If assumptions (1)–(4) hold, y0 − ψ ∈ C, and f ∈ L2 (0, T ; H ), then y ∗ is the unique strong solution to (10.0.1). Proof. (1) From (10.1.1) it follows that yc satisfies - d y + A(yc − ψ) + λc = f − Aψ, dt c yc (0) = y0 ,
(10.1.3)
where λc = min(0, λ¯ + c (yc − ψ)). If y0 −ψ ∈ C, then yc (t)−ψ ∈ C. In fact, let φ = min(0, yc −ψ) = −(yc −ψ)− ∈ −C∩V . Since λ¯ ≤ 0 it follows that d yc + A(yc − ψ), φ + Aψ − f + λ¯ + c φ, φ = 0, dt where by assumptions (3) and (2) A(yc − ψ), φ ≥ Aφ, φ ≥ ω|φ|2 − ρ|φ|2H and by (4) ¯ φ ≥ 0. Aψ − f (t) + λ, Thus, 1 d |φ|2 ≤ ρ |φ|2H 2 dt H
i
i i
i
i
i
i
10.1. Strong solutions
ItoKunisc 2008/6/12 page 283 i
283
and consequently e−2ρt |φ|2H ≤ |φ(0)|2H = 0.
(10.1.4)
Since ¯ 0 ≥ λc = min(0, λ¯ + c (yc − ψ)) ≥ λ, we have ¯ H |λc (t)|H ≤ |λ| for all t ∈ [0, T ]. From (10.1.3) we deduce that {yc } is bounded in L2 (0, T ; V ). By assumption (1) and again by (10.1.3) it follows that {Ayc } and { dtd yc } are bounded in L2 (0, T ; V ∗ ). Thus, there exist λ∗ ∈ L2 (0, T ; H ) satisfying λ∗ ≤ 0 a.e. and y ∗ satisfying y ∗ − ψ ∈ K, such that for a subsequence denoted again by c, λc → λ∗ weakly in L2 (0, T ; H ), d d yc → y ∗ weakly in L2 (0, T ; V ∗ ) dt dt
Ayc → Ay ∗ and
(10.1.5)
as c → ∞ . Taking the limit in (10.1.3) implies that d ∗ y + Ay ∗ − f = −λ∗ , dt
y ∗ (0) = y0 ,
(10.1.6)
with equality in the differential equation holding in the sense of L2 (0, T ; V ∗ ). For φ = −(yc − ycˆ )− with c ≤ cˆ we deduce from (10.1.3) that d (yc − ycˆ ) + A(yc − ycˆ ), φ + (λc − λcˆ , φ) = 0, dt where
(λc − λcˆ , φ) = min(0, λ¯ + c(ycˆ − ψ)) − min(0, λ¯ + c(y ˆ cˆ − ψ)) + min(0, λ¯ + c(yc − ψ)) − min(0, λ¯ + c(ycˆ − ψ)), φ ≥ 0,
since ycˆ ≥ ψ. Hence, using the same arguments as those leading to (10.1.4), we have |φ(t)|H = 0 and thus yc ≥ ycˆ
for
c ≤ c. ˆ
By Lebesgue dominated convergence theorem and the theorem of Beppo Levi, yc → y ∗ strongly to L2 (0, T ; H ) and pointwise a.e. in (0, T ) × . Since T 1 T 2 ¯ H dt → 0 0≥ (λc , yc − ψ)H dt ≥ − |λ| c 0 0 as c → ∞, we have
T
(λ∗ , y ∗ − ψ) dt = 0.
0
i
i i
i
i
i
i
284
ItoKunisc 2008/6/12 page 284 i
Chapter 10. Parabolic Variational Inequalities
That is, (y ∗ , λ∗ ) satisfies (10.0.6), where the first equation is satisfied in the sense of L2 (0, T ; V ∗ ). Suppose that y ∈ K satisfies (10.0.1). Then it follows that 1 d ∗ |y (t) − y(t)|2H + A(y ∗ − y(t)), y ∗ (t) − y(t) ≤ 0 2 dt and thus e−2ρt |y ∗ (t) − y(t)|2H ≤ |y0 − y(0)|2H . This implies that y ∗ is the unique solution to (10.0.1) in K and that the whole family {(yc , λc )} converges in the sense specified in (10.1.5). From (10.0.1) and (10.1.1) d ∗ ∗ ∗ y (t) + Ay (t) − f (t), yc (t) − y (t) ≥ 0, dt d ∗ yc (t) + Ayc (t) − f (t), y (t) − yc (t) ≥ (λc , yc − ψ)H . dt ¯ 2H . Summing the above inequalities and Since yc ≥ ψ, we have (λc , yc − ψ) ≥ − 1c |λ| −2ρt give multiplying by e 1 d −2ρt (e |yc (t) − y ∗ (t)|2H ) + e−2ρt A(yc (t) − y ∗ (t)), yc (t) − y ∗ (t)
2 dt 1 ¯ 2, +ρ |yc (t) − y ∗ (t)|2H ≤ e−2ρt |λ| c which implies the first estimate and in particular that yy → y ∗ strongly in L2 (0, T ; V ). Suppose next that in addition λ¯ ∈ L∞ (). Let k ∈ R+ and φ = (yc − y ∗ − k)+ . By assumption φ ∈ V . From (10.0.6) and (10.1.1) d ∗ y + Ay ∗ − f, φ ≥ 0 dt
and
d yc + Ayc + λc − f, φ = 0. dt
¯ L∞ , then If k ≥ 1c |λ| (λc , φ) = (min(0, λ¯ + c (yc − ψ)), (yc − y ∗ − k)+ ) = 0, where we use that yc ≥ y ∗ ≥ ψ. Hence, we obtain d ∗ ∗ (yc − y − k) + A(yc − y − k) + A k, φ ≤ 0. dt By assumption (3) and since A1, φ ≥ 0, 1 d |φ|2 + Aφ, φ ≤ 0, 2 dt which implies the second estimate.
i
i i
i
i
i
i
10.1. Strong solutions
ItoKunisc 2008/6/12 page 285 i
285
(2) Now suppose that f ∈ L2 (0, T ; H ) and that assumptions (1)–(4) hold. Consider (10.1.3) in the form - d y + Ayc = f − λc , dt c (10.1.7) y(0) = y0 . We decompose yc = yc,i + yh , where yc,i and yh are the solutions to (10.1.7) with initial condition and forcing functions set to zero, respectively. Note that {λc } is bounded in L2 (0, T ; H ) uniformly with respect to c. Hence by the following lemma {Ayc,i } and { dtd yc,i } are bounded in L2 (0, T ; H ) uniformly for c > 0. Moreover, Ayh ∈ L2 (δ, T ; H ) and dtd yh ∈ L2 (δ, T ; H ) for every δ > 0. Thus yc is bounded in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) and converges weakly in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) to y ∗ for c → ∞. Lemma 10.5. Under the assumptions of the previous theorem −A generates an analytic semigroup on H . If dtd x + Ax = g ∈ L2 (0, T ; H ), with x0 = 0, then dtd x(t) and Ax(t) ∈ L2 (0, T ; H ), and |Ax|L2 (0,T ;H ) ≤ k¯ |g|L2 (0,T ;H ) , with k¯ independent of g ∈ L2 (0, T ; H ). Proof. Let B = A−ρI . Further for u ∈ dom(A) and λ ∈ C with Reλ ≥ 0 set g¯ = λu+Bu. Then, since ¯ H |u|H , Re λ (u, u)H + Bu, u ≤ |g| assumption (2) implies that ω|u|2V ≤ |f |H |u|H .
(10.1.8)
From assumption (1) and (10.1.8) ¯ ¯ 2V ≤ 1 + M |g| ¯ H |u|H + M|u| ¯ H |u|H , |λ| |u|2H ≤ |g| ω and thus 1 M¯ |g| ¯ H. ¯ H ≤ 1+ |u|H = |(λI + B)−1 g| ω |λ|
(10.1.9)
It thus follows from [ItKa, Paz, Tan] that −B and hence −A generate analytic semigroups on H related by e−B t = e (ρ−A) t . For g ∈ L2 (0, ∞; H ) with eρ · g ∈ L2 (0, ∞; H ) consider d x + Ax = g with x(0) = 0. dt This is related to d z + Bz = gρ := geρ· with x(0) = 0 dt
(10.1.10)
i
i i
i
i
i
i
286
ItoKunisc 2008/6/12 page 286 i
Chapter 10. Parabolic Variational Inequalities
by z(t) = eρ t x. Taking the Laplace transform of (10.1.10), we obtain ∞ z0 e−λs z(s) ds, λˆz + B zˆ = gˆ ρ , where zˆ = and thus by (10.1.9) M¯ |gˆ ρ |H . |B zˆ |H ≤ |B(λI + B)−1 gˆ ρ |H ≤ 2 + ω From the Fourier–Plancherel theorem we have ∞ ∞ M¯ 2 |Az(t)|H dt ≤ 2 + |gρ |2 dt. ω 0 0 This implies that |Ax|L2 (0,T ;H ) ≤ eρ T (2 + t ≥ T.
M¯ )|g|L2 (0,T ;H ) ω
choosing g = 0 for
To allow δ = 0 for the strong solutions in the previous theorem we let y¯ denote the solution to - d y¯ + Ay¯ = 0, dt y(0) ¯ = y0 , and we consider -
d (yc dt
− y) ¯ + A(yc − y) ¯ = f − λc ,
y(0) − y(0) ¯ = 0. Arguing as in (2) of Theorem 10.4 we obtain the following corollary. Corollary 10.6. Under the assumptions of Theorem 10.4 we have y ∗ − y¯ ∈ H 1 (0, T ; H ) ∩ L2 (0, T ; dom(A)). Next we turn to verify existence under a different set of assumptions which, in particular, does not involve the monotone assumption (3). For λ¯ = 0 in (10.1.1), let yˆc denote the corresponding solution, i.e., d yˆc + Ayˆc + c min(0, c (yc − ψ)) = f, dt
c > 0,
(10.1.11)
which exists by Theorem 10.4 (1). Theorem 10.7. If assumptions (1)–(2) and (5)–(6) hold, y0 − ψ ∈ C ∩ V , and f ∈ L2 (0, T ; H ), then (10.0.1) has a unique, strong solution y ∗ (t) in H 1 (0, T ; H ) ∩ L2 (0, T ; dom(A)), and yˆc → y ∗ strongly in L2 (0, T ; V ) ∩ C(0, T ; H ) as c → ∞. Moreover t → y ∗ (t) ∈ V is right continuous. If in addition assumptions (3)–(4) hold, then yˆc ≤ yˆcˆ for c ≤ cˆ and yˆc (t) → y ∗ (t) strongly in H for each t ∈ [0, T ] and pointwise a.e. in .
i
i i
i
i
i
i
10.1. Strong solutions
ItoKunisc 2008/6/12 page 287 i
287
Proof. For λ¯ = 0 we have 1 d |yˆc − ψ|2H + A(yˆc − ψ), yˆc − ψ + Aψ − f, yˆc − ψ + c |(yˆc − ψ)− |2 = 0. 2 dt From assumptions (1)–(2) we have t (ω |yˆc − ψ|2V + c |(yˆc − ψ)− |2H ) ds |yˆc (t) − ψ|2H + 0 t 1 2 2 2 ≤ |y0 − ψ|H + 2ρ |yˆc − ψ|H + |Aψ − f |V ∗ ds ω 0 and thus
t
(ω |yˆc − ψ|2V + c |(yˆc − ψ)− |2H ) ds 1 t ≤ e2ρt |y0 − ψ|2H + |Aψ − f |2V ∗ ds . ω 0
|yˆc (t) − ψ|2H +
0
(10.1.12)
With assumptions (5) and (6) holding, we have that yˆc ∈ H 1 (0, T ; H ), d d yˆc + Ayˆc + c min(0, yˆc − ψ) − f (t), yˆc = 0 dt dt for a.e. t ∈ (0, T ). Then
d 2
yˆc + d (as (yˆc − ψ, ¯ yˆc − ψ) ¯ + c|(yˆc − ψ)− |2 ) ≤ 2(M 2 |yˆc − ψ| ¯ 2V + |Aψ¯ − f |2H )
dt dt H and hence
t
d 2 − 2
¯ ¯ as (yˆc (t) − ψ, yˆc (t) − ψ) + c |(yˆc (t) − ψ) |H +
yˆc ds 0 dt H t 2 2 ¯ y0 − ψ) ¯ + ¯ V + |Aψ¯ − f (s)|2H ) ds, ≤ as (y0 − ψ, 2(M |yˆc (s) − ψ|
(10.1.13)
0
where we used the fact that y0 − ψ ∈ C. It thus follows from (10.1.12)–(10.1.13) that
t
d 2 2 − 2
¯ |yˆc (t) − ψ|V + c |(yˆc (t) − ψ) |H +
dt yˆc ds 0 H (10.1.14) t ¯ V + |ψ − ψ| ¯ V+ ≤ K |y0 − ψ| |Aψ¯ − f (s)|2H ds 0
for a constant K independent of c > 0 and t ∈ [0, T ]. Hence there exists a y ∗ such that y ∗ − ψ¯ ∈ H 1 (0, T ; H ) ∩ B(0, T ; V ) and on a subsequence d d yˆc → y ∗ , dt dt
Ayˆc → Ay ∗
i
i i
i
i
i
i
288
ItoKunisc 2008/6/12 page 288 i
Chapter 10. Parabolic Variational Inequalities
weakly in L2 (0, T ; H ) and L2 (0, T ; V ∗ ), respectively. In particular this implies that yc → y ∗ weakly in L2 (0, T ; H ). Above B(0, T ; V ) denotes the space of all everywhere bounded measurable functions from [0, T ] to V . By assumption (5) we have that y ∗ −ψ ∈ B(0, T ; V ) as well. Since T T (yˆc − ψ, φ)H dt ≥ − ((yˆc − ψ)− , φ)H dt 0
0
T for all φ ∈ L (0, T ; H ) with φ(t) ∈ C for a.e. t, and since limc→0 0 |(yˆc (t) − ψ)− |2H dt = 0, by (10.1.14) we have T (y ∗ (t) − ψ, φ)H dt ≥ 0 for all φ ∈ L2 (0, T ; H ) with φ(t) ∈ C. (10.1.15) 2
0
This implies that y ∗ (t) − ψ ∈ C for a.e. t ∈ [0, T ]. For y − ψ ∈ C −(c(yˆc (t) − ψ)− , y − yˆc (t)) = − c(yˆc (t) − ψ)− , y − ψ − (yˆc (t) − ψ) ≤ 0 for a.e. t ∈ (0, T ). It therefore follows from (10.1.11) that for y ∈ K d (yˆc (t) − y(t) + y(t)) + A(yˆc (t) − y(t)) + Ay(t) − f (t), y(t) − yˆc (t) ≥ 0. ds (10.1.16) Hence t d e−2ρs (yˆc (s) − y(s)), yˆc (s) − y(s) ds ds 0
t
+ 0
≤
t
e−2ρs
d y(s) + A(yˆc (s) − y(s)), yˆc (s) − y(s) ds
e−2ρs Ay(s) − f (s), y(s) − yˆc (s) ds.
0
For z ∈ W (0, T ) we have t t d 1 e−2ρs ρ e−2ρs |z(s)|2H ds. z(s), z(s) ds = (e−2ρt |z(t)|2H − |z(0)|2H ) + ds 2 0 0 (10.1.17) This, together with (10.1.16), implies for y ∈ K 1 −2ρt (e |yˆc (t) − y(t)|2H − |y0 − y(0)|2H ) 2 t d −2ρs 2 + e y(s) + A(yˆc (s) − y(s)), yˆc (s) − y(s) + ρ |yˆc (s) − y(s)|H ds ds 0
≤
t
e−2ρs Ay(s) − f (s), y(s) − yˆc (s) ds t → e−2ρs Ay − f (s), y(s) − y ∗ (s) ds 0
0
i
i i
i
i
i
i
10.1. Strong solutions
ItoKunisc 2008/6/12 page 289 i
289
as c → ∞. Since norms are w.l.s.c., we obtain 1 −2ρt ∗ |y (t) − y(t)|2H − |y0 − y(0)|2H ) (e 2 t d −2ρs ∗ ∗ ∗ 2 + e y(s) + A(y (s) − y(s)), y (s) − y(s) + ρ |y (s) − y(s)|H ds ds 0 ≤
t
e−2ρs Ay(s) − f (s), y(s) − y ∗ (s) ds,
0
or equivalently, using (10.1.17) t d ∗ e−2ρs y (s) + Ay ∗ (s) − f (s), y(s) − y ∗ (s) ds ≥ 0 ds 0
(10.1.18)
for all y ∈ K and t ∈ [0, T ]. If y(t) − ψ ∈ H 1 (0, T ; H ) ∩ B(0, T ; V ) also satisfies (10.1.18), then it follows from (10.1.18) that t d e−2ρs (y(s) − y ∗ (s)) + A(y(s) − y ∗ (s)), y(s) − y ∗ (s) ds ≤ 0. ds 0 Using (10.1.17) this implies t 1 −2ρt e |y(t)−y ∗ (t)|2H + e−2ρs ( A(y(s)−y ∗ (s)), y(s)−y ∗ (s) +ρ |y ∗ (s)−y(s)|2H ) ds ≤ 0 2 0 and thus y(t) = y ∗ (t). Hence the solution to (10.1.18) is unique. Integrating (10.1.16) on (τ, t) with 0 ≤ τ < t ≤ T we obtain with the arguments that lead to (10.1.18) t d ∗ e−2ρs (10.1.19) y (s) + Ay ∗ (s) − f (s), y(s) − y ∗ (s) ds ≥ 0, ds τ and thus y ∗ satisfies (10.0.1). To argue that yˆc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) ∩ C(0, T ; H ), note that λˆ c = c min(0, yˆc − ψ) converges weakly in L2 (0, T ; V ∗ ) to λ∗ . From (10.1.11) and (10.0.5) we have 1 d |yˆc − y ∗ |2 + A(yˆc − y ∗ ), yˆc − y ∗ = λ∗ − λˆ c , yˆc − y ∗
2 dt ≤ λ∗ − λˆ c , yˆc − ψ + ψ − y ∗ ≤ λ∗ , yˆc − ψ + λˆ c , y ∗ − ψ =: ηc , where |ηc |L1 (0,T ;R) → 0 for c → ∞. By assumption (2) 1 d |yˆc − y ∗ |2H + ω|yˆc − y ∗ |2V − ρ|yˆc − y ∗ |2H ≤ ηc , 2 dt and hence
d −2ρt [e |yˆc − y ∗ |2H ] + ω e−2ρt |yˆc − y ∗ |2V ≤ 2e−2ρt ηc , dt
i
i i
i
i
i
i
290
ItoKunisc 2008/6/12 page 290 i
Chapter 10. Parabolic Variational Inequalities
which implies that e−2ρt |yˆc (t) − y ∗ (t)|2H + ω
t
e−2ρs |yˆc (s) − y ∗ (s)|2V ds ≤
0
t
e−2ρs ηc (s) ds,
0
and the desired convergence of yc to y ∗ in L2 (0, T ; V ) ∩ C(0, T ; H ) follows. It remains to argue right continuity of t → y ∗ (t) ∈ V . From (10.1.19) it follows that t d ∗ y ∗ (s − h) − y ∗ (s) −2ρs ∗ e y (s) + Ay (s) − f (s) , ds ≤ 0, (10.1.20) ds −h τ where h > 0. Using a(b − a) = b −a − 12 (a − b)2 we find 2 t 1 ¯ y ∗ (s − h) − y ∗ (s)) lim inf e−2ρs as (y ∗ (s) − ψ, h→0 −h τ t ¯ y ∗ (s) − ψ) ¯ ds ≥ 2ρ e−2ρs as (y ∗ (s) − ψ, 2
2
τ
1 1 ¯ y ∗ (t) − ψ) ¯ − e−2ρτ as (y ∗ (τ ) − ψ, ¯ y ∗ (τ ) − ψ). ¯ + e−2ρt as (y ∗ (t) − ψ, 2 2 ∗
∗
This estimate, together with assumption (6) and the fact that { y (·−h)−y }h>0 is weakly −h bounded in L2 (0, T ; H ), allows us to pass to the limit in (10.1.20) to obtain ¯ y ∗ (t) − ψ) ¯ − e−2ρτ as (y ∗ (τ ) − ψ, ¯ y ∗ (τ ) − ψ) ¯ e−2ρt as (y ∗ (t) − ψ, t −2ρs d ∗ 2
y (s) H ds + τe dt ≤M τ
t
t
d
¯ V y ∗ (s) ds + |y ∗ (s) − ψ| e−2ρs |Aψ¯ − f (s)|H
dt
τ
d ∗
y (s) ds.
dt
Consequently, we have ¯ y ∗ (t) − ψ) ¯ ≤ e−2ρτ as (y ∗ (τ ) − ψ, ¯ y ∗ (τ ) − ψ) ¯ e−2ρt as (y ∗ (t) − ψ,
t
+
¯ 2V + |Aψ¯ − f (s)|2H ) ds e−2ρs (M 2 |y ∗ (s) − ψ|
τ
for 0 ≤ τ ≤ t ≤ T . This implies that ∗ ∗ ¯ y ∗ (τ )−ψ)+|y ¯ ¯ 2H ≥ lim sup as (y ∗ (t)−ψ, ¯ y ∗ (t)−ψ)+|y ¯ ¯ 2H . as (y ∗ (τ )−ψ, (τ )−ψ| (t)−ψ| t→τ
defines an equivalent norm on the Hilbert space V , |y ∗ (t)−y ∗ (τ )|V → 0 as t ↓ τ . Hence y is right continuous. Now, in addition assumptions (3)–(4) are supposed to hold. Let Since as (φ, φ)+|φ|2H ∗
λˆ c (t) = c min(0, yˆc (t) − ψ). Then for c ≤ cˆ and φ = (yˆc − yˆcˆ )+ (λˆ c − λˆ cˆ , φ) = (c − c) ˆ min(0, yˆc − ψ) + c(min(0, ˆ yˆc − ψ) − min(0, yˆcˆ − ψ)), φ ≥ 0.
i
i i
i
i
i
i
10.2. Regularity
ItoKunisc 2008/6/12 page 291 i
291
Hence, using the arguments leading to (10.1.4), we have yˆc ≤ yˆcˆ for c ≤ c. ˆ Then yˆc (t) → y ∗ (t) strongly in H and pointwise a.e. in . Moreover, λˆ c (t) = c min(0, yˆc − ψ) → λ∗ weakly in L2 (0, T ; V ∗ ). Summarizing Theorems 10.4 and 10.7 we have the following corollary. Corollary 10.8. If assumptions (1)–(6) hold, y0 − ψ ∈ C ∩ V , and f ∈ L2 (0, T ; H ), then for every t ∈ [0, T ] yˆc (t) ≤ y ∗ (t) ≤ yc (t) and yˆc (t) ↑ y ∗ (t) and yc (t) ↓ y ∗ (t) pointwise a.e., monotonically in as c → ∞. Moreover, y ∗ − ψ ∈ H 1 (0, T ; H ) ∩ L2 (0, T ; dom(A)) ∩ L2 (0, T ; V ).
10.2
Regularity
In this section we discuss additional regularity of the solution y ∗ to (10.0.1) under the assumptions of Theorem 10.4 (3). For h > 0 we have, suppressing the superscripts “∗ ”, d (y(t + h) − y(t) λ(t + h) − λ(t) f (t + h) − f (t) y(t + h) − y(t) +A + = . dt h h h h From (10.0.5) (λ(t + h) − λ(t), y(t + h) − y(t)) = −(λ(t + h), y(t) − ψ) − (λ(t), y(t + h) − ψ) ≥ 0 and thus
d dt
(y(t + h) − y(t) (y(t + h) − y(t) , h h y(t + h) − y(t) y(t + h) − y(t) + A , h h f (t + h) − f (t) y(t + h) − y(t) ≤ , . h h
Multiplying this by t > 0 we obtain
d t
y(t + h) − y(t)
2 tω
y(t + h) − y(t)
2 1
(y(t + h) − y(t)
2 +
≤ 2
h 2 h h dt 2 H V
y(t + h) − y(t) 2
+ t +ρt
h 2ω H
(f (t + h) − f (t) 2
.
∗ h V
i
i i
i
i
i
i
292
ItoKunisc 2008/6/12 page 292 i
Chapter 10. Parabolic Variational Inequalities
Integrating in time,
t
y(t + h) − y(t) 2
y(s + h) − y(s) 2
ds t s
+ω
h h 0 H V
t
y(s + h) − y(s) 2 2ρt
+ s ≤e
h ω 0 H
f (s + h) − f (s) 2
∗ ds, h V
and letting h → 0+ , we obtain
t t
d 2
d 2
d 2 s 2ρt
t y + ω s y ds ≤ e y
+
dt H dt V dt H ω 0 0
d 2
f ds.
dt
(10.2.1)
H
Hence we obtain the following theorem. Theorem 10.9. Suppose that assumptions (1)–(5) hold and that y0 − ψ ∈ C ∩ V , f ∈ L2 (0, T ; H ) ∩ H 1 (0, T , V ∗ ). Then the strong solution satisfies (10.2.1) and thus y(t) ∈ dom (A) for all t > 0. The conclusion of Theorem 10.9 remains correct under the assumptions of the first part of Theorem 10.7, i.e., under assumptions (1)–(2) and (5)–(6), y0 − ψ ∈ C ∩ V , and f ∈ L2 (0, T ; H ) ∩ H 1 (0, T , V ∗ ).
10.3
Continuity of q → y(q) ∈ L∞ ()
In this section we analyze the continuous dependence of the strong solution to (10.0.1) with respect to parameters in the operator A. Let U denote the normed linear space of parameters and let U˜ ⊂ U be a bounded subset such that for each q ∈ U˜ the operator A(q) satisfies the assumptions (1)–(5) specified at the beginning of this chapter with ρ = 0. We assume further that dom(A(q)) = D is independent of q ∈ U˜ and that q ∈ U˜ → A(q) ∈ L(X, V ∗ ) is Lipschitz continuous with Lipschitz constant κ. Let y(q) be the solution to (10.0.1) corresponding to A = A(q), q ∈ U˜ . Then for q1 , q2 ∈ U˜ , we have d (y(q1 ) − y(q2 )) + A(q1 )(y(q1 ) − y(q2 )) dt + (A(q1 ) − A(q2 ))y(q2 ) , y(q1 ) − y(q2 ) ≤ 0, and therefore
T |y(q1 )(T ) − y(q2 )(T )|2H + ω |y(q1 ) − y(q2 )|2V dt 0 1 T ≤ |(A(q2 ) − A(q1 ))y(q2 )|2V ∗ dt = e(q1 , q2 )2 , ω 0
i
i i
i
i
i
i
10.3. Continuity of q → y(q) ∈ L∞ ()
ItoKunisc 2008/6/12 page 293 i
293
where e(q1 , q2 )2 ≤
κ |q1 − q2 |2U |y(q2 )|2L2 (0,T ;V ) . ω
Since from Theorem 10.9, y(q)(T ) ∈ D for q ∈ U˜ , it follows by interpolation that ¯ 1 , q2 )1−α |y(q1 )(T ) − y(q2 )(T )|W α ≤ Ce(q
(10.3.1)
where W α = [H, D]α is the interpolation space between D and H (see, e.g., [Fat, Chapter 8]), and C¯ is an embedding constant. If L∞ () ⊂ W α , with α ∈ (α0 , 1) for some α0 , then Hölder’s continuity of q ∈ U˜ → y(q)(T ) ∈ L∞ () follows. Next we prove Lipschitz continuity of q ∈ U˜ → y(q) ∈ L∞ (0, T ; L∞ ()). Some prerequisites are established first. We assume that A(q) generates an analytic semigroup S(t) = S(t; q) on H for every q ∈ U˜ [Paz]. Then for each q ∈ U˜ there exists M such that Aα S(t) ≤
M for all t > 0, tα
(10.3.2)
where Aα denote the fractional powers of A, with α ∈ (0, 1). We assume that M is independent of q ∈ U˜ . We shall further assume that 1
dom(A 2 ) = V
(10.3.3)
for all q ∈ U˜ , which is the case for a large class of second order elliptic differential operators; see, e.g., [Fat, Chapter 8]. We assume that A(q)L(V ,V )∗ ≤ ω¯ for all q ∈ U˜ .
(10.3.4)
Let r > 2 be such that V ⊂ Lr (), and let M¯ denote the embedding constant so that ¯ |V for all ζ ∈ V . |ζ |Lr () ≤ M|ζ
(10.3.5)
For p=
2r ∈ (2, ∞), r −2
we shall utilize the assumption ¯ 1 − q2 |U |A(q2 )α y|H |A− 2 (q1 )(A(q1 ) − A(q2 ))y|Lp () ≤ κ|q 1
for some α ∈ (0, 1) and all q1 , q2 ∈ U˜ , y ∈ D.
(10.3.6)
This assumption is applicable, for example, if the parameter enters as a constant into the leading differential term of A(q) or if it enters into the lower order terms.
i
i i
i
i
i
i
294
ItoKunisc 2008/6/12 page 294 i
Chapter 10. Parabolic Variational Inequalities
Theorem 10.10. Let A(q) generate an analytic semigroup for every q ∈ U˜ , and let assumptions (1)–(5) and (7) hold. If further (10.3.3)–(10.3.6) are satisfied and f ∈ L∞ (0, T ; H ), y0 ∈ C ∩ D, then q → y(q) is Lipschitz continuous from U˜ → L∞ (0, T ; L∞ ()). Proof. (1) Let q ∈ U˜ and A = A(q). Let f 1 , f 2 ∈ L∞ (0, T ; H ) with A− 2 f i ∈ L∞ (0, T ; Lp ()), i = 1, 2, 1
and let y 1 , y 2 denote the corresponding strong solutions to (10.0.1). For k > 0 let φk = max(0, y 1 − y 2 − k) and k = {x ∈ : φk > 0}. By assumption φk ∈ V and φk ≥ 0. Note that (min(0, λ¯ + c (z1 − ψ)) − min(0, λ¯ + c (z2 − ψ)), (z1 − z2 − k)+ ) ≥ 0 for all z1 , z2 ∈ H and k > 0. Thus, it follows from (10.0.5) and Theorem 10.4 that for φk = (y 1 − y 2 − k)+ ∈ V d 1 2 (y − y ), φk + A(y 1 − y 2 ), φk ≤ (f 1 − f 2 , φk ) (10.3.7) dt for a.e. t ∈ (0, T ). By assumption we have Aζ, (ζ − k)+ ) ≥ ω|(ζ − k)+ |2V for ζ ∈ V and hence it follows from (10.3.7) that ω 0
T
|φk |2V dt ≤
T
T
|(f, φk )| dt ≤ A(q)L(V .V ∗ )
0
|A− 2 f |L2 (k ) |φk |V dt, 1
0
where φk (0) = 0, f = f 1 − f 2 , and thus for any β > 1 ω 0
T
|φk |2V
12 dt
T
≤ ω¯ 0
≤ ω¯ C
|A
− 12
1−β
f |2L2 (k )
T
|A
− 12
0
with
C= 0
T
|A
− 12
f |2L2
12 dt
f |2L2 (k )
β2 dt
(10.3.8)
12 dt
.
i
i i
i
i
i
i
10.3. Continuity of q → y(q) ∈ L∞ ()
ItoKunisc 2008/6/12 page 295 i
295
For p > q = 2 we have
|A− 2 f |q dx ≤
|A− 2 f |p
1
1
k
q/p |k |(p−q)/p .
(10.3.9)
k
We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞ and we find for r > 2 |φk |rLr ≥ |φ − k|r dx > |φ − k|r ds ≥ |h ||h − k|r . (10.3.10) k
h
It thus follows from (10.3.5) and (10.3.8)–(10.3.10) that for β > 1
T
2 r
|h | dt
12
0
12 T β2 ¯ 1−β T − 1 2 M¯ ω¯ MC 2 2 ≤ |φk |V ≤ |A f |L2 (k ) dt |h − k| 0 ω|h − k| 0 ¯ 1−β ω¯ MC ≤ ω|h − k|
For
1 P
+
1 Q
T
T
|A
− 12
0
f |2Lp |k |
p−2 p
β2 dt
.
2Pβ
T
= 1 this implies that 2 r
|h | dt
12
¯ 1−β ω¯ MC ≤ ω|h − k|
0
T
|A
0
− 12
f |2P Lp
For P = ∞ and Q = 1 this implies, using that p =
T
2 r
|h | dt
12
0 ¯
K ≤ |h − k|
dt
|k |
Q(p−2) p
2Qβ dt
.
0
2r , r−2
T
2 r
|k | dt
β2 ,
(10.3.11)
0
where K = ω¯ MC |A− 2 f |L∞ (0,T ;Lp ()) . ω Now, we use the following fact [Tr]: Let ϕ : (k1 , h1 ) → R be a nonnegative, nonincreasing function and suppose that there are positive constants K, s, and β > 1 such that ϕ(h) ≤ K(h − k)−s ϕ(k)β for k1 < k < h < h1 . 1−β
1
β
β−1 1 ˆ = 0. Here Then, if kˆ = K s 2 β−1 ϕ(k1 ) r satisfies k1 + kˆ < h1 , it follows that ϕ(k1 + k) we set β
ϕ(k) =
T
r 2
|k | dt
12
0
on (0, ∞), s = 1, β > 1, and k1 = sup |A− 2 (f 1 (t) − f 2 (t))|Lp () . 1
t∈(0,T )
i
i i
i
i
i
i
296
ItoKunisc 2008/6/12 page 296 i
Chapter 10. Parabolic Variational Inequalities
It then follows from (10.3.8)–(10.3.10), as in the computation below (10.3.10), that ϕ(k1 ) ≤
M¯ ωC ¯ . ωk1
From the definition of kˆ we have in case C ≥ k1 kˆ ≤ 2 β−1 β
ω¯ M¯ ω
β β 1−β C 1−β k1 C β−1 k1
=2
β β−1
ω¯ M¯ ω
β k1 .
The same estimate can also be obtained in the case that C ≤ k1 , and consequently k1 + kˆ ≤ β ¯ k1 , where = 1 + 2 β−1 ( ω¯ωM )β . Hence we obtain y 1 − y 2 ≤ k1 a.e. in (0, T ) × . Analogously a uniform lower bound for y 1 − y 2 is obtained by using φk = min(0, y 1 − y 2 − k) ≤ 0 and thus |y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤ sup |A− 2 (f 1 (t) − f 2 (t))|Lp () . 1
(10.3.12)
t∈(0,T )
(2) We use the estimate of step (1) to obtain Lipschitz continuous dependence of the solution on the parameter q. Let q1 , q2 ∈ U˜ with corresponding solutions y(q1 ) and y(q2 ). Since d y(q2 ) + A(q1 )y(q2 ) + (A(q2 ) − A(q1 ))y(q2 ) + λ(q2 ) = f (t), dt y(q2 ) is the solution to (10.0.1) with A = A(q1 ) and f˜(t) = f − (A(q2 ) − A(q1 ))y(q2 ) ∈ L2 (0, T ; H ). Hence we can apply the estimate of (1) with A = A(q1 ) and f 1 − f 2 = (A(q2 ) − A(q1 ))y(q2 ) and obtain |y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤ sup |A(q1 )− 2 (A(q1 ) − A(q2 ))y(t; q2 )|Lp () . 1
t∈(0,T )
Utilizing (10.3.6) this implies that |y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤ κ¯ |q1 − q2 |U
sup |Aα (q2 )y(t; q2 )|H .
(10.3.13)
t∈(0,T )
To estimate Aα (q2 )y(t; q2 ) recall from Theorem 10.4 that λ¯ ≤ λ(t; q) ≤ 0 and thus {f − λ(q) : q ∈ U˜ } is uniformly bounded in L∞ (0, T ; H ). From (10.0.6) we have that
t
A(q2 )α y(t; q2 ) = A(q2 )α S(t; q2 )y0 +
A(q2 )α S(t − s; q2 )(f (s) − λ(s; q2 )) ds
0
∈ L∞ (0, T ; H ). From (10.3.2) and since y0 ∈ D it follows that {A(q2 )α y(q2 ) : q2 ∈ U˜ } is bounded in L (0, T ; H ) as desired. ∞
i
i i
i
i
i
i
10.4. Difference schemes and weak solutions
10.4
ItoKunisc 2008/6/12 page 297 i
297
Difference schemes and weak solutions
In this section we establish existence of weak solutions to (10.0.1) based on finite difference schemes (10.4.1). This difference approximation is also used to establish uniqueness of the weak solution and for proving monotonicity properties of the solution in the following section. For the sake of the simplicity of presentation we assume that ρ = 0. Consequently Aφ, φ = as (φ, φ) defines an equivalent norm on V . For h > 0 consider the discretized (in time) variational inequality: Find y k − ψ ∈ C, k = 1, . . . , N, satisfying
y k − y k−1 + Ay k − f k , y − y k ≥ 0, h
y 0 = y0
(10.4.1)
for all y − ψ ∈ C, where 1 f = h
kh
k
f (t) dt (k−1)h
and N h = T . Throughout this section we assume that y0 ∈ H and f ∈ L2 (0, T ; V ∗ ). Theorem 10.11. Assume that y0 ∈ H and f ∈ L2 (0, T , V ∗ ) and that assumptions (1)–(2) hold. Then there exists a unique solution {y k }N k=1 to (10.4.1). Proof. To establish existence of solutions to (10.4.1), we proceed by induction with respect to k and assume that existence has been proven up to k − 1. To verify the induction step consider the regularized problems yck − y k−1 + Ayck + c min(0, yck − ψ) − f k = 0. h
(10.4.2)
Since y ∈ H → min c (0, y − ψ) is Lipschitz continuous and monotone, the operator B : V → V ∗ defined by B(y) =
y + Ay + c min(0, y − ψ) h
is coercive, monotone, and continuous for all h > 0. Hence by the theory of maximal monotone operators (10.4.2) admits a unique solution; cf., e.g., [ItKa, Chapter I.5], [Ba, Chapter II.1]. For each c > 0 and k = 1, . . . , N we find 1 (|y k − ψ|2H − |y k−1 − ψ|2H + |yck − y k−1 |2H ) 2h c + A(yck − ψ) + Aψ − f k , yck − ψ + c |(yck − ψ)− |2H = 0. Thus the families |yck − ψ|2V and c |(yck − ψ)− |2H are bounded in c > 0 and there exists a subsequence of {yck − ψ} that converges to some y k − ψ weakly in V as c → ∞. As
i
i i
i
i
i
i
298
ItoKunisc 2008/6/12 page 298 i
Chapter 10. Parabolic Variational Inequalities
argued in the proof of Theorem 10.7 (cf. (10.1.15)), y k − ψ ∈ C and hence y k − ψ ∈ C. Note that (−(yck −ψ)− , y −yck ) = (−(yck −ψ)− , y −ψ −(yck −ψ)) ≤ 0
for all y −ψ ∈ C (10.4.3)
and lim inf Ayck , yck − y
c→∞
= lim inf A(yck − ψ), yck − ψ + A(yck − ψ), ψ − y + Aψ, yck − ψ
c→∞
(10.4.4)
≥ A(y − ψ), y − ψ + A(y − ψ), ψ − y + Aψ, y − ψ
k
k
k
k
= Ay k , y k − y . Passing to the limit in (10.4.2) utilizing (10.4.3) and (10.4.4) we obtain
y k − y k−1 + Ay k − f k , y k − y h k yc − y k−1 k ≤ lim inf , yc − y + Ayck , yck − y − f k , yck − y ≤ 0, c→∞ h
and hence y k satisfies (10.4.1). To verify uniqueness, let y˜ k be another solution to (10.4.1). Then, from (10.4.1),
(y k − y˜ k−1 ) − (y k−1 − y˜ k−1 ) k k k k + A(y − y˜ ), y − y˜ ≤ 0 h
and thus 1 k 1 k−1 |y − y˜ k |2H + A(y k − y˜ k ), y k − y˜ k ≤ |y − y˜ k−1 |2H . 2h 2h Since y 0 = y˜ 0 = y0 , this implies that y k = y˜ k for all k ≥ 1. Next we discuss existence and uniqueness of weak solutions to (10.0.1) by passing to the limit in the piecewise defined functions yh(1) = y k +
t − k h k+1 − y k ), (y h
yh(2) = y k+1 on (k h, (k + 1) h]
(10.4.5)
for k = 0, . . . , N − 1. Theorem 10.12. Suppose that the assumptions of Theorem 10.11 hold. Then there exists a unique weak solution y ∗ of (10.0.1). Moreover t → y ∗ (t) ∈ H is right continuous, y ∗ ∈ B(0, T ; H ), and yh(2) − ψ → y ∗ − ψ strongly in L2 (0; T ; V ).
i
i i
i
i
i
i
10.4. Difference schemes and weak solutions
ItoKunisc 2008/6/12 page 299 i
299
Proof. Setting y = ψ in (10.4.1), we obtain |y k − ψ|2H − |y k−1 − ψ|2H + |y k − y k−1 |2H + hω |y k − ψ|2V ≤
h |Aψ − f k |2V ∗ . ω
Thus, m
|y m − ψ|2H +
(|y k − y k−1 |2H + ωh |y k − ψ|2V ) ≤ |y − ψ|2H +
k=+1
m 1 |Aψ − f k |2V ∗ h ω k=+1 (10.4.6)
for all 0 ≤ < m ≤ N . Since
T
0
|yh(1) − yh(2) |2H h ≤
N h k |y − y k−1 |2H h → 0 as h → 0+ . 3 k=1
From the above estimates it follows that there exist subsequences of yh(1) , yh(2) (denoted by the same symbols) and y ∗ (t) ∈ L2 (0, T ; V ) such that yh(1) (t), yh(2) (t) → y ∗ (t) weakly in L2 (0, T ; V )
as h → 0+ .
(10.4.7)
Note that d (1) y k+1 − y k yh = on (k h, (k + 1) h]. dt h Thus, we have from (10.4.1) for every y ∈ K d d (1) d y + Ayh(2) − fh , y − yh(2) + yh − y, y − yh(2) ≥ 0 dt dt dt
(10.4.8)
a.e. in (0, T ). Here d (1) d (1) d (1) d d (yh − y), y − yh(2) = yh − y, y − yh(1) + yh − y, yh(1) − yh(2) dt dt dt dt dt (10.4.9) with T d (1) 1 d (10.4.10) yh − y, y − yh(1) dt ≤ |y(0) − y0 |2H dt dt 2 0 and
T
0
d (1) (1) y , yh − yh(2) dt h
dt = −
N 1 k |y − y k−1 |2H . 2 k=1
(10.4.11)
Since 0
T
Ay ∗ , y ∗ − y dt ≤ lim inf + h→0
0
T
Ayh(2) , yh(2) − y dt,
i
i i
i
i
i
i
300
ItoKunisc 2008/6/12 page 300 i
Chapter 10. Parabolic Variational Inequalities
which can be argued as in (10.4.4), it follows from (10.4.7)–(10.4.11) that every weak cluster point y ∗ of yh(2) satisfies
T
0
d 1 y + Ay ∗ − f, y − y ∗ dt + |y(0) − y0 |2H ≥ 0 dt 2
(10.4.12)
for all y ∈ K and a.e. t ∈ (0, T ). Hence y ∗ ∈ L2 (0, T ; V ) is a weak solution of (10.0.1) and y ∗ ∈ B(0, T ; H ). Moreover, from (10.4.6) 1 t ∗ 2 ∗ 2 |y (t) − ψ|H ≤ |y (τ ) − ψ|H + |Aψ − f (s)|2V ∗ ds ω τ for all 0 ≤ τ ≤ t ≤ T . Thus, lim sup |y ∗ (t) − ψ|2H ≤ |y ∗ (τ ) − ψ|2H . t↓τ
which implies that t → y ∗ (t) ∈ H is right continuous. Let y ∗ be a weak solution. Setting y = yh(1) ∈ K in (10.4.12) and y = y ∗ (t) in (10.4.8) we have T d (1) (1) ∗ ∗ (10.4.13) y + Ay − f, yh − y dt ≥ 0, dt h 0 T d (1) (2) (2) ∗ (10.4.14) y + Ayh − fh , y − yh dt ≥ 0, dt h 0 where we used that
T
0
d (1) (yh − y ∗ ), y ∗ − yh(2) dt ≤ 0 dt
from (10.4.9)–(10.4.11). Summing up (10.4.13) and (10.4.14) and using (10.4.11) implies that 0
T
Ay ∗ , yh(1) − yh(2) − f, yh(1) − y ∗ − fh , y ∗ − yh(2) dt
T N 1 k k−1 2 ≥ |y − y |H + A(y ∗ − yh(2) ), y ∗ − yh(2) dt. 2 k=1 0 ˆ y ∗ (t) − y(t)
ˆ a.e. on (0, T ) for every Letting h → 0+ we obtain 0 ≥ A(y ∗ (t) − y(t)), (2) 2 weak cluster point yˆ of yh in L (0, T ; V ). This implies that the weak solution is unique and that T A(y ∗ − yh(2) ), y ∗ − yh(2) dt → 0 0
as h → 0+ .
i
i i
i
i
i
i
10.4. Difference schemes and weak solutions
ItoKunisc 2008/6/12 page 301 i
301
Corollary 10.13. Let y = y(y0 , y) denote the weak solution to (10.0.1), given y0 ∈ H and f ∈ L2 (0, T ; V ∗ ). Then for all t ∈ [0, T ] T ˜ |y(y0 , f )(t) − y(y˜0 , f )(t)|H + ω |y(y0 , f ) − y(y˜0 , f˜)|2V ds 0
≤ |y0 − y˜0 |2H +
1 ω
t
0
|f − f˜|2V ∗ ds.
Proof. Let y and y˜ be the solution to (10.4.1) corresponding to (y0 , f ) and (y˜0 , f˜). It then follows from (10.4.1) that k (y − y˜ k ) − (y k−1 − y˜ k−1 ) k k k k k k ˜ + A(y − y˜ ) − (f − f ), y − y˜ ≤ 0. h k
k
Thus, |y k − y˜ k |2H + ωh |y k − y˜ k |2V ≤ |y k−1 − y˜ k−1 |2H +
h k |f − f˜k |2V ∗ . ω
Summing this in k, we have |y − m
y˜ m |2H
+ω
m
h |y − k
y˜ k |2V
≤ |y0 −
y˜0 |2H
k=1
m 1 + h |f k − f˜k |2V ∗ , ω k=1
which implies the desired estimate by letting h → 0+ . Corollary 10.14. Let λ¯ ∈ H satisfy λ¯ ≤ 0 and let yc ∈ W (0, T ) be the solution to d yc (t) + Ayc (t) + min(0, λ¯ + c (yc − ψ)) = f. dt
(10.4.15)
Then yc → y ∗ weakly in L2 (0, T ; V ) and yc (T ) → y ∗ (T ) weakly in H as c → ∞, where y ∗ is the unique weak solution to (10.0.1). In addition, if y ∗ ∈ W (0, T ), then |yc − y ∗ |L2 (0,T ;V ) + |yc − y ∗ |L∞ (0,T ;H ) → 0 as c → ∞. Proof. Note that (min(0, λ¯ + c (yc − ψ)), yc − ψ) ≥
c 1 2 ¯ . |(yc − ψ)− |2H − |λ| 2 2c H
Thus, we have 1 d 1 2 ¯ |yc − ψ|2H + A(y − ψ), yc − ψ + c |(yc − ψ)− |2H ≤ f − Aψ, yc − ψ + |λ| 2 dt c and |yc (t) − ψ|2H + ω
0
t
(|yc − ψ|2V + c |(yc − ψ)2H ) ds ≤
1 ω
t 0
1 |f − Aψ|2V ∗ + |λ¯ |2 c
ds.
i
i i
i
i
i
i
302
ItoKunisc 2008/6/12 page 302 i
Chapter 10. Parabolic Variational Inequalities
T Hence 0 |yc − ψ|2H dt → 0 as c → 0 and {yc − ψ}c≥1 is bounded is L2 (0, T ; V ). Using the same arguments as in the proof of Theorem 10.7, there exist y ∗ and a subsequence of {yc − ψ}c≥1 that converges weakly to y ∗ − ψ ∈ L2 (0, T ; V ), and y ∗ − ψ ≥ 0 a.e. in (0, T ) × . For y(t) ∈ K T 1 d d y(t) − (y(t) − yc ), y(t) − yc (t) + Ayc (t) − f (t), y(t) − yc (t)
dt dt 0 2 ¯ +(min(0, λ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) dt = 0, where T
−
0
d 1 (y(t) − yc ), y(t) − yc (t) dt = (|y(0) − y0 |2H − |y(T ) − yc (T )|2 ), (10.4.16) dt 2 (min(0, λ¯ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) ≤
1 2 ¯ . |λ| 2c H
(10.4.17)
Hence, we have T d 1 [ y(t), y(t) − yc (t) dt + Ayc (t) − f (t), y(t) − yc (t) + |y(0) − y0 |2H dt 2 0 ≥
1 1 |y(T ) − yc (T )|2H − 2 2c
0
T
¯ 2H ds. |λ|
Letting c → ∞, y ∗ satisfies (10.0.7) and thus y ∗ is the weak solution of (10.0.1). Suppose that y ∗ ∈ W (0, T ). Then by (10.4.15), d ∗ d ∗ ∗ ∗ ∗ (yc − y ) + A(yc − y ) + y + Ay − f, y − yc dt dt +(min(0, λ¯ + c (yc − ψ)), y ∗ (t) − ψ − (yc − ψ)) = 0. From (10.4.16)–(10.4.17), and since yc → y ∗ weakly in L2 (0, T ; V ), t 1 ∗ 2 |yc (t) − y (t)|H + A(yc − y ∗ ), yc − y ∗ ds → 0, 2 0 and this convergence is uniform with respect to t ∈ [0, T ].
10.5
Monotone property
In this section we establish monotone properties of the weak solution to (10.0.1). As in the previous section assumption (2) is used with ρ = 0. Corollary 10.15. If assumptions (1)–(3) hold, then y(y0 , f ) ≥ y(y˜0 , f˜)
i
i i
i
i
i
i
10.5. Monotone property
ItoKunisc 2008/6/12 page 303 i
303
provided that y0 ≥ y˜0 and f ≥ f˜, with y0 , y˜0 ∈ H and f, f˜ ∈ L2 (0, T ; V ∗ ). As a consequence, if y0 = ψ and f (t) ≥ f (s) for a.e. t > s, then the weak solution satisfies y(t) ≥ y(s) for t ≥ s. Proof. Assume that y k−1 ≥ y˜ k−1 . For φ = −(yck − y˜ck )− it follows from (10.4.2) that
yck − y˜ck − (y k−1 − y˜ k−1 ) , φ + A(yck − y˜ck ) − (f k − f˜k ), φ − c (yck − ψ)− h −(y˜ck − ψ)− , φ = 0.
Since k−1 − y˜ k−1 y − (f k − f˜k ), φ − c (yck − ψ)− − (y˜ck − ψ)− , φ ≥ 0, − h we have from assumption (3) that |φ|2H + Aφ, φ ≤ 0 h and thus yck − y˜ck ≥ 0 for sufficiently small h > 0. From the proof of Theorem 10.11 it follows that we can take the limit with respect to c and obtain y k − y˜ k ≥ 0. By induction this holds for all k ≥ 0. The first claim of the theorem now follows from (10.4.2) and Theorem 10.12. The second one follows from y(t; ψ, f (·)) = y(s; y(t − s; ψ, f ), f (· + t − s)) ≥ y(s; ψ, f (·)). Corollary 10.16. Let assumptions (1)–(3) hold and suppose that the stationary variational inequality Ay − f, φ − y ≥ 0
for all φ − ψ ∈ C
(10.5.1)
ˆ with f ∈ V ∗ has a solution y − ψ ∈ C. Then if y(0) = ψ and f (t) = f , we have y(t) ↑ y, where yˆ is the minimum solution to (10.5.1). Proof. Suppose y¯ is a solution to (10.5.1). Since y(t) ¯ := y, ¯ t ≥ 0, is also the unique solution to (10.0.1) with y0 = y¯ ≥ ψ, it follows from Corollary 10.16 that y(t) ≤ y¯ for all t ∈ [0, T ]. On the other hand, it follows from Theorem 10.4 (2) that y(τ + 1) − y(τ ) +
τ +1
(Ay(s) − f ) ds, φ − y(τ ) ≥ 0
for all φ − ψ ∈ C (10.5.2)
τ
since y(s) ≥ y(τ ), s ≥ τ . By the Lebesgue monotone convergence theorem yˆ = limτ →∞ y(τ ) ∈ C exists. Letting τ → ∞ in (10.5.2), we obtain that yˆ satisfies (10.5.1) and thus yˆ ≤ y. ¯
i
i i
i
i
i
i
304
ItoKunisc 2008/6/12 page 304 i
Chapter 10. Parabolic Variational Inequalities
Corollary 10.17 (Perturbation). Let assumptions (1)–(3) hold, and let ψ 1 , ψ 2 ∈ H and f ∈ L2 (0, T ; V ∗ ). Denote by yc1 and yc2 the solutions to (10.1.1) with y0 equal to ψ 1 and ψ 2 , respectively, and let y 1 and y 2 be the corresponding weak solutions to (10.0.1). Assume that (φ − γ )+ ∈ V for any γ ∈ R+ , that φ ∈ V , and that A1, (φ − γ )+ ≥ 0. Then for α = max(0, supx∈ (ψ 1 − ψ 2 )) and β = min(0, inf x∈ (ψ 1 − ψ 2 )) we have β ≤ yc1 − yc2 ≤ α, β ≤ y 1 − y 2 ≤ α. Proof. Note that c(yc1 − yc2 − α) = λ¯ + c(yc1 − ψ 1 ) − (λ¯ + c(yc2 − ψ 2 )) − c(ψ 1 − ψ 2 − α). Thus, an elementary calculation shows that (min(0, λ¯ + c(yc1 − ψ 1 )) − min(0, λ¯ + c(yc2 − ψ 2 )), (yc1 − yc2 − α)+ ) ≥ 0. As in the proof of Theorem 10.4 it follows that φ = (yc − yc2 − α)+ satisfies 1 d |φ|2 ≤ ρ|φ|2H . 2 dt H Since φ(0) = 0, this implies that φ(t) = 0, t ≥ 0, and thus yc1 − yc2 ≤ α a.e. on (0, T ) × . Similarly, letting φ = (yc1 − yc2 − β)− we obtain yc1 − yc2 ≥ β a.e. on (0, T ) × . Since from Corollary 10.14, yc1 → y 1 and yc2 → y 2 weakly in L2 (0, T ; H ) for c → ∞, we obtain the desired estimates.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 305 i
Chapter 11
Shape Optimization
11.1
Problem statement and generalities
We apply the framework that we developed in Section 5.5 for weakly singular problems to calculate the shape derivative of constrained optimization problems of the form min J (y, , ) (11.1.1) subject to e(y, ) = 0 over admissible domains and manifolds . Here e denotes a partial differential equation depending on the state variable y and the domain , and J stands for a cost functional depending, besides y and , on , which constitutes the variable part of the boundary of . We consider as an example the problem of determining an unknown interior domain D ⊂ U in an elliptic equation y = 0, in = U \ D, y = 0 on ∂D from the Cauchy data y = f,
∂y = g on the outer boundary ∂U. ∂n
This problem can be formulated as the shape optimization problem ⎧ min ∂U |y − f |2 ds ⎪ ⎪ ⎨ subject to ⎪ ⎪ ⎩ ∂ y = 0 in and y = 0 on ∂D, ∂n y = g on ∂U, and it is a special case by the theory to be presented in Section 11.2. Shape sensitivity calculus has been widely analyzed in the past and is covered in the well-known lecture notes [MuSi] and in the monographs [DeZo, HaMä, HaNe, SoZo, Zo], 305
i
i i
i
i
i
i
306
ItoKunisc 2008/6/12 page 306 i
Chapter 11. Shape Optimization
for example. The most commonly used approach relies on differentiating the reduced functional Jˆ() = J (y(), , ()) using the chain rule. As a consequence shape differentiability of y with respect to variations of the domain are essential in this method. In an alternative approach the partial differential equation is realized in a Lagrangian formulation; see [DeZo], for example. The method for computing the shape derivative that we describe here is quite different and elementary. In short, it can be described as follows. First we embed e(y, ) = 0 to an equation in a fixed domain 0 by a coordinate transformation which is called the method of mapping. Then we combine the Lagrange multiplier method to realize the constraint e(y, ) = 0, using the shape derivative of functionals to calculate the shape derivative of Jˆ(). In this process, differentiability of the state with respect to the geometric quantities is not used. In fact, we require only Hölder continuity with exponent greater than or equal to 12 of y with respect to the geometric data. We refer the reader to [IKPe2] for an example in which the reduced cost functional is shape differentiable whereas the state variable of the constraining partial differential equation is not. For comparison we briefly discuss an example using the “chain rule” approach. But first we require some notions from shape calculus, which we introduce only formally here. ¯ ⊂ U , denote a mapping Let be a reference domain in Rd , and let h : U → Rd , with describing perturbations of by means of t = Ft (), where Ft : → Rd is a perturbation of the identity, given by Ft = id + th, for example. Let {zt : t → R|t ≤ τ }, for some τ > 0, denote a family of mappings, and consider the associated family zt = zt ◦ Ft : → R which transports zt back to . Then the shape derivative of {zt } at in direction h is defined as 1 z (x) = lim (zt (x) − z0 (x)) for x ∈ , t→0 t and the material derivative is given by 1 z˙ (x) = lim (zt (x) − z0 (x)) for x ∈ . t→0 t Under appropriate regularity conditions we have z = z˙ − ∇z · F0 . For a functional → J () the shape derivative at with respect to the perturbation h is defined as J ()h = lim
t→0
1 J (t ) − J () . t
Consider now the cost functional min J (y, , ) =
1 2
y2d
(11.1.2)
i
i i
i
i
i
i
11.1. Problem statement and generalities
ItoKunisc 2008/6/12 page 307 i
307
subject to the constraint e(y, ) = 0 which is given by the mixed boundary value problem − y = f y=0 ∂y =g ∂n
in ,
(11.1.3)
on 0 ,
(11.1.4)
on .
(11.1.5)
Here the boundary of the domain is the disjoint union of a fixed part 0 and the unknown part , and f and g are given functions. A formal differentiation leads to the shape derivative of the reduced cost functional Jˆ ()h =
yy d +
1 2
∂y 2 + κy 2 h · n d, ∂n
(11.1.6)
where y denotes the shape derivative of the solution y of (11.1.3) at with respect to a deformation field h, and κ stands for the curvature of . For a thorough discussion of the details we refer the reader to [DeZo, SoZo]. Differentiating formally the constraint e(y, ) = 0 with respect to the domain, one obtains that y satisfies − y = 0 y = 0 ∂y ∂n
= div (h · n ∇ y) + f +
∂g + κg h · n ∂n
in , on 0 ,
(11.1.7)
on ,
where div , ∇ stand for the tangential divergence and tangential gradient, respectively, on the boundary . Introducing a suitably defined adjoint variable and using (11.1.7), the first term on the right-hand side of (11.1.6) can be manipulated in such a way that Jˆ ()h can be represented in the form required by the Zolesio–Hadamard structure theorem (see [DeZo]) Jˆ ()h =
Gh · n d.
We emphasize that the kernel G does not involve the shape derivative y anymore. Although y is only an intermediate quantity, a rigorous analysis requires justifying the formal steps in the preceding discussion. In addition one has to verify that the solution of (11.1.7) actually is the shape derivative of y in the sense of the definition in, e.g., [SoZo]. Furthermore, since the trace of y on 0 is used in (11.1.7) one needs y ∈ H 1 (). However, y ∈ H 2 () is not sufficient to allow for an interpretation of the Neumann condition in (11.1.7) in H −1/2 (). Hence y ∈ H 1 () requires more regularity of the solution y of (11.1.3) than H 2 (). In the approach of this chapter we utilize only y ∈ H 2 () for the characterization of the shape derivative of Jˆ(). We return to this example in Section 11.3. In Section 11.2 we present the proposed general framework to compute the shape derivative for (11.1.1). Section 11.3 contains applications to shape optimization constrained by linear elliptic systems, inverse interface problems, the Bernoulli problem, and shape optimization for the Navier–Stokes equations.
i
i i
i
i
i
i
308
ItoKunisc 2008/6/12 page 308 i
Chapter 11. Shape Optimization
11.2
Shape derivative
Consider the shape optimization problem min J (y, , ) ≡ j1 (y) dx + j2 (y) ds +
j3 (y)ds
(11.2.1)
∂\
subject to the constraint e(y, ) = 0,
(11.2.2)
which represents a partial differential equation posed on the domain ⊂ Rd with boundary ∂. We focus on sensitivity analysis of the reduced cost functional in (11.2.1)–(11.2.2) with respect to . To describe the admissible class of geometries, let U ⊂ Rd be a fixed bounded domain with C 1,1 -boundary ∂U , or a convex domain with Lipschitzian boundary, and let D be a domain with C 1,1 -boundary := ∂D satisfying D¯ ⊂ U . For the reference domain we admit any of the three cases (i) = D, (ii) = U , ¯ (iii) = U \ D. Note that
˙ (∂ \ ) ⊂ U ∪ ˙ ∂U. ∂ = (∂ ∩ ) ∪
(11.2.3)
Thus the boundary ∂ for the cases (i)–(iii) is given by (i) ∂ = ∪ ∅ = , (ii) ∂ = ∅ ∪ ∂U = ∂U , (iii) ∂ = ∪ ∂U. To introduce the admissible class of perturbations let h ∈ C 1,1 (U¯ ) with h = 0 on ∂U = 0 and define, for t ∈ R, the mappings Ft : U → Rd by the perturbation of identity Ft = id + t h.
(11.2.4)
Then there exists τ > 0 such that Ft (U ) = U and Ft is a diffeomorphism for |t| < τ . Defining the perturbed domains t = Ft () and the perturbed manifolds as t = Ft (), ¯ t ⊂ U for |t| < τ . Note that since h|∂U = 0 the it follows that t is of class C 1,1 and boundary of U remains fixed as t varies, and hence by (11.2.3) (∂)t \ t = ∂ \
for
|t| < τ.
i
i i
i
i
i
i
11.2. Shape derivative
ItoKunisc 2008/6/12 page 309 i
309
Alternatively to (11.2.4) the perturbations could be described as the flow determined by the initial value problem d χ (t) = h(χ (t)), χ (0; x) = x, dt with Ft (x) = χ (t; x), i.e., by the velocity method. Let Jˆ(t ) be the functional defined by Jˆ(t ) = J (yt , t , t ), where yt satisfies the constraint e(yt , t ) = 0.
(11.2.5)
The shape derivative of Jˆ at in the direction of the deformation field h is defined as 1 Jˆ ()h = lim Jˆ(t ) − Jˆ() . t→0 t The functional Jˆ is called shape differentiable at if Jˆ ()h exists for all h ∈ C 1,1 (U, Rd ) and defines a continuous linear functional on C 1,1 (U¯ , Rd ). Using the method of mappings one transforms the perturbed state constraint (11.2.5) to the fixed domain . For this purpose define y t = yt ◦ Ft . Then y t : → Rl satisfies an equation on the reference domain , which we express as e(y ˜ t , t) = 0,
|t| < τ.
(11.2.6)
We suppress the dependence of e˜ on h, because h will denote a fixed vector field throughout. Because F0 = id one obtains y 0 = y and e(y ˜ 0 , 0) = e(y, ).
(11.2.7)
We axiomatize the above description and impose the following assumptions on e, ˜ respectively, e. (H1) There is a Hilbert space X and a C 1 -function e˜ : X × (−τ, τ ) → X ∗ such that e(yt , t ) = 0 is equivalent to e(y ˜ t , t) = 0 in X ∗ , with e(y, ˜ 0) = e(y, ) for all y ∈ X. (H2) There exists 0 < τ0 ≤ τ such that for |t| < τ0 there exists a unique solution y t ∈ X to e(y ˜ t , t) = 0 and |y t − y 0 |X = 0. lim t→0 t 1/2
i
i i
i
i
i
i
310
ItoKunisc 2008/6/12 page 310 i
Chapter 11. Shape Optimization
(H3) ey (y, ) ∈ L(X, X∗ ) satisfies e(v, ) − e(y, ) − ey (y, )(v − y), ψ X∗ ×X = O(|v − y|2X ) for every ψ ∈ X, where y, v ∈ X. (H4) e˜ and e satisfy 1 ˜ t) − e(y t , ) + e(y, ), ψ X∗ ×X = 0 ˜ t , t) − e(y, lim e(y t→0 t for every ψ ∈ X, where y t and y are the solutions of (11.2.6) and (11.2.2), respectively. In applications (H4) typically results in an assumption on the regularity of the coefficients in the partial differential equation and on the vector field h. We assume throughout that X %→ L2 (, Rl ) and, in the case that j2 , j3 are nontrivial, that the elements of X admit traces in L2 (, Rl ), respectively, L2 (∂ \ , Rl ). Typically X will be a subspace of H 1 (, Rl ) for some l ∈ N. With regards to the cost functional J we require (H5) ji ∈ C 1,1 (Rl , R), i = 1, 2, 3. As a consequence of (H1)–(H2) we infer that (11.2.5) has a unique solution yt which is given by yt = y t ◦ Ft−1 . Condition (H5) implies that j1 (y) ∈ L2 (), j1 (y) ∈ L2 ()l , j2 (y) ∈ L2 (), j2 (y) ∈ L2 ()l , and j3 (y) ∈ L2 (∂U ), j3 (y) ∈ L2 (∂U )l for y ∈ X. Hence the cost functional J (y, , ) is well defined for every y ∈ X. Lemma 11.1. There is a constant c > 0, such that |ji (v) − ji (y) − ji (y)(v − y)|L1 ≤ c|v − y|2X holds for all y, v ∈ X, i = 1, 2, 3. Proof. For j1 the claim follows from |j1 (v) − j1 (y) − j1 (y)(v − y)| dx
≤
0
1
|j1 y(x) + s(v(x) − y(x)) − j1 (y(x))| ds |v(x) − y(x)| dx
L ≤ |v − y|2L2 ≤ c|v − y|2X , 2 where L > 0 is the Lipschitz constant for j1 . The same argument is valid also for j2 and j3 .
i
i i
i
i
i
i
11.2. Shape derivative
ItoKunisc 2008/6/12 page 311 i
311
Subsequently we use the following notation: At = (DFt )−T ,
It = det DFt ,
wt = It |At n|,
where DFt is the Jacobian of Ft and n denotes the outer normal unit vector to . We require additional regularity properties of the transformation Ft which we specify next. Let I = [−τ0 , τ0 ] with τ0 ≤ τ sufficiently small. t → Ft ∈ C(I, C 2 (U¯ , Rd )), t → Ft−1 ∈ C(I, C 1 (U¯ , Rd )),
F0 = id, t → Ft ∈ C 1 (I, C 1 (U¯ , Rd )), t → It ∈ C 1 (I, C(U¯ )),
t → At ∈ C(I, C(U¯ , Rd×d )),
t → wt ∈ C(I, C()), d Ft |t=0 = h, dt d DFt |t=0 = Dh, dt d It |t=0 = div h, dt
(11.2.8) d −1 F |t=0 = −h, dt t d d DFt−1 |t=0 = (At )T |t=0 = −Dh, dt dt d wt |t=0 = div h. dt
The surface divergence div is defined by div h = div h| − (Dh n) · n. The properties (11.2.8) are easily verified if Ft is specified by perturbation of the identity as in (11.2.4). As a consequence of (11.2.8) there exists α > 0 such that It (x) ≥ α,
x ∈ U¯ .
(11.2.9)
We furthermore recall the following transformation theorem where we already utilize (11.2.9). Lemma 11.2. (1) Let ϕt ∈ L1 (t ); then ϕt ◦Ft ∈ L1 () and ϕt dxt = ϕt ◦Ft det DFt dx. t
(2) Let ht ∈ L (t ); then ht ◦Ft ∈ L1 () and ht dt = ht ◦Ft det DFt |(DFt )−T n| d. 1
t
We now formulate the representation of the shape derivative of Jˆ at in direction h. Theorem 11.3. Assume that (H1)–(H5) hold, that F satisfies (11.2.8), and that the adjoint equation ey (y, )ψ, p X∗ ×X − (j1 (y), ψ) − (j2 (y), ψ) − (j3 (y), ψ)\ = 0,
ψ ∈ X, (11.2.10)
i
i i
i
i
i
i
312
ItoKunisc 2008/6/12 page 312 i
Chapter 11. Shape Optimization
admits a unique solution p ∈ X, where y is the solution to (11.2.2). Then the shape derivative of Jˆ at in the direction h exists and is given by d ˆ J ()h = − e(y, j1 (y) div h dx + j2 (y) div h ds. (11.2.11) ˜ t), p X∗ ×X |t=0 + dt Proof. Referring to (H2) let y t , y ∈ X satisfy e(y ˜ t , t) = e(y, ) = 0
(11.2.12)
for |t| < τ0 . Then yt = y t ◦ Ft is the solution of (11.2.5). Utilizing Lemma 11.2 one therefore obtains 1 ˆ 1 1 It j1 (y t ) − j1 (y) dx + wt j2 (y t ) − j2 (y) ds (J (t ) − Jˆ()) = t t t 1 + (j3 (y t ) − j3 (y))ds t ∂U 1 = It (j1 (y t ) − j1 (y) − j1 (y)(y t − y)) + (It − 1)j1 (y)(y t − y) t + j1 (y)(y t − y) + (It − 1)j1 (y) dx 1 + wt (j2 (y t ) − j2 (y) − j2 (y)(y t − y)) + (wt − 1)j2 (y)(y t − 1) t + j2 (y)(y t − y) + (wt − 1)j2 (y) ds 1 1 + (j3 (y t ) − j3 (y) − j3 (y)(y t − y))ds + j (y)(y t − y)ds. t ∂U t ∂U 3 (11.2.13) Lemma 11.1 and (11.2.8) result in the estimates
It (j1 (y t ) − j1 (y) − j (y)(y t − y)) dx ≤ c|y t − y|2 , 1 X
wt (j2 (y t ) − j2 (y) − j (y)(y t − y)) ds ≤ c|y t − y|2 , X 2
(j3 (y t ) − j3 (y) − j3 (y)(y t − y))ds
≤ c|y t − y|2X ,
(11.2.14)
∂U
where c > 0 does not depend on t. Employing the adjoint state p one obtains (j1 (y), y t − y) + (j2 (y), y t − y) + (j3 (y), y t − y)∂U = ey (y, )(y t − y), p X∗ ×X = − e(y t , ) − e(y, ) − ey (y, )(y t − y), p X∗ ×X − e(y ˜ t , t) − e(y, ˜ t) − e(y t , ) + e(y, ), p X∗ ×X − e(y, ˜ t) − e(y, ˜ 0), p X∗ ×X , (11.2.15)
i
i i
i
i
i
i
11.2. Shape derivative
ItoKunisc 2008/6/12 page 313 i
313
where we used (11.2.12). We estimate the ten additive terms on the right-hand side of (11.2.13). Terms one, five, and nine converge to zero by (11.2.14) and (H2). Terms two and six converge to 0 by (11.2.8) and (H2). For terms four and eight one uses (11.2.8). The claim (11.2.11) now follows by passing to the limit in terms three, seven, and ten using (11.2.15), (H3), (H2), (H4), and (H1). To check (H2) in specific applications the following result will be useful. It relies on ⎧ the linearized equation ⎪ ⎪ ⎨ Ey (y, )δy, ψ X∗ ×X = f, ψ X∗ ×X , ψ ∈ X, (H6) ⎪ ⎪ ⎩ admits a unique solution δy ∈ X for every f ∈ X ∗ . Note that this condition is more stringent than the assumption of solvability of the adjoint equation in Theorem 11.3, which requires solvability only for a specific right-hand side. Proposition 11.4. Assume that (11.2.2) admits a unique solution y and that (H6) is satisfied. Then (H2) holds. Proof. Let y ∈ X be the unique solution of (11.2.2). In view of e˜y (y, 0) = ey (y, ) Assumption (H6) implies that e˜y (y, 0) is bijective. The claim follows from the implicit function theorem. Computing the derivative dtd e(y, ˜ t), p X∗ ×X |t=0 in (11.2.11), and subsequently arguing that Jˆ is in fact a shape derivative at , can be facilitated by transforming the equation e(y, ˜ t), p = 0 back to e(y ◦ Ft−1 , t ), p ◦ Ft−1 = 0 together with the following wellknown differentiation rules of the functionals. Lemma 11.5 (see [DeZo]). (1) Let f ∈ C(I, W 1,1 (U )) and assume that ft (0) exists in L1 (U ). Then d f (t, x) dx|t=0 = ft (0, x) dx + f (0, x) h · n ds. dt t (2) Let f ∈ C(I, W 2,1 (U )) and assume that ft (0) exists in W 1,1 (U ). Then d ∂ f (t, x) ds|t=0 = ft (0, x) ds + f (0, s) + κf (0, s) h · n ds, dt t ∂n where κ stands for the additive curvature of . The first part of the lemma is valid also for domains with Lipschitz continuous boundary.
i
i i
i
i
i
i
314
ItoKunisc 2008/6/12 page 314 i
Chapter 11. Shape Optimization In the examples below f (t) will be typically given by expressions of the form μv ◦ Ft−1 , μ∂i (v ◦ Ft−1 )∂j (w ◦ Ft−1 ), v ◦ Ft−1 w ◦ Ft−1 ∂i (z ◦ Ft−1 ),
where μ ∈ H 1 (U ) and v, z, and w ∈ H 2 (U ) are extensions of elements in H 2 (). The assumptions of Lemma 11.5 can be verified using the following result. Lemma 11.6 (see [SoZo]). (1) If y ∈ Lp (U ), then t → y ◦ Ft−1 ∈ C(I, Lp (U )), 1 ≤ p < ∞. (2) If y ∈ H 2 (U ), then t → y ◦ Ft−1 ∈ C(I, H 2 (U )). (3) If y ∈ H 2 (U ), then dtd (y ◦ Ft−1 )|t=0 exists in H 1 (U ) and is given by d (y ◦ Ft−1 )|t=0 = −(Dy) h. dt As a consequence we note that
d ∂ dt i
(y ◦ Ft−1 ) |t=0 exists in L2 (U ) and is given by
d ∂i (y ◦ Ft−1 ) |t=0 = −∂i (Dy h), dt
i = 1, . . . , d.
In the next section ∇y stands for (Dy)T , where y is either a scalar- or vector-valued function. To enhance readability we use two symbols for the inner product in Rd , (x, y), respectively, x · y. The latter will be utilized only in the case of nested inner products.
11.3
Examples
Throughout this section it is assumed that (H5) is satisfied and that the regularity assumptions of Section 11.2 for D, , and U hold. If J does not depend on , we write J (y, ) in place of J (y, , ).
11.3.1
Elliptic Dirichlet boundary value problem
As a first example we consider the volume functional J (y, ) = j1 (y) dx
subject to the constraint (μ∇y, ∇ψ) − (f, ψ) = 0,
(11.3.1)
where X = H01 (), f ∈ H 1 (U ), and μ ∈ C 1 (U¯ , Rd×d ) such that μ(x) is symmetric and uniformly positive definite. Here = D and = ∂. Thus e(y, ) : X → X∗ is given by e(y, ), ψ X∗ ×X = (μ∇y, ∇ψ) − (f, ψ) .
i
i i
i
i
i
i
11.3. Examples
ItoKunisc 2008/6/12 page 315 i
315
The equation on the perturbed domain is determined by e(yt , t ), ψt Xt∗ ×Xt = (μ∇yt , ∇ψt ) dxt − f ψ dxt t t = (μt At ∇y t , At ∇ψ t )It dx − f t ψ t dx ≡ e(y ˜ t , t), ψ t X∗ ,X
(11.3.2) for any ψt ∈ Xt , with y t = yt ◦ Ft , μt = μ◦Ft , f t = f ◦Ft , and Xt = H01 (t ). Here we used that ∇yt = (At ∇y t )◦Ft−1 and Lemma 11.5. (H1) is a consequence of (11.2.8), (11.3.2), and the smoothness of μ and f . Since (11.3.1) admits a unique solution and (H6) holds, Proposition 11.4 implies (H2). Since e˜ is linear in y, assumption (H3) follows. For the verification of (H4) observe that e(y ˜ t , t) − e(y, ˜ t) − e(y t , ) + e(y, ), ψ X∗ ×X = ((μt It At − μ)∇(y t − y), At ∇ψ) + (μ∇(y t − y), (At − I )∇ψ). Hence (H4) follows from differentiability of μ, (11.2.8), and (H2). In view of Theorem 11.3 we have to compute dtd e(y, ˜ t), p X∗ ×X |t=0 for which we use the representation on t in (11.3.2). Recall that the solution y of (11.3.1) as well as the adjoint state p, defined by (μ∇p, ∇ψ) = (j1 (y), ψ) ,
ψ ∈ H01 (),
(11.3.3)
belong to H 2 () ∩ H01 (). Since ∈ C 1,1 (actually Lipschitz continuity of the boundary would suffice), y as well as p can be extended to functions in H 2 (U ), which we again denote by the same symbol. Therefore by Lemmas 11.5 and 11.6 d e(y, ˜ t), p X∗ ×X |t=0 dt d = (μ∇(y ◦ Ft−1 ), ∇(p ◦ Ft−1 )) dxt − fp ◦ Ft−1 dxt |t=0 dt t t (μ∇(−∇y · h), ∇p) = (μ∇y, ∇p) (h, n) ds + + μ∇y, ∇(−∇p · h) + f (∇p, h) dx. (11.3.4) Note that ∇y · h as well as ∇p · h do not belong to H01 () but they are elements of H 1 (). Therefore Green’s theorem implies (μ∇(−∇y · h), ∇p) + (μ∇y, ∇(−∇p · h)) + f (∇p, h) dx = div(μ∇p) (∇y, h) dx − (μ∇p, n)(∇y, h) ds (11.3.5) + (div(μ∇y) + f ) (∇p, h) dx − (μ∇y, n) (∇p, h) ds ∂y ∂p = − j1 (y) (∇y, h) dx − 2 (μn, n) (h, n) ds. ∂n ∂n
i
i i
i
i
i
i
316
ItoKunisc 2008/6/12 page 316 i
Chapter 11. Shape Optimization
Above we used the strong form of (11.3.1) and (11.3.3) in L2 () as well as the identities (μ∇y, n) = (μn, n)
∂y , ∂n
(∇y, h) =
∂y (h, n) ∂n
(together with the ones with y and p interchanged) which follow from y, p ∈ H01 (). Applying Theorem 11.3 results in d ˆ j1 (y) div h dx ˜ t), p X∗ ,X |t=0 + J ()h = − e(y, dt ∂y ∂p = (μn, n) (h, n) ds + (j1 (y)(∇y, h) + j1 (y) div h) dx ∂n ∂n ∂y ∂p div(j1 (y)h) dx, (h, n) ds + = (μn, n) ∂n ∂n and the Stokes theorem yields the final result, ∂y ∂p (μn, n) + j1 (y) (h, n) ds. J ()h = ∂n ∂n Remark 11.3.1. If we were to be content with a representation of the shape variation in terms of volume integrals we could take the expression for dtd e(y, ˜ t)|t=0 given in (11.3.4) and bypass the use of Green’s theorem in (11.3.5). The regularity requirement on the domain then results from y ∈ H 2 (), p ∈ H 2 (). In [Ber] the shape derivative in terms of the volume integral is referred to as the weak shape derivative, whereas the final form in terms of the boundary integrals is called the strong shape derivative.
11.3.2
Inverse interface problem
We consider an inverse interface problem which is motivated by electrical impedance tomography. Let U = = (−1, 1) × (−1, 1) and ∂U = ∂. Further let the domain ¯ − ⊂ U , represent the inhomogeneity of the conducting medium and set D = − , with + − ¯ . We assume that − is a simply connected domain of class C 2 with bound = U \ ary which represents the interface between − and + . The inverse problem consists of identifying the unknown interface from measurements z which are taken on the boundary ∂U . This can be formulated as min J (y, ) ≡ (y − z)2 ds (11.3.6) ∂U
subject to the constraints − div(μ∇y) = 0 in − ∪ + , 1 2 ∂y [y] = 0, μ − = 0 on , ∂n ∂y =g on ∂U, ∂n
(11.3.7)
i
i i
i
i
i
i
11.3. Examples
ItoKunisc 2008/6/12 page 317 i
317
where g ∈ H 1/2 (∂U ), z ∈ L2 (∂U ), with ∂U g = ∂U z = 0, [v] = v + − v − on , and n+/− standing for the unit outer normals to +/− . The conductivity μ is given by μ− , x ∈ − , μ(x) = μ+ , x ∈ + , for some positive constants μ− and μ+ . In the context of the general framework of Section 2 (11.3.7) admits a unique solution 11.2 we have j1 = j2 = 0 and j3 = (y − z) . Clearly 1 y ∈ H (U ) with ∂U y = 0. Its restrictions to + and − will be denoted by y + and y − , respectively. It turns out that the regularity of y ± is better than the one of y. Proposition 11.7. Let and ± be as described above. Then the solution y ∈ H 1 (U ) of (11.3.7) satisfies y ± ∈ H 2 (± ). Proof. Let H be the smooth boundary of a domain H with − ⊂ H ⊂ H ⊂ U. Then y|H ∈ H 3/2 (H ). The problem -
−div(μ+ ∇yH ) = 0 ∂yH ∂n
=g
on
∂U,
in
U \H ,
yH = y|H
on
H
has a unique solution yH ∈ H 2 (U \H ) with y H = y + |H . Therefore, b := y|∂U = yH |∂U ∈ H 3/2 (∂U ). Then the solution y to (11.3.7) coincides with the solution to ⎧ in − ∪ + , ⎨ −div(μ∇y) = 0 ∂y [y] = 0, [μ ∂n− ] = 0 on , ⎩ y=b on ∂U. We now argue that y ± ∈ H 2 (± ). Let yb ∈ H 2 (U ) denote the solution to in U, − yb = 0 yb = b on ∂U. Define w ∈ H01 (U ) as the unique solution to the interface problem ⎧ ⎪ ⎨ − div(μ∇w) = 0 ∂yb ∂w [w] = 0, [μ ∂n − ] = −[μ ∂n− ] ⎪ ⎩ w=0
in , on ,
(11.3.8)
on ∂U.
i
i i
i
i
i
i
318
ItoKunisc 2008/6/12 page 318 i
Chapter 11. Shape Optimization
∂yb 1/2 Then yb ∈ H 2 () implies [μ ∂n (). By [ChZo], (11.3.8) has a unique solution −] ∈ H 1 w ∈ H0 (U ) with the additional regularity w ± ∈ H 2 (± ). Consequently y = w + yb satisfies y|∂ = g and y ± ∈ H 2 (± ), as desired. In an analogous way p ± ∈ H 2 (± ).
To consider the inverse problem (11.3.6), (11.3.7) within the general framework of Section 11.2 we set X = {v ∈ H 1 (U ) : ∂U v = 0} and define e(y, ), ψ X∗ ×X = (μ∇y, ∇ψ)U − (g, ψ)∂U , respectively, e(y, ˜ t), ψ X∗ ×X = (μt At ∇y, At ∇ψIt )U − (g, ψ)∂U = (μ+ ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))+t + (μ− ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))−t − (g, ψ)∂U . Note that the boundary term is not affected by the transformation Ft since the deformation field h vanishes on ∂U . The adjoint state is given by − div(μ∇p) = 0 in − ∪ + , 1 2 ∂p [p] = 0, μ − = 0 on , ∂n ∂p = 2(y − z) on ∂U, ∂n
(11.3.9)
respectively, (μ∇p, ∇ψ)U = 2(y − z, ψ)∂U
for ψ ∈ X.
(11.3.10)
Assumption (H4) requires us to consider 1 | e(y ˜ t − y, t) − e(y t − y, ), ψ | t 1 ≤ |(μ+ It At ∇(y t − y), At ∇ψ) − (μ+ ∇(y t − y), ∇ψ)| dx t + 1 + |(μ− It At ∇(y t − y), At ∇ψ) − (μ+ ∇(y t − y), ∇ψ)| dx t −
1
+ t
≤μ
t (It At − I )∇(y − y), At ∇ψ dx +
1 + t
+μ
(∇(y − y), t (At − I )∇ψ) dx +
1
− t
+μ
t (It At − I )∇(y − y), At ∇ψ dx −
1 − t
+μ
(∇(y − y), t (At − I )∇ψ) dx. − The right-hand side of this inequality converges to 0 as t → 0+ by (11.2.8). The remaining assumptions can be verified as in Section 11.3.1 for the Dirichlet problem and thus Theorem
i
i i
i
i
i
i
11.3. Examples
ItoKunisc 2008/6/12 page 319 i
319
11.3 is applicable. By Proposition 11.7 the restrictions y ± = y|± , p ± = p|± satisfy y ± , p± ∈ H 2 (± ). Using Lemma 11.5 we find that d e(y, ˜ t), p X∗ ×X |t=0 dt
(μ+ ∇y + , ∇p+ )(h, n+ ) ds − (μ+ ∇(∇y + · h), ∇p + ) dx ∂+ + + + + (μ ∇y , ∇(∇p · h)) dx + (μ− ∇y − , ∇p− )(h, n− ) ds − + ∂− − (μ− ∇(∇y − · h), ∇p − ) dx − (μ− ∇y − , ∇(∇p − · h)) dx − − + + = [μ∇y, ∇p](h, n ) ds − μ (∇(∇y + · h), ∇p + ) + (∇y + , ∇(∇p + · h)) dx + μ− (∇(∇y − · h), ∇p − ) + (∇y − , ∇(∇p − · h)) dx. − =
−
Applying Green’s formula as in Example 11.3.1 (observe that (∇y, h), (∇p, h) ∈ / H 1 (U )) together with (11.3.9) results in (μ+ ∇(∇y + · h), ∇p + )) dx − (μ− ∇(∇y − · h), ∇p − )) dx − + − + + + = div(μ ∇p )(∇y , h) dx + div(μ− ∇p − )(∇y − , h) dx + − − (μ+ ∇p + , n+ )(∇y + , h) ds − (μ− ∇p − , n− )(∇y − , h) ds ∂+ ∂− 2 1 ∂p =− μ + (∇y, h) ds. ∂n In the last step we utilize h = 0 on ∂U . Similarly we obtain (μ+ ∇y + , ∇(∇p + · h))) dx − (μ− ∇y − , ∇(∇p − · h))) dx − + − 2 1 ∂y μ + (∇p, h) ds. =− ∂n Collecting terms results in Jˆ ()h = −
1
[μ(∇y, ∇p)] (h, n+ ) ds+
μ
2 1 2 ∂p ∂y (∇y, h) + μ (∇p, h) ds. ∂n+ ∂n+
The identity [ab] = [a]b+ + a − [b] = a + [b] + [a]b− implies [ab] = 0
if [a] = [b] = 0.
i
i i
i
i
i
i
320
Chapter 11. Shape Optimization
Hence the transition conditions
where
ItoKunisc 2008/6/12 page 320 i
∂ ∂τ
1 2 1 2 ∂y ∂y μ + = = 0, ∂n ∂τ 1 2 1 2 ∂p ∂p μ + = = 0, ∂n ∂τ
(11.3.11)
stands for the tangential derivative imply 1 2 2 1 ∂p ∂p ∂y ∂p ∂y + μ + (∇y, h) = μ + (h, τ ) (h, n ) + μ ∂n ∂n ∂n+ ∂n+ ∂τ 1 2 2 1 ∂p ∂y ∂p ∂y + = μ + (h, τ ) (h, n ) + μ ∂n ∂n+ ∂n+ ∂τ 1 2 ∂p ∂y = μ + (h, n+ ), ∂n ∂n+
and analogously
which entails
1
2 2 1 ∂y ∂p ∂y μ + (∇p, h) = μ + (h, n+ ), ∂n ∂n ∂n+
2 ∂p ∂y μ + (h, n+ ) ds [μ(∇y, ∇p)] (h, n ) ds + 2 ∂n ∂n+ 2 2 1 1 ∂y ∂p ∂y ∂p + =− μ μ + (h, n+ ) ds (h, n ) ds + ∂τ ∂τ ∂n ∂n+ 2 1 ∂y ∂p ∂y ∂p + = − [μ] μ + (h, n+ ) ds. (h, n ) ds + ∂τ ∂τ ∂n ∂n+
Jˆ ()h = −
+
1
In view of (11.3.11) this can be rearranged as 2 1 ∂y ∂p ∂y ∂p −[μ] + μ + ∂τ ∂τ ∂n ∂n+ − ∂y ∂p ∂y ∂p ∂y + ∂p + ∂p − − ∂y = −μ+ − μ + μ− + μ+ ∂τ ∂τ ∂n+ ∂n+ ∂n+ ∂n+ ∂τ ∂τ + − − + ∂y ∂p 1 ∂y ∂p ∂y ∂p = −μ+ + + ∂τ ∂τ 2 ∂n+ ∂n+ ∂n+ ∂n+ − ∂y ∂p 1 ∂y ∂p + ∂y + ∂p − + μ− + + ∂τ ∂τ 2 ∂n+ ∂n+ ∂n+ ∂n+ 1 = − [μ] (∇y + , ∇p− ) + (∇y − , ∇p+ ) , 2 which gives the representation 1 ˆ J ()h = − [μ] (∇y + , ∇p− ) + (∇y − , ∇p+ ) (h, n+ ) ds 2 = − [μ](∇y + , ∇p− ) (h, n+ ) ds.
i
i i
i
i
i
i
11.3. Examples
11.3.3
ItoKunisc 2008/6/12 page 321 i
321
Elliptic systems
Here we consider a domain = U \ D, where D¯ ⊂ U and the boundaries ∂U and = ∂D are assumed to be C 2 regular. We consider the optimization problem min J (y, , ) ≡ j1 (y) dx + j2 (y) ds,
where y is the solution of the elliptic system e(y, ), ψ X∗ ×X = a(x, ∇y, ∇ψ) − (f, ψ) dx − (g, ψ) ds = 0
(11.3.12)
in X = {v ∈ H 1 ()l : v|∂U = 0}. Above ∇y stands for (Dy)T . We require that f ∈ H 1 (U )l and that g is the trace of a given function G ∈ H 2 (U )l . Furthermore we assume that a : U¯ × Rd×d × Rd×d satisfies (1) a(·, ξ, η) is continuously differentiable for every ξ , η ∈ Rd×d , (2) a(x, ·, ·) defines a bilinear form on Rd×d × Rd×d which is uniformly bounded in x ∈ U¯ , (3) a(x, ·, ·) is uniformly coercive for all x ∈ U¯ . In the case of linear elasticity a is given by a(x, ∇y, ∇ψ) = λ tr e(y) tr e(ψ) + 2μ e(y) : e(ψ), where e(y) = 12 (∇y + (∇y)T ), and λ, μ are the positive Lamé coefficients. In this case a is symmetric, and (11.3.12) admits a unique solution in X ∩ H 2 ()l for every f ∈ L2 ()l 1 and g ∈ H 2 (∂U )l ; see, e.g., [Ci]. The method of mapping suggests defining e(y, ˜ t), ψ X∗ ×X = a(Ft (x), At ∇y, At ∇ψ) − (f t , ψ) It dx − (g t , ψ)wt ds = a(x, ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 )) − (f, ψ ◦Ft−1 ) dx − (g, ψ ◦Ft−1 ) ds. t
t
(11.3.13) The adjoint state is determined by the equation ey (y, )ψ, p X∗ ×X = a(x, ∇ψ, ∇p) − j1 (y)ψ dx − j2 (y)ψ ds = 0, (11.3.14)
ψ ∈ X. Under the regularity assumptions on a, (11.3.12) admits a unique solution in X ∩ H 2 ()l and the adjoint equation admits a solution for any right-hand side in X ∗ so
i
i i
i
i
i
i
322
ItoKunisc 2008/6/12 page 322 i
Chapter 11. Shape Optimization
that Proposition 11.4 is applicable. All these properties are satisfied for the linear elasticity case. Assumptions (H1)–(H4) can then be argued as in Section 11.3.1. Employing Lemma 11.5 we obtain d a(x, ∇(∇y T h), ∇p) + a(x, ∇y, ∇(∇p T h)) dx e(y, ˜ t), p X∗ ×X |t=0 = − dt T + a(x, ∇y, ∇p) (h, n) ds + (f, ∇p h) dx − (f, p) (h, n) ds ∂ + (g, ∇p T h) ds − (g, p) + κ(g, p) (h, n) ds. ∂n Since ∇y T h ∈ X and ∇p T h ∈ X, this expression can be simplified using (11.3.12) and (11.3.14): d T ∗ e(y, ˜ t), p X ×X |t=0 = − j1 (y)∇y h dx − j2 (y)∇y T h ds dt ∂ + a(x, ∇y, ∇p) − (f, p) (h, n) ds − (g, p) + κ(g, p) (h, n) ds, ∂n which implies
j1 (y)∇y T h dx + j1 (y) div h dx + j2 (y)∇y T h ds + j2 (y) div h ds ∂ + −a(x, ∇y, ∇p) + (f, p) + (g, p) + κ(g, p) (h, n) ds. ∂n
Jˆ ()h =
For the third and fourth terms the tangential Green’s formula (see, e.g., [DeZo]) yields
j2 (y)∇y T h ds
+
j2 (y) div h ds =
∂ j2 (y) + κj2 (y) ∂n
(h, n) ds.
The first and second terms can be combined using the Stokes theorem. Summarizing we finally obtain Jˆ ()h =
−a(x, ∇y, ∇p) + (f, p) + j1 (y)
+
∂ j2 (y) + (g, p) + κ(j2 (y) + (g, p)) (h, n) ds. ∂n
(11.3.15)
This example also comprises the shape optimization problem of Bernoulli type: min J (y, , ) ≡ min
y 2 ds,
i
i i
i
i
i
i
11.3. Examples
ItoKunisc 2008/6/12 page 323 i
323
where y is the solution of the mixed boundary value problem − y = f y =0 ∂y =g ∂n
in , on ∂U, on ,
which was analyzed with a similar approach in [IKP]. Here the boundary ∂ of the domain ⊂ R2 is the disjoint union of a fixed part ∂U and an unknown part both with nonempty relative interior. Let the state space X be given by X = {ϕ ∈ H 1 () : ϕ = 0 on ∂U }. Then the Eulerian derivative of J is given by (11.3.15), which reduces to ∂ 2 2 ˆ J ()h = −(∇y, ∇p) + fp + (y + gp) + κ(y + gp) (h, n) ds. ∂n This result coincides with the representation obtained in [IKP]. The present derivation, however, is considerably simpler due to a better arrangement of terms in the proof of Theorem 11.3. It is straightforward to adapt the framework to shape optimization problems associated with the exterior Bernoulli problem.
11.3.4
Navier–Stokes system
Consider the stationary Navier–Stokes equations −ν y + (y · ∇)y + ∇p = f div y = 0
in , in ,
y=0
on ∂
(11.3.16)
on a bounded domain ⊂ Rd , d = 2, 3, with ν > 0 and f ∈ H 1 (U ). In the context of the general framework we set = D with C 2 -boundary = ∂. The variational formulation of (11.3.16) is given by Find (y, p) ∈ X ≡ H01 ()d × L2 ()/R such that e (y, p), , (ψ, χ ) X∗ ×X ≡ ν(∇y, ∇ψ) + ((y · ∇)y, ψ) − (p, div ψ) − (f, ψ) + (div y, χ ) = 0
(11.3.17)
holds for all (ψ, χ ) ∈ X. Let the cost functional J be given by j1 (y) dx. J (y, ) =
Considering (11.3.17) on a perturbed domain t mapping the equation back to the reference domain yields the form of e(y, ˜ t). Concerning the transformation of the divergence we note that for ψt ∈ H01 (t )d and ψ t = ψt ◦ Ft ∈ H01 ()d , one obtains div ψt = (Dψit ATt ei ) ◦ Ft−1 = ((At )i ∇ψt,i ) ◦ Ft−1 ,
i
i i
i
i
i
i
324
ItoKunisc 2008/6/12 page 324 i
Chapter 11. Shape Optimization
where ei stands for the ith canonical basis vector in Rd and (At )i denotes the ith row of At = (DFt )−T . We follow the convention to sum over indices which occur at least twice in a term. Thus one obtains t t e˜ (y , p ), t , (ψ, χ ) X∗ ×X = ν(It At ∇y t , At ∇ψ) + (y t · At ∇)y t , It ψ − p t , It (At )k ∇ψk − (f t It , ψ) + It (At )k ∇ykt , χ = 0 for all (ψ, χ ) ∈ X. The adjoint state (λ, q) ∈ X is given by the solution to e (y, p), (ψ, χ ), (λ, q) X∗ ×X = (j1 (y), ψ) , which amounts to ν(∇ψ, ∇λ) + ((ψ · ∇)y + (y · ∇)ψ, λ) − (χ , div λ) + (div ψ, q) = (j1 (y), ψ)
(11.3.18)
for all (ψ, χ ) ∈ X. Integrating by parts one obtains ψ · ((y · ∇)λ) dx ((y · ∇)ψ, λ) = − ψ · λ div y dx − + (ψ · λ) (y · n) ds = −(ψ, (y · ∇)λ)
because y ∈ H01 ()d and div y = 0. Therefore ((ψ · ∇)y + (y · ∇)ψ, λ) = (ψ, (∇y)λ − (y · ∇)λ )
(11.3.19)
holds for all ψ ∈ H 1 ()d . As a consequence the adjoint equation can be interpreted as −ν λ + (∇y)λ − (y · ∇)λ − ∇q = j1 (y), div λ = 0,
(11.3.20)
2 where the first equation holds in L2 ()d and the second one in L (). d For the evaluation of dt e˜ (y, p), t , (λ, q) X∗ ×X |t=0 , (y, p), (λ, q) ∈ X being the solution of (11.3.17), respectively, (11.3.18), we transform this expression back to t , which gives e˜ (y, p), t , (λ, q) X∗ ×X = ν(∇(y ◦ Ft−1 ), ∇(λ ◦ Ft−1 ))t + (y ◦ Ft−1 · ∇)y ◦ Ft−1 , λ ◦ Ft−1 t − (div(λ ◦ Ft−1 ), p ◦ Ft−1 )t
− (f, λ ◦ Ft−1 )t + (div(y ◦ Ft−1 ), q ◦ Ft−1 )t . To verify conditions (H1)–(H4) we introduce the continuous trilinear form c : H01 ()d × × H01 ()d by c(y, v, w) = ((y · ∇)v, w) and assume that
H01 ()d
ν 2 > N |f |H −1
and
ν > M,
(11.3.21)
i
i i
i
i
i
i
11.3. Examples
ItoKunisc 2008/6/12 page 325 i
325
where N = supy,v,w∈H01
c(y, v, w) |y|H01 |v|H01 |w|H01
and
M = supv∈H01
c(v, v, y) , |v|2H 1 0
with y the solution to (11.3.16). Condition (H1) is satisfied by construction. If ν is sufficiently large so that the first inequality in (11.3.21) is satisfied, existence of a unique solution (y, p) ∈ H01 ()d × L2 ()/R to (11.3.16) is guaranteed; see, e.g., [Te]. The second condition in (11.3.21) ensures the bijectivity of the linearized operator e (y, p), and thus (H6) holds. In particular this implies that (H2) holds and that the adjoint equation admits a unique solution. To verify (H3) we consider for arbitrary (v, q) ∈ X and (ψ, χ ) ∈ X e((v, q), ) − e((y, p), ) − e ((y, p), )((v, q) − (y, p)), (ψ, χ ) X∗ ,X = ((v − y) · ∇(v − y), ψ) ≤ K|ψ|H01 () |v − y|2H 1 () , 0
where K is an embedding constant, independent of (v, q) ∈ X and (ψ, χ ) ∈ X. Verifying (H4) requires us to consider the quotient of the following expression with t and taking the limit as t → 0: ν[(It At ∇(y t − y), At ∇ψ) − (∇(y t − y), ∇ψ)] + [((y t · At ∇)y t , It ψ) − ((y t · ∇)y t , ψ) − ((y · At ∇)y, It ψ) + ((y · ∇)y, ψ)] − [(It (At )k ∇ψk , pt − p) + (div ψ, p t − p)] + [(It (At )k ∇ (ykt − yk ), χ ) − (div(y t − y), χ )] for (ψ, χ ) ∈ X. The first two terms in square brackets can be treated by analogous estimates as in Sections 11.3.1 and 11.3.2. Noting that the third and fourth square brackets can be estimated quite similarly to each other, we give the estimate for the last one: ((It − 1)(At )k ∇(ykt − yk ), χ ) + (((At )k − ek )∇(ykt − yk ), χ ) (ek ∇(ykt − yk ) − div(y t − y), χ ), which, upon division by t, tends to 0 for t → 0. In the following calculation we utilize that (y, p), (λ, q) ∈ H 2 ()d × H 1 (), which is satisfied if is C 2 . Applying Lemma 11.5 results in d e˜ (y, p), t , (λ, q) X∗ ×X |t=0 dt = ν(∇(−∇y T h), ∇λ) + ν(∇y, ∇(−∇λT h)) + ν (∇y, ∇λ) (h, n) ds T T + (−∇y h) · ∇ y, λ + (y · ∇)(−∇y h), λ T + (y · ∇)y, −∇λ h + (y · ∇)y, λ (h, n) ds T − (−∇p h, div λ) − (p, div(−∇λT h)) − p div λ (h, n) ds − (f, −∇λT h) − f λ (h, n) ds T + (div(−∇y h), q) + (div y, −∇q T h) + q div y (h, n) ds.
i
i i
i
i
i
i
326
ItoKunisc 2008/6/12 page 326 i
Chapter 11. Shape Optimization
Since div y = div λ = 0 and y, λ ∈ H01 ()d , this expression simplifies to d e˜ (y, p), t , (λ, q) X∗ ×X |t=0 dt = ν(∇y, ∇ψλ ) + ((y · ∇)y, ψλ ) − (p, div ψλ ) − (f, ψλ ) + ν(∇ψy , ∇λ) + ((ψy · ∇)y + (y · ∇)ψy , λ) + (div ψy , q) + ν (∇y, ∇λ) (h, n) ds,
where we have used the abbreviation ψy = −(∇y)T h,
ψλ = −(∇λ)T h.
Note that ψy , ψλ ∈ H 1 ()d but not in H01 ()d . Green’s formula, together with (11.3.16), (11.3.20), entails d e˜ (y, p), t , (λ, q) X∗ ×X |t=0 dt
∂y , ψλ ds + p (ψλ , n) ds = (−ν y + (y · ∇y)y + ∇p − f, ψλ ) + ν ∂n ∂λ + (ψy , −ν λ + (∇y)λ − (y · ∇)λ − ∇q) + ν , ψy ds ∂n + q (ψy , n) ds + ν (∇y, ∇λ) (h, n) ds ∂y ∂λ =− ν , (∇λ)T h ds − p ((∇λ)T h, n) ds − ν , (∇y)T h ds ∂n ∂n − q ((∇y)T h, n) ds + ν (∇y, ∇λ) (h, n) ds − (j1 (y), (∇y)T h) ∂y ∂λ ∂λ ∂y ν =− , +p ,n + q ,n (h, n) ds − (j1 (y), (∇u)T h) . ∂n ∂n ∂n ∂n
Arguing as in Section 11.3.3 one eventually obtains by Theorem 11.3 ∂y ∂λ ∂λ ∂y Jˆ ()h = ν , +p ,n + q ,n (h, n) ds ∂n ∂n ∂n ∂n + (j1 (y) div h + j1 (y)∇y T h) dx ∂y ∂λ ∂λ ∂y ν = , +p ,n + q , n + j1 (y) (h, n) ds. ∂n ∂n ∂n ∂n
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 327 i
Bibliography [ACFK]
J. Albert, C. Carstensen, S. A. Funken, and R. Close, MATLAB implementation of the finite element method in elasticity, Computing 69(2002), 239–263.
[AcVo]
R. Acar and C. R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems, Inverse Problems 10(1994), 1217–1229.
[Ad]
R. Adams, Sobolev Spaces, Academic Press, Boston, 1975.
[AlMa]
W. Alt and K. Malanowski, The Lagrange-Newton method for nonlinear optimal control problems, Comput. Optim. Appl. 2(1993), 77–100.
[Alt1]
W. Alt, Stabilität mengenwertiger Abbildungen mit Anwendungen auf nichtlineare Optimierungsprobleme, Bayreuther Mathematische Schriften 3, 1979.
[Alt2]
W. Alt, Lipschitzian perturbations in infinite optimization problems, in Mathematical Programming with Data Perturbations II, A.V. Fiacco, ed., Lecture Notes in Pure and Appl. Math. 85, Marcel Dekker, New York, 1983, 7–21.
[Alt3]
W. Alt, Stability of Solutions and the Lagrange-Newton Method for Nonlinear Optimization and Optimal Control Problems, Habilitation thesis, Bayreuth, 1990.
[Alt4]
W. Alt, The Lagrange-Newton method for infinite-dimensional optimization problems, Numer. Funct. Anal. Optimiz. 11(1990), 201–224.
[Alt5]
W. Alt, Sequential quadratic programming in Banach spaces, in Advances in Optimization, W. Oettli and D. Pallaschke, eds., Lecture Notes in Econom. and Math. Systems 382, Springer-Verlag, Berlin, 1992, 281–301.
[Ba]
V. Barbu, Analysis and Control of Nonlinear Infinite Dimensional Systems, Academic Press, Boston, 1993.
[BaHe]
A. Battermann and M. Heinkenschloss, Preconditioners for Karush-KuhnTucker matrices arising in the optimal control of distributed systems, in Control and Estimation of Distributed Parameter Systems, Vorau (1996), Internat. Ser. Numer. Math. 126, Birkhäuser, Basel, 1998, 15–32.
[BaK1]
V. Barbu and K. Kunisch, Identification of nonlinear elliptic equations, Appl. Math. Optim. 33(1996), 139–167. 327
i
i i
i
i
i
i
328
ItoKunisc 2008/6/12 page 328 i
Bibliography
[BaK2]
V. Barbu and K. Kunisch, Identification of nonlinear parabolic equations, Control Theory Adv. Tech. 10(1995), 1959–1980.
[BaKRi]
V. Barbu, K. Kunisch, and W. Ring, Control and estimation of the boundary heat transfer function in Stefan problems, RAIRO Modél Math. Anal. Numér. 30(1996), 1–40.
[BaPe]
V. Barbu and Th. Percupanu, Convexity and Optimization in Banach Spaces, Reidel, Dordrecht, 1986.
[BaSa]
A. Battermann and E. Sachs, An indefinite preconditioner for KKT systems arising in optimal control problems, in Fast Solution of Discretized Optimization Problems, Berlin, 2000, Internat. Ser. Numer. Math. 138, Birkhäuser, Basel, 2001, 1–18.
[Be]
D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Academic Press, Paris, 1982.
[BeK1]
M. Bergounioux and K. Kunisch, Augmented Lagrangian techniques for elliptic state constrained optimal control problems, SIAM J. Control Optim. 35(1997), 1524–1543.
[BeK2]
M. Bergounioux and K. Kunisch, Primal-dual active set strategy for state constrained optimal control problems, Comput. Optim. Appl. 22(2002), 193– 224.
[BeK3]
M. Bergounioux and K. Kunisch, On the structure of the Lagrange multiplier for state-constrained optimal control problems, Systems Control Lett. 48(2002), 16–176.
[BeMeVe]
R. Becker, D. Meidner, and B. Vexler, Efficient numerical solution of parabolic optimization problems by finite element methods, Optim. Methods Softw. 22(2007), 813–833.
[BePl]
A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sciences, Computer Science and Scientific Computing Series, Academic Press, New York, 1979.
[Ber]
M. Berggren, A Unified Discrete-Continuous Sensitivity Analysis Method for Shape Optimization, Lecture at the Radon Institut, Linz, Austria, 2005.
[Bew]
T. Bewley, Flow control: New challenges for a new renaissance, Progress in Aerospace Sciences 37(2001), 24–58.
[BGHW]
L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders, eds., Large-Scale PDE-Constrained Optimization, Lecture Notes in Comput. Sci. Eng. 30, Springer-Verlag, Berlin, 2003.
[BGHKW] L. T. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders, eds., Real-Time PDE-Constrained Optimization, SIAM, Philadelphia, 2007.
i
i i
i
i
i
i
Bibliography
ItoKunisc 2008/6/12 page 329 i
329
[BHHK]
M. Bergounioux, M. Haddou, M. Hintermüller, and K. Kunisch, A comparison of a Moreau–Yosida-based active set strategy and interior point methods for constrained optimal control problems, SIAM J. Optim. 11(2000), 495–521.
[BIK]
M. Bergounioux, K. Ito, and K. Kunisch, Primal-dual strategy for constrained optimal control problems, SIAM J. Control Optim. 37(1999), 1176–1194.
[BKW]
A. Borzì, K. Kunisch, and D. Y. Kwak, Accuracy and convergence properties of the finite difference multigrid solution of an optimal control optimality system, SIAM J. Control Optim. 41(2003), 1477–1497.
[BoK]
A. Borzì and K. Kunisch, The numerical solution of the steady state solid fuel ignition model and its optimal control, SIAM J. Sci. Comput. 22(2000), 263–284.
[BoKVa]
A. Borzì, K. Kunisch, and M. Vanmaele, A multigrid approach to the optimal control of solid fuel ignition problems, in Multigrid Methods VI, Lect. Notes Comput. Sci. Eng. 14, Springer-Verlag, Berlin, 59–65.
[Bre]
H. Brezis, Opérateurs Maximaux Monotones et Semi-groupes de Constraction das le Espaces de Hilbert, North–Holland, Amsterdam, 1973.
[Bre2]
H. Brezis, Problèmes unilatéraux, J. Math. Pures Appl. 51(1972), 1–168.
[CaTr]
E. Casas and F. Tröltzsch, Second-order necessary and sufficient optimality conditions for optimization problems and applications to control theory, SIAM J. Optim. 13(2002), 406–431.
[ChK]
G. Chavent and K. Kunisch, Regularization of linear least squares problems by total bounded variation, ESAIM Control Optim. Calc. Var. 2(1997), 359–376.
[ChLi]
A. Chambolle and P.-L. Lions, Image recovery via total bounded variation minimization and related problems, Numer. Math. 76(1997), 167–188.
[ChZo]
Z. Chen and J. Zou, Finite element methods and their convergence for elliptic and parabolic interface problems, Numer. Math. 79(1998), 175–202.
[Ci]
P. G. Ciarlet, Mathematical Elasticity, Vol. 1, North–Holland, Amsterdam, 1987.
[CKP]
E. Casas, K. Kunisch, and C. Pola, Regularization by functions of bounded variation and applications to image enhancement, Appl. Math. Optim. 40(1999), 229–258.
[Cla]
F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley and Sons, New York, 1983.
[CNQ]
X. Chen, Z. Nashed, and L. Qi, Smoothing methods and semismooth methods for nondifferentiable operator equations, SIAM J. Numer. Anal. 38(2000), 1200–1216.
i
i i
i
i
i
i
330
ItoKunisc 2008/6/12 page 330 i
Bibliography
[CoKu]
F. Colonius and K. Kunisch, Output least squares stability for parameter estimation in two point value problems, J. Reine Angew. Math. 370(1986), 1–29.
[DaLi]
R. Dautray and J. L. Lions, Mathematical Analysis and Numerical Methods for Science and Technology, Vol. 3, Springer-Verlag, Berlin, 1990.
[Dei]
K. Deimling, Nonlinear Functional Analysis, Springer-Verlag, Berlin, 1985.
[DeZo]
M. C. Delfour and J.-P. Zolésio, Shapes and Geometries: Analysis, Differential Calculus, and Optimization, SIAM, Philadelphia, 2001.
[Don]
A. L. Dontchev, Local analysis of a Newton-type method based on partial elimination, in The Mathematics of Numerical Analysis, Lectures in Appl. Math. 32, AMS, Providence, RI, 1996, 295–306.
[DoSa]
D. C. Dobson and F. Santosa, Recovery of blocky images from noisy and blurred data, SIAM J. Appl. Math. 56(1996), 1181–1198.
[dReKu]
C. de los Reyes and K. Kunisch, A comparison of algorithms for control constrained optimal control of the Burgers equation, Calcolo 41(2004), 203– 225.
[EcJa]
C. Eck and J. Jarusek, Existence results for the static contact problem with Coulomb friction, Math. Models Methods Appl. Sci. 8(1998), 445–468.
[EkTe]
I. Ekeland and R. Temam, Convex Analysis and Variational Problems, North– Holland, Amsterdam, 1976.
[EkTu]
I. Ekeland and T. Turnbull, Infinite Dimensional Optimization and Convexity, The University of Chicago Press, Chicago, 1983.
[Fat]
H. O. Fattorini, Infinite Dimensional Optimization and Control Theory, Cambridge University Press, Cambridge, 1999.
[FGH]
A. V. Fursikov, M. D. Gunzburger, and L. S. Hou, Boundary value problems and optimal boundary control for the Navier–Stokes systems: The twodimensional case, SIAM J. Control Optim. 36(1998), 852–894.
[FoGl]
M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applications to Numerical Solutions of Boundary Value Problems, North–Holland, Amsterdam, 1983.
[Fr]
A. Friedman, Variational Principles and Free Boundary Value Problems, John Wiley and Sons, New York, 1982.
[Geo]
V. Georgescu, On the unique continuation property for Schrödinger Hamiltonians, Helv. Phys. Acta 52(1979), 655–670.
[GeYa]
D. Geman and C. Yang, Nonlinear image recovery with half-quadratic regularization, IEEE Trans. Image Process. 4(1995), 932–945.
i
i i
i
i
i
i
Bibliography
ItoKunisc 2008/6/12 page 331 i
331
[GHS]
M. D. Gunzburger, L. Hou, and T. P. Svobodny, Finite element approximations of optimal control problems associated with the scalar Ginzburg-Landau equation, Comput. Math. Appl. 21(1991), 123–131.
[GiRa]
V. Girault and P.-A. Raviart, Finite Element Methods for Navier–Stokes Equations, Springer-Verlag, Berlin, 1986.
[Giu]
E. Giusti, Minimal Surfaces and Functions of Bounded Variation, Birkhäuser, Boston, 1984.
[Glo]
R. Glowinski, Numerical Methods for Nonlinear Variational Problems, Springer-Verlag, Berlin, 1984.
[GLT]
R. Glowinski, J. L. Lions, and R. Tremoliers, Numerical Analysis of Variational Inequalities, North–Holland, Amsterdam, 1981.
[Gri]
P. Grisvard, Elliptic Problems in Nonsmooth Domains, Monographs Stud. Math. 24, Pitman, Boston, MA, 1985.
[Grie]
A. Griewank, The local convergence of Broyden-like methods on Lipschitzian problems in Hilbert spaces, SIAM J. Numer. Anal. 24(1987), 684–705.
[GrVo]
R. Griesse and S. Volkwein, A primal-dual active set strategy for optimal boundary control of a nonlinear reaction-diffusion system, SIAM J. Control Optim. 44(2005), 467–494.
[Gu]
M. D. Gunzburger, Perspectives of Flow Control and Optimization, SIAM, Philadelphia, 2003.
[Han]
S.-P. Han, Superlinearly convergent variable metric algorithms for general nonlinear programming problems, Math. Programming 11(1976), 263–282.
[HaMä]
J. Haslinger and R. A. E. Mäkinen, Introduction to Shape Optimization, SIAM, Philadelphia, 2003.
[HaNe]
J. Haslinger and P. Neittaanmäki, Finite Element Approximation for Optimal Shape, Material and Topology Design, 2nd ed, Wiley, Chichester, 1996.
[HaPaRa]
S.-H. Han, J.-S. Pang, and N. Rangaray, Glabally convergent Newton methods for nonsmooth equations, Math. Oper. Res. 17(1992), 586–607.
[Har]
A. Haraux, How to differentiate the projection on a closed convex set in Hilbert space. Some applications to variational inequalities, J. Math. Soc. Japan 29(1977), 615–631.
[Has]
J. Haslinger, Approximation of the Signorini problem with friction, obeying the Coulomb law, Math. Methods Appl. Sci. 5(1983), 422–437.
[Hei]
M. Heinkenschloss, Projected sequential quadratic programming methods, SIAM J. Optim. 6(1996), 373–417.
i
i i
i
i
i
i
332
ItoKunisc 2008/6/12 page 332 i
Bibliography
[Hes1]
M. R. Hestenes, Optimization Theory. The Finite Dimensional Case, John Wiley and Sons, New York, 1975.
[Hes2]
M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl. 4(1968), 303–320.
[HHNL]
I. Hlavacek, J. Haslinger, J. Necas, and J. Lovisek, Solution of Variational Inequalities in Mechanics, Appl. Math. Sci. 66, Springer-Verlag, New York, 1988.
[HiIK]
M. Hintermüller, K. Ito, and K. Kunisch, The primal-dual active set strategy as a semismooth Newton method, SIAM J. Optim. 13(2002), 865–888.
[HiK1]
M. Hinze and K. Kunisch, Three control methods for time-dependent fluid flow, Flow Turbul. Combust. 65(2000), 273–298.
[HiK2]
M. Hinze and K. Kunisch, Second order methods for optimal control of timedependent fluid flow, SIAM J. Control Optim. 40(2001), 925–946.
[HinK1]
M. Hintermüller and K. Kunisch, Total bounded variation regularization as a bilaterally constrained optimization problem, SIAM J. Appl. Math. 64(2004), 1311–1333.
[HinK2]
M. Hintermüller and K. Kunisch, Feasible and noninterior path-following in constrained minimization with low multiplier regularity, SIAM J. Control Optim. 45(2006), 1198–1221.
[HPR]
S.-P. Han, J.-S. Pang, and N. Rangaraj, Globally convergent Newton methods for nonsmooth equations, Math. Oper. Res. 17(1992), 586–607.
[HuStWo]
S. Hüeber, G. Stadler, and B. I. Wohlmuth, A primal-dual active set algorithm for three-dimensional contact problems with Coulomb friction, SIAM J. Sci. Comput. 30(2008), 572–596.
[IK1]
K. Ito and K. Kunisch, The augmented Lagrangian method for equality and inequality constraints in Hilbert space, Math. Programming 46(1990), 341– 360.
[IK2]
K. Ito and K. Kunisch, The augmented Lagrangian method for parameterization in elliptic systems, SIAM J. Control Optim. 28(1990), 113–136.
[IK3]
K. Ito and K. Kunisch, The augmented Lagrangian method for estimating the diffusion coefficient in an elliptic equation, in Proceedings of the 26th IEEE Conference on Decision and Control, Los Angeles, 1987, 1400–1404.
[IK4]
K. Ito and K. Kunisch, An augmented Lagrangian technique for variational inequalities, Appl. Math. Optim. 21(1990), 223–241.
[IK5]
K. Ito and K. Kunisch, Sensitivity analysis of solutions to optimization problems in Hilbert spaces with applications to optimal control and estimation, J. Differential Equations 99(1992), 1–40.
i
i i
i
i
i
i
Bibliography
ItoKunisc 2008/6/12 page 333 i
333
[IK6]
K. Ito and K. Kunisch, On the choice of the regularization parameter in nonlinear inverse problems, SIAM J. Optim. 2(1992), 376–404.
[IK7]
K. Ito and K. Kunisch, Sensitivity measures for the estimation of parameters in 1-D elliptic boundary value problems, J. Math. Systems Estim., Control 6(1996), 195–218.
[IK8]
K. Ito and K. Kunisch, Maximizing robustness in nonlinear illposed inverse problems, SIAM J. Control Optim. 33(1995), 643–666.
[IK9]
K. Ito and K. Kunisch, Augmented Lagrangian-SQP-methods in Hilbert spaces and application to control in the coefficients problems, SIAM J. Optim. 6(1996), 96–125.
[IK10]
K. Ito and K. Kunisch, Augmented Lagrangian-SQP methods for nonlinear optimal control problems of tracking type, SIAM J. Control Optim. 34(1996), 874–891.
[IK11]
K. Ito and K. Kunisch, Augmented Lagrangian methods for nonsmooth convex optimization in Hilbert spaces, Nonlinear Anal. 41(2000), 573–589.
[IK12]
K. Ito and K. Kunisch, An active set strategy based on the augmented Lagrangian formulation for image restoration, MZAN Math. Model Numer. Anal. 33(1999), 1–21.
[IK13]
K. Ito and K. Kunisch, Augmented Lagrangian formulation of nonsmooth, convex optimization in Hilbert spaces, in Control of Partial Differential Equations, E. Casas, ed., Lecture Notes in Pure andAppl. Math. 174, Marcel Dekker, New York, 1995, 107–117.
[IK14]
K. Ito and K. Kunisch, Estimation of the convection coefficient in elliptic equations, Inverse Problems 13(1997), 995–1013.
[IK15]
K. Ito and K. Kunisch, Newton’s method for a class of weakly singular optimal control problems, SIAM J. Optim. 10(1999), 896–916.
[IK16]
K. Ito and K. Kunisch, Optimal control of elliptic variational inequalities, Appl. Math. Optim. 41(2000), 343–364.
[IK17]
K. Ito and K. Kunisch, Optimal control of the solid fuel ignition model with H 1 -cost, SIAM J. Control Optim. 40(2002), 1455–1472.
[IK18]
K. Ito and K. Kunisch, BV-type regularization methods for convoluted objects with edge, flat and grey scales, Inverse Problems 16(2000), 909–928.
[IK19]
K. Ito and K. Kunisch, Optimal control, in Encyclopedia of Electrical and Electronic Engineering, J. G. Webster, ed., 15, John Wiley and Sons, New York, 1999, 364–379.
[IK20]
K. Ito and K. Kunisch, Semi-smooth Newton methods for variational inequalities of the first kind, MZAN Math. Model. Numer. Anal. 37(2003), 41–62.
i
i i
i
i
i
i
334
ItoKunisc 2008/6/12 page 334 i
Bibliography
[IK21]
K. Ito and K. Kunisch, The primal-dual active set method for nonlinear optimal control problems with bilateral constraints, SIAM J. Control Optim. 43(2004), 357–376.
[IK22]
K. Ito and K. Kunisch, Semi-smooth Newton methods for state-constrained optimal control problems, Systems Control Lett. 50(2003), 221–228.
[IK23]
K. Ito and K. Kunisch, Parabolic variational inequalities: The Lagrange multiplier approach, J. Math. Pures Appl. (9) 85(2006), 415–449.
[IKK]
K. Ito, M. Kroller, and K. Kunisch, A numerical study of an augmented Lagrangian method for the estimation of parameters in elliptic systems, SIAM J. Sci. Statist. Comput. 12(1991), 884–910.
[IKP]
K. Ito, K. Kunisch, and G. Peichl, Variational approach to shape derivatives for a class of Bernoulli problems, J. Math. Anal. Appl. 314(2006), 126–149.
[IKPe]
K. Ito, K. Kunisch, and G. Peichl, On the regularization and the numerical treatment of the inf-sup condition for saddle point problems, Comput. Appl. Math. 21(2002), 245–274.
[IKPe2]
K. Ito, K. Kunisch, and G. Peichl, A variational approach to shape derivatives, to appear in ESAIM Control Optim. Calc. Var.
[IKSG]
K. Ito, K. Kunisch, V. Schulz, and I. Gherman, Approximate Nullspace Iterations for KKT Systems in Model Based Optimization, preprint, University of Trier, 2008.
[ItKa]
K. Ito and F. Kappel, Evolution Equations and Approximations, World Scientific, River Edge, NJ, 2002.
[Ja]
J. Jahn, Introduction to the Theory of Nonlinear Optimization, Springer-Verlag, Berlin, 1994.
[JaSa]
H. Jäger and E. W. Sachs, Global convergence of inexact reduced SQP methods, Optim. Methods Softw. 7(1997), 83–110.
[Ka]
K. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, Berlin, 1980.
[KaK]
A. Kauffmann and K. Kunisch, Optimal control of the solid fuel ignition model, ESAIM Proc. 8(2000), 65–76.
[Kan]
C. Kanzow, Inexact semismooth Newton methods for large-scale complementarity problems, Optim. Methods Softw. 19(2004), 309–325.
[KeSa]
C. T. Kelley and E. W. Sachs, Solution of optimal control problems by a pointwise projected Newton method, SIAM J. Control Optim. 33(1995), 1731– 1757.
i
i i
i
i
i
i
Bibliography
ItoKunisc 2008/6/12 page 335 i
335
[KO]
N. Kikuchi and J. T. Oden, Contact Problems in Elasticity: A Study of Variational Inequalities and Finite Element Methods, SIAM Stud. Appl. Math. 8, SIAM, Philadelphia, 1988.
[Kou]
S. G. Kou, A jump-diffusion model for option pricing, Management Sci. 48(2002), 1086–1101.
[KPa]
K. Kunisch and X. Pan, Estimation of interfaces from boundary measurements, SIAM J. Control Optim. 32(1994), 1643–1674.
[KPe]
K. Kunisch and G. Peichl, Estimation of a temporally and spatially varying diffusion coefficient in a parabolic system by an augmented Lagrangian technique, Numer. Math. 59(1991), 473–509.
[KRe]
K. Kunisch and F. Rendl, An infeasible active set method for quadratic problems with simple bounds, SIAM J. Optim. 14(2003), 35–52.
[KRo]
K. Kunisch and A. Rösch, Primal-dual active set strategy for a general class of optimal control problems, to appear in SIAM J. Optim.
[KSa]
K. Kunisch and E. W. Sachs, Reduced SQP methods for parameter identification problems, SIAM J. Numer. Anal. 29(1992), 1793–1820.
[Kup]
F.-S. Kupfer, An infinite-dimensional convergence theory for reduced SQP methods in Hilbert space, SIAM J. Optim. 6(1996), 126–163.
[KuSa]
F.-S. Kupfer and E. W. Sachs, Numerical solution of a nonlinear parabolic control problem by a reduced SQP method, Comput. Optim. Appl. 1(1992), 113–135.
[KuSt]
K. Kunisch and G. Stadler, Generalized Newton methods for the 2D-Signorini contact problem with friction in function space, MZAN Math. Model. Numer. Anal. 39(2005), 827–854.
[KuTa]
K. Kunisch and X-. Tai, Sequential and parallel splitting methods for bilinear control problems in Hilbert spaces, SIAM J. Numer. Anal. 34(1997), 91–118.
[KVo1]
K. Kunisch and S. Volkwein, Augmented Lagrangian-SQP techniques and their approximation, in Optimization Methods in Partial Differential Equations, Contemp. Math. 209, AMS, Providence, RI, 1997, 147–160.
[KVo2]
K. Kunisch and S. Volkwein, Galerkin proper orthogonal decomposition methods for parabolic systems, Numer. Math. 90(2001), 117–148.
[KVo3]
K. Kunisch and S. Volkwein, Galerkin proper orthogonal decomposition methods for a general equation in fluid dynamics, SIAM J. Numer. Anal. 40(2002), 492–515.
[LeMa]
F. Lempio and H. Maurer, Differential stability in infinite-dimensional nonlinear programming, Appl. Math. Optim. 6(1980), 139–152.
i
i i
i
i
i
i
336
ItoKunisc 2008/6/12 page 336 i
Bibliography
[LiMa]
J.-L. Lions and E. Magenes, Non-Homogeneous Boundary Value Problems and Applications, Springer-Verlag, Berlin, 1972.
[Lio1]
J.-L. Lions, Control of Distributed Singular Systems, Gauthier-Villars, Paris, 1985.
[Lio2]
J.-L. Lions, Optimal Control of Systems Governed by Partial Differential Equations, Springer-Verlag, Berlin, 1971.
[Lio3]
J.-L. Lions, Quelques Methodes de Resolution de Problemes aux Limites Non Lineares, Dunod, Paris, 1969.
[MaZo]
H. Maurer and J. Zowe, First and second-order necessary and sufficient optimality conditions for infinite-dimensional programming problems, Math. Programming 16(1979), 98–110.
[Mif]
R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim. 15(1977), 959–972.
[MuSi]
F. Murat and J. Simon, Sur le controle par un domain geometrique, Rapport 76015, Universite Pierre et Marie Curie, Paris, 1976.
[Pan1]
J. S. Pang, Newton’s method for B-differentiable equations, Math. Oper. Res. 15(1990), 311–341.
[Pan2]
J. S. Pang, A B-differentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Math. Programming 51(1991), 101–131.
[Paz]
A. Pazy, Semi-groups of nonlinear contractions and their asymptotic behavior, in Nonlinear Analysis and Mechanics: Heriot Watt Symposium, Vol III, R. J. Knops, ed., Res. Notes in Math. 30, Pitman, Boston, MA, 1979, 36–134.
[PoTi]
E. Polak andA. L. Tits, A globally convergent implementable multiplier method with automatic limitation, Appl. Math. Optim. 6(1980), 335–360.
[PoTr]
V. T. Polak and N. Y. Tret’yakov, The method of penalty estimates for conditional extremum problems, Žurnal Vycˇ islitel’noˇı Matematiki i Matematicˇ eskoˇı Fiziki 13(1973), 34–46.
[Pow]
M. J. D. Powell, A method for nonlinear constraints in minimization problems, in Optimization, R. Fletcher, ed., Academic Press, New York, 1968, 283–298.
[Pow1]
M. J. D. Powell, The convergence of variable metric methods for nonlinearly constrained optimization problems, Nonlinear Programming 3, O. L. Mangasarian, ed., Academic Press, New York, 1987, 27–63.
[Qi]
L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res. 18(1993), 227–244.
i
i i
i
i
i
i
Bibliography
ItoKunisc 2008/6/12 page 337 i
337
[QiSu]
L. Qi and J. Sun, A nonsmooth version of Newton’s method, Math. Programming 58(1993), 353–367.
[Rao]
M. Raous, Quasistatic Signorini problem with Coulomb friction and coupling to adhesion, in New Developments in Contact Problems, P. Wriggers and Panagiotopoulos, eds., CISM Courses and Lectures 384, Springer-Verlag, New York, 1999, 101–178.
[Ro1]
S. M. Robinson, Regularity and stability for convex multivalued functions, Math. Oper. Res. 1(1976), 130–143.
[Ro2]
S. M. Robinson, Stability theory for systems of inequalities, Part II: Differentiable nonlinear systems, SIAM J. Numer. Anal. 13(1976), 497–513.
[Ro3]
S. M. Robinson, Strongly regular generalized equations, Math. of Oper. Res. 5(1980), 43–62.
[Ro4]
S. M. Robinson, Local structure of feasible sets in nonlinear programming, Part III: Stability and sensitivity, Math. Programming Stud. 30(1987), 45–66.
[Roc1]
R. T. Rockafeller, Local boundedness of nonlinear monotone operators, Michigan Math. J. 16(1969), 397–407.
[Roc2]
R. T. Rockafeller, The multiplier method of Hestenes and Powell applied to convex programming, J. Optim. Theory Appl. 12(1973), 34–46.
[ROF]
L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D 60 (1992), 259–268.
[RoTr]
A. Rösch and F. Tröltzsch, On regularity of solutions and Lagrange multipliers of optimal control problems for semilinear equations with mixed pointwise control-state constraints, SIAM J. Control Optim. 46(2007), 1098–1115.
[Sa]
E. Sachs, Broyden’s method in Hilbert space, Math. Programming 35(1986), 71–82.
[Sey]
R. Seydel, Tools for Computational Finance, Springer-Verlag, Berlin, 2002.
[Sha]
A. Shapiro, On concepts of directional differentiability, J. Optim. Theory Appl. 13(1999), 477–487.
[SoZo]
J. Sokolowski and J. P. Zolesio, Introduction to Shape Optimization, SpringerVerlag, Berlin, 1991.
[Sta1]
G. Stadler, Infinite-Dimensional Semi-Smooth Newton and Augmented Lagrangian Methods for Friction and Contact Problems in Elasticity, Ph.D. thesis, University of Graz, 2004.
[Sta2]
G. Stadler, Semismooth Newton and augmented Lagrangian methods for a simplified problem friction, SIAM J. Optim. 15(2004), 39–62.
i
i i
i
i
i
i
338
ItoKunisc 2008/6/12 page 338 i
Bibliography
[Sto]
J. Stoer, Principles of sequential quadratic programming methods for solving nonlinear programs, in Computation Mathematical Programming, K. Schittkowski, ed., NATO Adv. Sci. Inst. Ser F Comput. Systems Sci. 15, SpringerVerlag, Berlin, 1985, 165–207.
[StTa]
J. Stoer and R. A. Tapia, The Local Convergence of Sequential Quadratic Programming Methods, Technical Report 87–04, Rice University, 1987.
[Tan]
H. Tanabe, Equations of Evolution, Pitman, London, 1979.
[Te]
R. Temam, Navier Stokes Equations: Theory and Numerical Analysis, North– Holland, Amsterdam, 1979.
[Tem]
R. Temam, Infinite-Dimensional Dynamical Systems in Mechanics and Physics, Springer-Verlag, Berlin, 1988.
[Tin]
M. Tinkham, Introduction to Superconductivity, McGraw–Hill, New York, 1975.
[Tr]
G. M. Troianiello, Elliptic Differential Equations and Obstacle Problems, Plenum Press, New York, 1987.
[Tro]
F. Troeltzsch, An SQP method for the optimal control of a nonlinear heat equation, Control Cybernet. 23(1994), 267–288.
[Ulb]
M. Ulbrich, Semismooth Newton methods for operator equations in function spaces, SIAM J. Optim. 13(2003), 805–841.
[Vog]
C. R. Vogel, Computational Methods for Inverse Problems, Frontiers Appl. Math. 23, SIAM, Philadelphia, 2002.
[Vol]
S. Volkwein, Mesh-independence for an augmented Lagrangian-SQP method in Hilbert spaces, SIAM J. Control Optim. 38(2000), 767–785.
[We]
J. Werner, Optimization—Theory and Applications, Vieweg, Braunschweig, 1984.
[Zar]
E. M. Zarantonello, Projection on convex sets in Hilbert space and spectral theory, in Contributions to Nonlinear Functional Analysis, E. H. Zarantonello, ed., Academic Press, New York, 1971, 237–424.
[Zo]
J. P. Zolesio, The material derivative (or speed) method for shape optimization, in Optimization of Distributed Parameter Structures, Vol II, E. Haug and J. Cea, eds., Sijthoff & Noordhoff, Alphen aan den Rijn, 1981, 1089–1151.
[ZoKu]
J. Zowe and S. Kurcyusz, Regularity and stability for the mathematical programming problem in Banach spaces, Appl. Math. Optim. 5(1979), 49–62.
i
i i
i
i
i
i
ItoKunisc 2008/6/12 page 339 i
Index adapted penalty, 23 adjoint equation, 18 American options, 277 augmentability, 65, 68, 73 augmented Lagrange functional, 76 augmented Lagrangian, 65, 263 augmented Lagrangian-SQP method, 155
dual cone, 1, 27 dual functional, 76, 115 dual problem, 77, 99 duality pairing, 1 effective domain, 89 Eidelheit separation theorem, 3 elastoplastic problem, 122 epigraph, 89 equality constraints, 3, 8 equality-constrained problem, 7 extremality condition, 253
Babuška–Brezzi condition, 184 Bernoulli problem, 322 Bertsekas penalty functional, 73 BFGS-update formula, 141 biconjugate functional, 93 bilateral constraints, 202 Bingham flow, 120 Black–Scholes model, 277 Bouligand differentiability, 217, 234 Bouligand direction, 226 box constraints, 225 Bratu problem, 13, 152 BV-regularization, 254 BV-seminorm, 87, 254
feasibility step, 149 Fenchel duality theorem, 253 first order augmented Lagrangian, 67, 75 first order necessary optimality condition, 156 friction problem, 125, 263 Gâteaux differentiable, 95 Gauss–Newton algorithm, 228 Gel’fand triple, 246 generalized equation, 31, 33 generalized implicit function theorem, 31 generalized inverse function theorem, 35 globalization strategy, 222
Clarke directional derivative, 222 coercive, 71 complementarity condition, 88, 113 complementarity problem, 51, 215, 240 conical hull, 1 conjugate functional, 92, 253 constrained optimal control, 190 contact problem, 263 convection estimation, 14, 153 convex functional, 89 Coulomb friction, 263
Hessian, 29 image restoration, 121 implicit function theorem, 31 indicator function, 28, 92 inequality constraints, 6 inequality-constrained problem, 7 inverse function theorem, 35 inverse interface problem, 316
descent direction, 225 directional differentiability, 58, 217, 234 339
i
i i
i
i
i
i
340 Karush–Kuhn–Tucker condition, 6 L1 -fitting, 126 Lagrange functional, 28, 66 Lagrange multiplier, 2, 28, 277 Lagrangian method, 75 least squares, 10 linear elasticity, 263 linearizing cone, 2 lower semicontinuous, 89 M-matrix, 194, 196 Mangasarian–Fromowitz constraint qualification, 6 material derivative, 306 Maurer–Zowe optimality condition, 42 maximal monotone operator, 104 mesh-independence, 183 method of mapping, 306 metric projection, 57 monotone operator, 104 Navier–Stokes control, 143, 323 Newton differentiable, 234 Newton method, 129, 133 nonlinear complementarity problem, 231, 243 nonlinear elliptic optimal control problem, 174 normal cone, 28 null-space representation, 138 obstacle problem, 8, 122, 247 optimal boundary control, 20 optimal control, 62, 126, 172 optimal distributed control, 19 optimality system, 18 P -matrix, 194 parabolic variational inequality, 277 parameter estimation, 82 parametric mathematical programming, 27 Hölder continuity of solutions, 45 Lipschitz continuity of solutions, 46 Lipschitz continuity of value function, 39 sensitivity of value function, 55
ItoKunisc 2008/6/12 page 340 i
Index partial semismooth Newton iteration, 244 penalty method, 75 penalty techniques, 24 polar cone, 1, 69 polyhedric, 54, 57 preconditioning, 174 primal equation, 18 primal-dual active set strategy, 189, 202, 240 proper functional, 89 quasi-directional derivative, 225 reduced formulation, xiv reduced SQP method, 139 regular point, 28, 166 regular point condition, 5 restriction operator, 183 saddle point, 103 second order augmented Lagrangian, 155 second order sufficient optimality condition, 156 semismooth function, 216 semismooth Newton method, 192, 215, 241 sensitivity equation, 58 sequential tangent cone, 2 shape calculus, 305 shape derivative, 305, 306, 308 shape optimization, 305 Signorini problem, 124, 263 Slater condition, 6 SQP (sequential quadratic programing), 129, 137 stability analysis, 34 state-constrained optimal control, 191, 247 stationary point, 66 strict complementarity, 65, 67, 76 strong regularity, 31 strong regularity condition, 47 subdifferential, 28, 95 sufficient optimality, 67 sufficient optimality condition, 131 superlinear convergence, 141, 221 Tresca friction, 263
i
i i
i
i
i
i
Index
ItoKunisc 2008/6/12 page 341 i
341
Uzawa method, 118 value function, 39, 99 variational inequality, 246 weakly lower semicontinuous, 89 weakly singular problem, 148 Yosida–Moreau approximation, 87, 248
i
i i
i
i
ItoKunisc 2008/6/12 page 342 i
i
i
i
i
i
i