Lecture Notes in Control and Information Sciences Editor: M. Thoma
252
Springer London Berlin Heidelberg New York Barcelona Hong Kong Milan Paris Santa Clara Singapore Tokyo
Murti V. Salapaka and Mohammed Dahleh
Multiple Objective Control Synthesis With 17 Figures
~ Springer
Series Advisory
Board
A. Bensoussan • M.J. Grimble ' P. Kokotovic • A.B. Kurzhanski • H. Kwakernaak • J.L. Massey • M. Morari Authors Murti V. Salapaka, PhD Department of Electrical Engineering, Iowa State University, Ames, Iowa 50011, USA Mohammed Dahleh, PhD D e p a r t m e n t o f M e c h a n i c a l E n g i n e e r i n g , U n i v e r s i t y o f California, S a n t a B a r b a r a , C A 93106, U S A
ISBN 1-85233-256-5 Springer-Verlag London Berlin Heidelberg British Library Cataloguing in Publication Data Salapaka, Murti V. Multiple objective control synthesis. - (Lecture notes in control and information sciences ; 252) 1.Automatic control - Mathematical models I.Title II.Dahleh, Mohammed 629.8'312 ISBN 1852332565 Library of Congress Cataloging-in-PublicationData A catalog record for this book is available from the Library of Congress Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. © Springer-Verlag London Limited 2000 Printed in Great Britain The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Typesetting: Camera ready by authors Printed and bound at the Athenmum Press Ltd., Gateshead, Tyne & Wear 69/3830-543210 Printed on acid-free paper SPIN 10746705
To my parents: Shri. S. Prasada Rao and Smt. S. Subhadra (Murti V. Salapaka)
To my wife: Marie Dahleh (Mohammed Dahleh)
Preface
Many control design tasks which arise from engineering objectives can be effectively addressed by an equivalent convex optimization problem. The significance of this step stems from efficient computational tools that are available for solving such problems. However, in most cases the resulting convex optimization problem is infinite dimensional. Thus effective finite dimensional convex approximations are needed to complete the control design task. Researchers have employed advanced mathematical tools to exploit and expose the structure of the resulting convex optimization problem with the objective of obtaining computable ways of obtaining the controller. One of the striking insights obtained by such tools was that certain optimal control problems are equivalent to finite dimensional programming problems. Thus seemingly infinite dimensional problems can be converted to finite dimensional problems. Even when the convex optimization problem is truly infinite dimensional or when a finite dimensional characterization is not established, the philosophy that has emerged is to establish finite dimensional approximations to the infinite dimensional problem, which guarantee the optimal performance within any prespecified tolerance. Here too researchers have borrowed advanced mathematical tools to exploit the underlying structure of the problem. One of the difficulties faced by a researcher in this area is the lack of a source where a comprehensive treatment of the tools employed is given. In this book we attempt to fill this gap by developing various topological and functional analytic tools that are commonly used in the formulation and solution of a class of optimal control problems. Efficient design techniques for multi-objective controllers are necessary because often a single measure fails to capture the design performance objective. The standard 7i2,7-/oo and gl designs are incapable of handling such multi-objective concerns because they optimize a single measure which is no guarantee of performance with respect to some other measure. An important subclass of multi-objective problems is the class of problems for synthesizing optimal controllers which guarantee performance with respect to both the 7/2 measure and relevant time domain measures. The 7/2/t71 problem is an example which falls in this class of multi-objective prob-
VII I
Preface
lems where the objective is the design of controllers which optimally reject white noise while guaranteeing stability margins with respect to uncertainty. We apply the developed mathematical tools to such problems where the 7/2 measure and time domain measures on the performance of the closed loop system can be incorporated in a natural manner.
Organization
of the Book
This book can be divided into two parts. The first part constitutes Chapters 1, 2 and 3 where the mathematical machinery is developed. In the second part, (Chapters 4, 5, 6, 7, 8 and 9) various control design problems are formulated and solved. In Chapter 1 we introduce the basic topological concepts. The importance of continuity and compactness with regards to optimization is established. We take a top-down approach where the spaces described have more and more structure as one proceeds through the chapter. In Chapter 2 functions on vector spaces and weak topologies are studied. Motivation for why weak topologies are important is elucidated. The Banach-Alaoglu result on compactness of bounded sets in weak topologies is proven. Important results on sublinear functions are given which prove to be instrumental in studying convex sets and functions. The chapter on convex analysis is the culmination of the mathematical treatment given where we establish the Kuhn-Tucker-Lagrange duality result. Chapter 4 develops a paradigm where control design objectives can be stated precisely and in an effective manner. Youla parameterization of all closed-loop maps achievable via stabilizing controllers is developed. Chapters 5 and 6 study single-input single-output systems: In Chapter 5 the ll norm of the closed loop is minimized while keeping its 7t2 norm below a prespecified value. In Chapter 6 a weighted combination of the 7t2 norm of the closed loop and various other relevant time domain measures is minimized over all stabilizing controllers. Exact solutions to the problems formulated are given and continuity of the solutions with respect to change in parameters is established. Even though these problems address single-input single-output systems, they serve to highlight the nature of mixed objective problems involving the two norm of the c!osed-loop and time domain measures. In Chapter 7 the square case of the of the 7-/2-gl problem is studied. It is shown that the problem is equivalent to a single finite dimensional quadratic programming problem. In Chapter 8 the interplay of the 7/2 and the el norms of the closed-loop in the general multiple-input multiple-output setting is studied. It is shown that controllers can be designed to achieve performance within any given tolerance of the optimal performance via finite dimensional quadratic pro-
Preface
IX
gramming. The design methodology avoids many problems associated with zero-interpolation based methods. Chapter 9 tackles a non-convex problem where the 7/2 norm of the closed loop is minimized while guaranteeing a specified level of fl performance for a collection of plants in a certain class. It is shown that this robust performance problem can be solved via a simplex-like procedure.
Acknowledgements We would like to thank the students at University of California at Santa Barbara who made valuable suggestions at various stages of the book. In particular we would like to thank Srinivasa Salapaka who proofread the first four chapters of the book. We would like to acknowledge Petar Kokotovic for the encouragement he provided in publishing this book. The methodology on multiple objective problems in the book was largely shaped in collboration with Petros Voulgaris. The results presented in Chapter 8 were obtained in collaboration with Mustafa Khammash. The results presented in Chapter 9 were obtained in collaboration with Antonio Vicino and Alberto Tesi. We would like to acknowledge the support of NSF and AFOSR during the period in which this manuscript was written.
Notation
{} (x, (x, d) (x, It II) R
The The The The The The The
empty set. set X endowed with the topology v. set X endowed with the metric d. set X endowed with the norm I1" [I. real number system. n dimensional Euclidean space. p-norm of the vector x E R '~ defined as Ixlp
:=
IxlP) Ixll
g ~rrt × rt
gl g~xn
too ~ r r t × rt
C0
mxt~
C0
g2
The 1-norm of the the vector x E R '~. The 2-norm of the the vector x E R '~. The )~ transform of a right sided real sequence x = (x(k))~= o defined as 5:()~):= }-~=0 x(k) Ak" The vector space of sequences. The vector space of matrix sequences of size m x n.. The Banach space of right sided absolutely summable real sequences with the norm given by II x Itl:= ~ ° = 0 Ix(k)l. The Banach space of matrix valued right sided real sequences n with the norm IlXlll := maxl<.i<_m}~j=l [Ixijlli where x E g~nxn is the m a t r i x (xij) and each xij is in gl. The Banach space of right sided, bounded sequences with the norm given by It x I1~:= sup k Ix(k)l. The Banach space of matrix valued right sided real sequences with the norm Ilxll~ := ~i~__1 maxl<_j_
0. The Banach subspace of g~x~ with elements x which satisfy l i m k - ~ x(k) = O. The Banach space of right sided square summable sequences with the n o r m g i v e n by II x 112:= ( ~ = 0 x(k) 2) ½ The Banach space of matrix valued right sided real sequences with the norm Ilxll2 :=
(Zmi=1 ~ j°= l
Ilxij
ii )
t?~ x'~ is the matrix (xij) and each xij is in g2.
where x •
XII
Notation
"]'/2 X* W(X*,X) T* int(A) 7) AI
The isometric isomorphic image of £2 under the transform ~?(A) with the norm given by I1 a?(A) 112=11 z ]12. The dual space of the Banach space X. < x, x* > denotes the value of the bounded linear functional x* at r E X. The weak star topology on X* induced by X. The adjoint operator of T : X --+ Y which maps Y* to X*. The interior of the set A. The closed unit disc in the complex plane. The transpose of the matrix A.
Contents
1.
Topology ................................................. 1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 G e n e r a l T o p o l o g y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Metric T o p o l o g y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 N o r m e d Linear (Vector) Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 F i n i t e D i m e n s i o n a l Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 E x t r e m a of Real Valued F u n c t i o n s . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 13 16 19 23
2.
Functions on Vector Spaces ............................... 2.1 S u b l i n e a r F u n c t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 D u a l Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 W e a k Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 ~p Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 27 32 36 38
Convex Analysis .......................................... Convex Sets a n d Convex Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . S e p a r a t i o n of Disjoint C o n v e x Sets . . . . . . . . . . . . . . . . . . . . . . . Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 M i n i m u m D i s t a n c e to a C o n v e x Set . . . . . . . . . . . . . . . . . 3.3.2 Kuhn-Tucker Theorem ...........................
45 45 48 55 57 60
4.
Paradigm for Control Design ............................. 4.1 N o t a t i o n a n d P r e l i m i n a r i e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 I n t e r c o n n e c t i o n of S y s t e m s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 I n t e r c o n n e c t i o n of F D L T I C S y s t e m s . . . . . . . . . . . . . . . .
69 69 75 76
5.
S I S O £ 1 / 7/2 P r o b l e m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 P r o b l e m F o r m u l a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 O p t i m a l S o l u t i o n s a n d their P r o p e r t i e s . . . . . . . . . . . . . . . . . . . . 5.2.1 Existence of a S o l u t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 S t r u c t u r e of O p t i m a l S o l u t i o n s . . . . . . . . . . . . . . . . . . . . 5.2.3 A n A priori B o u n d on the L e n g t h of A n y O p t i m a l Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 U n i q u e n e s s a n d C o n t i n u i t y of the S o l u t i o n . . . . . . . . . . . . . . . . .
83 85 87 87 88
3.
3.1 3.2 3.3
90 93
XIV
Contents
5.4 5.5 .
A
6.1 6.2
6.3 6.4 6.5
5.3.1 U n i q u e n e s s of the O p t i m a l S o l u t i o n . . . . . . . . . . . . . . . . . 5.3.2 C o n t i n u i t y of the O p t i m a l S o l u t i o n . . . . . . . . . . . . . . . . . An Example ........................................... Summary .............................................. Performance Measure ...................... Problem Formulation ................................... 6.1.1 R e l a t i o n to P a r e t o O p t i m a l i t y . . . . . . . . . . . . . . . . . . . . . Properties of the O p t i m a l S o l u t i o n . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Existence of a S o l u t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 S t r u c t u r e of O p t i m a l Solutions . . . . . . . . . . . . . . . . . . . . 6.2.3 A n A priori B o u n d on the L e n g t h of a n y O p t i m a l Solution ........................................... An Example ........................................... C o n t i n u i t y of the O p t i m a l S o l u t i o n . . . . . . . . . . . . . . . . . . . . . . . Summary ..............................................
Composite
93 95 95 98 99 99 100 101 101 102 103 106 107 109
.
MIMO Design: The Square Case ......................... 111 7.1 P r e l i m i n a r i e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 T h e C o m b i n a t i o n P r o b l e m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2.1 Square Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.3 T h e Mixed P r o b l e m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3.1 T h e A p p r o x i m a t e P r o b l e m . . . . . . . . . . . . . . . . . . . . . . . . 125 7.3.2 R e l a t i o n between the A p p r o x i m a t e a n d the Mixed Problem ......................................... 127 7.4 A n I l l u s t r a t i v e E x a m p l e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.4.1 S t a n d a r d gl S o l u t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.2 S o l u t i o n of the Mixed P r o b l e m . . . . . . . . . . . . . . . . . . . . . 130 7.5 S u m m a r y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.6 A p p e n d i x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.6.1 I n t e r p o l a t i o n C o n d i t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.6.2 Existence of a S o l u t i o n for the C o m b i n a t i o n P r o b l e m . 135 7.6.3 Results on the Mixed P r o b l e m . . . . . . . . . . . . . . . . . . . . . 136
.
Multiple-input Multiple-output Systems .................. 8.1 P r o b l e m S t a t e m e n t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 C o n v e r g i n g Lower a n d U p p e r B o u n d s . . . . . . . . . . . . . . . . . . . . . 8.2.1 C o n v e r g i n g Lower B o u n d s . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 C o n v e r g i n g U p p e r B o u n d s . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 S u m m a r y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
139 139 141 142 143 144
9.
Robust Performance ...................................... 9.1 R o b u s t S t a b i l i t y a n d R o b u s t P e r f o r m a n c e . . . . . . . . . . . . . . . . . 9.1.1 R o b u s t S t a b i l i t y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 R o b u s t P e r f o r m a n c e . . . . . . . . . i ....................
145 146 146 147
Contents 9.2
9.3 9.4 9.5
Problem Formulation ................................... 9.2.1 Delay A u g m e n t a t i o n A p p r o a c h . . . . . . . . . . . . . . . . . . . . . 9.2.2 F i n i t e l y M a n y Variables A p p r o a c h . . . . . . . . . . . . . . . . . . Quadratic Programming ................................. Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary ..............................................
XV 148 149 153 154 157 162
References ....................................................
163
Index .........................................................
165
1. Topology
In this chapter we lay down the foundations of the m a t h e m a t i c a l structure required for optimization methods for vector spaces. We start by a terse introduction to sets. No a t t e m p t is made to provide an axiomatic description of set theory. We appeal to the intuitive notion of sets as being a collection of objects. The reader is introduced to the axiom of choice and Zorn's lemma. Tile m e a t of this chapter is the section on general topology where we study topological sets with the bare m i n i m u m of structure on the sets. T h e concepts of convergence, continuity and compactness are presented in this general setting. T h e next two sections discuss metric and vector normed spaces. Vector normed spaces will be studied in greater detail in the next chapter. The section on finite dimensional spaces summarizes some i m p o r t a n t properties enjoyed by finite dimensional spaces that are lacking in infinite dimensional spaces. In tile last section of this chapter we study e x t r e m a of real valued functions. It is shown that compact sets and various forms of continuity play a pivotal role in the existence of extrema. Thus they form the focus of optimization. It is shown t h a t the norm topology does not have an abundance of compact sets. Elementary knowledge of real analysis is assumed. Definitions for well known operations like union and intersection of sets is not provided. Familiarity with countable sets, uncountable sets and the real number system is assumed. Except for these topics this chapter is self contained. However, the material presented in this chapter will be more transparent to a reader who has been exposed to analysis concepts taught in a undergraduate course (the first seven chapters of [1] is sufficient background).
1.1
Sets
Here, we do not a t t e m p t to provide the axiomatic development of sets, rather we appeal to the intuitive notion of a set as being a collection of objects. If A and B are two sets then A x B is another set defined as A x B := {(a, b) : a E A and b E B}. A binary relation on a set X is a subset T of X x X .
2
1. Topology
The relation is usually denoted by a symbol -~ and we say x -< y if and only if (x, y) belongs to T. An order is a binary relation which is transitive (x -< y and y --< z implies that x -< z), reflexive (x --< x) and antisymmetric (x -< y and y -< x implies that x is the same element as y). A relation without the a n t i s y m m e t r i c property is called a preorder. An element x is called the majorant of the subset Y if for all y in Y, y ~ x. An element, x is called the minorant of the subset Y if for all y in Y, x -~ y. The set is said to be totally ordered if either x ~ y or y -< x for any two elements x and y of X. Furthermore, it is well ordered if every subset Y of X has a minorant in the set Y. A directed set X is a set with a preorder such that every pair (x, y) of X has a majorant. We say that (X, -~ ) is inductively ordered if each totally ordered subset of X (in the order induced from X) has a m a j o r a n t in X. We denote the collection of all subsets of a set X by x ( X ) and the e m p t y set by {}. For sets A and B, we define A \ B as the collection of all elements of A which are not in B. The axiom of choice states that it is possible to choose an element from any nonempty subset of a set. We now state this axiom precisely. A x i o m 1.1 ( A x i o m o f c h o i c e ) Given any set X there exists a function c such that
c: (x(X) \ {})
X.
c is called the choice function.
In the intuitive description of a set, thought of as a collection of elements, it is difficult to explain the role of the axiom of choice. In the axiomatic description of set theory it can be shown that the axiom of choice is independent of the axioms used in developing set theory. Thus, different m a t h e m a t i c s results based on whether the axiom of choice is accepted to be true or not. The axiom of choice results in i m p o r t a n t theorems (such as the Hahn-Banach theorem). There seems to be no other alternative in establishing such key results and thus the validity of the axiom of choice is generally accepted. L e m m a 1.1.1 ( Z o r n ' s l e m m a ) . Every inductively ordered set X has an element x such that if y is in X and x -< y then y = x. x is called the maximal element. It can be shown that Zorn's l e m m a is equivalent to the axiom of choice. This l e m m a will be used in obtaining i m p o r t a n t results.
1.2 General Topology D e f i n i t i o n 1.2.1 ( T o p o l o g y ) . A topology on a set X is a collection of subsets r of X with the following properties.
1.2 General Topology
3
1. Any union of sets in 7 belongs to r. 2. Any finite intersection of sets in 7 belongs to r. 3. {} and X belong to 7. We say that (X, r) is a topological set and that r consists of the open subsets of X. A subset F of a topological set (X, r) is said to be closed if X \ F is open. A subset Y in a topological set (X, 7) is a neighbourhood of the point x in X if there is an open subset A of Y such that x E A. For each subset Y of X we define the closure of Y as the intersection of all the closed sets that contain Y. The closure of the set Y is denoted by Y - . T h e o r e m 1.2.1. For a topological set (X, 7), the following assertions are true. 1. The union of any collection of open sets is open and the union of a finite collection of closed sets is closed. ~. The intersection of any collection of closed sets is closed and the intersection of a finite collection of open sets is open. 3. If Y C X then z E Y - if and only if for every neighbourhood A of z, Y MA :/: {}. Proof. (3.) A - = A { F : F D A and F is a closed set} = N { X \ G : X \ G D A and G E r}. Therefore z0 belongs to A - if and only if z0 E X \ G for every open set G which is such that X \ G D A. Therefore z0 is in A - if and only if z0 E X \ G for every open set G which is such that G M A = {}. This is equivalent to the statement that a:0 is in A - if and only if for all G open which contain x0, G tq A # {}. We leave the rest of the proof to the reader. [] D e f i n i t i o n 1.2.2 ( R e l a t i v e t o p o l o g y ) . If (X, 7) is a topological set and Y is a subset of X then the relative topology on Y is a collection of sets of the form Y (3 A where A belongs to 7. It follows from the definition of relative topology that a set is closed in the relative topology on Y if and only if it has the form Y M F where F is closed in (X, r). D e f i n i t i o n 1.2.3 ( I n t e r i o r o f a s e t ) . Let Y be a subset of a topological set (X, r). A point z is called an interior point of Y if there is an open set A E 7 such that z E A with A C Y. The collection of all interior points of the subset Y is called the interior of the set Y and is denoted by i n t ( Y ) . Given a topological set (X, r) and a point x E X we define the neighbourhood filter, X'(z) associated with z by {N : N C X such that there exists U E 7, x E U and U C N},
(1.1)
4
1. Topology
which is a collection of neighbourhoods of x. We say t h a t for any two sets A and B in A/, A -< B if A D B. It is can be shown t h a t the relation -< defined above is reflexive and transitive. Also, given A and B in N'(z), A N B is in A/(x) and is contained in A and B. Therefore, any two elements in A/(z) have a majorant. This implies that N'(z) with the relation -.< is a directed set. If a and r are two topologies on X then we say that cr is a stronger topology than r or r is a weaker topology than c~ if r C c,. L e m m a 1.2.1. Let {rj : j E A} be a collection of topologies on a set X indexed by the set A. Then there exists a weakest topology which contains all vj for j in A. There also exists a strongest topology which is contained in all rj for j in A. Proof. Let rim be the collection of sets which are in all rj for j in A. Then it is clear that tint defines a topology on X. Also, rim is contained in all rj for j in A. It is also easy to see that ri,~t is the strongest such topology. Let T be the collection of all topologies which are stronger than rj for all j in A. This collection is not e m p t y because the topology in which all elements of X are defined open (called the discrete topology) is in T. We know that there exists a strongest topology ro~,t which is contained in all topologies in 7-. This is the weakest topology which contains all rj for j in A. [] D e f i n i t i o n 1.2.4 ( S u b b a s e , B a s e , N e i g h b o u r h o o d b a s e ) . Let cr be a collection of subsets of X . Then (r is a subbase for the topology v, if r is the weakest topology, amongst topologies on X that contain c~. ~ is a basis for r if all sets in it are unions of elements in cr. cr is a neighbourhood basis for an element z in ( X , r ) if for every set A in the neighbourhood filter of x (given by A/'(x)) there exists a set in both, N'(x) and (r which is a subset of A. Lemma
1.2.2.
The following assertions are true.
1. Let (r be a collection of subsets of a set X . Let r be a topology on X . is a subbasis for v if and only if r is the set { F : F = X , F = {} or F is a union of sets which are finite intersections of sets in c~} for which ~ is a subbasis. Then every set in r(c~) is a union of sets which are finite intersections of sets in cr, or the null set or the set X . 2. A collection of open sets (r in a topological set (X, v) is a basis for r if ~r f o r m s a neighbourhood basis for all elements in X . 3. A collection of open sets cr in a topological set (X, r) is a subbasis for r if the collection of sets formed by finite intersections of sets in cr f o r m s a neighbourhood basis for all elements in X . Proof. (1.) Every topology that contains (r contains all the sets which are union of sets formed by finite intersections of sets in ~r. This follows from the definition of a topology on a set. Also, all sets which are unions of sets formed
1.2 General Topology
5
by finite intersection of sets from ~r together with the null set and the set X define a topology on X. (2.) Let A E r. Then for every x in A, A is in Af(z) where A/'(x) is the neighbourhood filter of x. As the collection of sets cr forms a neighbourhood basis for all points in X, it follows that there exists a set B~ such that Bz E ~r M A/'(z) and B~ C A. It is evident that A = UxEA B:c, Therefore, A can be written as a union of sets from ~r. As A is an arbitrary set in r it follows that that cr forms a basis for the topology r. (3.) Let A E r be such that A :/= X and A :/= {}. Then for every z in A , A is in Af(z) where Af(z) is the neighbourhood filter of x. It follows that there exists a set Bx which is a finite intersection of sets from ~r such that B~ E Af(z) and B~ C A. It is evident that A = UxEA B~. Therefore, A can be written as a union of sets from the collection of sets formed by finite intersections of sets from ~r. As A is an arbitrary set in r it follows that cr forms a subbasis for the topology r. [] D e f i n i t i o n 1.2.5 ( N e t s ) . A net in a set X is a pair (A, k) where A is a directed set and k is a map from A into X. A net is also denoted as {x~}x~A where x~ = k(A). We also say that x;~ is a net in X on A. The net x~ is eventually in the set A if there exists A0 E A such that if A0 -< A then xx E A. The net xx is frequently in the set A if for every A0 E A there exists a A E A such t h a t A0 -~ A and xx E A. D e f i n i t i o n 1.2.6 ( C o n v e r g e n c e ) . Let x;~ be a net in a topological set (X, r). We say that x:~ -+ xo if x~ is eventually in N for all N E Af(xo). We also say that xx converges to xo in the topology r. Another notation used is limx,x to represent xo and xo is said to be the limit of the net xx.x~ is said to be a convergent net if there exists a xo E X such that xx -+ xo. It is possible that a net converges to more than a single element. This is not the case for the Hausdorff topology which is defined below. D e f i n i t i o n 1.2.7 ( H a u s d o r f f t o p o l o g y ) . For a set X, r is a Hausdorff topology if for all elements x and y in X with x ~s y there exist sets A E r and B E r such that x E A, y E B and A M B is empty. D e f i n i t i o n 1.2.8 ( D e n s e n e s s , S e p a r a b i l i t y , A x i o m o f c o u n t a b i l i t y ) . A subset Y of a topological set ( X , r ) is dense in (X, r) if Y - -- X. The topological set (X, 7") is called separable if it has a countable dense subset. The topological set (X, r) is said to satisfy the first axiom of countability if for every x in X there exists a countable number of open sets A,~(x) such that any neighbourhood of x contains atleast one of them. A net defined on the set of integers in X is called a sequence. In most results the concept of a net is superfluous and can be replaced by the notion of a sequence if the topological set satisfies the first axiom of countability. However,
6
1. Topology
the concept of nets allows for more general results and the proofs of m a n y results become easier to establish. L e n a m a 1.2.3. Let (X, r) be a topological set and let A be a subset of X. Then z0 is in the closure of A if and only if there is a net xx in A such that
Proof. From T h e o r e m 1.2.1 we know that if A C X, then z0 E A - if and only if for every neighbourhood G of ~:0, A f l G :/= {}. (r Suppose there is a net x;~ in A such that xx --+ z0. T h e n for any open set G which contains z0 (and therefore is in A/'(z0) ), there exists a A0 such that A0 -< A implies t h a t x~ E G. However, every such xx is in A because xx is a net in A. Therefore G M A =~ {} which implies t h a t z0 E A - . (=:~) Suppose x0 E A - and N E A/'(z0). Then there exists G E r such t h a t G C N and xo E G. Therefore for every N E Af(zo), N N A r {}. Let ZN belong to N n A. xN is a net in A defined on the directed set A/'(xo) (we have shown before that Af(x0) is a directed set with A -< B if and only if A D B). Given any N E A/'(zo), let Ao = N. If A0 -< A then zx E N because N = A0 D A and xx E A. Thus ZN is a net in A such t h a t ZN --+ z0. This proves the lemma. [] Example 1.2.1. It is not true that if z0 is in the closure of a set A then there exists a sequence in A which converges to z0. Consider any uncountable set X and define the topology on X by r := {{}, X, all sets which have countable complements}. T h e n the closed sets are given by {{},X, all countable sets}. Let A := X \ { z 0 } where z0 is any element in X. Then A is open and X is the only closed set that contains A. Therefore A - = X. Let zn be any sequence in A. T h e set X \ { x n : n > 1} is an open set which contains x0 and therefore is a neighbourhood of x0. However z,~ is never in this neighbourhood and therefore zn 74 z0. This example illustrates that to fully describe topological concepts in terms of convergence, the use of nets is indispensable. D e f i n i t i o n 1.2.9 ( C o n t i n u i t y ) . Let (X, r) and (Y, c~) be topological sets. A function f : (X, 7-) -+ (V, or) is continuous if for every set C Err, f - 1 (G) E 7".
y-1 : (v,
(x, T) is defined as
-,
f-l(B)
----={x E X : f ( x ) e B}.
f is said to be continuous at a point xo if for every neighbourhood G o f f ( x o ) , f - l ( G ) is a neighbourhood of xo. Lemma
1.2.4.
The following statements are equivalent.
1. f : (X, v) -+ (Y, or) is continuous at every point x E X.
1.2 General Topology
7
2. f : (X, r) --+ (Y, ~r) is a continuous function. 3. If xx is a net such that xx -+ xo in ( X , r ) then f(xx) --+ f(xo) in (Y,a).
Proof. (1 =~ 2) Suppose f : (X, r) -+ (Y, c~) is continuous at every point z E X. Let G E a and let x E f-l(G). As G is a neighbourhood of f(x) we have t h a t f - l ( G ) is a neighbourhood of x. This implies t h a t for every x E f-a(G) there exists a set A~ in r containing x such t h a t A~ C f-l(G). It is easy to show that U{A~ : z E f-I(G)} = f-I(G). Therefore f - l ( G ) (being a union of open sets) is open. As G is an arbitrary open set in Y we have shown that f is continuous. (2 ==> 1) Suppose f is a continuous function. Let N be a neighbourhood of f ( x ) for some z E X. Then there exists a set G E cr such that f(z) E G and G C N. From continuity o f f we have that f - l ( G ) E r. As f - l ( N ) D f - I ( G ) and f - l ( G ) is open and contains x we have that f - l ( N ) is a neighbourhood of x. Therefore we have shown that for any neighbourhood N of f ( x ) , f - 1 (N) is a neighbourhood of x. As x was chosen arbitrarily we have t h a t f is continuous at every point z E X. (2 ==r 3) Suppose f is continuous. Given any neighbourhood N of f(xo) we know that there exists a set G E ~r such that f(x0) E G and G C N. From continuity of f we know t h a t f - a ( G ) E 7". Also, x0 E f-l(G). Therefore f - l ( G ) is a neighbourhood of x0. As xx ~ x0 we know that there exists a A0 such t h a t A0 -~ A implies t h a t x~ is in f-l(G). Therefore there exists a A0 such that A0 -~ A implies t h a t f(x~) E G C N. As the neighbourhood N of f(xo) was chosen arbitrarily we have shown t h a t f ( x ~ ) --+ f(zo). (3 ::~ 2) Suppose f is not continuous. Then there exists x0 E X such that f is not continuous at x0 (from 2 ::r 1). This implies t h a t there exists a neighbourhood N of f(xo) such that f - a (N) is not a neighbourhood of x0. Therefore given M E A/'(x0) there exists x M E M such t h a t X M ~ f - l ( N ) . XM is a net in X on the directed set A/'(x0) where the binary relation is given by A -~ B if and only if A D B. It is clear t h a t XM --r x0. However, f(xM) ~ f(xo) because f(zM) E Y \ N for all M E A/'(x0) and g is a neighbourhood of f(xo). This proves the lemma. [] For a m a p f between sets X and Y and A C X we define
f(A) := {y E Y : y = f(z) for some z E A}. The set f ( X ) is called the range of f and is denoted by range(f). Lemma
1.2.5. / f f : (X, rx) --+ (Y, ry) is a continuous map then f - l ( B ) is
closed if B is closed in Y. Proof. As B is closed in Y it follows that Y \ B is open. From continuity of f it follows that f - l ( Y \ B ) is open. However, f - l ( Y \ B ) = X \ f - I ( B ) . Therefore, f - 1 (B) is closed. []
8
1. Topology
D e f i n i t i o n 1.2.10 ( I n i t i a l t o p o l o g y ) . Let {fj : j 9 be a collection of functions, indexed by the set J on a set X such that f j : X ~ Yj where Yj is a topological set with topology rj. The weakest topology on X which makes all functions fj continuous is called the initial topology on X induced by fj.
9 be a collection of functions on a set X, indexed by the set J, such that f j : X ---+Yj where Yj is a topological set with topology rj. The collection of sets of the form L e m m a 1.2.6. Let {fj : j
{fj-l(A) : A 9 rj for some j 9 J},
forms a subbasis for the initial topology on X. Proof. It is clear that any topology on X which makes all the functions fj continuous must contain all the sets from ~ where o" : : {fj-X(A) : A e rj for some j e J ) , otherwise at least one of the fj will be discontinuous. Therefore, the initial topology also contains a. Let r denote the topology for which a forms the subbasis. As 7- is the weakest toplogy that contains ~r and the initial topology contains a it follows that the initial topology is stronger than r. Also if X is endowed with the topology r then the functions fj are continuous. As the initial topology is the weakest topology on X which make fj continuous for all j it follows that r is stronger than the initial topology. Thus r is the initial topology. 13 L e m m a 1.2.7. Let X be endowed with the initial topology 7" induced by a
collection of functions {fj : j E J}, indexed by the set J, where fj : X Then a net x~ in ( X , r ) on A converges to xo in X if and only if fj(x~) converges to fj(xo) in rj for all j in J.
(Yj,Tj).
Proof. (=:>) Suppose the net x~ in X on A converges to x0. Then, because the topology on X is the initial topology (which makes all fj continuous) f j ( x x ) --+ fj(xo) for all j 9 J. (r Suppose for all j 9 J, fj(x~) ~ fj(xo). Let B be a neighbourhood of so. Then, as := { f 7 1 ( A ) : A 9
for some j 9 d ) ,
is a subbasis for the initial topology on X, B contains a set C which is a finite intersection of sets from ~ such t h a t z0 9 C. Without loss of generality assume that k C = Oi=l{f[-l(Ai) : where A~ 9 r~}.
Note that as x0 9 C it follows that Ai is an open set containing fi(xo) for all i = 1 , . . . , k. As fi(x~) ~ fi(xo) it follows that there exist Ai such that if ,~i "< ,k then fi(xx) 9 Ai for all i = 1 , . . . , k. Let ,~0 represent the m a j o r a n t of the set { A t , . . . , A k } . This implies that if A0 -< A then fi(xx) 9 Ai for all
1.2 General Topology
9
i = 1 , . . . , k . Therefore, if A0 -< )~ then xx E f~-l(A~) for all i = 1 , . . . , k which implies that zx E n~=i{f~-l(Ai)} C t3. This implies that z~ is eventually in B. As B is an arbitrary neighbourhood of z0 it follows that zx --+ z0. [] D e f i n i t i o n 1.2.11 ( I s o m o r p h i s m ) . A function f : X --4 Y is an isomorphism if it is a one-to-one and onto function. D e f i n i t i o n 1.2.12 ( H o m e o m o r p h i s m ) . A function f : (X, r~) --~ (Y, ry) is a homeomorphism if it is a one-to-one and an onto function which is continuous with its inverse also continuous. Two topological sets which have a homeomorphism between them are said to be topologically identical. D e f i n i t i o n 1.2.13 ( F i l t e r s ) . A filter Y: in a set X is collection of subsets of X which has the following properties.
1. { } r 2. X E.T. 3. A C B and A E .T implies that B E Y:. 4. A E .T and B E Y: implies that A n B E .T. D e f i n i t i o n 1.2.14 ( U l t r a f i l t e r ) . An ultrafilter in a set X is a filter in X with the additional property that no other filter in X properly contains it. L e m m a 1.2.8. For a filter g in a set X, the following statements are equivalent.
1. g is an ultrafilter in X. 2. For every set A C X either A E g or X \ A E g. Proof. (2 ~ 1) Suppose g is not an ultrafilter in X. Then there exists a filter ~" in X and a subset A of X such that A E Y" but A ~ g with the property that if B E g then B E }-. If X \ A E g then X \ A E 9v and because .T is a filter, ( X \ A ) A A E 9r. This would imply that {} E T . Therefore X \ A ~ g. Therefore we have shown that both A and X \ A are not in the filter g. (1 => 2) Suppose g is a filter and there exists a set Y C X such that Y ~ g and X \ Y ~ ~. Let A be any set which belongs to the filter g. Then ( X \ Y ) n A # {} because otherwise A C Y and as ~7 is a filter Y E ~. Let .T:={C:
there e x i s t s A E g s u c h t h a t
CD(X\Y)
AA}.
As ( X \ Y ) n A ~ {} for every A E g it is clear that {} ~ .T. It is clear that X E 3v. If C1 E .T then there exists a s e t A1 E g s u c h t h a t C'1 D ( X \ Y ) A A 1 . If C2 D Ci then C2 D ( X \ Y ) n Ai and therefore C~ E ~ . If Ci and C'2 both belong to .T then there exist sets Ai and A2 both in g such that C1 D ( X \ Y ) O Ai and C2 D ( X \ Y ) n A2. As Ai VIA2 E g and Ci AC2 D ( X \ Y ) f q (A1 AA2) it follows that Cl OC2 E .T. This proves that .T is a filter. Note that if B E G then B E .T. Also, X \ Y E .T whereas X \ Y f[ g. Thus .T properly contains G. Therefore g is not an ultrafilter, which proves the lemma. []
10
1. Topology
Lemma
1.2.9. E v e r y filter in a set X is contained in an ultrafilter in X .
Proof. Suppose G is a f l t e r in X. Consider
P := {7i : 7-I is a filter in X which contains ~}. Let the binary relation on P be given by -< where B -< .4 if and only if B C .A. Let Q be a totally ordered subset of P. Let N denote the union of all the elements of Q. Then it can be shown that N E P and t h a t N is a m a j o r a n t for Q. This implies t h a t P is inductively ordered. From Zorn's l e m m a ( L e m m a 1.1.1) P has a m a x i m a l element ~'. It follows t h a t 5r is an ultrafilter that contains ~. [] D e f i n i t i o n 1.2.15 ( S u b n e t s ) . Let ( A , i ) be a net in X on A. A subnet o f ( A , i ) is a net ( M , j ) in X with a f u n c t i o n h : M --+ A such that j = i(h) and f o r every X0 E A there exists a ~o E M with A0 -< h(~) if ~o -'< ~. We also say that yz on M is a subnet of the net xx on A if there exists a function h : M --+ A such that xh(Z) = yZ and f o r every Ao E A there exists a fie E M with Ao ~ h(13) if /3o -< ~3. The definition of subnets seems involved. However, the definition of a subsequence of a sequence will bring out the similarity between subnets and subsequences. L e m m a 1.2.10. Let xx be a net in ( X , r) on A. Let yp be a subnet of x;~ on M. If x~ -+ xo then y~ --+ xo. Proof. Let N E A/'(x0). As xx --+ z0 we know that there exists a A0 E A such t h a t )~0 -< )~ implies that xx E N. As yz is a subnet of x~ there exists a function h : M --+ A and /30 E M such that/30 --3 implies t h a t )~0 -'< h(/3). Thus there exists a/3o E M such that if fl0 -3 then xh(Z) E N. But from the definition of a subnet xh(p) = y~ and therefore we have shown that for every N E A/'(x0) there exists a/3o E M such that if/3o -< fl then ya E N. []
D e f i n i t i o n 1.2.16 ( U n i v e r s a l n e t s ) . A net xx in X on A is univcrsal if f o r all A C X either z x is eventually in A or xx is eventually in X \ A . As we will see universal nets play an i m p o r t a n t role in characterizing compactness. The following l e m m a brings out an interesting property of universal nets. L e m m a 1.2.11. If x~ is a universal net in X on A and f : X --> Y where Y is a set then f ( x x ) is a universal net in Y on A. Proof. Let B C Y. Then as xx is a universal net on X it follows that xx is eventually either in f - l ( B ) or in X \ f - I ( B ) = f-I(Y\B). Therefore, f ( z x ) is eventually either in B or in Y \ B . Therefore, f ( x x ) is a universal net in Y. []
1.2 General Topology Theorem
11
1.2.2. Every net has a universal subnet.
Proof. Let x~ be a net in X on A. Consider the set := {A : A C X and x~ is eventually in A}. It is evident that ~ is a filter. From L e m m a 1.2.9 we know t h a t there exists an ultrafilter .T which contains ~. We will show now t h a t x~ is frequently in every set in 9r . Suppose there exists a set F E .T and a A0 E A such t h a t if )~0 -< ~ then xx ~ F. Therefore {x~ : )~0 -< )~} M F = {}. The set {x~ : )~0 -< )~} belongs to G and therefore it belongs to .T. F belongs to 9e and as .T is a filter it follows t h a t {x~ : )~0-<)~} N F E 9v. This means that the null set belongs to ~ . We have reached a contradiction and therefore x,x is frequently in every set in .T. Now we construct an universal subnet of x~. Consider the set M:={()~,B)
:~EA
and B E ~ }
with the order defined by (A1,B1) ~ ()~2, B2) if and only if A1 -< A2 in A and B1 D B2. It follows that M with the relation -~ defined above is a directed set. Define the m a p p i n g h : M --+ A so that ,~ -< h(A, B) and Xh(a,B) E B (such a m a p exists because xx is frequently in every set of 9~). Let the elements of M be denoted by 13 and define y~ = xh(:~,B) where /3 = ()~, C). Given any )~0 E A choose/3o E M such that/3o = (A0, X). Then/3o -< /3 = ()~, C) implies that A0 -< )~ in A. Therefore given )~0 E A there exists a/3o E M such t h a t /3o -3 implies that )~0 -< )~ -< h()~, C) and therefore yz is a subnet of x~. Given any subset A of X we know that either A E ~" or X \ A E ~" (see L e m m a 1.2.8). Suppose that A E ~'. Let /3o = ()~0,A) where )~0 E A is arbitrary. If/3 = ()~, C) is such that /3o -< /3 then A0 -< )~ and A D C. Also, yZ = Xh(~,c) E C and therefore yp E A. Therefore we have shown t h a t there exists a/3o E M such that/3o -3 implies that yz E A. Thus the net yZ is eventually in the set A. It can be shown in a similar manner that if X \ A E :T then yZ is eventually in X \ A . As A was chosen arbitrarily we have shown that y~ is a universal subnet of x~. [] c o v e r ) . An open covering of a subset Y of a topological set (X, v) is a subset ~ of r such that Y C U A, A E ~r. A subcovering of ~r is a covering C~o that is contained in c~.
D e f i n i t i o n 1.2.17 ( O p e n
A topological set (X, v) is compact if every open cover of X has a finite subcover.
D e f i n i t i o n 1.2.18 ( C o m p a c t n e s s ) .
Thus (X, r) is compact if for every collection (countable or otherwise) of open sets which cover X there is a finite n u m b e r from the collection which cover X. A subset of (X, r) is said be compact if it is compact in the relative topology. Lemma
compact.
1.2.12. Every closed subset C of a compact topological set (X, r) is
12
1. Topology
Proof. Suppose, {Ao} forms an open cover for C. Then, there exists G~ in r such that Go f q C = A~. As C is closed we have that Go := X \ C is open. Therefore {Go} together with G0 forms an open cover of X. As X is compact there exists a finite number of sets from G0 and {Go } which cover X. It follows that only a finite number of sets from {Go} and G0 are required to cover X. This implies that only finitely m a n y of the the sets {Go} are required to cover C. Thus we have shown t h a t given any open cover {Ao} only finitely m a n y of the sets from {Aa} form an open cover of C. This proves the lemma. [] Not every compact subset of a closed set is closed. However, the following l e m m a holds. L e m m a 1.2.13. Every compact subset of a closed set in a topological set is
closed if the topology is Hausdorff. L e m m a 1.2.14. / f f : (X, r~) --+ (Y, ry) is a continuous map from a compact topological set (X, r~) to a topological set (Y, ru) then f ( X ) is compact as a subset of (Y, ru) where
f ( X ) := {y: there exists a z E X such that y = f ( z ) } . Proof. Suppose, {Bo} is an open cover for f ( X ) (in the relative topology). This implies that Bo = Go fq f ( X ) for some Go E ry. Then the collection of sets { f - l ( G ~ ) } forms an open cover of X. From compactness of X there exists a finite set F (i.e. F has finite number of elements) such t h a t the collection { f - l ( G a ) } o e F covers X. It is clear that f ( X ) = t.J~eFBo. [] 1.2.3. (X, 7") is a compact topological set if and only if every universal net in X is convergent.
Theorem
Proof. (==r Suppose (X, r) is a compact topological set. Let x~ be a universal net in X on A such that for every x E X there exists an open set G~ containing x such that xx is not eventually in G~. As zx is a universal net it follows t h a t for every x E X, xx is eventually in X \ G x . It is clear that U x e x G~ is an open cover of X. As X is compact there exist elements x l , . . . , x,, in X such t h a t n X = I,.Ji=l G ~ . As zx is eventually in X \ G ~ , for all n = 1 , . . . , n it follows that x~ is eventually in Ni~=l X \ G ~ . This implies that x~ is eventually in 12 X \ [.Ji=l G ~ = {}. Thus we have reached a contradiction and it follows t h a t every universal net in a compact set converges. (r Suppose (X, 7-) is not a compact set. Then there exists an open cover {Ax}xea such that for every finite subset F of A there exists an element XF in X such that XF E X \ U X e F Ax. Let T := {F : F C A and F is finite}. Define an order -< on T b y F1 -~ F2 if and only if F1 C F ~ . T h e n x f is a net in X on T. Let (Y~)#eM with h : M --+ T be a universal subnet of the net x~- (we know that such a subnet exists from T h e o r e m 1.2.2). Suppose
1.3 Metric Topology
13
there exists y0 such that yz --+ y0. Then there exists AFo with F0 E A such that Y0 E AFo and yz is eventually in AFo. Note that F0 is also an element of T. As yp is a subnet of x~ we know that there exists/~0 E M such that if /3o -~/3 then F0 -~ h(/?) in T. As y~ is eventually in AFo we know that there exists a/~r such that/?r _< 13 implies that yz E AFo. Let a be the majorant of/~0 and/yr. Then y~ E AFo, Fo -< h(a) and y~ = xh(~). But we know that Xh(o) E X \ Uxeh(~) A~. As Fo C h(a) we have a contradiction. This proves the theorem. [] C o r o l l a r y 1.2.1. In a compact topological set (X, v) every net has a con-
vergent subnet. Proof. Follows immediately from Theorem 1.2.2 and Theorem 1.2.3. [] It is not true that in a compact topological set every sequence has a convergent subsequence. We will give examples of this fact. later after we have introduced weak , topology. We will use the machinery developed to prove an important theorem; Tychonoff's theorem. D e f i n i t i o n 1.2.19 ( P r o d u c t set, P r o j e c t i o n , P r o d u c t t o p o l o g y ) . Suppose {Xj }je J is a collection of sets with topologies rj. The product set denoted by I I j e j X j is a collection of sets each of which has an element from all the sets Xj. The projection operators 7rj are mappings from H j e j X j to Xj. IrA is in I I j e j X j then ~rj(A) called the projection of A onto its jth factor is the element in A which belongs to Xj. The initial topology on H j e j X j induced by the functions 7rj is called the product topology.
Example 1.2.2. Let X1 = {x E R : 1 < x _< 2} and let X2 := {x E R : 3
_<
x < 4}. Then
II~=lX j := {(a,b):a E Xl and b E X2}. T h e o r e m 1.2.4 ( T y c h o n o t f ' s t h e o r e m ) . Let {Xj : j E J} be a collection
of compact topological sets. The product set I l j e g X j with the product topology is compact. Proof. Let xA be a universal net in I l j e j X j. Then, lrj(x~) is a universal net in Xj (see L e m m a 1.2.11). As Xj is compact it follows that 7rd(xa ) is a convergent net (see Theorem 1.2.3). As I I j e j X j has the product topology it follows that xx is convergent in H j e j X j (see L e m m a 1.2.7). As x~ is an arbitrary universal net in H j e j X j it follows that H j e j X j is compact (see Theorem 1.2.3). []
1.3 Metric
Topology
The exposition to topology that we have presented untill now is very general and it lacks the structure that most sets we encounter have. The notion of distance between any two elements of a set is an important concept which is made precise by defining an appropriate metric on the set.
14
1. Topology
D e f i n i t i o n 1.3.1 ( M e t r i c ) . Let X be a nonempty set. A metric on the set
X is a map d, from X x X to the real line which has the following properties. 1. d(x,y) >_ 0 and d(x,y) = 0 if and only if x = y; 2. d(z, y) = d(y, z) (symmetry}; 3. d(x, y) < d(x, z) + d(z, y) (the triangle inequality}, where x, y and z are elements in X. The set X with the metric d is denoted as (X, d). A metric on a set X induces a topology on the the set called the metric topology. D e f i n i t i o n 1.3.2 ( M e t r i c t o p o l o g y ) . Let (X, d) be a metric set. We say
that a subset A, of X is open in the metric topology induced by d if for every element a in A there exists an , sufficiently small such that the set {x E X : d(x,a) < ,} is contained in A. It can be shown easily that defining open sets in this manner induces a topology on X. We say t h a t (X, d) is a topological metric set when the topology on X is induced by the metric d. Let A,~(x) := { y : d(x, y) < 1--} n where n is any positive integer and x is any element in X. If N is a neighbourhood of x in the topological metric set (X, d), then it contains a set of the form { y : d(x,y) < ,} w h e r e , > 0. The set A,~(x) with 1 < e is contained in {y : d(x, y) < ,} and therefore every neighbourhood of x contains a set from the collection {An(x)}. Thus we have shown that any topological metric set satisfies the first axiom of countability. It follows that all convergence results can be established by using the notion of sequences and subsequences and the use of nets and subnets becomes superfluous for topological metric sets. L e m m a 1.3.1. Let Xo be any element in a topological metric set (X, d). Let
B,(x0) := {y: d(y, x0) < ,), where e is an arbitrary non-negative number. Then, B~ is closed. Also, the set
a,(x0) := {y: d(y, x0) < is open. Proof. If x E X \ B , then d(x, xo) > ,. Consider the set A , := {y : d(x,y) < a}, where a = (d(x,Xo) - , ) / 2 . If y E A~ then d(y, xo) >__d(x,xo) - d(y,x) > d(x, Xo)-a > d(x, z20) + , > e. Therefore, A~ C X \ B e . Its clear that X \ B e = U ~ e x \ s , A~. As A~ is open for all x in X \ B ~ it follows that X \ B ~ is open and therefore B, is closed. We leave the proof of the second assertion to the reader. [] We now summarize convergence results for topological metric sets. L e m m a 1.3.2. The following assertions are true for a topological metric
sets.
1.3 Metric Topology
15
1. Let xn be a sequence in a metric set ( X , d ) and let xo E X. xn --+ xo in the topology induced by d, if and only if for every real e > 0 there exists an integer m such that n > m implies that d(xn, xo) < e. 2. A function f : (X, dx) --+ (Y, dy) is continuous at xo if and only if for every sequence xn which converges to xo in (X, d x ) , f(x,~) -+ f(xo) in
(r, dr). 3. A function f : (X, dx) --+ (Y, dg) is continuous if and only if for any sequence {zn} convergent in (X, dx), f(x,~) converges to f ( l i m x ~ ) in
(r, dy). Proof. (1. =>) Suppose that the sequence x , is eventually in every neighbourhood of x0. Given any e > 0 consider the set G := {y : d(y, xo) < e}. Because this set is a neighbourhood of x0 it follows t h a t x~ is eventually in G which implies that there exists an integer m such that if m < n then xn E G. Therefore, for every real e > 0 there exists an integer m such that n > m implies that d(x,~, xo) < e. (1. r Proving the other direction is similar and we leave it to the reader. We also leave the proofs of assertions 2 and 3 to the reader. [] The next l e m m a is an i m p o r t a n t characterization of compactness for a topological metric space. T h e proof of the following l e m m a is left to the reader. L e m m a 1.3.3. For a topological metric set (X, d) the following statements
are equivalent. 1. X is compact. 2. Every sequence has a convergent subsequence. Thus unlike general topological sets, in metric sets compactness is completely characterized in terms of sequences. s e q u e n c e ) . A sequence {x,;} in a topological metric set ( X, d) is a cauchy sequence if for every positive number e, there exists an integer N such that if n, m > N then d(xn,xm) < e.
D e f i n i t i o n 1.3.3 ( C a u c h y
Note that every convergent sequence is Cauchy. Itowever, it is not true that in a topological metric set all Cauchy sequences are convergent.
Example 1.3.1. Let X := (0, 1] := {x : 0 < x _ 1} with the metric defined by d(x, y) = Ix - Yl (which is the absolute value of x - y). Indeed, the sequence 1n is cauchy but is not convergent. A topological metric set is said to be complete if all Cauchy sequences in it converge.
D e f i n i t i o n 1.3.4 ( C o m p l e t e n e s s ) .
D e f i n i t i o n 1.3.5 ( I s o m e t r i c m a p s ) . A map f :
metric if dx(Xl, x2) = d y ( f ( x l ) , f(x2)), for all elements Xl and x2 in X.
(X, d,) -~ (Y, dy) is iso-
16
1. Topology
1.4 Normed
Linear
(Vector)
Spaces
Linear spaces (often called vector spaces) are sets with the operations of vector addition and scalar multiplication that satisfy a set of properties which are given in the following definition. D e f i n i t i o n 1.4.1 ( V e c t o r s p a c e ) . Let X be a n o n e m p t y set and let '+', called the addition operation associate two elements x and y in X with another e l e m e n t x + y in X which satisfies the following properties. 1. x + y = y + x f o r vectors x and y in X . 2. x + ( y + z) = (x + y) + z f o r vectors x , y and z in X . 3. There exists in X a unique e l e m e n t denoted by 0 and called the zero element, or the origin, such that x + 0 = x f o r all x in X . ~. For every e l e m e n t x in X , there exists a unique element - x , called the negative of x, such that x + ( - x ) = O. We will refer to the system of real numbers or to the complex numbers as the scalars. The scalar multiplication operation associates a scalar ~ and an e l e m e n t x in X with another e l e m e n t a x , in X in such a way that 1. a ( x + y) = a x + ay. 2. (a + fl)x = (~x + fix for vectors x and scalars a and 13.
3. ( . z ) x = . ( z x ) .~. a x = x if the scalar ~ = 1. The set X with the operations of vector addition and scalar multiplication as defined above is called a vector space or a linear space.
The term 'space' is used to mean a vector space. It should be clear from the above definition that the set of scalars admit two operations; one is the the scalar multiplication and the other is scalar addition. The underlying set of scalars is the real number system unless mentioned otherwise. D e f i n i t i o n 1.4.2 ( S u b s p a c e ) . A subset A, of a vector space X , with a x + fly E A f o r all scalars a and/3 and vectors x E A and y E A is a subspace of X. Note that a subspace of a vector space is also a vector space. D e f i n i t i o n 1.4.3 ( N o r m e d v e c t o r s p a c e ) . A n o r m e d linear space is a vector space X with a function II' l I: x -+ R defined such that
1. I1: 11 >_ 0 and Ilxll = 0 if and only if x -- O. 2.
II,~xll
= I,~1 Ilxll f o r any scalar (~ and vector x in X .
3. IIx + yll < Ilxll + Ilyll.
1.4 Normed Linear (Vector) Spaces
17
It is clear t h a t a n o r m on a vector space X induces a metric on X defined by d(x, y) :-- ] ] x - yll for elements x and y in X. Therefore, a n o r m also induces a t o p o l o g y (called the n o r m topology) which is the metric topology with the metric defined as above. T h e norrned topological space X with a n o r m I]" I] is denoted by (X, ]]. I]). Also, note t h a t because a n o r m e d space is a metric space sequences suffice to describe convergence. E x a m p l e 1.4.1. C o n s i d e r the set R n which is defined as R n := { ( x l , x 2 , . . . , x n )
:xiEnfori=
1,...,n}.
R n is a vector space with the real numbers as the scalars. M a n y different n o r m s can be defined on R n. A n i m p o r t a n t class o f n o r m s on R n is defined by the p n o r m which is given by n
Ixlp := ( ~
lx, l~)~
i=1
where x = ( x l , x 2 , . . . , Xn) and p is an integer such that 1 < p < ~ . The two n o r m (p = 2) and the one n o r m (p --- l) are o f particular interest. A n o t h e r i m p o r t a n t n o r m on R n is the oo n o r m which is defined as
Ixlo~ := max Ixil. l
E x a m p l e 1.4.2 (Product n o r m e d spaces). Let (X, I1" IIx) and (Y, I1" IIY) be n o r m e d vector spaces. Let X x Y := {(x, y) : x E X and y E Y}. For (x, y) in X x Y and ~ E R, ~(x, y) E X • Y is defined as ( a x , my). Also, for elements (x l, Yl) and (x~, y:) in X • Y, (x l, YI) + (x~, y:) is defined as (xl + x2, Yl + Y:). W i t h these o p e r a t i o n s X • Y is a vector space. For (x, y) in X • Y let 1]" t] -+ R be defined by
II(x, y)ll := Ilxllx + IlyllY. Then it can be shown t h a t [[. [[ is a n o r m on X x Y. T h e n o r m e d vector space ( x • y, II. II) is called the p r o d u c t space of X and Y and II. II is called the p r o d u c t norm. Lemma
II' II : x
1.4.1. ~
In a n o r m e d topological vector space (X, [l" [[), the f u n c t i o n
R is a continuous f u n c t i o n .
Proof. Let {x,~} be any sequence in X. Note that Ilx~ll -- I1~ - x0 + x011 < IIx,, - x011 + IIx011 which implies that IIx, l l - llx011 __ IIx, - x011. Using the inequality Ilxoll ___ IIx, - x011 + IIx~ll it follows similarly that Ilxoll- IIx, II _< [Ix,~ - Xo[l. Therefore, I IIx~ll - Ilx01 I _< IIx~ - x011. This implies t h a t if x,, converges to xo in X then I1~'~11 converges to IIx011 in R. Therefore f r o m L e m m a 1.3.2 it follows t h a t I1 II is a continuous function. [] We have given an e x a m p l e in the previous section to illustrate t h a t not all Cauchy sequences converge. However the following l e m m a shows t h a t every C a u c h y sequence is bounded.
18
1. Topology
L e m m a 1.4.2. Let {xn} be a Cauchy sequence in a n o r m e d linear space (X, I1' II) Then there exists a positive real n u m b e r M such that IIx-II _< M f o r all n. Proof. As i x , } is Cauchy there exists an integer N such that n , m >_ N implies that IIx~ - xmll <_ 1 Choose any n > N T h e n PIx~ll = I I x . - XN + XNII <_ IIx. -- XNI[ + IIxNII < I + IIxNII. This proves the lemma. [] Of particular importance to optimization are Banach spaces.
D e f i n i t i o n 1.4.4 ( B a n a c h s p a c e s ) . A complete n o r m e d vector space is a B a n a c h space.
The a priori knowledge of the existence of a limit point for a Cauchy sequence is helpful in optimization problems where construction of Cauehy sequences is natural. D e f i n i t i o n 1.4.5 ( L i n e a r i n d e p e n d e n c e ,
Basis, Dimension).
In a vector space X , vectors x l , x 2 , . . . , x,~ are linearly independent i f f o r any set o f scalars e l , a 2 , . . . , a , ~ a l X l + a2x2 + . . . + a,~x,~ = 0 implies that ai = 0 f o r all i = 1 , . . . , n. I f vectors x l , 9 9 xn are not linearly independent then they are called dependent. ,4 set o f linearly independent vectors x l , x2, . . . , x,~ is a basis f o r the vector space X i f f o r any vector y in X there exists scalars a l , a 2 , . . . , an such that n
Y ----E i = I
OliXi"
I f there exist n vectors which f o r m a basis f o r a vector space X then X has d i m e n s i o n n and is said to be finite dimensional. I f no f i n i t e n u m b e r o f vectors can f o r m a basis f o r the vector space X then X is infinite dimensional.
Finite dimensional normed spaces have structure which might be missing from infinite dimensional spaces. Now we study linear operators from one normed space to another. As we will see in the coming chapters linear functions play an extremely i m p o r t a n t role in optimization theory. D e f i n i t i o n 1.4.6 ( L i n e a r m a p , A f f i n e l i n e a r m a p ) .
A m a p T, f r o m a vector space X to a vector space Y is linear i f f o r any scalars a and fl and vectors x l and x2, T ( a X l + fix2) = a 7 ' ( x l ) + fiT(x2). A map S, f r o m a vector space X to a vector space Y is affine linear i f the map S - S(O) is linear.
E x a m p l e 1.4.3. Let A : R 2 --+ R 2 be defined by
all hi2) A(x) := k, a21 a22
l) where x := ( xx2
X2
. T h e n the m a p A is linear. T h e m a p H : R 2 ~ R 2 with
H ( x ) = m ( x ) + b where b E 122 is an affine linear map.
1.5 Finite Dimensional Spaces Theorem
19
1.4.1. Let (X, II'llx) and (Y, II']]y) be normed spaces. For a linear
map T : (x, I1' IIx) ~ (r, I1' IIY) the following conditions are equivalent. 1. T is continuous. 2. T is continuous at the origin i.e. if the sequence x,, ~ 0 in (X, II 9 IIx) then T(xn) --+ 0 in (Y, I1" IIv). 3. There exists a real number K > 0 such that IIT(x)llY ___ I f l l x l l x . Proof. (1 =~ 2) Follows i m m e d i a t e l y because every continuous m a p is continuous at every point. (2 =~ 1) Suppose for any sequence {x,~} in X which converges to zero, T(x,~) converges to zero. Let {z,~} be a sequence which converges to z0 in X. It follows clearly t h a t the sequence {z,~ - z0} converges to zero in X and therefore T(zn - zo) converges to zero. From linearity of the m a p T it follows t h a t T(z,~) - T(zo) converges to zero which implies t h a t T(z,~) converges to T(zo). T h u s we have shown t h a t if zn converges to z0 in X then T(z,~) converges to T(zo) in Y. (2 ==r 3) Suppose for every positive integer n, there exists an element xn X
in X such t h a t IIT(x,~)lly > nllx,~ll x . Let zn : = n - ~ x "
T h e n IIz,~llA- = 1
and therefore z , converges to 0 in X. However, IIT(z,~)l I > 1 for all n which implies t h a t T(zn) does not converge to zero. This contradicts the fact t h a t
Zn "-+0. (3 ::~ 2) Suppose there exists a real n u m b e r I( such t h a t IIT(x)lly <_ KIIxlIx for all x in X. Let xn be a sequence in X which converges to zero. Then, because tlT(x,~)lly <_ KIIxnllx it follows t h a t T(x,~) converges to zero in Y. [] A m a p f, from a n o r m e d vector space X to a n o r m e d vector space Y is said to be a bounded map if there exists a real n u m b e r I f such t h a t Ilf(x)ll <_ I~[llxll. T h u s for a linear m a p the adjectives continuous and b o u n d e d can be used interchangeably. For ally m a p f which m a p s a n o r m e d vector space (X, II. IIx) to a n o t h e r n o r m e d vector space (Y, II. I1~') we define I]fll : = sup{tlf(x)llv : Ilxllx ___ 1}.
1.5 Finite
Dimensional
Spaces
Finite dimensional spaces enjoy a n u m b e r of properties which do not hold for general infinite dimensional spaces. Here, we show t h a t any n o r m e d finite dimensional space with dimension n is essentially similar to R '~ with any n o r m defined. 1.5.1. Let I be a closed bounded interval in R (that is I = [a,b] with a and b in R). Then I is compact.
Lemma
Proof. Let {A,x}~.e/t be a open covering of I. Let r = sup{t : a < t < b + l , such t h a t [a,t] has a finite subcover f r o m {A~}}.
20
1. T o p o l o g y
Suppose r < b. T h e n there exists A0 E A such t h a t r C Aao. As A), o is open there exists an c > 0 such t h a t [ r - e , r + ~ ] C A~ o. From the definition of r it follows t h a t [a, r - c] a d m i t s a finite subcover from {Ax}. This subcover together with A,xo forms a finite subcover of [a, r + ~]. This contradicts the definition of r. T h u s r > 1. This proves the l e m m a . [] Now we prove the well known Hiene-Borel theorem. T h e o r e m 1.5.1 ( H i e n e - B o r e l ) . 11) is compact.
Every closed bounded subset C, of (R n, 1.
Proof. Because C is bounded there exist b o u n d e d closed intervals Ik for k = 1 , . . . , n in R such t h a t C C H~=llk. As Ik are c o m p a c t (see L e m m a 1.5.1), it follows from T h e o r e m 1.2.4 (Tychonoff's t h e o r e m ) t h a t II'~=llk is c o m p a c t . As C is a closed subset of Hr~=lIk it follows from L e m m a 1.2.12 t h a t C is compact. [] L e l n m a 1.5.2. Every linear map from ( R n, l" I1) to any normed space (X, I[" Ilx) is continuous.
Pro@ Let ei be the n a t u r a l basis for /~n where ei is the n-tuple with 1 in the i th place and zeros elsewhere. Suppose, zk --+ z0 in ( R '~, I" I1) t h a t is, n k rt ]xk -- X0]l --+ 0. If xk = E i = I ai ei and x0 = ~--~i=1 a~ then this implies t h a t aik --+ a ~ for all i = 1 , . . . , n. Now, if T : (R n, I" 12) -+ (X, I1" IIx) is linear then IIT(xk)- Z(x0)ll = IlT(zk- x0)ll = IIr(~i~-0(ag- a ~ max1<,<
la,
-
117'(
)11. A s
-
a~
0 for all i =
1 ....
,
it
follows t h a t 7"(xk) converges to T(xo). Therefore, T is continuous. [] T h e p r o o f above is essentially the consequence of the fact t h a t in (t~ n , l" 1~) convergence is equivalent to c o m p o n e n t w i s e convergence. In the next l e m m a we show t h a t for any finite dimensional n o r m e d space there exists a m a p between it and (R n, I' 11) which is continuous and its inverse (which exists) is also continuous. L e m m a 1.5.3. Let (X, I[" II) be an n dimensional normed vector space with the vectors { X l , . . . , x,~} forming a basis for X . For every x E X there exists n a unique set o f n scalars c~i, with i = 1 , . . . , n such that x = E i = l O~iXi. If T denotes the map from (X, II 9 II) to (R'*, I 9 11) where 1. 11 is the one n norm on R n and T ( x ) : = (o~1, . . . , an) f o r x = E i = I aiXi, then T is a linear homeomorphism. ll
n
n
Proof. Suppose z = ~ i = 1 a i x i = E i = l ~iXi" then it follows t h a t ~--~i=l(ai ]~i)Xi = 0. As "~i are independent it follows t h a t ai = f~i for i = 1 , . . . , n . I'l Therefore, there exists a unique set of scalars ai, such t h a t x = ~--~i=1 aixi" This implies t h a t T is a well defined function. It is clear t h a t T is one-to-one, onto and linear (the p r o o f is left to the reader). As T is one-to-one and onto T -1 (x) consists only of a single e l e m e n t from X and therefore it is a linear m a p from (R '~, 1. I1) to (X, I1' II)- F r o m L e m m a 1.5.2 it follows t h a t T -1 is a continuous m a p .
1.5 Finite Dimensional Spaces Suppose T is not continuous. tinuous at zero. Therefore, there in (X, I1' II) but T(xk) 76 T(xo) ists an e > 0 such that for every
IT(xkj)ll _>e. Let yj
21
Then from T h e o r e m 1.4.1, T is not conexists a sequence {xa} such that xk --+ 0 in (R '~, I" I1). This implies that there exinteger j there exists xkj with kj >_ j and
xkj 1 . Then, IT(yj)ll = 1 and yj --+ 0 in (X, I1'11). := iT(xkj)l
T(yj) is a sequence inside the compact set B := {a : a E R n, tall _< 1} and therefore from T h e o r e m 1.5.1 (Hiene-Borel theorem) and L e m m a 1.3.3 it follows that there exists a subsequence of T(yj) which converges to some element p in R n. Without loss of generality we assume that the sequence T(yj) is convergent. As the norm is a continuous function it follows that ]Pll = 1. Because, T -1 is a continuous function and T ( y j ) --+ p it follows that T-l(r(yy)) ~ T-l(p) and therefore yj --+ T - l ( p ) . But yj -4 0 and therefore T - l p = 0 which implies that p = 0. This is a contradiction to the fact that Ipll = 1. Therefore, T is continuous, r-1 Thus we have shown that there exists a h o m e o m o r p h i s m between any finite dimensional normed vector space with dimension n and (R '~, 1" I1). It follows that there exists a h o m e o m o r p h i s m between any two normed finite dimensional spaces which have the same dimension. C o r o l l a r y 1.5.1. Let (X, II" ]]~) and (X, I1" lib) be the same n dimensional
vector space X, with two different norms defined. Then there exists constants m and M such that
Proof. Let the basis for the vector space X be given by { X l , . . . , Xn}. From L e m m a 1.5.3 it follows that there exists a m a p T : (X, II" Ilo) -~ (R", I I1) rl such that if x = }--~i=1 aixi then T ( x ) = ( a l , . . . , a , ~ ) and both T and T -1 are continuous linear maps (the fact that T -1 is linear is left for the reader to prove). From Theorem 1.4.1 it follows that there exist constants Kz and It'2 such that I r ( x ) l l _< I(lllxlla and Ilxllo = I l T - l ( T ( x ) l l a < K21T(z)I1. Therefore, there exist constants li" 1 and K.~ such that ~7~ IT(x)ll <_ II~lla _<
K21T(x)I~. Similarly, there exist constants N1 and N~ such llxllb <_ N2IT(x)I1. The corollary follows easily from these
that
i~, IT(x)ll _<
inequalities.
[]
C o r o l l a r y 1.5.2. Every finite dimensional normed vector space is complete
and every finite dimensional subspace of a normed vector space is closed in the relative topology. Proof. Let (X, ll" II) be a finite dimensional normed vector space. Let T be the h o m e o m o r p h i s m between (X, I1' II) and (R '~, l" I1) (as given in L e m m a 1.5.3). Suppose xk is a Cauchy sequence in (X, II, II). As T is bounded there exists a constant M such that IT(X)ll _ MIIxll. This implies that I T ( z k ) - - T ( x m ) l l = [ Z ( X k -- Xm)[1 ~ M [ I x k -- Xm[I. As the sequence {xk} is Cauchy it follows that {T(xk)} is Cauchy. (R", [-11) is a complete space (left to the reader to
22
1. Topology
prove) and therefore T(xk) converges to some element q E R n. As T -1 is continuous it follows that T-l(T(zk)) -+ T-l(q). Therefore Xk is convergent in X. Therefore, (X, I]" ll) is a complete space. T h e proof of the second assertion, which is a consequence of the first assertion is left to the rcader. [] The following theorem says that the only vector spaces in which all norm closed and bounded sets are compact are finite dimensional. T h e o r e m 1.5.2. In a normed vector space (X, ll" ll), the set B := {x : x E x and Ilxll < 1} is compact if and only i f X is finite dimensional.
Proof. (r Suppose X is a finite dimensional vector space with dimension n. From L e m m a 1.5.3 we know that there exists a linear h o m e o m o r p h i s m , T : (X, ll" II) --+ (R", I" I1). Therefore there exists a constant M such t h a t IT(x)ll < Mllzll for all z E X. Therefore, it follows that T(B) is a bounded set in (R '~, I" I1). From L e m m a 1.3.1 we know t h a t B is closed. As T is continuous it follows from L e m m a 1.2.5 that T(B) is closed. Therefore, T(B) is compact in (R '~, I" I1) (see Theorem 1.5.1; Hiene-Borel theorem) . As T -1 is continuous it follows that B = T-~(T(B)) is compact ( L e m m a 1.2.14). (=r Let C1 and C2 be two subsets of a vector space X. Then we define the sum of the two subsets as CI+C2:={z:
there e x i s t s x E C 1
andyEC2suchthat
z=z+y.}
Also, we define for a scalar c~ and a subset C1 of the vector space X, ~C1 : = { z :
there e x i s t s x E C l s u c h
that z = c ~ x } .
Let B ( x , 3 ) := {Y : IIx - Vii < 89 for any x in X. Then UxeBB(z,3) is an open cover of B. As B is compact there exist vectors xl, x2 9 9 x,~ in B 1 such that B = tA'~=lS(xi, 1). Note that B(xi, 3) = {xi} + ~S. Therefore, B = U~=l({xi } + 1B). Let Y = s p a n { x l , X 2 , . . . , x , } . Then it follows t h a t 1 B = Y + 71 B C Y-+ 89 + 89 C Y + 89 + ~(Y + 89 Continuing in 1 this manner we can establish that B C Y + 7wB for any positive integer m. For every z 7~ 0, ~ 9 B and therefore given x 9 X and m > 0 there exists ym 9 Y and bm 9 B such that Ym q- 2-mbm = N-~' Because Y is a vector space for every x 9 X and m > 0 there exists y,, 9 Y and bm 9 B such that y,~ + 2-'~bm = x. This implies that for any x 9 X and m > 0 there exists y,,, 9 Y, bm 9 B such that Ilym - xll -- 2-mllbmll _< 2 - m . Therefore, X = Y - . But Y - is Y itself because Y is finite dimensional (see Corollary 1.5.2). Therefore, X is finite dimensional. [] This L e m m a indicates the scarceness of compact sets in the norm topology. As we will see in the next section compactness is essential in optimization and this will lead us to define lcss restrictive topologies in the next chapter.
1.6 E x t r e m a of Real Valued F u n c t i o n s
1.6 Extrema
of Real
Valued
23
Functions
In this section we provide characterizations for functions which allow for extrema to exist. It is shown that cornpactness of sets and continuity properties of functions play an important role for cxtrema to exist. D e f i n i t i o n 1.6.1 ( L o c a l e x t r e m a ) . Let (X, II' IIx) be a normed vector space and let f : D --+ R be a real valued function defined on a subset D of X . An element xo in X is a local minimum if there exists a neighbourhood N o f x o such that for all x E N M D, f(xo) < f ( x ) . xo is a strict m i n i m u m if for all x e g f3 D, f(xo) < f ( x ) . Local maxima are defined analogously. Local extrema refers to either local minima or local maxima. T h e o r e m 1.6.1. If (X, r) is a topological compact set and f : (X, 7-) ---> R is a real valued continuous function then there exists elements Xo and xl in X such that f ( x o ) <_ f(y) and f(xx) _> f ( y ) for all y E X. Pro@ Let tt := inf{f(x) : x E X}. Then from the definition of infimum for every positive integer n there exists an element z,~ such that f(~,~) <_ tL + 1 . Note that as xn is a sequence in a compact set, from Corollary 1.2.1 it follows that there exists a subnet y~ of Xn in X on a set M such that yx ---+ x0 for some z0 in X. As f is continuous it follows that f ( y x ) ~ f(Yo). As Yx is a subnet of z,~ in X on M there exists a map h : M ~ 1 where 1 is tile set of positive integers such that Xh(X) = ya. Also, for every n in I there exists a An such that A,, -~ A implies that n < h(A). Now, f(y,~) = f(Xh(,~)) < It-t- h--~" Given any e > 0 choose N E I such that 1 < ~" If AN -< A then N < h(A) and therefore f(y;~) <_ p + 7~r _< # + ~. As 77 f(Yx) -+ f ( x o ) we know that given e > 0 there exists A~ such that if A~ -< A then If(Yx) - f(xo)l < ~. Let A0 be the rnajorant of AN and A~. If A0 -< A then f(xo) < p + e and because e is an arbitrary positive real number it follows that f(xo) < #. As x0 E X it follows that f(xo) = #. Thus we have established that x0 attains the infimum. The proof for the existence of xx is similar and is left to the reader. []
D e f i n i t i o n 1.6.2 ( L o w e r s e m i c o n t i n u i t y , U p p e r s e m i c o n t i n u i t y ) . Let (X, r) be a topological set and let f : (X, r) --+ R be a real valued function such that every set of the form {x : f ( z ) > ~} for a real is open. Then f is a lower semicontinuous function, f is upper semicontinuous if - f is lower semicontinuous. For A a directed set and rx a net in R on A we use the notation lira infrx to represent liminfrx:=
sup (xoin[ ,ko•A
r;~), A
24
1. Topology
and we use lira sup r~ to represent lim supr,x := inf ( s u p r x ) . AoEA
\Ao-~ A
L e m m a 1.6.1. Let (X, r) be a topological compact set and f : (X, r) --+ R. f is lower semicontinuous if and only if f ( l i m xx) _< l i m i n f f ( x A ) ,
for every convergent net x~. f is upper semicontinuous if and only if f(limzx) > limsupf(z~),
for every convergent net xx. Proof. (==~) Suppose f is a lower semicontinuous function and suppose x,x is a convergent net with limxx = x0. Choose any real number t such that t < f(xo). As f is lower semicontinuous it follows that the set { z : f ( x ) > t} is in r. Note that x0 is in this set and therefore {x : f ( x ) > t} is a neighbourhood of x0. As xx --+ x0 it follows that xx is eventually in this set. Therefore, there exists a )q such that if AI -~ )~ then f(x~) > t which implies that inf f ( z ~ ) > t. Therefore, l i m i n f f ( z ~ ) = sup inf f ( x ~ ) > t. This is true )~ 1 "~ A
--
AoEA
Ao-~A
--
for any t < f(x0) and therefore lim i n f f ( x x ) > f(zo). (r Suppose, for every convergent net xx, f ( l i m xx) < lira i n f f ( z x ) . Consider any set F := {x : f ( z ) < t} where t E R. Suppose, x0 E F - . Then from L e m m a 1.2.3 it follows that there exists a net x~ in X on a directed set A such that x~ --+ x0. From the assumption we have f ( x 0 ) < l i m i n f f ( x x ) _< t. Therefore, z0 E F which implies that F is closed. It follows t h a t {x : f ( x ) > t} = X \ F is open. Thus we have shown that f is lower semicontinuous if and only if f ( l i m z~) < lim i n f f ( x ~ ) . T h e rest of the l e m m a is left as an exercise. [] C o r o l l a r y 1.6.1. If (X, r) is a topological compact set and f : (X, r) -~ R
is a real valued lower semicontinuous function then there exists an element xo in X such that f(xo) < f(y) for all y E X. Similarly, if f is upper semicontinuous then there exist an element xl in X such that f ( x l ) > f ( y ) for all y E X. Proof. Follows from L e m m a 1.6.1 and arguments similar to one used in proving T h e o r e m 1.6.1. [] It is clear that the topology of a set is vital in determining whether ext r e m a for a function exist or not. In m a n y cases the function is a measure of a physical quantity which needs to be minimized or maximized on the given set. We have seen in the previous section that the norm topology is particularly restrictive for infinite dimensional spaces because of the dearth of compact sets in this topology (a norm bounded ball is not c o m p a c t in the norm topology; see T h e o r e m 1.5.2). Therefore, for infinite dimensional spaces
1.6 Extrema of Real Valued Functions
25
it is worthwhile to study relevant topologies other than the n o r m topology. We do this in the next chapter. T h e fact t h a t the derivative of a real valued function f : R --+ R vanishes when it has a local m a x i m a or a local m i n i m a is a classical result. Now, we generalize this result. D e f i n i t i o n 1.6.3 ( G a t e a u x d e r i v a t i v e ) . Let X be a vector space with an open subset D and let (Y, ][-[IY) be a normed vector space with a map, f : D ---+ Y defined. For an element x E X and h E X , f is said to be gateaux differentiable at x with increment h if there exists an clement fh (x) E Y such that
i i f ( x + a h ) - Ot f(m) - ~ A ( m ) l i t
~ 0 a,~ ~ ~ 0.
f h ( x ) is called the gateaux derivative of f at x in the direction h. If f is gateaux differentiable at x with all increments h E X then f is said to be gateaux differentiable at x. If f is gateaux differentiable at all x E X then f is gateaux differentiable. Note t h a t for a differentiable function f : R --+ R the notion of the g a t e a u x derivative and the ordinary derivative are tire same. T h e o r e m 1.6.2. Let f : ( X , I1' IIx) -+ :~ be a real valued gateaux dif:erentiablc function on a normed vector ,space (X, I1' IIx). A,, element Xo in X is a local extrema only i f f h ( X o ) = 0 f o r all h E X .
Proof. Suppose at x0 there is a local minima. T h e n there exists an e > 0 such t h a t Ilzllx _ ~ implies t h a t f ( x o + z) - f ( x o ) >_ O. Therefore, if a > 0 and [lahllx < e then f ( x o -t- a h ) - f ( x o ) > 0. Letting c~ --+ 0 while keeping cr positive we see t h a t f h ( x ) >_ O. Similarly, note t h a t if cr < 0 and II~hllx _ then f ( x o + ah) - f ( x o ) < 0. Letting a --+ 0 while keeping a negative we see t h a t fh(xo) <_ O. T h u s f h ( x o ) = O. Similarly, it can be shown t h a t if at x0 there is a local m a x i m u m then for all h E X , fh(xo) = O. []
2. F u n c t i o n s o n V e c t o r S p a c e s
One of the most important results in optimization theory in vector spaces is the Hahn-Banach theorem which admits many versions. The extension and the geometric versions of the Hahn-Banach theorem are the most i m p o r t a n t for our development. In this chapter we develop the extension form and leave the geometric form for the next chapter. Very intimately related to the HahnBanach theorem is the study of sublinear functions. Sublinear functions also provide the basis for convex analysis. We study sublinear functions in the first section of this chapter. In the previous chapter, the importance of compactness of a set and the continuity properties of a function were elucidated. It was also shown that in infinite dimensional spaces there is a dearth of compact sets in the norm topology (see Theorem 1.5.2). This neccessitates the search for other topologies where the conditions for compactness are relatively easier to satisfy. This leads us to the study of dual spaces and weak topologies. The study of dual spaces which is essentially the study of bounded linear functions, also lays down the foundation for separation of convex disjoint sets. We end this chapter with a study of tp spaces which serve as an example for the various concepts presented. These spaces are also important in the formulation of discrete-time robust control problems.
2.1 Sublinear
Functions
The various versions of the Hahn-Banach theorem, the results on linear functions and the weak topologies are related to sublinear functions. We study sublinear functions with the aim of establishing the Hahn-Banach theorem. We call a function f : X --+ R O { ~ } U { - c ~ } real valued on the set X if f ( x ) E R for all x E X. D e f i n i t i o n 2.1.1 ( S u b l i n e a r f u n c t i o n s ) . Let X be a vector space. We say f : X -+ R is a sublinear function if it satisfies the following properties.
I. For this 2. For this
elements property scalar a property
x and y in X, f ( x + y ) <_ f ( x ) + f ( y ) . Any function satisfying is called a subadditive function. >_ 0 and x in X, f ( a x ) = ~ f ( x ) . Any functzon satisfying is called a positively homogeneous function.
28
2. Functions on Vector Spaces
Note t h a t for a sublinear function f, f(O) = 0. L e m m a 2 . 1 . 1 . Let X be a vector space and S : X -~ R and T : X -~ R tO {co} be functions on X such that the following conditions hold.
1. For all elements x and y in X, S(x + y) < S(x) + S(V) and T(x + V) <
T(x) + T(y). 2. For all element x in X and real numbers a > O, S ( a x ) = a S ( x ) and
T(~x) = ~T(~). 3. S(O) = T(O) = O. Define U(z) : = i n f { S ( z - v) + T(v) : v 9 X } . Also, suppose there exists a u 9 X such that U(u) > - c o . Then U is a real valued sublinear function such that for all x in X , U(x) <_ S(x) and
u(x) < r(~). Proof. Given elements x and y in X, U(x + y) = i n f { S ( x + y - v ) + T ( v )
: v 9 X}
= inf{S(x + y - v 1 -- /)2) "t- T(Vl q- v2) : l'1, v2 9 X } _< inf{S(x - vl) + T ( v l ) + S(y - v2) + T(vo) : v i , w 9 X }
= u(z) + u(y). T h e second equality is true because {v 9 X} = {v~ + v_o E X } (as X is a vector space), and the inequality is true because of the s u b a d d i t i v i t y of S and T. This establishes the s u b a d d i t i v i t y of U. Suppose et is a positive real number, then,
U ( a z ) = i n f { S ( ~ x - v) + T(v) : v e X } = i n f { S ( a x - ~v) + T(~v) : v 9 X } = inf{aS(x v) + ~,T(v) : v 9 X } = aU(x). -
T h e second equality above is true because {v E X } = { a v E X } (as X is a vector space). Suppose x E X then as 0 E X we have U(x) = i n f { S ( x - v) + T(v) : v E X } <_ S(x) + T(O) = S(x) and as x E X we have U(x) = i n f { S ( x - v) + T(v) : v E X } <_ S(O) + T(x) = T ( z ) . This implies t h a t U(x) <_ S(x) and U(z) < T(x) for all x in X. Note that U(x) <_ S(x) < co for all x in X. Since U satisfies U(u) > - c o for some u E X, we have t h a t U(u) is real (that is [U(u)] < oo). Therefore, U(u) = U(u + 0) _< U(u) + U(0) which implies t h a t U(0) _> 0. However, U(0) _< S(0) = 0. Ilence U(0) = 0. Note t h a t for any x in X, U ( x ) + U ( - x ) >_ U(x - x) = 0. T h a t is, U(x) >_ - U ( - x ) > - S ( - x ) > - c o . This implies t h a t for all x in X , U(x) > - c o . We have established earlier t h a t U(x) < co and thus U is a real valued function. []
2.1 Sublinear Functions
29
2 . 1 . 2 . Let S : X ~ R be a sublmcar function on a vcctor space X , and let xo be an elemcnt in X . Define for every w E X
Lemma
U(w) : = i n f { S ( w + ax0) - a S ( x 0 ) : a _> 0}. Then U is a real valued sublinear function such that for all x E X , U(x) <
s(x) and U(-xo) < -S(x0). Proof. We assume t h a t xo :fi 0 (otherwise it follows t h a t for w E X , U(w) = inf{S(w) - a S ( 0 ) :c~ > 0} = S(w)). Define A:={yEX
:
there exists a E
R, c~_>0 w i t h y = - a x o } ,
and B = X \ A . Define a function T on X as;
T(y) = - a S ( x o ) if y E A with y = - a X o where a _> 0, =oc
i f y E B.
If x and y are elements in A then it follows easily t h a t T ( x + y) = T ( x ) + T ( y ) and if a is any non-negative real n u m b e r then a x E A and T ( a x ) = c~T(x). If x ~ A and z E B then T(z) = r and it follows t h a t T ( x + z ) <_ T(x) + T ( z ) . Also, if z E B and a > 0 then a z E B and T ( a z ) = oc = a T ( z ) . Note t h a t u(0)
: = inf{S(a
o) -
:
_> 0} = inf{,
S(xo) -
S(xo) :
> 0} = 0
Also, for any w E X, inf{S(w - v) + T(v) : v E X} = inf{S(w - v) + T(v) : v E A} = inf{S(w + axo) - a S ( x o ) : a _> 0} = U(w).
T h e second equality is true because T(v) = c<~ if v ~ A. Thus, S , T and U satisfy the conditions stipulated in L e m m a 2.1.1. Therefore, it follows f r o m L e m m a 2 . 1 . 1 t h a t U is a real valued sublinear function on X such t h a t U(x) <_ S(x) and U(x) <_ T ( x ) for all x E X. This implies t h a t U ( - x o ) <_ T ( - x o ) = - S ( x o ) . This proves the l e m m a . [] 2.1.1 ( H a h n - B a n a c h ) . Lct X be a vector space and let S : X -+ R be a sublinearfunction on X. Then there exists a linear function L : X -~ R such that L(x) <_ S(x) for all x E X.
Theorem
Proof. Define .7" to be the set { f : X -4 R such t h a t f is sublinear and f ( x ) _< S(x) for all x E X}. Define the order -~ o n . T by f - < g i f f ( x ) _> g(x) for all x in X. We will prove the existence of L by proving t h a t ~" is inductively ordered and then a p p l y i n g Zorn's l e m m a to it. Let ~ be a totally ordered subset of.T. We will show t h a t ~ has a m a x i m a l element. Define for w in X M ( w ) := i n f { g ( w ) : g E G}.
30
2. Functions on Vector Spaces
It is clear t h a t g -< M for any g E ~. Also note t h a t for any element f in ~" and for any element w in X, 0 = f(w - w) < f(w) + f ( - w ) . Therefore, f(w) >_ - f ( - w ) >_ - S ( - w ) > - o o . Therefore, for any element w E X, M(w) > - ~ . As M(w) <_ S(w) < c~ for all w E X it follows t h a t M is real valued. Let a be a non-negative real number. T h e n for any w E X we have
U(aw) = i n f { g ( a w ) : g E G} = inf{ag(w) : g E G} = ainf{g(w) : g E G} = aM(w). T h e second equality follows from the sublinearity of elements in {~. We now show t h a t M is subadditive. Let h and g be any two elements of ~. T h e n as {~ is totally ordered either g -< h or h -~ g. Let x and y be any two elements in X. If h -< g then h(x) + g(y) >_g(x) +g(y) > g(x + y), and i f g -< h then h(x) + g(y) >_ h(x) + h(y) >_ h(x + y). In either case there exists a function in ~ such t h a t h(x) + g(y) >_-h(x + y). It follows t h a t
M(x) + M ( y ) = inf{h(x) + 9 ( y ) : h,g E ~} > inf{h(x § y) : h E ~}
= M(x + y). T h u s M is a real valued sublinear function. It is clear t h a t M(x) <_ S(x) for all x E X and t h a t g -< M for any g E ~. Therefore, we have shown t h a t every totally ordered subset of .T has a m a x i m a l element which implies t h a t .T is an inductively ordered set. A p p l y i n g Zorn's l e m m a ( L e m m a 1.1.1) to .T we have t h a t there exists an element L in .T such t h a t if L -< f then L = f. As L E .T it follows t h a t L is sublinear and S -~ L. Let x0 be any element of X. T h e n from L e m m a 2.1.2 we have t h a t there exists a real valued sublinear function U such t h a t for all y E X, U(y) < n(y) <_S(y) and V(-xo) <_ -L(xo). As n is the m a x i m a l element in .T, and L -< U it follows t h a t U = L. This implies t h a t L(-xo) = -L(xo). As x0 was chosen arbitrarily it follows t h a t L ( - x ) = - L ( x ) for all x E X. If a is a negative real n u m b e r then - a L ( x ) : L ( - a x ) : - L ( a x ) (the first equality is true because L is sublinear and - a :> 0). Therefore, L ( a x ) = a L ( x ) for any a E R and for any x E X. Suppose, xl and x2 are elements of X. T h e n
L ( - x l - x2) = - L ( x l + x2) > - L ( X l ) - L(x2) = L ( - X l ) + L(-x2) > L(-Xl - x2). It follows t h a t for clements xl and x2 in X, L ( - x l - x 2 ) = L ( - X l ) + L(-x.2) which implies t h a t if xl and x2 are any two elements in X then L(xl + x2) = L(x~) + L(x~.). T h u s we have shown t h a t L is a linear function such t h a t L(x) < S(x) for all x in X. []
2.1 Sublinear Functions
31
2 . 1 . 3 . Let X be a vector space with a sublinear function S : X --+ R. Let xo be any element in X. Then there exists a linear function L : X ---4 R such that L(x) < S(x) for all x E X and L(xo) = S(zo). Lemma
Proof. For all w in X let U(w) : = i n f { S ( w + azo) - aS(xo) : a > 0}. F r o m L e m m a 2.1.2 we have t h a t U is a real valued s u b l i n e a r f u n c t i o n such t h a t U(x) <_ S(x) for all x E X a n d U ( - z o ) <_ - S ( x o ) . F r o m T h e o r e m 2.1.1 it follows t h a t there exists a linear f u n c t i o n L : X --~ R such t h a t L(x) <_ U(x) < S(x) for all x in X a n d L ( - z o ) <_ U ( - z o ) < - S ( x o ) . T h i s i m p l i e s t h a t L(xo) >_ S(xo). T h e r e f o r e , L(xo) = S(xo) which proves t h e lemma. [] 2 . 1 . 2 . Let X be a vector space with S : X --4 R a sublinear function defined on X. Let Y be a subspace of X. Let M : Y --4 R be a linear function on Y such that for all y E Y, M(y) ~ S(y). For any w in X let Theorem
v ( , , ) := i n f { S ( w - y) + M ( y ) :
e v}.
Then V is a real valued sublinear function on X such that U(x) ~ S(x) for all x in X and U(y) ~ M(y) for all y E Y. Also, there exists a linear function L: X --4 R such that n(x) <_ S(x) for all x in X and L(y) = M(y) for all y in Y. Proof. For any v in X let T(v) : = M(v) if v E Y, : = oo if v E X \ Y . It can be shown t h a t i r a 1 a n d x2 are a n y two e l e m e n t s in X then T ( z l + z 2 )
_<
T ( x l ) + T(x2) a n d if a > 0 is a real n u m b e r t h e n T ( a x ) = a T ( x ) . N o t e t h a t for a n y w i n
X
U(w) = i n f { S ( w - y) + M(y) : y E Y} = i n f { S ( w -
v) + T(v) : v E X ] ,
b e c a u s e T(v) = oo if v is n o t in Y. Also, U(0) = i n f { S ( - y ) + M ( y ) : y E Y} _~ i n f { M ( - y ) + M(y) : y E Y} = 0 > - c ~ . T h u s all c o n d i t i o n s on T , S a n d U s t i p u l a t e d in L e m m a 2.1.1 are satisfied and therefore U is a r e a l - v a l u e d s u b l i n e a r function such t h a t for all x in X , U(x) <_ S(x) a n d U(x) < T ( x ) . A p p l y i n g T h e o r e m 2.1.1 to U we d e d u c e t h a t t h e r e exists a l i n e a r f u n c t i o n L : X -+ R such t h a t for all x in X , L(x) <_ U(x). In p a r t i c u l a r , for all y in Y, L(y) <_ U(y) < T(y) = M(y). As Y is a vector space we know t h a t for all y in Y, n(y) < M(y) a n d L ( - y ) <_ M ( - y ) . As M a n d L are l i n e a r we have t h a t for all y in Y, L(y) = M(y). T h i s proves the l e m m a . []
32
2. Functions on Vector Spaces
2.2 Dual Spaces In this section we s t u d y the set of b o u n d e d linear functions on a n o r m e d vector space. As will be seen the initial t o p o l o g y induced by the functions in this set leads to a desirable topology on vector spaces. 2.2.1 ( B o u n d e d l i n e a r o p e r a t o r s ) . The set of all bounded linear operators between normed vector spaces (X, l]" [IX) and (Y, [[-I[Y) is denoted by B(X, Y). Thus B(X, Y) is the set
Theorem
{ T : T is linear and there exists K E R such that IITxlIY < K l l x l l x } .
Associate to each T E B(X, Y) the number IlTll : = sup{llTxllr: I l z I [ x < 1}.
(2.1)
This definition of IlTII makes B(X, V) into a normed vector space, l f V is a Banach space then so is B ( X , Y ) with the norm defined in (2.1).
Proof. T h e p r o o f of the assertion t h a t B(X, Y) is a n o r m e d linear space with the n o r m defined by (2.1) is left as an exercise. Suppose, {T,,} is a C a u c h y sequence in B(X, Y). T h e n given c > 0 there exists an integer N such t h a t n, m > N implies t h a t [[Tn - T m [[ < e. T h i s implies t h a t sup{[iTn(x ) - T,~(x)[]y : [Ix[Ix < 1} < c. It follows t h a t for a given e > 0 and for all x E X there exists an integer N such t h a t n , m > N implies t h a t ][T~(x) - Tm(x)[[y < e[lx[[z. Therefore for all x E X, {T~(x)} is a C a u c h y sequence in Y. Thus, for every element x E X there exists an element T(x) in Y to which T , (x) converges. Now we show t h a t T : X -+ Y is a linear m a p . Note that, for any integer n, [IT(x1) + T(x2) - T(xl + x2)[[y _< liT(x1) Tn(xl)[[y + lIT(x2) - Tn(x2)[[y + [ITn(Xl) + Tn(x2) - T(xl + x2)[ly. F r o m the linearity of T,~ and continuity of [[.[[x, we know t h a t the right h a n d side of the above inequality goes to zero as n --+ 0r Therefore, T(Xl) + T(x~) = T(xl + x2). Also, I I ~ T = ( x ) - T(~x)llv -- I [ T . ( c ~ x ) - T(o~x)llv. F r o m the definition of T(ax) the right h a n d side goes to zero as n --+ ~ and therefore sT(x) = T(ax). This proves t h a t T is linear. Now, we show t h a t T is b o u n d e d . As {T,} is a C a u c h y sequence there exists an integer M such t h a t m, n > M implies t h a t [[Tr~ - T~ll ___ 1. Therefore if m > M then IITmll < 1 + IITMII which implies t h a t if m > M then tiTm(x)lty <_ (1 + IITMII)IIxIIx for all x E X. Given any x E X and e > 0, choose m such t h a t m > M and [[T,~(x) T(x)l[v < e. Therefore, [IT(x)[[v < (1 + ]]TMI[)llx[[x -4- e. As e is a r b i t r a r y it follows t h a t IIT(x)llv < (1 + IITMII)llxltx. This relation holds for any x E X. Therefore T is bounded. G i v e n any real e > 0, choose M1 such t h a t n , m > M1 implies t h a t liT.- Troll < This implies t h a t for all x E X, if n, m > MI then t [ T , ~ ( z ) T m ( x ) l l r _< ~[Ixllx, Given any x choose n _> M1, large enough so t h a t
2.2 Dual Spaces
33
IITn(x) - T(x)llY _ ~llxllx. Therefore, if m > M1 then liT(x) - Tm(x)llv < dlx]lx- This implies t h a t lIT-Troll < c i f m > Mr. As ~ is a r b i t r a r y it follows t h a t lIT-Troll --+ 0. This proves t h a t Tn converges to T in the n o r m topology. T h u s we have established t h a t any Cauchy sequence in B ( X , Y ) converges if Y is Banach. [] If X is not the e m p t y set or {0} then it is also true t h a t IITtl =
sup{llTzllg : llzllx = 1} and Ilzll-- sup{ IITzlIY : x # 0}. P r o o f of this Ilxllx assertion is left to the reader.
C o r o l l a r y 2.2.1 ( H a h n - B a n a c h : e x t e n s i o n f o r m ) . Let Y be a subspace of a normed vector space (X, II'llx)- l f f : Y --+ R be a bounded linear function on Y then there exists a linear function F : X --+ R such that F ( y ) = f ( y ) for all y 9 Y and Ilfll = Ilfll. Proof. Let S : X --~ R be defined by S ( x ) = Ilfll Ilxllx. T h e n it is clear t h a t S is a real valued sublinear function. It is also true t h a t for all y 9 Y, If(Y)l < S ( y ) ( b e c a u s e ItfFI = s u p { I f ( y ) l : y 9 Y, ]lyllx < 1}). F r o m T h e o r e m 2.1.2 it follows t h a t there exists a linear function F : X -~ R such t h a t F ( x ) <_ S ( x ) for all z 9 X,
(2.2)
F(y) = f ( y ) for all y 9 Y.
(2.3)
arid
E q u a t i o n (2.2) implies t h a t IIFII _< [Ifll and e q u a t i o n (2.3) implies t h a t IIFII _> Ilfll. Therefore, IIFII-- Ilfll. [] D e f i n i t i o n 2 . 2 . 1 ( D u a l s p a c e s ) . Let (X, ll" IIx) be a normed vector space. Then X* denotes the set of bounded linear functions from X to R. The norm on any element x* in X* is defined as
IIx'll := s u p { x * ( x ) : Ilxllx _< 11.
(2.4)
In other words we have X* = B ( X , R). X* is said to be the dual space of X . It follows i m m e d i a t e l y from T h e o r e m 2.2.1 t h a t the dual space of any n o r m e d vector space is B a n a c h (because R is Banach). T h e o r e m 2.2.2. Let xo 9 X where (X, I1" IIx) is a normed vector space with X* as its dual. Then there exists an element x~ 9 X* such that IIz;ll = 1 and < xo, x; >----IIx011x. Proof. Let S : X --+ R be defined by S ( x ) := Ilxllx for all x 9 x . T h e n S is a real valued sublinear function. From L e m m a 2.1.3 we know t h a t there exists a linear function L : X --+ R such t h a t L(x) < S ( x ) for all z 9 X a n d L(xo) = S(xo). T h i s implies t h a t IILII = x and Z(x0) = II~011x. Denote Z by x~. This proves the t h e o r e m . []
34
2. Functions on Vector Spaces
T h e o r e m 2.2.3. Let (X, I I IIx) and (Y, I1" IIY) be normed vector spaces. Let X • Y be endowed with the product norm. Then there exists an isometric isomorphism between ( X x Y)* and X* • Y* with the norm on X* x Y* defined by
II(x*, y')ll := max(llx'll, Ily'll}, where x* E X* and y* E Y*. Proof. Let f be a b o u n d e d linear real valued function on X • Y. T h e n f r o m the linearity of f we have for any (x, y) E X • Y, f ( ( x , y)) = f ( ( x , 0) + (0, y)) = f ( ( x , 0)) + f ( ( 0 , y)).
(2.5)
It is clear that x* a function on X defined by x*(x) = f ( ( x , 0)) is linear. Similarly, y* a function on Y defined by y*(y) = f ( ( 0 , y)) is linear. Also, from equation (2.5) we have ]x*(x) + y*(y)l = I f ( x , y ) l _< Ilfll(llxllx + Ilyllr) for all (x,y) ~ X • In particular, Ix'(x)l _< Ilfll Ilxllx which implies
that IIx'll _< Ilfll. Similarly, it max{llx*ll, IlY*II) < Ilfll.
can
be shown that IlY*II _< Ilfll and therefore
Given any e > 0 we know t h a t there exists xr E X and yr E Y such t h a t Ilxdlx + IlY, IIY < 1 and If(x,y)l >_ I l f l l - r Therefore, for every e > 0 there exists xr E X and y, E Y such t h a t I I x , l l x + l l Y , llr < 1 and I x * ( x ) + y * ( x ) l > Ilfll- r which implies t h a t IIx'll IIx, llx + Ily*ll Ily, llr > Ilfll- r T h u s ,
max{llx'll, Ily'll}(llx,llx + Ily, llYI) > Ilfll - r As IIx,llx + IlY, IIY < 1 and e > 0 is a r b i t r a r y it follows t h a t max{llx*ll, IlY*II) > Ilfll. T h u s we have established t h a t the m a p i : (X • Y)* --+ X* • Y* which takes f E (X x Y)* into (x*, y*) E X* x Y* as defined above is isometric. T h e fact t h a t i is isomorphic is left to the reader to prove. [] D e f i n i t i o n 2.2.2 ( S e c o n d d u a l s p a c e ) . Let (X, I1' I[x) be a normed vector space with (X*, I1' II) as its dual. The set of all bounaed linear functions from X* to R is the second dual of X. It is denoted as X**. Every element x in X can be t h o u g h t of as a m a p f r o m X* into R. T h u s every element x E X can be identified with an element in X**. For any x* in X* let (J(x))(x*) := x*(x). Note t h a t (J(x))(x~ + x~) = (x*1 + x~)(x) = x*l(x)+x~(x ) = (g(x))(x*l)+(g(x))(x~) where x E X, x~ E X* and x i E X*. Also, (g(x))(c~x*) = ((~x*)(x) = c~x*(x) = c~(J(x))(x*). Therefore, it follows that J(x) is a linear m a p from X* to R. It is also a b o u n d e d linear function on X*. Indeed, for x E X and x* E X*, I(J(x))(x*)l = Ix*(x)l < I]x*lI I]xilx. Therefore, s u p { ( g ( x ) ) ( x * ) : IIx'll < 1} < Ilxllx.
(26)
Thus we have established t h a t
IIJ(x)ll_ Ilxllx.
(2.7)
2.2 Dual Spaces
35
In fact we will show using a H a h n - B a n a c h result t h a t IIJ(x)ll = Ilxllx. We call the m a p J : X --4 X** the canonical map from X to X**. Let (x, II. [Ix) be a n o r m e d vector space with X* as its dual. We use the symmetric notation < x,x* > to denote x*(x). With this notation we have ( J ( x ) ) ( x * ) = < x * , J ( x ) > = < x,x* > . We will call < .,. > the bilinear form on X. D e f i n i t i o n 2.2.3 ( A d j o i n t m a p ) . Let A : (X, [ [ - [ [ x ) -+ (Y, [[" [Jr) be a bounded linear map from a normed vector space X to a normed vector space Y. The adjoint of the map A, denoted by A* is a map from Y* to X* defined by <x,A*(y*)>x:=
>v for a l l x 9
and for ally* 9
where < .,. > x is the bilinear form on X and < .,. > y is the bilinear form on Y. T h e o r e m 2.2.4. For any element xo of a normed linear space (X, in ' Ilx) there exists an element x~ in X* such that [Ix~[I <_ 1 and[ < x0,x~ > I -IIx011x. Also, if j : x x ' " is the canonical map then IlJ(x)fl = IJxllx. Proof. Note that JR' mix : X -+ R is a sublinear function on X. Applying L e m m a 2.1.3 we know that there exists a linear function L : X -~ R such that i ( x ) <_ lixi[x for all x E X and L(xo) = Ilxolix. Because L is linear it follows that IL(x)[ _< lixllx for all x in X which implies that [[51[ _< 1. Therefore, i is in X* and we denote i by x~. T h u s IIx~)]t 5 1 and < Xo,X~ > = [[xo[lx. We have established earlier t h a t [[J(xo)[[ _< [[xo[[x. However, I[J(xo)[[-s u p ( < x * , J ( x o ) >: IIx'll <_ 1} ___ < x ; , J ( x o ) > = < xo, x; > = IIx0llx. This proves that IlJ(x)llx = Ilxllx for all x in X. [] T h e o r e m 2.2.5. Let (X, [l" mix) and (Y, II [iv) be normed vector spaces with a linear map A defined from X to Y. Let A* : Y* -+ X* denote the adjoint of A. Then, IIAll = IIA*II that is sup{llA(x)l[y : x E X and IIxIIx <_ 1} = sup{]iA*(y*)[I : y* 9 Y* and I]Y*[I _~ 1}. Proof. T h e proof is similar to the proof of T h e o r e m 2.2.4 and is left to the reader. [] D e f i n i t i o n 2 . 2 . 4 ( R e f l e x i v e ) . Let (X, I1" IIx) be a formed vector space. We say X is reflexive if J ( X ) = X**. Thus for a reflexive space every element x** in X** can be identified with an element x in X.
36
2. Functions on Vector Spaces
2.3 Weak
Topologies
We s t u d y here the weakest t o p o l o g y on X which makes all b o u n d e d linear function on X continuous and the weakest t o p o l o g y on X* which makes all the elements of X viewed as functions on X* continuous. t o p o l o g y ) . Let (X, II'lIx) be a normed linear space with X* as its dual. The initial topology on X induced by the elements of X* is called the weak topology on X and is denoted by W ( X , X ' ) . The initial topoogy on X* induced by the set
D e f i n i t i o n 2.3.1 ( W e a k t o p o l o g y , W e a k - s t a r
J ( X ) :: {J(x) :x 9 X}, where J : X -+ X ' * is the canonical map is called the weak-star topology on X* and is denoted by W ( X * , X ) . Let A be any open set in R. As A is open, for every r 9 A there exists a positive n u m b e r e, > 0 such t h a t the set {t : It - r] < er} is a subset of A. Therefore, A = L.J { t : i t - r[ < e,}. tEA
Ifa:~ is an element t o p o l o g y for X (see - r [ < e} where x* n e i g h b o u r h o o d of x form
of X* then x~ : X -+ R and a subbase for the weak L e m m a 1.2.6) are sets of the form {x 9 X : [ < x,x* > 9 X*, r 9 R and e > 0. This implies t h a t if N is a in the weak t o p o l o g y then it m u s t c o n t a i n a set of the
n
["){y 9 x : I< y -
>l<
i----1
where n is an integer, x l , . . . , x n are elements of X* and el, 9 9 9 e, are positive real numbers. Similarly if N* is a n e i g h b o u r h o o d of z* in the weak-star t o p o l o g y t h e n it m u s t contain a set of the form 11
n(y" 9 x ' :
>1
i=l
where n is an integer, x l , . . . , xn are elements of X and e l , . . . , en are positive real numbers. Lemma
2 . 3 . 1 . Let {xx} be a net in X , then zx --+ Xo in the weak topology
if and only if < x x , x * >--+< xo,x* >
in R for all x* in X*.
Similarly, if {x~} is a net in X*, then x"x --+ x~ in the weak-star topology if and only if <x,x~
>--+<x,x~ >
in R for all x in X .
2.3 Weak Topologies
Proof. Follows from L e m m a 1.2.7. T h e o r e m 2.3.1 ( B a n a c h - A l a o g l u ) . space with X* as its dual. The set
37 []
Let (X, I1" IIx) be a normed vector
B* := {x*: llx*ll < M},
(2.8)
is compact in the weak-star topology for any M G R. Proof. Let x~ be a universal net in B* on the directed set A. For any x in X, (g(x))(x~) = < x, x~ > is a universal net in R (see L e m m a 1.2.11) and furthermore, [ < x,x~ > I < M[[x[[x. Therefore < x,x~ > is a universal net in the compact set B := {r E R : -M[[x[[x][ _< r <_ M[[x[[x}. Therefore, there exists an element f ( x ) in R such that < x, x~, >-+ f ( x ) and f ( x ) E B (see Theorem 1.2.3). Let x and y be any two elements in X. Let < x,x~ >--4 f ( x ) and < y, x*~ >-4 f(y). Also, it is true that < x + y, x~ >--+ f ( x + y). This implies that < x, x~ > + < y, x~ >--4 f ( x + y). Therefore, f ( x + y) = f ( x ) + f(y). It can be shown in a similar manner that f ( a x ) = a f ( x ) for any real number a and x in X. This implies f : X --4 R is linear. As f ( x ) is in B it follows that I]f[[ := sup{[f(x)l : [Ix[Ix < 1} _< M. Therefore, f belongs to B* and by definition < x, x~, >--4 f ( x ) for all x in X. Hence, from Lemma2.3.1 it follows that x~, --~ f. Thus, every universal net in B* is convergent in the weak-star topology which implies that B* is compact in the weak-star topology (see Theorem 1.2.3). [] This shows that the weak-star topology is important because, unlike the norm topology, the unit norm ball is compact. However, it is not true that every sequence in a compact topological space has a convergent subsequence (an example is given later). The following result guarantees existence of a convergent subsequence when the space is separable. T h e o r e m 2.3.2. Let (X, [1" [Ix) be a separable normed vector space with X* as its dual. Then every sequence in {x* : [tx*[[ < M} has a convergent subsequence in the weak-star topology where M E R.
Proof. Left to the reader. [] Next we present a result on the compactness of the norm ball of X in the weak topology. T h e o r e m 2.3.3. Let (X, [[. [[x) be a normed vector space with X* as its dual. The set B := {x: Ilxllx
_< 1},
is compact in the weak topology if and only if X is reflexive.
(2.9)
38
2. Functions on Vector Spaces
2.4 s
Spaces
In this section we study the vector space of sequences with different norms imposed on it. We denote the space of sequences by g. Therefore, every element of g is a function from the set of integers to the real numbers; g = { z l z : I --+ R}. I f a E R and x E t then we define ax by (az)(i) = crz(i). For z E e and y E g we define z + y by (x + y)(i) = x(i) + y(i). It is evident that e with the scalar multiplication and vector addition as defined above is a vector space. Often, for an element x E t and i E I, x(i) is denoted as xi. For an element z in t define
Ilxllp :=
I~il p
and
i=--oo
Ilxll~ :--
sup
I~il,
-r
where 0 < p < oo is a real number. D e f i n i t i o n 2.4.1 (gp s p a c e s ) . Let p be a real number such that 0 < p < oo.
The ~p spaces are defined by
ep :-- {z 6 e: II~llp < or We will restrict our attention to right-sided sequences i.e. for elements of/~ which m a p negative integers to zero. This is a matter of convenience and the results in this section are valid in general. D e f i n i t i o n 2.4.2. co is a subset o f t such that for every x 6 e0, limxi = 0. Now we show that ev spaces are nested. Lemma2.4.1.
If O < p < q < oo then ~p C co and ev C s
Proof. We leave the first part of the proof to the reader. Suppose z E s Then ~-~i~162[z~IP < oo. Therefore, there exists an integer N such that i > N implies that Iz~l < 1. Note that if p _< q then for i > N, Izil q _< Iz~l v Therefore, oo
E
g
Ixilq = E
i=0
i=0 N
(3o
Ixitq + E
oo
<--E Ixilq'[- E i=0 N
_< ~
Ixilq
i=N-[-1
IxiIp
i=N-t-1
I~il q + I1~11~ < oo
i=0
Thus z E s
[]
2.4 gp Spaces
Lemma
39
2.4.2 ( H o l d e r ' s i n e q u a l i t y ) . Suppose that 1 <_ p < co and ~ +
!q = 1, then for elements x and y in g
(2.10)
< {ixllp{{y{{ ,
where xy E g is defined by xy(i) = x(i)y(i) with i being any integer. Also, the equality holds in (2.10) if and only if ( ~ ) p = (~)q for all i >_ O. Proof. We will assume that 1 < p < co and 1 < q < co. The other cases are left to the reader to prove. We will first show t h a t if r > 0, t > 0 and 0<)~< 1then rXt (l-x) <_ ,~r + (1 - A)t.
(2.11)
Consider the function f
f ( h ) = h A - Ah + )~- l, which m a p s the set of positive real numbers to R. Then f ' ( h ) = A(h( x - l ) - 1). Therefore for 0 < h < 1, f ' ( h ) > 0 and for h > 1, f ' ( h ) < 0. This implies that f is a strictly increasing function in the region 0 < h < 1 and therefore f ( h ) < f(1) = 0 if 0 < h < 1. Similarly in the region h > 1, f is an strictly decreasing function and f ( h ) < f(1) = 0 if h > 1. This implies that f ( h ) < 0 if h _> 0 and the equality holds only if h = 1. In other words h A _< ~ h + 1 - A, with equality only if h = 1. If t > 0 then by substituting ~ for h we have the inequality in (2.11). If t = 0 then the inequality in (2.11) clearly holds. Let r=
,)~= 1
,t=
and l - A =
.
From inequality (2.11) we have
ix,
y,I
Therefore,
+ i=0
i=0
This proves the lemma.
= ~ §
= 1.
i=0
[]
L e m m a 2.4.3 ( M i n k o w s k i ' s i n e q u a l i t y ) . Suppose that 1 <_ p <_ co then for elements x and y in
{{x+ y{{,,< {{x{{,,+ {{y{{p. The equality holds if and only if there exists a real k such that xi = kyi for all i = O,. . .n.
40
2. Functions on Vector Spaces
Proof. We will assume that 1 < p < oo and leave the other cases for the reader to prove. Let q be such that 1 + 1 = 1. Note that N
N
N
I~ + y,I p = ~ i--0
I~ + y,I p-a Ix,I + ~
i=0
I~ + y~IP-~Iy~I
i=0
~_(i=~o,x,i-~-yil(P-1)q) iq[(i=~o'zi'P)~~'-(i=~o'YilP)~] where the first inequality follows by applying ttolder's inequality (see L e m m a 2.4.2), whereas the second equality is true because ( p - 1)q = p. Also, note that the equality holds if and only if (see L e m m a 2.4.2), N
Ix,+y,l(p-~)q
_
--
I~, + y,I (p-1)q i-=0
N
I~,1p
=
N
ly, Ip
~ Ix, I~
~
i----0
i=O
I~,1~
Therefore the equality holds if and only if there exists a real k such that xi = kyi for all i = O , . . . n .
Yilp), !
inequality bY ( ~-~ i=0 Ixi -t-
Dividing the ab~
and noting that 1 +
= 1, we have 1
1
i=O
1
i=0
This holds for all positive integers N. Therefore, IIx+yllp 5 IlXllp+ Ilyllp. It follows from L e m m a 2.4.3 that II lip is a norm on gp for 1 < p < ~ . T h e o r e m 2.4.1. T h e following s t a t e m e n t s are true. 1. If 1 <_ p <_ oo then (gv, I1" lip) is a B a n a e h space. 2. If O < p < 1 then (gv, I1" lip) is complete. Proof. (1) Let {x n} be a Cauchy sequence in gv with 1 _< p < co. Then given any e > 0 there exists an M such that n, m > N implies that
[xn(i) - xm(i)[ p ~_ [[Xn -- Xm[[ p ~_ e. Therefore, {xn(i)} is a Cauehy sequence in R. This implies that there exists an element x(i) such that x'~(i) -+ x(i) because R is complete. We will show now that x := { x ( 0 ) , x ( 1 ) , x ( 2 ) , . . . , } is in gp. Indeed, from L e m m a 1.4.2 we know that there exists an M E R such that for all n = 1 , . . . , o o
2.4 gp S p a c e s
IIx"[l
41
M.
This implies t h a t for all n = 1 , . . . , c ~ K
Iz"(i)l p < Z i=0
M,
where K is any positive integer. Taking limits as n -+ co it follows t h a t K
Z fx(i)l p <
M,
i=0
for any positive integer K. Therefore, x E tp. We prove now t h a t x n converges to x in [1' lip norm. Given any e > 0 there exists N such t h a t n, m > N implies t h a t I]x" - x'r~llpp < ~. This implies t h a t for any integer K and n, m > N K
Ix"U) - xm(i)l p <_ e. i=0
Letting m -+ c~ we have t h a t for any integer K and n _> N K
Ix"(/) - z(i)l p _< e. i=0
This proves t h a t [Ix n - ~llp ~ 0 as n -+ ~ . T h u s we have established t h a t if 1 _< p _< oc then (gp, I1" lip) is a B a n a c h space. If p = co and {x"} is a C a u c h y sequence in goo, then given any e there exists a N such t h a t for i = 1 , . . . , oo and for any n, m >_ N ]xn(i) - xm(i)] _< e.
(2.13)
This implies t h a t {x'~(i)} is a C a u c h y sequence in R and we suppose it converges to x(i). Letting m -~ c~ in (2.13) we see t h a t given any e > 0 there exists a N such t h a t if n > N then
Iz"(i) - z(i)l < e, for any i = 1 , . . . , c ~ . This proves t h a t x n --+ x in the I1 I1~ norm. T h u s we have established t h a t if 1 < p _< oe then (tp, I1" lip) is a B a n a c h space. (2) P r o o f is identical to the p r o o f for 1 < p < e~. gp is not a B a n a c h space for 0 < p < 1 because the Minkowski's inequality does not hold. [] L e m r n a 2 . 4 . 4 . If O < p < ee then (gp, I1" lip) is separable.
Zgso, (co, I1 I1~)
is separable. Proof. Let A be a subset of g such t h a t for every an integer N such t h a t if i > N then x(i) = 0 i = 0 , . . . , o0. We will show t h a t A is dense in (gp, in fp. Given any e > 0 there exists an integer M
element x E A, there exists and x(i) is rational for all I1" lip)- Let x be any element such t h a t
42
2. Functions on Vector Spaces
Z
ix(i):<
i=M+I
For every x(i), with i < M there exists a rational n u m b e r r(i) such t h a t Ix(i) - r(i)l p <_ 2,-~ . Ifefine r e gp as r := { r ( 0 ) , r ( 1 ) , . . . , r ( M ) , 0 , 0 , . . . } . Note t h a t r E A and M
e~
IIz- flipv = ~ Ix(i) - r(i)lP + i=0
~
Ix(i) - r(i)l p
i---M+I
<_
+ i=0
L (i)l i=M+I
i=0
It can be shown t h a t A is a c o u n t a b l e set. T h u s we have established t h a t if 0 < p < oo then gp has a countable dense subset and therefore it is separable. T h e p r o o f for co being separable is very similar to the one given above and is left to the reader. [] 2.4.2. Let 1 < p < oo. Then there exists an isometric isomorphism between gp and gq + ~ = 1. Theorem
::}
9
Proof. We prove the t h e o r e m for 1 < p < ~ . T h e p r o o f of the t h e o r e m when p = 1 is left to the reader. Let x* E gp be a b o u n d e d linear function on gp. Let e '~ E g denote the sequence {e'~(i)} such t h a t e n (i) = 0 if i :/: r, =lifi=n. N
For any element x = {x(i)} in g~ it can be shown t h a t Z
x(i)ei converges
i=0
to x in the 11" lip n o r m . As x* is continuous it follows t h a t N
z*(x) = z*( lim Z x ( i ) e i ) = N--* ~
i=0
N
lim E x ( i ) x * ( e i ) . N-~ e~
i=0 N
Let y := {x*(e~
x*(el), ...}. Therefore, z*(z) =
show t h a t y E gq. Let z N in ~p be defined by
xN ( i) = ]y(z)l,sgn(y(z)) . ~ if i < N =0 i f i > N. W i t h this definition we have
lira ~ - ' x ( i ) y ( i ) . We will
N---+ oo
i=0
2.4 ~v S p a c e s
43
1
,xN,,p=
'
(2.14)
and
N N N X'(xN) -~ E xN(i)y(i) = E ly(i)[~+I -- ~ ly(i)l q" i=0
However, that
i=0
(2.15)
i=0
Ix*(xN)l < IIx'll [IxNIIp and thus it follows from (2.14) and (2.15)
E lY(i)[q <-IIx'll
ly(i)[ q
i=0
,
i=0
which implies that
lY(i)] q
_< IIx'll,
i=0
and l
q). < ,,x.,, This holds for all N and therefore ]lyllq ~_ II~'ll. Note that IIx'll := sup{Ix*(x)l : II~llp _< 1}. Therefore for any e > 0 there exists x e gp such that r
[x*(x)l+e >_ [[x*[[ and IIxIlp _< 1. This implies that ~
Ix(i)l lY(i)[ _> tlx*ll-~.
i=0
From ttolder's inequality it follows that Ilytlqllxllp >_ I I ~ ' l l - ~, As II~:llp _< 1 and e is any arbitrary positive number it follows that Ilyllq _> II~'ll and therefore Ilyllq = II~*ll, Thus we have shown that there exists a map F : g~ --+ gq defined by r ( ~ ' ) = {x.(~i)} OO
such that F is isometric and also, x*(x) -- E
x(i)F(x*)(i). It is clear that
i----0
F is one to one. We will now show that F is onto. Indeed, let z := {z(i)} be any element t'~r
in gq. Let f : gv --~ R be defined by f(x) = ~ x ( i ) z ( i ) . OO
linear and [f(x)l = [ Z x ( i ) z ( i ) ]
It is clear that f is
i----0
<_ [[xt]p[[z[Iq. Therefore, [[f[[ < [[Z[[q and
i=0
f E x*. It can be easily verified that F(f) = z. Thus F is a one to one and onto isometric map from iv to gq. []
44
2. Functions on Vector Spaces
L e m m a 2.4.5. There exists an isometric isomorphism which maps c~ to ~1 where c; is the dual space of (co, I1' I1~).
Proof. Let x" E c; and let V := {x*(ei)} where e i E/~ is defined in the proof of the previous theorem. Define x N E co by z~(i) = sgn(y(i)) if i _< N =0
i f i > N. N
Then x*(x N) = E s g n ( y ( i ) ) y ( i )
N
= E
i=0
[V(i)]. As
Ix'(xN)l
< Itx'll
IIxNIl~
i=0
N
we have E
ly(i)I
_ IIx'll IIxNII~ _< IIx*l[.
This holds for any arbitrary N
i=0
and therfore Ilylll _ IIx*ll. The rest of the proof is similar to the one given for 1 < p < ~ and is left to the reader. [] Often, if there is a isometric isomorphism between two spaces X and Y the notation X = Y is employed. For example c$ = el means that there is a isometric isomorphism between the set of bounded linear functions on co and /~1. Thus, Theorem 2.4.2 says that s * = eq where !v_ t _ l ~ = 1 and 1 _< p < e~. The following example illustrates that not every sequence in the set {x" : IIx'll _< 1} has a weak-star convergent subsequence.
Example 2.4.1. Consider the normed linear space (g~, 1[. denote its dual. Define x,~ E e~ by
II~)
and let e~
< x, x~, >= x(n). It is clear that []x~,ll = sup{ I < x,x~ > I: Ilxlloo < 1} <_ 1 for all integers n. Let x~k be a subsequence of {x~}. Let x be an element in s such that
x(i) =
1 i f i = n k , k is even =-lifi=nk, kisodd = 0 otherwise.
It is clear that < x,x~k > = 1 if k is even and < x,x*~k > = - 1 if k is odd. Therefore, < x, x~k > does not converge which implies that x*, is not a convergent sequence in e ~ . However, it should be noted that. Banach-Alaoglu theorem implies that B* := {x* 9 f~,llz*N _< 1} c f~o is weak-star compact. Therefore every universal net has a convergent subnet. But we have demonstrated that not every sequence has a convergent subsequence (however, every sequence has a convergent subnet).
3. C o n v e x
Analysis
One of the most important concepts in optimization is that of convexity. It can be said that the only true global optimization results involve conw~xity in one way or another. Establishing that a problem is equivalent to a finite dimensional convex optimization is often considered as solving the problem. This viewpoint is further rienforced due to efficient software packages available for convex programming. In this chapter we study convex sets and convex functions. The study of sublinear functions in the previous chapter will aid in establishing results on convex optimization. The famous Kuhn-Tucker Lagrange duality result is established which is important because it converts a constrained optimization problem to an unconstrained optimization problem.
3.1 Convex
Sets
and
Convex
Maps
In this section we present the definitions of convex sets, convex maps and related concepts. Prcliminary results on convexity are given. D e f i n i t i o n 3.1.1 ( C o n v e x s e t s ) . A subset ~2 of a vector space X is said to be convex if for any two elements cl and c2 in ~ and for a real number A with 0 < A < 1 the element ,~Cl + (1 - A)c2 E f2 (see Figure 3.1). The set {} is assumed to be convex.
L e m m a 3.1.1. If C is a convex subset o[ a normed vector space, (X, I1" IIx), then i n t ( C ) is convex and C is convex where C denotes the norm closure of the set C. Proof. If int(C) is empty then by assumption it is convex. Suppose, xl and x2 are elements in int(C). Then there exists an c > 0 such that
B(x ,O := D : IIx-
,llx <
and B(x~,~) := { x : IIx - x 2 l l x
< ~},
both are subsets of C. Let A E R be such that 0 < A < 1. Let z := AXl + ( 1 A)x~. Then for any w 6 X with tlwllx < ~,
46
3. Convex Analysis
X2
9. . - X 1
Fig. 3.1. In a convex set a chord joining any two elements of the set lies inside the set. Z -~- W = ~(X 1 -~- W) -~- (1 - ~)(x2 + w).
However, x l -l- w and x2 -I- w are elements in C. Therefore, from convexity of C, z -I- w is in C. Thus, z is in int(C). The proof that C is convex is left to the reader. []
D e f i n i t i o n 3.1.2 ( C o n v e x c o m b i n a t i o n ) . A vector o,f the ,form ~-~-~=1 ,kkxk, where ~ kn= l ~ k = 1 and )~k ~_ 0 f o r all k = 1, .. . , n is a convex combination o f the vectors x l , . . . , x n .
D e f i n i t i o n 3.1.3 ( C o n e s ) . A subset C of a vector space X is a cone if f o r every non-negative ct in R and c in C, ac E C. A subset C o f a vector space is a convex cone i f C is convex and is also a cone.
D e f i n i t i o n 3.1.4 ( P o s i t i v e c o n e s ) . A convex cone P in a vector space X is a positive convex cone if a relation ' >' is defined on that .for elements x and y in X , x > y if x - y E P. := N a n d x x Eint(P). Similarlyx < y ifx-y E-P Given a vector space X with positive cone P the positive defined as
X based on P such We write x > 0 i f < 0 ifx Eint(N). cone in X " , P ~ is
P ~ := {x* G X* :< x,x" > > 0 for all x E P}. Example 3.1. I. Consider the real number system R. The set
P := {x : x is nonnegative},
3.1 Convex Sets and Convex Maps
47
defines a cone in R. It also induces a relation _> on R where for any two elements x and y in R, x >_ y if and only if x - y 9 P. T h e convex cone P with the relation _> defines a positive cone on R. D e f i n i t i o n 3 . 1 . 5 ( C o n v e x m a p s ) . Let X be a vector space and Z be a vector space with positive cone P. A mapping, G : X --~ Z is convex if G ( t x + ( 1 - t ) y ) <_ t G ( x ) + (1 - t ) G ( y ) f o r all x , y in X and t with 0 < t < 1 and is strictly convex f f G ( t x + (1 - t)~) < tG(x) + (1 - t ) a ( ~ ) for all x r in X and t with O < t < I.
f(x)
If(b)
f( a ) ~ f(Xa ~ (1-~.)b) i
a
| ka+(1-~)b
..r
X
b
Fig. 3.2. A convex function.
3 . 1 . 2 . Let ( X , II" Itx) be a finite dimensional normed vector space and let C be a convex subset of X . Let f : C -+ R be a convex functional on C. Then f is continuous on i n t ( C ) . Lemma
Proof. F r o m L e m m a 1.5.3, we know t h a t it suffices to prove the l e m m a for (R '~, ] " ]1). We will prove the l e m m a for R 2 which can be easily generalized to R n. We will also assume t h a t 0 E i n t ( C ) (there is no loss of generality in doing so). We will show t h a t there exists an e > 0 and M E R such t h a t if ( x , y ) 9 R ~, and I(x,Y)ll _< ~ then f ( ( x , y ) ) <_ M (i.e. f is b o u n d e d above in some n e i g h b o u r h o o d of 0.) As 0 9 i n t ( C ) we know t h a t there exists a "rectangle" A such t h a t A : = {(x,y) : a l < x < 51, and as _< y _< b2}, where a t , a ~ , b l and b2 are elements in R, with A segl and seg2 be defined as
C
C. Let the sets
48
3. Convex Analysis {x : x = A(al,a2) + (1 - A)(al, bs) such that A 9 R and 0 < A < 1}
and
{x:z
= A(bl,a2) + (1 - A)(bl, bs) such that A 9 R and 0 < A < 1},
respectively. For any element (x, y) in segl we have f((x, y)) < Af((al, as)) -t- (1 - A)f((al, b2)) < If((al, as)) + f ( ( a l , bs))l. Similarly, for any element (z, y) 9 segs, f ( ( x , y)) < I/((bl, a2) + :((bl, b2))l. It is clear that any element (x, y) of A is a convex combination of elements in segl and segs. Thus, for all (z, y) 9 A,
(z,y)) < [f((ax,a2)) + f ( ( a l , bs))l + I f ( ( b l , a s ) + f ( ( b l , b 2 ) ) l =: M. Choose 5 > 0 such that B(0,5) := {r 9 R s : Irll _< 5} C A. Thus we have shown that f is bounded above in the neighbourhood, B(0, 5) of 0 by M. Given any 0 < e < 1 in R let x 9 B(0, eh) := {r 9 R 2 :lrl < eh}. This implies that 1-x 9 B(0,5) and therefore
f ( c ( ! x ) + ( 1 - c)0) < e f ( ! x ) +
( 1 - e)f(0) < cM + ( 1 - e)f(O).
Therefore, f ( x ) - f(O) <_ e(M - f(0)). Also, f(0) equals
1
1
x + (1 - 1 + , ) ( -
l x ) ) _<
f(x) +
(1-
1
1
Therefore,
1
f ( z ) - f(O) >_ - e f ( - - ~ z )
+ el(O) >_ - e ( M - f(0)).
Thus we have shown that given any 0 < e < 1 there exists a neighbourhood B(O, eh) of 0 such that if x is in this neighbourhood then I:(~) - f(0)l _< e(M - f(0)). Therefore, f is continuous at 0. rn
3.2 Separation
of Disjoint
Convex
Sets
Consider the vector space R 2. The equation of a line (see Figure 3.3) in R s is given by
m l X l -1- msx2 -~ c, where ml, ms and c are constants. The graph of a line is given by the set n -- {(Xl,XS)lmlXl + msxs = c}, which can be written as
n = {x E R s] < x,x* >= c},
(3.1)
where x* = (ml, ms). Note that if rns -- 0 then we have a vertical line. We now generalize the concept of a line in R 2 to normed vector spaces.
3.2 Separation of Disjoint Convex Sets
49
X2
(Xl,
c} (xl, X2) ~k
A ={x 9 < x , x >~< c}
L= {x : < x,x*> =c} Fig. 3.3. Separation of R 2 into half spaces by a line L.
D e f i n i t i o n 3.2.1 ( L i n e a r v a r i e t y , H y p e r p l a n e s ) . A subset V of a vector space X is a linear variety if there exists an element xv in X such that V = xv + M : = { x : X = X v + m f o r s o m e r n E
M},
where M is a subspace of X . A subset H of X is called a hyperplane if it is a linear variety which is proper (i.e. there exists an element xo in X which is not in H ) and maximal (i.e. if i l l is another linear variety which contains H then H1 = X ) . The line L defined earlier is a hyperplane in R 2. T h e o r e m 3.2,1. H is a hyperplane in X if and only if there exists a nonzero linear function f : X --~ R such that H := {x : f ( x ) = e}, where c E R. Proof. (=r Let H be a hyperplane in X. Then there exists x0 in X such t h a t H = x o + M := { x 0 + m : rn E M} where M is a proper subspace of X which is maximal. Let xl in X be such that xl ~ M. It is clear that the set M@Rxl
:= { m + y : m E
M andy=axl
for some a E R )
is equal to X (because it is a subspace which contains M and M is m a x i m a l ) . It is also clear t h a t for any x in X there exists unique elements f ( x ) E R and m~ E M such that x = ra~ + f(X)Xl.
50
3. Convex Analysis We will now show t h a t f is linear. Let x a n d y be e l e m e n t s in X , t h e n
x = rnz + f ( x ) x l a n d y = m y + f ( y ) x l . T h e r e f o r e x + y = ( f ( x ) + f ( y ) ) x l . F r o m the definition a n d uniqueness of f ( x x + y = m x + y + f ( x + y ) x l with f ( x + y ) = f ( x ) + f ( y ) and mx+y can be shown s i m i l a r l y t h a t f ( a x ) = a f ( x ) where a E R a n d x
(rnx + my) + + y) we have = m , + m y . It
is a n y e l e m e n t in X. Thus, f is linear. It is clear t h a t M = {x : f ( x ) = 0} a n d therefore tt = {x : f ( x ) = e} where c : = f(xo). (r Let f : X ~ R be a nonzero linear m a p a n d let g : = {x : f ( x ) = c} for s o m e c in R. Define M : = { x : f ( x ) = 0}. As f is nonzero t h e r e exists an e l e m e n t x0 in X such t h a t x0 ~ M. N o t e t h a t x E H if a n d o n l y if f ( x ) = c which is true if a n d only if f ( x - f(-~o)x0) = 0. Therefore, x E H if a n d only if x - ] - - ~ x 0 E M. Thus, H = f - ~ x 0 + M which i m p l i e s H is a linear variety. Now we show t h a t H is a p r o p e r m a x i m a l linear variety. I n d e e d , H is p r o p e r because x0 ~ M. Let n E N where N is a s u b s p a c e which c o n t a i n s M a n d n ~ M. As f ( n ) # 0 we have f ( x - /,,-~, n) = 0 for all x in X which jtn)
.
implies that x n E M for all x in X. T h u s x = m + n for s o m e m E M. As N is a s u b s p a c e which c o n t a i n s M a n d n E N it follows t h a t x E N. As x E X was chosen a r b i t r a r i l y it follows t h a t N = X. T h u s we have e s t a b l i s h e d t h a t M is a p r o p e r m a x i m a l subspace. T h i s proves t h a t H is a hyperplane. [] W i t h this t h e o r e m we have recovered the f a m i l i a r d e s c r i p t i o n as given in e q u a t i o n (3.1) for h y p e r p l a n e s . 3 . 2 . 1 . Let H be a hyperplane in a vector space X such that 0 is not in H. Then there exists a unique nonzero linear map f : X -+ R such that H = { x : f ( x ) = 1}.
Corollary
Proof. F r o m T h e o r e m 3.2.1 it is clear t h a t t h e r e exists a l i n e a r m a p f l : X -+ R a n d c E R such t h a t H = {x : f l ( x ) = c}. As 0 ~ H it is n o t p o s s i b l e t h a t e = 0. Let f : X -+ R be defined b y f ( x ) = ~ f l ( x ) for any x in X. T h e n it follows t h a t H = { x : f ( x ) = 1}. We will show t h a t f is unique. Let g : X --+ R be a n o t h e r n o n z e r o linear f u n c t i o n such t h a t H = {x : g(x) = 1}. Let h be an a r b i t r a r y e l e m e n t in H. T h e n it is clear t h a t for any x in X , f ( x + ( 1 - f ( x ) ) h ) = 1 which i m p l i e s t h a t x + (1 - f ( x ) ) h E H. Therefore, g ( x + (1 - f ( x ) ) h ) = 1 from which it follows t h a t g(x) = f ( x ) for any x in X (because g(h) = 1). T h u s , f is unique. [] For t h e p u r p o s e s of the discussion below we will a s s u m e t h a t c, m l a n d m 2 which d e s c r i b e t h e line L in F i g u r e 3.1 are all n o n n e g a t i v e . T h e r e s u l t s for o t h e r cases will be s i m i l a r . C o n s i d e r the region A in F i g u r e 3.1 which is t h e region "below" t h e line L. As i l l u s t r a t e d earlier, L = {x : < x, x* > = c} where x* = ( m l , m 2 ) . C o n s i d e r a n y p o i n t x = ( x l , x 2 ) in region A. Such a p o i n t lies "below" t h e line L. T h u s if x ' = ( x l , x~) d e n o t e s the p o i n t on t h e line L which has t h e s a m e first c o o r d i n a t e as t h a t of x t h e n x '2 >_ x2 . As x ' is on the line L it follows t h a t < x ' , z * > = mix1 + m2x'2 = c. As m2 > 0 it follows
3.2 Separation of Disjoint Convex Sets
51
that < x, x* > = r n l z l + rnex2 <_ mlXl -4- rnexl2 ---- C. Thus we have shown that for every point x in the region A, < x, x* >_< c. In a similar manner it can be established that if < x, x* > < c then z lies "below" the line L, that is x C A. Thus the region A is given by the set {z :< x , z * >_< c} which is termed the negative half space of L. In an analogous manner it can be shown that the region B (which is the region "above" the line L) is described by {x :< x, x* >_> e}. This set is termed the positive half space of L. Thus the line L separates R e into two halves; a positive and a negative half. We generalize the concept of half spaces for an arbitrary normed vector space. D e f i n i t i o n 3.2.2 ( H a l f s p a c e s ) . Let (X, II" IIx) be a normed linear space and let z* : X --4 R be a bounded linear functon on X . Let S1 : = { x E X : < z , x ' >
$2 := { x 9 X :< z , x " > <_c}, Sa= {x 9 >c}, S4 := {x 9 X :< z, z* > > c } . Then S1 iS an open negative half space, S~ is a closed negative half space, $3 is an open positive half space and $4 is a closed positive half space. L e m m a 3.2.1. Let (X, I1" IIx) be a normed linear space and let f : X --4 R be a bounded linear functon on X . Then the sets {x 9 X : f ( x ) < c} and {x 9 X : f ( x ) > c} are open and the sets {x 9 X : f ( x ) < c} and {x 9 X : f ( x ) >_ c} are closed in the norm topology for any c in R. Proof. The proof is left to the reader. [] It is intuitively clear that in R e if two convex sets C1 and Ce do not intersect then there exists a line in R e which separates the two sets (see Figure 3.4). In other words there exists x* in (R2) * and a constant c in R such that C1 lies on the positive half space of the line L = {x I < x, x* > = c} and C~ lies in the negative half space of L. T h a t is
C1 C {.v :< x,x* > > c}, and C2 c {z :< z, z* > < c}. The main focus of this section is to generalize this result to disjoint convex sets in a general normed vector space. In this regard we can immediately establish the following result. Suppose (X, II-IIx) is a normed vector space and let B := {xlllzll < 1} be the unit norm ball such that xo ~ i n t ( B ) . T h e n it is possible to separate B and z0 by a hyperplane. Indeed from T h e o r e m 2.2.4, it follows that there exists z* in X* such that IIz'll _ 1 and < z0, z" > =
IIz011x > 1. As IIx'll < 1 and Ilzitx < 1 it follows that < x, ~* > _ IIx'llll~llx < II~llx < X < II~011x for all x 9 Jut(B). Thus
52
3. Convex Analysis
~
~
/
/
/
/
/
A
Fig. 3.4. Separation of of convex sets in R 2.
int(~) c {x :< x,x" > < II~ollx), whereas
x0 E L := {~ :< ~, ~" > - - [Ix011x} Thus we have shown that it is possible to separate the interior of a unit norm ball and any element which does not belong to the interior of the unit n o r m ball. Minkowski's function is a norm like functional associated with a convex set which allows us to use a similar argument as developed above to separate disjoint convex sets. L e m m a 3.2.2 ( M i n k o w s k i ' s f u n c t i o n ) . Let K be a convex subset of a normed linear space (X,[[. [Ix) such that 0 E i n t ( K ) . For any x E X let p(z) := inf{A E R : $ > 0, such that z E M~'},
where A/~" :=- {X : X = Ak for some k E K } . Then p is a real valued continuous sublinear function which is non-negative. Proof. It is clear that p is non-negative. As 0 E i n t ( I ( ) , there exists an a > 0 in R such that if for any k E X , ]lkl[x < a implies that k E h'. Let x be ax any element in X such that ]lxi]x # O. Then [ ] ~ I i x < a and therefore 211~llx~ " Thus, for any z E X , p(x) < 211~llx < ~. a a shown that p is real valued and non-negative. e
Thus
we have
3.2 Separation of Disjoint Convex Sets
53
Let a E R be such that a > 0. Then
p(ax) = inf{A E R : A > 0 and ax E AN} =inf{a~
ER: ~ >0 andxE
~I(}
=
As 0 E i n t ( K ) it follows that p(0) = 0. Let z and y be elements in X. Note that ifp(z) < 1 then z E K. Indeed, if p(z) < 1 then there exists A E R and k E K such that 0 < A < 1 with z = Ak = A k + ( 1 - A ) 0 . As k and 0 both are in K and K is convex it follows that z is in K. Given any e > 0 let r , E R and ry E R be such that p(z) < r, < p(x) + ~ and p(y) < ry < p(y) + ~. As, 1 >
p(z) = P(r-~) we know that ~
E K. Similarly, ~
E K. Let
r := r~ + r~. From the convexity of K, Lr_ r ~z + r~ r yry E K. This implies that
l ( x +y) 9 K and therefore z + y 9 vii. Thus, p(x +y) <_ r < p ( x ) + p ( y ) + e . As e > 0 is arbitrary it follows that p(x + y) <_ p(x) + p(y). This proves that p is sublinear. Note that for elements x and y in X, p(x) = p ( z - y + y ) <_ p ( x - y ) + p ( y ) . This implies that p(x) - p(y) < p(x - y) < 2Ilx - Ylix - -
- -
a
"
Similarly it follows
that p ( y ) - p(z) < 2 I I Y - xjIx and therefore ] p ( z ) - p(y)] < 211x-YJIx -
a
-
Thus p is a continuous function.
a
[]
T h e o r e m 3.2.2. Let (X, II' [Ix) be a normed linear" space and let X* denote its dual. Let K be a convex subset of X such that i n t ( K ) 5s {} and let V be a linear variety such that int(K) M V = {}. Then there exists a nonzero x"
in X* such that i n t ( K ) C {z 9 X :< x,x" > < c} ff C {z 9 X :< x,z* > < c} V C {z 9 X :< z, z* > = c},
where K denotes the closure of the set I( in the norm topology. Proof. We will first, prove the theorem when 0 9 i n t ( K ) . Let V = x0 + N where x0 is an element in X and N is a subspace of X. Let
M=N•Rxo
:={n+y:n 9
andy=ax0
for s o m e c ~ 9
For any element m in M let m = nm + f ( m ) x o . We now show that nm and f ( x ) are unique. Let m = nl + axo = n2 + ~xo where a and /3 in R with --n I a # /3 and nl and n2 are elements in M. Then it follows that x0 = n 2~_# and therefore z0 is in N. As N is a subspace we have - x 0 9 N. This imphes that 0 9 V which is not true because i n t ( K ) f3 V = {} and 0 9 i n t ( K ) . Thus nm and f ( m ) are unique for every m in M. Thus f defines a function on M. It can also be shown that f is a linear function. Note that
V = {m 9 M : f(m) =1}.
54
3. Convex Analysis
For any x in X let
p(x) = inf{A E R : x E AK}. p is the Minkowski's function of the convex set K. As int(K) ;3 V = {} it follows t h a t for all v E V, p(v) _> 1. Therefore, if v is in M and f(v) = 1 then p(v) > 1. For all m E M, with f ( m ) > 0, f(f-~-~}) = 1 which implies t h a t for all m in M with f ( m ) > 0, p ( ] - - ~ ) >_ 1. F r o m the sublinearity and the non-negativity of p (see L e m m a 3.2.2) it follows t h a t p(m) > f ( m ) for all m in M. From T h e o r e m 2.1.2 we know t h a t there exists a linear function F : X R such t h a t F(m) = f ( m ) for all m in M and F(x) ~_ p(x) for all x in X. It is clear t h a t F is continuous because p is continuous and F(x) ~ p(x) for all x in X. We r e n a m e F as x*. T h e n it is clear t h a t for every element k in int(K), p(k) < 1 and therefore < k, x* > < 1. Also, for every element k in K , < k,x* > < p(k) < ]. As F(v) = f(v) --- 1 for every element v in V it follows t h a t < v,x* > = 1 on V. Also, x* :~ 0 as V is not e m p t y . T h u s we have established the t h e o r e m when 0 E int(K). For the m o r e general case if k0 is in int(K) then let h " := { k -
k0: k E K } , and let Y' := {v - k o : v E V}.
W i t h these definitions we have t h a t 0 E int(K') and V' N int(K') = {} (as V M int(K) = {}). A p p l y i n g the result to K ' and V' the t h e o r e m in the general case follows easily with c : = < k0, x* > . [] 3 . 2 . 2 ( S e p a r a t i o n o f a p o i n t a n d a c o n v e x s e t ) . Let K be a convex subset of a normed linear space (X, [[. [Ix) with int(K) # {}. Let xo in X be such that xo q~ int(K). Then there exists a nonzero x* E X* such that Corollary
> < < x 0 , x* > for allk i n i n t ( K ) and > < < x 0 , x * > for allk in K.
Proof. Let V := {x0}. T h e n V is a linear variety such t h a t V M int(K) = {}. From T h e o r e m 3.2.2 it follows t h a t there exists x* in X* and c E R such t h a t < x 0 , x* > = c, < k, x* > < c f o r a l l k i n i n t ( K ) and < k , x * > < c f o r a l l k in K . This proves the corollary. [] C o r o l l a r y 3.2.3. Let K be a convex subset of a normed linear space (X, I[. [ix) with int(K) r {}. Let xo E bd(K) := If \ int(K) := {x E K : x int(K)}. Then there exists a nonzero x* E X* such that < xo,x* > = s u p { < k,x* >: k E K } .
Proof. From Corollary 3.2.2 we know t h a t there exists x* in X* such t h a t
< <x0, x*>
for a l l k E I f .
(3.2)
3.3 Convex Optimization
55
As x0 E K we know that there exists a sequence {x,~} in K such that Ilx0 x,~[[x --+ O. From continuity of x* it follows that < x,~,x* >--+< xo, x* > . From equation 3.2 we conclude that < x0,x* > = s u p { < k,x* >: k E K}. [] The following corollary is often referred to as the Eidelheit separation result. C o r o l l a r y 3.2.4 ( S e p a r a t i o n o f d i s j o i n t c o n v e x s e t s ) . Let I(1 and K2 be convex subsets of a normed linear space (X, I1' IIx). Let int(I~x) 5s {} and suppose int(Ka) N K2 = {}. Then there exists a nonzero x* in X* such that s u p { < x,x* >: x E K1} _< inf{< x,x* >: x E K2}.
Proof. Let K : = [ ( 1 - I~'2 :---- {kl - k2 : kl E [ ( 1 and k2 E K2}. As int(/(1) f3 I(2 = {} it follows that 0 ~ i n t ( K ) . Also, i n t ( K ) 5s {} because int(Ifl)fqI(2 = {} and int(K1) 5s {}. From Corollary 3.2.2 there exists a nonzero x* in X* such that < k, x* > < 0 for all k in K. This implies that for any kl in h'l and for any k2 in Ks, < kl, x* > < < k~,x* > . This proves the corollary. []
3.3 Convex
Optimization
The problem that is the subject of the rest of the chapter is the following problem. p =
inf f ( x ) subject to xE/2,
where f :/2 --+ R is a convex function on a convex subset /2 of a vector spacc X. Such a problem is called a convex optimization problem. L e m m a 3.3.1. Let f : (X, ][.]]x) -~ R be a convex function and let ~ be a convex subset of X . If there exists a neighbourhood N in /2 of ~o where ~o E /2 such that for all w E N, f(wo) <_ f(cv) then f(uJo) < f(w) for all w i n / 2 (that is every local minimum is a global minimum).
Proof. Let w be any element o f / 2 . Let 0 < ,k < 1 be such that x := Aw0 + (1 - A)w be in N. Then f(cz0) < f ( x ) < Af(w0)--{- (1 - A)f(w). This implies that f ( ~ 0 ) < f ( ~ ) . As w is an arbitrary element o f / 2 we have established the lemma. []
L e m m a 3.3.2. Let~2 be a convex subset of a Banach space X and f :/2 ---+R be strictly convex. If there exists an xo E I2 such that
f(xo) = xE/2 inf f ( x ) , (that is f achieves its minimum on 12) then the minimizer is unique.
56
3. Convex Analysis
Proof. Let m : = ~ f ( z ) .
Let x l , z 2 E /2 be such t h a t f ( z l ) = f ( x ~ ) = m.
Let 0 < A < 1. From convexity o f / 2 we have AXl + (1 - A)x2 E /2. F r o m strict convexity of f we have t h a t if xl :fi z2 then f ( A z l + (1 - A)x2) < A f ( x l ) + (1 - A)f(x2) = m which is a contradiction. Therefore xl = z2. This proves the lemma. I--I M a n y convex o p t i m i z a t i o n problems have tile following structure w(z) =
inf f ( x ) subject to
z E /2 g(x) _< z,
(3.3)
where f : /2 --+ R, g : X ~ Z are convex m a p s with /2 a convex subset of the vector space X and Z a n o r m e d vector space with a positive cone P. T h e condition g ( z ) < z is to be interpreted with respect to the positive cone P of the vector space Z. Lemma
3.3.3.
The function w is convex.
Proof9 Let Zl and z2 be elements in Z and let 0 < A < 1 be any constant. Then w(AZl + (1 - A)z2) = i n f { f ( z ) : x E / 2 , g(x) < Azl + (1 - A)z2} = i n f { f ( x ) : x = /~X 1 "31-(1 -- ,'~)X2, X 1 9 /2, X 2 9 /2, g ( X ) < ~Z 1 "-~ (1 - A ) z o }
___ inf{,
f(xl)+ (1 -
9
x= 9
9
9
g(x) < /~Z1 -[- (1 - A)z:)
< inf{
f(xl) + (1 g ( x l ) _< zl,g(x2)
<_z2}
= AT(z1) + (1 - A)w(z2). T h e second equality is true because for any given A with 0 < A < 1 the set /2 = {x : x = Ax~ + (1 - A)x2,xl 9 x2 9 T h e first inequality is true because f is a convex map. T h e second inequality is true because the set { ( x l , x 2 ) 9 /2 x /2 : g(Axl + (1 - A)x2) < Azl + (1 - A)z2) D { ( x l , x 2 ) 9 12 x /2 : g ( x l ) < zl,g(x2) < z2}, which follows from the convexity o f g . This proves the lemma. [] L e m m a 3 . 3 . 4 . Let Zl and z2 be elements in Z such that zl < z2 with respect to the convex cone P. Then w(z2) < w(zl).
Proof. Follows i m m e d i a t e l y from the relation {x 9 /2 : g(z) < z2} D {x 9 /2 : g ( z ) _< zt}, if zx < z2. [] D e f i n i t i o n 3 . 3 . 1 ( E p i g r a p h ) . Let f : /2 --~ R be a real valued function where/2 is a subset of a vector space X . The epigraph of f over/2 is a subset If,/2] of R • X defined by If,/2] : = {(r,w) 9 R x X : x 9
f ( x ) < r}.
3.3 Convex O p t i m i z a t i o n
57
L e m m a 3.3.5. Let f : f2 --~ R be a real valued function where f2 is a convex subset of a vector space X . Then f is convex if and only if [f, ~2] is convex. []
Proof. Left to the reader.
3.3.1 M i n i m u m D i s t a n c e t o a C o n v e x Set
X2
\
~!~
~i~:~: 9 :.'::
x/
"
'i,"
":::~"
"
.... ,
.~Xl
L= Ix: < x,x*> =hK(x) }
F i g . 3.5. Support hyperplane to a convex set K. T h e figure also illustrates the fact t h a t the m i n i m u m distance from a point x to a convex set K is the m a x i m u m of the distances of the point from the s u p p o r t i n g hyperplanes of the convcx set.
Note that the minimum distance of a point x = (xl,x2) from a line (see Figure 3.3) in R 2 given by inf Ily - xll,
yEL
is equal to mlXl+rn2;g2--c
< X~X* > --C llx-ll '
where the equation of the line is given by m l x l + r n 2 x 2 = c and x* = (ml, rn2). D e f i n i t i o n 3.3.2 ( S u p p o r t - f u n c t i o n a l ) . Let K be a n o n e m p t y convex subset of a normed linear space (X, II' tlx). The support functional h : X * --4 R U {ec} is defined by h K ( x * ) := sup{< k,x* >: k E K } , f o r any x* in X * .
58
3. Convex Analysis
Note t h a t for all k E K, < k,x* > < hg(x*). T h u s it is clear t h a t the h y p e r p l a n e L = {< x , x * > = hk(x'-)} divides the vector space into two halves such t h a t the convex set K lies entirely in one half of the space (see Figure 3.5). By inspection it Call be seen t h a t in R ~, the m i n i m u m distance of a point x from a convex set K is equal to the m a x i m u m of the distances of the point x from the s u p p o r t i n g hyperplanes. As the distance of the point x from the s u p p o r t i n g h y p e r p l a n e associated with h g ( x * ) is given by
<
~, ~"
> -hK(~') llz*ll
we have
:
fllk- xll = max =
< ~:,x* > - h ~ ( z * )
m a x {< x, x* > --hK(X*)}. II~'ll
In tile r e m a i n i n g p a r t of this subsection we will prove the above result for a general n o r m e d vector space. L e m m a 3.3.6. Let S : X -4 R be a sublinear function defined on a norraed linear space (X, [[. [Ix). Let Z be a nonemptg convex subset of X . Then, there exists a linear function L : X -4 R ,such that L(z) < S(x) for all z E X and inf{L(z) : z E Z} = i n f { S ( z ) : z E Z}.
Proof. We will assume t h a t a := inf{S(z) : z E Z} > - o o (the p r o o f for the case when a = - o o follows easily from T h e o r e m 2.1.1). For any z in X let U(x) := i n f { S ( x + Az) - A a : z E Z, A _> 0}. We will show t h a t U is a real valued sublinear function such t h a t U(x) < S(x) for all x in X and for all z E Z, U ( - z ) < - a . Let x and y be elements in X, then U(x) + U(y) = i n f { S ( z + AlZl) --~ S ( y --[-A2Z2) -(A1 + A2)a : zl, z2 E Z, A1,A2 _ 0} >_ inf{S(x + y + AlZl + A2z2) - (Ax + A2)a :zt,z2 E Z, A1,A2 _> 0} inf{S(x + y + (At + A2)(AI+A2Zl x____xx___+ ~_.__ha__z~ ----AI+Aa " ] ! --(A1 "q- /~2)a : Zl, z2 E Z, AI, A2 _~ 0}
= inf{S(x + y + Az) - Aa : z E Z , A _> O} = U(x + y). T h e third equality is true because, for A1,A2 -> O, tAt+A2 l ~---AZ--zx9 + x---AZ--zAx+A~ z : z,,z~ in Z} = Z and the set {A1 + A2 : A1,A2 > 0} = {A: A _> 0}. Let c~ > 0, then
U(c~x) = i n f { S ( a x ) - A a : z E Z,A _> O} = (~inf{S(x) - h a : z E Z,A > O} = a i n f { S ( x ) - A a : z E Z,A > 0}.
3.3 Convex Optimization
59
T h e second equality is true because S(crx) = aS(x) and the last equality is true because {~ : A > 0) = {A: A > 0}. It is clear t h a t U(0) = 0. Thus, we have established t h a t U(ax) = aU(x) for all x E X and for all a > 0. Note t h a t for any x in X, 0 = U(0) = V(x - x) <_ U(x) + U ( - x ) <_ U(x) + S ( - x ) which implies t h a t U(x) >_ - S ( - z ) > - o e . Also it follows easily from the definition of U t h a t U(x) _< S(x) < oe. Therefore, we have shown t h a t U is a real valued sublinear function such t h a t for all x in X, U(x) <_S(x). From T h e o r e m 2.1.1 we know t h a t there exists a linear function L : X -+ R such t h a t for all z in X, L(x) <_ U(x) <_ S(z). This implies t h a t inf{L(z) : zEZ}
U ( - w ) = i n f { S ( - w + Az) - A a : ,~ _> 0, z E Z} < S ( w - tv) - a = - a . Therefore, for any z E Z, L ( - z ) <_ U ( - z ) <_ - a which implies t h a t for all z in Z L(z) > a (because n is linear). Thus, we have inf{L(z) : z E Z} < inf{S(z) : z E Z} = a and therefore i n f { n ( z ) : z E Z} = a. This proves the lemma. [] 3.3.1 (Minimum d i s t a n c e f r o m a c o n v e x s e t ) . Let K be a nonempty convex subset of a normed space (X, I1" Itx) and get xo be an element in X. Let the dual space of (X, 11' IIx) be denoted by X*. DeJine
Theorem
# := i n f { l I x 0 - klIx : k E K}.
Then, , = m a x { , ( xo, x" > - - h K ( x * ) : IIx'll
<_ 1}.
Also, if ko in K and x; e X* are such that [ I x 0 - kollx = < --hK(x;) = I.t then
xo,x; >
< z 0 - k0,z; > = IIz0- k0llx IIz;ll.
Proof. We will first show that sup{< Xo,Z* > - h K ( x * ) : IIx*ll _< 1} >_ •. Let S : X --+ R be a real valued sublinear function defined by S ( z ) = for any z in X. Let Z := x 0 -
Ilxllx
K : = {y : there exists k E K with y = z0 - k}.
F r o m L e m m a 3.3.6 we know t h a t there exists x ; in X* such t h a t Ilx;[I _< 1 and i n f { < z,x• > : z E Z} = inf{iiziix : z E Z} = inf{ll~:0- kllx : k E K} = p. However, i n f { < z,x; > : z E Z} = i n f { < x0 - k , x ; >: k E h'} =<xo,z;>+inf{-:kEK}
= < x0, x; > - h u ( ~ ; ) .
60
3. Convex Analysis
Thus we have established that there exists z~ in X* such that IIz~ll ~ 1 and p = < z0, x~ > - h K ( z ~ ) . This implies that
sup{< ~0,z" > - h K ( ~ ' ) : I1~'11 _< X} > ~. Now we will show that/1 _> sup{< x0, z'* > - - h K ( x ' ) : I]:c*ll < l}. Let ,v" in X* be such that ]]x'll < 1. Then for any k 6 K, lifo - kllx >_ I < ~o - k, ~" > I >_ < xo, x" > - < k, ~* >
Therefore, inf{I]xo- kllx : k E K } >_ < xo, x* > - h K ( x ' ) . This holds for any x" in X* which satisfies IIz'll <__ 1. Thus /1 > sup{< zo, x" > - h K ( Z ' )
: IIx'll S 1}.
However, we have established earlier that there exists x~ E X* such that ]]z~l I __< 1, and/~ = < zo, x~ > - h K (z~). Therefore,
= max{< ~o,x" > - h K ( x ' ) :
I1~*11 _< 1),
where we ha.ve replaced the term sup by max in the right hand side of the equation. Let k0 in I~ and x~ in X" be such that I1~;11_ 1 and = lifo - k l l x = < ~0, x~ > - h K ( ~ ) .
From the definition of hK we have that < k0, x~ > _< hK(x~) which implies that < z0 - k0, ~=; > ~ < x0,m~ > -hz,-(x~) = ~ - IIx0 - k011x. As IIx~ll ~ 1 it follows that < Xo - ko,z~ > > IIx0 - k0[lx I1~*11. Thus < zo - k o , x ; > = Ilz0 - k011x 11~'11. This proves the theorem. [] 3.3.2
Kuhn-Tucker Theorem
Consider the convex optimization problem ~(z) =
inf f(x) subject to xEI2
g(x) _< z. We will obtain information about w(0) by analyzing w(z). We have shown that w(z) is a decreasing function of z (see Lemma 3.3.4) and that it is a convex function (see Lemma 3.3.3). It can be visualized as illustrated in Figure 3.6. As w(z) is a decreasing fimction it is evident that the tangent to the curve at (0,w(0)) has a negative slope (see Figure 3.6). Thus the tangent can be characterized by a line L with the equation:
3.3 Convex Optimization
\(z,w(z))
,.
R
61
tl ff is ff
L= lx : < x,x* > =hK(x 3 }
Fig. 3.6. Illutstration of
w(z).
w ( z ) + < z, z* > = c, where z* > 0. Also, note that if we change the coordinates such that L becomes the horizontal axis and its perpendicular the vertical axis with the origin at (0,w(0)) (see Figure 3.6) then the function w(z) achieves its minim u m at the new origin. In the new cordinate system the vertical cordinate of the curve w(z) is given by the distance of (z, w(z)) from the line L. This distance is given by s(z) = w ( z ) + < z, z* > - c II(1,z')ll Thus
s(z)
achieves its m i n i m u m at z = O. This implies that
w(0) -=- min{w(z)+ < z, z* >} zEZ
= mi~{inf{f(x):x E fl,g(x) <_z } + < z,z" >} = i n f { f ( x ) + < z,z* >: x E Y2, z E Z,g(x) <_z} > i n f { f ( x ) + < g(x),z" >: x E ~ , z E Z,g(x) < z} i n f { f ( x ) + < g(x),z* >: x E $2). The first inequality is true because z* > 0 and g(x) <_ z. The second inequality is true because the {x E () : z E Z,g(x) < z} C {x E Y)). It is also true that w(0) = i n f { f ( x ) + < z,z* >: x E 9 , z E i n f { f ( x ) + < z, z* >: x E Y2), because
g(x) < g(x)
Z,g(x) < z}
is true for every x E (2. Thus we have
62
3. Convex Analysis w(0) = i n f { f ( x ) + < z, z" >: z E /2}.
Note that the above equation states that a constrained optimization problem given by the problem statement of w(0) can be converted to an unconstrained optimization problem as given by the right hand side of the above equation. We make these arguments more precise in the rest of this subsection.
y
separatmg hyperplane
Fig. 3.7. Figure for Lemma 3.3.7.
Lemma 3.3.7. Let (X, II-IIx), and (Z, II' IIz), be normed vector spaces with /2 a convex subset of X. Let P be a positive convex cone defined in Z. Let Z* denote the dual space of Z with the postive cone P ~ associated with P. Let f : /2 --+ R be a real valued convex functional and g : X --+ Z be a convex mapping. Define P0 := inf{f(x) : g(z) < 0, x E /2}.
(3.4)
Suppose there exists Xl E / 2 such that g(xl) < 0 and and suppose Po is finite. Then, there exist z~ > 0 such that /z0 = i n f { f ( x ) + < g(x), z~ >: x E /2}.
(3.5)
Furthermore, if there exists xo such that g(xo) <<0 and i~o = f ( x o ) then < g(x0), z~ > = 0
(3.6)
Proof. We will say that an element x in /2 is feasible if g(x) < O. Define A, (see Figure 3.7) a subset of Z x R by A:={(z,r):
there e x i s t s x E / 2 s u c h t h a t g ( x ) < z a n d
f(x)
3.3 Convex Optimization
63
and B (see Figure 3.7) a n o t h e r subset of Z • R by B := -P
x ( - 0 % # 0 ] : = { ( z , r ) : - z E P and r < P0}.
We will assume t h a t the n o r m on Z • R is the p r o d u c t n o r m induced by the norms on Z and R. Note t h a t in this n o r m Jut(B) # {} (let P0 E i n t ( - P ) ; then (P0,P0 - 1) E Jut(B)). We will show t h a t int(B)M A = {}. Suppose ( z , r ) E int(B) MA. T h e n there exists z in 12 such t h a t f(x) < r and g(z) <_ z. Also z E - P and r < P0. Therefore, f ( z ) _< r < #0 and 9(z) < z < 0. This implies t h a t z is feasible and f ( x ) is strictly less than/~0 which contradicts the definition of #0. Therefore, int(B) (3 A = {}. A p p l y i n g Eidelheit's separation result (see Corollary 3.2.4) to A and B (note t h a t A and B are convex) we know t h a t there exists a nonzero element (z*,s) E (Z • R)* = Z* • R (see T h e o r e m 2.2.3) and k E R such t h a t
< z,z* > +sr >_k for all ( z , r ) E A and
(3.7)
< z,z* > + s t _< k for all ( z , r ) E B.
(3.8)
We will now show t h a t s >_ 0. As (0, r) for r _< /J0 is in B it follows from inequality (3.8) t h a t s r < k for all r _< P0. This implies t h a t s > 0 (otherwise by letting r -+ - ~ we see t h a t k = oc which is not possible because inequality (3.7) holds). We will now show t h a t s > 0. Suppose t h a t s = 0. T h e n from inequality (3.7) we have < g ( x l ) , z* >~_ k,
(3.9)
because (9(xl), f(xl)) belongs to A. Also, from inequality (3.8) we have t h a t < z,z* > _< k, for all z z E -P, cr --+ cr < az, z*
(3.10)
E - P . In particular as 0 E - P we have k > 0. Suppose for some < z,z* > > 0. T h e n we have < a z , z* > = a < z,z* >--+ oz as However as P is a cone and a > 0, az E - P if z E - P . Therefore > < k < ~ if z E - P . T h u s we have a contradiction and therefore i
< z,z* > < 0 for all z E - P
and k > 0.
(3.11)
As -9(za) E int(P) we have t h a t there exists an r > 0 in R such t h a t Ilzllz < r implies t h a t - g ( z a ) + z E P. Therefore, from (3.11) we have t h a t < 9(x~) - z, z* > < 0 if Ilzllz < r which implies t h a t < 9(xl), z" > < < z, z* > if Ilzllz < From inequality (3.9) we have 0 < k < < g(xl), z* > < < z, z* > if Ilzllz < ~. This implies t h a t for any z E Z, < z,z* > > 0. For any nonzero
zEZ,
] ez ~Z Z* and therefore < [lYl~z' > > 0. This implies t h a t for any z E Z, < z, z* > > 0. As Z is a vector space (which implies - < z, z* > ___ 0) it follows t h a t
64
3. Convex Analysis
< z, z" > = 0 for all z E Z. T h u s z* = 0. This contradicts (z, s) :/: (0, 0) and therefore, s > 0. 7,* Let z~ = -Y" Dividing inequality (3.7) by s we have k < z,z~ > + r > - for all ( z , r ) E A and
(3.12)
8
dividing inequality (3.8) by s we have k < z,z~ > + r _< - for all ( z , r ) E B.
(3.13)
8
In particular, as (z, p0) E B for all z E - P it follows from inequality (3.13) that k < z,z~ > < - - p 0 for a l l z E - P . s This implies t h a t < z,z~ > _< 0 for all z E - P . Indeed, if for s o m e Zl E - P , < -~ 1 , z 0* > > 0 then < c~zl,z* >--+ oo as a -4 oo which contradicts the fact that < a z l , z* > is bounded above by k_ $ _/10. T h u s we conclude t h a t
z; e P c . Also, as (g(x), .f(x)) for x E 12 is in A it follows from (3.12) t h a t
< g(x), z~) > +f(x) > _k for all x E 12 and
(3.14)
8
as (0,p0) E B it folllows from (3.13) t h a t k ]~o < - for all (z, r) E/3.
(3.15)
8
F r o m inequalities (3.14) and (3.15) we conclude t h a t i n f { < g(x), z~ > +f(x) : x E 12} _> t o .
(3.16)
Suppose x E 12 and g(x) < 0 (i.e. x is feasible), then
f(x)+ < g(x), z~ >< f(x),
(3.17)
because ~o -* E p C . Therefore, we have i n f { f ( x ) + < g(x),z; >: x E 12} < i n f { f ( x ) + < g(x),z; >
: . e 12,a(.) < o} _< inf{f(x) : x E 12,g(x) <_ 0} = p0. T h e first inequality is true because 12 D {x E 12,g(x) <_ 0} and the second inequality follows from (3.17). It follows from inequality (3.16) t h a t P0 = i n f { f ( x ) + < g(x),z; >: x E 12}.
(3.18)
Let x0 be such t h a t x0 E 12 and g(xo) < 0 and f(xo) = #o. T h e n
f(~o) = to ___/(~o)+ < a(~o), z~ > _< f(~o) = t~o. T h e first inequality follows from e q u a t i o n (3.18) and the second inequality l S true because z~ E p e and g(zo) <_ O. T h i s proves t h a t < g(xo), z~ > = 0. []
3.3 Convex Optimization
65
L e m m a 3.3.8. Let X be a Banach space, 12 be a convex subset of X, Y be a finite dimensional normed space, Z be a normed space with positive cone P. Let Z* denote the dual space of Z with a postive cone P(~. Let f : 12 -+ R be a real valued convex functional, g : X --+ Z be a convex mapping, H : X ~ Y be an afflne linear map and O E int({y E Y : H ( x ) = y for s o m e x E 12"2}). Define #o := i n f { f ( x ) : g(z) _< 0, H ( z ) = 0, x E ~2}.
(3.19)
Suppose there exists xl E 12-2such that g ( x l ) < 0 and H ( x l ) = 0 and suppose #o is finite. Then, there exist z~ > 0 and y~ such that #0 = i n f { f ( x ) + < g(x),z~ > + < H ( x ) , y ; >: x E 12}.
(3.20)
Proof. Let 121 : = { x :
x E 12, g ( x ) = O } .
A p p l y i n g L e m m a 3.3.7 to 121 we know t h a t there exists z; E p e such t h a t P0 = i n f { f ( x ) + < g(x),z~ >: x E 12I}.
(3.21)
Consider the convex subset, H(Y2) := { y E Y
: y=
H(x) for s o m e x E
12}
of Y. For y E H(12) define
k(y) := i n f { f ( x ) + < g ( x ) , z ; >: x E I"2, g ( x ) = y}. We now show t h a t k is convex. Suppose y,y' E H(12) and :c,x ~ are such t h a t H ( x ) = y and H(x') = y'. Suppose, 0 < A < 1. We have, A ( f ( x ) + < g(x), z; >) -{- (1 - A ) ( I ( x ' ) + < g ( x ' ) , z; > ) >__ f()~x + (1 - )~)z') q- < g()~x q(1 ,~)x'), z~ >>_ k(Ay + (1 - A)y'). (the first inequality follows f r o m the convexity of f and g. T h e second inequality is true because H ( A x + (1 A)x') = Ay + (1 - )~)y'.) T a k i n g i n f i m u m on the left hand side we o b t a i n )~k(y) + (1 - )~)k(y') >_ k(Ay + (1 - A)y'). This proves t h a t k is a convex function. We now show t h a t k : H(12) --+ R (i.e. we show t h a t k(y) > - o c for all y E H(12)). As, 0 E int[H(12)] we know t h a t there exists a n , > 0 such t h a t if IlYll < e then y E H(12). Take any y E H(12) such t h a t y r 0. Choose A, b/ such t h a t -
)~ =
e 2-~
and y~
= -)~Y"
This implies t h a t y~ E H(12). Let, B = X--~S" We have (1 -
+ B y = 0.
Therefore, from convexity of the function k we have
66
3. Convex Analysis
~k(y) + (1 - ~)k(y') > k(O) = #o. Note that tt0 > - c r by assumption. Therefore, k(y) > - o a . Note, that for all y E H(12), k(y) < oc. This proves that k is a real valued function. Let [k, H(12)] be defined as given below [k,H(12)] :-- {(r,y) E R • Y : y E H(12), k(y) <_ r}. We first show that [k, H(12)] has nonempty interior. As, k is a real valued convex function on the finite-dimensional convex set H[12] and 0 E int[H(12)] we have from from L e m m a 3.1.2 that k is continuous at 0. Let r0 = k(0) + 2 and choose e~ such that 0 < d < 1. As, k is continuous at 0 we know that there exists 5 > 0 such that y E H(12) and Ilyll < 5 implies that Ik(y) - k(0)l < ,'.
This means that if y E H(12) and IlYll <_ ~ then
k(y) < k ( O ) + e ' < k ( O ) + l
Therefore, for all y E H(12) with Ilyll _<
we have k(y) < r0 - 89 This implies
that for all (r,y) E R x Y such that Iv - r01 < 88 Y E H(12) and IIYll _< 5 we have k(y) < r. This proves that (r0, 0) E int([k, H(12)]). It is clear that (k(0), 0) E R x Y is not in the interior of [k, H(12)]. Using, Corollary 3.2.2 we know that there exists (s, y*) # (0, 0) E R x Y* such that for all (r, y) E [k, H(I2)] the following is true
< y,y* > +rs > < O,y* > +k(O)s = s#o.
(3.22)
In particular, rs > s#o for all r > #0 (note that (7-,0) E [k, H(12)] for all r > #0). This means that s > 0. Suppose, s = 0. We have from (3.22) that < y, y* > > 0 for all y E H(12). As, 0 E int[H(12)] it follows that there exists an e E R such that [lyll < e implies that < y,y* > > 0 and < -y,y* > > 0. This implies that if Ilyil _< e then < y,y* > = 0. But, then for any y E Y one can choose a positive constant ~ such that IIo~yll < e and therefore < c~y,y* > = 0. This implies that (s, y*) = (0, 0) which is not possible. Therefore, we conclude that s > 0. Let y~ = y*/s. From (3.22) we have, < y, y~ > + r > P0, for all (r, y) E [k, H(12)].
(3.23)
This implies that for all y E H(12),
< y, y~ > +k(y) > #o,
(3.24)
(This is because (k(y), y) E[k, H(12)]). Therefore, for all x e 12,
< H(x),y~ > +f(x)+ < g(x),z; > > #0, which implies that
(3.25)
3.3 Convex Optimization inf{f(z)++
>: z E 1 2 } > # o .
67 (3.26)
But if x E 12 is such that H ( x ) = 0 then f ( x ) + < g(x), z~ > = f ( x ) + < g(x), z~ > + < H(x), y; > _>inf{f(z)++
>: z 6 1 2 }
_> 120.
Taking infimum on the left hand side of the above inequality over all z E (2 which satisfy H ( z ) = 0 (that is infimum over all x E ~21) we have, /~0 = i n f { f ( x ) + < g(x),z~ > + < H(x),y~ >: x 6 12}.
(3.27)
This proves the lemma. The following is a Lagrange duality theorem.
[]
T h e o r e m 3.3.2 ( K u h n - T u c k e r - L a g r a n g e d u a l i t y ) . Let X be a Banach space, ~ be a convex subset of X, Y be a finite dimensional normed space, Z be a normed space with positive cone P. Let Z* denote the dual space of Z with a positive cone P ~ . Let f : ~2 --+ R be a real valued convex functional, g : X --+ Z be a convex mapping, H : X -~ Y be an at:fine linear map and
0 e int[range(H)]. Define #o := inf{f(x) : g(x) < 0, H ( x ) = 0, x E 12}.
(3.28)
Suppose there exists xl E 12 such that g(xl) < 0 and H ( x l ) = 0 and suppose i.to is finite. Then, #o = max{~o(z',y) : z" _> 0, z" E Z*, y E Y},
(3.29)
where ~(z*, y):= i n f { f ( x ) + < g(x),z* > + < H ( x ) , y >: x E g2 } and the maximum is achieved for some z~ > O, z~ E Z*, yo E Y . Furthermore if the infimum in (3.28) is achieved by some xo E 12 then < g ( x 0 ) , z ; > + < H(xo),Y0 > = 0,
(3.30)
and xo m i n i m i z e s f ( x ) + < g(x), z~ > + < H(x), Yo >,
over all x E 12.
(3.31)
Proof. Given any z* > 0, y E Y we have inf { f ( x ) + < g(x),z* >
xED
+ < H ( z ) , y >} < i ~ f { f ( x ) +
+ < H(x),y >
: g(x) < O, g ( x ) = 0} < i n f { f ( x ) : g ( x ) < 0, H ( x ) = 0} --
zEI'~
~-
/.tO.
Therefore it follows that max{~o(z*, Y) : z" _> 0, Y E Y}_< Po. From Lemma 3.3.8 we know that there exists z~ E Z*,z~ _> 0, Vo E Y such that /~o = ~o(z;, Yo). This proves (3.29).
68
3. Convex Analysis
Suppose there exists x0 E 12, H(xo) = O, g(xo) <_ 0 and #o = f ( x o ) then t0 = y0) < y ( x o ) + < g(x0),z > + < H(xo),y0 >< = t0. Therefore we have < g(xo),z~ > + < H(xo),yo > = 0 and to = f ( x o ) + < g(xo), z~ > + < H(xo), Y0 >. This proves the theorem. [] We refer to (3.28) as the Primal problem and (3.29) as the Dual problem. C o r o l l a r y 3.3.1 ( S e n s i t i v i t y ) . Let X , Y, Z, f, H, g, 12 be as in Theorem 3.3.2. Let xo be the solution to the problem
minimize f(x) subject to x E 12, H ( x ) = O, g(x) <_ zo with (z~, Yo) as the dual solution. Let xl be the solution to the problem minimize f(x) subject to x E 12, H ( x ) = O, g(x) < zl with (z[, yl) as the dual solution. Then, < zl - zo,z; > <_ f ( x o ) -
f(xl)
5
o >.
(3.32)
Proof. From T h e o r e m 3.3.2 we know that for any x E 12, f ( x o ) + < g(xo) - zo,z~ > + < t[(xo),yo > < f ( x ) + < g(x) - zo,z~ > + < H ( x ) , y o > . In particular we have
f ( x o ) + < g(xo) -- zo, z; > + < tt(xo),Yo > f(Xl)-]-
< .q(Xl) -- Zo,Z; • + < ~ t ( X l ) , Y 0
> .
From T h e o r e m 3.3.2 we know that < g(xo) - z0, z~ > + < H(xo), Yo > = 0 and H ( x l ) = 0. This implies f(xo)--f(xl)
_~< g ( X l ) - - z 0 , z ~ > _~< z l - - z 0 , z ~ > .
A similar argument gives the other inequality. This proves the corollary.
[]
4. P a r a d i g m
for Control
Design
We present notions of stability, causality and well-posedness of interconnections of systems. The main part of this chapter focusses on the parametrization of all closed loop maps that are achievable through stabilizing controllers.
4.1 Notation
and
Preliminaries
We will generalize the gp space that we introduced in Chapter 2. Let t~ denote the space of all vector-valued real sequences taking values on positive integers that is g~ = {x: x = ( x l , x 2 , . . . , x n ) with xi 6 gp}. For any x in tm let
[xi(k)] v
[[x[[v = \k=O
Ilxll~
=
sup
1 _< p < oo and
i=1
max
Ix~(k)l
where x = ( x l , x = , . . . , x n ) and xi = (xi(O),xi(1),...) with zi(k) E R. Let
e~ :-- {xlx ~ e~, II~llp< oo}. All the results that were established for gp spaces in Section 2.4 hold for the ( q , II.llv) spaces. We often refer to e" as a signal-space. Let g;'• denote the spaces of m x n matrices with each element of the matrix in s Let Pk denote the truncation operator on gmxn which is defined by
Pk(x(O), x(1), x ( 2 ) , . . . ) = (x(0), x ( 1 ) , . . . , x(k), O, 0,...). Let S denote the shift map from t~ to ~ defined by
S(x(O), x(1), x ( 2 ) , . . . ) = (0, x(0), x(1), x(2), x ( 3 ) , . . . ) . D e f i n i t i o n 4.1.1 ( C a u s a l i t y ) . A linear map 7- : ~
causal if P t T = PtTPt for all t.
-4 (m is said to be
70
4. Paradigm for Control Design
T is strictly causal if P t T = P t T P t - 1 for all t where Pt is the truncation operator.
D e f i n i t i o n 4.1.2 ( T i m e i n v a r i a n c e ) . A map T : ga _.+ trm is time invariant if S T = 7-S where S is the shift operator. Let T be a linear m a p from ( q , ]].]]p) to ( ~ , ]].llp). The p-induced norm of T is defined as
117-11p-i.d :=
117-xllp
sup II~lt,~0 Ilxllp
We often refer to a m a p from a signal-space e '~ to another signal space gm as a system. D e f i n i t i o n 4.1.3 ( S t a b i l i t y ) . A linear map 7- : (X, t].]]x) -+ (Y, ]].]]Y) is said to be stable if it is bounded. T : (~'~, [[.lip) --4 (~p [[.lip) is said to be s stable if it is bounded. Example 4.1.I. Let 7- : /~ --4 ~ be a linear operator such that y = T u is defined as
y(1) y(2).
~
IT(l) T(0) '0" . =
T 1) T(2). T(0)... ':
)ii|u(1)~ u 2)
,
where y = (y(0), y(1), y(2), ...), u = (u(1), u(2) . . . . ) and T ( j ) E R, for all j = 0, 1 , . . . . Let t be any positive integer. Then it can be verified t h a t P t T P t u = P t T u for any u E L Thus T is causal. If T(0) = 0 then it can be verified t h a t P t T P t - 1 = P t T . In this case T is strictly causal. It also follows t h a t for any u E ~, S T u = T S u . Thus 7" is a time invaraint map. D e f i n i t i o n 4.1.4 ( C o n v o l u t i o n m a p s ) . 7- : gn __+ ~ is linear, time invariant causal, convolution map if and only if y = T u is given by
I 1 1 2 )(i1) y2
T~I ~ 2
T2,~
u2
rLrL
r;o
o
where y = (Yl Y2, . . . , Ym) E gm and u : (ul, u2, . . . , un) E ~ , 7ij : e ~ is described by
4.1 Notation and Preliminaries
|T~5(1 ) T~t(0)
71
lut(1)l
'0" .
with Tit(k) E R for all k = O, 1,.... {7;}j(k)}~~ is also called the impulse response of the system 7}t. The linear map T 0 can be identified with the sequence {T0(k)}~= 0 9 t. Thus with some abuse of notation we often write Tit to mean the map 7~t and T to mean the map T . Depending on the context Tij can denote the map Tit or the sequence {T~j(k)}. The operation given by Equation ,~. 1 is often written as Yi = Ti t * u t. L e m m a 4 . 1 . 1 . Let T : f --4 f be a linear, time invariant, causal, convolution map. Let {T(k)} denote its impulse response. Then
llTIl~-.~d -- ~ IT(k)l. k=0
Proof. Note that for any u 9 ~ w i t h llull~o <
1 we
have
IITulI~o=II~T(!2) T(2)T(O). o 1(i0,1lloo ] T ( I ) T(0)
0"
]u(1) u2)
...
"..
12
Y~
= sup I ~--~ T(n n
-
k)u(k)
k=0
-
k)l I~(k)l
k=0
Do
sup~--~I T ( n n
supEI T ( n n
Do
<
<
-
k)l = ~
k=0
IT(k)t.
k=0
Thus we have show,, that IITIl~o-,.d _ } 2 ~ 0 IT(k)l. Consider elements u i in s which are defined by
u,(k)
= s g n ( T ( k ) ) if k < i =0ifk>i.
It can be seen t h a t if yi = T u i then yi(i) = }--~k=0 i IT(k)l 9 T h u s it follows t h a t T u ~ --+ ~k~=o IT(k)l. N o t e t h a t u i 9 ~ for all i. T h u s it follows t h a t llTll~o _> ~2~=0IT(k)l. This proves the lemma, rn From the l a m i n a above it follows t h a t for a linear, time invariant, causal, Do convolution m a p , T : en --+ g~ if }-~'4=017qj(t)l < o~ for all i and j, then T is a b o u n d e d m a p f r o m ~ to ~ . Lemma
4 . 1 . 2 . Let T be a linear, time invariant, causal map f r o m ( ~ ,
II II~)
to ( ~ , I1-11~). Then
IITIl~-~.a :-- max 3-" IIT~tlll, l < i < m -
-
j=l
where Tit are elements o f T (see Definition 4.1.4) and IIT~jlI, :=
~=0
IT~j(k)l.
72
4. Paradigm for Control Design
Proof. The proof is left to the reader.
[]
D e f i n i t i o n 4 . 1 . 5 (A-transforms). For a linear, time invariant, causal, convolution map T : ~n _+ ~m as described in Definition ,~.1.~ the A-transform o f T is defined as O0
r := E T ( i ) A i ' i----0
where f Tll(i) T12(i) ... Tin(i)
T(i):= [ T21.(i):T2 (i).
J
\ T m l ( i ) Tin2 (i)
Tm,~(i)
It can be shown that ~b is analytic inside the open unit disc and continuous on the boundary if the matrix sequence {T(i)} E ~ • Example 4.1.2. Consider a linear map as described in Example 4.1.1 where T(k) = ~ . Then
T(k)Ak = ~ k=0
1
k=O
1 -
1 -
89
The eo~ induced norm of T, given by I[Tll~_i~d = ~
IT(k)l : 2 < c~. Thus
k:0
T is l ~ stable. Definition 4.1.6 ( F i n i t e d i m e n s i o n a l s y s t e m ) . If the A-transform of any linear, time invariant, causal, convolution map T : gn __+ gm is such that ~j(A) is a ratio of finite polynomials in A then T represents a finite dimensional system. We use the term FDLTIC as an abbreviation for finite-dimensional, linear, time invariant, causal. Unless otherwise mentioned all systems will be assumed to be convolution maps. L e m m a 4.1.3. Let T be a F D L T I C system; T : e'~ ~ g,n. Then there exist real matrices A , B , C and D such that if y = T u for some u E gn then x(k + 1) = A x ( k ) + B u ( k ) y(k) = C x ( k ) + Du(k) x(0)
= 0,
where u = (u(O),u(1),...) and y : (y(O),y(1),...).
(4.2)
4.1 Notation and Preliminaries
73
Proof. See [2]. [] The representation of the map T as given in (4.2) is called a state space representation of T. A convenient notation empployed to denote the system described by (4.2) is
C
]
D
"
D e f i n i t i o n 4.1.7 ( C o n t r o l l a b i l i t y , o b s e r v a b i l i t y ) . The pair of real matrices A and B with A E R ~• and with B E R n• is a controllable pair if the matrix [B, A B , . . . , A~-IB] has full rank. The pair of real matrices A and C is observable if (A r, C r) is controllable. D e f i n i t i o n 4.1.8 ( S t a b i l z a b i l i t y , d e t e c t a b i l i t y ) . The pair of real matrices A and B with A E R nxn and with B E R ~xm is a stabilizable pair if there exists a real matrix K such that p(A + B K ) < 1, where p denotes the spectral radius. The pair of real matrices A and C is detectable if there exists a real matrix L such that p(A + LC) < 1. D e f i n i t i o n 4.1.9 /Minimal realization). The triplet (A, B, C) are a minimal realization o f t in (4.2) if(A, B) is controllable and (A, C) is observable9 L e m m a 4.1.4. Suppose T is a FDLTIC system. Then there exists a statespace description
C
D
]
of T such that (A, B, C) is minimal9
Proof. See [2]. D e f i n i t i o n 4.1.10 ( U n i m o d u l a r m a t r i c e s ) . A square polynomial matrix function P(,~) = P ( 0 ) + P(1)X + . . . + P(k))~ k, is said to be unimodular if the determinant of P ( k ) is a non-zero constant independent of X. T h e o r e m 4.1.1. Let 7"(X) be a m • n matrix of rational functions of X (a function is a rational function of X if it can be written as a ratio of two polynomials of X). Then there exist L, (] and M such that J" = L~/ILr where Z and (7 are unimodular with appropriate dimensions and M has the structure ~L
0 ...0~ 9
=
o
~-~-0 ...0 r 0 ... 0 0...0 ". 0...0 0 ... 0 0...0
{~i, r are coprime (that is they do not have any common factors) monic (leading coefficient is one) polynomials, which are not identically zero for all i = 1 , . . . , r with the following divisibility property: ei()~) divides ~i+t(X) without remainder and r (X) divides ~bi(X) without remainder.
74
4. Paradigm for Control Design
A)/is called the Smith-Mcmillan form of T(A). D e f i n i t i o n 4.1.11 ( Z e r o s a n d poles o f ~b). The zeros of T(A) are the
roots of I LI i(A). The poles of i (A) are the roots of nS_._l T h e o r e m 4.1.2. Suppose T is a FDLTIC system. Then the following statements ave equivalent. I. T is ep stable for any p, 1 < p < (x). 2. If
C
D
is any state-space description of T such that (A, B) is
stabilizable and (A,C) is detectable then p(A) < 1 where p(A) denotes the spectral radius of the matrix A. 3. T(A) the A-transform of T has all its poles outside the unit disc (that is if Ao is a pole o f t then IAo[ > 1). Proof. See [2].
[]
Example 4.1.3. Consider Example 4.1.2. T is a FDLTIC system. A statespace description of T is given by
C
D
=
[7 ?]Not th,
as T is g~ stable it is s stable for all 1 _< p <_ oo. Furthermore 5b(A) has a single pole at 2 which is outside the unit disc. This theorem establishes the fact that for FDLTIC systems stability in ep sense implies stability in s sense for any p and q such that 1 < p < oo and 1 < q < oo. Thus for FDLTIC systems we can use the term stability to mean stability in s sense for any 1 < p < ~ . D e f i n i t i o n 4.1.12 ( N o r m a l r a n k ) . Let ~b(A) be a m z n matrix of rational functions of A. The normal rank of T is the rank of T(A0) where A0 is any complex number which is not a zero of ~b(A). D e f i n i t i o n 4.1.13 (rcf, lcf, dcf). Stable FDLTIC systems M and N are right coprime if there exist stable FDLTIC systems X and Y such that the A-transforms satisfy the identity 2(A)A~/(A) - ]2(A)N(A) = I.
(4.3)
Stable FDLTIC systems M and N are left coprime if there exist stable FDLTIC systems X and Y such that the A-transforms satisfy the identity ~/(A)X'(A) - ]Q(A)]?(A) = I. 2,--1
(4.4)
^
Suppose ~" = IQ~1-1 = M N where N and M are right coprime and ~I and N are left coprime. Then the pairN and M form a right coprimefactorization (rcf) of T and the pair M and N form a left coprime factorization (lcf) of T.
4.2 Interconnection of Systems
75
A doubly-coprime factorization (dcf) of a FDLTIC system T is a set of ^
--1
^
stable FDLTIC maps M, N, M and 1Q such that T = IQiV1-1 = 2VI ~[ and ?
(4.5)
L e m m a 4.1.5. Let T be a FDLTIC map with a state space description [ A ID B ].
isstabilizableand(A,C) isdetectable. Thenthere
exists a dcf of T. Proof. In the definition of dcf let Z
2= M=
f1=
C+DF A+LC F A+BF F A+LC -C
Y
B+LD --I -I
] J '
A+LC F A+BF C + DF A + LC C
N =
/ L ], and 2Q= ^
--1
Then it can be shown that ~b =/QA~/-1 = M
L ] 9 ' -B ] -D ' B + LD D
.
^
N and (4.5) is satisfied.
Definition 4.1.14 (Unit in ~1). A s y s t e m U in ~ x , ,
is a
unit in el if
U -1 is in ~ x m .
4.2 I n t e r c o n n e c t i o n
w
of Systems
_1
Z
611 612
V1
Y 621
622
V2
Fig. 4.1. Framework
[]
76
4. Paradigm for Control Design
Many control design issues can be cast into the framework shown in Figure 4.1 which shows a gerealized plant G, exogenous inputs w, control inputs u, measured outputs y and regulated outputs z. K is the controller which maps the measured outputs y to control inputs u when vl and v~ are zero. Both K and G are linear time invariant causal convolution maps. With respect to the interconnection of systems G and K in Figure 4.1, the first issue that needs to addressed is the existence and uiqueness of signals z, u and y for given input signals w, Vl and v~. If the signals resulting in an interconnection of two systems exist (that is they satisfy the conditions posed by the interconnection) and are unique then we say that the interconnection is well-posed. Note that for the interconnection in Figure 4.1 the existence and uniqueness of z, u and y is sufficient for the well-posedness of the interconnection. The signals satisfy the relation
(i
I
-G22
(i)
=
I
vl
\G21 0
v2
9
(4.6)
We will suppose throughout that the interconnection is well-posed. This is guaranteed if the map G~2 is strictly causal. Let H(G, K ) be such that = H(G,K)
vl
.
V2
The interconnection described by H(G, K) is often referred to as the closed loop map. D e f i n i t i o n 4.2.1 ( S t a b i l i t y o f c l o s e d l o o p m a p s ) . The closed loop map described by Figure 4.1 is ~p stable if IIH(G, l{)llp_in d < oo. In such a case K is said to be a stabilizing controller in the fp sense. 4.2.1 I n t e r c o n n e c t i o n o f F D L T I C S y s t e m s
Vl
Y
V2
Fig. 4.2. Parametrization of stabilizing controllers for G22.
4.2 Interconnection of Systems
77
L e m m a 4.2.1. Let G22 be a F D L T I C system which has a dcf given by G22 = ^
--1^
IV ~1-1 = 1~I
_)
N where 2
= i.
(4.7)
A FDLTIC controller K stabilizes the closed loop map shown in Figure ~.2 if and only if K has a rcf l~" = Y l f ( ~ 1 such that the map
N X1 is a unit in
~1.
el
Y1 v2 Fig. 4.3. Closed loop map with coprime factors for G22 and/(.
Proof. (~)Suppose an rcfof K is given by 121)(11 and suppose i N
!/1
)is X1 / an unit in el. It is clear that the Figure 4.2 is the same as Figure 4.3. Note that \
the map from ((, r/)to(vl, v2) isgivenby (_NM -eY 1 ) . BXe c a u s 1e t h e i n v e r s of this map is stable it follows that the map from (vl, v~) to ((, r]) is stable. But IlYlIp = IIN(IIp <_ I]NIIp-indll(llp and IlUllp = JlY171tlp <_ Ilglllp_i,~dllr/llp. Thus the map from (vl, v2) to (u, y) is stable and therefore the closed loop map is stable. (==~)Let FDLTIC controller K be such that the closed loop map in Figure 4.2 is stable. Thus tile map from (Vl, v~) to (u, y) is stable. Every FDLTIC system admits a dcf (see Lemma 4.1.5), and therefore it admits a rcf also. Let a rcf of K be given by /~" = 1;'1)(i-1. From the dcf of G~2 it is follows that ~ _ ~ / - ]~l~r = I. Multiplying both sides of this equation by ~ we have = .~(vl + u ) - ~ ' y and thus II~Jlp < Jl'~(llp-ind(llVlllpnt-IlUllp)-l-II}/'llp-indl[Yllp" This implies that the map from (Vl, v2) to ((, r/) is stable. Thus tile inverse of the map ( 5
-X1 g l ) is stable.
[]
T h e o r e m 4.2.1. Let FDLT1C system G~2 admit a dcf as given in Lemma 4.2.1. Then K is a F D L T I C stabilizing controller for the closed loop system in Figure 4.2 if and only if
78
4. Paradigm for Control Design
for some FDLTIC stable system Q. Proof.
(r
(/0/Q)
Multiplying both sides of (4.7) by ( ~ Q ) )
from the left and by
from the right we have
where Q is a stable FDLTIC map. From Lemma 4.2.1 it follows that /~" = (Y AT/(~)()( - N ( ~ ) - I is a stabilizing controller. It also follows from (4.8) -
that (Y - 21)/Q)()( - N ( ~ ) - I = (3~ _ Q ~ ) - I ( ~ - _ Q/~/) (follows from the observation that the (1,2) element of the product in (4.8) is zero). (=:>) Suppose K is a stabilizing controller. Then from Lemma 4.2.1 we know that there exist stable FDLTIC systems Y1 and X1 such that K =
Y1X~l and ( M YX1 1)
is an unit in ~l" Thus it f~176
is a unit in t~l where D = -NY1 + 217/X1 and Q := - ( X Y 1 - YX1)D -1. Therefore D is a unit in tl. Thus D -1 is a stable system and therefore Q is also stable. Multiplyng both sides of the above equation by
3~
we
have
By comparing entries in the above equality we have the result that If --(Y - MQ)(X - NQ) -1. This proves the theorem. [] Let us assume that in Figure 4.1 G and K are FDLTIC systems. Also assume that a stabilizable and detectable state-space description of is described by
( GIl G~2 ) = [ A G-- ~G21 G22
B1B2
C1
Dll D22
C2
D21 D22
I.
This notation is a convenient way of writing Gll = andG22=
[At , ] ] C1
Dll
C2
D22
'
, GI.~ =
C1
D12
'
C2
D~I
]
4.2 Interconnection of Systems
79
L e m m a 4.2.2. There exists a FDLTIC system K which stabilizes the closed loop in Figure ~.1 if and only if (A,B~_) is stabilizable and (A, C2) is detectable. If F and L are such that p(A + B2F) < 1 and p(A + LC2) < 1 then a controller with a state space realization given by
stabilizes the closed loop system depicted in Figure ~. I. Proof. (~=) If (A, B~) is stabilizable and (A, C2) is detectable then there exist matrices F and L are such that p(A + B~F) < 1 and p(A + LC2) < 1. Let K be a controller with a state space realization given in (4.9). It can be shown that the closed loop system has a state-space description given by
C'
/)
where
A=
- L C 2 A + B2F + LC2
'
which has the same eigenvalues as the matrix
( A + LC2 + 0B2F~J \-LC2 A Thus p(fi,) < 1 from which it follows from Theorem 4.1.2 that the closed loop map is stable. (==~) If (A, B2) is not stabilizable or (A, C2) is not detectable then some eigenvalues of A will remain outside the unit disc for any FDLTIC controller K. Details are left to the reader. 0 The controller K given above is called the Luenberger controller L e m m a 4.2.3. Suppose (A, B2) is stabilizable and (A, C2) is detectable. Then FDLTIC system K stabilizes the closed loop system depicted in Figure .~.1 if and only if it stabilizes the closed loop system depicted in Figure
~.2. Proof. (::~) The closed loop map depicted in Figure 4.1 is described by the equations z = Gllw + G12u y = G21w + G22u 9 u=Ky+Kv2+vl.
(4.10)
The description of the closed loop map depicted in Figure 4.2 is given by
y = G~u u = K y + Kv2 + vl.
(4.11)
It is thus clear (substitute w = 0 in (4.10)) that if the map from (w, Vl, v~) to (z, u, y) in (4.10) is stable then map from (vl, v2) to (u, y) in (4.11) is stable.
80
4. Paradigm for Control Design (r
Suppose K is a stabilizing controller for the closed loop map in Fig-
ure 4.2. Let [ AK CK ] BK DK ] beastabilizableanddetectablestate-spacedeA I D~2 B2 ] is a stabilizable and detectable scription of K. By assumption [ C2 state-space description of G22. Suppose, [A~-~--~-~-----]is a state-space description of the closed loop map obtained by employing the aforementioned statespace descriptions of G22 and K. Then one can show that (A, B) and (A, C) are stabilizable and detectable [2]. Thus from Theorem 4.1.2 it follows that p(A) < 1. If
~
/)
is a description of the closed loop map in Figure 4.1 ob-
tained by using the descriptions
[ I ] CK
DK
[
C
D
for
G~2 then by computing A one can verify that .4 = A. Thus p(.4) < 1 and therefore from Theorem 4.1.2 it follows that the closed loop system in Figure 4.1 is stable. [] T h e o r e m 4.2.2. Suppose (A,B~) is stabilizable and (A, C2) is detectable. Let FDLTIC system G22 admit a dcf as given in Lemma 4.2.1. Then K is a FDLTIC stabilizing controller for the closed loop system in Figure 4.1 if and only if
~ = I~"- MO)IYC - ~90,) -1 = I~ -0,?~)-11~ - 0 , ~ ) , for some FDLTIC stable system Q. Proof. Follows immediately from Theorem 4.2.1 and Lemma 4.2.3. By using the above parametrization we can show that
[]
K ( I - G2~K) -1 = (V - MQ)I~I.
The map from w to z in Figure 4.1 is given by = GI1 + G12K(I - G22K)-lG21.
Thus we have the folowing theorem T h e o r e m 4.2.3. Let G be FDLTIC system and let G22 admit a dcf as given in Lemma 4.2.1. r is a map from w to z in Figure 4.I for some FDLTIC, K which stabilizes the closed loop if and only if = H - UQV, where H = Gll + G12YlVIG21 U = GI2M V = ~IG21 and Q is some stable FDLTIC system.
4.2 Interconnection of Systems
81
We now present a result which is a generalization of T h e o r e m 4.2.3. T h e o r e m 4.2.4 ( Y o u l a p a r a m e t r i z a t i o n ) . Let G be a F D L T I C system and let G~2 admit a dcf as given in L e m m a 4.2.1. q~ is a map from w to z in Figure 4.1 for some linear, time invariant, causal K which stabilizes the closed loop in the goo sense if and only if q~ = H - UQV, where H = G l l -t- G12YMG21 U = G12M V = MG21
and Q is some gc~ stable system. T h e p a r a m e t e r Q is often referred to as the Youla parameter. The difference between T h e o r e m 4.2.3 and T h e o r e m 4.2.4 is that in T h e o r e m 4.2.4 the controller K is not restricted to be finite-dimensional. The proof of this theorem is similar to the one presented for T h e o r e m 4.2.3 except that an analogous result for coprime factorization over g o stable systems is utilized.
5. S I S O
742 P r o b l e m
Consider the system of Figure 5.1 where w = (wl w2) is the exogenous disturbance, ~ = (zl z~) is the regulated output, ~ is the control input and y is the measured output. A number of control design problems can be cast in the framework depicted by Figure 5.1 (see [3]). In feedback control design the objective is to design a controller, K such that with ~ = Ky the resulting closed loop map ~-w from w to T is stable (see Figure 5.1) and satisfies certain performance criteria. Such criteria may be posed in terms of a measure on (/)~-~ which depends on the signal norms of w and T that may be of interest in a particular situation. For example, the 7/oo norm of the closed loop measures the energy of the regulated output T for the worst disturbance w whose energy is bounded by one i.e. IIr
=
sup
II~/i~-wwll2.
I1w112_<1
The standard 7/~ problem minimizes this norm over all achievable closed loop maps. Thus the standard 7 / ~ problem solves the following problem: min
K stabilizing
II~w(I~')ll~,
where (P~-~(K) is the closed loop map from w to T. The two norm of the closed loop, II(P~-wll2 measures the energy in the regulated o u t p u t T for a unit pulse input, w. The standard 7/~ problem finds a stabilizing controller K which results in a closed loop map which has the minimum 7t2 norm when compared to all other closed loop maps achievable through stabilizing Wl
W2
Fig. 5.1. Closed Loop System.
Zl
i
Z2
84
5. SISO ~1/ ~/2 Problem
controllers. State space solutions for both the above mentioned problems are provided in [4]. The ~1 norm of the closed loop, I1~-~ II1 is the infinity norm of the regulated output ~, for the worst disturbance w whose magnitude is bounded by one i.e.
The standard gl problem finds a controller which minimizes this norm over all closed loop maps that are achevieable through stabilizing controllers. Thus the standard tl problem is to determine a controller K which solves tile following problem: min ]]r K stabilizing
(K)]]I.
It is shown in [5] that this problem reduces to a finite dimensional linear program for the 1-block case. All of the previous criteria refer to a single performance rneasure of the closed loop. It. is well known (see for example [3]) that minimization with respect to one norm may not necessarily yield good performance with respect to another. This has led researchers to consider problems where multiple measures of the closed loop are incorporated directly into the design. One of the important classes of problems in this category is the mixed 7 / 2 / 7 / ~ design. Here, the focus is on problems which include the 7/2 and the 7 / ~ norms of the closed loop in their definitions. Several state space results are available in this class (e.g., [6, 7]). Another important class of problems considers the interplay between the gl and the 7 / ~ norm of the closed loop. Problems from this class are addressed in [8, 9, 10, 11, 12]. In this book we will design controllers which guarantee performance as reflected by the 7-/2 and the /~1 measures. In this and the next chapter we present single-input single-output (SISO) (where there is only one exogenous disturbance and only one regulated variable) problems for this class. In Chapter 7 and Chapter 8 we present the multi-input multi-output (MIMO) problems where we study the interplay between the 7t2 and the ~1 norms for a general plant. The SIS() problems serve to highlight the characteristics of the 7/2 - e l problems. The relevant literature for various methodologies which address the objective of the 7/2 measure in conjunction with time domain measures can be found in [13, 14, 15, 16, 17, 18, 19, 20, 21]. In this chapter we consider the problem of minimizing the ~1 norm of the transfer function from the exogenous input to tile regulated o u t p u t over all stabilizing controllers while keeping its 7/2 norm under a specified level. The problem is analysed for the discrete-time, single-input single-output (SISO), linear time invariant case. It is shown that an optimal solution always exists. Duality theory is employed to show that any optimal solution is a finite impulse response (FIR) sequence and an a priori bound is given on its length.
5.1 Problem Formulation
85
Thus, the problem can be reduced to a finite dimensional convex optimization problem with an a priori determined dimension. Finally it is shown that, in the region of interest of the 7t2 constraint level the o p t i m a l is unique and continuous with respect to changes in the constraint level.
5.1 P r o b l e m F o r m u l a t i o n Consider the standard feedback problem represented in Figure 5.2 where P and K are the plant and the controller respectively. Let w represent the exogenous input, ~ represent the output of interest, y is the measured output and ~ is the control input where ~ and w are assumed scalar. Let r be the closed loop m a p which m a p s w --+ ~. From the Youla paramctrization (see Theorem 4.2.4) it is known that all achievable closed loop m a p s under stabilizing controllers are given by r = ]z - fig, where h, u, q E gl ; h, u depend only on the plant P and q is a free p a r a m e t e r in gl. The following is a result from complex analysis T h e o r e m 5.1.1. Given a sequence {t(i)} in gl with the associated A transf o r m i(A) then the following statements are equivalent. 1- d-Xrdki[X=~o= 0 f o r k = 0, 1, . . . . . ~ - 1 2. i(A) = (A - A 0 ) ~ where ~ is the A - t r a n s f o r m of some element sEgl. Throughout the chapter we make the following assumption. A s s u m p t i o n 1 All the zeros of fi (the A transform of u) insidc the unit disc are real and distinct. Also, it has no zeros on the unit circle. The assumption that all zeros of fi which are inside the open unit disc are real and distinct is not restrictive and is made to streamline the presentation of the chapter. Let, the zeros of u which are inside the unit disc be given by Zl, z2, 9 zn. Let
w I I,
-I
K
[_ 1TM
Fig. 5.2. Plant controller configuration
86
5. SISO gl/ "K2 Problem O:={r
there exists qE1?l with r
O is the set of all achievable closed loop m a p s under stabilizing controllers. Let A : 171 --+ R n be given by
1 z~ ~ z~ .. A~
:
:
:
:
:
\ 1 z. z~ z 3 .. and b E R" be given by
Theorem
The following is true:
5.1.2.
0 = { r ~ 171 : ~(z~) = ]~(z~) for all i = 1 , . . . , n} :
{r
E
171
: A r = b}.
Proof. (=~) Suppose r E O. Then r = h - fiq for some q C el. It follows t h a t r E el and t h a t r = h(zi) - fi(zi)dl(z~) = 0, for all i = 1 , . . . , n . T h u s E 171 and Ar = b. ( ~ ) Suppose, r = h(zi) for all i = 1 , . . . , n and r E 171. It follows t h a t (h - r = 0 for all i = 1 , . . . , n and h - r E 1?t. F r o m T h e o r e m 5.1.1 it follows t h a t (h - r = (A - A 1 ) . . . (A - )~,~)IS(A) where p(A) is the A-transform of an element p E 171. Let /L(A) = (A - A 1 ) ~ where t3~ has no zeros inside the closed unit disc in the c o m p l e x plane. T h u s it follows t h a t
A s / ~ has no zeros inside the unit disc it is a unit in 171. It follows t h a t q E t71. T h u s r = h - / t q where q E 171. [] T h e following p r o b l e m voo := inf{llh - u 9 qlll : q E 171} = inf{llr
r ~ 171 and Ar = b},
(5.1)
is the s t a n d a r d 171 p r o b l e m . In [5] it is shown t h a t this p r o b l e m has a solution which is possibly non-unique. O p t i m a l solutions are shown to be finite i m p u l s e response sequences. Let
# ~ := i n f { l l h = inf{[Ir
u . qll22 : q e171}, : r E gl and Ar = b},
(5.2)
5.2 Optimal Solutions and their Properties
87
which is the standard 7-/2 problem. The solution to this problem is unique and the solution is an infinite impulse response sequence. Define ml :=
inf
[[S[[1,
(5.3)
a,=b,llCl122_<~ which is the gl norm of the unique optimal solution of the standard 7/~ problem. Let rn2 :=
inf
I1r
(5.4)
which is the infimum over the g2 norms of the optimal solutions of the standard ~1 problem. The problem of interest is : Given a positive constant 7 > #r obtain a solution to the following mixed objective problem:
vw := i n f { [ [ h - u * q ] ] l : q 9 el and < h - u * q , h - u , q ><_ 7} = inf{llSlll : r ~ e~ ,AS = b and < S,S >_< 7}-
(5.5)
In the following sections we will study this problem from the point of view of existence, structure, continuity, and computation of the optimal solutions.
5.2 Optimal
Solutions
and
their
Properties
In the first part of this section we show that (5.5) always has a solution. In the second part we show that any solution to (5.5) is of finite length and in the third we give an a priori bound on the length. 5.2.1 E x i s t e n c e o f a S o l u t i o n Here we show that a solution to (5.5) always exists. T h e o r e m 5.2.1. There exists r
its0111 =
E (P such that
f{llSlll},
where ~ : = { S E gl : A S = b and < S , S ><_ 7} with "y > i ~ . infimum in (5.5) is a m i m m u m .
Therefore the
Proof. We denote the feasible set of our problem by ~ : = { S E gl : AS = b and < S, S >_< 7). v-r < oo because 3' > poo and therefore the feasible set is not empty. Let B := {S E gl : IlSl[x _< u-r + 1}. It is clear that
v-r=
inf .{[[Slla}.
CE,~nB
Therefore given i > 0 there exists Si E (P A B such that [ISil]l < u-r + ! B is a bounded set in gl = c0. It follows from the Banach-Alaoglu result --
i
~
88
5. SISO E,/ 7/2 Problem
(see Theorem 2.3.1) that B is W(c~,co) compact. Using the fact that co is separable we know (see Theoerm 2.3.2) that there exists a subsequence {r of {r and r E r B such that {r -+ r in the W(c~,co) sense, that is for all v in co
>--+
as k ~ e c .
(5.6)
Let the jth row of A be denoted by aj and the jthelement of b be given by bj . Then a s aj E co w e have, <~ aj, r
~>-~<
aj, r > as k ~ oo for all j = 1 , 2 , . . . , n.
(5.7)
= b we have < aj, r > = by for all k and for all j which implies < aj, dpo ;>= bj for all j. Therefore we have A(r = b. As 12 C Co we have
As A(r
from (5.6) that for all v in 12 < v, r
>--+< v, r
> as k -+ ~ ,
(5.8)
which shows that r --+ r in W(l~,l~). Also, from the construction of r we know that I1r 112 _< x/7. From Theorem 2.3.1 (Banach-Alaoglu theorem) we conclude that < r r > < 7 and therefore we have shown that r G ~. 1 From Theorem 2.3.1 Recall that r were chosen so that IIr < u~ § i-~" for all k. Therefore IIr < u~. As r E we have that [leo[It < u-r + U (which is the feasible set) we have IIr -- t,.,. This proves the theorem. [] 5.2.2
Structure of Optimal Solutions
In this subsection we use duality results to show that every optimal solution is of finite length. The following two lemmas establish the dual problem. L e m m a 5.2.1. u~ : max{~(yl,y2) :Yl >_ 0 and y2 E Rn},
(5.9)
where
~(y,, y2) := ~ f {11r
+ yl(< r r > - ~ ) + < b -
Ar
Y2 >}.
Proof. We will apply Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) to get the result. Let X, $?, Y, Z in Theorem 3.3.2 correspond to ~ea,gl, R" and R respectively. Let g(r : = < r r > -% H ( r := b - Ar With this notation we have Z* = R. A has full range which implies 0 E int[range(H)]. 7 > p~o and therefore their exists r such that < r r > - 7 < 0 and H(r = 0. Therefore all the conditions of Theorem 3.3.2 are satisfied. From Theorem 3.3.2 (KuhnTucker-Lagrange duality theorem) we have u-y:
max
inf{Jlr162162162
yt )_O,y2ER'* CElx
This proves the lemma. The right hand side of (5.9) is the dual problem.
}. []
5.2 Optimal Solutions and their Properties Lemma
The dual problem is given by :
5.2.2.
m a x { ~ ( y l , Y2) : Yl >_ 0 and Y2 E tt'~},
where ~(Yl,
89
(5.10)
Y2) is
r162
>}
+ y x ( < r 1 6 2> - 7 ) + < b, y2 > - < r
v(i) is defined by v(i) := A* y2(i). Proof. Let Ya _> 0, Y2 E R '~. It is clear that inf {114)111+ Yl(< 4),8 > - 7 ) +
CEll
=
< b - A t , y2 >}
inf {114)111 + Y l ( < 4), 4) ~> --7)-}- < b, .I]2 > - < r v > } .
Suppose 4) E gl and there exists i such t h a t r < 0 then define 4)1 E gi such t h a t 4)l(j) = 4)(j) for all j # i and r -- 0. Therefore we have IIr + yl(< r > -7)+ < b, y2 > - < 4),v > k II4111~+ y l ( < 4)1,4)1 > --7)+ < b, y2 > - < 4)1,v >. This shows that we can restrict 4) in the infimization to satisfy r >_ O. This proves the lemma. [] T h e following theorem is the main result, of this subsection. It shows t h a t any solution of (5.5) is a finite impulse response sequence. 5.2.2. Define T:={4) E gi : there exists L* with r L" }. The dual of the problem is given by.
= 0 if i >
Theorem
max{~(yl,Y2) : Yl > 0, y2 E /in},
(5.11)
where ~(Yl, Y2) is r
inf
{llr162162
y2>-<4),v>}.
v(i) defined by v(O = A'y2(i). Also, any optimal solution r of (5.5) belongs to T. Proof. Let y~ >_ 0, y~ E R '~ be the solution to max yI >>O,y2ER"
r162
+ y~(< 4),r > - 7 ) + < b
m4),y2 >}. * At
It is easy to show t h a t there exists L* such that v'Y(i) := (A y2)(i) satisfies Iv*r(i)l < 1 if/>_ L'. If 4)(i)v'~ ( i) > 0 for all i then, 114)11~ + y ; ( < r 1 6 2 > - 7 ) + < b,y~ > - < r ~ > (x) =
~--~{1r
+ y~ (r
2- r
- y~7+ < y~,b >
i=0 oo
= Elr
- vW(i)) + y~(r
~ } - Y1~7+ < y~, b >
- v'~(i)) + y~(r
2}
i=0 L*
= E{r i=0
90
5. SISO el/ 7i2 Problem
+
)_~ { r 1 6 2
2} - Y ~ 7 + < yg,b > .
i=L*+I
Suppose [v"(i)[ < 1. Then we have, r
- vY(i)) + y~(r
2> 0
and equals zero only if r = 0. Therefore, in the infimization we can restrict r = 0 whenever IvY(i)[ < 1. As IvY(i)[ < 1 for all i _> L" it follows t h a t we can restrict r to 7- in the infimization. In T h e o r e m 5.2.1 we showed t h a t there exists a solution r to the primal. From T h e o r e m 3.3.2 (Kuhn-TuckerLagrange duality theorem) we have t h a t r minimizes
Iir
+y~(< r162> -7)+
< b, y'~ > - < r v y >,
over all r in gl. From the previous discussion it follows t h a t r proves the theorem.
G 7-. This []
5.2.3 An A priori Bound
Solution
on the Length
of Any Optimal
In this subsection we give an a priori b o u n d on the length of any solution to (5.5). First we establish the following three lemmas. Lemma Ar
5.2.3. Let 7
inf
r162
>
#~,
:=
ml
Ar
inf
r162162
IIr
and Vy : =
y:, y~ represent a dual solution as obtained in (5.10).
I[r
Then y~ < M.y where My : = --
m~ .
~[ -- tt oo
Proof. Let 7 > 3'1 > # ~ and vy I '-.-
Ar
inf
r 1 6 2 ~"h
IIr
Let Yly, Y2Yrepresent
a dual solution as o b t a i n e d in (5.10). From Corollary 3.3.1 (a sensitivity result) we have < 7 -- 3'1, Y~ > ~ Vy, -- /JAr ~ Vy, _< m l , which implies t h a t y~ < '~: . This holds for all 7 > 71 > /loo. Therefore -"7-')'1 My : = y -m;,ur is an a priori b o u n d on y~. This proves the lemma. [] 5.2.4. Let r be a solution of the primal (5.5). Let y~, y~ represent the corresponding dual solution as obtained in (5.10). Let v y : = A*y~ then,
Lemma
y~r
= . - ( i2) - 1 i f vY(i) > 1 _ ~'(i)+1 --
=
2
0
i f vY(i) < - 1 i f IvY(i)[_< 1. 2rn xy/'~
Also, IlvY[[oo <_ ay where ay = y - u ~ + 1.
5.2 Optimal Solutions and their Properties
91
Proof. Let oo
L(r
:= ~-~{r
v'Y(i)) + Y7(r
-
2} - 7Y7+ <
b
,Y2-r > 9
i=O
Suppose ]v'~(i)[ = 1. Now, i f y 7 = 0 then it is clear that. Y•r = 0. I f y 7 > 0 then as r minimizes L(r we have r = 0. We have already shown that if [v'r(i)l < 1 then r = 0. Therefore, Y7r -= 0 if Iv~(i)l _< 1. Suppose v~(i) > 1 then it is easy to show that there exists r such that r > 0 and r - v'r(i)) + y7(r 2 < 0. As any optimal minimizes n(r we know that r ~ ( i ) ) - v ~ (i))+y7 (r 2 < 0, which implies r > 0 and therefore 1 - v'r(i) + 2y7r ) = 0. This implies that Y7r = ,'(i)-~ Similarly the result follows when v~(i) < - 1 . Therefore, 2 " I[v~[l~ < 2M~llr
< ~
--
--
IIr
"y--,u~o
follows from the fact that < r is an a priori upper bound on
r
< ~"F+I. --
"Y--P,
o o -
The last inequality
> < 7. This implies that c~-r := 2"~'v~ + 1 This proves the lemma. [] --
~ - P o o
IIv lloo.
--
L e m m a 5.2.5. If y~ E R n is such that IIA*y?llo~ < a n then there exists a positive integer L* independent of y2 such that I(A* y2)(i)l < 1 for all i >_ L*.
Proof. Define Zl
A)=
Z2
Z3 . . .
L" ' L.' L.' ' L. . . \
Zn 1
I
z3 ... z. /
A*L : R n -+ R L+I. With this definition we have A~o = A*. Let Y2 E R'* be such that [IA*y211~ < a~. Choose any L such that L > ( n - 1). As zl,i = 1 , . . . , n are distinct A~, has full column rank. A~ can be regarded as a linear map taking (R n, ]].]]1) --4 (R L+I, [I-[[oo). As A~ has full column rank we can define the left inverse of A~,, (A~,) -l which takes (R L+I, ]].1]o~) --+ (R", [I.[]l). Let the induced norm of (A~,) -l be given by H (A~) -l Ho~,l. Y2 e R n is such that IIA * y2l[oo < _ a-y and therefore IIALy2II+ * _< a~. It follows that, Ily2lll <_1[ (A~,) -t [[oo,1 I]A~y2lloo _
(5.12)
Choose L* such that max
k:l,...,n
Izkl L" II ( A ~ ) - ' II~o,~ ~-,
< 1
(5.13)
There always exists such an L* because ]zk] < 1 for all k = 1 , . . . , n . Note that L* does not depend on y~. For any i > L* we have
92
5. SISO t l / "H2 Problem k-~-I2
I(A*y2)(i)] = I E
z~y2(k)] <
Izklilly2111
max
-- k = l , . . . , n
k=l
<
max
Izk[i [[ (A}.) -1 11003 a,~,
-- k = l , . . . , n
< --
max
Izkl L" [[ (A'L) -l Iloo,1 a~.
k=l,...,n
The second inequality follows from (5.12). From (5.13) we have I(A'y2)(i)l < 1 if i > L*. This proves the lemma. [] The following theorem is the main result of the section. T h e o r e m 5.2.3. Every solution r
of the primal (5.5) is such that r = 0 if i >_ L* where L" given in Lemma 5.2.5 can be determined a priori. Furthermore the upperbound on lengths of the optimal solutions is nonincreasing as a function of 7.
Proof. Using Lemma 5.2.4 we can bound on I[v'r]]c~ by a~. By applying Lemma 5.2.5 we conclude that there exists L~ (which can be determined a priori) such that ]v~(i)] < 1 if i > L~. Using the fact that r = 0 if [v~(i)[ < 1 we conclude that r = 0 if i _> L.~. L,r was chosen to satisfy max
k=l,...,n
Iz l z"
II (A},) -I 1[~,1 av < 1.
a-~ is nonincreasing as a function of 7 9 Therefore L~ is nonincresaing as a function of 7. This proves the theorem. [] Note that as a~ = 2ml,fi ~ , - ttzr + 1, we have that the upper bound on lengths of the solutions increases to infinity as 7 decreases to #oo. This is commmensurate with the fact that the optimal solution for the standard 7-/, problem (5.2) is an infinite impulse response sequence. The above theorem shows that the problem at hand is a finite dimensional convex problem of a priori determined dimension. In particular, in view of Theorem 5.2.3 the problem that needs to be solved is as follows L"
u~ = At, r
r <~ E '
where AL. =
'
-
Ir k)l,
(5.14)
k = 0
1 Zl z [ . . . \ 1 z2 z ~ . . z L ' J . . . . . .
and
L*
is given in L e m m a 5.2.5. An alter-
1 zn z~ .. z nL* native representation can be given as the following lemma suggests L e m m a 5.2.6. The primal is given by." L*
minimize E k----O
r
+ r
(5.15)
5.3 Uniqueness and Continuity of tile Solution subject to
AL.(r
< r r162
-- r
r162
93
= b
r
>_<
in R L~ w i t h r 1 6 2
> 0.
Proof. Note that in the above theorem the ordering is componentwise for the inequalities. We will show that (5.14) is equivalent to (5.15). Let P0 denote the value attained by the objective functional in (5.15). Suppose r r satisfy the constraints of 5.15. Let r := r - r Then it is clear that r satisfies the constraints of (5.14). Also, for each k, ]r = Ir + - r _< Jr + Jr = r + r (k). This implies that v-~ _< P0. Suppose r satisfies the constraints of (5.14). Define r such that r = r if r _> 0 and 0 otherwise. Similarly, define r such that r = -r if r _< 0 and 0 otherwise. It is clear that r = r - r and that r r satisfy the constraints of (5.15). Also, Jr = r + r This proves that u7 _> P0. Therefore u7 = P0. It is easy to show that if r r is optimal for (5.15) then r := r - r is optimal for (5.14). This proves the lemma. [] This class of convex problems can be solved efficiently using standard methods [22].
5.3 Uniqueness and Continuity of the Solution In this section we address the issue of uniqueness and continuity of solutions to the primal problem with respect to changes in the constraint level on the 7/~ norm of the closed loop map. In the first part we address the issue of uniqueness and in the second part we show that the optimal solution is continuous in the region where it is unique. 5.3.1 U n i q u e n e s s o f t h e O p t i m a l S o l u t i o n The following three lemmas are established before the main result of this subsection L e m m a 5.3.1. Let y~ >_ O, y~ E R n be a solution to (5.10). If y~ = 0 then v~ = v ~ . This implies that (5.5) reduces to solving a standard 21 problem. Proof. Let v :-- A ' y 2 and r (5.10) is given by:
be such that Ar 1 = b. If y~ -- 0 then the dual max
y~ER n r
inf
{JJCJJl+ < b - A r
>}
O0
= max inf y2eR ~ r
~--~{r i=o
- v(i))}+ < r
v >
94
5. SISO ~1/ 7t2 Problem co
=
max
inf
~-~{r
- v(i))}+ < r
v e R a n g e ( A* ) r i)v( i) > O i = 0
i
v >
Suppose Ilvllo~ > 1 then there exists j such that Iv(j)[ > 1. Thus we can choose r with r M. This implies that
>_ 0 such that C(j)(sgn(v(j)) - v(j)) < M for any
oo
r
inf
~-'{r "= W:o
- v(i))}+ < ffa,v > = - o e .
Therefore we can restrict v in the maximization to satisfy Ilvll~ ___ 1. From arguments similar to that of the proof of Theorem 5.2.3 r = 0 whenever Iv(i)[ < 1. Therefore the infimum term is zero whenever Ilvll~ _< 1. This implies that the dual problem reduces to: max < C 1, v >, ven,,~ge(a'),llvl[~__ O, y~ E R '~ be a solution in (5.10). If y~ > 0 then
the solution r
of (5.5) is unique.
Proof. Let L ( C ) : = I1r + y~'(< r162 > - 7 ) + < b - AC, y~ >. From Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) we know that C0 minimizes L(C), C E ~1. If y~ > 0 then it is easy to show that L(C) is strictly convex in ~I- From the L e m m a 3.3.2 it follows that r is unique. This proves the lemma. [] The main result of this subsection is now presented. Theorem
5.3.1. Define S := {r : Ar = b and 11r
= u ~ } , m2 := inf < CES
r C > - The following is true: 1) I f 7 > rn2 then problem (5.5) is equivalent to the standard ~x problem
whose solution is possibly nonunique. 2) If #o~ < 7 < rn~ then the solution to (5.5) is unique. Proof. Suppose m2 < 7 then there exists r E s such that Ar = b, I1r = u ~ and < Cl, Cl > 5 ")'. This implies that t,~ = inf IIC111 < uoo. T h e Ar
~b,r162
other inequality is obvious. This proves 1). Let/zoo < 7 < m2 and suppose yl~ = 0 then we have shown in L e m m a 5 . 3 . 1 that v7 = u ~ . Therefore there exists Cl e el such that [1r = u ~ , Ar = b and < r162 > < 7 < m2. This implies that r E S and < r162 > < m2 which is a contradiction. Therefore y~ > 0. From L e m m a 5.3.2 we know that C0 is unique. This proves 2). [] The above theorem shows that in the region where the constraint level on the 7"/2 is essentially of interest (i.e., active) the optimal solution is unique.
5.4 An Example 5.3.2 C o n t i n u i t y o f t h e O p t i m a l
95
Solution
Following is a theorem which shows that the ~1 norm of the optimal solution is continuous with respect to changes in the constraint level 7. Theorem
5.3.2. Let v.y :=
Ar162
inf
~?
116111- Then v.~ is a continuous
function of 7 on (/aoo, oo). Proof. If 7 E ( ~ , ~ ) then it is obvious that "~ ~ i n t { d o m ( v ~ ) } where dom(v~) := {7 : - o o < v-~ < oo} is the domain of v-~. From L e m m a 3.3.3 we know that v~ is a convex function of 7. The theorem follows from the fact that every convex function is continuous in the interior of its domain (see L e m m a 3.1.2). [] Now we prove that the optimal solution is continuous with respect to changes in the constraint level in the region where the optimal is unique. Theorem
Then r
5.3.3. Let poo < 7 ~ m2. Let 6~ represent the solution of
-4 6~ in the norm topology if 7k --4 7.
Proof. Let ml :=
min
ACmb,
]]6]]1. Then it is obvious that ]]6~]11 = v~ (
m l . Without loss of generality assume that 7k > 7/2. Let L* represent the upper bound on the length of 6~- Then as the upperbound is nonincreasing (see Theorem 5.2.3) we can assume that r E R L ' . L e t B := { x : x E R L" : ]]x]]l _< m l } then we have 6-~k E B. Therefore there exists a subsequence Ok, of 6~k and 61 such that 6k, ~ 61 as i -4 cx~ in (R L" , 11.111).
(5.16)
It is clear as in the proof of T h e o r e m 5.2.1 that A61 = b as ACk, = b for all i. Also,
11611t22 <_ II6! -
r l l / + II6k,112 _< I161 6k,1122 +Tk,. -
Taking limits on both sides as i -4 0o we get < r >_< 7. This implies that 61 is a feasible element in the problem of v-y. From T h e o r e m 5.3.2 we have ]]6k,]]1 -4 v~. From (5.16) we have ]]61]]1 = v~. From uniqueness of the optimal solution we have 61 = 6~. From uniqueness of the o p t i m a l it also follows that 6 ~ -4 6~. This proves the theorem. []
5.4 A n E x a m p l e In this section we illustrate the theory developed in the previous sections with an example. Consider the SISO plant,
96
5. SISO gl/ 7i~ Problem 1 /5()~) = )~_ 2 '
(5.17)
where we are interested in the sensitivity m a p r := ( I - P K ) - I . Using Youla p a r a m e t r i z a t i o n we get that all achievable transfer functions are given by r = (I - / 5 / ~ - ) - ~ = 1 - (A - 1)0 where ~) is a stable m a p . T h e m a t r i x A and b are given by 1
1
A:(1,2,22
. . . . ),
b = 1.
It is easy to check t h a t for this p r o b l e m ~r162:= inf{llr
: r 9 t?~ and Ar : b} : 0.75,
and rnl :--
I1r
inf
with the o p t i m a l solution r r
=
~
= 1.5,
given by
0.75.t --~-A .
t----O
P e r f o r m i n g a s t a n d a r d gl o p t i m i z a t i o n [3] we obtain u ~ :-- i n f { l l r
: r 9 g~ and A r = b} = i
and m2 :=
inf
Ar162
11r
= x,
with the o p t i m a l solution r = 1. We choose the constraint level to be 0.95. Therefore, ~-y = 2ml,/~ + 1 = 15.62. For this e x a m p l e n = 1 and zl = 1 L" the a priori b o u n ~ o n the length of the o p t i m a l is chosen to satisfy max
k=l,...,n
Izkl L" II (A~) -t I1~,1 a~ < 1.
(5.18)
where L is any positive integer such t h a t L > (n - 1). We choose L = 0 and therefore AL = 1 and ]] (A~) -~ Hob,l= 1. We choose L* = 4 which satisfies (5.18). Therefore, the o p t i m a l solution r satisfies r = 0 if i > 4. T h e p r o b l e m reduces to the following finite dimensional convex o p t i m i z a t i o n problem: u~ =
rain
3 {~"~ ]r
: r 9 R4},
1 1 1 where AL. = (1, 2,2~, 2~). We obtain (using M a t l a b O p t i m i z a t i o n T o o l b o x ) the o p t i m a l solution r to be:
r (A) = 0.9732 + 0.0535)~.
5.4 An Example
97
Tradeoff Curve 1.5
]
I
I
I
I
0.8
0.85
~I 0.9
I
1.45 1.4 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0.75
~
~
' ~ 0.95
Fig. 5.3. The e 1 and the 7i2 norms of the optimal closed loop for various values of "r are plotted. The x axis can be read as the square of 7i2 norm or the value of % The y axis shows the gl norm. Therefore we have IIr = 1.02670 and 11r ~ 0.95. The same c o m p u t a tion was carried out for various values of the constraint level, 7 E [0.75, 1]. The trazteoff curve between the gx and the 7i2 norms of the o p t i m a l solution is given in Figure 5.3. For all values of 7 in the chosen range the square of the 7/2 norm of the optimal closed loop was equal to the constraint level 7. Although when the constraint level 7 equals 0.75 the o p t i m a l closed loop m a p is an infinite impulse response sequence, the o p t i m a l closed loop m a p has very few nonzero terms in its impulse response even for values of 7 very close to 0.75. For example with "~ = 0.755 the o p t i m a l closed loop m a p is given by: r
= 0.7708 + 0.3632A + 0.1596A 2 + 0.0578A 3 + 0.0065A 4.
As a final remark, we can use the structure of this example to illustrate that the optimal unconstrained 7/2 solution can have 7/2 norm much smaller than the 7/2 norm of the o p t i m a l gl (unconstrained) solution. Hence, minimizing only the s norm, which is an upper bound on the 7/2 norm, m a y require substantial sacrifices in terms of 7-/2 performance. Indeed, instead of the P used in the example before, consider the plant /5~(A) -- A - a where now a is a zero in the unit disk (i.e., lal < 1) and very close to the unit circle (i.e., ]a] ~ 1). Then the optimal unconstrained 7/2 norm given by
(b~,(AaA*~)-lb~) 1/2
- (1-
la12) 1/2
98
5. SISO gl/7"/2 Problem
where Aa = (1,a, a2,...), b~ = 1 (see [3] for details) is close to 0. On the other hand, for the optimal/1 unconstrained solution r we have r "- 1 which has 7/~ norm equal to 1. Therefore minimizing only with respect to ~1 may have undesirable 7/~ performance.
5.5 Summary In this chapter the mixed problem of/1/712 for the SISO discrete time case is solved. The problem was reduced to a finite dimensional convex optimization problem with an a p r i o r i determined dimension. The region of the constraint level in which the optimal is unique was determined and it was shown that in this region the optimal solution is continuous with respect to changes in the constraint level of the 7/2 norm. A duality theorem and a sensitivity result were used.
6. A Composite
Performance
Measure
This chapter studies a "mixed" objective problem of minimizing a composite measure of the ~1, 7t2, and/?r norms together with the ~ norm of the step response of the closed loop . This performance index can be used to generate Pareto optimal solutions with respect to the individual measures. The problem is analysed for the discrete time, single-input single-output (SISO), linear time invariant systems. It is shown via the Lagrange duality theory that the problem can be reduced to a convex optimization problem with a priori known dimension. In addition, continuity of the unique optimal solution with respect to changes in the coefficients of the linear combination is established.
6.1 Problem
Formulation
Let wl be the unit step input i.e., wl = (1, 1,...). The problem of interest can be stated as: Given cl > 0, c2 > 0, c3 > 0, and c4 > 0 obtain a solution to the following mixed objective problem:
:= =
inf
{elll~lll + e=ll~ll22 + e3110* wxll~ + e4ILr
inf
{c~llr
~b Achievable
CEel, Ar
e=11r = +c311r wxll~ + e41lr
} (6,1)
The assumptions made on the plant are the same assumptions that were made in Chapter 5. The definitions of achievability, the matrix A and the vector b are as given in Chapter 5. We define f : s --+ R by,
Y(r := cxllr
+ c211r
+ c311r wall~ + c411r
which is the objective functional in the optimization given by (6.1). In the following sections we will study the existence, structure and computation of the optimal solution. Before we initiate our study towards these goals it is worthwhile to point out certain connections between the cost under consideration and the notion of Pareto optimality.
100
6. A Composite Performance Measure
6.1.1 R e l a t i o n
to Pareto
Optimality
T h e notion of Pareto o p t i m a l i t y can be stated as follows (see for e x a m p l e , [22]). Given a set of rn n o n n e g a t i v e functionals 7i, i -- l , . . . , m on a n o r m e d linear space X , a point x0 E X is Pareto o p t i m a l with respect to the vector valued criterion 7 := ( f l , . . . , f m ) if there does not exist any x E X such that 7 i ( x ) _< f i ( x 0 ) for all i E { 1 , . . . , m } and f i ( x ) < 7~(x0) for s o m e i E {1,...,m}. Under certain conditions the set of all Pareto o p t i m a l solutions can be generated by solving a m i n i m i z a t i o n of weighted s u m of the functionals as the following t h e o r e m indicates. T h e o r e m 6.1.1. [23] Let X be a normed linear space and each nonnegative functional 7i be convex. Also let {TI
s,.. := {e e ~ " : c, _> O, ~ c ,
= 1),
i=1
and for each e E R m consider the following scalar valued optimization: inf s
ciTi(x).
xEX i=1
If xo E X is Pareto optimal with respect to the vector valued criterion -](x), then there exists some c E Sra such that xo solves the above minimization. Conversely, given c E Sin, if the above minimization has at most one solution xo then xo is Pareto optimal with respect to 7(x). [] In the next section we show t h a t there is a unique solution r to P r o b lem (6.1). F u r t h e r m o r e , since u is assumed to be a scalar, there is a unique o p t i m a l q E ~1. Hence, in view of the a f o r e m e n t i o n e d t h e o r e m we have t h a t if we restrict our a t t e n t i o n to p a r a m e t e r s el,c2,c3 and c4 such t h a t (Cl,C2, C3, C4) E ~4 := {(c1,c2,e3, c4) : Cl JrC24-c34-c4 = 1, c,,c2, c3, c4 > 0}, we will produce a set of P a r e t o o p t i m a l solutions with respect to the vector valued function f ( q ) := ( l l h - u , q{ll, lib - u , qll2 2, {{(h - u , q) , w , l l ~ , Ilh - u , q l l ~ ) =: (71 (q), 72 (q), 73 (q), f 4 ( q ) ) . where q E ~l. Thus, if r is the o p t i m a l solution for P r o b l e m (6.1) with a corresponding qo for some given (el, c2, c3, c4) E Z'4, then there does not exist a preferable alternative r with r = h - u * q for some q E tl such t h a t f i ( q ) < fi(qo) for all i E { 1 , . . . , 4 } and f i ( q ) < fi(qo) for s o m e i E {1,...,4}. As a final note we m e n t i o n t h a t if (cl, c2, c3, c4) do not satisfy cl + c2 + c3 + c4 = 1 then we can define a new set of p a r a m e t e r s ~1, ~2, ~3 and 54 by cl = cl c2 = c~ 53 = c~ and 54 = c~ c1+c:+c3+c4 ' c1+c2+ca+c4 ' cl+c2+ca+c4 ct+c2+ca+c4 with 51 + 52 + ~3 + ~4 = 1. T h e s e new p a r a m e t e r s would yield the s a m e o p t i m a l solution as with (cl, c2, c3, c4).
6.2 Properties of the Optimal Solution
101
6.2 Properties of the Optimal Solution In the first part of this section we show that Problem (6.1) always has a solution. In the second part we show that any solution to Problem (6.1) is a finite impulse responsc sequence and in the third we give an a priori bound on the length. 6.2.1 E x i s t e n c e o f a S o l u t i o n Here we show that a solution to (6.1) always exists. T h e o r e m 6.2.1.
There exists r
= inf {C111r
f(r 9
+ C211r
r
where r162
9 ~ such that
"t- C3I] r * Wl ]]r "]- C411r
9 e~ : Ar = b}. ThereSore the i , ~ , m
,~ (6. ~) is ~ m i m m u ~ .
Proof. We denote the feasible set of our problem by 9 := {r 9 gt : Ar = b}. Let
B := { r 1 4 9
: e~llr
+ ~11r
+ r162
w~lloo + c~11r
< ~+ 1).
It is clear that u =
inf {~11r cE~nB
+ c211r
Therefore given i > 0 there exists r
~11r
+ c~11r
+ c311r w~ll~ + c~11r 9 ~5 M B such that
~ + c311r * w~llo~ + ~411r
1 < - + =.
Let
m B:={CEgl
:c~[14[11< u + l } .
B is a bounded set ill gt = C~. It follows from the Banach-Alaoglu result (see Theorem 2.3.1) that B is W(c~,co) compact. Using the fact that co is separable and that {r is a sequence in B we know that there exists a subsequence {r of {r and r 9 B such that r -+ r in the W(c~, co) sense, that is for all v in co < v, r
> ' - ~ < v, r
> as k --~ oe.
(6.2)
Let the jth row of A be denoted by aj and the jth element of b be given by bj . Then as aj 9 Co we have,
aj, r As A(r < aj, r that r
~"'~<~ aj, r
> as k -+ oo for all j = 1 , 2 , . . . , n.
(6.3)
= b we have < aj,r > = by for all k and for all j which implies >---=bj for all j. Therefore we have A(r = b from which it follows 9 4. This gives us ct I1r162
+ca]lr
* wlll~+eallr
> -
102
6. A Composite Performance Measure
From (6.2) we can deduce t h a t for all t, r (t) converges to r An easy consequence of this is we have for all N as k tends to co, ~tN=o{ellr + c~(r (t)) 2 } + e3 max0
~Ct=o {etlr
+ e2(r (t)) 2} + c3 maxt
y]~{c,lr
+ c2(r
2} + c3 max ](r * wl)(t)[ + c4 t
max
Ir
0
<
u.
--
/:=0
Letting
N
c211r
--+ c~ in the above inequality we conclude that cl][r _< .. This proves the theorem. []
+
+ c311r * w~ll~ + c411r
6.2.2
Structure
of Optimal Solutions
In this subsection we use a Lagrange duality result to show t h a t every o p t i m a l solution is of finite length. Lemma
6.2.1.
u = m a x in f { f ( r
< b- Ar
y E R ~ ~Etl
>}.
(6.4)
Proof. We will apply T h e o r e m 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) to get the result. Let X , / 2 , Y , Z in T h e o r e m 3.3.2 correspond to gt,ga, R n, R respectively. Let 7 :-- u-t-l, g ( r f(r and H ( r b-Ar W i t h this notation we have Z* = R. A has full range which implies 0 E int[range(H)]. From T h e o r e m 6.2.1 we know that there exists r such t h a t g(r = f(r -- - 1 < 0 and H(r = 0. Therefore all the conditions of T h e o r e m 3.3.2 are satisfied. From T h e o r e m 3.3.2 we have v--
max
inf{f(r
z>+
z>O,yER ~ r
Let z0 E R, Y0 E R" be a maximizing solution to the right hand side of the above equation. r being the solution of the primal we have from (3.30) t h a t < g(r z0 > + < g ( r Y0 > = 0 which implies that < g(r z0 > = 0. As g(r -'/: 0 we conclude that z0 -- 0. This proves the theorem.r-I T h e following theorem shows t h a t the solution to (6.1) is unique and t h a t it is a finite impulse response sequence. T h e o r e m 6.2.2. Define 7-:={r E gt :there exists L* with r L*}. The following is true: v -- m a x i n [ { f ( r y E R '~
C E T -
-
< r v > 4- < b, y >},
= 0 if i > (6.5)
-
where v(i) := (A* y)(i). Also, the solution to the primal (6.1) is unique and the solution belongs to 7-.
6.2 Properties of the Optimal Solution
103
Proof. Let Y0 E R n be the solution to the right hand side of (6.5). Define vo := (A*yo)(i) and let J(r
:=f(r
y0>.
It is immediate that v is equal to inf ~--~{el [r162162
sup ](r
CEll ~0
sup Ir
i
<
i
b, yo > . As v0 is in gl we know that there exists L* such that vo(i) satisfies [vo(i)] < cl if i > L*. We now show that for r to be optimal it is necessary that r = 0 for all i > L*. Indeed, if r # 0 while [v0(i)] < cl for some i, note that, cllr
2- r
+ c2(r
> cl]r
2- r
+ c2(r
for any r e s such that r = 0. Moreover, ifr is such that r whenever j < L* and r = 0 whenever j > L* it follows that i
sup I Z i
= r
i
r
> sup ] Z r i
j=O
and sup Ir
j=0
> sup ]r
i
1
i
or equivalently,
[[r * Wl[[oo ~ [[r * Wl[[o~ and ][r
_> [[r
Hence, we have that J ( r > J(r which proves our claim. In T h e o r e m 6.2.1 we showed that there exists a solution r to the primal (6.1). From Theorem 3.3.2 we know that r is a solution to inf J ( r As J ( r is strictly CEll
convex in r we conclude that the solution to the primal (6.1) is unique. From the previous discussion it follows that the solution to the primal (6.1), r is in T. This proves the theorem. [] 6.2.3 A n A p r i o r i B o u n d o n t h e L e n g t h o f a n y O p t i m a l S o l u t i o n In this subsection we give an a priori bound on the length of the solution to (6.1). First we establish the following two lemmas. L e m m a 6.2.2. Let r
be a solution of the primal (6.1). Let Yo represent a corresponding dual solution as obtained in (6.5). Let vo := A*yo then, IIv011o <
where
= el + e3 + e4 + 2
f(h).
Proof. From the proof of Theorem 6.2.2 it is clear that r that it minimizes J ( r given by L~
Y~{c1[r i=0
should be such
i
+ c2(r
2- r
+ c3 maxi
j=O
r
+ e4 ~<~xlC(i)J, -
104
6. A Composite Performance Measure
where L* is such that Ivo(i)l < cl if i >_ L*. Let i be any integer such that i _< L*. Consider perturbation S of S0 given as S(i) = r and S(j) = r f o r j r i. Then, for all e, it can be shown that t
O
t
I~r
max
max I E S o ( j ) l O
j=0
-
< I~l
(6-6/
--
and max ISo(t)l < Ir
max I r
O
~
O
(6.7)
~
assume that
I n d e e d ,
t
N
i. LV', o(J>l ~,-----.
max O
j=0
= i V ' ~ o ( i ) l for some N < L*. 'Z..~-"--'' j=0
For the given e let t
-
M
I E S ( J ) l = , z . _---.l~--"S(i)lf~176 , j=0 j=0
max O
(6.8)
L*.
If M < i then t
t
max I~-~ S ( J ) l O
-
max I ~--] r 0
= tr162
-
-Ir +... + r < o < I~1, and if M > i then t
t
max mE SIJ/I-
O < t < L
9
j=O
max I ~--~ r 0
= Ir
+ . . . + (r
+r
+ c) + . . .
- Ir
< 1r -Ir < I~1.
+... + r
+... + r +...+r
+ I~1
(6.7) can be proved easily. It follows from (6.6) and (6.7) that J(r
- J(r
= c1(1r
+ cl - 1r
+ c2(e ~ + 2er
t
+c3 max I E r 0
~
< cxl,I + c2(, 2 + 2~r
t
max I E r j=0 max I r 0
O
~
+ cal~l + c4lcl-
As r is the unique minimum we have that J ( r it follows that
)
- J(r
evo(i).
> 0 and therefore
6.2 Properties of the Optimal Solution
+ cal~l + c~l~l- evo(i) > o for all
c~l~l + c~(? + 2~r
105
~.
Dividing both sides of the above inequality by le_l we get ~ + ca + c~ +
c21~1
+ 2c~r
-
~-~vo(z) ' " > 0 for all e.
Letting e --~ 0 + and r --+ O- in the above inequality we have
vo(i) < cl + c3 + c4 +
2c~1r
and
-vo(i)
_< C l -t- e3 4- c4
--
2c21r
_< cl + c3 + c4 + 2c21r
respectively. This implies that Iv0(i)l < el + ca + c4 + 2c21r
< cl + ca + c4 + 2c21tr
As this holds for any i < L* we have
Ilvoll~ < cl + ca + c4 + 2c2 f ( r
< cl + c~ + ~4 +
Cl
2c~f(h), Cl
where we have used that 2c211r _< 2 ~ f ( r _< ca + e3 + c4 + 2 ~ f ( h ) , and, f(r < f ( h ) since h is feasible (q = 0) but not necessarily optimal. This proves the lemma.D L e m m a 6.2,3. If y E R n is such that ]]A*y[[~ _< a then there exists a positive integer L* independent of y such that I(A* y)(i)] < Cl for all i >_ L*.
Proof. Define
1 1 1 ...1~
A'L=
Zl
Z2
Z3
.,.
Zn
. . . . . . L .L .L " .L Z l ~2 ~3 . . . . n
J
A*L : R " --+ R L+I . With this definition we have A ~ = A*. Let y E R n be such that I[A*ylloo < ~. Choose any L such that L > (n - 1). As zi,i = 1 , . . . , n are distinct A~ has full column rank. A~, can be regarded as a linear map taking (R '~, ]1.111) -'~ ( RL+I , I].II~). As A~ has full column rank we can define the left inverse of A~,, (A~,) -t which takes (R L+I, [I.1[~) --+ ( Rn, I1.111). Let the induced norm of (m~,) -~ be given by II (A~,) -I I1~,1. Y E R n is such that m _ ~. It follows that, IIA*yll~o -<- ~ and therefore IIA LYlIo~ < ]]ylll _<]1 (A~) -l Iloo,1 IIA*LYlI~ <[I (A~) -t I1~,I a.
(6.9)
Choose L* such that max
k:l,...,n
Iz~l L" II (A~,) -l I1~,1 '~ < Cl.
(6.1o)
106
6. A Composite Performance Measure
There always exists such an L* because [zk[ < 1 for all k = 1 . . . . ,n. Note that L* does not depend on y. For any i > L* we have k~rt
I(A'y)(i)l
l Y~ z~,y(k)l
=
< --
max
kml,...,n
Izkl~llylll
k.~ l
<- -
max
k=l,...,rt
< --
max
I kl' II (A~,) -l I1 ,1 Izkl L" II (A~) -l Iioo,1
kml,...,rt
T h e second inequality follows from 6.9. From 6.10 we have I(A*y)(i)l < Cl if i > L*. This prove s the lemma. [7 We now s u m m a r i z e the main result of this subsection T h e o r e m 6.2.3. The unique solution r of the primal (6.1) is such that r = 0 if i > L* where L* given in Lemma 6.2.3 can be determined a
priori. Proof. Let Y0 be the dual solution to (6.1) and let v0 := A*yo. From L e m m a 6.2.2 we know that ]lv0[l~ < a where a = cl + c3 + e4 + 2 ~ f ( h ) . Applying L e m m a 6.2.3 we conclude that there exists L* (which can be determined a priori) such that Ivo(i)l < el if i > L*. Therefore, r = 0 if i > L*. This proves the theorem. [] The above theorem shows that the Problem (6.1) is a finite dimensional convex minimization problem. Such problems can be solved efficiently using standard numerical methods. At this point we would like to make a few remarks. It should be clear t h a t the uniqueness property of the optimal solution is due the the non-zero coefficient c2. This makes the problem strictly convex. T h e finite impulse response property of the optimal solution is due to the nonzero cl. Also, it should be noted t h a t in the case where c3 a n d / o r c4 are allowed to be zero, all of the previous results apply by setting respectively c3 a n d / o r c4 to zero in the appropriate expressions for the upper bounds.
6.3 An
Example
In this section we illustrate the theory developed in the previous sections with an example taken from [14] and also considered in C h a p t e r 5. Consider the SISO plant,
=
- 5'1
(6.11)
where we are interested in the sensitivity m a p r := ( I - P K ) -1. Using Youla parametrization we get that all achievable transfer functions are given by
6.4 Continuity of the Optimal Solution
107
=(I-Pk)
~)q^ where ~ is a stable m a p . Therefore, h = 1 and u = A - 89 The m a t r i x A and b are given by 1 1 A = (1, 2 , 2 2 , . . . ) ,
b=l.
We consider the case where cl = 1, c~ = 1, c3 = 1 and c4 = 1. Therefore, 1 L* the a = cl + c3 +c4 + 2 ~ f ( h ) ) = 11. For this example n = 1 and zl = 3' a priori bound on the length of the o p t i m a l is chosen to satisfy
max
k-~l,...,n
Izk] L" Jl ( A D -*
" <
(6.12)
where L is any positive integer such t h a t L _~ ( n - 1). We choose L = 0 and therefore AL = 1 and I[ (A'L)-' [Io~,1= 1. We choose L* = 4 which satisfies (6.12). Therefore, the o p t i m a l solution r satisfies r = 0 if i > 4. The problem reduces to the following finite dimensional convex optimization problem: 3
v = Amicn=l{ E ( l r
+ 0
+ (r
k----0
max Ir
0
: r e R4},
where AL. ---- (1, ~, 1 ~, 1 ~). We obtain (using Matlab O p t i m i z a t i o n Toolbox) the o p t i m a l solution r to be: r
= 0.9 + 0.2A.
6.4 Continuity
of the
Optimal
Solution
In this section we show that the o p t i m a l is continuous with respect to changes in the p a r a m e t e r s cl, c2, ca and c4. First, we prove the following lemma: L e m m a 6.4.1. Let {fk } be a sequence of functions which map R m to R. If fk converges uniformly to a function f on a set S C R m then lira m i n f k ( x ) =
k-+c~
~i~l f ( x ) ,
provided that the minima exist. Proof. Let me m f ( x ) = f(xo) for some x0 E S. Given e > 0 we know from convergence of the sequence {fk} to f t h a t there exists an integer K such that if k > K then [fk(x0) - f(x0)] < e, ::~ fk(x0) < e + f(x0),
108
6. A Composite Performance Measure ::~ min f k ( x ) < e + f ( x o ) xES
)
::~ lim m i n f k ( z ) < e + f(xo). k-..+oo xE S
As e is a r b i t r a r y we have lim m i n f k ( x ) < f ( x o ) . Now we prove the other k--+ oo ~ E S
inequality. Given e > 0 we know t h a t there exists an integer K such t h a t if k > K then [ A ( x ) - f ( x ) l < e for any x E S
=~ fk(x) > f ( x ) -- e 3> f(xo) -- e for any x E S
=~ min fk(z) > f(z0) -- c ~ES
lim minfk(z) k-+oo xE S
=r
> f ( x o ) -- ~.
As c is a r b i t r a r y we have k--roolim~ l ~ ]'k (X) _> f(Xo). This proves the l e m m a
[]
T h e o r e m 6.4.1. Let c~ 9 [al, bl], ck2 9 [a2, b2], Ck 9 [a3, b3] and ck4 9 [a4, b4] where al > O, a2 > O, aa > O, a4 > O. Let Ck be the unique solution to the problem "k : = min exkllr Ar
and let r
k + c211r
~ + e~llr
wxlloo + c4k11r
(6.13)
be the solution to the problem
v := rain c111r Ar
+ c21]r
+ c31tr
w l l l ~ + e411r
(6.14)
I/c~ ~ el, c~ ~ c~, c~ -~ c~ . n d e~ ~ e. t h e . Ck ~ r Proof. We prove this t h e o r e m in three parts; first we show t h a t we can restrict the proof to a finite dimensional space, second we show t h a t Uk --~ ~ and finally we show t h a t Ck "~ r Let Yk represent the dual solution of (6.13) and let Vk : : A*yk. Let f k ( r : = Ckll[qJl11"~-Ck2[l~)l122"4:-ck3[lCk * Wl[lc~-lt-C4k][r and f ( r := c,11r + c211r ~ + callr * w, ll~ + c~llr Let Ok the u p p e r bound on Ilvkll~ be as given by L e m m a 6.2.2. Therefore, ck
ak = c k + eka + c k + 2 ~ A ( h )
<_ bl + b3 "q- b4 -F 2b2(llhll~ + ~llhll22 + ~ l l h * wl[l~ + ~ l l h l l ~ ) . Let this bound be denoted by d. Choose L* such t h a t
max Iz~l L" II (m~) -~ I1~,~ d < a~.
i=l,...,n
where L is such t h a t L > (n - 1). Therefore, it follows t h a t
max I:il L' II (A'L) - t I1~,1 c~k < c~.
i=l,...,n
6.5
Summary
109
for all k. From arguments similar to that of L e m m a 6.2.3 and Theorem 6.2.3 it follows that ek(i) = 0 if i > L* for all k. Therefore we can assume that ek
9 R L" 9
Now, we prove that uk -+ u. Let r ul := min bll]r Ar
+ b2]]r
be the solution of the problem
+ ba]]r * Wl][oo -t- b4]]r
As Clk _< bl, c~ < be, ca~ _< ba and c~ _< b4 we have that uk < ul for all k. Therefore, for any k we have c~]]r ]]r I~ ~+ c 3 1k1 r * w~ll~+c~llCkll~ _< vl which implies I1r x <- ~c~ <- ~~ ' lick * w~ll~ <- ~c ~ - < -< ~ < - ~ a , , I1r and IlCkll~ <_ ~ . Let S : = {r 9 R L* : Ar = b, IIr "-~x~ _ ~ } . Then it is clear that
Uk := min clk11r r
+ c~11r
<_ ~o,, I1r
~ _<
~ + c~11r * w~lloo + ckllr
We now prove that fk converges to f uniformly on S. Given e > 0 choose K such that if k > It" then ] c ~ - c l ] < ~4 v t ' 1c2 k - c ~ ] < ~4~'1 ~ Ick - c a l < ~4v't and ]c4k - c 4 1 < ~4 u l " Then for a n y r 1 4 9 1 6 2 1 6 2 ](Ckl-Cl)llr (c~ - c~)llr
+ (c~3 - ca)lie * wl II~ + (c~ - c4)llr +
+
and thus §
Therefore, it follows that fk converges uniformly to f on S. From L e m m a 6.4.1 it follows that uk --+ u. We now prove that ek ~ r Let B :--- {r 9 R L" : I1r ___ o~ } then we know that ek 9 B which is compact in (R L*, I[.llt). Therefore there exists a subsequence ek, of r and r 9 R L" such that ek, --~ r As c) ~ cl, c~ --+ c2, ck3 --+ c3, ck4 ~ c4, and r --+ r we have that f~,(r -+ f(-r As v~ converges to v it follows t h a t f a , ( r --+ f ( r (note that va. = fk,(r and v = f(r and therefore f ( r = f(r As Aek, = b for all i we have that Ar = b. From uniqueness of the solution of (6.14) it follows that r = r Therefore we have established that Ca, --+ r From uniqueness of the solution of (6.14) it also follows that ek --+ r This prov('s the theorem. []
6.5 Smnmary In this chapter we considered a mixed objective problem of minimizing a given linear combination of the gl norm, the square of the 7t2 norm, and the go~ norms of the step and pulse responses respectively of the closed loop. Employing the Khun-Tucker-Lagrange duality theorem it was shown that this problem is equivalent to a finite dimensional convex optimization problem with an a priori known dimension. The solution is unique and represents a
110
6. A Composite Performance Measure
Pareto optimal point with respect to the individual measures involved. It was also shown that the optimal solution is continuous with respect to changes in the coefficients of the composite measure.
7. M I M O
Design: The Square Case
This chapter and the next chapter explore the interplay of the ~1 and the 7/2 norms of the closed loop for the multi-input multi-output (MIMO) case which is much richer in its complexity and its applicability than the SISO case, discussed in previous chapters. Consider for example Figure 5.1 where the part of the regulated output given by z2 is used to reflect the performance with respect to a unit pulse input and it is also required that the maximum magnitude of zl due to a worst unity magnitude bounded input stays below a prespecified level. This objective can be captured by the following problem: min
K stabilizing
{ll z wll2 : I[ ,wlll
(7.1)
where 3' is the level over which the infinity norm of z2 is not allowed to cross for the worst bounded disturbance. Or, it may be that the disturbance w is such that a part of it, wl is a white noise while another part, w2 is unity magnitude bounded. A relevant objective is the minimization of the effect of these disturbances on the regulated output. The problem, man
K stabilizing
{ll~z~,ll2 : I 1 ~ 1 1 1 <_ 7},
(7.2)
where 7 is the level over which the infinity norm of z is not allowed to cross for the worst unity magnitude bounded disturbance is then a problem of interest. Both problems mentioned previously fall under a general framework of a problem which we call the mixed problem. This problem is addressed and solved in this chapter along with a related problem which minimizes a combination of the various input-output maps of the closed loop which we call the combination problem. The latter is of interest by itself and in relation to the mixed problem. The treatment in the chapter is restricted to the square case also known as the 1-block case (where the number of regulated variables is equal to the number of control inputs and the number of measured outputs is equal to the number of exogenous inputs). The MIMO problem in the mixed 7/2 and t?l setting poses many questions which are not addressed in the SISO setting discussed in previous chapters. It is shown that MIMO problems need to be handled differently in a significant way. The optimal solutions for the 1-block are not in general finite impulse responses nor are they unique (as will be shown) unlike the SISO
112
7. MIMO Design: The Square Case
case. However, it is established that the solution can be obtained via finite dimensional quadratic optimization and linear programming. We show that it is possible to obtain an a priori bound on the dimension of the suboptimal problems even for the MIMO version of the problem addressed in [14] (no a priori bound is given in [14]). The chapter is organized as follows. In section 7.1 we give preliminaries. In section 7.2 we show that the combination problem can be solved exactly via a finite dimensional quadratic optimization and linear programming for the square case. In section 7.3 we study the mixed problem and its relation to the combination problem. In section 7.4 we give an example to illustrate the theory developed. In section 7.5 conclusions are given. Section 7.6 is the appendix which contains the proofs of some of the facts stated in earlier sections.
7.1 Preliminaries In this section we state theorems, assumptions and give notation which will be relevant to the rest of the chapter. A good reference for this section is [3]. We denote by n~, n~o, nz and ny the number of control inputs, exogenous inputs, regulated outputs and measured outputs respectively of the plant G. We represent by 69, the set of closed loop maps of the plant G which are achievable through stabilizing controllers. H E glf ~ X ~ , U E gln z X~.u and V E gl~ X n w characterize the Youla parametrization of the plant. The following theorem follows from the Youla parametrization. T h e o r e m 7.1.1. n u XTIY
O = {4 E ~ " x n , : there exists a Q E el
with r
~
/:/
--
0Oil'}.
If r is in O we say that ~ is an achievable closed loop map. We assume throughout the chapter that U has normal rank nu and (/ has normal rank ny. There is no loss of generality in making this assumption [3]. Let the Smith-McMillan decomposition of U and "v" be given by = Lv
tuku,
f" = Lv. Iv R v , nyX~
respectively where Lu E ~ , • Ru E ~x,~., Lv E gl and Rv E ~ x , ~ , ~ are unimodular matrices. Let Auv denote the set of zeros of/)" and ~ ' i n T ) (i.e. the zeros of IIi=xe, nu ^. IIj=le -~ ^'j which lie inside the closed unit disc). For A0 E A v v define, au.(A0) := multiplicity of A0 as a root of ~i(A) for i = 1 . . . . , n~, o'yj(A0) := multiplicity of A0 as a root of g'j(A)for j - 1 . . . . ,ny.
7.1 Preliminaries
113
As Lu, /~v are unimodular we can define the following polynomial row and column vectors:
Oti(~ ) = (Lu1)i(,~) for i = ~j(.,~) ---(kv1)J(,~) f o r j =
1 , . . . , nz, 1,...,n~,
where (M)i denotes the i th row of the matrix M and ( M ) J denotes the j t h and /3j E g?u,xl We assume column of a matrix M. Note that ai E plxn. ~1 that U and (7 have no zeros which lie on the unit circle. This is a standard assumption in the optimal model-matching approach we employ and is crucial for the 1-block development. For a detailed discussion about the case where this assumption fails see [3]. We now present the main interpolation theorem for a closed loop map to be achievable. We denote the k th derivative of (.) with respect to ,k by (.)(k) whereas the k th power of (.) is denoted by (.)k. T h e o r e m 7.1.2. [3] q5 E g~* • tions hold f o r all ~o E A u v
is in 0 if and only if the following condi-
i= 1,...,nu i)(dti~flj)(k)(Ao) -= (ai[-Iflj)(k)(~o) f o r j = 1 , . . . , ny k = O,...,ag,(.~o) + J" (&ir = (&i/2I)()~) f o r i = n,, + 1 , . . . , nz ii) (r (/://)j)()~) f o r j = nu + 1 , . . . , n ~ "
o'vj(.Xo)
-
1
The first set of conditions constitutes the zero interpolation conditions whereas the second set consists of the rank interpolation conditions. The plant G is called square, or equivalently, we have a 1-block problem, if the rank interpolation conditions are absent (i.e., when nu = nz and ny = nw ). Otherwise, the plant is non-square plant, or equivalently, we have a 4-block problem. Define F i j ~ ~ E ~ xn~ by oo
oa
l=0
t----0
:=
s)
It can be easily verified that for any ~/' E t ~x X (&ir
(7.3t ~
= < 4, F Okx~ > .
(7.4)
This is shown in the Appendix. F ijkx~ characterize the zero interpolation conditions. We state the following lemma which will be of use in the next section. L e m m a 7.1.1. For all )~o G A u v : F'JkXo e e ~ ' , •
(andhence
eg; "xn') for
i = 11 ,, .. .. .. ,, nn u u = 0,...,crv.(,k0) +o'v,(,k0) --l.
Furthermore, given any e > 0 we can always choose a To > T such that
f o r .ll s > To
114
7. M I M O Design: T h e S q u a r e Case
Proof. See Appendix. This means that the zero interpolation conditions can be characterized via elements in gl~ , . Similarly, the rank interpolation conditions can be characterized via elements Ga,qt and G~jpt in g~' x,~ (see Appendix) as the following theorem states. In the following theorem Gaiqt and Go,pt (elements in r215 are defined in the Appendix. Xtl
w
T h e o r e m 7.1.3. [3] Let
R F ijtr176:= Real(F ijk~'~ and I F ijk)'~ = Irnaginary(FiJk~o). Suppose Aug Cint(~D) then q~ E g~xn~ is in 0 if and only if the following conditions hold: A0 E A u v < ~, R F ijk'x~ > = < H, R F ijk'x~ > i= 1,...,nu j = 1 . . . . , ny < or2,I F ijk)'~ > = < H, I F ijtr176> for
{
k = 0,...,
o'u,(Ao) + o'v,(),o) - 1
and
f i=n,,+l,.
.,n~ j = nu + 1,...,n~ < ~, Gaiqt > = < H, Ga,qt > for J q = l , . . . , n w = < H,G~jpt > p= 1,...,nz t =0,1,2,... Furthermore, F ijkx~ Ga,qt and Gzjpt are matrix sequences in g~~x'~'~
I
Proof. Follows easily from Theorem 7.1.2, equation (7.4) and the fact that H and R are real matrix sequences. The fact about sequences in ~ , x , ~ is shown in the Appendix. [] We assume without loss of generality that Y ijk)'~ is a real sequence. Further, we define I,l u
ny
H,F,5 o> and XofiAuv i=1
j=l
Cz is the total number of zero interpolation conditions. The following problem /]0,1 = ,i~ Ai~fv~blr {llr
(7.5)
is the standard multiple input multiple output ~ ' x n ~ problem. In [5] it is shown that this problem for a square plant has a solution, possibly nonunique but the solution is a finite impulse response m a t r i x sequence. Let #0,2 := ~ A{l14511~}'~mev inf ~ble --
(7.6)
which is the standard 7t2 problem. The solution to this problem is unique and is an infinite impulse response sequence. We now collect all the assumptions made (which will be assumed throughout this chapter) for easy reference. A s s u m p t i o n 2 ~r has normal rank n~, and ~/ has normal rank ny.
7.2 The Combination Problem
115
A s s u m p t i o n 3 U and V have no zeros which lie on the unit circle, that is A u v C int(:D). Assumption
7.2 The
4
F ijk;~~
is a 7~ealsequence.
Combination
Problem
In this section we state and solve the combination problem. We first make the problem statement precise. Next we show the existence of an o p t i m a l solution. We then solve the problem for the square case. Finally, we study the nonsquare case. Let Nw := { 1 , . . . , n ~ } and let Nz := { 1 , . . . , n z ) . Let M, Y and M Y be subsets of N~ • N~ such that the intersection between any two of these sets is e m p t y and their union is N~ • Nw. Let ~pq and Cpq be given positive constants for (p,q) E M N U M and for (p,q) E M N U N respectively. The problem of interest is the following: Given a plant G solve the following optimization
problem:
=
inf
Achievable
~
~11r
(p,q) e M N u M
2+
~
cpqll~'pqtl~. (7.7)
(p,q) e M N u N
Note that for all (p, q) E M only the 712 norm ofqSpq appears in the objective, for all (p, q) E N only the gl norm of Cpq appears in the objective and for all (p, q) E M N a combination of the 7/~ and the gl norm of q~pq appears in the objective. For notational convenience we define f : s • --4 R by f(r
::
Z
-
(p,q)E M N u M
: (p,q)E M N u N
which is the objective functional being minimized. As it can be seen the objective functional of the combination problem constitutes a weighted sum of the square of the 7/2 norm and the t?l norm of individual elements qSpq of the closed loop m a p r Note that with this type of functional the overall 712 norm of the closed loop as well as s norms of individual rows can be incorporated as special cases. For technical reasons explained in the sequel we define the space .4 := {~b e Lcr~2"•
: ~pq E gl for all (p, q) 9 M Y O N } .
The following set is an extension of O O~ := {~ 9 ,4 : 4~ satisfies the zero and the rank interpolation conditions). Note that 69 is the set {45 9 g~" • 9 r satisfies the zero and the rank interpolation conditions). Also, note that when M is e m p t y then O : O~. Finally, we define the following optimization problem v~ := inf f(45). CEO~
Now, we show that a solution to (7.8) always exists.
(7.8)
116
7. MIMO Design: The Square Case
There exists ~o E O~ such that
L e m m a 7.2.1.
~-~
/]e
--
0
e; llr176
2
(p,q)E M N u M
(p,q)E M N u N
Therefore, the infimum in (7.8) is a minimum. Proof. See Appendix. 7.2.1 S q u a r e C a s e Here, we solve the combination problem for the square case. T h r o u g h o u t this subsection the following assumption holds:
5 nu = n~ and ny = nw.
Assumption
In the sequel y E R c" is indexed by ijkAo where i, j, k, A0 vary as in the zero interpolation conditions. The following l e m m a gives the dual problem for the square case. L e m m a 7.2.2.
ue = m a x { r
where r
y E R e'},
(7.9)
inf L(45) and
'~EA
L(r
qiI%II2 + (p,q)e ( M1V )u M
epqii%itl+
(p,q)E ( ~t N )u N -~ ~ . . . . (hijkAo
v,:~0w
""
- < F'3J')'~ ~ > ) '
i,j,k,Ao
Proof. We will apply Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) to get the result. Let X, s Y,Z in T h e o r e m 3.3.2 correspond to A,.A, R e ' , R respectively. Let 7 = ue + 1 and let g : .,4 -+ R be given by g((P) := f ( ~ ) - 7. Let t I : , A --~ R e` be given by tIijkXo(r
: = biJ k A o _ < FiJk)'~
> ,
We index the equality constraints of [_[/by ijkAo where i, j, k, A0 vary as in the zero interpolation conditions. In [3] it is shown that the m a p H__ is onto R c" (it is shown that the zero interpolation conditions are independent). This means that 0 E i n t ( R a n g e ( H ) ) . From L e m m a 7.2.l wc know that there exists a (/il E .A such that /_/(~1) = 0 (that is (P~ satisfies the zero interpolation conditions) and f((P~) = ue which implies that g ( ~ l ) = - 1 < 0. Thus all the conditions of T h e o r e m 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) are saitisfied. The l e m m a follows by applying T h e o r e m 3.3.2 to (7.8). [] For notational convenience we define the functionals Zpq E ~1 by
Zpq(t) :=
~--~ ffijkAorpq " ~ijk,ko (t). i,j,k,Ao
In what follows we show that the dual problem is in fact a finite dimensional convex p r o g r a m m i n g problem. A bound on its dimension is also furnished.
7.2 T h e Combination P r o b l e m
117
T h e o r e m 7.2.1. It is true that v = v~. v~ can be obtained by solving the following problem: O0
max{ E E - ~ P q C P q (t)2 + E E--6Pq~bPq (t)2+ E yijk~o~-~,jk~o, (p,q)EM t=0 (p,q)EMN t=0 i,j,k,Ao subject to Y 9 RC',r 9 gl for all (p,q) 9 M N U M, --Cpq 5 Zpq(t) < Cpq i f (p, q) 9 N ] 2"~pqOpq(t) = Zpq(t) -epq i f (p,q) 9 M N and Zpq(t) > epq, = Zpq(t) + epq i f (p, q) 9 M N and Zpq(t) < -epq, (I) 0 i f (p, q) 9 M N and IZpq(t)] < Cpq, = Zpq(t) i f (p, q) 9 M, for all t = 0, 1 , 2 , . . . . Furthermore, it holds that the infimum in (7. 7) is a minimum, and, ~o is a solution of (7.8) if and only if it is a solution of (7. 7). In addition, r is unique for all (p, q) 9 ( M N ) tO M.
Proof. In Lemma 7.2.1 we showed that an optimal solution ~b~ always exists for problem (7.8). From Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) we know that if y 9 R ~" is optimal for the dual problem then r minimizes L(~) where L(~) =
~pqll~qllfl +
E
E
Cpqll~Pq}[1
(p,q)E( M N)tJM (p,q)E (M N)v N -[- E YiJk)~~176 <~ FiJk)~~162> ) " i,j,k,Ao
Thus, r176 minimizes
(~v~q(t) ~ - zpq(t)r
+
-z,q(t)r
~
(e~r
~ + c~l~(t)l
(p,q)EMN
(p,q)EM
+
~_, ( c ~ l r
z,~(t)~(t)) +
(p,q)eN
Therefore, if (p, q) 9 M then ~~
E
YijkAo bijkA~ i,j,k ,)~o
minimizes
~pq~p~(t) 2 - z~(t)%~(t), which is strictly convex in ~pq(t) and therefore ~~ is unique. Differentiating the above function with respect to ~pq(t) and equating tile result to zero we conclude that if (p, q) E M then
2~pq%~
= zpq(t).
As Zpq E gl we have that for all (p, q) E M, ~Oq E gl. If (p, q) E M N then ~~ minimizes
~pqCpq(t) 2 + Cpql,~pq(t)l- zp~(t)r which is strictly convex and therefore the minimizer is unique. This also implies that if (p, q) E M N then ~b~ satisfies conditions stipulated in (I).
118
7. MIMO Design: The Square Case Suppose, (p, q) G N then #~
minimizes
7(epq(t)) := {cpqlepq(t)t- zpq(t)ep~(t)}. Note that if Zpq(t)q~pq(t) < 0 then -f(r the optimal minimizes
> O. But, 7(0) = 0. Therefore,
ep~ (t)(e~sgn(Z~ (t)) - Z~ (t)), over all ~pq(t) 6 R such that Zpq(t)~)pq(t) >_ O. Now, if IZpq(t)l > Cpq then given any K > 0 we can choose r 6 R that satisfies Zpq(t)~pq(t) > 0 and "-f(~pq(t)) < -A" and thus the infimum value would be - o o . Therefore, we can restrict Zrq(t ) in the maximization of the dual to satisfy IZpq(t)l < Cpq. If, IZpq(t)l < cpq then -f(r >_ 0 for any qbpq(t) E R t h a t satisfies Zpq(t)~pq(t) ~> 0 and is equal to zero only for r = 0. Therefore, we conclude that if (p, q) 6 N then we can restrict Zpq(t) in the m a x i m i z a t i o n of the dual to satisfy IZpq(t)l _ Cpq and that the o p t i m a l r176 minimizes -f(g'pq(t)) with a m i n i m u m value of zero. It also follows that if IZpq(t)l < cpq then r E R that minimizes f(~pq(t)) is equal to zero. The expression for ue follows by substituting the value of r obtained in the above discussion for various indices in the functional L(r Note that O = Oe f'l t?7"Il z X t ' l . w . But in the previous steps we have shown by construction that the optimal solution to (7.8), ~b~ is such t h a t Op0q 6 el for all (p, q) 6 M. This means that 4' ~ E O. From the above discussion the theorem follows easily. [] Note that the above theorem demonstrates that the problem at hand is finite dimensional. Indeed, at an optimal point y0, r the constraint IZp~ I _ Cpq will be satisfied for sufficiently large t since Z~q~ E el. Thus, ~pq(t)O __-- 0 for (p,q) 6 M N U N and large t i.e., ~p0q is F I R for (p,q) E M N U N. T h e following lemmas provide a way to compute a priori bounds on the dimension of the problem. 7.2.3. Let q~o be a solution to the primal problem (7.7) and let yO, ZOq be solutions to the dual. Then the following is true: -Cpq < Z~ <_ cpq i f (p,q) E N, 4)~ = 0 i f (p,q) 6 N and Iz~ < Cpq, 2 ~ , e ~ ( t ) = z~q(t) - e~q if (p,q) e M N a~d Zg~(t) > c~, Z~ + Cpq i f (p, q) e M N and Z~ < -Cpq, 0 i f (p,q) 6 M N and IZ~ffq(t)l_< cpq, = Z~ i f (p, q) E M. 4'~ is unique for all (p, q) in ( M N ) U M. Also, there exists an a priori bound a such that IIZ~ _< ~ for all (p,q) E N~ x N,o. Lemma
Proof. The first part of the l e m m a follows from the arguments used in Theorem 7.2.1. We now determine an a priori bound. For all (p, q) E M N the following is true:
7.2 The Combination Problem
1~9
< Cpq + ~ y ( H ) where the last inequality follows since H is a feasible solution and hence cvqll~Oqllx <_ f(~o) < f(H). For all (p,q) E N the following is true:
[Z~
<_ Cpq.
For all (p, q) E M
IZ~
<
2~vql~~
<
Pq V
_
_< 2Cvq[[r176112 -Epq < _
Pq V
cl,q "
Denote by dpq the upper bounds determined above, all of which can be determined a priori. Let a :=
dpq.
max
(p,q)EN, •
Thus a is an a priori upperbound on [tZ~ proves the lemma. Lemma
for all (p,q) E Nz x Nw. This []
7.2.4. Let
Cmin
:~
min
(p,q)EMNuN
Cpq.
If y E R c" is such that for all (p,q) E Nz x Nw IIZpqll~ < ~,
where Zpq(t) := E
u~j~oFLijk~o (t),
ijk)~o
then there exists an a priori computable positive integer L* such that Izpq(t)l < em~. f o r alt t > L'.
Proof. For notational convenience we index Fpq ijkx~ and bijk~~ where ijkAo vary as in the zero interpolation conditions by Fvq and bn respectively where n = 1 , . . . , cz. T h e vector in R r whose n th element is given by b'* is denoted by b. We interpret Fpq as a cx~ x 1 column vector equal to -
(F;~ (0), F ; ~ ( 1 ) , . . . ) ' . With this notation = ( F;1, FL, . . . ,
)y,
where Zpq is viewed as a infinite column vector with the t th element equal to Zpq(t). Therefore the condition
120
7. MIMO Design: The Square Case
IIZpqlloo < a for all (p,q) E N, x iV,,,, is equvalent to the condition
IIA'YlI~
< ,~,
where
F!~.
F~2
..
F:~
:
:
:
A !
F~I. : ~k
F;; :
:
E rl,l z fl, ~o
T h e m a t r i x A := (A') ~ is the m a t r i x which has c~ rows each for one zero interpolation condition. If 9 E ~ " • is stringed out into a vector as below: r
~11
'
then A ~ = b gives the zero interpolation conditions. It is known t h a t the zero interpolation conditions are independent and therefore A has full row rank 9 Equivalently, A' has full c o l u m n rank. Choose L > c~ such t h a t D E R e` • with rows from the first L rows of A' is invertible. Consider D as a m a p f r o m (R c', ].]1) to (R ~', [.l~). Now, as y = O-1Dy we have [Y]I = [D-1Dy]I <_ [D-l[oo,l[Dy[oo <_ ]D-l[oo,la,
(7.10)
where ID-11oo,1 is the induced n o r m of D -1. T h e last inequality follows because IIA'YlI~ < ~ which implies IDytoo < 4. From the note after L e m m a 7.1.1 we know t h a t for any n and for any (p, q) E N~ x N,~ there exists an integer Tpq, such t h a t t > Tvq, implies t h a t
IF~(t)l <
ernin
Let L" = m a x T p q n . pqn
Note t h a t L* is d e t e r m i n e d a priori9 Let t > L* then
7.2 The Combination Problem c~
121
ct
IZpq(t)l = I~--~ Y~q(t)y,~l _< ~ n=l
c~
Cm~n -< ID-'I~,,=
E
lY.llY~q(t)l
n=l
ly,,I -<- cm~,~.
The last inequality follows from equation (7.10). This proves the lemma. We now state the main result of this subsection.
[]
T h e o r e m 7.2.2. It is true that u equals max{
E (p,q)EM t = 0
Zvq (t)~+
E E - ~ P q q ) P q (t)2+ E YiJk~obiJk~~ (p,q)EMN t = 0 i,j,k,Ao
subject to y E Rc',qSpq E R L• for all (p,q) E MN, --epq < Zpq(t) <_ epq if (p,q) E N ]. (Ill) --evq <_ 25pqr -- Zpq(t) <_ Cvq if (p, q) E M N S for all t = 0, 1, 2 , . . . , L * where L" is determined a priori as given in Lemrna 7.2.4. Furthermore, the optimal of the primal (7. 7) ~Oq is unique for all (p, q) E (MN) U M. Proof. An easy conclusion of Lemma 7.2.3 and Lemma 7.2.4 is that v is equal to c~
maxr E E-4 1q Zvq(t)2+
L~
E
E --dvq~pq(t)2+ E
YiJkx~176
(p,q)6m t = O (p,q)~MN t - O i,j,k,Ao subject to y E R c~,q~vq E R r" for all (p,q) E MN, -evq <_ Zvq(t ) <_ Cvq if (p, q) E N "~ 2-dvq~vq(t ) = Zvq(t ) - cvq if (p,q) E M N and Zvq(t ) > Cvq' i (11) = Zpq(t) + Cpq if (p,q) E M N and Zpq(t) < -Cpq, = 0 if (p,q) E M N and [Zpq(t)[ < Cvq, for all t = 0, 1 , 2 , . . . , L*. L* is determined a priori as indicated in L e m m a 7.2.4. Indeed, from L e m m a 7.2.3 we know that we can restrict the maximization in Theorem 7.2.1 over the set y E R c' such that [[Zvq[[c~ < ol. Now we conclude from L e m m a 7.2.4 that there exists an integer L* such that t > L* implies that IZpq(t)l < cmin < Cpq. One of the conditions stipulated in (I) of Theorem 7.2.1 is that
~bpq(t) = 0 if (p,q) E M N and [Zpq(t)[ < Cpq. Note that the condition 2~vq(t ) = Zpq(t) for (p, q) E M ],as been incorporated into the objective functional of this theorem. The fact that ~p0q is unique for all (p, q) E (MN) U M is due to Theorem 7.2.1. To bring the problem into tile form stated in the theorem, denote the right hand side of the equation in the statement of the theorem by P. Suppose,
122
7. MIMO Design: The Square Case
y E R c', Zvq(t) (determined by y), and q~pq(t) satisfy condition (II) above for all the appropriate indices. If (p, q) E M N and Zpq(t) > %q then --Cpq : 2-CpqCpq(~) -- Zpq(t) < Cpq,
because Cpq > O. Similarly it can be checked that all the other conditions of (II1) are satisfied. This implies that P > u. Suppose, y E R c', Zpq(t) (determnined by y) and 4~pq(t) satisfy condition (Ili) of Theorem 7.2.2. Let q)pq(t) be defined as follows: 2-dpqqSpq(t) = Z p q ( t ) - epq if (p,q) C M N and Zpq(t) > Cpq, 2"dpqOpq(t) = Zpq(t) + Cpq if (p, q)E M N and Zvq(t ) < -%q, Opq(t) = 0 if (p,q) E i N and IZpq(t)[ < %q, for all 0 < t < L* (i.e ~vq(t) satisfies constraints (II)). Suppose, (p, q) E M N and Zpq(t) > Cpq then 0 ~__ 2"Cpq~pq(t) = Zpq(t) - Cpq ~__ 2-dpq~vq(t ).
Therefore, --2
2
--~pq (4) >__--~pq (t).
Similarly, the above condition follows for other indices. Thus, given variables satisfying (III) we have constructed variables satisfying (II) which achieve a greater objective value. This proves that ~ _< u. This proves the theorem [] Thus, we have shown that the problem (7.7) for a square plant is equivalent to the finite dimensional quadratic programming problem of Theorem 7.2.2 with the dimension known a priori. Such types of programming are well studied in the literature and efficient numerical methods are available (e.g., [22]). We should point, out that the sum ~-'~4~o 4~vq 1 Zpq(t)2 appearing in the quadratic program above is a quadratic function of yijk),o with coefficients of the form < pijk)~o >, which can be readily computed pq ~-prst-~o pq The solution procedure consists of solving the quadratic program of Theo0 and r176 with t = 0, 9 9", L* rem 7.2.2 to obtain the optimal variables Yijkxo for all (p,q) E M N . The latter set completely determines (pOq for all (p, q) E M N . From Yijk~o~ the optimal ~p0q for (p, q) E m can be computed as CpOq 1 0 = 2~pq Z~q (see Lemma 7.2.3). The quadratic program of Theorem 7.2.2 0 for (p, q) E N. Nontheless, does not yield immediately any information on ~p~ 0 for (p, q) E N can be easily obtained o n c e q)pq u for (p, q) E M N U M are ~pq found through the following (finite dimensional) optimization:
minimize
E Cpqll~Pqlll (V,q)eN
subject to
45pq E R L" E < FiJkX~ (p,q)E N
> = biJkX~ --
E (p,q)E ( M N )u M
< FiJk~~
0 >
7.3 The Mixed Problem
123
This problem can be readily solved via linear programming [3]. From the developments above it follows that the structure of an optimal solution ~p0 to the primal problem (7.7) has in general an infinite iiupulse response (IIR). The parts of ~0 however that are contained in the cost via 0 ,s with (p,q) in M N U N, are always FIR. their/71 norm i.e., the ~pq Finally, it should be noted that the optimal solution has certain properties related to the notion of Pareto optimality (e.g., [22]). In particular, from the uniqueness properties of r it is clear that there is no other feasible cp such that 1]4ipqlI2 < ]l~p~ for some (p,q) E M N U M while tl~ppqlll < IIr176 or, conversely, there is no 4) such that Ilgipqlll < IIr176 for some (p, q) 6 M N U M while ll4ipqll2 < [Iq~p~
7.3 The
Mixed
Problem
In this section we make the statement for the mixed problem precise. We solve the mixed problem via a related problem called the approximate problem. For both the mixed and the approximate problems the following notation is relevant: Let N~ := { 1 , . . . , n w } and let Nz := { 1 , . . . , n z } . Let S be a given subset of Nz. S corresponds to those rows of the closed loop which have some part constrained in the/71 norm. We denote the cardinality of S by c,~. Let Np for p 6 S be a subset of N,o. Np characterizes the part of the pth row of the closed loop that is constrained in the el norm. The (positive) scalars 7p for p 6 S represent the/?1 constraint level on the pth r o w . It is assumed that 7p > tJ0.1. Finally, 7 6 R c~ is a vector which has 7p for p 6 S as its elements. We define a set F, C ~ " x,~ of feasible solutions as follows: q~ 6 ~ " • is in F-~ if and only if it satisfies the following conditions: a)
~
II~pqlll < ~p for all p E S,
qENp
b) 4i 6 0
(i.e 9 is an achievable closed loop map).
r is said to be feasible if r 6 _r'~. Let M_M_be a given subset of N~ • N~o. The problem statements for the mixed and the approximate problems are now presented. Given a plant G the mixed problem is the following optimization: p-~ := inf { Z @6F.~
IlCpqll~}.
(7.11)
(p,q)EM
Given a plant G the approximate problem of order ~ is the following optimization: p76 := ver,inf(
~ (p,q)6M
II~pqll2~+~--~ ~ II~pqll~}.
(7.12)
p68 q 6 N n
W e will further assume that for all (p,q) 6 Nz • Nw the component qSpq appears in the ~i constraint or in the objective function or in both. Note that
124
7. MIMO Design: The Square Case
M__M_is the set of transfer function pairs whose two norms have to be minimized in the problem. The problem is set lip so that one can include the constraint of a complete row in the closed loop m a p (P or part of a row. This way we can easily incorporate constraints of the form I](pI]l < 1 which is cquivalent to each row having one norrn less than 1. Also, the 7/, norm of (P can be included in the cost as a special case. We also define the following sets which help in isolating various cases in the dual formulation: N := Ui6s(i, Ni), which is set of indices (i, j) such that cPij occur in the/~1 constraint,
M N := M A N , which is the set of indices (i, j) such that 4'ij occurs in the g~ constraint and its two norm appears in the objective, M := M \ M N , which is the set of indices (i, j) such that two norm of Oij occurs in the objectivc but it does not appear in tile t?l constraint and N := N \ M N , which is the set of indices (i, j) such that ~ i j o c c u r s ill the gl constraint but its two norm does not appear in the objective. With this notation we have, M = ( M N ) U M and N = ( M N ) U N. We assumc that M N U M U N equals N~ x N~. This implies that for all (p, q) 6 Nz • N~ r appears in the i71 constraint or in the objective function or in both. We define Sm : g ? ' x ~ __4 R and ]a~ : g ? ' x " ~ --4 R by
Sm(r := E
IlCpqllg=
(p,q)6-M"
~
Ilepqll ,
(p,q)E(M N ) u M
and (p,q)6 M (p ,q)6 M Nu M
p6S qeNp (p,q)@ M Nu N
which are the objective functions of the mixed and the a p p r o x i m a t e problems respectively. We make the following assumption. Assumption
6 The plant is square i.e., n~ = nu and ny = n~.
We now solve the a p p r o x i m a t e problem and later we give the relation of the mixed problem to the a p p r o x i m a t e problem.
7.3 The Mixed Problem
125
7.3.1 T h e A p p r o x i m a t e P r o b l e m In this subsection wc study the approximate problem of order 5. This problem is very similar to the combination problem. The techniques used in solving the combination problem are often identical to thc ones used in solving the approximate problem. We state many facts without proof. These facts can be easily deduced in ways similar to the ones used in the solution of the combination problem. 'the importance of this problem comes from its connection to the mixed problem. As in the combination problem, we define for notational convenience
Zvq(t) :=
YiJkx~ jkx~
Z i,j,k ,,ko
T h e o r e m 7.3.1. There exists q5~ 9 F~ ,such that 5 (p,q)E M N o M
(p,q)E M N u N
Therefore, the infimum in (7.12) is a minimum. Moreover, the following it is true that p~ equals max
Z (p,q)EM
oo t=0
E
Z
(p,q)EMN
t=0
bi j k A ~
i,j,k,,ko
pES
subject to y E R c ' , $ v q E g l for all (p,q) E M N U M ,
-(5 + ~p) < Izpq(t)l < (5 + yp) if (p, q) 9 N 2Ovq(t ) = Zvq(t ) - (5 + yv) = Z p q ( t ) -t- (5 -t- y p )
=0 = Zpq(t)
i f (p,q) 9 M N , Zvq(t ) > (5 + yv), i f (p, q) 9 M N , Zpq(t) < - ( 5 + Yv)' i f (p, q) 9 MN, IZpq(t)l < (5 + ~p),
] (IV)
if (p, q) 9 M, for all t = 0 , 1 , 2 , . . . .
In addition, the optimal ~5~ is unique for all (p, q) E (M N) 0 M. Proof. The proof follows by utilizing results analogous to Lemmas 2 and 3, and similar arguments to Theorem 7.2.1. [] To get an analogous result to Lemma 7.2.3 it is clear that we have to get an a priori bound on the dual variable ~. L e m m a 7.3.1. Let (p0,1 denote a solution of the standard gt problem (7.5).
f~a(q5~ is the objective of the approximate problem evaluated at a solution of the standard ~1 problem. If (~-~, y'~) is the solution to the approximate problem as given in Theorem 7.3.1 then ~p _< fa~(~~ ~'p - - 120,1
for all p E S .
126
7. MIMO Design: The Square Case
Proof. Take any c 6 R such that uo,1 < c < minTp. p6s Let -t o 6 R c" be given by 7v~ = c. Let 6 pwo :=
inf f ~ ( ~ ) .
4'fi F.,o
Then from Corollary 3.3.1 we have,
> < ~ 5o - ,..~5 < ~ o5 < f ~ ( ~ o , 1 ) .
< v - v~ Therefore, pfS
As (Tp - c) > 0 we have < f~((po,1) yp _ - for a l l p 9
-~
"/p
--
c
This holds for all c > /I0,1and therefore the l e m m a follows. Now we state the l e m m a analogous to L e m m a 7.2.3.
[]
L e m m a 7.3.2. Let q5~ be a solution to the primal problem (7.12) and let ~o, yO, zOq be solutions to the dual. Then the following is true:
- ( 5 + ~p) <_ Z~ O~ = 0 2Ov~ = Z~ = Z~ = 0 = Z~
<_ (5 + ~p) i f if - (5 + ~p) i f (5+yp) i f if if
(p, q) 6 N, (p,q) 6 N and IZ~ ( p , q ) 9 M N and Z~ (p,q) 9 M N and Z~ (p, q) 9 M N and ]Z~ (p, q) 6 M.
< (5 + yp), > (5 + ~p), < -(5+yp), < (5 + ~p),
qS~ is unique for all (p, q) 6 ( M N ) tO M. Also, there exists an a priori bound c~ such that IIZ~ < ~o where 2 ~ oi ~o ._ ~(2 ~ + ~ + ~fo(~, )+ 2 V,. , a ~~ - . "~p /20,1
J
- -
for all (p, q) 9 Nz • Nw. r
is a solution to (7.5).
Proof. Similar to the proof of L e m m a 7.2.3. Note here that to the a p p r o x i m a t e problem is always the gl optimal 4 ~ (used in L e m m a 7.2.3) which m a y not be feasible in this not satisfy the gl constraint. This is why ~0,i appears in C~a
a feasible solution as opposed to H case since it m a y the expression for []
Using arguments similar to the ones used in L e m m a 7.2.4 we can determine L~ such that ]Zpq(t)l < 5 for all t >_ L:. The following theorem follows using arguments identical to that used in proving T h e o r e m 7.2.2.
7.3 The Mixed Problem
127
T h e o r e m 7.3.2. It is true that #~ equals 8 = max{
E
er E -xzpq(t)2+
(p,q)EM t-=O
L*. Z E -(~)pq (t) 2 (p,q)EMNt=O
+
i,j,k,Ao
pEs
subject to E R e" , ~ > O, y E R e` ,~pq(t) E R L" f o r all (p, q) 9 M N , --(~ + ~p) < Zpq(t) <_ (g + ~p) if (p, q) 9 Y - ( ~ +~p) < 2gbpq(t)-Zpq(t) <_ (~ + ~p) i f (p,q) 9 M N f o r all t = O, 1 , 2 , . . . , L~. Furthermore, the optimal qS~ of the primal (7.12) is unique for all (p, q) E ( M N ) O M. Thus, we have reduced the approximate problem to a finite dimensional quadratic optimization problem with a priori known dimension. The same remarks relative to the solution procedure hold as in the combination problem. Note that again for the optimal solution ~0 the 45p~ with (p, q) ill M N U N, are always FIR while the 4ipq~,s with (p, q) in M, are IIR. 7.3.2 R e l a t i o n b e t w e e n t h e A p p r o x i m a t e a n d t h e M i x e d P r o b l e m In this section we show how to solve the mixed problem using the results of the approximate problem. Note that the approximate problem reduces to a finite dimensional quadratic optimization problem with a priori known dimension. For the mixed problem (1-block) a similar Lagrange duality approach can be used to show that the problem can be converted to a finite dimensional convex problem with some of the optimal 45Oq being possibly FIR and some IIR as in the approximate problem (see Theorem 7.3.1). Nonetheless, even in the single input-single-output-case, an a priori bound on the dimension of the equivalent quadratic problem has proved elusive [14]. In addition, the MIMO problem is substantially more complex, for one cannot determine a priori which of the optimal dual variables ~pp corresponding to the gl constraint is active (i.e., ~ > 0.) Hence, the a priori determination of which (if any) of the otpimal ~0q is FIR is not possible. This can make the solution procedure extremely complicated and virtually intractable by trying to examine all possibilities. This difficulty can be circumvented by considering the approximate problem. The results in this section show that a suboptimal solution to the mixed problem can be obtained by solving an approximate problem. The following theorem shows that we can design a controller K for the mixed problem which achieves an objective value within any given tolerance of the optimal value by
128
7. MIMO Design: The Square Case
solving a corresponding approximate problem. The existence of a solution for the mixed problem and the optimal r being unique for (p, q) E ( M N ) U M can be proved in a similar manner as was done for the approximate problem. 6
T h e o r e m 7.3.3.
#-~ _< P~ _< P~ + 5]7[t.
Proof. It is easy to show that #-/ _
II~pqlll _< ~p for all p ~ S qE Np
This implies that
f~(~) _<
~
II~qll2 ~ + al-~[1,
(p,q)e( M N)tJ M
Taking infimum over F~ on both sides in the above inequality the theorem follows. [] The next theorem is a result on the convergence of the optimal solutions of the approximate problems to the solution of the nfixed problem. T h e o r e m 7.3.4. Let q~'~ be a solution of the approximate problem of order
88 Then, there exists a subsequence {~,~k } of ~ r is a solution of the mixed problem and ~ , , ~ _+
~0 i~ th~ w((e~.•
", ~2" . •
and 4)~ E e~" x,~, such that
) topology.
Furthermore, ~ q ~ ~o in the W((g2)*, g2) topology for all (p, q) E (MN) O M. Proof. See Appendix
7.4 An
Illustrative
Example
Here we illustrate the theory developed with an example. Consider, the two input single output plant P as depicted in Figure 2 where u :=
is the U2
input, w is tile exogenous disturbance, and y is the measured output. The plant P is given by /5 = (A - 0.5 1). The regulated output is given by z := (y us)'. Therefore,
= )01
I
=
7.4 An Illustrative Example
129
As the plant P is stable a valid Youla parametrization is given by:
~/()~) ~-Gl1:
(~),
U()~)--('~12 :
( ~ O 0"5 1 1 ) a n d I)'(A)= G21 = I .
For this problem nz = 2, n,o = 1, ny = 1 and n~ = 2. Let r be an achievable closed loop map then
=
-
This implies that
Therefore, ql and q2 are in gl only if 1
A. 0.5(1-
r
+ ~b2(A)) E ft.
Therefore ~b is an achievable closed loop map if and only if E g~xl and 1 - ~1(0.5) + r
= 0.
The abow ~.equation is the only interpolation condition. Following, the notation developed in the earlier sections we define F1 : = ( 1 , ~1, (
)-', ...), and F~ := (-1, _ -2' _(
)2),.
It can be checked that the interpolation condition is equivalent to < F,~b > = < Fl,~bl > A- < F2,~b2 > = 1. As n= = nz and nw = ny the system is square and rank interpolation conditions are absent. First we solve the standard multiple input multiple output el problem for the given system G.
y
Ul it2
It
Fig. 7.1. A two input single output example
130
7. M I M O Design: T h e Square Case
7.4.1 S t a n d a r d s
Solution
In this subsection we are interested in solving the following optimization: u0,1 =
inf
4' Achievable
IIr
=
inf
: 1
114)111.
We refer the reader to section 12.1.2 of [3] for the the theory used to solve this problem. It can be easily verified that the above problem reduces to the following finite dimensional linear program:
min
it,
",r subject to 1
r
+ E~+(t)
= u for i = 1,2,
+ r
t=O
ct(o) - r
+ ~'(~t(1) - ~i-(1)) - (r r - 89 ~ ; ( 1 ) ) = 1,
r
> o.
Using the linear programming software of MATLAB we obtain that an optimal is given by
o0,__ This implies that u0,1 : 0.5.
7.4.2 Solution o f t h e M i x e d P r o b l e m In this subsection we are interested in solving the following optimization for the given system in Figure 2:
m := ~ Ai~vab,e{ll~l122 : I1~111 <_ i, 9 E g~xt}. We give the corresponding approximate problem of order 0.1 by the the following optimization: 0.1
m
:=
9
~ Alhnfvablo{ll~ll~ + 0.111~,111 +0.111~2111 : 11r
_ 1, ~ e g ~
x
1}.
The dual of the above problem using Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) is given by: #o.1 := max
inf {1l~112 2 + (0.1 +Yl)11~1111 + (0.1 +Y~)11~2111
@E/~ xx
- y < F,O > + y - y 1 subject to ~1_>0, Y2 >_ O, y E R.
- Y~}
7.4 An Illustrative Example
131
In keeping with the notation defined in earlier section we denote
Z := yF, that is Z1 = yF1 and Z2 = yF2, 1oI (~) := ii~ii122 + ii~iii 2 + o.iII~iI11 + o.III~2111. Therefore, we have, f ~ = 1 2 + 0 + 0 . 1 + 0 = 1.1 and fo.1 ( ~ 0 j ) = (0.5)2+ (0.5)2+0.1(0.5+0.5) = 0.6. Let ~ and y~ be the solution to the dual problem stated above and let ~ be the solution to the primal. We define L : ~ x l _+ R
by 5 ( r
I[(P[[22 + +(0.i + fflf)l](~l[]1-Jr-(0.I 4-ff;)I]~2[[I-- < Z;,(Pl > - <
Z~,(/is > . From Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality theorem) it follows that (P~ minimizes L((/)) over all (/i E g~• This implies that ~ ( t ) minimizes,
9 1(t) 2 4- (0.1 4- yT)l~1(t)l-
z?(t)~1(t),
(7.13)
over all ~ l ( t ) E R. We can discard the ~ l ( t ) ~ R which have an opposite sign to that of Z~(t) because then - < Z~, r >_> 0. Therefore ~7(t) minimizes,
r (t) ~ + ~ (t)((0.1 + ~ ) s g ~ ( z ? (t)) - z? (t)), over all (/il(t) E R which satisfy assume that Z~(t) >_O. Then r Ol(t) 2 4- r (t)((0.1 4- ff~) -
O~(t)Z~(t) >_ O. Without loss of generality minimizes,
Z'~(t)),
over all ~l(t) E R which are positive. Now if (0.1+y~) _> Z'~(t) then objective is always positive as ~i I (t) is restrained to be positive and therefore the minimizer Or(t) is forced to be equal to zero. If 0 < (0.1 4- ff~) < Z'~(t) then the coefficient of ~(t) in the objective is negative and therefore we can do better than achieving a zero objective value and this forces 4~7(t) > 0. With this knowledge we can now differentiate the unconstrained objective function in (7.13) to get 2(P'~(t) = Z'~(t)- (0.1 + ~ ) . Similarly, if Z~(t) < - ( 0 . 1 +ff~) < 0 then 2(V~(t) = Z~(t) 4- (0.1 4- y~). In any case the following holds:
IZ?(t)l ___0.1 4- ~7 + 21r
___0.1 + fO.1(~0,1) 4- 211~111. 7p -- U0,1
The second inequality follows from Lemma 7.3.1. From the fact that I1(/i~111 < 1 we have 0.6 IZlr(t)[ < 0.1 + 1 - 0.-----~4- 2 = 3.3. Note that Z'~(t) ly~l ___3.3. Now, IZ?(t)l =
= y'rFl(t). As IZ~'(0)l _< 3.3 it follows that ly~Fl(0)l = 1
ly'~Fl(t)l < 3.3 Fl(t) = 3 . 3 ~ .
This implies that we can determine
a priori L'~ such that if t > L~ then
132
7. MIMO Design: The Square Case
IZ~(t)l ~ 0.1 _~ 0.1 + ~1~, which will imply that ~l~(t) = 0 if t :> L~. L~ = 5 does satisfy this requirement. A similar development holds for 4)5. From Theorem 7.3.2 we have that the dual can be written as: 2
#o., =
5
max{E E-r
2
+ y- E
i = l t=O
~i}
i=l
subject to y E R ~, y > O y E R,~i E R6 i = l,2 Zi(t) ~_ (0.1 + Yi) for i = 1,2. for all t = 0, 1 , 2 , . . . , 6 .
- - ( 0 . 1 + Yi) ~ 2(Pi(/) --
Using MATLAB software we obtain that the optimal ~7 which is unique for this example is given by r
= 0.3972 + 0.1732)~ + 0.0617()~) 2 + 0.0058()~) 3,
Therefore, f ~ ) =- 0.5109, II(P'~II1 -- 0.6379 and II(P'~II2 = 0.6191. This implies from Theorem 7.3.3 that if (p0 represents the solution of the mixed problem then If-,(~~ - / - ~ ( ~ ) l
_< 0.2.
This completes the example.
7.5 Summary In this chapter we considered two related problems of MIMO controller design which incorporate the 7t2 and the gt norms of input-output maps constituting the closed loop directly in their definitions. In the first problem termed as the combination problem, a positive linear combination of the square of the 7/2 norms and the gl norms of the inputoutput maps was minimized over all stabilizing controllers. It was shown that, for the 1-block case, the optimal is possibly IIR and the solution can be nonunique. However, it was shown that the problem can be solved exactly via a finite dimensional quadratic optimization problem and a linear programming problem of a priori known dimensions. In the second problem termed as the mixed problem, the 7/~ performance of the closed loop is minimized subject to a gl constraint. It was shown that suboptimal solutions within any given tolerance of the optimal value can be obtained via the solution to a related combination problem.
7.6 Appendix
133
7.6 Appendix 7.6.1 I n t e r p o l a t i o n
Conditions
We analyse here in some detail the zero interpolation conditions. ;1xn, and /33" E ~?~xl we have Given a l ~ ~
~,(~)4(~)b~ (~) : ~ ~ ~,~ ( ~ ) ~ ( ~ ) ~
(~)
p=l q=l
= ~ ~ ~(~,~ 9~
9 ~jq)(t)~ ~
p=l q=l t=0
=~ ~
~,~(t - l ) ( ~ , , ~ ) ( ~ ) ~
p=l q=l t=0 l=0
=E E
aip(t - l) E fljq(l - S)Opq(S)At
p=lq:l flz
t=0 l=0
ntv
OO
OO
s=0 (:~
: p=lq----1 EE E E E t=O l=O 8----0 Therefore, it follows t h a t ~Iz
n w
=
~,p(t - t)~j~(l - s)~p~(~) (~,)(k) ~:~o p = l q = l t = 0 1=0 s=O
= EEEEEcqP(t-l)fljq(1-s)~pq(s)(At)(k)x=,~o
"
p=l q=l 8----0 t=O l-=O
Define F ijkx~ E ~ •
by O0
ijkAo F~q (s) := E E aip(t - l ) f l j q ( l - s) (At) (k) X=Xo
(7.14)
/=0 t=0
It can be easily verified that for any 9 E s
•
( ~ ) ( ~ ) ( ~ o ) =< ~, F ' j ~ ~ >.
(7.15)
Proof of Lemma 7.1.1 : We first show that if IAI < 1 then there exists an integer T such t h a t t > T implies I(,kt)kl _< l(At+l)k I where t is an integer. Let T be any integer such t h a t T > ~ and let t > T be an integer then
I(,V)(k)l
- I(At+l)(k)l = I(t)(t - 1 ) , . . . , (t - k + 1)At-k I - I ( t + 1 ) ( t ) , . . . , (t - k -t- 2)At-k+1[ = (t)(t -- 1 ) , . . . , (t -- k + 2)1,~1'-k. =
=
{t - k + 1 - (t + 1)IAI} - 1),..., (t - k + 2)IXl '-k. {t(1 -I,Xl) - k -I,Xl + 1} (t)(t 1) .... , (t - k + 2)1~1'-k(1 (t)(t
{t- (~)} >0.
- I,Xl).
134
7. MIMO Design: The Square Case
Suppose s is an integer such t h a t s > T where T is as defined above. F r o m (7.3) we have
IFSq
(~)1 =
~,(t
t)~(t
s)(~*)(k)
1=0 t = 0
A=Ao
A=Ao
Ll=s
A=Ao
t=s (21o
<
sup s_
oo
Ic~,p(t
I(~')(k)l~,=,,o ~ l=s
-
OI I~'jq(t - s)l
t=s
oo
< <
sup s
I(A')(k)l~=~o ~
I~q(l
-
s)l ~
l=s
laip(t
- l)
t=s
I('~')(k)lx=xo II~pllx II~'qlll
= (s)(s -- 1)... (s -- k + 1)l,~01'-kll~iplll II/~jqlll. F r o m the ratio test as ])%1 < 1 it follows t h a t
ijkA~ -]7'pq
E gl because
lim (s + 1 ) ( s ) . . . ( s k + 2)]Aol ('+1) = ])%] < 1. ( s ) ( s - 1 ) . . . (s - k + 1)lAoP
,-*~
Note t h a t given any e > 0 we can always choose a To > T such t h a t
s~l,~01'-kll~,pllx II~jqll~ < r for all s > To, which m e a n s t h a t we can choose T0 > T such t h a t
IF~Jk~~
< e for all s >
To.
This proves the l e m m a . [] T h e the elements G , ~ , q t and G i 3 j p t c o r r e s p o n d i n g to the rank conditions can be defined as [3]:
a~,q,(O :=
,~;(t
r
.
c~,,,(t):
.
....
'
0
...
0
.
.
.
~'](t - 0-..
}p'" ,-o~.
9
.
.
0
,
.
,
9
.
.
O
.
.
.
7.6 A p p e n d i x
135
As 6~i and ~j are polynomial vectors we have that Ga,qt and G~jpt are in ~1, Xnw .
7.6.2 E x i s t e n c e o f a S o l u t i o n f o r t h e C o m b i n a t i o n P r o b l e m Here we show that a solution to (7.7) always exists. Proof of Lemma 7.2.1 : As
f(~) :=
pqllpql122+
E (p,q)EMNuM
epqll pqlll, (p,q)6MNuN
we have r'e-- inf { f ( 4 ~ ) : f ( r
This implies that we can restrict 45 in the infimization to satisfy the conditions
that [[r ~ is bounded above by some constant, ~ for all (p, q) 9 M and the [[~pq[]l is bounded above by some constant, C for all (p, q) 9 M N t3 g . Define B = {~P9 g~.xn~ : ]l~bpqll2 2 <~ C f o r all (p,q) 9 M and I]4~pqlll _< C for all (p, q) 9 M N t3 N}. Therefore, ~,~=
inf f(q~). ~6@,ClB
From, above we conclude that there exists a sequence {~"} 9 ~9~ f3 B such that 1
f(~") < re + -.
(7.16)
n
As {(pn} 9 O~ [3 B we have the following
< FiJk:'~ )'~ > = bijkx~
(7.17)
II'P~qllX < C for all (p,q) 9 M N U N,
(7.18)
I](~qll2 2 < U for all (p, q) 9 M.
(7.19)
From (7.18) we conclude that for all (p,q) 9 M N U N, ~ q belongs to a bounded set in (co)*. From the Banach-Alaoglu result (see Theorem 2.3.1) and separability of co we conclude that for all (p, q) 9 M N U N there exists a subsequence {~pq nk } of {45~q} and (Ppq 0 such that {~pq ,~ } --4 ~pq 0 in the W((co)*, co) topology. This implies that < "/), ~)p~ >--4"< ~), ~p0q >
for all v 9 co.
(7.20)
Similarly, we conclude that for all (p, q) 9 M there exists a subsequence { 4 ~ ' } of {4~pq -k } and Ovq 0 such that { ~ ' } --+ ~p0q in the W((/~)*,ts) topoiogy. This implies that
136
7. MIMO Design: The Square Case 0 < v , S ; ; ' >--+< v,Svq > for all v 9 e2.
(7.21)
Thus, we have defined ~0 9 .4 by the limits (7.20) and (7.21). Note that
< FiJk~~
>':
E
< FPjk~~
> "
(p,q)EN, • Therefore, it follows from (7.17), (7.21), (7.20) and Lemma 7.1.1 that
< FiJk;%,qb 0 > = biJ k)~o. Similarly, the rank interpolation conditions are also satisfied by S ~ From the above discussion it follows that S ~ is in Oe and therefore, f(~b0) __
cpqlir
~
-cpql['Pvqll2 0 2+
~ (p ,q)r ( M N )u M
_> lie"
(v,q ) e N
From (7.20) and (7.21) it follows that for all t 9 R and for all (p, q) 9 Nz • N~,.
S ; ; ' (t) -+ ~~
This implies that for all T as s ~ T
tin0
(v,q)E(MN)uM T E ( t=0
(p,q)E(MN)uN
E "Cpql~Opq(~)[2 -4- E -~pq[~Oq(,)[2) (p,q)E(MN)uM (p,q)e(MN)uN
We have from (7.16) that for all s and T T
~--~( t=0
~ ~,~ql~;;'(t)l:~+
(p,q)e(MN)uM
~
1 cvqlS;;'(t)l)
% v~ 4- - - .
(p,q)E(MN)uN
Ilk"
(7.22)
In (7.22) first letting s -+ cr and then letting T ~ cr we have that f ( S ~ < ue. This proves the lemma. [] 7.6.3 R e s u l t s o n t h e M i x e d P r o b l e m
Proof of Theorem 7.3.4 : From Theorem 7.3.3 we have that
f# (4,") < ~w + -11,.;,11. n
This implies that there exists a constant CI such that II~;qll.~ < c1, for all (p, q) E. (MN)tJ M. From the Banach-Alaoglu (see Theorem 2.3.1) result we conclude that there exists a subsequence { ~ } of { ~ q } and S~ E e2 such that 0 in the W((s
e2) topology for all (p, q) E ( M N ) LJ M.
7.6 Appendix
137
This implies that for all v E g2 and for all (p, q) E (MN) tO M, <
,~k >--~< V,4pq o > V,4pq
(7.23)
as k -.), oo.
4 "~ E F~ for every k therefore [14pq Ill < "/p for all p E S.
qs Nr nk
_..=o
We conclude that there exists a subsequence { 4 ~ ' } of {4vq } and 4pq E ~ , x,,~ such that for all v E co and for all (p, q) E (MN) U N, _---0 < U, 4p~' >---+< V, 4pq > as s ~ 0(3.
(7.24)
From the uniqueness of the limit for all (p, q) E ((MN) U M) M ((MN) U N), 4pq-~: 4pq.O Thus, for every (p, q) 6 N~ x Nw we have a sequence { 4 ~ ' } which converges to 4p0q in the W((t2)*,g2) topology (note convergence in W((co)*, co) implies convergence in W((~2)*, t2)). For all s we have
114 3.112 <
#~ +
(p,q)6(MN)uM
1 l~11. nk"
From convergence in the ~2 weak star topology we have
114~
2 < ~.
(p,q)6(MN)uM
Similarly, it can be shown that
114~
_< ~
for all p E S.
q6Np
Also, from equations (7.23), (7.24) and the fact that the zero interpolation conditions can be characterized by elements in e2 we know that 40 satisfies the zero interpolation conditions. To prove that 40 E F-~ we still have to prove that 4v~ E el for all (p, q) E M. This follows in a similar manner to that of the combination problem. Analogous to what was done in the combination problem we define .A and Oe to be the sets
{4 E ~22"xn'~ : 4pq 9 el for all (p,q) 9 (MN) U N} and {4 9 ,4 : 4 satisfies the zero and the rank interpolation conditions}, respectively.Finally _F~ :-- {4 9 O~: ~
[]4pqlll _< 3'p for all p 9 S}.
q6Np
We also define the corresponding optimization problem/.t~ as p; := inf fro(4).
~EL.y
(7.25)
138
7. MIMO Design: The Square Case
We can show in a similar manner to the combination problem that the solution to (7.25) exists and is such that its solution is in (9. This proves that p~ = #~ and also that the solution to (7.25) is a solution to (7.11). The 4 ~ we have constructed is a solution to (7.25) and therefore we conclude that it is a solution of (7.11) and that
II ~
2=
(p,q)E(MN)uM The statement about the original sequence converging for (p, q) E (MN) U M follows from the fact that for the mixed problem ~0q is unique if (p, q) E (MN) t.J M. This proves the theorem. []
8. Multiple-input Multiple-output Systems
Most approaches which incorporate the gl objective characterize the achievability of a closed loop map through a stabilizing controller by using zero interpolation conditions on the closed loop map. This was the approach taken in Chapter 7. Computation of the zeros and the zero directions can be done by finding the nullspaces of certain Toeplitz like matrices [3]. Once the optimal closed loop map is determined the task of determining the controller still remains. The closed loop map needs to satisfy the zero interpolation conditions exactly to guarantee that the correct cancellations take place while solving for the controller. However, numerical errors are always present and there exists a need to determine which poles and zeros cancel. These difficulties exist even for the pure MIMO gl problem, when zero interpolation methods are employed. However, in [24] it was shown that converging upper and lower bounds can be determined to the gl problem by solving an auxiliary problem which does not require zero interpolation and thus avoids the above mentioned problems. In this chapter we study the 7/~ - gl problem for the general case. Unlike the square case it is difficult to obtain exact solutions in the general case. We show that converging upper and lower bounds can be computed without zero interpolation for the most general MIMO case. This provides an attractive method for solving multi-objective problems which incorporate 21 and 7t2 objectives. This chapter is organized as follows. In Section 8.1 we formulate the problem and define an auxiliary problem which regularizes the original one. In Section 8.2 we present converging lower and upper bounds for the problem. Finally, we conclude in Section 8.3.
8.1 Problem
Statement
Let H, U and V in the Youla parametrization be partitioned into submatrices as given below H=
H21HU2
, U=
U2
according to the following equation
and V = ( V 1 V 2 ) ,
140
8. Multiple-input Multiple-output Systems
H-U,Q,V:
\H21H22
-
V~
*Q*(V1V2),
for some Q E s215 where n~ is the number of inputs and ny is the number of measured outputs. The problem statement is: Given a plant G, positive real number 7 solve the following problem. inf
qes
[]g 22 -- U 2 * V , V21122 xn~
subject to IIH l l - u 1 * Q * v ' l l l _< 7. We denote by # the optimal value obtained from the above problem. Now we define an auxiliary problem which is intimately related to the one defined above. The auxiliary problem statement is: Given a plant G, positive real numbers ~ and 7 solve the following problem. inf IIH 2 2 - U 2 * Q * V=ll 2 O~g~. xn~ subject to
(8.1) IIHll - UI * Q * V'II~ <_7
IIQlll _< ~The optimal value obtained from the above problem is denoted by u. Note that in the problem statement of/~ the allowable Youla parameter Q which is in glnuXny needs to satisfy II H ' I u 1 * Q * v i l l i _< 7. Therefore it follows that [IU 1. Q * V l[ll = []H i 1 - U 1 * Q * V1 - Hill[1 <_ IIH i 1 - U1 * Q* Villi + IIHlll]I _< [Ig11[ll + 7 . Suppose, U1 has more rows than columns and ~,1 has more columns than rows and both have full normal rank. Thus the left inverse of U1 exists (given by ( 0 t ) -l) and the right inverse of ~1 exists (given by ( ~ l ) - r ) . Further suppose that ~fl and i~'1 have no zeros on the unit circle. Then it can be shown that there exists a/3 (which depends only on (U1)-t and ( ~ l ) - r ) such that IIQII1 _3. Indeed as U 1 and V 1 are left and right invertible it follows that Q = ( / ) - l ) - I / ~ ( ~ l ) - r . This implies that I[QI[1 _< II(U1)-tlll(llglllll + 7 ) l l ( V l ) - r l ] l =:/3. The assumption that U 1 and 1)-1 have no zeros on the unit circle ensures that /3 is finite. Thus if in the auxiliary problem we choose a = / 3 then the constraint ]IQII1 _< a is redundant in the problem statement of u and we get p = u. The extra constraint in the problem statement of v is useful because it regularizes the problem (as will be seen). The following lemma is a result on the uniqueness of the solution to (8.1). L e m m a 8.1.1. Let QO in g~,xn~ be a solution to (8.1). Let ~o = H - U * QO , v with ~22.~ = H22 _ U 2 , Qo , V2 antiC11'~ = H 1 1 _ U 1 , Q 0 , V 1. Then ~22,o is unique. Furthermore, if (f2 and ~2 have full normal column and row ranks respectively then QO is unique.
8.2 Converging Lower and Upper Bounds
141
Proof. Note that the problem statement of v given by (8.1) can be recast as, = inf{ll~=2ll~
: ~=2 9 A,~,},
(8.2)
where Aat = {rb22 : there exists Q 9 gl,~,, x,,,,, with ~p22 = H~2 - U 2 , Q , Y 2,1]H 1 1 - U 1 , Q , Vl111 < % and IIQI[1 _< a}. It is clear that Aat is a convex set. It is also true that I1.]]'~ is a strictly convex function. It follows from Lemma 3.3.2 that the minimizer of (8.2) given by (p22,o, if it exists is unique. If ~2 and i5"2 have full column and row ranks then it follows that
~)0= (0~)-~r where (U2)-t and (I)2) -~ represent the left and the right inverses o f U 2 and ~2 respectively. Thus (~0 is unique. This proves the lemma. []
8.2 Converging
Lower
and
Upper
Bounds
In this section we will obtain converging upper and lower bounds to the auxiliary problem. The following lemma will be useful towards this goal. L e m m a 8.2.1.. Suppose r is a sequence in e2, r is in g2 and Ok(t) -+ Co(t) for all t. Suppose also that IlCkil2/~ 11r Then lick -- r --+ 0. Proof. Given e > 0 choose n such that I I ( I - P~)r
_< min{~, (8([1r
+ 1) )u}'
(8.3)
where Pn is the truncation operator. As Ck(/) -+ r such that k > ;~'~ ~ II;~(r
- r
we can choose K2
___ ~.
(8.4)
We know that IIPn(r --+ IIP,~(r as k --+ oo. From above and the fact that IlCkl[2 -+ IIr it follows that we can choose K3 such that
k > ]~'3 ~ I I 1 ( I - P.)r
- II(Z - P,~)r
I < ~[.
(8.5)
Let K _~ max{K2, I(3} the k > K implies lick - r
= IIP-(r - r + II(I- P.)(r - r _< ~ + II(Z - P,,)(r + I I ( Z - P,~)(r oo
+2 ~
ICk(OI Ir
t=n+l
< ~ + 211(z - P,,)(r
+ ~: + 2 ~
ICk(01 Ir
t=n+l
< ~ + 2 ~ + ~ + 211(I- P.)~ll~ I1(I - P.)r < ~ + 2~ + ~[ + 211r de.
8(11r
e
+ 1) []
142
8. Multiple-input Multiple-output Systems
8.2.1 Converging Lower B o u n d s Let vn be defined by inf
[[Pn(H ~ - U 2 * Q * Y2)[[ 2
QEs = xn~
subject to
(8.6) lIp.(H
11 -
U 1 , Q 9 v~)lh
<
It is clear that only the parameters of Q ( 0 ) , . . . , Q ( n ) enter into the optimization problem and therefore (8.6) is a finite dimensional quadratic programming problem. Once optimal Q ( 0 ) , . . . , Q(n) are found, then Q -{ V ( 0 ) , . . . , Q(n), 0 , . . . } will be an FIR optimal solution to (8.6). T h e o r e m 8.2.1. Suppose the constraint set in problem (8.1) is nonempty. nuX;~,y Then problem (8.1) always has an optimal solution QO in s . Furthermore,
Also, i f ~ ~,~ := H 22 - U s 9 QO , V ~ and ~22,~ := H22 _ U 2 , Q,~ , V 2 where Q'~ is a solution to (8.6) then there exists a subsequence {(p22,n,,) of the sequence {~22.n} such that I1~ ~ , " ' - - ~22,~
--+ 0 as m ~
o~.
If (J2 and ~/2 have full normal column and row ranks respectively then QO is unique and [l~ 22''~ - ,~22'~
~
0 as n ~
oo.
Proof. We know that for any Q in 81~ u X ~ y , IIP,(H 11 m U 1 , Q 9 v1)111 < ]]P,~+I(H 11 - V 1 , Q , v1)]]1 and [IPn(H ~ 2 - U s* Q , v~)]]~ < ]]P,~+I(H ~ 2 U s , Q , v2)[]~. Therefore un < Un+l for all n -- 1 , 2 , . . . . Thus {v,~} forms an increasing sequence. Similarly it can be shown that for all n, v,~ < u. For n = 1 , 2 , . . . , let {Q'~} in i l "xn~ be FIR solutions of (8.6). As the nuXny sequence {Qn} is uniformly bounded by a in s it follows from BanachAlaoglu theorem (see Tl~eorem 2.3.1) that there exists a subsequence {Qn,,} n m of {Qn} and Q0 in s nuX~y 1 such that Qij converges to QOj in the W(cG,co ) topology. This implies that Qn~(t) converges to QO(t) for all t = 0, 1 , . . . . Therefore for all n, Pn(U * Qn,, , V) converges to Pn(U * QO , V) as m tends to c~. Now for any n > 0 and for any nm > n, ][P~(H11-U1,Q'~,V1)]]1 <_ % This implies that ][Pn(H 11 - U 1 , Q 0 , V1)][1 < 7. Since n is arbitrary, we have [[Hll
_ U 1 ,
QO, V1[[1 _< %
8.2 Converging Lower and Upper Bounds
143
Similarly for any n > 0 and for any nm > n, IIP.(HS2-U2,Q"..,v2)II== <_v. Again, this implies t h a t IIP,~(H s2 - U 2 , Q 0 , VS)ll~ _< u. Since n is arbitrary, it follows t h a t iiHSS _ U 2 , Q 0 ,
v21l~ < v.
It follows t h a t Q0 is an o p t i m a l solution for (8.1). To prove t h a t Vn ./~ V, we note t h a t I I P , ( H 2s - U s , Q " ~ , VS)ll~ _< I I P ~ ( H ss - U s , Q " ~ , V2)ll~ -v,~., for all n > 0, for all nm > n. Taking the limit as m goes to infinity we have
ilP.(HS= _ U 2 , Q 0 ,
VS)ll~ _<
lim v,~,, for all n > 0. rn--~ oo
It follows t h a t [[HS2 _ U 2 , Q 0 , V2I[~ _< limoo vn.,. T h u s we have shown t h a t limm-~oo v,~.~ = u. Since vn is a m o n o t o n i c a l l y increasing sequence, it follows t h a t u n / z u. It is clear from L e m m a 8.1.1 t h a t ~S2.o := HS2 _ U s , Q 0 , V 2 is unique. If(P ss'n := P,~(H s2 - U s * Q'~ * V s) then from the discussion above it follows t h a t v , , , = 11~22,~11~ converges to v = I1r176 ~. Also, O22'n"(t) converges to r 1 7 6 It follows f r o m L e m m a 8.2.1 t h a t II~ 2 2 ' " ~ - ~22'~
- ~ o as m
-+ ~ .
From L e m m a 8.1.1 we also have t h a t if ~-2 and ~,s have full n o r m a l column and row ranks respectively then Q0 is unique. F r o m the uniqueness of Q0 it follows t h a t the original sequence, {r converges to ~22,o in the two norm. This proves the theorem. [] 8.2.2 Converging
Upper
Bounds
Let v '~(7) be defined by inf
IIH 2 2 _ u 2 , Q 9 v211~
subject to lJ H l l - U 1 * Q * V i i i 1 ~ 7
(8.7)
IIQII~<~ Q ( k ) = O if k > n.
T h e following t h e o r e m shows t h a t {vn(7)} defines a sequence of upper b o u n d s to v ( 7 ) which converge to v(7). Theorem
8.2.2.
F o r all n, u n ( 7 ) >_ v'*+1(7) :> v(7). Also,
~"(7) "~ ~(7).
144
8. Multiple-input Multiple-output Systems ~uXny
Proof. It is clear that urn(7) _> u'~+1(7) because any Q in s which satisfies the constraints in the problem definition of u'~(7) will satisfy the constraints in the problem definition of u '~+1 (7). For the same reason we also have u ' ( 7 ) _> u(7) for all relevant n. Thus {u n (7)} is a decreasing sequence of real numbers bounded below by u(7). It can be shown that u(7 ) is a continuous function of 7 (see Theorem 6.5 in [15]). Given e > 0 choose 6 > 0 such that - ( 7 - 5) - - ( 7 ) < ~.
(8.8)
Such a 6 exists from the continuity of u(7) in 7- Let Q'Y-~ be a solution to the problem u(7 - 5) which is guaranteed to exist from Theorem 8.2.1. Let M be large enough so that m ~ M implies that
IIIH 22 - U 2 , Pm(Q ~-~) , V2II~ - ] ] H 2 2 - U 2 * Q'Y-~ 9 v211~l < -~ and (8.9) 6
IlIHll-U1,pm(Q~-~),V1111
-IIHI1-UI,Q~-~,V'II~I
< ~.
(8.10)
As Q'~-a is a solution to the problem u(7 - 5) it is also true that IIH22 - U ~ * Q"-~ * V2ll~ -- ~'(7 - 5), IIHx~ - u x * Q ~ - a * v i i i 1 _< 7 - 6 and IlQ~-allx ___~. From the above and equations (8.9), (8.10) it follows that for all m >_ M,
IIH22
- u 2 * Pm(Q "~-~) ,
E
v211~ - ~(7 - 5) <_ ~,
]IH 11 - U 1 * P.~(Q'~-~) * Vl111 < 7 and
IIPm(Q~-6)lll < ~.
(8.11) (8.12) (8.13)
From equation (8.8) and the above it follows that for all ra > M, Pm(Q "~-~) satisfies all the constraints of problem u m(7) and
liB 22 -
U 2 , Pm(Q "~-6) ,
v211~
~
v(7) ___ ~.
Thus for all m > M it follows that .~(~)
- . ( 7 ) <_ ~.
This proves the theorem.
8.3
[]
Summary
In this chapter we have formulated a problem which incorporates the 7t2 performance measure and the ~1 measure. It is shown that converging upper and lower bounds can be obtained via finite dimensional convex programming problems. This methodology avoids many of the problems present in zero interpolation based methods.
9. R o b u s t P e r f o r m a n c e
The robust stability problem addresses the stability of the closed loop system for all perturbations A which lie in a specified class. The larger this class, the more conservative the condition on the closed loop map which guarantees stability with respect to the perturbations in the class. The small gain theorem gives a condition on the closed loop map which is sufficient for stability when the only restriction on the perturbation is a norm bound. However, many physical systems can be cast into the framework of Figure 9.1 with A having a block diagonal structure [3, 25]. This has led to research into conditions on the closed loop map for robust stability with respect to perturbations which have a block diagonal structure. Necessary and sufficient conditions for robust stability with respect to linear time varying and block diagonal perturbations which have finite eoo induced norms are given in [26, 27, 28]. Similar conditions can be obtained when the perturbations are nonlinear time invariant instead of linear time varying. These conditions on the closed loop map can be verified easily and can be considerably less conservative than the small gain condition. The robust performance problem in contrast to the nominal performance problem addresses the issue of synthesis of a controller K which minimizes the effect of w on z over all controllers which stabilize the system in Figure 9.1 for the worst case A belonging to a specified class. The gl robust performance problem captures the objectives of gl robust stability and/?1 performance. In [29] it is shown that such a problem can be solved by employing sensitivity methods to linear programming, when there is only one perturbation block. However, as mentioned earlier gl performance is no guarantee of acceptable 7i2 performance. Motivated by these concerns we formulate a problem which reflects the objectives of the el robust design and nominal 7/2 performance. We show that this problem can be solved via finite dimensional quadratic programming when there is only one perturbation block. This chapter is organized as follows. In Section 9.1 we give results on robust stability and robust performance when the perturbation block is bounded in the too induced norm. In Section 9.2 we present the problem of interest and give upper and lower bounds to the main problem. We also show the connection between these bounds and quadratic programming. In Section 9.3 we give results on quadratic programming while in Section 9.4
146
9. Robust Performance
i,
G
Z
-I
Fig. 9.1. The Performance Problem we describe the solution method to the problem formulated in this chapter. Finally, in Section 9.5 we give a summary of this chapter.
9.1 Robust
Stability
and
Robust
Performance
In the first part of this section we present results on robust stability. In the second part we present results on robust performance. 9.1.1 R o b u s t S t a b i l i t y Let A be a causal map from g~o to Coo and
I1~11~-~=~:--
sup IIz~wllo~. Iltolloo<_l
The small gain theorem tells us that the system in Figure 9.2(a) is stable for all AX ~ {/~ causal : II/Xlloo_~.~ _ 1} if IIMII1 < 1. This class of perturbations does not include any structure. However, many physical systems have perturbations which have diagonal structure as shown in Figure 0.2(b). The small gain theorem is a conservative result for such systems. Therefore, it is natural to ask when the system is stable for a restricted class of perturbations which accounts for the structure. Consider Figure 0.2(b), where M is in ~ x ~ and Ai are single-input single-output maps for i = 1 , . . . , n. Define, ZILTV := {AI A = diag{A1,... ' An}, Ai is causal linear time varying}, ANL := {A] A = diag{A~,..., A,,}, A~ is causal nonlinear time invariant},
9.1 Robust Stability and Robust Performance
(a)
147
(b)
Fig. 9.2. (a) Perturbations are unstructured. (b) Perturbations have diagonal structure BZaLTV : = { A E A L T v [ [[A[l~-ind_< 1} and BZaNL := { A E AWL[ [[zh[[~-ind< 1}. BALTv and BzanL incorporate the diagonal structure of the perturbations. Each of these sets leads to a question of robust stability. We are interested in a condition on M which if satisfied leads to the stability of the system for all perturbations lying in BALT v and a condition on M which leads to stability of the system for all perturbations in BaNL- The Theorem 9.1.1 proven in [27] gives such conditions. We define [M[ where M in ~'• by the following, 11Ml1111 9 9 [[Ml,~llt) IMI:=
"
IIM.I[[1
'
9 9 [[M,~,~[[1
For the rest of this chapter by stability we mean s
stability.
T h e o r e m 9.1.1. The system in Figure 9.2(b) is stable for all A E BALTV if and only if either one of the following conditions is satisfied 1) p([M[) < 1 where p(.) is the spectral radius, 2) inf [[D-1MD[]I < 1 DED
where :D is the set of diagonal matrices with strictly positive elements. The same result holds if A is restricted to lie in BZ~L instead of B~LTV. 9.1.2 R o b u s t P e r f o r m a n c e Consider Figure 9.3(a). The ga robust performance problem with respect to linear time varying structured perturbations is the problem of synthesizing a controller K such that the closed loop map is stable for all A E BaLTV and the ga norm of the map from w to z is less than one for all A E B,aLTV" In [27] it is shown that this problem is the same as the problem of synthesizing a
148
9. Robust Performance m
A1
An
U
U
3
z ~
G
8 $
w lid
W
tL
*
G
U
i
(a)
(~)
Fig. 9.3. Robust Performance for (a) is equivalent to robust stability of (b) controller K such that the system in Figure 9.3(b) is stable for every (A, Ap) which lies in Bap, which is the set {A = diag(A, Ap) I /~ E Bzx6rv, IIApll~-i,~d <_ 1, Ap is L T V and causal}. This results in the following, theorem [28], T h e o r e m 9.1.2. The system in Figure 9.3 achieves Tvbust performance with respect to B,~LT v if and only if either of the following conditions is satisfied (a) p(lr < 1, (b) inf IID-:r DE'D
< 1,
nal matrices with strictly positive elements which have compatible dimensions with the dimensions of ~. The same theorem holds for the robust performance problem with respect to nonlinear time invariant structured perturbations. The extra block Ap is now restricted to be nonlinear time invariant.
9.2 Problem
Formulation
In this section we define the optimization problem which captures the objectives of interest stated in the previous sections. After defining the problem
9.2 P r o b l e m F o r m u l a t i o n
149
of interest we formulate problems which give upper and lower bounds to the problem of interest. Youla parametrization (see Theorem 4.2.4) tells us that all closed loop m a p s achievable through stabilizing controllers are given by
r where H, U and V are fixed elements dependent only on the system G and Q is a stable free parameter. In this chapter we assume the closed loop m a p is a two-input two-output map. We define the sets 7:) and O(D) to be the sets 7) := {
0)
d2
:
di > 0} and ,
O(O) := {r 9 e~x21 9 = H - U * Q * V for Q stable IID-lqSDII1 < "y}, respectively. Let # ( D ) := inf{[lr
[ (p 9 O(D)},
# := inf p ( D ) .
(9.1) (9.2)
DET~
We Note that (P22 is the m a p between w and z. p captures the objectives we have in mind. In the following sections we obtain converging upper and lower bounds to p. 9.2.1 D e l a y A u g m e n t a t i o n
Approach
Let n u , n w , n o , n y , nz and ns denote the dimension of u , w , v , y, z and s respectively. The Youla parametrization tells us that all achievable closed loop m a p s are given by H - U * Q * V, where H, U and V are fixed stable elements and Q is a free variable which is stable (for a detailed discussion see [3]). The constraint t h a t Q is stable can be translated into linear constraints on (P. Thus there exists an operator .4: ~ + ~ -+ s such that (P = H - U * Q * V for some Q stable if and only if A ( ~ ) = b, where b is a fixed element in el. The range space of the operator .4 is finite dimensional if n~ + n~ = nu and nw + nv = ny. In this case the problem is called a square problem. We define #~(D) :=
inf {11~2~11~2 + 5 ( I I ( D - l ~ D ) l l l l + I I ( D - t ~ D ) 2 1 1 1 ) } , ~EO(D)
#~ := inf p~(D), DES)
(9.3) (9.4)
where (D-I~D)i is the i th row of(P. It is clear that we can a p p r o x i m a t e # by ~6 to the desired accuracy by choosing an appropriate (f. # ( D ) is a quadratic p r o g r a m m i n g problem. However, it is intractable because it is not possible to show that it can be solved via finite dimensional quadratic p r o g r a m m i n g even in the square case. In contrast we have seen in Chapter 7 that for the square case, p~(D) can be solved via finite dimensional quadratic p r o g r a m m i n g . If nz + n~ > nu or n~ + nv > ny then the range space of.A is not finite dimensional. In this case #~ (D) is solved by converting it to a square problem.
150
9. Robust Performance
This is done by the Delay Augmentation Method. We give a brief description of this method here (for a detailed discussion see [3]). Let S denote a unit shift, that is,
S(r
z(1), z(2),...) = (0, x(O), z(1) .... ),
and S T denotes a T th order shift. Suppose, that the Youla parametrization of the plant yields H in gl"xn~, U in e~~xn', and V in gl , where n z = t n~ + n~ and n~o = nw + no. Partition, 0 into i
0=
02
i
l
i
,
where U 1 in g~" xn~. Similarly, partition I~" into (Q1, V~) where V 1 in gl Let (P and H be partitioned according to the following equation:
(r r
11 ~12~
~11 H~2~
We augment 0 and ~" by following
(
U1
~ 2 2 j = (/:/ul f i 2 2 j - ( ~ r 2 ) Qll (tY' Q2) . N th
9
(9.5)
order delays and augment Qll a,s given by the
~ll,N ~12,N~_(IJ11 /t12
01 0
011 012"~(~'1 ~"2 (9.6)
or equivalently, CN := H _ 0 ~ 0 9 r r
We define 69(D, N) the feasible set for the delay augmented problem to be the set p2x2J ~N = f[
{~U
E ~1
~fNQQN with O stable and IID-Ir
< 7}-
We define the Delay Augmentation problem of order N by /z~v(D) := inf{l(#N): #N E O ( D , N ) ) .
(9.7)
where
l(~) := 11~221122+ a(ll(O-~O)lll~
+
II(D-I~D)~II1).
This is a square problem and can be solved via finite dimensional quadratic programming. Let the delay augmented problem corresponding to (9.4) be given by P~v := inf #~v(D).
DE'D
We will now show that P~N converges to #a from below.
(9.8)
9.2 Problem Formulation
151
L e m m a 9.2.1. lim P~N = /~
where the limit on the left hand side of the equation above exists. Proof. It can be shown that O(D) C O(D, N + 1) C O(D, N) for all integers N. Therefore, for a given D and for a]l N #~v(D) _< #~N+,(D) and p~v(D) _2~(D). Therefore #~v(D) is an increasing sequence in N. It can be shown that this sequence converges to #~(D) from below. Now, p~v(D) _< #~(D) for all D in D. This implies that inf p~N(D) < inf #6(D). DE'/:)
--
DE'D
Therefore, P~v _< #~. Also, because #~N(D) < P~v+~ (D) for all D in "D it follows t h a t / ~ v is an increasing sequence bounde.d above by #~. Let L = lim p~. N-+c~
Suppose p~ - L =: c > 0. Then, there exists an integer M such that
P~ -- P~N _> r
VN_>M.
This implies that #*-(#~v+e/4)_>e/4V
N > M.
Therefore there exists Do in "D such that
p~(Do) - P~N(Do) >_el4 V X >_ M. This, contradicts the fact that #~N(Do) --+ IJ~(Do) as N -+ exp. This proves the lemma. [] Notice that for a given D in 7) we have the following: (
D-I~D =
q~lt
(d2/dl)r
(dt/d2)C'2t
~/'22
"
We denote d~/dl by s and therefore we have
D-lC~D =
(l/s)e_~l e.~2 2 "
Thus, #~v (D) can be obtained by solving the following problem. 2 Achievable subject to
I1' :11
<_
+
1/sll 2 ll
+ ll 2ll
_< %
where ~ is achievable if r := /2/ - (fNo~fN for some Q stable. It can be shown that the above problem is a finite dimensional quadratic programming
152
9. Robust Performance
problem (since it is a square problem) where the dimension can be determined a priori (see Chapter 7). We assume in this section that the dimension is given by T. We define new variables corresponding to each @ij(t) as follows: @+(t) := @ij(t) if @ij(t) ~_ 0 := 0 if @ij (t) < 0 @~(t) := -@ij(t) if @ij(t) <_0 := 0 if @ij(t) > O. This implies that @ij (t) = @+ (t) - @~ (t) and I@ij(t)l = @+ (t) + @~ (t). Also, we can obtain # ~ ( D ) by solving the following. rain
T 5--'(@~2(t) - @~2(t)) 2 + ~(r
* @i-. (t) + ~ ( t )
~t Achievable ~
"
+ @q(t))
+5(s@+12(t) + s@-[2(t) + 1/s@+l(t) + 1/sqS~1(t)) subject to
T E O+l,(t) +@-[l(t) + s@+2(t) + s@~o(t) <_7, t=O T
E @+~(t) +@~2(t) + 1/s@+l(t) + 1/s@2i(t) <_7, t=O
@~.(t) > o, @~(t) > o, v t : o , . . . , T . Let E := [
06(T+1)+2 ) IT+I
\ -Ir+l T
and let C = 2EE'. We define
xl
T
:= ~ r162 t----0
and x2 := E @ + l ( t ) +
t.-~0
(P~-i(l). Let the vector x and p(s) be given by:
z2
1/s
@+li @-[1
1 1
e+12 x :=
4)-[2
p(s)
o := -5
0
@2+~ @21
o 0
@+2
I
'
where 1 is a vector of ones with length (T + 1) and 0 is a vector of zeros with length (T + 1). With the above definitions we can cast P~N(D) into the following form:
9.2 Problem Formulation ]
153
!
min -~x Cx - p'(s)x subject to (A~I~;) AI2"~ A22,] x < b
(QPI)
gx:e
x>0
whereAll(s)=(o~)ands>O.
Theachievabilityconditionsareabsorbed
into H. C is positive semidefinite. #~N(D) is a function of the variable s. Therefore, we denote #~v(D) by 7(s). Note that P~v =
inf 7(s) =: "/opt.
sER+
9.2.2 F i n i t e l y M a n y V a r i a b l e s A p p r o a c h Converging upper bounds can be obtained by Finitely Many Variables approach. In this approach the allowable closed loop maps achievable via stabilizing controllers arc restricted to have a finite impluse response structure. This means that the allowable closed loop maps can be characterized by the linear map r as defined in the earlier subsection with ~4 having a finite dimensional domain space. It can be shown that in this case the range space is also finite dimensional [3]. Thus the allowable closed loop maps can be characterized by finite number of constraints involving finite number of variables. We define
#N(D) = inf{ll,P2.~Iiu u I,/, in ON(D)}, where ON(D) 0 for all k and
(9.9)
= {cp =_ H - U * Q * V with Q stable I q~(N + k) =
IID-lq, D[I1 <
.
,./~[
7}. It can be shown that #N := mf p',. (D) -+ p
--
DE'D
from above as N --+ 0r following similar arguments employed in the previous subsection. Defining x and C a.s defined in the earlier subsection we can cast pT(D) into the following form 1 i rain -~x Cx subject to A22
x < b H x : e
z>0 Note that if we denote #T(D) by 7(s) then ~T __ inf 7(s) =: "/opt. sER +
(QP2)
154
9. Robust Performance
9.3 Quadratic
Programming
Consider the following quadratic p r o g r a m m i n g problem 1
,
min ~ x C x - p' x subject to
(QP)
AxO
where A in R m~x'~l, H in R m2xnl has full row rank, and C is positive semidefinite. We are interested in obtaining necessary and sufficient conditions for x0 to be optimal for ( Q P ) . The following theorem gives such conditions. Theorem
9.3.1. C o n s i d e r the quadratic p r o g r a m m i n g problem, ( Q P ) . xo is optimal f o r the problem i f and only i f there exist yo in R m l , u in R m~, A in R m:, v in R nl such that xo, yo, u, v, ~ satisfy the following conditions p = C x o + A~u + H ~ A - v e -= H x o b = A x o + Yo 0 = u~yo 0 ---- ~)IX 0
xo >_O, yo >_O,u>_O,v >_O. Proof. (=v) Suppose, x0 is optimal for the problem ( Q P ) . This implies that x0 satisfies the conditions e : H x o , A x o - b < 0 and x0 > 0. From Theorem 3.3.2 (Kuhn-Tucker-Lagrange duality result) we know-that there exists u in R rn~ , )~ in R m2' v in R '~ with u > 0 and v > 0 such that x0 minimizes L ( x ) where 1
~
L ( x ) := ~ x C x -
p x + u ' ( A x - b) + A ' ( H x -
e) - v ' x .
This implies that d L(x)
= Cxo - p + A'u +
- v = O.
2~-X 0
Also, from Theorem 3.3.2 we know that u ' ( A x o - b) + A ' ( H x o - e) - v ' x o = O.
As x0 satisfies e - H x o = 0 we have u ' ( A x o - b) - v~xo = O. However, x0 satisfies A x o - b ~ 0 and x0 > 0. Therefore we conclude that u ' ( A x o - b) = z/x0 = 0. The necessity of the conditions given in the theorem s t a t e m e n t for x0 to be optimal is established by defining Yo = b - A x o .
9.3 Quadratic Programming
155
(r Suppose, for a given x0 there exist vectors A, y0, u, v which satisfy the conditions given in the theorem statement. Let x in R ~' be any element which satisfies the constraints of ( Q P ) . Let f(.) denote the objective function of ( Q P ) . We have 1
,
1
,
f ( x ) - f ( x o ) - -~x C x - 7 x o C x o - p ' ( x - xo) = .~(xl
_ X o ) ' C ( x - xo) + x ' C x o - xoCxo'
- p ' ( x - xo)
1
-- -~(x - X o ) ' C ( x - x o ) + ( x - x o ) ' ( p - A ' u - H A + v)
-p'(x > -(x
- xo)
- X o ) ' A ' u - ( x - x o ) ' f t ' A + (x - x o ) ' v
-- - u ' ( ( A x
- b) - ( A x o - b)) - A ' H ( x - xo) + (x - Xo)'v
= -u'((Ax
- b) + Yo) - A ' H ( x - Xo) + ( x -
xo)'v
"- u'(b - A x ) + v' x >_ 0
This proves the theorem. [] The above theorem shows that the solution of a convex quadratic programming problem as given in ( Q P ) is equivalent to the search of a vector (x, u, v, y, X) which satisfies the following conditions:
A 0 H0
(i)
0 I 0 0
=
p
,
(9.10)
v ' x + u ' y = 0,
(9.11)
(x u v y) _> 0.
(9.12)
Also, note that if conditions (9.10) and (9.11) are satisfied then the objective function f(.) of ( Q P ) is given by: 1
I
1
!
f(x) = !
= ~x (p-Au-H'A+v)-p'x I ,A x - IA, 1 ,v = - - ~Ip ,x - "~u ,~ H x + -~x
(9.13)
ii 11 ii ii = - - ~ p x - -~u (b - y) - -~:~ e + -~v 1
,
:
--~p x -
=
-~P
1, z -
1 , -~b u
1 , lv, x 1 , ~eAq~ --k ~ u y
1, -~b u
1, ~e A
(9.14)
156
9. Robust Performance
Define b :=
and x :=
v
. Let the m a t r i x in equation (9.10) be
denoted by A. Also, we assume t h a t A in R m• has full row rank (i.e. it has rank m). Note t h a t the objective function f(.) of ( Q P ) is given by
f=(_l
1 0 0 - 89
~ : CtX
(see equation (9.14)). In this section, whenever, we refer to x we assume that it is in tire form (x u v y A)' where the variables x , u , v , y and A are as defined in T h e o r e m 9.3.1. We call zi and yi primal variables. We call vi the dual variable associated with the primal variable xi and ui as the dual variable associated with the primal variable yi. Before we characterize the set of elcmemts which satisfy equations (9.10), (9.11) and (9.12), we give the following definitions. D e f i n i t i o n 9.3.1 ( F e a s i b l e s o l u t i o n ) . A n e l e m e n t x in R n is called feasible if it satisfies equations (9.10), (9.11) and (9.15). The set of all such elements is denoted by 5 . Note that a primal variable and its corresponding dual variable both c a n n o t be nonzero in a feasible solution, because of (9.11) and (9.12). D e f i n i t i o n 9 . 3 . 2 ( B a s i c s o l u t i o n ) . Let B be a m x m submatrix f o r m e d f r o m the columns of A such that B is invertible. Then, x u : = B - I b defines a basic solution of A x = b. Such a solution will have n - m components equal to zero corresponding to the columns of A not in [3. These components are called the non-basic variables. The rn components that correspond to the columns of B are called basic variables. D e f i n i t i o n 9 . 3 . 3 ( B a s i c f e a s i b l e s o l u t i o n ) . A n e l e m e n t x in R n is called basic feasible solution if it is basic and feasible. Theorem solution.
9.3.2.
If iP is not e m p t y then it has at least one basic feasible
Proof. Let ai denote the i th c o l u m n of A. Let z be a feasible solution and let the i th element of the vector z be denoted by zi. Also, let z be p a r t i t i o n e d as (x z u z v z yZ Az), where the variables x z, u z, v ~, y~, A z correpond to variables x, u, v, y, A in T h e o r e m 9.3.1 indexed by z. For simplicity assume t h a t the first p c o m p o n e n t s of z are nonzero while the rest are zero. This m e a n s t h a t Zlal + z2a2 + . . . + zpap = b, and z is such t h a t (u~)'y z + (vZ)'x z = 0 and (x ~ u z v ~ y~)' >_ 0.
9.4 Problem Solution
157
If a l , . . . , ap are independent columns then p < m because A has rank m. This implies t h a t z is a basic solution. Suppose, ax,. 9 av are dependent. T h e n there exists a in R" with at least one strictly positive element such that aiai + a 2 a 2 + , . . + a p a p = O,
with the last n - p c o m p o n e n t s equal to zero. Let c :=
zi
min - {i:a,>0} ~i
Let t : = (z - e~). This implies t h a t A t = A ( z - c~) = b because A a = 0. Also, note t h a t if zi = 0 then ti = 0. T h e condition ( u ~ ) ' y z + ( v Z ) ' x ~ = 0 is equivalent to uiz Yi~ = v ~ x zi = 0 (because (x ~ u s v ~ yZ), > 0). This m e a n s t h a t u i~y t = vixitt = 0. Also, if zl > 0 then t~ >_ 0. T h u s t is a feasible solution. From the definition of e, t will have a t m o s t p - 1 nonzero c o m p o n e n t s . T h u s from a feasible solution which had p nonzero c o m p o n e n t s we have created a feasible solution which has p - 1 nonzero c o m p o n e n t s . This process can be repeated until the n u m b e r of strictly positive c o m p o n e n t s is less than or equal to m and the corresponding columns of A are linearly independent (i.e. until the feasible solution is also basic). This concludes the proof. [] In the next section we exploit T h e o r e m 9.3.2 to solve the robust perform a n c e problem wc have formulated.
9.4
Problem
Solution
We saw ill Section 9.2 t h a t converging upper and lower b o u n d s to p as defined in (9.2) can be obtained by solving problems which can be cast into the following form: 1
,
min -~x C a - p ' ( s ) x subject to All(S) A 1 2 ~ A21 A22 J x < b Hx~e x>0
(Qp(s))
with "/opt : =
inf 7(s),
sER+
where "/opt is the p r o b l e m of interest. Note t h a t A l l (s) has the s t r u c t u r e given by
Alx(S'
158
9. Robust Performance
and s > 0. Using the results obtained in the previous section we know that the above problem has a solution if and only if there exists x, u, v, y, )~ which satisfy the following constraints:
)
0 0
0 I 0 0
=
,
(9.15)
v' x + u' y = O,
(9.16)
( x u v y ) _> 0.
(9.17)
Using the structure of A(s) the constraints given by equation (9.15) can be rearranged as given below: 0 ~
0
9 ,-s
0 * **
x2
b2
0.**
ul
pl(s)
9
*
*
****
~
9
*
*
****
"-~
_
where the entries denoted by * do not depcnd on s. We denote the m a t r i x on the left hand side by A(s), the vector on the rightmost side of the equation by b(s) and (xl,x2, ul,u2, x_,)~)' by x (note that ~ is the last element in x). We have also shown that if (QP(s)) has a finite value for some fixed s then there exists a basic feasible solution z of A ( s ) x = b(s). Note that the lower bound given by (QP1) and the upper bound given by (QP2) (which are of the form (Qp(s)) always have a finite value. Thus we will assume that (QP(s)) has a finite value for all relevant s. Also note that f(.) is given by f ( x ) ----C'(S)X. Suppose, for some fixed value so > 0 we have obtained a basic feasible solution of (QP(so)), given by z, o. Note that because of condition (9.11) one can choose the m a t r i x B(so) in R mxm where B(so) is the associated matrix with the basic solution z~o (see Definition 9.3.2) such that if a column corresponding to a dual (primal) variable is included in B(so) then the column associated with the primal (dual) variable is not in B(so). We call the rn independent columns of B(so) as the optimal basis associated with z, o. Our intention is to characterize the set of reals 0 < s such that (QP(s)) has a basic feasible solution which has the same optimal basis as the o p t i m a l basis of Zso. The way we have chosen the optimal basis for so guarantees that the condition (9.11) is satisfied if we generate a basic solution using the same columns for a value of s different from so (because the product vixi = uiyi = 0 will always be true if a solution is generated with the an optimal basis). We introduce some notation now. We assumc that A(s) is a m • n m a t r i x with m _< n.
9.4 Problem Solution
159
Given an indexing set of m positive integers J = { j l , - . . , j m } , the notation B j ( s ) denotes the matrix formed by those columns of A(s) indexed by the elements of ,7. An indexing set is said to be basis-index if the rn x rn m a t r i x Bff(s) is invertible and is an optimal-basis-index if B T(s ) is an o p t i m a l basis for the problem (QP(s)). The vector cn in R l x ' ' consists of entries of c corresponding to the basic variables whereas CD is the 1 • (n - m) vector corresponding to the nonbasic variables. Let fl be defined as /9 := B ~ 1 =
D e f i n i t i o n 9.4.1. Let so > O. Let ,70 be an optimal basis index for the problem (QP(so)). Define X j o ( . ) : R -+ R m, the solution function w.r.t J as follows
~joCs) : :
B-1 joCS)b(8)
1 if B -Jo(S) exists. Otherwise this function is given a value O. We assume throughout this chapter that xl and x2 are basic variables. 9.4.1. Let so > O. Let,70 be an optimal-basis-index with xB as the basic feasible solution for the problem (QP(so)). Suppose ul and u2 are basic variables in the optimal solution. Define B := Bffo(s0) and let fl := B -1 Then B flo (s) is invertible if and only
Theorem
a(s) := det(I4 + S Y B - 1 X ) # 0
(o00 0o)
where
X =
, S=
0
0
sos
0
so-s
0
0
'
Y
=
,o, ,
Ira(s) # 0 then
(xsCs))1 (~8(s))2 Xjo(S) = xB(,) - [~1 ~2 ~3 a4] R(~) (xBCs))3 (xsCs))4 where R(s) := (14 + S Y B - 1 X ) - I S Ira(s) = 0 then Xflo(S) = 0.
and ZB(S) := B - l b ( s ) .
(14
0).
160
9. Robust Performance
Proof. Let B := BJo(SO). As z1,x2,ul and u2 are basic variables in the o p t i m a l we have, 8-
0
0
0
0
so
~o-'
0
0
0
0
so - s
0
0
0
0
$0 3
(/4
0) =: B + X S Y .
$0 $
Therefore, det(Bjo (S)) = d e t [ B ( I + B - 1 X S Y ) ] = d e t ( B ) d e t ( I + B - 1 X S Y )
det(B)det(I4 + SYB-IX). Note that c~(s) = det(14+ SYB-IX) and therefore, it is clear that the inverse of Bjo(S) exists if and only if ~(s) ~ O. Assuming c~(s) ~ 0 we find an expression for Bj0 (s) as follows:
Bjo(S) = B - ' ( I + X S Y B - ~ ) -~ = B-I[I - (I+XSYB-1)-IXSYB-1 ] = B-1 _ B-1X(I4 + S Y B - 1 X ) - I S Y B -1. Now, X j o (s) = B)lo ( s ) b ( s ) _- [B-1 _ B - 1 X (I4 + S Y B - 1 X ) - I S Y B-1]b(s) = B - l b ( s ) - B-1X(I4 + S Y B - I X ) - I S Y B - l b ( s )
[
( xs( sl) ~
S
=B-lb(s)_[fllfl2fl3~4](i4+SrB-1X)-i
(zs(s))2 (xB(s))3 (xB(s))4
(xB(s))~ = x~Cs) - [Z 1 Z 2 Z 3 Z 4 ] RCs)
(x~Cs))2 (x~Cs))~ (xBCs))~
'
where we have defined R(s) := (I4 + S Y B - 1 X ) - I S and x n ( s ) :-- B - l b ( s ) . An expression for the 4 x 4 m a t r i x R(s) can be found easily. Note t h a t if c~(s) = 0 then BJo(S) is not invertible and by definition it follows t h a t
XjoC8) = 0.
[]
D e f i n i t i o n 9.4.2. Given so > O. Let Jo be an optimal-basis-index for the problem (QP(so) ). Define
Reg(Jo) := {s :> 0 : ~(s) r 0, ( X j o ( S ) ) i > 0 for all i : 1 , . . . , m } . Note t h a t X J o ( S ) is a rational function o f s and therefore Reg(flo) is a union of closed intervals except for the roots of a ( s ) = 0. D e t e r m i n i n g Reg(Jo) is therefore an easy task. 9.4.2. Let So > 0 and let Jo be an optimal-basis-index with XB as the basic feasible solution for the problem (QP(so)). Suppose ul and u2
Theorem
=
9.4 Problem Solution
161
are basic variables in the optimal solution. Then B j o ( s ) is an optimal basis for (QP(s)) if and only if s in Reg(Jo). Suppose, s in Reg(/To) then the objective value of (QP(s)) is given by
-(x.(s))l-
720(8) _m. c T ( s ) X B ( 8 )
_ cT(s)
[~1 ~2 f13 ]~4] R(8)
(XB(8))2
(x,(s))3 _(x,(s))4.
Proof. Suppose s in Reg(/To). Then, B/T ~(s) has linearly independent columns (because a(s) # 0). As x/To(s ) := Bjo(S)b(s) we know that X/To(S ) is a basic solution. X/To (s) is a feasible solution because (X/To(S))i >_ 0 for all i = 1 , . . . , m . If s > 0 is such that s ~ Reg(/To) then either feasibility is lost or the columns of B/T ~(s) are not independent. This proves the first part of the theorem. If the solution is optimal for (QP(s)) then the optimal objective value is given by
"~/To(S)= c~ (s)x s (s) - (XB(S))I
= c~(s){xB(s) - [~1 ~2 ~3 ~4] R(s)
(xB(s)),(~(s))2 } (~B(s))~_ (zB(S))ll = e~(s)~(s) - c~(s) [~' ~2 ~ ~ ] R(s) (xB(s))~| (~B(s))~/ 9
(~(s))~J This proves the theoren. [] We now present the following theorem which gives a way to compute 7opt. T h e o r e m 9.4.3. There exists a finite set of basis indices /T0,/T1,...,/Tt
such that R + = U~=lReg(/Tk ). Furthermore if fk :=
min
~eR~g(/Tk)
7/Tk (s)
then 7opt = min fk.
k=0,...,l
Proof. The proof is iterative: Step 1) Let Sl > 0. Find ,71, Reg(/Ti) and fl where /T1 is the optimal basis-index for (QP(sl)). Note that Reg(fll) is a finite union of closed intervals except for a finite number of points which can be determined,
162
9. Robust Performance
Step 2) Suppose we have reached the (k - 1)th step. If t.)k-ll Reg ( J p ) ----R + then stop and the theorem is true with l = k - 1. Otherwise choose any s in R + - U~'-~Reg(ffp) and perform step 1 with sl = s. This procedure has to terminate because for any s in R + there exists a basic feasible solution and there are only finite number of basis-index sets. [] We have assumed t h a t for (QP(so)) the optimal is such t h a t ul, us are there in the basis (we assume that xl and x2 are always in the o p t i m a l basis). This might not be so. In t h a t case the expressions can be easily modified and they will be simpler t h a n the ones derived.
9.5 Summary A problem which incorporates ~/~ nominal performance and gl robust performance was formulated. It was shown t h a t this problem can be solved via quadratic p r o g r a m m i n g using sensitivity techniques.
References
1. W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, Inc., 1976. 2. C. Chen. Linear System Theory And Design. Holt, Rinehart And Winston, Inc., New York, 1984. 3. M. A. Dahleh and I. J. Diaz-Bobillo. Control of Uncertain Systems: A Linear Programming Approach. Prentice Hall, Englewood Cliffs, New Jersey, 1995. 4. J. C. Doyle, K. Glover, P. Khargonekar, and B. A. Francis. State space solutions to standard 7-/2 and 7-/oo control problems. IEEE Trans. Automat. Control, 34, no. 8:pp. 831-847, 1989. 5. M. A. Dahleh and J. B. Pearson. £1 Optimal feedback controllers for MIMO discrete-time systems. IEEE Trans. Automat. Control, 32, no. 4:pp. 314-322, 1987. 6. J. C. Doyle, K. Zhou, and B. Bodenheimer. Optimal control with mixed 7t2 and 7-/oo performance objectives. In Proceedings of the American Control Conference. Vol. 3, pp. 2065-2070, Pittsburg, PA, June 1989. 7. P. P. Khargaonekar and M. A. Rotea. Mixed 7/2/7/¢¢ control; a convex optimization approach. [EEE Trans. Automat. Control, 36, no. 7:pp. 824-837, 1991. 8. N. Eha, M. A. Dahleh, and I. J. Diaz-Bobillo. Controller design via infinite dimensional linear programming. In Proceedings of the American Control Conference. Vol. 3, pp. 2165-2169, San Fransiscoi California, June 1993, 9. M. Sznaier. Mixed e l / 7 / ~ controllers for MIMO discrete time systems. In Proceedings of the IEEE Conference on Decision and Control. pp. 3187-3191, Orlando, Florida, December 1994. 10. H. Rotstein and A. Sideris. 7/oo optimization with time domain constarints. IEEE Trans. Automat. Control, 39:pp. 762-770, 1994. 11. X. Chen and J. Wen. A linear matrix inequality approach to discrete-time e l / 7 / ~ control problems. In Proceedings of the IEEE Conference on Decision and Control. pp: 3670-3675, New Orleans, LA, December 1995. 12. N. Eha and M. A. Dahleh. ea minimization with magnitude constraints in the frequency domain. Journal of Optimization Theory and its Applications, 93:27-52, 1997. 13. N. Elia, P. M. Young, and M. A. Dahleh. Multiobjective control via infinite dimensional lmi optimization. In Proceedings of the Allerton Conference on Communication, Control and Computing. pp: 186-195, Urbana, I1., 1995. 14. P. Voulgaris. Optimal 7/2/el control via duality theory. IEEE Trans. Automat. Control, 4, no. ll:pp. 1881-1888, 1995. 15. M. V. Salapaka, M. Dahleh, and P. Voulgaris. Mixed objective control synthesis: Optimal tl/7/2 control. SIAM Journal on Control and Optimization, V35 N5:1672-1689, 1997.
164
References
16. M. V. Salapaka, P. Voulgaris, and M. Dahleh. SISO controller design to minimize a positive combination of the ~1 and the 7t2 norms. Automatica, 33 no. 3:387-391, 1997. 17. M. V. Salapaka, P. Voulgaris, and M. Dahleh. Controller design to optimize a composite performance measure. Journal of Optimization Theory and its Applications, 91 no. 1:91-113, 1996. 18. M. V. Salapaka, M. Khammash, and M. Dahleh. Solution of mimo 7t2/ Q problem without zero interpolation. In Proceedings of the IEEE Conference on Decision and Control. pp: 1546-1551, San Diego, CA, December 1997. 19. P. M. Young and M. A. Dahleh. Infinite dimensional convex optimization in optimal and robust control. IEEE Trans. Automat. Control, 12, 1997. 20. N. Elia and M. A. Dahleh. Controller design with multiple objectives. IEEE Trans. Automat. Control, 42, no. 5:596-613, 1997. 21. M. V. Salapaka, M. Dahleh, and P. Voulgaris. Mimo optimal control design: the interplay of the 7/2 and the el norms. IEEE Trans. Automat. Control, 43, no. 10:1374-1388, 1998. 22. S. P. Boyd and C. H. Barratt. Linear Controller Design: Limits (9] Performance. Prentice Hall, Englewood Cliffs, New Jersey, 1991. 23. N. O. D. Cuhna and E. Polak. Constrained minimization under vector valued criteria in finite dimensional spaces. J. Math. Annal. and Appl., 19:pp 103-124, 1967. 24. M. Khammash. Solution of the l l mimo control problem without zero interpolation. In Proceedings of the IEEE Conference on Decision and Control. pp: 4040-4045, Kobe, Japan, December 1996. 25. J. C. Doyle. Analysis of feedback systems with structured uncertainty. In IEE Proceedings. Vol. 129-D(6), pp. 242-250, November 1982. 26. M. H. K h a m m a s h and J. B. Pearson. Robust disturbance rejection in g l optimal control systems. Systems and Control letters, 14, no. 2:pp. 93-101, 1990. 27. M. H. K h a m m a s h and J. B. Pearson. Performance robustness of discrete-time systems with structured uncertainty. I E E E Trans. Automat. Control, 36, no. 4:pp. 398-412, 1991. 28. M. H. K h a m m a s h and J. B. Pearson. Analysis and design for robust performance with structured uncertainty. Systems and Control letters, 20, no. 3:pp. 179-187, 1993. 29. M. H. Khammash. Synthesis of globally optimal controllers for robust performance to unstructured uncertainty. IEEE Trans. Automat. Control, 41:189-198, 1996:
Index
(x, I1" II), 17 (x, T), 3 (X,d), 14 < .,. >, 35
A\B, 2 AxB, 1
B(X, Y), 32
C
D
' 73
bi3k~° , 114 co, 38
f-'(B), 6 hK(x*), 57 int(Y), 3 B~p, 148 BZ~LTU, 147 B~NL , 147 ANL, 146
F Ok'% , 113 Gc,,qt, 114 Gf~apt, 114 P $ , 46 Pk, 69 S. 69
W(X, X*),
[AI ]
ALTV, 36
X*, 33 X**, 34 Y-,3 [f, ~], 56 Auv, 112 H3ej , 13
x(X), 2 £r,, 38 g~, 69
146
Af(x), 3 1-block problem, 113 4-block problem, 113 7/00, 83
achievable, 112 adjoint map, 35 affine linear map, 18 approximate problem, 123 axiom of choice, 1, 2 axiom of countability, 5
e~, 69 m×n , 69 t~p
{}, 2 &i(A), 113
~(A), u3 A-transforms, 72 ]im inf r~ 23 ]im sup r~, 24 b d ( K ) , 54 p~, I49 p~(D), 149
Banach space, 18 Banach-Alaoglu, 37, 44. 87. 88, 101, 135. 136, 142 base. 4 basic feasible solution, 156 basic solution, 156 basis, 18 basis-index 159 bilinear form, 35 bounded linear operators, 32 bounded map, 19
II II, 16 -~,2 O, 73 aui (Ao), 112 ~vj (A0), 112
canonical map, 35 Cauchy sequence, 15 causality, 69 closed, 3
166
Index
closed loop map, 76 closure, 3 combination problem, 115 compactness, 11 completeness, 15 composite performance measure, 99 cones, 46 continuity, 6 controllability, 73 convergence, 5 convex combination, 46 convex maps, 47 convex optimization problem, 55 convex sets, 45 convolution maps, 70 coprime, 73 dcf, 74 delay augmentation, 149 denseness, 5 detectability, 73 dimension, 18 directed set, 2 dual, 68 dual spaces, 33 dual variable, 156 Eidelheit separation, 55 Epigraph, 56 eventually, 5 FDLTIC, 72 feasible solution, 156 filter, 9 finite dimensional system, 72 finite dimensional vector space, 18 FIR, 84 frequently, 5 Gateaux derivative, 25 Hahn-Banach. 33 half spaces, 51 Hausdorff topology, 5 Hiene-Borel theorem, 20-22 Holder's inequality, 39 homeomorphism, 9 hyperplanes, 49 inductively ordered, 2 initial topology, 8 interior, 3 isometric maps, 15 isomorphism, 9
Kuhn-Tucker-Lagrange duafity, 67, 88, 90, 94, 102, 116, 117, 130, 131, 154 lcf, 74 linear independence, 18 linear map, 18 linear variety, 49 local extrema, 23 local maxima, 23 local minimum, 23 Luenberger controller, 79 majorant, 2 maximal element, 2 metric, 14 metric topology, 14 MIMO systems, 139 minimal realization, 73 minimum distance from a convex set, 59 Minkowski's function, 52 Minkowski's inequality, 39 minorant, 2 mixed problem, 123 neighbourhood, 3 neighbourhood base, 4 neighbourhood filter, 3 nets, 5 non-basic variable, 156 non-square, 113 norm topology, 17 normal rank, 74 normed vector space, 16 observability, 73 open, 3 open cover, 11 optimal basis, 158 optimal-basis-index, 159 order, 2 pareto optimality, 100 poles, 74 positive cones, 46 positively homogeneous , 27 preorder, 2 primal, 68 primal variable, 156 product normed spaces, 17 product set, 13 product topology, 13 projection, 13
Index
quadratic programming, 154 range, 7 rank interpolation conditions, 113 rcf, 74 real valued, 27 reflexive, 35 relative topology, 3 robust performance, 147 robust stability, 146 second dual space, 34 semicontinuity, 23 sensitivity, 68, 90 separability, 5 separation of a point and a convex set, 54 separation of disjoint convex sets, 55 sequence, 5 shift map, 69 signal-space, 69 SISO fl/7t2 problem, 83 Smith-Mcmillan form, 74 square plant, 113 stability, 70 stability of closed loop maps, 76 stabilizability, 73 stabilizing controller, 76 state space, 73 stronger topology, 4 strongest topology, 4 subadditive function, 27 subbase, 4
sublinear functions, 27 subnets, 10 subspace, 16 support-functional, 57 system, 70 time invariance, 70 topology, 2 totally ordered, 2 transitive, 2 truncation operator, 69 Tychonoff's theorem, 13, 20 ultrafilter, 9 unimodular matrices, 73 unit in t l , 75 universal nets, 10 vector space, 16 weak topology, 36 weak-star topology, 36 weaker topology, 4 weakest topology, 4 well ordered, 2 well-posed, 76 Youla parameter, 81 Youla parametrization, 81 zero interpolation conditions, 113 zeros, 74 Zorn, 2, 10, 29, 30
167
Lecture Notes in Control and Information Sciences Edited by M. Thoma 1993-1999 Published Titles:
Vol. 186: Sreenath, N. Systems Representation of Global Climate Change Models. Foundation for a Systems Science Approach. 288 pp. 1993 [3-540-19824-5] Vol. 187: Morecki, A.; Bianchi, G.;
Jaworeck, K. (Eds) RoManSy 9: Proceedings of the Ninth CISM-IFToMM Symposium on Theory and Practice of Robots and Manipulators. 476 pp. 1993 [3-540-19834-2] Vol. 188: Naidu, D. Subbaram Aeroassisted Orbital Transfer: Guidance and Control Strategies 192 pp. 1993 [3-540-198199] Vol. 189: Ilchmann, A. Non-Identifier-Based High-Gain Adaptive Control 220 pp. 1993 [3-540-198458]
Vol. 194: Cao, Xi-Ren Realization Probabilities: The Dynamics of Queuing Systems 336 pp. 1993 [3-540-19872-5] Vol. 195: Liu, D.; Michel, A.N. Dynamical Systems with Saturation Nonlinearities: Analysis and Design 212 pp. 1994 [3-540-19886-1] Vol. t96: BattilotU, S. Noninteracting Control with Stability for Nonlinear Systems 196 pp. 1994 [3-540-19891-1] Vol. 197: Henry, J.; Yvon, J.P. (Eds) System Modelling and Optimization 975 pp approx. 1994 [3-540-19893-8] Vol. 198: Winter, H.; N0l~er, H.-G. (Eds)
Advanced Technologies for Air Traffic Flow Management 225 pp approx. 1994 [3-540-19895-4]
Vol. 190: Chatila, R.; Hirzinger, G. (Eds)
Experimental Robotics I1: The 2nd International Symposium, Toulouse, France, June 25-27 1991 580 pp. 1993 [3-540-19851-2] Vol. 191: Blondel, V. Simultaneous Stabilization of Linear Systems 212 pp. 1993 [3-540-19862-8] Vol. 192: Smith, R.S.; Dahleh, M. (Eds) The Modeling of Uncertainty in Control Systems 412 pp. 1993 [3-540-19870-9] Vol. 193: Zinober, A.S.I. (Ed.) Variable Structure and Lyapunov Control 428 pp. 1993 [3-540-19869-5]
Vol. 199: Cohen, G.; Quadrat, J.-P. (Eds) 1 lth International Conference on Analysis and Optimization of Systems Discrete Event Systems: Sophia-Antipolis, June 15-16-17, 1994 648 pp. 1994 [3-540-19896-2] Vol. 200: Yoshikawa, T.; Miyazaki, F. (Eds) Experimental Robotics IIh The 3rd Intemational Symposium, Kyoto, Japan, October 28-30, 1993 624 pp. 1994 [3-540-19905-5] Vol. 201: Kogan, J. Robust Stability and Convexity 192 pp. 1994 [3-540-19919-5] Vol. 202: Francis, B.A.; Tannenbaum, A.R. (Eds) Feedback Control, Nonlinear Systems, and Complexity 288 pp. 1995 [3-540-19943-8]
Vol. 203: Popkov, Y.S. Macrosystems Theory and its Applications: Equilibrium Models 344 pp. 1995 [3-540-19955-1]
Vol. 213: Patra, A.; Rao, G.P. General Hybrid Orthogonal Functions and their Applications in Systems and Control 144 pp. 1996 [3-540-76039-3]
Vol. 204: Takahashi, S.; Takahare, Y. Logical Approach to Systems Theory 192 pp. 1995 [3-540-19956-X]
Vol. 214: Yin, G.; Zhang, Q. (Eds) Recent Advances in Control and Optimization of Manufacturing Systems 240 pp. 1996 [3-540-76055-5]
Vol. 205: Kotta, U. Inversion Method in the Discrete-time Nonlinear Control Systems Synthesis Problems 168 pp. 1995 [3-540-19966-7] Vol. 206: Aganovic, Z.; Gajic, Z. Linear Optimal Control of Bilinear Systems with Applications to Singular Perturbations and Weak Coupling 133 pp. 1995 [3-540-19976-4] Vol. 207: Gabasov, R.; Kirillova, F.M.; Prischepova, S.V. Optimal Feedback Control 224 pp. 1995 [3-540-19991-8] Vol. 208: Khalil, H.K.; Chow, J.H.; Ioannou, P.A. (Eds) Proceedings of Workshop on Advances inControl and its Applications 300 pp. 1995 [3-540-19993-4] Vol. 209: Foias, C.; Ozbay, H.; Tannenbaum, A. Robust Control of Infinite Dimensional Systems: Frequency Domain Methods 230 pp. 1995 [3-540-19994-2] VoI. 210: De Wilde, P. Neural Network Models: An Analysis 164 pp. 1996 [3-540-19995-0]
Vol. 211: Gawronski, W. Balanced Control of Flexible Structures 280 pp. 1996 [3-540-76017-2] Vol. 212: Sanchez, A. Formal Specification and Synthesis of Procedural Controllers for Process Systems 248 pp. 1996 [3-540-76021-0]
VoI. 2t5: Bonivento, C.; Marro, G.; Zanasi, R. (Eds) Colloquium on Automatic Control 240 pp. 1996 [3-540-76060-1]
Vol. 216: Kulhavy, R. Recursive Nonlinear Estimation: A Geometric Approach 244 pp. 1996 [3-540-76063-6] Vol. 217: Garofalo, F.; Glielmo, L. (Eds) Robust Control via Variable Structure and LyapunQv Techniques 336 pp. 1996 [3-540-76067-9] Vol. 2t8: van der Schaft, A. L2 Gain and Passivity Techniques in Nonlinear Control 176 pp. 1996 [3-540-76074-1] Vol. 2t9: Berger, M.-O.; Deriche, R.; Herlin, I.; Jaffr~, Jo; Morel, J.-M. (Eds) ICAOS '96: 12th International Conference on Analysis and Optimization of Systems Images, Wavelets and PDEs: Paris, June 26-28 1996 378 pp. 1996 [3-540-76076-8] Vol. 220: Brogliato, B. Nonsmooth Impact Mechanics: Models, Dynamics and Control 420 pp. 1996 [3-540-76079-2]
Vol. 221: Kelkar, A.; Joshi, S. Control of Nonlinear Multibody Flexible Space Structures 160 pp. 1996 [3-540-76093-8] Vol. 222: Morse, A.S. Control Using Logic-Based Switching 288 pp. 1997 [3-540-76097-0]
Vol. 223: Khatib, O.; Salisbury, J.K.
Vol. 233: Chiacchio, P.; Chiaverini, S. (Eds)
Experimental Robotics IV: The 4th International Symposium, Stanford, California, June 30 - July 2, 1995 596 pp. 1997 [3-540-76133-0]
Complex Robotic Systems 189 pp. 1998 [3-540-76265-5] Vol. 234: Arena, P.; Fortuna, L.; Muscato, G.; Xibilia, M.G.
Tenfouw, J. (Eds) Robust Flight Control: A Design Challenge 654 pp. 1997 [3-540-76151-9]
Neural Networks in Multidimensional Domains: Fundamentals and New Trends in Modelling and Control 179 pp. 1998 [1-85233-006-6]
Vol. 225: Poznyak, A.S.; Najim, K.
Vol. 235: Chen, B.M.
Learning Automata and Stochastic Optimization 219 pp. 1997 [3-540-76154-3]
Hoo Control and Its Applications 361 pp. 1998 [1-85233-026-0]
Vol. 224: Magni, J.-F.; Bennani, S.;
Vol. 236: de Almeida, A.T.; Khatib, O. (Eds) Vol. 226: Cooperman, G.; Michler, G.;
Autonomous Robotic Systems 283 pp. 1998 [1-85233-036-8]
Vinck, H. (Eds) Workshop on High Performance Computing and Gigabit Local Area Networks 248 pp. 1997 [3-540-76169-1]
Vol. 237: Kreigman, D.J.; Hagar, G.D.;
Vol. 227: Tarboudech, S.; Garcia, G. (Eds) Control of Uncertain Systems with Bounded Inputs 203 pp. 1997 [3-540-76183-7]
Vol. 238: Elia, N. ; Dahleh, M.A.
Morse, A.S. (Eds) The Confluence of Vision and Control 304 pp. 1998 [1-85233-025-2] Computational Methods for Controller Design 200 pp. 1998 [1-85233-075-9]
Vol. 228: Dugard, L.; Verdest, E.I. (Eds)
Stability and Control of Time-delay Systems 344 pp. 1998 [3-540-76193-4]
Vol. 239: Wang, Q.G.; Lee, T.H.; Tan, K.K.
Vol. 229: Laumond, J.-P. (Ed.)
Finite Spectrum Assignment for Time-Delay Systems 200 pp. 1998 [1-85233-065-1]
Robot Motion Planning and Control 360 pp. 1998 [3-540-76219-1]
Vol. 240: Lin, Z.
Vol. 230: Siciliano, B.; Valavanis, K.P. (Eds)
Low Gain Feedback 376 pp. 1999 [1-85233-081-3]
Control Problems in Robotics and Automation 328 pp. 1998 [3-540-76220-5] Vol. 231: Emeryanov, S.V.; Burovoi, I.A.; Levada, F.Yu. Control of Indefinite Nonlinear Dynamic Systems 196 pp. 1998 [3-540-76245-0]
Vol. 241: Yamamoto, Y.; Hare S. Learning, Control and Hybrid Systems 472 pp. 1999 [1-85233-0767] Vol. 242: Conte, G.; Moog, C.H.; Perdon A.M. Nonlinear Control Systems 192 pp. 1999 [1-85233-151-8]
Vol. 232: Casals, A.; de Almeida, A.T. (Eds)
Experimental Robotics V: The Fifth International Symposium Barcelona, Catalonia, June 15-18, 1997 190 pp. 1998 [3-540-76218-3]
Vol. 243: Tzafestas, S.G.; Schmidt, G. (Eds)
Progress in Systems and Robot Analysis and Control Design 624 pp. 1999 [1-85233-123-2]
Vol. 244: Nijmeijer, H.; Fossen, T.I. (Eds) New Directions in Nonlinear Observer Design 552pp: 1999 [1-85233-134-8] Vol. 246: Garulli, A.; Tesi, A.; Vicino, A. (Eds)
Robustness in Identification and Control 448pp: 1999 [1-85233-179-8] Vol. 246: Aeyels, D.;
Lamnabhi-Laganigue,F.; van der Schaft,A. (Eds) Stability and Stabilization of Nonlinear Systems 408pp: 1999 [1-85233-638-2] Vol. 247: Young, K.D.; OzgQner, 0. (Eds)
Variable Structure Systems, Sliding Mode and Nonlinear Control 400pp: 1999 [1-85233-197-6] Vol. 248: Chen, Y.; Wen C.
Iterative Leaming Control 216pp: 1999 [1-85233-190-9] Vol. 249: Cooperman, G.; Jessen, E.;
Michler, G. (Eds) Workshop on Wide Area Networks and High Performance Computing 352pp: 1999 [1-85233-642-0] Vol. 250: Corke, P. ; Trevelyan, J. (Eds) Experimental Robotics VI 552pp: 2000 [1-85233-210-7] Vol. 251: van der Schaft, A. ; Schumacher, J. An Introduction to Hybrid Dynamical Systems 192pp: 2000 [1-85233-233-6]