Dynamic Noncooperative Game Theory
This is Volume 160 in MATHEMATICS IN SCIENCE AND ENGINEERING A Series of Monograph...
234 downloads
1109 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Dynamic Noncooperative Game Theory
This is Volume 160 in MATHEMATICS IN SCIENCE AND ENGINEERING A Series of Monographs and Textbooks Edited by RICHARD BELLMAN, University of Southern California The complete listing of books in this series is available from the Publisher upon request.
Dynamic Noncooperative Game Theory TAMER
GEERT JAN OLSDER
BA~AR
Coordinated Science Laboratory and Department of Electrical Engineering University of JIlinois at UrbanaChampaign Urbana, Illinois, USA
Department of Applied Mathematics Twente University of Technology Enschede, The Netherlands
1982
Academic Press A Subsidiary of Harcourt Brace Jovanovich, Publishers
Paris
.
London . New York San Diego . San Francisco . Sao Paulo Sydney . Tokyo . Toronto
ACADEMIC PRESS INC. (LONDON) LTD. 2428 Oval Road London NWI 7DX United States Edition published by ACADEMIC PRESS INC. III Fifth Avenue New York, New York 10003
Copyright © 1982 by ACADEMIC PRESS INC. (LONDON) LTD.
All Rights Reserved No part of this book may be reproduced in any form by photostat, microfilm, or any other means, without written permission from the publishers
British Library Cataloguing in Publication Data Basar, T. Dynamic noncooperative game theory. 1. Differential games. I. Olsder, G. J. 519.3 QA272
ISBN 0120802201 LCCCN 8167913
Typeset by Macmillan India Ltd., Bangalore. Printed in Great Britain by Page Bros (Norwich) Ltd.
Preface
This book presents an extensive treatment of static and dynamic noncooperative game theory, with emphasis placed on the interplay between dynamic information patterns and the structural properties of several different types of equilibria. We believe that, in this respect, it fills a gap in the game theory literature both as a textbook and as a reference book. The first part of the book, which comprises Chapters 2 to 4, covers the material that is generally taught in an advanced undergraduate or firstyear graduate course on noncooperative game theory. The coverage includes static finite and infinite games of both the zerosum and nonzerosum type and in the latter case both Nash and Stackelberg solution concepts are discussed. Furthermore, this part includes an extensive treatment of the class of dynamic games in which the strategy spaces are finitethe socalled multiact games. Through an extensive tree formulation, the impact of information patterns on the existence, uniqueness and the nature of different types of noncooperative equilibria of multiact games is thoroughly investigated. Most of the important concepts of static and dynamic game theory are introduced in these three chapters, and they are supplemented by several illustrative examples. Exposition of the material is quite novel, emphasizing concepts and techniques of multiperson decision making rather than mathematical details. However, mathematical proofs of most of the results are also provided, but without hindering the flow of main ideas. The second part of the book, which includes Chapters 5 to 8, extends the theory of the first part to infinite dynamic games in both discrete and continuous time (the soknown differential games). Here the emphasis is again on the close interrelation between information patterns and noncooperative equilibria of such multiperson dynamic decision problems. We present a unified treatment of the existing, but scattered, results in the literature on infinite dynamic games as well as some new results on the topic. The treatment is confined to deterministic games and to stochastic games under perfect state information, mainly because inclusion of a complete investigation on stochastic dynamic games under imperfect state information would require presentation of some new techniques and thereby a volume much larger than the present one. v
vi
PREFACE
Each chapter (with the exception of the first) is supplemented with a problem section which includes problems of two types: (i) standard exercises on the contents of the chapter, which a reader who has carefully followed the text should have no difficulty in solving, (ii) exercises which can be solved by making use of the techniques developed in the text, but which also require some elaborate thinking on the part of the reader, or research problems at the level of a master's or even a Ph.D. thesis. The first part of the book contains exercises mostly of the first category, whereas the second part contains more problems belonging to the second category, these being indicated by a "star" preceding the problem number. Following the problem section in each chapter (except the first) is a notes section, which is devoted to historical remarks and sources for further reading on the topics covered in that particular chapter. A onesemester course on noncooperative game theory, taught using this book, would involve mainly Part I and also an appropriate blend of some of the topics covered in Part II, this latter choice depending to a great extent on the taste of the instructor and the background of the students. Such a course would be suitable for advanced undergraduate or firstyear graduate students in engineering, economics, mathematics, operations research and business administration. In order to follow the main flow of Chapters 2 and 3 the student need not have any real mathematical background, but he should be able to think in abstract terms. However, proofs of some of the theorems in these two chapters, as well as the contents of Chapter 4, require some basic knowledge of real analysis and probability, which is summarized in the three appendices that are included towards the end of the book. Part II of the book, on the other hand, is intended more for the researcher in the field, to provide him with the stateofart in infinite dynamic game theory. However, selected topics from this part could also be used as a part of a graduate course on dynamic optimization, optimal control theory or mathematical economics. We believe that it is in fact the first time these topics have been included in such an extensive form in a textbook. This project started in October 1978 when the first author was on a yearlong sabbatical leave at the Department of Applied Mathematics of Twente University of Technology; it continued in a somewhat decentralized (but cooperative) fashion when (this time) the second author was on a yearlong sabbatical leave at the Division of Applied Sciences of Harvard University and the first author had returned to the Department of Applied Mathematics of Marmara Scientific and Industrial Research Institute; and it was concluded two years after its inception, again at Twente University of Technology, during a short visit of the first author, which was sponsored by the Foundation "Hogeschoolfonds Twente". Hence an interesting feature of the book is that it was written on three different continentsAsia, Europe and
PREFACE
vii
America. The financial support of all these institutions and organizations. as well as the scientific environment, secretarial service and different facilities provided by the Marmara Scientific and Industrial Research Institute and Twente University of Technology contributed greatly toward the completion of this project, and are therefore gratefully acknowledged. In this connection, we should particularly mention Miss Jacoline Brugge, Mrs Sacide Suer, Mrs Carla Hassing and Miss Miriam Bezemer who typed skilfully, at different stages, parts of the manuscript, and Mr R. H. Arends who made all the drawings. We also wish to extend our thanks to Prof. Milan Vlach, of Charles University, Prague, Mr Derya Cansever, of Marmara Research Institute, and Prof. Rajan Suri, of Harvard University, who read parts of the manuscript and made valuable suggestions which led to improvements in the presentation of the material. The first author would like to take this opportunity to express his indebtedness to two individuals, Prof. Max Mintz, of the University of Pennsylvania, and Prof. YuChi Ho, of Harvard University, for introducing him to different facets of dynamic game theory and for highly stimulating discussions during the early phases of his research career. The second author would like to acknowledge his interaction with Prof. John V. Breakwell, of Stanford University, who aroused his interest in differential games and whose intellectual imagination he greatly admires. Tamer Basar Urbana, November 1981
Geert Jan Olsder Enschede, November 1981
To Tanqiil and Gozen (T. B.) and Elke, Rena, Theda and Kike (G. J. 0.)
Contents Preface
v
Chapter 1 Introduction and Motivation Preliminary remarks 1.2 Preview on noncooperative games 1.3 Outline of the book 1.4 Conventions, notation and terminology 1.1
1 1 3 13
14
PART I Chapter 2 Noncooperative Finite Games: Twoperson Zerosum 2.1 Introduction 2.2 Matrix games 2.3 Computation of mixed equilibrium strategies 2.4 Extensive forms: singleact games 2.5 Extensive forms: multiact games 2.6 Zerosum games with chance moves 2.7 Problems 2.8 Notes
19 19 19 31
39 50 61 65
68
Chapter 3 Noncooperative Finite Games: Nperson Nonzerosum
70
Introduction Bimatrix games Nperson games in normal form Computation of mixedstrategy Nash equilibria in bimatrix games Nash equilibria of Nperson games in extensive form 3.5.1 Single act games: purestrategy Nash equilibria 3.5.2 Singleact games: Nash equilibria in behavioural and mixed strategies 3.5.3 Multiact games: purestrategy Nash equilibria 3.5.4 Multiact games: behavioural and mixed equilibrium strategies The Stackelberg equilibrium solution Nonzerosum games with chance moves Problems Notes
70 71
3.1
3.2 3.3
3.4 3.5
3.6 3.7 3.8
3.9
ix
82
90 92 95 112
115 123
125
144
149 154
x
CONTENTS
Chapter 4 Static Noncooperative Infinite Games 4.1 Introduction 4.2 e equilibrium solutions 4.3 Continuouskernel games: results on existence of Nash equilibria 4.4 Some special types of zerosum games on the square 4.4.1 Games with separable kernels 4.4.2 Games of timing 4.4.3 Bellshaped games 4.5 Stackelberg solution of continuouskernel games 4.6 Quadratic games with applications in microeconomics 4.7 Problems 4.8 Notes
156 156 156 165 170 170 172 174 176 183 194 197
PART II Chapter 5 5.1 5.2 5.3 5.4
General Formulation of Infinite Dynamic Games Introduction Discretetime infinite dynamic games Continuoustime infinite dynamic games Mixed and behavioural strategies in infinite dynamic games 5.5 Tools for oneperson optimization 5.5.1 Dynamic programming for discretetime systems 5.5.2 Dynamic programming for continuoustime systems 5.5.3 The minimum principle 5.5.4 Representations of strategies along trajectories 5.6 Problems 5.7 Notes
Chapter 6 Nash Equilibria of Infinite Dynamic Games 6.1 Introduction 6.2 Openloop and feedback Nash equilibria for dynamic games in discrete time 6.2.1 Openloop Nash equilibria 6.2.2 Closedloop nomemory (feedback) Nash equilibria 6.3 Informational properties of Nash equilibria in discretetime dynamic games 6.3.1 A threeperson dynamic game illustrating informational non uniqueness 6.3.2 General results on informationally non unique equilibrium solutions 6.4 Stochastic nonzerosum games with deterministic information patterns 6.5 Appendix A:Openloop and feedback Nash equilibria of differential games 6.5.1 Openloop Nash equilibria
201 201 202 210 217 219 220 222 223 231 236 237 239 239 240 241 249 259 259 264 271 278 278
CONTENTS
Xl
6.5.2 Closedloop nomemory (feedback) Nash equilibria 6.6 Appendix B: Stochastic differential games with deterministic information patterns 6.7 Problems 6.8 Notes Chapter 7 Hierarchical (Stackelberg) Equilibria Dynamic Games
of Infinite
7.1 Introduction 7.2 Openloop Stackelberg solution of twoperson dynamic games in discrete time 7.3 Feedback Stackelberg solution under CLPS information pattern 7.4 (Global) Stackelberg solution under CLPS information pattern 7.4.1 An illustrative example (Example 1) 7.4.2 A second example (Example 2): follower acts twice in the game 7.4.3 Linear Stackelberg solution of linearquadratic dynamic games 7.5 Stochastic dynamic games with deterministic information patterns 7.5.1 (Global) Stackelberg solution 7.5.2 Feedback Stackelberg solution 7.6 Problems 7.7 Notes Chapter 8 PursuitEvasion Games 8.1 Introduction 8.2 Necessary and sufficient conditions for saddlepoint equilibria 8.3 Capturability 8.4 Singular surfaces 8.5 Solutions of two pursuitevasion games 8.5.1 The lady in the lake 8.5.2 The dolichobrachistochrone 8.6 An application in maritime collision avoidance 8.7 Role determination and an application in aeronautics 8.8 Problems 8.9 Notes Appendix I I.l 1.2 1.3 1.4 1.5
284 293 299 302
305 305 306 312 315 315 322 324 332 333 338 340 342 344 344 345 353 361 367 368 371 380 385 394 397
Mathematical Review
399
Sets Normed linear (vector) spaces Matrices Convex sets and functionals Optimization of functionals
399 400 401
401 402
xii Appendix II 11.1 11.2 11.3 Appendix III
CONTENTS
Some Notions of Probability Theory Ingredients of probability theory Random vectors Integrals and expectation Fixed Point Theorems
405 405 406
408 409
References
411
List of Corollaries, Definitions, Examples, Lemmas, Propositions, Remarks and Theorems
423
Index
426
Chapter 1
Introduction and Motivation 1.1
PRELIMINARY REMARKS
This book is concerned with dynamic noncooperative game theory. In a nutshell, game theory involves multiperson decision making; it is dynamic if the order in which the decisions are made is important, and it is noncooperative if each person involved pursues his or her" own interests which are partly conflicting with others'. A considerable part of everything that has been written down, whether it is history, literature or a novel, has as its central theme a conflicting situationsa collision of interests. Even though the notion of "conflict" is as old as mankind, the scientific approach has started relatively recently, in the years around nineteen hundred and thirty, with, as a result, a still growing stream of scientific publications. We also see that more and more scientific disciplines devote time and attention to the analysis of conflicting situations. These disciplines include (applied) mathematics, economics, aeronautics, sociology and politics. It is relatively easy to delineate the main ingredients of a conflicting situation: an individual has to make a decision and each possible decision leads to a different outcome or result, which are valued differently by that individual. This individual may not be the only one who decides about a particular outcome; a series of decisions by several individuals may be necessary. If all these individuals value the possible outcomes differently, the germs for a conflicting situation are there. The individuals involved, also called players or decision makers, or simply persons, do not always have complete control over the outcome. Sometimes there are uncertainties which influence the outcome in an unpredictable way. Under such circumstances, the outcome is (partly) based on data not yet t Without any preference to sexes, a decision maker, in this book, is most times referred to as a "he". It could equally well be a "she".
2
DYNAMIC NONCOOPERATIVE GAME THEORY
known and not determined by the other players' decisions. Sometimes it is said that such data is under the control of "nature", or "God", and that every outcome is caused by the joint or individual actions of human beings and nature. The established names of "game theory" (development from approximately 1930) and "theory of differential games" (development from approximately 1950, parallel to that of optimal control theory) are somewhat unfortunate. "Game theory", especially, appears to be directly related to parlour games; of course it is, but the notion that it is only related to such games is far too restrictive. The term "differential game" became a generally accepted name for games where differential equations play an important role. Nowadays the term "differential game" is also being used for other classes of games for which the more general term "dynamic game" would be more appropriate. The applications of "game theory" and the "theory of differential games" mainly deal with economic and political conflicting situations, worst case designs and also modelling of war games. However, it is not only the applications in these fields that are important; equally important is the development of suitable concepts to describe and understand conflicting situations. It turns out, for instance, that the role of informationwhat one player knows relative to othersis very crucial in such problems. Scientifically, dynamic game theory can be viewed as a child of the parents game theory and optimal control theory. t Its character, however, is much more versatile than that of either of its parents, since it involves a dynamic decision process evolving in (discrete or continuous) time, with more than one decision maker, each with his own cost function and possibly having access to different information. This view is the starting point behind the formulation of Table I. The place of dynamic game theory. One player
Many players
Static
Mathematical programming
(Static) game theory
Dynamic
Optimal control theory
Dynamic (and/or differential) game theory
t In almost all analogies there is a deficiency; a deficiency in the present analogy is that the child is as old as one of his parentsoptimal control theory. For the relationship between these theories and the theory of mathematical programming see Table I.
INTRODUCfION AND MOTIVATION
3
"games in extensive form", which started in the nineteen thirties through the pioneering work of Von Neumann, which culminated in his book with Morgenstern (Von Neumann and Morgenstern, 1947), and then made mathematically precise by Kuhn (1953), all within the framework of "finite" games. The general idea in this formulations is that a game evolves according to a road or tree structure; at every crossing or branching a decision has to be made as how to proceed. In spite of this original setup, the evolution of game theory has followed a rather different path. Most research in this field has been, and is being, concentrated on the normal or strategic form of a game. In this form all possible sequences of decisions of every player are set out against each other. For a twoplayer game this results in a matrix structure. In such a formulation dynamic aspects of a game are completely suppressed, and this is the reason why game theory is classified as basically "static" in Table I. In this framework emphasis has been more on (mathematical) existence questions, rather than on the development of algorithms to obtain solutions. Independently, control theory gradually evolved from Second World War servomechanisms, where questions of solution techniques and stability were studied. Then followed Bellman's "dynamic programming" (Bellman, 1957)and Pontryagin's "maximum principle" (Pontryagin et al., 1962), which spurred the interest in a new field called optimal control theory. Here the concern has been on obtaining optimal (i.e. minimizing or maximizing) solutions and developing numerical algorithms for oneperson singleobjective dynamic decision problems. The merging of the two fields, game theory and optimal control theory, which leads to even more concepts and to actual computation schemes, has achieved a level of maturity, which the reader will hopefully agree with after he/she goes through this book. At this point, at the very beginning of the book, where many concepts still have to be introduced, it is rather difficult to describe how dynamic game theory evolved in time and what the contributions of relevant references are. We therefore defer such description until later, to the "notes" section of each chapter (except the present), where relevant historical remarks are included.
1.2 PREVIEW ON NONCOOPERATIVE GAMES A clear distinction exists between twoplayer (or, equivalently, twoperson) zerosum games and the rest. In a zerosum game, as the name implies, the sum of the cost functions of the players is identically zero. Mathematically speaking, if ui and L i denote, respectively, the decision variable and the cost 1 i 2 function of the ith player (to be written Pi), then = 1 L (u , u ) == 0 in a zerosum game. If this sum is, instead, equal to a nonzero constant (independent of
I.:
4
DYNAMIC NONCOOPERATIVE GAME THEORY
the decision variables), then we talk about a "constantsum" game which can, however, easily be transformed into a zerosum game through a simple translation without altering the essential features of the game. Therefore, constantsum games can be treated within the class of zerosum games, without any loss of generality, which we shall choose to do in this book. A salient feature of twoperson zerosum games that distinguishes them from other types of games is that they do not allow for any cooperation between the players, since, in a twoperson zerosum game, what one player gains incurs as a loss to the other player. However, in other games, such as twoplayer nonzerosum games (wherein the quantity L~ = I Li(ut, u l ) is not a constant) or three or moreplayer games, cooperation between two or more players may lead to their mutual advantage. Example 1 A point object (with mass one) can move in a plane which is endowed with the standard (Xl> xl)coordinate system. Initially, at t = 0, the point mass is at rest at the origin. Two unit forces act on the point mass; one is chosen by Pl, the other by P2. The directions of these forces, measured counterclockwise with respect to the positive Xlaxis, are determined by the players and are denoted by u l and ul , respectively;they may in general be timevariant. At time t = 1, PI wants to have the point mass as far in the negative xldirection as possible, i.e.he wants to minimize Xl(1),whereas P2 wants it as far in the positive xldirection as possible, i.e. he wants to maximize Xl(1), or equivalently, to minimize Xl (1). (See Fig. 1.) The "solution" to this zerosum game follows immediately; each player pulls in his own favourite direction, and the point mass remains at the originsuch a solution is known as the saddlepoint solution.
Ir~U
x,
Fig. J. The ropepulling game.
We now alter the formulation of the game slightly, so that, in the present setup, P2 wishes to move the point mass as far in the negative xldirection as possible, i.e, he wants to minimize xl(I). PI's goal is still to minimize xdl). This new game is clearly nonzerosum. The equations of motion for the point
INTRODUCTION AND MOTIVATION
5
mass are
= COS(UI)+COS(U 2), X2 = sin(u l)+sin(u 2 ), XI
Xl (0) = XI(O) = 0;
x2(0) = X2(0) = O.
Let us now consider the pair of decisions {ul == n, u 2 ==  n/2} with the corresponding values of the cost functions being = xdl) =  t and L 2 = x2(1) =  t. If P2 sticks to u 2 == n/2, the best thing for PI to do is to choose u l == rr; any other choice of u l (t) will yield an outcome which is greater than  t. Analogously, if PI sticks to u l == rr, P2 does not have a better choice than u 2 ==  n/2. Hence, the pair {ul == n; u 2 ==  n/2} exhibits an equilibrium behaviour, and this kind of a solution, where one player cannot improve his outcome by altering his decision unilaterally, is called a Nash equilibrium solution, or shortly, a Nash solution. If both players choose u l == u 2 == 5n/4, however, then the cost values become L I = L 2 =  tJ2, which are obviously better, for both players, than the costs incurred under the Nash solution. But, in this case, the players have to cooperate. If, for instance, P2 would stick to his choice u 2 == 5n/4, then PI can improve upon his outcome by playing u l == c, where c is a constant with tt ::s; c < 5n/4. PI is now better off, but P2 is worse off! Therefore, the pair of strategies {ul == 5n/4, u 2 == 5n/4} cannot be in equilibrium in a noncooperative mode of decision making, since it requires some kind of faith (or even negotiation), and thereby cooperation, on part of the players. If this is allowable, then the said pair of strategiesknown as a Paretooptimal solutionstands out as a reasonable equilibrium solution for the game problem (which is now called a cooperative game), since it features the property that no other joint decision of the players can improve the performance of at least one of them, without degrading the performance of the other. 0
e
In this book we shall deal only with noncooperative games. The reasons for such a seeming limitation are twofold. Firstly, cooperative gamest can, in general, be reduced to optimal control problems by determining a single cost function to be optimized by all players, which suppresses the "game" aspects of the problem. Secondly, the size of this book would have increased considerably by inclusion of a complete discussion on cooperative games. Actions and strategies Heretofore, we have safely talked about "decisions" made by the players, without being very explicit about what a decision really is. This will be made t What we have in mind here is the class of cooperative games without side payments. If side
payments are allowed, we enter the territory of (cooperative) games in characteristic function form, which is an all together different topic. (See Owen (1968) and Vorob'ev (1977).)
6
DYNAMIC NONCOOPERATIVE GAME THEORY
more precise now in terms of information available to each player. In particular, we shall distinguish between actions (also called controls) on the one hand and strategies (or, equivalently, decision rules) on the other. If an individual has to decide about what to do the next day, and the options are fishing and going to work, then a strategy is: "if the weather report early tomorrow morning predicts dry weather, then I will go fishing, otherwise I will go to my office".This is a strategy or decision rule: what actually will be done depends on quantities not yet known and not controlled by the decision maker; the decision maker cannot influence the course of the events further, once he has fixed his strategy. Any consequence of such a strategy, after the unknown quantities are realized, is called an action. In a sense, a constant strategy (such as an irrevocable decision to go fishing without any restrictions or reservations) coincides with the notion of action. In the example above, the alternative actions are to go fishing and to go to work, and the action to be implemented depends on information (the weather report) which has to be known at the time it is carried out. In general, such information can be of different types. It can, for instance, comprise the previous actions of all the other players. As an example, consider the following sequence of actions: if he is nice to me, I will be kind to him; if he is cool, I will be cool, etc. The information can also be of a stochastic nature, such as in the fishing example. Then, the actual decision (action) is based on data not yet known and not controlled by other players, but instead determined by "nature". The next example will now elucidate the role of information in a game situation and show how it affects the solution. The information here is deterministic; "nature" does not playa role. Example 2 This example aims at a gametheoretic analysis of the advertising campaigns of two competing firms during two periods of time. Initially, each firm is allotted a certain number of customers; PI has 100 and P2 has 150 customers. Per customer, each firm makes a fixed profit per period, say one dollar. Through advertisement, a firm can increase its number of customers (some are stolen away from the competitor and others come from outside) and thereby its profit. However, advertising costs money; an advertising campaign for one period costs thirty dollars (for each firm). The figures in this example may not be very realistic; it is, however, only the ratio of the data that matters, not the scale with respect to which it is expressed. First, we consider the oneperiod version of this game and assume that the game ends after the first period is over. Suppose that PI has to decide first whether he should advertise (Yes) or not (No), and subsequently P2 makes his choice. The four possible outcomes and the paths that lead to those outcomes are depicted in Fig. 2, in the form of a tree diagram. At every branching, a
INTRODUCTION AND MOTIVATION Initially:
7
P1 has 100 customers P2 has 150 customers

N
so
75
145
125
190
120
P2 decides
Y 125 number of customers of P1 175 number of customers of P2
00
75
115
95
125
160
120
145
profit
P1
profit P2
Fig. 2. Oneperiod advertising game.
decision has to be made as how to proceed. The objective of each firm is to maximize its profit (for this oneperiod game). The "best" decisions, in this case, can be found almost by inspection; whatever PI does (Y or N), P2 will always advertise (Y), since that is more profitable to him. PI, realizing this, has to choose between Y(with profit 95) and N (with profit 75), and will therefore choose Y. Note that, at the point when P2 makes his decision, he knows PI's choice (action) and therefore P2's choice depends, in principle, on what PI has done. P2's best strategy can be described as: "if PI chooses N, then I choose Y; if PI chooses Y, then I choose y", which, in this case, is a constant strategy. This oneperiod game will now be changed slightly so that PI and P2 make their decisions simultaneously, or, equivalently, independently of each other. The information structure associated with this new game is depicted in Fig. 3;a dashed curve encircling the points D 1 and D z has been drawn, which indicates that P2 cannot distinguish between these two points. In other words, P2 has to P1
so
75
125
160
115 120
95 145
) profits
Fig. 3. P2 chooses independently of PI's action.
8
DYNAMIC NONCOOPERATIVE GAME THEORY
arrive at a decision without knowing what PI has actually done. Hence, in this game, the strategy of P2 has a different domain of definition, and it can easily be verified that the pair {Y, Y} provides a Nash solution, in the sense described before in Example 1, yielding the same profits as in the previous game. We now extend the game to two periods, with the objective of the firms being maximization of their respective cumulative profits (over both periods). The complete game is depicted in Fig. 4 without the information sets; we shall, in fact, consider three different information structures in the sequel. First, the order of the actions will be taken as PIP2PIP2, which means that at the time of his decision each player knows the actions previously taken. Under the second information structure to be considered, the order of the actions is (PI, P2HPl, P2), which means that during each period the decisions are made independently of each other, but that for the decisions of the second period the (joint) actions of the first period are known. Finally, as a third case, it will be assumed that there is no order in the actions at all; PI has to decide on what to do during both periods without any prior knowledge of what P2 will be doing, and vice versa. We shall be looking for Nash solutions in these three games. The PIP2PIP2 information structure The solution is obtained by working backward in time (d la "dynamic programming"). For the decision during the second period, P2 knows to which point (D 7D 14 ) the game has proceeded. From each of these points he chooses Yor N, depending on which decision leads to a higher profit. These decisions are denoted by an asterisk in Fig. 4. At the time of his decision during the                 PI decides
_ N
Y'
D2
D, customers profit
Y'
N
[1~
D4
Dy[.80
, ,~.
'~
D7
Dg
De
N'
y
64 104
60 133
1CX)
64 104
60 103
144 229
[ 75 190
Y
N'
92
76 60 121 143
1CX)
62
46 60 91 143
140 142 228 225
126 135 216 303
N'
Y
!160 75
t"
N
[145 120
~ ,~ ' " ~
Y'
D
120
N
_________ P2decldes
6
y'
[125 175 _    PI decides [ 95 145
~
D14 +   P2 decides N'
Y
86 57 171 128
72 92 155 100
87 128
127 96
71 109 76 116 121 155
1C6 90 } customers 116 140
57 141
42 92 125 100
98
87
97 96
71 79 76 86 121 125
75 60 } profit 2nd per lo d 116 110
132 131 117 207 301 258 285 220
202 218
212 216
56
98
194 171 166 170 155 } total profit 206 266 270 261 255
Fig. 4. The twoperiod advertising game, without the information sets.
9
INTRODUCTION AND MOTIVATION
second period, PI knows that the game has proceeded to one of the points D 3D 6 • He can also guess precisely what P2 will do after he himself has made his decision (the underlying assumption is that P2 will behave rationally). Hence, PI can determine what will be the best for him. At point D 3 , for instance, decision N leads to a profit of 144 (for PI) and Y leads to 142; therefore, PI will choose N if the game would be at D 3 • The optimal decisions for PI at the second period are indicated by an asterisk also. In this way we continue (in retrograde time) until the vertex of the tree is reached; all best actions are designated by an asterisk in Fig. 4. As a result, the actual game will evolve along a path passing through the points D 2 , D 6 , and D 14 , and the cumulative profits for PI and P2 will be 170 and 261, respectively.
The (PI, P2HPI, P2) information structure During the last period, both players know that the game has evolved to one of the points D 3D6 , upon which information the players will have to base their actions. This way, we have to solve four "oneperiod games", one for each of the points D 3D6 . The reader can easily verify that the optimal (Nash) solutions associated with these games are: information (starting point)
PI
D3 D4
action
N N Y N
o,
D6
profit
P2
PI
N N N Y
144 135
212 166
P2 229 303
216 270
The profit pairs corresponding to these solutions can be attached to the respective points D 3D 6 , after which a oneperiod game is left to be solved. The game tree is: P1
P1
144
135
212
166
profit
229
303
216
270
profit P2
It can be verified that both players should play Y during the first period, since a unilateral deviation from this decision leads to a smaller profit for both players. The realized cumulative profits of PI and P2 under this information scheme are therefore 166 and 270, respectively.
10
DYNAMIC NONCOOPERATIVE GAME THEORY
The noinformation case Both players have four possible choices: N N, NY, YN and YY, where the first (respectively, second) symbol refers to the first (respectively, second) period. Altogether there are 4 x 4 = 16 possible pairs of realizable profits, which can be written in a matrix form:
~
chooses
NN
NY
YN
yy
229
144
140 228
135 303
132 301
NY
142 225
126 216
131 258
117 285
YN
207 220
202 218
171 266
166 270
YY
212 216
194 206
170 261
155 255
PI chooses
NN
If PI chooses Y Nand P2 chooses YY, then the profits are 166 and 270, respectively. If anyone of the players deviates from this decision point, he will be worse off. (If, for example, PI deviates, his other profit options are 132, 117 and 155, each one being worse than 166.)Therefore {(YN), (YY)} is the Nash solution under the "noinformation"scheme. 0 Game problems of this kind will be studied extensively in this book, specifically in Chapter 3 which is devoted to finite nonzerosum games, i.e. games in which each player has only a finite number of possible actions available to him. Otherwise, a game is said to be infinite, such as the one treated in Example 1. In Chapter 3, we shall also elucidate the reasons why the realized profits are, in general, dependent on the information structure. The information structure (PI, P2HP1, P2) in Example 2 is called a feedback information structure; during each period the players know exactly to which point ("state") the game has evolved and that information is fed back into their strategies, which then leads to certain actions. This structure is sometimes also referred to as closedloop, though later on in this book a distinction between feedback and closedloop will be made. The noinformation case is referred to as an openloop information structure. Not every game is welldefined with respect to every possible information structure; even though Example 2 was welldefined under both a feedback and
INTRODUCTION AND MOTIVATION
11
an openloop information structure, this is not always true. To exemplify this situation, we provide, in the sequel, two specific games. The game to be formulated in Example 3 does not make much sense under an openloop information structure, and the one of Example 4 can only be played under the openloop structure. 0 Example 3
"The lady in the lake" A lady is swimming in a circular lake with a maximum speed VI' A man who wishes to intercept the lady, and who has not mastered swimming, is on the side of the pond and can run along the shore with a maximum speed Vm • The lady does not want to stay in the lake forever and wants eventually to escape. If, at the moment she reaches the shore, the man is not there, she "wins" the game since on land she can run faster than the man. Note that openloop solutions (i.e. solutions corresponding to the openloop information structure) do not make much sense here (at least not for the man). It is reasonable to assume that each player will react immediately to what his/her opponent has done and hence the feedback information structure, whereby the current position of man and lady are fed back, is more appropriate. 0 Example 4 "The princess and the monster" This game is played in complete darkness. Like the previous example, the current one is also of the pursuitevasion type. Here, however, the players do not know where their opponent is, unless they bump into each other (or come closer to each other than a given distance 1»; that moment in fact defines the termination of the game. The velocities of both players are bounded and their positions are confined to be in a bounded area. The monster wants to bump into the princess as soon as possible, whereas the princess wants to postpone this moment for as long as possible. Since the players cannot react to each other's actions or positions, the information structure is openloop. 0
The solution to Example 3 will be given in Chapter 8; for more information on Example 4, see Foreman (1977). This latter example also leads, rather naturally, to the concept of "mixed strategies"; the optimal strategies in the "princess and monster" game cannot be deterministic, as were the strategies in Examples 1 and 2, for instance. For, if the monster would have a deterministic optimal strategy, then the princess would be able to calculate this strategy; if, in addition, she would know the monster's initial position, this would enable her to determine the monster's path and thereby to choose for herself an appropriate strategy so that she can avoid him forever. Therefore, an optimal strategy for the monster (if it exists) should dictate random actions, so that his trajectory cannot be predicted by the princess. Such a strategy is called mixed. Equilibria in mixed strategies will be discussed throughout this book.
12
DYNAMIC NONCOOPERATIVE GAME THEORY
What is optimal? In contrast to optimal control problems (oneplayer games) where optimality has an unambiguous meaning, in multiperson decision making, optimality, in itself, is not a welldefined concept. Heretofore we have considered the Nash equilibrium solution, which is a specific form of "optimality". We also briefly discussed Paretooptimality, There exist yet other kinds of optimality in nonzerosum games. Two of them will be introduced now by means of the matrix game encountered in the "noinformation"case of Example 2. The equilibrium strategies in the "Nash" sense were YN for PI and YY for P2. Suppose now that P2 is a careful and defensive player and wants to protect himself against any irrational behaviour on the part of the other player. If P2 sticks to YY, the worst that can happen to him is that PI chooses YY, in which case P2's profit becomes 255 instead of 270. Under such a defensive attitude P2 might play YN, since then his profit is at least 258. This strategy (or, equivalently, action in this case) provides P2 with a lower bound for his earnings and the corresponding solution concept is called minimax. The player who adopts this solution concept basically solves a zerosum game, even though the original game might be nonzerosum. Yet another solution concept is the one that involves a hierarchy in decision making: one of the players, say PI, declares and announces his strategy before the other player chooses his strategy and he is in a position to enforce this strategy. In the matrix game of the previous paragraph, if PI says "I will play YY" and irrevocably sticks to it (by the rules of the game we assume that cheating is not possible), then the best P2 can do is to choose YN, in which case PI's profit becomes 170 instead of 166 which was obtainable under the Nash solution concept. Such games in which one player (called the leader) declares his strategy first and enforces it on the other player (called the follower) are referred to as Stackelberg games. They will be discussed in Chapter 3 for finite action sets, and in Chapters 4 and 7 for infinite action sets. Static versus dynamic As the last topic of this section, we shall attempt to answer the question: "When is a game called dynamic and when is it static?" So far, we have talked rather loosely about these terms; there is, in fact, no uniformly accepted separating line between static games on the one hand and dynamic games on the other. We shall choose to call a game dynamic if at least one player is allowed to use a strategy that depends on previous actions. Thus, the games treated in Example 2 with the information schemes PlP2PlP2 and (PI, P2HPl, P2) are dynamic. The third game of Example 2, with the "noinformation" scheme, should be called static, but, by an abuse of language, such a game is often also called dynamic. The reason is that the players act more than once and thus time plays a role. A game in which the players act only once and independently of
INTRODUCTION AND MOTIVATION
13
each other is definitely called a static game. The game displayed in Fig. 3 is clearly static.
1.3 OUTLINE OF THE BOOK The book comprises eight chapters and three appendices. The present (first) chapter serves the purpose of introducing the reader to the contents of the book, and to the conventions and terminology adopted. The next three chapters constitute Part I of the text, and deal with finite static and dynamic games and infinite static games. Chapter 2 discusses the existence, derivation and properties of saddlepoint equilibria in pure, mixed and behavioural strategies for twoperson zerosum finite games. It is in this chapter that the notions of normal and extensive forms of a dynamic game are introduced, and the differences between actions and strategies are delineated. Also treated is the class of twoperson zerosum finite games in which a third (chance) player with a fixed mixed strategy affects the outcome. Chapter 3 extends the results of Chapter 2 to nonzerosum finite games under basically two different types of equilibrium solution concepts, viz. the Nash solution and the Stackelberg (hierarchical) solution. The impact of dynamic information on the structure of these equilibria is thoroughly investigated, and in this context the notion of prior and delay commitment modes of play are elucidated. Chapter 4 deals with static infinite games of both the zerosum and nonzerosum type. In this context it discusses the existence, uniqueness and derivation of (pure and mixed) saddlepoint, Nash and Stackelberg equilibria. It provides explicit solutions for some types of games, with applications in microeconomics. The remaining four chapters constitute Part II of the book, for which Chapter 5 provides a general introduction. It introduces the class of infinite dynamic games to be studied in the remaining chapters, and also gives some background material on optimal control theory. The major portion of Chapter 6 deals with the derivation and properties of Nash equilibria in infinite discretetime deterministic and stochastic dynamic games with prescribed fixed duration under different types of deterministic information patterns. Two appendices to the chapter present counterparts of all these results in the continuous time. Chapter 7 discusses the derivation of global and feedback Stackelberg equilibria for the class of dynamic games treated in Chapter 6, but only in discrete time. Finally, Chapter 8 deals with the class of zerosum differential games for which the duration is not fixed a priorithe socalled pursuitevasion
14
DYNAMIC NONCOOPERATIVE GAME THEORY
gamesand under the feedback information pattern. It first presents some sets of necessary and sufficient conditions for saddlepoint solution of such differential games, and then applies these to pursuitevasion games with specific structures so as to obtain some explicit results. Each chapter (with the exception of Chapter I) starts with an introduction section which summarizes its contents, and therefore we have kept the descriptions above rather brief. Following Chapter 8 are the three appendices two of which (Appendices I and II) provide some background material on sets, vector spaces, matrices, optimization theory and probability theory to the extent these notions are utilized in the book. The third appendix, on the other hand, presents some theorems which are used in Chapters 3 and 4. The book ends with a list of references, a table that indicates the page numbers of the Propositions, Theorems, Definitions, etc. appearing in the text, and an index.
1.4
CONVENTIONS, NOTATION AND TERMINOLOGY
Each chapter of the book is divided into sections, and occasionally sections are divided into subsections. Section 2.1 refers to section I of Chapter 2, and section 5 referred to in a particular chapter stands for the fifth section of that chapter. Section 8.5.2 refers to subsection 2 of section 8.5. The following items appear in this book and they are numbered per chapter: Theorem (abbreviated to Thm) Definition (abbreviated to Def.) Figure (abbreviated to Fig.) Proposition (abbreviated to Prop.) Corollary
Problem Remark Lemma Example Equation
The "telephone system" is used to refer to these items. Thus Thm 7 refers to the seventh theorem of the chapter in which this reference is made. When crossreferencing, the chapter number is also included in the notation; for instance, Thm 2.7 refers to Theorem 7 of Chapter 2. Unlike the numbering of other items, equation numbers appear within parentheses, such as equation (2.5) which refers to the fifth numbered equation in Chapter 2. An exception to this convention is the numbering of items referred to in sections 6.6 and 6.7, which are also considered as appendices to Chapter 6. Theorem 2 of section 6.6 (respectively, section 6.7)is referred to as Thm 6.A2 (respectively, Thm 6.B2), and the same convention applies to all the other items in these two sections. References to bibliographical sources (listed alphabetically at the end of the book) are made according to the Harvard system (i.e. by name(s) and date). The following abbreviations and symbols are adopted in the book, unless stated otherwise in a specific context:
INTRODUCTION AND MOTIVATION
RHS LHS w.p. IX
LP ~
o
II
sgn(x) =
vA Pi
P(E) Li
Ji N N ui yi 11
i
K
K [0, T] x
xU) V
V V
Vm
15
righthand side lefthand side with probability IX linear programming equality sign defined by end of proof, remark, example, etc. parallel to +l ,ifx>O undefined, if x = { I ,ifx
°
boundary of the set A player i pursuer (evader) cost function(al) of Pi for a game in extensive form cost function(al) of Pi for a game in normal form number of players ~ {I, ... , N}, players' set; i E N action (decision, control) variable of Pi; UiE Vi strategy (decision law) of Pi; yi E rinformation available to Pi number of stages (levels) in a discretetime game ~{I, ... , K}; kEK time interval on which a differential game is defined: t E [0, T] state variable; x(t) in continuous time and Xk in discrete time timederivative of x(t) value of zerosum game (in pure strategies) upper value lower value value of a zerosum game in mixed strategies.
Other symbols or abbreviations used with regard to sets, unions, summation, matrices, optimization, random variables, etc. are introduced in Appendices I and II. A convention that we have adopted throughout the text (unless stated otherwise) is that in nonzerosum games all players are costminimizers, and in twoperson zerosum games PI is the minimizer and P2 is the maximizer. In twoperson matrix games PI is the rowselector and P2 is the columnselector. The word "optimality" is used in the text rather freely for different equilibrium solutions when there is no ambiguity in the context; optimum quantities (such as strategies, controls, etc.) are generally identified with an asterisk (*).
This page intentionally left blank
PART I
This page intentionally left blank
Chapter 2
Noncooperative Finite Games: Twoperson Zerosum 2.1 INTRODUCTION This chapter deals with the class of twoperson zerosum games in which the players have a finite number of alternatives to choose from. There exist two different formulations for such games: the normal (matrix) form and the extensive (tree) form. The former constitutes a suitable representation of a zerosum game when each player's information is static in nature, since it suppresses all the dynamic aspects of the decision problem. The extensive formulation, on the other hand, displays explicitly the evolution of the game and the existing information exchanges between the players. In the first part of the chapter (sections 2 and 3), the normal form is introduced together with several related concepts, and then existence and computation of saddlepoint equilibria are discussed for both pure and mixed strategies. In the second part of the chapter (sections 4 and 5), extensive form description for zerosum finite games without chance moves is introduced, and saddlepoint equilibria for such games are discussed, also within the class of behavioural strategies. This discussion is first confined to singleact games in which each player is allowed to act only once, and then it is extended to multiact games. Section 6 of the chapter is devoted to a brief discussion on the nature of equilibria for zerosum games which also incorporate chance moves.
2.2 MATRIX GAMES The most elementary type of twoperson zerosum games are matrix games. There are two players, to be referred to as Player I (PI) and Player 2 (P2),and 19
20
DYNAMIC NONCOOPERATIVE GAME THEORY
an (m x n)dimensional matrix A = {aij}' Each entry of this matrix is an outcome of the game, corresponding to a particular pair of decisions made by the players. For PI, the alternatives are the m rows of the matrix, while for P2 the possible choices are the n columns of the same matrix. These alternatives are known as the strategies of the players. If PI chooses the ith row and P2 the jth column, then aij is the outcome of the game, and PI pays this amount to P2. In case aij is negative, this should be interpreted as P2 paying PI the positive amount corresponding to this entry (i.e,  aij)' To regard the entries of the matrix as sums of money to be paid by one player to the other is, of course, only a convention. In a more general framework, these outcomes represent utility transfers from one player to the other. Thus, we can view each element of the matrix A (i.e. outcome of the game) as the net change in the utility of P2 for a particular play of the game, which is equal to minus the net change in the utility of Pi. Then, regarded as a rational decision maker, PI will seek to minimize the outcome of the game, while P2 will seek to maximize it, by independent decisions. Assuming that this game is to be played only once, then a reasonable mode of play for PI is to secure his losses against any (rational or irrational) behaviour of P2. Under such an incentive, PI is forced to pick that row (i*) of matrix A, whose largest entry is no bigger than the largest entry of any other row. Hence, if PI adopts the i*th row as his strategy, where i* satisfies the ineq ualities V(A)~max
a.•. ~ max aij, , J
j
j
i = 1, ... ,m,
(1)
then his losses will be no greater than V which we call the loss ceiling of PI, or equivalently, the security level for his losses. The strategy "row i*" that yields this security level will be called a security strategy for Pi. Adopting a similar mode of play, P2 will want to secure his gains against any behaviour of PI, and consequently, he will choose that column (j*) whose smallest entry is no smaller than the smallest entry of any other column. In mathematical terms, j* will be determined from the ntuple of inequalities ~
(A)
aijo
~min i
~
min aij, j = 1, ... , n.
(2)
i
By deciding on thej*th column as his strategy, P2 secures his gains at the level
y which we call the qainfloor of P2, or equivalently, the security level for his
gains. Furthermore, any strategy for P2 that secures his gain floor, such as "column j*", is called a security strategy for P2. l I n every matrix game A = {aij}, (i) the security level of each player is unique, (ii) there exists at least one security strategy for each player,
THEOREM
2
21
TWO·PERSON ZERO·SUM
(iii) the security level of PI (the minimizer)neverfalls below the security level of P2 (the maximizer), i.e.
min air = i
Y
(A) :::; t7(A)
= maxa j
j• j
,
(3)
where i* and j* denote security strategies for PI and P2, respectively. Proof The first two parts follow readily from (1) and (2), since in every matrix game there are only a finite number of alternatives for each player to choose from. To prove part (iii), we first note the obvious set of inequalities min ail :::; akl :::; max akj i
j
which hold true for all possible k and I. Now, letting k desired result, i.e. (3), follows.
= i* and 1= j*,
the 0
The third part ofThm 1 says that, if played by rational players, the outcome of the game will always lie between V and y It is for this reason that the security levels j7 and yare also called, respectively, the upper and lowervalues of the matrix game. Yet another interpretation that could be given to these values is the following. Consider the matrix game in which players do not make their decisions independently, but there is a predetermined ordering according to which the players act. First PI decides on what row he will choose, he transmits this information to P2 who subsequently chooses a column. In this game, P2 definitely has an advantage over PI, since he knows what his opponent's actual choice of strategy is. Then, it is unquestionable that the best play for PI is to choose one of his security strategies (say, row i*). P2's "optimal" response to this is the choice of a "column jO" where t" is determined from
(4) with the "min max" operation designating the order of play in this decision process. Thus, we see that the upper value of the matrix game is actually attained in this case. Now, if the roles of the players are interchanged, this time PI having a definite advantage over P2, then P2's best choice will again be one of his security strategies (say, column j*), and PI's response to this will be a "row iO ", where iO satisfies a,~j.
= min aij. = i
Y
= max min ail' j
i
(5)
with the "max min" symbol implying that the minimizer acts after the
22
DYNAMIC NONCOOPERATIVE GAME THEORY
maximizer. Relation (5) indicates that the outcome of the game is equal to the lower value y, when PI observes P2's choice. To illustrate these facets of matrix games, let us now consider the following (3 x 4) matrix: P2
PI
1
3
3
2
0
1
2
1
2
2
0
1
(6)
Here P2 (the maximizer) has a unique security strategy, "column 3" (i.e.j* = 3), securing him a gainfloor of y = O. PI (the minimizer), on the other hand. has two security strategies, "row 2" and "row 3" (i.e. if = 2, i! = 3), yielding him a lossceiling of V = maxj aZl = maxj a 3j = 2 which is above the security level of P2. Now, if P2 plays first, then he chooses his security strategy "column 3", with PI's unique response being "row 3" (i 0 = 3),t resulting in an outcome of 0= Y If PI plays first, he is actually indifferent between his two security strategies. In case he chooses "row 2", then P2's unique response is "column 3" (/ = 3),t whereas if he chooses "row 3", his opponent's response is "column 2" ( / = 2), both pairs of strategies yielding an outcome of 2 = V. The preceding discussion validates the argument that, when there is a definite order of play, security strategies of the player who acts first make complete sense, and actually they can be considered to be in "equilibrium" with the corresponding response strategies of the other player. By two strategies (of PI and P2) to be in equilibrium, we roughly mean here that, after the game is over and its outcome is observed, the players should have no ground to regret their past actions. In a matrix game with a fixed order of play, for example, there is no justifiable reason for the player who acts first to regret his security strategy after the game is over. But what about the class of matrix games in which the players arrive at their decisions independently? Do security strategies have any sort of an equilibrium property, in that case? To shed some light on this question, let us consider the (3 x 3) matrix game of (7) below, with the players acting independently, and the game to be played only once. P2
PI
t
4
0
1
0
1
3
1
2
1
Here, it is only coincidental that optimal response strategies are also security strategies.
(7)
2
TWOPERSON ZEROSUM
23
Both players have unique security strategies, "row 3" for PI and "column 1" for P2, with the upper and lower values of the game being V = 2 and 1" = 0, respectively. If both players play their security strategies, then the outcome of the game is 1, which is midway between the security levels of the players. But after the game is over, PI might think: "Well I knew that P2 was going to play his security strategy, it is a pity that I didn't choose row 2 and enjoy an outcome of 0". Thinking along the same lines, P2 might also regret that he did not play column 2 and achieve an outcome of 2. This, then, indicates that, in this matrix game, the security strategies of the players cannot possibly possess any equilibrium property. On the other hand, if a player chooses a row or column (whichever the case is) different from the one dictated by his security strategy, then he will be taking chances, since there is always a possibility that the outcome of the game might be worse for him than his security level. For the class of matrix games with equal upper and lower values, however, such a dilemma does not arise. Consider, for example, the (2 x 2) matrix game P2
PI
EJBj
(8)
in which "row 2" and "column 2" are the security strategies of PI and P2, respectively, resulting in the same security level V = Y = 1. These security strategies are in equilibrium, since each one is "optimal" against the other one. Furthermore, since the security levels of the players coincide, it does not make any difference, as far as the outcome of the game is concerned, whether the players arrive at their decisions independently or in a predetermined order. The strategy pair {row 2, column 2}, possessing all these favourable features, is clearly the only candidate that could be considered as the equilibrium solution of the matrix game (8). Such equilibrium strategies are known as saddlepoint strategies, and the matrix game, in question, is then said to have a saddle point in pure strategies. A more precise definition for these terms is given below. DEFINITION 1 For a given (m x n) matrix game A = {a;J, let {row i*, column j*} be a pair of strategies adopted by the players. Then, if the inequality
(9) is satisfied for all i = 1, ... , m and all j = I, ... , n, the strategies {row r, columnj*} are said to constitute a saddlepoint equilibrium (or simply, they are said to be saddlepoint strategies), and the matrix game is said to have a saddle point in pure strategies. The corresponding outcome a j • r of the game is called the saddlepoint value, or simply the value, of the matrix game, and is denoted 0 by V(A).
24
DYNAMIC NONCOOPERATIVE GAME THEORY
THEOREM
2
Let A
= {aij} denote an (m x n) matrix game with V(A) = l' (A).
Then, (i) A has a saddle point in pure strategies, (ii) an ordered pair of strategies provides a saddle point for A if, and only if, the first of these is a security strategy for P 1 and the second one is a
security strategy for P2, (iii) V (A) is uniquely given by V(A)
= V(A) = Y(A).
Proof Let i* denote a security strategy for P 1 and j* denote a security strategy for P2, which always exist by Thm 1 (ii). Now, since V = Y ,we have a i•j
:5; m~x J
a i•j
= V=
V
= m~n air
:5;
aij.
I
for all i = 1, ... , m and j = 1, ... , n; and letting i = i*, j = j* in this inequality, we obtain V = l' = ai"j"' Using this result in the same inequality yields
which is (9). This completes the proof of (i)and the sufficiency part of (ii). We now show that the class of saddlepoint strategy pairs is no larger than the class of ordered security strategy pairs. Let {row i*, coloumnj*} be a saddlepoint strategy pair. Then it follows from (9) that max a j• j :5; a jj • j
:5; max j
a ij , i
= 1 , ... , m,
thus proving that "row i*" is a security strategy for P 1. Analogously, it can be shown that "columnj*" is a security strategy for P2. This completes the proof of (ii). Part (iii) then follows readily from (ii). 0 We now immediately have the following property of saddlepoint solutions, which follows from Thm 2 (ii): 1 In a matrix game A, let {row it, columnjt} and {row i z, columnjz} be two saddlepoint strategy pairs. Then {row ii' columnjz}' {row i z' columnjt} are also in saddlepoint equilibrium. 0
COROLLARY
This feature of the saddlepoint strategies, given above in Corollary 1,is also known as their ordered interchangeability property. Hence, in the case of nonunique (multiple) saddle points, each player does not have to know (or guess) the particular saddlepoint strategy his opponent will use in the game,
2
TWOPERSON ZEROSUM
25
since all such strategies are in equilibrium and they yield the same valueindeed a very desirable property that an equilibrium solution is expected to possess. For the case when the security levels of the players do not coincide, however, no such equilibrium solution can be found within the class of (pure) strategies that we have considered so far. One way of resolving this problem is, as we have discussed earlier, to assume that one player acts after observing the decision of the other one, in which case the security level of the player who acts first is attained by an equilibrium "strategy pair". Here, of course, the strategy of the player who acts last explicitly depends on the action of the first acting player and hence the game has a "dynamic character". The precise meaning of this will be made clear in section 4, where more detail can be found on those aspects of zerosum games.
Mixed strategies Yet another approach to obtain an equilibrium solution in matrix games that do not possess a saddle point, and in which players act independently, is to enlarge the strategy spaces, so as to allow the players to base their decisions on the outcome of random eventsthus leading to the socalled mixed strategies. This is an especially convincing approach when the same matrix game is played over and over again, and the final outcome, sought to be minimized by PI and maximized by P2, is determined by averaging the outcomes of individual plays. To introduce the concept of a mixed strategy, let us again consider the matrix game A = {a jj } with m rows and n columns. In this game, the "strategy space" of PI comprises m elements, since he is allowed to choose one of the m rows of A. If r 1 denotes this space of (pure) strategies, then r 1 = {row 1 , ... , row m}. If we allow mixed strategies for PI, however, he will pick probability distributions on r 1. That is, an allowable strategy for PI is to choose "row I" with probability (w.p.) Yl' "row 2" w.p. Yz, ... , "row m" w.p. Ym, where Yl + Yz + ... + Ym = 1. The mixed strategy space of PI, which we denote by Y, is now comprised of all such probability distributions. Since the probability distributions are discrete, Y simply becomes the space of all nonnegative numbers Yi that add up to 1, which is a simplex. Note that, the m pure strategies of PI can also be considered as elements of Y, obtained by The mixed setting (i) Yt = 1, Yi = 0, 'if i 1 1; (ii) Yz = 1, Yi = 0, 'if i 1 2; strategy space of P2, which we denote by Z, can likewise be defined as an ndimensional simplex. A precise definition for a mixed strategy now follows. t t This definition of a mixed strategy is valid not only for matrix games but also for other types of finite games to be discussed in this and the following chapter.
26
DYNAMIC NONCOOPERATIVE GAME THEORY
2 A mixed strategy for a player is a probability distribution on the space of his pure strategies.Equivalently,it is a random variable whose valuesare 0 the player's pure strategies.
DEFINITION
Thus, in matrix games, a mixed strategy for each player can either be considered as an element of a simplex, or as a random variable with a discrete probability distribution. Typical mixed strategies for the players, under the latter convention, would be independent random variables u and v, defined respectively by I w.p. Yl u = {  , m w.P.Ym V
= {_I._\VX_~t.,
n w.p. z,
m
L
i = 1
Yi = 1, Yi ~ 0
(lOa)
n
L z, = 1, z, ~
O.
(lOb)
i= 1
Now, if the players adopt these random variables as their strategies to be implemented during the course of a repeated game, they will have to base their actual decisions (as to what specific row or column to choose during each play of the game) on the outcome of a chance mechanism, unless the probability distributions involved happen to be concentrated at one pointthe case of pure strategies. If the adopted strategy of PI is to pick "row 1" w.p. 1/2 and "row m" w.p. 1/2, for example, then the player in question could actually implement this strategy by tossing a fair coin before each play of the game, playing "row 1" if the actual outcome is "head", and "row m" otherwise. It should be noted, here, that the actual play (action) dictated by PI's strategy becomes known (even to him) only after the outcome of the chance mechanism (tossing of a fair coin, in this case) is observed. Hence, we have a sharp distinction between a player's (proper mixed) strategy t and its implemented "value" for a particular play, where the latter explicitly depends on the outcome of a chance experiment designed so as to generate the odds dictated by the mixed strategy. Such a dichotomy between strategy and its implemented value does not exist for the class of pure strategies that we have discussed earlier in this section, since if a player adopts the strategy "to play row i", for example, then its implementation is known for surehe will play "row i". Given an (m x n) matrix game A, which is to be played repeatedly and the final outcome to be determined by averaging the scores of the players on
t Since, by definition, the concept of mixed strategy also covers pure strategy, we shall sometimes use the term "proper mixed" to emphasize (whenever the case is) that the underlying probability distribution is not one point.
2
27
TWOPERSON ZEROSUM
individual plays, let us now investigate as to how this final outcome will be related to the (mixed) strategies adopted by the players. Let the independent random variables u and v, defined by (10), be the strategies adopted by PI and P2, respectively. Then, as the number oftimes this matrix game is played gets arbitrarily large, the frequencies with which different rows and columns of the matrix are chosen by PI and P2, respectively, will converge to their respective probability distributions that characterize the strategies u and v. Hence, the average value of the outcome of the game, corresponding to the strategy pair {u,v}, will be equal to m
J(y,z) =
n
I L YiajjZj=y'Az,
(11)
i=1 j=l
where Y and
Z
are the probability distribution vectors defined by Y
= (Yl""
,Ym)',z
= (ZI""
,zn)"
(12)
PI wishes to minimize this quantity, J (Y, z), by an appropriate choice of a probability distribution vector Y E Y, while P2 wishes to maximize the same quantity by choosing an appropriate Z E Z, where the sets Yand Z are, respectively, the m and ndimensional simplices introduced earlier, i.e. m
Y= {YERm: Y ~ 0, i
Z = {z ERn: Z ~ 0,
L =
Yi = l}
(13a)
1
n
I
Zj
= 1}.
(13b)
j=l
The following definitions now follow as obvious extensions to mixed strategies of some of the concepts introduced earlier for matrix games with strategy spaces comprised of only pure strategies. DEFINITION 3 A vector y* E Yis called a mixed security strategy for PI, in the matrix game A, if the following inequality holds for all Y E Y:
y*'Az ~ max y'AZ,YEY.
Vm(A)~max zeZ
(14)
zeZ
Here, the quantity 17m is known as the average security level of PI (or equivalently the average upper value ofthe game). Analogously, a vector Z* E Z is called a mixed security strategy for P2, in the matrix game A, if the following inequality holds for all ZEZ: Ym(A)~
min Y' Az* ye Y
~
min y' AZ,ZEZ. ye Y
(15)
28
DYNAMIC NONCOOPERATIVE GAME THEORY
Here, Ym is known as the average security level of P2 (or equivalently, the average lower value of the game). 0 DEFINITION 4 A pair of strategies {y*, z*} is said to constitute a saddlepoint for a matrix game A, in mixed strategies, if
y*' Az
~
y*' Az*
~
y' Az*, V yeY, zeZ.
(16)
The quantity Vm(A) = y*' Az* is known as the saddlepoint value, or simply the value, of the game, in mixed strategies. 0 Remark J The assumptions of existence of a maximum in (14) and minimum in (15) are justifiable since, in each case, the objective functional linear (thereby continuous) and the simplex over which optimization performed is closed and bounded (thereby compact). (See Appendix I.)
a is is 0
As a direct extension of Thm 1, we can now verify the following properties of the security levels and security strategies in matrix games, when the players are allowed to use mixed strategies. 3 In every matrix game A in which the players are allowed to use strategies, the following properties hold: The average security level of each player is unique. There exists at least one mixed security strategy for each player. The security levels in pure and mixed strategies satisfy the following inequalities:
THEOREM
mixed (i) (ii) (iii)
(17)
Proof
(i) Uniqueness of 17m follows directly from its definition, 17m = inf y max , y' Az, in view of Remark 1. Analogously, Yin = sup z min y y' Az is also unique. (ii) The quantity, maxZ€zY' Az, on the RHS (righthand side) of (14) is a continuous function of ye Y, a result which can be proven by employing standard methods of analysis. But since Yis a compact set, the minimum of this function is attained on Y, thus proving the desired result for the security strategy of PI. Proof of existence of a security strategy for P2 follows along similar lines. (iii) The middle inequality follows from the simple reasoning that the maximizer's security level cannot be higher than the minimizer's security level. The other two inequalities follow, since the pure security strategies of the players are included in their mixed strategy spaces. 0
2
29
TWOPERSON ZEROSUM
As an immediate consequence of part (ii) of Thm 3 above, we obtain COROLLARY 2 I n a matrix game A, the average upper and lower values in mixed strategies are given, respectively, by
Vm(A)
= min max y' Az
(18a)
Ym(A)
= max min y' Az.
(18b)
y
z
and z
y
o
Now, one of the important results of zerosum game theory is that these upper and lower values are equal, that is, Vm(A) = Ym(A). This result is given in Thm 4, below, which is known as the "Minimax Theorem". In its proof, we shall need the following lemma: 1 Let A be an arbitrary (m x n)dimensional matrix. Then, either (i) there exists a nonzero vector y E R"; y 2': 0 such that A' y ~ 0,
LEMMA
or (ii) there exists a nonzero vector z E R", z 2': 0 such that Az 2': O.
Proof This result is related to the separating hyperplane theorem, and it can be found in any standard book on linear or nonlinear programming; see, for example, Boot (1964; p. 34). 0 THEOREM 4 (The Minimax Theorem) In any matrix game A, the average security levels of the players in mixed strategies coincide, that is,
Vm(A)
= min max y'Az = max min y' Az = Ym(A). y
z
z
y
(19)
Proof We first show, by applying Lemma 1, that for a given matrix game A, at least one of the following two inequalities holds:
s
0
(i)
Ym(A) 2': O.
(ii)
Vm(A)
Assume that the first alternative of Lemma 1 is valid. Then, there exists a vector yOEY such that A'yO ::; 0, which is equivalent to the statement that the inequality
30
DYNAMIC NONCOOPERATIVE GAME THEORY
holds for all z E Z. This is further equivalent to saying max yO'Az ~ z
°
which definitely implies (by also making use of (18a)) Pm(A)
= min max y' Az s 0. z
y
Now, under the second alternative of Lemma 1, there exists a vector such that Azo ~ 0, or equivalently,
y' Azo
~
ZO E
Z
0, \;/ y E Y.
This is further equivalent to the inequality min y' Azo
~
0,
y
which finally implies (ii), by also making use of (l8b), since
Ym = max min y' Az z
y
~ min y' Azo . y
Thus we see that, under the first alternative of Lemma 1, inequality (i) holds; and under the second alternative, inequality (ii) remains valid. Let us now consider a new (m x n)dimensional matrix B = {b i j } obtained by shifting all entries of A by a constant  c, that is, b i j = aij  c. This results in a shift of the same amount in both Pm and Ym' that is, Pm(B)
=
Ym(B)
= Ym(A)c.
and
Pm(A)c
Since matrix A was arbitrary in (i) and (ii), we replace A with B, as defined above, in these inequalities, and arrive at the following property with regard to the upper and lower values ofa matrix game A, in mixed strategies. For a given matrix game A and an arbitrary constant c, at least one of the following two inequalities holds true: Pm(A) ~ c
(iii)
Ym(A) ~ c.
(iv)
But, for this statement to be valid for arbitrary c, It IS necessary that Pm (A) = Ym (A). For, otherwise, the only possibility is Pm (A) = Ym (A) + k, in view of the middle inequality of (17), where k > 0; and picking c = Ym (A) + (1/2)k in (iii) and (iv), it readily follows that neither (iii) nor (iv) is satisfied. Thus, this completes the proof of the theorem. 0
2
TWOPERSON ZEROSUM
31
3 Let A denote an (m x n) matrix game. Then, (i) A has a saddle point in mixed strategies, (ii) a pair of mixed strategies provides a saddle point for A if, and only if, the
COROLLARY
first of these is a mixed security strategy for PI, and the second one is a mixed security strategy for P2, (iii) Vm(A) is uniquely given by Vm(A) = Vm(A) = Ym(A), (iv) in case ofmultiple saddle points, the mixed saddlepoint strategies possess the ordered interchangeability property. Proof This corollary to Thm 4 is actually an extension of Thm 2 and Corollary 1 to the case of mixed strategies, and its verification is along the same lines as that of Thm 2. It should only be noted that equality of the average upper and lower values of the matrix game is a fact, in the present case, rather than a part of the hypothesis, as in Thm 2. 0
We have thus seen that, if the players are allowed to use mixed strategies, matrix games always admit a saddlepoint solution which, thereby, manifests itself as the only reasonable equilibrium solution in zerosum twoperson games of that type.
2.3 COMPUTATION OF MIXED EQUILIBRIUM STRATEGIES We have seen in the previous section that twoperson zerosum matrix games always admit a saddlepoint equilibrium in mixed strategies (cf. Thm 4). One important property ofmixed saddlepoint strategies is that, for each player, the corresponding one also constitutes a mixed security strategy, and conversely, each mixed security strategy is a mixed saddlepoint strategy (Corollary 3 (iij), This property, then, strongly suggests a possible way of obtaining the mixed saddlepoint solution of a matrix game, which is to determine the mixed security strategy(ies) of each player. To illustrate this approach, let us consider the (2 x 2) matrix game P2
Pl@tE
(20)
which dearly does not admit a purestrategy saddle point, since V = 1 and
E = O. Let the mixed strategies of PI and P2 be denoted by y = (Yl' Y2)' and Z = (Zl, Z2)', respectively, with Yi ~ 0, Zi ~ 0 (i = 1,2), Y1 + Y2 = Zl +Z2 = 1, and consider first the average security levelof PI. It should be noted that, while
32
DYNAMIC NONCOOPERATIVE GAME THEORY
determining the average security levels, we can assume, without any loss of generality, that the other player chooses only pure strategies (this follows directly from Defs 3 and 4). Hence, in the present case, we can take P2 to play either (Zl = 1, Z2 = 0) or (Zl = 0, Z2 = 1); and under different choices of mixed strategies for PI, we can determine the average outcome of the game as shown in Fig. 1 by the bold line, which forms an upper envelope to the two straight lines drawn. Now, if the mixed strategy (Y! = t, y! = t) corresponding to the lowest point of that envelope is adopted by PI, then the average outcome will be no greater than t. For any other mixed strategy of PI, however, P2 can obtain a higher average outcome. (If, for instance, Yl = t, Y2 = t, then P2 can increase the outcome by playing Zl = 1,z2 = O.)This then implies that ty, = t, Y2 = t) is a mixed security strategy of PI (and his only one), and thereby, it is his mixed saddlepoint strategy. The mixed saddlepoint value can easily be read off from Fig. 1 to be Vm = t. 3
3 Z,,,,l, Z2"0
2
!
2
1
1
06
06
,
0 ZpG, Z21
1
I
I
0
.
I I I
I
I
1
I
I
Y1~1
Y~~
2/5
Y,~O
)2~0
Y!.~
3/5
Y2~
1
Fig. 1. Mixed security strategy of Pi for the matrix game (20).
z,=1
Z,·:.1/5
z,~O
Z2~0
z!~4/5
Z2~
1
Fig. 2. Mixed security strategy of P2 for the matrix game (20).
To determine the mixed saddlepoint strategy of P2, we now consider his security level, this time assuming that PI adopts pure strategies. Then, for different mixed strategies of P2, the average outcome of the game can be determined to be the bold line, shown in Fig. 2, which forms a lower envelope to the two straight lines drawn. Since P2 is the maximizer, the highest point on this envelope is his average security level, which he can guarantee (on the average) by playing the mixed strategy (z! = !. z! = ~) which is also his saddlepoint strategy. The computational technique discussed above is known as the "graphical solution" of matrix games, since it makes use of a graphical construction directly related to the entries of the matrix. Such an approach is practical, not
2
TWOPERSON ZEROSUM
33
only for (2 x 2) matrix games, but also for general (2 x n) and (m x 2) matrix games. Consider, for example, the (2 x 3) matrix game
IB EEB3 P2
PI
0
627
(21)
for which PI's average security levelin mixed strategies has been determined in Fig. 3. 7
o Yt,1
Y,·' 2/3
Y, ,0
>'2'0
Y2' '/3
Y2' 1
Fig. 3. Mixed security strategy of PI for the matrix game (21).
Assuming again that P2 uses only pure strategies (ZI = 1, Z2 = Z3 = 0 or = Z3 = 0, Z2 = lor ZI = Z2 = 0, Z3 = 1),the average outcome of the game for different mixed strategies of PI is given by the bold line drawn in Fig. 3, which is again the upper envelope (this time the three straight lines).The lowest point on this envelope yields the mixed security strategy, and thereby the mixed saddlepoint strategy, of PI, which is (yT = t, Y! = t> The mixed saddlepoint value, in this case, is Vm = !. To determine the mixed saddlepoint strategy of P2, we first note that the average security level of PI has been determined by the intersection of only two straight lines which correspond to two pure strategies of P2, namely "column 1" and "column 2". If PI uses his mixed security strategies, and P2 knows that before he plays, he would of course be indifferent between these two pure strategies, but he would never ZI
34
DYNAMIC NONCOOPERATIVE GAME THEORY
play the third pure strategy, "column 3", since that will reduce the average outcome of the game. This then implies that, in the actual saddlepoint solution, he will "mix" only between the first two pure strategies. Hence, in the process of determining P2's mixed security strategies, we can consider the (2 x 2) matrix game P2
PI
BfA 3
6
(22)
2
which is obtained from (21) by deleting the last column. Using the graphical approach, P2's mixed security strategy in this matrix game can readily be determined to be (z! = z! = t) which is also his mixed saddlepoint strategy.
r,
Relation with linear programming One alternative to the graphical solution, when the matrix dimensions are high, is to convert to original matrix game into a linear programming (LP) problem and make use of some of the powerfulalgorithms ofLP. To elucidate the close connection between a twoperson zerozum matrix game and an LP problem, let us start with an (m x n) matrix game A = {aij} with all entries positive (i.e. aij > 0 'r/ i = 1, ... , m;j = 1, ... , n). Then, the average value of this game, in mixed strategies, is given by Vm (A)
= min max y' Az = max min y' Az, z
y
z
(23)
y
which is necessarily a positive quantity by our assumption on A. Let us now consider the minmax operation used in (23). Here, first ayE Y is given, and then the resulting expression is maximized over Z; that is, the choice of z EZ can depend on y. Hence, the middle expression of (23) can also be written as min vdY) y
where vdY) is a positive function of y, defined by Vl(y)
= max y'Az z
~
y'Az
'r/ ZEZ.
(24)
Since Z is the ndimensional simplex as defined by (13b), the inequality in (24) becomes equivalent to the vector inequality
2
TWO·PERSON ZERO·SUM
35
with
In = (1, ... ,1)' ERn. Further introducing the notation y = Y/Vl (y) and recalling the definition of Y from (l Sa), we observe that the optimization problem faced by PI in determining his mixed security strategy is minimize v 1 (y) over R m subject to A'y::; In
Y'lm = y ~ 0,
[Vl(y)r 1
y = YV1(Y)'
This is further equivalent to the maximization problem max y'/ m
(25a)
subject to (25b) (25c) which is a standard LP problem. The solution of this LP problem will give the mixed security strategy of PI, normalized with the average saddlepoint value of the game, Vm(A). Hence, if y* EYis a mixed saddlepoint strategy of PI, in the matrix game A, then the quantity y* = y*/Vm(A) solves the LP problem (25), whose optimal value is in fact 1/ Vm(A). Conversely, every solution of the LP problem will correspond to a mixed strategy of PI in the same manner. Now, if we instead consider the righthand expression of (23),and introduce v2(z)gminy'Az::; y'Az y
V y e Y,
and
i gz/v 2(z), following similar steps and reasoning leads us to the minimization problem min
i'l;
(26a)
subject to
Ai ~/m i ~O,
(26b) (26c)
which is the "dual" of (25). We thus arrive at the conclusion that if z* EZ is a mixed saddlepoint strategy of P2, in the matrix game A, then z* = z* /Vm(A)
36
DYNAMIC NONCOOPERATIVE GAME THEORY
solves the LP problem (26), whose optimal value is again IjVm(A). Furthermore, the converse statement is also valid. We have thus shown that, given a matrix game with all positive entries, there exist two LP problems (duals of each other) whose solutions yield the saddlepoint solution(s) of the matrix game. The positivity of the matrix A, however, is only a convention here, and can easily be dispensed with, as the following lemma exhibits. LEMMA 2 Let A and B be two (m x n)dimensional matrices related to each other by the relation
(27) where c is some constant. Then, (i) every mixed saddlepoint strategy pair for matrix game A also constitutes a mixed saddlepoint solution for the matrix game B, and vice versa. (ii) Vm(A) = Vm(B) + c. Proof Let (y*,z*) be a saddlepoint solution for A, thus satisfying inequalities (16). If A is replaced by B + clm/~ in (16), then it is easy to see that (y*, z*) also constitutes a saddlepoint solution for B, since y'/m/~z = 1 for every y E Y, Z E Z. Since the reverse argument also applies, this completes the proof of (i). Part (ii), on the other hand, follows from the relation Vm(A) = y*' [B
+ clm/~]z*
= y*' Bz* + cy*' Im/~z*
=
Vm(B) + c.
0
Because of property (i) of the above lemma, we call matrix games which satisfy relation (27) strategically equivalent matrix games. It should now be apparent that, given a matrix game A, we can always find a strategically equivalent matrix game with all entries positive, and thus the transformation of a matrix game into two LP problems as discussed prior to Lemma 2 is always valid, as far as determination of saddlepoint strategies is concerned. This result is summarized below in Theorem 5. 5 Let B be an (m x n) matrix game, and A be defined by (27) with c chosen to make all its entries positive. Introduce the two LP problems
THEOREM
max y'/m} A'y s I. y ~O min z'l; } Az
z~O
~
1m
"primal problem"
(28a)
"dual problem"
(28b)
2
TWOPERSON ZEROSUM
37
with their optimal values (if they exist) denoted by Vp and Vd , respectively. (i) Both LP problems admit a solution, and Vp = Vd = 1jVm(A). (ii) If (y*, z*) is a mixed saddlepoint solution of the matrix game B, then y* fVm(A) solves (28a), and z* j Vm(A) solves (28b). (iii) If y* is a solution of (28a), and i* is a solution of (28b), the pair (y* /Vp , i*/Vd ) constitutes a mixed saddlepoint solution for the matrix game B. Furthermore, Vm(B) = (ljVp )  c. Proof Since A is a positive matrix and strategically equivalent to B, the theorem has already been proven prior to the statement of Lemma 2. We should only note that the reason why both LP problems admit a solution is that the mixed security strategies of the players always exist in the matrix game A. 0
Theorem 5 provides one method of making games "essentially" equivalent to two LP problems that are duals of each other, thus enabling the use of some powerful algorithms of LP in order to obtain their saddlepoint solutions. (The reader is referred to Dantzig (1963) or Luenberger (1973) for LP algorithms.) There are also other transformation methods available in the literature, which form different kinds of equivalences between matrix games and LP problems, but the underlying idea is essentially the same in all those techniques. For one such equivalence the reader is referred to the next chapter, to Corollary 3.2, where an LP problem, not equivalent to the one of Thm 5, is obtained as a special case of a more general result on nonzerosum matrix games. For more details on computational techniques for matrix games the reader is referred to Luce and Raiffa (1957) and to Singleton and Tyndal (1974). Although it falls outside the scope of this chapter, we mention, for the sake of completeness, that each LP problem can also be transformed into an equivalent zerosum matrix game, towards which end the following theorem will be given. THEOREM
6
Given the dual pair of LP problems
y'C}
max A'y ~ b y ~O
min z'b } Az ~ c
"primal problem"
"dual problem"
z~o
where A is an (m x n) matrix, associate with this pair a zerosum matrix game characterized by the (m + n + 1) x (m + n + 1) matrix
38
DYNAMIC NONCOOPERATIVE GAME THEORY
A
o
b'
cJ b
o
.
Then, (i) PI and P2 have the same saddlepoint strategies in this matrix game, to be denoted by the (m + n + I)vector 0, then y = yjs and z = z/s solve, respectively, the primal and dual LP problems. Proof The first part of the theorem follows readily since M is a skewsymmetric matrix (see Problem 9, section 7). For a proof of part (ii), see, for instance, Vorob'ev (1977). 0
Dominating strategies We conclude this section by introducing the concept of "dominance" in matrix games, which can sometimes be useful in reducing the dimensions of a matrix game by eliminating some of its rows and/or columns which are known from the very beginning to have no influence on the optimal solution. More precisely, given an (m x n) matrix game A = {aij}' we say that "row i" dominates "row k" if aij ::::;; akj,j = 1, ... , n, and if, for at least one j, the strict inequalitysign holds. In terms of pure strategies, this means that the choice of the dominating strategy, i.e. "row i", is at least as good as the choice of the dominated strategy, i.e. "row k", If, in the above set of inequalities, the strict inequalitysign holds for all j = 1, ... , n, then we say that "row i" strictly dominates "row k", in which case, regardless of what P2 chooses, PI does better with "row i" (strictly dominating strategy) than with "row k" (strictly dominated strategy). It therefore follows that PI can always dispense with his strictly dominated strategies and consider only strictly undominated ones, since adoption of a strictly dominated strategy is apt to increase the security level for PI. This argument also holds for mixed strategies, and strictly dominated rows are always assigned a probability of zero in an optimal (mixed) strategy for PI. Analogously for P2, "column j" of A is said to dominate (respectively, strictly dominate) "column I", if aij ~ ail' i = 1, ... ,m, and if, for at least one i (respectively, for all i), the strict inequalitysign holds. In an optimal (mixed) strategy for P2, strictly dominated columns are also assigned a probability of zero. Mathematical verifications of these intuitively obvious assertions, as well as some other properties of strictly dominating strategies, can be found in Vorob'ev (1977). We have thus seen that, in the computation of (pure or mixed) saddlepoint equilibria of matrix games, strictly dominated rows and columns can readily be
2
TWOPERSON ZEROSUM
39
deleted, since they do not contribute to the optimal solutionand this could lead to considerable simplifications in the graphical solution or in the solution of the equivalent LP problems, since the matrix is now of smaller dimension(s). With (nonstrictly) dominated rows and columns, however, this may not always be so. Even though every saddlepoint solution of a matrix game whose dominated rows and columns are deleted is also a saddlepoint solution of the original matrix game, there might be other saddle points (of the original game) which are eliminated in this procedure. As a specific illustration of this possibility, consider the (2 x 2) matrix game P2 PI
~ 1
which admits two saddle points, as indicated. For PI, "row 2" is clearly dominating (but not strictly dominating), and for P2, "column 2" is strictly dominating. Thus, by eliminating dominated rows and columns, we end up with the (1 x 1) game corresponding to the dominating strategies of the players, which trivially has a unique saddle point. Motivated by this result, we call the saddlepoint solutions of the "reduced" game which does not have any dominated rows or columns dominant saddlepoint solutions. Even though the original game might have other saddle points, it may still be reasonable to confine attention only to dominant saddle points, since any game that possesses a saddle point also possesses (by definition) a dominant saddle point, and furthermore the saddlepoint value (in pure or mixed strategies) is unique; in other words, there is nothing essential to gain in seeking saddlepoint equilibria which are not dominant.
2.4
EXTENSIVE FORMS: SINGLEACT GAMES
There exist different mathematical descriptions for twoperson zerosum games with finite strategy sets, the matrix form (also known as the normalform) being one of these. A normal form does not in general provide the full picture of the underlying decision process, since it only describes the correspondences between different ordered strategy pairs and outcomes. Some important issues like the order of play in the decision process, information available to the players at the time of their decisions, and the evolution of the game in the case of dynamic situations are suppressed in the matrix description of a zerosum game. An alternative to the normal form, which explicitly displays the dynamic
40
DYNAMIC NONCOOPERATIVE GAME THEORY
character of the problem, is known as the extensive form of a twoperson zerosum game. An extensive form basically involves a tree structure with several nodes and branches, providing explicit description of the order of play and the information available to each player at the time of his decision(s); and the game evolves from the top of the tree to the tip of one of its branches. Two such tree structures are depicted in Fig. 4, where in each case PI has two alternatives (branches) to choose from, whereas P2 has three alternatives, and the order of play is such that PI acts before P2 does. The numbers at the end of the lower branches represent the payoffs to P2 (or equivalently, losses incurred to PI) if the corresponding paths are selected by the players. P1
P1 R
3
0
6
2
7
3
0
6
2
7
(b)
(a)
Fig. 4. Two zerosum games in extensive form differing only in the information
available to P2.
It should be noted that the two zerosum games displayed in Figs 4(a) and 4(b) are equivalent in every aspect other than the information available to P2 at the time of his play, which is indicated on the tree diagrams by dotted lines enclosing an area (known as the information set) including the relevant nodes. In Fig. 4(a), the two possible nodes of P2 are included in the same dotted area, implying that, even though PI acts before P2 does, P2 does not have access to his opponent's decision. That is to say, at the time of his play, P2 does not know at what node he really is. This is, of course, equivalent to the case when both players act simultaneously. Hence, simultaneous play can also be represented by a tree structure, provided that the relevant information set is chosen properly. The extensive form of Fig. 4(a) can now easily be converted into the normal form, with the equivalent matrix game being the one given by (21) in section 3. As it has been discussed there, this matrix game admits a saddlepoint solution in mixed strategies, which is (yt = t, yl = t; zt = t. zt, = t, zl = 0), with the mixed saddlepoint value being Vm = 8/3. The extensive form of Fig. 4(b), however, admits a different matrix game as its normal form and induces a different behaviour on the players. In this case, each node of P2 is included in a separate information set, thus implying that P2 has perfect information as to which branch of the tree PI has chosen. Hence, if u 1 denotes the actual choice of PI and y2(U1) denotes a strategy for P2, as a
2
41
TWOPERSON ZEROSUM
maximizer P2's optimal choice will be 2*( 1) _ {M
Y
u
R

1
if u = L if u1 = R.
(29a)
Not knowing this situation ahead of time, PI definitely adopts the strategy (29b) with the equilibrium outcome of the game being 3, which is, in fact, the upper value of the matrix game (21). Thus, we have obtained the solution of the zerosum game of Fig. 4(b) by directly making use of the extensive tree formulation. Let us now attempt to obtain the (same) saddlepoint solution by transforming the extensive form into an equivalent normal form. To this end, we first delineate the possible strategies of the players. For PI, there are two possibilites: y1
= Land
yl
=
R.
For P2, however, since he observes the action ofPl, there exist 32 = 9 possible strategies, which are YI(u 1) = u 1 , Y~(Ul) = L, Y~(Ul) = M, yi(u 1 ) = R, 2
1
Ys(u)=
{
L, M,
if u 1 = L ifu1=R
Y~(Ul)
=
R { L,'
if u 1 = L if u 1 = R
Y~(Ul)
=
M { R, '
if u 1 = L if u 1 = R
2( 1) _ 
M'
{ L,
if u 1 = L if u 1 = R
=
R { M,'
if u 1 = L if u 1 = R
Y8
Y~(Ul)
U
where the subscripts i = l, ... ,9 denote a particular ordering (labelling) of the possible strategies of P2. Hence, the equivalent normal form of the zerosum game of Fig. 4(b) is the 2 x 9 matrix game
P2 1
1
3
0
1
0
(3)*
Q)
0
L
0
6
2
7
2
6
7
6
2
R
1
2
3
4
5
6
7
8
9
PI
42
DYNAMIC NONCOOPERATIVE GAME THEORY
which admits two saddle points, as indicated, with the dominant one being
{L, column 7}. It is this dominant saddlepoint solution that corresponds to
the one given by (29), and thus we observe that the derivation outlined earlier, which utilizes directly the extensive tree structure, could cancel out dominated saddle points. This, however, does not lead to any real loss of generality, since, if a matrix game admits a purestrategy saddle point, it also admits a dominant purestrategy saddle point, and furthermore, the saddlepoint value is unique (regardless of the number of purestrategy equilibria). We should now note that the notion of "( pure) strategy" introduced above within the context of the zerosum game of Fig. 4(b) is somewhat different from (and, in fact, more general than) the one adopted in section 2. This difference arises mainly because of the dynamic character of the decision problem of Fig. 4(b}P2 being in a position to know exactly how PI acts. A (pure) strategy for P2, in this case, is a rule that tells him which one of his alternatives to choose, for each possible action of PI. In mathematical language, it is a mapping from the collection of his information sets into the set of his alternatives. We thus see that there is a rather sharp distinction between a strategy and the actual action dictated by that strategy, unless there exists a single information set (which is the case in Fig. 4(a)). In the case of a single information set, the notions of strategy and action definitely coincide, and that is why we have used these two terms interchangeably in section 2 while discussing matrix games with purestrategy solutions. We are now in a position to give precise definitions of some of the concepts introduced above. DEFINITION 5 Extensive form of a twoperson zerosum finite game without chance moves is a finite tree structure with (i) a specific vertex indicating the starting point of the game, (ii) a payofffunction assigning a real number to each terminal vertex of the tree, which determines the payoff (respectively, loss) to P2 (respectively, PI), (iii) a partition of the nodes ofthe tree into two player sets (to be denoted by !V 1 and !V 2 for PI and P2, respectively), (iv) a subpartition of each player set !Vi into information sets {11~}, such that the same number of immediate branches emanate from every node belonging to the same information set, and no node follows another node in the same information set.' 0
Remark 2 This definition of an extensive form covers more general types of A tree structure in which a node follows another node in the same information set can be reduced to another tree structure that does not exhibit such a feature.
t
2
43
TWOPERSON ZEROSUM
twoperson zerosum games than the ones discussed heretofore since it also allows a player to act more than once in the decision process, with the information sets being different at different levels of play. Such games are known as multiact games, and they will be discussed in section 5. In the remaining parts of this section we shall confine our analysis only to singleact gamesgames in which each player acts only once. 0 Remark 3 It is possible to extend Def. 5 so that it also allows for chance moves by a third party called "Nature". Such an extension will be incorporated in section 6. In the present section, and also in section 5, we only consider the class of zerosum finite games which do not incorporate any chance moves. 0
6 Let N i denote the class of all information sets of Pi, with a of Pi at typical element designated as ,{ Let Vi, denote the set of .alternatives . the nodes belonging to the information set r/'. Define V = U V~, where the union is over ,.,i E N i • Then, a strategy yi for Pi is a mapping from N i into Vi, assigning one element in Vi for each set in and with the further property that yi(,.,i) E Vi,~ for each ,.,i E N i . The set of.all strategies for Pi is called his strategy set (space), and it is denoted by P. 0 DEFINITION
~.
I
s,
Remark 4 For the starting player, it is convenient to take N i = a singleton, to eliminate any possible ambiguity. 0 Example 1 In the extensive form of Fig. 4(b), P2 has two information sets (N 2={(node 1),(node 2)}) and three alternatives (V 2={L,M,R}). Therefore, he has nine possible strategies which have actually been listed earlier during the normalization phase of this extensive form. Here, by an 0 abuse of notation, ,.,2 = {u 1 }. DEFINITION 7 A strategy yi(.) of Pi is said to be constant if its value does not 0 depend on its argument n',
Saddlepoint equilibria of singleact games The purestrategy equilibrium solution of singleact twoperson zerosum games in extensive form can be defined in a way analogous to Def. 1, with appropriate notational modifications. To this end, let J (y 1, y2) denote the numerical outcome of a game in extensive form, interpreted as the loss incurred to PI when PI and P2 employ the strategies yl E r and y2 E r, respectively. This loss function in fact defines the correspondences between different ordered strategy pairs and outcomes, and thus describes the equivalent normal form of the zerosum game. Then, we have
44
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION 8 A pair of strategies {yl* E r: , y2* E r 2 } is in saddlepoint equilibrium if the following set of inequalities is satisfied for all yl E r, l E r 2:
(30)
The quantity J (yl*, y2*) is known as the saddlepoint value of the zerosum game. 0 Since saddlepoint equilibrium is defined in terms of the normal form of a game, a direct method for solving twoperson singleact games in extensive form would be first to convert them into equivalent normal form and solve for the saddlepoint solution of the resulting matrix game, and then to interpret this solution in terms of the original game through the onetoone correspondences that exist between the rows and columns of the matrix and strategies of the players. This method has already been illustrated in this section in conjunction with the solution of the zerosum game of Fig. 4(b),and as we have observed there, a major disadvantage of such a direct approach is that one has to list down all possible strategies of the players and consider a rather high dimensional matrix game, especially when the secondacting player has several information sets. An alternative to this approach exists, which makes direct use of the extensive form description of the zerosum game. This, in fact, is a recursive procedure, and it obtains the solution without necessarily considering all strategy combinations, especially when the secondacting player has more than one information set. The strategies not considered in this procedure are the dominated ones, and this has actually been observed earlier in this section when we employed the technique to arrive at the equilibrium strategy pair (29a}(29b) for the zerosum game of Fig. 4(b). Before providing a general outline of the steps involved in this derivation, let us first consider another, somewhat more elaborate, example which is the zerosum game whose extensive form is depicted in Fig. 5 below. To determine the saddlepoint strategies associated with this zerosum game, we first note that if u 1 = R, then P2 should choose y2(R) = L, to incur a loss of 2 to Pl. If Pi picks Lor M, however, P2 cannot differentiate between P1 R
3
1 1
1 2
0
Fig. 5. An extensive form that admits a saddlepoint equilibrium in pure strategies.
2
TWOPERSON ZEROSUM
45
these two, since they belong to the same information set; and hence now a matrix game has to be solved for that part of the tree. The equivalent normal form is
PIRIE ~ P2
R
L
which clearly admits the unique saddlepoint solution u 1 = M, u2 = R, yielding a cost of 1 for PI. Hence, in the actual play of the game, PI will always play u 1 = yl = M, and P2's optimal response will be u 2 = R. However, the saddlepoint strategy of P2 is
Y (n 20
'f
2)
=
{LR
1
=.R
if u otherwise
(31)
and not the constant strategy y2(U1) = R, since if P2 adopts this constant strategy, then PI can switch to R to have a loss of 0 instead of 1. The strategy for PI that is in equilibrium with (31) is still ylO
= M,
(32)
since N 1 is a singleton. The reader can now easily check that the pair (31}(32) satisfies the saddlepoint inequalities (30). The preceding analysis now readily suggests a method of obtaining the saddlepoint solution ofsingleact zerosum games in extensive form, provided that it exists. The steps involved are as follows.
Derivation of purestrategy saddlepoint solutions of singleact games in extensive form (1) For each information set of the secondacting player, solve the corresponding matrix game (assuming that each such matrix game admits a purestrategy saddle point). For the case when an information set is a singleton, the matrix game involved is a degenerate one with one of the players having a single choicebut such (essentially oneperson) games always admit welldefined purestrategy solutions. (2) Note down the saddlepoint value of each of these matrix games. If PI is the starting player, then the lowest of these is the saddlepoint value of the extensive form; if P2 is the starting player, then the highest one is. (3) The saddlepoint strategy of the firstacting player is his saddlepoint strategy obtained for the matrix game whose saddlepoint value corresponds to the saddlepoint value of the extensive game.
46
DYNAMIC NONCOOPERATIVE GAME THEORY
(4) For the secondacting player, the saddlepoint strategy is comprised of his saddlepoint solutions in all the matrix games considered, by appropriately identifying them with each of his information sets. 0 We leave it to the reader to verify that any pair of equilibrium strategies obtained by following the preceding procedure does, indeed, constitute a saddlepoint strategy pair that satisfies (30). The following conclusions can now readily be drawn. PROPOSITION I A singleact zerosum twoperson finite game in extensiveform admits a (purestrategy) saddlepoint solution if, and only if, each matrix game corresponding to the information sets of the secondacting player has a saddle point in pure strategies. 0 PROPOSITION 2 Every singleact zerosum twoperson finite game in extensive form, in which the information sets of the secondacting player are singletons, t admits a purestrategy saddlepoint solution. 0
If the matrix game corresponding to an information set does not admit a saddlepoint solution in pure strategies, then it is clear that the strategy spaces of the players have to be enlarged, in a way similar to the introduction of mixed strategies in section 2 within the context of matrix games. To pave the way for such an enlargement in the case of games in extensive form, let us first consider the singleact twoperson zerosum game whose extensive form is depicted in Fig. 6. P1
Fig. 6. An extensive form that admits a mixed (behavioural)strategy saddlepoint
solution.
Here, P2 is the secondacting player and he has two information sets. If his opponent picks R, then he can observe that choice, and his best response (strategy) in that case is L, yielding an outcome of 1. If PI chooses L or M, however, then P2 is ignorant about that choice, since both of these nodes are t Such games are also known as perfect information games.
2
TWOPERSON ZEROSUM
47
included in the same information set of PI. The relevant matrix game, in this case, is P2
P1EfEE ~ L
R
which is the matrix game (20)considered earlier in section 3, and it is known to admit the mixed saddlepoint solution (y! = t, yt, = t) for PI and (z! = t, z~ = !) for P2, yielding an average outcome of t. Since t < 1, it is clear that PI will prefer to stay on that part of the tree and thus stick to the strategy
y1· =
t t
L M
w.p. w.p. w.p.O
{ R
(33a)
which is "mixed" in nature. P2's equilibrium strategy will then be A2·
Y
=
L
L
{ R
w.p. 1 if u 1 = R w
.p.
I} otherwise 1
(33b)
W.p.s
since he also has to consider the possibility of PI playing R (by definition of a strategy). The pair (33) now yields the average outcome of t. In order to declare (33) as the equilibrium solution pair of the extensive form of Fig. 6, we have to specify the spaces in which it possesses such a property. To this end, we first note that PI has three possible pure strategies, yl = L, y~ = M and yj = R, and in view of this, (33a) is indeed a mixed strategy for PI (see Def. 2). For P2, however, there exist four possible pure strategies, namely
and if u 1 = R otherwise. Hence, a mixed strategy for P2 is a probability law according to which these four strategies will be "mixed" during the play of the game, and (33b) provides
48
DYNAMIC NONCOOPERATIVE GAME THEORY
one such probability distribution which assigns a weight of t to the first pure strategy and a weight of ! to the third one. We have thus verified that both (33a) and (33b) are welldefined mixed strategies for Pi and P2, respectively. Now, if ff"y!, y2) denotes the average outcome of the game when the mixed strategiesj' and y2 are adopted by Pi and P2, respectively, the reader can easily check that the pair (33) indeed satisfies the relevant set of saddlepoint inequalities }(Al. Y ,yA2) <  }(Al. Y ,yA2·) <  J~(AI y,yA2·) and thus constitutes a saddlepoint solution within the class of mixed strategies. However, the class of mixed strategies is, in general, an unnecessarily large class for the secondacting player. In the extensive form of Fig. 6, for example, since P2 can tell exactly whether Pi has played R or not, there is no reason why he should mix between his possible actions in case of u 1 = R and his actions otherwise. That is, P2 can do equally well by considering probability distributions only on the alternatives belonging to the information set including two nodes, and (33b)is, in fact, that kind of a strategy. Such strategies are known as behavioural strategies, a precise definition of which is given below: DEFINITION gt For a given twoperson zerosum finite game in extensive form, let Y~ 1 denote the set of all probability distributions on Vi"~ where the latter is the set of all alternatives of Pi at the nodes belonging to the information set '11• Analogously, let Zn' denote the set of all probability distributions on V~" Further define Y= UN1Y~', Z = UN' Z~2. Then, a behavioural strategy yl for Pi is a mapping from the class ofall his information sets (N 1) into Y, assigning one element in Y for each set in N 1, such that yl ('1 1) E Y~ 1 for each '11 E N 1.A typical behavioural strategy y2 for P2 is defined, analogously, as a restricted mapping from N 2 into Z. The set of all behavioural strategies for Pi is called his behavioural strategy set, and it is denoted by " 0
rio
Remark 5
Every behavioural strategy is a mixed strategy, but every mixed strategy is not necessarily a behavioural strategy, unless the player has a single information set. 0 For games in extensive form, the concept of a saddlepoint equilibrium in behavioural strategies can now be introduced as in Def. 8, but with some slight This definition is valid not only for singleact games, but also for multiact games which are treated in section 5.
t
2
TWOPERSON ZEROSUM
49
obvious modifications. If J (Y1' h) denotes the average outcome of the game resulting from the strategy pair {1' 1Ef 1, yZEfZ}, then DEFINITION lOA pair of strategies {y 1 E r 1, 1'Z E r Z} is said to constitute a saddle point in behavioural strategies for a zerosum game in extensive form, if the set of inequalities
(34)
is satisfied for all y1 E r, yZ Ef z. The quantity J(y1., yz.) is known as the saddlepoint value of the game in behavioural strategies. 0 For a given singleact zerosum game in extensive form, derivation of a behaviouralstrategy saddlepoint solution basically follows the four steps outlined earlier for purestrategy saddle points, with the only difference being that this time mixed equilibrium solutions of the matrix games are also allowed in this process. The reader should note that this routine extension has already been illustrated within the context of the extensive form of Fig. 6. Since every matrix game admits a saddle point in mixed strategies (cf. Corollary 3), and further, since the above derivation involves only a finite number of comparisons, it readily follows that every singleact zerosum game in extensive form admits a saddle point in behavioural (or, equivalently, mixed) strategies. We thus conclude this section with a precise statement of this property, and by recording another apparent feature of the saddlepoint solution in extensive games. 4 I n every singleact twoperson zerosum finite game, there exists at least one saddle point in behavioural strategies, (ii) if there exist more than one saddlepoint solution in behavioural strategies, then they possess the ordered interchangeability property, and (iii) every saddle point in behavioural strategies is also a saddle point in the larger class of mixed strategies. 0
COROLLARY
(i)
We should note, before concluding this section, that a finite singleact game in extensive form might also admit a mixedstrategy saddle point (which is not behavioural) in addition to a behavioural saddle point. The corresponding average saddlepoint values, however, will all be the same, because of the interchangeability property, and therefore we may confine our attention only to behavioural strategies, also in view of Corollary 4(i).
50
DYNAMIC NONCOOPERATIVE GAME THEORY
2.5 EXTENSIVE FORMS: MULTIACT GAMES Zerosum games in which at least one player is allowed to act more than once, and with possibly different information sets at each level of play, are known as multiact zerosum games. t For an extensive formulation of such games, and in accordance with Def. 5, we consider the case when the number of alternatives available to a player at each information set is finite. This leads to a finite tree structure, incorporating possibly different information sets for each player at each level of play. A (pure) strategy of a player is again defined as a restricted mapping from his collection of information sets to the finite set of all his alternatives, exactly as in Def. 6, and the concept of a saddlepoint solution then follows the lines of Def. 8. This general framework, however, suppresses the dynamic nature of the decision process and is not constructive as far as the saddlepoint solution is concerned. An alternative to this setup exists, which brings out the dynamic nature of the problem, for an important class of multiact zerosum games known as "feedback games", a precise definition of which is given below. DEFINITION 11 A multiact twoperson zerosum game in extensive form is called a twoperson zerosum feedback game in extensive form, if (i) at the time of his act, each player has perfect information concerning the current level of play, i.e. no information set contains nodes of the tree belonging to different levels of play, (ii) information sets of the firstacting player at every level of play are singletons.i and the information sets of the secondacting player at every level of play are such that none of them includes nodes corresponding to branches emanating from two or more different information sets of the other player, i.e. each player knows the state of the game at every level of play. 0
Figure 7 displays two multiact zerosum games in extensive form, which are not feedback games, since the first one violates (i) above, and the second one violates (ii). The one displayed in Fig. 8, however, is a feedback game. Now, let the number of levels of play in a zerosum feedback game be K. Then, a typical strategy yi of Pi in such a game can be considered as composed of K components M, ... , y~), where y~ stands for the corresponding strategy t Such games are also referred to as "multistage zerosum games" in the literature. They may also be considered as "dynamic games", since the information sets of a player do, in general, have a dynamic character, providing information concerning the past actions of the players. In this respect, the singleact games of the previous section may also be referred to as "dynamic games" if the secondacting player has at least two information sets. 1 The node corresponding to each such information set is known as the "state" of the game at that level (stage) of play.
2
I
P1
51
TWO·PERSON ZEROSUM
P1
LEVEL 1
l
LEVEL 2
] LEVEL 1
I
LEVEL 2
Fig. 7. Two multipoint zerosum games which are not feedback games.
Fig. 8. A zerosum feedback game in extensive form.
of Pi at hisjth level ofact. Moreover, because of the nature of a feedback game, yjcan be taken, without any loss of generality, to have as its domain only those information sets of Pi which pertain to the jth level of play. Let us denote the collection of all such strategies for Pi at levelj by q. Then, we can rewrite the saddlepoint inequality (30), for a feedback game in extensive form, as
(Yr,· .. ,yr; Yr, ... ,Y7d ~ J (Yr,· .. ,yr; Yr,· .. ,Yk") (35) ~J * ~ J (yf, ... , Yk; Yr, ... , yn which is to be satisfied for all Y~ E r; i = 1,2; j = 1, ... ,K. Such a deJ
composition of the strategies of the players now allows a recursive procedure for the determination of the saddlepoint solution {y I O, y 2°} of the feedback game, if we impose some further restrictions on the pair {y IO, y20}. Namely, let the pair {y 1°, y2·} also satisfy (recursively) the following set of K pairs of inequalities for all Y~ E q, i = 1, 2; j = 1, , K:
J(yf, . . . , yLI' yl*; Yr,.·., yi) ~ J(yf, , yLh yl*; yi, .. . , yLh Yk) ~ J (yf, . . . ,Yk; Yr,· .. ,yih Yk), J(yf, . . . ,yLa yrz, Yk; Yi,.··, yLh Yk) s J(y}, .. . , YL2' yrz, Yk; Yi, ... , yL2' yi·z, Yk) ~ J(yf, .. . , Ykh yl*; yi, ... , yL2' yrh
yn,
J(Yr, . . . ,Yk; Yi, Yr, ... ,Yk) s J(Yr, . . . ,Yl*;Yr,· .. ,Yk) s J (yf, Yr, ... , Yl*; Yr, ... , yn
(36)
52
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION 12 For a twoperson zerosum feedback game in extensive form with K levels of play, let {y 1°, y 2°} be a pair of strategies satisfying (35)and (36) for all Y~ E r~, i = 1, 2;j = 1, ... , K. Then, {y 1°, y 20} is said to constitute a pair of (pure)feedback saddlepoint strategies for the feedback game. 0
3 Every pair {y 1°, y 2°} that satisfies the set of inequalities (36) also satisfies the pair ofsaddlepoint inequalities (35). Hence, the requirementfor satisfaction of(35) in De! 12 is redundant, and it can be deleted without any loss of generality. PROPOSITION
Proof Replace
yL .. . 'Y~l
on the LHS inequalities of the set (36) by
y{", ... , Ykl , respectively. Then, the LHS inequalities of (36) read J(y{", J(y{",
,Yk; Yr,· .. ,yi) ~ J(y{", . . . ,Yk; Yr,· .. ,yLh Yk) ,Yk; Yr, . . . ,yLh Yk) s J(y{", . . . ,Yk; Yr, . . . , y~, Ykt. Yk)
J(y\", . . . ,Yk; Yr,
Yr, ... ,Yk) s J(y{", . . . ,Yk; Yr, . . . , Yk)
for all YyE rj, j = 1, ... , K. But this set of inequalities reduces to the single inequality
J(YT,.··, Yk; yf, ... , y~ ) ~ J(YT,···, Yk; Yr, ... , Yk), since they characterize the recursive dynamic programming conditions for a oneperson maximization problem (Bellman, 1957). This then proves the validity of the LHS inequality of (35)under satisfaction of (36). Validity of the RHS inequality of (35) can likewise be verified by considering the RHS inequalities of (36). 0 The appealing feature of a feedback saddlepoint solution is that it can be computed recursively, by solving a number of static games at each levelof play. Details of this recursive procedure readily follow from the set of inequalities (36) and the specific structure of the information sets of the game, as it is described below. A recursive procedure to determine the (pure) feedback saddlepoint strategies of a feedback game (1) Starting at the last level of play, K, solve each singleact game corresponding to the information sets of the firstacting player at that level K. The resulting (pure) saddlepoint strategies are the saddlepoint strategies {y,r, yr} sought for the feedback game at level K. Record the value of each singleact game corresponding to each of the information sets of the firstacting player, and assign these values to the
2
53
TWOPERSON ZEROSUM
corresponding nodes (states) of the extensive form at level K. (2) Cross out the Kth level of play, and consider the resulting (K I)level feedback game. Now solve the last level singleact games of that extensive form, with the resulting saddlepoint strategies denoted by {yt; l s y2K"_.}, and repeat the remaining deeds of step 1 with K replaced by K1.
(K) Cross out the 2nd level of play, and consider the resulting singleact
game in extensive form. Denote its saddlepoint strategies by {yr, yf"}, and its value by V. Then, the original feedback game in extensive form admits a saddlepoint solution {(yf", ... ,yf), (yi", ... ,yf)} and has value J* = V. 0
Example 2 As an illustration of the above procedure, let us consider the 2level feedback game whose extensive form is depicted in Fig. 8. At the last (2nd) level, there are four singleact games to be solved, which are all equivalent to matrix games. From left to right, we have the normal forms
~
CITTIeilTI CITIJ
LQjJ ~LUiJ
~
with the encircled entries indicating the location of the saddlepoint solution in each case. The saddlepoint strategies and are thus given by
yr
ut
= L, if ot herwi erwise
yr
ui =
R
2" ( 2) _ L , Y2 '12  ,
(37a)
where u~ denotes the choice Pi at level 1. It is noteworthy that the saddlepoint strategy of P2 is a constant at this level of play. Now, crossing out the 2nd level of play, but retaining the encircled quantities in the four matrix games as costs attached to the four nodes, we end up with the equivalent singleact game
P21!x P1
1
0 2
1
54
DYNAMIC NONCOOPERATIVE GAME THEORY
whose normal form is
~ ~ admitting the unique saddlepoint solution
yi" = L,
y1* = L,
(37b)
and with the value being V = 1, which is also the value of the initial 2level feedback game. The feedback saddlepoint strategies of this feedback game are now given by (37), with the actual play dictated by them being u~
= u1 =
ui = u~ = L.
(38)
o
If a feedback game in extensive form does not admit a feedback saddle point in pure strategies, then it could still admit a saddle point in accordance with inequalities (35). If that is not possible either, then one has to extend the strategy spaces of the players so as to include also behavioural strategies. Definition of a behavioural strategy in multiact games is precisely the one given by Def. 9 and the relevant saddlepoint inequality is still (34), of course under the right type of interpretation. Now, in a zerosum feedback game with K levels of play, a typical behavioural strategy yi of Pi can also be considered as composed of K components (Yi, ... , yiK ) , where y~ stands for the corresponding behavioural strategy of Pi at his jth level of acta feature that follows readily from Defs 9 and 11. By the same token, yJcan be taken, without any loss of generality, to have as its domain only those information sets of Pi, which pertain to the jth level of play. Let us now denote the collection of all such behavioural strategies for Pi at level j by f} Then, since the saddlepoint inequality (34) corresponding to a Klevel feedback game can equivalently be written as AI· A2 A2) < J~(AI. AI. A2. A2.) J~ ( A I . Yl '''',YK;Yl'''',YK 
)'1
£J* s J(YL ... , yi:; yf, ... , yf), \fY~Er~,
"",YK;Yl "",YK i = 1, 2;j = 1, ... , K, (39)
it follows that a recursive procedure can be used to determine the behavioural saddlepoint strategies if we further require satisfaction of a set of inequalities analogous to (36). Such a recursive procedure basicaIly follows the Ksteps outlined in this section within the context of pure feedback saddlepoint solutions, with the only difference being that now behavioural strategies are also allowed in the equilibrium solutions of the singleact games involved. Since this is a routine extension, we do not provide details of this recursive procedure here, but only illustrate it in the sequel in Example 3.
2
55
TWOPERSON ZEROSUM
It is worth noting at this point that since every singleact twoperson zerosum game admits a saddlepoint solution in behavioural strategies (cf. Corollary 4), every feedback game will also admit a saddlepoint solution in behavioural strategies. More precisely, PROPOSITION 4 Every twoperson zerosum feedback game, which has an extensive form comprised of afinite number ofbranches, admits a saddle point in 0 behavioural strategies.
Example 3 As an illustration of the derivation of behavioural saddle point in feedback games, let us reconsider the game of Fig. 8 with two modifications: The outcomes of the game corresponding to the paths u~ = R, ui = L, u~ = L, u~ = L and u~ = R, ui = L, u~ = L, u~ = R, respectively, are now taken as 0 and  1, respectively. Then, the four singleact games to be solved at the last (2nd) levelare equivalent to the matrix games (respectively, from left to right)
[!ED
~
~ ~
l!l=!J
~
crr=D ~
which all admit purestrategy solutions, as indicated. The saddlepoint strategies at this level are, in fact, as given by (37a). Now, crossing out the 2nd levelof play, but retaining the encircled quantities in the four matrix games as costs attached to the four nodes, we end up with the equivalent single act game
P21!A P1
1
0 0
1
whose normal form
CIITJ ~
admits a mixed saddlepoint solution:
AI*_{L
Yl 
R
w.p·t 1
W.P.2,
A2* _
Yl 
{L R
w·p·t w.p."!.
(40)
56
DYNAMIC NONCOOPERATIVE GAME THEORY
Furthermore, the average saddlepoint value is Vm = 1/2. Since, within the class of behavioural strategies at level 2, the pair (37a) can be written as
I} I}
w.p. w.p. 0 w.p. R w.p.O R
L Y2  [ L A1. _
if
ul
= L,
uf = R A2.
Y2 =
otherwise,
{L
R
w.p. 1 w.p.O,
(41)
(40) and (41) now constitute a set of behavioural saddlepoint strategies for the feedback game under consideration, and with a behavioural saddlepoint value of J* = 1/2. 0
For multiact games which are not feedback games, neither Prop.4 nor the recursive procedures discussed in this section are, in general, valid, and there is no general procedure that would aid in the derivation of pure or behavioural saddlepoint solutions, even if they exist. If we allow for mixedstrategy solutions, however, it directly follows from Corollary 3 that an equilibrium solution exists,sinceevery multiact zerosum finite game in extensiveform has an equivalent normal form which is basically a matrix game with a finite number of rows and columns. More precisely, PROPOSITION 5. Every two person zerosum multiact finite game in extensive form admits a saddlepoint solution in mixed strategies. 0
The following example now illustrates derivation of mixedstrategy saddlepoint equilibrium in multiact games, which basically involves solution of an appropriate matrix game. Example 4 Let us reconsider the extensive form of Fig. 8, but with a single information set for each player at each level of play, as depicted in Fig. 9.
I I
P1 L
LEVEL 1
LEVEL 2
Fig. 9. A zerosum multiact game that does not admit a behavioural saddlepoint
solution.
2
57
TWOPERSON ZEROSUM
The players each have four possible ordered choices (LL, LR, RL, RR), and thus its equivalent normal form is
LL
LR
RL
RR
LL
1
0
2
3
LR
2
3
0
1
RL
2
1
1
0
RR
3
4
2
2
PI
P2 which admits the mixed saddlepoint solution 'I" _
{LL LR
w.p.3/5 w.p.2/5
(42a)
'2"_
{LL RR
w.p.4/5 W.p. 1/5
(42b)
Y

Y 
with the saddlepoint value in mixed strategies being Vm = 7/5. (The reader can check this result by utilizing Thm 5.) 0 Even though every zerosum multiact game in extensive form admits a saddle point in mixed strategies, such a result is no longer valid within the class of behavioural strategies, unless the mixed saddlepoint strategies happen to be behavioural strategies. A precise version of this property is given below in Prop. 6. PROPOSITION 6 Every saddlepoint equilibrium of a zerosum multiact game in behavioural strategies also constitutes a saddlepoint equilibrium in the larger class of mixed strategies for both players.
Proof Assume, to the contrary, that the behavioural saddlepoint value (j*) is not equal to the mixed saddlepoint value (Jm ) , where the latter always exists by Prop. 5, and consider the case < Jm • This implies that PI does better with his behavioural saddlepoint strategy than his mixed saddlepoint strategy. But this is not possible, since the set of mixed strategies is a much larger class. Hence, we can only have j* > Jm • Now, repeating the same argument on this strict inequality from P2's point of view, it follows that only 0 the equality t» = J m can hold.
r
Proposition 6 now suggests a method for derivation of behavioural saddlepoint strategies of zerosum multiact games in extensive form. First,
58
DYNAMIC NONCOOPERATIVE GAME THEORY
determine all mixed saddlepoint strategies of the equivalent normal form, and then investigate whether there exists, in that solution set, a pair of behavioural strategies. If we apply this method on the extensive form of Example 4, we first note that it admits a unique saddlepoint solution in mixed strategies, which is the one given by (42). Now, since every behavioural strategy for PI is in the form
LL LR RL RR
I
w.p. w.p. w.p. w.p.
YtY2 Yt (1  Y2) Y2 (1  Yd (1  yd(I  Y2)
for some 0 :5: Yt :5: 1, 0 :5: Y2 :5: I,' and every behavioural strategy for P2 is given by
LL LR RL [ RR
w.p. W.p. W.p. W.p.
ZtZ2 zdIz 2) z2(Izd (Izd(lz2)
for some 0 :5: Zt :5: 1,0 :5: Z2 :5: 1, it follows (by picking Yt = 1, Y2 = 3/5) that (42a) is indeed a behavioural strategy for PI; however, (42b) is not a behavioural strategy for P2. Hence the conclusion is that the extensive form of Fig. 9 does not admit a saddle point in behavioural strategies, from which we deduce the following corollary to Prop. 6. 5 A zerosum multiact game does not necessarily admit a saddlepoint equilibrium in behavioural strategies, unless it is a feedback game. 0
COROLLARY
We now introduce a special class of zerosum multiact games known as openloop games, in which the players do not acquire any dynamic information
throughout the decision process, and they only know the level of play that corresponds to their action. More precisely, DEFINITION 13 A multiact game is said to be an openloop game, if, at each level of play, each player has a single information set. 0
Note that YI and Y2 (and also necessarily add up to 1.
t
ZI
and Z2, in the sequel) are picked independently, and hence do not
2
TWOPERSON ZEROSUM
59
The multiact game whose extensive form is depicted in Fig. 9 is an openloop game. Such games can be considered as one extreme class of multiact games in which both players are completely ignorant about the evolution of the decision process. Feedback games, on the other hand, constitute another extreme case (with regard to the structure of the information sets) in which both players have full knowledge of their past actions. Hence, the openloop and feedback versions of the same multiact game could admit different saddlepoint solutions in pure, behavioural or mixed strategies. A comparison of Examples 2 and 4, in fact, provides a validation of the possibility of such a situation: the extra information embodied in the feedback formulation actually helps PI in that case. It is, of course possible to devise examples in which extra information in the above sense instead makes P2 better off,or does not change the value of the game at all. A precise condition for this latter property to hold is given below in Prop. 7 whose proof is in the same spirit as that of Prop. 6, and is thus left as an exercise for the reader. PROPOSITION 7 A zerosum openloop game in extensive form admits a (purestrategy) saddlepoint solution if, and only if, itsfeedback version admits constant saddlepoint strategies at each level of play.' 0
Prior and delayed commitment models Since there is no extra information available to the players throughout the duration of an openloop game, the players can decide on their actions at the very beginning, and then there is no real incentive for them to change these decisions during the actual play ofthe game. Such games in which decisions are made at the very beginning, and with no incentive to deviate from them later, are known as "prior commitment" games. Mixed saddlepoint strategies can then be considered to constitute a reasonable equilibrium solution within such a framework. Feedback games, on other hand, are of "delayed commitment" type, since each player could wait till he finds out at what information set he really is, and only then announce his action. This then implies that mixed strategies are not the "right type" ofstrategies to be considered for such games, but behavioural strategies are. Since feedback games always admit a saddle point in behavioural strategies (cf. Prop. 3), equilibrium is again well defined for this class of games. There are also other classes of multiact games of the delayed commitment type, which might admit a saddle point in behavioural strategies. One such class of multiact zerosum games are those in which players recall their past actions but are ignorant about the past actions of their opponent. As an is implicit here that these constant strategies might differ from one level to another, and that they might not be feedback saddlepoint strategies (i.e. they might not satisfy (36)).
t It
60
DYNAMIC NONCOOPERATIVE GAME THEORY
illustrative example, let us reconsider the extensive form of Fig. 9, but under the present setup. The information sets of the players would then look as displayed in Fig. 10. Now, admissible behavioural strategies of the players in this game are of the form
~l
Yl =
{L
r R
~
yHuD = ~ ~2 Yl =
illul)
w.p. Yl w.p. 1 Yl
R
{L
R
~ (t
(43a)
w.p. Y2 } w.p. 1 Y2
if
w.p. Y3 } w.p. 1 Y3
if
ut = L ut = R
(43b)
~
w.p. Zl w.p. 1z 1
(43c)
w.p. Z2 } w.p. 1z 2
if u~
w.p. Z3 } w.p. 1z3
if u~
=L
(43d)
=R
where 0 ~ Yi ~ 1, 0 ~ Zi ~ 1, i = 1,2,3, and the average outcome of the game corresponding to these strategies can be determined to be
J(:;1, r) =
ZlZ2(2YIY2  3YIY3 +2h 1
+ Y2Y3)
+zd6YIY3 2Y17YIY2 5h  hY2 +6) +Z3(Zl 1)(2YIY2 +3Yl +3Y3 3YIY3 4) + 4YIY2
+ Yl + 2Y3 
YIY3  2
gF(Y1>Y2,Y3; Zl,Z2,Z3)·
Fig. 10. A multiact game of the delayed commitment type that admits a behavioural
saddlepoint solution.
2
TWOPERSON ZEROSUM
The reader can now check that the values (y! z! = 1, zt = 0) satisfy the inequalities
F(y!,y!,yt;
61
= 1, y! = 3/5, yt = 0; z! = 4/5,
F(y!,y!,yt; z!,z!,zt) ::; F(Yl,Y2,Y3; z!,z!,zt)
Zl,Z2,Z3)::;
(44)
for all 0 ::; Yi ::; 1,0 ::; Zi ::; 1, i = 1,2,3, and hence, when they are used in (43), the resulting behavioural strategies constitute a saddlepoint solution for the multiact game under consideration, with the average value being 7/5. We thus observe that a modified version of the extensive form of Fig. 9, in which the players recall their past actions, does admit a saddlepoint solution in behavioural strategies. Derivation of the behavioural saddlepoint solutions of multiact games of the delayed commitment type involve, in general, solution of a pair of saddlepoint inequalities of the type (44). For the present case (i.e, for the specific example of Fig. 10)it so happens that F admits a saddlepoint solution which, in turn, yields the behavioural strategies sought; however, in general it is not guaranteed that it will admit a welldefined solution. The kernel F will always be a continuous function of its arguments which belong to a convex compact set (in fact a simplex), but this is not sufficient for existence ofa saddle point for F (see section 4.3). The implication, therefore, is that there will exist multiact games of the delayed commitment type which do not admit a saddlepoint solution in pure or behavioural strategies.
2.6
ZEROSUM GAMES WITH CHANCE MOVES
In this section, we briefly discuss the class of twoperson zerosum finite games which also incorporate chance moves. In such decision problems, the final outcome is determined not only by the decisions (actions) of the two players, but also by the outcome ofa chance mechanism whose odds between different alternatives are known a priori by both players. One can also view this situation as a threeplayer game wherein the third player, commonly known as "nature", has a fixed mixed strategy.
Fig. 11. A zerosum singleact game in extensive form that incorporates a chance
move.
62
DYNAMIC NONCOOPERATIVE GAME THEORY
As an illustration of such a zerosum game, let us consider the tree structure of Fig. 11 where N (nature) picks L w.p. 1/3 and R w.p. 2/3. Not knowing the realized outcome of the chance mechanism, the players each decide on whether to play Lor R. Let us now assume that such a game is played a sufficiently large number of times and the final outcome is determined as the arithmetic mean of the outcomes of the individual plays. In each of these plays, the matrix of possible outcomes is either
P2
PI
tEEE 1
(])
L
R
L R
(45a)
or
P2
PI
! 02 Iill
L
4 I R
L
(45b)
R
the former occurring w.p. 1/3 and the latter w.p. 2/3. Each of these matrix games admits a purestrategy saddlepoint solution as indicated. If N's choice were known to both players, then P2 would always play R, and PI would play R if N's choice is L and play L otherwise. However, since N's choice is not known a priori, and since the equilibrium strategies to be adopted by the players cannot change from one individual play to another, the matrix game of real interest is the one whose entries correspond to the possible average outcomes of a sufficiently large number of such games. The said matrix game can readily be obtained from (45a) and (45b) to be
P2
PI
EHE t
L
1
3
R
L
R
(46a)
which admits, as indicated, the unique saddlepoint solution
r" =
y2· = L
(46b)
2
TWOPERSON ZEROSUM
63
with an average outcome of 1. This solution is, in fact, the only reasonable equilibrium solution of the zerosum singleact game of Fig. 11, since it also uniquely satisfies the pair of saddlepoint inequalities J(')I'O, ')12)
for all
')11 E
r,
')12 E
r,
s
J(')I'o, ')12°)
s
J(')I', ')12°)
(47)
where
r 1 = r2 = {L, R } and J(')I1, ')12) denotes the expected (average) outcome of the game corresponding to a pair of permissible strategies {')I 1, ')I2}, with the expectation operation taken with respect to the probability weights assigned a priori to different choices of N. For this problem, the correspondences between J and different choices of (')11,')12) are in fact displayed as the entries of matrix (46a). It is noteworthy that P2's equilibrium strategy is L for the extensive form of Fig. 11 (as discussed above), whereas if N's choice were known to both players (Fig. 12),he would have chosen the constant strategy R, as shown earlier. Such a feature might, at first sight, seem to be counterintuitiveafter all, when his information sets can distinguish between different choices of N, P2 disregards N's choice and adopts a constant strategy (R), whereas in the case when this information is not available to him, he plays quite a different one. However, such an argument is misleading because of the following reason: in the tree structure of Fig. 12, PI's equilibrium strategy is not a constant, but is a variant of N's choice; whereas in Fig. 11 he is confined to constant strategies, and hence the corresponding zerosum game is quite a different one. Consequently there is no reason for P2's equilibrium strategies to be the same in both games, even though they are constants.
Fig. 12. A modified version of the zerosum game of Fig. 11, with both players having
access to nature's actual choice.
Remark 6 It should be noted that the saddlepoint solution of the zerosum game of Fig. 12 also satisfies a pair of inequalities similar to (47), but this time each player has four permissible strategies, i.e. r is comprised of four elements. It directly follows from (45a)(45b) that the average saddlepoint
64
DYNAMIC NONCOOPERATIVE GAME THEORY
value, in this case, is (1/3)(1) + (2/3)(1/4) = 1/2, which is less than the average equilibrium value of the zerosum game of Fig. 11. Hence, the increase in information with regard to the action of N helps P1 to reduce his average losses, but works against P2. 0 The zerosum games of both Fig. 11 and Fig. 12 are in extensive form, but they do not fit the framework of Def. 5 since possibility of chance moves were excluded there. We now provide below a more general definition ofa zerosum game in extensive form that also accounts for chance moves. DEFINITION 14 Extensive form of a twoperson zerosum finite game that also incorporates chance moves is a tree structure with (i) a specific vertex indicating the starting point of the game, (ii) a payofffunction assigning a real number to each terminal vertex of the tree, which determines the payoff (respectively, loss) to P2 (respectively, P 1)for each possible set of actions of the players together with the possible choices of nature, (iii) a partition of the nodes of the tree into three player sets (to be denoted by N 1 and N 2 for P1 and P2, respectively, and by N° for nature), (iv) a probability distribution, defined at each node of N°, among the immediate branches (alternatives) emanating from this node, (v) a subpartition of each player set N i into information sets {y/~} such that the same number of immediate branches emanate from every node belonging to the same information set, and no node follows another node in the same information set.
This definition of a zerosum game in extensive form clearly covers both singleact and multiact games, and nature's chance moves can also be at intermediate levels of play. For such a game, let us denote the strategy sets of P1 and P2 again as r 1 and r 2 , respectively. Then, for each pair 1,')'2 2} {')'I E r the value of the payoff function, J(')'I,')'2), is a random Er quantity, and hence the real quantity of interest is its expected value which we denote as J(')'I, ,),2). The saddlepoint equilibrium can now be introduced precisely in the same fashion as in Def. 8, with only J replaced by J. Since this formulation converts the extensive form into an equivalent normal form, and since both II and r are finite sets, it follows that the saddlepoint equilibrium solution of such a game problem can be obtained by basically solving a matrix game whose entries correspond to the possible values of J(')'I, ')'2). This has actually been the procedure followed earlier in this section in solving the singleact game of Fig. 11, and it readily extends also to multiact games. When a purestrategy saddle point does not exist, it is possible to extend the strategy sets so as to include behavioural or mixed strategies, and to seek equilibria in these larger classes of strategies. Proposition 5 clearly also holds
2
65
TWOPERSON ZEROSUM
for the class of zerosum games covered by Def. 14, and so does Prop. 6. Existence of a saddle point in behavioural strategies is again not guaranteed, and even if such a saddle point exists there is no systematic way to obtain it, unless the players have access to nature's actions and the game possesses a feedback structure (as in Def. 11). For games of this specific structure, one can use an appropriate extension of the recursive derivation developed in section 5 for feedback games, which we do not further discuss here. Before concluding, we should mention that the notions of prior and delayed commitment introduced in the previous section also fit within the framework of games with chance moves, with the extension being conceptually straightforward.
2.7
PROBLEMS
1. Obtain the security strategies and the security levels of the players in the following matrix games. Which one(s) admit (pure) saddlepoint solutions?
P2
PI
P2
5
2
1
2
1
4
2
3
2
0
3
1
2
1
1
1
2
3
1
2
1
2
3
4
1
1
0
1
PI
2. Obtain the mixed security strategies and the average security levels of the players in the following matrix games.
P2
filil o
PI
o
2
3
1
P2
PI
fi
3
~4 .1
3. Verify that the quantity sup, minyy' Az is unique. 4. Prove that the quantity max , y' Az is a continuous function of y E Y. 5. Determine graphically the mixed saddlepoint solutions of the following matrix games.
P2
PI
1
3
1
2
3
2
2
1
1
0
2
2
1
4
66
DYNAMIC NONCOOPERATIVE GAME THEORY
6. Convert the following matrix games into linear programming problems and thereby numerically evaluate their saddlepoint solutions.
P2
PI
R+F ttttj
P2
PI
0
1
2
3
1
0
1
2
0
1
0
1
1
0
1
0
7. The space of all real (m x n) matrices A = {aij} can be considered to be isomorphic to the mndimensional Euclidean space R'". Let Smn denote the set ofall real (m x n) matrices which, considered as the payotTmatrix of a zerosum game, admit pure strategy saddlepoint equilibria. Then Smn is isomorphic to a subset of R mn, to be denoted by sm. ". (i) Is sm. n a subspace? (ii) Is sm. n closed, convex? (iii) Describe S2. 3 explicitly. 8. Let Ti(A) denote the set of all mixed saddlepoint strategies of Pi in a matrix game A. Show that Ti(A) is nonempty, convex, closed and bounded. 9. A matrix game A, where A is a skewsymmetric matrix, is called a symmetric game. For such a matrix game prove that T 1 (A) = T 2(A), and Vm(A) = O. 10. Obtain the pure or behavioural saddlepoint solutions of the following singleact games in extensive form.
11. Obtain a feedback saddlepoint solution for the feedback game in extensive form depicted in Fig. 13. What is the value of the game? What is the actual play dictated by the feedback saddlepoint solution? Show that these constant strategies also constitute a saddlepoint solution for the feedback game. 12. Show that the feedback game in extensive form depicted in Fig. 14 does not admit a purestrategy feedback saddlepoint solution. Obtain its behavioural saddlepoint solution and the behavioural saddlepoint value. Compare the latter with the value of the game of Problem 11. What is the actual play dictated by the feedback saddlepoint solution?
2
P2 ,
L
67
TWOPERSON ZEROSUM
L
P1
R
L
R
Fig. 13. Feedback game of Problem 11.
L
P1
R
P2 P1;
P2 ;
L
L
,
R
R
L
1
.1 0
2
Fig. 14. Feedback game of Problem 12.
13. What is the openloop version of the feedback game of Fig. 14? Obtain its saddlepoint solution and value. 14. Prove that if a feedback game in extensive form admits a feedback saddlepoint solution with the value Jf , and its openloop version (if it exists) admits a unique saddlepoint solution with the value J o , then J f = J o Furthermore, show that the openloop saddlepoint solution is the actual play dictated by the feedback saddlepoint solution. (Hint: make use of Prop. 7 and the interchangeability property of the saddlepoint strategies.) *15. Verify the following conjecture: If a feedback game in extensive form admits a behavioural saddlepoint solution which actually dictates a random choice at least at one levelof play, then its openloop version (ifit exists) cannot admit a purestrategy saddlepoint equilibrium. 16. Investigate whether the following multiact game of the delayed commitment type admits a behavioural saddlepoint solution or not.
17. Determine the pure or behavioural saddlepoint solution of the following game in extensive form.
68
DYNAMIC NONCOOPERATIVE GAME THEORY
18. Determine the pure or behavioural saddlepoint solution of the following game in extensive form which also incorporates a chance move.
19. Determine the solution of the previous problem in the case when P2 has access to nature's choice. What is the value of this extra information for P2? 20. Construct an example ofa matrix with only real eigenvalues, which, when considered as a zerosum game, has a value in mixed or pure strategies that is smaller than the minimum eigenvalue. *21. A matrix game is said to be completely mixed ifit admits a mixed saddlepoint solution which assigns positive probability weight to all pure strategies of the players. Now, if A = {aij} is a nonsingular (n x n) matrix with nonnegative entries, prove that the game whose payofT matrix is A 1 is completely mixed, with average value Vm (A 1) = 1/ (I~ = 1
'L;=
1
aij)'
2.8
NOTES
Section 2.2 The theory of finite zerosum games dates back to Borel in the early 1920s whose work on the subject was later translated into English (Borel, 1953). Borel introduced the notion of a conflicting decision situation that involves more than one decision maker, and the concepts of pure and mixed strategies, but he did not really develop a complete theory of zerosum games. He even conjectured that the minimax theorem was false. It was Von Neumann who first came up with a proof of the minimax theorem, and laid down the foundations of game theory as we know of today (Von Neumann, 1928, 1937). His pioneering book with Morgenstern definitely culminates his research on the theory of games (Von Neumann and Morgenstern, 1947). A full account of the historical development in finite zerosum games, as well as possible applications in social sciences, is given in Luce and RaitTa (1957). The reader is also referred to McKinsey (1952)and the two edited volumes by Kuhn and Tucker (1950,
2
TWOPERSON ZEROSUM
69
1953). The original proof of the minimax theorem given by Von Neumann is nonelementary and rather complicated. Since then, the theorem has been proven in several different ways, some being simpler and more illuminating than others. The proof given here seems to be the simplest one available in the literature.
Section 2.3 Several illustrative examples of the graphical solution of zerosum matrix games can be found in the nontechnical book by Williams (1954). Graphical solution and the linear programming approach are not the only two applicable in the computation of mixed saddlepoint solutions of matrix games; there is also the method of Brown and Von Neumann (1950) which relates the solution of symmetric games (see Problem 9 for a definition) to the solution of a particular differential equation. Furthermore, there is the iterative solution method of Brown (1951) by fictitious play. A good account of these two numerical procedures, as well as another numerical technique of Von Neumann (1954) can be found in Luce and Raiffa (1957, Appendix 6). Sections 2.4, 2.5 and 2.6 Description of single and multiact games in extensive form again goes back to the original work of Von Neumann and Morgenstern (1947), which was improved upon considerably by Kuhn (1953). For a critical discussion of prior and delayed commitment approaches to the solution of zerosum games in extensive form the reader is referred to Aumann and Maschler (1972). Section 2.7 Problem 20: For more results on the relation between games and their spectra, see Weil (1968). Problem 21: The complete solution to this problem, as well as some other results related to completely mixed matrix games, can be found in Raghavan (1979); see also the references cited therein.
Chapter 3
Noncooperative Finite Games: N person Nonzerosum
3.1 INTRODUCTION This chapter develops a general theory for static and dynamic nonzerosum finite games under two different types of noncooperative solution conceptsthose named after J. Nash and H. von Stackelberg. The analysis starts in section 2 with bimatrix games which basically constitute a normal form description of 2person nonzerosum finite games, and in this context the Nash equilibrium solution (in both pure and mixed strategies) is introduced and its properties and features are thoroughly investigated. The section concludes with some computational aspects of mixedstrategy Nash equilibria in bimatrix games. Section 3 extends the analysis of section 2 to N person games in normal form and includes a prooffor an important theorem (Thm 2) which states that every Nperson finite game in normal form admits Nash equilibria in mixed strategies. In section 4, the computational aspects of mixedstrategy Nash equilibria are briefly discussed, and in particular the close connection between mixedstrategy Nash equilibria in bimatrix games and the solution of a nonlinear programming problem is elucidated. Section 5 deals with properties and derivation of Nash equilibria (in pure, behavioural and mixed strategies) for dynamic Nperson games in extensive form. The analysis is first confined to singleact games and then extended to multiact games. In both contexts, the concept of "informational inferiority" between two finite games in extensive form is introduced, which readily leads to "informational nonuniqueness" of Nash equilibria in dynamic games. To eliminate this feature, the additional concept of "delayed commitment" type equilibrium strategy is introduced, and such equilibria of special types of singleact and multiact finite games are obtained through recursive procedures. 70
3
NPERSON NONZEROSUM
71
Section 6 is devoted to an extensive discussion of hierarchical equilibria in
Nperson finite games. First, the twoperson games are treated, and in this context both the (global) Stackelberg and feedback Stackelberg solution concepts are introduced, also within the class of mixed and behavioural strategies. Several examples are included to illustrate these concepts and the derivation of the Stackelberg solution. The analysis is then extended to manyplayer games. The final section of the chapter discusses extension of the results of the earlier sections to Nperson finite games in extensive form which also incorporate chance moves.
3.2 BIMATRIX GAMES Before developing a general theory of nonzerosum noncooperative games, it is mostly appropriate first to investigate equilibria ofbimatrix games which in fact constitute the most elementary type of decision problems within the class of nonzerosum games. In spite of their simple structure, bimatrix games still carry most of the salient features and intricacies of noncooperative decision making. A bimatrix game can be considered as a natural extension of the matrix game of section 2.2, to cover situations in which the outcome of a decision process does not necessarily dictate the verdict that what one player gains the other one has to lose. Accordingly, a bimatrix game is comprised of two (m x n)dimensional matrices, A = {aij} and B = {b ij}, with each pair of entries (aij' bij) denoting the outcome of the game corresponding to a particular pair of decisions made by the players. As in section 2.2, we call the alternatives available to the players strategies. If PI adopts the strategy "row i" and P2 adopts the strategy "column T, then aij (respectively, bij) denotes the loss incurred to PI (respectively, P2). Being a rational decision maker, each player will strive for an outcome which provides him with the lowest possible loss. Stipulating that there exists no cooperation between the players and that they make their decisions independently, let us now investigate which pair(s) of strategies to declare as equilibrium pair(s) of a given bimatrix game; in other words, how do we define a noncooperative equilibrium solution in bimatrix games? To recall the basic property of a saddlepoint equilibrium solution in zerosum games would definitely help in this investigation: "A pair ofstrategies is in saddlepoint equilibrium if there is no incentive for any unilateral deviation by anyone of the players" (section 2.2).This is, in fact, a most natural property that an equilibrium solution is expected to possess, and surely it finds relevance also in nonzerosum games. Hence, we have
72
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION 1 A pair of strategies {row i*, columnj*} is said to constitute a noncooperative (Nash) equilibrium solution to a bimatrix game (A = {a ij}, B = {b ij}) if the following pair of inequalities is satisfied for all i = 1, ... , m and all j = 1, ... , n:
(la)
(lb) Furthermore, the pair (aiojo, biojo) is known as a noncooperative (Nash) equilibrium outcome of the bimatrix game.
o
Example 1 Consider the following (2 x 2) bimatrix game:
P2 A
_][Dl(il [DQ]
P2 PI,
B=
crnr:D ~
PI.
It admits two Nash equilibria, as indicated: {row 1, column I} and {row 2, column 2}. The corresponding equilibrium outcomes are (1,2) and ( 1,0). 0
We now readily observe from the preceding example that a bimatrix game can admit more than one Nash equilibrium solution, with the equilibrium outcomes being different in each case. This then raises the question of whether it would be possible to order different Nash equilibrium solution pairs among themselves so as to declare only one of them as the most favourable equilibrium solution. This, however, is not completely possible, since a total ordering does not exist between pairs of numbers; but, notions of "betterness" and "admissibility" can be introduced through a partial ordering. DEFINITION 2 A pair of strategies {row iI' column j I} is said to be better than another pair of strategies {row i 2 , column jj ] if aid, s ai,i; bit j, s bi'h and if at least one of these inequalities is strict. 0 DEFINITION 3 A Nash equilibrium strategy pair is said to be admissible if there exists no better Nash equilibrium strategy pair. 0
In Example 1, out of a total of two Nash equilibrium solutions only one of them is admissible ({row 2, column 2}), since it provides uniformly lower costs for both players. This pair of strategies can therefore be declared as the most "reasonable" noncooperative equilibrium solution of the bimatrix game. In the case when a bimatrix game admits more than one admissible Nash
3
73
NPERSON NONZEROSUM
equilibrium, however, such a definite choice cannot be made. Consider, for example, the (2 x 2) bimatrix game P2
A

~
P2
1
lC?
PI,
B=~Pl
LQQ]
(2)
which admits two admissible Nash equilibrium solutions {row 1, column I}, {row 2, column 2}, with the equilibrium outcomes being ( 2, 1) and (1,  2), respectively. These Nash equilibria are definitely not interchangeable, and hence a possibility for the outcome of the game to be a nonequilibrium one in an actual play emerges. In effect, since there is no cooperation between the players by the nature of the problem, PI might stick to one equilibrium strategy (say, row 1) and P2 might adopt the other one (say, column 2), thus yielding an outcome of (1,1) which is unfavourable to both players. This is indeed one of the dilemmas of noncooperative nonzerosum decision making (ifa given problem does not admit a unique admissible equilibrium), but there is really no remedy for it unless one allows some kind of communication between the players, at least to insure that a nonequilibrium outcome will not be realized. In fact, a possibility of collusion saves only that aspect of the difficulty in the above example, since the "best" choice still remains indeterminate. One way of resolving this indeterminacy is to allow a repetition and to choose one or the other Nash equilibrium solution at consecutive time points. The average outcome is then clearly ( 3/2,  3/2), assuming that the players make their choices dependent on each other. However, this favourable average outcome is not an equilibrium outcome in mixed strategies, as we shall see in the last part of this section. The solution proposed above is acceptable in the "real" decision situation, known as the battle ofthe sexes, that the bimatrix game (2) represents. The story goes as follows: A couple has to make a choice out of two alternatives for an evening's entertainment. The alternatives are a basketball game which is preferred by the husband (PI) and a musical comedy which is preferred by the wife (P2). But they would rather go together, instead of going separately to their top choices. This then implies, also in view of the bimatrix game (2), that they will eventually decide to go together either to the game or to the comedy. It should be noted that there is a flavour ofcooperation in this example, but the solution is not unique in any sense. If repetition is possible, however, then they can go one evening to the game, and another evening to the comedy, thus resolving the problem in a "fair" way. Such a solution, however, is not game theoretic as we shall see in the last part of this section. The message here is that if a bimatrix game admits more than one admissible Nash equilibrium
74
DYNAMIC NONCOOPERATIVE GAME THEORY
solution, then a valid approach would be to go back to the actual decision process that the bimatrix game represents, and see whether a version of the bimatrix game that also accounts for repetition admits a solution which is favoured by both parties. This, of course, is all valid provided that some communication is allowed between the players, in which case the "favourable" solution thus obtained will not be a noncooperative equilibrium solution. Thus, we have seen that if a bimatrix game admits more than one admissible Nash equilibrium solution, then the equilibrium outcome of the game becomes rather illdefinedmainly due to the fact that multiple Nash equilibria are in general not interchangeable. This ambiguity disappears, of course, whenever the equilibrium strategies are interchangeable, which necessarily requires the corresponding outcomes to be the same. Since zerosum matrix games are special types of bimatrix games, in which case the equilibrium solutions are known to be interchangeable (Corollary 2.1), it follows that there exists some nonempty class of bimatrix games whose equilibrium solutions possess such a property. This class is, in fact, larger than the set of all (m x n) zerosum games, as it is shown below. DEFINITION 4 Two (m x n) bimatrix games (A, B) and (C, D) are said to be strategically equivalent if there exist positive constants !XI' !X 2 , and scalars PI> P2' such that (3a) aij = !XI ci j + PI
bij = !X2dij + P2' for all i = 1, ... , m; j = 1, ... , n.
(3b)
o
Remark 1 The reader can easily verify that relation (3) is indeed an equivalence relation since it is symmetric, reflexive and transitive. 0 PROPOSITION 1 All strategically equivalent bimatrix games have the same N ash equilibria.
Proof Let (A, B) be a bimatrix game with a Nash equilibrium solution {row i*, column j*}. Then, by definition, inequalities (la}(1 b) are satisfied. Now let (C, D) be any other bimatrix game that is strategically equivalent to (A, B). Then there exist !Xl > 0, !X 2 > 0, PI' P2 such that (3a}(3b) are satisfied for all possible i and]. Using these relations in (la) and (lb), we obtain, in view of the positivity of !XI and !X 2 , Cj*r:::; c,r
d j' r
s
dj.j'
3
NPERSON NONZEROSUM
75
for all i = 1, ... , m and all j = 1, ... , n, which proves the proposition since (A, B) was an arbitrary bimatrix game. 0 2 N ash equilibria of a bimatrix game (A, B) are interchangeable B) is strategically equivalent to (A,  A).
PROPOSITION
if (A,
Proof The result readily follows from Corollary 2.1 since, by hypothesis, the bimatrix game (A, B) is strategically equivalent to a zerosum game A. 0
The undesirable features of the Nash equilibrium solutions of bimatrix games, that we have witnessed, might at first sight lead to questioning appropriateness and suitability of the Nash equilibrium solution concept for such noncooperative multiperson decision problems. Such a verdict, however, is not fair, since the undesirable features detected so far are due to the noncooperative nature of the decision problem under consideration, rather than to the weakness of the Nash equilibrium solution concept. To supplement our argument, let us consider one extreme case (the socalled "team" problem) that involves two players, identical goals (i.e. A = B), and noncooperative decision making. Note that, since possible outcomes are single numbers in this case, a total ordering of outcomes is possible, which then makes the admissible Nash equilibrium solution the only possible equilibrium solution for this class of decision problems. Moreover, the admissible Nash equilibrium outcome will be unique and be equal to the smallest entry of the matrix that characterizes the game. Note, however, that even for such a reasonable class of decision problems, the players might end up at a nonequilibrium point in case of multiple equilibria; this being mainly due to the noncooperative nature of the decision process which requires the players to make their decisions independently. To illustrate such a possibility, let us consider the identical goal game (team problem) with the cost matrices P2
A~B~BIE
PI,
(4)
which both parties try to minimize, by appropriate choices of strategies. There are clearly two equilibria, {row 1, column I} and {row 2, column 2}, both yielding the minimum cost of 0 for both players. These equilibria, however, are not interchangeable, thus leaving the players at a very difficult position if there is no communication between them. (Perhaps tossing a fair coin and thereby securing an average cost of 1 would be preferred by both sides to any pure
76
DYNAMIC NONCOOPERATIVE GAME THEORY
strategy which could easily lead to an unfortunate cost of 1.) This then indicates that even the wellestablished teamequilibrium solution could be frowned upon in case of multiple equilibria and when the decision medium is noncooperative. One can actually make things look even worse by replacing matrix (4)with a nonsymmetric one P2
A~B~~
PI
(5)
which admits the same two equilibria as in the previous matrix game. But what are the players actually going to play in this case? To us, there seems to be only one logical play for each player: PI would play "row 2", hoping that P2 would play his second equilibrium strategy, but at any rate securing a loss ceiling of 1. P2, reasoning along similar lines, would rather play "column I". Consequently, the outcome of their joint decision is I which is definitely not an equilibrium outcome. But, in an actual play of this matrix game, this outcome is more likely to occur than the equilibrium outcome. One might counterargue at this point and say that P2 could very well figure out PI's way of thinking and pick "column 2" instead, thus enjoying the low equilibrium outcome. But what if PI thinks the same way? There is clearly no end to this socalled secondguessing iteration procedure if adopted by both players as a method that guides them to the "optimum" strategy. One might wonder at this point as to why, while the multiple Nash equilibria of bimatrix games (including identicalgoal games) possess all these undesirable features, the saddlepoint equilibrium solutions of zerosum (matrix) games are quite robustin spite of the fact that both are noncooperative decision problems. The answer to this lies in the nature of zerosum games: they are completely antagonistic, and the noncooperative equilibrium solution fits well within this framework. In other nonzerosum games, however, the antagonistic nature is rather suppressed or is completely absent, and consequently the "noncooperative decision making" framework does not totally suit such problems. If some cooperation is allowed, or if there is a hierarchy in decision making, then equilibrium solutions of nonzerosum (matrix) games could possess more desirable features. We shall, in fact, observe this later in section 6 where we discuss equilibria in nonzerosum games under the latter mode of decision making. The minimax solution One important property of the saddlepoint strategies in zerosum games was that they were also the security strategies of the players (see Thm 2.2). In
3
77
NPERSON NONZEROSUM
bimatrix games, one can still introduce security strategies for the players in the same way as it was done in section 2.2,and with the properties cited in Thm 2.1 being valid with the exception of the third one. It should of course be clear that, in a bimatrix game (A, B), the security strategies of PI involve only the entries of matrix A, while the security strategies of P2 involve only those of matrix B. In Example 1,considered earlier, the unique security strategy of PI is "row 1", and the unique security strategy of P2 is "column 1". Thus, considered as a pair, the security strategies of the players correspond to one of the Nash equilibrium solutions, but not to the admissible one, in this case. One can come up with examples of bimatrix games in which the security strategies do not correspond to anyone of the equilibrium solutions, or examples of bimatrix games in which they coincide with an admissible Nash solution. As an illustration of the latter possibility, consider the following bimatrix game: P2
A=
ffij o 5 30
2
P2
PI,
B=
ffi 30
o
2
PI.
(6)
This has a unique pair of Nash equilibrium strategies, as indicated, which are also the security strategies of the players. Moreover, the equilibrium outcome is determined by the security levels of the players. The preceding bimatrix game is known in the literature as the Prisoners' dilemma. It characterizes a situation in which two criminals, suspected of having committed a serious crime, are detained before a trial. Since there is no direct evidence against them, their conviction depends on whether they confess or not. If the prisoners both confess, then they will be sentenced to 8 years. If neither one confesses, then they will be convicted of a lesser crime and sentenced to 2 years. If only one of them confesses and puts the blame on the other one, then he is set free according to the laws of the country and the other one is sentenced to 30 years. The unique Nash equilibrium in this case dictates that they both should confess. Note, however, that there is in fact a better solution for both criminals, which is that they both should refuse to confess; but implementation of such a solution would require a cooperation of some kind (and also trust), which is clearly out of question in the present context. Besides,this second solution (which is not Nash) is extremely unstable, since a player will find it to his advantage to unilaterally deviate from this position at the last minute. Even if the security strategies might not be in noncooperative equilibrium, they can still be employed in an actual game, especially in cases when there exist two or more noninterchangeable Nash equilibria or when a player is not completely sure of the cost matrix, or even the rationality, of the other player.
78
DYNAMIC NONCOOPERATIVE GAME THEORY
This reasoning leads to the following second solution concept in bimatrix games. DEFINITION
5 A pair of strategies {row i, column
J} is known as a pair of
minimax strategies for the players in a bimatrix game (A, B) if the former is a security strategy for PI in the matrix game A, and the latter is a security strategy for P2 in the matrix game B. The corresponding security levelsof the 0 players are known as the minimax values of the bimatrix game. Remark 2 It should be noted that minimax values of a bimatrix game are
definitely not lower (in an ordered way) than the pair of values of any Nash equilibrium outcome. Even when the unique Nash equilibrium strategies correspond to the minimax strategies, the minimax values could be higher than the values of the Nash equilibrium outcome, mainly because the minimax strategy of a player might not constitute optimal response to the minimax strategy of the other player. 0 Mixed strategies In our discussion ofbimatrix games in this section, we have so far encountered cases when a given bimatrix game admits a unique or a multiple of Nash equilibria. There are cases, however, when Nash equilibrium strategies might not exist, as illustrated in the following example. In such a case we simply say that a Nash equilibrium does not exist in pure strategies. Example 2
Consider the (2 x 2) bimatrix game P2
P2
A
~ 2
o 1
PI,
B=i3T2l ITrD
Pi.
It clearly does not admit a Nash equilibrium solution in pure strategies. The minimax strategies, however, exist (as they always do, by Thm 2.1 (ii)) and are given as {row 1, column 2}. 0 Paralleling our analysis in the case of nonexistence of a saddle point in zerosum games (section 2.2), we now enlarge the class of strategies so as to include all mixed strategies, defined as the set of all probability distributions on the set of pure strategies of each player (seeDef. 2.2). This extension is in fact sufficient to insure existence of a Nash equilibrium solution (in mixed strategies, in a bimatrix game)a result which we state below in Theorem 1,after making the concept of a Nash equilibrium in this extended space precise. Using the same notation as in Defs 2.32.5, wherever appropriate, we first have
3
NPERSON NONZEROSUM
79
DEFINITION 6 A pair {y* EY, z* E Z} is said to constitute a noncooperative (Nash) equilibrium solution to a bimatrix game (A, B) in mixed strategies, if the following inequalities are satisfied for all YEYand ZEZ:
y*'Az* ::5: y'Az*, yEY
(7a)
y*'Bz*::5: y*'Bz,
(7b)
ZEZ.
Here, the pair (y *, Az *, y *,Bz *) is known as a noncooperative (Nash) equilibrium outcome of the bimatrix game in mixed strategies.
o
1 Every bimatrix game has at least one Nash equilibrium solution in mixed strategies.
THEOREM
Proof We postpone the proof of this important result till the next section 0 where it is proven in a more general context (Thm 2). Computation of mixed Nash equilibrium strategies of bimatrix games is more involved than the computation of mixed saddlepoint solutions, and in the literature there are very few generally applicable methods in that context. One of these methods converts the original game problem into a nonlinear programming problem, and it will be discussed in section 4. What we intend to include here is an illustration of the computation of mixed Nash equilibrium strategies in (2 x 2) bimatrix games by directly making use of inequalities (7a}(7b). Specifically, consider again the bimatrix game of Example 2 which is known not to admit a purestrategy Nash equilibrium. In this case, since every yEYcan be written as y = (Y1' (1 Yd)" with 0 ::5: Y1 ::5: 1, and similarly every z EZ as z = (z1> (1 z 1»', with 0::5: z 1::5: 1, we first obtain the following equivalent inequalities for (7a) and (7b), respectively, for the bimatrix game under consideration:
2ytzt + yt ::5: 2Y1Zt + Y1' 0::5: Y1 ::5: 1, 2ytztzt::5:
yt»'
2Ytz1Z1'
O::5:z1::5:1.
Here (yt, (1and (zt, (1 zt»' denote the mixed Nash equilibrium strategies of PI and P2, respectively, which have yet to be computed so that the preceding pair of inequalities is satisfied. A way of obtaining this solution would be to find that value of zt (ifit exists in the interval [0, 1]) which would make the RHS of the first inequality independent of Y1> and also the value of yt that would make the RHS of the second inequality independent of Z1' The solution then readily turns out to be (yt = t, zt = i) which can easily be checked to be the unique solution set of the two inequalities. Consequently, the bimatrix game of Example 4 admits a unique Nash equilibrium solution in mixed strategies, which is {y* = (t, if, z* = (t, in.
80
DYNAMIC NONCOOPERATIVE GAME THEORY
Remark 3 In the above solution to Example 2 we observe an interesting, and rather counterintuitive, feature of the Nash equilibrium solution in mixed strategies. The solution obtained, i.e. {y *, z*}, has the property that the inequalities (7a) and (7b) in fact become independent of y EY and z E Z. In other words, direction of the inequality loses its significance.This leads to the important conclusion that if the players were instead seeking to maximize their average costs, then we would have obtained the same mixed equilibrium solution. This implies that, while computing his mixed Nash equilibrium strategy, each player pays attention only to the average cost function of his coplayer, rather than optimizing his own average cost function. Hence, the nature of the optimization (i.e minimization or maximization) becomes irrelevant in this case. The same feature can also be observed in mixed saddlepoint solutions, yielding the conclusion that a zerosum matrix game A could have the same mixed saddlepoint solution as the zerosum game with 0 matrix A. The following proposition now makes the conclusions of the preceding remark precise.
3
o
0
Let Yand Z denote the sets of inner points (interiors) ofYand Z, respectively. If a bimatrix game (A, B) admits a mixedstrategy Nash equilibrium solution {y* E Y, z* E Z},t then this also provides a mixedstrategy N ash equilibrium solution for the bimatrix game ( A,  B). PROPOSITION
Proof By hypothesis, inequalities (7a) and (7b) are satisfied by a pair {y* d: z* E Z}. This implies that min y y Az* is attained by an inner point of Y, and hence, because of linearity of y'Az* in y, this expression has to be independent of y. Similarly, y*'Bz is independent of z. Consequently, (7a}(7b)
can be written as y*'Az*
= y'Az*,
y*'Bz* = y*'Bz,
'
'
which further leads to the inequalitypair y*'(  A)z* ::s:: y'(  A)z*,
'
y*'(  B)z* ::s:: y*'(  B)z,
'
which verifies the assertion of the proposition.
o
As a special case of this proposition, we now immediately have the following result for zerosum matrix games. t Such a solution is also known as a completely mixed Nash equilibrium solution or as an inner mixedstrategy Nash equilibrium solution.
3
81
N·PERSON NONZERO·SUM
COROLLARY 1 Any inner mixed saddle point ofa zerosum matrix game A also constitutes an inner mixed saddle point for the zerosum matrix game  A. 0
Even though we have introduced the concept of a mixed noncooperative equilibrium solution for bimatrix games which do not admit a Nash equilibrium in pure strategies, it is quite possible that a bimatrix game that admits (pure) Nash equilibria would also admit a mixed Nash equilibrium solution. To verify this possibility, let us again consider the bimatrix game of the "battle of sexes", given by (2). We have already seen that it admits two noninterchangeable Nash equilibria with outcomes ( 2, 1) and ( 1,  2). Let us now investigate whether it admits a third equilibrium, this time in mixed strategies. F or this bimatrix game, letting y = (y l' (l  y d), E Y and z = (z 1 , (1  zd), E Z, we first rearrange the inequalities (7a)(7b) and write them in the equivalent form  5yj zj + 2yj :5:  5Yl zj + 2ylo
o :5: Yl
:5: 1
5yjzt+3zt:5: 5ytZl +3z lo
0:5:
:5: 1.
Zl
Now it readily follows that these inequalities admit the unique inner point solution (yj = t, zt = t) dictating the mixed Nash equilibrium solution {y* = (t,t)', z* = (~,tn, and an average outcome of (!, h Of course, the above inequalities also admit the solutions (yj = 1, zj = 1) and (yt = 0, zt = 0) which correspond to the two pure Nash equilibria which we have discussed earlier at some length. Our interest here lies in the nature of the mixed equilibrium solution. Interpreted within the context of the "battle of sexes", this mixed solution dictates that the husband chooses to go to the game with probability t and to the musical comedy with probability l The wife, on the other hand, decides on the game with probability ~ and on the comedy with probability l Since, by the very nature of the noncooperative equilibrium solution concept, these decisions have to be made independently, there is a probability of 13/25 with which the husband and wife will have to spend the evening separately. In other words, even if repetition is allowed, the couple will be together for less than t the time, as dictated by the unique mixed Nash equilibrium solution; and this so in spite of the fact that the husband and wife explicitly indicate in their preference ranking that they would rather be together in the evening. The main reason for such a dichotomy between the actual decision process and the solution dictated by the theory is that the noncooperative equilibrium solution concept requires decisions to be made independently, whereas in the bimatrix game of the "battle ofsexes" our way of thinking for a reasonable equilibrium solution leads us to a cooperative solution which inevitably asks for dependent decisions. (See Problem 4, section 8, for more variations on the "battle of sexes")
82
DYNAMIC NONCOOPERATIVE GAME THEORY
The conclusion we draw here is that bimatrix games with purestrategy Nash equilibria could also admit mixedstrategy Nash solutions which, depending on the relative magnitudes of the entries of the two matrices, could yield better or worse (and of course also noncomparable) average equilibrium outcomes.
3.3 NPERSON GAMES IN NORMAL FORM The class of Nperson nonzerosum finite static games in normal form model a decision making process similar in nature to that modelled by bimatrix games, but this time with N (> 2) interacting decision makers (players). Decisions are again made independently and out ofa finite set ofalternatives for each player. Since there exist more than two players, a matrix formulation on the plane is not possible for such games, thus making the display of possible outcomes and visualization of equilibrium strategies rather difficult. However, a precise formulation is still possible, as it is provided below together with the notation to be used in describing Nperson finite static games in normal form.
Formulation of an Nperson finite static game in normal form (1) There are N players to be denoted by PI, P2, ... , PN. Let us further denote the index set {I, 2, ... , N} by N. (2) There is a finite number of alternatives for each player to choose from. Let mi denote the number of alternatives available to Pi, and further denote the index set {I, 2, ... , mi} by M i , with a typical element of M, designated as ni • (3) If Pj chooses a strategy njEMj, and this so for all j s N, then the loss "N' The ordered Ntuple of incurred to Pi is a single number a~1o"2"'" all these numbers (over iEN),i.e. (a~l"" ,nN' a~I'" .,nN' •.. ,a~, ... ,nN)' constitutes the corresponding unique outcome of the game. (4) Players make their decisions independently and each one seeks unilaterally the minimum possible loss, of course by also taking into account the possible rational choices of the other players. The noncooperative equilibrium solution concept within the context of this Nperson game can be introduced as follows as a direct extension of Def. 1. DEFINITION 7 An Ntuple ofstrategies {nt, n!, ' , . ,nt}, with nt EM i , i E N is .said to constitute a noncooperative (Nash) equilibrium solution for an Nperson nonzerosum static finite game in normal form, as formulated above, if the
3
83
NPERSON NONZEROSUM
1
following N inequalities are satisfied for all n i E M i , i EN:
al'~al"
a 2' ~a2" 
"1'"2. "),"2.
as: ~ aN,

"l,nr.
, s a"1,"2" l,
,
,nN
.,nN
,s a 2,
. FIN
, "l,n2'"3"
, SaN. "t.
.n»
' ,nN ,
.nNI'"N
(8)
.
Here, the Ntuple (ai', a Z' , ••• , aN') is known as a noncooperative (Nash) equilibrium outcome of the Nperson game in normal form. 0 There is actually no simple method to determine the Nash equilibrium solutions of Nperson finite games in normal form. One basically has to check exhaustively all possible combinations of Ntuples of strategies, to see which ones provide Nash equilibrium. This enumeration, though straightforward, could at times be rather strenuous, especially when Nand/or m i , i E N are large. However, given an N tuple ofstrategies asserted to be in Nash equilibrium, it is relatively simpler to verify their equilibrium property, since one then has to check only unilateral deviations from the given equilibrium solution. To get a flavour of the enumeration process and the method of verification of a Nash equilibrium solution, let us now consider the following example. Example 3 Consider a 3person game in which each player has two alternatives to choose from. That is, N = 3 and ml = m2 = m3 = 2. To complete the description of the game, the 23 = 8 possible outcomes are given as (at 1, l' at, 1,1, at I, d = (1, 1,0)
at,
(at 2,1> 2, l' at 2, d = (2,1,1) (at 1,1, at 1,1> at 1, d = (2,0,1)
(at 2,1> at 2,1' at 2, d = (0,  2,1)
(at 1,2' at 1,2, at 1,2) = (1,1,1) (at 2, 2, at 2,2' at 2,2) = (0,1,0) (at 1,2' at 1,2, at 1,2) = (0,1,2) ( 1, 2, 0).
(at 2,2, at 2,2, at 2, 2) =
These possible outcomes can actually be displayed in the form of two (2 x 2) matrices P2 n3
(2,1,1)
= 1: f::::",,=="=+I PI (2,0,1)
(0,  2,1)
(9a)
84
DYNAMIC NONCOOPERATIVE GAME THEORY
P2 (1,1,1)
(0,1,0)
(0,1,2)
(1,2,0)
P1
(9b)
where the entries of the former matrix correspond to the possible outcomes if P3's strategy is fixed at n 3 = 1, and the latter matrix provides possible outcomes if his strategy is fixed at n 3 = 2. We now assert that the encircled entry is a Nash equilibrium outcome for this game. A verification of this assertion would involve three separate checks, concerning unilateral deviation of each player. If P1 deviates from this asserted equilibrium strategy nl = 1, then his loss becomes 2 which is not favourable. If P2 deviates from n2 = 1,his loss becomes 1 which is not favourable either. Finally, if P3 deviates from n 3 = 1,his loss becomes 1 which is higher than this asserted equilibrium loss O. Consequently, the first entry of the first matrix, i.e (1, 1,0), indeed provides a Nash equilibrium outcome, with the corresponding equilibrium strategies being {nT = 1, n! = 1, nJ = 1}. The reader can now check by enumeration that this is actually the only Nash equilibrium solution of this 3person game. 0 When the Nash equilibrium solution of an N person nonzerosum game is not unique, we can again introduce the concept of "admissible Nash equilibrium solution" as direct extensions ofDefs 2 and 3 to the N person case. It is of course possible that a given N person game will admit more than one admissible Nash equilibrium solution which are also not interchangeable. This naturally leads to an illdefinedequilibrium outcome; but we will not elaborate on these aspects of the equilibrium solution here since they were extensively discussed in the previous section within the context of bimatrix games, and that discussion can readily be carried over to fit the present framework. Furthermore, Prop. 1 has a natural version in the present framework, which we quote below without proof, after introducing an extension of Def. 4. Proposition 2, on the other hand, has no direct counterpart in an N person game with N > 2. DEFINITION 8 Two nonzerosum finite static games in normal form are said to be strategically equivalent if the following three conditions are satisfied: (i) the two games have the same number of players (say N), (ii) each player has the same number of alternatives in both games, a~, .. ,,"), niEMi, iEN} is the set of possible (iii) if {(a~I"""N"'" outcomes in one game, and {(b~I'" " "N' ... ,b~". " "N)' n i E M i , i E N} is the set of possible outcomes in the other, then there exist positive constants ~i' iEN, and scalars Pi' iEN, such that
(10)
3
85
NPERSON NONZEROSUM
o PROPOSITION 4 All strategically equivalent nonzerosumfinite static games in normal form have the same Nash equilibria. 0
Minimax strategies introduced earlier within the context of bimatrix games (Def. 5) find applications also in Nperson games, especially when there exist more than one admissible Nash equilibrium which are not interchangeable. In the present context their definition involves a natural extension of Def. 5, which we do not provide here. Example 4 In the 3person game for Example 3, the minimax strategies of P1, P2 and P3 are 111 = 1 or 2, 112 = 2 and 113 = 1, respectively. It should be noted that for P1 any strategy is minimax. The security levels of the players, however, are unique (as they should be) and are 2, 1 and 1, respectively. Note that these values are higher than the unique Nash equilibrium outcome (1,  1,0), in accordance with the statement of Remark 2 altered to fit the present context. 0
Mixedstrategy Nash equilibria Mixed noncooperative equilibrium solution of an Nperson finite static game in normal form can be introduced by extending Def. 6 to N players, i.e. by replacing inequalities (7a) and (7b) by an Ntuple of similar inequalities. To this end, let us first introduce notation yi to denote the mixed strategy space of Pi, and further denote a typical element of this space by y', and its kth component by y~. Then, we have DEFINITION 9 An Ntuple {yi·Eyi; iEN} is said to constitute a mixedstrategy noncooperative (Nash) equilibrium solution for an Nperson finite static game in normal form if the following N inequalities are satisfied for all yiEyi,jEN:
I"
~ " L.
M1
... "t... Yn'·1y"2•2 ... yNn• a 1n1.· .• n:::;; "L. •.• "L.." Yn11 y,,2•2 . . . N
MN
N
M1
MN
N.
1
Y nN ant> ... , r»
J2·
~ " 2· '' . . . " t.., , y nl. y n2 ~
NA J • =
N· ant> 2 y nN
~
L ... I MI
•••
MN
.. ,
" nN < t.., ~
2 3· ... " t.., , y. nl y n2Y nJ
... y~:a~"
2· N· anN Yn1·t y" 2 ... y "N 1.·
•••
~
.. ,nN
(ll)
86
DYNAMIC NONCOOPERATIVE GAME THEORY
Here, the Ntuple (P", ... , IN") is known as a noncooperative (Nash) equilibrium outcome of the N person game in mixed strategies. 0 One of the important results of static game theory is that every Nperson game of the type discussed in this section admits a mixedstrategy Nash equilibrium solution (note that pure strategies are also included in the class of mixed strategies). We now state and prove this result which is also an extension of Thm 1. THEOREM 2 Every Nperson static finite game in normal form admits a noncooperative (Nash) equilibrium solution in mixed strategies.
Proof Let {yiEyi; iEN} denote an Ntuple of mixed strategies for the Nperson game, and introduce ./, i (
'I' ",
1
Y , ... , y
N) _
"
"1
MI
MN
2
 £oJ ••• £oJ y"1 Y "2
•••
N
i
Y"Na"l.. ·."N
for each ni E M, and i E N. Note that this function is continuous on the product set n N yi. Now, let c'", be related to v':n, by c~ , = max{ljJ~,O}
,
for each niEMi and iEN. It readily follows that C~(yl, . . . , yN) is also a continuous function on n, yi. Then, introducing th~ transformation i
y",=
y~+c~ ,
I
1+
L cj
,
njEM h ieN,
(12)
jeM,
and denoting it by (13)
we observe that T is a continuous mapping of Il N yi into itself. Since each yi is a simplex of appropriate dimension (see section 2.2), the product set Il N yi becomes closed, bounded and convex. Hence, by Brouwer'sfixed point theorem (see Appendix III), mapping T has at least one fixed point. We now prove that every fixed point of this mapping is necessarily a mixedstrategy Nash equilibrium solution of the game, and conversely that every mixedstrategy Nash equilibrium solution is a fixed point of T, thereby concluding the proof of the theorem. We first verify the latter assertion: If {yi*; i EN} is a mixedstrategy
3
87
NPERSON NONZEROSUM
equilibrium solution, then, by definition (i.e. from inequalities (11), the function v'; (y I', . . . ,yN*) is nonpositive for every n j E M, and i E N, and thus c~, = 0 for ~ll nj E M i , i E N. This readily yields the conclusion T(yl', ... ,yN')
=
(yl', ... , yN'),
which is that {yi'; iEN} is a fixed point ofT. For verification of the former assertion, suppose that {i; i EN} is a fixed point of T, but not a mixed equilibrium solution. Then, for some i E N (say i = 1) there exists a ji 1 E Y I such that
(14) Now let iii denote an index for which the quantity (15) attains its minimum value over n 1 E MI' Then, since jil is a mixed strategy, the RHS of (l4) can be bounded below by (15) with n l = iii, thus yielding the strict inequality
L . . . L Yn' .. Yn an I
M,
MN
N
t
N
I
I,
which can further be seen to be equivalent to t/J~l
(y', y2, ... ,yN) > O.
ct
(16)
This inequality, when used in the definition of c~" implies that > o. But since enI , is nonnegative for all n 1 EM I , the summation term I: M I c~ , becomes positive. Now, again referring back to inequality (14), this time we let "1 denote an index for which (15) attains its maximum value over n1 EM I . Then, going through an argument similar to the one used in the preceding paragraph, we bound the LHS of (14) from above by (15) with n l = "I' and arrive at the strict inequality ,1,1(1 2 'l'ri, Y ,y,
This then implies that cit yields the conclusion
... ,y N) < 0.
= 0, which, when used in (12) with i = I
1 and n l
= "1'
I
Yri, < Yri,
since I: M, c~, point of T.
> O. But this contradicts the hypothesis that
{v': i E N} is a fixed 0
88
DYNAMIC NONCOOPERATIVE GAME THEORY
The proof of Thm 2, as given above, is not constructive. For a constructive proof, which also leads to a numerical scheme to obtain the equilibrium solution, the reader is referred to Scarf (1967). But, in general, it is not possible to obtain the equilibrium solution explicitly. If a given game admits an inner mixedstrategy equilibrium solution, however, then a possibility of obtaining the corresponding strategies in explicit form emerges, as the following proposition (which could also be considered as an extension of Remark 3 and Prop. 3) indicates. PROPOSITION 5 Let yi denote the interior of yi with i E N. Then, any inner mixed Nash equilibrium solution {y i* E yi; i EN} of an N person finite static game in normal form satisfies the set of equations
L L ... L y~>~: M, M,
n2
+ 1,
L ... L M,
M N_,
nN
... y~;(a:,..
MN
y~:
.r» 
a:,.l.n,.. .nN) = 0, n2 E M 2 , (17)
...yNn~~·1
(a~ . .. nN  a~,.
.nN_I) = 0, nNE M N,
+ 1.
Furthermore, these mixed strategies also provide a mixedstrategy Nash equilibrium solutionfor the Nperson game structured in a similar way but with some (or all)ofthe players maximizing their average costs instead ofminimizing. Proof Since {yi* E yi; i E N} is an equlibrium solution, yi* minimizes the quantity ~
"1*
M,
MN
L... •.. L...
iI· i
i+l*
N*
1
Yn," . Yn,_tYn,Yni+t' .. YnNant .... »»
over yi, and with i E N. Let us first take i = 1, and rewrite the preceding expression (in view of the relation LM t Y~t = 1) as
3
89
N·PERSON NONZERO·SUM
Now, since the minimizing y 1· is an inner point of yl by hypothesis, and since the preceding expression is linear in {y~l; n 1 EM 1 ; n 1 =f I}, it readily follows that the coefficient of y~, has to vanish for each n 1 E M 1 n 1 =f 1. This condition then yields the first set of equations of (17). The remaining ones can be verified analogously, by taking i = 2,3, ... , N. Finally, the last part of Prop. 5 follows, as in Prop. 3, from the property that at the equilibrium point { / E yi; i E N i , i =f j} Pj's average cost is actually independent of'j' (which has just been proven), and thus it becomes immaterial whether he minimizes or maximizes 0 his average cost. Since the hypothesis of Prop. 5 requires the mixedstrategy Nash equilibrium solution to be an inner point of the product set yl x .. ' X yN, the set of equations (17) is satisfied under quite restrictive conditions, and even more so under the conjecture of Problem 9 (section 8), since then it is required that the number of alternatives of each player be the same. Nevertheless, the proposition still provides a characterization of the equilibrium solution, and enables direct computation of the corresponding strategies by merely solving coupled algebraic equations, under conditions which are not totally void. As an illustration of this point we now provide the following example.
Example 5 Consider a 3person game in which each player has two alternatives to choose from and in which the possible outcomes are given by P2
(1, 1, 0)
P2
(0, 1, 0)
(1, 0, 1) (0, 0, 0) PI, n3 = 2: f      +         j PI. (2, 0, 0) (0, 0, 1) (0, 3, 0) (1, 2, 0)
Since M i consists of 2 elements for each i = 1,2,3, the set (17) involves only 3 equations for this game, which can be written as
1  y~  2y~
+ yh~ = 0 + y~y~ =
22y~
2y~
y~
1 = 0
+ y~
}
° .
(18)
This coupled set of equations admits a unique solution with the property < 1, which is
o < yL yL y~
yf= J31
yf = Y
23. 
2J3 1  3!.J3 .
90
DYNAMIC NONCOOPERATIVE GAME THEORY
Hence, this 3person game admits a unique inner Nash equilibrium solution in mixed strategies, which is
= (2  )3, )3  1)' y2* = ()3  1, 2  )3)' y3* = (i)3, li)3)'. y r
It should be noted that this game admits also two purestrategy Nash equilibria with outcomes (1, 1,0) and ( 1,2,0). 0
3.4 COMPUTATION OF MIXEDSTRATEGY NASH EQUILIBRIA IN BIMATRIX GAMES We have discussed in section 3 a possible approach towards obtaining mixedstrategy Nash equilibria of Nperson games in normal form when these equilibria have the property of being inner solutions. In other words, when a mixedstrategy equilibrium solution assigns positive probabilities to all possible purestrategy choices of a player, and this so for all players, then the corresponding equilibrium strategies can be determined by solving a set of algebraic equations (cf. Prop. 5). For the special case of'bimatrix games, this requirement of Prop. 5 says that the pair of mixed Nash equilibrium strategies y* and z* should be elements of Y and Z, the interiors of Y and Z, respectively, which is a rather restrictive condition. If this condition is not satisfied, then one natural approach would be to set some components of the strategies y and z equal to zero and obtain algebraic equations for the other components, in the hope that the solutions of these algebraic equations will be nonnegative. If such a solution does not exist, then one can set some other components equal to zero and look for a nonnegative solution of the algebraic equation obtained from the other components, and so on. Since there are only a finite number ofpossibilities, it is obvious that such an approach will eventually yield mixedstrategy Nash equilibria of bimatrix games, also in view of Thm 2. But, then, the natural question that comes to mind is whether this search can be done in a systematic way. Indeed it can, and this has been established by Lemke and Howson (1964) in their pioneering work. In that paper, the authors' intention has actually been to give an algebraic proof of the existence of mixed equilibrium solutions in bimatrix games (i.e. the result ofThm 1),but, as a byproduct, they also obtain an efficient scheme for computing mixed Nash equilibria, which was thereafter referred to as the "LemkeHowson" algorithm. The reader should consult the original work (Lemke and Howson, 1964)and an expository article (Shapley, 1974)for the essentials of this algorithm and its application in bimatrix games.
3
NPERSON NONZEROSUM
91
Relation to nonlinear programming Yet another general method for the solution ofa bimatrix game is to transform it into a nonlinear (in fact, a bilinear) programming problem, and to utilize the numerical techniques developed for solutions of nonlinear programming problems. In the sequel, we establish this equivalence between bimatrix games and a specific class of bilinear programming problems with linear constraints (see Prop. 6); we refer the reader to Luenberger (1973) for algorithms on the numerical solution of nonlinear programming problems. PROPOSITION 6 The pair {y*, z*} constitutes a mixedstrategy Nash equilibrium solution to a bimatrix game (A, B) if, and only if, {y*, z*, p*, q*} is a solution of the following bilinear programming problem:
min [y'Az+lBz+p+q]
(19)
Y. z, p, q
subject to Az ~ p i: B'y ~ qln
y
~O,
z
) (20)
zo
y'l m = 1, z' In = 1. Proof The constraints evidently imply that y' Az + y' B + P + q ~ 0,
(i)
which shows that the optimal value of the objective function is nonnegative. If {y*, z*} is an equilibrium pair, then the quadruple
y*, z*, p* = y*'Az*,
q* = y*' Bz*
(ii)
is feasible and the corresponding value of the objective function is zero. Hence (ii) is an optimal solution to the bilinear programming problem. Conversely, let {Y, i, p, q} be a solution to the bilinear programming problem. The fact that feasible solutions to this problem exists has already been shown by (ii), Specifically, by Thm 2, the bimatrix game (A, B) has an equilibrium pair (y*, z*) and hence (ii) is a feasible solution. Furthermore, (ii) also yields the conclusion that the minimum value of the objective function is nonpositive, which, in view of (i), implies
For any
y~
0, z
~
Y' Ai + Y' Bi + P+ q = O. 0 such that v't; = 1 and z'l; = y'Ai
~
p,
y'Bz
~
ij,
1, we have
92
DYNAMIC NONCOOPERATIVE GAME THEORY
and hence y’AZ+ Y‘Bz 2  p  4 = y’AZ+ y’B,T.
In particular, we have J ‘ A f 2  p, y’B.5 2  q, J ’ A f + j’BZ Thus 7’ A? =  p , Y’BZ =  q, and therefore
=p
+ q.
y’AZ 2 J’AZ,
Y’Bz 2J’BZ. This last set of inequalities verifies that { Y, Z} is indeed a mixedstrategy Nash 0 equilibrium solution of the bimatrix game ( A , B).
For the special case of zerosum matrix games, the following corollary now readily follows from Prop. 6, and establishes an equivalence between twoperson zerosum matrix games and L P problems. COROLLARY 2 The pair { y*, z * } constitutes a mixedstrategy saddlepoint solution for a twoperson zerosum matrix game A iJ and only iJ { y * , z*, p*, q * } is a solution of the L P problem
subject to AZ 2 pl,,,,
c,
A‘Y 5 4 y 2 0 , z 20,
y’l,
=
1, Z’l“
=
(22) 1.
0
Remark 4 Note that Prop. 6 directly extends to Nperson finite games in normal form. Furthermore, it is noteworthy that the L P problem of Corollary 2 is structurally different from the L P problem considered in section 2.3 (Thm 2.5), which also provided a means of obtaining mixedstrategy saddle0 point solutions for matrix games.
3.5 NASH EQUILIBRIA OF NPERSON GAMES IN EXTENSIVE FORM This section is devoted to noncooperative (Nash) equilibria of Nperson finite games in extensive form, which do not incorporate chance moves. Extensive tree formulation for finite games without chance moves has already been introduced in Chapter 2 within the context of zerosum games (cf. Def. 2.5),
3
NPERSON NONZEROSUM
93
and such a formulation is equally valid for finite nonzerosum games, with certain appropriate modifications. Hence first, as a direct extension of Def. 2.5, we have the following definition which covers also both singleact and multiact games: DEFINITION 10 Extensive form of an Nperson nonzerosum finite game without chance moves is a tree structure with (i) a specific vertex indicating the starting point of the game, (ii) N cost functions, each one assigning a real number to each terminal vertex of the tree, where the ith cost function determines the loss to be incurred to Pi, (iii) a partition of the nodes of the tree into N player sets, (iv) a subpartition of each player set into information sets { qj}, such that the same number of immediate branches emanate from every node belonging to the same information set and no node follows another 0 node in the same information set.
Two typical nonzerosum finite games in extensive form are depicted in Fig. 1. The first one represents a 3player singleact nonzerosum finite game in extensive form in which the information sets of the players are such that both P2 and P 3 have access to the action of P1. The second extensive form of Fig. 1, on the other hand, represents a 2player multiact nonzerosum finite game in which P 1 acts twice and P 2 only once. In both extensive forms, the set of alternatives for each player is the same at all information sets and it consists of two elements. The outcome corresponding to each possible path is denoted by an ordered Ntuple of numbers (u', . . . , uN),where N stands for the number of players and uistands for the corresponding cost to Pi.
(4 (b) Fig. 1. Two typical nonzerosum finite games in extensive form.
Purestrategy Nash equilibria We now introduce the concept of a noncooperative (Nash) equilibrium solution for Nperson nonzerosum finite games in extensive form, which
94
DYNAMIC NONCOOPERATIVE GAME THEORY
covers both singleact and multiact games. To this end, let us first recall definition of a strategy from Chapter 2 (cf. Def. 2.6). 11 Let N i denote the class of all information sets of Pi, with a typical element designated as ,{ Let V~i denote the set of alternatives of Pi at the nodes belonging to the information set ,{ Define Vi = U V~;, where the union is over ,.,i E N i. Then, a strategy yi for Pi is a mapping from N i into Vi, assigning one element in Vi for each set in «', and with the further property that yi(,.,i) E V~; for each ,.,i E N i. The set of all strategies of Pi is called his strategy set (space), and it is denoted by rio 0 DEFINITION
Let Ji(yl, ... ,yN) denote the loss incurred to Pi when the strategies Er, ... , v" ErN are adopted by the players. Then, the noncooperative (Nash) equilibrium solution concept for such games can be introduced as follows, as a direct extension of Def. 7. yl
12 An Ntuple of strategies {yl', y2', ... , yN'} with yi' E r i, i E N, is said to constitute a noncooperative (Nash) equilibrium solution for an Nperson nonzerosum finite game in extensive form, if the following N inequalities are satisfied for all yi E r, i EN: DEFINITION
JI' ~JI(yl',
y2', ... ,yN')
J2' ~J2(yl',
y2', y3', ... ,yN') :s;; J2(yl', y2, y3', ... ,yN')
s jl (yl,
y2', ... ,yN')
                              
IN' ~JN(yl',
(23)
... ,yNI', yN') :s;; IN(yl', ... ,yNl', yN).
The Ntuple of quantities pl', ... ,I N ' } is known as a Nash equilibrium outcome of the nonzerosum finite game in extensive form. 0 We emphasize the word a in the last sentence of the preceding definition, since Nash equilibrium solution could possibly be non unique with the corresponding set of Nash values being different. This then leads to a partial ordering in the set of all Nash equilibrium solutions, as in the case of Defs 2 and 3 whose extensions to the N person case are along similar lines and are therefore omitted. Nash equilibria in mixed and behavioural strategies The concepts of noncooperative equilibria in mixed and behavioural strategies for nonzerosum finite games in extensive form can be introduced as straightforward (natural) extensions of the corresponding ones presented in Chapter 2 within the context of saddlepoint equilibria. We should recall that a
3
NPERSON NONZEROSUM
95
mixed strategy for a player (Pi) is a probability distribution on the set of all his pure strategies, i.e. on rio A behavioural strategy, on the other hand, is an appropriate mapping whose domain of definition is the class of all the information sets of the player. By denoting the behavioural strategy set of Pi by f'i, and the average loss incurred to Pi as a result of adoption of the {AI r~I , ... , YAN E r~N} bY J(y ~ r , ... ,yAN ), t he be havi avioura I strategy N tupIeyE definition of a Nash equilibrium solution in behavioural strategies may be obtained directly from Def. 12 by replacing yi, r and Ji with yi, fi and Ji, respectively. We now discuss, in the four subsections to follow, properties and derivation of these different types of Nash equilibria for finite games in extensive formfirst for singleact games, and then for multiact games.
3.5.1 Singleact games: purestrategy Nash equilibria In singleact games, each player acts only once, and the order in which the players act could be a variant of their strategies. If the order is fixed a priori, and further if the singleact game is static in nature (i.e. if each player has a single information set), then there is no basic difference between extensive and normal form descriptions, and consequently no apparent advantage of one form over the other. Since such static games in normal form have already been extensively studied in sections 2 and 3, our concern here will be primarily on dynamic singleact games, that is, games in which at least one of the players has access to some nontrivial information concerning the action of some other player. Figure l(a), for example, displays one such game wherein both P2 and P3 have access to PI's action, but they are ignorant about each other's actions. Yet another singleact game with dynamic information is the 2person nonzerosum game depicted in Fig. 2, wherein P2's information sets can differentiate between whether PI has played L or not, but not between his possible actions M and R. One method of obtaining Nash equilibria of a dynamic singleact finite game in extensive form is to transform it into an equivalent normal form and to make use of the theory of sections 2 and 3. This direct method has already been discussed earlier in section 2.4 within the context of similarly structured zerosum games, and it has the basic drawback that one has to take into consideration all possible strategy combinations. As in the case of dynamic zerosum games, an alternative to this direct method exists for some special types of nonzerosum games, which makes explicit use of the extensive form description, and obtains a subclass of all Nash equilibria through a systematic (recursive) procedure. But in order to aid in understanding the essentials and the limitations of such a recursive procedure, it will be instructive first to
96
DYNAMIC NONCOOPERATIVE GAME THEORY P1 L
M
P2 : R
L 0.1
2,1
L 3,2
R 0.3
L 2,1
Fig. 2. A 2person nonzerosum singleact game with dynamic information.
consider the following specific example. This simple example, in fact, displays most of the intricacies of Nash equilibria in dynamic finite games.
Example 6 Consider the 2person nonzerosum singleact game whose extensive form is displayed in Fig. 2. The following is an intuitively appealing recursive procedure that would generate a Nash equilibrium solution for this game. At the first of his two information sets (from left), P2 can tell precisely whether Pl has played L or not. Hence, in case u l = L, the unique decision of P2 would be u2 = L, yielding him a cost of  1 which is in fact the lowest possible level P2 can hope to attain. Now, if u 1 = M or R, however, P2's information set does not differentiate between these two actions of Pl, thus forcing him to playa static nonzerosum game with Pl. The corresponding bimatrix game (under the usual convention of A denoting the loss matrix of PI and B denoting that of P2) is: P2
P2
A~:
I: I ~ L
I PI
B=
R~
M
L
R
PI.
(24)
R
This bimatrix game admits (as indicated) a unique Nash equilibrium {R, R} with an equilibrium cost pair of (  1, 0). This then readily suggests y2*(t]2)
=
{LR
l
=.L
if u otherwise
(25a)
as a candidate equilibrium strategy for P2. PI, on the other hand, has two possible strategies, u l = Land u l = R, since the third possibility u l = M is ruled out by the preceding argument. Now, the former of these leads (under (25a» to a cost of JI = 0, while the latter leads to JI =  1. Hence, he would clearly decide on playing R, i.e. (25b)
3
97
NPERSON NONZEROSUM
is a candidate equilibrium strategy for PI. The cost pair corresponding to (25a}(25b) is
Jt = _ 1, J2° = O.
(25c)
O
The strategy pair (25a}(25b) is thus what an intuitively appealing (and rather straightforward) recursive procedure would provide us with, as a reasonable noncooperative equilibrium solution for this extensive form. It can, in fact, be directly verified by referring to inequalities (23) that the pair (25a}(25b) is indeed in Nash equilibrium. To see this, let us first fix P2's strategy at (25a), and observe that (25b) is then the unique costminimizing decision for PI. Now, fixing PI's strategy at (25b), it is clear that P2's unique costminimizing decision is u 2 = R which is indeed implied by (25a) under (25b). The recursive scheme adopted surely resulted in a unique strategy pair which is also in Nash equilibrium. But is this the only Nash equilibrium solution that the extensive form of Fig. 2 admits? The reply is, in fact, no! To obtain the additional Nash equilibrium solution(s), we first transform the extensive form of Fig. 2 into an equivalent normal form. To this end, let us first note that r 1 = {L, M, R} and r = {yi, y~, yL y~}, where yf ('1 2 ) = L, Y~('12)
= R,
Y~('12)
=
{~
if u 1 = L otherwise
Y~('12)
=
{~
if u 1 = L otherwise.
Hence, the equivalent normal form is the 3 x 4 bimatrix game P2
P2 L
[Q]
2
0
2
L
A= M
3
0
0
3
R
2
1
ED
2
2
3
4
8J
1
1
PI, B = M
2
3
3
2 PI
R
1
0
@
1
1
2
3
4
1
which admits two purestrategy Nash equilibria, as indicated. The encircled one is {R;yn which corresponds to the strategy pair (25a}(25b) obtained through the recursive procedure. The other Nash equilibrium solution is the constant strategy pair y2('1 2 ) = L
(26a)
yl('1 1)
(26b)
=
L
98
DYNAMIC NONCOOPERATIVE GAME THEORY
with a corresponding cost pair of )1 =
0,
)2 = 
1.
(26c)
A comparison of (25c) and (26c)clearly indicates that both of these equilibria are admissible. It is noteworthy that, since (26a) is a constant strategy, the pair (26aH26b) also constitutes a Nash equilibrium solution for the static singleact game obtained from the one of Fig. 2 by replacing the dynamic information of P2 with static information (i.e. by allowing him a single information set). It is, in fact, the unique Nash equilibrium solution of this static game which admits the normal form (bimatrix) description P2
P2
L
A=M R
[IfU° 3 2
L
1 R
L
PI,
B=M R
00 2
1 L
3
°
PI.
R
The actual play (R,R) dictated by (25aH25b), however, does not possess such a property as a constant strategy pair. 0 The preceding example has displayed certain important features of Nash equilibria of a singleact game, which we now list below. We will shortly see, in this section, that these features are, in fact, valid on a broader scale for both singleact and multiact nonzerosum games. Features of the singleact game of Example 6 (i) The singleact game admitted multiple (two) Nash equilibria. (ii) The (unique) Nash equilibrium solution of the static version of the singleact game (obtained by replacing the information sets of the secondacting player by a single information set) also constituted a Nash equilibrium solution for the original singleact game with the dynamic information. (iii) A recursive procedure that involves solution of only bimatrix games (could also be degenerate games) at each information set yielded only one of the two Nash equilibria. The actual play dictated by this equilibrium solution did not constitute an equilibrium solution as constant strategies. 0 We now show that appropriate (and more general) versions of these features are retained in a more general framework for Nperson singleact games. To
3
99
NPERSON NONZEROSUM
this end, we first introduce some terminology and also a partial ordering of extensive forms in terms of their information sets. DEFINITION 13 A singleact Nperson finite game in extensive form (say, I) is the static version of a dynamic singleact N person finite game in extensive form (say, II), if I can be obtained from II by replacing the information sets of each player with a single information set encompassing all the nodes pertaining to that player. t The equivalent normal form of the static game I is called the static normal form of II. 0 DEFINITION 14 Let I and II be two singleact Nperson games in extensive form, and further let r\ and r\, denote the strategy sets of Pi (i EN) in I and II, respectively. Then, I is said to be informationally inferior to II if r\ c;; r\1 for all i E N, with strict inclusion for at least one i. 0
Remark 5 In Fig. 3, the singleact game I is informationally inferior to II and III. The latter two, however, do not admit any such comparison. 0 PI
,,
P2,' 1,2
0,1
0,0
1,2
2,1
P2,: 1,2
1.0
" /
0,1
0,0
1,2
2,1
1,0
II
1,2
0,1
0,0
1.2
2.1
1.0
III
Fig. 3. Extensive forms displaying informational inferiority. PROPOSITION 7 Let I be an Nperson singleact game that is informationally inferior to some other singleact Nperson game, say II. Then, (i) any Nash equilibrium solution of I also constitutes a Nash equilibrium solution for II,
t It should be noted that a singleact finite game admits a static version only if the order in which the players act is fixed a priori and the possible actions of each player is the same at all of his information sets.
100
(ii)
DYNAMIC NONCOOPERATIVE GAME THEORY
if {yl, ... , yN} is a Nash equilibrium solution of II so that yiE nfor all i E N, then it also constitutes a Nash equilibrium solution for I.
Proof (i) If {yl"Eq, ... , yN"Ern constitutes a Nash equilibrium solution for I, then inequalities (23)are satisfied for all yi E rL i E N. But, since r iI ~ rh, i E N, we clearly also have yi E rIb i E N. Now assume, to the contrary, that {y I", ... , yN"} is not a Nash equilibrium solution of II. Then, this implies that there exists at least one i (say, i = N, without any loss of generality) for which the corresponding inequality of (23) is not satisfied for all yi E rill' In particular, there exists }i N E r IT such that (i)
Now, the Ntuple of strategies {yl", ... , yNI", }iN} leads to a unique path of action, and consequently to a unique outcome, in the singleact game II. Let us denote the information set of PN, which is actually traversed by this path, by ij IT, and the specific element (node) of ijIT intercepted by fiN. Let us further denote the information set ofPN in the game I, which includes the node iiN, by ijf. Then, there exists at least one element in r ~ (say, yN) with the property yN (ijf) = }iN(ijN). If this strategy replaces }iN on the RHS of inequality (i), the value of J N clearly does not change, and hence we equivalently have IN" > IN(yl"; ... ; yNI"; yN).
But this inequality contradicts the initial hypothesis that the N tuple {yI" ... ; yN"} was in Nash equilibrium for the game I. This then completes the proof of part (i). (ii) Part (ii) of the proposition can likewise be proven. 0 Remark 6 Proposition 7 now verifies a general version of feature (ii) of the singleact game of Example 6 for N person singleact finite games in extensive form. A counterpart of feature (i) also readily follows from this proposition, which is that such games with dynamic information will in general admit multiple Nash equilibria. A more definite statement cannot be made, since there would always be exceptional cases when the Nash equilibrium solution of a singleact dynamic game is unique and is attained in constant strategies which definitely also constitute a Nash equilibrium solution for its static version assuming that it exists. It is almost impossible to single out all these special games, but we can comfortably say that they are "rarely" met, and existence of multiple Nash equilibria is a rule in dynamic singleact games rather than an exception. Since this sort of non uniqueness emerges mainly because of the dynamic nature of the information sets of the players, we call it informational nonuniqueness. 0
3
NPERSON NONZEROSUM
101
Now, to elaborate on the third (and the last) feature listed earlier, we have to impose some further structure on the "relative nested ness" of the information sets of the players, which is already implicit in the extensive form of Fig. 2. It is, in fact, possible to develop a systematic recursive procedure, as an extension of the one adopted in the solution of Example 6, if the order in which the players act is fixed a priori and the extensive form is in "laddernested" forma notion that we introduce below. DEFINITION 15 In an extensive form of a singleact nonzerosum finite game with a fixed order of play, a player Pi is said to be a precedent ofanother player Pj if the former is situated closer to the vertex of the tree than the latter. The extensive form is said to be nested if each player has access to the information acquired by all his precedents. If, furthermore, the only difference (if any) between the information available to a player (Pi) and his closest (immediate) precedent (say Pi 1) involves only the actions of Pi l,and only at those nodes corresponding to the branches of the tree emanating from singleton information sets of Pi  1, and this so for all players, the extensive form is said to be laddernested. t A singleact nonzerosum finite game is said to be nested (respectively, laddernested) if it admits an extensive form that is nested (respectively, laddernested). 0
Remark 7 The singleact extensive forms of Figs 1(a) and 2 are bothladder
nested. If the extensive form of Fig. 1(a) is modified so that both nodes of P2 are included in the same information set, then it is only nested, but not laddernested, since P3 can differentiate between different actions of PI but P2 cannot. Finally, if the extensive form of Fig. 1(a)is modified so that this time P3 has a single information set (see Fig. 4(a», then the resulting extensive form becomes nonnested, since then even though P2 is a precedent ofP3 he actually knows more than P3 does. The singleact game that this extensive form describes is, however, nested since it also admits the extensive form description depicted in Fig. 4(b). 0 One advantage ofdealing with laddernested extensive forms is that they can recursively be decomposed into simpler tree structures which are basically static in nature. This enables one to obtain a class of Nash equilibria of such games recursively, by solving static games at each step of the recursive procedure. Before providing the details of this recursive procedure, let us introduce some terminology. t Note that in 2person singleact games the concepts of "nested ness" and "laddernestedness" coincide, and every extensive form is, by definition, laddernested.
102
DYNAMIC NONCOOPERATIVE GAME THEORY
PI
PI
L
:::. ~
R
:::
~:
N
N
O:
R
L
". N
O'
~
~
N
r:
(a)
0:
~.
~
N
~.
6
N
;::;:
~.
0
(b)
Fig. 4. Two extensive forms of the same nested singleact game. DEFINITION 16 For a given singleact dynamic game in nested extensive form (say, I), let '1 denote a singleton information set of Pi's immediate follower (say Pj); consider the part of the tree structure ofl, which is cut otTat '1, has '1 its vertex and has as immediate branches only those that enter into that information set of Pj. Then, this tree structure is called a subextensive form of I. (Here, we adopt the convention that the starting vertex of the original extensive form is the singleton information set of the firstacting player.) 0
Remark 8 The singleact game depicted in Fig. 2 admits two subextensive forms which are PI
P2~,.L
v., ~"
L o,~
PI
"
R 2,1
P2;(A' ~ ~ '\ \.

/
L
R
L
R
3,2
0,2
2,1
,1,0
The extensive form of Fig. 1(a), on the other hand, admits a total of four subextensive forms which we do not display here. It should be noted that each subextensive form is itself an extensive form describing a simpler game. The first one displayed above describes a degenerate 2player game in which PI has only one alternative. The second one again describes a 2player game in which the players each have two alternatives. Both of these subextensive forms will be called static since the first one is basically a oneplayer game and the second one describes a static 2player game. A precise definition follows. DEFINITION 17 A subextensive form of a nested extensive form of a singleact game is static if every player appearing in this tree structure has a single information set. 0
3
N·PERSON NONZERO·SUM
103
We are now in a position to extend the recursive procedure adopted in Example 6 to obtain (some of) the Nash equilibrium solutions of N person singleact games in laddernested extensive form. A recursive procedure to determine purestrategy Nash equilibria of Nperson singleact games in laddernested extensive form (1) For each fixed information set of the lastacting player (say, PN), single out the players who have precisely the same information as PN, determine the static subextensive form that includes all these players and their single information sets, and solve the static game that this subextensive form describes. Assuming that each of these static games admits a unique purestrategy Nash equilibrium solution, record the equilibrium strategies and the corresponding Nash outcomes by identifying them with the players and their information sets. (If anyone of the static games admits more than one Nash equilibrium solution, then this procedure as wellas the following steps are repeated for each of these multiple equilibria.) (2) Replace each static subextensive form considered at step 1 with the immediate branch emanating from its vertex, that corresponds to the Nash strategy of the starting player of that subextensive form; furthermore, attach the corresponding N tuple of Nash equilibrium values to the end of this branch. (3) For the remaining game in extensive form, repeat steps 1 and 2 until the extensive form left is a tree structure comprising only a vertex and some immediate branches with an Ntuple of numbers attached to each of them. (4) The branch of this final tree structure which corresponds to a minimum loss for the starting player is his Nash equilibrium strategy, and the corresponding Ntuple of numbers at the end of this branch determines a Nash outcome for the original game. The corresponding Nash equilibrium strategies of the other players in the original singleact game can then be determined from the solutions of the static subextensive forms, by appropriately identifying them with the players and their information sets. 0 Remark 9 Because of the laddernestedness property of the singleact game, it can readily be shown that every solution obtained through the foregoing recursive procedure is indeed a Nash equilibrium solution of the original singleact game. 0
A most natural question to raise now is whether the preceding recursive procedure generates all the Nash equilibrium solutions of a given Nperson
104
DYNAMIC NONCOOPERATIVE GAME THEORY
singleact game in laddernested extensive form. We have already seen in Example 6 that this is, in fact, not so, since out of the two Nash equilibrium solutions, only one of them was generated by this recursive procedure. Then, the question may be rephrased as: precisely what subclass of Nash equilibria is generated by the preceding scheme? In other words, what common feature (if any) can we attribute to the Nash equilibria that can be derived recursively? To shed some light on this matter, it will be instructive to refer again to Example 6 and to take a closer look at the two Nash equilibria obtained: Example 6 (contd) For the singleact game of Fig. 2, the recursive procedure generated the solution (25a}(25b) with Nash cost pair (1, 0). When compared with the other Nash cost pair which is (0,  1), it is clear that the former is more advantageous for Pl. Hence, P2 would rather prefer to play the constant strategy (26a) and thereby attain a favourable cost level of J2 =  1. But, since he acts later, the only way he can insure such a cost level is by announcing his constant strategy ahead of time and by irrevocably sticking to it. If he has the means of doing this and further if PI has strong reasons to believe in such an attitude on part of P2, then (0, 1) will clearly be the only cost pair to be realized. Hence, the constantstrategy Nash solution in this case corresponds to a "prior commitment" mode of play on the part of the last acting playera notion which has been introduced earlier in section 2.5 within the context of multiact zerosum games, and which also has connections with the Stackelberg mode of play to be discussed in the next section of this chapter. Now, yet another mode ofplay for P2 (the secondacting player) would be to wait until he observes at what information set he is at the time of his play, and only then decide on his action. This is known as the "delayed commitment" type of attitude, which has also been introduced earlier in section 2.5, within the context of zerosum games and in conjunction with behavioural strategies. For the singleact game of Example 6, a Nash equilibrium solution of the delayed commitment type is the one given by (25a}(25b), which has been obtained using the recursive procedure. This discussion now leads us to the following definition. DEFINITION 18 A Nash equilibrium solution of a laddernested singleact finite Nperson game is of the delayed commitment type if it can be obtained through the recursive procedure outlined earlier, i.e. by solving only static singleact games. 0
Remark 10 As a side remark, it is noteworthy that in the singleact game of Example 6 additional information for P2 concerning the action of PI could be detrimental. For, ifhis information set was a single one (i.e. the static version), then the game would admit the unique Nash solution (26a}(26b) with P2's
3
105
NPERSON NONZERO·SUM
Nash cost being J2 = 1. In the setup of the extensive form of Fig. 2, however, he could end up with a higher Nash cost of P = O. Hence, in nonzerosum games, additional information could be detrimental for the player who receives it. 0 We now illustrate, giving two examples, the steps involved in the outlined recursive procedure to obtain admissible Nash equilibrium strategies of the delayed commitment type. Example 7 Consider the 3person singleact game whose extensive form (of the laddernested type) is depicted in Fig. 1(a). Here, both P2 and P3 have access to the action of PI (and to no more information), and hence they are faced with a bimatrix game at each of their information sets. Specifically, if u 1 = L, then the corresponding bimatrix game is
P3 A
= L
P3
IITTll
P2
RQII]'
B
I©lll P2,
= L
R~
L R L R where the entries of matrix A denote the possible losses to P2 and the entries of matrix B denote the possible losses to P3. This bimatrix game admits a unique admissible Nash equilibrium solution which is {u 2> = L, u 3> = L}, with the corresponding Nash values being  1 and 0 for P2 and P3, respectively. If u 1 = R, on the other hand, the bimatrix game of interest is P3 A = L
R
P3
~JidJI
2
L
R
~
P2,
B=
L R
I ~ I ~ I P2 L
R
This game admits a unique Nash solution which is {u > = R, u 3> = L} with a Nash cost pair of (  2,  1). Hence, regardless of what PI plays, the delayed commitment type admissible Nash equilibrium strategies of P2 and P3 in this laddernested game are unique, and they are given as 2
1,2>(,,2)
= ul,
y3>(,,3)
= L.
(i)
(Note that P3's equilibrium strategy is a constant.) Now, if PI picks u 1 = L, his loss under (i) will be 1, but otherwise his loss is O. Hence he also has a unique equilibrium strategy (ii)
106
DYNAMIC NONCOOPERATIVE GAME THEORY
Strategies (i) and (ii) now constitute the unique delayed commitment type admissible Nash equilibrium solution of the singleact game of Fig. Ita), which can also be directly verified by again referring to the set of inequalities (23). The 0 corresponding Nash equilibrium outcome is (0,  2,  I). Example 8 Consider the 4person singleact game whose extensive form (of the laddernested type) is depicted in Fig. 5. The lastacting player is P4 and he has three information sets. The first one (counting from left) tells him whether PI has actually played L or not, but nothing concerning the actions of the other players. The same is true for P2 and P3. The corresponding threeplayer static game admits an equivalent normal form which is the one considered in Example 3, with only Pj replaced by Pj + 1 in the present context. Furthermore, we now also have the costs ofan additional player (PI) showing up at the end of the branches (i.e. in the present context we have a quadruple
L
o. .,
::: 0'
e;
N' ~
O.
00'
.,'.,'
N
0
N
0'
"'.0 e;
o'
~
6'
cs
0
.,'cs ~.
Fig. 5. A 4person singleact laddernested game in extensive form.
cost instead of a triple). It is already known that this game admits a unique Nash equilibrium solution which, under the present convention, is and this is so if PI has picked u1 = L, with the corresponding cost quadruple being (1, 1,  1, 0). The second information set of P4 tells him whether PI has picked Rand P2 has picked L; and P3 also has access to this information. Hence, we now have a bimatrix game under consideration, which is essentially the first bimatrix game of Example 7, admitting the unique solution and this so if u 1 1, 1,0).
= R, u2 = L, with the corresponding cost quadruple being (0,
3
107
NPERSON NONZEROSUM
The third information set of P4 provides him with the information = R, u 2 = R. P3 has access to the same information, and thus the equivalent normal form is the second bimatrix game of Example 7, which admits the unique Nash solution u1
with the corresponding cost quadruple being (0, 0,  2, I). Using all these, we can now write down the unique Nash equilibrium strategies of P3 and P4, which are given as y3"('1 3) =
and
1
{R
2
if u =.R, u = otherwise
L
R
(i)
(ii) respectively. Deletion of all the branches of the extensive tree already used now leads to the simpler tree structure Pt L
P2 1,1,10 0,1,1,0 0.0,2,1
In this game, P2 acts only if u 1 = R, in which case he definitely picks u 2 = R, since this yields him a lower cost (0) as compared with I he would obtain otherwise. This also completely determines the delayed commitment type Nash equilibrium strategy of P2 in the game of Fig. 5, which is unique and is given by y2"('1 2) = u 1 . (iii) Then, the final equivalent form that the game takes is PtA.
LI \R
1,1,1,0 0,0.2.1
from which it is clear that the optimal unique strategy for PI is
r" = R.
(iv)
The strategies (ijIiv) constitute the unique delayed commitment type Nash equilibrium solution of the extensive form of Fig. 5 (which is clearly also
108
DYNAMIC NONCOOPERATIVE GAME THEORY
admissible), and the reader is also encouraged to check this result by verifying satisfaction of inequalities (23). The corresponding unique Nash equilibrium outcome is (0, 0,  2, 1). 0 Nested extensive forms which are not laddernested cannot be decomposed into static subextensive forms, and hence the recursive derivation does not directly apply to such games. However, some nested extensive forms still admit a recursive decomposition into simpler subextensive forms, some of which will be dynamic in nature. Hence, an extended version of the recursive procedure applies for nested games, which involves solution of simpler dynamic subextensive forms. In this context, the delayed commitment mode of play also makes sense as to be introduced in the sequel. DEFINITION 19 A nested extensive (or subextensive) form of a singleact game is said to be undecomposable if it does not admit any simpler subextensive form. It is said to be dynamic, if at least one of the players has more than one information set. 0 DEFINITION 20 For an Nperson singleact dynamic game in nested undecomposable extensive form (I) let J denote the set of games which are informationally inferior to I. Let {y 1', ... , yN'} be an N tuple of strategies in Nash equilibrium for I, and further letj* denote the number of games in J to which this Ntuple provides an equilibrium solution. Then, {yl', ... , yN'} is of delayed commitment type if there exists no other N tuple of Nash equilibrium strategies of I which constitutes an equilibrium solution to a smaller (thanj*) 0 number of games in J.t
The following example now provides an illustration of these notions as well as derivation of delayed commitment type Nash equilibria in nested games. Example 9 Consider the 3person singleact game whose extensive form is displayed in Fig. 6. In this game P3 has access to PI's action, but P2 does not. Hence, the game is nested but not laddernested. Furthermore, it is both dynamic and undecomposable, with the latter feature eliminating the possibility for a recursive derivation of any of its Nash equilibria. Then, the only method is to bring the singleact game into equivalent normal form and to obtain the Nash equilibria of this normal form. Towards this end, let us first note that PI and P2 each have two possible strategies: yt = L, yi = R; yi = L, t The reader should verify for himself that every singleact game that admits Nash equilibria in pure strategies has at least one equilibrium solution of the delayed commitment type, i.e. the set of such equilibria is not empty unless the game does not admit any purestrategy Nash equilibrium solution.
3
109
N·PERSON NONZEROSUM
y~ = R. P3, on the other hand, has four possible strategies:
YI (11 3 ) = and
yl(11
3
L, y~ (11 3 )
R, y~ (11 3 ) = u 1 ,
if u 1 = R
L
)
=
= { R otherwise. P1 R
Fig. 6. A nested singleact game in extensive form which is dynamic and
undecomposable,
Using the terminology of section 3, we now display the normal form 'as two (2 x 4) matrices, one for each strategy of PI:
P3 (1,  1, 1)
(0,0,0)
(1,  1, 1)
(1, 2, 1)
(1, 1, 1)
(1, 2, 1)
(1, 1, 1)
1
2
3
4
~
P2
P3
n1 = 2:
lV
1
(1, 1,2)
2
(1, 0, 1)
(1,2,0)
1
2
(fl,O,
(1,0, 1) (1,2,0) , 3
(1, 1, 2) (1,0, 1)
P2.
4
(Here, the integers 1, 2, ... identify particular strategies of the players in accordance with the subscripts attached to yiS . ) This normal form admits two Nash equilibria, as indicated, which correspond, in the original game, to the strategy triplets (i)
and (ii)
110
DYNAMIC NONCOOPERATIVE GAME THEORY
There exists only one game which is informationally inferior to the singleact game under consideration, which is the one obtained by allowing a single information set to P3 (i.e. the static version). Since the second Nash solution dictates a constant action for P3, it clearly also constitutes a Nash solution for this informationally inferior static game (cf. Prop. 7).Hence, for the second set of Nash strategies, j* = 1, using the terminology of Def. 20. The first set of Nash strategies, on the other hand, does not constitute a Nash equilibrium solution to the informationally inferior game, and hence in this case j* = O. This then yields the conclusion that the nested singleact game of Fig. 5 admits a unique delayed commitment type Nash equilibria, given by (i) and with an 0 outcome of (0, 0, 0). For more general types of singleact finite games in nested extensive form, a recursive procedure could be used to simplify the derivation by decomposing the original game into static and dynamic undecomposable subextensive forms. Essential steps of such a recursive procedure are now given below. Nash equilibria obtained through this recursive procedure will be called of the delayed commitment type. A recursive procedure to determine delayed commitment type purestrategy Nash equilibria of Nperson singleact games in nested extensive form (1) Each information set of the lastacting player is included in either a static subextensive form or a dynamic undecomposable subextensive form. Single out these extensive forms and obtain their delayed commitment type Nash equilibria. (Note that in static games every Nash equilibrium is, by definition, of the delayed commitment type.) Assuming that each of these games admits a unique purestrategy Nash equilibrium solution of the delayed commitment type, record the equilibrium strategies and the corresponding Nash outcomes by identifying them with the players and their information sets. (If anyone of these games admits more than one Nash equilibrium solution of the delayed commitment type, then this impliesthat the original game admits more than one such equilibrium,and this procedure as well as the following steps will have to be repeated for each of these multiple equilibria.) (2) Replace each subextensive form considered at step 1 with the immediate branch emanating from its vertex, that corresponds to the delayed commitment Nash strategy of the starting player of that subextensive form; furthermore, attach the corresponding Ntuple of Nash equilibrium values to the end of this branch. (3) For the remaining game in nested extensive form, repeat steps 1 and 2 until the extensive form left is a tree structure comprised ofonly a vertex
3
III
NPERSON NONZEROSUM
and some immediate branches with an Ntuple of numbers attached to each of them. (4) The branch of this final tree structure which corresponds to a minimum loss for the starting player is his Nash equilibrium strategy of the delayed commitment type, and the corresponding N tuple of numbers at the end of this branch determines a Nash outcome for the original game. The corresponding delayed commitment type Nash equilibrium strategies of the other players in the original singleact game can then be captured from the solution(s) of the subextensive forms considered, by appropriately identifying them with the players and their information sets. 0 Example 10 To illustrate the steps of the foregoing recursive procedure, let us consider the 3person singleact game of Fig. 7, which is in nested extensive form. At step 1 of the recursive procedure, we have two 2person static subextensive forms and one 3person dynamic undecomposable subextensive form. Counting from left, the first static subextensive form admits the unique Nash equilibrium solution (from Example 7):
{u 2· = L,
;:~
e;
0"
e,
N
N
"'.
O
.,
N"
0
U 3·
= L}.
.,
0 00
(i)
~~
~
N
.,,..:
~
~"
.
C5 ....
d ,..:
Fig. 7. The singleact game of Example 10.
The second static subextensive form also admits a unique solution which is (again from Example 7):
{U 2• = R, u "
= L}.
(ii)
It should be noted that (i) corresponds to the information sets rJ2 = rJ3 = {u1 = L} and (ii) corresponds to rJ2 = rJ3 = {u1 = Ml}. Now, the dynamic undecomposable subextensive form is in fact the singleact game of Example 9, which is already known to admit the unique delayedcommitment type Nash equilibrium solution •• _
y  M2,
2. _
y  L,
3.
3
_
Y (rJ) 
{LR ifu ifu
1 1
= M2
= R.
112
DYNAMIC NONCOOPERATIVE GAME THEORY
At step 2, therefore, the extensive form of Fig. 7 becomes
M2 PI
1,1,0 Q2,1 0,0,0
and it readily follows from this tree structure that PI has two Nash equilibrium strategies: ylO = MI and r" = M2.
Consequently, the original singleact game admits two Nash equilibria of the delayed commitment type: ylo = MI, yZO(I12) =
{~
if u! = MI otherwise
y3°(11 3)=
{~
ifu 1 = R otherwise
and
r " = M2,
yZO(I12)= { L
iful=MI otherwise
R y3 (11 2 ) = { L
otherwise,
R
0
iful=R
with equilibrium outcomes being (0,  2,  I) and (0,0,0), respectively.
0
Remark 11 As we have discussed earlier in this section, singleact nonzerosum finite dynamic games admit, in general, a multiple of Nash equilibria, mainly due to the fact that Nash equilibria of informationally inferior games socalled also provide Nash solutions to the original game (cf. Prop. 7~the informational nonuniqueness of Nash equilibria. The delayed commitment mode of play introduced for nested and laddernested games, however, eliminates this informational non uniqueness, and therefore strengthens the Nash equilibrium solution concept for dynamic singleact games. 0
3.5.2
Singleact games: Nash equilibria in behavioural and mixed strategies
When an Nperson singleact game in extensive form does not admit a Nash equilibrium solution (in pure strategies), then an appropriate approach is to investigate Nash equilibria within the enlarged class of behavioural strategies, as has been done in section 2.5for zerosum games. If the nonzerosum single
3
113
NPERSON NONZEROSUM
act finite game under consideration is of the "laddernested" type, then it is relatively simpler to obtain Nash equilibria in behavioural strategies (which also fit well within a delayed commitment framework), since one still follows the recursive procedure developed in this section for pure strategies, but this time also considering the mixedstrategy Nash equilibria of the static subextensive forms. Then, these mixed strategies, obtained for each equivalent normal form. will have to be appropriately concatenated and written in the form of a behavioural strategy. The following example now illustrates these steps in the derivation of behavioural Nash equilibria. Example 11 Consider the 3person singlead game depicted in Fig. 8. This is of the laddernested type, and both P2 and P3 have access to the action of Pl. but to no other information. Hence, we first have to consider two bimatrix games corresponding to the possible actions Land R ofPl. If u 1 = L, then the bimatrix game is the one treated in Example 2, which admits the unique mixedstrategy Nash equilibrium =
{~
w.p.1/2 w.p.l/2
u•=
{~
w.p.l/2 w.p.l/2
u
2
•
3
with the corresponding average cost triple being (0, 1/2,3/2). If u 1 = R, on the other hand, the bimatrix game is P3
A=
L
R
ffi3j 3
2
L
R
P3
B=
P2.
° ~
L
R
L
P2 2 R
PI L
M
R
N
0: s>: Fig. 8.
The 3person singleact game of Example 11.
114
DYNAMIC NONCOOPERATIVE GAME THEORY
which admits the unique Nash solution (in pure strategies) with the corresponding cost triple being (1,1,1). Comparing this outcome with the previous one, it is obvious that the unique equilibrium strategy of PI is Appropriate concatenation now yields the following unique behavioural Nash equilibrium strategies for P2 and P3, respectively:
»2.(,,2) =
Y'.(~') ~
L R L R
!t
w.p. 1/2} W.p. 1/2 w.p. 1 W.p. 0
}
W.p. 1/2} w.p. 1/2 W.p. 1 w.p. 0
}
if u 1 = L if u 1 = R if u 1 = L if u 1 = R.
The unique average Nash equilibrium outcome in behavioural strategies is (0, 1/2, 3/2). 0 Since the recursive procedure involves solution of static games in normal form at every step of the derivation, and since every Nperson nonzerosum finite game in normal form admits a mixed strategy Nash equilibrium solution (cf. Thm 2), it follows that a Nash equilibrium solution in behavioural strategies always exists for singleact games in laddernested extensive form. This result is summarized below in Prop. 8, which is the counterpart of Corollary 2.4 in the present framework. PROPOSITION 8 In every singleact Nperson game which has a laddernested extensive form comprising a finite number of branches, there exists at least one N ash equilibrium solution in behavioural strategies. 0
Remark 12 It should be noted that, as opposed to Corollary 2.4,Prop. 8 does not claim an ordered interchangeability property in case of multiple Nash equilibria, for reasons which have extensively been discussed in sections 2 and 3 of this chapter. Furthermore, the result might fail to hold true if the singleact game is not laddernested, in which case one has to consider the even larger class of mixed strategies. 0
3
NPERSON NONZEROSUM
115
Remark 13 Since the players have only a finite number of possible pure strategies, it readily follows from Thm 2 that there always exists a Nash equilibrium in mixed strategies in singleact finite games of the general type. However, computation of these mixed equilibrium strategies, is quite a nontrivial problem, since a recursive procedure cannot be developed for finite singleact games which are not laddernested. 0
3.5.3 Multiact games: purestrategy Nash equilibria In the discussion of the Nash equilibria of multiact nonzerosum finite games in extensive form, we shall follow rather closely the analysis of section 2.5 which was devoted to similarly structured zerosum games, since the results to be obtained for multiact nonzerosum games will be direct extensions of their counterparts in the case of zerosum games. Not all conclusions drawn in section 2.5, however, are valid in the present context, and these important differences between the properties of saddlepoint and Nash equilibrium solutions will also be elucidated in the sequel. In order to be able to develop a systematic (recursive) procedure to obtain Nash equilibria of multiact nonzerosum finite games, we will have to impose (asin section 2.5) some restrictions on the nature of the informational coupling among different levels of play. To this end, we confine ourselves to multiact games whose Nash equilibria can be obtained by solving a sequence of singleact games and by appropriate concatenation of the equilibrium strategies determined at each level of play. This reasoning, then, brings us to the socalled "feedback games", a precise definition of which is given below (as a counterpart of Def. 2.11). DEFINITION 21 A multiact Nperson nonzerosum game in extensive form with a fixed order of play is called an N person nonzerosum feedback game in extensive form, if (i) at the time of his act, each player has perfect information concerning the current level of play, i.e. no information set contains nodes of the tree belonging to different levels of play, (ii) information sets of the firstacting player at every level of play are singletons, and the information sets of the other players at every levelof play are such that none of them includes nodes corresponding to branches emanating from two or more different information sets of the firstacting player, i.e. each player knows the state of the game at every level of play. If, furthermore, (iii) the singleact games corresponding to the information sets of the first
116
DYNAMIC NONCOOPERATIVE C;;AME THEORY
acting player at each level of play are of the laddernested (respectively, nested) type (cf. Def. 15), then the multiact game is called an Nperson nonzerosum feedback game in laddernested (respectively, nested) 0 extensive form, Remark 14 It should be noted that, in the case of twoperson multiact games, every feedback game is, by definition, also in laddernested extensive form; hence, the zerosum feedback game introduced earlier by Def 2,11 is also laddernested, 0
Now, paralleling the analysis ofsection 2.5, let the number oflevels of play in an N person nonzerosum feedback game in extensive form be K, and consider a typical strategy yi of Pi in such a game to be composed of K components M, ' , ", yiJ. Here, I'Jstands for the corresponding strategy of Pi at his jth level of act, and it can be taken to have as its domain the class of only those information sets of Pi which pertain to the jth level of play, Let us denote the collection of all such strategies for Pi at levelj by r J, Then, for such a game, the set of inequalities (23) can be written as (as a counterpart of (2.35))
}
(27)
which are to be satisfied for all yJEr}, i = 1," '"' N;j = 1" , "K. On any N tuple of Nash equilibrium strategies {I'1., . , , , yN · } that satisfies (27), let us impose the further restriction that it also satisfies (recursively) the following K Ntuple inequalities for all yJEr}, i = 1, . , , , N;j = 1, ' , , , K:
JI(YI,···, I'Ll, 1'1*; Yi,···, yL .. Yi*;··,; Yf, ... , y~ .. yf) ~ Jl(yI, . , , , yLh 1'1:; Yi, ... ,yLh yk'; . , , ; Yf, .. , , I'Ll> yf)
level K
J2(yI, ' , , , I'Ll' y};; y~, ' , , , yLh yk'; , , , ; Yf, ' , , , I'Ll, yn ~ J2(yI, ' , , , yLh y};; Yi, ' , " , yLh yi; , , , ; Yf, ' , . , I'Ll> yf),
' "N'(' r:": ' I " " j." i " · · " "2"""· i·····' N"'·'· N'" 'N.·)· J 1'1"'" YKh YK; 1'10"" YKh YK;"'; 1'1"'" YKI> YK ~ IN(yI,· .. , I'Ll> 1'1*; Yi,· .. , I'Ll> yk'; . . . ; Yf" .. , I'Ll> y~)
3
level KI
NPERSON NONZEROSUM
117
J '(yl, ... , yL2> yrh yr, yi, ... , Yi'" Yi'; ... ; y~, ... , yLz, y~*b yf) .::;; J1(YI, . . . , ykz, ykb yf; yi, ... , yLz, y2KO_" yr; . . . ; y~, ... , y~z, yZ*h y~') JZ(Yl, ... , yLz, Yk"h yr; Yi,· .. , yLz, Yi'h yi'; .. . ; y~, ... , yLz, yZ*h yf) s JZ(Yl,···, ykz, yrh yr; Yi, ... , yL'z, yLh yi'; ... ; y~, ... , yLz, YZ*" yZ*) IN(y:, .. . , yLz, yrh yr; Yi,· .. , yLz, yi'l' yi'; .. . ; y~, .. . , yLz, YZ*" yZ*) .::;; IN(Yl,···, yLz, yrh yr; yI. .. . , yLz, Yi'h yi'; ... ; y~, .. . , yLz, y~" yZ*)
DEFINITION 22 For an Nperson nonzerosum feedback game in extensive form with K levels of play, let {y'*, ... ,yN*} be an Ntuple of strategies satisfying (27) and (28) for all yJErJ, i = 1, ... , N;j = 1, ... ,K. Then, {yl*, ... ,yN*} constitutes a (pure)feedback Nash equilibrium solution for the feedback game. 0 PROPOSITION 9 Every Ntuple {y'*, .. . , yN*} that satisfies the set of inequalities (28) also satisfies the set ofN inequalitites (27). Hence, the requirement for satisfaction of (27) in De! 22 is redundant, and it can be deleted without any loss of generality.
Proof This basically follows the lines of argument used in the proof of Prop. 2.3, with two inequalities at each level of play replaced by N inequalities in the present case. 0
The set of inequalities (28) now readily suggests a recursive procedure for derivation of feedback Nash equilibria in feedback nonzerosum games. Starting at level K, we have to solve a singleact game for each information set
118
DYNAMIC NONCOOPERATIVE GAME THEORY
of the firstacting player at each level of play, and then appropriately concatenate all the equilibrium strategies (with restricted domains) thus obtained. If the singleact games to be encountered in this recursive procedure admit more than one (purestrategy) Nash equilibrium, the remaining analysis will have to be repeated for each one of these equilibria so as to determine the complete set of feedback Nash equilibria the multiact game admits. For nested multiact feedback games one might wish to restrict the class of Nash equilibria of interest even further, so that only those obtained by concatenation of delayed commitment type equilibria of singleact games are considered. The motivations behind such a further restriction are: (i) the delayed commitment mode of play conforms well with the notion of feedback Nash equilibria and it eliminates informational nonuniqueness, and (ii) Nash equilibria at every level of play can then be obtained by utilizing the recursive technique developed earlier in this section for singleact games (which is computationally attractive, especially for laddernested games). A precise definition now follows. DEFINITION 23 For a nonzerosum feedback game in extensive form, a feedback Nash equilibrium solution is of the delayed commitment type if its appropriate restriction at each level of play constitutes a delayed commitment type Nash equilibrium (cf. Defs 18, 20) for each corresponding singleact game to be encountered in the recursive derivation. 0
Heretofore, the discussion on multiact feedback games has been confined only to feedback Nash equilibria (of the delayed commitment type or otherwise). However, such games could very well admit other types of Nash equilibria which satisfy inequalities (23) but which do not fit the framework of Def. 22. The main reason for this phenomenon is the "informational nonuniqueness" feature of Nash equilibria in dynamic games, which has been discussed earlier in this section within the context of singleact games. In the present context, a natural extension of Prop. 7 is valid, which we provide below after introducing some terminology and also the concept of informational inferiority in nonzerosum feedback games. DEFINITION 24 Let I be an N person multiact nonzerosum game with a fixed order of play, which satisfies the first requirement of Def 21. An Nperson openloop game in extensive form (say, II) (cf. Def 2.13) is said to be the static (openloop) version of I if it can be obtained from I by replacing, at each level of play, the information sets of each player with a single information set. The equivalent normal form of the openloop version of I is the static normal form of I. 0
3
NPERSON NONZEROSUM
119
DEFINITION 25 Let I and II be two Nperson multiact nonzerosum games with fixed orders of play, which satisfy the first requirement ofDef. 22. Further let r\ and r~I denote the strategy sets of Pi in I and II, respectively. Then, I is informationally inferior to II if r\ ~ for all i E N, with strict inclusion for at least one i. 0
nl
PROPOSITION 10 Let I and II be two N person multiact nonzerosum games as introduced in De! 25, so that I is informationally inferior to II. Then, (i) any Nash equilibrium solution for I is also a Nash equilibrium solution for II, (ii) if {yI, ,yN} is a Nash equilibrium solution for II so that yi E rUor all i = 1, , N, then it is also a Nash equilibrium solution for I.
Proof It is analogous to the proof of Prop. 7, and is therefore omitted.
o
Remark 15 It should now be apparent why informationally nonunique Nash equilibria exist in nonzerosum multiact feedback games. Every nontrivial multiact feedback game in which the players have at least two alternatives at each level of play admits several informationally inferior multiact games. These different games in general admit different Nash equilibria which are, moreover, not interchangeable. Since everyone of these also constitutes a Nash equilibrium for the original multiact game (cf. Prop. 10), existence of a plethora of informationally nonunique equilibria readily follows. The further restrictions of Defs 22 and 23, imposed on the concept of noncooperative equilibrium, clearly eliminate this informational nonuniqueness, and they stand out as providing one possible criterion according to which a further selection can be made. This "informational nonuniqueness" property of noncooperative equilibrium in dynamic games is, of course, also featured by zerosum games, since they are special types of nonzerosum games, in which case the Nash equilibrium solution coincides with the saddlepoint solution. However, since all saddlepoint strategies have the ordered interchangeability property and further since the saddlepoint value is unique, nonuniqueness of equilibrium strategies does not create a problem in zerosum dynamic gamesthe main reason why we have not included a discussion on informational nonuniqueness in Chapter 2. It should be noted, however, that for 2person zerosum multiact feedback games, Def. 23 is redundant and Def. 22 becomes equivalent to Def. 2.12, which clearly dictates a delayed commitment mode of play. 0 The following example now serves to illustrate both the recursive procedure and the informational nonuniqueness of Nash equilibria in feedback nonzerosum games.
120
DYNAMIC NONCOOPERATIVE GAME THEORY
Example 12 Consider the multiact feedback game whose extensive form is depicted in Fig. 9. Since this is a 2person game (of the feedback type) it is definitely also laddernested (or equivalently, nested), and hence derivation of its delayed commitment type feedback Nash equilibria involves solutions of only static subextensive forms. At the second (and the last) level of play, there are four singleact games, one for each singleton information set of Pi. The first of these (counting from left) admits the unique Nash equilibrium solution of the delayed commitment type I"
1
1'2 ('12) = L,
2"
2
Y2 (tl2) =
{RL
L
if u~ = . o th erwise
(i)
with the Nash equilibrium outcome being (1,0). The second one admits the unique Nash equilibrium solution (ii) with an outcome of (1, 1). The third singleact game again has a unique Nash equilibrium, given by (iii) with an outcome of (0,  1/2).The last one is the singleact game of Example 6, which admits the unique delayed commitment type Nash equilibrium solution I"
1
Y2 (tl2) = R,
2"
2
Y2 (tl2) =
{LR
L
if u~ = otherwise,
(iv)
with the Nash outcome being (1,0). When (i)(iv) are collected together, we obtain the delayed commitment type Nash equilibrium solution of the
~I
P2( R
R
L
0,1 Fig. 9.
L 2.0
R
R
0,1 2,1
R
3,2
The multiact feedback nonzerosum game of Example 12·.
1,0
3
121
NPERSON NONZEROSUM
feedback game at level 2 to be uniquely given as
yntlD =
yr(tI~)
ui
= R {
L
'f 1
{U l = L,
u~ = L, ui = L u~ = R, ui = R, ul = M
(29a)
otherwise.
Now, crossing out the second level of play, we end up with the singleact game Pt
P2,~; 1,0
1,1
1,0
o,~
whose equivalent normal form is the bimatrix game
P2
A=
L
R
EH± @ L
1
P2 PI
B=
t~
L
@V
R
R
L
0
PI
R
which admits, as indicated, the unique Nash equilibrium solution
R}
yntl}) = yf(tli) = L
(29b)
with the corresponding outcome being (29c)
(0, 1/2).
In conclusion, the feedback game of Fig. 9 admits a unique feedback Nash equilibrium solution of the delayed commitment type, which is given by (29aH29b), with the corresponding Nash outcome being (29c). This recursively obtained Nash equilibrium solution, although it is unique as a delayed commitment type feedback Nash equilibrium, is clearly not unique as a feedback Nash equilibrium solution, since it is already known from Example 6 that the fourth singleact game encountered at the first step of the recursive derivation admits another Nash equilibrium solution which is not of the delayed commitment type. This equilibrium solution is the constant strategy pair given by (26aH26b), that is
yf(tlD = L,
yntl~)
=
L,
which has to replace (iv) in the recursive derivation. The Nash outcome
122
DYNAMIC NONCOOPERATIVE GAME THEORY
1
corresponding to this pair of strategies is (0, 1). The strategy pair (29a) will now be replaced by I'
1
'Yz (tlz)
z' z
'Yz (tlz)
{R
L,
R
if u~ = ui = = L otherwise
{R if
ui = L,
u~ =
= L otherwise
(30a)
L, ui = L
which also provides a Nash equilibrium solution at level 2. Now, crossing out the second level of play, this time we have the singleact game
P2~' P~
__" I
\.
L
R
1,0
1,1
L
R 0,1
o,~
whose equivalent normal form is
[ffijl P2
A
L R
0
®
L
R
... P2
PI,
B
R 1/2
L
8
PI
R
which admits, as indicated, the unique Nash equilibrium solution
R}
'Yl*(tlD = 'Yr(tli) = R
(30b)
with the corresponding outcome being (0, 1).
(3Oc)
We therefore have the conclusion that (30a) (30b) provide another feedback Nash equilibrium solution for the multiact game of Fig. 9, which is though not of the delayed commitment type. It is noteworthy that the unique feedback Nash solution of the delayed commitment type is in this case inadmissible, which follows immediately by a comparison of (29c) and (3Oc). To determine the other (nonfeedback type) Nash equilibria of this multiact game, the only possible approach is to bring it into equivalent normal form and to investigate Nash equilibria of the resulting bimatrix game. This is, however, quite a strenuous exercise, since the bimatrix game is of rather high dimensions. Specifically,at level 1 each player has 2 possible strategies, while at level 2 PI has 23 x 3 = 24 and P2 has 26 = 64 possible strategies. This then implies that the bimatrix game is of dimensions 48 x 128.Two Nash equilibria
3
123
NPERSON NONZEROSUM
of this bimatrix game are the ones given by (29)and (30), but there are possibly others which in fact correspond to Nash equilibria of multiact games which are informationally inferior to the one of Fig. 9; one of these games is the openloop game which is the static version of the original feedback game. 0
3.5.4
Multiact games: behavioural and mixed equilibrium strategies
Behavioural equilibrium strategies for an Nperson feedback game in laddernested extensive form can be derived by basically following the recursive procedure developed in section 3.5.3for purestrategy feedback equilibria, but this time by also considering the behaviouralstrategy Nash equilibria of the singleact games to be encountered in the derivation. The justification of applicability of such a recursive procedure is analogous to the one of behavioural saddlepoint equilibria offeedback games discussed in section 2.5. Specifically,if K denotes the number oflevels of play in the N person feedback game of the laddernested type, then a typical behavioural strategy yi of Pi in such a game can be decomposed into K components y~, ... , y~, where y~ E f'J is the corresponding behavioural strategy of Pi at his jth level of act. Furthermore, yJ has, as its domain, only the class of those information sets of Pi which pertain to the jth level of play. Then, the set of inequalities that determines the behavioural Nash equilibrium solution can be written as <
J1(1
~ ..~ ... :~'. '. '.Y.K~:.: : ~:. . . ~t'
'1
'2'
'N')) (31)
IN':;;; IN(Yl'; . . . ; yNt'; Yf, . . . , yz) which have to be satisfied for all yJ E f'J, j = 1, ... , k; i E N. Under the delayed commitment mode of play, we impose the further restriction that they satisfy the K Ntuple inequalities similar to (28),with only Y~ replaced by yJ and Ji by ]i. A counterpart of Prop. 9 also holds true in the present framework and this readily leads to recursive derivation of behaviouralstrategy Nash equilibria in feedback games of the laddernested type. This procedure always leads to a Nash equilibrium solution in behavioural strategies as the following proposition verifies. PROPOSITION 11 Every finite N person nonzerosum feedback game in laddernested extensive form admits a Nash equilibrium solution in behavioural strategies which can be determined recursively.
Proof This result is an immediate consequence of Prop. 8, since every feedback game in laddernested extensive form can be decomposed recursively into laddernested singleact games. 0
124
DYNAMIC NONCOOPERATIVE GAME THEORY
For feedback games which are not of the laddernested type, and also for other types of (nonfeedback) multiact games, Prop. 11 is not in general valid, and Nash equilibrium in behavioural strategies (even if it exists) cannot be obtained recursively. Then, the only approach would be to bring the extensive form into equivalent normal form, obtain all mixedstrategy Nash equilibria of this normal form, and to seek whether anyone of these equilibria would constitute a Nash equilibrium in behavioural strategies. This argument is now supported by the following two propositions which are direct counterparts of Props 2.4 and 2.5, respectively. PROPOSITION 12 Every Nsperson nonzerosum multiact finite game in extensive form admits a Nash equilibrium solution in mixed strategies.
Proof The result readily follows from Thm 2, since every such game can be transformed into an equivalent normal form in which each player has a finite number of strategies. 0 PROPOSITION 13 Every Nash equilibrium of a finite N person nonzerosum multiact game in behavioural strategies also constitutes a Nash equilibrium in the larger class of mixed strategies.
Proof Let {yl*Ef 1, ... ,yN'Ef'N} denote an Ntuple of behavioural strategies stipulated to be in Nash equilibrium, and let r i denote the mixedstrategy set of Pi, i E N. Since f'i c r i , we clearly have yj ' Er, for every i EN. Assume, to the contrary, that {y .', ... , YN'} is not a mixedstrategy Nash equilibrium solution; then this implies that there exists at least one i (say, i = N, without any loss of generality) for which the corresponding inequality of (31)t is not satisfied for all yi E In particular, there exists a yN E such that
rio
r
(i)
Now, abiding by our standard convention, let r denote the purestrategy set of PN, and consider the quantity N
F(yN) gJN(Yl'; . . . ; yNl'; yN)
defined for each yN ErN. (This is welldefined since r Nc r N.) The infimum of this quantity over r Nis definitely achieved (say, by yN' ErN), since r Nis a finite set. Furthermore, since f'N is comprised of all probability distributions on r N, inf F(yN) = inf F(yN) = F(yN'). [N rN t Here, by an abuse of notation, we take strategy N tuples.
l' to denote the average loss to
Pi under also mixed
3
125
NPERSON NONZEROSUM
We therefore have (ii) and also the inequality IN> F(yN·) in view of (i). But this is impossible since JN· = inf fN F (y N) and r completing the proof of the proposition.
N
c
r N, thus
0
3.6 THE STACKELBERG EQUILIBRIUM SOLUTION The Nash equilibrium solution concept that we have heretofore studied in this chapter provides a reasonable noncooperative equilibrium solution for nonzerosum games when the roles of the players are symmetric, that is to say, when no single player dominates the decision process. However, there are yet other types of noncooperative decision problems wherein one of the players has the ability to enforce his strategy on the other player(s), and for such decision problems one has to introduce a hierarchical equilibrium solution concept. Following the original work of H. von Stackelberg (1934), the player who holds the powerful position in such a decision problem is called the leader, and the other players who react (rationally) to the leader's decision (strategy) are called the followers. There are, of course, cases of multilevels of hierarchy in decision making, with many leaders and followers; but for purposes of brevity and clarity in exposition we will first confine our analysis to hierarchical decision problems which incorporate two players (decision makers}one leader and one follower. To set the stage to introduce the hierarchical (Stackelberg) equilibrium solution concept, let us first consider the bimatrix game (A, B) displayed (under our standard convention) as P2
P2 L
@S,
A=M
1
R
2
(iI2) Sz
OJN
3
PI, B =M
1
2
2
R
L
M
R
v , ~"
L
8
1
C::. 2/3')SZ
2
[Q)N
1
0
1
1(2
L
M
R
8,
....
_"
PI.
(32)
This bimatrix game clearly admits a unique Nash equilibrium solution in pure strategies, which is {M, M}, with the corresponding outcome being (1,0). Let us now stipulate that the roles of the players are not symmetric and PI can
126
DYNAMIC NONCOOPERATIVE GAME THEORY
enforce his strategy on P2. Then, before he announces his strategy, PI has to take into account possible responses ofP2 (the follower), and in viewof this, he has to decide on the strategy that is most favourable to him. For the decision problem whose possible cost pairs are given as entries of A and B, above, let us now work out the reasoning that PI (the leader) will have to go through. If PI chooses L, then P2 has a unique response (that minimizes his cost) which is L, thereby yielding a cost of 0 to PI. If the leader chooses M, P2's response is again unique (which is M), with the corresponding cost incurred to PI being 1. Finally, if he picks R, P2's unique response is also R, and the cost to PI is 2. Since the lowest of these costs is the first one, it readily follows that L is the most reasonable choice for the leader (PI) in this hierarchical decision problem. We then say that L is the Stackelberg strategy of the leader (PI) in this game, and the pair {L, L} is the Stackelberg solution with PI as the leader. Furthermore, the cost pair (0,  1)is the Stackelberg (equilibrium) outcome of the game with PI as the leader. It should be noted that this outcome is actually more favourable for both players than the unique Nash outcomethis latter feature, however, is not a rule in such games. If, for example, P2 is the leader in the bimatrix game (32), then the unique Stackelberg solution is {L, R}, with the corresponding outcome being (3/2,  2/3) which is clearly not favourable for PI (the follower) when compared with his unique Nash cost. For P2 (the leader), however, the Stackelberg outcome is again better than his Nash outcome. The Stackelberg equilibrium solution concept introduced above within the context of the bimatrix game (32)is applicable to all twoperson finite games in normal form, but provided that they exhibit one feature which was inherent to the bimatrix game (32)and was used implicitly in the derivation: the follower's response to every strategy of the leader should be unique. If this requirement is not satisfied, then there is ambiguity in the possible responses of the follower and thereby in the possible attainable cost levels of the leader. As an explicit example to demonstrate such a decision situation, consider the bimatrix game P2
A=
L R
ffiffij 2
2
1
L
M
R
BfA P2
PI,
B=
L
0
R
1 L
M
PI
R
(33) and with PI acting as the leader. Here, if PI chooses (and announces) L, P2 has two optimal responses Land M, whereas if PI picks R, P2 again has two
3
127
NPERSON NONZEROSUM
optimal responses, Land R. Since this multiplicity of optimal responses for the follower results in a multiplicity of cost levels for the leader for each one of his strategies, the Stackelberg solution concept introduced earlier cannot directly be applied here. However, this ambiguity in the attainable cost levels of the leader can be resolved if we stipulate that the leader's attitude is towards securing his possible losses against the choices of the follower within the class of his optimal responses, rather than towards taking risks. Then, under such a mode of play, PI's secured cost level corresponding to his strategy L would be 1, and the one corresponding to R would be 2. Hence, we declare r" = L as the unique Stackelberg strategy of PI in the bimatrix game of (33), when he acts as the leader. The corresponding Stackelberg cost for PI (the leader) is J 1* = 1. It should be noted that, in the actual play of the game, PI could actually attain a lower cost level, depending on whether the follower chooses his optimal response y2 = L or the optimal response y2 = M. Consequently, the outcome of the game could be either (1,0) or (0,0), and hence we cannot talk about a unique Stackelberg equilibrium outcome of the bimatrix game (33) with PI acting as the leader. Before concluding our discussion on this example, we finally note that the admissible Nash outcome of the bimatrix game (33) is (1,1) which is more favourable for both players than the possible Stackelberg outcomes given above. We now provide a precise definition for the Stackelberg solution concept introduced above within the context of two bimatrix games, so as to encompass all twoperson finite games of the singleact and multiact type which do not incorporate any chance moves. For such a game, let r and ragain denote the purestrategy spaces of PI and Pl, respectively, and Ji(yl, y2) denote the cost incurred to Pi corresponding to a strategy pair {yl E r, y2 E r 2 }. Then, we have 26 In a twoperson finite game, the set R 2(y l ) c each yl E r by DEFINITION
R 2 (yl ) =
gEr 2: J2(yl,~)
s
J2(yl,y2),
r defined for
V' y2Er 2}
is the optimal response (rational reaction) set of P2 to the strategy yl of PI. DEFINITION
(34) E
r
0
27 In a twoperson finite game with PI as the leader, a strategy
yl E r is called a Stackelberg equilibrium strategy for the leader, if
max Jl(yl*,y2) = min max y'ER'(yl*)
y'Er
'
Jl(yl,y2)~j1*.
(35)
y'ER'(yl)
The quantity JI* is the Stackelberg cost of the leader. If, instead, P2 is the
128
DYNAMIC NONCOOPERATIVE GAME THEORY
leader, the same definition applies with only the superscripts 1 and 2 interchanged. 0 THEOREM
3 Every twopersonfinite game admits a Stackelberg strategyfor the
leader. Proof Since r and r are finite sets, and R 2 (yl) is a subset of yl E r, the result readily follows from (35).
r for each
0
Remark 16 The Stackelberg strategy for the leader does not necessarily have to be unique. But nonuniqueness of the equilibrium strategy does not create any problem here (as it did in the case of Nash equilibria), since the Stackelberg cost for the leader is unique. 0
If R 2 (yl) is a singleton for each yl E r, then there exists a mapping T :r 1 > r such that y2 E R 2(yl) implies y2 = T 2 yl . This corresponds to the case when the optimal response of the follower (which is T 2 ) is unique for every strategy of the leader, and it leads to the following simplified version of (35) in Def. 27: Remark 17
2
j l (y 1',T2 yl')
= min J l(yl, T 2 yl ) s J 1'.
(36)
yJ Erl
Here J l' is no longer only a secured equilibrium cost level for the leader (Pi), but it is the cost level that is actually attained. 0 From the follower's point of view, the equilibrium strategy in a Stackelberg game is any optimal response to the announced Stackelberg strategy of the leader. More precisely DEFINITION 28 Let y l' E r be a Stackelberg strategy for the leader (Pi). Then, any element y2' E R 2 (y 1') is an optimal strategy for the follower (P2) that is in equilibrium with yl'. The pair {yl', y2'} is a Stackelberg solution for the game with Pi as the leader, and the cost pair (Jl(yl',y2'), J 2 (y l', y2' » is the 0 corresponding Stackelberg equilibrium outcome.
Remark 18 In the preceding definition, the cost level J 1 (y 1', Y2') could in fact
be lower than the Stackelberg cost J I ' _a feature which has already been observed within the context of the bimatrix game (33). However, if R 2 (y l' ) is a singleton, then these two cost levels have to coincide. 0 For a given twoperson finite game, let J I' again denote the Stackelberg cost of the leader (Pi), and J1 denote any Nash equilibrium cost for the same
3
129
N·PERSON NONZERO·SUM
player. We have already seen within the context of the bimatrix game (33) that J I' is not necessarily lower than J~, in particular, when the optimal response of
the follower is not unique. The following proposition now provides one sufficient condition under which the leader never does worse in a "Stackelberg game" than in a "Nash game". PROPOSITION 14 For a given twoperson finite game, let J defined before. If R 2 (yl) is a singleton for each yl E r, then
I'
and J ~ be as
Proof Under the hypothesis of the proposition, assume to the contrary that there exists a Nash equilibrium solution {y1° E r, v" E r 2 } whose corresponding cost to PI is lower than J \', i.e.
Jl' > Jl (yl 0, y20).
(i)
Since R 2(yl ) is a singleton, let T 2: r ..... r be the unique mapping introduced in Remark 17. Then, clearly, y20 = T 2 y 10, and if this is used in (i], together with the RHS of (36), we obtain min Jl (yi, T 2y l ) = J i,ler'
I'
> J 1 ( y l ° ,T 2y I0 ),
o
which is a contradiction.
Remark 19 One might be tempted to think that if a nonzerosum game admits a unique Nash equilibrium solution and a unique Stackelberg strategy (y I') for the leader, and further if R 2(y I') is a singleton, then the inequality of Prop. 14 still should hold. This, however, is not true as the following bimatrix game demonstrates:
P2
P2
A=
L
2
R L
R
PI,
B=
2
L R
L
PI.
R
Here, there exists a unique Nash equilibrium solution, as indicated, and a unique Stackelberg strategy r" = L for the leader (PI). Furthermore, the follower's optimal response to yl' = L is unique (which is y2 = L). However, 0= J I' > J ~ = 1. This counterexample indicates that the sufficient condition of Prop. 14 cannot be relaxed any further in any satisfactory way. 0
130
DYNAMIC NONCOOPERATIVE GAME THEORY
Since the Stackelberg equilibrium concept is nonsymmetric as to the roles of the players, there is no recursive procedure that can be developed (as in section 5) to determine the Stackelberg solution of dynamic games. The only possibility is the bruteforce method, that is, to convert the original twoperson finite dynamic game in extensive form into equivalent normal form (which is basically a bimatrix game), to exhaustively work out the possible outcomes of the game for each strategy choice of the leader, and to see which of those is the most favourable one for the leader. Since this is a standard application of Def. 27 and is no different from the approach adopted in solving the bimatrix games (32) and (33), we do not provide any further examples here. However, the reader is referred to section 8 for problems that involve derivation of Stackelberg equilibria in singleact and multiact dynamic games. The feedback (stagewise) Stackelberg solution Within the context of multiact dynamic games, the Stackelberg equilibrium concept is suitable for the class ofdecision problems in which the leader has the ability to announce his decisions at all of his possible information sets ahead of timean attitude which naturally forces the leader to commit himself irrevocably to the actions dictated by these strategies. Hence, in a sense, the Stackelberg solution is of the prior commitment type from the leader's point of view. There are yet other types of hierarchical multiact (multistage) decision problems, however, in which the leader does not have the ability to announce and enforce his strategy at all levels of play prior to the start of the game, but can instead enforce his strategy on the follower(s) at every level (stage) of the game. Such a hierarchical equilibrium solution, which has the "Stackelberg" property at every level of play (but not globally), is called a feedback Stackelberg solution. t We now provide below a precise definition of this equilibrium solution within the context of twoperson nonzerosum feedback games in extensive form (cf. Def. 21). To this end, let us adopt the terminology and notation of section 5, and consider I to be a twoperson nonzerosum feedback game in extensive form with K levels of play and with PI being the firstacting player at each level of play. Let a typical strategy ./ E r of Pi be decomposed as MEn, ... , Y~ Er i), where y stands for the corresponding strategy of Pi at the jth level of play. Furthermore, for a given strategy yi E r i , let yi, k and Yi. k denote two truncated versions of yi, defined as
J
yi, k = (yi, ... , yiK )
(37a)
and Yi,k =
M,···, ylt).
(37b)
t We shall henceforth refer to the Stackelberg equilibrium solution introduced through Defs 2628 as the global Stackelberq equilibrium solution whenever it is not clear from the context which type of equilibrium solution we are working with.
3
131
NPERSON NONZEROSUM
Then, we have DEFINITION
29
{Pi E r". 13 2 E
Using the preceding notation and terminology, a pair
r: constitutes afeedback Stackelberq solution for I with PI as
the leader if the following two conditions are fulfilled: (i) jl(Yl,k>pl,k'Y2,k'p2.k) = min )I~Er~
for all Y~
E
q, j =
max y:eRftp;Yl.k+l)
(38a)
1, ... , k  1; i = 1,2; and with k = K, K  1, ... , 1, where
Rf (/3; YI. k + I) is the optimal response set of the follower at level k, defined by R2(p. Ipl,k+1 °2p2,k+l) k 'YI,k+1 )6{o2 = Yk E r. k' j2(YI,k>1\, ,Y2,k>Yk,
(ii)
Rf (13; pi, YI, k) is a
... ,K.
singleton, with its only element being
s; k = 1,
The quantity jl(p l,f32) is the corresponding feedback Stackelberg cost of the leader. 0 Relations (38a) and (38b) now indicate that the feedback Stackelberg solution features the "Stackelberg" property at every level of play, and this can in fact be verified recursivelythus leading to a recursive construction of the equilibrium solution. Before we go into a discussion of this recursive procedure, let us first prove a result concerning the structures of Rf (.) and j 1(,),
PROPOSITION 15 For a twoperson feedback game that admits a feedback Stackelberg solution with PI as the leader, (i) Rf (f3; YI,k+ d is not a variant of YI, k' and hence it can be written as Ri. (13; Yk), (ii) jl(Yl.hpl,k,Y2,k,p 2,k) depends on YI,k and Y2,k only through the singleton information sets tit of PI at level k.
Proof Both of these results follow from the "feedback" nature of the twoperson dynamic game under consideration, which enables one to decompose the Klevel game recursively into a number of singleact games each of which corresponds to a singleton information set of PI (the leader). In accordance with this observation, let us start with k = K, in which case the proposition is trivially true, Relation (38a) then defines the (global) Stackelberg solution of
132
DYNAMIC NONCOOPERATIVE GAME THEORY
the singleact game obtained for each information set ,,1 of PI, and the quantity on the RHS of (38a) defines, for k = K, the Stackelberg cost for the leader in each of these singleact games, which is actually realized because of condition (ii)of Def. 29 (seeRemark 18). Then, the cost pair transferred to level k = K 1 is welldefined (not ambiguous) for each information set ,,1 of PI. Now, with the Kth level crossed out, we repeat the same analysis and first observe that J2 (.) in (38b)depends only on the information set "LI of PI and on the strategies yLJ and yLl of PI and P2, respectively, at level k = K 1, since we again have basicallysingleact games. Consequently, RLI (.) depends only on yLI' If the cost level of PI at k = K 1 is minimized subject to that constraint, PiI is by hypothesis a solution to that problem, to which a unique response of the follower corresponds. Hence, again a welldefined cost pair is determined for each game corresponding to the singleton information sets"L I of PI, and these are attached to these nodes to provide cost pairs for the remaining (K  2)level game. An iteration on this analysis then proves the result. 0 The foregoing proof already describes the recursive procedure involved to obtain a feedback Stackelberg solution of a twoperson feedback game. The requirement (ii) in Def. 29 ensures that the outcomes of the Stackelberg games considered at each level are unambiguously determined, and it imposes an indirect restriction on the problem under consideration, which cannot be tested a priori. This then leads to the conclusion that a twoperson nonzerosum feedback game does not necessarily admit a feedback Stackelberg solution. There are also cases when a feedback game admits more than one feedback Stackelberg solution. Such a situation might arise if, in the recursive derivation, one or more of the singleact Stackelberg games admit multiple Stackelberg strategies for the leader whose corresponding costs to the follower are not the same. The leader is, of course, indifferent to these multiple equilibria at that particular level where they arise, but this nonuniqueness might affect the (overall) feedback Stackelberg cost of the leader. Hence, if {Pi, p2 } and { are two feedback Stackelberg solutions of a given game, it could very well happen that jl(PI, P2) < JI(~I, ~2), in which case the leader definitely prefers pi over ~I and attempts to enforce the components of pi at each level of the game. This argument then leads to a total ordering among different feedback Stackelberg solutions of a given game, which we formalize below.
e, e}
DEFINITION 30 A feedback Stackelberg solution {Pi, P2} of a twoperson nonzerosum feedback game is admissible if there exists no other feedback Stackelberg solution gl, ~2} with the property JI(~I, < JI(PI, p2). 0
e)
3
133
NPERSON NONZEROSUM
It should be noted that if a feedback game admits more than one feedback Stackelberg solution, then the leader can definitely enforce the admissible one on the follower since he leads the decision process at each level. A unique feedback Stackelberg solution is clearly admissible. We now provide below an example of a 2level twoperson nonzerosum feedback game which admits a unique feedback Stackelberg solution. This example also illustrates recursive derivation of this equilibrium solution. Example 13 Consider the multiact feedback nonzerosum game of Fig. 9. To determine its feedback Stackelberg solution with Pl acting as the leader, we first attempt to solve the Stackelberg games corresponding to each of the four information sets of Pl at level 2. The first one (counting from the left) is a trivial game, admitting the unique equilibrium solution 1.
1
2.( 2
{R
L
if u1 = otherwise,
Y2 ('12) = L, Y2 '12) = L
(i)
with the corresponding outcome being (1,0). The next two admit, respectively, the following bimatrix representations whose unique Stackelberg equilibria are indicated on the matrices:
r.
P2 A=
L R
P2
~ CD
5
L
R
PI
B=
L R
P2
A=
L R
tllij 2
2
L
R
CD
2
L
R
®II
PI
(ii)
I PI.
(iii)
P2
PI
B=
L R
I
1
0
L
R
Finally, the fourth one admits the trivial bimatrix representation P2 A= L
~ L
P2 2 R
I PI
B= L
I
8 Q] L
R
PI
(iv)
134
DYNAMIC NONCOOPERATIVE GAME THEORY
together with the 2 x 2 bimatrix representation P2
A
P2
f3TOl PI,
=M
R~ L
B=
M
R
R
EEm 3
1
@
L
R
PI
(v)
with the unique Stackelberg solutions being as indicated, in each case. By a comparison of the leader's cost levels at (iv)and (v),we conclude that he plays R at the fourth of his information sets at level 2. Hence, in going from the second to the first level, we obtain from (i), (ii), (iii) and (v)the unique cost pairs at each of Pi's information sets to be (counting from the left): (1,0), (1,1), (0, i/2) and (1,0). Then, the Stackelberg game at the first level has the bimatrix representation P2 P2
A=
L R
t~Pl' L
B=
L R
I~I
L
R
~ IPI
(vi)
R
whose unique Stackelberg solution with Pi as leader is as indicated. To recapitulate, the leader's unique feedback Stackelberg strategy in this feedback game is
and the follower's unique optimal response strategy is
yi"(tli)
= L,
yr(tl~)
={
R L
if u~ = L and u~ = L, or u~ ui = R and u~ L otherwise.
+
=R
The unique feedback Stackelberg outcome of the game is (0, 1/2), which follows from (vi) by inspection. 0 Before concluding our discussion on the feedback Stackelberg equilibrium solution, we should mention that the same concept has potential applicability to multiact decision problems in which the roles of the players change from one level of play to another, that is, when different players act as leaders at different stages. An extension of Def. 29 is possible so that such classes of decision problems are also covered; but since this involves only an appropriate change of indices and an interpretation in the right framework, it will not be covered here.
3
135
NPERSON NONZEROSUM
Mixed and behavioural Stackelberg equilibria The motivation behind introducing mixed strategies in the investigation of saddlepoint equilibria (in Chapter 2) and Nash equilibria (in section 2) was that such equilibria do not always exist in pure strategies, whereas within the enlarged class of mixed strategies one can insure existence of noncooperative equilibria. In the case of the Stackelberg solution of twoperson finite games, however, an equilibrium always exists (cf. Thm 3),and thus, at the outset, there seems to be no need to introduce mixed strategies. Besides, since the leader dictates his strategy on the follower, in a Stackelberg game, it might at first seem to be unreasonable to imagine that the leader would ever employ a mixedstrategy. Such an argument, however, is not always valid, and there are cases when the leader can actually do better (in the average sense) with a proper mixed strategy, than the best he can do within the class of pure strategies. As an illustration of such a possibility, consider the bimatrix game (A, B) displayed below:
P2
A=LffiO R
0
1
L
R
PI,
/ tffia P2
B =
L R
2 1
1/2
L
R
1
Pl.
(39)
If PI acts as the leader, then the game admits two purestrategy Stackelberg equilibrium solutions, which are {L, L}, and {R, R}, the Stackelberg outcome in each case being (1,1/2). However, if the leader (PI) adopts the mixed strategy which is to pick Land R with equal probability 1/2, then the average cost incurred to PI will be equal to 1/2, quite independent of the follower's (pure or mixed) strategy. This value J1 = 1/2 is clearly lower than the leader's Stackelberg cost in pure strategies, which can further be shown to be the unique Stackelberg cost of the leader in mixed strategies, since any deviation from (1/2, 112) for the leader results in higher values for J1, by taking into account the optimal responses of the follower. The preceding result then establishesthe significance of mixed strategies in the investigation of Stackelberg equilibria of twoperson nonzerosum games, and demonstrates the possibility that a proper mixedstrategy Stackelberg solution could lead to a lower cost level for the leader than the Stackelberg cost level in pure strategies. To introduce the concept of mixedstrategy Stackelberg equilibrium in mathematical terms, we assume the twoperson nonzerosum finite game to be in normal form (without any loss of generality) and associate with it a bimatrix game (A, B). Abiding by the notation and terminology of section 2, we let Yand Z denote the mixedstrategy spaces of PI and P2, respectively, with their typical elements designated as y and z. Then, we have
136
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION
31 For a bimatrix game (A,B), the set lP(y)
= {zOEZ:y'Bz°:5: y'Bz, 'v'ZEZ}
(40)
is the optimal response (rational reaction) set of P2 in mixed strategies to the 0 mixed strategy y E Yof PI. DEFINITION 32 In a bimatrix game (A, B) with PI acting as the leader, a mixed strategy y*E Yis called a mixed Stackelberg equilibrium strategy for the leader if
max y*' Az ZEJl"'(y·)
= inf max y' Az
~Il
..
(41)
yEY ZER'(y)
The quantity P' is the Stackelberg cost of the leader in mixed strategies. 0 It should be noted that the "maximum" in (41) always exists since, for each
IP (y) is a closed and bounded subset of Z (which is a finite dimensional simplex). Hence, p. is a welldefined quantity. The "infimum" in (41), however, cannot always be replaced by a "minimum", unless the problem admits a mixed Stackelberg equilibrium strategy for the leader. The following example now demonstrates the possibility that a twoperson finite game might not admit a mixedstrategy Stackelberg strategy even though II· < p •.
y E Y, y' Az is continuous in z, and
(Counter) Example 14 Consider the following modified version of the
bimatrix game of (39): P2
ffiTIO R
A = L
0
1
L
R
P2
PI,
B
=
L~/2 R
1
1
1/3
L
R
PI.
With PI as the leader, this bimatrix game also admits two purestrategy Stackelberg equilibria, which are {L, L} and {R, R}, the Stackelberg cost for the leader being J" = 1. Now, let the leader adopt the mixed strategy y = (Yt> (1  Yd)" under which J2 is J2(y,z) = y' Bz = (hI
+J)ZI +!YI +t,
where z = (ZI' (1 ZI))' denotes any mixed strategy of P2. Then, the mixedstrategy optimal response set of P2 can readily be determined as
{Z = (I,D)} , if Y1 > 4/7 lP(y) = { {z = (0, I)}, ~f Yl < 4/7 Z , If YI = 4/7.
3
NPERSON NONZEROSUM
137
Hence, for YI > 4/7, the follower chooses "column 1" with probability 1, and this leads to an average cost of JI = Yl for Pi. For Yl < 4/7, on the other hand, P2 chooses "column 2" with probability 1,which leads to an average cost level of]1 = (1  Yl) for Pi. Then, clearly, the leader will prefer to stay in this latter region; in fact, if he employs the mixed strategy Y = (4/7  e, 3/7 + e)' where s > 0 is sufficiently small, his realized average cost will be J 1 = 3/7 + s, since then P2 will respond with the unique purestrategy y2 = R. Since e > 0 can be taken as small as possible, we arrive at the conclusion that 1'" = ~ < 1 = P", In spite of this fact, the leader does not have a mixed Stackelberg strategy since, for the only candidate yO = (4/7,3/7), IP (yO) = Z, and therefore max_ER'(/lyO'Az = 4/7 which is higher than ]1*. 0 The preceding example thus substantiates the possibility that a mixed Stackelberg strategy might not exist for the leader, but he can still do better than his pure Stackelberg cost J I" by employing some suboptimal mixed strategy (such as the one Y = (4/7  s, 3/7 + e)' in Example 14, for sufficiently small e > 0), In fact, whenever JI" < P", there will always exist such an approximating mixed strategy for the leader. If P" = P", however, it is, of course, reasonable to employ the pure Stackelberg strategy which always exists by Thm 3. The following proposition now verifies that 1'" < J I" and 1'" = J I" are the only two possible relations we can have between P* and J 1*; in other words, the inequality Jl" > P" never holds, PROPOSITION
16
For every twoperson finite game, we have
P" :$: P",
(42)
Proof Let Yo denote the subset of Y consisting of all onepoint distributions, Analogously, define Zo as comprised of onepoint distributions in Z. Note that Yo is equivalent to r, and Zo is equivalent to r 2 , Then, for each yEYo,
min y' Bz = min y' Bz zeZ
zeZo
since any minimizing solution in Z can be replaced by an element of Zo, This further implies that, for each y E Yo, elements of IP (y) are probability distributions on R 2 (y), where the latter set is defined by (40) with Z replaced by Zo' Now, since Yo c y,
JI" = min max y' Az YEY ZER'(yl
:$: min YEY,
max y' Az, ZER'(y)
and further, because of the cited relation between
iP (.) and R 2 (,), the latter
138
DYNAMIC NONCOOPERATIVE GAME THEORY
quantity is equal to min max y' Az yEYo zER'(y)
which, by definition, is po, since R 2 (y) is equivalent to the purestrategy optimal response set of the follower, as defined by (34). Hence, Jr s sr. 0 Computation of a mixedstrategy Stackelberg equilibrium (whenever it exists) is not as straightforward as in the case of purestrategy equilibria, since the spaces Yand Z are not finite. The standard technique is first to determine the minimizing solution(s) of min y' Bz ZEZ
as functions of y E Y. This will lead to a decomposition of Y into subsets (regions), on each of which a reaction set for the follower is defined. (Note that in the analysis of Example 14, Y has been decomposed into three regions.) Then, one has to minimize y' Az over y E Y, subject to the constraints imposed by these reaction sets, and under the stipulation that the same quantity is maximized on these reaction sets whenever they are not singletons. This bruteforce approach also provides approximating strategies for the leader, whenever a mixed Stackelberg solution does not exist, together with the value of]l *. If the twoperson finite game under consideration is a dynamic game in extensive form, then it is more reasonable to restrict attention to behavioural strategies (cf. Def. 2.9). Stackelberg equilibrium within the class of behavioural strategies can be introduced as in Defs 31 and 32, by replacing the mixed strategy sets with the behavioural strategy sets. Hence, using the terminology and notation ofDef. 2.9,we have the following counterparts of Defs 31 and 32, in behavioural strategies: DEFINITION 33 Given a twoperson finite dynamic game with behaviouralstrategy sets (fl,f2) and average cost functions (p,J2), the set
iP (},l) = {y t' Ef2 :J 2(y 1,yt') S J2 (y 1, y2), 'v' y2 Ef2}
(43)
is the optimal response (rational reaction) set of P2 in behavioural strategies to the behavioural strategy yl Efl of PI. 0 DEFINITION 34 In a twoperson finite dynamic game with PI acting as the leader, a behavioural strategy y10Ef 1 is called a behavioural Stackelberg
3
139
NPERSON NONZEROSUM
equilibrium strategy for the leader if
sup
y'ER'(yl')
The quantity strategies.
P(,yl',y2) = inf
sup p(yl, Y2)~Jl'.
P* is the
(44)
y'ER'(y')
y'EI"
Stacke/berg cost of the leader in behavioural
0
Remark 20 The reason why we use the "supremum" in (44), instead of the "maximum", is because the set iF ('yl) is not necessarily compact. The behavioural Stackelberg cost for the leader is again a welldefined quantity, but a behavioural Stackelberg strategy for the leader does not necessarily exist.
o
We now conclude our discussion on the notion of behavioural Stackelberg equilibrium by presenting an example of a singleact dynamic game that admits a Stackelberg solution in behavioural strategies. Example 15 Consider the twoperson singleact game whose extensive form is depicted in Fig. 10. With P2 acting as the leader, it admits two purestrategy Stackelberg solutions: yZ'('1 2) =
1
{L if u =.R
R,
yl'('11) =
R otherwise,
and with the unique Stackelberg outcome in pure strategies being (0, t). Now, let us allow the leader to use behavioural strategies. The bimatrix game at the first (counting from the left) of the leader's information sets is P2
P2
A=
L M
lli 1
1
2
L
R
B=
PI,
L
M
~ o1 01
E
L
R
PI R
M
,
,
P2 ':
"
L
R
L
R
L
2,0
1.1
1.1
2.0
o.~
'
~
R 2~
Fig. 10. The twoperson singleact game of Example 15.
PI
140
DYNAMIC NONCOOPERATIVE GAME THEORY
for which J2' = 1/2. The mixed Stackelberg strategy exists for the leader, given as 2 w.p. 1/2 y = R w.p.l/2,
{L
and any mixed strategy is an optimal response strategy for PI (the follower) yielding him an average cost level of] 1 = 3/2. We thus observe that if the leader can force the follower to play Lor M, his average cost can be pushed down to 1/2 which is lower than J 2' = 2/3, and the follower's cost is then pushed up to 3/2 which is higher than the cost level of J 1 = 0 which would have been attained if P2 were allowed to use only pure strategies. There indeed exists such a (behavioural) strategy for P2, which is given as L w.p. 1/2} . 1 y2'(17 2 ) = R w.p.l/2 if u =LorM (i) {
R
w.p. 1
if u l
=
R
and the behavioural Stackelberg cost for the leader is }2' = 1/2.
(ii)
It should be noted that the follower (PI) does not dare to play u = R since, by announcing (i)ahead of time, the leader threatens him with an undesirable cost level of JI = 2. We now leave it to the reader to verify that the behavioural strategy (i) is indeed a Stackelberg strategy for the leader and the quantity (ii) is his Stackelberg cost in behavioural strategies. This verification can be accomplished by converting the original game in extensive form into equivalent normal form and then applying Def. 34. Another alternative method is to show that (ii) is the mixedstrategy Stackelberg cost for the leader, and since (i) attains that cost level,it is also a behavioural Stackelberg strategy, by the result of Problem 21 (section 8). 0 l
Many player games
Heretofore, we have confined our investigation on the Stackelberg concept in nonzerosum games to twoperson games wherein one of the players is the leader and the other one is the follower. Extension of this investigation to Nplayer games could be accomplished in a number of ways; here we discuss the possible extensions to the case when N = 3, with the further generalizations to N > 3 being conceptually straightforward. For threeperson nonzerosum games, we basically have three different possible modes of play among the players within the framework of noncooperative decision making.
3
141
NPERSON NONZEROSUM
(1) There are two levels of hierarchy in decision makingone leader and two followers. The followers react to the leader's announced strategy by playing according to a specific equilibrium concept among themselves (for instance, Nash). (2) There are still two levels of hierarchy in decision makingnow two leaders and one follower. The leaders play according to a specific • equilibrium concept among themselves, by taking into account possible optimal responses of the follower. (3) There are three levels of hierarchy. First PI announces his strategy, then P2 determines his strategy by also taking into account possible responses of P3 and enforces this strategy on him, and finally P3 optimizes his cost function in view of the announced strategies of PI and P2. Let us now provide precise definitions for the three types of hierarchical equilibria outlined above. To this end, let ji (yl, y2, y3) denote the cost function of Pi corresponding to the strategy triple {yl E r', y2 E r, y3 Er 3}. Then, we have DEFINITION 35 For the 3person finite game with one leader (PI) and two followers (P2 and P3), y[.E r is a hierarchical equilibrium strategy for the leader if
max
jl (y[',y2,y3) = min
max
jl(/,yl,y3)
(45a)
y'Er' (y'.y')ERF(y')
(y'.y')ERF(yl')
where R F(yl) is the optimal response set of the followers' group and is defined for each yl E r by and
RF(yl)={(e,e3)Er2xr3:j2(/,e2,e3)~j2(yl,y2,e3)
(45b)
j3(/,e2,e3)~j3(yl,e2,y3),Vy2Er2,y3Er3}.
Any (y2', y 3') E R F (y[.) is a corresponding optimal strategy pair for the followers' group. 0 DEFINITION 36 For the 3person finite game with two leaders (PI and P2) and one follower (P3), yj' E r is a hierarchical equilibrium strategy for Pi (i = 1,2) if
max yJeRJ(yl·;yl·l
max yJ e R 3(yl·;y2)
jl(yl',y2',y3)=min }'ler 1
j2 (yl', y2', y3) = min y1 e r2
max
jl(/,y2',y3)
(46a)
j2(yl',y2,y3)
(46b)
yJ eR J (y l ; )'2· }
max y'ER'(yl';y')
where R 3 (yl; y2) is the optimal response set of the follower and is defined for each (yl, y2)Er l x r by
142
DYNAMIC NONCOOPERATIVE GAME THEORY
R 3 (yl; y2) =
g E r. J3 (yl,
y2,~)
~
J3 (yl, y2, y3), V y3 E r 3 } .
(46c)
Any strategy y3° E R 3 (y \0; yZO) is a corresponding optimal strategy for the follower (P3). 0 DEFINITION 37 For the 3person finite game with three levels of hierarchy in decision making (with PI enforcing his strategy on P2 and P3, and P2 enforcing his strategy on P3), y 1° E r is a hierarchical equilibrium strategy for PI if
max
max
y1eS 2(yl·)
Jl (y10, y2, y3)
y3 eSJ(yl·;yl)
max
= min
ylef l
max y 1 e S 2(yl)
Jl(yl,y2;y3)
(47a)
y'ES'(y'.y')
where max
S2(yl)~gEr2:
J2(yl,~,y3)
y'ES'(y';,)
s
max
J2(yl,y2,y3), VylEr 2 }
(47b)
y'ES'(y';y')
S3 (yl;y2) ~ g Er 3 :J 3 (yl,
y2,~)
~
J3 (yl, y2, y3), V y3 Er 3}.
(47c)
Any element yZOES Z (yIO) is a corresponding optimal strategy for P2, and any y30E S 3 (y [0, yZO) is an optimal strategy for P3 corresponding to the pair (y [0, yZO). 0 Several simplifications in these definitions are possible if the optimal response sets are singletons. In the case of Def. 35, for example, we would have: PROPOSITION
for each yl such that
E
17 In the 3personjinite game of De! 35, if R F(yl) is a singleton
r, then there exist unique mappings T 2 : r + r and T 3 : r + rJl (yIO, T 2 y1°,T3 r")
= min
J 1 (yl, T 2 yl, T 3 yl)
(48a)
yl erl
and J2 (yl, T 2 yl, T 3 yl) J3(yl,T 2 yl,T 3 yl)
s
J2 (yl, y2,T3 yl), V y 2 Er 2
(48b)
s
P(yl,T 2 yl,y3),
(48c)
v y3 Er
3,
3
NPERSON NONZEROSUM
143
Analogous results can be obtained in cases of Defs 36 and 37, if the optimal response sets are singletons. The following example now illustrates all these cases with singleton optimal response sets. Example 16 Using the notation of section 3, consider a 3person game with possible outcomes given as
(aLl,l,aI,l,l,ai.l,l) = (1,0,0) (aL2,l,aI,2,l,ai.2,d = (2,1, 1) (atl,1>atl,l,atl,d = (0,0,1)
= (aLl,2,aI,1,2,ai.l,2) = (aL2,2,ai,2,2,ai.2,2) = (atl,2,atl,2,atl,2) = (at2,2,at2,2,at2,2) = (at2,1,a~,2,1>at2,d
(2, 1,1)
(1, 1,1) (0,1,0) (1,2,0) (0,1,2)
and determine its hierarchical equilibria (of the three different types discussed above). 1. One leader (PI) and two followers (P2 and P3) For the first alternative of PI (i.e, n 1 = 1),the bimatrix game faced by P2 and P3 is
P2 (2,1,  1) (0,1,0)
(1,1,1)
P3
which clearly admits the unique Nash equilibrium solution {nt = 1, nj = I} whose cost to the leader is J 1 = 1. For nl = 2, the bimatrix game is P2 (0,0,1) (1,2,0)
(0,1,2)
P3
with a unique Nash equilibrium solution {n~ = 2, nj = I} whose cost to PI is J 1 = 2. This determines the mappings T 2 and T 3 of Prop. 17 uniquely. Now, since 1 < 2, the leader has a unique hierarchical equilibrium strategy that satisfies (48a), which is y.* = "alternative I", and the corresponding unique optimal strategies for P2 and P3 are y2* = "alternative I", y3* = "alternative I", respectively. The unique equilibrium outcome is (1,0,0).
144
DYNAMIC NONCOOPERATIVE GAME THEORY
2. Two leaders (PI and P2) and one follower (P3) There are four pairs of possible strategies for the leaders' group, which are listed below together with the corresponding unique optimal responses of the follower and the cost pairs incurred to PI and P2:
{n1
=
l,n 2 = I}
{nl = l,n2 = 2} {n 1 = 2 , n 2=1} {nl=2,n2=2}
nt = 1 + (1,0) nt = 1 + (2, 1) nt = 2 + (  1, 2) nt = I + (2,  1).
The two leaders are faced with the bimatrix game P2
(1,0)
(2,1)
PI
( 1,2)
whose Nash solution is uniquely determined as {nf = 2, n! = 2}, as indicated. Hence, the hierarchical equilibrium strategies of the leaders are unique: ylo = "alternative 2", y2° = "alternative 2". The corresponding unique optimal strategy of P3 is y3° = "alternative 1", and the unique equilibrium outcome is
(2,1,1). 3. Three levels of hierarchy (first PI, then P2, and finally P3) We first observe that S2 (yl) is a singleton and is uniquely determined by the mapping p2: r + r 2 defined by
{I
2 _ p n1  2
if nl = 1 n  2 .
1'f
1
Hence, if PI chooses n 1 = 1, his cost will be J 1 = 1, and if he chooses nl = 2, his cost will be Jl = 2. Since 1 < 2, this leads to the conclusion that PI's unique hierarchical equilibrium strategy is ylo = "alternative I" and the corresponding unique optimal strategies of P2 and P3 are y2° = "alternative 1" and y3° = "alternative I", respectively. The unique equilibrium outcome is (1, 0, 0) 0
3.7
NONZEROSUM GAMES WITH CHANCE MOVES
In this section, we briefly discuss an extension of the general analysis of the previous sections to the class of nonzerosum games which incorporate chance moves, that is, games wherein the final outcome is determined not only by the
3
NPERSON NONZEROSUM
145
actions of the players, but also by the outcome of a chance mechanism. A special class of such decision problems was already covered in section 2.6 within the context ofzerosum games, and the present analysis in a way builds on the material of section 2.6 and extends that analysis to nonzerosum games under the Nash and Stackelberg solution concepts. To this end, let us first provide a precise definition of an extensive form description of a nonzerosum finite game with chance moves. DEFINITION 38 Extensiveform of an Nperson nonzerosum finite game with chance moves is a tree structure with (i) a specific vertex indicating the starting point of the game, (ii) N cost functions, each one assigning a real number to each terminal vertex of the tree, where the ith cost function determines the loss incurred to Pi for each possible set of actions of the players together with the possible choices of nature, (iii) a partition of the nodes of the tree into N + 1 player sets (to be denoted by Ri for Pi, i E N, and by N° for nature), (iv) a probability distribution, defined at each node of N°, among the immediate branches (alternatives) emanating from that node, (v) a subpartition of each player set Ri into information sets {'1~} such that the same number of immediate branches emanate from every node belonging to the same information set and no node follows another node in the same information set. 0
For such a game, let us again denote the (pure) strategy space of Pi by r, with a typical element designated as 'yi. For a given Ntuple of strategies {yi E ri;j EN}, the cost function Ji (y1, ... , yN) of Pi is in general a random quantity, and hence the real quantity of interest is the one obtained by taking the average value of J i over all relevant actions of nature and under the a priori known probability distribution. Let us denote that quantity by L i (y1, ... , ')' N) which in fact provides us with a normal form description of the original nonzerosum game with chance moves in extensive form. Apart from a difference in notation, there is no real difference between this description and the normal form description studied heretofore in this chapter, and this observation readily leads to the conclusion that all the concepts introduced and general results obtained in the previous sections on the normal form description of nonzerosum games are also equally valid in the present context. In particular, the concept of Nash equilibrium for N person nonzerosum finite games with chance moves can be introduced through Def. 12 by merely replacing Ji with L i (i = 1, ... , N). Analogously, the Stackelberg equilibrium solution in 2person nonzerosum finite games with chance moves can be introduced by replacing Ji with L i (i = 1,2) in Defs 2629. Rather than to list
146
DYNAMIC NONCOOPERATIVE GAME THEORY
all these apparent analogies, we shall consider, in the remaining part of this section, two examples which illustrate derivation of Nash and Stackelberg equilibria in static and dynamic games that incorporate chance moves. Example 17 Consider the twoperson static nonzerosum game depicted in Fig. 11. If nature picks the left branch, then the bimatrix game faced by the players is
P2
L
(4,0)
(0,4)
R
(0,0)
(4,4)
L
R
PI,
(i)
whereas if nature picks the right branch, the relevant bimatrix game is P2 L
(8,  4)
(4,  8)
R
(4,4)
(0, 4)
L
R
PI.
(ii)
Since the probabilities with which nature chooses the left and right branches are 1/4 and 3/4, respectively, the final bimatrix game of average costs is
P2 L
(7,  3)
(3,  5)
R
(3,3)
(1, 2)
L
R 1
4
PI.
(iii)
NATURE .J 4
Fig. 11. A singleact static nonzerosum game that incorporates a chance move.
3
NPERSON NONZEROSUM
147
This game admits a unique Nash equilibrium solution in pure strategies, which is 1'1" = R, 1'2" = R with an average outcome of (  1,  2). To determine the Stackelberg solution with PI as the leader, we again focus our attention on the bimatrix game (iii) and observe that the Stackelberg solution is in fact the same as the unique Nash solution determined above.
o
Example 18
Let Xo be a discrete random variable taking the values Xo
=
 I w.p. 0 W.p. { 1 W.p.
1/4 1/2 1/4.
We view Xo as the random initial state of some system under consideration. PI, whose decision variable (denoted by u l ) assumes the values  1 or + 1, acts first, and as a result the system is transferred to another state (Xl) according to the equation Xl = Xo + u l . Then P2, whose decision variable u 2 can take the same two values (  1 and + 1), acts and transfers the system to some other state (x 2 ) according to X2
= Xl +u 2 •
It is further assumed that, in this process, PI does not make any observation (i.e. he does not know what the real outcome of the chance mechanism (value of xo) is), whereas P2 observes sgntx.), where
sgn(x)g
X < 0 x =0 { + 1, x> O.
 I, 0,
This observation provides P2 with partial information concerning the value of Xo and the action of Pl. The cost structure of the problem is such that Pi wishes to minimize J i, i = 1, 2, where jl = mod(x2)+U l J2 = mod (X2) u 2
with
mod(x)~
{
if x = 1,2 if x = 3,0,3 + 1, if x = 2,1. 1, 0,
148
DYNAMIC NONCOOPERATIVE GAME THEORY
We now seek to obtain Nash equilibria and also Stackelberg equilibria of this singleact game with either player as the leader. To this end, let us first note that this is a finite nonzerosum game which incorporates a chance move, but it has not been cast in the standard framework. Though, by spelling out all possible values of J 1 and J 2, and the actions of the players leading to those values, an extensive tree formulation can be obtained for the game, which is the one depicted in Fig. 12. This extensive form also clearly displays the information sets and thereby the number of possible strategies for each player. Since PI gains no information, he has two possible strategies:
yt=I,
y~=+l.
P2, on the other hand, has eight possible strategies which can be listed as yi(I]2)
y~(I]2)
= I,
=
2( 2)_{+ 1, if 1]2 =
Y4
I]

I]

2 ( 2)
Ys
y2(I]2) 3
I]i
y2 ( 2) 5 I]
 1, otherwise
2( 2)_
Y6
.
+ I,
I]
{+
1, if 1]2
= I]~
=
{
Y7 I]
 1, otherwise
2
2
= 1]1 { + I, otherwise ,I
I]
=  I ,I'f I] 2 = 1]22
2(2)
.
I 'f

+ 1, otherwise I ,I'f I] 2

2
=1]3
= { + 1, otherwise
{ + 1, if 1]2 = I]~ = _ 1, otherwise.
Then, the equivalent bimatrix representation in terms of average costs is P2
A=
1
1
1
1/2
3/2
3/2
2
1
3/4
3/4
1
1/4
2
3
4
5
1
1
3/2 3/2
1/4
1/2 6
7
PI,
8
__ ~,1 I,
;
P1 "
u2 ,
1 2,0 2,2
l 0,0
2,0
0,2
\1
1,1
0.2 2,0
0,0
2.2
1,1
Fig. 12. Extensive tree formulation for the singleact twoperson game of Example 18,
3
B=
1III 2
CD 1
149
NPERSON NONZEROSUM
P2
R~11lT=ll
11 1 5/4 5/413 3/4
1 3/4
~
I PI.
8
234567
Since the first row of matrix A strictly dominates the second row, PI dearly has a unique (permanent) optimal strategy yl' = 1 regardless of whether it is a Nash or a Stackelberg game. P2 has four strategies that are in Nash or Stackelberg equilibrium with yl', which are y~, y~, y~ and y~. Hence, we have four sets of Nash equilibrium strategies, with the equilibrium outcome being either ( 1,  1) or ( 3/2,  1). If PI is the leader, his Stackelberg strategy is yl* = 1 and the Stackelberg average cost (as defined in Def. 27) is U* = 1. He may, however, also end up with the more favourable average cost level of L I =  3/2, depending on the actual play of P2. If P2 is the leader, he has four Stackelberg strategies (yt y~, y~ and y~), with his unique Stackelberg average cost being L 2' =  1. To summarize the results in terms of the terminology of the original game problem, regardless of the specific mode of play (i.e. whether it is Nash or Stackelberg), PI always chooses u l = 1, and P2 chooses u 2 = + 1 if Xl < 0, and is indifferent between his two alternatives (1 and + 1) if Xl ~ O. 0
3.8 PROBLEMS 1. Determine the Nash equilibrium solutions of the following bimatrix game, and investigate the admissibility properties of these equilibria: P2
A=
P2
8
4
8
4
0
0
6
1
6
0
0
4
12
0
0
0
4
0
0
0
12
0
4
0
0
4
0
0
1
0
PI
B=
PI.
2. Verify that minimax values of a bimatrix game are not lower (in an ordered way) than the pair of values of any Nash equilibrium outcome. Now, construct a bimatrix game wherein minimax strategies of the players are also their Nash equilibrium strategies, but minimax values are not the same as the ordered values of the Nash equilibrium outcome.
150
DYNAMIC NONCOOPERATIVE GAME THEORY
3. Obtain the mixed Nash equilibrium solution of the following bimatrix game: P2
A~110 What is the mixed Nash equilibrium solution of the bimatrix game (A, B)?
4. Consider the game of the "battle of sexes", but with the matrices
A~I~
I; I B~I~
I~
I
where IX is a nonnegative number. Find the value(s) of IX for which the unique inner mixed Nash equilibrium solution dictates the couple to spend the evening together (i) with highest probability, (ii) with lowest probability. *5. Let (A, B) be an (m x n) bimatrix game admitting a unique Nash equilibrium solution in mixed strategies, say (y*,z*). Prove that the number of nonzero entries of y* and z* are equal. How would this statement have to be altered if the number of mixedstrategy equilibrium solutions is more than one? *6. Prove (or disprove) the following conjecture: every bimatrix game that admits more than one pure Nash equilibrium solution with different equilibrium outcomes necessarily also admits a Nash equilibrium solution in proper mixed strategies. 7. Determine all Nash equilibria of the 3person nonzerosum game, with m1 = m2 = m3 = 2, whose possible outcomes are given below: (at 1,1' ai, 1,1> ai, 1,1) = (0,0,1) (at 2,1' ai, 2, 1> ai, 2, d = ( 2,1,2) (at 1,1' a~,I,1>
at I, d = (0, 3,1)
(at 2,1' a~, 2, 1, at 2, d = (1,1,  2) (atl,2,aL,2,ai,I,2) = (2,1,1) (at 2,2, ai, 2,2, ai, 2,2) = ( 2,0,1)
at 1,2, at 1,2) = (1, 2,2) (at 2,2, at 2,2' at 2,2) = (1,0,0).
(aL,2'
8. Obtain the minimax strategies of the players in the threeperson game of Problem 7. What are the security levels of the players? *9. Prove (or disprove) the Nperson version of the conjecture of Problem 6.
3
151
NPERSON NONZEROSUM
*10. This problem is devoted to a specific Nperson zerosum garnet (N ~ 2) which is called Mental Poker after Epstein (1967). Each of the N players chooses independently an integer out of the set {I, 2, ... , n}. The player who has chosen the lowest unique integer is declared the winner and he receives N  1 units from the other players each of whom loses one unit. If there is no player who has chosen a unique integer, then no one wins the game and everybody receives zero units. If for instance N = 3, n = 5 and the integers chosen are 2, 2 and 5, then the player who has chosen 5 is the winner. Show that for N = 2, n arbitrary, there exists a unique saddlepoint strategy for both players, which is to choose integer 1 with probability 1. Next consider the case N = 3. A mixed strategy for Pi will be denoted i .  1, 2, 3,WIith Yji > b y Y i  ( YI"'" Y.i) , 1  0 an d "'. L...j= I YiiI  . We seek a Nash equilibrium solution {y l*, y 2*, y 3*} with the property i * = y 2 * = y 3 *. Show that' there exists a unique Nash equilibrium solution {yl*, y2*, y3*} with the property yl* = y2* = y3*, which is given = (1/2)i,j = 1, ... ,n 1, y~' = (112)'1. Generalize these results by to N > 3. (See Moraal 1978).) 11. Obtain all purestrategy Nash equilibria of the following singleact games in extensive form. Which of these equilibria are (i) admissible, (ii) of the delayed commitment type?
yr
L
R
R
I
,
L 0,1
R 2,1
RLM 1,3
1,2 1,1
1,0
0.0
0,1
2,1
1,3
(a)
1,2
)
L
R
1,·2
0,2
(b)
12. Obtain all purestrategy Nash equilibria of the following threeplayer singleact game wherein the order in which P2 and P3 act is a variant of PI's strategy. (Hint: note that a recursive procedure still applies here for . each fixed strategy of Pl.) L
R , I /
;n
,
P3(
6 ~.
t
.
0:
o, 0 o·
., ~.
r:
~:
o.
o'
.,'
0
'\f
c;
o
o'
~.
N
.,
C5
With respect to the notation of section 3, the game is zerosum if LiEN a~I'
jEN.
;::. ~.
~.
' .. "N =
0. It nJE M J •
152
DYNAMIC NONCOOPERATIVE GAME THEORY
13. Investigate whether the following threeperson singleact game in nested extensive form admits a Nash equilibrium of the delayed commitment type:
14. Determine all multiact nonzerosum feedback games which are informationally inferior to the feedback game whose extensive form is depicted in Fig. 9. Obtain their delayed commitment type feedback Nash equilibria. 15. Investigate whether the following multiact game admits purestrategy Nash equilibria, by transforming it into equivalent normal form.
16. Determine the Stackelberg strategies for the leader (PI) and also his Stacke1berg costs in the singleact games of Problem 11. Solve the same game problems under the Stackelberg mode of play with P2 acting as the leader. 17. Let I be a twoperson singleact finite game in extensive form in which P2 is the firstacting player, and denote the Stackelberg cost for the leader (PI) as Jr. Let II be another twoperson singleact game that is informationally inferior to I, and whose Stackelberg cost for the leader (PI) is Jr. Prove the inequality s Jh·. 18. Determine the Stackelberg solution of the twoperson dynamic game of Problem 15 with (i) PI as the leader, (ii) P2 as the leader. *19. In a twoperson feedback game, denote the global Stackelberg cost for the leader as and the admissible feedback Stacke1berg cost for the same player as J1. Investigate the validity of the inequality J: ~ J1. 20. Obtain the pure, behavioural and mixed Stackelberg costs with P2 as the leader for the following singleact game.
Jr
J:
3
21.
22. 23.
24.
25.
153
NPERSON NONZEROSUM
Does the game admit a behavioural Stackelberg strategy for the leader (P2)? Let ii1 ' E f1 be a mixed Stackelberg strategy for the leader (PI) in a twoperson finite game, with the further property that ii1 0 E i.e. it is a behavioural strategy. Prove that ii 1 0 also constitutes a behavioural Stackelberg strategy for the leader. Prove that the hierarchical equilibrium strategy rIO introduced in Def. 37 always exists. Discuss the reasons why yl° introduced in Def. 35, and y1 and l* introduced in Def. 36 do not necessarily exist. Determine hierarchical equilibria in pure or behavioural strategies of the singleact game of Fig. 4(b) when (i) P2 is the leader, PI and P3 are followers, (ii) PI and P2 are leaders, P3 is the follower, (iii) there are 3levels of hierarchy with the ordering being P2, PI, P3. In the fourperson singleact game of Fig. 5, let P3 and P4 form the leaders' group, and PI and P2 form the followers' group. Stipulating that the noncooperative Nash solution concept is adopted within each group and the Stackelberg solution concept between the two groups, determine the corresponding hierarchical equilibrium solution of the game in pure or behavioural strategies. Determine the purestrategy Nash equilibria of the following threeperson singleact game in extensive form, which incorporates a chance move. Also determine its (purestrategy) Stackelberg equilibria with (i) PI as leader, P2 and P3 as followers (i.e. 2levelsof hierarchy), (ii)P2 as leader, PI as the first follower, P3 as the second follower (i.e. 3levels of hierarchy).
r.
0
L
.
§; 6
0
(j
o'
.,.:
r
~
~~
.
0 ~.
o'
., ~.
e"
..
C5 ~.
~. ~.
154
DYNAMIC NONCOOPERATIVE GAME THEORY
26. Let X o be a discrete random variable taking the values  I Xo
=
{
w.p. 1/3 w.p. 1/3 1 w.p. 1/3.
°
Consider the twoact game whose evolution is described by x
I
X2
= Xo + uA + u~ = Xl +ut +ui,
where u~ (respectively, ui) denotes the decision (action) variable of Pi at level 1 (respectively, level 2), i = 1,2, and they take values out of the set {1, 0,1}. At level 1 none of the players make any observation (i.e. they do not acquire any information concerning the actual outcome of the chance mechanism (xo ))' At level 2, however, P2 still makes no observation, while PI observes the value of signtx.), At this level, both players also recall their own past actions. Finally, let ji = (  l)i+ I mod(x2) + sgn(u~)
+ sgn(u~),
i = 1,2,
denote the cost function of Pi which he desires to minimize, where mod (x) =
 I, if X = 4, 1,2,5 0, ~f x: 3,~3 ; sgn (0) = 0. { 1, If x 5, 2, 1,4
=
(i) Formulate this dynamic multiact game in the standard extensive form and obtain its purestrategy Nash equilibrium. (ii) Determine its purestrategy Stackelberg equilibria with PI as leader.
3.9 NOTES Sections 3.2 and 3.3 The first formulation of nonzerosum finite games dates back to the pioneering work of Von Neumann and Morgenstern (1947). But the noncooperative equilibrium solution discussed in these two sections within the context of Nperson finite games in normal form was first introduced by J. Nash (1951)who also gave a proof ofThm 2. For interesting discussions on the reasons and implications of occurrence of nonunique Nash equilibria, see Howard (1971). Section 3.4 For further discussion on the relationship between bimatrix games and nonlinear programming problems, see section 7.4 of Parthasarathy and Raghavan (1971). Section 7.3 of the same reference also presents the LemkeHowson algorithm which is further exemplified in section 7.6. Sections 3.5 and 3.7 A precise formulation for finite nonzerosum dynamic games in extensive form was first given by Kuhn (1953), who also proved that every Nperson
3
NPERSON NONZEROSUM
155
nonzerosum game with perfect information (i.e. with singleton information sets) admits a purestrategy Nash equilibrium solution (note that this is a special case of Prop. 11 given in section 3.5). The notion of laddernested extensive forms and the recursive derivation of the Nash equilibrium solution of the delayed commitment type are introduced here for the first time. The "informational non uniqueness" property of the Nash equilibrium solution in finite dynamic games is akin to a similar feature ip infinite nonzerosum games observed by Basar (1974, 1976a, 1977b); see also section 6.3. A special version ofthis feature was observed earlier by Starr and Ho (l969a, b) who showed that, in nonzerosum finite feedback games, the feedback Nash equilibrium outcome could be different (for all players) from the openloop Nash equilibrium outcome. Sections 3.6 and 3.7 The Stackelberg solution in nonzerosum static games was first introduced by H. von Stackelberg (1934) within the context of economic competition, and it was later extended to dynamic games primarily in the works of Chen and Cruz, Jr (1972) and Simaan and Cruz, Jr (1972a, b) who also introduced the concept of a feedback Stackelberg solution, but their formulation is restricted by the assumption that the follower's response is unique for every strategy of the leader. The present analysis is free from such a restriction. The feature of the mixedstrategy Stackelberg solution exhibited in (Counter) Example 14 is reported here for the first time. Its counterpart in the context of infinite games was first observed in Basar and Olsder (1980a); see also section 6.4. For a discussion on a unified solution concept, from which Nash, Stackelberg, minimax solutions (and also cooperative solutions) follow as special cases, see Blaquiere (1976).
Chapter 4
Static Noncooperative Infinite Games 4.1
INTRODUCTION
This chapter deals with static zerosum and nonzerosum infinite games, i.e. static noncooperative games wherein at least one of the players has at his disposal an infinite number of alternatives to choose from. In section 2, the concept of s equilibrium solutions is introduced, and several of its features are discussed. In sections 3 and 4, continuouskernel games defined on closed and bounded subsets of finite dimensional spaces are treated. Section 3 deals mostly with the general questions of existence and uniqueness of the equilibrium solution in such games, while section 4 addresses to some more specific types of zerosum games. Section 5 is devoted to the Stackelberg solution of continuouskernel games, in both pure and mixed strategies; it presents some existence and uniqueness results, together with illustrative examples. Finally, section 6 deals with quadratic games and economic applications. Here, the theory developed in the previous sections is specialized to games whose kernels are quadratic functions of the decision variables, and in that context the equilibrium strategies are obtained in closed form. These results are then applied to different price and quantity models of oligopoly.
4.2
6
EQUILIBRIUM SOLUTIONS
Abiding by the notation and terminology of the previous chapters, we have N players, PI, ... , PN, where Pi has the cost function Ji whose value depends not only on his action but also on the actions of some or all of the other players. Pi's action is denoted by d, which belongs to his action space Vi. t For infinite t In this chapter, we deal with static problems only, and in that context we do not distinguish between strategy space and action space, since they are identical.
156
4
STATIC NONCOOPERATIVE INFINITE GAMES
157
games, the action space of at least one of the players has infinitely many elements. In Chapters 2 and 3, we have only dealt with finite games in normal and extensive form, and in that context we have proven that noncooperative equilibrium solutions always exist in the extended space of mixed strategies (cf. Thm 3.2). The following example now shows that this property does not necessarily hold for infinite games in normal form; it will then motivate the introduction of a weaker equilibrium solution conceptthe socalled s equilibrium solution. Example 1 Consider the zerosum matrix game A defined by
P2 PI
2
°
1
1
"2
"3
1
2
"2
"3
4"
1
s
3
~
4
1
Here, PI has two pure strategies, whereas P2 has countably many alternatives. As in finite zerosum games, let us denote the mixed strategies of PI and P2 by Yi' i = 1,2; and Zj' j = 1,2, ... respectively; that is, Yi denotes the probability with which PI chooses row i, and Zj denotes the probability with which P2 chooses column j. Obviously we should have Yi ~ 0, Zj ~ 0, Yl + Yz = 1, 1 Zj = 1. Ifwe employ a natural extension of the graphical solution method described in section 2.3, to determine the mixed security strategy of PI, we arrive at Fig. 1. The striking feature here is that the upper envelope is not well defined.
I;=
2
Fig. 1. Graphical solution to the (2 x (0) matrix game of Example I.
Suppose that the matrix A is truncated to A k , where A k comprises the first k columns of A. The mixed security strategy for PI is then Y! = (k  1)1 (3k  2); Y! = (2k  1)/(3k  2),and the average security level of PI is Vm (A k ) =2(kl)/(3k2). If we let k+OCJ, we obtain Y! =!,Y! =i,Vm(A)=i·
158
DYNAMIC NONCOOPERATIVE GAME THEORY
Thus it is easily seen that (y! = j, y! = j) is a mixed security strategy for PI in the matrix game A. The mixed security strategy for P2 in the truncated game A k is (z! = (k  2)/(3k  2),z: = 2k/(3k  2» and the average security level of P2 is l::"m(A k ) = 2(kI)/(3k2). In the limit as k+oo, we obtain (z! = t,z:' =!) and l::"m(A) = j. But, P2 cannot guarantee for himself the average lower value V m(A), since he cannot choose column "00". We observe, however, that if he chooses column k with probability j , and k being sufficiently large, then he can secure for himself an average lower value arbitrarily close to l::"m(A) = j. In accordance with this observation, if P2 chooses k such that l::"m(A k ) > j e, then the corresponding mixed security strategy is called an emixed security strategy, which is also an emixed saddlepoint strategy for P2. The reason for the indicated disparity between finite and infinite games lies in the fact that the set of mixed strategies of P2 is not compact in the latter case. Consequently, an (m x (0) matrix game might not admit a saddle point in mixed strategies, in spite of the fact that the outcome is continuous with respect to mixed strategies of P2. 0 The preceding example has displayed nonexistence of a saddle point within the class of mixed strategies. It is even possible to construct examples of zerosum (semiinfinite) matrix games which admit e saddlepoint strategies within the class of pure strategies. This happens, for instance, in the degenerate matrix game A, where A = [0 Hi ... ]. Similar features are also exhibited by nonzerosum matrix games under the Nash solution concept. We now provide below a precise definition of an s equilibrium solution in N person games within the context of pure strategies. Extension to the class of mixed strategies is then immediate, and the last part of this section is devoted to a discussion on that extension, as well as to some results on existence of mixed e equilibrium solutions. DEFINITION
1 For a given e ~O,
an Ntuple {ut, ... ,u~·},
with U;·EU i ,
iEN, is called a (pure) e Nash equilibrium solution for an Nperson nonzero
sum infinite game if Ji(U~·,
... , u~·)~
inf
Ji(U~·,
... , U~l·,
u',
U~+l·,
... , u~·)+e,
iEN.
UIEU'
For s = 0, one simply speaks of "equilibrium" instead of "0 equilibrium" solution, in which case we denote the equilibrium strategy of Pi by ui • • 0 For zerosum twoperson games the terminology is somewhat different. For N = 2 and Jl _J2 gJ, we have
=
4 DEFINITION
2
saddle point if
For a given
8 ;;:::
0, the pair {u)', un
X
V 2 • For
8
E
VI
X
v2
is called an
8
J(U I,U;")+8
J(U)',U 2)8 ~ J(u)',u;')~
for all {ul , u2 } E Vi point".
159
STATIC NONCOOPERATIVE INFINITE GAMES
= 0 one simply speaks of a "saddle
o
By direct analogy with the treatment given in section 2.2 for finite games, the
lower value of a zerosum twoperson infinite game is defined by
l:" =
sup U1
E U2
inf J (u l , u 2 ), UIEU'
which is also the security level of P2 (the maximizer). Furthermore, the upper value is defined by
17= inf sup UIEU'
U
2
J(U I,U2),
EU 2
which is also the security level of PI. Since the strategy spaces are fixed, and in particular the structure of Vi does not depend on u i , if j, i, j = 1,2, it can easily be shown, as in the case of Thm 2.1 (iii), that y ~ V. If 1" = V, then V = Y = V is called the value of the game. We should note that existence of the value of a game does not necessarily imply existence of (pure) equilibrium strategies; it, however, implies existence ofan 8 saddle pointa property that is verified in Thm 1 below, which proves, in addition, that the converse statement is also true. THEOREM
1 A twopersonzerosum (infinite) game has afinite value if, andonly 8 > 0, an 8 saddle point exists.
if, for every
Proof Firstly, suppose that the game has a finite value (V = 1" = V). Then, 2 8 > 0, one can find utE Vi and U;'E V suchthat
given an
1"t8 ) < V +t8
J(ul,u;') >
J (u)',u
2
(i)
(ii)
which follow directly from the definitions of 1" and V, respectively. Now, since V, by adding 8 to both sides of the first inequality we obtain
1" = V =
(iii) where the latter inequality follows from (ii) by letting u 2 = u;*. Analogously, if we now add  8 to both sides of (ii), and also make use of (i) with u l = u;', we obtain (iv)
160
DYNAMIC NONCOOPERATIVE GAME THEORY
If (iii) and (iv) are collected together, the result is the set of inequalities J (U~·,U2)
e ~ J (u~·,
un ~ J (u l ,un+e,
(v)
for all u I E V I , u 2 E V 2, which verifies the sufficiency part of the theorem, in view of Def. 2. Secondly, suppose that for every e > 0, an s saddle point exists, that is, a pair {U/·E u', U;·E V 2 } can be found satisfying (v)for all u l EV I , u 2 E V 2 . Let the middle term be denoted as J£. We now show that the sequence {J£I' J£2' ... }, with e1 > e2 > ... > and lim;.... 00 e, = 0, is Cauchy. t Towards this end, let us first take s = ek and s = ej , j> k, in subsequent order in (v) and add the resulting two inequalities to obtain
°
+ J (ul;, u 2 ) 
J (ul:, u 2 )  J (ul , u~;)
Now, substituting first {u = ul;, u = u~:} the preceding inequality, we get l
2
J (ul , u~:)
~ 2(ek
+ ej) < 4ek .
and then {u = ul:, u 2 = ur} in 1
 4ek < J £.  J £j < 4ek
for any finite k and j, with j > k, which proves that { J e.} is indeed a Cauchy sequence. Hence, it has a limit in R, which is the value of the game. It should be noted that, although the sequence {J..} converges, the sequences {ul:} and {u~:} need not have limits. 0 Nash equilibria and reaction curves Purestrategy Nash equilibrium solution in infinite static games can be obtained as the intersection point of the reaction curves of the players. The concept of a reaction set was already introduced in Chapter 3 (Def, 3.26)within the context of finite games, and its counterpart in infinite games describes a curve, or a family of curves, with some properties like continuity and differentiability, depending on the structure of the action sets and the cost functionals. In the sequel we devote a brief discussion to the graphical derivation of purestrategy Nash equilibria via reaction curves; some further discussion on this topic will be given in section 3. DEFINITION 3 In a twoperson nonzerosum game, (u l , u 2 ) with respect to u 2 E V 2 be attained for each
J2
R 2 (u l )
C
let the mimmum of u l E V I. Then, the set
V 2 defined by R 2 (u 1) = {
e
E
V 2: J 2 (u 1, 0 ~ J 2 (u I , U 2 ), 'v' u 2 E V 2 }
is called the optimal response or rational reaction set of P2. If R 2 (u I) is a singleton for every u I E V I, then it is called the reaction curve or reaction t
For a definition of a Cauchy sequence, the reader is referred to Appendix I.
4
STATIC NONCOOPERATIVE INFINITE GAMES
161
function of P2, and is denoted by Iz (u l ) . The reaction set and curve of PI are similarly defined (simply by interchanging the indices). 0 In Fig. 2, the "constant level"or isocostcurves of JI and JZ have been drawn for a specific game with VI = V Z = R. For fixed ul, say u l = iii, the best P2 can do is to minimize JZ along the line u l = iii. Assuming that this minimization problem admits a unique solution, the said optimal response of P2 is determined, in the figure, as the point where the line u 1 = iii is tangent to an isocost curve JZ = constant. For each different iii, a different unique optimal response can thus be found for P2, and the collection of all these points forms the reaction curve of P2, indicated by Iz in the figure. The reaction curve of PI is similarly constructed: it is the collection of all points (u 1 , u Z ) where horizontal lines are tangent to the isocost curves of J 1, and it is indicated by Ii in the figure. By definition, the Nash solution must lie on both reaction curves, and therefore, if these curves have only one point of intersection, as in the figure, the Nash solution exists and is unique.
Fig. 2. Constant level curves for Jl and ]2, and the corresponding reaction curves (/1 and 12 ) of PI and P2 respectively.
The configuration of the reaction curves is not always as depicted in Fig. 2. In Fig. 3, some other possibilities have been displayed. In all these cases it has been assumed that Vi = V 2 = R. In Fig. 3a, Ii and Iz are parallel straight lines and neither a Nash nor an e Nash solution exists. In Fig. 3b, Ii and Iz partly coincide and there exists a continuum of Nash solutions. In Fig. 3c, there is a finite number of Nash solutions. In Fig. 3d, one of the reaction curves is not connected and neither a Nash nor an s Nash solution exists. The nonzerosum games associated with Figs 3b and 3c both exhibit multiple Nash equilibria, in which case, as discussed earlier in section 3.2, the Nash concept is weak as an equilibrium solution concept. Unless other criteria are imposed, there is no reason why players should prefer one particular equilibrium solution over the other(s). In fact, since the players make their decisions independently, it could so happen that the outcome of their joint (but noncooperative) choices is not an equilibrium point at all. Such non
162
DYNAMIC NONCOOPERATIVE GAME THEORY
equilibrium outcomes of unilaterally equilibrium strategies are shown in Figs 3b and 3c, where PI picks a Nash strategy al and P2 chooses a Nash strategy il 2 • I,
(c)
(d)
Fig. 3. Possible configurations of the reaction curves.
Extension to mixed strategies All the definitions, as well as the result of Thm I, of this section have been presented for pure strategies. We now outline possible extensions to mixed strategies. We define a mixed strategy for Pi as a probability distribution /li on Vi, and denote the class of all such probability distributions by Mi. For the case when Vi is a finite set, we have already seen in Chapters 2 and 3 that M i is an appropriate dimensional simplex. For an infinite Vi, however, its description will be different. Let us consider the simplest case when Vi is the unit interval [0,1]. Then, elements of M i are defined as mappings u: [0,1] + [0, 1] with the properties /l(0) = 1  /l(l) =0 } (1) /l(u) ~ /l(u) whenever u > u /l(u) = /l(u +) if ufO, that is, M i is the class of all probability distribution functions defined on [0, 1]. More generally, if Vi is a closed and bounded subset of a finite dimensional space, a mixed strategy for Pi can likewise be defined since every such Vi has the cardinality of the continuum and can therefore be mapped into the unit interval in a onetoone manner. However, such a definition ofa mixed strategy leads to difficulties in the construction of the average cost function, and therefore it is convenient to define /li as a probability measure (see Ash (1972)) on Vi, which is valid even if Vi is not closed and bounded. The average cost function of Pi is then defined as a mapping fi:M I x ... X MN + R by fi(/ll, ... ,/IN)
= JUl ... SuN Ji(U I, ... , uN)d/ll(u l)
... d/lN(u N)
(2)
4
163
STATIC NONCOOPERATIVE INFINITE GAMES
whenever this integral exists, which has to be interpreted in the LebesgueStieltjes sense (see section 3 of Appendix II). With the average cost functions defined as in (2), the mixed s Nash equilibrium solution concept can be introduced as in Def. I, by merely replacing Ji by P, Vi by u', ui by jJ.i and u~· by jJ.:*. Analogously, Defs 2 and 3 can be extended to mixed strategies, and Thm 1 finds a direct counterpart in mixed strategies, with Y and V replaced by
Ym = sup 1J.2
and
e M2
inf J(jJ.l, jJ.2) p.l
eMI
Vm = inf sup
J(jJ.l, jJ.2)
1lleMll12eM2
respectively, and with V replaced by Vm • We devote the remainder of this section to the mixed s Nash equilibrium solution concept within the context of semiinfinite bimatrix games. The two matrices, whose entries characterize the possible costs to be incurred to PI and P2, will be denoted by A = {aiJ and B = {bij}, respectively, and both will have size (m x (0). If m < 00, then we speak of a semiinfinite bimatrix game, and if m = 00, we simply have an infinite bimatrix game. The case when the matrices both have the size (00 x n) follows the same lines and will not be treated separately. A mixed strategy for PI, in this case, is a sequence {Yt> Y2' ... , Ym} satisfying L~; 1 Yi = 1, Yi ~ O. The class of such sequences is denoted by Y. The class Z is defined, similarly, as the class of all sequences {Zl' Z2" ... } satisfying 1 z, = 1, Zi ~ O. Each element of Z defines a mixed strategy for P2. Now, for each pair {y E Y, Z E Z}, the average costs incurred to PI and P2 are defined, respectively, by
L;:
J1 (y, z) =
LL YiaijZj; i
j
J2 (y, z) =
LL YibijZj, i j
assuming that the infinite series corresponding to these sums are absolutely convergent. If the entries of A and B are bounded, for example, then such a requirement is fulfilled for every Y E Y, Z E Z. The following theorem now verifies existence of mixed e equilibrium strategies for the case m < 00. THEOREM 2 For each s > 0, the semiinfinite bimatrix game (A, B), with A = {aij}!; 1 j ~ b B = {bij}i; 1 j ~ b m < 00 and the entries aij and bij being bounded, admits an s equilibrium solution in mixed strategies.
Proof Let N denote the class of all positive integers. For eachj E N, define cj as the jth column of B, i.e. cj = Be]. The set C = {Cj, j E N} is a bounded subset of am. Let s > 0 be given. Then, a finite subset C c C exists such that to each C E Cae E C corresponds with the property II cc II < s, which means that the magnitude of each component of (cc) is less than s, 00
164
DYNAMIC NONCOOPERATIVE GAME THEORY
Without loss of generality, we assume that C = {Cto C2' .•• , c.} with n < 00. If necessary, the pure strategies of P2 can be rearranged so that C consists of the first n columns of B. Define A. = {aj j } l= 1 j: 10 B. = {bij}l= 1 j: i  By Thm 3.1, the "truncated" bimatrix game (A., B.) has at least one equilibrium point in mixed strategies, to be denoted by
For each r E N, we choose a b(r) E {I, ... , n} such that IIc.  cb (. ) 1100 < 6. For eachj E {I, ... , n}, let R (j) ~ {r E N: b(r) = j}. Next, for each Z E Z, we define i = (ito ... ,i.) by i j = L'ER(j)Z" Then i j 2:: 0 and Lij = 1. Subsequently, Bz
where 1m
=
f
r = 1
z.c, 2::
±( L
j = 1
rER(j)
z,)Cj  elm
= B.z 
= (1, ... , 1)' E R", Hence y'Bz 2:: Y'B.Z 62:: Y' B.E 6 = Y' Bi6
elm'
(ii)
for all Z E Z. Inequalities (i) and (ii) verify that (y, i) constitutes a mixed 6 equilibrium solution for (A, B). 0 COROLLARY 1 Every bounded semiinfinite zerosum matrix game has a value in the class of mixed strategies.
Proof This result is a direct consequence of Thm 2, and constitutes a natural extension of Thm 1 to mixed strategies. 0
For (00 x 00) bimatrix games such powerful results do not exist as it is shown in the following (counter) example. Example 2 Each one of two players chooses a natural number, independently of the other. The player who has chosen the highest number wins and receives one unit from the other player. If both players choose the same number, then the outcome is a draw. The matrix corresponding to this zerosum game is
o
1
1
1
1
0
1
1
1 1
1 1
0 1
1 0
4
STATIC NONCOOPERATIVE INFINITE GAMES
165
whose size is (00 x (0). For this game, y = 1 and V = + 1 and consequently the average value Vm does not exist, in spite of the fact that the matrix entries are bounded. 0
4.3 CONTINUOUSKERNEL GAMES: RESULTS ON EXISTENCE OF NASH EQUILIBRIA This section deals with static games in which the number of alternatives available to each player is a continuum and the cost functionals are continuous. In particular, we shall consider the class of games for which a pure strategy of each player can be represented as a real number in the interval [0, 1]. If there are two players, then one speaks of "games on the square". The following theorem now provides a set of sufficient conditions under which such nonzerosum games admit purestrategy Nash equilibria. THEOREM 3 For each i E N, let the cost functional J i : [0, 1] N+ R be joint/y continuous in all its arguments and strictly convex in ui for every uj E [0, 1], j E N, j =1= i. Then the associated N person nonzerosum game admits a Nash equilibrium in pure strategies.
Proof Let us take i = 1. By strict convexity, there exists a unique mapping /1: [0, I]NI+ [0,1] such that u l = ldu 2, • • . ,UNI) uniquely minimizes J 1 ( U 1 , • • • ,UN) for any given (N I)tuple {u2 , • • • ,UNI}.
The mapping /1 is actually the reaction function of PI in this Nperson game. Similarly, reaction functions Ii' i = 2, ... , N, can be defined as unique mappings from [0, 1] N I into [0, 1]. Using vector notation, these relations can be written in compact form as u = L(u), where u = (u l , . . . , UN) E [0, 1] N, and L = (/1' ... ,IN)' It will be shown in the sequel that the individual reaction functions Ii are continuous in their arguments, and hence L is a continuous mapping. Since L maps [0, 1] Ninto [0, 1] N, this then implies, by utilization of Brouwer's fixed point theorem (see Appendix III, Thm 1), that a u* E [0, 1] N exists such that u* = L(u*). Obviously, the components of this u* constitute a Nash equilibrium solution. To complete the proof of the theorem, what remains to be shown is the continuity of Ii' Let us take i = 1 and assume, to the contrary, that /1 is discontinuous at U6, ... ,u~. Further let II (u6, ... , u~) = u6. Then there exists a sequence of vectors {ilj g (uJ, ... ,uf)'; j = 1,2, ... ,} such that (u6, ,u~)' is the limit of this sequence but uA is not the limit of 11 (uJ, , as j + 00. Because of compactness of the action spaces, there exists a subsequence of {Uj}, say {uj,}, such that lduj) converges to a limit
un
166
DYNAMIC NONCOOPERATIVE GAME THEORY
U6 =f ii6, and
simultaneously the following inequality holds: J 1(11 (ii j ) , iii) <
P (u6, ii)).
Now, by taking limits with respect to the subsequence of indices obtains the inequality J 1 (u6, u~,
. . . , u~)::;
J 1 (u6, u~,
Ud,
one
. . . , un
which, together with u6 =f ii6, forms a contradiction to the initial hypothesis that u6 is the unique u1 which minimizes Jl(U 1, u~, . . . , un Hence, 11 is continuous. The continuity of Ii, i > 1, can analogously be proven. 0 Remark 1 It should be clear from the proof of the theorem that the result is valid if Vi, i E N, are taken as general compact convex sets (instead of [0,1]). 0
For N = 2, the contents of Thm 3 can be visualized graphically on a square (see Fig. 4). The function u 2 = 12 (u 1 ) is a connected curve, connecting the line segments AD and Be. Similarly, the function u 1 = 11 (u 2 ) is a connected curve, connecting the line segments AB and De. Obviously, the point of intersection yields the Nash equilibrium solution. In general, however, the Nash solution is nonunique, because there may be more than one point of intersection, or equivalently, in the proof ofThm 3, there may be more than one fixed point of the equation u = L(u). In the case of nonunique Nash equilibrium solutions, we can classify them in a number of ways. One such classification is provided by the notion of "robustness", which is introduced below.
tf1
2
D U
A
C
u'
Fig. 4. Reaction curves for a game on a square.
4 Given two connected curves u 2 = 12 (u 1 ) and u 1 = 11 (u 2 ) on the square, denote their weak e5neighbourhoods by N land N L respectively.t Then, a point P of intersection of these two curves is said to be robust if, given s > 0, there exists a £5 0 > 0 so that every ordered pair selected from N:' and N ~ has an intersection in an sneighbourhood of P. 0 DEFINITION
(l0
t
VQ
A weak <5neighbourhood N. of I: [0,1] + [0, 1] is defined as the set of all maps [0,1] such that I/(~)n~)1 < <5, \I~E[O, 1].
+
T: [0,1]
4
STATIC NONCOOPERATIVE INFINITE GAMES
167
As specific examples, consider the twoperson nonzerosum games for which the reaction functions are depicted in Figs 5a and 5b. In Fig. 5a, the point Pi constitutes a robust Nash solution, while point P 2 is not robust. In Fig. 5b, however, all the Nash equilibrium solutions are robust. It should be noted that the number of robust intersections of two connected curves is actually odd, provided that possible points ofintersections at the corners of the square are excluded. Hence, we can say that nonzerosum twoperson static games on the square, whose cost functionals fulfil the requirements stated in Thm 3, essentially admit an odd number of Nash equilibrium solutions. An exception to this rule is the case when the two reaction curves intersect only at the corners of the square.
~~
~,~' (a)
"'~ (b)
Fig. 5. Games on the square, illustrating robust Nash solutions.
Yet another classification within the class of multiple Nash equilibrium solutions of a twoperson nonzerosum static game is provided by the notion of "stability" of the solution(s) of the fixed point equation. Given a Nash equilibrium solution, consider the following sequence of moves: (i) One of the players (say PI) deviates from his corresponding equilibrium strategy, (ii) P2 observes this and minimizes his cost function in view of the new strategy of PI, (iii) PI now optimally reacts to that (by minimizing his cost function), (iv) P2 optimally reacts to that optimum reaction, etc. Now, if this infinite sequence of moves converges to the original Nash equilibrium solution, and this so regardless of the nature of the initial deviation of PI, we say that the Nash equilibrium solution is stable. If convergence is valid only under small initial deviations, then we say that the Nash equilibrium solution is locally stable. Otherwise, the Nash solution is said to be unstable. A nonzerosum game can of course admit more than one locally stable equilibrium solution, but a stable Nash equilibrium solution has to be unique. The reaction functions of two different nonzerosum games on the square are depicted in Fig. 6. In the first one (Fig.6a), the equilibrium solution is stable, whereas in the second one it is unstable. The reader is referred to section 6 for more examples of stable and unstable equilibrium solutions. Since twoperson zerosum games are special types of nonzerosum games, the statement of Thm 3 is equally valid for saddlepoint solutions. It can even be sharpened to some extent.
168
DYNAMIC NONCOOPERATIVE GAME THEORY U2
~)
0
~
U'
(~
Fig. 6. Illustration of (a) a stable and (b) an unstable Nash solution. COROLLARY 2 Consider a twoperson zerosum game on the square, defined by the continuous kernel J (u l , u 2 ). Suppose that J (u l , u 2 ) is strictly convex in u l for each u 2 E [0, 1] and strictly concave in u 2 for each u l E [0, 1]. Then there exists a unique purestrategy saddlepoint equilibrium.
Proof Existence of saddlepoint equilibrium is a direct consequence of Thm 3. Furthermore, by strict convexity and concavity, there can be no saddlepoint solutions outside the class of pure strategies. Hence only uniqueness within the class of pure strategies remains to be proven, which, however, follows readily from the interchangeability property of multiple saddle points 0 (viewed as a direct extension of Corollary 2.1).
The following theorem, whose proof can be found in either Karlin (1959) or Owen (1968), provides a slight generalization of the preceding corollary. 4 Let Jiu", u 2 ) be afunctional defined on Vi x V 2 where both Vi and V are compact and convex sets. IfJ is continuous on Vi x V 2 , concave in u 2 for each u l E VI and convex in u l for each u 2 E V 2 , then a saddle point exists in pure strategies, but it is not necessarily unique. 0 THEOREM 2
We now focus attention to nonzerosum games whose cost functionals are continuous but not necessarily convex. For such games one cannot, in general, hope to obtain pure strategy Nash equilibria; however, in the enlarged class of mixed strategies, the Nash equilibrium solution exists as it is verified in the following theorem. THEOREM 5 An Nperson nonzerosum game in which the action spaces Vi (i E N) are compact and the cost functionals Ji (i E N) are continuous on V I X . . . x V N admits a Nash equilibrium in mixed strategies.
Proof A proof of this theorem can be found in Owen (1974). The underlying idea is to make the kernels Ji discrete so as to obtain an Nperson
4
169
STATIC NONCOOPERATIVE INFINITE GAMES
finite matrix game that suitably approximates the original game in the sense that a mixed strategy solution of the latter (which always exists by Thm 3.1) is arbitrarily close to a mixed equilibrium solution of the former. Compactness of the action spaces ensures that a limit to the sequence of solutions obtained for approximating finite matrix games exists. For yet another proof of this result, see Glicksberg (1950). 0 As a special case of Thm 5 we now have COROLLARY 3 Every continuouskernel twoperson zerosum game with com0 pact action (strategy) spaces admits a saddle point in mixed strategies.
The question ofthe weakest conditions under which a mixedstrategy saddle point exists naturally arises. Corollary 3 says that continuity is a sufficient condition. But it is by no means necessary, and, in fact, it can be replaced by semicontinuity conditions (see Glicksberg, 1950). However, it is not possible to relax the semicontinuity conditions any further and still retain existence of a mixedstrategy saddle point within a general enough framework (see Sion and Wolfe, 1957). We conclude this section with an example of a zerosum game whose cost functional is continuous but not convexconcave, and which admits a mixed saddlepoint equilibrium. Example 3 Consider the twoperson zerosum game on the square [0, 2]Z characterized by the kernel J = (u 1  uZ)z. In Fig. 7, the two reaction curves 11 and l z have been drawn, which clearly do not intersect. It can easily be seen that r = 1 and J! = 0, i.e. a purestrategy saddle point does not exist. In the extended class of mixed strategies, however, a candidate saddlepoint solution directly follows from Fig. 7, which is u
1· _

1 w.p. l:, u 2" _ {O2 w.p, w.p.
1/2 1/2.
It can readily be verified that this pair of strategies indeed provides a mixed saddlepoint solution. 0
Fig. 7. Reaction curves for the zerosum game with kernel J = (u l
 U
2)2.
170
DYNAMIC NONCOOPERATIVE GAME THEORY
4.4 SOME SPECIAL TYPES OF ZEROSUM GAMES ON THE SQUARE In this section, some special types of twoperson zerosum games will be briefly discussed. These special types of games, or special types of kernels through which the games are characterized, have attracted special attention either because their saddlepoint solutions exhibit special features and/or because special numerical algorithms can be developed to compute these saddlepoint solutions.
4.4.1
Games with separable kernels
Suppose we are given a twoperson zerosum game on the unitsquare with the kernel J(u 1, u 2) =
m
•
i=1
j=1
L: L:
aij ri(u 1)Sj(u2),
where r, and Sj are continuous functions. Such a game is called a game with a separable kernel, or simply a separable game, and it can easily be reduced to a game in a simpler form. If a mixed strategy (J.11 E M 1,J.1 2 E M 2) is adopted, the outcome becomes m
J(J.11, J.12) =
L:
i= 1
i
j= 1
aij
J~
ri(u 1)dJ.11(u 1).
J~
sj(u 2)dJ.12(u
2)
which can be interpreted as the average outcome of a game with a finite set of alternatives, In this new game, PI chooses r1> ... , rm subject to
ri =
J~ ri(u 1)dJ.11(u1), i = 1, . , . ,m
(i)
and with J.11 E MI. The set of all such mvectors r ~ (r 1, . . . , rmY will be denoted by C r It can be proven that this set is compact and convex, spanned by a curve in R", whose parametric representation is r; = ri(~)' 0 ~ ~ ~ 1, i = 1, ... , m. Each point on this curve corresponds to a onepoint distribution function J.11. Similarly, P2 chooses 81 , . , . , s" subject to A Sj
=
J01 Sj (2) u d J.1 2(u 2) ,J. = 1, ... , n, J.1 2EM2 .
(1'1')
The set of all such nvectors s~ (s h ' . , , s.)' will be denoted by C 2' This set is also compact, convex and is spanned by a curve in R', whose parametric representation is Sj = Sj(~)' 0 S ~ S 1, j = 1, ... , n.
4
171
STATIC NONCOOPERATIVE INFINITE GAMES
Thus, the original separable game can be reformulated as
r'As,
min max reel
seC2
(iii)
where A is the (m x n) matrix {aij}' The conditions of Thm 4 are satisfied for this new game and hence it has a pure, but not necessarily unique, saddlepoint solution in the product space C l x C 2 • Such a solution, denoted by (i; *, S *), must be transformed back into the functions Ill, 1l2 • In other words, functions III and 11 2 must be found such that (i)and (ii)are satisfied where now ri, ri(u l ) , i = 1, ... ,m and Sj, Sj(u 2 ), j = I, ... ,n, are considered as being given. This is reminiscent of the socalled moment problem, since, if ri( u l) is a power of u l for i = 1, ... , m, then the value of ri stands for the corresponding moment of Ill. Example 4 Consider the twoperson zerosum game characterized by the kernel J(u
l
,
u
2) = 2u l sin(Iu2) 
3(U I)3 sin(Iu2) + ulcos
+ (ul )3cos(Iu2) 
(I U2)
2u l + 2t(U I)3  cos(IU 2)
The functions ri(u l ) and Sj(u 2 ) are determined as rl = u r2 =
l
; 81
= sin (I u 2 ) ;
82
= cos (IU2 );
S3
= 1.
~UI)3;
r3 = 1
;
The vectors r = (r l , r2' r 3)', defined by (i), and S = (Sl' S2' S3)" defined by (ii), must belong to the sets CI and C 2 , respectively, which are graphically depicted in Fig. 8. The game in terms of f and becomes:
s
mm max J(r, s) reel
seC z
~
PLANE Pj=1
PLANE
g,3=1
,
I
: g,,=sln(g> I S2 I
~cos(J~)
• I
o
o
Fig. 8. The admissible sets C 1 and C2 for Example 4.
172
DYNAMIC NONCOOPERATIVE GAME THEORY
where j = 2'151  3r2S1 + '152 + '252  2'1  2tr2  52
= ,d25 1 + 52  2)+'2 (  351 + 52 + 2!)  52 = 5d2'1 3'2)+52('1 +'2 1)2'1 +2!'2' In order to analyse this latter game, we must distinguish between several cases depending on the signs (positive, negative or undetermined) of the functions
251 +S2 2, 3s 1 +S2 +2t,
zr, 3r2' r 1 +r21. This example has been constructed so that the point of intersection of the two lines 2s 1 + S2  2 = 0 and  3s 1 + S2 + 2t = 0, i.e. (SI = 0,9, S2 = 0,2), belongs to C 2 and the point of intersection of the two lines 2rl  3r2 = 0 and r. + r2  1 = 0, i.e. (r 1 = 0'6,'2 = 0,4), belongs to C i Therefore, the coordinates of these two points of intersection yield a saddle point in terms of Sj and ri' If at least one of these points of intersection had fallen outside the admissible region, the analysis would have been slightly more involved, though still elementary and straightforward. The two points of intersection, S 1 and S 2, have been indicated in Fig. 8 from which it is rather straightforward to obtain the solution of the original problem in terms of uland u 2 • Consider, for instance, the point S 1;it lies on the line connecting 0 and B (see Fig. 8). Point 0 corresponds to the pure strategy u 1 = 0 and point B corresponds to the pure strategy Ji;:::: 0'82. Hence the following twopoint distribution for u 1 is an optimal choice for PI: w.p. ...... ......
0,820,6 ;:::: 0'27 0·82
0·6 w.p. ;:::: 0.82 ;:::: 0·73. It should be clear that there exist more than one twopoint distribution for u 1 which leads to the same point S r0
4.4.2
Games of timing
The prototype for a game of timing is a twoperson zerosum game wherein each player can act a finite number of times at arbitrary instances within a given time interval, say [0, I]. The order in which the players act is usually very important. In this subsection only games in which the players act once will be considered. As an example, suppose that two large mailorder companies are planning their yearly catalogues. Here, the timing of the catalogue is an important
4
STATIC NONCOOPERATIVE INFINITE GAMES
173
factor. The company that issues its catalogue first gains a seeming advantage over its competitor; on the other hand, the one that waits to be the second can capitalize on the weaknesses of his opponent. From the mathematical viewpoint, a game of timing can be considered as a game on the unit square, whose kernel has the following structure: 1 2 L l (U , U )
J (u 1 ,u 2)
=
cf> (u 1 ) {
L 2 (U 1,U 2 )
iful < u 2 , if u 1 = U 2 , ifu 1 > u 2 ,
where each of the functions L 1 and L 2 are assumed to be jointly continuous in their arguments and cf> is also continuous. The kernel may be discontinuous along the diagonal, and because it may be neither upper nor lower semicontinuous, mixed saddlepoint strategies do not necessarily exist. In the example of the mailorder companies, if J stands for the relative profit for P2 and the relative loss for PI, it is quite natural to assume that L 1 (u 1 ,u 2 ) and L 2 (ul, u2 ) are both monotone decreasing in u 1 for fixed u2 and monotone increasing in u 2 for fixed u 1 • The monotonicity of L 1 and L 2 and the discontinuity of J at u 1 = u2 permit the following interpretation: if P2 acts at a fixed time u 2 , PI improves his chances ofsuccess by waiting as long as possible, provided that he acts before P2. If PI waits until after P2 acts, however, he may lose if PI has been successfulthe main reason for discontinuity at u 1 = u 2 • Once P2 has acted, PI's chances increase with time. Analogous statements apply when the roles of the players are reversed. Games of timing have been discussed in detail in Karlin (1959) where it has been proven that, for a large class of games of timing, an e saddle point exists. There are several examples which can appropriately be considered as games of timing; we shall conclude our discussion here with one such example. Example 5 Consider a duel: two men, starting at t = 0, walk toward each other at such a speed that they meet each other at t = 1 if nothing intervenes. Each one has a pistol with exactly one bullet, and he may fire the pistol at any time t E [0,1] he wishes. If one of them hits the other, the duel is immediately over, and the one who has fired successfully is declared as the winner. If neither one fires successfully or if both fire simultaneously and successfully, the duel becomes a standoff. The probability of hitting the opponent after the pistol has been fired is a function of the distance between the men at that moment. Ifa player fires at time t E [0, 1], we assume that the probability of hitting the opponent is equal to t. Moreover, we assume that the duel is silent, i.e.a player does not know whether his opponent has already fired, unless, of course, he is hit. In a nonsilent, say noisy, duel, a player knows when his opponent has fired, and if he has not been hit he will wait until t = 1 to fire his own pistol, because at t = 1 he is certain to hit his opponent.
174
DYNAMIC NONCOOPERATIVE GAME THEORY
If uland u 2 are the instants at which the pistols are fired, the kernel for the silent duel, interpreted as a zerosum game, becomes
= U lU 2_U l+U 2, 4J = 0, L 2 = +U lU 2_U l+U 2,
u l u2 .
L l
J
(U
l,U 2)
=
{
It is easily seen that the kernel is skewsymmetric, i.e. J (u 1 ,u 2) =  J (u 2, U 1 ). As we have already mentioned in Problem 9 of section 2.8, a zerosum game with a skewsymmetric kernel is called a symmetric game. The arguments to be employed in the verification of the hypothesis of that problem can also be used here to show that, if the value Vm (in mixed strategies) of this symmetric game exists, then it must be zero. Moreover, the optimal values of III and 1l2 , to be denoted by Ill' and Ill', respectively, will be identical. Since the sufficiencycondition of Corollary 3 is not satisfied, it is not known beforehand whether this game admits a mixed saddlepoint equilibrium. However, assuming a solution (i.e. a pair of probability distributions) in the form u::::; a,
u
~
b,
a < u < b, we arrive, after some analysis, at the conclusion that a mixed saddle point indeed exists and is given by 1 O<  u<  3' 1
"3::::; u s 1.
4.4.3
o
Bellshaped games
The following game is bellshaped: A submarine (PI) tries to escape a depth bomb to be dropped by an airplane (P2). During the time before the bomb is discharged, the submarine can locate its position anywhere in the interval  a ::::; u 1 ::::; a, where a is a given positive constant. P2 is aware of the possible positions of P 1. If P 1 locates itself at uland P2 aims a bomb at u 2 , we assume that the damage done is proportional to exp {  b(u 1  U2)2}, where b is a positive constant. With appropriate renormalization the game may be formulated as a continuous game on the unit square such that J (u 1 ,u 2 ) = exp {  A.(u 1  U2)2}, where A. is a positive parameter depending on a and b.
4
STATIC NONCOOPERATIVE INFINITE GAMES
175
A kernel J defined on [0,1] x [0,1] is said to be bellshaped if = 4> (u l  u 2 ), where 4> satisfies the following conditions:
J (u l ,u 2 )
(1) 4>(u) is an analytic function defined for all real u; (2) for every n and every set of values {u~}Z; I ' {u~ H; I such that u~ < u~ < ... < u~, i = 1,2, the determinant of the matrix A with the entries
aij = 4>(ul uJ) is positive; (3) 4>(u)du < 00.
r:
The function 4> (u) = exp ( AU 2 ) with A > 0 satisfies these properties. However, the function l/J(u) = 1/(1 + u 2 ) , whose graph also has a bell shape, does not meet all of the above conditions. For bellshaped games we have PROPOSITION I The average value Vm ofa bellshapedgame exists, it is positive, and any mixed saddlepoint strategy of a player assigns positive probability to only a finite number of pure strategies.
Proof Existence of the average value Vm follows directly from Corollary 3. Furthermore, it is bounded below by the minimum of J (u l ,u 2 ) on the unit square. Since 4> is continuous, J is continuous, and this minimum must be positive according to property (2). If the mixed saddlepoint strategy of one of the players, say P2, assigns positive probability to an infinite number of pure strategies, then, for any optimal /11, say /11*, we must have 4>(u lu 2)d/1I*(u l) = V > 0 (i)
s;
m
for an infinite number of u sin [0, I]. Since 4> is analytic, the integral is analytic in u 2 , and therefore (i) should hold true for all u 2 • However, from property (3) it follows that 2
lim
u2
_ ±CX)
S;4>(ulu 2)d/1l(u l)+0,
which contradicts the previous strict inequality. Hence, the mixed saddlepoint strategy of P2 is of the finite type, i.e. it is a finite combination of pure strategies. The proof for the mixed saddlepoint strategy of PI follows similar lines. 0 For more details on bellshaped games the reader is referred to Karlin (1959). It can be shown that, for the bombersubmarine problem which we initially formulated, the mixed saddlepoint strategies are in the form of twopoint distributions in case A = 3. (It can also be proven that, with increasing A, the number of pure strategies which are assigned positive probability by the
176
DYNAMIC NONCOOPERATIVE GAME THEORY
mixed saddlepoint strategy will increase.) For A = 3, the solution is u1"
where
=
{O
w.P·t
u2"
1 W.P.2
=
{P
W·p·t
IP W·P·2
Pis approximately 0·072. 4.5 STACKELBERG SOLUTION OF CONTINUOUSKERNEL GAMES
This section is devoted to the Stackelberg solution of static nonzerosum game problems when the number of alternatives available to each player is not a finite set and the cost functions are described by continuous kernels. For the sake of simplicity and clarity in exposition, we shall deal primarily with twoperson static games. A variety of possible extensions of the Stackelberg solution concept to Nperson static games with different levels of hierarchy will be briefly mentioned towards the end of the section, with the details left to the interested reader as exercises. Let ui E Vi denote the action variable of Pi, where his action set Vi is assumed to be a subset of an appropriate metric space. The cost function J i of Pi is defined as a continuous function on the product space V 1 X V 2. Then we can give to following general definition of a Stackelberg equilibrium solution, which is the counterpart of Def. 3.27 for infinite games. 5 In a twoperson game, with PI as the leader, a strategy is called a Stackelberg equilibrium strategy for the leader if
DEFINITION
u 1* E V
1
J
1
"
~
sup
J
1
1 ( U ",
u 2):S; u2
u 2 ER 2 (u'")
sup ER
2
Jl(ul,u 2)
(3)
(u' )
for all u 1 E V 1. Here, R 2 (u 1) is the rational reaction set of the follower as introduced in Def. 3. 0 Remark 2
If R 2 (u 1 ) is a singleton for each u 1 E V 1, in other words, if it is described completely by a reaction curve l2: V 1 + V 2, then inequality (3) in the above definition can be replaced by
J 1" ~ J 1 (u 1 ', l2 (u 1*»
s
J 1 (u 1, l2 (u 1
for all u 1 E V 1.
»
(4)
0
If a Stackelberg equilibrium strategy exists for the leader, then the LHS of inequality (3)is known as the leader's Stackelberg cost, and is denoted by J 1", A more general definition for J 1" is, in fact, J1*= inf uieU
sup
l
u
2eR 2(u
l)
J 1 ( U 1, u2 ),
(5)
4
177
STATIC NONCOOPERATIVE INFINITE GAMES
which also covers the case when a Stackelberg equilibrium strategy does not exist. It follows from this definition that the Stackelberg cost of the leader is a welldefined quantity, and that there will always exist a sequence of strategies for the leader which will insure him a cost value arbitrarily close to J 1*, This observation brings us to the following definition of s Stackelberg strategies. DEFINITION 6 Let Ii > 0 be a given number. Then, a strategy u ~ * E V 1 is called an e Stacke/berg strategy for the leader (P1) if
sup "2 E
o
Jl(Ui*,U2)~Jl0+1i.
Rl(u~·}
The following two properties of s Stackelberg strategies now readily follow. 0
Property 1 In a twoperson game, let J 1 be a finite number. Then, given an arbitrary s > 0, an e Stackelberg strategy necessarily exists. 0 Property 2 Let {u~;*} be a given sequence of e Stackelberg strategies in V 1 , and with Iii > ej for i < j and lim l ej = O. Then, if there exists a convergent in V 1 with its limit denoted as u 1*, and further if subsequence {u~*} 'k 1 1 2 SUPu2ER2(u,)J1 (u , u ) isa continuous function ofu in an open neighbourhood 1 of u 1 ° E V 1, u * is a Stackelberg strategy for PI. 0 r : OC)
The equilibrium strategy of the follower, in a Stackelberg game, would be any strategy that constitutes an optimal response to the one adopted (and announced) by the leader. Mathematically speaking, if u 1 0 (respectively, u~ *) is adopted by the leader, then any u 2 E R 2 (u 1 ) (respectively, u 2 E R 2 (ur)) will be referred to as an optimal strategy for the follower that is in equilibrium with the Stackelberg (respectively, Ii Stackelberg) strategy of the leader. This pair is referred to as a Stackelberg (respectively, s Stacke/berg) solution of the twoperson game with P1 as the leader (see Def. 3.28). The following theorem now provides a set of sufficient conditions for twoperson nonzerosum games to admit a Stackelberg equilibrium solution. THEOREM 6 Let V 1 and V 2 be compact metric spaces, and Ji be continuous on V 1 X V 2, i = 1, 2. Further let there exist a finite family of continuous mappings l(i): VI + V 2, indexed by a parameter i E I ~ {l, ... , M}, so that R 2 (U 1 ) = {U 2 E V 2:U 2 = 1(i)(u 1 ), iE/}. Then, the twoperson nonzerosum static game admits a Stackelberq equilibrium solution. 0
Proof It follows from the hypothesis of the theorem that J 1 , as defined by (5), is finite. Hence, by Property 1, a sequence of e Stackelberg strategies exists for the leader, and it admits a convergent subsequence whose limit lies in Vi,
178
DYNAMIC NONCOOPERATIVE GAME THEORY
due to compactness of Vi. Now, since R 2 ( • ) can be constructed from a finite family of continuous mappings (by hypothesis), sup
u 2eR 2(u 1)
Jl(U 1,
u 2 ) = max Jl(u 1,/(i)(u 1 », iel
and the latter function is continuous on Vi. Then, the result follows from Property 2. 0 Remark 3 The assumption of Thm 6, concerning the structure of R 2 ( . ) , imposes some severe restrictions on J2; but such an assumption is inevitable as the following example demonstrates. Take Vi = V 2 = [0,1], jl =  U 1U 2 and J2 = (u 1 t)u 2 . Here, R 2 ( . ) is determined by a mapping 1(.) which is continuous on the halfopen intervals [0, t), (t, 1], but is multivalued at u 1 = t. The Stackelberg cost of the leader is clearly J 1· =  t, but a Stackelberg strategy does not exist because of the "infinitely multivalued" nature of I. 0
When R 2 ( U 1 ) is a singleton for every u 1 E Vi, the hypothesis of Thm 6 can definitely be made less restrictive. One such set ofconditions is provided in the following corollary to Thm 6 under which there exists a unique I which is continuous (by an argument similar to the one used in the proof of Thm 3). 4 Every twoperson nonzerosum continuouskernel game on the square,for which J2(U 1, .) is strictly convex for all u 1 E Vi and PI acts as the leader, admits a Stackelberq equilibrium solution. 0
COROLLARY
It should be noted that the Stackelberg equilibrium solution for a twoperson game exists under a set of sufficiency conditions which are much weaker than those required for existence of Nash equilibria (cf. Thm 3). It should further be noted, however, that the statement of Thm 6 does not also rule out the existence of a mixedstrategy Stackelberg solution which might provide the leader with a lower average cost. We have already observed occurrence of such a phenomenon within the context of matrix games, in section 3.6, and we now investigate to what extent such a result could remain valid in continuouskernel games. Ifmixed strategies are also allowed, then permissible strategies for Pi will be probability measures Jii on the space Vi. Let us denote the collection of all such probability measures for Pi by Mi. Then, the quantity replacing Ji will be the average cost function P(Ji\ Ji2)
= Iu!
IU2Ji(U\ U2)dJil(U 1)dJi2(u 2),
(6)
and the reaction set R 2 will be replaced by
iP (Ji1) g
{jL2 E M 2: ]2(Ji1, jL2) S ]2(Jil, Ji2), V Ji2
E
M 2 }.
(7)
4
STATIC NONCOOPERATIVE INFINITE GAMES
179
Hence, we have DEFINITION 7 In a l is called a
JlI. E M
J" A
twoperson game with PI as the leader, a mixed strategy mixed Stackelberg equilibrium strategy for the leader if sup
1" e il' (1',.)
P (Jl l., Jl2) ~
Jl (Jll, Jl2)
sup 1" e R'(I")
for all Jll E M l , where JI·is known as the average Stackelberg cost of the leader in mixed strategies. 0 PROPOSITION
2 (8) i
Proof Since M also includes all onepoint measures, we have (by an abuse of notation) Vi c Mi. Then, for each u l E VI, considered as an element of M l , inf J2(u l, Jl2) == inf Jv,)2(U l, u2)dJl2(u2) 1"eM'
leM'
= inf )2(U l ,
U
2),
uleU2
where the last equality follows since any infimizing sequence in M 2 can be replaced by a subsequence of onepoint measures. This implies that, for one point measures in M l , iP (Jll) coincides with the set of all probability measures defined on R 2 (u l ) . Now, since M l => VI, P·=inf sup P(Jll,Jl2) M'
i
2 ( p' )
s
inf sup P(Jl l,Jl2), v'
Jf2(p')
and because of the cited relation between iP(Jll) and R 2(Ul ), the last expression can be written as inf sup v'
R 2 (u' )
)1
(u l , u 2 ) =
)1".
o
We now show, by a counterexample, that, even under the hypothesis of Thm 6, it is possible to have strict inequality in (8). (Counter) Example 6
Consider a twoperson continuouskernel game with
VI = V 2 = [0, IJ, and with cost functions
= I:(U l )2 +u l P )2 = (u2 _ (ul )2)2,
)1
where
I:
> 0 is a sufficiently small parameter.
_u
2,
180
DYNAMIC NONCOOPERATIVE GAME THEORY
The unique Stackelberg solution of this game, in pure strategies, is u l • = 0, u 2 • = (UI)2, and the Stackelberg cost for the leader is J I· = O. We now show that the leader can actually do better by employing a mixed strategy. First note that the follower's unique reaction to a mixed strategy of the leader is u 2 = E[ (ul )2] which, when substituted into JI, yields the expression
JI = eE[(u l)2] +E[u l] J{E[(U I)2]} E[(ulf]. Now, if the leader uses the uniform probability distribution on [0,1], his average cost becomes
JI=e~1+~J~ which clearly indicates that, for s sufficiently small,
JI' < 0
=
o
P'.
The preceding example has displayed the fact that even Stackelberg games with strictly convex cost functionals may fail to admit only purestrategy solutions, and the mixed Stackelberg solution may in fact be preferable. However, if we further restrict the cost structure to be quadratic, it can be shown that then only purestrategy Stackelberg equilibria will exist. PROPOSITION 3 Consider the twoperson nonzerosum game with Vi V 2 = Rm" and
Ji =
tui ' Rliu i + u i' Rliu i +iUilR~juj + ui'rI + uj'r~;i,j
= R'"
= 1,2, i =F j,
where R!i > 0, RL R!j, R~j are appropriate dimensional matrices, and rl, r~ are appropriate dimensional vectors. This "quadratic" game can only admit a purestrategy Stackelberg solution, with PI as the leader. Proof Assume, to the contrary, that the game admits a mixedstrategy Stackelberg solution, and denote the leader's optimal mixed strategy by Ill'. Furthermore, denote the expectation operation under Ill'by E [.]. If the leader announces this mixed strategy, then the follower's reaction is unique and is given by
=  REI R~I E[u 1 ]  REI d. is substituted in P' = E[JI], we obtain JI' = tE[ul'Rt I u l ] + E[u 1 ]' KE[u 1 ] + E[ul]'k + c, u2
If this where
4
STATIC NONCOOPERATIVE INFINITE GAMES
181
Now, applying CauchySchwarz inequality (see Ash, 1972)on the first term of
Ji', we further obtain the bound Ji'
~
tE[u 1 ]' R~
1
E[u 1 ]
+ E[u 1 ]' KE[u 1 ] + E[u 1]'k + c
(i)
which depends only on the mean value of u 1 • Hence P' = infau 1 ' R~1U1 u'
+ u 1 ' Ku 1 + u 1 ' k +c}
~
Jr.
This implies that enlargement of the strategy space of the leader, so as also to include mixed strategies, does not yield him any better performance. In fact, since E[u 1 ' u 1 ] > E[u 1 ] E [u 1 ] , whenever the probability distribution is not onepoint, it follows that the inequality in (i) is actually strict for the case of a proper mixed strategy. This then implies that, outside the class of purestrategies, there can be no Stackelberg solution. 0 Graphical display of Stackelberg solutions and the notion of relative leadership If V 1 and U 2 are ldimensional spaces, and if, in addition, R 2 ( U 1 ) is a singleton for each u 1 E V 1 , then the Stackelberg equilibrium solution can easily be visualized on a graph. In Fig. 9, isocost curves for J1 and J2 have been drawn, together with the reaction curves 11 and 12 , With PI as the leader, the Stackelberg solution will be situated on 12 , at the point where it is tangent to the appropriate isocost curve of J1. This point is designated as S 1 in Fig. 9. The coordinates of S 1 now correspond to the Stackelberg solution of this game with PI as the leader. The point ofintersection of 11 and 12 (which is denoted by N in the figure) denotes the unique Nash equilibrium solution of this game. It should be noted that the Nash costs of both players are higher than their corresponding equilibrium costs in the Stackelberg game with PI as the leader. This is, though, only a specific example, since it would be possible to come up with situations in which the follower is worse off in the Stackelberg game than in the associated "Nash game". However, when the reaction set R 2 (U 1) is a
Fig. 9. Graphical construction of the Stackelberg solution in a twoperson gamea
"stalemate" solution.
182
DYNAMIC NONCOOPERATIVE GAME THEORY
singleton for every u l E VI, the leader cannot do worse in the "Stackelberg game" than his best performance in the associated "Nash game" since he can, at the worst, play the strategy corresponding to the most favourable (from the leader's point of view) Nash equilibrium solution pair, as discussed earlier in section 3.6 (see, in particular, Prop. 3.14). To summarize, PROPOSITION 4 For a twoperson nonzerosum game, let V1denote the infimum of all the Nash equilibrium costs of PI. Then, if R 2 (u') is a singleton for every u l E VI, v1 ~ J", 0
Hence, in twoperson nonzerosum games with unique follower responses, the leader never prefers to play the "Nash game" instead of the "Stackelberg game", whereas the follower could prefer to play the "Nash game", if such an option is open to him. In some games, even the option of who should be the leader and who should follow might be open to the players, and in such situations there is the question of whether it is profitable for either player to act as the leader rather than be the follower. For the two person game depicted in Fig. 9, for example, both players would prefer to act as the follower (in the figure, point S2 characterizes the Stackelberg solution with P2 as the leader). There are other cases, however, when both players prefer the leadership of only one of the players (see Fig. IOa, where leadership of P2 is more advantageous to both players) or when each player wants to be the leader himself (Fig. lOb). When the players mutually benefit from the leadership of one of them, then this constitutes a stable situation since there is no reason for either player to deviate from the corresponding Stacke1berg solution which was computed under mutual agreementsuch a Stackelberg equilibrium solution is called a "concurrent" solution. When each player prefers to be the leader himself, then the Stackelberg solution is called "nonconcurrent". In such a situation, each player will try to dictate the other player his own Stackelberg strategy and thus force him to play the "Stackelberg game" under his leadership. In this case, the one who can process his data faster will certainly be the leader and announce his strategy first. However, if the slower player does not actually know that the other player can process his data faster than he does, and/or if there is a delay in the information exchange between the players, then he might tend to announce a Stackelberg strategy under his own leadership quite unaware of the announcement of the other playerwhich certainly results in a nonequilibrium situation. Finally, for the case when neither player wants to be the leader (as in Fig. 9), the "Stackelberg game" is said to be in a "stalemate", since each player will wait for the other one to announce his strategy first.
4
183
STATIC NONCOOPERATIVE INFINITE GAMES
u2
} 'constant J2 , constant
(b)
(a)
Fig. 10. Different types of Stackelberg equilibrium solutions. (a) Concurrent solution.
(b) Nonconcurrent solution.
4.6 QUADRATIC GAMES WITH APPLICATIONS IN MICROECONOMICS In this section, we obtain explicit expressions for the Nash and Stackelberg solutions of static nonzerosum games in which the cost functions of the players are quadratic in the decision variablesthe socalled quadratic games. The action (strategy) spaces will be taken as appropriate dimensional Euclidean spaces, but the results are also equally valid (under the right interpretation) when the strategy spaces are taken as infinitedimensional Hilbert spaces. In that case, the Euclidean inner products will have to be replaced by the inner product of the underlying Hilbert space, and the positive definiteness requirements on some of the matrices will have to be replaced by strong positive definiteness of the corresponding selfadjoint operators. This section will also include an illustration ofthe quadratic model and its Nash and Stackelberg solutions within the context of noncooperative economic equilibrium behaviour of firms in oligopolistic markets. A general quadratic cost function for Pi, which is strictly convex in his action variable, can be written as . 1 ~ J' =  L.
~ L.
2 j =1 k=l
...
k
u' R jk U +
~ L.
j=l
.,.
.
rj u' + c',
(9)
where uj E V j = R mJ is the mrdimensional decision variable of Pj, R~k is an (mj x mk)dimensional matrix with R: j > 0, r~ is an mrdimensional vector and ci is a constant. Without loss of generality, we may assume that, for j + k, Rjk = R~~, since if this were not the case, the corresponding two quadratic terms could be written as u
rs:jk Uk + Uk'Rikju
j_
i'(R~k+R:;j) 2
 U
and redefining R}k as (R jk + R ~~)/2,
k + Uk,(R~k+RL)' 2
U
j
u,
(10)
a symmetric matrix could be obtained. By
184
DYNAMIC NONCOOPERATIVE GAME THEORY
an analogous argument, we may take R Jj to be symmetric, without any loss of generality. Quadratic cost functions are of particular interest in game theory, firstly because they constitute secondorder approximation to other types of nonlinear cost functions, and secondly because they are analytically tractable, admitting, in general, closedform equilibrium solutions which provide insight into the properties and features of the equilibrium solution concept under consideration. To determine the Nash equilibrium solution in strictly convex quadratic games, we differentiate Ji with respect to u i (i EN), set the resulting expressions equal to zero, and solve the set of equations thus obtained. This set of equations, which also provides a sufficient condition because of strict convexity, is i+ R:iU
L Rijuj+d = °
(iEN),
(11)
jfi
which can be written in compact form as
Ru = r
(12a)
where R~l
R~2
R~N
Ri2
R~2
R~N (12b)
R~
RfN
R~N
R~N
u' ~ (ul, u 2, r' ~ (rL rL
, UN). , r~).
(12c) (12d)
This then leads to the following proposition. PROPOSITION 5 The quadratic Nplayer nonzerosum static game defined by the costfunctions (9) and with R: i > 0, admits a Nash equilibrium solution if, and only if, (12a) admits a solution, say u* .. this Nash solution is unique if the matrix R defined by (12b) is invertible, in which case it is given by
u*
= R1r.
(13)
o
Remark 4 Since each player's cost function is strictly convex and continuous
in his action variable, quadratic nonzerosum games of the type discussed
4
185
STATIC NONCOOPERATIVE INFINITE GAMES
above cannot admit a Nash equilibrium solution in mixed strategies. Hence, in strictly convex quadratic games, the analysis canbe confined to the class of pure strategies. 0 We now investigate the stability properties of the unique Nash solution of quadratic games, where the notion of stability was introduced in section 3 for the class of twoperson games. Assuming N = 2, in the present context, the corresponding alternating sequence of moves leads to the iterative relation
ui+ 1 = (Rt
.r' Rt2 (R~2)1
R~l
ui  (Rt
.r' [d  Rt2 (R~2)1
k = 0, I, ...
dJ, (14)
for PI, with uA denoting any starting choice for him. IfP2 is the starting player, then an analogous iterative equation can be written for him. Then, the question of stability of the Nash solution (13), with N = 2, reduces to stability of the equilibrium solution of (14) (which is obtained by setting ui+ 1 = ui = u" in (14) and solving for u 1 *). Since (14) is a linear difference equation, a necessary and sufficient condition for the iterative procedure to converge to the actual Nash strategy of PI is that the eigenvalues of (Rtd 1 Rt2 (R~2)1 R~l should be in the unit circle, i.e. IJc[(Rtd 1 Rt2(R~2)lR~lJI < 1. (15) If, instead of (14), we had written a difference equation for P2, the corresponding condition would then be IJc[(R~2)lR~dRtl)lRt2JI
< 1,
which is though equivalent to the preceding condition since, if A and B are two matrices of compatible dimensions, nonzero eigenvalues of AB and BA remain the same (cf. Marcus and Mine, 1964, p. 24). This way of obtaining the solution of the set of equations (11) for N = 2 corresponds to the socalled GaussSeidel numerical procedure. Its extension to the N > 2 case, however, is possible in a number of ways. For a threeplayer quadratic game, for example, one such possible iteration scheme is for PI and P2 to announce and update their strategies simultaneously, but for P3 only after he observes the other two. Other iterative schemes can be obtained by interchanging the roles of the players. Since the conditions involved are in general different for different iteration schemes, it should be apparent at this point that stability of the Nash solution in Nperson games will depend to great extent on the specific iteration procedure selected (see Fox (1964) for illustrative examples). From a game theoretic point of view, each of these iteration schemes corresponds to a game with a sufficiently large number of stages and with a
186
DYNAMIC NONCOOPERATIVE GAME THEORY
particular mode of play among the players. Moreover, the objective of each player is to minimize a kind of an average long horizon cost, with costs at each stage contributing to this average cost. Viewing this problem overall as a multiact nonzerosum game, we observe that the behaviour of each player at each stage of the game is rather "myopic", since at each stage the players minimize their cost functions only under past information, and quite in ignorance of the possibility of any future moves. If the possibility offuture moves is also taken into account, then the rational behaviour of each player at a particular stage could be quite different. For an illustration of this possibility the reader is referred to Example 7 which is given later in this section. Such myopic decision making could make sense, however, if the players have absolutely no idea as to how many stages the game comprises, in which case there is the possibility that at any stage a particular player could be the last one to act in the game. In such a situation, riskaversing players would definitely adopt "myopic" behaviour, minimizing their current cost functions under only the past information, whenever given the opportunity to act. Twoperson zerosum games Since zerosum games are special types of twoperson nonzerosum games with J j =  J 2 (PI minimizing and P2 maximizing), in which case the Nash equilibrium solution concept coincides with the concept of saddlepoint equilibrium, a special version of Prop. 5 will be valid for quadratic zerosum games. To this end, we first note that the relation J, =  J 2 imposes in (9) the restrictions
under which matrix R defined by (12b) can be written as
R _(

RL
Rt~
which has to be nonsingular for existence of a saddle point. Since R can also be written as the sum of two matrices
the first one being positive definite and the second one skewsymmetric, and since eigenvalues of the latter are always imaginary, it readily follows that R is a nonsingular matrix. Hence we arrive at the conclusion that quadratic strictly convexeoncave zerosum games admit unique saddlepoint equilibrium in pure strategies.
4 COROLLARY
5
J =
187
STATIC NONCOOPERATIVE INFINITE GAMES
The strictly convexconcave quadratic zerosum game
tul'R~IUI
+ul'R~2U2
tu2'R~2U2
+ ul'd u 2'd +c l;
R~I>O,R~2>O,
admits a unique saddlepoint equilibrium in pure strategies, which is given by u 1 =  [R~ I + R~2 (R~2)1 u 2° = [R~2 +R~~(R~I)IR~2rl[d 0
R~~r
I[d + R~2 (R~2)1 +R~~(R~dlrn
dJ,
0
Remark 5 Corrollary 5 can also be considered as an extension of Corollary 2 to games with noncompact strategy spaces, but with a quadratic kernel. 0 Team problems Yet another special class of nonzerosum games are the team problems in which the players (or equivalently, members of the team) share a common objective. Within the general framework, this corresponds to the case JI == J2 == ... == J N g J, and the objective is to minimize this cost function over all 20 I UiE Vi, i = 1, ... ,N. The resulting solution Ntuple (u O , u , ••• ,UN") is known as the teamoptimal solution. The Nash solution, however, corresponds to a weaker solution concept in team problems, the socalled personbyperson (pbp) optimality. In a twomember team problem, for example, a pbp optimal solution (u 10, u 2°) dictates satisfaction of the pair of inequalities J (u IO, u 2 0) s J (u l, u 2 0), V u l E V I,
J(u IO, u20) s J(u IO, U2),VU 2 E V I ,
whereas a teamoptimal solution (uI O, u2° ) requires satisfaction of a single inequality J (u1*, u 20) ~ J (u I, u 2 ), V UI E V 1, u 2 E V 2. A teamoptimal solution always implies pbp optimality, but not vice versa. Of course, if J is quadratic and strictly convex on the product space Vi x ... x V N, then a unique pbp optimal solution exists, and it is also teamoptimal. t However, for a cost function that is strictly convex only on individual spaces Vi, but not on the product space, this latter property may not be true. Consider, for example, the quadratic cost function J = (U I)2 + (U 2)2 + 10 UlU2 + 2u l + 3u 2 which is strictly convex in u l and u 2 , separately. The matrix corresponding to R t This result may fail to hold true for team problems with strictly convex but nondifferentiable kernels (see Problem 9, section 7).
188
DYNAMIC NONCOOPERATIVE GAME THEORY
defined by (12b) is
2 ( 10
1 20)
which is clearly nonsingular. Hence a unique pbp optimal solution will exist. However, a teamoptimal solution does not exist since the said matrix (which is also the Hessian of J) has one positive and one negative eigenvalue. By cooperating along the direction of the eigenvector corresponding to the negative eigenvalue, the members of the team can make the value of J as small as possible. In particular, taking u 2 = _ju 1 and letting u 1 + + 00, drives J to  00. The Stackelberg solution We now elaborate on the Stackelberg solutions of quadratic games of type (9) but with N = 2, and PI acting as the leader. We first note that since the quadratic cost function Ji is strictly convex in u i , by Prop. 3 we can confine our investigation ofan equilibrium solution to the class of pure strategies. Then, to every announced strategy u 1 of PI, the follower's unique respose will be as given by (11) with N = 2, i = 2:
(16) Now, to determine the Stackelberg strategy of the leader, we have to minimize J lover V 1 and subject to the constraint imposed by the reaction of the follower. Since the reaction curve gives u 2 uniquely in terms of u 1 , this constraint can best be handled by substitution of (16) in jl and by minimization of the resulting functional (to be denoted by J1) over V 1. To this end, we first determine J1: J1(u 1) =tu 1'Rt 1 u 1 +HR~1  ul'R~1  [R~IUl
(R~2)I[R~1 +d]'(R~2)ld
u 1 + dJ' (R~2)1 R~2 (R~2)1 1 1 u + + u 'd
rn
[R~1
u
1
+
rn
+c 1.
For the minimum of J1 over Vi to be unique, we have to impose a strict convexity restriction on J1. Because of the quadratic structure of J1, the condition amounts to having the coefficient matrix of the quadratic term in u 1 positive definite, which is Rt 1 + R~~ (R~2)1  R~~ (R~2)1
R~2 (R~2 Ri'1
)1 R~1  R~dR~2)1
> O.
R~1
(17)
Under this condition, the unique minimizing solution can be obtained by setting the gradient of J1 equal to zero, which yields
4
189
STATIC NONCOOPERATIVE INFINITE GAMES
u 1* = [Rl 1 +RfdR~2)IR~2(R~2)IR~IR~dR~2)IR~1  R~~ (R~2  R~1 (R~2)1
)1 R~'IJ
r~
1 [R~~
(R~2)1
+ d  Rfl (R~2)1
R~2 (R~2)1
dl
d (18)
PROPOSITION 6 V nder condition (17), the twoperson version of the quadratic game (9) admits a unique Stackelberg strategy for the leader, which is given by (18). The follower's unique response is then given by (16). 0
Remark 6
The reader can easily verify that a sufficient condition for condition (17)to be satisfied is for J 1 to be strictly convex on the product space VI x V 2 • 0 Applications in microeconomics We now consider specific applications of the quadratic nonzerosum game theory in microeconomics, in establishing economic equilibria in certain types of markets. The markets we have in mind are defined with respect to a specific good or a collection of goods, and an individual participant (player) in these markets is either a buyer or a seller. The traded goods are produced by the seller(s) and are consumed by the buyer(s). An important characteristic of such markets is the number of participants involved. Markets in which there are a large number of both buyers and sellers, and in which each participant is involved with only a negligible fraction of the transactions, are called (bilaterally) competitive. A key characteristic of such competitive markets is that no single participant can, by his own actions, have a noticeable effect on the prices at which the transactions take place. There is a market price, and no seller needs to sell for less, and no buyer needs to pay more. Our interest in the sequel will be on the socalled oligopoly markets, which comprise a few sellers and many competitive buyers. In the special case of two sellers this market is known as a duopoly. In an oligopoly, the buyers cannot influence the price or quantities offered, and it is assumed that the collective behaviour of the buyers is fixed and known. The sellers, however, have an influence via the price they ask and the output (production) they realize. Example 7 This example refers to what is known as a Cournot duopoly situation. There are two firms with identical products, which operate in a market in which the market demand function is known. The demand function relates the unit price of the product to the total quantity offered by the firms. It is then assumed that all the quantity offered is sold at the prevailing price dictated by the demand function. Let qi denote the production level of firm i (Pi), and p denote the price of the commodity. Then, assuming a linear
190
DYNAMIC NONCOOPERATIVE GAME THEORY
structure for the market demand curve, we have the relation p = rx  f3(ql
+ q2),
where rx and 13 are positive constants known to both firms. Furthermore, if we assign quadratic production costs to both firms, the profit functions are given by pi = ql P _ kd ql )2, p2 = q2 p_k 2(q2)2, for PI and P2, respectively, where k 1 , k 2 are nonnegative constants. PI wants to maximize pi and P2 wants to maximize p2. Under the stipulation that the firms act rationally and that there is no explicit collusion, the Nash equilibrium concept seems to fit rather well within this framework. Formulated as a nonzerosum game, the Nash equilibrium solution of this decision problem is rx(f3 + 2k 2) 2 rx(f3 + 2kd ql = 313 2 + 4k 1k 2 + 4f3(k1 + k 2); q = 313 2 + 4k 1k2 + 4f3(k1 + k 2)'
which has been obtained by solving equation (11). By the very nature of the Nash equilibrium concept, it is assumed here that the decisions of the players are made independently. If this is not true, i.e. if either ql depends on q2 and/or q2 depends on s', then one must solve the set of equations dpl d q l =p+q
1{ 88pql +88pq2 ddq2} 1 J ql 2k 1q =0,
(19) dp2 2 { 8p 8p d q1} 2 d q2 = P + q 8q2 + 8ql d q2  2k2q = 0. The terms dq 2/dql and dq 1/dq2 are called the conjectural variations terms and they reflect the effect of variations in the output of one firm on the output of the other firm. In the Nash solution concept these terms are identically zero. However, in order to obtain the Stackelberg solution with either firm as the leader, an appropriate version of (19) has to be used. If, for instance, PI is the leader, then dq 1/dq2 = 0, butthe term dq 2/dql will not be zero. This latter term can in fact be determined by making use of the reaction curve of P2, which is q2 = (rx  f3ql )/[2(13 + k 2)].
The reader can now easily determine the Stackelberg strategy of PI by either maximizing pi on this reaction line, or by directly making use of (18). 0 Example 8 This is a somewhat more sophisticated version, and also an extension, of Example 7. There will be N firms or players in the oligopoly problem to be defined, and it will be possible for the price of the product to
4
STATIC NONCOOPERATIVE INFINITE GAMES
191
vary from one firm to another. Let us first introduce the following notation: pi = the profit for the ith firm (Pi), pi = the price per unit product charged by Pi, jj = average price = (Ii pi)! N, f3 = price sensitivity of overall demand, V = the price at which demand is zero, c = fixed costs per unit product of Pi, d i = demand for the product of the ith firm. Suppose that demand and price of the product of Pi are related through the relation . 1 . . d' = f3[V _ P')!(p' il)] (20) N
where)! > 0 is a constant, or equivalently through 1 ))  1 ( ) !
pi = ( 1 +)!( 1  N
i /i  TN qi ) '
V+ N ~
(21)
where we have replaced di with qi since the quantity produced by each firm is assumed to be equal to the demand for the product of that firm under the price dictated by this relation. The profit of Pi can then be written as
pi = (pi
and in the socalled "quantity game" the firstorder conditions for Nash equilibrium are (23)
192
DYNAMIC NONCOOPERATIVE GAME THEORY
In the price game it is assumed that the demands are determined via the prices according to (20), whereas in the quantity game it is assumed that the prices satisfy (21). In both cases the profit function of each firm is quadratic in the decision variables, and strictly concave in his own decision variable, and therefore it suffices to consider only the first order conditions. For the price game, the set of equations (22) becomes i
I
i
d (p , ... ,p ) + (p  c N
'" . I )udl(p, ... , p N ) apI .
= 0,
. lEN,
and since the piS appear symmetrically in this set of equations, we may substitute p gpl = p2 = ... = v" and solve for p, to obtain
.
pI
_ = Pg
V+ c (1 + NN 1Y) N_ 1
'
(24)
i E N.
2+~Y
If this solution is substituted into (20), the following equilibrium values for the demands are obtained: N1 y(1c) [3 N (25) d 1 = ... = d" ~d = ·,·(Vc). N N1 2 +  y N Now, for the quantity game, the set of equations (23) can be rewritten as i
I
N
p (q , ... ,q )  c + q
i
",'
In order to express pi as a function of p = pi = ... = v" into (21), to obtain p
=
pi
=
N
I
Upl(q , ... , q ) _ oqi  0,
V
s'. ... .e"
~ q,
i E N,
. lEN.
(26)
only, we substitute (27)
where we have also made use of the property that pi (i E N) is symmetric in d'', ... , d'', Now evaluating the quantity Opi/oqi, from (21), we obtain Opi N N +Y oqi= p·N(1+Y)'
. lEN.
(28)
Substitution of (27) and (28) into (26) yields the following unique equilibrium value for qi: i_'f" _. 1+y (29) q q=[3(V c) 2N+y(1+N)' iEN.
4
STATIC NONCOOPERATIVE INFINITE GAMES
193
If this value is further substituted into (27), we obtain i _
'6
p  p=
. N
V(N +y)+cN(1 +Y)
2N
+ Y(1 + N)
,I E
.
(30)
A comparison of (24) with (30) readily yields the conclusion that the Nash equilibria of price and quantity games are not the same. For the trivial game with only one firm, i.e. N = 1,however, we obtain in both cases the same result, which is p* = t(V + c), d* = tP( V  c). Also, in the limit as N  00, the corresponding limiting equilibrium values of p and q(d) in both games are again the same: p
V+c(l+y)
2
+y
, q ~ O.
For 1 < N < 00, however, the Nash equilibria differ, in spite of the fact that the economic background in both models is the same. Therefore, it is important to know a priori in this market as to which variable is going to be adopted as the decision variable. The price game with its equilibrium is sometimes named after Edgeworth (1925), whereas the quantity game is named after Cournot who first suggested it in the nineteenth century (Cournot, 1838). We now present a graphical explanation for occurrence of different Nash equilibria when different variables are taken as decision variables in the same model. Suppose that the isocost curves for a nonzerosum game are as depicted in Fig. 2, which have been redrawn in Fig. 11, where the decision variables are denoted by p1 and p2, and the reaction curves corresponding to these decision variables are shown as 11 and 12 , respectively. If a transformation to a new set of decision variables d 1 (p 1, p2), d 2 (p 1, p2) is made, then it should be apparent from the figure that we obtain different reaction curves m1 and m2
Fig. 11. Graphical illustration of the effect of a coordinate transformation on Nash
equilibria.
194
DYNAMIC NONCOOPERATIVE GAME THEORY
and hence a different Nash solution. As a specific example, in Fig. 11 we have chosen d 1 = pi, d 2 = pi + p2, in which case ml = 11; but since m2 + 12 Nash equilibrium becomes a variant of the coordinate transformation. It should be noted, however, that if the transformation is of the form d 1(pl ), d 2(p2), then the Nash equilibrium remains invariant. 0
4.7 PROBLEMS 1. Solve the following zerosum semiinfinite matrix games:
P2
PI
P2
[
~1:f21:/31~/4
3/4
PI
8/9 15/16 24/25 ... 0
1
0
0
...
2. The kernel of a zerosum game on the unit square is given by J(u 1 , u 2 ) = (U 1)3 _ 3u 1 u2 + (U 2)3. Determine the pure or mixed saddlepoint strategies. (Hint: note that J is strictly convex in u 1 , and that for any u 1 the maximum value of J with respect to u2 is attained either at u2 = 0 or at u2 = 1.) 3. Consider the zerosum continuous kernel game on the unit square, defined by J (u 1 , u 2 ) = (u 1  u 2 f  CX(U 2)2, where cx is a scalar parameter. Determine its pure or mixed saddlepoint solution when (i) 1 < cx ~ 2, (ii) cx = 0 and (iii) 0 < cx ~ 1. 4. We consider the duel described in Example 5 with the sale exception that now the pistols are noisy, i.e. a player knows whether his opponent has already fired or not. Show that this version of the duel game can be described by the kernel 1  2U1 ,
Jtu", u2 )
= 0
Ul
, u1
{ 2u 2  1 ,
U
=
u2 , 1>U2,
and verify that (u 1 * = t, u 2 * = t) constitutes a (pure) saddlepoint strategy pair. 5. Consider the twoperson Stackelberg game defined by the cost functions Ji = cx (1 + u 1 + U2)2 + (U i)2, i = 1,2, i
where cx i , i = 1,2, are scalar parameters; u 1 and u 2 are also scalars. For what values of (cx 1, CX2) is the game (i) concurrent, (ii) nonconcurrent, or (iii) stalemate? 6. Consider the twoperson Stackelberg game, with PI as the leader, defined on the unit square, and with the cost functions
4
STATIC NONCOOPERATIVE INFINITE GAMES
J1 J2
195
= P(l =
u l) U2 (U 2  I) + Ul, I)k_ «(U U2)2,
where P and k are scalars. For what values of P and k does a mixed Stackelberg solution for the leader yield a lower average cost for him than the pure Stackelberg solution? 7. In a threeperson game with three levels of hierarchy, first PI announces his strategy and dictates it on both P2 and P3; subsequently P2 announces his strategy and dictates it on P3. The cost functions are given by JI = (x _2)2 + (u1f, J2
= (x_l)2+(u 2)2,
J3 = (X_3)2+(U 3)2,
where x = u l + u 2 + u 3 and u i E R, i = I, 2, 3. Determine the hierarchical equilibrium solution of this game. Compare this with the Nash equilibrium solution of a nonzerosum game having the same cost structure. In which one (the threelevel hierarchy or Nash game) does PI attain a better performance? *8. A twoperson nonzerosum Stackelberg game is given with Vi = Rm" i = 1, 2, and with PI as the leader whose cost function is J I. Prove that if the graph {u l x u 2 E U 1 X V 2: u 2 = R 2 (u l)}, where R 2 (u l) is the reaction set of P2, is convex and PI's cost function is convex on this set, then JI* = J 1*, i.e. there is no need for PI to extend the class of pure strategies to the class of mixed strategies. (This problem is a generalization of Prop. 3.) 9. Consider a twoperson game with J 1 == J 2 g J. Show that the cost function J: R 2 + R defined by 2)2+(U 1)2, J={(lU (lU I)2+(U 2)2,
12U2, U UI
is strictly convex on R 2 . What are the reaction sets R 2 (u 1 ) and R 1 (u 2 )? Prove that infinitely many personbyperson (pbp) optimal solutions exist, but that the team solution is unique. Now prove that, for quadratic and strictly convex J, the reaction curves can have at most one point in common, and therefore the pbp optimal solution and the team solution are unique and identical. *10. We are given a threeperson game with the cost functions
= (UI)2+UIU2+UIU3_2ul, = U2U l + (U 2)2 + U2U3 _ 2u 2, J3 = U3U 1 + U3U 2 + (U 3)2 _ 2u 3, Jl
J2
196
DYNAMIC NONCOOPERATIVE GAME THEORY
where the action variables u i , i = 1,2,3, are scalars. Consider two different numerical schemes for the determination of Nash equilibrium:
Scheme 1:
u~+
1
= arg min J1 (u, uf, un u
u~+
1
= arg min J2
(u~,
u, un
u
u:+ 1 = arg min J3 (u~,
u),
u~,
u
Scheme 2:
u~+
= arg min J1 (u, u~,
1
un
u
u
i
where u~ is the value of u at the kth iteration step. Prove that, if u~ is calculated according to scheme 1, the sequences {u~}, i = 1,2,3, do not converge, whereas they do converge if calculated according to scheme 2. Can you come up with another threeperson game for which the sequence generated by scheme 1 converges to the Nash equilibrium solution, but scheme 2 does not? 11. A two person nonzerosum game is given as follows. The action variables u 1 and u2 are scalars, the cost functions are J1 _ x'

j2
(1t
n
x+C(U 1)2,
t
= x' ( t1 ~)
x + c (u
2
f,
where c is a positive constant and
Determine the Nash solution as a function of the parameter c. Are the equilibrium strategies continuous in c for O'2 :s; c :s; I? * 12. Consider the "price" and "quantity" models of oligopoly as introduced in Example 8, but with the roles of the players such that PI is the leader and P2, ... , P N are the followers. Further assume that the followers play according to the Nash solution concept among themselves. Obtain the solution of this "leaderfollowers" game, and investigate its properties as the number of followers goes to infinity.
4
STATIC NONCOOPERATIVE INFINITE GAMES
197
13. Consider a twoperson nonzerosum game with the cost functions defined by Jl = (U 1)2+(u2 f ; J2 = (u 1_1)2+(u 2_1)2. (i) Determine the Nash solution and the Stackelberg solution with PI as the leader. (ii) Now assume that PI has access to the action variable u2 ofP2, and can announce a strategy in the form
u1 =
2 IXU
+P,
where IX and p are constants yet to be determined. Assuming that P2 is a rational player seeking to minimize his own cost function J 2, how should PI choose the values for IX and P? Show that the resulting value of J 1 is its global minimum, i.e. zero. Give a graphical interpretation of this result.
4.8
NOTES
Section 4.2 The concept of e equilibrium solution is as old as the concept of equilibrium solution itself. The results presented in this section are standard, and our proof of Thm 2 follows closely the one given in Burger (1966) where additional results on e equilibrium solutions can be found. Corollary 1, which follows from Thm 2 in our case, was originally proven by Wald (1945)in a different way. Recently, the concept of e equilibrium solution has received attention in the works of Tijs (1975)and Rupp (1977). The notion of a reaction curve (also called reaction function) was first introduced in an economics context and goes as far back as 1838, to the original work of Cournot. Section 4.3 There are several results in the literature, known as minimax theorems, which establish existence of the value in pure strategies under different conditions on V 1, V 2 and J. Corollary 2 provides only one set of conditions which are sufficient for existence of value. Some other minimax theorems have been given by Sion (1958), Fan (1953) and Bensoussan (1971). Bensoussan, in particular, relaxes the compactness assumption of Corollary 2 and replaces it with two growth conditions on the kernel of the game. The result on the existence of Nash equilibria in pure strategies (i.e.Thm 3)is due to Nikaido and Isoda (1955), and extensions to mixed strategies were first considered by Ville (1938). The discretization procedure mentioned in the proof of Thm 5, in the text, suggests a possible method for obtaining mixed Nash equilibria of continuouskernel games in an approximate way, if an explicit derivation is not possible. In this case, one has to solve for mixed Nash equilibria of only finite matrix games, with the size of these matrices determined by the degree of approximation desired. Section 4.4 This section presents some results of Karlin (1959). Extensions of the game of timing involve the cases of PI having m bullets and P2 havingn bullets, mand n being integers ~ 1. The general silent (m, n) case was solved by Restrepo (1957)and the general noisy duel with m and n bullets was solved by Fox and Kimeldorf (1969). The case of PI having m noisy bullets and P2 having n silent bullets remains unsolved for m > 1. For other bomberbattleship problems, see Burrow (1978).
198
DYNAMIC NONCOOPERATIVE GAME THEORY
Section 4.5 The Stackelberg solution concept was first introduced in economics through the work ofH. Von Stackelberg in the early 1900s (cf. Von Stackelberg, 1934). But the initial formulation involved a definition in terms of reaction curves, with the generalization to nonunique reaction curves (or the socalled reaction sets) being quite recent (cf. Basar and Selbuz, 1979b; Leitmann, 1978). A different version of Thm 5, which accounts only for unique reaction curves, was given by Simaan and Cruz, Jr (1973a),but the more general version given here is due to Basar and Olsder (1980a): this latter reference is also the first source for extension to mixed strategies. A classification of different static Stackelberg solutions as given here is from Basar (1973), but the concept of "relative leadership" goes as far back as to the original work of Von Stackelberg (1934). Section 4.6 Quadratic games have attracted considerable attention in the literature, partly because of analytic tractability (reaction functions are always affine).Two good sources for economic applications of (quadratic) games are Friedman (1977)and Case (1979). Example 7 is the socalled Cournot's quantity model and can be found in the original work ofCournot (1838). Several later publications have used this model in one way or another (see e.g. Intriligator (1971) and Basar and Ho (1974)). Example 8 is a simplified version of the model presented in Levitan and Shubak (1971), where the authors also provide a simulation study onsome data obtained from automobile industry in the USA. Section 4.7 Problem 4 can be found in Karlin (1959). Problem 13(ii) exhibits a dynamic information structure and can therefore also be viewed as a dynamic game. For more details on such Stackelberg games the reader is referred to Chapter 7, and in particular to section 7.4.
PART II
This page intentionally left blank
Chapter 5
General Formulation of Infinite Dynamic Games
5.1
INTRODUCTION
This chapter provides a general formulation and some background material for the class of infinite dynamic games to be studied in the remaining chapters of the book. In these games, the action sets of the players comprise an infinite number (in fact a continuum) of elements (alternatives), and the players gain some dynamic information throughout the decision process. Moreover, such games are defined either in discrete time, in which case there exists a finite (or at most a countable) number of levels of play, or in continuous time, which corresponds to a continuum of levels of play. Quite analogous to finite dynamic games, infinite dynamic games can also be formulated in extensive form which, however, does not lead to a (finite) tree structure, firstly because the action sets of the players are not finite, and secondly because of existence of a continuum of levels of play if the game is defined in continuous time. Instead, the extensive form of an infinite dynamic game involves a difference (in discrete time) or a differential (in continuous time) equation which describes the evolution of the underlying decision process. In other words, possible paths of action are not determined by following the branches of a tree structure (as in the finite case) but by the solutions of these functional equations which we also call "state equations". Furthermore, the information sets, in such an extensive formulation, are defined as subsets of some infinite sets on which some further structure is imposed (such as Borel subsets). A precise formulation which takes into account all these extensions as well as the possibility of chance moves is provided in section 2 for discretetime problems and in section 3 for game problems defined in continuous time. In a discretetime dynamic game, every player acts only at discrete instants of time, whereas in the continuoustime formulation the players act throughout a time interval which is either fixed a 201
202
DYNAMIC NONCOOPERATIVE GAME THEORY
priori or determined by the rules ofthe game and the actions of the players. For finite games, the case when the duration of the game is finite, but not fixed a priori, has already been included in the extensive tree formulation of Chapters 2 and 3. These chapters, in fact, included examples of multiact games wherein the number of times a player acts is explicitly dependent on the actions of some other player(s)and these examples have indicated that such decision problems do not require a separate treatment, and they can be handled by making use of the theory developed for multiact games with an a priori fixed number of levels. In infinite dynamic games, however, an a priori fixed upper bound on the duration may sometimes be lacking, in which case "termination" of the play becomes a more delicate issue. Therefore, we devote a portion of section 3 to a discussion on this topic, which will be useful later in Chapter 8. Even though we shall not be dealing, in this book, with mixed and behavioural strategies in infinite dynamic games, we devote section 4 of this chapter to a brief discussion on this topic, mainly to familiarize the reader with the difficulties encountered in extending these notions from the finite to the infinite case, and to point out the direction along which such an extension would be possible. Finally, section 5 of the chapter deals with certain standard techniques of oneperson singlegoal optimization and, more specifically, with optimal control problems. Our objective in including such a section is twofold; firstly, to introduce the reader to our notation and terminology for the remaining portions of the book, and secondly, to summarize some of the tools for oneperson optimization problems which will be heavily employed in subsequent chapters in the derivation of noncooperative equilibria of infinite dynamic games. This section also includes a discussion on the notion of "representations of strategies on trajectories", which is of prime importance in infinite dynamic games, as it will become apparent in Chapters 6 and 7.
5.2
DISCRETETIME INFINITE DYNAMIC GAMES
In order to motivate the formulation of a discretetime infinite dynamic game in extensive form, we now first introduce an alternative to the tree model of finite dynamic gamesthe socalled loop model. To this end, consider an Nperson finite dynamic game with strategy sets {r i ; i EN} and with decision (action) vectors {ui E Ui; i EN}, where the kth block component (uD of ui designates Pi's action at the kth level (stage) of the game and no chance moves are allowed. Now, if such a finite dynamic game is defined in extensive form (cf. Def. 3.10),then for any given N tuple of strategies {yi E r, i E N} the actions of the players are completely determined by the relations iEN,
(1)
5
203
GENERAL FORMULATION
where fli denotes the information set of Pi. Moreover, since each fli is completely determined by the actions of the players, there exists a pointtoset mappingtgi:V 1 x V 2 x .,. x VN+Ni(iEN)suchthatfl i =gi(U 1, ... ,UN) (i EN) and when substituted into (1), yields the "loop" relation
ui = yi(gi(U 1 , ••• , UN» == pi(U1 , •.• , UN),
(2)
iEN.
This clearly indicates that, for a given strategy N tuple, the actions of the players are completely determined through the solution of a set of simultaneous equations which admits a unique solution under the requirements of Kuhn's extensive form (Kuhn, 1953). If the cost function of Pi is defined on the action spaces, as L i: V 1 X V 2 X •. , x V N+ R, then substitution of the solution of (2) into L i yields L i (u 1 , . . • , UN) as the cost incurred to Pi under the strategy N tuple {yi E r. i EN}. If the cost function is instead defined on the strategy spaces, as r. r 1 x r x ... x r N+ R, then we clearly have the relation Ji (yl, ... , yN) = L i (u1, ... ,UN) (i EN). Now, if a finite dynamic game is described through a set of equations of the form (2), with {pi; i EN} belonging to an appropriately specified class, together with an N tuple of cost functions {L i ; i EN}, then we call this a loop model for the dynamic game (cf. Witsenhausen, 1971a, 1975). The above discussion has already displayed the steps involved in going from a tree model to a loop model in finite deterministic dynamic games. Conversely, it is also possible to start with a loop model and derive a tree model out of that, provided that the functions pi : V 1 X V 2 X •.• x V N+ Vi (i EN) are restricted to a suitable class satisfying conditions like causality, unique solvability of the loop equation (2),etc. In other words, if the sets pi (i EN) are chosen properly, then the set of equations
(3)
iEN,
together with the cost structure {Z'; i EN}, leads to a tree structure that satisfies the requirements of Def. 3.10. If the finite dynamic game under consideration also incorporates a chance move, with possible alternatives for nature being WEn, then the loop equation (3) will accordingly have to be replaced by WEn,
iEN,
(4)
where, by an abuse of notation, we again let pi be the class of all permissible mappings pi: V 1 X v 2 X . , . x V NX n + Vi. The solution of (4) will now be a function of w; and when this is substituted into the cost function L i : V 1 X . • . x V NX n + R and expectation is taken over the statistics of co, tHere N' denotes the set of all tI'. as in Def. 3.11.
204
DYNAMIC NONCOOPERATIVE GAME THEORY
the resulting quantity determines the corresponding expected loss to Pi. Now, for infinite dynamic games defined in discrete time, the tree model of Def. 3.10 is not suitable since it cannot accommodate infinite action sets. However, the loop model defined by relations such as (3)does not impose such a restriction on the action sets; in other words, in the loop model we can take each U i, to which the action variable u i belongs, to be an infinite set. Hence, let us now start with a set of equations of the form (3) with U i (i E N) taken as infinite sets. Furthermore let us decompose u i into K blocks such that, considered as a column vector,
(5)
iEN,
where K is an integer denoting the maximum number of stages (levels) in the game, t and u~ denotes the action (decision or control) variable of Pi corresponding to his move during the kth stage of the game. In accordance with this decomposition, let us decompose each pi E pi in its range space, so that (3) can equivalently be written as (6)
Under the causality assumption which requires u~ to be a function of only the past actions of the players, (6) can equivalently be written as (by an abuse of notation)
=
u~
p~
(uL ... , uf,;",; uf, . . . , uf,), p~EPL
iEN, kEK.
(7)
Byfollowing an analysis parallel to the one used in system theory in going from inputoutput relations for systems to state space models (cf. Zadeh, 1969), we now assume that P~ is structured in such a way that there exist sets rt, Y~, X (i E N, k E K), with the latter two being finite dimensional, and functions jj : X x U~ x ... xU: + X, h~: X + Y~ (i E N, k EK), and for each p~ E P~ (i E N, k E K) there exists a y~ E (i E N, k E K), such that (7) can equivalently be written as
n
(8)
where ... ,
'1~
U~_I;'"
is a subcollection of ;ut', ... , uf,}, and y~ Xk+1
for some
=
{yL ... ,y~;···;yt',···,y:;uL
hUxk ) Yk E Yk
=.fk(xk , u~, ... , uf),
(9a) Xk+IEX
(9b)
XI EX.
t In other words, no player can make more than K moves in the game, regardless of the strategies picked.
5
GENERAL FORMULATION
205
Adopting the systemcontrol theory terminology, we call x k the state of the game, y~ the (deterministic) observation of Pi and tI~ the information available to Pi, all at stage k; and we shall take relations (8)(9) as the starting point in our formulation of discretetime infinite dynamic games. More precisely, we define an N person discretetime infinite dynamic game of prespecified fixed duration as follows. DEFINITION 1 An N person discretetime deterministic infinite dynamic game t of prespecified fixed duration involves: (i) An index set N = {I, , N} called the players' set. (ii) An index set K = {I, , K} denoting the stages of the game, where K is the maximum possible number of moves a player is allowed to make in the game. (iii) An infinite set X with some topological structure, called the state set (space) of the game, to which the state of the game (xd belongs for all k E K and k = K + 1. (iv) An infinite set V ~ with some topological structure, defined for each k E K and i E N, which is called the action (control) set of Pi at stage k. Its elements are the permissible actions u~ of Pi at stage k. (v) A functionfk: X x Vi x '" x vt+ X, defined for each kEK, so that
(i)
for some Xl E X which is called the initial state of the game. The difference equation (i)is called the state equation of the dynamic game, describing the evolution of the underlying decision process. (vi) A set Y~ with some topological structure, defined for each k E K and i E N, and called the observation set of Pi at stage k, to which the observation y~ of Pi belongs at stage k. (vii) A function h~: X + YL defined for each k E K and i EN, so that y~ = h~(Xk)'
kEK,
iEN,
which is the statemeasurement (observation) equation of Pi concerning the value of X k• (viii) A finite set tiL defined for each k E K and i E N as a subcollection of l y2. 'yN U1'U2 {y 1, ... , yl'y2 k ' 1"", k " ' " 1,···, yN'Ul k' 1 " " , kl, 1"'" 2 hi h d t . the i &' . d UkI; . • . ; Ul, . . . , UkI' W IC e ermmes e mrorma tiIOn game
N
N}
and recalled by Piat stage k ofthe game. Specification oftl~ for all k E K characterizes the information structure (pattern) of Pi, and the collection (over i E N) of these information structures is the information structure of the game. t
Also known as an "Nperson deterministic multistage game".
206
DYNAMIC NONCOOPERATIVE GAME THEORY
(ix) A set N i, defined for each k E K and i E N as an appropriate subset of {(Ytx ... xYf)x ... x (Yt'x ... xYt')x(vtx ... xVfd x .. , x (V t' x ... x V(_ d}, compatible with 17i. N i is called the information space of Pi at stage k, induced by his information 17i.
n
A prespecified class of mappings Yk: Nt + VI. which are the permissible strategies of Pi at stage k. The aggregate mapping yi = {yih }'~, . . . ,y~} is a strategy for Pi in the game, and the class r i of all such mappings yi so that yi E rt, k E K, is the strategy set (space) of Pi. x .,. xVi') (xi) A functional Li:(XxVt x ... x Vt')x(XxV~ x ... x (X x V i x '" x V~) + R defined for each i E N, and 0 called the cost functional of Pi in the game of fixed duration. (x)
The preceding definition of a deterministic discretetime infinite dynamic game is clearly not the most general one that could be given, firstly because the duration of the game need not be fixed, but be a variant of the players' strategies, and secondly because a "quantitative" measure might not exist to reflect the preferences of the players among different alternatives. In other words, it is possible to relax and/or modify the restrictions imposed by items (ii)and (xi)in Def. 1,and still retain the essential features ofa dynamic game. A relaxation of the requirement of (ii) would involve introduction of a termination set A c X x {1,2, ... }, in which case we say that the game terminates, for a given N tuple ofstrategies, at stage k, ifk is the smallest integer for which (Xb k) E A.' Such a more general formulation clearly also covers fixed
duration game problems in which case A = X x {K} where K denotes the number of stages involved. A modification of (xi), on the other hand, might for instance involve a "qualitative" measure (instead of the "quantitative" measure induced by the cost functional), thus giving rise to the socalled qualitative games (as opposed to "quantitative games" covered by Def. 1). Any qualitative game (also called game of kind) can, however, be formulated as a quantitative game (also known as game ofdegree) by assigning a fixed cost of zero to paths and strategies leading to preferred states, and positive cost to the remaining paths and strategies. For example, in a twoplayer game, if PI wishes to reach a certain subset A of the state set X after K stages and P2 wishes to avoid it, we can choose L1
_

{O
if x K + 1E A 1 otherwise'
2 _
L

{
° otherwise if x A K
 1
+ 1E
and thus consider it as a zerosum quantitative game. t It is, of course, implicit here that x. is determined by the given Ntuple of strategies, and the strategies are defined as in Def. 1, but by taking K sufficiently large.
5
GENERAL FORMULATION
207
Now, returning to Def. 1, we note that it corresponds to an extensiveform description of a dynamic game, since the evolution of the game, the information gains and exchanges of the players throughout the decision process, and the interactions of the players among themselves are explicitly displayed in such a formulation. It is, of course, also possible to give a normal form description of such a dynamic game, which in fact readily follows from Def. 1. More specifically, for each fixed initial state Xl and for each fixed Ntuple permissible strategies {yi E r: i EN}, the extensive form description leads to a unique set of vectors {u~ == y~ M), x k + 1 ; i EN, k EK } because of the causal nature of the information structure and because the state evolves according to a difference equation. Then, substitution of these quantities into L i (i EN) clearly leads to a unique N tuple of numbers reflecting the corresponding costs to the players. This further implies existence of a composite mapping i'. r l x ... x r N + R, for each i EN, which is also known as the cost functional of Pi (i EN). Hence, the permissible strategy spaces of the players (i.e. r 1, . . . , r N) together with these cost functions (J 1, . . . , J N) constitute the normalform description of the dynamic game for each fixed initial state vector Xl. It should be noted that, under the normal form description, there is no essential difference between infinite discretetime dynamic games and finite games (the complex structure of the former being disguised in the strategy spaces and the cost functionals), and this permits us to adopt all the noncooperative equilibrium solution concepts introduced in Chapters 2 and 3 directly in the present framework. In particular, the reader is referred to Defs 2.8,3.12 and 3.27 which introduce the saddlepoint, Nash and Stackelberg equilibrium solution concepts, respectively, which are equally valid for infinite dynamic games (in normal form). Furthermore the feedback Nash (cf.Def. 3.22) and feedback Stackelberg (cf. Def. 3.29) solution concepts are also applicable to discretetime infinite dynamic games under the right type of interpretation, and these are discussed in Chapters 6 and 7, respectively. Before concluding our discussion on the ingredients of a discretetime dynamic game as presented in Def. 1, we now finally classify possible information structures that will be encountered in the following chapters, and also introduce a specific class of cost functionsthe socalled stageadditive cost functions. DEFINITION 2 In an N person discretetime deterministic dynamic game of prespecified fixed duration, we say that Pi's information structure is (i) openloop (OL) pattern if 11~ = {xd, k EK, (ii) closedloop perfect state (CLPS) pattern if 11~ = {Xl' . . . Xk}' kEK, (iii) closedloop imperfect state (CLlS) pattern if f/~ = {yil , ••• , yU, k E K, (iv) memoryless perfect state (MPS) pattern if 11~ = {Xl' xd, k E K,
208
DYNAMIC NONCOOPERATIVE GAME THEORY
(v) feedback (perfect state) (FB) pattern if'1k = {xk},kEK, (vi) feedback imperfect state (FIS) pattern if '11 = {y1}, k E K, (vii) onestep delayed CLPS (1DCLPS) pattern if '11 = {Xl'· .. 'X k 1}' k E K, k =1= 1, (viii) onestep delayed observation sharing (1DOS) pattern if '11 = {Yl' ... 'Ykl,yl}, kEK, where Yj~{yJ,yJ, ... ,yt}. 0 DEFINITION 3 In an N person discretetime deterministic dynamic game of prespecified fixed duration (i.e, K stages), Pi's cost functional is said to be stageadditive if there exist gl: X x X x vi x ... x U f + R, (k E K) so that
(10)
where uj
=
(u{, ... , u{)'.
Furthermore, if Li(ut, . . ., UN) depends only on we call it a terminal cost functional.
X K + 1 (the
terminal state), then 0
Remark 1 It should be noted that every stageadditive cost functional can be converted into a terminal cost functional, by introducing an additional variable Zk (k E K) through the recursive relation and by adjoining Zk to the state vector X k as the last component. Denoting the new state vector as xk ~ (x~, Zk)', the stageadditive cost functional (10)can then be written as Li(U 1 , • . . ,UN) = (0, ... ,0, 1)x K + 1 which is a terminal cost functional.
o
Games with chance moves: stochastic games Infinite discretetime dynamic games which also incorporate chance moves (the socalled stochastic games) can be introduced by modifying Def. 1 so that we now have an additional player, called "nature", whose actions influence the evolution of the state of the game. These are in fact, random actions which obey an a priori known probability law, and hence what replaces the state equation (9b) in stochastic games is a conditional probability distribution function of the state given the past actions of the players and the past values of the state. An equivalent way of saying this is that there exists a function Fk:X x V~ x ... x Vfx 0+ X, defined for each kEK, so that X k+ 1
= Fdx k, ul, . . . , uf, Ok), kEK,
5
209
GENERAL FORMULATION
where Ok is the action variable of nature at stage k, taking values in 0; the initial state Xl is also a random variable and the joint probability distribution function of {Xl' 1 , . . . , OK} is known. A precise formulation of a stochastic dynamic game in discrete time would then be as follows.
°
DEFINITION 4 An Nperson discretetime stochastic infinite dynamic game of prespecified fixed duration involves all but items (v) and (xi) of Def. 1, and in addition: (0) A finite or infinite set 0, with some topological structure, which denotes the action set of the auxiliary (N + lth) player, nature. Any permissible action Ok of nature at stage k is an element of 0. (v) A function Fk : X x o; x ... x U x 0 +X, defined for each k E K, so that
r
(11)
where Xl is a random variable taking values in X, and the joint probability distribution function of {Xl> 1, ... , K} is specified. (X xU} x ... x U f x 0) x (X x U ~ x ... x U f x 0) (xi) A functional x ... x (X x Ui x ... x x 0)+R defined for each i EN, and called the cost functional of Pi in the stochastic game of fixed duration. 0
u.
°
°
UZ
To introduce the noncooperative equilibrium solution concepts for stochastic games as formulated in Def. 4, we again have to transfer the original game in extensive form into equivalent normal form, by first computing L i (.). (i E N) for each Ntuple of permissible strategies {yi E r, i E N} and as a function of the random variables {x I, I, . . . , ON}' and then by taking expectation of the resulting N.tuple of cost functions over the statistics of these random variables. The resulting deterministic functions Ji(yl, . . . , yN)(iEN) are known as the expected (average) costfunctionals of the players, and they characterize the normal form of the original stochastic game together with the strategy spaces {r i ; iEN}. We now note that, since such a description is free of any stochastic nature of the problem, all the solution concepts applicable to the deterministic dynamic game are also applicable to the stochastic dynamic game formulated in Def. 4, and hence the stochastic case need not be treated separately while introducing the equilibrium solution concepts.
°
Remark 2 Definition 4 does not cover the most general class of stochastic dynamic games since (i) the state measurements of the players could be of stochastic nature by taking hi to bea mapping X x 0 ~y~, (ii) the duration of
210
DYNAMIC NONCOOPERATIVE GAME THEORY
the game might not be fixed a priori, but be a variant of the players' actions as well as the outcome(s) of the chance move(s), and (iii) the order in which the players act might not be fixed a priori, but again depend on the outcome of the chance move as well as on the actions of the players (cf. Witsenhausen, 1971a). Such extensions will, however, not be considered in this book. Yet another possible extension which is also valid in deterministic dynamic games (and which is included in the formulation of finite games in extensive form in Chapters 2 and 3)is the case when the action sets VW E N, k E K) of the players are structurally dependent on the history of the evolution of the game. Such games will also not be treated in the following chapters. 0
5.3 CONTINUOUSTIME INFINITE DYNAMIC GAMES Continuoustime infinite dynamic games, also known as differential games in the literature, constitute a class of decision problems wherein the evolution of the state is described by a differential equation and the players act throughout a time interval. Hence, as a counterpart of Def. 1, we can formulate such games of prespecified fixed duration as follows: DEFINITION 5 A quantitative N person differential game of prespecified fixed duration involves the following. (i) An index set N = {I, ... ,N} called the players' set. (ii) A time interval [0, TJ which is specified a priori and which denotes the duration of the evolution of the game. (iii) An infinite set So with some topological structure, called the trajectory space of the game. Its elements are denoted as {x(t), t ::;; T}and constitute the permissible state trajectories of the game. Furthermore, for each fixed t E [0, TJ, x(t) E So, where SO is a subset of a finitedimensional vector space, say R", (iv) An infinite set Vi with some topological structure, defined for each i E N and which is called the control (action) space of Pi, whose elements {ui(t), 0::;; t ::;; T} are the control functions or simply the controls of Pi. Furthermore, there exists a set S' ~ Rm, (i E N) so that, for each fixed t E [0, TJ, ui(t) E (v) A differential equation
°::;
s:
dx(t) _
1
N
_
x(t), u (t), ... ,u (t», x(o) 
X
o,
(12)
whose solution describes the state trajectory of the game correspond
5
211
GENERAL FORMULATION
ing to the Ntuple of control functions {ui(t), o:s; t:S; T} (iEN) and the given initial state X o . (vi) A setvalued function l1 i ( . ) defined for each i E N as l1 i(t)
= {x(s), o:s; s :s; e:}, o:s; e: :s; t,
e:
(13)
where is nondecreasing in t, and l1 i (t) determines the state information gained and recalled by Pi at time t E [0, Specification of l1 i ( . ) (in fact, e; in this formulation) characterizes the information structure (pattern) of Pi, and the collection (over i E N) of these information structures is the information structure of the game. (vii) A sigmafield N; in So, generated for each i E N by the cylinder sets {x E So, x(s) E B} where B is a Borel set in SO and :s; s :s; et. N;, t ~ to, is called the information field of Pi. (viii) A prespecified class r of mappings yi: [0, T] x So + s', with the property that ui(t) = yi(t, x) is N;measurable (i.e. it is adapted to the information field N:). r is the strategy space of Pi and each of its elements yi is a permissible strategy for Pi. (ix) Two functionals qi: SO + R, gi: [0, T] x SO X SI X . . . X SN + R defined for each i E N, so that the composite functional
n.
°
is welldefined t for every ui(t) = yi(t, x), yi E r i (j EN), and for each i EN. L i is game of fixed duration. 0 the cost functional of Pi in the differenti~l A differential game, as formulated above, is yet not welldefined unless we impose some additional restrictions on some of the terms introduced. In particular, we have to impose conditions on f and ri(i EN), so that the differential equation (12) admits a unique solution for every Ntuple {ui(t) = yi(t, x), iEN}, with yiEr i . A nonunique solution to (12)is clearly not allowed under the extensive form description of a dynamic game, since it corresponds to nonunique state trajectories (or game paths) and thereby to a possible nonuniqueness in the cost functions for a single N tuple of strategies. We now provide below in Thm 1 a set of conditions under which this uniqueness requirement is fulfilled. But first we list down some information structures within the context of deterministic differential games, as a counterpart of Def. 2. t
This term will be made precise in the sequel.
212
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION 6 In an N person continuoustime deterministic dynamic game (differential game) of prespecified fixed duration [0, T], we say that Pi's information structure is (i) openloop (OL) pattern if1/i(t) = {xo}, tE[O, (ii) closedloop perfect state (CLPS) pattern if
n,
1/ i (t) = {x (s),O $: s
s
t},
tE[O,
n,
(iii) edelayed closedloop perfect state (BDCLPS) pattern if .
1/' (t) =
°
{{xO},O$:t$:E {xIs), $: s $: tB}, B < t
°
where B > is fixed, (iv) memoryless perfect state (MPS) pattern if 1/i(t) = {xo, x (t)}, t E [0, (v) feedback (perfect state) (FB) pattern if1/i(t) = {x(t)}, tE[O, T].t
n,t 0
THEOREM 1. Within the framework of De! 5, let the information structurefor each player be anyone of the information patterns of De! 6. Furthermore, let So = CEO, T]. Then, if (i) f(t, x, u 1, •.• , UN) is continuous in t E [0, T]for each x ESo, u i E Si, i EN, (ii) f(t, x, u l , . . . , UN) is uniformly Lipschitz in x, u 1 , . . . , UN; i.e.for some
k>
o,t
$: k max {llx(t)x(t)ll+ 0::;
x(.),
t:S;
T
x(. )EC"[O, n;
L
Ilui(t)li(t)II},
ieN
ui (
. ),
ui (
.
)EU i
(iEN),
(iii) for yiE ri(i EN), yilt, x) is continuous in t for each x ( . ) E C[O, nand uniformly Lipschitz in x( . )EC[O, T], the dllferential equation (12) admits a unique solution (i.e. a unique state trajectory for every yi Eri(i EN), so that ui(t) = yilt, x), and furthermore this unique trajectory is continuous. Proof It follows from a standard result on the existence of unique continuous solutions to differential equations. 0 Remark 3 Theorem 1 provides only one set of conditions which are sufficient to insure existence of a unique state trajectory for every Ntuple of Note that these two are not covered by (13) in Def. 5. t llvll denotes here the Euclidean norm of the vector v.
t
5
213
GENERAL FORMULATION
strategies {yi Er i ; i EN}, which necessarily implies a welldefined differential game problem within the framework of Def. 5. Since these conditions are all related to existence of a unique solution to
dxfr) _
i E N,
(15)
they can definitely be relaxed (but slightly) by making use of the available theory on functional differential equations (see, for example, Hale (1977). But, inevitably, these conditions all involve some sort of Lipschitzcontinuity on the permissible strategies of the players. However, although such a restriction could be reasonable in the extreme case of oneperson differential games (i.e. optimal control problems), it might be quite demanding in an Nplayer (N ~ 2) differential game. To illustrate this point, consider, for instance, the oneplayer game described by the scalar differential equation
~:
= u
1
,
x(O) = 0,
(i)
and adopt the strategy yl(t, x) = sgn(x(t)). The solution to (i) with u 1(t) = yl (t, x) is clearly not unique; a multiple of solutions exist. If we adopt yet another strategy, viz. yl (r, x) =  sgn(x(t)), then the solution to (i)does not even exist in the classical sense. Hence, a relaxation of the Lipschitzcontinuity condition on the permissible strategies could make an optimal control problem quite illdefined. In such a problem, the single player may be satisfied with smooth (but suboptimal) strategies. In differential games, however, it is unlikely that players are willing to restrict themselves to smooth strategies voluntarily. If one player would restrict his strategy to be Lipschitz, the other player(s) may be able to exploit this. Specifically in pursuitevasion games (see Chapter 8) the players play "on the razor's edge", and such a restriction could result in a drastic change in the outcome of the game. In conclusion, nonLipschitz strategies cannot easily be put into a rigorous mathematical framework. On the other hand, in many games, we do not want the strategy spaces to comprise only smooth mappings. These difficulties may show up especially under the Nash equilibrium solution concept. In twoperson Stackelberg games, however, nonLipschitz strategies can more easily be handled in the general formulation, since one of the player's (follower's) choice of strategy is allowed to depend on the other player's (leader's) announced strategy. 0 The saddlepoint, Nash and Stackelberg equilibrium concepts introduced earlier for finite games are equally valid for (continuoustime) differential games if we bring them into equivalent normal form. To this end, we start with
214
DYNAMIC NONCOOPERATIVE GAME THEORY
the extensive form description of a differential game, as provided in Def. 5 and under the assumptions of Thm 1, and for each fixed Ntuple of strategies {yiEr i ; iEN} we obtain the unique solution of the functional differential equation (15) and determine the corresponding action (control) vectors u i ( . ) = yi(., x), i EN. Substitution of these into (14), together with the corresponding unique state trajectory, thus yields an Ntuple of numbers {L i ; i EN}, for each choice of strategies by the playersassuming of course that functions gi(i E N) are integrable, so that (14) are welldefined. Therefore, we have mappings i', r x ... x r N + R (i EN) for each fixed initial state vector X O' which we call the cost functional of Pi in a differential game in normal form. These cost functionals, together with the strategy spaces {r i, i E N} of the players, then constitute the equivalent normal form description of the differential game, which is the right framework to introduce noncooperative equilibrium solution concepts, as we have done earlier for other classes of dynamic games.
Termination Definition 5 covers differential games of fixed prespecified duration; however, as discussed in section 2 within the context of discretetime games, it is possible to extend such a formulation so that the end point in both state and time is a variable. Let SO again denote a subset of R", and R + denote the halfopen interval [0,00). Let a closed subset A c SO x R + be given, which we call a terminating (target) set, so that (xo, O)¢A. Then, we say that the differential game terminates, for a given N tuple of strategies, at time T E R + if T is the smallest element of R + with the property (x (T), T) E A, i.e. T=min{tER+:(x(t),t)EA}.
(16)
This positive number T is called the terminal time of the differential game, corresponding to the given N tuple of strategies. Terminal time is sometimes defined in a slightly different way, as the smallest time at which x ( .) penetrates A, i.e. where A denotes the interior of A. The two definitions can only differ in situations where the trajectory x( . ) belongs to the boundary of A for a while or only touches A at one instant of time. Unless stated differently, we shall adopt (16) as the definition of the terminal time. The question of whether a given differential game necessarily terminates is one that requires some further discussion. If there is a finite time, say t l ' such that (x, tdEA for all XESo, then the game always terminates, in at most t 1 units of time. Such a finite t 1, however, does not always exist, as elucidated by the following optimal control (oneplayer differential game) problem.
5
GENERAL FORMULATION
215
Example 1 Consider the optimal control problem described by the twodimensional state equation
Xl = u
X2
=
,Xl (0) = 1 X2(0) = I,
X 2 +u,
where u( . ) is the control (action) variable satisfying the constraint 0 s u(t) s 3 for all t E [0, 00). Let the terminal time T be defined as T = min {t E [0, 00): X2{t) = 2}, and a permissible OL strategy y be defined as a mapping from [0,00) into [0,3]. Now, if the cost function is L(u) = J~ 1 dt = T, then the minimizing strategy is y* (t) = 3, t ~ 0, with the corresponding terminal time (and thereby the minimum value of L) being In 2. If the cost function to be minimized is L(u) = Xl (T), however, the player may not have an incentive to terminate this "oneplayer game" since, as long as termination has not been achieved, his cost is not defined. So, on the one hand, he does not want to terminate the game; while on the other hand, if he has an incentive to terminate it, he should do it as soon as possible, which dictates him to employ the strategy y* (t) = 3, t ~ O. The player is faced with a dilemma here, since the latter cost function is not welldefined for all strategies available to the player. This ambiguity can, however, be removed by either (i) restricting the class of permissible strategies to the class of socalled playable strategies which are those that terminate the game in finite time, or (ii) extending the definition of the cost functional so that L(u) = {Xl (T), if T is ~nite 00, otherwise which eliminates any possible incentive for the player not to terminate the game. In both cases, the optimal strategy will be y*(t) = 3, t ~ O. 0 The difficulties encountered in the preceding optimal control example, as well as the proposed ways out of the dilemma, are also valid for differential games, and this motivates us to introduce the following concept of "playability". DEFINITION 7 For a given Nperson differential game with a target set A, a strategy N tuple is said to be playable (at (to, xo)) if it generates a trajectory x(.) such that (x(t), t) E A for finite t. Such a trajectory X (.) is called terminating.
o
Differential games with chance moves t Chance moves can be introduced in the formulation of differential games by t Here it is assumed that the reader has some prior knowledge of stochastic processes. If this is not the case, either this part may be skipped (without much loss, since this formulation is used later only in section 6 of Chapter 6), or the standard references may be consulted for background knowledge (Fleming and Rishel, 1975; Gikhman and Skorohod, 1972; Wong, 1971).
216
DYNAMIC NONCOOPERATIVE GAME THEORY
basically following the same lines as in the discretetime case (cf. Def. 4), but this time one has to be mathematically more precise. In particular, if we assume the chance player (nature) to influence the state trajectory throughout a given time interval, then actions of this additional player will be realizations of a stochastic process {Ot,t ~ O} whose statistics are known a priori. If we adjoin such a stochastic process to (12), then the resulting "differential equation" dxtr) ~
= F(t,x(t),u 1 (r),
N
... , u (t),O,),
x(O) = Xo,
might not be welldefined, in the sense that even though its solution might be unique for each realization (sample path) of {O" t ~ O}, it might not exist as a stochastic process. To obtain a meaningful formulation and tractable results, one has to impose some restrictions of F and {Ot,t ~ O}. One particular such assumption is to consider equations of the form x, =
Xo
+ J~
F (s,
X.,
u 1 (s), ... ,uN(s)ds + J~CT(S,
x.)dO.
(17a)
where F satisfies the conditions imposed on f in Def.5, CT satisfies similar conditions in its arguments s and x., and {Ot, t ~ O} is a special type of a stochastic process called the Wiener process. Equation (17a) can also be written symbolically as dx, = F (s, x, u 1 (r), ... , uN(t)) dt
+ u(t,
xt)dw t, x, I,~O
= Xo
(17b)
where we have used W t instead of 0t to denote that it is the Wiener process. It should further be noted that in both (17a) and (17b), x(.) is written as x(.), to indicate explicitly that it now stands for a stochastic process instead of a deterministic function. Equation (17b) is known as a stochastic differential equation, and existence and uniqueness properties of its solution, whenever u i (.) = yi(., x), yi E r, (i E N) are elucidated in the following theorem, whose proof can be found in Gikhman and Skorohod (1972). THEOREM 2 Let r 1, . . . , r N denote the strategy spaces of the players under anyone of the information patterns ofDe! 6. Furthermore let So= Cn[O,T], F satisfy the requirements imposed on fin Thm 1, and yi E r (i E N) satisfy the restrictions imposed on yi in Thm 1 (iii). If,further, CT is a nonsinqular (n x n) matrix, whose elements are continuous in t and uniformly Lipschitz in x, the stochastic differential equation (17b) with u i ( . ) = yi(., x), yi E r (i EN), admits as its solution a unique stochastic process with continuous sample paths, for every such Ntuple of strategies. 0
To complete the formulation of a differential game with chance moves (i.e.a stochastic differential game), it will now be sufficient to replace L i in Def. 5 (ix) with the expected value of the same expression, with the expectation operation
5
GENERAL FORMULATION
217
taken with respect to the statistics of the Wiener process { Wt, t ~ O} and the initial state X o. The game can then be converted into normal form by determining, in the usual sense, functions s'. r x ... r N + R(i EN), which is now the suitable form to introduce the solution concepts already discussed earlier.
5.4
MIXED AND BEHAVIOURAL STRATEGIES IN INFINITE DYNAMIC GAMES
In Chapter 2, we have defined mixed strategy for a player as a probability distribution on the space of his pure strategies, or equivalently, as a random variable whose values are the player's pure strategies (cf Def. 2.2), which was also adopted in Chapters 3 and 4, for finite games and static infinite games, respectively. Defined in this way, mixed strategy is a mathematically wellestablished object, mainly because the strategy spaces are finite for the former class of problems, and at most a continuum for the latter class of games, thereby allowing one to introduce (Borel) measurable subsets of these strategy spaces, on which probability measures can be defined. An attempt to extend this directly to infinite dynamic games, however, is hampered by several measuretheoretic difficulties (Aumann, 1964). To illustrate the extent of these difficulties, let us consider a simple twostage twoperson dynamic game defined by the state equation
°
xz=l+u 1 z X 3 = Xz +u ,
where  1 :s; u 1 :s; 0, :s; U Z :s; 1, and P2 has access to closedloop perfect state information (i.e. he knows the value of xz). A mixed strategy for PI can easily be defined in this case, since his pure strategy space is [ 1,0] which is endowed with a measurable structure, viz. its Borel subsets. However, for P2, the permissible (pure) strategies are measurable mappings from [0,1] into [0,1], and thus the permissible strategy space of P2 is r = II where I denotes the unit interval. In order to define a probability distribution on I I, we have to endow it with a measurable structure, but no such structure exists which is suitable for the problem under consideration (see, Aumann, 1961). Intuitively, such a difficulty arises because the set of all probability distributions on I is already an extremely rich class, so that if one wants to define a similar object on a domain II whose cardinality is higher than that of I, such an increase in cardinality cannot reflect itself on the set of probability distributions. An alternative, then, is to adopt the "random variable" definition of a mixed strategy. This involves a sample space, say n, and a class of measurable mappings from n into I I, with each of these mappings being a candidate mixed
218
DYNAMIC NONCOOPERATIVE GAME THEORY
strategy. The key issue here, now, is the choice of the sample space Q. In fact, since Q stands for a random device, the question can be rephrased as the choice of a random device whose outcomes are rich enough to be compatible with II. But the richest one is the continuous roulette wheel which corresponds to a sample space Q as a copy of the unit interval I. t Such a consideration then leads to the conclusion that the random variable (mixed strategy) should be a measurable mapping ffrom Q into II. Unfortunately, this approach also leads into difficulties, since one still has to define a measurable structure on the range space II. But now, if we note that, to every function [: Q + II, there corresponds a function g: Q x I + I defined by g (ro, x) = f(ro)(x), then the "measurability" difficulty is resolved since one can define measurable mappings from Q x I into I. This conclusion of course remains valid when the action set I is replaced by any measurable space U, and the information space I is replaced by any measurable space N. Moreover, an extension to multistage infinite dynamic games is also possible as the following definition elucidates. DEFINITION 8 Given a Kstage discretetime infinite dynamic game within the framework of Oef. 1, let uL Nl be measurable spaces for each i E N, k E K. Then, a mixed strategy yi for Pi is a sequence yi = (yL ... ,y~) of measurable mappings y~ :Q x N~ + U~ (j E K), so that yi (ro,.) E I" for every ro E Q, where Q is a fixed sample space (taken as a copy of the unit interval). 0
A direct extension of this definition to continuoustime systems would be as follows. 9 Given a differential game that fits the framework of Def 5, let N) be measurable spaces. Then, a mixed strategy yi for Pi, in this game, is a measurable transformation yi: Q x [0, T] x So + s, so that yi (co, . , .) E r i for every co En, where n is a fixed sample space (taken as a copy of the unit interval). Equivalently, yi (. , . , x) is a stochastic process for each XES o . 0 DEFINITION
So and s' (i
E
A behavioural strategy, on the other hand, has been defined (cf. Def 2.9)as a collection of probability distributions, one for each information set of the player. In other words, using the notation of section 2.4, it is a mixed strategy y(.) with the property that y(11 d and y(112) are independent random variables whenever 111 and 112 belong to different information sets. If we extend this notion directly to multistage infinite dynamic games (and within the framework of Def 8), we have to define a behavioural strategy as a mixed t This follows from the intuitive idea of a random device (cf. Aumann, 1964). It is, of course, possible to consider purely abstract random devices which are even richer.
5
GENERAL FORMULATION
219
strategy Yi(. , .) with the property that the collection of random variables { y) (. , tl)), tl) EN) ; j E K} are mutually independent. But, since N) is in general a nondenumerable set, in infinite dynamic games this would imply that we have a nondenumerable number of mutually independent bounded random variables on the same sample space. This is clearly not possible, since any bounded random variable defined on our sample space n should have a countable basis (cf. Loeve, 1963). Then, a way out of this difficulty is to define behavioural strategy as a mixed strategy which is independent from stage to stage, but not necessarily stagewise. Aumann actually discusses that stagewise correlation is quite irrelevant (Aumann, 1964),and the expected cost function is invariant under such a correlation. Therefore, we have DEFINITION 10 Given a Kstage discretetime infinite dynamic game within the framework of Def 1, let U LN ~ be measurable spaces for each i EN, k E K. Then, a behavioural strategy yi for Pi is a mixed strategy (cf. Def 8) with the further property that the set of random variables {y)(. , tl)), j E K} are mutually independent for every fixed tl) E N) (j E K). 0
This definition of a behavioural strategy can easily be extended to multistage games with a (countably) infinite number of stages; however, an extension to continuoustime infinite dynamic games is not possible since the time set [O,T] is not denumerable.
5.5 TOOLS FOR ONEPERSON OPTIMIZATION Since optimal control problems constitute a special class of infinite dynamic games with one player and one criterion, the mathematical tools available for such problems may certainly be useful in dynamic game theory. This holds particularly true if the players adopt the noncooperative Nash equilibrium solution concept, in which case each player is faced with a single criterion optimization problem (i.e. an optimal control problem) with the strategies of the remaining players taken to be fixed at their equilibrium values. Hence, in order to verify whether a given set of strategies is in Nash equilibrium, we inevitably have to utilize the tools of optimal control theory. We, therefore, present in this section some important results on dynamic oneperson optimization problems so as to provide an introduction to the theory of subsequent chapters. The section comprises four subsections. The first two deal with the dynamic programming technique applied to discretetime and continuoustime optimal control problems, and the third is devoted to the "minimum principle". For more details on, and a rigorous treatment of, the material presented in these
220
DYNAMIC NONCOOPERATIVE GAME THEORY
subsections the reader is referred to Fleming and Rishel (1975)and Boltyanski (1978). Finally, the last subsection (section 5.5.4) introduces the notion of "representations of strategies along trajectories".
5.5.1
Dynamic programming for discretetime systems
The method of dynamic programming is based on the principle of optimality which states that an optimal strategy has the property that, whatever the initial state and time are, all remaining decisions (from that particular initial state and particular initial time onwards) must also constitute an optimal strategy. To exploit this principle, we work backwards in time, starting at all possible final states with the corresponding final times. Such a technique has already been used in this book within the context of finite dynamic (multiact) games, specifically in sections 2.5 and 3.5, in the derivation of noncooperative equilibria, though we did not refer explicitly to dynamic programming. We now discuss the principle of optimality within the context of discretetime systems that fit the framework of Def. 1, but with only one player (i.e. N = 1). Towards that end, we consider equation (9b), assume feedback perfect state information and a stageadditive cost functional of the form (10), all for N = 1, i.e. (18a) L(u)
=
K
L gk(X k+ l ' Uk' x k ),
(18b)
k=1
where Uk = U~ = I'k(X k), I'k(') denotes a permissible (control) strategy at stage k E K, and K is a fixed positive integer. In order to determine the minimizing control strategy, we shall need the expression for the minimum (or minimal) cost from any starting point at any initial time. This is also called the valuefunction, and is defined as
[,f
l' XJ]
= min gi(Xi+ Ui' ¥,. ".¥K .=k = I' ;(Xi) E Viand x k = X. A direct application of the principle of V(k, x)
with Ui optimality now readily leads to the recursive relation V(k, x)
= min [gk
(fk(X, Uk), Uk'
x)+ V(k+ 1,!k(x,
Uk))].
(19)
I.t!~EUl
If the optimal control problem admits a solution u* = {ut, k E K}, then the solution V(I,xd of (19) should be equal to L(u*), and furthermore each ut should be determined as an argument of the RHS of (19). As a specific application, let us consider the socalled linearquadratic
5
221
GENERAL FORMULATION
discretetime optimal control problem, which is described by the state equation
(20a) and cost functional 1
L(u)
K
="2 k~1
(x~+
I
Qk+IXk+ I
+ U~RkUk)'
(20b)
Let us further assume that x, ERn, u, E R", Ck ERn, R k > 0, Qk + I ~ 0 for all k E K, and Au, Bu are matrices of appropriate dimensions. Here, the
corresponding expression for fk is obvious, but the one for gk is not uniquely defined; though, it is more convenient to take it as gk(Uk, x k) =
, k
Ui R I UI {
U~RkUk
+X~QkXk'
XK+IQK+IXK+I
=1
+1 + 1.
k
=1=
1, K
,k
=
K
We now obtain, by inspection, that V(k, x) should be a general quadratic function of x for all k E K, and that V(K + 1, XK+I) = tX~+lQK+IXK+I' This leads to the structural form V(k, x) = tX'SkX + X)k + qk' Substituting this in the recursive relation (19) we obtain the unique solution of the optimal control problem (20) as follows. PROPOSITION
1 The optimal control problem (20) admits the unique solution
ut = yt(x k) =
PkSk+IAkXkPk(Sk+1 +Sk+ICk), kEK
where P k= [Rk+B~Sk+IBkrIB~
(21)
Sk= Qk+A~Sk+I[IBkPkSk+I]Ak; Sk= A~[l
SK+l
= Qk+1
BkPkS k+ I]' [Sk+ 1+ Sk+ICk]; SK+ I = 0
Furthermore, the minimum value of (20b) is
L(u*)
=
~xiSIXI
+xisl +ql
(22)
where ql
=
~
ktl
C~Sk+
ICk  (Sk+ 1+ SN ICk)'
P~B~(Sk+
I+
s., ICk} + 2C~Sk+
I'
0
Remark 4 It should be noted that if the requirement Qk ~ 0 is not satisfied, a necessary and sufficient condition for (21) to provide a unique solution to the linearquadratic discretetime optimal control problem is R k + B~Sk+ IB k > 0,
VkEK.
0
222
DYNAMIC NONCOOPERATIVE GAME THEORY
5.5.2
Dynamic programming for continuoustime systems
The dynamic programming approach, when applied to optimal control problems defined in continuous time, leads to a partial differential equation (POE) which is known as the HamiltonJacobiBellman (HJB) equation. Towards this end we consider the optimal control problem defined by
x = f(t, x, u(t)), = y(t, x (t)),
u(t)
S;
x(O) = X o, t u(.) E S, Y E I',
~O
= g(t, x, u) dt + q(T, x(T)) T = min {t: l(t, x(t)) = O} '"0
L(u)
(23)
and confine our analysis to the class r of all admissible feedback strategies. The minimum costtogo from any initial state (x) and any initial time (r) is described by the socalled value function which is defined by V(t, x)
=
min {.(sl.s
"'f
[fg(s, x(s), u(s))ds + q(T, x(T))],
(24a)
satisfying the boundary condition V(T, x)
= q(T, x).
(24b)
A direct application of the principle of optimality on (24a), under the assumption of continuous differentiability of V, leads to the HJB equation 
a V(t, x) at
=
. [a V(t, x) ] ax f(t, x, u) + g(t, x, u) ,
m~n
(25)
which takes (24b) as the boundary condition. In general, it is not easy to compute V(t, x). Moreover, the continuous differentiability assumption imposed on V(t, x) is rather restrictive (see for instance Example 2 in section 5.5.3,for which it does not hold). Nevertheless, if such a function exists, then the HJB equation (25) provides a means of obtaining the optimal control strategy. This "sufficiency" result is now proven in the following theorem. THEOREM 3 If a continuously differentiable function V(t, x) can befound that satisfies the HJ B equation (25) subject to the boundary condition (24b), then it generates the optimal strategy through the static (pointwise) minimization problem defined by the RHS of (25).
Proof If we are given two strategies, y* E r (the optimal one) and y E r (an arbitrary one), with the corresponding terminating trajectories x* and x, and
5
223
GENERAL FORMULATION
terminal times T* and T, respectively, then (25) reads 9 (t, x, u) + +
aV(t, x) aV(t,at x) ax f(t, x, u) +
* *
2 g(t, x , u )
a
a V(t, x*)f( * *) + V(t, x*) = 0 ax t, x , u at ,
(26)
where y* and y have been replaced by the corresponding controls u* and u, respectively. Integrating the LHS from 0 to T and the RHS from 0 to T *, we obtain I;g(t, x, u) dt + V(T, x(T))  V(O, x o ) 20,
(27)
I t g(t, x*, u*)dt + V(T*, x*(T*))  V(O, X o) = O.
(28)
Elimination of V(O, xo) yields
f; g(t, x, u) dt + q(T, x(T)) 2 I;* g(t, x*, u*)dt + q(T*, x*(T*)),
(29)
from which it readily follows that u* is the optimal control, and therefore y* is the optimal strategy. 0 Remark 5 If, in the problem statement (23), time variable t does not appear explicitly, i.e. if f, g, q and I are timeinvariant, the corresponding value function will also be timeinvariant. This then implies that (I VIOt = 0, and the resulting optimal strategy can be expressed as a function of only x(t). 0 Remark 6 An alternative derivation, of a geometrical nature, can be given of the HJB equation (25). Since it is a special case of the more general derivation given in section 8.2 for twoplayer zerosum differential games, it will not be treated here. 0
5.5.3
The minimum principle
Continuoustime systems In this subsection our starting point will be the HJB equation under the additional assumption that V is twice continuously differentiable, and we shall convert it into a series of pointwise optimization problems, indexed by the parameter t. This new form is in some cases more convenient to work with and it is closely related to the socalled minimum principle, as also discussed here. Towards this end we first introduce the function

H(t,x,u)
=
a V(t,x) ax f(t,x,u)+g(t,x,u),
(30)
224
DYNAMIC NONCOOPERATIVE GAME THEORY
in terms of which equation (25) can be written as oV
~
ut

= min H(t,x,u).
(31)
u
Being consistent with our earlier convention, the minimizing u will be denoted by u*. Then _ * oV(t,x)_ (32) H(t,x,u)+ ot = O. This is an identity in x and the partial derivatives of this function with respect to x are also zero. This leads to, also by interchanging the orders of second partial derivatives (which requires V to be twice continuously differentiable): og ox
+~
dt
(0 V) + 0V of + oR ou* = o. ox
ox ox
(33)
au ox
If there are no constraints on u, then oR lou = 0 for u = u* according to equation (31). If there are constraints on u, and u* is not an interior point, then it can be shown that (oR lou)' (au* lox) = 0 (because of optimality, aR lau and au* lax are orthogonal; for specific problems we may have au* lox = 0). In view of this, equation (33) becomes og ax
d
+ dt
(0axV) + ax a V af ax = o.
(34)
By introducing the socalled costate vector, p'(t) = a V(t,x*)lox, where x* denotes the state trajectory corresponding to u*, equation (34) can be written as dp' _
a [(
a
(35)
H (r, p, x, u) = g(t, x, u) + p'(t)f(t, x, u).
(36)
* *) +p '( t )f( t,x *,u *)] = _ ox H( t.p,» *,u *) ox gt,x,u
where H is defined by
The boundary condition for p(t) is determined from , p (T)
=
a V(T,x*) ox
=
oq(T,x*) ax along I(T, x)
= O.
(37)
In conclusion, we have arrived at the following necessary condition for the optimal control u*(.): under the assumption that the value function V(t,x) is twice continuously differentiable, the optimal control u*(t) and corresponding trajectory x*(t) must satisfy the following socalled canonical equations
5 x*(t)
OH)' = f(t, x*, u*), = (iii
p'(t)
=
, P (T)
=
225
GENERAL FORMULATION
x(to)
=
X o;
oH (t, p, x*, u*) ox oq(T,x*) ax along I(T,x)
= 0;
(38)
H (t, p, x, u) ~ g(t, x, u) + p'(t)f(t, x, u) u*(t)
= arg min H(t,p,x*,u). UES
In the derivation of (38), the controls have been assumed to be functions of time and state; i.e. u(t) = y(t, x(t)). If, instead, one starts with the class of control functions which depend only on time, i.e. u(t) = y(t), the set of necessary conditions (38) can be derived in quite a different way, and under milder assumptions. Following such a derivation, one obtains the following result which can, for instance, be found in Fleming and Rishel (1975) and which is referred to as the minimum principle. THEOREM 4 Consider the optimal control problem defined by (23) and under the OL information structure. If the functions f, g, q and I are continuously differentiable in x and continuous in t and u, then relations (38) provide a set of necessary conditions for the optimal control arid the corresponding optimal trajectory to satisfy. 0
Remark 7 A particular class of optimal control problems which are not covered by Thm 4 are those with a fixed terminal condition, i.e. with the terminal time T and the terminal state x(T) prescribed. It should be noted that the function I is not differentiable in this case; however, the necessary conditions for the optimal control are still given by (38), with the exception that the condition on p(T) is absent and instead the terminal state constraint x(T) = x f replaces it. We also note, in passing, that in the theory of zerosum differential games, reasonable problems with state space dimension ~ 2 will (almost) never be formulated with a fixed prescribed terminal state, since it is not possible to enforce this endpoint constraint on both players when they have totally conflicting objectives. 0
The following example now illustrates how Thm 4 can be employed to obtain the solution of an optimal control problem for which the value function V is not continuously differentiable. It also exhibits some features which are
226
DYNAMIC NONCOOPERATIVE GAME THEORY
frequently met in the theory of differential games, and introduces some terminology which will be useful in Chapter 8. Example 2 Consider an openloop control problem with systems dynamics described by (39) where the control satisfies the constraint  1 ~ u(t) ~ 1 for all t ~ O. The objective is to steer the system from an arbitrary but known initial point in the (x 1, x 2) plane to the line Xl = X2 in minimum time. Hence, the cost function can be written as L(u) = J~g(t, x, u)dt = J~ldt = T, where T is defined as the first instant for which the function l(t,xdt),x 2(t)) == x l(t)X2(t)
becomes zero. In the derivation to follow, we shall restrict ourselves to initial points satisfying Xl  X2 < O. Initial points with the property Xl  X2 > 0 can be dealt with analogously. Now, application of relations (38) yields (with P~(Pl,P2)'):
H = 1 + Pi X2
Pi
= 0,
P2
+ P2 U , }
=  Pi>
(40)
u*(t) = sgn{P2(t)},
where the costate variable P2(t) is also known as the switchingfunction, since it determines the sign of u*(t). For points on the line Xl = X2' which can be reached by optimal trajectories, obviously L == 0 and therefore the costate vector p(T) = (Pi (T), P2 (T))' will be orthogonal to the line Xl = X2 and point in the direction "southeast". Since the magnitude of the costate vector is irrelevant in this problem, we take (41)
Pl(T) = 1,
which leads to u*(t)
={
 I +1
for t < T  1, for T  1 ~ t ~ T,
(42)
assuming, of course, that T > 1. Now, by integrating (39) backwards in time from an arbitrary terminal condition on the line Xl = X2 (which we take as xdT) = x 2(T) = a where a is the parameter), and by replacing u(t) with u*(t) as given by (42), we obtain xdt)
= (r 
T)a +t(t  T)2 + a,
x 2(t) = a+(tT),
5
227
GENERAL FORMULATION
for T  I :os; t :os; T. For t = T ' I we get xdT  I) = t, X2 (T) = a 1. The line XI = t. X2 = a  I is called, for varying a, the switching line. Since we only consider the "southeast" region of the state space, the trajectory (Xl (r), X 2 (r) should move into this region from xdT) = X2 (T) = a if t goes backwards in time, which is only true for a :os; 1. For a > I, the trajectories first move into the region X 2 > X I' in retrograde time. Those trajectories cannot be optimal, since we only consider the region X2 < X I' and, moreover, boundary conditions (41) are not valid for the region X 2 > XI' Hence only points xdt) = X2(T) = a with a :os; I can be terminal points. The line segment XI = X2 = a, a :os; I, is called the usable part (UP) of the terminal manifold or target set. Integrating the state equations backwards in time from the UP, and substituting t = T  I, we observe that only the line segment XI = !. X2 = a  I, a :os; I, is a switching line. Using the points on this switching line as new terminal state conditions, we can integrate the state equations further backwards in time with u*(t) =  1. The complete picture with the optimal trajectories is shown in Fig. I. These solutions are the only ones which satisfy the necessary conditions of Thm 4 and therefore they must be optimal, provided that an optimal solution exists. Once the optimal trajectories are known, it is not difficult to construct V (x). Curves of "equal timetogo", also called isochrones, have been indicated in Fig. 1.Such a curve is determined from the equation V (x) = constant. It is seen that along the curve AB the value function V (x) is discontinuous. Curves along
x,
Fig. 1. The solution of Example 2. The dotted lines denote lines of "equal timetogo".
228
DYNAMIC NONCOOPERATIVE GAME THEORY
which V (x) is discontinuous are called barriers. The reason behind the usage of such a terminology is that no optimal trajectory can never cross a curve along which V (x) is discontinuous. Note that, because of the discontinuity in V (x), Thm 3 cannot be utilized in this case. The corresponding feedback strategy y*(t, x), which we write as y*(x) since the system is autonomous (i.e. timeinvariant), can readily be obtained from Fig. 1:to the right of the switching line and barrier, we have y*(x) =  1,and to the left of the switching line and barrier, and also on the barrier, we have y*(x) = +1. 0 Linearquadratic problems
We now consider an important class of problemsthe socalled linearquadratic continuoustime optimal control problemsfor which V(t, x) is continuously differentiable, so that Thm 3 can be applied. Towards that end, let the system be described (as a continuoustime counterpart of (20)) by
x=
A(t)x+B(t)u(t)+c(t);
x(O) = x o ,
(43a)
and the cost functional to be minimized be given as
L(U)=~X'(T)QJX(T)+~f:
(x'Qx + u'Ru)dt,
(43b)
where x(t) E R", u(t) E R m, 0 $: t $: T and T is fixed. A(.), B(.), Q(.) ~ 0, R( .) > 0 are matrices of appropriate dimensions and with continuous entries on [0, T]. QJ is a nonnegative definite matrix and c(.) is a continuous vectorvalued function taking values in R". Furthermore, we adopt the feedback information pattern and take a typical control strategy as a continuous mapping y: [0, T] x R" + R". Denote the space of all such strategies by r. Then the optimal control problem is to find a y* E r such that J (y*)
where
s
J (y),
J(y)gL(u),
for all y E r,
withu(.)=y(.,x).
(44a)
(44b)
Several methods exist to obtain the optimal strategy y* or the optimal control function u*(.) = y*(., x). We shall derive the former by making use of Thm 3. Simple arguments (see Anderson and Moore, 1971) lead to the conclusion that min J (y) is quadratic in Xo' Moreover, it can be shown that, if the system is positioned at an arbitrary t E [0, T] at an arbitrary point x E R", the minimum costtogo, starting from this position, is quadratic in x. Therefore, we may assume existence of a continuously differentiable value function of the form V (t, x) = tx'S(t)x + k'(t)x + m(t),
(45)
5
229
GENERAL FORMULAnON
that satisfies (25). Here S (.) is a symmetric (n x n) matrix with continuously differentiable entries, k ( . ) is a continuously differentiable nvector, and m (. ) is a continuously differentiable function. If we can determine such S ( . ), k ( . ) and m (.) so that (45) satisfies the HJB equation, then Thm 3 justifies the assumption of the existence of a value function quadratic in x. Substitution of (45) into (25) leads to
tx'Sx x'/(  m = min [(Sx + k)'{Ax sBu +c)+tX'Qx +tu' Ru]. UER
(46)
Carrying out the minimization on RHS gives u* (z)
= y*(t, x(t)) =  R 1 B' [S(t)x(t) + k(t)],
(47)
substitution of which into (46) leads to an identity relation which is readily satisfied if S+SA +A'SSBR 1 B'S+ Q = 0,
k +A'k +Sc SBR 1 B'k = 0, m+k'c!k'BR 1 B'k = 0,
S(T) = QJ' k(T) = 0, m(T)= O.
(48a) (48b) (48c)
Thus we arrive at: PROPOSITION 2 The linearquadratic optimal control problem (43) admits a unique solution y* which is given by (47), where S('), k () and m (.) uniquely satisfy (48). The minimum value of the cost functional is
J (y*)
= ! x~ S (O)x o + k' (O)x o + m(O).
Proof Except for existence of a unique S ( .) ~ 0, k ( . ) and m( . ) that satisfy (48), the proof has been given prior to the statement of the proposition. Furthermore, if there exists a unique S (.) that satisfies the matrix Riccati equation (48a),existence of unique solutions to the remaining two differential equations of (48)is assured since they are linear in k and m, respectively. Then, what remains to be proven is that a unique solution S(.) ~ 0 to (48), which is also known as a matrix Riccati equation, exists on [0, T]. But this can readily be verified by either making use of the theory of differential equations (cf. Reid, 1972)or utilizing the specific form of the optimal control problem with QJ and Q(.) taken to be nonnegative definite (cf. Anderson and Moore, 1971). 0 Remark 8 Other methods exist, which yield the solution (47). Two of these are the minimum principle (see Bryson and Ho, 1969)and the "completing the square"method (see Brockett, 1970). 0
230
DYNAMIC NONCOOPERATIVE GAME THEORY
Remark 9 Nonnegative definiteness requirements on QJ and Q ( . ) may be removed, but we have to assume existence of a unique bounded solution to (48a) in order to insure existence of a unique minimizing control which is given by (47). Since then (48a) might not admit a solution (depending on the length of the time interval), implying that the "optimal" cost might tend to  00. To exemplify this situation consider the scalar example:
x=
x+u,
L(u) =
t s;( 
x(O) = X o, } x 2 + u2 )dt.
(49)
The Riccati equation (48a), for this example, becomes
S
28  8 2  1 = 0, 8 (T) = 0
(50)
which admits the solution 8(t)
=
T t Tlt
(51)
on the interval (T  1, T]. Hence, for T ~ 1, a continuously differentiable solution to (50) does not exist. This nonexistence of a solution to the matrix Riccati equation is directly related to the wellposedness of the optimal control problem (49), since it can readily be shown that, for T ~ 1, L(u) can be made arbitrarily small (negative) by choosing a proper u(t). In such a case we say that the Riccati equation has a conjugate point in [0, T] (cf. Sagan (1969) and Brockett (1970».
Discretetime systems We now conclude this subsection by stating a counterpart of Thm 4 for discretetime optimal control problems. Such problems have earlier been formulated by (18a) and (18b), but now we also assume that fk is continuously differentiable in Xk' and gk is continuously differentiable in X k and xk+ 1 (k E K). Then, the following theorem, whose derivation can be found in either Canon et al. (1970) or Boltyanski (1978), provides a minimum principle for such systems. THEOREM 5 For the discretetime optimal control problem described by (18a) and (18b), let (i) h(·, ud be continuously differentiable on R", (k E K), (ii) gd., Uk,') be continuously differentiable on R" X R", (k E K). Then, if { ut, k E K} denotes an optimal control sequence, and {xt + 1, k E K} denotes the corresponding state trajectory, there exists a finite sequence
5
GENERAL FORMULATION
231
{Pz, ... , PK+I} of ndimensional costate vectors so that the following relations are satisfied: Xt+1 =fk(xt, un
xt
= x.,
ut = arg min Hk(PH
I'
Uk' xn
"/ceU"
Pk =
O~k
fk(Xt, U:{PH 1 +
+ ( OXa
k
PK+I
(OX~+
1
9k(Xt+ I' ut, xt) )' ]
(52)
* * *»)',
9k(Xk+l>Uk,Xk
= 0,
where
The reader should now note that Prop. 1 also follows as a special case of this theorem.
5.5.4
Representations of strategies along trajectories
While discussing some important results of optimal control theory, in this section, we have so far confined our treatment to two specific information patterns and thereby to two types of strategies; namely, (i) the openloop information pattern dictates strategies that depend only on the initial state and time, (ii) the feedback information pattern forces the permissible strategies to depend only on the current value of the state and time. The former class of strategies is known as openloop strategies, while the latter class is referred to as feedback strategies. Referring back to the discretetime linearquadratic optimal control problem described by (20) (and assuming that C k = 0 for all k E K, for the sake of brevity in the discussion to follow), the solution presented in Prop. 1, i.e,
Y:(Xk) = PkSHIAkXb
kEK,
(53)
is a feedback strategy, and in fact it constitutes the unique solution of that optimal control problem within the class offeedback strategies. If the optimal control problem had been formulated under the openloop information pattern, then (53) would clearly not be a solution candidate. However, the unique optimal openloop strategy, for this problem, can readily be obtained from (53) as (54a)
232
DYNAMIC NONCOOPERATIVE GAME THEORY
where xt is the value of the state at stage k, obtained by substitution of (53)in the state equation; in other words, x* can be solved recursively from the difference equation (54b) It should be noted that (54a) is an openloop strategy since it only depends on the discretetime parameter k and the initial state Xl (through (54b»). We can also interpret (54a) as the realization of the feedback strategy (53) on the control set Uk' that is to say, (54a)is the action dictated by (53). For this reason, we also sometimes refer to (54a)as the openloop realization or openloop value of the feedback strategy (53). The question now is whether we can obtain other solutions (optimal strategies) for the optimal control problem under consideration by extending the strategy space to conform with some other information structure, say the closedloop perfect state pattern. If we denote by r k the class of all permissible closedloop strategies (under the CLPS pattern) at stage k (i.e. strategies of the form Yk(Xk,Xkl, ... , xd), then (53)is clearly an element ofthat space,and it constitutes an optimal solution to the optimal control problem (20) also on that enlarged strategy space. As a matter of fact, any element Yk of r k (k E K), which satisfies the boundary condition YdXt,Xtl"'" x!,xt)
=
PkSk+lAkxt,
kEK,
(55)
also constitutes an optimal solution. One such optimal policy is the onestep memory strategy Yk(Xk,xkd
=
PkSk+lAkXk+Ck[XkFklXkl],
where Fk ~ A k  BkPkSk+ 1 A k, and Ck is an appropriate dimensional arbitrary matrix. It should be noted that all such strategies have the same openloop value (which is the RHS of (55)) and generate the same state trajectory (54b), both of which follow from (55). Because of these properties, we call them different representations of a single strategy, a precise definition of which is given below. Since the foregoing analysis also applies to continuoustime systems, this definition covers both discretetime and continuoustime systems. 11 For a control problem with a strategy space I', a strategy YE r is said to be a representation of another strategy yE r if (i) they both generate the same unique state trajectory, and (ii) they both have the same openloop value on this trajectory. 0 DEFINITION
Remark 10 It should be noted that the notion of representation ofa strategy, as introduced above, involves no optimality. As a matter offact, it is a property
5
GENERAL FORMULATION
233
of the strategy space itself, together with the control system for which it is defined, and no cost functional has to be introduced. 0 A significance of the notion of "representations" in control problems is that it enables one to construct equivalence classes (of equal openloop value control laws) in the general class of closedloop strategies, so that there is essentially no difference between the elements of each equivalence class. In fact, for an optimal control problem, we can only talk about a unique optimal "equivalence class", instead of a unique optimal "strategy", since as we have discussed earlier within the context of the optimal control problem (20), every representation ofthefeedback strategy (53) constitutes an optimal solution to the optimal control problem, and there are uncountably many such (optimal) strategies. As a converse to the italicized statement, we can say that every solution of the optimal control problem (20) is a representation of the feedback strategy (53). Both of these statements readily follow from Def. 11, and they are valid not only for the specific optimal control problem (20) but for the general class of control problems defined in either discrete or continuous time. We now address ourselves to the problem of constructing equivalence classes of representations of strategies for a control problem with a given space (I") of closedloop strategies, and firstly for discretetime systems. The procedure, which follows directly from Def. 11, is as follows. For such a system, first determine the set (rod of all elements of r, which are strategies that depend only on the initial state x 1 and the discrete time parameter k. r OL' thus constructed, is the class of all permissible openloop controls in r. Now, let Y = {Yk(xd; kEK} be a chosen element of r O L ' which generates (by substitution into (l8a» a (necessarily) unique trajectory {Xk' k E K}. Then, consider all elements Y = {Yk (. ), k E K} of r with the properties (i) Y generates the same state trajectory as Y, and (ii) Ydxk,X k 1 , •.• , x2 , xd == Yk(xd, kEK. The subset of r thus constructed constitutes an equivalence class of representations which, in this case, have the openloop value y. If this procedure is executed for every element of r OL, then the construction of all equivalence classes of r becomes complete. For continuoustime systems, essentially the same procedure can be followed to construct equivalence classes of representations of strategies. In both discretetime and continuoustime infinite dynamic systems, such a construction leads, in general, to an uncountable number of elements in each equivalence class which necessarily contains one, and only one, openloop strategy. The main reason for the "nondenumerable" property of equivalence classes is that in deterministic problems the CLPS information pattern (which involves memory) exhibits redundancy in informationthus giving rise to existence of a plethora of different representations of the same openloop
234
DYNAMIC NONCOOPERATIVE GAME THEORY
policy. This "informational non uniqueness" property of the CLPS pattern is of no real worry to us in optimal control problems, since every representation of an optimal control strategy also constitutes an optimal solution. However, in dynamic games, the concept of "representations" plays a crucial role in the characterization of noncooperative equilibria, and the preceding italicized statements are no longer valid in a game situation. We defer details of an investigation in that direction to Chapters 6 and 7, and only provide here an extension of Def. 11 to dynamic games. DEFINITION 12 For an Nperson dynamic game with strategy spaces {r i; i EN} let the strategies of all players, except Pi, be fixed at yi Er i, j EN, j =1= i. Then, a strategy yi E r for Pi is a representation of another strategy yiEr i, with yiEr i (jEN,j =1= i) fixed, if (i) the Ntuples {yi,yi;jEN,j =1= i} and {yi,yi;jEN,j =1= i} generate the same unique state trajectory, and (ii) yi and yi have the same openloop value on this trajectory. 0
Stochastic systems We devote the remaining portion of this subsection to a briefdiscussion on the representation of closedloop strategies in stochastic control problems. In particular, we consider, as a special case ofDef. 4, the class of control problems described by the state equation Xk + 1 = fk (Xb ud + (Jk
(56)
where (Jdk E K) is a random vector taking values in R", and the joint probability distribution function of { Xl' (J l ' . . . , (J K} is known. Then, given a feedback strategy {Yk(')' k E K}, it is in general not possible to write down a representation of this strategy that also involves past values of the state, since now there is no redundancy in the CLPS information, due to existence of the disturbance factor (Jk' In other words, while in deterministic systems Xk+ 1 can easily be recovered from x, if one knows the strategy employed, this is no longer possible in stochastic systems because x k + 1 also contains some information concerning (Jk' This is particularly true if the random vectors Xl> (Jt, ... , (JK are Gaussian statistically independent and COV((Jk) > ot for all k E K. Hence, we have: PROPOSITION 3 If, for the stochastic system (56), all the random vectors are Gaussian, statistically independent, and the covariance of(Jk is offull rankfor all k E K, then every equivalence class of representations is a singleton. 0
Remark 11 The reason why we do not allow some random vectors to be t
COV(Ok) stands for the covariance of Ok' i.e. the quantity E[(Ok  E[Od )(Ok  E[OkJ),]'
5
235
GENERAL FORMULATION
statistically dependent and/or COV(Ok) to have lower rank is because then some components of x k+ 1 can be expressed in terms of some components of Xk' Xk1' etc. independent of Ok' Ok 1' . . . , which clearly gives rise to non unique representations of a strategy. Furthermore, we should note that the requirement of Gaussian probability distribution with positive definite covariance can be replaced by any probability distribution that assigns positive probability mass to every nonempty open subset of R". The latter requirement is important because, if any open subset of R" receives zero probability, then nonunique representations are possible on that particular open set. 0 Optimal control of stochastic systems described by (56)can be performed by following the basic steps ofdynamic programming applied to the deterministic system (18). Towards this end, we first replace the cost functional (l8b) by J('l') = Eet1 gk(Xk+ 1> Uk' xk)luk = 'l'k(Xj, I
s
k), k E K
1
(57)
where {'l'k; k E K} is any permissible closedloop strategy, and E[.] denotes the expectation operation defined with respect to the underlying statistics of x.; 0l> ... ,OK which we assume to be statistically independent. Then, the value function V(k, x) is defined by V(k, x)
= min Y.,···, YK
E
[.f I
gi(Xi+ 1, Ui' X;)IUi
= 'l'i(XZ,
=k
I~
i), i
= k, ... ,
KJ (58a)
with X k = x. Following the steps ofdynamic programming, and by utilizing the assumption of independence of the random vectors involved, it can readily be shown that the value function satisfies the recursive relation t V(k, x)
= min E6,[gk(.!k (Xk' Uk) + Ok' Uk> Xk) U.EV.
which also leads to the conclusion that the optimal strategy 'l'k(k E K) depends only on the current value of the state, i.e. it is a feedback strategy. We now apply this result to the socalled linearquadratic discretetime stochastic optimal control problem described by the linear state equation xk+ 1 = Akx k + BkUk + Ok
(59a)
and the quadratic cost functional J('l') = t
E[~
kt1
X~+1Qk+1Xk+1
+u~Rkukluk
1
= 'l'k(')' kEK
£6. denotes the expectation operation with respect to statistics of 9•.
(59b)
236
DYNAMIC NONCOOPERATIVE GAME THEORY
where Rk+ 1 > 0, Qk+ 1 ~ 0 (k E K); 8dk E K) are ndimensional independent Gaussian vectors with mean value ck , and covariance A k > 0, and Ydk E K) are permissible closedloop strategies. For this special case, V(k, x) again turns out to be quadratic in x, and this leads to the following counterpart of Prop. 1: PROPOSITION 4 For the linearquadratic stochastic control. problem described by (59), and under the closedloop perfect state information pattern, there exists a unique minimizing solution (which is afeedback strategy) given by
u: = y:(x k) = PkSk+lAkXkPk(Sk+l +Sk+lCk),kEK,
(60)
where all the terms are defined in Prop. 1.
0
Remark 12 It should be noted that the solution of the stochastic linearquadratic optimal control problem depends only on the mean value (ek' k E K) of the additive disturbance term but not on its covariance (A k , k E K).t Moreover, the solution coincides with the optimal feedback solution of the deterministic version of the problem (as presented in Prop. 1),which takes the mean value of the Gaussian disturbance term as a constant known input. Finally, we note that, in view of Remark 11, uniqueness of the solution is valid under the assumption that the covariance matrix A k is positive definite for all kEK. 0
5.6
PROBLEMS
1. A company has x(t) customers at time t and makes a total profit of f(t)
= J~
(ex(s)  u(s»ds,
up to time t, where e is a positive constant. The function u(t), restricted by u(t) ~ O,f(t) ~ 0, represents the money put into advertising. The restriction f(t) ~ 0 indicates that the company is not allowed to borrow money. Advertisement helps to increase the number of customers according to x = u, x(O) = X o > O. The company wants to maximize the total profit during a given time period [0, TJ, where T is fixed. Obtain both the openloop and feedback optimal solutions for this problem. 2. Consider the twodimensional system Xl
=
X2U
X2 = 1 +x1u, t
The minimum expected value of the cost function does, however, depend on {A., k E K}.
5
237
GENERAL FORMULATION
where the control variable u is scalar and satisfies Iu (t) I ~ 1. In polar coordinates, the equations of motion become
r = cosoc
rli =  ru + sin oc,
°
where o: is the angle measured clockwise from the positive x 2axis. The target set A. is loci ~ iX, where < iX < n. (For an interpretation of this system see Example 8.3, later in Chapter 8, with V2 = 0.) The decision maker wants the state (Xl, X2) to stay away from A. for as long as possible. Determine the timeoptimalcontrol. Show that, for ~ n12, there is a barrier. 3. Consider the following discretetime, discrete state space optimal control problem. The state space consists of the integers {O, 1,2, ... }. At time t k the decision maker steers the system whose position is integer ik ~ 1,either to {O} or to the integer ik + 1, where it arrives at time t H r The control problem terminates when the system is steered to the origin {O}. The payoff is completely determined by the last move. If the system moves from {ik } to {O}, then the payoff is 2 2;" to be maximized by the decision maker. Show that each playable pure strategy leads to a finite value, and that the behavioural strategy which dictates both alternatives with equal probability (!) at each stage leads to an infinite expected payoff. *4. Consider the following discretetime, discrete state space, zerosum differential game. The state space consists oftheintegers {O, 1,2, ... }.The target set is A. = {O}. Suppose that at time tk PI's position is integer i k • At the next time step he may move 0, 1 or 2 positions to the left, i.e. at t k + 1 PI's possible positions are i k , i k  I or t,  2. During each time step t k , P2 drops a bomb on one of the integers, and he knows PI's position at time t k  1 • However, P2 does not know what Pi's position at t k will be (apart from the fact that it is either i k _ 1, ik _ 1  1 or ik _ 1  2 if his position is i k _ 1 at t k _ 1)' The game ends when either PI reaches the integer 0 or P2 drops a bomb on PI's current position, whichever occurs first. The payoff function is PI's chance to reach {O} without being bombed, which PI wants to maximize and P2 wishes to minimize. Show that behavioural strategies are the appropriate ones to solve this game. Determine its saddle point and value in behavioural strategies.
a
5.7
NOTES
Section 5.2 As already noted in Remark 2, the statemodel description of a discretetime (deterministic or stochastic) infinite dynamic game as presented here is clearly not the most general extension of Kuhn's finite game model (cf. Chapters 2 and 3) to the
238
DYNAMIC NONCOOPERATIVE GAME THEORY
infinite case. One such extension which has not been covered in Remark 2 is the infinitehorizon problem wherein the number of stages K tends to infinity. In such games, the socalled stationary strategies [i.e. strategies which do not depend on the discrete time parameter k) become of real interest, and they have hitherto attracted considerable attention in the literature (see e.g. Shapley, 1953; Sobel, 1971; Maitra and Parthasarathy, 1970a, b), but mostly for finite or at most denumerable state and strategy spaces. The general class of discretetime stochastic dynamic games under the feedback information structure are also referred to as Markov games in the literature. For a recent survey article in this context, see Parthasarathy and Stern (1977). Section 5.3 Differential games were first introduced by Isaacs (195456), within the framework of twoperson zerosum games, whose work culminated in the publication of his book about ten years later (see Isaacs, 1975). Nonzero sum differential games were later introduced in the works of Starr and Ho (l969a, b), Case (1967) and Friedman (1971),but under specificinformation structuresnamely the openloop and feedback perfect state information patterns. Some representative references on stochastic differential games, on the other hand, are Friedman (1972), Elliott (1976) and Basar (1977b). A class ofdifferential games not covered by Def. 5 are those whose state dynamics are governed by partial differential equations. These will not be treated in this book; for some examples of practical games which fit into this framework, the reader is referred to Roxin (1977). Yet another class of differential games not to be covered in subsequent chapters are those on which the players are allowed to employ impulsive controls; see Blaquiere (1977) for some results on this topic. Section 5.4 Mixed and behavioural strategies in infinite dynamic games were first introduced by Aumann (1964) who also provided a rigorous extension of Kuhn's extensive form description of finite games to infinite games, and proved in this context that if an N person nonzerosum infinite dynamic game is of "perfect recall" for one player, then that player can restrict himself to behavioural strategies. For a further discussion of mixed strategies in differential games and for an elaboration on the relation of this concept with that of "relaxed controls" used in control theory (cf. Warga, 1972), the reader is referred to Wilson (1977), who also proves the existence of mixedstrategy saddle points in suitably structured zerosum differential games of fixed duration and with openloop information structure. Some other references which deal with existence of mixedstrategy equilibrium solutions in differential games are Elliott (1976),Pachter and Yavin (1979), Levine (1979)and Shiau and Kumar (1979). Section 5.5 The concept of "representations ofa strategy" was introduced by Basar in a series of papers, within the context of infinite dynamic games (see, e.g. Basar, 1974, 1976a, 1977b), and it was shown to be closely related to existence of nonunique Nash equilibria under the CLPS information pattern, as it will be elucidated in Chapter 6.
Chapter 6
Nash Equilibria of Infinite Dynamic Games 6.1
INTRODUCTION
This chapter discusses the properties and derivation of Nash equilibria in infinite dynamic games of prescribed fixed duration. The analysis is confined primarily to dynamic games defined in discretetime, and with a finite number of stages; however, counterparts of most of the results are obtained also in continuous time and are included in the two appendices following the main body of the chapter. Utilization of the two standard techniques ofoptimal control theory, viz.the minimum principle and dynamic programming, lead to the socalled openloop and feedback Nash equilibrium solutions, respectively. These two different Nash equilibria and their derivation and properties (such as existence and uniqueness) are discussed in section 2, and the results are also specialized to twoperson zerosum and linearquadratic games. When the underlying information structure for at least one player is dynamic and involves memory, a plethora of (socalled informationally nonunique) Nash equilibria with different sets of cost values exists, whose derivation entails some rather intricate analysisnot totally based on standard techniques of optimal control theory. This derivation, as well as several important features of Nash equilibria under closedloop perfect state information pattern, are discussed in section 3, first within the context of a scalar threeperson twostage game (cf. section 6.3.1) and then for general dynamic games in discrete time (cf. section 6.3.2). Section 4 is devoted to derivation of necessary and sufficient conditions for Nash equilibria in stochastic nonzerosum dynamic games with deterministic information patterns. Such a stochastic formulation eliminates informational non uniqueness, thereby making the question of existence of unique Nash equilibrium under closedloop perfect state information pattern a meaningful one. 239
240
DYNAMIC NONCOOPERATIVE GAME THEORY
The last two sections, also named Appendices, present the counterparts of the results of sections 2 and 4 in the continuous time, that is for differential games with fixed duration.
6,2 OPENLOOP AND FEEDBACK NASH EQUILIBRIA FOR DYNAMIC GAMES IN DISCRETE TIME Within the context of Def. 5.1, consider the class of Nperson discretetime deterministic infinite dynamic games of prescribed fixed duration (K stages) which are described by the state equation (1)
where X k E X = R" and x 1 is specified a priori. Furthermore, the control sets vi are taken as measurable subsets of R m, (i.e. vi s; R m,; k E K, i EN), and a stageadditive cost functional
(2) is given for each player i EN. In this section, we discuss derivation of Nash equilibria for this class of nonzerosum games when the information structure of the game is either (i) openloop or (ii) memoryless perfect state for all the players. In the former case, any permissible strategy for Pi at stage k EK is a constant function and therefore can be considered to be an element of V L i.e. r; = V~, k E K, i EN. Under the latter information structure, however, any permissible strategy for Pi at stage kEK is a measurable mapping ')Ii:X + Vi, iEN, kEK. With (k E K, i EN) taken as the appropriate strategy space in each case, we recall (from Def. 3.12)that an N tuple ofstrategies {')Ii' E r. i EN} constitutes a Nash equilibrium solution if, and only if, the following inequalities are satisfied for all {')liE r i; i EN}:
n
JI'~J1(')II';
')12'; ... ; ')IN')::; J1(')I1; ')12'; ... ; ')IN')
J 2' ~J2(')II';
')12'; ')13'; ... ; ')IN')
s
J 2(')II'; ')12; ')13'; ... ; ')IN') (3)
IN' ~JN(')II';
... ; ')INI'; ')IN')::; IN(')II'; ... ; ')INI'; ')IN).
Here, ')Ii ~ {')IL ... , ')I~}theaggregatestrategyofPiand the notation ')liE rstandsfor j] E rt, Vk EK. Moreover, Ji(')I1, ... , ')IN)isequivalentto L i(U1, ... , UN) with ul replaced by ')11(.) (i EN, k EK).
6
NASH EQUILIBRIA
241
Under the openloop information structure, we refer to the Nash solution as "openloop Nash equilibrium solution", which we discuss in the first subsection
to follow. For the memoryless perfect state information, the Nash solution will be referred to as "closedloop nomemory Nash equilibrium solution", and under a further restriction, which is the counterpart of Def. 3.22 in the present framework, it will be called "feedback Nash equilibrium solution". Both of these two types of equilibria will be discussed in section 6.2.2.
6.2.1
Openloop Nash equilibria
One method of obtaining the openloop Nash equilibrium solution(s) of the class of discretetime games formulated above is to view them as static infinite games and directly apply the analysis of Chapter 4. Towards this end we first note that, by backwards recursive substitution of (1) into (2), it is possible to express L i solely as functions of {ui;j EN} and the initial state Xl whose value is known a priori, where ui is defined as the aggregate control vector (u{', uL ... , uD'. This then implies that, to every given set of functions Uk' gi; k E K}, there corresponds a unique function D: X x V I X . . . x V N + R, which is the cost functional of Pi, i E N. Here, Vi denotes the aggregate control set of Pj, compatible with the requirement that if ui E V i then u~ E V~, '
242
DYNAMIC NONCOOPERATIVE GAME THEORY
1 For an Nperson discretetime infinite dynamic game, let (i) i'd., uf, , un be continuously differentiable on R", (k E K) (ii) g~ (. , ui, , uf,.) be continuously differentiable on R" x R", (kEK, iEN). Then, if V'(xd = u f ' ; i EN} provides an openloop Nash equilibrium solution and {x: + 1; k E K} is the corresponding state trajectory, there exists a finite sequence of ndimensional (costate) vectors {p~, ... , p~ + I} for each i E N such that the following relations are satisfied: THEOREM
(4a)
yf(x 1 ) == uk" = arg min ui€Ui
H~(p~+
1>
uf, ... , ut 1°, uk, Uk+ 10,
••• ,
ufo, xt)
(4b)
(4c)
where
Proof Consider the ith inequality of (3), which says that yiO(xd == u io minimizes Li(u I', ... , U i  1', u i , u i + I', ... , UN') over Vi subject to the state equation
But this is a standard optimal control problem for Pi since u" (j E K,j =1= i)are openloop controls and hence do not depend on ui . The result, then, follows directly from the minimum principle for discretetime control systems 0 (cf. Thm 5.5). Theorem 1 thus provides a set of necessary conditions (solvability of a set of coupled twopoint boundary value problems) for the openloop Nash solution to satisfy; in other words, it produces candidate equilibrium solutions. In principle, one has to determine all solutions of this set ofequations and further investigate which of these candidate solutions satisfy the original set of inequalities (3). If some further restrictions are imposed on j" and g~ (i E N, k E K) so that the resulting cost functional D (defined earlier, in this subsection) is convex in u i for all u j E Vi, j =1= i, j E K, the latter phase of
6
243
NASH EQUILIBRIA
verification can clearly be eliminated, since then every solution set of (4) constitutes an openloop Nash equilibrium solution. A specific class of problems for which this can be done, and the conditions involved can be expressed explicitly in terms of the parameters of the game, is the class of socalled "linearquadratic" games which we first introduce below. DEFINITION 1 An N person discretetime infinite dynamic game is of the linearquadratic type if ul = Rm; (iEN, kEK), and
(5a)
where A k , BL Ql + 1> R~ are matrices of appropriate dimensions, Qk+ 1 IS symmetric, Rli > 0, and k E K, i EN. 0 THEOREM 2 For an Nperson linearquadratic dynamic game with Ql+ 1 ~ 0 (i EN, k EK), let A k , Ml (k EK, i EN) be appropriate dimensional matrices defined by
Ak= I +
"
i ii L... B k[R k ] 
i 1 Bi'M k+l,
(6a)
k
ieN
(6b)
If A k (k EK), thus recursively defined, is invertible, the game admits a unique openloop Nash equilibrium solution given by
yk'{xd
=u;; =  [Riir
1 BlMl+ 1 Ak
1
Akxt,
(k E K, i E N)
(7a)
where {xt + 1 ; k E K} is the associated state trajectory determined from
(7b)
Rl
i Proof Since Ql + 1 ~ 0, > 0, for the linearquadratic game, D(Xl,U 1, . • . , UN) is a strictly convex function of u i for all uiERm)K
(j i= i,jEN)andforallx 1 ERn. Therefore, every solution set of (4)provides an openloop Nash solution. Hence, the proof will be completed if we can show that (7a) is the only candidate solution.
t Note that g] can also be taken to depend on Xl instead of x. , has in the proofof Prop. 5.1;but we prefer here the present structure for convenience in the analysis to follow.
244
DYNAMIC NONCOOPERATIVE GAME THEORY
First note that
Hi = ~(AkXk+L
+~L
JEN
JEN
u{ R~u{
B{u{)'Qi+1 (A kXk+
+ Pi'+ I
L B{U{)
JEN
(A kXk + L B{U{), JEN
and since Qi + I ~ 0, Ri > 0, minimization of this "Hamiltonian" over yields the unique relation i
ui E R m, (i)
where * X k+l
u kk + "L. B kk' i
A x*
i
••
x*1
X I'
(ii)
lEN
Furthermore, the costate (difference) equation in (4) reads
pi = A~ [pi + I
+ Qk+ 1 x:+ d;
p~ + 1 = O.
(iii)
Let us start with k = K in which case (i) becomes (iv) and if both sides are first premultiplied by B~ and then summed over i E N we obtain, by also making use of (ii) and (6a),
which further yields the unique relation which is precisely (7b) for k = K. Substitution of this relation into (iv) then leads to (7a) for k = K. We now prove by induction that the unique solution set of (i)(iii) is given by (7a)(7b) and pi = (Mi  Qi)x: (i E N, k E K). Let us assume that this is true for k = 1+ 1 (already verified for I = K  1) and prove its validity for k = 1. Firstly, using the solution pi + 1 = (Mi + 1  Q! + dx,*+ I in (i), we have (v)
Again, premultiplying this by of (ii) and (6a),
B! and summing over i E N leads to, also in view
which is (7b). If this relation is used in (v),we obtain the unique control vectors (7a), for k = I, and if it is further used in (iii) we obtain
6
245
NASH EQUILIBRIA
which clearly indicates, in view of (6b), that
pi = (MIQDxt. This then completes the proof by induction, and thereby the proof of the theorem. 0 Remark 1 An alternative derivation for the openloop Nash equilibrium solution of the linearquadratic game is (as discussed earlier in this subsection in a general context) to convert it into a standard static quadratic game and then to make use of the available results on such games (cf. section 4.6). By backward recursive substitution of the state vector from the state equation into the quadratic cost functionals, it is possible to bring the cost functional of Pi into the structural form as given by (4.9), which further is strictly convex in u i = (ur, ... ,u~) because of assumptions Ql ~ 0, Rti > 0, (k E K). Consequently,each player has a unique reaction curve,and the condition for existence of a unique Nash solution becomes equivalent to the condition for unique intersection of these reaction curves (cf. Prop. 4.5). t The existence condition of Thm 2, i.e. nonsingularity of Ak , (k E K), is precisely that condition, but expressed in a different (more convenient, recursive) form. For yet another derivation of the result of Thm 2, by making use of Prop. 5.1, the reader is referred to Problem 1, section 7. 0
We now turn our attention to another special class of nonzerosum dynamic gamesthe twoperson zerosum gamesin which case g~
== g; ggk
(8a)
(assuming, in accordance with our earlier convention, that PI is the minimizer and P2 is the maximizer), and the "Nash" inequalities (3)reduce to the saddlepoint inequality J (y ", yZ) :s; J (y 1., yZ.) :s; J (y\ yZ·),
l
E
r, yZ E r,
(8b)
where J g J 1 ==  J z. Directly applying Thm 1 in the present context, we first have from (4) (in view of (Sa) the relation p~ =  p; g Pk' (k E K), and therefore from (4), Hf ==  H~ g H k , (k E K). Hence, we have arrived at the following conclusion: THEOREM
3 For a twoperson zerosum discretetime infinite dynamic game, let
(i) !,,(., uL u;) be continuously differentiable on R", (kEK) (ii) gk(., uL u;, .] be continuously differentiable on R" x R", (k E K). t For a derivation of openloop Nash solution in twoperson linearquadratic games, and along these lines, the reader is referred to Basar (1976a) and Olsder (1977a).
246
DYNAMIC NONCOOPERATIVE GAME THEORY
Then, if {yl'(xd = U"; i = 1, 2} provides an openloop saddlepoint solution, and {xt + 1; k E K} is the corresponding state trajectory, there exists a finite sequence of ndimensional (costate) vectors {P2, . . . . , PK + I} such that the following relations are satisfied:
x! = Xl Hk(Pk+ 1, uf, uf, xt) s Hk(Pk+ l' uf, ut, xt) s Hk(Pk+ b xt + 1 = f,. (xt, uf, un, If ui E uL
uf E Uf
(9a)
uL ut, xt), (9b)
PK+l
= 0,
(9c)
where Hk(Pk + 1, ui, uf, xd ~ gk(f,.(x k, uf, un ui, uf, Xk) + p~
+ 1 f,.(Xk, uf,
Proof Follows from Thm 1, as discussed earlier.
uf), k E K. (9d)
o
As a specific application of this theorem, we now consider the class of linearquadratic twoperson zerosum dynamic games (cf. Def. 1)described by the state equation (lOa) and the objective functional 1
2
L (u ,u ) =
2 "21 k~~ 1 (x k + 1 Qk+ 1 x k + 1 + Uki ' Uk1  Uk2' Uk) I
(lOb)
which PI wishes to minimize and P2 attempts to maximize. It should be noted that we have taken the weighting matrices for the controls in (lOb) as unity, without any loss of generality, since otherwise they can be absorbed in Bf and Bf provided, of course, that Rf 1 > 0 and Rf 2 > 0 which was our a priori assumption in Def. 1. Let us also assume that Qk + 1 ~ 0 (k E K) which essentially makes L(uI, u 2 ) strictly convex in u 1 . In order to formulate a meaningful problem, we also have to require L(u 1 , u 2 ) to be (strictly) concave in u 2 , since otherwise P2 can make the value of L unboundedly large. The following lemma now provides a necessary and sufficient condition for (lOb) to be strictly concave in u 2 •
6
247
NASH EQUILIBRIA
LEMMA I For the linearquadratic twoperson zerosum dynamic game introduced above, the objectivefunctional uu', u 2 ) is strictly concave in u 2 (for all u 1 ERKm t) if, and only if,
IB{Sk+lB;>O
(kEK)
(lla)
where Sk is given by
S, = Qk + A~Sk+ lA k +A~Sk+ SK+l = QK+l
IB; [I  B{Sk+ IB;r 1 B{Sk+ lA k; (l lb)
Proof Since L(ut, u 2 ) is a quadratic function of u 2 , the requirement of
strict concavity is equivalent to existence of a unique solution to the optimal control problem min [ L(u 1 , u2 ) ] u2eR Kml
subject to (lOa) and for each u1 E RKmt. Furthermore, since the Hessian matrix (matrix of second partials) of L with respect of u2 is independent of u 1 , we can instead consider the optimal control problem
subject to that is, we can take u1 == 0 without any loss of generality. Then, the result follows from the dynamic programming technique outlined in section 5.5.1, and in particular from (5.19) which admits the unique solution V(k, x) =  X'SkX + x' QkX, if, and only if, (lla) holds. 0 We are now in a position to present the openloop saddlepoint solution of the linearquadratic twoperson zerosum dynamic game. THEOREM 4 For the twoperson linearquadratic zerosum dynamic game described by (lOa)(lOb) and with Qk+ 1 ~ 0 (k E K), let condition (lla) be satisfied and A k , M k (k E K) be appropriate dimensional matrices defined through
A k = [I + (Bl B{  Bf B{)Mk+ d Then,
M k = Qk+A~Mk+lAklAk;MK+l
= QK+l'
(12a) (12b)
(i) Ak(k E K), thus recursively defined, is invertible, (ii) the game admits a unique openloop saddlepoint solution given by
248
DYNAMIC NONCOOPERATIVE GAME THEORY
ynxd == ur =  Bf M k + 1 Ak l Akxt
(13a)
yf"{xd == uf = + B{ M k + 1 Ak Akxt, k E K,
(13b)
l
where {xt + 1; k E K} is the corresponding state trajectory determined from
(13c) Proof Let us first note that, with the openloop information structure and under condition (lla) (cf. Lemma 1), the game is a static strictly convexconcave quadratic zerosum game which admits a unique saddlepoint solution by Corollary 4.5. Secondly, it follows from Thrn 3 that this unique saddlepoint solution should satisfy relations (9) which can be rewritten, for the linearquadratic game, as follows:
xt+ 1 = Akxt+ Blur + B;uf, xT =
Xl
(i)
Hk(Pk+b ur, u~, xt)::s:; Hdpk+b ur, uf, xt)::s:; Hdpk+l, uI, ur, xt), (ii)
\fulERKm l , U2ERKm 2 ,
Pk = Aapk+l
+
PK+I = 0,
+Qk+1Xt+l]'
, 21 Ukl' u;1  21 Uk2' Uk2 + Pk + 1 (Akx k +
1 1
(iii)
2 2
Bk u; + Bk Uk ), k E K.
Now, start with k = K, to determine uk" and Uk from (ii) (and also by noting that PK + 1 = 0). The result is the unique set of relations
uk"= Bi'Qk+1XLI Uk = B~'QK+IXt+l which, when substituted into (i), yield the relation But since there exists a unique saddlepoint solution, there necessarily exists a unique relation between xt and xt + h implying that A K should be invertible. Then, and UK
= ( )iB~M
K+IAj( 1 A K
4 ;
i
= 1,2.
Hence, the theorem is verified for k = K. Now, proceeding to k = K 1 and following the same steps, we first obtain UCI = (_)i BLI(QKXt+ PK);
i
= 1,2.
6
249
NASH EQUILIBRIA
But, from (iii), and if this is used in the preceding expression for U~l=
()iBL 1M K 4
UiK _I
; i=
we obtain
1,2
where we have also made use of (12b). Substitution of this last set of relations into (i) again leads to which implies that A K is invertible and since there exists a unique saddle point. Furthermore, we have U~l
=
(_)i Bi~_l
M'j(A;( 1 A K 
141;
i
= 1,2,
which are (13a) and (13b); hence the result is verified for k = K 1. A simple inductive argument then verifies the desired result, by also making use of the recursive relation (12b). 0
6.2.2
Closedloop nomemory (feedback) Nash equilibria
We now turn our attention to a discussion and derivation of Nash equilibria under the memoryless perfect state information pattern which provides the players with only the current value of the state at every stage of the game (and of course also the initial value of the state, which is known a priori). For this class of dynamic games we first obtain the counterpart of Thm 1, which provides a set of necessary conditions for any closedloop nomemory Nash equilibrium solution to satisfy: THEOREM
5
For an Nperson discretetime infinite dynamic game, let
(i) .h(.) be continuously differentiable on R" x U~ x .. , x Ut', (kEK),
(ii) g1(.) be continuously differentiable on R" x U~ x .. , x Ut'x R", (k EK, iEN). Then, if {yi"{xk , xd = u(; k E K, i E N} provides a closedloop nomemory Nash equilibrium solution such that y1*(., x d is continuously differentiable on R" (Vk E K, i EN), and if{xt+ 1; k E K} is the corresponding state trajectory, there exists a finite sequence of ndimensional (costate) vectors {p~, ... , p~ + d for each i E N such that the following relations are satisfied: (14a)
250
DYNAMIC NONCOOPERATIVE GAME THEORY
y~'(X:,
xd == u~· = arg min H~(p~+
I'
UiEUi
u~',
... , u~l",
ul, (14b)
p~ = a~/k(*)'[P1+1 + j~N
Hi
+(ax~+1
[a~k
y{(x:, xd
91(*)}J+[a~k
91(*)]'
J[a~, J Jk(*)
[P1+ 1 +
(ax~+ 91(*»)'J 1
where H1(Pk+l, ul, . . . , uf, xd~gl(Jk(xk'
ul, .. . , un ul, . . . , uf,x k)
+pr+IJk(Xk'ul, ... ,un kEK, iEN, (14d)
a I' (* )~:;Jk a :;Jk uXk
uXk
I'
( Xk'I Uk' , ... , UkN' ) Ix.~x:'
and a similar convention applies to
a.
a
:; gi.(*) and  .Jk(*). au~
uXk
Proof The proof is similar to that ofThm 1, but here we also have to take into account possible dependence of uf on x k (k EK, i EN). Accordingly, the ith inequality of (3) now says that {}If (Xk' xd; k EK} minimizes i i + I' ) J i ( YI ', •.. ,y i  I ',y,y , ... ,y N '=) " L... gki ( Xk+I,YkI ' ( Xk,XI, keK
~
L 91 (Xk+
keK
over
I,
Y~ (xk,xd,x k)
r i subject to the state equation Xk+1 =fk(x., yC(x k, xd,···, Y1 1'(xk,xd,y1(xk,xd, :r' yi.. + 1 (Xk, xd, ... ,YkN· (Xk' xd) ~ Jk(x k, yi. (Xk, xd)' k E K.
6
NASH EQUILIBRIA
251
Then, the result follows directly from Thm 5.5 by taking j, aSh and gk as g~ which are continuously differentiable in their relevant arguments because of the hypothesis that the equilibrium strategies are continuously differentiable in their arguments. 0 If the set of relations (14) is compared with (4), it will be seen that they are identical, except for the costate equationsthe latter (equation (14c)) having two additional terms which are due to the dynamic nature of the information structure, allowing the players to utilize current values of the state. Furthermore, it is important to notice that every solution of the set (4) also satisfies relations (14), since every such solution is associated with static information, thereby causing the last two terms in (14c) to drop out. But, the "openloop" solution is not the only one that the set (14a}(14c) admits; surely it could have other solutions which explicitly depend on the current value of the state, thus leading to nonunique Nash equilibria. This phenomenon is closely related with the "informational nonuniqueness" feature of Nash equilibria under dynamic information, as introduced in section 3.5 (see, in particular, Prop 3.10 and Remark 3.15), whose counterpart in infinite games will be thoroughly discussed in the next section. Here, we deal with a more restrictive class of Nash equilibrium solutions under memoryless perfect state information patternthe socalled feedback Nash equilibriumwhich is devoid of any "informational nonuniqueness" (see also Oef. 3.22). DEFINITION 2 For an Nperson Kstage discretetime infinite dynamic game with memoryless perfect state information pattern, t let Ji denote the cost functional of Pi (i E N) defined on r x ... x r N . An N tuple of strategies { yi> E ri;i EN} constitutes afeedback Nash equilibrium solution if it satisfies the set of K Ntuple inequalities (3.28) for all yi E E N, k E K. 0
n,i
PROPOSITION 1 Under the memoryless perfect state information pattern, every feedback Nash equilibrium solution of an N person discretetime infinite dynamic game is a closedloop nomemory Nash equilibrium solution (but not vice versa).
Proof This result is the counterpart of Prop. 3.9 in the present framework and therefore its proof parallels that of Prop. 3.9. The result basically follows by showing that, for each i E N, the collection of all the ith inequalities of the K Ntuples imply the ith inequality of (3). 0 t The statement of this definition remains valid if the information pattern is instead "closedloop perfect state".
252
DYNAMIC NONCOOPERATIVE GAME THEORY
Definition of the feedback Nash equilibrium solution directly leads to a recursive derivation which involves solution of a static N person nonzerosum game at every stage of the dynamic game. Again, a direct consequence ofDef. 2 is that the feedback equilibrium solution depends only on Xk at stage k, and dependence on XI is only at stage k = I." By utilizing these properties, we readily arrive at the following theorem. THEOREM 6 For an Nperson discretetime infinite dynamic game, the set of strategies {Y~· (x k ) ; k E K, i EN} provides a feedback Nash equilibrium solution if, and only if, there exist functions Vi (k,.): R" + R, k E K, i E N, such that the following recursive relations are satisfied:
. [i(Ji.( V i (k ,X ) = nun gk k X,
i)
I·() X,
Uk 'Yk
iI·() i Yki+I*() X , Uk' X,
. . . , Yk
U~EU~
... ,Yt'·(X), x) + Vi(k
+ 1,fi*(x, uD ]
_ gki(Ji*( k x, Yki*(» X ,YkI*() x,
Vi (k + 1, Jt (x, yt* (xl ), V i(K+l,x)=O,
... , YkN·(» x,
X
(15)
+
iEN,
where 6. f ( x, Jki*(x, Uki) =Jk
i i+I*( X ) , . . . ,Y N·(x. » Yk1.() X, . . . , YkiI*() X, Uk,Y
Proof Let us start with the first set of N inequalities of (3.28). Since they have to hold true for all y~ E rt, i EN, k s K  1, this necessarily implies that they have to hold true for all values of state x K which are reachable by utilization of some combination of these strategies. Let us denote that subset of R" by X K' Then, the first set of inequalities of (3.28) becomes equivalent to the problem of seeking Nash equilibria of an N person static game with cost functionals (i)
which should be valid for all x K EX K' This is precisely what (15)says for k = K, with a set of associated Nash equilibrium controls denoted by {y~(x K); i E N} since they depend explicitly on x K EX K' but not on the past values of the state (including the initial state XI)' Now, with these strategies substituted into (i), a similar argument (as above) leads to the conclusion that the second set of inequalities of (3.28) defines a static Nash game with cost functionals Vi(K, XK)
+gLI(XK,
utI>' ... , urI>
XKI),
iEN,
t This statement is valid also under the "closedloop perfect state" information pattern. Note that the feedback equilibrium solution retains its equilibrium property also under the feedback information pattern.
6
253
NASH EQUILIBRIA
where and the Nash solution has to be valid for all x K _ 1 EX K _ 1 (where X K _ 1 is the counterpart of X K at stage k = K  I). Here again, we observe that the Nash equilibrium controls can only be functions of x K I' and (15) with k = K  I provides a set of necessary and sufficient conditions for {'/: _I(X K I): i EN} to solve this static Nash game. The theorem then follows from a standard induction argument. 0 The following corollary, which is the counterpart of Thm 2 in the case of feedback Nash equilibrium, now follows as a special case of Thm 6. PRELIMINARY NOTATION FOR COROLLARY I Let p~ (i EN, k E K) be appropriate dimensional matrices satisfying the set of linear matrix equations
[R~+Br
L Blpl = BrZ~+l
Zi+lB1JP~+BrZ~+l
Ak,iEN,
(16a)
jEN
Hi
where ZUi E N) is obtained recursively from
z:k = F'k z:k+l F k + L, '\' pj'Rijpj + Qi. k kkk'
z:
K+l
jEN
and Fk
g Ak 
L: B~
p~,
=
QiK+I, iEN,
k E K.
(l6b)
(16c)
lEN
1 An Nperson linearquadratic dynamic game (cf Def 1) with (iEN,kEK) and R~:::=:O (i, jEN, j=l=i,kEK) admits a unique feedback Nash equilibrium solution if, and only if, (16a) admits a unique solution set {pr; i EN, k E K}, in which case the equilibrium strategies are given by COROLLARY
Qi+I:::=:O
(17)
Proof Starting with k = K in the recursive equation (15),we first note that the functional to be minimized (for each i E N) is strictly convex, since Rik+ B~ QiK+1B iK > O. Then, the first order necessary conditions for minimization are also sufficient and therefore we have (by differentiation) the unique set of equations  [R~i
+ B~
QiK+1BUyi;(X K)  B~QLI
L: B~yjK·(XK) JEN j"i
(i)
254
DYNAMIC NONCOOPERATIVE GAME THEORY
which readily leads to the conclusion that any set of Nash equilibrium strategies at stage k = K has to be linear in x K' Therefore, by substituting y~ = piKxK(iEN) into the foregoing equation, and by requiring it to be satisfied for all possible x K , we arrive at (16a) for k = K. Further substitution of this solution into (15)for k = K leads to Vi(K, x) = tx'(Z~  Q~)x; that is, Vi(K,.) has a quadratic structure at stage k = K. Now, if this expression is substituted into (15) with k = K 1 and the outlined procedure is carried out for k = K 1, and this so (recursively) for all k ::; K  1, one arrives at the conclusion that (i) Vi(k, x) = ix'(Z~  QDx is the unique solution of the recursive equation (15) under the hypothesis of the corollary and by noting that Z~ ~ 0 (i EN, k EK), and (ii) the minimization operation in (15) leads to the unique solution (17) under the condition of unique solvability of (16a). This, then, completes verification of Corollary 1. 0 Remark 2 The "nonnegative definiteness" requirements imposed on Ql+ 1 and R~(i,j E N,j i= i; k EK) are sufficient for strict convexity of the functionals to be minimized in (15), but they are by no means necessary. A set of less stringent (but more indirect) conditions would be
R~ +B~Zi+
lBi > 0, (iEN,kEK),
under which the statement of Corollary 1 still remains valid. Furthermore, it follows from the proof of Corollary 1 that, if (16a) admits more than one set of solutions, every such set constitutes a feedback Nash equilibrium solution. 0 Remark 3 It is possible to give a precise condition for unique solvability of the set of equations (16a)for pi(i EN, k EK).The said condition is invertibility of matrices cIlk , k EK, which are composed of block matrices, with the iith block given as R~ + B~ z; + 1 Bi and the ijth block as B~ z; + 1 B L where i,jEN,ji=i. 0
We now consider the general class of twoperson discretetime zerosum dynamic games and determine the "costtogo" equation to be satisfied by the feedback saddlepoint solution, as a special case of Thm 6. COROLLARY 2 For a twoperson discretetime zerosum dynamic game, the set ofstrategies {yi"(x k); k EK, i = 1, 2} provides a feedback saddlepoint solution if, and only if, there exist functions V(k,.): R" + R, k E K, such that the following recursive relation is satisfied:
V(k, x)
= min max [gk(.fk(X, u~, un u~eU.1
ufeU;
u~,
u~,
x) + V(k + 1,.fk(x, u~, u~)]
6
= max min
u;eU; uieU;
NASH EQUILIBRIA
[gk(h(X, ui, u~),
255
uL ut, x) + V(k + 1,.fk(x, uL uf)]
= gk(.fk (x, yr<x), y:"(x», yf(x), y:"(x), x) + V(k + 1,.fk(x, yf(x), y:"(x»; V(K + 1, x) = O.
(18)
Proof The recursive equation (18) follows from (15) by taking N = 2, gi == g~ ~gk(kEK), since then VI ==  V 2 ~ V and existence of a saddle point is equivalent to interchangeability of the min max operations. 0
For the further special case of a linearquadratic zerosum game, the solution of (18)as well as the feedback saddlepoint solution can be explicitly determined in a simple structural form, but after a rather lengthy derivation. We accomplish this in two steps: firstly we obtain directly the special case of Corollary 1 for the linearquadratic zerosum game (see Corollary 3 below), and secondly we simplify these expressions further so as to bring them into a form compatible with the results of Thm 4 (see Thm 7, in the sequel). COROLLARY 3 For the twoperson linearquadratic zerosum dynamic game described by (lOa)(lOb), and with QH 1 ~ 0 (k E K), the unique solution of(16a) is given by (19a) P~ = [/ +Ki Bi]1 «; A k
P~
where
=
(19b)
[/K~BnlKfAk
z., 1[/ + Bf(I 
Bf' z,; 1Bf)1 Bf'Zk+ 1] 1 B k1'Zk+l B k1)IB k1'Z] k~Bk 2'Z k+l [ I BdI+ HI
(20b)
z, = F'kZk+ I Fk+ pfPlpf'Pf + Qk;ZK+I = QK+l'
(21)
K~ ~Bf
K2
and and
(20a)
z; =  z; ~ Zk satisfies the recursive equation Fk~AkB~pi
Bfp;.
Furthermore, the set of conditions of Remark 2 are equivalent to >0
(keK)
(22a)
1 Bf' Zk+ 1 Bf > 0
(k e K).
(22b)
I+BfzHIB~
Proof By letting N = 2, Q~+1 = Qf+l ~Qk+l' R~1 = Rfl = I, Rf2 =  Ri 2 = I in Corollary 1, we first observe that z; ==  zf and the equation for z; is the same as (21), assuming of course that pi's in the two
256
DYNAMIC NONCOOPERATIVE GAME THEORY
equations are correspondingly the samea property which we now verify. Towards this end, start with (16a) when i = 2, solve for P; terms of P~ (this solution is unique under (22b», substitute this into (16a) when i = 1,and solve for P~ from this linear matrix equation. The result is (19a),assuming this time that the matrix [I + K: B~] is invertible. Furthermore, by following the same procedure with indices 1 and 2 interchanged, we arrive at (19b), on account of the invertibility of [I  K; Bn. Therefore, P: and P; given by (19a)and (19b), respectively, are indeed the unique solutions of (16a) for this special case, provided that the two matrices in question are invertible. A direct manipulation on these matrices actually establishes nonsingularity under conditions (22a)(22b); however, we choose here to accomplish this by employing an indirect method which is more illuminating. Firstly note that (22a) and (22b) correspond to the existence conditions given in Remark 2, and therefore they make the functionals to be minimized in (15) strictly convex in the relevant control variables. But, for the zerosum game, (15) is equivalent to (18~ and consequently (22a) and (22b) make the kernel in (18)strictly convex in and strictly concave in (k E K). Since every strictly convexeoncave quadratic static game admits a (unique) saddle point (cf. Corollary 4.5), the set (16a) has to admit a unique solution for the specific problem (zerosum game) under consideration. Hence, the required inverses in (20b) and (2Oc) should exist. This, then, completes the proof of Corollary 3. 0
u:
u;
To further simplify (19)(21), we now make use of the following matrix identity: LEMMA 2 Let Z = Z' and B be matrices of dimensions (n x n) and (n x m), respectively, and with the further property that B'ZB does not have any unity eigenvalues. Then,
(23)
Proof First note that the matrix inverse on the RHS of this identity exists, under the hypothesis of the lemma, since nonzero eigenvalues of B'ZB and BB' Z are the same (see Marcus and Mine, 1964,p. 24).Then, the result follows by multiplying both sides of the identity by In  BB'Z from the right and by straightforward manipulations. 0
Application of this identity to (20a) and (20b) readily yields (by identifying B; with Band Zk+ 1 with Z in the former and B~ with Band  Zk+ 1 with Z in
the latter) t t
Here we suppress the dimensions of the identity matrices since they are clear from the context.
6
s; = Kf
NASH EQUILIBRIA
257
B{Zk+I[J BfB{Zk+lr l
= B{zk+I[I+BiB{zk+lr l.
Furthermore, if these are substituted into (19a) and (19b), some extensive, but straightforward, matrix manipulations which involve repeated application of (23) lead to the expressions Pi =B{Zk+I[J+(BiB{BfBnzk+lrIA k
(24a)
Pf
(24b)
=  B{Zk+1 [I  (B1B{  Bf BnZk+ Ir1A k,
and Fk , given in Corollary 3, is then expressed as F, = [I  (B1 B{  Bf BnZk+ I] I A k.
(24c)
Now, finally, if (24a}(24c) are substituted into (21),we arrive at the conclusion that Zk satisfies the same equation as M k, given by (12b), does. Therefore, we have THEOREM 7 The twoperson linearquadratic zerosum dynamic game described by (10a}(10b) and with Qk+1 ;;:::0 (kEK) admits a unique feedback saddlepoint solution if, and only if, I'
1
I +B k Mk+IB k > 0
(kEK)
(25a)
IB{Mk+IBf>O
(kEK),
(25b)
in which case the unique equilibrium strategies are given by
yf(xd= B{Mk+IAkIAkXk·
(26a)
yr(Xk) = B{ Mk+IA;;1 Akx k, kEK,
(26b)
and the corresponding unique state trajectory {xt + I; k E K} satisfies the difference equation (26c) where M k + 1 , Ak (k E K) are as defined in Thm 4. Proof This result follows from Corollary 1, Corollary 3 and the discussion given prior to the statement of Theorem 7. 0 Remark 4 A comparison of Thms 4 and 7 now readily reveals an important property of the saddlepoint solution in such games; namely, whenever they both exist, the unique openloop saddlepoint solution and the unique feedback saddlepoint solution generate the same state trajectory in linearquadratic zerosum games. Furthermore, the openloop values (cf. section 5.5.4) of the feedback saddlepoint strategies (26a}(26b) are correspondingly the same as
258
DYNAMIC NONCOOPERATIVE GAME THEORY
the openloop saddlepoint strategies (13aH13b). These two features of the saddlepoint solution under different information structures are in fact characteristic of not only linearquadratic games, but of the most general zerosum dynamic games treated in this section, as it will be verified in the next section (see, in particular, Thm 8). In nonzerosum dynamic games, however, the Nash equilibrium solutions obtained under different information structures do not exhibit such a feature, as it willalso be clear from the discussion of the next section (see, in particular, section 6.3.1). 0 Even though the two saddlepoint solutions of Thms 4 and 7 generate the same state trajectory, the existence conditions involved are not equivalent. This result follows from a comparison of (IIa) and (25aH25b), and is further elaborated on in the following proposition. PROPOSITION 2 For the linearquadratic twoperson zerosum dynamic game described by (IOaHlOb) and with Qk+ 1 > 0 V k E K, condition (IIa) implies both (25a) and (25b), but not vice versa. In other words, every linearquadratic twoperson zerosum dynamic game (with positive cost on the state) that admits a unique openloop saddle point also admits a unique feedback saddle point, but existence of the latter does not necessarily imply existence of a saddle point in openloop strategies.
Proof Let us first show that (lla) necessarily implies (25a) and (25b). To accomplish this, it will be sufficient to verify validity of the relation
(i)
under condition (l la), from which (25a)and (25b) readily follow. Towards this end, let us first observe from (llb) that (lla), together with Qk > 0, implies
s, > 0
(ii)
(k EK),
and further note that (1Ia) can equivalently be written as
>0
Sk}l B~Br
(iii)
(kEK).
Relation (i)clearly holds for k = K; therefore let us assume its validity for k + I and then verify it for k. Firstly, M k+ 1!\;1
= (M;+\ +BiBf BfBn 1 ,
(iv)
and since (from (iii) and (i) 1
1
l'
2
2'
 1
M k+ 1+BkBk BkB k ~Mk+1BkBk
we have
2
2'
 1
~Sk+lBkBk
2
2'
>0,
(v)
6
NASH EQUILIBRIA
259
and if this is used in (I2b), together with Qk > 0, we readily obtain M k > 0 which is the RHS relation of (i). Furthermore, using (l lb) and (12a), SkMk
=
A~[(SkI\
B~BnlMk+1Akl]Ak:2:0
where the inequality follows from (v)in view of (iv). Hence, the LHS relation of (i) is also valid for general k. This then completes the first part of the proof. To verify that (25a}(25b) do not necessarily imply (IIa), we simply produce a counter example. Consider the scalar twostage game (i.e. K = 2) with Qk + 1 = B~ = A k = 1, Bi = I/J2, (k = 1,2). Both (IIa) and (25a)(25b) are satisfied for k = 2. For k = 1, however, S2 = 3, M 2 = 5/3 and (25a)(25b) are satisfied, while (IIa) fails to hold true. 0
6.3 INFORMATIONAL PROPERTIES OF NASH EQUILIBRIA IN DISCRETETIME DYNAMIC GAMES This section is devoted to an elaboration on the occurrence of "informationally non unique" Nash equilibria in discretetime dynamic games, and to a general discussion on the interplay between information patterns and existenceuniqueness properties of noncooperative equilibria in such games. First, we consider, in some detail, a scalar threeperson dynamic game which admits uncountably many Nash equilibria, and which features several important properties of infinite dynamic games. Then, we discuss these properties in a general context.
6.3.1 A threeperson dynamic game illustrating informational nonuniqueness Consider a scalar threeperson twostage linearquadratic dynamic game in which each player acts only once. The state equation is given by X3
= X2 +u l +u 2
X2= Xl
+u
3
,
(27a) (27b)
and the cost functionals are defined as L1 = 2
L = L3 =
+ (U l)2  (X3? + 2(u 2 )2  (U 3)2 _L 2 = (X3)2 + (U 3)2_2(U 2)2.
(X3)2
(28a) (28b) (28c)
In this formulation, u i is the scalar unconstrained control variable of Pi
260
DYNAMIC NONCOOPERATIVE GAME THEORY
(i = 1,2,3), and Xl is the initial state whose value is known to all players. PI and P2, who act at stage 2, have also access to the value of X2 (i.e. the underlying information pattern is CLPS (or, equivalently, MPS) for both PI and P2), and their permissible strategies are taken as twice continuously differentiable mappings from R x R into R. A permissible strategy for P3, on the other hand, is any measurable mapping from R into R. This, then, completes description of the strategy spaces r, r 2 and r (for PI, P2 and P3, respectively), where we suppress the subindices denoting the corresponding stages since each player acts only once. Now, let {yl E r, y2 E r, y3 E r 3 } denote any noncooperative (Nash) equilibrium solution for this threeperson nonzerosum dynamic game. Since L i is strictly convex in u' (i = 1,2), a set of necessary and sufficient conditions for yl and y2 to be in equilibrium (with y3 E r fixed) is obtained by differentiation of L i with respect to ui (i = 1,2), thus leading to
where
X2(y3) + 2yl (X2' xtl + y2(X2' xtl = 0 X2(y3)_yl(X2,xd+y2(X2,xtl = 0 X2 ~ X2 (y3)
= Xl +
y3 (x.},
Solving for yl (X2' XI) and y2(X2' Xl) from the foregoing pair of equations, we obtain (29a) yl(X2' xtl = jX2 (29b) y2(X2' xtl = tX2 which are the side conditions on the equilibrium strategies yl and y2, and which depend on the equilibrium strategy y3 of P3. Besides these side conditions, the Nash equilibrium strategies of PI and P2 have no other natural constraints imposed on them. To put it in other words, every N ash equilibrium strategy for PI will be a closedloop representation (cf. Def. 5.12) of the openloop value (29a), and every Nash equilibrium strategyfor P2 will be a closedloop representation of (29b). To complete the solution of the problem, we now proceed to stage one. Since {r'. y2, y3} constitutes an equilibrium triple, with {u l = yl (X2' xtl, u 2 = y2(X2' X tl} substituted into L 3 the resulting cost functional of P3 (denoted as p) should attain a minimum at u 3 = y3 (xtl; and since yl and y2 are twice continuously differentiable in their arguments, this requirement can be expressed in terms of the relations
d 3 3 d 3 L (y (xtl) U
= x 3(1 +
y~ .2 + y~)2
2y~
1
y2 +y3(xtl
=0
6
261
NASH EQUILIBRIA
where we have suppressed the arguments of the strategies. Now, by utilizing the side conditions (29a)(29b) in the above set of relations, we arrive at a simpler set of relations which are, respectively,
"32 [XI + l' 3 (xd]
I

3
[1 + 1'x,(x2, XI)] + l' (xd =0
(30a)
2
+ 1';,(X2' Xd]2 "3 x21':,x, (X2' xd  2 [1';,(X2 , Xd]2 + 1 > 0, (30b) where
x2 is again defined as (30c)
These are the relations which should be satisfied by an equilibrium triple, in addition to (29). The following proposition summarizes the result. 3 Any triple {1' l ' E r l , 1'Z' E r, 1'3' E r 3 } that satisfies (29) and (30), and also possesses the additional feature that (30a) with 1'1 = Y I ' and 1'Z = 1'z* admits a unique solution 1'3 = 1'3', constitutes a Nash equilibrium solution for the nonzerosum dynamic game described by (27)(28). PROPOSITION
Proof This result follows from the derivation outlined prior to the statement of the proposition. Uniqueness of the solution of (30a) for each pair {1' 1*, 1'Z'} is imposed in order to insure that the resulting 1'3* is indeed a globally minimizing solution for P. 0
We now claim that there exists an uncountable number of triplets that satisfy the requirements of Prop. 3. To justify this claim, and to obtain a set of explicit solutions, we consider the class of 1'1 and 1'2 described as 1
2

2
1  3 "3X2+q[X2X2(1')],
3
l' (x2,xd= "3X2+P[X2X2(1')] l' (x 2,xd=
t Here, we could of course also have nonstrict inequality (i.e. :2:)in which case we also have to look at higher derivatives of [3. We avoid this by restricting our analysis at the outset, only to those equilibrium triples {1'I, 1'2, 1'3} which lead to an P that is locally strictly convex at the solution point.
262
DYNAMIC NONCOOPERATIVE GAME THEORY
where p and q are free parameters. These structural forms for yl and yl clearly satisfy the side conditions (29a) and (29b), respectively. With these choices, (30a) can be solved uniquely (for each p, q) to give y3(xd
=  [(2 + 6p)/(11 + 6p)]x 1 ,
p =1= 11/6,
with the existence condition (30b) reading
G+ Y+ P
2pq  ql
+~ >
(31)
O.
The scalar Xl is then given by Xl
Hence, PROPOSITION
4
= [9/(11 + 6p)]Xl'
The set of strategies y·*(xl,xd
yl*(Xl'
=
~Xl
xd = ~
y3*(xd
Xl
+P{Xl
[9/(11 +6p)]xd
+ q{Xl  [9/(11 + 6p)]Xl
}
=  [(2 + 6p)/(11 + 6p)]Xl
constitutes a Nash equilibrium solution for the dynamic game described by (27)(28), for all values of the parameters p and q satisfying (31) and with p =1=  11/6. The corresponding equilibrium costs of the players are
LT =
2[6/(11 + 6p)]l(X 1)l L! =  Lj =  [(22 + 24p
+ 36pl)/(11 + 6p)2](xd 2.
0
Several remarks and observations are in order here, concerning the Nash equilibrium solutions presented above. (1) The nonzerosum dynamic game of this subsection admits uncountably many Nash equilibrium solutions, each one leading to a different equilibrium cost triple. (2) Within the class of linear strategies, Prop. 4 provides the complete solution to the problem, which is parameterized by two scalars p and q. (3) The equilibrium strategy of P3, as well as the equilibrium cost values of all three players, depend only on p (not on q), whereas the existence condition (31)involves both p and q. There is indeed an explanation for this: the equilibrium strategies of PI and P2 are in fact representations of the openloop values (29a)(29b) on appropriate trajectories. By choosing a specific representation of (29a), PI affects the cost functional of P3 and thereby the optimization problem faced by him. Hence, for each different representation of (29a), P3 ends up, in general, with a
6
263
NASH EQUILIBRIA
different solution to his minimization problem, which directly contributes to nonuniqueness of Nash equilibria. For P2, on the other hand, even though he may act analogouslyi.e. choose different representations of (29b}these different representations do not lead to different minimizing solutions for P3 (but instead affect only existence of the minimizing solution) since L 2 ==  L 3 , i.e. P2 and P3 have completely conflicting goals (seeThm 8, later in this section, and also the next remark for further clarification). Consequently, y3' (xd is independent of q, but the existence condition explicitly depends upon q. (This is true also for nonlinear representations, as it can be seen from (30a) and (30b).) (4) If PI has only access to X2 (but not to Xl), then, necessarily, p = 0, and both PI and P3 have unique equilibrium strategies which are {y\'(x 2 ) = jX2' y3*(xd =  (2/11)xd. (This is true also within the class of nonlinear strategies.) Furthermore, the equilibrium cost values are also unique (simply set p = 0 in V', i = 1, 2, 3, in Prop. 4). However, the existence condition (31) still depends on q, since it now reduces to q2 < 11/9. The reason for this is that P2 has still the freedom of employing different representations of (29b), which affects existence of the equilibrium solution but not the actual equilibrium state trajectory, since P2 and P3 are basically playing a zerosum game (in which case the equilibrium (i.e. saddlepoint) solutions are interchangeable). (See Thm 8, later in this section; also recall the feature discussed in Remark 4 earlier.) (5) By setting p = q = 0 in Prop. 4, we obtain the unique feedback Nash equilibrium solution (cf. Corollary 1 and Remark 2) of the dynamic game under consideration (which exists since p = 0 i= 11/6, and (31) is satisfied). (6) Among the uncountable number of Nash equilibrium solutions presented in Prop. 4, there exists a subsequence of strategies which brings PI's Nash cost arbitrarily close to zero which is the lowest possible value L 1 can attain. Note, however, that the corresponding cost for P3 approaches (xY which is unfavourable to him. Before we conclude this subsection, it is worthy to note that the linear equilibrium solutions presented in Prop. 4 are not the only ones that the dynamic game under consideration admits, since (30a) will also admit nonlinear solutions. To obtain an explicit nonlinear equilibrium solution, we may start with a nonlinear representation of (29a), for instance 1
2

3
2
y (x2,xd= 3X2+P[X2X2(Y)] '
substitute it into (30a), and solve for a corresponding y3(xd, checking at the same time satisfaction of the second order condition (30b). For such a
264
DYNAMIC NONCOOPERATIVE GAME THEORY
derivation (of nonlinear Nash solutions in a linearquadratic game) the reader is referred to Basar (1974). See also Problem 5, section 7.
6.3.2
General results on informationaJJy nonunique equilibrium solutions
Chapter 3 (section 5) has already displayed existence of "informationally nonunique" Nash equilibria in finite multiact nonzerosum dynamic games, which was mainly due to the fact that an increase in information to one or more players leaves the Nash equilibrium obtained under the original information pattern unchanged, but it also creates new equilibria (cf. Prop. 3.10). In infinite games, the underlying reason for occurrence of informationally nonunique Nash equilibria is essentially the same (though much more intricate), and a counterpart of Prop. 3.10can be verified. Towards this end we first introduce the notion of "informational inferiority" in such dynamic games. DEFINITION 3 Let I and II be two Nperson Kstage infinite dynamic games which admit precisely the same extensive form description except the underlying information structure (and, of course, also the strategy spaces whose descriptions depend on the information structure). Let I'/i( (respectively 1'/\1) denote the information pattern of Pi in the game I (respectively, II), and let the inclusion relation 1'/; f; I'/h imply that whatever Pi knows at each stage of game I he also knows at the corresponding stages of game II, but not necessarily vice versa. Then, I is informationally inferior to II if I'/i( S; 1'/\( for all i E N, with strict inclusion for at least one i. 0
PROPOSITION 5 Let I and lJ be two Nperson Kstage infinite dynamic games as introduced in De! 3, so that I is informationally inferior to I I. Furthermore, let the strategy spaces of the players in the two games be compatible with the given information patterns and the constraints (if any) imposed on the controls, so that 1'/\ S; 1'/\, implies r\ s; r\J, i E N. Then, (i) any Nash equilibrium solution for I is also a Nash equilibrium solution for l l, (ii) if {y 1, . . . , yN} is a Nash equilibrium solution for lJ such that yi E for all i E N, then it is also a Nash equilibrium solution for I.
n
Proof Let by definition,
V*; i E N} constitute a Nash equilibrium solution for I. Then,
Jl(yl*, y2*, ... , yN')::::; Jl(yl, y2*, ... , yN'), Vyl
Erl;
6
265
NASH EQUILIBRIA
n,
with the corresponding therefore, PI minimizes J1 (., y2*, ... , yN") over solution being r" Now consider minimization of the same expression over n,( ~ n) which reflects an increase in deterministic information concerning the values of state. But, since we have a qeterministic optimization problem, the minimum value of jl (y1, y2*, ... , yN ) does not change with an increase in information (see section 5.5.4). Hence,
En.
and furthermore since y 1 * E r
h, we have the
inequality
Since PI was an arbitrary player in this discussion, it in general follows that Ji(y1*, . . . ,
r", . . . , yN* )::;; Ji(y1*,
. . . , yi1*, yi, yi+1*, . . . , yN* ),
'tyiErh (iEN) which verifies (i) of Prop. 5. Proof of (ii) is along similar lines.
o
Since there corresponds at least one informationally inferior game (viz. a game with an openloop information structure) to every multistage game with CLPS information, the foregoing result clearly provides one set of reasons for existence of "informationally non unique" Nash equilibria in infinite dynamic games (as Prop. 3.10did for finite dynamic games). However, this is not yet the whole story, as it does not explain occurrence of uncountably many equilibria in such games. What is really responsible for this is the existence of uncountably many representations of a strategy under dynamic information (cf. section 5.5.4.). To elucidate somewhat further, consider the scalar threeperson dynamic game of section 6.3.1.We have already seen that, for each fixed equilibrium strategy y3 of P3, the equilibrium strategies of PI and P2 have unique openloop values given by (29a) and (29b), respectively, but they are otherwise free. We also know from section 5.5.4 that there exist infinitely many closedloop representations ofsuch openloop policies; and since each one has a different structure, this leads to infinitely many equilibrium strategies for P3, and consequently to a plethora of Nash equilibria. The foregoing discussion is valid on a much broader scale (not only for the specific threeperson game treated in section 6.3.1), and it provides a general guideline for derivation of informationally nonunique Nash equilibria, thus leading to the algorithm given below after introducing the concept of "stagewise equilibrium".
266
DYNAMIC NONCOOPERATIVE GAME THEORY
DEFINITION 4 For an Nperson Kstage infinite dynamic game, a set of strategies {yiO; i E N} satisfying the following inequalities for all y~ E rt, i E N, k E K, is a stagewise (Nash) equilibrium solution:
Ji(y10; ... ; yiO; ... ; yN') :::;; Ji(y10; ... ; r: . . . , yrl, yLyr+l> ... , y~;
.. . ; yN')
(32)
o Remark 5 Every Nash equilibrium solution (cf. the set of inequalities (3» is a stagewise Nash equilibrium solution, but not vice versa. 0 An algorithm to obtain informationally nonunique Nash equilibria in infinite dynamic games First determine the entire class of stagewise equilibrium solutions. (1) Starting at the last stage of the game (k = K), fix the Ntuples of strategies at every other stage k < K, and solve basically an N person static game at stage K (which is defined by (32) with k = K) to determine the openloop values of the corresponding stagewise equilibrium strategies at k = K as functions of the strategies applied previously. Furthermore, determine the equivalence class of representations of these Ntuple of openloop values, again as functions of the previous strategies and so that they are compatible with the underlying information structure of the problem. (2) Now consider inequalities (32) for k = K  1 and adopt a specific member of the equivalence class determined at step 1 as a strategy Ntuple applied at stage K; furthermore fix the N tuples of strategies at every stage k < K 1, and solve basically an Nperson static game to determine the openloop values of the corresponding stagewise equilibrium strategies at stage K  1 as functions of the strategies applied previously. Repeat this for every member of the equivalence class determined at step 1. Each will, in general, result in a different set of openloop values. Now, for each set of openloop values thus determined, find the corresponding equivalence class of representations which are also compatible with the underlying information structure and which depend on the strategies applied at earlier stages. (K) Now, finally, consider the set of inequalities (32) for k = 1, and adopt
specific (and compatible) members of the equivalence classes determined at steps 1, ... , K  1 as strategies applied at stages K, .... , 2; solve the resulting N person static game to determine the
6
NASH EQUILIBRIA
267
corresponding stagewise equilibrium strategies at stage 1, which will in general also require solution of some implicit equations. Repeat this for every compatible (K  I)tuple of members of the equivalence classes constructed at steps 1, ... , K  1. This then completes determination of the entire class of stagewise equilibrium solutions of the given dynamic game. Finally, one has to check to see which of these stagewise equilibrium solutions also constitute Nash equilibria, by referring to the set of inequalities (3). Remark 6
In obtaining a set of informationally nonunique equilibrium solutions for the scalar example of section 6.3.1,we have actually followed the steps of the foregoing algorithm; but there we did not have to check the last condition since every stagewise equilibrium solution was also a Nash equilibrium solution because every player acted only once. 0 The foregoing algorithm leads to an uncountable number of informationally non unique Nash equilibria in deterministic multistage infinite nonzerosum games wherein at least one player has dynamic information. Furthermore, these informationally nonunique Nash equilibria are, in general, not interchangeablethereby leading to infinitely many different equilibrium cost Ntuples. The reason why we use the words "in general" in the preceding sentence is because there exist some extreme cases of nonzerosum games which do not exhibit such features. One such class is the socalled team problems for which L 1 == L 2 . . . == L N, i.e. there is a common objective. The Nash equilibrium solution corresponds in this case to a personby person optimal solution and it becomes a team solution under some further restrictions (see the discussion given in section 4.6). Deterministic team problems are no different from optimal control problems discussed in section 5.5,and in particular the discussion of section 5.5.4leads to the conclusion that informational non uniqueness ofequilibria does not create any major difficulty in team problems, since all these equilibrium solutions are different representations of the same Ntuple of strategies which is associated with the global minimum of a single objective functional. Hence, for the special case of team problems, informational nonuniqueness does not lead to different equilibrium cost N tuples. Let us now consider the other extreme casethe class of twoperson zerosum infinite dynamic gamesin which case L 1 ==  L 2 • For such dynamic games, informational nonuniqueness does not lead to different (saddlepoint) equilibrium cost pairs either, since saddle points are interchangeable; however, as we have observed in section 6.3.1,existence ofa saddle point will depend on the specific pair of representations adopted. Therefore, contrary to team solutions, every representation of a pair of saddlepoint strategies is not
268
DYNAMIC NONCOOPERATIVE GAME THEORY
necessarily a saddlepoint strategy pair. The following theorem now makes this statement precise, and it also provides a strengthened version of Prop. 5 for twoperson zerosum dynamic games. THEOREM 8 Let I be a twoperson zerosum Kstage dynamic game with closedloop perfect state information pattern. (i) If I admits an openloop saddlepoint solution and also afeedback saddlepoint solution, the former is unique if, and only if, the latter is unique. (ii) If I admits a unique feedback saddlepoint solution {1'I /, 1'2 /} with a corresponding state trajectory {xl + I; k E K}, and if {1'l*, 1'2*} denotes some other closedloop saddlepoint solution with corresponding state trajectory {Xt+l; kEK}, we have
kEK Xt+I=X{+1> 1'( (xt, ... , xi, xd == }'~r (x{),
kEK,
i
= 1,2.
(iii) If I admits a unique openloop saddlepoint solution {1' 10 ,1'20 } with a corresponding state trajectory {x~ + I ; k E K}, and if {l' I', l' 2'} denotes some other closedloop saddlepoint solution with corresponding state trajectory {Xt+l;kEK}, we have
xt+1 = x~+1>
kEK
1'r(xt, ... ,x!,xd==1'~O(xd,
kEK,
iEN.
Proof (i) Let {1' I / , 1' 2/ } be the unique feedback saddlepoint solution and 0 {1'IO, 1'2 } be any openloop saddlepoint solution. It follows from Prop. 5 that {1'I /, 1'2 /} and {1'I O, 1'20} are also closedloop saddlepoint solutions. Furthermore, because of the interchangeability property of saddle points, {1'IO, 1'2 /} and {1'lf, 1'2 0 } are also closedloop saddlepoint strategy pairs, i.e.
s J (1'1,1'2 /), V1'I E r l , 1'2 E r, of inequalities for {1'l/, 1'20}, where r denotes
J (1'1 0 ,1'2) ::; J (1'1 0 ,1'21 )
(i)
and an analogous pair the closedloop strategy space of Pi. Now, let us consider the RHS inequality of (i) which is equivalent to J (1'1 0 ,1'21 ) = min J (1'1,1'2 /). )'1 E
r1
This is a standard optimum control problem in discrete time with a cost functional J~ (1' I ) =
"A c: gk (x,
+ I'
utI , xd
kEK
9k (x, + I ' uf, x k) ~ gk (x, + I ' uf, }'((x k), xd,
uf = 1'f (.),
6
and a state equation Xk + I
NASH EQUILIBRIA
= X(Xk, un g fk
(x k ,
269
uk, ylf (x k »·
Globally optimal solution of this problem over r l can be obtained (ifit exists) via dynamic programming and as a feedback strategy. But by hypothesis (since { y lf,y2f} is a saddlepoint pair), there exists a unique feedback strategy (namely, yIf) that solves this problem. Since every closedloop strategy admits a unique openloop representation (cf.section 5.5.4), it follows that there exists a unique openloop solution to that optimum control problem, which is given by yIO(xd=ylf(x{) where {X{+I;kEK} denotes the state trajectory corresponding to {i f , y2f } . Therefore, y1 0 is unique whenever If is. It can likewise be shown, by making use of the LHS inequality of (i), that y20 (xd = y2f(x {), and hence y20 is unique whenever y2f is unique. This then completes verification of the "if" part of (i). The "only if" part can be verified analogously, this time starting with the hypothesis that {y10, y20} is the unique openloop saddlepoint solution. (ii) By Prop. 5 and the interchangeability property of saddle points, {yl', y2f} and {ylf, y2'} are also saddlepoint strategy pairs; therefore J(yl',y2 f)
= min
J(y1,y2 f).
'1'1 ef '
This equality defines an optimum control problem in discrete time, and as in the proof of part (i), every globally optimal solution of this optimization problem can be obtained via dynamic programming; and by hypothesis, y1 f is the unique feedback strategy that renders the global minimum. Hence, every solution in r has the same openloop representation as ylf, that is yf (x {, .. , x {, x d = yV (x {), k E K. Similarly, this time by starting with {y1 f, y2'}, yf (x {, .. , x {, xd == y~f (x (), k E K. Therefore, the pairs of strategies {if, y2f} and {v", y2'} have the same openloop representations, and this necessarily implies that the corresponding state trajectories are the same. (iii) The idea of the proof here is the same as that of (ii). By Prop. 5 and the interchangeability property of saddle points, {y I', y20 } and {y1 0 , y2'} are also saddlepoint strategy pairs; the former of these implies that J (yl', y20)
= min
J (y1, y20)
yl ef1
which defines an optimization problem whose openloop solution is known to be unique (namely, y10). Therefore, every other solution of this optimization problem in r ' has to be a closedloop representation of ylO on the trajec} 'i.e, Yk1* (x 0k , · · · , X 0 ) 10 ( XI' ) k E K. S'imuany, '1 I 0 . k EK, tory { Xk+I' 2• Xl =Yk yt (X~ , ... ,X~, XI) == y~O (XI)' k E K. Furthermore, because of this equivalence between the openloop representations, we necessarily have xt+ 1 = x~ + 1 , kEK. 0
270
DYNAMIC NONCOOPERATIVE GAME THEORY
Remark 7 The "uniqueness" assumptions of parts (ii)and (iii)ofThm 8 serve to simplify the statement of the theorem, but do not bring in any loss of generality of conceptual nature. To indicate the modification to be incorporated in (i) and (ii) of Thm 8 in case of existence of non unique openloop or feedback saddle points, let us assume that there exist two openloop saddlepoint solutions, say {')IIO, ')I20} and {')I100 , ')I200}. Then, because of the interchangeability property, {')IIO , ')12 0 0 } and {')IIOO , ')120 } will also be saddlepoint solutions; and if {')II., ')I2.} denotes some closedloop saddlepoint solution, the openloop representation of this pair of strategies will be equivalent to one of the four openloop saddlepoint strategy pairs. The saddlepoint value is, of course, the same for all these different saddlepoint solutions. 0
Theorem 8, together with Remark 7, provides a complete characterization of the saddlepoint solutions of twoperson zerosum discretetime dynamic games, if their openloop and/or feedback saddlepoint solutions are known; every other saddle point can be obtained as their representation under the given information pattern. Furthermore, whenever they both uniquely exist, the openloop representation of the feedback saddlepoint solution is necessarily the openloop saddlepoint solution. This does not imply, however, that if a feedback saddle point exists, an openloop saddle point necessarily has to exist (and vice versa); we have already seen in section 6.2.2. (cf. Prop. 2) that the former statement is not true in linearquadratic zerosum games. In conclusion, a nonzerosum multistage infinite game with dynamic information will, in general, admit uncountably many (informationally nonunique) Nash equilibria which are not interchangeable, unless it is a team problem or a zerosum game. In Basar and Selbuz (1976) it has actually been shown within the context of a class of dynamic twostage nonzerosum games that this is in fact a strict property, i.e. unless the nonzerosum game can be converted into an equivalent team problem or a zerosum game, it admits uncountably many Nash equilibria. One possible way of removing informational nonuniqueness in the noncooperative equilibria of nonzerosum dynamic games is to further restrict the equilibrium solution concept by requiring it to be a feedback Nash equilibrium (cf. Def. 2) which has been discussed thoroughly in section 6.2.2. However, it should be noted that this restriction (which is compatible with a delayed commitment mode of play) makes sense only if all players have access to the current value of the state (e.g.the CLPS, MPS or FB information pattern). If at least one of the players has access to delayed or openloop information, the feedback equilibrium solution cannot be defined, and one then has to resort to some other method for eliminating informational nonuniqueness. One such alternative is to formulate the nonzerosum game problem in a stochastic framework, which is the subject of the next section.
6
NASH EQUILIBRIA
271
6.4 STOCHASTIC NONZEROSUM GAMES WITH DETERMINISTIC INFORMATION PATTERNS We have already seen in the previous sections that one way of eliminating "informational nonuniqueness" in the Nash solution of dynamic nonzerosum games is to further restrict the equilibrium concept by requiring satisfaction of a "feedback" property (cf. Def. 2) which is, however, valid only under the CLPS and MPS information patterns. Yet another alternative to eliminate informational nonuniqueness is to formulate the dynamic game in a stochastic framework, in accordance with Def. 5.4; for, under an appropriate stochastic formulation, every strategy has a unique representation (cf. Prop. 5.3) and the statement of Prop. 5 is no longer valid. This section is, therefore, devoted to an investigation of existence, uniqueness and derivation of Nash equilibrium in stochastic dynamic games with deterministic information patterns. The class of Nperson Kstage discretetime stochastic infinite dynamic games to be treated in this section will be as described in Def. 5.4, but with the state equation (5.11) replaced by (33)
°
°
where {Xl' 1 , . . . , d is a set of statistically independent Gaussian random vectors with values in R", and cov (Ok) > 0 Vk E K. Furthermore, the cost functionals of the players are taken to be stageadditive, i.e. (34)
Under the CLPS information pattern, the following theorem now provides a set of necessary and sufficient conditions for any Nash equilibrium solution of such a stochastic dynamic game to satisfy: THEOREM 9 Every Nash equilibrium solution of an Nperson Kstage stochastic infinite dynamic game described by (33}(34) and with the CLPS information pattern (for all players)comprises only feedback strategies; andfor an Ntuple {yk"(Xk), k E K; i E N} to constitute such a solution, it is necessaryand sufficient that there exist functions Vi(k, .): R" + R (k E K, i E N) such that the following recursive relations are satisfied:
Vi(k, x)
= min Eo [gl(lf(x, uD + Ok' yf(x), ... ,yt1(x), u~e u~
II
i Yki+ \'( X) , ... ,YkN'(» 1 1kj'( x, Uki ) ) + 0] Uk' X, X + Vi (k +, k = EoJgl(]f(x, yf(X»+Ok,Yn X), ... ,yf(X), ... ,Yt"(X),X) (35) + Vi(k + 1,]f(x, yf(X» + Ok)] Vi(K + 1, x) = 0, iEN
272
DYNAMIC NONCOOPERATIVE GAME THEORY
where X, 7ki.(X, Uki)"=Jk X, YkI.() I' (
i ... , YkiI·() X , Uk' Yki+I.() X ,
... ,
Y N·(x ))
(36)
and Eo. denotes the expectation operation with respect to the statistics of Ok' Proof The first step is to verify the statement of the theorem for stagewise equilibrium solution (cf. Oef. 4); hence start with inequalities (32) at stage k = K, which read • Ji(y l.; ... ; y i · ; • • • ;yN ) s Ji(yl·; ... ; yr, . . . , y~h y~; ... ; yW),
where
1. ... ,y . N) _ E OK. [Li ( U, 1 .. ,U N) Iuy i i( .,IE ) . N] J i( y, XI
and
Ok g {0 1 ,
•.• ,
od, k E K.
Because of the stageadditive nature of L i , these inequalities equivalently describe an N person static game defined by
= EOK,x,[g~(J~(xk,
y~(.))
+ OK' yl"( .), ...
, y~(.),
... , yn .), xk]
(i)
where xt (k E K) is recursively defined by Xt+1 =fk(xt,y~(xn···
, yf (xt)) + Ok>
xT= Xl'
(ii)
Now, since OK is statistically independent of {OKI, x.}, (i) can equivalently be written as E OK l x {min Eo •
I
UKEUk
uiK,y~+I·(.),
K
[giK(J~(xk,u~)+OK,yn.),
... ,yf(.),xk]
= EOK[giK(Jk(xk,yk(.))+OK,yl"(.), ... ,y~(.),
... ,yi(l·(.),
(iii) ... ,yf(.),x't]},
which implies that the minimizing UiK will be a function of xk; furthermore (in contrast with the deterministic version) x't cannot be expressed in terms of xt 1, . . • , xT since there is a noise term in (ii) which directly contributes to additional errors in case of such a substitution (mainly because every strategy has a unique representation when the dynamics is given by (ii)). Therefore, under the CLPS information pattern, any stagewise Nash equilibrium strategy for Pi at stage K will only be a function of X K' i.e. a feedback strategy U'; = : (x K), which further implies that the choice of such a strategy is
6
273
NASH EQUILIBRIA
independent of all the optimal (or otherwise) strategies employed by the players at previous stages. Finally note that, at stage k = K, the minimizing solutions
of (iii) coincide with those of (35) since the minimization problems are equivalent. Let us now consider inequalities (32) at stage k = K  1, i.e.
and these can further be written as follows because of the stageadditive nature of L i and since y~(i E N) have already been determined at stage K as feedback strategies (independent of all the previous strategies): E(jK2 x ,1
{min UKE UK
EOK_Jg~I(]~I(X'Lb
U~_I)
, ... ,y~S(.),uiK_byiL\*(·),
+ Vi(K,]~_l(xLI' _
+ ()KI,
}'~I
(.)
• • . ,
Yi*K 
.. · ,y~~I(.),Xt_l) ULI)+()KI
E OK I [giK  1(Ji*K I (* X K
I'
Yi*K 
1( .) )
+ () K  l Y1* K  1( • ) ,
1(•), • . • ,
In writing down (iv), we have also made use of the statistical independence property of ()K2, Xl' ()KI and ()K' Through a reasoning similar to the one employed at stage k = K, we readily conclude that any y~ I ( . ) that satisfies (iv) will have to be a feedback strategy, independent of all the past strategies employed and the past values of the state; therefore U~_I = Y~_I(XK_I) (i EN). This is precisely what (35) says for k = K  1. Proceeding in this manner, the theorem can readily be verified for stagewise equilibrium; that is, every set of strategies V*; i E N} satisfying (35)constitutes a stagewise equilibrium solution for the stochastic game under consideration, and conversely every stagewise equilibrium solution of the stochastic dynamic game is comprised of feedback strategies which satisfy (35). To complete the last phase of verification of the theorem, we first observe that every stagewise equilibrium solution determined in the foregoing derivation is also a feedback Nash equilibrium solution (satisfying (3.28)) since the construction of stagewise equilibrium strategies at stage k = I did not depend on the strategies employed at stages k < I (note the italicized statements above). Now, since every feedback Nash equilibrium under CLPS information pattern is also a Nash equilibrium (this is a trivial extension of Prop. 1 to stochastic dynamic games), and furthermore since every Nash equilibrium solution is a stagewise equilibrium solution (see Remark 5), it readily follows that the statement of Thm 9 is valid also for the Nash equilibrium solution. 0
274
DYNAMIC NONCOOPERATIVE GAME THEORY
Remark 8 Firstly, since every strategy admits a unique representation under the state dynamics description (33) (with COV(Ok) > 0, k E K), the solution presented in Thm 9 clearly does not exhibit any "informational non uniqueness". Therefore, if non unique Nash equilibria exist in a stochastic dynamic game with CLPS information pattern, this is only due to the structure off,. and g~ and not a consequence of the dynamic nature of the information pattern as in the deterministic case. Secondly, the Nash equilibrium solution presented in Thm 9 will, in general, depend on the statistical properties (i.e. the mean and covariance) of the Gaussian noise term 0k(k E K). For the special case oflinearquadratic stochastic dynamic games, however, no such dependence on the 0 covariance exists as it is shown in the sequel.
As an immediate application of Thm 9, let us consider the class of linearquadratic stochastic dynamic games wherein f,. and g~ are as defined by (5a)(5b), and R~i > 0 (k E K, i EN). Furthermore, assume that E[OkJ = 0 Vk E K. Because of this latter assumption, and sincej, is linear and g] is quadratic, the minimization problem in (35) becomes independent of the covariance of Ok' as far as the optimum strategies are concerned, and a comparison with (15)readily leads to the conclusion that they should admit the same solution which is linear in the current value of the state. Therefore, we arrive at the following corollary which we give without any further verification. COROLLARY 4 An Nperson linearquadratic stochastic dynamic game as formulated above, and with Q~ + 1 ~ 0 (i E N, k E K), R~j ~ 0 (i, j EN, j 1= i, k E K) and C LPS information pattern, admits a unique Nash equilibrium solution if, and only if, (16a) admits a unique solution set {pr; i E N, k E K}, in which case the equilibrium strategies are given by
yf(xk ) = 
pr
Xk
(k E K, i EN);
(37)
that is, the unique solution coincides with the uniquefeedback Nash equilibrium solution of the deterministic version of the game (cf Corollary 1). 0
We therefore observe that an appropriate stochastic formulation in linearquadratic dynamic games with CLPS information pattern serves to remove informational nonuniqueness of equilibria, since it leads to a unique equilibrium which is one of the uncountably many equilibria of the original deterministic game. This particular equilibrium solution (which is also a feedback equilibrium solution in this case) may be viewed as the unique robust solution of the deterministic game, which is insensitive to zeromean random perturbations in the state equation. As an illustration, let us consider the threeperson dynamic game example of section 6.3.1, and, out of the uncountably many Nash equilibrium solutions obtained there, let us attempt to select the
6
NASH EQUILIBRIA
275
one which features such a property. The new (stochastic) game is now the one which has zeromean random perturbations in (27a)(27b), more specifically, the state equation is
where E[()2] = E[I1I] = 0; 11 2, 11 1 , XI are Gaussian, statistically independent. and E[(11 2)2] > 0, E[(I1Jl2] > 0. The unique Nash equilibrium solution of this stochastic game is {yl*(X 2) =  (2/3)x 2, y2*(X2) = (l/3)X2' y3*(X I) =  (2/11)xI} which is also the feedback Nash equilibrium solution of the associated deterministic game. If the underlying information pattern of a stochastic dynamic game is not CLPS (or more precisely, if every player does not have access to the current value of the state), the statement ofThm 9 is not valid. It is, in fact, not possible to obtain a general theorem which provides the Nash equilibrium solution under all different types of deterministic information patterns, and therefore one has to solve each such game separately. One common salient feature of all these solutions is that they are all free of any "informational nonuniqueness", and any nonuniqueness in equilibria (ifit exists) is only due to the structure of the cost functionals and state equation. In particular, for linearquadratic stochastic dynamic games, it can be shown that the Nash equilibrium solution will in general be unique under every deterministic information pattern. One such class of problems, viz. N person linearquadratic stochastic dynamic games wherein some players have CLPS information and others OL information, have been considered in Basar (1975) and a unique Nash equilibrium solution has been obtained in explicit form. This solution has the property that for the players with CLPS information the equilibrium strategies are dependent linearly not only on the current value of the state, but also on the initial state. The following example now illustrates the nature of this solution and also serves to provide a comparative study of Nash equilibria under four different information patterns. Example J Consider the stochastic version of the threeperson nonzerosum game of section 6.3.1. Under four different information structures, namely when (i) PI and P2 have access to CLPS information and P3 has OL information (this case is also covered by Corollary 4), (ii) PI has CLPS information and P2, P3 have OL information, (iii) P2 has CLPS information and PI, P3 have OL information, (iv) all three players have OL information. In these stochastic games the state dynamics are described by X3=X2+UI+U2+112} X2=XI+U 3+111,
276
DYNAMIC NONCOOPERATIVE GAME THEORY
where {8 1 , 8 2 , XI} is a set of statistically independent Gaussian variables with E[8 k J = 0, E[(8d 2 ] > 0, k = 1, 2.t The cost functionals are again as given by (28a)(28c), but now their expected values determine the costs incurred to the players. (i) Under the first information pattern (CLPS for PI and P2, OL for P3), the unique solution also follows from Corollary 4 and is given by (as discussed right after Corollary 4) I' 2 2' 1 3' y (X2) = 3X2' Y (X2) = 3X2, Y (xd =
2 XI. TI
The corresponding expected values of the cost functionals are J
I'
72
8
= lliE[(xtl ] +gE[(8tl ] + E[(8 2) ], 2
2
2
(38a)
(ii) Under the second information pattern (CLPS for PI, OL for P2 and P3), direct derivation or utilization of the results of Basar (1975) leads to the unique Nash equilibrium solution I'
Y (x2,xd
1
1
2'
1
3'
1
= "2X2 gX1' Y (xd = 4x1, Y (xd = 4X1.
Note that this unique solution of the stochastic game under consideration corresponds to one of the uncountably many equilibrium solutions presented in Prop. 4 for its deterministic versionsimply set p = 1/6, q = 1/3. The corresponding expected values of the cost functionals are
J
I'
1 2 1 2 = "2E[(xd J +"2E[(8d J + E[(8 2)2 J,
36 J 2 ' = _J 3 ' = 1 E[(xd 2J ~E[(8d2]
(39a)  E[(8 2)2].
(39b)
(iii) Under the third information pattern (CLPS for P2, OL for PI and P3), direct derivation leads to a unique solution set
t This Gaussian distribution may be replaced by any probability distribution that assigns positive probability mass to every open subset of R" (see also Remark 5.11),without affecting the results to be obtained in the sequel.
6
NASH EQUILIBRIA
277
which also corresponds to one of the solution sets of Prop. 4, this time obtained by setting p = 2/3, q = 2/3. The corresponding expected values of the cost functionals are J1·
8S
=2
E[(X d 2] +4E[(8d 2]
+ E[(8 2)2],
·= ·=
J2
_J 3
26SE[(Xd2]2E[(8d2]E[(82)2].
(40a) (40b)
(iv) Under the fourth information pattern (OL for all three players), the unique Nash equilibrium solution is
which can also be obtained from Prop. 4 by letting p = 2/3, q = 1/3. The corresponding expected cost values are J1·
=
8S
2 E[(X 1)2]
+ E[(8d 2] + E[(8 2)2],
6S J2· = _J3· =  2 E[(Xd 2] E[(8d 2]  E[(8 2)2].
(41a) (41b)
A comparison of the expected equilibrium cost values of the players attained under the four different information patterns now reveals some interesting features of the Nash equilibrium solution. (1) Comparing (39a) with (41a), we observe that when the other players' information patterns are OL, an increase in information to PI (that is, going from OL to CLPS) could improve his performance (if E [(()1)2] is sufficiently large) or be detrimental (if E[(8d 2]is relatively small). (2) When P2 has CLPS information, a comparison of (38a) and (40a) leads to a similar trend in the effect of an increase of PI's information to his equilibrium performance. (3) When PI has CLPS information, an increase in information to P2leads to degradation in J2· (simply compare (38b) with (39b)). (4) When PI has OL information, an increase in information to P2leads to improvement in his performance (compare (40b) with (41b)). These results clearly indicate that, in nonzerosum games and under the Nash equilibrium solution concept, an increase in information to one player does not necessarily lead to improvement in his equilibrium performance, and at times it could even be detrimental. 0
278
DYNAMIC NONCOOPERATIVE GAME THEORY
6.5 APPENDIX A: OPENLOOP AND FEEDBACK NASH EQUILIBRIA OF DIFFERENTIAL GAMES This section presents results which are counterparts of those given in section 2 for differential games that fit the framework of Def. 5.5; that is, we consider Nperson quantitative dynamic games defined in continuous time, and described by the state equation x(t) =f(t,x(t),u 1 (t), ... , uN(t)); x(o)
= xo
(A1a)
and cost functionals Li(U 1 ,
••.
,uN) =
S: gi(t,X(t),u
1
(r), ... ,uN(t))dt + qi(x(T));iEN. (A1b)
Here, [O,T] denotes the fixed prescribed duration of the game, X o is the initial state known by all players, X(t)E R" and Ui(t)E s' ~ RIIIj (iEN) 'VtE[O, T]. Furthermore,f and yiE r (iE N) satisfy the conditions ofThm 5.1 so that, for a given information structure (as one of (i), (ii) or (iv) of Def. 5.6), the state equation admits a unique solution for every corresponding N tuple {yi E r, i EN}. Nash equilibrium solution under a given information structure is again defined as one satisfying the set of inequalities (3) where Ji denotes the cost functional of Pi for the differential game in strategic (normal) form.
6.5.1
Openloop Nash equilibria
The results to be presented here are counterparts ofthose given in section 6.2.1; and in order to display this relationship explicitly, we shall adopt the same numbering system (as in section 6.2.1) for theorems, definitions, etc. with "A" preceding the corresponding number. That is, Thm AI in the sequel corresponds to Thm 1 of section 6.2.1, Def. AI corresponds to Def. 1,etc. We therefore first have AI For an Nperson dijferential game of prescribed fixed duration [0, T], let (i) f(t,. , u l , ••• , UN) be continuously differentiable on R", V t E [0, T], (ii) gi(t,., uI, ... , UN) and qi ( •) be continuously dijferentiable on R", V t E [0, T], iEN. Then, if Ir" (t, x o ) = u i* (t); i E N} provides an openloop Nash equilibrium solution, and {x* (t), ~ t ~ T} is the corresponding state trajectory, there exist N costate functions pi ( . ): [0, T] + R n, i EN, such that the following THEOREM
°
6
279
NASH EQUILIBRIA
relations are satisfied: x*(t) =f(t,x*(t),uIO(t), ... , uN"(t));x*(O) =
Xo
yiO(t, x o) == uiO(t) = arg min H'{z, pi(t), x*(t), ul°(t), ... , UiES
i
UiIO(t), Ui,U i + I°(t), ... , uN°(t))
(A2)
P. i' (t) =  aH i (t,p i ( t), x *,u 1 °(t,) ... , u NO( t ))
ax
"' p' (T)
O. N = oxq'(X*(T)),iE ,
where Hi(t,p,X,u I, ... ,UN)ggi(t,x,U I, ... ,uN)+lf(t,x,u I, ... ,UN), tE[O, T], iEN. Proof Follows the same lines as in the proof of Thm 1, but now the minimum principle for continuoustime control systems (i.e.Thm 5.4)is used instead of Thm 5.5. 0
Theorem AI provides a set of necessary conditions for the openloop Nash equilibrium solution to satisfy, and therefore it can be used to generate candidate solutions. For the special class of "linearquadratic" differential games, however, a unique candidate solution can be obtained in explicit terms, which can further be shown to be an openloop Nash equilibrium solution under certain convexity restrictions on the cost functionals of the players: DEFINITION AI An Nperson differential game of fixed prescribed duration is of the linearquadratic type if Si = R m'(i E N) and
f(t,
X,
uI ,
•.. ,
UN) = A(t)x +
L Bi(t)Ui ieN
L:
gi(t, X, u I , •.. , UN) = ~(X'Qi(t)X+ ui'Rij(t)u j) . 1 . JEN q'(X) = 2" X' Qjx, where A(.), B i(.), Qi(.), Rij(.) are matrices ofappropriate dimensions, defined
on [0, TJ, and with continuous entries (i,j EN). Furthermore Qj, Qi(.) are 0 symmetric and R ii( . ) > 0 (i EN). THEOREM A2 For an Nperson linearquadratic differential game with Qi( .) ~ 0, Q~ ~ 0 (i E N) let there exist a unique solution set {M i; i EN} to the
280
DYNAMIC NONCOOPERATIVE GAME THEOR Y
coupled matrix Riccati differential equations Mi+MiA+A'Mi+Qi_M i
L
Bi(Rii)lBi'Mi=O;
JEN
(A3)
Then, the dijTerential game admits a unique openloop Nash equilibrium solution given by yiO(t, x o )
=uiO(t) =  Rii(t) 
1
B i' (t)Mi(t)x*(t) (i
E
N)
(A4)
where x* (.) denotes the associated state trajectory defined by
x* (r)
=
<1>(t, O)x o
d dt <1>(t, a)
=
F(t)<1>(t, a);
<1>(a, a)
=
I
(A5)
F ~A  L BiRii'Bi'Mi. lEN
Proof For the linearquadratic differential game, and in view of the additional restrictions Qi(.) ~ 0, Q~ ~ 0, Li(u 1 , ••• , UN) is a strictly convex function of u'(. )for all permissible control functions u i ( . ) (j f i,j E N)and for all X o ERn. This then implies that Thm Al is also a sufficiency theorem and every solution set of the first order conditions provides an openloop Nash solution. We, therefore, now show that the solution given in Thm A2 is the only candidate solution. First note that the Hamiltonian is
n'«. p, x, u 1 ,
... ,
UN) =
~
(X'Qi X + .LUi'RijU i) + pi'(AX + L Biui) JEN
JEN
whose minimization with respect to ui(t) E R m; yields the unique relation uiO(t) = Rii(t)lBi(t)'pi(t), iEN.
(i)
Furthermore, the costate equations are pi
=
_Qix*_A'pi;
pi(T)
=
Q~x(T)
(iEN),
and the optimal state trajectory is given by i*
=
Ax* 
L BiRjj'Bi'pi; lEN
x*(O)
=
Xo'
(ii)
This set of differential equations constitutes a twopoint boundary value problem, the solution of which can be written, without any loss of generality, as {pi(t) = Mi(t)x*(t), iEN; x*(t), tE[O, T]} where M i(.) are (n x n)
6
281
NASH EQUILIBRIA
dimensional matrices. Now, substituting pi = M i X* (i E N) into the costate equations, we arrive at the conclusion that M i (i E N) should satisfy the coupled matrix Riccati equations (A3). The expressions for the openloop Nash strategies follow readily from (i) by substituting pi = Mix*, and likewise the 0 associated state trajectory (A5) follows from (ii), Remark Al It is possible to obtain the result of Thm A2 in a somewhat different way, by directly utilizing Prop. 5.2; see Problem 10. 0
For the class of twoperson zerosum differential games, we rewrite the cost functional (to be minimized by PI and maximized by P2) as L(u 1, u 2)
=
S; g(t, x, ut, u 2) dt + q(x (T)),
and obtain the following set of necessary conditions for the saddlepoint strategies to satisfy. THEOREM A3 For a twoperson zerosum differential game ofprescribedfixed duration [0, T], let (i) f(t,. ,U 1,U2) be continuously differentiable on R", 'VtE[O,T], (ii) g(t,. ,ut,u 2) and q(.) be continuously differentiable on R", 'VtE[O,T]. Then, if {yi. (t, xo) = u i• (r): i = 1, 2} provides an openloop saddlepoint solution, and {x*(t), :s; t :s; T} is the corresponding state trajectory, there exists a costate function p(.): [0, T] + R" such that the following relations are satisfied:
°
x*(t) =f(t,x*(t),u 1·(t),u 2·(t));x*(0) = Xo H(t,p,x*,U 1·,U 2 ) ~ H(t,p,x*,u1*,u 2 . ) :s; H(t,p,x*,u\u 2 . 'VU
p'(t)
o
1
),
(t) ES 1,U2 (t) E S 2 , t E [0, T] 1.
2.
(A6)
0
= ox H(t,p(t),x*, u (t),u (t));p'(T) = oxq(x*(T))
where
Proof This result follows directly from Thm AI by taking N = 2, _g2 g g,ql == q2 g q, and by noting that pl(.)== p2(.)gp(.)and H 1(.) ==  H 2(. ) g H (.). 0
e' ==
For the special case of a twoperson zerosum linearquadratic game described by the state equation
x = A(t)x+B1(t)U 1 +B 2(t)u 2;
x(O) = X o
(A7a)
282
DYNAMIC NONCOOPERATIVE GAME THEORY
and the objective functional
1fT (x'(t)Q(t)x(t)+U l(t)'U l(t)U 2(t)'u2(t))dt
L(U l,U 2) ="2
0
I , +"2X(T)QfX(T),
(A7b)
we now first impose additional restrictions on the parameters so that L(u 1, u 2) is strictly concave in u2 • LEMMA AI For the linearquadratic twoperson zerosum differential game introduced above, the objective functional L(u l, u 2) is strictly concave in u 2 (for all permissible u 1 ( • )) if, and only if, there exists a unique bounded symmetric solution S(.) to the matrix Riccati equation S+A'S+SA+Q+SB 2B 2'S=0; S(T)=Qf' (A8)
on the interval [0, T]. Proof A reasoning similar to the one employed in the proof of Lemma I leads to the observation that the requirement of strict concavity is equivalent to the condition of existence of a unique solution to the optimal control problem
min u(.)
subject to
{f
T
[u(t)'u(t)  y(t)'Q(t)y(t)] dt  y(T)'Qf y(T)}
0
y=
A(t)y + B 2(t)u;
y(O) = Xo'
The lemma then follows, in view of this observation, from Remark 5.9.
0
THEOREM A4 For the twoperson linearquadratic zerosum differential game described by (A7a)(A7b) and with Q f ~ 0, Q(.) ~ 0, let there exist a unique bounded symmetric solution to the matrix Riccati differential equation (A8) on [0, T]. Then, (i) there exists a unique bounded symmetric solution M ( .) to the matrix Riccati differential equation M + A'M + MA + Q  M(B l B l'  B 2B 2')M = 0; M(T) = Qf' (A9)
on [O,T],
(ii) the game admits a unique openloop saddlepoint solution given by y10(t,XO ) y2 (t, xo)
== u10(t) = Bl(t)' M(t)X*(t)} == u 2*(t) = B 2(t)' M(t)x*(t),
(A10)
6
283
NASH EQUILIBRIA
where x*(.) denotes the associated state trajectory defined by x*(t) = (t,O)x o d dt (t, (1) = F(t)(t, (1); <1>«(1, (1) = I F ~ A  (B 1 B 1 '

(AII)
B ZBZ')M.
Proof The steps involved in the proof of this theorem are similar to those involved in the proof of Thm 4: we first establish existence of a unique saddlepoint solution to the linearquadratic differential game under consideration, and then verify (i) and (ii) above in view of this property. For the former step we shall proceed rather casually, since a rigorous verification requires some prior knowledge of functional analysis which is beyond the scope of this book. The interested reader can, however, refer to Balakrishnan (1976) for a rigorous exposition of the underlying mathematics, or to Bensoussan (1971) and Lukes and Russell (1971) for a framework (Hilbert space setting) wherein similar mathematical techniques are used at a rigorous level but in a somewhat different context. Let V i, the space of permissible control functions u i ( . ), be a complete (Banach) space equipped with an inner product
u(t)'v(t)dt,
for u, VE Vi,
i
= 1,2.
Then, since L(ul, u z) is quadratic in x, u 1, u Z, and x is linear in u 1, u Z, L(u 1,u z) can be written as
1, z) L(u u =
~ 1  < Z, Lzu Z >z +
+
>1 
1
,
Z
L 12u >1
Z +r3'
(i)
where L1:V 1 + V 1, Lz:V z + V Z, L 12:V z + V 1 are bounded linear operators, r 1 E V 1, rz E V Zand r3 E R. Moreover, since Q(.) ~ 0, Qf ~ 0, L 1 is a strongly positive operator (written as L 1 > 0), and furthermore under the condition of unique solvability of (A8) (cf. Lemma AI) L z > 0. Now, if L(u 1, u z) admits a unique saddle point in openloop policies, it is necessary and sufficient that it satisfy the following two relations (obtained by differentiation of (i) ) 1 L1u + L 12uz +r1 = L~zu1+Lzuz+rz
°
=0,
where L~z denotes the adjoint ofL 12. Solving for U Z E V Z from the second one (uniquely), and substituting it into the first one, we obtain the single relation (L 1 +L12Li1L~z)Ul
+r1L 12Li 1rz
=
°
(ii)
284
DYNAMIC NONCOOPERATIVE GAME THEORY
which admits a unique solution, since L 1 + L 12Li 1LT2 > 0 and thereby invertible. (This latter feature follows since L 1 > 0 and L 12 Li 1LT2 is a nonnegative selfadjoint compact operator (see Balakrishnan, 1976, for details).) We have thus established existence of a unique openloop saddlepoint strategy for PI under the condition of Lemma AI. This readily implies existence of a unique openloop saddlepoint strategy for P2 since L 2 is invertible. Thus completing the proof of existence of a unique saddle point, we now turn to verification of (i) and (ii). At the outset, let us first note that the unique saddlepoint solution should necessarily satisfy relations (A6) which can be rewritten for the linearquadratic game as x*(t) = A(t)x*(t) + B1(t)UI'(t)+ B 2(t)U 2'(t); u1'(t) = B1(t)'p(t)
x*(O)
u 2·(t)
= B 2 (t)' p(t)
p(t)
=  Q(t)x*(t)  A(t)'p(t); p(T) = Qfx*(T).
=
X
o
1
(iii)
Since rl (.) and t z (.) introduced earlier are linear functions of x o, the unique solution u I· of (ii)is necessarily a linear function of x o, and so is u". Therefore, p(. ) in (iii)should depend on X o linearly. This readily implies that we can write, without any loss of generality, p(t) = M(t)x*(t), where M(.) is some bounded matrix function of dimension (n x n) and with continuously differentiable elements. Now, substitution of this relation into (iii),and requiring (iii)to hold true for all X o E R", leads to (i) and (ii) of Thm A4. 0
6.5.2
Closedloop nomemory (feedback) Nash equilibria
Under the memoryless perfect state information pattern, the following theorem provides a set of necessary conditions for any closedloop nomemory Nash equilibrium solution to satisfy. A5 F or an N person differential game of prescribed fixed duration [0, T], let (i) f(t,.) be continuously differentiable on R" x S 1 X . .. x S N, 'v' t E [O,T], (ii) gi(t,. )andqi(.)becontinuouslydifferentiableonR" x Sl x ... x SNand R", respectively, 'v't E [0, T], i E N. Then, if {yi·(t,x,xo) = ui·(t); iEN} provides a closedloop nomemory Nash equilibrium solution such that yi·(t,. , xo) is continuously differentiable on R" ('v' t E [0, T], i EN), and {x*(t), 0 ~ t ~ T} is the corresponding state trajectory, there exist N costate functions pi(. ): [0, T] + R", i E N, such that the following THEOREM
6
285
NASH EQUILIBRIA
relations are satisfied: x*(t) = f(t, x*(t), u "(r), ... , UN·(t»; X*(O) = X o yi·(t, X*, X O)
== U'·(t) = arg min n'«, pitt), x*(t), u "(r),
... , UiI·(t),
ui, Ui+I·(t), ... , uN·(t» P• i' ( t)
.,
0 H i(t, Pi() =  OX t, x *,y 1.( t, x * ,x) o , ... , Y11·(t, x *,x o ),u i·,
p' (T)
(A12)
o.
= OX q'(x*(T»), i E N,
where Hi is as defined in Thm AI. Proof Let us consider the ith inequality of (3), which fixes all players' strategies (except the ith one) at yj = yr (j =1= i,jEN) and constitutes an optimal control problem for Pi. Therefore, relations (A12)directly follow from Thm 5.4 (the minimum principle for continuoustime systems), as do relations (A2) of Thm AI. We should note, however, that the partial derivative (with respect to x) in the costate equations of (A12) receives contribution also from dependence of the remaining N  1 players' strategies on the current value of xa feature which is clearly absent in the costate equations of Thm AI. 0
The set of relations (A12) in general admit an uncountable number of solutions which correspond to "informationally non unique" Nash equilibrium solutions of the differential game under memoryless perfect state information pattern, one of which is the openloop solution given in Thm AI. The analysis of section 6.3.1 can in fact be duplicated in the continuous time to produce explicitly a plethora of informationally nonunique Nash equilibria under dynamic information; and the reader is referred to Basar (1977b)for one such explicit derivation. In order to eliminate informational nonuniqueness in the derivation of Nash equilibria under dynamic information (specifically under the MPS and CLPS patterns) we constrain the Nash solution concept further, by requiring it to satisfy conditions similar to those of Def. 2, but now in the continuous time. Such a consideration leads to the concept of a "feedback Nash equilibrium solution" which is introduced below. DEFINITION A2 For an Nperson differential game of prescribed fixed duration [0, T] and with memoryless perfect state (MPS) or closedloop
286
DYNAMIC NONCOOPERATIVE GAME THEORY
perfect state (CLPS) information pattern, an N tuple of strategies [yi. E r, i E N} t constitutes afeedback Nash equilibrium solution if there exist functionals Vi L .) defined on [0, T] x R" and satisfying the following relations for each i E N: Vi(T,x) = qi(X)
r gi(S,X*(S),y[·(S, 11.), : ; r gi(s, x'(s), Y1·(S, 11.),
Vi(t,x) =
, yi·(S,l1.), ... , yN·(s,l1.))ds+qi(x*(T)) , y' I·(S, 11.), yi(S, 11.), y i + I·(S, 11.),
... , yN·(S,l1.))ds + qi(Xi(T)), 'v'yiEri,XER"
(Al3)
where, on the interval [t, T], xi(s) = f(s, x'(s), yl·(S, 11.), ... , yil·(s, 11.), yi(S, 11.), yi+ "(s, 11.), ... , yN·(S, 11.)); x'{r) = x
(A14)
x*(s) = f(s, x*(s), Y I·(S, 11.), ... , y'·(S,l1.), ... , yN·(s,l1.)); x*(s) = x
and 11. stands for either the data set {x(s),xo} or {x(a), a ::; s}, depending on whether the information pattern is MPS or CLPS. 0 One salient feature of the concept of feedback Nash equilibrium solution introduced above is that if an N tuple {yi.; i E N} provides a feedback Nash equilibrium solution (FNES) to an N person differential game with duration [0, T], its restriction to the time interval [t, T] provides an FNES to the same differential game defined on the shorter time interval [t, T], with the initial state taken as x(t), and this being so for all 0::; t < T. An immediate consequence of this observation is that, under either MPS or CLPS information pattern,feedback Nash equilibrium strategies will depend only on the time variable and the current value of the state, but not on memory (including the initial state xo). The following proposition now verifies that an FNES is indeed a Nash equilibrium solution. PROPOSITION AI Under the memoryless (respectively, closedloop) perfect state information pattern, every feedback Nash equilibrium solution of an Nperson differential game is a closedloop nomemory (respectively, closedloop) Nash equilibrium solution (but not vice versa).
Proof Note that Vi(t,x) is the value function (cf. section 5.5.2)associated with the optimal control problem of minimizing J i (yt., ... , yit.,yi,yi+1*, ... ,yN·) over yiEr i (see in particular, equation (5.24)). tHere
r is chosen to be compatible with the associated
information pattern.
6
287
NASH EQUILIBRIA
Therefore (Al3), together with the given terminal boundary condition, implies satisfaction of the ith inequality of (3). Since i E N was arbitrary, and the foregoing argument is valid under both the MPS and CLPS information 0 patterns, the result stands proven. If the value functions Vi (i E N) are continuously differentiable in their both arguments, then N partial differential equations, related to the HamiltonJacobiBellman equation of optimal control (cf. section 5.5.2), replace (Al3). THEOREM A6 For an N person differential game of prescribed fixed duration [0, T], and under either MPS or CLPS information pattern, an Ntuple of strategies {yiOEri; iEN} provides a feedback Nash equilibrium solution if there exist functions Vi: [0, T] x R" ...... R, iEN, satisfying the partial differential equations

aVi(t,x) a t
. = mm ~E~
=
[aVi(t, x)Jo 0  ] a ' (t,x,u')+g' (t,x,u') X
aVi(t,x)Jo 0 _0 0 ax '(t,x,y' (t,x))+g' (t,x,y' (t,x))
(AI5) where O( t, x, u i).6. = f( t, x, Y10() t, x , ... ,y i_10() t, x , u i , Y i+IO() t, x , .. ,y N°()) t, x giO(t,x,Ui)f;gi(t,x,yIO(t,X), .. ,yi10(t,x),Ui,yi+10(t,X), .. ,yN°(t, x)).
Ji
Proof In view of the argument employed in the proof of Prop. AI, this result follows by basically tracing the standard technique of obtaining (5.25) from (5.24). 0
For the class of Nperson linearquadratic differential games (cf. Def. AI), it is possible to obtain an explicit solution for (AI5), which is quadratic in x. This also readily leads to a set of feedback Nash equilibrium strategies which are expressible in closedform. COROLLARY AI For an Nperson linearquadratic differential game with Qi(.) ~ 0, Qj ~ 0, R ii (.) ~ (i, jEN, i =1= j) let there exist a set of matrix valued functions Zi(.) ~ 0, iEN, satisfying the following N coupled matrix Riccati
°
differential equations: ii+Zi
F + F' Zi+
L ZiBiRjj1RiiRW1Bi' Zi+Qi = 0; JEN
(AI6a)
288
DYNAMIC NONCOOPERATIVE GAME THEORY
where F(t)gA(t)
L Bi(t)Rii(t)lBi(t)'Zi(t).
(A16b)
iEN
Then, under either the M PS or C LPS information pattern, the differential game admits a feedback Nash equilibrium solution in linear strategies, given by
(A17) and the corresponding values of the cost functionals are
..1. 2x~ ZI(O)XO, (iEN).
J I" = V1(0,X O ) =
Proof Simply note that, under the condition of solvability of the set of matrix Riccati differential equations (A16a), the partial differential equation (A15) admits a solution in the form Vi(t,x) = ix' Zi(t)X (iEN), with the corresponding minimizing controls being given by (A17). The "nonnegative definiteness" requirement imposed on Zi(.) is a consequence of the fact that Vi(t, x) ~ 'v' xER", tE[O,T], this latter feature being due to the eigenvalue restrictions imposed a priori on Qi(.), Q~ and Rii(.). Finally, the corresponding "Nash" values for the cost functionals follow from the definition V (r,x) (see equation (A13)). 0
°
Remark A2 The foregoing corollary provides only one set of feedback Nash equilibrium strategies for the linearquadratic game under consideration, and it does not attribute any uniqueness feature to this solution set. However, in view of the discretetime counterpart of this result (cf. Corollary 1), one would expect the solution to be unique under the condition that (A16a) admits a unique solution set; but, in order to verify this, one has to show that it is not possible to come up with other (possibly nonlinear) solutions that satisfy (A13), and hitherto this has remained as an unresolved problem. What has been accomplished, though, is verification of uniqueness of feedback Nash equilibrium when the players are restricted at the outset to linear strategies (cf. Lukes, 1971). 0
We now specialize the result ofThm A6 to the class of twoperson zerosum differential games of prescribed fixed duration, and obtain a single partial differential equation to replace (A15). COROLLARY A2 For a twoperson zerosum differential game of prescribed fixed duration [0, T], and under either M PS or CLPS information pattern, a pair
6
289
NASH EQUILIBRIA
r
of strategies {yi" E i ; i = 1,2} provides a feedback saddlepoint solution if there exists a function V: [0, T] x R" + R satisfying the partial differential equation 
aV(t,x) a t
.
= mm max
u l eSI u2ES2
[aV(t,x) 1 2 1 Z ] a f(t,x,u,u )+g(t,x,u .u ) X
. [a Va(t, x) f(t,x,u
= max nun
u 1 E S 2 u l ES1
1
2
1
2 ]
(A18)
,u )+g(t,x,u ,u )
X
1" 2" 1" 2" ) = aV(t,x) ax f(t,x,y (t,x),y (t,x»+g(t,x,y (t,x),y (t,x),
V(T,x) = q(x).
N
Proof This result follows as a special case of Thm A6 by taking = 2,gl(.):= g2(.)gg(.),and ql(.):= _q2(.)gq, in which case
VI :=  V 2 g V and existence of a saddle point is equivalent to interchangeability of the min max operations. 0
Remark A3 The partial differential equation (A18) was first obtained by Isaacs (see, Isaacs, 1975), and is therefore sometimes called after his name. Furthermore, the requirement for interchangeability of the min max operations in (A18) is also known as the "Isaacs condition". For more details on this, the reader is referred to Chapter 8. 0
We now obtain the solution of the Isaacs equation (A18) for the case of a twoperson zerosum linearquadratic differential game described by (A7aHA7b). Since the boundary condition is quadratic in x (i.e. V (T, x) = q(x) = ix' Qf x), we tryout a solution in the form V (r, x) = ix' Z (t)x, where Z ( .) is symmetric and Z (T) = Qf. Substitution of this quadratic structure in (A18) yields, for the linearquadratic problem, the following equation:
~x'
Z(t)x
= min
u 1eR m i
+
max [x'Z(t)[A(t) + B1(t)U I
u 2eR m2
B2(t)U2J+~X'Q(t)X
2
+~UI2'UI
_~U2'2
UZ ]
.
Since the kernel on the RHS is strictly convex (respectively, strictly concave) in u 1 (respectively, U Z), and is further separable, it admits a unique saddlepoint solution given by
290
DYNAMIC NONCOOPERATIVE GAME THEORY
If these unique values for uland u 2 are substituted back into the RHS ofIsaacs equation, we finally obtain the following matrix Riccati differential equation for Z ( .), by also taking into account the starting assumption that Z is symmetric (which is no loss of generality) and the requirement that the equation should be satisfied for all x E R":
t + A' Z + ZA + Q 
21
Z (B 1 B 1'  B 2 B ) Z = 0; Z (T) = QJ' (A19) The following theorem now even strengthens this result:
A7 Let there exist a unique symmetric bounded solution to the matrix Riccati differential equation (A19) on the interval [0, T]. Then, the twoperson linearquadratic zerosum differential game described by (A7a)(A7b), and under the M PS or C LPS information pattern, admits a unique feedback saddlepoint solution given by
THEOREM
(A20) The associated state trajectory is obtained as the solution of x(t)
= [A (r)  (B 1 (t)B 1 (t)')  B 2 (t)B 2 (r)") Z (t)] x (t);
x(O)
= x o, (A21a)
and the saddlepoint value of the game is J(')'l., ,),2.) =
~X~Z(O)xo.
(A21b)
Proof With the exception of the "uniqueness" of the feedback saddlepoint solution, the result follows as a special case of Corollary A2,as explained prior to the statement of the theorem. To verify uniqueness of the solution, one approach is to go back to Isaacs equation and investigate existence and uniqueness properties of solutions of this partial differential equation. This, however, requires some advance knowledge on the theory of partial differential equations, which is beyond the scope of our coverage in this book. In its stead, we shall choose here an alternative approach which makes direct use of the linearquadratic structure of the differential game. This approach will provide us with the uniqueness result, as well as the equilibrium strategies (A20). Towards this end, let us first note that, under the condition of unique solvability of (A19),x(T),QJx(T) can be written as x(T),QJx(T)
== x(T)' Z(T)x(T) ==
= x~
Z (O)xo +
x'oZ(O)x o +
g x(t)'[ A' Z 
 B 2 B2')Z] dt + f~ +x(t)' Z(t)x(t)]dt
r
:t [x(t)' Z(t)x(t)]dt
ZA Q + Z(B 1 B 1 '
[x (t)' Z (t)x (t)
6
NASH EQUILIBRIA
291
where we have suppressed the tdependencies of the terms appearing in the integrands. If we now substitute for x(.) from (A7a), we obtain (after cancelling out some common terms) x(T),QJ x(T)
+ S; [u l 'B I 'Zx+ u 2'B2'ZX + x' ZBlu l
== x~Z(O)xo
+x' ZB 2u 2 x'Qx +x'ZB 1BI'Zx] dt.
If this identity is used in (A7b), the result is the following equivalent expression for L(u 1 ,u 2 ): L(U
1,U 2
)
= ~x~Z(O)xo
~ f:
+ ~f: (u 2

(u l
+ BI'Zx)' (u 1 + B1'Zx)dt
B 2 ' Zx)' (u 2

B 2 ' Zx)dt.
Since the first term above is independent of u 1 and u 2 , L(U 1,U 2 ) is separable in u 1 and u2 , and it clearly admits the unique saddlepoint solution given by (A20). The saddlepoint value of L(U 1 , U 2 ) also readily follows as tx~Z(O)xo.
0
Remark A4 A comparison of Thms A4 and A7 reveals one common feature of the openloop and feedback saddlepoint solutions in linearquadratic differential games, which is akin to the one observed for games described in discrete time (cf. Remark 4). The two matrix Riccati differential equations (A9) and (A19) are identical, and by this token the state trajectories generated by the openloop saddlepoint solution and feedback saddlepoint solution are identical; and furthermore the openloop values of the feedback saddlepoint strategies are correspondingly the same as the openloop saddlepoint strategies. A comparison of the existence conditions of Thms A4 and A7, on the other hand, readily leads to the conclusion that existence of an openloop saddlepoint solution necessarily implies existence of a feedback saddlepoint solution (since then (A9) (equivalently (A19» admits a unique bounded symmetric solution), but not vice versa. This latter feature is a consequence of the following observation: for the class of linearquadratic differential games satisfying the restriction
(A22) the matrix Riccati differential equation (A19) admits a unique nonnegative definite solution, whenever QJ ~ 0, Q(.) ~ 0 (since then (A19) becomes comparable with (5A8a); however, (A22) is not sufficient for (A8) to admit a bounded solution on [0, T]. This situation is now exemplified in the ~~cl.
0
292
DYNAMIC NONCOOPERATIVE GAME THEORY
Example AI Consider a scalar linearquadratic twoperson zerosum differential game described by
x = 2u 1 
U2; X(o)
L(u 1, u 2) =
~
I:
= XO, tE[0,2];
[X(2)]2
+~
{[U 1(t )] 2  [u 2(t )] 2} d t.
For these parametric values, the Riccati differential equation can be rewritten as t  Z2 = 0; Z(2) = 1 which admits the unique solution 1 Z(t) = 3 , t
tE[0,2].
Hence, the differential game under consideration admits the unique feedback saddlepoint solution y'·(t, x)
= [2/(3t)] x(t)
y2·(t, x)
=  [ 1/(3  t)]
x (t).
Let us now investigate existence of an openloop saddlepoint solution for the same scalar differential game. The Riccati differential equation (A8)can be rewritten, for this example, as S+S2=0; S(T)=1
(tE[O,T], T=2)
which admits a bounded solution only if T < 1, in which case 1
S(t)tT+l' However, since T = 2 > 1 in our case, the existence condition of Thm A4 is not satisfied, and therefore an openloop saddlepoint solution does not exist. This implies, in view of Lemma AI, that under the openloop information structure L (u 1 , u 2 ) is not concave in u 2 , and hence the upper value of the game is unbounded (since it is scalar and quadratic). 0 Even though feedback saddlepoint equilibrium requires less stringent existence conditions than openloop saddlepoint equilibrium in linearquadratic differential games (as also exemplified in the preceding example), this feature does not necessarily imply (at the outset) that the least stringent conditions for existence of saddlepoint equilibrium under closedloop information structure are required by the feedback saddlepoint solution. This also brings up the question of whether the linearquadratic differential game
6
NASH EQUILIBRIA
293
still admits a saddlepoint solution in spite of existence of a conjugate point to the matrix Riccati equation (A19) on the interval [0, T]. Bernhard (1979) has shown, by going outside the class of continuously differentiable strategies, that this may be possible (i.e. a saddlepoint may survive a conjugate point) if the conjugate point is of (what he calls) an "even" type. A limiting case for the result of Thm A7 is obtained by letting T > 00, assuming that the solution of the Riccati equation (A19), to be denoted by Z (t, T), remains bounded and independent of the terminal condition for all t < 00 as T . 00. With all the system matrices taken as constants, the solution of (A19) as T . 00 then coincides with the solution of the algebraic Riccati equation (A23) and in this context one may wonder whether the pair of strategies (A20), with Z determined as a solution of (A23), provides a saddlepoint equilibrium for the infinitehorizon version of the linearquadratic zerosum differential game (obtained by taking all matrices to be time invariant and by letting T . 00 in (A7a) and (A7b)). This, however, is not always true, and one has to impose some additional restrictions on the parameters of the problem as discussed in Mageirou (1976)and Jacobson (1977). The reader is also referred to Mageirou (1977) for the computational aspects of the solution of (A23).
6.6 APPENDIX B: STOCHASTIC DIFFERENTIAL GAMES WITH DETERMINISTIC INFORMATION PATTERNS As it has been discussed in section 6.5.2, one possible way of eliminating "informational nonuniqueness" of Nash equilibria in nonzerosum differential games is to require the equilibrium solution also to satisfy a feedback property (cf. Def. 6. A2); but this is a valid approach only if the information structure of the problem permits the players to have access to the current value of the state. Yet another approach to eliminate informational nonuniqueness is to formulate the problem in a stochastic framework, and this has been done in section 4 for dynamic games defined in discrete time. A stochastic formulation for quantitative differential games (i.e. dynamic games defined in continuous time) of prescribed duration involves (cf. section 5.3) a stochastic differential equation
dx, = F(t, x" u 1 (r), ... , uN(t)) dt
+ o(t, x.) dw., x, II = 0 = xo,
(B1)
which describes the evolution of the state (and replaces (33)), and N cost functionals
294
DYNAMIC NONCOOPERATIVE GAME THEORY
Li(u!, ... , UN)
=
S; gilt, x., u (r), ... , uN(t» dt + qi(XT); i l
E
N.
(B2)
If ri(i E N) denotes the strategyspace of Pi (compatible with the deterministic information pattern of the game), his expected cost functional, when the differential game is brought to normal form, is given by Ji(y!, ... , yN) = E[Li(u!, ... , uN)1 ui(.) = yi(., l1 i), j
E
N]
(B3)
with yi E riu EN) and E[.] denoting the expectation operation taken with respect to the statistics of the standard Wiener process {W o t ~ O}. Furthermore, F and a are assumed to satisfy the requirements ofThm 5.2,and in particular a is assumed to be nonsingular. This section presents counterparts of some of the results of section 4 for stochastic nonzerosum differential games of the type described above; the analysis, however, will be rather informal since the underlying mathematics (on stochastic differential equations, stochastic calculus, etc.) is beyond the scope of this volume. Appropriate references will be given throughout for the interested reader who requires a more rigorous exposition of our treatment here. Firstly, let us consider the oneperson version of the Nperson stochastic differential game, which is a stochastic optimal control problem described by the state dynamics dxt=F(t,xt,u(t»dt+a(t,xr)dw o
xrlt;o=x o,
(B4)
and cost functional J (y)
= EU; g(t, Xo y(t,
Xr» dt
+ q(XT)] ,
(B5)
where u(t) = y(t, Xr) E S, i.e. only feedback strategies are allowed. The following result from Fleming (1969) or Fleming and Rishel (1975) provides a set of sufficient conditions and a characterization for the optimal solution of this stochastic control problem. PROPOSITION B1 Let there exist a suitably smooth solution W (r, x) of the semilinear parabolic partial differential equation
aw + 1
~
ot
2
~..
L., i. i
aIJ(t, x)
W (T, x)
a2w
~
uXiuxi
= q(x),
+min [V x W.F(t, x, u)+g(t, x, u)] = 0; UES
(B6)
where a ii denotes the ijth element ofmatrix a, and Xi denotes the ith component of vector x. If there is a function UO (r, x, p) that minimizes [p' F(t, x, u) + g(t, x, u)] on S, then y * (t, x) = UO (t, x, V x W) is an optimal control, with the corresponding expected cost given by J * = W (0, xo). 0
6
295
NASH EQUILIBRIA
Remark B1 The partial differential jequation (B6) corresponds to the HamiltonJacobiBellman equation (5.25) of dynamic programming, but now we have the extra second term which is due to the stochastic nature of the problem. It should also be noted that if the control was allowed to depend also on the past values of the state, the optimal control would still be a feedback strategy since the underlying system is Markovian (i.e. F and (J depend only on x, but not on {x" s < t} at time t). 0 We now observe an immediate application of Prop. B1 to the Nperson nonzerosum stochastic differential game of this section. THEOREM B1 For an Nperson nonzerosum stochastic differential game of prescribed fixed duration [0, T], as described by (B1)(B2), and under F B, MPS or CLPS information pattern, an Ntuple offeedback strategies {yi' E r. i E N} provides a Nash equilibrium solution if there exist suitably smooth functions Wi: [0, T] x R" + R, i E N, satisfying the semilinear parabolic partial differential equations ::1

'
uW ' ot
::1 
1" 2 L. k. j
k' (J
= V x Wi
::12
U
.
W'
J(t, x) ~
uxkux j
= min [V x W' . F'
+ g' (r, x, u')]
'  ' 0 " "
U'ES'
(r, X, u ')
.tr«, x, yi'(t, X))+gi'(t, x, yi'(t,X»;
(B7)
WilT, x) = qi(X) (i EN), where pO(t, x, ui) ~ Fit, x, y1'(t, x), gi'(t,x,Ui)~gi(t,x,y1'(t,X),
, yi 1 (r, x), u', yi+ l'(t, x), ,yi1°(t,x),U i,yi+1°(t,X),
, YN'(t, x)) ,yN'(t,X)).
Proof This result follows readily from the definition of Nash equilibrium and from Prop. B1, since by fixing all players' strategies, except the ith one's, at their equilibrium choices (which are known to be feedback by hypothesis), we arrive at a stochastic optimal control problem of the type covered by Prop. B1 and whose optimal solution (if it exists) is a feedback strategy. 0 Theorem B1 actually involves two sets of conditions: (i) existence of minima to the RHS of (B7), and (ii) existence of "sufficiently smooth" solutions to the set of partial differential equations (B7). A sufficient condition iO for the former is existence of functions u : [0, T] x R" x R" + Si which satisfy the inequalities , . 10 ,0 NO) :s; H't, '( x, p';u . 1 °, H'(t' x, p'; u , ... , u' , ... , u i i+1° , ... ,U NO),vUE \.I i s' (,'EN) (B8a) ... ,U i1° ,U,U
296
DYNAMIC NONCOOPERATIVE GAME THEORY
where t E [0, T], pi ERn, x ERn, and Hi(t,x,pi;u l ,
...
.
... ,uN)+gi(t,x,ul, ... ,u N). (B8b )
,UN)~pi·F(t,x,ut,
·0'
If such functions u· (t, x, p')(i E N) can be found, then we say that the Nash condition holds for the associated differential game.' In fact, Uchida (1978) has shown, by utilizing some of the results of Davis and Varaiya (1973) and Davis (1973) obtained for the stochastic optimal control problems, that under the Nash condition and certain technical restrictions on F, (1, s', gi and qi(i E N) there indeed exists a set of sufficiently smooth solutions to the partial differential equations (B7). Therefore, the Nash condition is sufficient for a suitably welldefined class of stochastic differential games to admit a Nash equilibrium solution.! and a set of conditions sufficient for the Nash condition to hold are given in Uchida (1979). For the special case of zerosum stochastic differential games, i.e. when L I ==  L 2 ~ Land N = 2, every solution set of (B7) is given by Wi ==  W 2 ~ W where W satisfies
aWl " i j (}2 W _ . L.(1 (t,x)~mmmax[VxW.F(t,x,u ot i. j UXiUXj u' ES' U'ES '
I
~2
+ get, x, ut, u 2 )]
WeT, X)
2
,u)
= max min [V x W. F(t, x, ut, u2 ) + get, x, u 1 , u2 ) ] ; u 2 eS2
WI
est
(B9)
= q(x).
Furthermore, the Nash condition (B8a) can be reexpressed for the stochastic zerosum differential game as existence offunctions uiO: [0, T] x R" x R" + s, i = 1, 2, which satisfy H(t,
X,
. p,. u 1° ,u 2°)_  min max H(t, uleS t u2eS2
X,
P,. u I ,u 2 )
= max min Hit, x, p; u 1 , u 2 ) where
(B10a)
u 2eS2 uteS l
(BlOb) Because of the similarity with the deterministic problem, condition (B10a) is known as the Isaacs condition. Now, it can actually be shown (see Elliott, 1976) t Note that such a condition could also have a counterpart in Thm A6, for the deterministic problem. t Uchida (1978)actually proves this result for a more general class of problems for which the state equation is not necessarily Markovian, in which case a more general version replaces (B7).
6
297
NASH EQUILIBRIA
that, if Isaacs condition holds and if Sl and Sl are compact, there exists a suitably smooth solution to the semilinear partial differential equation (B9) and consequently a saddlepoint solution to the stochastic zerosum differential game under consideration. Therefore, we have t COROLLARY B1 For the twoperson zerosum stochastic differential game described as an appropriate special case of (Bl)(B2), and satisfying certain technical restrictions, let Sl and S2 be compact and let the Isaacs condition (BI0a) hold. Then, there exists a saddlepoint solution in feedback strategies, which is
= u 1° (r, X,, Vx W (t, Xl)) yl*(t, Xl) = u 2° (r, Xl' V x W (r, x.}) yl*(t, Xl)
}
iO where W (t, x) satisfies (B9) and u (r, X, p) (i E N) are determined from (BI0a).
o
For the special case of linearquadratic stochastic differential games, both Thm B1 and Corollary B1 permit explicit expressions for the equilibrium strategies. Towards this end, let F, gi and qi (i E N) be as given in Def. AI (with F replacing j), and let a(t, x.) = a(t) which is nonsingular. Furthermore, assume that s' = R ""(i EN). t Then, u i o from (B8a) satisfies uniquely Bi(t)'pi + Rii(t)U iO = 0
(i E N)
as long as R ii(.) > 0 (i E N) which we assume to hold a priori. Therefore, (B7) can be written as
. WI(T, x)
I
, .
= 2 X Qj x,
which admits the unique solution set .1. . W'(t, x) = 2 x' Z'(t)x + ~'(t)
(i EN),
Elliott (1976) has proven this result for a non Markovian problem in which case (89) is replaced by a more complex relation; here we provide a version of his result which is applicable to Markovian systems. t In order to apply Corollary B1 directly, we in fact have to restrict ourselves to an appropriate (sufficiently large) compact subset of R"".
t
298
DYNAMIC NONCOOPERATIVE GAME THEORY
where Zi satisfies the N coupled matrix Riccati differential equations (A16a) and ~i satisfies the differential equation . "It. =  21Tr [IT(t)Z'(t)].
Now, since yi* is given by yi*(t, x,) =  R ii(t)1 Bi(t)'V x W(t, x,)"
it readily follows that the Nash equilibrium solution of the stochastic linearquadratic differential game coincides with that of the deterministic game, as provided in Corollary Al. Therefore, we have COROLLARY B2 The set of Nash equilibrium strategies (A17) for the deterministic linearquadratic d!fferential game with MPS or CLPS information pattern, also provides a Nash equilibrium solution for its stochastic counterpart, with IT not depending on the state x; D
A similar analysis and reasoning also leads to COROLLARY B3 Under the conditions of Thm A7, the set of saddlepoint strategies (A20) for the deterministic linearquadratic twoperson zerosum d!fferential game with M PS or C LPS information pattern also provides a saddlepoint solutionfor its stochastic counterpart, with IT not depending on the state x,,
D
Remark B2 Under appropriate convexity restrictions, which hold for the linearquadratic nonzerosum game treated here, existence of a solution set to (B7) is also sufficient for existence of a Nash equilibrium (see Varaiya, 1976); therefore the solution referred to in Corollary B2 is the unique Nash equilibrium solution for the linearquadratic stochastic differential game within the class of feedback strategies. It should be noted that we do not rule out existence of some other Nash equilibrium solution that depends also on the past values of the state (though we strongly suspect that this should not be the case), since Thm B1 provides a sufficiency condition only within the class of feedback strategies. The reader should note, however, that we cannot have informationally nonunique equilibrium solutions here, since {w" t ~ O} is an independentincrement process and IT is nonsingular. For the zerosum differential game, on the other hand, an analogous uniqueness feature may be attributed to the result of Corollary B3, by making use of a sufficiency result of Elliott (1976), which is obtained for compact S1 and S2 but can easily be generalized to noncompact action spaces for the linearquadratic problem.
o
6
299
NASH EQUILIBRIA
6.7
PROBLEMS
1. An alternative derivation of the result of Thm 2 makes use of Prop.
5.!. Firstly show that the ith inequality of (3) dictates an optimization problem to Pi which is similar to the one covered by Prop. 5.1, with Ck = LjENBiut which is only a function ofx 1 (and hence can beconsidHi
ered as a constant). Then the openloop Nash equilibrium solution is given as the solution of the set of relations
uk" = p1 sj ,; A
k
x: p1 (sl+1 +sl+1 L Bi un jEN
Hi
P~
= [R1i+B(sl+IBirIB(
s1 = Q1+ A ~ s1+1 [I  B1 p 1s1+1 ]
Ak ;
sl+1= Q1+1
L B~UfJ;S~+l=O
sl=A~[IB1p1S1+IJ'[sl+I+S1+l
jEN
Hi
keK, ieN, where xt+ 1 (k e K) denotes the corresponding state trajectory. Now finally prove that the preceding set of relations admits a unique solution as given in Thm 2, under precisely the same existence condition. 2. Obtain a more general version of Thm 2 when./k in (Sa) is replaced by
where Ck (k e K) is a constant vector. Then, specialize this result to the class of twoperson zerosum dynamic games so as to obtain a generalized version of Thm 3. *3. Verify the result of Prop. 2 when the positivity requirement on Qk+ 1 is relaxed to Qk + 1 ~ 0 (k e K). *4. Within the context ofThm 7,consider the scalar (withn = ml = m2 = 1) zerosum dynamic game in which all system parameters are constants (i.e. independent of k e K). Obtain a set of sufficient conditions on these parameters so that the zerosum constantparameter dynamic game admits a unique feedback saddlepoint solution regardless of the number of stages (i.e. K). 5. Derive a set of nonlinear Nash equilibrium solutions for the dynamic game of section 6.3.1. Specifically, obtain a counterpart of the result of Prop. 4 by starting with the representations 1
Y (x2,xd
2 3 2 = 3X2 +p[x 2 X2(Y )]
300
DYNAMIC NONCOOPERATIVE GAME THEORY
and by mimicking the analysis that precedes Prop. 4. How does the underlying existence condition compare with (31)? *6. Consider the scalar twoperson threestage stochastic dynamic game described by the state equation X4=X3+U~+O
and cost functionals
X3 = X2 + u1 + u~ + 0 X2=Xl+ ui+O L l = (X4)2+(ui)2+(u1)2 L 2 = (X4)2+2(u~)2+(uif.
Here 0 denotes a Gaussian random variable with mean zero and variance 1, and both players have access to CLPS information. Show that this dynamic game admits an uncountable number of Nash equilibria, and obtain one such set which is linear in the state information available to both players. Will there be additional Nash equiliria if 0 has instead a twopoint distribution with mean zero and variance I? *7. For the stochastic dynamic game of the previous problem, prove that the Nash equilibrium solution is unique if the underlying information pattern is, instead, onestep delay CLPS pattern. Obtain the corresponding solution, and identify it as a member of the solution set determined in Problem 6. *8. Consider the class of twoperson Kstage linearquadratic stochastic dynamic games with cost functional (lOb) and state equation Xk+ l = AkXk +Bf uf +B; u; +Ok' kEK,
where {Ok; k E K} denotes a set of independent Gaussian vectors with mean zero and covariance In. Obtain the saddlepoint solution when (i) both players have access to onestep delayed CLPS information, (ii) PI has access to CLPS information and P2 has access to onestep delayed CLPS information, and (iii)PI has access to CLPS information whereas P2 has only access to the value of Xl (i.e. OL information). How do the existence conditions compare in the three cases, among themselves and with (25a)(25b)? (Hint: Make use of Thm 8 in order to simplify the derivation of the saddlepoint strategies.) 9. Obtain a more general version Thm A2 whenfis replaced by f(t,x,u l
, .•• ,
UN) = A(t)x+
L Bi(t)Ui+c(t) ieN
where c(.) is a vectorvalued function with continuous entries. Then,
6
301
NASH EQUILIBRIA
specialize this result to the class of twoperson zerosum differential games so as to obtain a generalized version of Thm A3. 10. Show that Thm A2 can also be obtained by directly utilizing Prop. 5.2 in a manner which is parallel to the one outlined in Problem 1 for the discretetime counterpart. 11. This problem addresses at a generalization of Problem 5.1. Assume that we now have two companies, PI and P2, with Pi having xi(t) customers at time t and making a total profit of fi(t) = J~[cixi(s)ui(s)]ds
up to time t, where c' is a constant. The function ui(t), restricted by ui(t) ~ O,P(t) ~ 0, represents the money put into advertisement by Pi, which helps to increase its number of customers according to Xi
= u i u i ,
x;(O)
= XiO >
0, i,j
= 1,2,
j
+ i.
The total number of customers shared between the two companies is a constant, so that we may write, without any loss of generality, the constraint relation xdt)+X2(t)
= 1, VtE[O,T]
where T denotes the end of the time period when each firm wishes to maximize its total profit (i.e. p(T». Obtain the openloop Nash equilibrium solution of this twoperson nonzerosum differential game wherein the decision variable for Pi is u'. (For details see Olsder, 1976). *12. Consider a variation on the previous problem so that now the state equation is described by •
Xl
I
= X2U XIU
2
X2 = Xlu X2 U •
2
I
and the decision variables satisfy the constraints lui(t)I:s 1,
i
= 1,2.
The objective for Pi is still maximization of fi(T). (i) Obtain the openloop Nash equilibrium solution. (ii) Obtain the feedback Nash equilibrium solution when both players have access to the current values of Xl and X2' *13. Obtain the complete set of informationally nonunique linear Nash equilibrium solutions of the scalar linearquadratic twoperson nonzerosum differential game described by the state dynamics x=2u l

U
2
,
x(O)=I,
tE[O,2]
302
DYNAMIC NONCOOPERATIVE GAME THEORY
and cost functionals Li
= [X (2)] 2 + J~[ui(t)]2dt,
j
= 1,2,
and with the underlying information structure being MPS for both players. Which of these would still constitute a Nash equilibrium when (i) P2 has, instead, access to OL information, (ii)both players have access to OL information? (Note that, in this case, OL information is in fact "no information" since the initial state is a known number.) 14. Prove that the statement ofThm 8 is also valid for twoperson zerosum differential games with prescribed fixed duration. *15. Consider the differential game of Problem 13, but with the state equation replaced by
x = 2U 1U 2 +0', where 0' is a random variable taking the values + 1 and  1 with equal probability 1/2. Under the CLPS information pattern for both players, does this differential game, which also incorporates a chance move, admit a unique Nash equilibrium solution? In case the reply is "yes", obtain the unique solution; in case it is "no", obtain the complete set of Nash equilibria that are linear in the current value of the state. 16. Obtain the saddlepoint solution of the class of twoperson zerosum stochastic differential games described by the state equation dx, = [A (r)», + B 1 (t)u 1 + B 2 (t)u 2 ] dt
+ O'(t)dwt
and cost functional (A7b), where PI has access to CLPS information and P2 has OL information. Here, 0'(.) is a nonsingular matrix, {WI' t ~ O} is the standard Wiener process, and the initial state vector Xo is a known quantity. (For details see Basar, 1977c.)
6.8
NOTES
Section 6.2 Results pertaining to Nash equilibria in infinite dynamic games first appeared in the works of Case (1967, 1969) and Starr and Ho (1969a, b) but in the continuous time (i.e. for differential games). Here we present counterparts of these results in discrete time, and with special attention paid to linearquadratic (nonzerosum and zerosum) games, and existence conditions for equilibria under openloop and memoryless perfect state information patterns. A possible extension of these results would be to the class of game problems in which the players also have the option of whether they should make a perfect state measurement or not, if it is costly to do so; for a discussion of such problems see Olsder (1977b). For applications of some of these results in macroeconomics, see Pindyck (1977), Kydland (1975), Plasmans (1980) and Plasmans and de Zeeuw (1980). In the latter two references the dynamic game
6
NASH EQUILIBRIA
303
constitutes a model of the European Economic Community and the players are the contributing countries. Section 6.3 Existence of uncountably many informationally nonunique Nash equilibria in dynamic games under closedloop information was first displayed in Basar (1974,1976a) for discretetime games, and in Basar (1977b) for differential games. The underlying idea in this section therefore follows these references. The algorithm of section 6.3.2 is from Basar (1977d), and Thm 8 was first presented in Basar (1977a); some further discussion on the relation between existence of a saddle point and the underlying information pattern in zerosum dynamic games can be found in Basar (1976b)and Witsenhausen (1971b), where the latter reference also discusses the impact of information on the upper and lower values of a game when a saddle point does not exist. Section 6.4 Elimination of informational nonuniqueness in Nash equilibria through a stochastic formulation was first discussed in Basar (1976a),and further elaborated on in Basar (1975, 1979b); this section follows basically the lines of those references. The fact that an increase in information could be detrimental for a player under Nash equilibria was first pointed out in Basar and Ho (1974), but within the context of stochastic games with noisy (imperfect) observation. For some results on the equilibrium solution of linearquadratic stochastic games with noisy observation, see Basar (1978), Basar and Mintz (1972, 1973). Section 6.5 Derivation of openloop and feedback Nash equilibria in nonzerosum differential games was first discussed in Case (1967, 1969) and Starr and Ho (1969a,b), where the latter two references also display the differences (in value) between these two types of equilibria. Derivation of openloop saddlepoint equilibria, however, predates this development, and was first discussed in Berkovitz (1964)and Ho et al. (1965), with the latter devoted to a particular class oflinearquadratic differential games of the pursuitevasion type, whose results were later put into rigorous framework in Schmitendorf (1970); seealso Halanay (1968)which extends the results of Ho et al. (1965)to differential games with delays in the state equation. For an excellent exposition on the status of nonzerosum and zerosum differential game theory in the early nineteen seventies, see the survey paper by Ho (1970). Some selective references which display the advances on Nash equilibria of deterministic differential games in the nineteen seventies are (in addition to those already cited in the text) Bensoussan (1974), Ho and Olsder (1980), Leitmann (1974,1977), Scalzo (1974),Vidyasagar (1976) and related papers in the edited volumes Blaquiere (1973), Hagedorn et al. (1977), Ho and Mitter (1976),Kuhn and Szego (1971). For applications of this theory to problems in economics, see Case (1971, 1979), Clemhout et al. (1973) and Leitmann and Wan, Jr (1979), where the latter discusses a worst case approach towards stabilization of uncertain systems, that utilizes the zerosum differential game theory. Tolwinski (1978b) provides a rather complete account of the results on existence of openloop Nash equilibria in nonzerosum differential games, and also discusses (Tolwinski, 1978a) a possible computational scheme for the numerical evaluation of the Nash equilibrium solution; see also Varaiya (1967, 1970), Rosen (1965) and Friedman (1971) for existence results concerning noncooperative equilibria. For the special class of linearquadratic games, Papavassilopoulos and Cruz, Jr (1979b) and Papavassilopoulos et al. (1979)discuss existence of a unique solution to the coupled set of Riccati equations (A16a), which plays an important role in the characterization of
304
DYNAMIC NONCOOPERATIVE GAME THEORY
feedback Nash equilibria. For a discussion of noncooperative equilibria in differential games with state dynamics described by partial differential equations, see Roxin (1977). Section 6.6 For some explicit results in the case of stochastic quadratic differential games with mixed information patterns, see Basar (1977b,c, 1979a, 1980a). Extensions of these results to stochastic differential games with noisy observation meet with formidable difficulties, some of which have been resolved in the case oflinearquadratic zerosum games (see Willman, 1969, and Bagchi and Olsder, 1981).
Chapter 7
Hierarchical (Stackelberg) Equilibria of Infinite Dynamic Games 7.1
INTRODUCTION
This chapter is devoted to derivation of the Stackelberg solution in infinite dynamic games with fixed prescribed duration. The main body of the chapter treats dynamic games defined in discrete time and with the number of players restricted to two; however, continuoustime counterparts of some of these results and possible extensions to many player games are briefly mentioned in the notes section, at the end of the chapter, supported by relevant references. The next two sections (i.e. sections 2 and 3) deal, respectively, with the Stackelberg solution under openloop information and the feedback Stackelberg solution under CLPS information. These solutions are obtained using two standard techniques of optimal control theory, viz. the minimum principle and the dynamic programming, respectively. Derivation of the (global) Stackelberg solution under CLPS information pattern, however, requires a much more subtle analysis since all standard techniques and approaches of optimal control theory fail to provide the solution. Therefore, section 4 of the chapter is devoted exclusively to this topic and to elucidation of an indirect approach towards derivation of the closedloop Stackelberg solution. This indirect approach is first introduced within the context of two scalar examples (cf. sections 7.4.1 and 7.4.2), and then its application is extended to the class of twoperson linearquadratic dynamic games in section 7.4.3. Section 5 discusses derivation of the Stackelberg solution in stochastic dynamic games with deterministic information patterns, and primarily for the openloop information structure (cf. section 7.5.1) and for the CLPS information structure under the "feedback Stackelberg" solution concept (cf. section 7.5.2). Derivation of the (global) Stackelberg solution under the CLPS information pattern is still an unresolved problem today, and this topic is briefly discussed also in section 7.5.1. 305
306
DYNAMIC NONCOOPERATIVE GAME THEORY
7.2 OPENLOOP STACKELBERG SOLUTION OF TWOPERSON DYNAMIC GAMES IN DISCRETE TIME In this section we discuss the Stackelberg solution for the class of twoperson deterministic discretetime infinite dynamic games of prescribed fixed duration (cf. Def. 5.1)when the underlying information structure is openloop (for both players) and PI acts as the leader. Hence, abiding by our standard terminology and notation, the state evolves according to (1)
where X k E X ,= R", XI is specified a priori, and u~ E U~ ~ Rm" i = 1,2; and the stageadditive cost functional for Pi is introduced as i = 1,2.
(2)
Because of the openloop information structure, the game is already in normal form, and therefore Defs 3.26, 3.27 and 3.28 readily introduce the Stackelberg equilibrium solution with PI acting as the leader (since finiteness of the game was not essential in those definitions). One method of obtaining the Stackelberg solution for this class of games is to make use of the theory of section 4.5, since they can be viewed as static games under the openloop information structure; such an approach also readily leads to conditions for existence of a Stackelberg solution. Towards this end, we first note that, by recursive substitution of (1) into (2), the cost functional J i (u I, U z) can be made to depend only on the control vectors u I, U Z (which are of dimensions mlK and mzK, respectively) and also on the initial state X 1 which is known a priori. Now, if (i) Ji is continuous on U 1 X U Z (i = 1, 2), (ii) J Z(ul,.) is strictly convex on U Z for all u l E U 1, (iii) U i is a closed and bounded (thereby compact) subset of R m, (i = 1, 2), then it follows from Corollary 4.4 (more generally from Thm 4.6) that a Stackelberg equilibrium solution exists for the openloop infinite game. A bruteforce method to obtain the corresponding solution would be first to determine the unique reaction curve of the follower by minimizing JZ(ul, U Z) over U Z E U Z for every fixed u 1 E U 1, which is a meaningful optimization problem because of assumptions (i)(iii)above. Denoting this unique mapping by TZ:U 1 ~ U Z, the optimization problem faced by PI (the leader) is then minimization of J 1(U 1, T Z u1 ) over U 1, thus yielding a Stackelberg strategy for the leader in this openloop game. Despite its straightforwardness, such a derivation is not practical, especially if the number of stages in the game is largewhich directly contributes to the dimensions of the control vectors u I, uZ • An alternative derivation, which
7
HIERARCHICAL (STACKELBERG) EQUILIBIUA
307
parallels that of Thm 6.1 in the case of openloop Nash equilibria, utilizes techniques of optimal control theory. Towards this end, we first determine the unique optimal response of the follower to every announced strategy yl E r 1 of the leader. Since the underlying information pattern is openloop, the optimization problem faced by the follower is (for each fixed u 1 E U 1) min LZ(u1,u Z) u2eU l
subject to This is a standard optimal control problem whose solution exists and is unique under conditions (iHiii), and which can further be obtained by utilizing Thm 5.5: 1 In addition to conditions (i)(iii) assume that' (iv) A(., uL uf) is continuously differentiable on R", (k E K), (v) gf (: , uL uf,' ) is continuously differentiable on R" x R", (k E K). Then, to any announced strategy u1 = y1 Er 1 ofthe leader, there exists a unique optimal response of the follower (to be denoted by yZ (x d = UZ ) satisfying the following relations: LEMMA
(3a)
uf= arg minHf(pk+l,uLuf,xk) u;Euf
Pk =
O~k
A(Xb
uL uU[Pk+ 1 + (OX~+
1
gf(Xk+ 1, uL uf,xd)'] +
+ [ OX0 gf (Xk+ 1, uL uf, xd]'; k
PK+!
(3b)
(3c)
=0
Hf(Pk+ 1, u~, uf, Xk) ~gf(A(Xk' u~, uf), u~, uf, Xk) + Pk+ lA(x k, uL uf), k E K.
(3d)
Here, {Pz ... , PK + ! } is a sequence of ndimensional costate vectors associated 0 with this optimal control problem. Except for very special cases, conditions (iv)and (v)given below, and conditions (vi),(vii),(ix)and (x) to be given later, are required for satisfaction ofcondition (i) (i.e. they are, in general, implicit in condition (i»; but we nevertheless rewrite them here since we need them explicitly in some of the results obtained in the sequel. t
308
DYNAMIC NONCOOPERATIVE GAME THEORY
If we further assume (vi) f,.(xk,uL·) is continuously differentiable on uf, (kEK), (vii) gf(Xk+1' uL ., Xk) is continuously differentiable on U~, (k E K), (viii) ii2 in Lemma 1 is an innerpoint solution for every u 1 E U 1, then (3b) can equivalently be written as
=0
V, H~(Pk+1,uLii~,xd
(3e)
(kEK).
Stackelberg strategy of the
Now, to obtain the leader, we have to minimize U(u 1 , u2 ) , in view of the unique optimal response of the follower to be determined from (3e) in conjunction with (3a) and (3c). Therefore, PI is faced with the optimal control problem min L 1 ( U 1 , u 2 )
(4a)
UIEU I
subject to
Xk+ 1 = J,(Xk' uL un.
Xl given
Pk = Fk(xk,uLuf,Pk+d, Vu: H~(Pk+1,uLu~,Xk)
where H~ is defined by (3d) and
8
(4b)
PK+l = 0
= 0
(4c) (4d)
(kEK)
[8
A 8Xk Jk r( Xk,Uk' 12)' 12) 12 Fk = Uk Pk+ 1 + 8Xk gk2(1'( Jk x k,Uk> Uk , Uk> Uk> Xk
)J' .
(4e)
Note that, in this optimal control problem the constraint equations involve boundary conditions at both end points, and hence Thm 5.5 is not applicable; but a standard result of nonlinear programming (cf. Canon et al., 1970) can be utilized to yield the following conclusion. THEOREM 1 In addition to conditions (i)(viii) stated in this section, assume that (ix) .fk(xk, ., u~), gf (Xk+ b ., uf, Xk) are continuously differentiable on U L
kEK,
o;
(x) gt(.,.,.,.) is continuously differentiable on R" x U ~ x x R", k E K, (xi) .fk(., uL.) is twice continuously differentiable on R" x in, and g~ ( . , uL. ,.) is twice continuously differentiable on R" x U ~ x R", k E K. Then, if {y~. (x 1) = Uk· E Ut, k E K} denotes an openloop Stackelberg equilibrium strategyfor the leader in the dynamic gameformulated in this section, there exist finite vector sequences {A'l> ... , A.d, {Ill> ... , Ild, {VI' ... , vd that satisfy the following relations: xt+1 =J,(xt, uk", uf),
x! = Xl
V_I H ~ (A.k,llk, Vk, pt+ 1, uk", uf, xt)

=0
(Sa) (5b)
7
309
HIERARCHICAL (STACKELBERG) EQUILIBRIA
(5c) (5d) III =
°
(5e) (Sf)
), Pk* = Fk(* Xk, Uk••, Uk2· ,Pk* +I
PK* + 1 = 0
(5g)
where H~ = gU!k(x k, uL un, uL uf, Xk) + Ak.h(Xb uL uf) +v~('VufHf(PHl,uLuf,Xk»"
n; is
+ Il~Fk(Xk'
ut
ut, Uf,Pk+ tl (5h)
defined by (3d), F, by (4e), and denotes the interior of U], Furthermore, {ut, k E K} is the corresponding unique openloop Stackelberq strategy of thefollower (P2), and {xt+ I; k E K} is the state trajectory associated with the Stackelberg solution. Proof As discussed prior to the statement of the theorem, the leader (PI) is faced with the optimal control problem of minimizing L 1 (u1 , u 2 ) over V 1 and subject to (4b)(4d). Since the problem is formulated in discrete time, it is equivalent to a finite dimensional nonlinear programming problem the "Lagrangian" of which is
L
=
L {g~(Xk+l,ut,uf'Xk)+Aa!k(XbULuf)XHl]
kEK
+
J1~[Fdxk'
ui , uf, PH d  Pk] +
v~ [a~f
Hf(Pk+l' uL uf,xdj'},
where Ak' Ilk' Vk (k E K) denote appropriate Lagrange multiplier vectors. Now, if {ut,kEK} is a minimizing solution and {Xt+l,Pt+l,uf; kEK} are the corresponding values of the other variables so that the constraints (4b)(4d) are satisfied, then it is necessary that (see e.g. Canon et al.. 1970, p. 51) V L = 0, I
Uk
V, L = 0, V L = 0, u"
wherefrom (5b)(5e) follow.
Xk
V
PIc+1
L = 0,
(k E K)
0
Remark 1 If the follower (P2), has, instead, access to closedloop perfect state
(CLPS) information, his optimal response (cf. Lemma 1) will be any closedloop representation of the openloop policy {iif, k E K}; however, since the
310
DYNAMIC NONCOOPERATIVE GAME THEORY
constraints (4bH4d) are basically openloop relations, these different representations do not lead to different optimization problems for the leader (PI). Therefore, the solution presented in Thm 1 also constitutes a Stackelberg equilibrium for the case when the follower has access to CLPS information (with the leader still having OL information). Under this latter information pattern, we may still talk about a unique openloop Stackelberg strategy for the leader; whereas for the follower the corresponding optimal response strategy will definitely not be unique, since any closedloop representation of u z* given in Thm 1 will constitute an optimal strategy for P2. 0 We now specialize the result of Thm 1 to the class of linearquadratic dynamic games (cf. Def. 6.1) wherein (6a) i
gdXk+
l'
1
2
_
1
I
i
Uk> uk,xd  2(Xk+1 Qk+1 Xk+1
j' i l' ij j . + uiu; + ukR k Uk)' J 0/=
.
I,
i.]
= 1,2. (6b)
We further assume that Qi+ 1 ~ 0, Rl > 0, and U~ = R'" V k E K, i = 1, 2. Under these assumptions, it readily follows that the optimal response of the follower to any announced control sequence of the leader is unique and affine, and moreover the minimum of L 1 under the constraint imposed by this affine response curve is attained uniquely by an openloop control sequence (for PI) that is linear in Xl' t This unique solution can be obtained explicitly by utilizing Thm 1 and in terms of the current (openloop) value of the state; it should be noted that Thm 1 provides in this case both necessary and sufficient conditions because of strict convexity. t The result is given below in Corollary 1, which is obtained, however, under a condition of invertibility of a matrix related to the parameters of the game. 2
1 The twoperson linearquadratic dynamic game described by (6aH6b) under the parametric restrictions Q~+ 1 ~ 0, R~2 > 0 (k E K, i = 1,2) admits a unique openloop Stackelberg solution with PI acting as the leader, which is given, under the condition of inoertibility ofthe matrix appearing in the braces in (8c), by COROLLARY
uk" = yf(xd = B(Klxt uf = ynxd = BrKfxt
(7a)
(kEK),
(7b)
This follows since L 1(U 1, u2 ) is quadratic and strictly convex on R m, x R m2. t Note that condition (iii)in the general formulation is also met here, since we already know that a unique solution exists with U~ taken as Rmi and therefore it can be replaced by an appropriate compact subset of R m, (i = 1,2). t
7
311
HIERARCHICAL (STACKELBERG) EQUILIBRIA
where K~ ~[I
+Qt+ lBtB{]IAH l<1>k +Qt+ lBt[I +B{Qt+ IBn 1 R~2B{Pk+ +Qt+l[I+BtB{Q~+d1AmMk
K~ ~
PH 1
!k (8a)
e,
(8b)
+BkB k PHI +BkB k QH1 BdI +B k QH1 B d  R k B k PHI 1 i ' 2 2 2' 1 } 1 +BkB k U+QH1 Bk Bk) A H 1 2
k~{I
2'
. [A k  B~ B( Q~+
1
1 (/
i '
2
+ B~
2
B{ Qt+
2'
1
2
1
12
2'
.r ' AkMk],
(8c)
and P k , A k , M k are appropriate dimensional matrices satisfying the difference equations
r, = A"Pk+ 1 +A"Q~+I<1>k; Ak = A,,[Q~+I<1>k+AHI MH
1
=
AkMkB~Nk;
P K+ 1 = QLl' +Q~+IAkMkQ~+lB~Nk];
(9a) A K+ 1 = QLl (9b)
M 1 = O.
(9c)
with
n, ~ (/ + B{ Qt+ 1 Bt)1 [B{(A k+ 1k + Q~+
lAkMd  Rf2 B{ K~].
(9d)
Furthermore, the unique state trajectory associated with this pair of strategies satisfies (ge) Proof Existence of a unique solution to the linearquadratic openloop Stackelberg game has already been verified in the discussion prior to the statement of the corollary. We now apply Thm 1 to obtain the corresponding equilibrium strategies of the players, in the structural form (7a}(7b). Towards this end, we first rewrite (5b}(5g), under the "linearquadratic" assumption . (6a}(6b), respectively as follows:
ur12+B(Q~+IX:+12'QI
+BfAk+BfQ~+lAkJJk+BfQ~+IB~Vk =0 (i) B * 2' 1 2'Q2 (I 2'Q2 2 ("") R k Uk + k k+1 Xk+l +B k Ak+Bk k+1 Ak,uk+ +B k H1Bk)Vk = 0 11 Akl = A"Q~+IX:+1 +A"Ak+A"Q~+dAkJJk+B~vd; AK = 0 (iii) JJk+l =AkJJk+B~Vk; JJI =0 (iv) uf +BnQ~+lxt+l +pt+d = 0 (v) pt = A,,[pt+l +Q~+lxt+d; pt+! = O. (vi) 2'
The coefficient of Vk in (ii) is clearly positive definite; therefore, solving for Vk from (ii) and substituting this in (i),.(iii), and (iv), we obtain, respectively,
312
DYNAMIC NONCOOPERATIVE GAME THEORY
uk"
=
Akl
BnQl+IQ~+IB~(l Bn1 Q~+IB~(l  BfQ~+ I [I  B~(l +BfQ~+IB~(l+B{Q~+lB~)IRi2ur,
(vii)
= A;'[Qi + I Q~+IB~(l
+B{Q~+lBnlB{Qi+IJxt+1 Qf+IB k2 (I + B{ Qf+ IBf)1 B{]Ak
+ A;'[I 
+ A;'Q~+
I
[I  Bf(l + B{Qf+ IB~)l
AkQ~+I(l+B{Q~+IB~)lRl2ur; IlH I
+B{Q~+IB~)IB{Qi+IJXt+1 + B{Q~+IB~)l B{]Ak + B{ Q~+ IB~)l B{Q~+ tJAkll k
B{ Qf+ IJAkll k AK=O
(viii)
+ B{Q~+ IB~)1 B{ Q~+ IJAkll k  B{ (l + B{ Q~+ IB~)l 2uf* . [RI +B{(Ql+IXt+1 +AdJ; III = O. (ix)
= [I  B~(l
Now, letting pt+l = (PHI Q;+dXt+b Ak = (AH 1 Ql+dxt+b Ilk = Mkxt, and substituting (vii)and (v)into the state equation, we obtain the relation (x)
assuming that the required inverse in as given by (8c), we have also utilized the matrix identity of Lemma 6.2 in simplifying some of the terms involved. Now, finally, by further making use of (x) in (vii), (v), (vi), (viii)and (ix), we arrive at (8a), (8b), (9a), (9b) and (9c), respectively. 0 Remark 2 The coupled matrix difference equations (9a)(9c) constitute a twopoint boundary value problem, since the starting conditions are specified at both end points. If the problem is initially formulated as a (high dimensional) static quadratic game problem, however, such a twopoint boundary value problem does not show up; but then one is faced with the task of inverting high dimensional matrices (i.e. the difficulty again shows up in a different form). 0
7.3
FEEDBACK STACKELBERG SOLUTION UNDER CLPS INFORMATION PATTERN
The feedback Stackelberg solution concept introduced in Def. 3.29 for finite games is equally valid for infinite dynamic games defined in discrete time and with additive cost functionals, provided that the underlying information structure allows the players to have access to the current value of the state (such as in the case of CLPS or MPS information pattern). Furthermore, in view of Prop. 3.15 (which is also valid for infinite games defined in discrete
7
313
HIERARCHICAL (STACKELBERG) EQUILIBRIA
time), it is possible to obtain the solution recursively (in retrograde time), by employing a dynamic programming argument and by solving static Stackelberg games at every stage. Towards this end, we consider, in this section, the class of twoperson dynamic games described by the state equation (1),and the cost functionals L i (U 1 , u 2 ) given by (2). Under the delayedcommitment mode of play, we restrict attention to feedback strategies, i.e. u~ = Y~(Xk)' k E K, i = 1, 2, and seek to obtain the feedback Stackelberg solution that is valid for all possible initial states XI E R~. To complete the description of the dynamic denote the class of all permissible strategies of Pi at Stackelberg game, let stage k (i.e. measurable mappings from R" into U~). Then, the cost functional for Pi, when the game is in normal form, is (as opposed to (2»
n
Ji(y 1,y2 )
K
=
I
k=1
y{Er{, kEK,j
gt(X k+1,yl(Xk),y2(Xk),Xk);
=
1,2. (10)
Because of the additive nature of this cost functional, the following theorem now readily follows from Def. 3.29 and Prop. 3.15 (interpreted appropriately for the infinite dynamic game under consideration), and it provides a sufficient condition for a pair of strategies to constitute a feedback Stackelberg solution. 2 For the twoperson discretetime infinite dynamic game formulated in this section, a pair of strategies {y I' E F 1, y2'E r 2 } constitutes a feedback Stackelberg solution with PI as the leader if
THEOREM
min is a singleton set defined by
where R~(·) R~(y~)
(11)
=
= min
{MEn:G~(Y~,f3~,Xk)
yiE r:
~ Gt(!..(Xb yt(x k), y~(Xk»'
G~(yLy~,Xk)
G~(yLy~,Xk)
V XkERn},
yt(Xk), Yf(Xk),X,;),
i
(12a)
= 1,2, k E K,
(12b)
and Gt is recursively defined by G~(Xk+
b
yt(Xk)' y~(Xk)'
x k) = G~+
1 (!..+ dXk+ 1,
yf+ 1 (Xk+ d, yf+ 1 (Xk+ d, Xk+ d + g~; Gk+l
= 0,
i
= 1,2.
yk'+1 (Xk+ d, yf+ dx k+ 1)), (13)
o
Remark 3 If the optimal response set R~() of P2 (the follower) is not a singleton, then (in accordance with Def. 3.29) we have to take the maximum of
314
DYNAMIC NONCOOPERATIVE GAME THEORY
the LHS of (11) over all these nonunique response functions. However, if G~ (yi, ., x k ) is strictly convex on V ~ (k E K), the underlying assumption of "unique follower response" is readily met, and the theorem provides an easily implementable algorithm for computation of feedback Stackelberg strategies.
o
For the special case of linearquadratic games with strictly convex cost functionals, the feedback Stackelberg solution of Thm 2 can be determined in closed form. Towards this end, let.f,. and gi be as defined by (6a) and (6b), respectively, with Qi+ 1 ~ 0, R~ ~ 0 V k E K, i,j = 1,2, i i= i. and take Vi = Rm,. Then, solving recursively for GL in view of (11)and (I2a), we obtain the quadratic structure
where
Li satisfies the
backward recurrence relation
Li = (AkBiSi B~SU L~+l=QLl
Li+dAkBiSi B~S~)+SrSi+~fR~Sk+QL (i,j=I,2; ji=i),
(14)
and
s; g [Bf (l + L~+lB~Bnl
L~+dI
+ B{ L~+ IBf)1 R~2(l [(/ +L~+
IB~B{)I
+B~
+B{ L~+lB~)1
u, 1(/ +B~B{L~+
B{ L~+dlBi B(L:~ IB~(l + B{ L~+ IB~ + I] I B(
(l+BrL~+lBnIBrL~+I]Ak' S~ g(l + B{ L~+ IB~)I
B{ L~+ dA k  B~S~).
d+L~~
IB~(/
+BrL~+
IB~)I
RF (I5a) (I5b)
It should be noted that Li+ 1 ~ 0 Vk E K, i = 1,2 (since Qi+ 1 ~ 0, R~ ~ 0),and
therefore the required inverses in (I5a) and (I5b) exist. By the same token, G~ (yL ., Xk) is strictly convex on Rm 2, and hence R~ is a singleton for every k E K; furthermore, the minimization problem (11) admits a unique solution, again because of the strict convexity (on Rm l ) . The implication, then, is the following corollary to Thm 2, which summarizes the feedback Stackelberg solution of the linearquadratic game. 2 The twoperson linearquadratic discretetime dynamic game, as defined by (6a) and (6b) and with strictly convex costfunctionals and FB, CLPS or MPS information pattern, admits a unique feedback Stackelberg solution with PI as the leader (under the delayed commitment mode of play), which is COROLLARY
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
315
linear in the current value of the state:
Here Sf and Sf are defined by (15a) and (I5b), respectively, via the recursive 0 relations (14).
7.4
(GLOBAL) STACKELBERG SOLUTION UNDER CLPS INFORMATION PATTERN
We consider, in this section, derivation of (global) Stackelberg solution when the leader has access to dynamic information (e.g. CLPS information). Such decision problems cannot be solved by utilizing standard techniques of optimal control theory (as in sections 2 and 3); because the reaction set of the follower cannot, in general, be determined in closed form, for all possible strategies of the leader, and hence the optimization problem faced by the leader on this reaction set becomes quite an implausible one (at least, using the standard available techniques of optimization). In finite games (cf. section 3.6) this difficulty can definitely be circumvented by converting the original game in extensive form into normal form (which is basically a static (matrix) game) and then by performing only a finite number of comparisons. In infinite dynamic games, however, such an approach is apt to fail because of the "infinite" nature of the strategy space of the leader (i.e. he has uncountably many alternatives). In the sequel we first consider a specific example to elucidate the difficulties involved in a bruteforce derivation of the Stackelberg solution when the leader has access to dynamic information, and introduce (within the context of that example which involves two stageseach player acting only once, first the follower and then the leader) an indirect method to obtain the solution (see section 7.4.1).Then, we extend this indirect approach to a modified version of the example of section 7.4.1, in which the follower also acts at the second stage (see section 7.4.2)and finally we obtain in section 7.4.3 closedform expressions for linear Stackelberg strategies in general linearquadratic dynamic games, by utilizing the methods introduced in sections 7.4.1 and 7.4.2.
7.4.1
An illustrative example (Example 1)
Consider the twostage scalar dynamic game described by the state equations (16)
316
DYNAMIC NONCOOPERATIVE GAME THEORY
and cost functionals
e
= (X3)2+2(u lf+fJ(U 2)2, fJ L 2 = (X3)2 + (U 2)2.
~o
(17a) (17b)
(Note that we have suppressed the subindices on the controls, designating the stages, since every player acts only once in this game.) The information structure of the problem is CLPS, i.e. PI (the leader), acting at stage 2, has access to both Xl and X2' while P2 (the follower) has only access to the value of x.; therefore, a permissible strategy yl for PI is a measurable mapping from R x R into R, and a permissible strategy y2 for P2 is a measurable mapping from R into R. At this point, we impose no further restrictions (such as continuity, differentiability) on these mappings, and denote the strategy spaces associated with them by r l and r, respectively. Now, to any announced strategy yl E r l of the leader, the follower's optimal reaction (for fixed xd will be the solution of the minimization problem min {[Xl  u 2  yl(XI  u 2, Xd]2 + [U 2]2}
(18)
uleR
which determines R 2(yl ). For the sake of simplicity in the discussion to follow, let us now confine our analysis only to those strategies in r I which lead to a singleton R 2 (yl). Denote the class of such strategies for the leader by r I , and note that the follower's response is now a mapping y2:R x j'"l > R, which we symbolically write as y2 [x I; yl]. There is, in general, no simple characterization of y2 since the minimization problem (18)depends "structurally" on the choice of yl E j'"1. Now, the optimization problem faced by the leader is min {[X~'
 yl (X~',
XdY + 2[yl (x~',
Xd]2
+ fJ[y2[XI; yl]]2}
yl Erl
where This is a constrained optimization problem which is, however, not of the standard type, since the constraint is in the form of minimum of a functional. If we further restrict the permissible strategies of the leader to be twice continuously differentiable in the first argument (in which case the underlying r becomes a smaller set), the first and secondorder conditions associated with the minimization problem (18) can be obtained, which implicitly determine y2[XI;yl]. The problem faced by the leader then becomes one of calculus of variations, with equality and inequality constraints, which is quite intractable even for the "simple" problem formulated in this subsection. Besides, even if the solution of this constrained calculus of variations problem is obtained (say, through numerical techniques), we cannot be sure that it
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
317
constitutes a Stackelberg strategy for the leader in the game problem under consideration (i.e. it might not be the best choice for the leader) since the a priori assumption of twice differentiability might be overly restrictive. We shall, in fact, see in the sequel that, depending on the value of {3, the Stackelberg strategy of the leader could be nondifferentiable. These considerations then force us to abandon the direct approach towards the solution of the Stackelberg game under CLPS information pattern, and to seek indirect methods. One such indirect method involves determination of a (tight) lower bound on the (Stackelberg) cost of the leader, and then finding an appropriate strategy for the leader that realizes that bound in view of possible rational responses of the follower. In an attempt to explore this indirect method somewhat further, let us first observe that the lowest possible cost the leader can expect to attain is min min J 1 (yl , y2) yl er 1 y2 e f 2
(19)
which is the one obtained when the follower cooperates with him to (globally) minimize his cost functional. This quantity clearly provides a lower bound to the leader's Stackelberg cost; but it is not as yet known w.hether it can be realized, t and this is what we investigate in the sequel. Firstly, solving the optimization problem (19) in view of (16) and (17a), we obtain (by utilizing dynamic programming or directly Prop. 5.1) the pair of feedback strategies
"31 X 2
(20a)
y2'(xd = [2/(3{3 + 2)]x 1
(20b)
y
l'
(X2) =
as providing the unique globally minimizing solution within the class of feedback strategies. The corresponding optimum state trajectory, in this twostage team problem, is described by x~ x~
= [3{3/(3{3 + 2)]x 1 } = (2/3)x~,
(2Oc)
and the minimum cost is (20d) The superscripts in (20a}(20d) stand for "team", since (19)actually describes a twomember team problem with the common objective being minimization of (17a), and (20a}(20b) is the unique teamoptimal solution in feedback t Note that the follower in fact attempts to minimize his own cost functional which is different from J'.
318
DYNAMIC NONCOOPERATIVE GAME THEORY
strategies. We should note, however, that if PI is also allowed to know (and utilize) the value of the initial state Xl' (20a) is no longer a unique teamoptimal strategy for PI, but any representation of it on the state trajectory (2Oc) also constitutes an optimal strategy (cf.section 5.5.4). Denoting the class of all such strategies by r I', we now have the more general result that any pair yl E I'", yZ'(xd = [2/(3P + 2)]x I
constitutes a teamoptimal solution to (19). For each such pair, the state trajectory and the corresponding cost value are still as given by (20c)and (20d), respectively. Now, if (19), or equivalently (20d), is going to be realized as the Stackelberg cost of the leader, there should be an element of r I', say y I", which forces the follower to play yt even though he is minimizing his own cost functional (which is J Zthe counterpart of L Z in the extensive form description). This, inevitably, asks for a relation of the form yt = arg min J2(yl", y2), y2
ef
2
and this being so uniquely, i.e. the minimum of J 2 (yl", . ) on r 2 should be attained uniquely at yt. t Intuitively, an appropriate strategy for the leader is the one that dictates (20a)if the follower plays (20b),and penalizes the follower rather heavily otherwise. Since the leader acts, in this game, after the follower does, and since the state equation is scalar and linear, he (the leader) is in a position to implement such a strategy (by utilizing solely the state information available to him). In particular, the strategy I"
y (xz,xd =
{j
X2
k
XI
if X2 = [3P/(3P + 2)]x I otherwise,
(21)
where k is any positive number, is one such candidate. Such a strategy is clearly in rt'. Moreover, one may readily check validity of the inequality min JZ(yl = jxz,YZ) < min J2(yl = kx l,y2)
},2
er2
y1 e f 2
for k > O,'which implies that the strategy (21)does indeed force the follower to the team strategy (20b), since otherwise he incurs a cost which is higher than the one obtained under the strategy (20a).The conclusion, therefore, is that the strategy (21) constitutes, for k > 0, a Stackelberg strategy for the leader, with the unique optimal response of the follower being (20b). t In case of nonunique solutions, we have to take the supremum of JI over all those solutions, which should equal to JI'. t This inequality, in fact, holds for k > 1 + J(8/13), but weconsider only positive values ofkfor the sake of simplicity in presentation.
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
319
To recapitulate, (I) For the dynamic game described by (16}{17) and under the CLPS information pattern, a Stackelberg solution exists for all nonnegative values of p. (2) The Stackelberg cost of the leader (P 1) is equal to the global minimum value of his cost function (obtained by cooperative action of the follower), even though the follower (P2) seeks to minimize his own (different) cost functional and may not know the cost functional of PI. (3) The leader's Stackelberg strategy is a representation of his optimal feedback strategy in the related team problem described by (19), and it necessarily involves memory. (A feedback strategy for PI cannot be a Stackelberg strategy.) (4) The Stackelberg strategy of PI is nonunique (parameterized, in this case, by a positive scalar k), but the optimal response of P2 to all those strategies of PI is unique (and independent of k); it is the strategy that minimizes the leader's cost function, even though the follower's objective is quite different. (5) The Stackelberg strategy of the leader, as given by (21), is not even continuous; this, however, does not rule out the possibility for existence of continuous (and differentiable) Stackelberg strategies. (6) The strategy given by (21)may be viewed as a threat strategy on part of the leader, since he essentially threatens the follower to make his cost worse if he does not play the optimal team strategy (20b). One natural question that arises at this point is whether there exists a Stackelberg strategy for the leader which is structurally more appealing (such as continuous, differentiable, etc.). In our investigation in this direction, we necessarily have to consider only the strategies in rr, since the quantity (19) constitutes a tight lower bound on the cost of the leader. In particular, we now consider only those strategies in I?' which are linear in the available information, which can be written as yl(X2,xd =
~X2
+ p(x 2 
3;~2
Xl)
(22)
where p is a scalar parameter. To determine the value of p for which (22) constitutes a Stackelberg strategy for the leader, we have to substitute (22) into J 2, minimize the resulting expression over y2E r 2 by also utilizing (16), and compare the argument of this minimization problem with the strategy (20b). Such an analysis readily leads to the unique value
provided, of course, that
p i=
2 1 p=3 P O. Hence, we have
320 PROPOSITION
DYNAMIC NONCOOPERATIVE GAME THEORY
1 Provided that f3
+ 0, the linear
strategy
I. 1 2f3 3 ( 3f3 y (x2,xd = 3X2 +~ X 2  3f3+2
Xl
)
constitutes a Stackelberg strategy for the leader in the dynamic game described by (16)(17), and the unique optimal response ofthefollower is as given by (20b). The corresponding state trajectory is described by (20c) and the leader's Stackelberg cost is equal to the team cost (20d). 0
Therefore, with the exception of the singular case f3 = 0, the leader can force the follower to the team strategy (20b) by announcing a linear (continuously differentiable) strategy which is definitely more appealing than (21) which was discontinuous on the equilibrium trajectory. For the case f3 = 0, however, a continuously differentiable Stackelberg strategy does not exist for the leader (see Basar (1979b) for a verification), and he has to employ a nondifferentiable strategy such as (21). Remark 4 The linear strategy presented in Prop. 1 and the discontinuous strategies given by (21) do not constitute the entire class of Stackelberg strategies of the leader in the dynamic game of this subsection; there exist other nonlinear (continuous and discontinuous) strategies for the leader which force the follower to the optimal team strategy (20b). Each such strategy can be determined by basically following the analysis that led to Prop. 1 under a different nonlinear representation of (20a) (instead of the linear representation (22)). 0 Remark 5 The results of this example can also be visualized graphically for each fixed Xl and f3. Let us take Xl = 1 and leave f3 as a parameter, in which case
the isocost curves of P2100k as depicted in Fig. If and the (openloop) team solution for PI can be written as (from (20)) u l'
= P/(2 + 3(3),
ul!
= 2/(2 + 3(3).
since there is a onetoone correspondence between X2 and u 2 , any permissible strategy of PI can also be written as a function of u 2 , i.e. u l = yl(U 2). Now, if we can find a strategy r" with the property that its graph has only the point (u 1', u2') in common with the set D ~ {(ul , u 2): J 2(U l , u 2 ) :::;; J2(U I ', u2')},
then the follower (P2) will definitely choose u2 ' . Thus ylo constitutes a Stackelberg strategy for PI, leading to the global minimum of his cost function.
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
321
u' 3
Fig. 1. Graphical illustration of Example 2.
It easily follows from Fig. I that, for f3 > 0, there exists a unique linear yl' which satisfies the abovementioned requirement; it is the tangent at (u I', u2') to the corresponding isocost curve of P2. For f3 = this tangent line turns out to be vertical (parallel to the u 1axis), and hence cannot be described by a linear strategy; however, a Stackelberg strategy exists in the class of continuous strategies for the leader. (In Fig. 1 we have drawn the tangent lines for f3 = 2 and f3 = 0, and for the latter case the dashed curve provides a continuous Stackelberg strategy for the leader.) 0
°
Remark 6 A common property of the Stackelberg solutions obtained for the dynamic game described by (16H17), and under CLPS information, is that they are also Nash equilibrium solutions. To see this, we first note the obvious inequality
for such a Stackelberg equilibrium solution pair. Furthermore, the inequality Jl(yl*,y2*):S; Jl(yt, y2*)
V yl
Er 1
also holds, since the Stackelberg solution is also teamoptimal (under J 1) in this case. These two inequalities then imply that every such pair also constitutes a Nash equilibrium solution. We already know that the dynamic game under consideration admits an uncountable number of Nash equilibrium solutions (cf.section 6.3); the above discussion then shows that, since the leader can announce his strategy ahead of time in a Stackelberg game, he can
322
DYNAMIC NONCOOPERATIVE GAME THEORY
choose those particular Nash equilibrium strategies (and only those) which lead to an equilibrium cost that is equal to the global minimum of his cost functional. 0
7.4.2 A second example (Example 2): follower acts twice in the game The indirect method introduced in the foregoing subsection for derivation of the Stackelberg solution of dynamic games under CLPS information basically involves two steps: (1) determination of a tight lower bound on the cost function of the leader, which coincides with the global minimum value of a particular team problem; (2) adoption of a particular representation of the optimal team strategy of the leader in this team problem, which forces the follower to minimize (in effect) the cost function of the team problem while he is in fact minimizing his own cost function. Even though this general approach is applicable in a much more general context (i.e. applicable to dynamic games with several stages and more complex dynamics), the related team problem is not always the one determined solely by the cost function of the leader (as in the previous subsection), in particular if the follower also acts at the last stage of the game. The following scalar dynamic game now exemplifies derivation of the related team problem in such a situation. As a modified version of the scalar dynamic game of section 7.4.1,consider the one with dynamics (23) and cost functionals L I = (X3)2 + 2 (U I)2 + P(Ui)2, L 2 = (X3)2+(uif+(un 2.
P> 0
(24a) (24b)
The information structure of the problem is CLPS, and PI (the leader) acting at stage 2 has a single control variable u I, while P2 (the follower) acting at both stages I and 2 has control variables ui and u~, respectively. Let us introduce the strategy spaces r (for PI), (for P2 at stage 1)and (for P2 at stage 2) in a way compatible with the governing information structure. Then, to any announced strategy yl E r of the leader, the follower's optimal reaction at stage 2 will be
n
2 I I I Y2[X2;y] = 2[x 2  y
n
(X2'X I)]
En.
which is obtained by minimization of J20ver y~ Hence, regardless of what strategy yl E r the leader announces, and regardless of what strategy the
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
323
follower employs at stage 1, his (the follower's) optimal strategy at stage 2 is a unique linear function of X2 and y I, as given above. Substituting this structural form into )1 and )2, derived from (24a) and (24b), respectively, we obtain the reduced cost functionals
and therefore the lowest possible cost value the leader can achieve is min min j1 (i, yn
(26)
y1er1 )'feri
which is the quantity that replaces (19) for this game. The original Stackelberg game has thus been converted into one wherein the follower does not act at the last stage (his only strategy now being yi), and the related team problem is described by (26), whose optimal team solution in feedback strategies is I' 1 Y (X2)=gX 2
(27a)
yf (XI) =
(27b)
[2/(913 + 2)]xI
with the corresponding unique team trajectory being x~ = [9 P/(9P+ 2)]Xl } = (4/9)x~.
x~
(27c)
The associated minimum team cost is ]1' = [213/(913 + 2)] (x d 2
(27d)
which forms a lower bound on the leader's cost in this game, and is definitely higher than the quantity min min min yI e r
1
y~ E
ri
)12 E
rf
)1
(yt, y~,
yn.
Now, paralleling the arguments of section 7.4.1, we obtain the following representation of (27a),on the trajectory (27c), to be a Stackelberg strategy for the leader, forcing the follower at the first stage to the team strategy (27b):t I"
y (x 2 , X tl = t
{~X2
_ kX
I
if X 2 = [913/(913 + 2)]x I otherwise
The reader is asked to fill in the details of this derivation.
(28)
324
DYNAMIC NONCOOPERATIVE GAME THEORY
where k >  1 + J(96/113). The optimal response of the follower to any such strategy is
yi* (xd =
[2/(9P
+ 2)]x 1
(29a) (29b)
leading to (27d) as the Stackelberg cost of the leader. To obtain the counterpart of Prop. 1 for the dynamic game of this subsection, we again start with a general linear representation of (27a), and by following similar arguments we arrive at: PROPOSITION
2
Provided that
p + 0, the
linear strategy
constitutes a Stackelberg strategy for the leader in the dynamic game described by (23)(24), and the unique optimal response ofthe follower is given by (29). The corresponding state trajectory is described by (27c), and the leader's Stackelberq cost is as given by (27d). 0 Remark 7 It can again be shown that, for p = 0, no continuously differentiable Stackelberg solution exists for the leader; therefore, he has to announce a strategy which is discontinuous in the derivative, such as (28), or the one analogous to that discussed in Remark 5. Furthermore, the reader should note that, for the dynamic game under consideration, there does not exist a Stackelberg solution which also constitutes a Nash equilibrium solution; i.e. a counterpart of Remark 6 does not apply here. The main reason for this is the fact that the follower also acts at the last stage of the game. 0
7.4.3 Linear Stackelberg solution of linearquadratic dynamic games We now utilize and extend the indirect methods introduced in sections 7.4.1 and 7.4.2 to solve for the Stackelberg solution of twoperson linearquadratic dynamic games within the class oflinear memory strategies, so as to obtain the counterpart of Props 1and 2 in this more general context. The class of dynamic games under consideration is described by the state dynamics (30)
7
325
HIERARCHICAL (STACKELBERG) EQUILIBRIA
and cost functionals (for PI and P2, respectively) 1 ~ (' Ql r' 1 i'R12 2) L 1 =2 :\ Xk+l k+l Xk+l+ Uk Uk+Uk k Uk
k
(31a) (31b)
Rk 2>O, Q~+l~O, R~I~O, 'v'kEK~{I, ... , K}, with Qk+l~O, dim (x) = n, dim (ui) = m., i = 1,2. The information structure of the problem is CLPS, under which the closedloop strategy spaces for PI (the leader) and P2 (the follower) are denoted by nand respectively, at stage k E K. Furthermore, let J 1 and J 2 denote the cost functionals derived from (31a)and (3Ib), respectively, for the normal (strategic) form of the game, under these strategy spaces. Since Bk does not necessarily vanish in this formulation, the follower also acts at the last stage of the game, and this fact has to be taken into account in the derivation of a tight lower bound on the leader's cost functional. Proceeding as in section 7.4.2, we first note that, to any announced strategy Ktuple {Yk ErL kEK} by the leader, there corresponds a "robust" optimal reaction by the follower at stage K, which is determined by minimizing j2(y 1, y 2) over YiErL with the remaining strategies held fixed. This quadratic strictly convex minimization problem readily leads to the relation
n,
yi[X K; yl]
=
[I+B{QL1 B
n1B{QLl [A KxK+B1Yk('1K)]
(32) ,X K}, and yi [xK;Y1J stands for Yi('1K)the where nK~{xl,X2"" strategy of P2 at stage K and displays the explicit dependence of the choice of yi on yl. It should be noted, however, that the structure of this optimal response of the follower does not depend on the choice of yl: regardless of what yl and the previously applied strategies are, yi depends linearly on XK and yl('1K)' The unique relation (32) can therefore be used in (3Ia) and (3Ib), without any loss of generality, in order to obtain an equivalent Stackelberg game wherein the follower does not act at the last stage. The cost functionals of this new dynamic game in extensive form are
I
i
I
I
1

= 2(X K+ 1QK+IXK+l
+ u( Uk
I
1 + UKi ' UK)+2
Kl ~
L... (Xk+ 1 Qk+l Xk+
1
k = 1
+ u{Rk2U~)
(33a) I
I Q2 I'R21 1) I 2 =:2 (' X K + 1 K+1XK+l+U K K UK +2 +u{u~+u(R~lun
Kl
~
k~1
(' 1Q2 Xk+l k+l Xk+
(33b)
326
DYNAMIC NONCOOPERATIVE GAME THEORY
where 1 X K+ I = A KXK+ B KUK1 Qi + I = [I  BiT]' Qi + 1 [I  BiT] + T' R k2T Qi+1 = Qi+1 [I BiT] D. (B 2'Q2 B 2 1)lBK2'Q2 T = K K+l K+ K+l'
(34a) (34b) (34c)
(34d)
Let us further introduce the notation
Ii (yl, y2)
~
y2 Er 2+ {yfE:r~; k = 1, ... , Kl} (i I{ul = yl (11k), k E K; u~ = y~ (11k), k = 0, ... , K l}.
Then, a lower bound on the leader's Stackelberg cost is, clearly, the quantity min mip Ji (yl, y2) (35) y'er' ji'er' which is the minimum cost ofa team problem in which both PI and P2 strive to minimize the single objective functional T', This team problem admits a unique optimal solution within the class of feedback strategies, which is given in the following lemma whose proof readily follows from Prop. 5.1 by an appropriate decomposition. LEMMA 2 lnfeedback strategies, the joint optimization (team) problem defined by (35) admits a unique solution given by
y~' (x k) =
 Ll Xk'
Yk Xk =  L2kXk' 2/ ( )
kEK{K}
and with minimum cost
where
kEK k50K1
(36a) (36b)
M() is defined recursively by
M K = Qk+FKQi+IFK+Ll'Lk M k =Ql+F/,Mk+1 F k+ LfLi+LtRi 2Lf,
k 50 K I,}
(37).
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
327
where FKf;,AKBkLk
Fk f;, A k  B~ L~  B~ L~,
k:S:Kl.}
(38)
The optimum team trajectory is then described by
(39)
o
In view of the analysis of section 7.4.2, we now attempt to find a representation of the team strategy {yt'; kEK} on the optimum team trajectory (39), which provides a Stackelberg strategy for the leader. The following proposition justifies this indirect approach for the problem under consideration. PROPOSITION 3 Let {ytO; kEK} be a representation of the team strategy {yt'; k E K} on the optimum team trajectory (39), such that every solution of the
minimization problem
mip J2 (yIO, y2)
y1Er 1
is a representation of {yf; k E K  { K }} on the same team trajectory. Then, {yt 0; k E K} provides a Stackelberg strategy for the leader. Proof We have already shown, prior to the statement of Lemma 2, that the quantity (35) provides a lower bound on the cost function of the leader, for the dynamic game under consideration. But, under the hypothesis of Prop. 3, this lower bound is tight, and is realized if the leader announces the strategy {yf; k E K} which, thereby, manifests itself as a Stackelberg strategy for the leader. 0
To obtain some explicit results, we now confine our attention to linear onestep memory strategies for the leader (more precisely, to those representations of {yt'; k E K} which are linear in the current and most recent past values of the state), viz. y~ (Xk,xkd
= LtXk+Pk[XkFklXkd (kEK{l}),
(40)
where {P k ; k E K  {I} } is a matrix sequence which is yet to be determined. This matrix sequence is independent of Xk and Xk _ I, but it may, in general, depend on the initial state x 1 which is known a priori. Our objective for the remaining portion of this subsection may now be succintly stated as (i) determination of conditions under which a Stackelberg strategy for the leader exists in the structural form (40) and also satisfying the hypothesis is of
328
DYNAMIC NONCOOPERATIVE GAME THEORY
Prop. 3, (ii) derivation of the corresponding strategy of the leader (more precisely, the matrices P k , k E K  {I}) recursively, whenever those conditions are fulfilled. Towards this end, we substitute (40) into J2 (yl , y2), minimize the resulting functional over y2 E 2 and compare the minimizing solutions with {y~l; k E K} and its representations on the optimum team trajectory. Such an analysis leads to Thm 3 given in the sequel.
r
Preliminary notation for Theorem 3 1:(.): an (m2 x m2)matrix defined recursively by
1:k=Q~+Fk1:k+IFk+L(RPLt+Lf'
L~;
1:K+ 1 =
Qi:+1
(41)
A k : an (ml x n)matrix defined recursively by (as a function of {PH I ' P H 2 , · · · ,Pd)
s, = B( Pk+IAHIFkR~1
t.; +B(1:k+ IFk;
AK+I
= O.
(42)
CONDITION 1 For a given Xl E R", let there exist at least one matrix sequence {P K' P K _ I' ... , P 2} that satisfies recursively the vector equation
[Bf'P~+
jA H jFk + Bf'1: H IF k  Lf] xi = 0
where A k is related to {P K , •.. PH d through (42), and function of x I, as determined through (39).
(k + 1 EK)
xi is a known
(43) linear 0
3 Let Condition 1 be satisfied and let {PI (xd, ... , P! (xd} denote one such sequence. Then, there exists a Stackelberg solution for the dynamic game described by (30)(31) and under the CLPS information, which is given by
THEOREM
yf(xk, Xk
l ,
xd
= [Pt (xd LtJ Xk Pt (xd FkIXk_I,} kEK{I}
(44a)
yr (xd =  LixI, Yr(xd= L~Xk'
kEK{K},
(44b)
and the corresponding Stackelberg costs ofthe leader and thefollower are given, respectively, by (45a)
(45b)
yf(.),
Proof Substituting ut = k E K, as given by (44a), into £2 defined by (33b), we obtain a functional which is quadratic and strictly convex in
7
329
HIERARCHICAL (STACKELBERG) EQUILIBRIA
{ uf; k E K  { K }}. Let us denote this functional by L (ui, .. , ui: _ I)' which is, in fact, what the folJower will minimize in order to determine his optimal response. Strict convexity of L readily implies that the optimal response of the follower to the announced strategy Ktuple (44a) of the leader is unique in openloop policies, and that every other optimal response in f2 is a representation of this unique openloop strategy on the associated state trajectory. Consequently, in view of Prop. 3, verification of the theorem amounts to showing that the minimum of L (ui, ... , ui: I) is attained by the openloop representation of the feedback strategy (K I)tuple (44b) on the state trajectory described by (39), in other words, we should verify that the set of controls
uf = 
LfxL
kEK {K}
minimizes L(ui, ... , U~_I)' Now, since L is quadratic and strictly convex, it is both necessary and sufficient that {un be a stagebystage optimal solution; i.e. the set of inequalities (i)
should be satisfied for alJ u~ E R rn2 and alJ k = 1, .. , K  1, where {u 2' h denotes the entire sequence ui', ... , _I with only urmissing. Because of the additive nature of the cost function L, (i) can equivalently be written as
ur
(ii) for all u~ ER 2' L k (UK_I""
1 K1
+2"
"
L...
rn
2
and all k = 1, ... ,K 1, where, for fixed k, 2'
2) l!.
,Uk+ 1,Uk =
L*k (Uk2)
_ 1(' Q 2 , R 21 ) 2" YK+I K+1YK+l+I1K K 11K
1
(' 2 Yj+l Q j+lYj+1 +l1j, R j21 I1j)+2"
j~k
KI
.
1
" 2 " 2' 2 L. Uj Uj +2"Uk2' Uk'
(iii)
j~k+l
with
(iv) Yk+2 = F k+ 1Yk + l +B~+lPt+1 Yk+1
= (AkB~Li)x~+B~u~
[Yk+l FkxD
330
DYNAMIC NONCOOPERATIVE GAME THEORY
and (V)
It is important to note that, on the RHS of (ii), L k is not only a function of u~ but also of X k, which we take at its equilibrium value x] because of the RHS of (i) (i.e. the sequence {ui, ... , u~ I} that determines x, is already at equilibrium while treating (ii)).It is for the same reason that Yk, on the last row of (iv), is taken to be equal to x]. Now, if the last row relation in (iv) is used iteratively in the remaining relations of (iv),we can express Yj,j ~ k + 2, in terms of Yk + 1 and xL and if this is used in (v), Ilj,j ~ k + 1, can be expressed in terms of the same variables. The resulting expressions are Yj
=
F (j,k + I)Yk+ 1 + H (j,k + I)Bi+ 1 Pt+ 1 [Yk+ 1  FkxD, j ~ k +2
Ilj= LJF(j,k+l)Yk+l +[Y(j,k+l)LJH(j,k+l)Bi+d Pt+l j ~ k+ 1
[Yk+l FkxU,
where F(j,k)=F j_ H(j,k) H (j,j 1) Y(j,k)
1 •••
F k, F(j,j)=I
= H(j,k+l)Bi+1Pt+l +F(j,k+l),
= I,
H (j, i)
= 0,
j
~
i
= P; BJl .... Pt+l Bio Y(j,j).= I.
Utilizing these expressions, we can next show that V u; L * (u~
u~:
1' . . . ,
1,
u~)
Iuf = ut = 0, V k =
1, .. , K  1.
(vi)
Towards this end, we first obtain
t
Vu; L * I.; ~ ut = [B { P ~
1
{B~ ~
1
H' (K
+ 1, k + 1) Qi + 1F
(K
+ 1, k)
K
+
L
j=k+l
[B~~IH'(j,k+l)ZjF(j,k)Y'(j,k+l)RJILJF(j,k)]}
(vii)
+ B{ ~k+ IFk  Ln x~,
where
Some further manipulations prove that (vii) can equivalently be written as
[Brp:~
l
A t + I F k +Br~k+
I Fk  Lnxi,
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
331
where At+ 1 satisfies the recursive equation (42) with Pi ; 1 replaced by Pt+ l ' But, by Condition 1 [(43)], the foregoing expression identically vanishes; this proves (vi), and in turn establishes the desired result that minimizes L at stage k and is in equilibrium with {u"} k : (Note that, since L is a strictly convex functional, (vi) is a sufficient condition for minimization.) Since k was an arbitrary integer, the proof of the main part of the theorem follows. Expression (45a) is readily obtained from Lemma 2 under the stipulation that the optimum team trajectory is realized, since then J ,. is identical with J". Expression (45b), on the other hand, follows from substitution of the team solution given in Lemma 2 into J2. 0
ut
Remark 8 Condition 1, and thereby the Stackelberg solution presented in Thm 3, depend, in general, on the value of the initial state Xl; that is, the Stackelberg strategy of the leader at stage k is not only a function of X k and X k _ 1, but also of Xl' This latter dependence may be removed under a more stringent condition (than Condition 1)on the parameters of the game, in which case the vector equation (43) is replaced by the matrix equation (k + 1 EK).
(46)
Such a restriction is, in fact, inevitable if we seek a solution to the infinite horizon problem (i.e. with K + 00), in which case our attention is solely confined to stationary controls (see Basar and Selbuz, 1979b, for an illustrative example). 0
Remark 9 The analyses of sections 7.4.1 and 7.4.2 have already displayed the validity of the possibility that a linearquadratic dynamic game might not admit a Stackelberg solution in linear strategies. Therefore, depending on the parameters of the dynamic game, Condition 1 might not be fulfilled. In such a case, we have to adopt, instead of (40), a parameterized nonlinear representation of {y~'; k E K} as a candidate Stackelberg strategy of the leader, and then determine those parameter values in view of Prop. 3. For derivation of such a nonlinear Stackelberg solution, the reader is referred to Tolwinski (1981b). Another important point worth noting is that, for the linearquadratic game of this section, and for general parameter values, J~ given in Lemma 2 is not always the Stackelberg cost of the leader, i.e. there may not exist a strategy yl' in ['1 (linear or otherwise) under which the follower is forced to choose a representation of y2 r on the optimal team trajectory. This would be the case if, for example, there exist certain control variables of the follower, at intermediate stages, which do not affect the state variable, and thereby cannot be "detected" by the leader. In such a case, the leader cannot influence the cost function of the follower through these control variables and hence cannot enforce the team solution. This then necessitates derivation of a new team cost
332
DYNAMIC NONCOOPERATIVE GAME THEORY
(which is realizable as the Stackelberg cost of the leader), by taking into account the robust optimal responses of the follower in terms of these controls (as we did at the final stage). The resulting Stackelberg cost will be higher than }1'. Another such case occurs if the leader can detect (through his state observation) a linear combination of the control variables of the follower, but does not observe them separately. (This would be the case if, for example, the dimension of the state is lower than that of the control vector of the follower.) Here again, we have to determine a new minimum team cost (different from J1 '), to be realized as the Stackelberg cost of the leader, by taking into account the freedom allotted to the follower in the nondetectable (by the leader) region of his control space. We do not pursue this point any further here, and refer the reader to Tolwinski (1981b) for such a derivation, and also to Basar (1980c) for an indirect derivation of the Stackelberg cost value of the leader in general dynamic games when the leader has access to closedloop imperfect state information (cf. Def. 5.2). See also Problem 6 in section 6. 0
7.5 STOCHASTIC DYNAMIC GAMES WITH DETERMINISTIC INFORMATION PATTERNS This section is devoted to a brief discussion on possible extensions of the results of the previous sections to stochastic dynamic games (cf. Def. 5.4) wherein the state evolves according to (47)
where {O 1, . . . ,OK} is a set of statistically independent Gaussian vectors with values in R" and with cov (Ok) > 0, V k E K. In a general context, the cost functional of Pi, for the game in extensive form, is again taken to be stageadditive, viz. L i ( U 1 ,U 2)
=
K
" (X k + l ' Uk, 1 ut 2 , Xk ), .I L. iOk
k=1
= 1, 2,
(48)
and, abiding by our standard convention, PI is taken to act as the leader and P2 as the follower. In the discussion to follow, we shall consider three different information structures; namely, (A) openloop for both players, (B) openloop for the leader and closedloop perfect state (CLPS) for the follower, (C) CLPS for both players. In addition to derivation of the (global) Stackelberg solution (cf. section 7.5.1), we shall also discuss derivation of the feedback Stackelberg solution under the third information pattern listed above (cf. section 7.5.2).
7
333
HIERARCHICAL (STACKELBERG) EQUILIBRIA
7.5.1
(Global) Stackelberg solution
A. Openloop information for both players If the underlying deterministic information structure is openloop for both players (in which case the controls depend on the initial state Xl which is assumed to be known a priori), then the Stackelberg solution can be obtained by basically converting the game into equivalent static normal form and utilizing the first method outlined in section 2 for the deterministic openloop Stackelberg solution. Towards this end, we first recursively substitute (47) into (48), and take expected values of L i over the statistics of the random variables {O l ' . . ,OK}, to obtain a cost functional (Ji(U l , u 2 » which depends only on the control vectors, u 1 , u 2 (and also on the initial state vector Xl which is already known by the players). Since such static games have already been treated in section 4.5, we do not discuss them here any further, with the exception of one special case (seethe next paragraph and Prop. 4). It should be noted that the Stackelberg solution will, in general, depend on the statistical moments ofthe random disturbances in the state equationunless the system equation is linear and the cost functionals are quadratic. For the special case oflinearquadratic dynamic games (i.e. whenj, in (47) is given as in (6a),and gl in (48) is structured as in (6b», and under the openloop information structure, the cost functional J i (U 1 , u 2 ) can be written as Ji(U l , u 2)
=
£ 21 (Y~+l
k;l
Q1+l Yk+l
+uru~+u{RVu{)+~i;j
f
i,j
= 1,2.
where Yk+l
=AkYk+Bfuf+B~u~+E[Ok];
Yl =X l,
and ~i is independent of the controls and Xl' and depends only on the covariances of {0 1 ' ••. ,OK}' This result is, of course, valid as long as the set {01" .. , OK} is statistically independent of Xl' which was one of our underlying assumptions at the beginning. Hence, for the linearquadratic stochastic game, the stochastic contribution completely separates out; and, furthermore, if E[Ok] = 'r/ k E K, the openloop Stackelberg solution matches (completely) with the one given in Corollary 1 for the deterministic problem. This result is now summarized below in Proposition 4.
°
PROPOSITION 4 Consider the twoperson linearquadratic stochastic dynamic game described by (47)(48) together with the structural assumptions (6a)(6b), and under the parametric restrictions Q1+ 1 ~ 0, Rf2 > 0, E[Ok] = (k E K; i = 1, 2).lt admits a unique openloop Stackelberg solution with PI acting as the leader, which is given, under the condition ofinvertibility ofthe matrix appearing 0 in the braces in (8c), by (7a)(7b).
°
334
DYNAMIC NONCOOPERATIVE GAME THEORY
B. OL information for PI and CLPS information for P2 For the deterministic Stackelberg game, we have already seen in section 2 that an additional state information for the follower does not lead to any difference in the openloop Stackelberg strategy of the leader, provided, of course, that he (the leader) still has only openloop information (cf. Remark 1). The only difference between the OLOL and OLCLPS Stackelberg solutions then lies in the optimum response strategy of the follower, which, in the latter case, is any closedloop representation of the openloop response strategy on the equilibrium trajectory associated with the OL Stackelberg solution; hence, the optimum strategy of the follower under the OLCLPS information pattern is definitely not unique, but it can, nevertheless, be obtained from the OL Stackelberg solution. For the stochastic Stackelberg game, however, the issue is more subtlein particular, if the state dynamics is not linear and the cost functional is not quadratic. In general, the OLOL Stackelberg solution does not coincide with the OLCLPS Stackelberg solution, and the latter has to be obtained independently of the former. The steps involved in this derivation are as follows: (1) For each fixed {u~ E U L k E K}, minimize J2(U
1,
y2 ) = E[
K
L
k=l
uL ul, xk)lu~
g~(Xk+l'
=
Yl(x" I ~ k), kEK]
subject to (47) and over the permissible class of strategies, r. Any solution y2 E r of this stochastic control problem satisfies the dynamic programming equation (see section 5.5.4, equation (5.58b) V (k,
x) = min
E9.cg~
ufeUf
=E9Jg~(fk(Xk'
Uie (x k, uL u~) + Ok' uL u~, xd 1 2 + V(k + 1,Jk(x k , Uk' ud + 0d]
yt (,,~)+Ok' + V(k + l,fk(xk, uL yt (, ~) uf,
U~,
yt (, ~),
(49)
Xk)
+ Ok)]
where ,~ ~
{X" I s k },
and E 9, denotes the expectation operation with respect to the statistics of Ok· (2) Now, minimize the cost functional
over the permissible class of strategies, r 1, and subject to the constraints imposed by the dynamic programming equation (49) and the state equation
7
335
HIERARCHICAL (STACKELBERG) EQUILIBRIA
(47) with uf replaced by y;o (l1f). t The solution of this optimization problem constitutes the Stackelberg strategy of the leader in the stochastic dynamic game under consideration. t For the class of linearquadratic games, these two steps can readily be carried out, to lead to a unique Stackelberg solution in closedform. In this case, the leader's equilibrium strategy, in fact, coincides with its counterpart in the OLOL game (presented in Prop. 4}a property that can be verified without actually going explicitly through the two steps of the foregoing procedure. This result is given below in Prop. 5, whose proof, as provided here, is an indirect one. PROPOSITION 5 Consider the twoperson linearquadratic stochastic Stackelberg game ofProp. 4, but under the 0 LCLPS information pattern. Provided that the condition of matrix invertibility of Prop. 4 is satisfied, this stochastic game admits a unique Stackelberg solution with PI as the leader, which is given by
where
yr(xd= BfK~x:, kEK, yf"(xk,xd=  pfSk+ lAkx k  pf (Sk+ 1

Sk+ 1 B~ B(K~
(5Ia) x:), k E K,(5Ib) (52a)
Pf= [I+B{Sk+1Bfr1B{ Sk= Qf Sk= A~[I
+ A~Sk+
1
[I B;P; Sk+l]A k ; SK+l = Qk+l
BfpfSk+ .I' [Sk+l Sk+1B~B(K~xn;SK+l
(52b)
= 0,
(52c)
and K~ is defined by (8a) and x: by (ge). Proof For each fixed control strategy {u~ = y~ (x.). k E K} of the leader, the dynamic programming equation (49) can explicitly be solved to lead to the unique solution c
yt(X k,X 1) = PfSk+1AkXk Pf(Sk+l +Sk+1B~un
kEK,
(i)
where Pf and Sk + 1 are defined by (52a)and (52b), respectively, and Sk is defined by Sk = A~[I  Bf pfSk + 1]' [Sk + 1 + Sk+ 1 B~ un; SK+ 1 = 0. (ii)
yt(.)
t Note that is, in fact, dependent on )'1 , but this dependence cannot be written explicitly (in closedform), unless the cost functional of P2 and the state equation has a specific simple structure (such as quadratic and linear, respectively). t The underlying assumption here is that step (i) provides a unique solution {yt ; k e K} for every
{u1eU1;keK}.
336
DYNAMIC NONCOOPERATIVE GAME THEORY
(This result follows from Prop. 5.4 by taking E[OkJ = B~ u~.) Now, at step (2) of the derivation, (i) and (ii) above will have to be used, together with the state equation, as constraints in the minimization of (50)over r I. By recursive substitution of (ii)and the state equation into (i),we obtain the following equivalent expression for (i), to be utilized as a constraint in the optimization problem faced by the leader: (iii) where Tdl), M k , N k (/) are the coefficient matrices associated with this representation, whose exact expressions will not be needed in the sequel. Now, (iii) being a linear relation, and L 1 being a quadratic expression in 1 2 . 10 s: 11ows tat, h wit. h UI = 6. ( U I' , ... ,UK 1 ,J 1 (y, I y 2 ° ) can { Xk + I, ut ; Uk; k E K}, It I be written as
'r
JI(yl,y2o)=E[UI'TIUI+UI'T2XI+X;T3Xllu~
=y~(xd,
kEKJ+Jm
where T 1 , T 2, T 3 are deterministic matrices, and J m is independent of uland is determined solely by the covariance matrices of {Ok; k E K}this decomposition being due to the fact that the leader's information is openloop and hence his controls are independent of the random variables {Ok; k E K} which were taken to have zero mean. Since the Stackelberg strategy of the leader is determined as the global minimum of JI (yl , y2o), it readily follows that it should be independent of the covariance of {Ok; k E K}. Now, if there were no noise in the state equation, the follower's response (though not unique) would still be expressible in the form (i), and therefore the leader's Stacke1berg strategy would still be determined as the control that minimizes [u I 'T 1 U 1 + U I' T 2 X 1 + X'I T 3 XI]. Hence the leader's Stackelberg strategy in this stochastic game is the same as the one in its deterministic counterpartthus leading to (51a) (see Corollary 1 and Remark 1).The follower's unique optimal response (51b) then follows from (i)by substituting (51a) into (ii). Note that this strategy is, in fact, a particular closedloop representation of (7b) on the openloop equilibrium trajectory of the deterministic problem (which is described by ~~~
0
C. CLPS information for both players When the leader has access to dynamic information, derivation of the Stackelberg solution in stochastic dynamic games meets with unsurmountable difficulties. Firstly, any direct approach readily fails at the outset (as in its deterministic counterpart, discussed in section 4) since the optimal response of the follower to any announced strategy of the leader (from a general strategy space) cannot be expressed analytically (in terms of that strategy), even for linearquadratic games. Secondly, the indirect method of section 4, which is
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
337
developed for deterministic dynamic games, cannot be extended to stochastic games, since, in the latter case, every strategy has a unique representation (see section 5.5.4}as opposed to existence of infinitely many closedloop representations of a given strategy in a deterministic system. Consequently, derivation of the closedloop Stackelberg solution of stochastic dynamic games remains, today, as a challenge for the researchers. If, however, we make some structural assumptions on the possible strategies of the leaderwhich is tantamount to seeking suboptimal solutionsthen the problem may become tractable. In particular, if, under the stipulated structural assumptions, the class of permissible strategies of the leader can be described by a finite number of parameters, and if the follower's optimal response can be determined analytically as a function of these parameters, then the original game may be viewed as a static one in which the leader selects his strategy from a Euclidean space of appropriate dimension; such a static Stackelberg game is, in general, solvablebut more often numerically than analytically. The following example now illustrates this approach and demonstrates that even for simple stochastic dynamic games, and under the crudest type of structural assumptions, the corresponding (suboptimal) Stackelberg solution cannot be obtained analytically, but only through some numerical minimization techniques. Example 3 Consider the 2stage scalar stochastic dynamic game described by the state equations (53)
and cost functionals LI L2
= (X3)2 + 2(U I )2 + (u 2f = (X3)2 + (U 2)2.
(54a) (54b)
Here, (}l and (}2 are taken as independent random variables with mean zero and variances (1" I and (1"2, respectively. The leader (P 1)acts at stage 2 and has accessto both Xl and X2, while the follower (P2) acts at stage 1 and has only access to Xl' If yiEr i denotes a general strategy of Pi(i = 1,2), the expected (average) cost functional of P2 can be written as
xtJ + (}2]2 + [y2(XtJ]2} yl (X2' XtJ]2 + [y2(XtJ]2} + (1"2 which has to be minimized over y2 E r, to determine the optimal response of J2(yl, y2)
= E {[X2 = E {[X2 
yl (x 2,
P2 to yl E r. Barring the stochastic aspects, this is similar to the problem treated in section 7.4.1, where the difficulties involved in working with a general yl have been delineated. We therefore now restrict our investigation
338
DYNAMIC NONCOOPERATIVE GAME THEORY
(also in line with the discussion preceding the example) to a subclass of strategies in I" which are affine in X2' that is, to strategies of the form t (55)
yl(x 2,xd = IXX2 +px l
where IX and Pare free parameters which are yet to be determined. They will,in general, be dependent on Xl which is, though, known a priori by both players. Under the structural restriction (55), J2 admits a unique minimum, thus leading to the optimal response strategy (for the follower) o y2
(xd
=
(1  1X)(1 IX 1 + (1 _1X)2
P) Xl
(56)
which explicitly depends on the parameters IX and P that characterize the leader's strategy. To determine their optimal values, we now substitute (55}(56) into P(yl, y2), together with the corresponding values of X3 and X2 from (53), to obtain the function F given below, which has to be minimized over IXER, pER for fixed Xl: FIX
(,
P) =
{[1IXP]2 2[1X+2PIXP]2} 2 2 2 1+(11X)2 + [1+(11X)2]2 x l+[(11X) +IX ]a l+a 2 •
Let us now note that (i) F is jointly continuous in (IX, P), F (IX, P) ~ 0 V(IX, p) E R x R, and F (IX, P)  00 as IIX I, IPI  00. Therefore, we can restrict our search for a minimum on R 2 to a closed and bounded subset of R 2 , and consequently there exists (by the Weierstrass theorem) at least one pair (IX*,P*) that minimizes F for any given pair (Xl> ad. (ii) The optimum pair (IX*,P*) depends on (Xl' ad, but not on a 2 , and it cannot be expressed analytically as a function of (Xl' ad. Hence, for each fixed (Xl> ad, F (IX, p) has to be minimized numerically, by utilizing one of the available optimization algorithms (Luenberger, 1973). (iii) With (IX*, P*) determined as above, the linearlux, strategy yIO(X2' xd = 1X*(xdx2 + P*(XdXl is only a suboptimal Stackelberg strategy for the leader, since he may possibly achieve a better performance by announcing a strategy outside the "linearinx," class. 0
7.5.2
Feedback Stackelberg solution
In this subsection, we extend the results of section 3 to stochastic dynamic games described by (47}(48), and, in particular, we obtain the counterpart of t The apparent linear structure of the second term below is adopted for the sake of convenience in the analysis to follow; it could also have been taken as a single function p. t See section 5 of Appendix I.
7
339
HIERARCHICAL (STACKELBERG) EQUILIBRIA
Thm 2 in the present context (see Thm 4 below). In this analysis, we do not restrict our attention at the outset to feedback strategies (as it was done in section 3), but rather start with general closedloop strategies Uk = Yk (Xl, I s k), k E K, i = 1,2. The conclusion, however, is that the feedback Stackelberg solution can be realized only in feedback strategies. To see this, we start (in view of Def. 3.29)at the last stage k = K and solve basically a static Stackelberg game between PI and P2, with their strategies denoted by y~ and y;" respectively. Because of the additive nature of the cost functional of each player, and since cov (OK) > 0, OK is Gaussian and statistically independent of the remaining random vectors, every Stackelberg solution at this stage will depend only on X K and not on the past values of the state vector. Proceeding to the next stage k = K  1,after substitution of the Stackelberg solution at stage k = K into the state equation, we conclude, by the same reasoning, that the Stackelberg strategies at stage k = K  I are only functions of X K _I_ An inductive argument (as in the proof of Thm 6.9) then verifies the "feedback" property of the feedback Stackelberg strategies, and it also simultaneously leads to the equations that yield the corresponding solution for the stochastic dynamic game. Theorem 4, below, now provides the complete solution under the assumption of singleton reaction sets for the follower. THEOREM 4 Every feedback Stackelberg solution of a twoperson Ksstaqe stochastic infinite dynamic game described by (47}(48) and with the C LPS information pattern (for both players) comprises only feedback strategies; and a pair ofstrategies {yl*E r l , y2*E r 2} constitutes a feedback Stackelberg solution with PI as the leader if
min
}'~
where
E
r:. }'; ER.;(yfl
Gi(yLyf,x k )
kf (.) is a singleton set
= Gi(yf,yf,xk)V'XkERn
(kEK)
(57)
defined by (58a)
(58b)
and
Gk is recursively defined by I
i
GdXk+ I, Yk (x k ) ,
y~:
I
(x k + d, y~:
2
Yk
I
(x k ) , x k )
(Xk+
=
d) + 0k+ 1> yi: I (xk+ d, y~:
x k + I)] + gk(x k + I' yi (xd, yf(x k ) , xd;
G
i K+I
= 0,
i
= 1, 2,
Ai
E6k+JGk+ I (.h+ I (Xk+ I
b
(x k + d, (59)
340
DYNAMIC NONCOOPERATIVE GAME THEORY
Proof This result follows from a direct application of Def. 3.29 and Prop. 3.15 (interpreted appropriately for the infinite stochastic dynamic game under consideration), if one utilizes the techniques and lines of thought employed in the proof of Thm 6.9 for the feedback Nash solution. 0 Remark 3 has a natural counterpart here for the stochastic game; and for the special case of the linearquadratic stochastic dynamic game in which E[OkJ = 0 T/ k E K, it may readily be verified that the feedback Stackelberg solution of Thm 4 is precisely the one given in Corollary 2.
7.6
PROBLEMS
1. Show that, for the special case of twoperson zerosum dynamic games (i.e. with L 1 ==  L 2), Thm I coincides with Thm 6.3, and Corollary I
coincides with Thm 6.4. 2. Obtain the openloop Stackelberg solution of the twostage dynamic game described by (23) and (24) when (i) PI is the leader, (ii) P2 is the leader. For what values of the parameter f3 is the openloop Stackelberg solution concurrent? (See section 4.5 for terminology.) 3. Obtain the feedback Stackelberg solution of the dynamic game of the previous problem under the CLPS information pattern when (i)PI is the leader, (ii) P2 is the leader. 4. Generalize the result of Corollary 2 to the class of linearquadratic dynamic games with state dynamics Xk + 1
= Akx k + B~
u~
+ B~
u~
+ Ck>
where Ck (k E K) is a constant vector. 5. Consider the scalar Kstage dynamic game described by the state equation and cost functionals
U = U =
K
L [(Xd2+(U~)2+2(u~fJ
k=l K
L
[(xd 2 +0'5(U~)2
+ I·5(ul)2].
k= 1
Obtain the global and feedback Stackelberg solutions of this dynamic game under CLPS information pattern when (i) PI is the leader, (ii) P2 is the leader. Compare the realized values of the cost functions of the
7
HIERARCHICAL (STACKELBERGj EQUILIBRIA
341
players under these different Stackelberg solutions for the cases when K = 5 and K + 00. *6. Consider the twostage dynamic game characterized by the 2dimensional state equation 2,. X2 = X 1U Xl = (1,1)' X3
= Xl (I,I)'u l
and cost functionals L
I=
X3 X3
L2 = x'
3
+ (U I)2 + U2'(~
(1
~)U2
(1
O)x + u2' 0)u2. 0 2 3 0 2
Here, u l is the scalar control variable of PI (the leader) who has access to y=(I,1) x 2 ' 2
and u is the 2dimensional control vector of P2 (the follower) who has openloop information. Show that the global minimum of L I cannot constitute a tight lower bound on the leader's Stackelberg cost function, and a tight lower bound can be obtained by first minimizing L 2 over u 2 E R 2 subject to the constraint
2(I,I)u 2 = y and then minimizing L lover u l E Rand y E R, with u 2 determined as above. (For more details on this problem and the method of solution see Basar, 1980c.) 7. Consider the scalar dynamic game of section 7.4.1 with Xl = 1, f3 = 2, and under the additional restriction  1 ~ u i ~ 0, i = 1,2. Show (by graphical display) that the Stackelberg cost of the leader is higher than the global minimum value of his cost function, and a strategy that leads to this cost value (which is J I * = 8/9) is YI * = 1/3. 8. Prove Prop. 4 by carrying out the two steps preceding it in section 7.5.1. *9. Consider the scalar stochastic 2stage dynamic game described by the state equation
xI=I
Xk+I=0'5xk+Ut2u~+(}k'
and cost functionals
L I = (X3)2+
U
2
L
[(un2+2(u~)2J
k=l 2
= (X3)2 + L
k=l
[I'5(u~)2
+0'5(uf)2],
342
DYNAMIC NONCOOPERATIVE GAME THEORY
where 0 1 and O2 are statistically independent Gaussian variables with mean zero and equal variance 1. The underlying information structure is CLPS for both players. Determine the feedback Stackelberg solution and the linear (global) Stackelberg solution with PI as the leader. Can the leader improve his performance in the latter case by employing nonlinear strategies? *10. Obtain the linear (global) Stackelberg solution of the previous problem when the stochastic terms are correlated so that E[Ol 02] = 1 (in other words, there exists a single Gaussian variable with mean zero and variance 1, so that 0 1 = O2 = 0).
°
7.7
NOTES
Sections 7.2 and 7.3 Openloop and feedback Stackelberg solution concepts in infinite dynamic games were first treated in the continuous time in the works of Chen and Cruz, Jr (1972)and Simaan and Cruz, Jr (1973a, b). The results presented in these two sections are their counterparts in the discrete time, and the analysis follows that of Basar (l979b). Other references that discuss the openloop and feedback Stackelberg solutions in discretetime infinite dynamic games are Kydland (1975), Hamalainen (1976) and Cruz, Jr (1978), where the latter also discusses derivation of feedback Stackelberg solution of continuoustime dynamic games under sampleddata state information. Haurie and Lhousni (1979) provide a discussion on the feedback Stackelberg solution concept in continuoustime dynamic games with CLPS information structure. For applications in microeconomics see Okuguchi (1976). In Wishart and Olsder (1979), necessary conditions for openloop Stackelberg solutions in deterministic continuoustime differential games are applied to some simple investment problems, giving rise to discontinuous solutions. Section 7.4 Derivation of the global Stackelberg solution in infinite dynamic games with CLPS information has remained as a challenge for a long time, because of the difficulties explained in this section prior to section 7.4.1and illustrated in section 7.4.1. The indirect approach presented in this chapter was first introduced in Basar and Selbuz (1979a, b) and later in Tolwinski (1981b) and Basar (1980c); the specific examples included in sections 7.4.1 and 7.4.2are taken from Basar (1980c).The analysis of section 7.4.3follows closely the one of Basar and Selbuz (1979b),which also includes an extension to N( > 2)person games with one leader and Nl followers, and the followers playing according to the Nash equilibrium solution concept among themselves. Extensions of this approach to other types of N( > 2)person dynamic games (with different types of hierarchy in decision making) can be found in Basar (1980b). At the present, there is growing research interest in the hierarchical (Stackelberg) equilibria of many player dynamic games; some most recent articles devoted to this topic are Basar (1981), Ho et al. (1980)and Tolwinski (1980, 1981a). The indirect approach of this section has been extended to differential games with CLPS information pattern in Basar and Olsder (1980b) and Papavassilopoulos and Cruz, Jr (1980), but with partial success since the global minimum of the leader's cost function is attained as his Stackelberg cost under rather restrictive conditions. For a
7
HIERARCHICAL (STACKELBERG) EQUILIBRIA
343
possible indirect limiting approach that leads to the Stackelberg cost of the leader see Basar (1981). A general set of necessary conditions for the Stackelberg solution of twoperson deterministic differential games with CLPS information pattern has been obtained in Papavassilopoulos and Cruz, Jr (1979a). Section 7.5 This section follows the lines of Basar (1979b) and extends some of the results given there. The difficulties illustrated by Example 3 were first pointed out in Castanon (1976). For a discussion of the derivation of the Stackelberg solution when players have access to noisy (but redundant) state information, see Basar (1980d). Furthermore, Basar (l979c) discusses the feedback Stackelberg solution in linearquadratic stochastic dynamic games with noisy observation.
Chapter 8
PursuitEvasion Games 8.1
INTRODUCTION
This chapter deals with twoperson deterministic zerosum differential games with variable terminal time. We have already discussed zerosum differential games in Chapter 6, but as a special case of nonzerosum games and with fixed terminal time. The class of problems to be treated in this chapter, however, are of the pursuitevasion type for which the duration of the game is not fixed. Actually, it was through this type of problems (i.e.through the study of pursuit and evasion between two objects moving according to simple kinematic laws) that the theory of differential games was started in the early 1950s. Extensions to nonzerosum dynamic games, as treated in Chapters 6 and 7, were subsequently considered in the late sixties. Section 2 discusses the necessary and sufficient conditions for saddlepoint equilibrium strategies. We start with the natural twoperson extension of the HamiltonJacobiBellman equation, which provides a sufficient condition, and which is called the "Isaacs equation", after Isaacsthe acknowledged father of pursuitevasion games. A geometric derivation of the Isaacs equation is given in this section, which utilizes the principle of dynamic programming, and by means of which the concept of semipermeable surfaces is introduced; such surfaces play an important role in the remaining sections of the chapter. Subsequently, necessary conditions are derived, which are the twoperson extension of the Pontryagin minimum principle (cf. section 5.5). These conditions are valid not for feedback strategies but for their openloop representations, and in order to obtain the feedback strategies these openloop solutions have to be synthesized. In section 3 we treat "capturability", which addresses the question of whether the pursuer can "catch" the evader or not. The answer to this is completely determined by the kinematics, initial conditions and the target set; the cost function does not play any role here. It is in this section that the homicidal chauffeur game and the twocars model are introduced. 344
8
345
PURSUITEVASION GAMES
In section 4 we return to derivation of saddlepoint strategies for quantitative games. What makes the application of necessary and sufficient conditions intellectually interesting and nontrivial is the existence of a large number of singular surfaces. These surfaces are manifolds in the state space across which the backward solution process of dynamic programming fails because of discontinuities in the value function or in its derivatives. In section 5, two examples are worked out completely, and they illustrate the complexity that arises in the derivation due to the presence ofsingular surfaces. Each example displays a different type of such surface. In section 6 a problem in maritime collision avoidance is treated: under what circumstances can one ship avoid collision with another? The approach taken is one of worst case analysis, which quite naturally leads to a differential game formulation. The solution method, however, is applicable also to other collision avoidance problems which do not necessarily feature noncooperative decision making. In section 7 we address the problem of role determination in a differential game wherein the roles of the players (viz. pursuing or evading) are not specified at the outset, but are rather determined as functions of the initial conditions. Role determination is applied to an aeronautical problem which involves a dogfight between two airplanes.
8.2
NECESSARY AND SUFFICIENT CONDITIONS FOR SADDLEPOINT EQUILIBRIA
The systems considered in this chapter can all be described by x(t)
= j(t, x(t), u 1 (t), u 2(t)),
x(O)
= x o,
(1)
where x(t) E SO c R", ui(t) E S' C R mi, i = 1,2. The functionjis continuous in t, u1 and u 2 , and is continuously differentiable in x. The first player, PI, who chooses u 1 , is called the pursuer, which we shall abbreviate as P. The second player is called the evader and we shall refer to him as E instead of P2. The final time T of the game is defined by T
= inf'{re R + : (x(t), t)EA},
(2)
where A is a closed subset, called the target set, in the product space SO x R ". The boundary aA of A is assumed to be an ndimensional manifold, i.e. a hypersurface, in the product space R" x R +, characterized by a scalar function /(t, x) = O. This function is assumed to be continuous in t and continuously differentiable in x, unless stated differently. Since all differential games in this chapter will be zerosum, we have a single objective function
346
DYNAMIC NONCOOPERATIVE GAME THEORY
L(u 1, U2) = J~ g(t, X(t), U1(t), U2(t))dt 1
+ q( T, X (T),
(3)
2
where 9 is continuous in t, u and u , and continuously differentiable in X; q is continuous in Tand continuously differentiable in x(T). We define J as the cost function of the game in normal form, which is determined from L in terms of the strategies yi(t, x(t» = ui(t), i = 1, 2. Only feedback strategies will be considered, and we assume yi(t, x) to be piecewise continuous in t and x. With the permissible strategy spaces denoted by r 1 and r, let us recall that a strategy pair (y'., y2·) is in (saddlepoint) equilibrium if J (y'*, y2) s J (yl·, y2.) ::;; J (yl, y2·)
(4)
for all yi E rio The reason for considering feedback strategies, and not strategies of a more general class (such as strategies with memory), lies in the continuoustime counterpart of Thm 6.8: as long as a saddle point in feedback strategies exists, it is not necessary to consider saddle points with respect to other classes of strategies. Actually, as it will be shown in the sections to follow, "solutions" to various problems are, in general, first obtained in openloop strategies, which may then be synthesized to feedback strategies, provided that they both exist. The Isaacs equation
In Corollary 6.A2 a sufficiency condition was obtained for the feedback saddlepoint solution under the assumption of a fixed final time. It will now be shown that this corollary essentially holds true when, instead of a fixed final time, the final time is determined by the terminal constraint (2). The underlying idea here, too, is the principle of optimality; wherever the system is at any given arbitrary time, from then onwards the pursuer (respectively, evader), minimizes (respectively, maximizes) the remaining portion of the cost function under the feedback information structure. The function describing the minimax value" of the cost function, when started from the position (t, x), is V (t, x)
= min max {IT g(s, x(s), yl (s, x(s», y2(S,xts) ds + q(T, x(T)} ,t (5) yl
)'2
t
which is called the value function. Under the assumption that such a function exists and is continuously differentiable in x and t, it satisfies the partial differential equation min max
uleSt
uleS2
12J av t avf(t,X, u 12 a ,u )+g(t,x, u ,u) = a. [ X t
(6)
t It is tacitly assumed here (and hereafter) that the minimax value equals the maximin value, i.e. the operations min and max in (5) commute. t Ibid.
8
347
PURSUITEVASION GAMES
This equation is called the Isaacs equation, for which a geometrical derivation will be given here. It is also possible to extend the derivation given in Thm 6.A6 so as to include variable terminal time, but we prefer the geometric derivation here, since it provides more insight into the problem, in particular if there is only a terminal payoff. Accordingly, we assume that 9 == 0, which is in fact no loss ofgenerality because of the argument of Remark 5.1 (suitably modified for continuoustime systems). The value function V, if it exists, is then defined by V (t, x)
= min yl
= max
max q (T, x(T» y2
y2
min q(T, x(T) ),
(7)
yl
when the initial position is (t, x). One particular saddlepoint trajectory is drawn in Fig. 1; it is indicated by curve m and the value corresponding to any point of m is c. We now make the assumption that for initial points in a certain neighbourhood below m, V (t, x) < c,and similarly for initial points above m, V(t, x) > c.' At point A, on curve m, P will choose u 1 E S 1 so as to make the inner product between (0 Vlox, aVlot)' and (f', 1)' as small as possible, i.e. he tries to make the vector (f', 1)' point downward towards points corresponding to lowest possible costs. On the other hand, E wants to choose u 2 E S2 in such a way so as to maximize this inner product, because then the system will move in E's favourite direction, i.e. towards points corresponding to highest possible costs. Hence u I ' and u 2' are chosen as the arguments of min max u'
u'
x
f+ aoV) = (Oa V x t
max min u'
u'
f+ °oV). (OV ax t
(8)
( ~at ) q (t.x) = c,
"
"1(t,X)=O
Fig, 1. Geometrical derivation of the Isaacs equation, where
C1
>
C
>
C2'
If this assumption does not hold true (which is the case of V being constant in an (n + J) dimensional subset of R" x R +) the derivation to follow can easily be adjusted.
t
348
DYNAMIC NONCOOPERATIVE GAME THEORY
Since, mis an equilibrium trajectory, the system will move along m if'u" and u2' are applied, and that results in min max
.'eS'
.ZeS Z
(aaxvf + aatV) = 0,
(9)
which is equivalent to (6) if 9 == O. If E plays optimally, i.e. he chooses u", then
av ax f(x,U
1
av
p
,u )+at ~ 0
(l
0)
for all u1 E S 1, which implies that the outcome of the game will be ~ c, since the system will move into the area where V ~ c. However, P can keep the outcome at c by playing u", since then the equality sign will apply in (10). Thus, without E's cooperation (i.e. without E playing non optimally), P cannot make the system move from A towards the area where V < c. Analogously, by interchanging the roles of the players in this discussion, E cannot make the system move from A in the "upward" direction if P plays optimally. Only if P plays nonoptimally, E will be able to obtain an outcome higher than c. Because of these features, the ndimensional manifold in R n x R + ,comprising all initial positions (t, x) with the same value, is called semipermeable, provided that 9 = O. In conclusion, if the value function V (t, x) is continuously differentiable, then the saddlepoint strategies are determined from (6). This equation also provides a sufficiency condition for saddlepoint strategies as stated in the following theorem, the proof of which is a straightforward twoplayer extension of Thm 5.3. (Compare it also with Corollary 6.A2.) THEOREM 1 If (i) a continuously differentioble function V (t, x) exists that satisfies the Isaacs equation (6), (ii) V(T,x) = q(T,x) on the boundary of the target set, defined by l(t,x) = 0, and (iii) either ul'(t) = yl'(t,X), or u2'(t) = y2'(t,x), as derivedfrom (6), generates trajectories that terminate infinite time (whatever 1'2, respectively 1'1, is), then V(t, x) is the oaluefunction and the pair (1'1',1'2') constitutes a saddle point. 0
Remark I The underlying assumption of interchangeability of the min max operation in the Isaacs equation is often referred to as the Isaacs condition. The slightly more general condition
min max [p'f+g] ul
u2
= max u2
min [p'!+g], u1
for all nvectors p, is sometimes also referred to as the Isaacs condition. Regardless of which definition is adopted, the Isaacs condition will hold if both f and 9 are separable in uland u2 , i.e.
8 f(t, X,
U
PURSUITEVASION GAMES 1,
uZ)
g(t,x,U1,U
Z
)
34':1
= Ii (t, X, U 1) +12 (t, x, UZ ), = gdt,x,u1)+gz(t,x,u Z ).
If the Isaacs condition does not hold, and if only feedback strategies are allowed, then one has to seek for equilibria in the general class of mixed strategies defined on feedback strategy spaces. In this chapter, we will not extend our investigation to mixed strategies, and deal only with the class of differential games for which Isaacs condition holds (almost) everywhere. Note, however, that, even if separability holds, this does not necessarily mean that the order of the actions of the players is irrelevant, since the underlying assumption of V being continuously differentiable may not always be satisfied. In regions where V is not continuously differentiable, it may be crucial which player acts first (see section 8.5.2.). 0 In the derivation of saddlepoint strategies, equation (6) cannot directly be used, since V is not known ahead of time. An alternative is to use Thm 2, given below, which provides a set of necessary conditions for an openloop representation (or realization) of the feedback saddlepoint solution. In spite of the fact that it does not deal with feedback solutions directly, the theorem is extremely useful in the computation (synthesis) of feedback saddlepoint solutions. It can be viewed as the variable terminaltime extension of Thm 6.A3 and the twoplayer extension of Thm 5.4. The proof follows by direct application of Thm 5.4, which is also valid under the weaker assumption of f and g measurable in t (instead of being continuous in t) (see Berkovitz, 1974, p. 52). 2 Given a twoperson zerosum differential game, described by (1}(4), suppose that the pair {yl., yZ·} provides a saddlepoint solution in
THEOREM
feedback strategies, with x*(t) denoting the corresponding state trajectory. Furthermore, let its openloop representation {ui·(t) = yj·(t, x*(t», i = 1, 2} also provide a saddlepoint solution (in openloop policies). Then there exists a costate function p(.): [0, T] + R· such that the following relations are satisfied: x*(t)
= f(t, x*(t), u1·(t), uZ·(t», x*(O)
=
Xo
H(t,p,x*,ui·,u Z ) ~ H(t,p,x*,u'·,u z·) ~ H(t,p,x*,u1,u Z·),
V u 1 ES 1 , fJ'(t)
= 
p'(T) = where
uZ ES z,
oxo H (t, p(t), x*(t), u1·(t), uZ·(t»,
o q(T,x*(T» ox
along I(T,x) = 0,
(11)
350
DYNAMIC NONCOOPERATIVE GAME THEORY
Remark 2 Provided that both openloop and feedback equilibria exist, conditions of Thm 2 will lead to the desired feedback saddlepoint solution. Even if an openloop saddlepoint solution does not exist this method can still be utilized to obtain the feedback saddlepoint solution, and, by and large, this has been the method of approach adopted in the literature for solving pursuitevasion games. For an example of such a game in which this method is utilized, but an openloop (pure) saddle point does not exist, see the "Lady in the lake" game treated in section 5. Also, for certain classes of problems, the value function V (t, x) is not continuously differentiable, and therefore the sufficiency conditions ofThm 1 cannot be employed. Sufficiency conditions for certain types of nonsmooth V (t, x) have been discussed in Bernhard (1977), but they are beyond the scope of our treatment here. 0 In the literature on pursuitevasion differential games, it is common practice to write V" instead of p' (for the costate vector), and this will also be adopted here. The reader should note, however, that in this context, V" is only a function of time (but not of x), just the way p' is. After synthesis, the partial derivative with respect to x, of the value function corresponding to the feedback equilibrium solution, coincides with this function of time. We now illustrate the procedure outlined above by means of a simple example. As with most other problems to be considered in this chapter, this example is timeinvariant, and consequently the value function and the saddlepoint strategies do not explicitly depend on t (cf. Remark 5.5).
Example 1 Consider the twodimensional system described by
Xl = (2+J2)U2Sinul,} 2cosu l,
x2 =
(i)
where the initial value of the state lies in the half plane X2 > 0, u 2 satisfies the constraint I u 2 1 ~ 1 and no restrictions are imposed on u l. The target set is X2 ~ and the objective functional is L = xdT). Note that the game terminates when the xlaxis is reached, and since x2 ~ 1, it will always terminate. In order to apply Thm 2, we first write the Hamiltonian H for this pursuitevasion game as H = V {(2 + J2)u 2  sin u l } + V {  2  cos u l },
°
Xt
where Vx , (
X2
=pd and V =P2) satisfy, according to (11), X2
(
Vx , = 0,
V
X2
= 0.
(ii)
Since the value function does not explicitly depend on time, we have V(x 1> X2 = 0) = Xl and hence Vx , = 1 along the x.axis. The final condition for VX 2
8
PURSUITEVASION GAMES
351
along the xlaxis cannot be obtained directly; it will instead be obtained through the relation min", max", H = 0 at t = T. Carrying out these min max operations, we get u 2 = sgn (Vx ,) = 1 and hence )'z* (x) = 1, and the vector (sin u l, cos u l) is parallel to (VX I ' Vx ) ' Substitution of these equilibrium strategies into H leads to
2+J2J(1
+ V;)2V = 0, X2
which yields VX2 = 1 at t = T. From (ii)it now follows that VX 1 = 1 and VX2 = 1 forallt.Inthisexampleitisevenpossibletoobtain V explicitly: V(X l,X2) = Xl + Xz' Hence, the feedback saddlepoint strategies are constants: yt*(x) = n/4; yZ*(x) = 1. The corresponding state trajectories are straight lines, making an angle of n/4 radians with the negative xzaxis (see Fig. 2).Figure 2 also helps to visualize the geometric derivation of the saddlepoint strategies. Suppose that the system is at point A corresponding to the value 6, say. Emanating from point A, the velocity vector (Xt>X2)' has been drawn in terms of the vectors (0,  2)', ( (2 + J2»)'Z*, 0)' and (  cos )'1*,  sin )'1*)', which add up to the RHS of (i). E, the maximizer, would like to make the angle ex (see Fig. 2) as large as possible. The best he can do, irrespective of P's decision, is to choose yz* = + 1. If P would play 1'2 =1= 1, i.e. 1'2 < 1, the angle ex will become smaller, and consequently the outcome will become smaller than 6. Similarly, P, the minimizer, would like to have a vector (Xl' X2)' with angle ex as small as possible, which .leads to his optimal strategy ),1* = n/4; for any other 1'1, the vector (x 1, X2)' will point more to the right, and this is exactly what P wants to avoid.
6
X2
,I 4
2
A~ 2
6
x,
Fig. 2. Vectogram for Example 1.
All conditions of Thm 1 are satisfied and therefore the strategies obtained indeed constitute a saddlepoint solution in feedback strategies. 0 In addition to the necessary conditions ofThm 2, we now exemplify some of the intricacies which might occur with respect to termination.
352
DYNAMIC NONCOOPERATIVE GAME THEORY
Example 2 Consider the onedimensional system x(O)
=
xo
> 0,
where controls u 1 and u2 are constrained to  1 :s; u' :s; 0, i set is the half line x :s; O. The cost functional is
= 1,2. The target
L=J~(lx)dt.
The costate equation for this problem is
P=
aH a; = 1,
and the optimal {u1*,U Z*} satisfy u 1* =
{
 I if p > 0 undetermined if
OWp
p
= 0,
uz* =
{  1 if p < 0 undetermined if
p
= O.
(i)
OWp>O
How do we determine p(T)? The appropriate condition in (11)cannot be used since that condition, which gives p(T) along the boundary of the target set, requires variations in x within l(T, x) = 0; but this latter relation only yields x = O. In its stead, we make use of the definition of the derivative for p(T) and also make use of the functional form V (l1x) = l1x  t(l1X)2 for sufficiently small (positive) l1x (which is obtained through some analysis) to arrive at the value p(T) = 1. (See Problem 1 for another derivation.) Integration of the costate equation and substitution into (i) leads to u 1 " = yl"(X) = 1, u2" = y2"(X) = 0, for 0 :s; x < 1.
(ii)
For x = 1 we have p = 0 and (i) does not determine the equilibrium strategies. For yi" = 0 (i = 1,2), the system would remain at x = 1 and the game would not terminate in finite time. To exclude the occurrence of such singular cases, we have to include an additional restriction in the problem statement, which forces P to terminate the game in finite time; this can be achieved by defining L = 00 at T = 00. Then, at state x = 1, the preceding restriction forces P to choose yl" (x) = 1. Now we can integrate further in retrograde time. For t < T  1, p(t) = VAx· (r] changes sign, and therefore, for x > 1, we obtain from (i): uP = z" (x) = 0, u 2* = y2" (x) = 1, x > 1. (iii) Now the question is whether the strategy pair {yP, y2"}, defined by (ii) and (iii), constitutes a saddle point. To this end, Thm 1 will be consulted. The value function is easily determined to be V(x) = x(2  x)/2, and all conditions of Thm 1 hold, except condition (iii). Condition (iii)ofThm 1does not hold, since, for initial points x> 1, the strategy pair {yl", y2(.) == O} does not lead to
8
353
PURSUITEVASION GAMES
termination in finite time, and hence, by playing y2 == 0, E would be much better off. However, in such a situation, P can do much better if he is allowed to have an informational advantage in the form of (iv) i.e. at each instant of time, P is informed about E's current decision (openloop value).' If P chooses his new strategy as 1'
y
=
{1 if x ~ 1 or (x> 1 and o otherwise,
U2
>
t)
the saddlepoint equilibrium will be restored again by means of the pair 0 1' , y2'), which follows from direct comparison. Another way of getting out of the "termination dilemma" is to restrict the class permissible strategies to "playable pairs", i.e. to pairs for which termination occurs in finite time (cf. Example 5.1 and Def. 5.7). Such an assumption excludes for instance the pair {y1 = 0, y2 = O} from consideration.
o
8.3
CAPTURABILITY
Consider the class of pursuitevasion games formulated in section 2, which we now assume to be timeinvariant, i.e.f, g and I are independent of t, and q does not depend on T. The target set A is therefore a tube in R" x R +, with rays parallel to the time axis. The projection of A onto the R"space is denoted by i\. Before considering the problem of derivation of a saddlepoint solution, we must first address the more fundamental question of whether the target set can be reached at all. Ifit cannot be reached, we simply say that the pursuitevasion game is not well defined. Accordingly, we deal, in this section, with the following qualitative pursuitevasion game. The evader tries to prevent the state from reaching i\, whereas the pursuer seeks the opposite. Once it is known, for sure, that the target set can be reached, one can return to the original (quantitative) pursuitevasion game and investigate existence and derivation of the saddlepoint solution. For the differential game of kind introduced above, we introduce an auxiliary cost functional as J
=
{1+
if (x(t), t)EA 1 otherwise,
for some t,O :s; t <
OC!
This is, of course, possible if E has access to only openloop information, and can therefore only pick openloop policies. If E has access to feedback information, however, (iv) is not physically realizable and a time delay has to be incorporated.
t
354
DYNAMIC NONCOOPERATIVE GAME THEORY
thus converting it into a differential game of degree. The value function for this differential game can take only two values, V (x) =  1and V (x) = + 1,where the former corresponds to initial states that lead to capture, while the latter corresponds to initial states from where capture cannot be assured. Consequently, the value function is discontinuous, and therefore the theory of section 2 is not directly applicable. It can, however, be modified suitably to fit the present framework, which we now discuss below. Let us first note that only those points x of aA for which min max v1(x,u 1,u 2 ) = max min v1(x,u 1,uJ ) ~ u1
u2
u2
wI
°
(12)
are candidates for a terminal position of the game. Here, the ndimensional vector vis the outward normal of A at x (see Fig. 3). If strict inequality holds in (12), then the state will penetrate A; furthermore, the points x for which equality sign holds may only be grazing or touching points. The set of all points x of aA which satisfy (12)is called the usable part (UP) of ai\. Let us suppose that the initial points situated in the shaded region of Fig. 3and only thosecan be steered to the target set by P, regardless of how E acts, and investigate whether we can determine the surface S which separates these initial states from the others which do not necessarily lead to termination. This surface S is sometimes referred to as the boundary of the usable part (BUP).t
Fig. 3. The construction of the BUP (boundary of usable part).
Consider a point A on S, at which a tangent hyperplane relative to Sexists. The outward normal at A, indicated by p, is unique, apart from its magnitude. The surface S is then determined by mm maxP1(x,u 1,u 2 ) = 0, ut
u2
(13)
t This terminology differs from Isaacs'. In Isaacs (1975), the BUP consists of those points of the
UP for which the equality sign holds in (12).
8
PURSUITEVASION GAMES
355
which is only a necessary condition and therefore does not characterize S completely. This semipermeable surface should in fact have the property that without P's cooperation E cannot make the state cross S (from the shaded area V = 1 to the nonshaded area V = + 1) and conversely, without E's cooperation, P cannot make the system cross S (from V = + 1 to V =  1). Hence if a "tangential" penetration of S is impossible, (13) is also sufficient. Points of S at which no tangent hyperplane exists (such as H 1 and H2 in Fig. 3) must be considered separately. Those points will be considered later, in sections 6 and 7, while discussing intersections of several smooth semipermeable surfaces. To construct S, we now substitute the saddlepoint solution of (13), to be denoted by u i' = yi'(X), i = 1,2, into (13) to obtain the identity p'(x)f(x, y t·(x), y2'(X)):= o.
Differentiation with respect to x leads to (also in view of the relation (%ui)(p'f). (oyi/OX) = 0, which follows from the discussion included in section 5.5.3. right after equation 5.3.3) dp = _ dt
(Of)' ax
p,
(14)
which is evaluated along S. The boundary condition for (14) is p(T)
= v,
(15)
where v is the outward normal of A at the BUP. We now introduce various features of the BUP by means of the following example, which is known as the "homicidal chauffeur game". Example 3 Let us consider a pursuer and an evader, who both move in a twodimensional plane (with coordinates Xl> X2) according to
X1p = V1 sin Op, xz P = V 1 cos Op,
Bp =
W1 U 1,
X1e =
V2
=
V2
xZ e
Be
=
sin 0e, cos Oe'
WzU
(16)
Z,
where the subscripts p and e stand for pursuer and evader, respectively. The controls u 1 and U Z satisfy the constraints Iuil ~ 1, i = 1,2. The scalars V 1 and V z are the constant speeds, and W1 and Wz are the maximum angular velocities of P and E, respectively. Since the angular velocities are bounded, three coordinates (Xl' Xz and 8) are needed for each player to describe his position. For u 1 = 0 (respectively, u Z = 0), P (respectively, E) will move in a straight line in the (Xl' xz) plane. The action u = + 1 stands for the sharpest possible turn to the right; u =  1 analogously stands for a left turn. The minimum turn and for E by R z ~wz/vz. radius for P is given by R 1 ~Wl/Vl For this differential game, yet to be formulated, only the relative positions of
356
DYNAMIC NONCOOPERATIVE GAME THEORY
P and E are important, and not the absolute positions by means of which the model (16) has been described. Model (16) is sixdimensional; however, by means of the relative coordinates, defined as
=
Xl : (Xl. Xl p) c~s fJ p ~ (x 2 • ~ (Xl. Xl p) sin fJ p (X2. fJ = fJ.  fJ p ,
X2 
X2p) X2p)
sin fJ p , cos fJ p ,
}
(17)
the model will be threedimensional, as to be shown in the sequel. In this relative coordinate system, (Xl' x 2 ) denotes E's position with respect to P and the origin coincides with P's position. The x2axis is aligned with P's velocity vector, the xlaxis is perpendicular to the x2axis (see Fig. 4), and the angle fJ is the relative direction (heading) in which the players move. With respect to the relative coordinates (17), model (16) can be described as XI=WIUlx2+v2sinfJ, } X2 = VI +WIUIXI +V2 cosO, () = W2U2 WIU I,
(18)
which is sometimes referred to as the twocars model. We shall consider a special version of (18), namely the one in which E is allowed to change his direction instantaneously (i.e. W 2 = (0). Then E will have complete control over fJ, which we therefore take as E's new control u 2 • Also, normalizing both P's speed and minimum turn radius to one, we obtain 2 l } Xl =  u X2 + V2 sin u , (19) x2 = 1 +U1XI +V2 cos u", with no bounds imposed on u 2 , while u l still satisfies the constraint lull :s; 1. The two normalizations introduced above determine the distance scale and time scale, respectively. Figure 4 depicts model (19); note that the angle u 2 is measured clockwise with respect to the positive xraxis. This convention of measuring u 2 has been initiated by Isaacs (1975), and since then it has been
2
vekxrty
vector P
~
E
velocity
vector E
Fig. 4. The relative coordinate system.
8
357
PURSUITEVASION GAMES
adopted in the literature on pursuitevasion games. Note that the coordinate system is affixed to P and moves along with P through the "real" space. The target set A is defined through the inequality xi+x~
s 13 2 ,
that is, P tries to get within a distance 13 of the evader, whereas the evader tries to avoid this. The game thus formulated is referred to as the homicidal chauffeur game, in which a chauffeur (P) tries to overrun a pedestrian (E). The homicidal chauffeur game is trivial for V2 > 1 = Vi> since capture will then never be possible, provided that E plays the appropriate strategy. Therefore we shall henceforth assume 0 < V2 < 1. The UP (part of the circle described by xi + x~ = 13 2 ) is determined by those (Xl' X2) for which the following relation holds (note that at the point (Xl' X2), the vector (Vi> V2)' = (Xi> X 2)' is an outward normal to A): min max {x I u
1
2
( 
u l X 2 + V2 sin u2 ) + X 2 (  1 + U I X I
+ V 2 cos u2 ) }
u
=X2+V2f3~0.
Hence only those X2 for which X2 ~ V2f3 on the circle xi + x~ = 13 2 constitute the UP. Intuitively this is clear: directly in front of P, E cannot escape; if E is close to P's side, he can sidestep and avoid capture, at least momentarily; if E is behind P, i.e. X 2 < 0, there is no possibility for immediate capture. In order to construct the surface S, we shall start from the end point (x I = 13 ) (1  v X 2 = V2 13), on the capture set. Another branch of S can be obtained if we start from the other end of the UP, which is (x 1 =  13)(1 v~), X 2 = V2f3), but that one is the mirror image of the first one; i.e. if the first branch is given by (Xl (t), X2 (t», then the second one will be given by (  X I (t), X2 (t). Therefore we shall carry out the analysis only for the "first" branch. Equation (13) determines S and reads in this case
n,
min max {PI ( u l X 2 + V 2 sin u 2 ) + P2 (  1 + u l Xl u1
u2
+ V2 cos u 2 ) } == 0,
whereby sin uZ* =
PI
)(pi+pn'
u 1' = sgns,
with
cos u z* = .j
~z
Z)'
(PI +pz
(20)
358
DYNAMIC NONCOOPERATIVE GAME THEORY
The differential equations for Pl and P2 are, from (14), •
1
•
Pl = u P2,
1
(21)
P2 = U Pl·
The final condition for P, as expressed by (15), will be normalized to unit magnitude, since only the direction of the vector P plays a role, i.e. pdT) = J (l 
vn ~ cos ex;
P2 (T) = V2 ~ sin ex
(22)
for some appropriate angle ex. In order to solve for p(t), we need to know u", At t = T, s = 0, and therefore u 1 (T) cannot be determined directly. For an indirect derivation we first perform ds dt
d
= dt
(PlX2  XlP2)
=
(23)
Pl'
Since Pl > 0 at t = T, we have ds/dt < 0 at t = T, which leads to s(t) > 0 for t sufficiently close to T, and therefore u l = + 1 just before termination. The solution to (21), together with (22) and u l = + I, now becomes pdt)
= cos(t  T + ex),
= sin(t  T + ex),
P2 (t)
(24)
and the equations determining the branch of S emanating from Xl = {3 J (l X2 = {3V2 become
vn,
+ V2 cos(t  T + ex), 1 + Xl + Vl sin(t  T + ex),
Xl =  X2
x2 =
Xl (T) = {3J(lX2(T) = {3V2,
vn, }
(25)
which are valid as long as s(t) > O. The solution is Xl (t) = ({3 + V2(t  T) )cos(t  T + ex) + 1  cos(t  T), } X2 (t) = ({3 + V2(t  T)) sin(t  T + ex)  sin(t  T),
(26)
which leaves the circle xi + x~ = {32 tangentially. What we have to ensure is that the trajectory (26)lies outside A. This is what we tacitly assumed in Fig. 3, but which unfortunately does not always hold true as we shall shortly see. Ifwe define r ~ J(xi + xn, we require f > 0 at t = T (note that r = Oat t = T), and some analysis then leads to the conclusion that this holds if, and only if, {32 + v~ < 1.
(27)
For parameter values ({3, V2) for which {32 + v~ > I, the semipermeable surface S moves inside A and thereby loses its significance. t In the latter case the proposed construction (which was to split up the state space into two parts, V = + 1 and V = 1) obviously fails. This case will be dealt with later in t In fact, it is still a semipermeable surface, but for a different ("dual") game in which E is originally within the set x~ + x~ < p2 and tries to remain there, whereas P tries to make the state cross the circle x~ +x~ = p2.
8
PURSUITEVASION GAMES
359
section 5 within the context of the dolichobrachistochrone problem. In the present example we henceforth assume that (27) holds. Let us now explore the trajectories described by (26). The following two possibilities may be distinguished: (i) Xl (t 2 ) = 0 for some t 2 < T and (ii) Xl (z) > 0 for all t < T, i.e. as long as set) > O. These two cases have been sketched in Fig. 5. From (26) it follows that the two branches of 8 intersect (if they do, the point of intersection will be on the positive x2axis) if
p~
V2
arcsinte.) +
J(l v~)
1,
(28)
whose verification is left as an exercise to the reader. \
~2
'.
(a)
(b)
Fig. 5. Possible configurations for the BUP.
Case 1 (Fig. 5a) For the situation depicted in Fig. 5a, let us first assume that E is initially somewhere within the shaded area. Then he cannot escape from the area, except for moving through the UP (= capture), provided that P plays optimally, which is that P will never let E pass through the two semipermeable surfaces indicated by 8 1 and 8 2 , If E is initially outside the shaded area, however, he will never be captured (assuming that he plays optimally when in the neighbourhood of 8 1 or 8 2 ) , If E is initially in the shaded area, the interaction point C requires closer attention. Towards this end let us assume that the state is initially on 8 1 ; then P should try to force it move in the direction towards the UP (which fortunately can be done in this example), and not towards C and pass C eventually. This latter possibility has been depicted in Fig. 6, where P tries to keep E in the shaded region. Suppose that, in Fig. 6, the state is initially at point Q. In order for P to keep the state within or on the boundary of the shaded area, and for E to try to escape from it, both players should employ their strategies according to (13). If these strategies cause the state to move in the direction of C, and eventually pass C, then P cannot keep E within the shaded area, though he can
360
DYNAMIC NONCOOPERATIVE GAME THEORY
x, Fig. 6. A leaking corner.
prevent E from passing either 51 or 52' The intersection point of the two semipermeable surfaces "leaks" and it is therefore called a leaking corner. As already noted, point C in Fig. 5a does not leak. The next question is; once E is inside the shaded area, can P terminate the game in finite time? In other words, would it be possible for E to manoeuvre within the shaded area in such a way so that he can never be forced to pass the UP? Even though such a stalemate situation may in general be possible, in the present example E can always be captured in finite time, which, for the time being, is left to the reader to verify; Example 5 in section 4 will in fact provide the solution. Case 2 (Fig. 5b) We now turn our attention to the situation depicted in Fig. 5b in which case
f3 >
V2
arcsin
(V2)
+ j(l v~) 1.
(29)
In the figure, the semipermeable surfaces have been drawn up to t 1 = T  n  2(X which corresponds to the first switching time of u 1 in retrograde time from t = T. (From (23)and (24)it follows that s(t) =  sin (t  T + (X) + sin (X.) Fort < t 1 we have to continue with u" = 1.Carryingoutthecorresponding analysis, it follows that the resulting continuation of 51' designated by 8 1, would be as sketched in Fig. 7. The trajectory leaves point C in exactly the opposite direction as it arrived, and the semipermeable surface has a sharp
Fig. 7. The failure of continuation of the semipermeable surface.
8
PURSUITEVASION GAMES
361
corner (which is also known as a cusp). For the semipermeable surface 8 1 , P can keep the state on the shaded side of 8 1 , and the same is true for 8 1 (the reader should note the directions of p(t) in Fig. 7). Clearly, 8 1 and 81 together do not constitute a semipermeable surface. Therefore, 8 1 will be disregarded altogether, and we shall consider only the semipermeable surface of Fig. 5b. It is now claimed that termination is possible from any initial point. If the state is originally at I, say, (see Fig. 5b), E can prevent the state from passing 8 1 ; P, on the other hand, can force the system to move along the dotted line, and therefore termination willoccur. We have to ensure that no little "island" exists anywhere in the (Xl' X2) plane, surrounded by semipermeable surfaces, and not connected to A, and wherein E is safe. Such a possibility is easily ruled out in this example, on account of the initial hypothesis that P's velocity is greater than E's, since then P can always force the state to leave such an island, and therefore such an island cannot exist to begin with. 0 In conclusion, the foregoing example has displayed various features of the BUP and in particular the one that if two semipermeable surfaces intersect, then we have to ensure that the composite surface is also semipermeable, i.e. there should not be a leaking corner. Furthermore, in the construction discussed above, within the context of Example 3, we have treated only the case where BUP does not initially move inside the target set. The other case will be elucidated in section 5, within the context of the dolichobrachistochrone problem.
8.4 SINGULAR SURFACES An assumption almost always made at the outset of every pursuitevasion game is that the state space can be split up into a number of mutually disjoint regions, the value function being continuously differentiable in each of them. The behaviour and methods of construction V(t, x) are well understood in such regions. The boundaries of these regions are called singular surfaces, or singular lines if they involve onedimensional manifolds, and the value function is not continuously differentiable across them. In this book we in fact adopt a more general definition of a singular surface. A singular surface is a manifold on which (i) the equilibrium strategies are not uniquely determined by the necessary conditions (11),or (ii) the value function is not continuously differentiable, or (iii) the value function is discontinuous. In general, singular surfaces cannot be obtained by routinely integrating the state and costate equations backward in time. A special singular surface that manifests itself by backward integration is the switching surface, or equivalently, a transition surface. In Example 5.2, we have already met a
362
DYNAMIC NONCOOPERATIVE GAME THEORY
switching line within the context of an optimal control problem, and its appearance in pursuitevasion games is quite analogous. In the dolichobrachistochrone problem to be treated in section 5, we shall see that the procedure for locating a switching or transition line is fairly straightforward, since it readily follows from the state and costate equations. A singular surface, which does not manifest itself by backward integration of state and costate equations, is the socalled dispersal surface. The following example now serves to demonstrate what such a surface is and how it can be detected. Example 4 The system is the same as that of Example 1:
Xl = (2 + J2)u 2 sin u l, X2 = 2cos u l, with lu2 1 :s; 1 and with no bounds on u l. The initial state lies in the positive X 2 half plane and the game terminates when X2(t) = O. Instead of taking L = Xl (T) as in Example 1, we now take L = xi (T). In order to obtain the solution, we initially proceed in the standard manner. The Hamiltonian is H = V { (2 + J2)u 2  sin u l } + V {  2  cos u l }, (i) X1
X2
where VX 1 = 0, VX 2 = O. Since V is the value function, we obviously have VX 1 = 2x l along the x.axis. Then, as candidate equilibrium strategies, we obtain 2
= sgn (VXl) = sgn (Xl (T) ), } (sinul,cosul)II(VXh VX 2 ) ,
u
(ii)
where II stands for "parallel to". Note that the u l and u 2 in (ii)are candidates for openloop strategies and therefore the expression u 2 (t ) = sgn (Xl (T» does not conflict with causality. Substituting (ii) and VX 1 = 2Xl into (i) with t = T and setting the resulting expression equal to zero, we obtain V (x (T» = 21 X 1 (T) I. The costate variables Vx , and V are completely determined now. For terminal state conditions of (Xl> 0, X2 = 0) we find, as candidate equilibrium strategies, X2
X2
u
1
7t u2 = 1, ="4'
and in the case of terminal state conditions (x 1 < 0, X 2 = 0) their counterparts are
8
PURSUITEVASION GAMES
363
The corresponding state trajectories are drawn in Fig. 8, wherefrom we observe that they intersect. Starting from an initial position II' there are two possibilities, but only one of themthe trajectory going in the "southeast" directionis optimal. Starting from 12 , the trajectory in the "southwest" direction would be optimal. The separation line between the southeast (SE) and southwest (SW) directions is obviously the x2axis, since it is on this line where the two values (one corresponding to a trajectory going SW, the other one to SE) of the cost function match. While integrating the optimal paths backward in time, we should stop once the x2axis is reached, which is called a dispersal line. More precisely, it is a dispersal line for the evader, since, if the initial condition is on this line, it is E who chooses one of the two optimal directions along which to proceed; along both of these directions, the system trajectory leaves the dispersal line and both lead to the same outcome. 0
Fig. 8. Intersection of candidate optimal trajectories.
In the foregoing example we have discovered a dispersal line for the evader. Of course, dispersal lines for the pursuer also exist. Towards that end, consider the homicidal chauffeur game introduced in Example 3. Suppose that the parameters of the game (/3, V2) are such that termination is always possible and that the cost functional stands for the time elapsed before capture. If E is far behind P (i.e, if, in Fig. 4, the state is Xl = 0, X 2 ~  /3), then P will turn either to the right or to the left as quickly as possible, so as to have E directly in front of him. Therefore, the halfline (x, = 0, X2 :s; a) for some negative a, is a dispersal line for the pursuer. By means of the same game we now introduce another type of a singular surface, viz. the socalled universal line or universal surface. Assume that (27) and (29) hold, which means that E can be captured by P, whatever the initial position is. Suppose that E is situated far in front of P, i.e. X2 ~ /3, relatively close to the x2axis. Then, the best P can do is to turn toward E, such that, in the relative (Xl' X2) space, E's position moves toward the xraxis. Once E is on the x 2axis, the remaining phase of the game is trivial, since it is basically a chase along a straight line. A line, or more generally a surface, whereto optimal trajectories enter from both sides and then stay on, is called a universal line
364
DYNAMIC NONCOOPERATIVE GAME THEORY
(surface). It turns out that, in the case when both (27)and (28)hold, the x 2axis is also a universal line, as shown in the following example.
Consider the homicidal chauffeur game as introduced in Example 3, in which the cost for P is the "time to capture", and where the parameters /3 and V2 are such that (27) and (28) hold. In order to construct candidates of optimal paths by backward integration from the UP, we now follow the analysis of Example 3 rather closely, along with the formulas (20) and (21). For the present example, equation (6) becomes Example 5
min max { VX I ( 2 u
i
u

u l X2 + V2 sin u 2) + VX 2 (

1 + u l Xl
+ V2 cos u 2) + 1) = 0,
whereby u l = sgn
where
j;~l
(VXI
X2  Vx2xd; (sin u 2, cos u 2 ) II(Vxl' VxJ,
(30)
and VX 2 satisfy
along optimal trajectories. The final conditions for the costate equations are .
sin e(T)
~'Xl = cos e(T) 
cos e(T)
v 2' VX 2
= cos e(T)  V2 '
(31)
the derivation of which is left as an exercise to the reader. (Rewrite the problem in polar coordinates (r, e), and utilize the boundary condition V9 = at termination.) Exactly as in Example 3, it can be shown that u r = + 1 near the end. For the right half plane, the terminal conditions are
°
Xl (T)
= /3 sin Bo'
X2(T)
= /3 cos Bo'
where Bo is the parameter determining the final state, and the solution of the state equation close to termination can be written as Xl
(r)
x2(r)
= 1  cos r + (/3  V2r) sin (Bo + r), }
= sin r + (/3 
V2r) cos (eo + r),
(32)
where r is the retrograde time T  t. Equation (32) displays two facts: (i) x2(r) > Ofor r > 0,whichmeansthatpartoftheshadedregioninFig.5a is not yet filled with (candidate) optimal trajectories, (ii) for r = /3/V2, the state (Xl (r), X2(r) is independent of the arrival angle Bo. Figure 9 depicts these trajectories. Point Al in this figure has the coordinates
8
365
PURSUITEVASION GAMES
Fig. 9. The decision points in the homicidal chauffeur game. X I (f3/V2)' X2 (/3/V2); it is a socalled decision point; E has various options, all leading to the same payoff. Point A 2 is the mirror image of Al with respect to the x2axis. In order to fill the gap with optimal paths, we try a singular arc for which the switching function is identically zero, i.e.
(33)
For this to be true, its time derivative must also be identically zero, which yields VX 1 (

1 + V 2 cos u2 )  VX 2 V2 sin u2
= 0.
(34)
For (33)and (34)to be true when not both VX 1 and VX 2 are zero, the determinant of the coefficients must be zero, which results in Xl  V2(X I
°
cos u2 
X2
sin u2 )
= 0.
Because of (30) and (33), Xl cos u  X2 sin u == 0, which ultimately leads to = as the only candidate for a singular surface. Along this line, the switching function is VX 1 X2 and therefore we must have VX 1 == 0, leading to {u I' == 0, u 2' == Along the line Xl = 0, both P and E follow the same straight path in real space. From the singular arc, paths can leave in retrograde time at any instant, leading to Fig. 10, thus filling the whole shaded area in
Xl
2
2
°}.
x.
x,
Fig. 10. The x 2axis is a universal line.
366
DYNAMIC NONCOOPERATIVE GAME THEORY
Fig. 5b with optimal trajectories. The line Xl = 0 is a universal line. It should be noted that both V and 0 V/ox are continuous across the universal line (see Problem 2, section 8). 0 There are also other types of singular surfaces. In Fig. 11 we have sketched all hitherto known singular surfaces of codimension one (i.e. these surfaces are manifolds of dimension one less than that of the state space) and across which V(x) is continuous. It is believed that no other such singular surface exists, but a verification of this claim is not yet available. The singular surfaces not treated heretofore in this chapter are the equivocal line, focal line and switching envelope. They will not be treated in this book; for examples with equivocal lines the reader is referred to Isaacs (1975) and Lewin and Olsder (1979); the focal line appears for instance in the "obstacle tag" example (Breakwell, 1971)and a game with a switching envelope is treated in Breakwell (1973). Problem 7 in section 8 is also believed to feature a switching envelope in its solution. I
,~~\transit Ion I Ine
universal line
rt«; I
i
~
dispersal line
focal line
«f equivocal line
SWitching envelope
Fig. 11. Classification of singular surfaces.
In the cases of the equivocal line and the switching envelope, one of the players has the option of either staying on the singular surface or leaving it; both possibilities lead to the same outcome. The difference between these two surfaces is that, in the switching envelope case the trajectories enter (in forward time) tangentially, whereas in the equivocal line case they do not. An analogous difference exists between the universal line and the focal line. Some of the singular surfaces can only be approximated by strategies of the form u 1 = yl(X), u 2 = y2(X). In the case of the focal line, for instance, one player, say P, tries to get the state vector off the focal line, but E brings it back to the focal line. Strategies of the form u 1 = yl(X), u 2 = y2(X, u 1 ), with an
8
PURSUITEVASION GAMES
367
informational advantage to the evader, would be able to keep the state precisely on the focal line. Singular surfaces across which V (x) is discontinuous are the barrier and the safe contact. We have already met barriers in Example 5.2 in the context of optimal control, and in section 3 where games of kind were treated. Their appearance in games of degree will now be delineated within the context of the homicidal chauffeur game. An example of safe contact, however, is given in section 8.5.2. Example 6 Consider the homicidal chauffeur game with the cost function defined as the timetogo. In Example 3 we have described how semipermeable surfaces, starting from the end points of the UP of the target, could be constructed. We now treat the case depicted in Fig. 5b, i.e. the case when the parameters V 2 and {3 are such that (29) holds. If E is initially situated at point C 2 (see Fig. 5b), P cannot force him across 8 10 but instead he can force him to go around it. Therefore it is plausibleand this can be proven rigorously (see Merz, 1971}that capture from C 2 will take considerably more time than capture from C l ' Hence, the value function is discontinuous across 8 1 which, therefore, is a barrier. 0
This concludes the present section on singular surfaces. In conclusion, the crucial problem in the construction of the value function is to locate the singular surfaces, but hitherto this problem has not been solved in a routine way. On the other hand, once a particular V has been constructed, which is continuously differentiable in each of a finite number of mutually disjoint regions of the state space, some conditions (known as "junction conditions") exist to check whether the V (x) obtained is really the value function or not. Since these conditions are not yet very well understood, we do not treat them here; but for some discussion on this topic the reader is referred to Bernhard (1977).
8.5
SOLUTIONS OF TWO PURSUITEVASION GAMES
In this section we obtain the complete solutions of two pursuitevasion games by utilizing the techniques developed in the previous sections. Each one of these games features at least one singular surface. In the game of "the lady in the lake", it is a dispersal surface for the evader; there is also a decision point. In the "dolichobrachistochrone" the singular surfaces are a transition line and the socalled safe contact which is the remedy if the BUP happens to move inside the target set instead of moving to the outside.
368
DYNAMIC NONCOOPERATIVE GAME THEORY
8.5.1
The lady in the Jake
A lady (E) is swimming in a circular pond with a maximum constant speed V2' She can change the direction in which she swims instantaneously. A man (P), who has not mastered swimming, and wishes to intercept the lady, is on the side of the pond and can run along the perimeter with maximum speed 1. He, also, can change his direction instantaneously. Furthermore, it is assumed that both E and P never get tired. E does not want to stay in the lake forever; she wishes eventually to come out without being caught by the man. (On land, E can run faster than P.) E's goal is to maximize the payotT, which is the angular distance PE viewed from the centre ofthe pond, at the time E reaches the shore (see Fig. 12). P obviously wants to minimize this payoff, To make the game nontrivial, it is assumed that V2 < 1. U:=1
p~ I
I I
:e E / ~/::::Ju2 c/ r "<,.:
Fig. 12. The "lady in the lake".
Even though the problem could be cast in a rectangular coordinate system fixed in space, a description in the relative coordinates fJ and r turns out to be simpler, where fJ is the angle between P and E (see Fig. 12 for the direction) and r is E's distance from the centre of the lake. This coordinate system is called relative since it is attached to P; it is not fixed in space, and yet it describes the game completely. The kinematics of the game are
where R is the radius of the pond. P's control u 1 is restricted by lu1(t)1 :::; 1, while E's control u 2 , which stands for the angle of E's velocity vector with respect to the radius vector CE (see Fig. 12), is not restricted in any way. We now seek for the equilibrium strategies in feedback form. The cost function is \fJ(T)!, where T is defined as T = min{t:r(t) = R}, and 1t :::; fJ(t):::; +1t.
8
369
PURSUITEVASION GAMES
The Isaacs equation for this problem is 2
. {OV(8,r) 2 OV(8,r)(v 2sinu _~)}_ mm max 0 V2 cos U + 08 ul u' r r R
 0,
(35)
whence
V I 6ii 0 V) . u 1* _ sgn (0 6iiV) ,(cos U 2",sm U 2')11(061;:
(36)
The differential equations for the costate variables along the optimal trajectories are
~(a dt
V(8, r)) 08
and the value function at t
=
°' ~(odt
V(8,r)) or
=
a
2
V. V 2 sin u 08 r2 '
= T is given by V (8(T), r(T)) = 18(T)1.
The optimal control u 1', taken as the openloop representation of the feedback equilibrium strategy, t can now be written as u "(r) = sgn (0 V/08) = sgn(8(T)). Since we assumed n ~ 8(t) ~ + tt, this implies that P will move in E's direction along the smallest angle. Substituting u I' and u? into (35), we obtain . u 2'(t) sin
2
(37)
= RV r(t) sgn 8(T).
Two observations can be made at this stage. Firstly, (37) is valid as long as r(t) ~ RV2; for r(t) < RV 2 the outlined derivation does not hold. Secondly, E runs along a straight line in real space, tangent to a circle of radius RV2 (see Fig.
13a).
p I I
,I
.
,
/
C
r,
t,,)e
I
\
\
/
A'I
V,
'1'1
\
,
E':2:"~<7iiR
,
(a)
(b)
Fig. 13. E's equilibrium strategy for the "lady in the lake". t Note that, although the openloop representation of the feedback saddlepoint solution (if it exists) may exist, an openloop saddlepoint solution surely does not exist for this pursuitevasion game. Whether a mixed openloop saddlepoint equilibrium exists is an altogether different question which we do not address here.
370
DYNAMIC NONCOOPERATIVE GAME THEORY
This result could also have been obtained geometrically (seeFig. 13b).In the relative coordinate system, E's velocity vR is made up of two components: (i) E's own velocity vector in real space, 15 2 , and (ii) a vector VI' obtained by rotation around the centre C, opposite to P's direction of motion (recall that P tries to reduce 18(T)1), and with magnitude r(t)/R. The best policy for E is to make 15 2 and the resulting velocity vector VR perpendicular to each other. For any other choice of u 2 (i.e. direction of 152 in Fig. 13b), the angle between CE and vR will become smaller, or, in other words, E will move faster into P's direction, which is precisely what E tries to avoid. Therefore, E's equilibrium strategy is to keep the angle between 152 and vRat nl2 radians, or equivalently, sin u 2• = Rv 2 Ir(t). This way of constructing E's optimal strategy fails for r(t) < RV2 in which case E can achieve a larger angular velocity (with respect to the centre C) than P, and therefore she can always manoeuvre herself into the position 8(t) = n, i.e. to a position diametrically opposite from P. Note that, in order to maintain 8(t) == n, E would have to know P's current action. If she does not have access to this (which is a more realistic situation), she can stay arbitrarily close to O(t) == n as she moves outward towards the circle with radius Rv 2 • In case E would know P's current action, she can reach the point (8 = tt, r = Rv 2 ) in finite time (see Problem 3); if she does not know P's current action, she can get arbitrarily close to this point, also in finite time. t From the position (8 = rr, r = RV2), E moves right or left along the tangent. More precisely, E will first swim outward to a position r > RV2 which is sufficientlyclose to r = RV2' Then P has to make a decision as to whether to run clockwise or anticlockwise around the pond (P cannot wait since, as E moves outward, his payoff will then become worse). Once E knows which direction has been chosen by P, she will swim "away" from P, along the tangent line just described. In the region r(t) > RV2' P's angular velocity is larger, and he therefore will continue to run in the direction he had chosen before. The outcome of the game is readily calculated to be 18(T)I = n + arc cos V 2
1 J(1vn V2
(38)
which holds true for all initial positions inside the circle of radius Rv 2 . The lady can escape from the man if 18(T)1 > 0, which places a lower bound on E's speed: V 2 > 0' 21723 .... From all initial positions inside the pond, E can always first swim to the centre and then abide by the strategy just described, resulting in the outcome (38). From some initial positions she can do better, t This would inevitably involve some delay in the current state information of E and thereby some memory in her information. Restriction to only feedback strategies would lead to some subtle measurability questions.
8
PURSUITEVASION GAMES
371
namely the positions in the shaded area in Fig. 14 which is bounded by the pond and the two equilibrium trajectories which, constructed in retrograde time, end at (r = Rv 2 , e = n), Within the shaded area the optimal strategies are determined by (36). The backward construction is only valid untille(t)1 = n; the line segment (e = 11:, RV2 S r S R) obviously forms a dispersal line on which it is P who decides whether to go "to the left" or "to the right". In the nonshaded area of the pond in Fig. 14, the value function is constant and equals Ioir, I given by (38). Properly speaking, this value can only be obtained if there is an informational advantage to E as explained above. See also Problem 9. If this is not the case and only closedloop strategies are allowed, this value function can only be approximated; therefore, an e saddlepoint equilibrium exists (cf. section 4.2, specifically Def. 4.2),but a saddle point does not exist. p
F
Fig. 14. Optimal trajectories in the relative space.
The solution obtained has a very special feature. If E would accidentally choose a wrong manoeuvre, she can always return to the centre of the lake and start all over again. The point Eo in the relative coordinate system (see Fig. 14), which plays a special role, is sometimes referred to as a decision point.
8.5.2
The dolichobrachistochrone
The dolichobrachistochrone problem is an extension of the wellknown brachistochrone problem in the calculus of variations. In the latter problem a point mass moves in a twodimensional plane (X I,X 2), under a uniform gravitation field which we assume to be in the direction of the positive x 2axis. Under this convention, the point mass will "fall upwards". The brachistochrone problem can be formulated as an optimal control problem in the first quadrant (Xl ~ 0, X2 ~ 0): At each state (Xl' x 2 ) the point mass is navigated in such a way that the direction of travel, i.e. the direction of the velocity vector (Xl' X2 ), is the control variable. The gravitational constant being properly scaled, we take the speed of the point mass to be equal to J(X2). The control
372
DYNAMIC NONCOOPERATIVE GAME THEORY
problem terminates when the point mass penetrates the positive x2axis; therefore, the target set A is the second quadrant (X2 ;;;:: 0, X1 ~ 0).Termination is defined here as penetration (rather than as grazing) of iJA for a reason which will become clear later, but which essentially allows movements along the positive x2axis. The cost function is the time elapsed; the direction of the velocity vector should be chosen such that the terminal condition is satisfied in minimum time. The underlying state equation for this optimal control problem is Xl = X2 =
(JX2)COSU
(Jx 2) sin u
1, 1,
1
where u is chosen by P. In the dolichobrachistochrone problem, t another player, E, is included in the formulation and he influences the dynamics through his control u 2 according to
(39)
where w is a positive constant and u 2 satisfies the constraint Iu 2 1 ~ 1. E's aim is to postpone termination for as long as possible. The cost function, for P to minimize and for E to maximize, is therefore chosen as L = J~1 dt,
(40)
where T is defined as inf{t:xI(t) = 0, XI(t) < O}. If E chooses u2 = + 1, this implies that he pulls as hard as possible in the positive xldirection, away from A; if he chooses u 2 = 1, he pulls downwards, against gravitation, trying to keep X2 small (and a small X2 stands for the case when P's influence is relatively small). Intermediate values of u 2 should be interpreted as E taking a combination of these two options. First we wish to determine those initial points from which capture is possible. The normal to l\. is (1,0)' and the UP, determined from
~~n ~~x (Jx 2 ) COS ul + I(U
2
+ 1))
= 
(Jx 2 ) + w < 0,
(41)
is Xl = 0, X2 > w2 ; we have used strict inequality in (41) since we require t A name which is more suitable for this game, in view of the description given below, is "macrobrachistochrone", as explained in Isaacs (1969). We shall, for historical reasons, adopt the original one.
8
373
PURSUITEVASION GAMES
penetration. The set of differential equations that determines the BUP is (39), with initial condition (Xl = 0, X z = w Z ) and with {ul , U Z} determined as the saddle point of PI ((jxz)COS u
l
+T(U
Z
+
1)) + pz ((jxz)Sin u
l
+
T(U
Z
1)).
(42)
where PI and pz are related to {ul , U Z} via the differential equations
PI =
PI (T) = 1, }
0,
pz = ~cosul~sinu\ 2y' X z
(43)
pz(T) = 0.
y' Xz
The saddlepoint solution of (42)can further be shown to satisfy the relations cosUI'=PI/P,
sinul'=pz/p,
U Z'
= sgns,
(44)
where pgj(pi+p~),
sgpI+Pz,
If (44) is used in (43) and (39) we obtain the following set of four differential equations: PI W XI =  j xz+(l +sgns), p 2
(45)
1\ = 0, . pz=
pdT) = 1,
PI
~' Loy' X 2
In general it is difficult to solve (45) in closed form. However, by making use of (42) which is identically zero with u' = u", we obtain x dt)
=  w (T 2
Z
(T wt)
Z
. t) + W SIn  ,
2
t)
(46)
W ( T xz(t)=2 l+cos;.
°
To our dissatisfaction, we observe that Xl (t) < for t near Tand that the BUP, as determined by (46),is not as sketched in Fig. 15; it moves, instead, towards inside A and therefore loses its significance. A BUP must, however, exist, since, for Xz > w 2 , it is clear from (39) that P can always terminate the game in finite time. For initial positions with Xz close
374
DYNAMIC NONCOOPERATIVE GAME THEORY
o
x,
Fig. 15. What the BUP does not look like.
to zero, but positive, i.e. 0 < X2 ~ w 2 , P cannot enforce x2 > 0 or Xl < 0 without E's cooperation, and for such initial positions the game will never end, thus leading to "victory" for E. Therefore it might be reasonable to think that 2 X2 = w will be the BUPt, but this is not completely true as we shall elucidate in the sequel. Let us now look into the problem of capture in finite time for initial states close to the xraxis and just below the half line X2 = w 2 , Xl ~ O. More specifically, consider the initial point indicated by A l in Fig. 15. Its coordinates 2 are Xl = 0, < X 2 < w 2 • Assume that P chooses the strategy l u = yl (Xl> X2) = 3n/4, with the corresponding contribution to the state equationthe vector ((Jx 2)cos u l, (JX2) sin u l ),indicated by vector Min Fig. 16a. If E does not want the point mass to cross the x2axis in the negative 2 2 2 X ldirection, he must choose u such that the vector (! w(u + 1), !w (u  1))' 2 belongs to the segment DE 2 (see Fig. 16a). Any other u will lead to immediate termination. Under these choices for u l and u 2 , the point mass will move in the direction i'JX.1 l , i.e. x2 > 0 and Xl ~ 0 (even though oMl is not uniquely determined since point E on the line segment DE 2 is not unique, the result is
tw
p
o
~)
(~
Fig. 16. Vectograms for the dolichobrachistochrone. tThe line (x, ~ 0, X 2 = w 2 ) is semipermeable. The point (x, the two lines (x, ~ 0, X 2 = w 2 ) and (x, = OJ.
= 0, x 2 = w 2 ) is a leaking corner of
8
375
PURSUITEVASION GAMES
always x2 > 0). E does not want the point mass to move up, since it will eventually get into the region X2 > w 2 , wherefrom capture is unavoidable. Suppose, therefore, that the point mass has moved a short distance in the direction oMb after which Xl > O. Now E can choose u 2 = 1, i.e. he pulls downwards as hard as possible, and the point mass moves in the direction OM2' E can stick to his choice u2 =  1 until XI becomes zero again, after which he has to choose a strategy corresponding to i5E again, etc. As a result of this iterative procedure the point mass moves up and follows a path as depicted in Fig. 16b. The preceding analysis has been mostly heuristic (since it deals with a discretized version of the game), but it nevertheless shows that E cannot prevent the point mass from moving up into the region X 2 > w 2 • Later on we shall make this analysis more precise; the conclusion at this point is that {Xl ~ 0, X2 = w 2 } is not the BUP. For large values of Xl' however, the line 2 X2 = w will constitute part of the BUP, because in that region P cannot threaten his opponent by letting the state pass the x2axis. We now construct the exact form of the BUP, together with the solution of the differential game formulated by (39) and (40). Towards this end, suppose that we are given an arbitrary final point (Xl = 0, X2 = q) with q > w 2 , and consider construction of the equilibrium trajectory for the dolichobrachistochrone problem which terminates at (0, q). For this problem, equation (6)can be written as
[V {(Jx
o = ~~n ~~x
X1
2 ) CO S
u
+ V {(JX 2)SinU I +I(U 2 X2
l
+ I(U 2 +
I)}
l)} + J 1
(47)
which leads to the unique set of relations (48) U 2" 
sgn s,
s = VXl Do
+
Vxz'
If these are further used in (47), we obtain the equation (49)
The costate variables VX 1 and
v"z satisfy
V =0;
Vx,(x(T» =
X)
Vxz = +
2)X2 ;
(Jq~w,}
Vxz(x(T» = 0,
(50)
376
DYNAMIC NONCOOPERATIVE GAME THEORY
which leads to Vx,(X(t))
= 1/(Jq 
w).
To solve for Vx , from (50) we need the corresponding expressions for the state variables, which satisfy, in view of (48), the differential equations
(51)
Since these equations explicitly depend on Vx " their solution cannot readily be obtained in closed form. However, certain simplifications become possible if we make use of (49); as long as s > 0 (which holds in a certain neighbourhood of the terminal position), equation (49) leads to Vx , =
J(q/x 2 1) (Jq)w

(52)
The reader can now verify that this solution indeed satisfies the second equation of (50). Equation (49) provides also a second solution for Vx " with opposite sign, which, however, has to be rejected since it follows from (50) that Vx , < 0 near t = T. Once (52) is substituted into (51), XI(t) and x 2(t ) can explicitly be solved for r ~ T  t ::::; q, the corresponding expressions being given below, together with those for VX t and Vx , :
nJ
rJq 2
q. 2
r
Xl (t)
=   +  sm r::  wr,
X2(t)
=
~(1
r«
+cosil). (53)
1
VX 1
= (Jq)w'
Vx ,
=
J(qx 2 1) (Jq)w
nJ
These expressions might fail to characterize the solution if r > q or s(r) ::::; 0, whichever is more restrictive. For small r we clearly have s > O. The first instant r for which s = VX 1 + Vx , becomes zero is denoted by ro(q) and is determined as nJq (54) ro(q) = 2'
8
377
PURSUITEVASION GAMES
Hence, it is (54) which determines the interval on which (53) is valid. It readily follows that the points (Xl(fo(q», X2(fo(q») formfor varying qpart of a parabola, which is a switching line indicated by m in Fig. 17. For q~qog4n2w2/(n+2)2, the solution Xl of (53) satisfies xl(f)~OVrE [0, fO(q)]. For q belonging to the open interval (w2, qo) only part of the trajectory (Xl(f), X2(f» satisfies the condition Xl(r) ~ for all r E [0, ro(q)]. By hypothesis, the initial position (Xl(0), X2(0» lies in the first quadrant, and therefore we should use only the parts of the trajectories which satisfy the condition Xdf) ~ 0.
°
o
x,
Fig. 17. Candidate equilibrium trajectories for the dolichobrachistochrone problem.
°
In order to find the continuation of the trajectories in retrograde time, we take s < in (49) and (51). Direct integration of equations (50) and (51) turns out to be difficult. However, by making use of (49), and by taking into account that VX1 = 1/(Jqw), we can solve for VX2 explicitly, to obtain VX2
=
:'2_::
w+Q(x
q)
,Q(X2,q)g
[{
w
2x}
1+«Jq)_~)2
X2
Jl/ 2
(55)
Some analysis shows that the plus sign holds for q ~ 4w 2 and the minus sign holds for qo ~ q < 4w2. Once VX2 has been determined on and below m, it can be substituted into (39); the solutions to these differential equations are drawn in Fig. 17. For q ~ 4w2 , the trajectories "drop downward" with increasing r, starting from the parabola m. For qo ~ q < 4w2, the trajectories first "turn upward"; at time r = r 1 (q), which is determined from Q (X2 (f tl, q) = 0, they have a horizontal tangent in the (Xl> X2) plane. For t > tl (q) we again must choose the plus sign in (55). From (50) it follows that VX2 is positive along optimal trajectories and hence VX2 (Xl (r], X2 (r» is decreasing as a function of r, Therefore, once s becomes negative, it will stay so, and no more switches will occur for increasing f. The trajectories can be continued until they reach the half line X 2 = w 2 , at which instant r becomes infinity. The family of trajectories thus constructed does not
378
DYNAMIC NONCOOPERATIVE GAME THEORY
completely cover the region (Xl ~ 0, X2 > wi). However, we already know that capture from this region is possible in finite time. Therefore, some optimal trajectories, or parts of optimal trajectories, are missing. This also follows from the heuristic argument used earlier in the construction of the BUP, in which case the conclusion was that capture in finite time is possible also from the line segment Xl = 0, !w 2 < X2 < qo/2, which is not covered by the analysis so far. At this point, we assume that s < 0 everywhere below m and for Xl > O. Under this assumption (which is only a conjecture at this stage), it turns out that we can fill in the missing part of the state space with optimal trajectories. Other possible choices concerning the sign of s lead to contradictions, which are left for the reader to verify. Since s < 0, we have u2 • =  1 as long as Xl > O. Then it follows that, once the system reaches the x 2axis, with w 2 / 2 < X 2 ::::;; qo/2, it will stay there. Referring to Fig. 16b, the teeth of the "saw tooth movement" become infinitely small. In the limit we have Xl = 0, under the assumption that X I (t) is differentiable, leading to
= _ 2)X 2
cos u l _ 1. (56) w This indicates that, for this limiting case to be possible, we have to consider a different information structure: apart from X 2, E has to know also the current value of u l . If such an information structure is not allowed, we have to approach the resulting trajectory along the xraxis again by the "sawtooth movement" and this leads to E equilibrium solutions. Substitution of (56) into the differential equation for X2 in (39) gives u 2 (u l
)
X2 = ()x 2}(sinulcosu1)w. Since P wishes to maximize X2' he will choose u l = 371:/4, leading to X2 = )(2X2) w,
(57)
which is positive for X 2 > w /2. For initial points on the line segment a, being defined as those (x., X2) with Xl = 0, !w 2 < X2 < qo/2, the system will move (in forward time) in the positive direction of the xraxis until it reaches X2 = qo/2, wherefrom it will follow the trajectory indicated in Fig. 17, and the game ends at the point (Xl = 0, X2 = qo). For the "onedimensional game" along the line segment a (which is a particular type of a singular line called safe contact), equation (47) reads (note that Xl = 0) 1 + VX2 x2 = 0, whence 2
1
VX2
= J(2X2 ) 
W '
(58)
To determine VX2 along the same line segment, we must consider equation (49) in its general form (not with Xl = 0 substituted) which, with s < 0, leads to
8
PURSUITEVASION GAMES
VX1
=
1
J (2Xl) _
W
= 
379
VX2 '
(59)
Equations (58) and (59) indicate that s = 0 along IX. Note that, in this onedimensional game, VX 1 is not a constant. This is not in contradiction with (50), since the derivation of (50) requires existence of neighbouring optimal trajectories (cf section 2) which do not exist in this case. We have earlier conjectured that, before the trajectories reach IX, s < 0, which is now seen to be in complete agreement with oS > 0 as it follows from (50) and the fact that s = 0 on IX. Since VX1' VX2' Xl and Xl are known along IX, the differential equations for Xl' Xl' VX 1 and VX2 can be solved in retrograde l along IX, V and  V become time, leading to Fig. 18. As Xl approaches X1 X2 infinitely large and the first two terms in (47) dominate the last term (which is + 1). Neglecting this + 1 term and normalizing the magnitudes of VX j and V"2 l ) to + 1 and 1, respectively, the trajectory at the point (Xl = 0, Xl = departing from this point satisfies
tw
tw
[V
0= min max e'
X1
u2
I(U
l
l)}}
V.X 1 = 0, VX 1 ( Xl V.X2
P = 2J'X;'
{(JX1)COS u 1 +
1 = 0, Xl = 2W
VX2 ( Xl
1) =
1W = 0, Xl = 2
~(Ul 2
+ I)} + V
X2 {
(Jxl)sin u l
1,
1) =
1,
2).
which, together with the system equation, determine the semipermeable surface leaving (x, = 0, X2 = !w From these equations one readily obtains
:::=J(W2~X2}
o
(60)
x,
Fig. 18. The complete solution to the dolichobrachistochrone problem.
380
DYNAMIC NONCOOPERATIVE GAME THEORY
whose solution, passing through (Xl = 0, X2 = w 2 / 2), is
n)  J X2 J (w2
Xl = w2(12"4
JX 2
' X2)+W 2 arcsm;,
(61)
which holds as long as w2 /2 :s; X2 :s; w2 • This semipermeable surface, together with X2 = w2 , Xl ~ w2 (n/4 + 1/2), forms the BUP at which r = 00 and which separates the capture region from the region where evasion is possible. We have thus completely determined the saddlepoint strategies above the BUP. The strategy u 1• = )'1· (Xl> X2) is expressed in terms of VX1 and VX2 via (48); and since VXI and VX2 have explicitly been determined above the BUP, )'1· is known for Xl > O. Furthermore, on Ct it is given by r" (0, X2) = 3n/4. For u 2• = )'2· (Xl' X 2) we also have explicit expressions: )'2· = + 1 above m and )'2· = 1 below m and Xl > O. Along Ct, we assumed an informational advantage to E, in which case (see (56)) l) =  2(JX2)COS (ul)/wl. )'2·(0, X2, u If we substitute u l
= )'1· (0, X2) into this latter expression, in order to determine )'2· in terms of the original information structure, we obtain )'2. = 1 J(2x 2 )/ w, which corresponds to direction OD in Fig. 16. The value function can now be determined; it is simply the time needed for the system to travel from any initial point, along the optimal trajectory passing through this point, to the corresponding terminal point on the xraxis. 8.6 AN APPLICATION IN MARITIME COLLISION AVOIDANCE In this section, collision avoidance during a twoship encounter in the open sea will be treated as a problem in the theory ofdifferential games. Given two ships close to each other in the open sea, a critical question to be addressed is whether a collision can be avoided. If, for instance, we assume that the helmsman of ship 1 (PI) has complete information on the state of ship 2, whereas the helmsman of ship 2 (P2) is not aware of the presence of the first ship, this lack of information on the part of P2 may result in a hazardous situation. It is quite possible that P2 may perform a manoeuvre leading to a collision which would not have occurred if P2 had had full knowledge of PI's position. Hence, in such a situation, it would be reasonable to undertake a worst case analysis, by assuming that one of the ships (PI) tries to avoid and the other one (P2) tries to cause a collision. One can envisage other situations, such as the case when both ships are aware of each other's position and they both try to prevent a collision, i.e. they cooperate. This case, although not a differential game in the proper sense, can
8
381
PURSUITEVASION GAMES
be solved by similar techniques. Roughly speaking, the minmax operation in the differential game case must be replaced by the maxmax operation in the cooperative case. Still another situation arises when one ship holds course and speed (is "standing on") because of the "rules of the road", whereas the other ship must evade. The dynamic system comprises two ships, PI and P2, manoeuvring on a sea surface which is assumed to be homogenous, isotopic, unbounded and undisturbed. The kinematic equations to be used are identical to those of the twocars model (see Example 3): XI=WIUlx2+v2sinO,
= VI +WIUIX I +V2 cost), () = W2U2 WIU I.
X2
}
(62)
This simple model may not be (too) unreasonable if only short duration manoeuvres are considered, during which the ships cannot change their speeds markedly. Each ship is characterized by two parameters: the maximum angular velocity Wi and the constant forward speed Vi' The control of Pi is u i , which is the rudder angle, and which is bounded by lui (t)j ~ 1. Extensions on this model are possible, such as, for instance, the case of variable forward speed (the engine setting will then become a control variable as well), but these would make the analysis to follow much more complicated. Some other effects (hydrodynamic and inertial) have been ignored in the model (62), such as the uidependence of Vi' The cost function is described in terms of the closest point of approach (CPA), the distance corresponding to it is called the miss distance. The terminal condition is given by the CPA at first pass, characterized by dr dt
= 0,
d 2r dt 2 > 0,
(63)
J
where r = (xi + x~), and the terminal range r (T) is the cost function. This constitutes a game of degree. In this section we shall consider, instead, the directly related game of kind, characterized by rm , the minimum range or minimum miss distance. If r (T) > rm' no collision takes place, and for r (T) ~ rm' a collision will take place. In the three dimensional state space, the target set is described by the cylinder xi + x~ = r;'. The UP, i.e. the set of points on this cylinder at which a collision can take place, is determined by v 2 (x, sinO+x2 cOSO)V2X2 ~ 0. (64) The terminal condition wherefrom a barrier can be constructed is determined from this relation with the inequality sign replaced by an equality sign. Substituting XI = r m sin ex and X2 = rm cos ex, where ex is the bearing, into this
382
DYNAMIC NONCOOPERATIVE GAME THEORY
equality, we obtain, at the final time, r(T)
=
r;
sintx(T) = B(VI V2 cos 8 (T)jw cos tx(T) = B (sin 8 (T)jw,
J
}
(65)
where w ~ (vi + v~  2VI V2 cos 8(T»; and where B = ± 1. For each 8 (T) we obtain two values for tx (T), one corresponding to e = + 1 and the other one to B =  1. These two values correspond to a right and left barrier, respectively; P2 can just miss PI, either on the right or on the left side. From the final conditions thus obtained, we must construct equilibrium trajectories backward in time in order to obtain the BUP. The BUPactually comprising two parts, each corresponding to a value of Bin (65)is depicted in Fig. 19. In this figure the two parts of the BUP intersect, which indicates that the enclosed points are guaranteed collision points for P2 (provided that the intersection of the two BUP parts does not leak).
~
left barrier . right barrier
) SUP
x,
Fig. 19. A schematic picture of the BUP.
We now turn to the actual construction of the BUP. The optimal u l and u2 are the arguments of the corresponding Isaacs equation, thus leading to u 1'
= sgn (Xl VX2 
X2
VX , 
Ve),
u 2'
= sgn ( Ve).
(66)
Furthermore, the costate variables satisfy .
Vx .
,= Wlu
V X2 =
Ve
=
WI U
V2
I
I
VX2'
,
(67)
(VX2 sin 8  Vx 1 cos 8),
As long as the arguments of the sgnrelations in (66)are nonzero, it is simple to obtain the solution of (62)and (67); but, substitution of the final values for Vx " VX2' Veinto (66)shows that the arguments of the sgnrelations are zero at t = T. The situation can, however, be saved by replacing the arguments with their
8
383
PURSUITEVASION GAMES
retrograde time derivatives, which leads to 1
u (T) = e sgn 2
u (T)
G: G:
=  e sgn
cos O(T)). cos 0 (T) ).
where s = ± 1. Hence, the equilibrium strategies at the terminal situation are determined for almost all values of O(T). For V 2/V1 < 1, we can expect a singular control for P2 if O(T) = ±arccos (v2/vd for the last part of the manoeuvre. Similarly, if vdv 2 < 1, PI may have a singular control. Let us proceed with the case V2 < V1 < 1. Closer scrutiny reveals that there are no equilibrium paths leading to the singular point O(T) = e arccos (V2/V1)' but, in its stead, a dispersal line ends at this point. The other point, O(T) = e arccos (V 2/V1)' however, is the end point of a singular line (with u 2 = 0), which is a universal line. Integrating backward in time from the final condition, we first start with the singular control u 2 = 0 which may be terminated any time (switching time) we desire; and going further backward in time we obtain a nonsingular u 2 • In Fig. 20, it is shown how the right barrier (e = + 1)can be constructed by determining the optimal trajectories backward in time. In the figure, the projections of the trajectories on the (X2' 0) plane have been drawn. In order to construct these trajectories, 0 (T) can be taken as a parameter, and for O(T) =  e arccos (V 2/Vl) we have another parameter, viz. the time instant at which the singular control switches to a nonsingular one (there are two options: u 2 = + 1 and u 2 =  1) in retrograde time. The dispersal line is found as the intersection of two sets of trajectories obtained through backward integration, namely that set of trajectories which have O(T) smaller than arccos (v 2/vd, modulo 2n:, and the set of trajectories with O(T) greater than arccos (v2/vd. The left barrier can be constructed analogously. If
arccos
V2
Vi"
I
21t
21tarccos
e
~
V1
Fig. 20. The trajectories which make up the right barrier, projected in the (X2' 8) plane.
384
DYNAMIC NONCOOPERATIVE GAME THEORY
the two barriers intersect over the whole range 0 ~ (J ~ 2n, and if the line of intersection does not leak, then the enclosed points are guaranteed collision points for P2. Also a tacit assumption which makes this whole construction work is that the two parts which define the dispersal line (for both the right and left barrier) do indeed intersect. If they do not intersect, then there would not be a dispersal line; instead, there would be a hole in the surface which encloses the points we described above. Consequently, this set of points would not be enclosed anymore, and PI might be able to force escape by steering the state through this hole. As yet, no sufficiency conditions are known which ensure enclosure; hence, for each problem one has to verify numerically that no holes exist. For this reason, the method can be used conveniently only for systems with state space dimension not greater than three. A possible and appealing way of constructing the barriers which enclose the points wherefrom collision is unavoidable, is to consider cuts in the (x.; X2' (J) space for which (J is a constant, say (J = (Jo. For each parameter value 8(T) and also for the retrograde time parameter when 8 (T) =  e arccos (V2 Ivd one calculates the equilibrium trajectories until they hit the plane (J = 8othis being performed for both the right and left barrier. For different values of the parameters, one obtains, in general, different points in the plane 8 = 80 , with the ultimate picture looking like the one sketched in Fig. 21. In this approach existence of a dispersal line cannot be detected; one simply "byintegrates" it. The semipermeable surface (line) m in Fig. 21 is the result of such a procedure. However, this is not a serious problem, since what one seeks in the plane (J = 80 is a region completely surrounded by semipermeable surfaces; the connecting corners must not leak. Hence, in a picture like Fig. 21, one has to check whether point A leaks or not. If it does not, then one can disregard the semipermeable line m. If point A would have been leaking, however, it would have been an indication that either the problem statement is not correct (i.e. no enclosed region exists from where collision can be guaranteed), or there must be other semipermeable surfaces which will define, together with (parts of) the already existing semipermeable surfaces, a new enclosed region.. A
X1
Fig. 21. Intersection of the semipermeable surfaces with the plane () =
(}o.
8
385
PURSUITEVASION GAMES
The next figure, Fig. 22, taken from Miloh and Sharma (1976), depicts several enclosed areas for different rm values. The figure shows the barrier crosssections at () = n12. The numerical data are as follows: VI = 1, V2 = 1, WI = W2 = 1. This means that the minimum turn radii of PI and P2 are 1 and 1, respectively. By constructing similar pictures for different sets of parameter values (different speed ratios and different manoeuvring capabilities), it is possible to gain insight into collision sensitivity with respect to these parameters.
3
2
Fig. 22. Initial points (x,;
X 2),
with
e=
X,
n/2, for which distance of CPA is rm •
As a final remark, collision was defined here as "two ships (considered as points on the plane) approaching each other closer than a given distance". If ships would actually collide this way, then they must have the shape of round disks, as viewed from above. The construction presented can, however, be applied to more realistic shapes, which is though slightly more complicated; such an example will be presented in section 7. Problem 5 (section 8) presents yet another slightly different problem in which the target set is still a round disk, but it is not centred at the origin.
8.7 ROLE DETERMINATION AND AN APPLICATION IN AERONAUTICS Heretofore we have considered the class of pursuitevasion games wherein the roles of the players are prescribed at the outset of the problem. These problems are sometimes called onetarget games which model situations such as a missile
386
DYNAMIC NONCOOPERATIVE GAME THEORY
chasing a plane. However, in a "dogfight" that takes place between two planes or ships which are both armed and capable of destroying their opponents, it may not be apparent at the outset who pursues whom. Accordingly, we now introduce twotarget games, where either player may be the pursuer or the evader, depending on their current configuration, and each target set is determined by the weapon range of the respective player. Each player's task is to destroy his opponent (i.e.manoeuvre the opponent into his target set) and to avoid destruction by his opponent. The solution method to twotarget games is essentially made up of two stages: (i) Given the initial conditions, determine the relative roles of the players; who pursues whom? It is clear that in a deterministic differential game the roles of the players are determined completely by the initial state and will not change during the course of the gameprovided that only pure strategies are permissible and that the playets act rationally. (ii) With the roles of the players determined, what are the optimal, e.g. timeoptimal, strategies for the players? Note that the question of who pursues whom has nothing to do with the cost function. Here we shall be concerned with the first stage. The state space consists of three mutually disjoint regions: two regions corresponding to victory by one or the other player, and the third region corresponding to a draw, i.e. neither player is capable of destroying his opponent. The states within the draw region may correspond to different situations. A particular possibility is a stalemate: Both players play certain strategies such that neither player will be destroyed and, moreover, a deviation from his equilibrium strategy by one of the players will lead to his own destruction. Another possibility is that the faster player can escapetacitly assuming that this player cannot win the game in finite time. The region R I of the state space, which corresponds to victory by PI, may be expected to be bounded by a surface comprising a part of the surface of AI' the target set for PI, together with a semipermeable surface LI which prevents the state from leaving R I provided that PI plays optimally in the neighbourhood ofL I. The interior of'R, does not contain any points ofP2's target set Az, and thus, assuming that the game terminates in finite time (i.e. a stalemate cannot arise), victory for PI is guaranteed for initial states inside R I (see Fig. 23a). That portion of AI' which forms part of the boundary of R I, is permeable only in the direction away from R I. In a similar way we can construct Rzthe region corresponding to victory by P2. If R I and R z do not fill up the whole state space, then points belonging to neither R I nor R z belong to R 3 , the draw region. Such a situation is illustrated schematically in Fig. 23b. Victory by P2 in region R z is guaranteed if we assume that no stalemate is possible, so that the target set Az is reached in finite time. States belonging to both L I and L z , (such as I in Fig. 23b) will terminate on
8
387
PURSUITEVASION GAMES
C~C <.
L1
R,
I
~ L2
(a)
\A2
R]
'
(b)
Fig. 23. Regions R 1 and R 2 corresponding to victory by PI and P2 respectively.
both Al and A2 , thereby giving rise to "victory" by both players, or, what is essentially the same, to a simultaneous confrontation. States belonging to L2 and not to L 1, or belonging to L 1 and not to L2' give rise to near miss situations. We now present an example of a twotarget game which has the additional feature that the two target sets are non smooth. Consider an aerial duel or dogfight, wherein each combatant wishes to destroy his opponent without himself being destroyed. In order to keep the analysis at a reasonable level, we take the dimension of the state space to be three, and assume that the two players move in a single horizontal plane. Equations of motion within this plane are the same as those in the ship collision avoidance problem, and they are given by (62). The constraints on the controls are also the same. We assume that VI > V 2 > 0 and 0 < WI < W2' i.e. PI has the advantage of having a higher speed, whereas P2 is more manoeuvrable. Pi has a confrontation range (which is his target set Ad consisting of a line segment oflength Ii in the direction of his velocity vector. If PI crosses the confrontation range of P2, or vice versa, the game ends. For the analysis to follow, we make the additional assumption 0 < II < 12 , We start with the construction of the barrier L which is the composition of Ll and L2 introduced earlier. Each portion of L, whether it leads to simultaneous confrontation or to near miss, is obtainable by solving a particular "local" differential game, with a terminal cost function defined only in the immediate neighbourhood of the simultaneous confrontation or near miss. If we adopt the convention that the local game has a positive (respectively, negative) value when it ends in favour ofP2 (respectively, PI), the equilibrium controls are determined from min max (VX1Xl + VX2X2 i 1 u
u
which leads to u 1* = sgn (X2 VX1u 2* = sgn VB'
Xl
+ Vel}) = 0,
VX2 + VB)}
(68)
(69)
388
DYNAMIC NONCOOPERATIVE GAME THEORY
The costate equations are the same as in (67), but in this case the terminal conditions are different as will be shortly seen. In fact, the terminal conditions will be different for each local game. From (69) it follows that, if u l remains constant for a duration of at least r units of time before the final time Tis reached, then, apart from an insignificant constant multiplier, assumed henceforth to be unity, the costates at time r = T  t satisfy VX1(r)= COS(eJ>wIUlr),} (70) VX2 (r) = sin (eJ>  Wi u 1 r), where eJ> is determined by the final conditions. From (89), (69) and (67) it further follows that
Vo = so that, if u
2
V2COS(O+eJ>Wlulr)= V2COS(O(T)+eJ>w2u2r),
(71)
:/= 0,
By introducing A if u I :/= 0,
= X2V
X1

XI VX 2 + VB' we also find
A= 
VI VX 1 ' so that,
(73) Since many of the local games will include singular controls, we now investigate their properties. From (69) it follows that a singular control for P2 is only possible for Vo == 0, which results in Vo == 0. Equation (71) now leads to the conclusion that the only possible singular control for P2 is u 2 == 0. Similarly, it can be shown that the only possible singular control for PI is l u == 0. Returning to the actual construction of ~, we shall first deal with simultaneous confrontation and thereafter with the near miss situations. For each situation, several types may be distinguished, since simultaneous confrontation and near miss occur in different ways. Simultaneous confrontation Consider the situation depicted in Fig. 24, where the trajectories of PI and P2 in real space have been sketched, together with the relative coordinate system at terminal time T. In the figure, PI turns to the right as fast as possible, and P2 does likewise, but to the left. The final state is given as follows: Xl (T) = 0, o(T) = rr, and X2 (T) is a variable that must satisfy (74)
8
PURSUITEVASION GAMES
389
~12 x2(Tl.
~
o
I
I
I
I
I
~,'P1
I
I
Fig. 24. A situation of simultaneous confrontation.
The equilibrium strategies are yl· (.) = + 1, y2·(.) =  1. Initial conditions associated with these strategies and terminal conditions can easily be found by backward integration of the state equations. The value of the game corresponding to these initial conditions is neither positive (victory for P2) nor negative (victory for PI) and is therefore determined as V = O. It should be clear from Fig. 24 that yl· and y2· are indeed in equilibrium since a deviation from either strategy will lead to a destruction of the deviating player. This is also confirmed by application of the necessary conditions. Let us assume for the moment that the terminal values of the costate variables satisfy the requirements VX1 (T) > 0, VX2 (T)
= 0, Ve (T) < O.
(75)
Then, it follows that ¢ = 0 in (70), which, together with (69) and (73), indeed leads to u 1• = + 1, u2• =  1. But why should (75) be a requirement? Let us consider the case of V (T): a small deviation of P2's position in the positive Xldirection in the final situation (keeping X2 (T) = 0, (J (T) = rr] will be to the advantage of P2, since he will be able to win the game from the new relative position. Hence, after this deviation, we have V > 0 instead of V = 0, and therefore VXI (T) > O. A parallel argument verifies the sign of Ve (T). Furthermore, since a change in X2 (T) does not change the outcome of the game (it remains V = 0), we have VX2 (T) = O. Let us now investigate how this simultaneous confrontation affects the barrier L in the (Xl> X2' (J) space, which separates the initial points wherefrom PI can win from those corresponding to P2's victory. Since X2 (T) is a parameter, subject to the constraint (74), a whole family of equilibrium X1
390
DYNAMIC NONCOOPERATIVE GAME THEORY
trajectories exists, all leading to a simultaneous confrontation of the type described. This family forms a semipermeable surface in the (Xl' X2' 8) space. For the actual construction we shall consider the intersection of this surface with the planes 8 = constant, say 8 = 8 0 , Equations (62), (67), (69) and (75) are integrated backwards in time until 8 = 80 is reached and for varying X2 (T) we obtain different points in the 8 = 8 0 plane, which make up a line segment. In Fig. 25, the solution of a specificexample has been sketched, for which the characteristios of the airplanes are Vl
= Wl = 1, V2 = i,
W2
= 3, 11 = 2 and 12 =
(76)
00.
In the figure, 8 0 = n13. The line segment which separates the states corresponding to victory by PI from those corresponding to P2's victory, according to the simultaneous confrontation described above, has number 1 attached to it. From initial positions slightly above this line, PI will win, and from initial positions slightly below, P2 will win. 6'
x,
Fig. 25. Intersection of semipermeable surfaces with the plane
e=
1t/3.
Different types of simultaneous confrontation exist, each one corresponding to a different manoeuvre. These different types are shown in Fig. 26. Type 1 has already extensively been discussed. Type 2 is similar, but now P2 turns right as fast as possible; X2 (T) is again the parameter. Types 3 and 4 are different in the sense that destruction is not caused by the confrontation range, but instead by collision; the angle 8 (T) is the parameter. Type 3 is a collision with PI turning right and P2 turning left. In type 4, both turn right. Type 5 calls for an Smanoeuvre by PI to get within the range of P2, who has the longer confrontation range, before P2, who is turning right, faces directly
8
391
PURSUITEVASION GAMES
toward PI, at a distance of X 2 (T) = 11; the turn angle ex of the final bend of PI is the parameter. Type 6 is an extension of the Smanoeuvre of PI in type 5, in which the final bend with a fixed angle exl is preceded by a straightline segment (singular arc) whose length functions as a parameter. This singular arc itself is preceded by a left turn. Type 7 is similar to type 6, the only difference being that now PI starts with a right turn. In addition to these 7 types, there are 7 other types, indicated by 1',2', ... , 7', obtained from the first 7 types by leftright reflections. For instance, in case 1', PI turns left and P2 turns right. X2(T)
P2 x2{T)
em
x,{T)
/
P,
type 1
type 2
P,
P,
P2
P2
/,
P2
type 3
/'
P2
,
}"
/,
a _,

,,
,,
P1 P1 P1
P1 type 4
type 5
type 6
type 7
Fig. 26. Different manoeuvres, all leading to simultaneous confrontation.
In all these types, the parameter is subject to certain constraints. We will not treat detailed analyses of all these types here; they are somewhat technical and not enlightening for the results to be obtained. The details can be found in Olsder and Breakwell (1974). Those types which give rise to curves lying in the f} = n/3 plane have been indicated in Fig. 25. A question that comes into mind now is whether these fourteen different types of simultaneous confrontation constitute the whole picture. In fact, we do not know for sure whether there are other types or not. The ones that have
392
DYNAMIC NONCOOPERATIVE GAME THEORY
been depicted in Fig. 25 have been obtained intuitively and the analysis indeed showed that they give rise to semipermeable surfaces. However, together with the semipermeable surfaces corresponding to near miss situations, for which the discussions will follow shortly, they separate (and enclose) the states wherefrom PI can guarantee victory from the other states, i.e. there is no hole in ~ (cf. section 6). If, however, another simultaneous confrontation maX2(T)
em
',.J
~
P1
P1 type b
em
,,
b
~}/,
P.?
fl;.
,
(T) ,
}"
}" P1
F\
P1 type h
type 9
X2(T)
X2(T)
arccos y;.
,,
, ,/,v,
/
P1 type d
s.
}" type f
}*
P1 type c
P2
A,
I',
I'
}*
type a
type e
P2
t
~
I',
/
/2
x,(T)
}t'
x,(T) I' P2
P2 type i
type j
Fig. 27. Different manoeuvres, all leading to near misses.
8
PURSUIT EVASION GAMES
393
noeuvre (or another notyetknown near miss manoeuvre) is discovered, it may lead to either a larger or a smaller R I . Near miss Essentially ten different types of near miss situations have been discovered so far, which are labelled as a, b, ... ,j in Fig. 27, which depicts the corresponding trajectories in real space. Another set of ten types exists, obtained by interchanging left and right, as in the case of simultaneous confrontation. We now give a brief rundown of the first ten types. In type a both players move to the right as fast as possible; P2's trajectory in relative space looks as sketched in Fig. 28,and its characteristic feature is that it touches the boundary of PI's confrontation range, but does not cross ithence the name "near miss". The parameter can be taken as X2 (T), which satisfies the constraint 0 ~ X2 (T) s min (iI, v2/wd and determines ()(T) through sin () (T) = WI X2 (T)/V2' The terminal conditions for the costate equations are V (T) > 0, V (T) = Vo(T) = O. Type b resembles type a, the difference being that u 2• =  1. Types c and d have e(T) = n/2 and end with a singular arc by P2. The length of this arc is the parameter. These two types only occur if wIll ;;:: V2' For wIll < V2' the types c and d do not arise, but instead we have the types e, f, g and h, for which X2 (T) = II' For types e and h the parameter is () (T), for f and g it is the length of the singular arc of P2's trajectory. Types i andj resemble e and h, respectively, but now PI is almost destroyed at the far end of P2's confrontation range. In e and h it is P2 who can barely escape the far end of PI's confrontation range. Note that in the figures corresponding to i andj, the relative coordinate system at t = Thas been drawn with respect to P2 (instead of PI, as in all other types). In these two latter types, P2 has a singular arc, the length of which is the parameter. Xj
X2
Fig. 28. The trajectory corresponding to type a in Fig. 27, now in relative space.
In Fig. 25, we have drawn the line segments corresponding to the applicable near miss cases for the example with characteristics (76), which are the types a, b, c and d and their mirror images a', b', c', d', The crucial question now is
394
DYNAMIC NONCOOPERATIVE GAME THEORY
whether all the line segments (both simultaneous confrontation and near miss) enclose a region, together with A l . It should be clear from the figure that two such regions exist: an "inner one" and an "outer one". Closer scrutiny reveals that some corners of the outer region leak, and the semipermeable surfaces which enclose the inner region constitute the barrier which separates R, from R z. To summarize, we have given a complete characterization of the regions leading to victory for PI and P2, and have arrived at the qualitative result that if PI plays optimally along the boundary of his region of victory (R,), then P2 is captured within R, (and likewise with the roles of PI and P2 interchanged)" If he cannot stay within R, for ever, then he has to leave this region via A l which leads to termination of the game (with PI's victory).
8.8 PROBLEMS 1. Instead of the pursuitevasion game of Example 2, consider the equivalent
problem described by Xl
=
ul +u z,
xz=lx l , L =
Xz
Xl
(0) > 0,
xz(O) =0,
(T),
whereT = min {t: Xl (t) = O}, 1 ~ u i ~ 0, i = 1,2. Obtain the final condition for the costate equation by making use of (11), and show that the result is in complete agreement with the corresponding condition derived in Example 2. 2. Calculate the value function V (x) for Example 5, and show that both V and aV I ax are continuous along the universal surface. 3. Consider the "lady in the lake" with an informational advantage to E, i.e. the strategies are of the form u l (t) = yl (x (t», U Z (t) = yZ (x (t), u 1 (t». E starts at the centre of the pond and spirals outward as fast as possible, while maintaining () (t) == n. Show that the circle r = Rvz will be reached in finite time. 4. This problem addresses to the derivation of the equilibrium solution of the "lady in the lake" geometrically, without utilizing Thm 2. Start from the decision point Eo in Fig. 29, and assume that P chooses u l =  1 (i.e. he moves such that the pond is on his left in real space); the case u l = + 1can be dealt with analogously. E will then swim in a straight line to the shore. The question is: in which direction will she swim? Suppose that she swims in the direction fJ (see Fig. 29). (i) Why can fJ be restricted to IfJl s n12? tNote that since
11
=
00,
R 3 is empty.
8
PURSUITEVASION GAMES
395
(ii) Show geometrically that swimming towards E 3 is better for E then to swim to E 2 , where E 3 is only an edistance away from E 2 • (Show that the time needed for P to cover the distance E 2 E 3 is greater than the time needed for E to cover the distance E 2E 3 , where E 2is the projection of E 2 on E oE 3 .) (iii) Show that swimming towards E 4 is the best E can do. (Make use of the fact that the circle through the points E 4 , Eo and C lies inside the
pond.)
p
Fig. 29. Geometrical derivation of the saddlepoint solution.
*5. Consider the following twocars model: Xl = (J2/2) sin e X2 u l ,
X2 = (J2/2) cos e+ Xl u l 1, l+2J(2)u 2 , = u
e
together with the constraints Iui(t) I ~ 1, i = 1, 2. The target set 7\. is defined by xi + (X2 + R)2  R 2 = 0, where R = .J(2)/8. P1 tries to avoid A, whereas P2 would like the state to enter 7\.. (i) Attempt to determine the BUP in the standard way and show that it lies partly inside A. Therefore, the semipermeable surface calculated cannot be the BUP. (ii) Determine the actual BUP, in a way similar to that employed in the dolichobrachistochrone problem. *6. Consider the system described by
+ 1 + 2 sin u 2, x2 =  3u l + 2 cos u 2 , Xl = X2
with Iull ~ 1 and no constraints imposed on u 2 • The target set is the half line (Xl> 0, X 2 = 0). P1 wants to steer the system to the target set as soon
396
DYNAMIC NONCOOPERATIVE GAME THEORY
as possible, whereas P2 wants to do the opposite. The game is played in the half plane X2 ;:::: o. (i) Show that a barrier starts at the origin (x, = 0, X2 = 0) and ends at X2 = 1. (ii) Show that the continuation ofthe barrier, for X2 > 1, is an equivocal line. *7. The differential game addressed to in this problem is the same as that of problem 6, except that the dynamics are replaced by Xl =x2+ 1+ ui+2sinu 2, x2 =  3ui + 2 cos u 2 , where PI chooses ui and ui, subject to the constraint (Ui)2 + (ui)2/ 8 :::;; 1, where 8 is a small positive number. P2 chooses, as before, u 2 , without any restrictions. For 8 = 0 this problem reduces to problem 6. It is conjectured that the present more general version features a switching envelope. Prove or disprove this conjecture, and also investigate the limiting behaviour of the solution when 8 .j. o. 8. The kinematics of this problem are described by (19)with V2 = The only restriction on the controls is lu l (t)1 :::;; 1. The target set of PI is given by
t.
X2 3 271:} A, = { (Xt> X2):Xl2 +X 22:::;; 4 and arctan Xl:::;; and the target set of P2 is given by
PI's objective is to force (Xl, X2) into A l without passing through A2, whereas P2 wants the state to enter A2 without passing through A l. Show that the region of victory R; for Pi is as indicated in the following figure:
2
8
PURSUITEVASION GAMES
397
Check also that the semipermeable curve DE lies completely outside 1\2 and that the corner E does not leak. *9. We are given a zerosum differential game for which the Isaacs condition does not hold everywhere, but instead the Isaacs equation admits two different solutions depending on the order of the min max operations. Prove that the associated differential game does not admit a saddlepoint solution in feedback strategies, but it admits a saddle point when one or the other player is given an informational advantage on the current action of his opponent. Make use of this result in providing a possible verification for the existence of a saddle point in the lady in the lake problem.
8.9
NOTES
Section 8.2 The theory of deterministic pursuitevasion differential games was singlehandedly created by Isaacs in the early fifties, which culminated in his book (Isaacs, 1975; first edn, 1965). Blaquiere and Leitmann, independently of Isaacs, also obtained Thms 1 and 2 (see Blaquiere et al., 1969). Their geometric approach is essentially the same as that ofIsaacs, but it is more rigorously stated. In this chapter, we follow essentially Isaacs' approach, though the terminology and interpretation are slightly different. Starting in the sixties, many researchers have worked on a rigorous establishment of the validity of the Isaacs equation. The starting point is often a timediscretization and a carefully defined information structure. Depending on this information structure, lower and upper values of the game are defined, and then appropriate limiting arguments are incorporated, under which these upper and lower values approach each other; the resulting limiting value, in each case, is declared to be the value of the game. References in this connection are Fleming (1961,1964), Varaiya and Lin (1969), Roxin (1969), Friedman (1971), Elliott and Kalton (1972), Elliott (1977) and Elliott et aL. (1973). For extended solution concepts of ordinary differential equations, when, for instance, thejfunction in the state equation is not Lipschitz, the reader is referred to Krassovski and Soubbotine (1977), which also includes a rigorous definition of strategy, and also to Hajek (1979). The notions of playability and nontermination aided in the understanding of the socalled "bangbangbang surface" introduced in Isaacs (1969); see also Ciletti (1970) and Lewin (1976). Section 8.3 The concept of capturability also originates in Isaacs' works. A mathematically more rigorous approach to capturability is given in Krassovski and Soubbotine (1977), leading to the socalled 'Theorem of the Alternative". A different approach to capturability has been developed by Pontryagin (1967) and subsequently by Hajek (1975). In this setup the pursuer knows the current action of the evader, which he "neutralizes" by means of his own control. The leftover power (ifany) is used by the pursuer to steer the system to the target set. The famous "homicidal chauffeur game" is very versatile in that its solution features many singular surfaces. We do not provide here the complete solution to this game; it was partly solved in Isaacs (1975). The complete solution is spread out over Merz (1971) and Breakwell (1973). The notion of a leaking corner was introduced in Bernhard (1971).
398
DYNAMIC NONCOOPERATIVE GAME THEORY
Section 8.4 The transition, dispersal, universal and equivocal lines were first introduced in Isaacs (1975). The focal line and the switching envelope appeared for the first time in Merz (1971) and Breakwell (1973). Introduction of noise in the systems equations tends to "smoothen" the singular surfaces which will therefore no longer exist as such; see the recent work by Pachter and Yavin (1979). Section 8.5 The "lady in the lake" appeared in the Russian translation of Isaacs' book; see also Breakwell (1977). The dolichobrachistochrone problem was stated and partly solved in Isaacs (1975). The complete solution was given later in Breakwell (1971) and Chigir (1976). We essentially follow the latter reference. In the survey article (Ho and Olsder, 1980), a rather exhaustive list of other known solved or partly solved problems is given, including such games as "lion and man", "obstacle tag", "conic surveillance evasion" and the "suicidal pedestrian". References to extensions with "two pursuers and one evader" or "one pursuer and two evaders" are also given there. Section 8.6 The application in maritime ship collision avoidance follows Sharma (1976), and Vincent and Peng (1973). See also Merz (1973) and Olsder and Walter (1978).
Section 8.7 Role determination and its application of in aeronautics were first discussed in Olsder and Breakwell (1974). Similar problems, but with different target sets, have been considered in Merz (1976). For other aeronautical applications see Peng and Vincent (1975) and Vincent et al. (1976). Section 8.8 Problem 5 is from Vincent and Peng (1973). Problem 6 is a special version of a problem considered in Isaacs (1975). Problem 8 is from Getz and Pachter (1980).
Appendix I
Mathematical Review
This appendix provides some background material on those aspects of real analysis and optimization which are frequently used in the text; it also serves to introduce the reader to our notation and terminology. For a more detailed exposition on the topics covered here, a standard reference is Luenberger (1969).
1.1 SETS A set S is a collection ofelements. If s is a member (element) of S, we write s E S; if s does not belong to S, we write s f$ S. If S contains a finite number of elements, it is called a finite set; otherwise it is called an infinite set. If the number of elements of an infinite set is countable (i.e, if there is a onetoone correspondence between its elements and positive integers), we say that it is a denumerable (countable) set, otherwise it is a nondenumerable set. ' A set S with some specific structure attached to it is called a space, and it is called a linear (vector) space if this specificstructure is of algebraic nature with certain wellknown properties which we assume that the reader is familiar with. If S is a vector space, a subset of S which is also a vector space is called a subspace. An example of a vector space is the ndimensional Euclidean space (denoted as R") each element of which is determined by n real numbers. An xeRn can either be writtenasa row vector X = (Xl"" ,xn)wherext> .... .x, are real numbers and denote the components of X, or as a column vector which is the "transpose" of (Xl' •.. ,xn ) [written as X = (Xl' . . . , xnYl We shall adopt the latter convention in this text, unless indicated otherwise.
Linear independence Given a finite set of vectors S1> ••• , s, in a vector space S, we say that this set of vectors is linearly independent if the equation = I lXiS; = 0 implies IX; = 0 Vi = 1, ... , n. Furthermore, if every element of S can be written as a linear
L;
399
400
DYNAMIC NONCOOPERATIVE GAME THEORY
combination of these vectors, we say that this set of vectors generates S. Now, if S is generated by such a linearly independent finite set (say, X), it is said to be finite dimensional with its unique "dimension" being equal to the number of elements of X; otherwise, S is infinite dimensional.
1.2 NORMED LINEAR (VECTOR) SPACES A normed linear vector space is a linear (vector) space S which has some additional structure of topological nature. This structure is induced on S by a realvalued function which maps each element UES into a real number Ilull called the norm of u. The norm satisfies the following three axioms:
Ilull ~ 0 v u e S: Ilull = 0 if, and only if, u = o. (2) Ilu + vii:::; Ilu II + Ilvll for each u, v ES. (1)
(3) II~ull
= 1~1'l ul
V~ER
and for each
UES.
Convergent sequences, Cauchy sequence An infinite sequence of vectors {Sl,S2' . . . , s., ..... } in a normed vector space S is said to converge to a vector s if, given an e > 0, there exists an N such that lis  sill < s for all i ~ N. In this case, we write s, + s, or lim,~ 00 s, + S, and call S the limit point of the sequence {s.}.More generally, a point s is said to be a limit point of an infinite sequence {s.} ifit has an infinite subsequence {SiJ that converges to s. An infinite sequence {s.] in a normed vector space is said to be a Cauchy sequence if, given an s > 0, there exists an N such that lis"  Sm II < e for all n, m ~ N. A normed vector space S is said to be complete, or a Banach space, if every Cauchy sequence in S is convergent to an element of S.
Open, closed and compact sets
Let S be a normed vector space. Given an s E S and an e > 0, the set N .(s) = {x E S: Ilxsll < s] is said to be an eneiqhbourhood of s. A subset X of S is open if,for every x EX, there exists an s > 0 such that N.(x) c X. A subset X of S is closed if its complement in S is open; equivalently, X is closed if every convergent sequence in X has its limit point in X. Given a set XeS, the largest subset of X which is open is called the interior of X and denoted as X. A subset X of a normed vector space S is said to be compact if every infinite sequence in X has a convergent subsequence whose limit point is in X. If X is finite dimensional, compactness is equivalent to being closed and bounded.
Transformations and continuity A mapping f of a vector space S into a vector space T is called a transformation
APPENDIX I
401
or a function, and is written symbolically f: S + Tor y = f(x), for XES, yET. f is said to be a functional if T = R. Letf: S + Twhere S and Tare normed linear spaces.f is said to be continuous at X o E S if, for every s > 0, there exists a J > 0 such that f(x) E N .(f(xo)) for every x E N s (xo). Iffis continuous at every point of S it is said to be continuous everywhere or, simply, continuous.
1.3 MATRICES An (m x n) matrix A is a rectangular array of numbers, called elements or entries, arranged in m rows and n columns. The element in the ith row andjth column of A is denoted by a subscript ij, such as aij or [A]ij' in which case we write A = {aij}' A matrix is said to be square if it has the same number of rows and columns; an (n x n) square matrix A is said to be an identity matrix if a i j = 1, i = 1, ... ,n, and aij = 0, i =F j, i,j = 1, ... , n. An (n x n) identity matrix will be denoted by In or, simply, by I whenever its dimension is clear from the context. The transpose of an (m x n) matrix A is the (n x m) matrix A' with elements a;j = a ji. A square matrix A is symmetric if A = A'; it is nonsingular if there is an (n x n) matrix called the inverse of A, denoted by A 1, such that A  1A
=I
= AA 
1.
Eigenvalues and quadratic forms Corresponding to a square matrix A, a scalar A. and a nonzero vector x satisfying the equation Ax = A.x are said to be, respectively, an eigenvalue and an eigenvector of A. A square symmetric matrix A whose eigenvalues are all positive (respectively, nonnegative) is said to be positive definite (respectively, nonnegative definite or positive semidefinite). An equivalent definition is as follows. A symmetric (n x n) matrix A is said to be positive definite (respectively, nonnegative definite) if x' Ax > 0 (respectively, x' Ax ;;::: 0) for all nonzero x E R". The matrix A is said to be negative definite (respectively, nonpositive definitei if the matrix (  A) is positive (respectively, nonnegative) definite. We symbolically write A > 0 (respectively, A ;;::: 0) to denote that A is positive (respectively, nonnegative) definite.
1.4 CONVEX SETS AND FUNCTIONALS A subset C ofa vector space S is said to be convex if for every u, VEe and every aE[0,1], we have au+ (1a)vEC. A functional f: C + R defined over a
402
DYNAMIC NONCOOPERATIVE GAME THEORY
convex subset C ofa vector space S is said to be convex if, for every u, VEe and every scalar tx E [0, 1], we havej'(z« + (1  tx)v) ~ txf(u) + (1  tx)f(v). If this is a strict inequality for every tx E (0, 1), then f is said to be strictly convex. The functionalf is said to be concave if (  f) is convex, and strictly concave if (  f) is strictly convex. A functional f: R" + R is said to be differentiable if, with x = (x 1 , ... , x n )' ERn, the partial derivatives of f with respect to the components of x exist, in which case we write Vf(x)
= [of(x)/oxi> .... , of(x)/oxnJ.
Vf(x) is called the gradient of'jat x and is a row vector. We shall also use the notationj, (x) or df(x)/dx to denote the same quantity. If we partition x into
two vectors y and z of dimensions n 1 and nnl' respectively, and are interested only in the partial derivatives ofjwith respect to the components of y, then we use the notation VJ(y, z) or of(y, z)/oy to denote this partial gradient. Let g : R" + R" be a vectorvalued function whose components are differentiable with respect to the components of x ERn. Then, we say that g(x) is differentiable, with the derivative dg(x)/dx being an (m x n) matrix whose ijth element is Ogi(X)/OXj • (Here gi denotes the ith component of g.) The gradient Vf(x) being a vector, its derivative (which is the second derivative off: R" + R) will thus be an (n x n) matrix, assuming that f(x) is twice continuously differentiable in terms of the components of x. This matrix, denoted by V 2f(x) , is symmetric, and is called the Hessian matrix off at x. This Hessian matrix is nonnegative definite for all x E R" if, and only if, f is convex.
1.5 OPTIMIZATION OF FUNCTIONALS Given a functionalf: S + R, where S is a vector space, and a subset X the optimization problem
~
S, by
minimize f(x) subject to x E X we mean the problem offinding an element x* or an optimal solution) such that f(x*)
~f(x)
E
X (called a minimizing element
't/XEX
This is sometimes also referred to as a globally minimizing solution in order to differentiate it from the other alternativea locally minimizing solution. An element XO E X is called a locally minimizing solution if we can find an e > 0 such that f(x O ) ~f(x)
403
APPENDIX I
i.e. we compare f(x O ) with values of f(x) In that part of a certain eneighbourhood of x", which lies in X. For a given optimization problem, it is not necessary that an optimal solution exists; an optimal solution will exist if the set of real numbers {f(X):XEX] is bounded below and there exists an X*EX such that inf {f(X):XEX] =f(x*), in which case we write
f(x*) = inf f(x) = min xeX
f(x).
xeX
If such an x* cannot be found, even though inf if(x): x E X} is finite, we simply say that an optimal solution does not exist; but we declare the quantity inf{f(x):xEX}
or
inf f(x) xeX
as the optimal value of the optimization problem. If {f(x): x E X} is not bounded below, i.e. inf xeX f(x) =  00, then neither an optimal solution nor an optimal value exists. An optimization problem which involves maximization instead of minimization may be converted into a minimization problem by simply replacing f by  f Any optimal solution of this minimization problem is also an optimal solution for the initial maximization problem, and the optimal value of the latter, denoted sUPxexf(x), is equal to minus the optimal value of the former. When a maximizing element x* EX exists, then supxeX f(x) = maxxex f(x) = f(x*).
Existence of optimal solutions In the minimization problem formulated above, an optimal solution exists if X is a finite set, since then there is only a finite number ofcomparisons. When X is not finite, however, existence of an optimal solution is guaranteed if f is continuous and X is compacta result known as the Weierstrass theorem. For the special case when X is finite dimensional, we should recall that compactness is equivalent to being closed and bounded.
Necessary and sufficient conditions for optimality Let S = R",andf: R" ~ R be a differentiable function. If X is an open set, a first order necessary condition for an optimal solution to satisfy is Vf(x*)
= O.
If, in addition, f is twice continuously differentiable on R", a second order necessary condition is
V 2f(x*) ~ O. The pair ofconditions {Vf (x*)
= 0, V 2f(x*) >
O} is sufficient for x* E X to be a
404
DYNAMIC NONCOOPERATIVE GAME THEORY
locally minimizing solution. These conditions are also sufficient for global optimality if, in addition, X is a convex set andfis a convex functional on X. These results, by and large, hold also for the case when S is infinite dimensional, but then one has to replace the gradient vector and the Hessian matrix by first and second Gateaux (or Frechet) derivatives, and positive definiteness requirement is replaced by "strong positiveness" of an operator. See Luenberger (1969) for these extensions.
Appendix II
Some Notions of Probability Theory
This appendix briefly presents some notions of probability theory which are used in the text. For more complete exposition the reader should consult standard texts on probability theory, such as Feller (1971), Papoulis (1965), Ash (1972) and Loeve (1963). In two of the sections of the book (viz. sections 5.3 and 6.7) we have used some material on stochastic processes and stochastic differential equations which is, however, not covered in this appendix; for this, the reader should consult Wong (1971), Gikhman and Skorohod (1972), Fleming and Rishel (1975) and the references cited in section 6.7.
11.1
INGREDIENTS OF PROBABILITY THEORY
Let 0 denote a set whose elements are the outcomes of a random experiment. This random experiment might be the toss of a coin (in which case 0 has only two elements) or selection of an integer from the set [0, (0) (in which case 0 is countably infinite) or the continuous roulette wheel which corresponds to a nondenumerable n. Any subset of n on which a probability measure can be defined is known as an event. Specifically, if F denotes the class of all such events (i.e. subsets of 0), then it has the following properties: (1) OEF. (2) If A E F, then its complement A C
= {w EO: io ¢ A} also belongs to F. (The empty set, 4>, being the complement of 0, also belongs to F.) (3) If Ai> A 2 E F, then Ai (\ A 2 and Ai U A2 also belong to F. (4) If A i,A 2 , • • • ,Ai>' .. denote a countable number of events, the countable intersection n:,,= 1 Ai and the countable union UiOCl= 1 Ai are also events (i.e, E F). The class F, thus defined, is called a sigma algebra (aalgebra) and a probability measure P is a nonnegative functional defined on the elements of this aalgebra. P also satisfies the following axioms. (1) For every event A E F, 0 ~ P(A) s 1, and P(O) = 1. 405
406
DYNAMIC NONCOOPERATIVE GAME THEORY
(2) If A I,A2 EF and A l n A 2 = rJ> (i.e. Al and A 2 are disjoint events), P(A I u A 2 ) = P(Ad + P(A 2 ) .
(3) Let {Ai} denote a (countably) infinite sequence in F, with the properties Ai + I C Ai and n;: 1 Ai = rJ>. Then, the limit of the sequence of real numbers {P(A i ) } is zero (i.e, limi_ 00 P(AJ + 0). The triple (0, F, P) defined above is known as a probability space, while the pair (0, F) is called a measurable space. If 0 = R", then its subsets of interest are the ndimensional rectangles, and the smallest aalgebra generated by these rectangles is called the ndimensional Borel aalgebra and is denoted B", Elements of B" are Borel sets, and the pair (R", B") is a Borel (measurable) space. A probability measure defined on this space is known as a Borel probability measure.
Finite and countable probability spaces If 0 is a finite set (say, 0 = {WJ,W2"" ,w ft } ) , we can assign probability weights on individual elements of 0, instead of on subsets of 0, in which case we write Pi to denote the probability of the single event Wi We call the ntuple (PI, P2' .•• , pft) a probability distribution over O. Clearly, we have the restriction 0 ::5: Pi ::5: 1 'V i = 1, .. ,n, and furthermore if the elements of 0 are all disjoint (independent) events, we have the property = I Pi = 1. The same convention applies when 0 is a countable set (i.e. o = {WI' W2, . . . ,Wi' . . . } ), in which case we simply replace n by 00_
L:;
11.2 RANDOM VECTORS Let (OI,Fd and (02,F 2) be two measurable spaces andf be a function defined from the domain set 0 1 into the range set O 2, If for every A EF 2 we have fI(A)~{wEOI:f(w)EA}EFJ, then f is said to be a measurable function, or a measurable transformation of (0 1 , F I) into (0 2, F 2)' If the latter measurable space is a Borel space, then f is said to be a Borel function, in which case we denote it by x. For the special case when the Borel space is (0 2, F 2) = (R, B),the Borel function x is called a random variable. For the case when (0 2, F 2) = (R", B"), x is known as an ndimensional random vector. If there is a probability measure P defined on (Ob F I)which we henceforth write simply as (0, F)then the random vector x will induce a probability measure Px on the Borel space (R", B"), so that for every BE B" we have Px(B) = P(xI(B)). Since every element of B" is an ndimensional rectangle, the arguments of Px are in general infinite sets; however, considering the collection ofsets {~E R": ~i < aj, i = 1, ... , n} in B" where a, (i = 1, ... ,n) are real numbers, restriction of Px to this class is also a probability measure whose argument is now a finite set. We denote this probability measure by
407
APPENDIX II
PIC = Px(al> a2• . . . • an) and call it a probability distribution function of the
random vector x. Note that
Px(al> a2.···. an) = p({WEn:XI(W) < al. X2(W) < a2"'" xn(w) < an}), where Xi is a random variable denoting the ith component of x. Whenever n> 1. P, is sometimes also called the cumulative (joint) probability distribution function. It is a wellestablished fact that there is a onetoone correspondence between Px and p x • and the subspace on which P; is defined can generate the whole B" (cf. Loeve, 1963). Independence Given the probability distribution function of a random vector X = (Xl> •.. , xn>'. the (marginal) distribution function of each random ,¥ariable Xi can be obtained from
PX i (aJ = The random variables
Xl> ••••
lim x n are said to be (statistically) independent if
Px(al' ...• an) = Px1(ad PX2(a2) .... Px.(an). for all scalars alo ....• an' Probability density function A measure defined on subintervals of the real line and which equals to the length of the corresponding subinterval(s) is called a Lebesgue measure. It assigns zero weight to countable subsets of the real line. and its definition can readily be extended to ndimensional rectangles in R", Let P be a Borel probability measure on (R", B")such that any element of B" which has a Lebesgue measure of zero has also a Pmeasure of zero; then we say that P is absolutely continuous with respect to the Lebesgue measure. Now, a wellestablished result of probability theory says that (cf. Loeve, 1963), if x: ('1, F, P) + (R", B", Px) is a random vector and if P x is absolutely continuous with respect to the Lebesgue measure, there exists a nonnegative Borel function Px(') such that, for every A E B", Px(A) =
L pA~)
d~.
Such a function px(.) is called the probability density function of the random vector x. In terms of the distribution function Px' the preceding relation can be written as
PAal>' .. ,an) = f~l", for every scalar aI' ... , an'
.,. f'?", pA~lo
... , ~n)d~l
.... d~n
408
DYNAMIC NONCOOPERATIVE GAME THEORY
11.3
INTEGRALS AND EXPECTATION
Let x: (n, F, P) > (R", B", Px) be a random vector andf: (R", B") > (R", B") be a nonnegative Borel function. Then, f can also be considered as a random vector from (n, F) into (R", B"), and its average value (expected value) is defined either by JrJ(x(w))P(dw) or by JR"f(~)PAd~) depending on which interpretation we adopt. Both of these integrals are well defined and are uniquely equal in value. Iff changes signs, then we take f = f+  f where both f+ and f are nonnegative, and write the expected value off as
provided that at least one of the pair E[f+(x)] and E[f (x)] is finite. Since, by definition, Px(d~) = Px(~ + d~)  P A~) g dPx( ~), this integral can further be written as E[f(x)] = JIl"f(~)dPx(~) which is a LebesgueStieltjes integral and which is the convention that we shall adopt. For the special case whenf(x) = x we have E[x] g
JR" ~dPA~)
gi
which is known as the mean (expected) value of x. The covariance of the ndimensional random vector x is defined as E[ (x  i) (x  in =
JR" (~ 
i) (~ .i)' dPx(~)
g cov (x)
which is a nonnegative definite matrix of dimension (n x n). Now, if Px is absolutely continuous with respect to the Lebesgue measure, E[f] can equivalently be written, in terms of the corresponding density function Px' as If n consists only of a finite number of independent events WI' W2' ... , Wn, then the integrals are all replaced by the single summation E[f(x(w))] =
n
L: f(x(W;))Pi
i= 1
where Pi denotes the probability of occurrence of event co; For a countable set
n, we have the counterpart
n
E[f(x(w))] = lim nCX)
L: f(x(w;))p;.
i=l
Appendix III
Fixed Point Theorems
In this appendix we give, without proof, two theorems which have been used in Chapters 3 and 4 of the text. Proofs of these theorems are rather lengthy and can be found in the references cited.
Fixed point theorems THEOREM 1 (Brouwer fixed point theorem) IfS is a compact and convex subset of R" andfis a continuousfunction mapping S into itself, then there exists at least one XES such that f(x) = x. 0
Several proofs exist for this fixed point theorem, one of the most elementary ones being given by Kuga (1974). A generalization of this theorem is the Kakutani fixed point theorem (Kakutani, 1941) given below. DEFINITION 1 (Upper semicontinuity) Let f be a function defined on a normed linear space X, and associating with each x E X a subset f(x) of some (other) normed linear space Y. Then.jis said to be upper semicontinuous (usc) at a point Xo E X if, for any sequence {x;} converging to Xo and any sequence {y; E f(x;)} converging to Yo, we have Yo E f(xo). The function f is 0 upper semicontinuous if it is usc at each point of X.
THEO~EM 2 (Kakutani) Let S be a compact and convex subset of R", and letf be an upper semicontinuousfunction which assigns to each XES a closed subset of S. Then there exists some XES such that x E f{x). 0
409
This page intentionally left blank
References Anderson, B. D. O. and J. B. Moore (1971). "Linear Optimal Control". Prentice Hall, New York. Ash, R. B. (1972). "Real Analysis and Probability". Academic Press, New York and London. Aumann, R. J. (1961). Borel structures for function spaces. Illinois Journal of Mathematics, 5, 614630. Aumann, R. J. (1964). Mixed and behavior strategies in infinite extensive games. In M. Dresher, L. S. Shapley and A. W. Tucker, eds, "Advances in Game Theory", pp. 627650. Princeton University Press, Princeton, New Jersey. Aumann, R. J. and M. Maschler (1972). Some thoughts on the minimax principle. Management Science, 18, no. 5, part 2, 5463. Bagchi, A. and T. Basar (1981). Stackelberg strategies in linearquadratic stochastic differential games. Journal of Optimization Theory and Applications, 35, no. 3. Bagchi, A. and G. J. Olsder (1981). Linear stochastic pursuit evasion games. Journal of Applied Mathematics and Optimization, 7, 95123. Balakrishnan, A. V. (1976). "Applied Functional Analysis". SpringerVerlag, Berlin. Basar, T. (1973). On the relative leadership property ofStackelberg strategies. Journal of Optimization Theory and Applications, 11, no. 6, 655661Basar, T. (1974). A counter example in linearquadratic games: Existence of nonlinear Nash solutions. Journal of Optimization Theory and Applications, 14, no. 4, 425430. Basar, T. (1975). Nash strategies for Mperson differential games with mixed information structures. Automatica, 11, no. 5, 547551Basar, T. (I 976a). On the uniqueness of the Nash solution in linearquadratic differential games. International Journal of Game Theory, 5, no. 2/3, 6590. Basar, T. (I 976b). Some thoughts on saddlepoint conditions and information structures in zerosum differential games. Journal of Optimization Theory and Applications, 18, no. 1, 165170. Basar, T. (I 977a). Two general properties of the saddlepoint solutions of dynamic games. IEEE Transactions on Automatic Control, AC22, no. I, 124126. Basar, T. (I 977b). Informationally nonunique equilibrium solutions in differential games. SIAM Journal on Control and Optimization, 15, no. 4, 636660. Basar, T. (I 977c). Existence of unique Nash equilibrium solutions in nonzerosum stochastic differential games. In E. O. Roxin, P. T. Liu and R. L. Sternberg, eds, "Differential Games and Control Theory II", pp. 201228. Marcel Dekker, New York. Basar, T. (l977d). "Multicriteria Optimization of Linear Deterministic and Stochastic Systems: A Survey and Some New Results". Marmara Research Institute Publication, Applied Mathematics Division Report no. 37. Basar, T. (1978). Decentralized multicriteria optimization oflinear stochastic systems. 411
412
REFERENCES
IEEE Transactions on Automatic Control, AC23, no. 3, 233243. Basar, T. (1979a). Stochastic Linear quadratic nonzerosum differential games with delayed information patterns. In A. Niemi, ed., "Proceedings of the 7th IFAC World Congress", vol. 2. Pergamon Press, New York. Basar, T. (1979b). Information structures and equilibria in dynamic games. In M. Aoki and A. Marzollo, eds, "New Trends in Dynamic System Theory and Economics", pp. 355. Academic Press, New York and London. Basar, T. (I 979c). Stochastic stagewise Stackelberg strategies for linearquadratic systems. In M. Kohlmann and W. Vogel, eds, "Stochastic Control Theory and Stochastic Differential Systems" pp. 264276., Lecture Notes in Control and Information Sciences, vol. 16. SpringerVerlag, Berlin. Basar, T. (I 980a). On the existence and uniqueness of closedloop sampleddata Nash controls in linearquadratic stochastic differential games. In K. Iracki, K. Malonowski and S. Walukiewicz, eds, "Optimization Techniques", part I, pp. 193203. Lecture Notes in Control and Information Sciences, vol. 22. SpringerVerlag, Berlin. Basar, T. (1980b). Equilibrium strategies in dynamic games with multi levels of hierarchy. In "Proceedings of the Second IF AC Symposium on Large Scale Systems: Theory and Applications". Toulouse, France. Basar, T. 098Oc). Memory strategies and a general theory for Stackelberg games with partial dynamic information. In "Proceedings of the Fourth International Conference on Analysis and Optimization of Systems", Versailles, France. Basar, T. (I 980d). Hierarchical decision making under uncertainty. In P. T. Liu, ed., "Dynamic Optimization and Mathematical Economics", pp. 205221. Plenum Press, New York. Basar, T. (1981). A new method for the Stackelberg solution of differential games with sampleddata state information. In "Proceedings of the Eighth IFAC World Congress", Kyoto, Japan. Basar, T. and Y. C. Ho (1974). Informational properties of the Nash solutions of two stochastic nonzerosum games. Journal of Economic Theory, 7, no. 4, 370387. Basar, T. and M. Mintz (1972). On the existence of linear saddlepoint strategies for a twoperson zerosum stochastic game. In "Proceedings of the Eleventh IEEE Conference on Decision and Control", New Orleans, pp. 188192. Basar, T. and M. Mintz (1973). A multistage pursuitevasion game that admits a Gaussian random process as a maximin control policy. Stochastics, 1, no. I, 2569. Basar, T. and G. J. Olsder (1980a). Mixed Stackelberg strategies in continuouskernel games. IEEE Transactions on Automatic Control, AC25, no. 2. Basar, T. and G. J. Olsder (I 980b). Teamoptimal closedloop Stackelberg strategies in hierarchical control problems. Automatica, 16, no. 4. Basar, T. and H. Selbuz (1976). Properties of Nash solutions of a twostage nonzerosum game. IEEE Transactions on Automatic Control, AC21, no. 1,4854. Basar, T. and H. Selbuz (l979a). A new approach for derivation of closedloop Stackelberg strategies. In "Proceedings of the 19th IEEE Conference on Decision and Control", San Diego, California, pp. 11131118. Basar, T. and H. Selbuz (I 979b). Closedloop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Transactions on Automatic Control, AC24, no. 2, 166179. Bellman, R. (1957). "Dynamic Programming". Princeton University Press, Princeton, New Jersey.
REFERENCES
413
Bensoussan, A. (1971). Saddle points of convex concave functionals. In H. W. Kuhn and G. P. Szego, eds, "Differential Games and Related Topics", pp. 177200. NorthHolland, Amsterdam. Bensoussan, A. (1974). Points de Nash dans le cas de fonctionelles quadratiques etjeux differentiels lineaires a N personnes. SIAM Journal on Control, 12, no. 3, 237243. Berkovitz, L. D. (1964). A variational approach to differential games. In M. Dresher, L. S. Shapley and A. W. Tucker, eds, "Advances in Game Theory", pp. 127174. Princeton University Press, Princeton, New Jersey. Berkovitz, L. D. (1974). "Optimal Control Theory". SpringerVerlag, Berlin. Bernhard, P. (1971). Condition de coin pour les jeux differentials, Seminaire des 2125 Juin, "Les Jeux Differentiels", Centre d'Automatique de I'Ecole Nationale Superieure de Mines de Paris. Bernhard, P. (1977). Singular surfaces in differential games: an introduction. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 133. Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Bernhard, P. (1979). Linear quadratic twoperson zero sum differential game: necessary and sufficient condition. Journal of Optimization Theory and Applications, 27, no. I. Blackwell, D. and M. A. Girshick (1954). "Theory of Games and Statistical Decisions". John Wiley, New York. Blaquiere, A., ed. (1973). "Topics in Differential Games". NorthHolland, Amsterdam. Blaquiere, A. (1976). Une generalisation du concept d'optimalite et des certaines notions geometrique qui's y rattachent. Institut des Hautes Etudes de Belgique, Cahiers du centre d'etudes de recherche operationelle, vol. 18, no. 12, Bruxelles, 4961. Blaquiere, A. (1977). Differential games with piecewise continuous trajectories. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 3469. Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Blaquiere, A., F. Gerard and G. Leitmann (1969). "Quantitative and Qualitative Games". Academic Press, New York and London. Bogardi, I. and F. Szidarovsky (1976). Application of game theory in water management. Applications of Mathematical Modelling, 1, 1620. Boltyanski, V. G. (1978). "Optimal Control of Discrete Systems". Hallsted Press, John Wiley, New York. Boot, J. C. G. (1964). "Quadratic Programming". NorthHolland, Amsterdam. Borel, E. (1953). The theory of play and integral equations with skew symmetrical kernels: On games that involve chance and skill of the players. On systems of linear forms of skew symmetric determinants and the general theory of play, trans. by L. J. Savage, Econometrica, 21, 97117. Breakwell, J. V. (1971). Examples elementaires, Seminaire des 2125 Juin, "Lex Jeux Differentiels", Centre d'Automatique de I'Ecole Nationale Superieure des Mines de Paris. Breakwell, J. V. (1973). Some differential games with interesting discontinuities. Internal report, Stanford University. Breakwell, J. V. tI977). Zerosum differential games with terminal payoff. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and
414
REFERENCES
Applications", pp. 7095. Lecture Notes in Control and Information Science, vol. 3. SpringerVerlag, Berlin. Brockett, R. (1970). "Finite Dimensional Linear Systems". John Wiley, New York. Brown, G. W. (1951). Iterative solutions of games by fictitious play. In T. C. Koopmans, ed., "Activity Analysis of Production and Allocation", pp. 374376. Cowles Commission Monograph, vol. 13. John Wiley, New York. Brown, G. W. and J. von Neumann (1950). Solutions of games by differential equations. In H. W. Kuhn and A. W. Tucker, eds, "Contributions to the Theory of Games", vol. I, pp. 73~79. Annals of Mathematics Studies, no. 24. Princeton University Press, Princeton, New Jersey. Bryson, A. E. and Y. C. Ho (1969). "Applied Optimal Control". Blaisdell, Waltham, Massachusetts. Burger, E. (1966). "Einfiihrung in die Theorie der Spiele", (2. Auflage). Walter de Gruyter and Company, Berlin. Burrow, J. L. (1978). A multistage game with incomplete information requiring an infinite memory. Journal oj Optimization Theory and Applications, 24, 3373.60. Canon, M. D., C. D. Cullum, Jr and E. Polak (1970). "Theory of Optimal Control and Programming". McGrawHill, New York. Case, J. H. (1967). "Equilibrium Points of Nperson Differential Games". Ph.D. Thesis, University of Michigan, Department of Industrial Engineering, TR no. 19671, Ann Arbor, Michigan. Case, J. H. (1969). Toward a theory of many player differential games. SIAM Journal on Control, 7, no. 2. Case, J. H. (1971). Applications of the theory of differential games to economic 'problems. In H. W. Kuhn and G. P. Szego, eds, "Differential Games and Related Topics", pp. 345371. NorthHolland, Amsterdam. Case, J. H. (1979). "Economics and the Competitive Process". New York University Press, New York. Castanon, D. A. (1976). "Equilibria in Stochastic Dynamic Games of Stackelberg Type". Ph.D. Thesis, M. I. T., Electronics Systems Laboratory, Cambridge, Massachusetts. Chen, C. I. and J. B. Cruz, Jr (1972). Stackelberg solution for twoperson games with biased information patterns. IEEE Transactions on Automatic Control, AC17, no. 5, 791798. Chigir, S. A. (1976). The game problem on the dolichobrachistochrone. PMM, 30, 10031013. Ciletti, M. D. (1970). On the contradiction of the bangbangbang surfaces in differential games. Journal oJOptimization Theory and Applications, 5, no. 3, 163169. Clemhout, S., G. Leitmann and H. Y. Wan, Jr (1973). A differential game model of oligopoly. Journal oj Cybernetics, 3, no. I, 2439. Coddington, E. A. and N. Levinson (1955). "Theory of Ordinary Differential Equations". McGrawHill, New York. Cournot, A. (1838). "Recherches sur les Principes Mathematiques de la Theorie des Richesses". Hachette, Paris. (English edition published in 1960 by Kelley, New York, under the title "Researches into the Mathematical Principles of the Theory of Wealth", trans. by N. T. Bacon.) Cruz, J. B. Jr (1978). Leaderfollower strategies for multilevel systems. IEEE Transactions on Automatic Control, AC23, no. 2, 244254.
REFERENCES
415
Dantzig, G. B. (1963). "Linear Programming and Extensions". Princeton University Press, Princeton, New Jersey. Davis, M. H. A. (1973). On the existence of optimal policies in stochastic control. SIAM Journal on Control, 11, 587594. Davis, M. H. A. and P. P. Varaiya (1973). Dynamic programming conditions for partially observable stochastic systems. SIAM Journal on Control, Ll , 226261. Dresher, M., A. W. Tucker and P. Wolfe, eds. (1957). "Contributions to the Theory of Games", vol. III, Annals of Mathematics Studies, no. 39, Princeton University Press, Princeton, New Jersey. Dresher, M., L. S. Shapley and A. W. Tucker, eds. (1964). "Advances in Game Theory", Annals of Mathematics Studies, no. 52. Princeton University Press, Princeton, New Jersey. Edgeworth, F. Y. (1925). "Paper Relating to Political Economy", vol. 1, Macmillan, London. Elliott, R. J. (1976). The existence of value in stochastic differential games. SIAM Journal on Control and Optimization, 14, 8594. Elliott, R. J. (1977). Feedback strategies in deterministic differential games. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 136142. Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Elliott, R. J. and N. J. Kalton (1972). The existence of value in differential games of pursuit and evasion. Journal of Differential Equations, 12, 504523. Elliott, R. J., N. J. Kalton and L. Markus (1973). Saddlepoints for linear differential games. SIAM Journal on Control, II, 100112. Epstein, R. A. (1967) "The Theory of Gambling and Statistical Logic". Academic Press, New York and London. Fan, K. (1953). Minimax theorems. "Proceedings of the Academy of Sciences", USA, 39,4247. Feller, W. (1971). "An Introduction to Probability Theory and Its Applications", 2nd edn, vol. II, John Wiley, New York. Fleming, W. H. (1961). The convergence problem for differential games. Journal of Mathematical Analysis and Applications, 3, 102116. Fleming, W. H. (1964). The convergence problem for differential games II. In M. Dresher, L. S. Shapley and A. W. Tucker, eds, "Advances in Game Theory", pp. 195210. Annals of Mathematics Studies, no. 52. Princeton University Press, Princeton, New Jersey. Fleming, W. H. (1969). Optimal continuousparameter stochastic control. SIAM Review, 11, 470509. Fleming, W. H. and R. W. Rishel (1975). "Deterministic and Stochastic Optimal Control". SpringerVerlag, Berlin. Foreman, J. G. (1977). The princess and the monster on the circle. SIAM Journal on Control and Optimization, 15, 841856. Fox, L. (1964). "An Introduction to Numerical Linear Algebra". Clarendon Press, Oxford. Fox, M. and G. S. Kimeldorf (1969). Noisy duels. SIAM Journal of Applied Mathematics, 17, no. 2, 353361. Friedman, A. (1971). "Differential Games". John Wiley, New York. Friedman, A. (1972). Stochastic differential games. Journal of Differential Equations, 11, no. I, 79108.
416
REFERENCES
Friedman, J. W. (1977). "Oligopoly and the Theory of Games". NorthHolland, Amsterdam. Gal, S. (1979). Search games with mobile and immobile hider. SIAM Journal on Control and Optimization, 17, 99122. Getz, W. M. and M. Pachter (1980). Twotarget pursuitevasion differential games in the plane. Journal of Optimization Theory and Applications, 30. Gikhman, I. I. and A. V. Skorohod (1972). "Stochastic Differential Equations". SpringerVerlag, Berlin. Glicksberg, I. L. (1950). Minimax theorem for upper and lower semicontinuous payoffs. Rand Corporation Research Memorandum RM478. Glicksberg, I. L. (1952). A further generalization of the Kakutani fixed point theorem, with application to Nash equilibrium points. Proceedings of the American Mathematical Society, 3, 170174, Grote, J. D., ed. (1975). "The Theory and Applications of Differential Games". Reidel Publishing Company, The Netherlands. Hagedorn, P., H. W. Knobloch and G. J. 01sder, eds. (1977). "Differential Games and Applications", Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Hajek, O. (1975). "Pursuit Games". Academic Press, New York and London. Hajek, O. (1979). Discontinuous differential equations, I and II. Journal ofDifferential Equations,32,149l85. Halanay, A. (1968). Differential games with delay. SIAM Journal on Control, 6, no. 4, 579593. Hale, J. (1977). "Theory of Functional Differential Equations". SpringerVerlag, Berlin. Hiimiiliiinen, R. P. (1976). Nash and Stackelberg solutions to general linearquadratic twoplayer difference games. Helsinki University of Technology publication, Systems Theory Laboratory, Report B29, Espoo, Finland, 1976. Haurie, A. and R. Lhousni (1979). A duopoly model with variable leadership Stackelberg solution, presented at International Atlantic Economic Conference, Salzburg, Austria, May 1018. Ho, Y. c., ed. (1969). "Proceedings of the First International Conference on the Theory and Applications of Differentials Games", Amherst, Massachusetts. Ho, Y. C. (1970). Survey paper: Differential games dynamic optimization and generalized control theory. Journal of Optimization Theory and Applications, 6, no. 3 179209. Ho, Y. C. and S. K. Mitter (1976). "Directions in Large Scale Systems". Plenum Press, New York. Ho, Y: C. and G. J. Olsder (1980). Differential games: concepts and applications. In M. Shubik, ed., "Proceedings of Mathematical Models of Conflict and Combat: Uses and Theory". Ho, Y. c., A. E. Bryson, Jr and S. Baron (1965). Differential games and optimal pursuitevasion strategies. IEEE Transactions on Automatic Control, ACIO, no. 4, 385389. Ho, Y. C., P. B. Luh and G. J. Olsder (1980). A controltheoretic view on incentives. Proceedings of the Fourth International Conference on Analysis and Optimization of Systems, Versailles, France, In A. Bensoussan and J. L. Lions, eds, pp. 359383, Lecture Notes in Control and Information Sciences, vol. 28. Springer Verlag, Berlin. Ho, Y. C; P. B. Luh and R. Muralidharan (1981). Information structure, Stackelberg
REFERENCES
417
games and incentive controllability. IEEE Transactions on Automatic Control, AC26, no. 2. Howard, N. (1971). "Paradoxes of Rationality: Theory of Metagames and Political Behavior". The MIT Press, Cambridge, Massachusetts. Intriligator, M. D. (1971). "Mathematical Optimization and Economic Theory". Prentice Hall, Englewood Cliffs, New Jersey. Isaacs, R. (195456). DitTerential Games I, II, III, IV. Rand Corporation Research Memorandum RM1391, 1399, 1411, 1468. Isaacs, R. (1975). "Differential Games", 2nd edn, Kruger Publishing Company, Huntington, New York. (First edition: Wiley, New York, 1965). Isaacs, R. (1969). DitTerential games: their scope, nature and future. Journal of Optimization Theory and Applications, 3, no. 5. Jacobson, D. H. (1977). On values and strategies for infinitetime linear quadratic games. IEEE Transactions on Automatic Control, AC22, 490491. Kakutani, S. (1941). Generalization of Brouwer's fixed point theorem. Duke Journal of Mathematics, 8, 457459. Karlin, S. (1959). "Matrix Games, Programming and Mathematical Economics", vols I and II. AddisonWesley, Reading, Massachusetts. Krassovski, N. and A. Soubbotine (1977). "Jeux DitTerentiels". Editions Mir, Moscow. (Original Russian edition published in 1974.) Kuga, K. (1974). Brouwer's fixed point theorem: An alternative proof. Siam Journal on Mathematical Analysis, 5, no. 6, 893897. Kuhn, H. W. (1953). Extensive games and the problem ofinformation. In H. W. Kuhn and A. W. Tucker, eds, "Contributions to the Theory of Games", pp. 193216. Annals of Mathematics Studies, no. 28, Princeton University Press, Princeton, New Jersey. Kuhn, H. W. and G. P. Szego, eds. (1971). "Differential Games and Related Topics". NorthHolland, Amsterdam. Kuhn, H. W. and A. W. Tucker, eds (1950). "Contributions to the Theory of Games", vol. I, Annals of Mathematics Studies, no. 24. Princeton University Press, Princeton, New Jersey. Kuhn, H. W. and A. W. Tucker, eds (1953). Contributions to the Theory of Games", vol. II, Annals of Mathematics Studies, no. 28. Princeton University Press, Princeton, New Jersey. Kydland, F. (1975). Noncooperative and dominant player solutions in discrete dynamic games. International Economic Review, 16, no. 2, 321335. Leitmann, G. (1974). "Cooperative and Noncooperative Many Player DitTerential Games", CISM Monograph no. 190. SpringerVerlag, Vienna. Leitmann, G., ed. (1976). "Multicriteria Decision Making and DitTerential Games". Plenum Press, New York. Leitmann, G. (1977). Many player ditTerential games. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 153171, Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Leitmann, G. (1978). On generalized Stackelberg strategies. Journal of Optimization Theory and Applications, 26, no. 6. Leitrnann, G. and H. Y. Wan, Jr (1979). Macroeconomic stabilization policy for an uncertain dynamic economy. In M. Aoki and A. Marzollo, eds, "New Trends in Dynamic System Theory and Economics", pp. 105136, Academic Press, London and New York.
418
REFERENCES
Lemke, C. E. and J. F. Howson (1964). Equilibrium points ofbimatrix games. SIAM Journal on Applied Mathematics, 12,413423. Levine, J. (1979). Twoperson zerosum differential games with incomplete informationa Bayesian model. In P. T. Liu and E. O. Roxin, eds, "Differential Games and Control Theory Ill", pp. 119151, Proceedings of the 3rd Kingston Conference 1978, Part A. Lecture Notes in Pure and Applied Mathematics, vol. 44. Plenum Press, New York. Levitan, R. E. and M. Shubik (1971). Noncooperative equilibria and strategy spaces in an oligopolisticmarket.ln H. W. Kuhn and G. P. Szego, eds, "Differential Games and Related Topics", pp. 429448, NorthHolland, Amsterdam. Lewin, J. (1976). The bangbangbang problem revisited. Journal 0/ Optimization Theory and Applications, 18, no. 3, 429432. Lewin, J. and G. J. Olsder (1979). Conic surveillance evasion. Journal ofOptimization Theory and Applications, 19, 107125. Loeve, M. (1963). "Probability Theory", 3rd edn. Van Nostrand, Princeton, New Jersey. Luce, R. D. and H. Raiffa (1957). "Games and Decisions". Wiley, New York. Luenberger, D. G. (1969). "Optimization by Vector Space Methods". Wiley, New York. Luenberger, D. G. (1973). "Introduction to Linear and Nonlinear Programming". AddisonWesley, Reading, Massachusetts. Lukes, D. L. (1971). Equilibrium feedback control in linear games with quadratic costs. SIAM Journal on Control, 9, no. 2, 234252. Lukes, D. L. and D. L. Russell (1971). A global theory for linear quadratic differential games. Journal of Mathematical Analysis and Applications, 33, 96123. Mageirou, E. F. (1976). Values and strategies for infinite time linear quadratic games. IEEE Transactions on Automatic Control, AC21, no. 4, 547550. Mageirou, E. F. (1977). Iterative techniques for Riccati game equations. Journal of Optimization Theory and Applications, 22, no. I, 5161. Maitra, A. and T. Parthasarathy (1970a). On stochastic games. Journal of Optimization Theory and Applications, 5, 289300. Maitra, A. and T. Parthasarathy (l970b). On stochastic games, II. Journal of Optimization Theory and Applications, 8, 154160. Marcus, M. and H. Mine (1964). "A Survey of Matrix Theory and Matrix Inequalities", vol. 14. Prindle, Weber and Schmidt, Boston, Massachusetts. McKinsey, J. C. C. (1952). "Introduction to the Theory of Games". McGrawHill, New York. Merz, A. W. (1971). The homicidal chauffeura differential game. Stanford University publication, Guidance and Control Laboratory, Report no. 418, 1971. Merz, A. W. (1973). Optimal evasive maneuvers in maritime collision avoidance. Navigation, 20, 144152. Merz, A. W. (1976). A differential game solution to the coplanar tailchase aerial combat problem. NASACr137809. Miloh, T. and S. D. Sharma (1976). Maritime collision avoidance as a differential game. Bericht no. 329, Institut fur Schiffbau der Universitat Hamburg. Moraal, R. (1978). Mental poker. Master's Thesis, Twente University of Technology, Enschede, Netherlands (in Dutch). Nash, J. (I95\). Noncooperative games. Annals of Mathematics, 54, no. 2, 286295. Von Neumann, J. (1928). Zur theorie der Gesellschaftspiele. Mathematische Annalen, 100, 295320.
REFERENCES
419
Von Neumann, J. (1937). Uber ein okonomisches Gleichungssytem und eine Verallgemeinerung des Brouwerschen Fixpunktsatzes. Ergebnisse eines Mathematik Kolloquiums, 8, 7383. Von Neumann, J. (1954). A numerical method to determine optimum strategy. Naval Research Logistics Quarterly, 1, 109115. Von Neumann, J. and O. Morgenstern (1947). "Theory of Games and Economic Behavior", 2nd edn. Princeton University Press, Princeton, New Jersey. Nikaido, H. and K. Isoda (1955). Note on noncooperative convex games. Pacific Journal of Mathematics, 5, 807815. Okuguchi, K. (1976). "Expectations and Stability in Oligopoly Models", Lecture Notes in Economics and Mathematical Systems, no. 138. SpringerVerlag, Berlin. Olsder, G. J. (1976). Some thoughts about simple advertising models as differential games and the structure of coalitions. In Y. C. Ho and S. K. Mitter, eds, "Directions in Large Scale Systems", pp. 187206. Plenum Press, New York. Olsder, G. J. (1977a). Information structures in differential games. In E. O. Roxin, P. T. Liu and R. L. Sternberg, eds, "Differential Games and Control Theory II", pp. 99136. Marcel Dekker, New York. Olsder, G. J. (1977b). On observation costs and information structures in stochastic differential games. In P. Hagedorn, H. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 172185, Lecture Notes in Control and Information Sciences, vol. 3. SpringerVerlag, Berlin. Olsder, G. J. and J. V. Breakwell (1974). Role determination in an aerial dogfight. International Journal of Game Theory, 3, 47 66. Olsder, G. J. and J. Walter (1978). Collision avoidance of ships. In J. Stoer, ed., "Proceedings 8th IFIP Conference on Optimization Techniques", pp. 264271. Lecture Notes in Control and Information Sciences, vol. 6. SpringerVerlag, Berlin. Owen, G. (1968). "Game Theory". Saunders, Philadelphia, Pennsylvania. Owen, G. (1974). Existence of equilibrium pairs in continuous games. International Journal of Game Theory, 5, no. 2/3, 97105. Pachter, M. and Y. Yavin (1979). A stochastic homicidal chauffeur pursuitevasion differential game. Technical Report no. TWISK 95, WNNR, CSIR, Pretoria, South Africa. Papavassilopoulos, G. P. and J. B. Cruz, Jr (1979a). Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control AC24, no. 2, 155166. Papavassilopoulos, G. P. and J. B. Cruz, Jr (1979b). On the existence of solutions to coupled matrix Riceati differential equations in linear quadratic games. IEEE Transactions on Automatic Control, AC24, no. 1. Papavassilopoulos, G. P. and J. B. Cruz, Jr (1980). Sufficient conditions for Stackelberg and Nash strategies with memory. Journal of Optimization Theory and Applications, 31, 233260. Papavassilopoulos, G. P., J. V. Medanic and J. B. Cruz, Jr (1979). On the existence of Nash strategies and solutions to coupled Riccati equations in linear quadratic games. Journal of Optimization Theory and Applications, 28, no. 1,4976. Papoulis, A. (1965). "Probability, Random Variables and Stochastic Processes". McGrawHill, New York. Parthasarathy, T. and T. E. S. Raghavan (1971). "Some Topics in Twoperson Games". Elsevier, New York. Parthasarathy, T. and T. E. S. Raghavan (1975). Existence of saddlepoints and Nash
420
REFERENCES
equilibrium points for differential games. SIAM Journal on Control, 11, 100112. Parthasarathy, T. and M. Stern (1977). Markov gamesa survey. In E. O. Roxin, P. T. Liu and R. L. Sternberg, eds, "Differential Games and Control Theory II", pp. 146. Marcel Dekker, New York. Peng, W. Y. and T. L. Vincent, (1975). Some aspects of aerial combat. AIAA Journal, 13, 711. Pindyck, R. S. (1977). Optimal economic stabilization policies under decentralized control and conflicting objectives. IEEE Transactions on Automatic Control, AC22, no. 4, 517530. Plasmans, J. E. J. (1980). Interplay: a linked model for economic policy in the EEC. In Proceedings of the 3rd IFACfIFORS Conference on Dynamic Modelling and Control of National Economies, Warsaw, Poland. Plasmans, J. E. J. and A. J. de Zeeuw (1980). Nash, Pareto and Stackelberg solutions in macroeconomic policy models with a decentralized decision structure. Presented at the IFAC workshop on Differential Games, Sochi, USSR. Also as an internal report, University of Tilburg, the Netherlands. Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze and E. F. Mishchenko (1962). 'The Mathematical Theory of Optimal Processes". Interscience Publishers, New York. Pontryagin, L. S. (1967). Linear differential games, I. Soviet Math. Doklady, 8,769771. Raghavan, T. E. S. (1979). Some remarks on matrix games and nonnegative matrices. SIAM Journal of Applied Mathematics, 36, no. 1,8385. Reid, W. T. (1972). "Riccati Differential Equations". Academic Press, New York and London. Restrepo, R. (1957). Tactical problems involving several actions. In M. Dresher, A. W. TuckerandP. Wolfe,eds, "Contributions to the Theory of Games", vol III, pp. 313335. Princeton University Press, Princeton, New Jersey. Rosen, J. B. (1965). Existence and uniqueness of equilibrium points for concave Nperson games. Econometrica, 33, 520534. Roxin, E. O. (1969). The axiomatic approach in differential games. Journal of Optimization Theory and Applications, 3, 153163. Roxin, E. O. (1977). Differential games with partial differential equations. In P. Hagedorn, K. W. Knobloch and G. J. Olsder, eds, "Differential Games and Applications", pp. 186204, Lecture Notes in Control and Information Sciences, vol. 3, SpringerVerlag, Berlin. Rupp, W. (1977). sGleichgewichtspunkte in nPersonenspiele. In R. Henn and O. Moeschlin, eds, "Mathematical Economics and Game Theory; Essays in Honor of Oskar Morgenstern". SpringerVerlag, Berlin. Sagan, H. (1969). "Introduction to the Calculus of Variations". McGrawHili, New York. Scalzo, R. C. (1974). Nperson linear quadratic differential games with constraints. SIAM Journal on Control, 12, no. 3,419425. Scarf, H. (1967). The approximation of fixed points of a continuous mapping. SIAM Journal of Applied Mathematics, 15, no. 5, 13281343. Schmitendorf, W. E. (1970). Existence of optimal openloop strategies for a class of differential games. Journal of Optimization Theory and Applications,S, no. 5. Shapley, L. S. (1953). Stochastic games. Proceedings of the National Academy of Sciences, USA, 39, 10951100.
REFERENCES
421
Shapley, L. S. (1974). A note on the LemkeHowson algorithm. Mathematical Programming Study, 1, 175189. Sharma, L. D. (1976). On ship maneuverability and collision avoidance. Internal report, Institut fiir SchifIbau der Universitat Hamburg. Shiau, T. H. and P. R. Kumar (1979). Existence and value and randomized strategies in zerosum discretetime stochastic dynamic games. Mathematics research report no. 7917 of the University of Maryland, Baltimore County, Maryland. Shakun, M. F., ed. (1972). Dedicated issue on game theory and gaming, Management Science, 18, no. 5, part II. Simaan, M. and J. B. Cruz, Jr (1973a). On the Stackelberg strategy in nonzerosum games. Journal of Optimization Theory and Applications, 11, no. 5, 533555. Simaan, M. and J. B. Cruz, Jr (1973b). Additional aspects of the Stackelberg strategy in nonzerosum games. Journal of Optimization Theory and Applications, 11, no. 6, 613626. Simaan, M. and J. B. Cruz, Jr (l973c). Sampleddata Nash controls in nonzerosum differential games. International Journal of Control, 17, no. 6, 12011209. Singleton, R. R. and W. F. Tyndal (1974). "Games and Programs". Freeman, San Francisco, California. Sion, M. (1958). On general minimax theorems. Pacific Journal of Mathematics, 8, 171176. Sion, M. andP. Wolfe (1957). On a game without a value. In M. Dresher, A. W. Tucker and P. Wolfe, eds, "Contributions to the Theory of Games", vol. III, pp. 209 306. Princeton University Press, Princeton, New Jersey. Sobel, M. J. (1971). Noncooperative stochastic games. Annals of Mathematical Statistics, 42, 19301935. Von Stackelberg, H. (1934). "Marktforrn und Gleichgewicht". Springer, Vienna. (An English translation appeared in 1952 entitled "The theory of the Market Economy", published by Oxford University Press, Oxford). Starr, A. W. and Y. C. Ho (I 969a). Nonzerosum differential games. Journal of Optimization Theory and Applications, 3, no. 3, 184206. Starr, A. W. and Y. C. Ho (1969b). Further properties of nonzerosum differential games. Journal of Optimization Theory and Applications, 3, no. 3, 207219. Tijs, S. H. (1975). Semiinfinite and infinite matrix games and bimatrix games. Ph. D. Thesis, University of Nijmegen, the Netherlands. Tolwinski, B. (I 978a). Numerical solution of Nperson nonzerosum differential games. Control and Cybernetics, 7, no. 1, 3750. Tolwinski, B. (l978b). On the existence of Nash equilibrium points for differential games with linear and nonlinear dynamics. Control and Cybernetics, 7, no. 3, 5769. Tolwinski, B. (1980). Stackelberg solution of dynamic game with constraints. In Proceedings of the 19th IEEE Conference on Decision and Control, Albuquerque, New Mexico. Tolwinski, B. (1981a). Equilibrium solutions for a class of hierarchical games. In J. Outenbaum and M. Niergodka, eds, "Applications of Systems Theory to Economics, Management and Technology". PWN, Warsaw. Tolwinski, B. (1981b).Closedloop Stackelberg solution to multistage linear quadratic game. Journal of Optimization Theory and Applications, 35. Tucker, A. W. and R. D. Luce, eds. (1959). "Contributions to the Theory of Games", vol. IV, Annals of Mathematics Studies, no. 40. Princeton University Press, Princeton, New Jersey.
422
REFERENCES
Uchida, K. (1978). On the existence of Nash equilibrium point in Nperson nonzerosum stochastic differential games. SIAM Journal on Control and Optimization, 16, no. 1,142149. Uchida, K. (1979). A note on the existence of a Nash equilibrium point in stochastic differential games. SIAM Journal on Control and Optimization, 17, no. I, 14. Varaiya, P. P. (1967). The existence of solutions to a differential game. SIAM Journal on Control, 5, 153162. Varaiya, P. P. (1970). Nperson nonzerosum differential games with linear dynamics. SIAM Journal on Control, 8,441449. Varaiya, P. P. (1976). Nplayer stochastic differential games. SIAM Journal on Control and Optimization, 14, 538545. Varaiya, P. P. and J. G. Lin (1969). Existence of saddle point in differential games. SIAM Journal on Control, 7, 141157. Vidyasagar, M. (1976). A new approach to Nperson, nonzerosum, linear differential games. Journal of Optimization Theory and Applications, 18, 171175. Ville, J. A. (1938). Sur la theorie generale desjeux ou intervient l'habilite desjoueurs. In E. Borel, "Traite du calcul des probabilites et de ses applications IV", pp. 105113. Vincent, T. L. and W. Y. Peng (1973). Ship collision avoidance. Workshop on Differential Games, Naval Academy, Annapolis, Missouri. Vincent, T. L., D. L. Strichtand W. Y. Peng(l976). Aircraft missile avoidance. Journal of Operations Research, 24. Vorob'ev, N. H. (1977). "Game Theory". SpringerVerlag, Berlin. Wald, A. (1945). Generalization of a theorem by von Neumann concerning zerosum twoperson games. Annals of Mathematics, 46, 281 286. Warga, J. (1972). "Optimal Control of Differential and Functional Equations". Academic Press, New York and London. Weil, R. L. (1968). Game theory and eigensystems. SIAM Review, 10,360367. Whittaker, E. T. and G. N. Watson (1965). "A Course of Modern Analysis", 4th edn. Cambridge University Press, Cambridge. Williams, J. D. (1954). "The Compleat Strategyst". McGrawHill, New York. Willman, W. W. (1969). Formal solutions for a class of stochastic pursuitevasion games. IEEE Transactions on Automatic Control, AC14, no. 5, 504509. Wilson, D. J. (1977). Differential games with no information. SIAM Journal on Control and Optimization, 15,233246. Wishart, D. M. G. and G. J. Olsder (1979). Discontinuous Stackelberg solutions. International Journal of System Sciences, 10, no. 12, 13591368. Witsenhausen, H. S. (1971a). On information structures, feedback and causality. SIAM Journal on Control, 9, 149160. Witsenhausen, H. J. (1971b). On the relations between the values of a game and its information structure. Information and Control, 19, no. 3. Witsenhausen, H. S. (1975). Alternatives to the tree model for extensive games. In J. D. Grote, ed., "The Theory and Applications of Differential Games", pp. 7784. Reidel Publishing Company, The Netherlands. Wong, E. (1971). "Stochastic Processes in Information and Dynamical Systems". McGrawHill, New York. Zadeh, L. A. (1969). The concepts of system, aggregate and state in system theory. In L. A. Zadeh and E. Polak, eds. "System Theory", pp. 342. McGrawHill, New York.
Corollaries, Definitions, Examples, Lemmas, Propositions, Remarks and Theorems
Chapter 1 Examples
No.1 2 3 4
Chapter 2 Corollaries
NO.1 2 3 4 5
Definitions
No.1 2 3 4 5 6 7 8 9 10 11 12 13 14
Examples
No.1 2 3 4
Lemmas
p. 4 6 11 11
p. 24 29 31 49 58 p.. 23 26 27 28 42 43 43 44
48 49 50 52 58
64
p.43 53 55 56
No.1 2
p. 29 36
Propositions
No.1 2 3 4 5 6 7
Remarks
No.1 2 3 4 5 6
p.46 46 52 55 56 57 59 p.28 43 43 43 48 63
Theorems
No.1 2 3 4 5 6
p. 20 24 28 29 36 37
Chapter 3 Corollaries
No.1 2
Definitions
No.1 2 3
p. 81 92 p.72 72 72
423
Definitions (contd)
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
74 78 79 82 84 85 93 94 94 99 99 101 102 102 104 108 108 115 117 118 118 119 127 127 128 131 132 136 136 138 138 141 141 142 145
424
LIST OF COROLLARIES ETC.
Examples
No.1 2 3 4 5
6
6 (contd.) 7 8 9 10 11 12 13 14 15 16 17 18 Propositions
No.1 2 3 4 5
6 7 8
9 10 11 12 13 14 15 16 17 Remarks
No.1 2 3 4 5 6 7 8 9 10 11
p.72 78 83 85 89 96 104 105 106 108 111 113 120 133 136 139 143 146 147 p. 74 75 80 85 88 91 99 114 117 119 123 124 124 129 131 137 142 p. 74 78 80 92 99 100 101 102 103 104 112
Remarks (contd)
Remarks (contd)
12 13 14 15 16 17 18 19 20 Theorems
No.1 2 3
Chapter 4 Corollaries
No.1 2 3 4 5
Definitions
No.1 2 3 4 5 6 7
Examples
No.1 2 3 4 5 6 7 8
Propositions
No.1 2 3 4 5 6
Remarks
No.1 2
114 115 116 119 128 128 128 129 139
Theorems
p. 79 86 128
Properties
p.164 168 169 178 187 p. 158 159 160 166 176 177 179 p. 157 164 169 171 173 179 189 190 p. 175 179 180 182 184 189 p. 166 176
3 4 5 6 No.1 2 3 4 5 6
No.1 2
Chapter 5 Definitions
No.1 2 3 4 5 6 7 8 9 10 11 12
Examples
No.1 2
Propositions
No.1 2 3 4
Remarks
No.1 2 3 4 5 6 7 8 9 10
178 184 187 189 p. 159 163 165 168 168 177 p. 177 177
p.205 207 208 209 210 212 215 218 218 219 232 234 p.215 226 p.221 229 234 236 p.208 209 212 221 223 223 225 229 230 232
425
LIST OF COROLLARIES ETC.
Remarks (contd)
II 12
Theorems
No. I 2 3 4 5
234 236 p.212 216 222 225 230
Chapter 6
Corollaries
No.1 2 3 4 Al A2 Bl B2 B3
Definitions
No.1 2 3 4
Al A2 Examples
No.1 Al
Lemmas
No.1 2 Al
Propositions
No. I 2 3 4 5 Al BI
p.253 254 255 274 287 288 297 298 298 p.243 251 264 266 279 285 p.275 292 p. 247 256 282 p.251 258 261 262 264 286 294
Remarks
No. I 2 3 4 5 6 7 8 Al A2 A3 A4 BI B2
Theorems
No.1 2 3 4 5 6 7 8 9 Al A2 A3 A4 A5 A6 A7 BI
Chapter 7 Corollaries
No. I 2
Examples
No. I 2 3
p.245 254 254 257 266 267 270 274 281 288 289 291 295 298 p.242 243 245 247 249 252 257 268 271 278 279 281 282 284 287 290 295
Lemmas
No. I 2
Propositions
No. I 2 3 4 5
Remarks
No. I 2 3 4 5 6 7 8 9
Theorems
No.1 2 3 4
Condition
No. I
p.307 326 p.320 324 327 333 335 p. 309 312 313 320 320 321 324 331 331 p.308 313 328 339 p.328
Chapter 8
Examples
No.1 2 3 4 5 6
p.310 314
Remarks
p.315 322 337
Theorems
No. I 2
No. I 2
p. 350 352 355 362 364 367 p.348 350 p.348 349
Index
Absolute continuity, 407 Action, 5 set, 205 Admissible feedback Stackelberg solution, 132 Admissible Nash equilibrium solution, 84 Admissible strategy pair, 72 Advertising game, 6, 301 Aeronautics application, 385 Average cost functional, 209 Average lower value, 28 Average security level, 27 Average Stackelberg cost, 179 Average upper value, 27 Average value, 408 Banach space, 400 Bangbangbang surface, 397 Barrier, 228, 367 Battle of the sexes, 73, 81 Behavioural Stackelberg equilibrium, 135 Behavioural strategies, 48, 94 in infinite dynamic games, 217 in multiact games, 54, 123 Bellshaped games, 174 Better pair of strategies, 72 Bimatrix game, 71 Bombersubmarine problem, 175 Borel set, 406 Boundary of usable part (BUP), 354 Brachistochrone, 371 Brouwer fixed point theorem, 409 Canonical equations, 224 Capturability, 353 Cauchy sequence, 400 Causality, 204 Chance moves, 61, 144
Closed set, 400 Closedloop imperfect state information, 207 Closedloop nomemory Nash equilibria, 249 Closedloop nomemory Nash equilibria for differential games, 284 Closedloop perfect state information, 207, 212 Collision avoidance, 380 Compact set, 400 Completely mixed game, 68 Completely mixed Nash equilibrium, 80 Computation, mixedstrategy Nash equilibria, 79, 88, 90 Nash solution for quadratic games, 185 mixed equilibrium strategies, 31 Concavity, 402 Concurrent Stackelberg solution, 182 Conflict, 1 Conic surveillance evasion, 398 Conjectural variation, 190 Conjugate point, 230 Constant level curve, 161 Constant strategy, 7, 43 Constantsum game, 4 Continuity, 401 Continuouskernel game, 165 Continuoustime infinite dynamic games, 210 Control function, 210 Control set, 205 Convexity, 401 Cooperation, 4 Cooperative game, 5 Cost function(al), 3, 93, 206 Costate vector (variable), 224 Countable set, 399 426
INDEX
Cournot duopoly, 189 Covariance, 408 Curve of equal time to go, 227 Cusp, 361 Decision maker, I Decision point, 365, 371 Decision rule, 6 Delayed commitment, 59, 104, 108, 118 Density function, 407 Differential game, 2, 210 Discretetime infinite game, 202 Discretetime systems, 230 Dispersal surface (line), 362 Dogfight, 386 Dolichobrachistochrone, 371 Dominating strategy, 38 Duel, 173 Duopoly, 189 Dynamic form, 108 Dynamic game, I Dynamic programming, 3, 220, 222 Eigenvalue, 401 Eigenvector, 401 Epsilon Stackelberg strategy, 177 Epsilondelayed closedloop perfect state information, 212 Epsilonequilibrium, 156 Epsilonmixed saddlepoint strategy, 158 Epsilonneighbourhood, 400 Epsilonsaddle point, 159 Equivocal line, 366 Evader, 345 Expected cost functional, 209 Expected value, 408 Extensive form, 39, 93, 207 with chance moves, 64 Feedback, 10, 208 game in extensive form, 50 game in laddernested extensive form, 116 imperfect state information, 208 Nash equilibria, 117, 249 Nash equilibria for differential games, 278 perfect state information, 212 saddlepoint strategy, 52 Stackelberg solution, 130, 312
427
Finite game, 10 Finite set, 399 Fixed point theorem, 409 Focal line (surface), 366 Follower, 12, 125 Functional, 401 Gainfloor, 20 Games, of degree, 206 of kind, 206 with chance moves, 61 with separable kernels, 170 on the square, 165 of timing, 172 GaussSeidel procedure, 185 Global Stackelberg solution, 130, 315, 342 Gradient, 402 Graphical solution, 32 HamiltonJacobiBellman (HJB) equation, 222, 295 Hamiltonian, 224 Hessian, 402 Hierarchical equilibrium, 141, 305 Homicidal chauffeur game, 357, 364 Infinite game, 10, 156 Infinite horizon, 238 Infinite set, 399 Information, 205 field, 211 pattern, 205 set, 40, 42, 93 space, 206 structure, 10, 205, 211 Informational inferiority, 99, 119,264 Informationally nonunique Nash equilibria, 266 Informational nonuniqueness, 100, 119, 259 Initial state, 205 Inner mixedstrategy Nash equilibrium, 80 Interchangeability of equilibria, 24, 75 Interior, 400 Isaacs condition, 296, 348 Isaacs equation, 346 Isocost curve, 161 Isochrone, 227
428
INDEX
Kakutani's fixed point theorem, 409 Laddernested form, 101 Lady in the lake, II, 368 Leader, 12, 125 Leaking corner, 360, 374 Lebesguemeasure, 407 LemkeHowson algorithm, 90 Limit, 400 Linear independence, 399 Linear programming, 34 Linearquadratic, differential game, 279 discretetime game, 243 Linearquadratic optimal control, continuoustime, 228 discretetime, 221 Lion and man, 398 Lipschitzcontinuity, 213 Locally stable Nash equilibrium, 167 Loop model, 202 Lossceiling, 20 Lower value, 21, 159 Many player Stackelberg games, 140 Markov game, 238 Matrix, 401 game, 19 Riccati equation, 229 Memoryless perfect state information, 207, 212 Mental poker, 151 Microeconomics, 189 Minimax solution, 12, 76 strategies, 78 theorem, 29 value, 78 Minimum principle, 223 Miss distance, 381 Mixed epsilon Nash equilibrium, 163 security strategy, 27 Stackelberg solution, 135, 179 Mixed strategies, II, 25, 26, 78, 94 in infinite dynamic games, 217 in infinite static games, 162 in multiact games, 123 Mixedstrategy Nash equilibria, 85 Multiact game, 43, 50, 115 Multistage game, 50
Nperson discretetime infinite dynamic game, 205 Nperson games, 82 Nperson nonzerosum feedback game, 115 Nash condition, 296 Nash equilibrium,S, 72, 82, 92, 94, 158, 239, 240 in behavioural strategies, 112 in mixed strategies, 79, 112 outcome, 72 Nature, 2, 209 Near miss, 387 Nested form, 101 Noisy duel, 194 Noisy observations, 304 Noncooperative equilibrium, 72, 82, 94 outcome, 72 Noncooperative game, I Nonconcurrent Stackelberg solution, 182 Nonlinear programming, 91 Nonuniqueness of Nash equilibria, 72, 100 Nonzerosum games, 4, 70 with chance moves, 144 Norm, 400 Normal form, 39, 82, 207 Observation, 205 Obstacle tag, 398 Oligopoly, 189 Onestep delayed closedloop perfect state information pattern, 208 Onestep delayed observation sharing pattern, 208 Open set, 400 Openloop, 10, 207, 212 game, 58 Nash equilibrium for differential games, 278 Nash equilibrium for discretetime games, 240 realization, 232 Stackelberg solution, 306 Optimal control theory, 2 Optimal response set, 127, 136, 138, 160 Optimal solution, 402 Optimality, 12 principle of, 220
INDEX
Paretooptimal solution, 5 Payoff function, 42 Penetration, 355 Permissible strategy, 211 Personbyperson optimality, 187 Playability, 215 Playable strategies, 353 Player, 1 set, 42, 93 Players' set, 205, 210 Precedence of players, 101 Price game, 191 Princess and the monster, II Prior commitment, 59, 104 Prisoners' dilemma, 77 Probability distribution, 162, 406 Probability measure, 405 Proper mixed strategy, 26 Pure strategy, 23 Pursuer, 345 Pursuitevasion, 344 Quadratic gall)«s,.l83 Qualitative game, 206 Quantitative game, 206 Quantity game, 191 Random variable (vector), 26, 406 Rational reaction set, 127, 136, 138, 160 Reaction function, 160 Representation of a strategy, 231, 265, 318 Riccati equation, 229 Robust Nash solution, 166 Role determination, 385 Rope pulling game, 4 Saddle point, 23 equilibrium, 23, 44 in behavioural strategies, 49 in differential games, 345 in mixed strategies, 28 Saddlepoint value, 44 Safe contact, 367, 378 Security level, 20 Security strategy, 20 Semiinfinite bimatrix game, 163 Semipermeable, 348 surface, 359, 384 Separable kernel, 170
429
Ship collision avoidance, 380 Sigma algebra, 405 Simplex, 25 Simultaneous confrontation, 387 Singleact game, nonzerosum, 95 zerosum, 43 Singular surface (line), 361 Stable Nash equilibrium, 167 Stackelberg cost, 127 Stackelberg equilibrium, 125, 127, 176, 305 in mixed strategies, 136 in behavioural strategies, 138 Stackelberg game, 12, 176,324,332 Stage of a game, 205 Stageadditive cost function, 207 Stagewise Stackelberg solution, 130 Stalemate Stackelberg solution, 182 State equation, 205 State of a game, 50, 205 State space, 205 State trajectory, 210 Static infinite game, 156 Static normal form, 99, 118 Static subextensive form, 102 Static version of a dynamic game, 99, 118 Stochastic differential equation, 216 Stochastic differential game, 216, 293 Stochastic discretetime games, 271 Stochastic games, 208 Stochastic systems, 234 Strategic equivalence, 36, 74, 84 Strategic form, 3 Strategy, 5, 43, 206 set, 43, 94 space, 43, 94 Strict dominance, 38 Subextensive form, 102 Switching envelope, 366 Switching function, 226 Switching surface, 361 Symmetric game, 66 Target set, 214, 345 Team problem, 75, 187, 267, 319 Teamoptimal solution, 187 Terminal cost functional, 208 Terminal time, 214
430 Termination, 206, 214, 352 dilemma, 353 Threat strategy, 319 Trajectory space) 210 Transition surface (line), 361 Tree structure, 40 Twocars model, 356 Twotarget game, 386 Undecomposable extensive form, 108 Universal surface (line), 363 Unstable Nash equilibrium, 167 Upper value, 21, 159
INDEX
Usable part (UP), 227, 354 Value, 23, 159 function, 220, 222, 346 in mixed strategies, 28 Vector space, 400 Weierstrass theorem, 403 Wiener process, 216 Zerosum game, 3, 19 with chance moves, 61